My automation/orchestration thoughts

I’ve been thinking for a while to write about my experience in the field of network automation somewhere, and thanks to Richard I finally found a place where to share my thoughts and findings.
I will start with some non-technical dreck and then move on to some more technical stuff. I hope you will appreciate both.

The views and opinions expressed in this post are solely my own and do not express the views or opinions of my employer.

What?

To commence the discussion, let me warn you. Automation/orchestration/SDN are some of the most abused and confused terms in IT and are as misleading as vague.
The way I see it is that “orchestration” is something “higher”, that provides guidance to some clever workhorses working behind the scenes on how to do repetitive stuff with the right information.
There is no particular technologies or tools that give you “automation” or “orchestration”.
Orchestrating is about changing mindsets, organisational models and, ultimately, changing how we do things. It’s not about flipping tables, as that usually does not bring people together, but apart.

Why?

The reason everyone is nowadays craving for automation is just because it’s something that the net-execs love to hear. “We can lower OPEX by automating more !!”
Fact is that if you suck more money to achieve automation than the $$ you save, the discussion is a non-starter.
Also, if you don’t tackle the problem the right way from the start, it might mean that your investments are thrown in the bin as they are very short-lived.

Besides, it is pretty evident that having some clever non-human “functions” that are able to make our lives easier and let us spend more time in the pub would be nice, right?

Assume you have 3K+ switches where you want to change the NTP server address all in one go.
You certainly don’t want to have to login on 3K+ boxes one by one. A script could do it all for you, couldn’t it?
But that’s not the point. We’ve always had scripts, we’ve always had geeks writing 3000 lines-long bash scripts to carry out such tasks (that no-one but them could read or tweak because they were utterly bad-written).

The point of automation is having a much more comprehensive and coherent framework where tasks like changing an NTP server address or the MTU on an interface do not require ANY scripting skills, but can be done easily by anyone, even oompa-loompas.
Having such an environment lowers dramatically the chance for human errors and cross-configuration incompatibilities, which, in turn, lowers the risk of MPIs and, ultimately, of revenue losses.
We’re in the days of continuous testing and delivery of software, why can we not mutuate the same philosophy in the networking world?
A device config, in the end of the day, is a set of instructions that can be regarded exactly as a piece of code that can be version-controlled, automatically-delivered, and continuously tested.
We have to transform our network labs into continuous-testing facilities driven by tools like Jenkins/GIT/etc.!
My testing friends are probably going to hate me now. But can you imagine how quick and cheap would be to test a new service/feature just by defining the expected results and waiting for the outcome to appear in your mailbox?
I’m not saying we have to get rid of people but that we can concentrate on doing much clever stuff just by re-working our mindsets.

How ?

I work for one of the biggest ISPs in the UK. My company is a fairly complex organisation due to its history of acquisitions and massive growth rate in the last few years.
Introducing the orchestration topics in big and complex organisations is a particularly challenging task, not only because you have to get people from various departments on your side, but especially because you have to start operating on heterogeneous network estates whose stability is critical to guarantee the degree of services our customers expect (and pay us for).

We can spend hours discussing about the specific tools or softwares that can be used to orchestrate network functions, but we wouldn’t be tackling the main point.
Orchestrating a network (I’m taking SDN out of the picture here) essentially means “driving” the network configurations in a clever way, from a centralised system that I enjoy calling the SPOT (single-point-of-truth).
You can drive configurations quite easily, even if your network is multi-vendor, but you cannot do it without having a model to drive it with. You need some good-quality data to render your configurations. The recipe for an accurate network configuration has three ingredients: a good model, good data, and a good template. Et voila’.

Network configuration templates

Writing templates is easy, very easy. We opted for Jinja2 as a templating language, as it seems very widely used and is supported by tools like Ansible.
The challenge with templates is definitely one: re-usability. I do not want to write a template for how to configure a VLAN each and every time I write the templates for my projects. How can others can re-use my templates? Are my templates elastic enough to let others expand them should there need be?

Network models

Putting together a model is – fairly – simple. The big network players (coordinated by the IETF NETMOD WG) are trying to make ends meet and agree on some standardised models, all written in YANG. (www.yang-central.org)
A network device is typically used for more than one purpose (a BNG can be used to terminate subscribers, as a pure MPLS PE router where to connect CE devices, etc.) and I expect the model that best represents such a device would be composed by multiple services, each in the form of a mix of YANG model instances.

Network state Data

What is NOT easy at all is – especially in a brown field scenario – putting together the data that is necessary to populate the nice model you engineered so carefully.
Where do you fetch your IP addressing from, the L3/L2 MPLS VPN data, the interfaces descriptions and the network physical topology?
You might already have an IPAM, a database of some sort where you store that suff, but can they talk to each other? Do they expose any APIs? Are those APIs powerful enough to fulfill your needs?

I think the essence of the problem is about tool co-ordination. “Because that data needs to be somewhere”. You just need to talk to people and try to understand how you can make use of the data you already have.

The immaturity of YANG, the lack of models for ‘everything’, and the lack of network devices that can be configured through NETCONF just by the means of YANG data structures certainly does not help businesses to steer with determination towards the YANG road. (I suspect there is a lot of politics behind the reason why YANG is not yet so popular, although it’s been around for some years now).
The above is maybe one of the reasons why the landscape is blossoming with proprietary solutions developed (or acquired – that’s the case of Cisco NCS – former Tail-f) by multiple vendors that can help bridging the gap.

When?

NOW. This part is not going to be long and boring. You have to start NOW. Reason is that it’s a fairly complicated and time-consuming process (if you’re not starting from scratch).

Who?

That’s a 1BN$ question. I think everyone has to be involved in the process. No-one should feel left apart nor immune from the revolution.
If you don’t like it, change job. If you love CLIs, you don’t need to worry. You’re still going to need CLIs. It’s probably not going to be IOS or JUNOS anymore, but you can still call it CLI if it makes you feel better.

Summary

Introducing orchestration in big ISPs can be regarded as the first industrial revolution of the Internet after the advent of IP (just kidding), where efficiencies are introduced by automating dumb and time-consuming tasks.
The industrial revolutions took 20 years each to complete and I expect this one to last for a while too. There is going to be turmoils, riots and rallies arranged by the “non-automation” parties. But it is going to happen no matter what.

The biggest resistances to the revolution are maybe the ones coming from up above, as everything has to have a business-case nowadays, and building one for a revolution is rather difficult.
But wait, do we need one? This is not just about going out to the market and buying a shiny and very pricey “jack of all trades” that does everything at the click of a button or hiring 200 employees to code a monolithic tool.
What we can do is start tackling some simple problems wearing the hat of the orchestration architect (and therefore not writing 30.000 lines long python scripts that no-one can read) and orchestrating small portions of our networks at a minimal cost.
Then, when everyone else will see what you’ve done within the (small) budget of some other projects and how much you’ve simplified your and everybody else’s life they’ll say: “Wait a minute. Why don’t we jump on the bandwagon?”