Working Thoughts on SDN #3
Yesterday HP announced some SDN products to include a controller. If you had read my SDN Working Thoughts #3 post, then you already knew this data point. I have many questions about this announcement, starting with why would they announce an OpenFlow based controller when you can get one from Big Switch Networks (BSN)? I am sure there is a smart answer, but that is not my point. In addition to HPQ, IBM announced a controller using the NEC controller. My point is there is has been and continues to be a lot of development and design of controllers going on. My hypothesis is that the controller architecture will play a role as to where the battle of SDN market share will be won and lost in the coming years and simplification of the market into “separating the data plane from the control plane” is not specific enough and does not encompass a broad enough data set. I have written several times before SDN is more than APIs and reinventing the past thirty years of networking in OpenFlowease.
I think a person’s perspective of the controller is directly related to how you see the network evolving and how your company wants to run their business. There is no stand alone controller market. If I was to summarize the various views of the controller I would say that incumbent vendors view a third party controller as a threat and need to provide a controller as hedge in their portfolio in case it becomes a strategic point of emphasis. Incumbents really do not know what to do with a controller in terms of their legacy business, which is why they market a controller as some sort of auto-provisioning, stat collecting NMS on steroids. It will enable you to buy more of their legacy stuff, which for HPQ after today’s guidance cut may not be the case. The emerging SDN companies view the controller as point of contention for network control. All the companies in the market share labeled “other” or “non-Cisco” view the controller as a means to access the locked-in market share of Cisco. In the past, I would have told you that control planes have enormous monetary value if you can commercialize them inside customers. Cisco did this with IGRP, IOS, Cisco IOS and NX-OS. Ciena did this with the CoreDirector. Sonus failed to do this. Ipsilon failed to do this. Does anyone remember the 56k modem standard battle between US Robotics and the rest of the world who were working on the 56k standard and who won that market battle? The question becomes over the next year or two is how many controllers become commercialized in the market place and what are these controllers doing? I think there is a difference between controllers doing network services and controllers providing network orchestration based on application needs.
The following quote is from Jim Duffy’s article in Network World on HP’s controller announcement:
“HP’s Virtual Application Networks SDN Controller is an x86-based appliance or software that runs on an x86-based server. It supports OpenFlow, and is designed to create a centralized view of the network and automate network configuration of devices by eliminating thousands of manual CLI entries. It also provides APIs to third-party developers to integrate custom enterprise applications. The controller can govern OpenFlow-enabled switches like the nine 3800 series rolled out this week, and the 16 unveiled earlier this year. Its southbound interface relays configuration instructions to switches with OpenFlow agents, while it’s northbound representational state transfer interfaces — developed by HP as the industry mulls standardization of these interfaces — relays state information back to the controller and up to the SDN orchestration systems.”
Reading Duffy’s description I think the SDN orchestration system (is that application orchestration?) is more valuable than the controller he describes, but that is a side discussion. I also took the time to read this blog post from HP. Much of this controller architecture discussion has been on my mind for months as well as in my day to day work conversations for the past few months. It seems a day cannot go by without a conversation on this matter. I have no conclusions to offer in this post, so if you are looking for one please stop reading. The point of this post is that controller architecture, controller design and how SDN will evolve is in process and I think it is little early to be declaring the availability of solutions that offer marginal incremental value at best. The evolution of the controller thought process can be summarized at a high level by the following:
- Wired Article from Apr. 2012
- Urs Hoezle’s presentation from ONS in 2012
- Google A Software Defined WAN Architecture (81 Slides) from ONS 2012
- Martin’s Blog
From the Martin’s blog in the section on General SDN Controllers:
“The platform we’ve been working on over the last couple of years (Onix) is of this latter category. It supports controller clustering (distribution), multiple controller/switch protocols (including OpenFlow) and provides a number of API design concessions to allow it to scale to very large deployments (tens or hundreds of thousands of ports under control). Since Onix is the controller we’re most familiar with, we’ll focus on it. So, what does the Onix API look like? It’s extremely simple. In brief, it presents the network to applications as an eventually consistent graph that is shared among the nodes in the controller cluster. In addition, it provides applications with distributed coordination primitives for coordinating nodes’ access to that graph.”
Regarding, ONIX here’s a brief summary of the architecture but you can read a paper on it here and note who the author’s are and where they work:
- Centralized approach. Central controller configures switches using either OpenFlow along with some lower-level extensions for more fine grained control.
- Default topology is computed using legacy protocols (e.g. OSPF, STP, etc.), or static configuration.
- Collects and presents a unified topology picture (they call it a network information base – NIB) to Apps that run on top of it.
- Multiple controllers (residing in Apps) are allowed to modify the NIB by requesting a lock to the data structure in question.
- Scalability and Reliability:
- Cluster + Hierarchy of Onix instances, but NIB is synchronized across all instances (e.g. via a distributed database). For the hierarchical design, there is further discussion on partitioning the scope and responsibilities of each Onix instance.
- Transactional database for configuration (e.g. setting a forwarding table entry), DHT for volatile info (e.g. stats). Lot of focus on database synchronization and design.
- Example of “apps” mentioned in the paper:
- Security policy controller
- Distributed Virtual Switch controller
- Multi-tenant virtualized datacenter (i.e. NVP)
- Scale out BGP router
- Flexible DC architectures like Portland, VL2 and SEATTLE for large DCs
Combining the info from multiple sources, Google uses ONIX for a network OS (see the link to the ONIX paper above). ONIX appears to be Nicira’s closed source version of NOX, and both Nicira and Google use it. NEC has something called Helios that involves OpenFlow, which noted above was OEMed by IBM. I not sure about HPQ and their recent controller announcement, but I think it serves us well to understand the history of the ONIX architecture.
- ONIX users think that fast failover at the switch level while maintaining application requirements is a hard problem to solve. They think it is better to focus on centralized reconfiguration in response to network failures.
- ONIX synchronizes state only at the ONIX controller
- ONIX wants to use multiple controllers writing to the network information base interface and probably to any table in any switch
Is ONIX a direction for some OpenFlow evolution or a design point? I think one of the early visions for OpenFlow and ONIX was for it to become a cloud OS, which it has yet to become, but others are trying. The evolution of OF/ONIX vision looks something like this:
- Build a fabric solutions company with software and hardware, which is largely about controlling physical switches with OpenFlow (Read NOX paper here)
- Build a commercial controller (ONIX) and sell it as a platform product to a community of applications developers
- Build a network virtualization (multi-tenancy through overlays…this is the part where Nicira renames ONIX to NVP?) application that happens to embed their controller (formerly ONIX). Control the forwarding table with OpenFlow and every other aspect of overlay implementation using OVSDB protocol talking to OVS (it is largely about controlling virtual switches with only a pinch of OpenFlow).
- Nicira purchased by VMWare for their general expertise in SDN and for future applications of the technology assets (VMWare today ships a virtualization/overlay solution using VXLAN that does not include any Nicira IP).
It will be interesting over the next year or so to see how the architecture of the controller evolves. I wrote about some of this in the SDN Working Thoughts #3 post. I think we are coming to an understanding that there are variations to just running a controller in band with the data flows. I think we will conclude that having a controller act as session border control device translating between the legacy protocol world and the OpenFlow world is also a non-starter, but this is the current hedge strategy of most incumbent vendors. As the world of SDN evolves, we will look back and see the path to what SDN has become by looking at the failures as proofs along the way. The industry will solve the scaling and large state questions, but I think the solutions will be shown to exist closer to the hardware (i.e. network) than most envision in the pure software only view.
In a prior post I had made a reference to an article that was partially inspired by a post by Pascale Vicat-Blanc on the Lyattis blog. The Lyatiss team has been working on a cloud language for virtual infrastructure modeling. In particular, it generalizes the Flowvisor concept of virtualizing the physical network infrastructure to include application concepts. I am not sure of the extent of their orchestration goals. Do they expect Cloudweaver to spin up the VMs and storage, place them on specific servers, configure the network to satisfy specific traffic engineering constraints, and finally tear down the VMs? I am not sure. With Nicira now part of VMWare what is the future for NOX/ONIX and will other companies be innovators or implementors?
There is another potential market evolution to consider when we think about the controller. The silicon developers are looking to develop chips that disaggregate servers into individual components. The objective is to make the components of the server, especially the CPU upgradable. Some people have envisioned this type of compute cluster to be controlled by OpenFlow, but I think that is unlikely. Network protocols will be around for a very long time, but putting that aside, the question is what does this type of compute clustering do for the network? How much server to server traffic stays in the rack / cluster / pod / DC? I am not sure how much of this evolution will have to do with OpenFlow, but what I do know is that this type of compute evolution will have a lot to do with SDN, if you believe that SDN is about defining network topologies based the needs of the applications that use the network.
In a true representation of the title, this post is just some working thoughts on SDN with hypotheses to be proven. Comments and insights welcome…
/wrk