SDN, It’s Free just like a Puppy!

I have written both and will post at the same time because I believe we are conflating many issues when it comes to networking.  For example: SDN, ONF, OpenFlow, Overlays, Underlays, Tunneling, VXLAN, STT, White Box, Merchant Silicon, Controllers, Leaf/Spine, Up, Down, Top, Bottom, DIY, Cloud, Hybrid, NetContainers, Service Chaining, DevOps, NoOps, SomeOps, NFV, Daylight, Floodlight, Spotlight to name a few.  Both of these posts are intended to be read back to back.  I wrote them in two parts to provide an intentional separation.

I have fantastic news for everyone who wants to implement SDN:  It is free; just like a puppy.  In many aspects, SDN is just like a puppy.  It is cute, we fall in love on sight, we want to play with it, we want to take it home, we limit its space until it is house trained, it provides us with unconditional affection and we like everything about it.  It is going to bring joy and happiness to our network, I mean our family.  Unfortunately, we have to live it and the life-cycle is going to be about 15-18 years and the puppy we have today is going to grow into a big dog that will soon stop being cute.  What we really need is a new network applicable everywhere that is on par with all the other IT technology pillars.

Taking a step back for a moment, let us review the decade of the naughts.  In 2001 we had just come off the internet/telecom bubble of the 1990s and the obscene spending on Y2K readiness.  In May 2003, Nicholas Carr writes an article in the HBR called IT Doesn’t Matter and pens a book a year latter.  The article is basically said that we will get our IT services from a great IT utility in the sky because the stuff is so complex, so expensive (i.e. CAPEX) and so labor intensive (i.e. OPEX) that the only method to control cost is to do it at scale like the power grid.  Maybe it is regulated too.  A number of enterprises took the approach that they would just out source their IT people and services to a third party.  Large businesses were built just to integrate and administrate IT at scale.  All sorts of outsourcing companies grew into large business doing the complex work of making stuff work and many companies even outsourced IT abroad.  Do you remember when we were all reading Thomas Friedman’s book The World is Flat?

When I wrote this post a few weeks ago I was thinking about the past decade of networking, but I think I can set the background better than I did in that post.  Imagine we all worked in a datacenter until the end of 2002.  In December 2002 we decided to take a year off, bum around the BVIs and then we opened a coffee shop.  Something cool like Blue Bottle in SF or Voltage in Cambridge.  After ten years the coffee bug was worked out of our system and we decided to go back and work in the data center we left in 2002.  When we got there and looked around we would be in shock.  Every element of the modern data center would be different and nearly unrecognizable from 2002, ten years ago.  Every element of the modern datacenter had dramatically changed from the servers to applications to storage to devops tools including power, cooling and the physical size of facilities.  Everything would be unrecognizable except the network.  The network would be the same.  We would recognize the network and we could probably be knowledge current in a day or two with the design and operation of the network — yet in the intervening decade the nature of applications, servers, workloads, storage and how those elements of IT interact would have dramatically changed.  I have heard this state describe in many ways:

  • Cargo cult culture
  • Network people are masters of complexity
  • Network people do not think outside of the box – they are not risk takers

Some companies have attempted to break this culture; most notably are Amazon and Google.  Amazon looked for ways to sell their excess IT capacity and this became AWS.  Google decided to build their own network equipment figuring that if the current state of the switched network is the best it is going to get, meaning we are just going to lash up bigger and bigger leaf/spine hierarchies, then they can at least make it less expensive by doing it themselves.  Around 2004 and into 2007, a group at Stanford and some of the search engines came up with the idea of separating the data plane from the control plane and trying to solve the inflexible nature of the network through programmability.  Then Nicira came out with overlays.  All of these were attempts to fix the network by logically overlaying to abstracting aspects of the network with programability.  Let me review the sequence of attempts to control network cost in past ten years:

  • Outsource
  • Lease
  • Do It Yourself
  • Overlays
  • Implement SDN (i.e. Controller Architecture) + Merchant Silicon

These are all attempts to attack the cost both in terms of CAPEX and OPEX .  Commoditization and DIY networking are really the puppy elements of SDN.  Product life-cycle management is difficult.  Components, supply chains, QA testing, regression testing, you know the six sigma stuff, is hard to do.  Subsituting one cost element for another is not really a long term solution to the problems presented by the current state of switched network designs.  I think the siren song of commoditization and DIY networking are appealing to people because the network does not really change.  That is my point.  The solution is changing the network construct — not more of the same.  Network switches are not like disks and servers.  Disks and servers are not the same either.  The cheap-and-cheerful disposable hardware model only works:

  • …for servers if workload is fluid
  • …for disks if data is fluid
  • …for switches if capacity is fluid
The word fluid is being used to as a condensation/conflation of: replicated, replicable, re-locatable, re-allocatable.  Interesting that when you think about it this way, virtualization is not a means, but a result of fluidization.  Pointedly, network virtualization does not make capacity fluid — it makes workloads fluid.  If workloads are fluid, it would be helpful to have fluid network capacities to allocate to the demands of the workloads.   

The OPEX cost of the network is not going to change until the network portion of the IT model is made concurrent with the other elements of the IT model.  What needs to be addressed is: (1) network correlation and automation that is concurrent with the state of other technologies in the DC and (2) better utilization of the network based on what is important which is applications and workloads.

/wrk

Dr. Strangelove: Or How I Learned to Stop Worrying and Love SDN

If you have not seen the movie Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb, you should.  It is an important point of cultural reference in our contemporary history.  I have been thinking about all this SDN stuff and various technical and business strategies over the past few weeks.  Today, a colleague made reference to movie Dr. Strangelove in a passing conversation about network design.  It occurred to me that there are a lot of humorous parallels between the movie and networking.  This is a blog and I think it is a place between unfinished thoughts and longer form content.

I started writing about the OPEX problem of networking last year.  Here is a link to a post on OPEX, but I should state that the founder of Plexxi was thinking about this problem in 2009-2010 when he founded the company and if you look through the old Nicira presentations you can see they were highlighting the same problem set.  See this presentation I did in January 2013 which is really a presentation on the origins of SDN and what does it mean.  For this post, I am going to pretend that SDN does not exist.  In fact, if networking did not exist at all and a group of us were put into a room and asked to create a network, I doubt that we would come up with anything remotely resembling what we have today.  This might be a good exercise for IT architects to go through.  How would you design a network today, forgetting the limitations of the last 20-30 years of networking?  That is what we did at Plexxi and continue to do each day.  That is point of this post.  If I was to go design a system called a network to connect compute, storage and users I would want that network to have a number of characteristics.

1. Network Must Express What is Important: Solving the internet and client/server challenges was about solving the problem of connectivity.  Today, we are not solving the problem of connectivity, we are solving the problem of utilization and correlation.  We need to correlation the utilization of the network with workloads that the network is tasked to support.  We want the network to be orchestrateable by the application and the developer of the application.  That means a developer can say that application A-B-C requires these sets of characteristics from the network such as low latency, jitter sensitive, bandwidth, hop count, path or what is called service chaining.  A network then has the ability to take these requirements from many applications and calculate a topology that best reflects the needs of the applications.

2. Dynamic Network: In order to make the network orchestrateable, we need the network to be dynamic.  Dynamic networks can be built in the wireless or optical domains.  Physically wiring a network reinforces the hardwired, physically limited design of the network.  When we wire up the conventional leaf/spine, switched network design, we lock-in the network design requirements on day 1.  If the workload requirements of the network changes after the first day, we have to no ability to change how the network is configured.

3. Purpose Built for Automation: We want the network to be built for automation.  We do not want to be configuring network elements at the port or the device level through CLIs.  We want to compute topologies and script the network configuration.

4. Single Interface: We need to have centralized interface points that allow the network to be orchestrated.  This is not a single point, but it is not every network element.  We want to have orchestration from the top down to the device – not the device up.

5. API Driven: We want the network interface to be API driven.  We want an open interface that allows software developers, server administrators (i.e. DevOps) people to orchestrate the network based on the needs of their applications and services.  If we want to reduce OPEX, we need to remove the network as a silo that lags behind the dynamic nature of the other elements of IT.

6. Simplification: A network can do two things: connect and disconnect devices.  Simplified physical layer cabling via an advanced dynamic interconnect that translates into lower power, cooling and administrative costs. The Plexxi topology is guaranteed to remain consistent was you grow the network in size, in contrast to leaf spine where budgets and time of builds frequently does not allow this to happen.

7. Scale: We need a network that can scale concurrent with the evolutionary cycles occurring in compute and storage.  The scaling problem is multi-deminsional.

When building a network with the Plexxi architecture, there are four powerful advantages:

  1. The controller architecture (centralized points of control and administration)
  2. Simplified cabling (no need to engineer a massive structured cabling plant)
  3. Linear cost as the network grows (no core switch upgrades)
  4. The use of optics provides enormous benefits at scale in terms of power and cooling

For illustrative purposes, let us look at the number servers, switches and cables required for a full fat tree network design for k port switch:

  • Supports K^3/4 servers
  • Requires K^2/3 switches
  • Requires K^3/2 cables

Using a 64 port leaf (distribution layer) ToR switch would result in fat tree design supporting 65,536 ports, 5,120 switches and 131,072 cables.  In Plexxi network, the same design to provide 65,536 ports would result in 1,366 switches and 2,732 cables.  The reduction in the number of switches and cables is a direct result in the use of optics and a controller (i.e. SDN) architecture.  These advantages result in significant operational cost reductions.

Attacking the complexity and cost factors in running a network is multi-dimensional.  Servers are going from 1G to 10G.  That means a network upgrade is necessary.  The good news for customers is they have choice in how they design their network.  It is probably the first time they have had choice in network design since the demise of ATM and shared LANs.  For the past 16 years (or more) we have been designing switched hierarchical networks.  Choice for the customer was really price as in what kind of OSR to use, who had the lower 1G and 10G switch ports, what kind of 40G density to go with, etc.  The network was the same, despite the choice of vendor.

Customers can decide to build a Plexxi network.  A Plexxi network is built to deliver a long term OPEX reduction and we can do it at scale.  We have an API driven controller.  We have a dynamic interconnect.  Our standards based, merchant silicon based ethernet switch was purpose built for automation from the controller.  A Plexxi network is simple.  It is an optical fabric orchestrateable via the controller.  It is built to correlate the workloads expressed to it from the user applications via the API.  In legacy networks, we spend a lot of time determining state, letting legacy protocols and QoS and firewalls figure out the topology.  In a Plexxi network we figure out what is important, calculate topology using that information and then program the network to configure 100% of the bandwidth.  We do not want to have a network that is 20% utilized with one person per 100 devices.  We think we can do a lot better than those metrics.

/wrk

Self Similar Nature of Ethernet Traffic

With all the debates around networking at ONS 2013, I found myself reading competitive blog posts and watching competitive presentations from vendors.  It was the most entertaining part of ONS and it has certainly invigorated InterOp this past week with a new sense of purpose.  Many vendors announced new switches and products ahead of the InterOp show.  There has also been a steady discussion post-ONS on the definition of SDN.  With all the talk around buffer sizes, queue depths and port densities, I think  something has been lost or I missed a memo.  I often hear people talk about leaf/spine networks, load-balancing, ECMP and building “spines of spines” in large DC networks.

Arista recently announced their new spine switch the 7500E with deep buffers and high levels of port densities to build even bigger muiltipath, switched hierarchies.  The 7500 is really the state of the art for deep multi-stage forwarding fabric switches, in which the core design is relatively unchanged since the early to mid 1990s.  The need for deep buffers is the result of mapping real ethernet traffic over statistical fabrics.  Deep is often called in quotes because, even though the amount (in GB) is large, you have to really compute the depth in terms of time (in uSec), so the real, practical depth scales inversely with inbound aggregate link speed.  The Arista 7500E is clearly the state of art in terms of ethernet switches.  As we continue to build bigger and higher, I wonder who has a 2000 port terabit ethernet switch with 1000Gb of switching buffers on their POR?  After all, that is what we are going to need five years from now.  Here is recent quote I read about multi-path load balancing and optimal utilization, which these large multi-stage forwarding switches are designed to facilitate.  ”In practice, a well-provisioned IP network with rich multipath capabilities is robust, effective, and simple. Indeed, it’s been proven that multipath load-balancing can get very close to optimal utilization, even when the traffic matrix is unknown (which is the normal case).”

I went about trying to find a cited source that proves that multipath load-balancing can get very close to optimal network utilization with an unknown traffic matrix.  I asked a lot of people to point me to the paper or the traffic analysis.  Not a single person could point me to a paper or source.  One person said the origin of this belief comes from the Wikipedia page on CLOS.  Really?  The Wikipedia page on CLOS is now the reference design page for networking?  Another very well respected networking thought leader told me that MPTCP might provide this capability, but to date was unproven.

I think we need to go back and do some reading starting with Leland and Taqqu’s paper called On the Self-Similar Nature of Ethernet Traffic.  I uploaded a copy here.  It is an interesting paper and having been published in 1993, I wonder if it has been lost in time.  Do people read it anymore or was it lost in the transition from shared LANs to switched LANs and the excitement of the 1990s?  Self-similarity is the idea that something feels the same regardless of scale.  When this is put in terms of long range dependence, auto-corelation decays and the Hurst parameter (H = measure of burst, developed by Harold Hurst in 1965) increases as traffic increases.  In other words, the more you aggregate, the more you add spines of spines to the design, the result is the intensification of the burstiness (i.e. H parameter).  The obvious assumption is to expect a Poisson distribution, but that it not result.  Processes with long range dependence are characterized by an autocorrelation function that decays hyperbolically as size increases.  Said another way, as the number of ethernet users increases, the resulting aggregate traffic becomes burstier instead of smoother.

“We demonstrate that Ethernet local area network (LAN) traffic is statistically self-similar, that none of the commonly used traffic models is able to capture this fractal behavior, and that such behavior has serious implications for the design, control, and analysis of high-speed, cell-based networks. Intuitively, the critical characteristic of this self-similar traffic is that there is no natural length of a “burst”: at every time scale ranging from a few milliseconds to minutes and hours, similar-looking traffic bursts are evident; we find that aggregating streams of such traffic typically intensifies the self-similarity (“burstiness”) instead of smoothing it.”

As we continue to build ever taller and ever denser switched hierarchies, will we reach a point of diminishing returns in terms of complexity and cost?  When I think about the implementation of a controller architecture, I ask what is the objective of this architecture?  Why would you deploy a controller?  This is the question I like to ask network people: what does your controller do?  The reason to build a controller cannot be to implement the past in a new protocol form.  It has to be to do something different to obtain a different result or value.  The overlay and TCAM programming strategies were developed to get better utilization and flexibility over the “rigid, no easy scale out, poor programatic control, inflexible workload placement, limited multi-tenacy” aspects of the legacy switched (i.e. physical) network.  I quoted those words from a variety of vendor presentations on my desk.  I think a lot of the early SDN strategies assumed that the network construct of a fat tree, leaf/spine, spines of spines, switched hierarchy would never change.  Network designers need to wary of aggregating complexity; just adding capacity such as upgrading from 1G to 10G or 10G to 40G does not resolve the flaw in the network, it merely masks the flaw for some period of time.  The fluidity of the network does not work in the same way as the fluidity of storage and compute.  The latter statement is intended for the people who think network devices work like servers and when switches fail traffic loads will be easily migrated to available resources.  I think to fully harness the benefit of a controller architecture, the controller must be able to physically and logically alter the network.  That is the point.  A controller can change the topology.  Why would we want to change the topology of the network?

The rule of symmetry in a leaf/spine network is the answer.  A leaf/spine network is a structured wiring network design and when you break the symmetry of the design you diminish the value of the design.  If you have to add spines to the spines to scale out the design, you lose the value of collapsing the tiers in the design.  With a controller architecture, you can move from the position that most random is most optimal to most optimal is specific to the workload requirements.  The network can react to the workload requirements.

The whole idea of randomly hashing flows across the network and calling it optimal is opposite of most scientific principals.  I am certain that if server virtualization was instantiating a VM and having it placed on a randomly chosen CPU the adoption rate of server virtualization would have been far lower and longer.  That leaves me to ponder why would we conclude that randomization of the network is best path to optimization when that is not the strategy of choice for compute and storage?  I will fully agree that we do not want applications programming QoS settings, but we can abstract the requirements of the application and workload.  This abstraction will allow us to build a network tailored to the needs of the application workloads.  We can build a better network — rather than operating on the twenty-year assumption that most random is most optimal.  We can build a network based on specifics.  At Plexxi we call that Affinity.

“An important implication of the self-similarity of LAN traffic is that aggregating streams of such traffic typically does not produce a smooth (“Poisson-like”) superposition process but instead, intensifies the burstiness (i.e., the degree of self- similarity) of the aggregation process. Thus, self-similarity is both ubiquitous in our data and unavoidable in future, more highly aggregated, traffic. However, none of the currently common formal models for LAN traffic is able to capture the self-similar nature of real traffic. We briefly mention two novel methods for modeling self-similar LAN traffic, based on stochastic self-similar processes and deterministic nonlinear chaotic maps, that provide accurate and parsimonious models.”

There is a fair amount work going on in the IT industry to predict, understand, place and control workloads in the datacenter.  We should not ignore this work.  At Plexxi, we believe we can harness and use this measurement of utility and apply it Screen Shot 2013-05-10 at 3.09.14 PMto the network to obtain fully specified application topologies for the network.  I think most IT professionals think the network has a low value of utility.  I think that can change.  It is just another way we want to build you a better network.  I think that five years from now, the design of the network will be different.  If we really want to build networks at exabit scale, the network will be rich in path diversity — not aggregation.  The more paths the better.  There was a presentation at OFC this year in which Facebook showed an astounding statistic.  The statistic was for every external 1kb HTTP request, they have a 930x increase in internal traffic.  Building at scale is going to require the IT world to harness the photonic advantage and we will not talk about buffers and queue depths because in the photonic world there are no buffers.  The photonic advantage (see chart to left from Dr. John Bowers) in terms of power required (watts) and capacity (Gbps) is for optics versus packet switched is 20,000 times better.  This is possible in the modern data center because a controller can compute topologies that use optical and packet switching technologies advantageously.  In the past I wrote about traffic patterns and highlighted that the cheapest ports a network architect can purchase are the ToR ports.  Moving to the next era of network design will not be led by making the cheapest ports cheaper so we can buy more of the them and add to the complexity of the network.

/wrk

Demonstrating AWESOME in the Pursuit of the Optical Data Center

This week at OFC, Plexxi and Calient are showing the power of SDN and optics.  The idea to use some sort of optical or hybrid optical architecture for the data center has been pursued for years.  Here is a link to a 2010 paper called, Helios: A Hybrid Electrical/Optical Switch Architecture for Modular Data Centers, written by a number of people, but the most notable author is Amin Vahdat.  At OFC this year, as in past years, there are number of papers on the use of optics in the data center.  Here are some examples:

A search on “optical datacenter” using Google yields a 4.29M results.  The links above are just from the first page or well known papers.  What Plexxi Calient 1is Plexxi and Calient showing?  Here is a bit of the inside story.

The demonstration in the Calient booth at OFC was assembled in the past few weeks.  We did not even agree to put it together until late February.  One of my technical team members, Colin Ross did some of the heavy lifting, but it was a team effort.  The actual connector in Plexxi control that communicates with the Calient switch was done very quickly (a couple of hours) in Python.  What we did next was to use telemetry data from Boundary on VM performance to signal to Plexxi Control to change the topology of the network using the MEMs fabric in the Calient switch.  This is a sharp visualization of end-to-end Plexxi Calient 2automation from the application level down to the network, rather than the wires up.

There are many possibilities going forward with one of them being dynamic topology changes based on sFlow, which Plexxi will support in a future software release.  I have inserted two diagrams and short video.  Here is description of what you are seeing in the video:

Calient Plexxi OFC Video Pres R1apv

First you will see that we have a connection between switch P_05 and P_04. This correlates with the Boundary window showing “Client4->Client5″ with activity. When activity moves to “Client5->Client3″ (around 18 seconds) on the Boundary window, you will see shortly after that the Calient connects P_05 to P_03.

If you are a networking person, this should be a pretty cool demo.  What we are showing is the true power of SDN to use applications to express a need to the network and for the network to programmed for that need all through a Python interface.  I think that is pretty awesome, but I am sure there a people that want dad’s CLI back.

/wrk

Notebook 3.9.13: NFD #5, Portfolio Changes, Daylight-ONF, OFC

Last week was busy with travel to SF and the snow storm in Boston.  This week is no easier as I spend part of the week in Boston and part of it in SF with meetings in the SV.

Networking Field Day #5

Plexxi participated in Network Field Day #5 on Thursday.  There were many questions from the NFD team as they were trying to digest the Plexxi system.  The NFD #5 team posted the videos and if you have a couple hours they provide a base line introduction to Plexxi.  Much of what I have been blogging over the past ~16 months has been about Plexxi and what we are trying to change in networking.  Having been on a lot of Plexxi sales calls over the past year, I would state that everyone wants to understand the hardware and is fascinated with the optical interconnect, but the optical portion of the Plexxi system is equal to the merchant silicon portion of the system.  The interesting part is Plexxi Control and how an application can orchestrate the underlying network elements (optical + switching silicon) in a Plexxi switch.  Towards the end of the third video (~50 min mark) there is a comment by Derick Winkworth (Plexxi team member) about network wide view via your new CLI which is Python scripting through Plexxi Control.

Plexxi Intro

Plexxi Switch

Affinity Networking and Plexxi Control

Here is a post by my colleague Mike Bushong on The Insidious Cost of Incrementalism – Part 1.

Portfolio Changes

Crazy week of stock picks.  I have been remiss in noting all the changes, but I am long VMW, BRCD, BSFT, Gold and AMT.  I am waiting on a better entry point to buyback my GOOG position.  Gold is driving me crazy as I keep it as a hedge, but all the emotion around it almost makes me want to sell it and ignore it.

Controller Wars / Daylight / ONF

I have posted frequently over the past year about controllers and SDN.  I was even asked questions about Daylight at the Credit-Suisse Datacenter conference this past week.  I am not sure of the accuracy of this posts, but as I wrote in September of last year it seemed that 2013 was destined to be the year of the controller wars.  Dell moves to block Daylight link and analysis from SDN Central here.  Speaking of SDN Central, Plexxi and Boundary will be hosting a joint demo this week on Friday.

OFC: Plexxi Demo of Optically Connected Datacenter Switches

At OFC Plexxi is going to be showing integration with an optical switch vendor.  I am going to write more about this subject next week, but it goes back to Python scripting comment and orchestrating work loads across the network.  Here was an interesting Twitter exchange during the NFD #5 live session:

NFD

/wrk

It is all about Doctrine (I am talking about Networking and that SDN thing)

Last year, I wrote a long post on doctrine.  I was reminded of that post three times this week.  The first was from a Plexxi sales team who was telling me about a potential customer who was going to build a traditional switched hieracrhical network as test bed for SDN.  When I asked why they were going to do that, they said the customer stated because it was the only method (i.e. doctrine) his people had knowledge of and it was just easier to do what they have always done.  The second occurrence was in a Twitter dialog with a group of industry colleagues across multiple companies (some competitors!) and one of the participants referenced doctrine as a means for incumbents to lock out competitors from markets.  The third instance occurred at the at the Credit Suisse Next-Generation Datacenter Conference when I was asked what will cause people to build networks differently.  Here are my thoughts on SDN, networking, doctrine, OPEX and building better networks.

Is SDN a killer app for networking?  No, but applications are.

Why is networking so complex?  Because it is a method (i.e. strategy) for keeping competition out of your market.  (1) Step 1 make it complex, (2) Step 2 make it require advanced training, (3) Step 3 insitiutionalize advanced training to master complex knowledge (4) Step 4 induce a state of apogee via doctrine to prevent change.

How is doctrine enforced and institutionalized? By reinforcing the need for complexity.  If you cannot make it complex on your own, find others to help.

“The concept of doctrine enables technology people to make assumptions.  Assumptions are great as long as they hold.  When I refer to doctrine I am referring to procedures that ecosystem participants follow because they have been trained to reason and act in a certain manner within the command and control structure of their business and technology.  We design networks, manage companies; evaluate technology and markets according to a common set of doctrines that have been infused into the technology ecosystem culture over many decades.  I was thinking along this thought line in mid-October when I posted “I also believe we are all susceptible to diminished breadth in our creativity as we get older.  Diminished breadth in our creativity the root cause as to why history repeats itself and another reason why when we change companies we tend to take content and processes from our prior company and port them to our new company.  This is especially true in the technology industry.  We recycle people; hence we recycle ideas, content and value propositions from what worked before.  Why be creative when it is easier to cut and paste?  As a casual observation it seems to me that most people working in tech have a theta calculation as to their creativity.  I believe a strategy to guard against creativity decay is to look back on the past and critique the work.”  In mid October I had not fully fused the thesis of creativity fail or creativity theta with doctrine.  The idea to link the two concepts occurred to me last night as I was reading Shattered Sword for the second time.”

How is doctrine broken?  Doctrine is what people believe in and act on.  When intellectual thought leaders have the courage to lead, change can occur quickly.  I told the audience at the CS conference that the migration from shared LANs to switched LANs occurred quite quickly and if you were not involved in the technology industry in the early 1990s you probably do not know about Cabletron, Synoptics, Bytex, Chipcom, Proteon, CrossComm, Xyplex, DEC, etc.

I started writing about the OPEX problem of networking last year.  Here is a link to a post on OPEX, but I should state that the founder of Plexxi was thinking about this problem in 2009-2010 when he founded the company and if you look through the old Nicira presentations you can see they were highlighting the same problem set.  See this presentation I did in January 2013 which is really a presentation on the origins of SDN and what does it mean.  For this post, I am going to pretend that SDN does not exist.  In fact, if networking did not exist at all and a group of us were put into a room and asked to create a network, I doubt that we would come up with anything remotely resembling what we have today.  This might be a good exercise for IT architects to go through.  How would you design a network today, forgetting the limitations of the last 20-30 years of networking?  That is what we did at Plexxi and continue to do each day.  That is point of this post.  If I was to go design a system called a network to connect compute, storage and applications I would want that network to have a number of characteristics.

1. Network Must Express What is Important: Solving the internet and client/server challenges was about solving the problem of connectivity and reachability.  Today, we are not solving the problem of connectivity and reachability, we are solving the problem of utilization and correlation.  We need to correlate the utilization of the network with workloads that the network is tasked to support.  We want the network to be orchestrateable by the application and the developer of the application.  That means a developer can say that application A-B-C requires these sets of characteristics from the network such as low latency, jitter sensitive, bandwidth, hop count, path or what is called service chaining.  A network then has the ability to take these requirements from many applications and calculate a topology that best reflects the needs of the applications.

2. Dynamic Network: In order to make the network orchestrateable, we need the network to be dynamic.  Dynamic networks can be built in the wireless or optical domains.  Physically wiring a network reinforces the hardwired, physically limited design of the network.  When we wire up the conventional leaf/spine switched network design, we lock-in the network design the requirements on day 1.  If the workload requirements of the network changes after the first day, we have to no ability to change how the network is configured.

3. Purpose Built for Automation: We want the network to be built for automation.  We do not want to be configuring network elements at the port or the device level through CLIs.  We want to compute topologies and script the network configuration.

4. Single Interface: We need to have centralized interface points that allow the network to be orchestrated.  This is not a single point, but it is not every network element.  We want to have orchestration from the top down to the device – not the device up.

5. API Driven: We want the network interface to be API driven.  We want an open interface that allows software developers, server administrators (i.e. DevOps) people to orchestrate the network based on the needs of their applications and services.  If we want to reduce OPEX, we need to remove the network as a silo that lags behind the dynamic nature of the other elements of IT. 

6. Simplification: We want to build network designs that simple.  A network can do two things: connect and disconnect devices.  Simplified physical layer cabling via an advanced dynamic interconnect that translates into lower power, cooling and administrative costs.  

7. Scale: We need a network that can scale concurrent with the evolutionary cycles occurring in compute and storage.  The scaling problem is multi-deminsional.

Attacking the complexity and cost factors in running a network is multi-diemensinal.  Servers are going from 1G to 10G.  That means a network upgrade is necessary.  The good news for customers is they have choice in how they design their network.  It is probably the first time they have had choice in network design since the demise of ATM and shared LANs.  For the past 16 years (or more) we have been designing switched hierarchical networks.  Choice for the customer was really price as in what kind of OSR to use, who had the lower 1G and 10G switch ports, what kind of 40G density to go with, etc.  The network was the same, despite the choice of vendor.

Customers can decide to build a Plexxi network.  A Plexxi network is built to deliver a long term OPEX reduction and we can do it at scale.  We have an API driven controller.  We have a dynamic interconnect.  Our standards based, merchant silicon based ethernet switch was purpose built for automation from the controller.  A Plexxi network is simple.  It is an optical fabric orchestrateable via the controller through APIs.  It is built to correlate the workloads expressed to it from the applications via the API.  In legacy networks, we spend a lot of time determining state, letting legacy protocols and QoS and firewalls figure out the topology.  In a Plexxi network we figure out what is important via the application architect, calculate topology using that information and then program the network to configure 100% of the bandwidth.  We do not want to have a network that is 20% or 40% utilized with one person per 100 devices.  We think we can do a lot better than those metrics.  We think we can build you a better network.

/wrk

Talking SDN or Just Plain Next Generation Networking…

Tomorrow in SF, I will be talking about SDN, or as I like to call it next generation networking at the Credit Suisse Next Generation Data Center Conference.  It will be a panel discussion and each participant has a few minutes to present their company and thoughts on the market adoption of SDN.  Explaining the next twenty years of networking in fifteen minutes is a challenge, but I have been working with a small slide deck that helps make the point.  Here are those slides (link below).  I posted a variation of those slides few weeks ago that I used in NYC, but I tailored this deck to strict time limit of 15 minutes.  I will post more frequently after Plexxi is done at NFD #5 this week and around the time of OFC.

CS Next Gen DC Conference

 

/wrk