Self Similar Nature of Ethernet Traffic
With all the debates around networking at ONS 2013, I found myself reading competitive blog posts and watching competitive presentations from vendors. It was the most entertaining part of ONS and it has certainly invigorated InterOp this past week with a new sense of purpose. Many vendors announced new switches and products ahead of the InterOp show. There has also been a steady discussion post-ONS on the definition of SDN. With all the talk around buffer sizes, queue depths and port densities, I think something has been lost or I missed a memo. I often hear people talk about leaf/spine networks, load-balancing, ECMP and building “spines of spines” in large DC networks.
Arista recently announced their new spine switch the 7500E with deep buffers and high levels of port densities to build even bigger muiltipath, switched hierarchies. The 7500 is really the state of the art for deep multi-stage forwarding fabric switches, in which the core design is relatively unchanged since the early to mid 1990s. The need for deep buffers is the result of mapping real ethernet traffic over statistical fabrics. Deep is often called in quotes because, even though the amount (in GB) is large, you have to really compute the depth in terms of time (in uSec), so the real, practical depth scales inversely with inbound aggregate link speed. The Arista 7500E is clearly the state of art in terms of ethernet switches. As we continue to build bigger and higher, I wonder who has a 2000 port terabit ethernet switch with 1000Gb of switching buffers on their POR? After all, that is what we are going to need five years from now. Here is recent quote I read about multi-path load balancing and optimal utilization, which these large multi-stage forwarding switches are designed to facilitate. “In practice, a well-provisioned IP network with rich multipath capabilities is robust, effective, and simple. Indeed, it’s been proven that multipath load-balancing can get very close to optimal utilization, even when the traffic matrix is unknown (which is the normal case).”
I went about trying to find a cited source that proves that multipath load-balancing can get very close to optimal network utilization with an unknown traffic matrix. I asked a lot of people to point me to the paper or the traffic analysis. Not a single person could point me to a paper or source. One person said the origin of this belief comes from the Wikipedia page on CLOS. Really? The Wikipedia page on CLOS is now the reference design page for networking? Another very well respected networking thought leader told me that MPTCP might provide this capability, but to date was unproven.
I think we need to go back and do some reading starting with Leland and Taqqu’s paper called On the Self-Similar Nature of Ethernet Traffic. I uploaded a copy here. It is an interesting paper and having been published in 1993, I wonder if it has been lost in time. Do people read it anymore or was it lost in the transition from shared LANs to switched LANs and the excitement of the 1990s? Self-similarity is the idea that something feels the same regardless of scale. When this is put in terms of long range dependence, auto-corelation decays and the Hurst parameter (H = measure of burst, developed by Harold Hurst in 1965) increases as traffic increases. In other words, the more you aggregate, the more you add spines of spines to the design, the result is the intensification of the burstiness (i.e. H parameter). The obvious assumption is to expect a Poisson distribution, but that it not result. Processes with long range dependence are characterized by an autocorrelation function that decays hyperbolically as size increases. Said another way, as the number of ethernet users increases, the resulting aggregate traffic becomes burstier instead of smoother.
“We demonstrate that Ethernet local area network (LAN) traffic is statistically self-similar, that none of the commonly used traffic models is able to capture this fractal behavior, and that such behavior has serious implications for the design, control, and analysis of high-speed, cell-based networks. Intuitively, the critical characteristic of this self-similar traffic is that there is no natural length of a “burst”: at every time scale ranging from a few milliseconds to minutes and hours, similar-looking traffic bursts are evident; we find that aggregating streams of such traffic typically intensifies the self-similarity (“burstiness”) instead of smoothing it.”
As we continue to build ever taller and ever denser switched hierarchies, will we reach a point of diminishing returns in terms of complexity and cost? When I think about the implementation of a controller architecture, I ask what is the objective of this architecture? Why would you deploy a controller? This is the question I like to ask network people: what does your controller do? The reason to build a controller cannot be to implement the past in a new protocol form. It has to be to do something different to obtain a different result or value. The overlay and TCAM programming strategies were developed to get better utilization and flexibility over the “rigid, no easy scale out, poor programatic control, inflexible workload placement, limited multi-tenacy” aspects of the legacy switched (i.e. physical) network. I quoted those words from a variety of vendor presentations on my desk. I think a lot of the early SDN strategies assumed that the network construct of a fat tree, leaf/spine, spines of spines, switched hierarchy would never change. Network designers need to wary of aggregating complexity; just adding capacity such as upgrading from 1G to 10G or 10G to 40G does not resolve the flaw in the network, it merely masks the flaw for some period of time. The fluidity of the network does not work in the same way as the fluidity of storage and compute. The latter statement is intended for the people who think network devices work like servers and when switches fail traffic loads will be easily migrated to available resources. I think to fully harness the benefit of a controller architecture, the controller must be able to physically and logically alter the network. That is the point. A controller can change the topology. Why would we want to change the topology of the network?
The rule of symmetry in a leaf/spine network is the answer. A leaf/spine network is a structured wiring network design and when you break the symmetry of the design you diminish the value of the design. If you have to add spines to the spines to scale out the design, you lose the value of collapsing the tiers in the design. With a controller architecture, you can move from the position that most random is most optimal to most optimal is specific to the workload requirements. The network can react to the workload requirements.
The whole idea of randomly hashing flows across the network and calling it optimal is opposite of most scientific principals. I am certain that if server virtualization was instantiating a VM and having it placed on a randomly chosen CPU the adoption rate of server virtualization would have been far lower and longer. That leaves me to ponder why would we conclude that randomization of the network is best path to optimization when that is not the strategy of choice for compute and storage? I will fully agree that we do not want applications programming QoS settings, but we can abstract the requirements of the application and workload. This abstraction will allow us to build a network tailored to the needs of the application workloads. We can build a better network — rather than operating on the twenty-year assumption that most random is most optimal. We can build a network based on specifics. At Plexxi we call that Affinity.
“An important implication of the self-similarity of LAN traffic is that aggregating streams of such traffic typically does not produce a smooth (“Poisson-like”) superposition process but instead, intensifies the burstiness (i.e., the degree of self- similarity) of the aggregation process. Thus, self-similarity is both ubiquitous in our data and unavoidable in future, more highly aggregated, traffic. However, none of the currently common formal models for LAN traffic is able to capture the self-similar nature of real traffic. We briefly mention two novel methods for modeling self-similar LAN traffic, based on stochastic self-similar processes and deterministic nonlinear chaotic maps, that provide accurate and parsimonious models.”
There is a fair amount work going on in the IT industry to predict, understand, place and control workloads in the datacenter. We should not ignore this work. At Plexxi, we believe we can harness and use this measurement of utility and apply it to the network to obtain fully specified application topologies for the network. I think most IT professionals think the network has a low value of utility. I think that can change. It is just another way we want to build you a better network. I think that five years from now, the design of the network will be different. If we really want to build networks at exabit scale, the network will be rich in path diversity — not aggregation. The more paths the better. There was a presentation at OFC this year in which Facebook showed an astounding statistic. The statistic was for every external 1kb HTTP request, they have a 930x increase in internal traffic. Building at scale is going to require the IT world to harness the photonic advantage and we will not talk about buffers and queue depths because in the photonic world there are no buffers. The photonic advantage (see chart to left from Dr. John Bowers) in terms of power required (watts) and capacity (Gbps) is for optics versus packet switched is 20,000 times better. This is possible in the modern data center because a controller can compute topologies that use optical and packet switching technologies advantageously. In the past I wrote about traffic patterns and highlighted that the cheapest ports a network architect can purchase are the ToR ports. Moving to the next era of network design will not be led by making the cheapest ports cheaper so we can buy more of the them and add to the complexity of the network.
/wrk
Affinity based approach to dealing with self similar nature of traffic makes assumptions about the physical placement of endpoints on a network, the scale of the system and the ability of a network controller to react in time to make a difference. A bigger issue in my experience is not of the self-similarity of Ethernet traffic, but the affect of long lived high bandwidth flows on shorter lived flows. For this problem there are evolving active queue management schemes such as http://www.ietf.org/proceedings/86/slides/slides-86-iccrg-5.pdf as well as indeed affinity based approaches such as http://cseweb.ucsd.edu/~vahdat/papers/helios-sigcomm10.pdf.