BGP, ASICs, and IPv4 at scale – Why the Internet is (or was) Broken
On Tuesday August 12, 2014, the global IPv4 Internet routing table exceeded 512,000 routes. This is an important number for several very common routing platforms across multiple networking vendors, and as a result many networks across the globe experienced interruptions and performance issues. While some articles have reported on the scope of impacted carriers and services, I’ll discuss the technologies and limitations that led to this failure.
To understand the source of this failure, there are two critical pieces of Internet technology to grasp. First is BGP, the way routers around the Internet exchange information about what networks they know about, and how to reach them. Second, and the more immediate cause of the recent failure, are the mechanics or implementation of routing and destination lookups in high performance, carrier grade routers and switches.
Border Gateway Protocol
The common language that all carrier and ISP routers speak throughout the Internet is crucial to the delivery of the interconnected, distributed network that we all use today. BGP dates back to 1995 and even earlier in older iterations, and is still in use today, albeit with many adjustments and new features. At the core of BGP is a common method for routers and network devices to tell each other about networks, and their relationship to those networks. A critical component to BGP operation is the Autonomous System or ASN, which defines a network and its associated prefixes (known routes) as viewed from the perspective of the BGP protocol.
BGP speaking routers across autonomous systems peer with each other in a variety of arrangements such as dedicated peering points or direct connections, and exchange their known routes or a subset of known routes with each other. This exchange also includes a large number of attributes (Origin, As_Path, Next_Hop, Local_Pref, communities…) associated with each prefix. As you might expect, at a global scale this table is quite large – and each time a new autonomous system (typically an ISP or larger enterprise) pops up, or an existing autonomous system further distributes its network, this table continues to grow.
TCAM, FIB, ASIC what?
Routers in larger scale networks need to store, reference, and update this large global BGP routing table quickly and frequently. So where does all of this information get stored? The answer isn’t simple, and varies wildly by vendor and hardware, but in large scale network hardware, the general implementation looks like this.
Routers handling BGP (and other routing protocols) maintain routing tables and store known routes in a Routing Information Base or RIB specific to that protocol. The best known route for a destination is then inserted into the Forwarding Information Base or FIB, which the router will use for forwarding lookups on packets due for processing. While the RIB is typically stored in general system memory, of which space is often readily available, the FIB is stored on TCAM residing on ASICs. Application Specific Integrated Circuits are the secret sauce to high performing networking appliances, enabling fast routing and switching above and beyond what traditional x86 based processing and memory can achieve. The ASICs on modern routers and switches contain a fixed amount of TCAM, enabling the low latency lookups and updates required for routing and switching.
So what happened and what does it have to do with TCAM? Going back to the original issue with our picture of routing tables and TCAMs in mind, you can see how the growing IPv4 Internet might be a problem for platforms with a fixed amount of space available. As it so happens, some of the most commonly deployed BGP speaking routers and switches over the last decade, including the Cisco 6500 and 7600 platform, have had default TCAM allocations allowing for a maximum of 512,000 IPv4 entries. On many platforms, as soon as TCAM space reaches its limit, failures can occur ranging from reduced performance to catastrophic failure. On Tuesday that 512,000 IPv4 limit was reached and the straw that broke the camel’s back, so to speak, was Verizon owned ASN 701 & 705. While the change, the apparently inadvertent advertisement of thousands of new routes, was quickly reversed, routers across the Internet saw performance issues and failures for hours due to the nature of BGP propagation.
Even though we are again under the 512k IPv4 route marker, it’s only a matter of time until we cross that barrier and beyond. This expansion of IPv4 tables is ultimately at the fault of no one, and should only be seen as the expected progression and growth of the “Internet of Things”. It’s hard to predict what type of scale future networks will need to adapt to, and in a world of fast expanding virtualization and cloud services it can be easy to forget that the underlying network transport behind these services must scale to cope with demand. Hold on to your hats over the next few weeks as we inevitably cross the 512k barrier again, and see who accepted this as a wake-up call to upgrade where necessary, and who didn’t.