"Small Forwarding Tables for Fast Routing Lookups"
Mikael Degermark, Andrej Brodnik, Svante Carlsson, and Stephen Pink
Summary by Andy Reitz.
January 27th, 2000
Summary:
The purpose of this paper is to describe a method which achieves Gigabit-level routing performance using commodity processors. The Author's have recognized that one of the key bottlenecks in the routing of IP packets is the routing lookup. Thus, it is hoped that if this procedure can be heavily optimized in a clean and robust manner, that the way should be paved for Gigabit routing on the Internet.Essentially, a routing lookup finds the nearest match to the destination address of a given IP packet, in the table of known routes (next hops). Before this paper, it was assumed this process was compute-intensive, and should be avoided at all costs. It is the intention of this paper to show that the impact of this process can be minimized, using the proper data structures.
To the authors, the proper data structure is one that is minimal in both size as well as the number of memory accesses required. The number of memory accesses should be kept low because each one incurs a large processor-cycle memory. Additionally, keeping the size of the data structure allows a larger portion of it to be cached in the processors internal caching hierarchy. Keeping the data in the processors cache can significantly increase performance.
In keeping with the theme of optimality, the authors also tried to code these data structures efficiently, such that the the smallest number of instructions are needed in order to create the data structure, as well as to perform lookups in it. By optimizing the performance from these two quarters (instruction complexity and data size), the authors show that the desired performance levels can be achieved from commodity processors.
Strengths Of This Design:
- Provides a method for performing a high rate of IP route lookups on common, Industry-standard hardware. Not only is the high rate of route lookups a boon in and of itself, but the fact that this can be done using commodity hardware has additional benefits.
- Firstly, commodity hardware tends to be cheaper than special-purpose hardware.
- Secondly, it is easier to find programmers and hardware designers that understand the commodity hardware.
- Scales to larger routing tables, via simple modifications to the data structures employed. Scales to different revisions of IP (i.e. IPv6) in a similar manner.
- Can perform at least 2.2 million routing lookups on the DEC Alpha 21164 @ 333Mhz, and 2.0 million on an Intel Pentium Pro @ 200 Mhz. Plus, these numbers are constrained only by the processors themselves. Hence, faster processors should be able to generate higher performance.
- A strictly software-based approach is generally more flexible than hardware-based, or hardware/software mixes (i.e. Tag Switching).
Weaknesses Of This Design:
- The extreme performance of this architecture relies largely on the presence of the forwarding table in the processor's cache hierarchy. Unfortunately, the author's benchmarks don't take into account the fact that the processor will be running some sort of multitasking operating system (e.g. Cisco's IOS). Under this sort of environment, frequent context switches will obliterate cached forwarding data, chewing up performance. The authors have tried to optimizing this by minimizing the number of instructions executed (as well as the data size), but to date, the true performance of this method can only be truly measured when it is employed in a real, live router.
- The data structure is rather complex, and could prove to be somewhat tricky to implement.
- The data structure described is essentially static -- it only supports lookups, not additions or deletions. Hence, the router must repeatedly construct this table from the Patricia Tree data structure that it maintains elsewhere. Although the authors claim that this conversion can be done efficiently, it does have the possibility of adding extra latency to the lookup process.