Existing Routing Protocols and IPv6

Petri Wessman
Akumiitti Oy
Petri.Wessman@akumiitti.fi

Abstract

Routing is a fundamental component of internetworking. Several diverse protocols have either been developed or have evolved to handle specific problem domains. The advent of IP version 6 represents both a challenge and an opportunity -- one one hand, IPv6 must coexist peacefully with existing routing protocols; on the other hand, it gives developers an opportunity to migrate to protocols that are better suited to the demands of the present-day Internet. This document discusses the issues involved.

1. Introduction 1.1 IPv6 1.2 Provider-based addresses 1.3 Routing
2. General routing problems in moving from IPv4 to IPv6
3. Existing routing protocols 3.1 Interior routing protocols 3.1.0 RIPv2 3.1.1 OSPF 3.1.2 IS-IS and EIGRP 3.2 Exterior routing protocols 3.2.0 BGPv4 3.2.1 IDRP
4. Coexisting IPv4 and IPv6
5. New routing protocols 5.1 Nimrod 5.2 PNNI
6. Conclusions
References
Glossary

1. Introduction

Because the TCP/IP protocol suite is based on the concept of integrating networks (i.e. internetworking), routing plays a very important role in TCP/IP-based networks. Various routing protocols have been developed over the years to deal with the various types of routing problem domains, ranging from internal routing protocols (routing within an autonomous system) to external routing protocols (routing between different autonomous systems). The terms "internal gateway protocol" and "external gateway protocol" are also used [Black95]. The current routing protocols are adequate for the current version of the IP protocol (IPv4), but they are not without their problems, the largest of which is the explosion in routing table size caused by the unforeseen growth of the Internet. The new version of the IP protocol (version 6, officially known as IPv6) changes the most fundamental part of Internet routing -- the address size -- forcing an re-evaluation of the current routing schemes.

1.1 IPv6

IPv6 is a fix for the most serious of the "death of the Internet" scenarios: the lack of address space. Back in the 1970's when IPv4 was developed, 32-bit addresses were seen as "more than large enough". Nobody foresaw the explosive growth of the Internet, and more seriously, the growth in the number of hosts connected to the net. At a time when personal computers as we know them today did not exist, those assumptions were no doubt valid. IPv6 uses an address size of 128 bits, which is currently seen as "more than large enough" -- time will tell whether this is true or not [RFC1883].

The decision about address size was a controversial one, a competing proposal (SIPP) advocated 64-bit addresses while another competitor (TUBA) used NSAP addresses whose length could vary between 1 to 20 bytes. 128 bits was finally chosen because the "additional" bits (as compared to 64) could be used to handle the growing complexity of the Internet by adding logical layers to addresses, in addition to which the same bits made address autoconfiguration (also known as "plug-and-play") possible. Variable-length addresses as used by the TUBA proposal were rejected because we currently don't have enough practical experience with them; it was feared that they would lead to either overly-complex and slow network algorithms or to programmers always opting to use the maximum address size to make coding easier [Huit96].

A complete description of the differences between IPv4 and IPv6 is outside the scope of this document, we will only discuss the structure of an IPv6 address as pertains to routing. The designers of the IPv6 protocol chose to represent the 128-bit address as eight 16-bit integers separated by colons. Each integer is represented in hexadecimal form, skipping leading zeros in numbers is allowed (i.e. 007A can be represented as 7A). An example address would be:

1075:3A:AEF3:0:0:0:210:A6EB

As a further abbreviation, consecutive null (zero) fields within an address can be marked with two colons, reducing the above example to:

1075:3A:AEF3::210:A6EB

Only one double-colon can be used within an address, otherwise we would get ambiguous addresses (::CA74::, for example).

While IPv4 addresses are divided into network classes (class A, class B, etc), IPv6 addressing and routing is performed by using variable-length prefixes from the address. Hosts can legitimately treat IPv6 addresses as opaque 128-bit packets, while routers need only store prefixes (ranging from 1 to 128 bits).

The exceptions to this are the special addresses:

The "unspecified" address (16 null bytes).
Loopback address (::1).
IPv4-based address (96 zero bits prepended to the 32-bit IPv4 address).
Site local address (for networks not connected to the Internet, noted with the binary prefix 1111 1110 11).
Link local address (for stations not yet provided with a provider-based address (see below) or a site local address, noted with the binary prefix 1111 1110 10).
Multicast address (for sending to a select group at once, noted with the binary prefix 1111 1111).

The initial address allocation schemes for IPv6 are designed with classless routing in mind, this is mirrored in the allocation of 1/8 of the address space to so-called provider-based unicast addresses [Huit96].

In addition to the above, IPv6 also supports the concept of "anycast", meaning sending to the "nearest" of a certain group of targets. Anycast addresses are syntactically identical to unicast addresses.

1.2 Provider-based addresses

The first "real" IPv6 addresses for the Internet will most probably be allocated as provider-based addresses. These have the binary prefix 010, followed by five components. A provider-based IPv6 address looks like the following:

# of bits 3 n m o p 128-m-n-o-p
field 010 registry provider subscriber subnetwork interface

# of bits	3	n	m	o	p	128-m-n-o-p
field	010	registry	provider	subscriber	subnetwork	interface

The meanings of the fields are:

Registry

The registry in charge of assigning addresses. Currently four registries have been specified:

10000 multiregional (IANA)
01000 RIPE NCC (Europe)
11000 INTERNIC (North America)
10100 APNIC (Asia/Pacific)

Provider

The regional network service provider (in some cases this will mean the regional registry)

Subscriber

The subscriber of the network link.

Subnetwork

The subnetwork within the subscriber's network.

Station

The individual station (or interface, more exactly).

In the initial allocation scheme, 5 bits have been reserved for the registry, followed by 16 bits of provider ID. This is followed by 8 zero bits in the initial phase. 24 bits have been reserved for the subscriber ID, followed by another 8 zero bits. The remaining bits will (probably) be used for 16 bits of subnetwork ID followed by 48 bits of station (interface) ID. In other words, the current specification for a provider-based address looks like this: [Huit96]

# of bits 3 5 16 8 24 8 16 48
field 010 registry provider empty subscriber empty subnetwork interface

# of bits	3	5	16	8	24	8	16	48
field	010	registry	provider	empty	subscriber	empty	subnetwork	interface

1.3 Routing

In the current-day Internet, a router wanting to provide an ideal route to all possible destinations would have to maintain information about every network in the Internet in its routing table. Obviously, most routers do not do this because of the huge size of the full routing tables, and resort to using default paths for networks that are not included in the routing tables. Unfortunately, the large backbone servers of the Internet cannot afford the suboptimal routing resulting from default paths, and must maintain full tables. Upgrading these tables is a continuous task.

The solution to this problem is to use a well-defined hierarchy of addressing, and have the routing protocols take advantage of this. In IPv4 no such easily usable hierarchy is available, but the IPv6 provider-based addressing scheme is designed explicitly with routing considerations in mind. The provider field was chosen as the base routing "domain" because of its good position in the address hierarchy. [Huit96].

IPv6 was designed with unified routing requirements in mind. The following general guidelines (among others) were imposed at the design stage [RFC1668]:

Addressing must support topologically significant address assignment.
The addressing structure should impose as few preconditions as possible on the address structure (because it is difficult to predict the evolution of the address hierarchy).
Hierarchical address assignment should not force the use of hierarchical routing.

2. General routing problems in moving from IPv4 to IPv6

A number of problems surface when we consider the migration from IPv4 to IPv6. The first is, of course, the assignment of new IP addresses. Considering the size of the Internet this is a daunting task. To help with migration, a mapping exists for current IPv4 addresses to IPv6 (by using a 96-zero-bit prefix). While this allows IPv6 software to handle IPv4 addresses, it doesn't help the routing concerns. Since native IPv6 routing is based on hierarchical routing and the IPv4 addresses do not generally correspond to the network topology, IPv4 addresses cannot be routed with the same algorithms as IPv6 addresses. While the use of converted IPv4 addresses is a temporary measure, IPv4 addresses will probably be in (limited) use for at least ten years to come [Huit96].

At first glance, routing IPv4 addresses efficiently in IPv6 looks like a hopeless task. However, the situation is made easier by the recent use of classless interdomain routing (CIDR) in IPv4 routing [RFC1787]. CIDR is an attempt to curb the growth of routing tables by using variable-length prefixes of the IPv4 address instead of fixed-length network IDs (very much like IPv6 provider-based addresses). In CIDR, address hierarchy is generally dictated by geographical and/or political boundaries, while IPv6 address hierarchy is dictated by more abstract levels. Nevertheless, CIDR addresses provide one possibility of using embedded hierarchy information in IPv4 addresses for IPv6 routing.

IPv6 addressing does not follow political or geographical boundaries because such a plan would not allow proper address aggregation. Networks do not necessarily follow country boundaries, and geographical areas will typically have multiple ways of connecting to the Internet (i.e. providers). The abstract "provider" was chosen as a routing base point for this reason. Using providers does entail some difficulties, though, in that in a sense it ties a customer to a provider (i.e. changing providers changes the network address). If the customer switches from provider A to provider B, for example, there are two possibilities:

The customer convinces both providers to let him use his old address. While easy for the customer, this forces B to advertise to the whole Internet that the specific customer's traffic should be routed through his network, not through network A as specified in the address. In the long run, this would lead to a new version of the current "routing table explosion" problem.
The customer changes all stations within his network to use the new address. This is ideal for the provider, but lots of work for the customer using IPv4 technology.

Fortunately, IPv6 address autoconfiguration procedures enable the easy re-mapping of networks, making the second choice easier to deal with. In addition, since IPv6 addresses refer to interfaces instead of network hosts, the customer has the option of connecting to both networks and routing packets to either of the networks depending on the current network load and other possible factors. This was almost impossible to implement in IPv4 [Huit96].

While IPv6 provider-based addresses provide a convenient hierarchy, the actual assignment of the addresses themselves will probably be a slow process complicated by non-technical issues (i.e. local politics and economic concerns).

3. Existing routing protocols

The earliest TCP/IP routing protocols were born at a time when networks were small, and a single backbone served the entire Internet. As that time, factors such as routing table size, efficiency and scalability were not seen as very important -- "quick and dirty hacks" were more of a rule than the exception [Black95]. In this text we will concentrate on the protocols that are actively in use today, including some protocols that are still in the development stage but which are seen as likely candidates for IPv6 network use.

3.1 Interior routing protocols

Interior routing protocols (or "interior gateway protocols", IGP for short) are protocols which manage routing within an autonomous system. An "autonomous system" in routing terms is a collection of networks that is administered as a whole by a single specific administrator, or in less vague terms, a collection of networks (and/or hosts) that are all connected to each other and which can communicate with each other without resorting to routing outside the autonomous system. Most current Internet domains are examples of autonomous systems.

The Internet has no clear interior routing protocol "leader". RIP is still popular, OSPF is seen as the current recommended protocol, and other protocols such as "dual" IS-IS and the proprietary EIGRP are also in some use. Older protocols such as hello and gated have (fortunately) mostly been abandoned due to being technically obsolete and inefficient [Black95].

3.1.0 RIPv2

The Routing Information Protocol (RIP) is a fairly old IGP which uses distance-vector metrics for routing. In other words, the cost of a route is measured by the number of hops in the route, regardless of the state of the specific hops (links). This is obviously a gross simplification, and is one of the reasons why RIP is seen as inferior to OSPF [Black95]. RIPv2 is version 2 of the original RIP protocol, which adds some additional information to the route information (including some security considerations) [RFC1723]. The advantage of RIP is that it is small (requires very little code) and easy to implement, making it useful for small network nodes (embedded systems etc) which cannot afford the memory space consumed by more efficient protocols.

RIP operates by having each node periodically broadcast the "distance" from itself to other nodes. The metric used for the distance is simply the hop count. RIP nodes use the information broadcasted by other nodes to build routing tables from themselves to all the other nodes in the network. In essence, the network discovers its own topology through successive iterations of routing information broadcasts. RIP has some problems, however. The iterative nature of RIP can cause routing loops in cases where the network has not yet "stabilized", leading to possible message congestion [Huit95]. Another problem with RIP is the so called "counting to infinity" problem, in which the network "discovers" a certain type of broken link only by the network nodes successively incrementing the distance metric until a value of "infinite" is reached [Black95]. Because of this known problem, the value of "infinity" is set to a small value in RIP (16). This solves the problem in a way, but it can still cause slow network convergence after certain types of errors, and the choice of a low "infinite" value has another unfortunate effect: it limits the use of the distance metric to hop counts, because more meaningful metrics would require a larger numeric space.

A version of RIP has been defined for IPv6 [Malk96][Huit96]. It was decided to keep the basics identical to RIPv2, with the new header being a straightforward extension of the RIPv2 header. This makes RIP easy to implement for IPv6 networks, although it is far from an ideal protocol.

3.1.1 OSPF

OSPF (Open Shortest Path First) is a relatively new protocol designed by the OSPF Working Group of the IETF. It is currently at version 2 [Moy95]. OSPF is a shortest-path-first protocol which bases routing decisions on link state records which are dynamically updated. The term "shortest path" in the name is a bit misleading, "optimum path" would be better -- the name comes from the "shortest path first" algorithm developed by E.W. Dijkstra, which is used by OSPF nodes to compute preferred paths. OSPF offers fast convergence periods to stabilize routing tables after network topology changes, is designed to prevent packet looping, supports precise metrics (or multiple metrics), supports multiple paths to a destination, and uses a separate representation for external routes (useful in conjunction with the inter-autonomous system exterior routing protocols). OSPF is noticeably more complex than RIP.

OSPF is a link state protocol (as opposed to RIP, which is a "distance vector" protocol). OSPF nodes all maintain a complete "map" of the network, and perform local computation of best routes based on this internal map. This prevents packet looping, since the internal map is always kept at a coherent state. Changes in the network topology are propagated to all nodes quickly by a "flooding" protocol. OSPF includes three different protocols (compared to RIP's one). The "hello" protocol checks that links are operational and is also used to negotiate a "designated router" (see below) and a backup. The "exchange" protocol is used to synchronize databases between two nodes, with one node acting as "master" and the other as "slave". The "flooding" protocol is used to propagate changes in link state to other nodes in the network. When a network of OSPF nodes is brought up, each node must discover its peers and build up its database of the network topology. To simplify this step and to limit the number of exchanges needed, OSPF nodes "elect" one of the nodes to act as a designated router, and one additional router to act as backup for the designated node in case it fails. All nodes synchronize themselves with the designated router, speeding up the start process. The designated router also acts as coordinator in the sending of "flooding" messages (i.e. messages about network topology changes). OSPF supports load sharing between links with equal or "almost equal" cost, though setting the definition of "almost equal" to too loose a definition can easily cause routing loops. To avoid corrupted routing information, each OSPF packet contains a sequence number which can be used to discard old messages. OSPF also includes security provisions to protect against malicious nodes [Huit95].

OSPF is the recommended IPG for IPv6, and the IETF is designing a version of OSPF for IPv6 use [Colt96]. The changes needed are minimal (to accommodate the larger address format). OSPF with IPv6 will run between IPv6-capable nodes, the link state database will not be shared with a (possibly existing) IPv4 database -- the two versions of OSPF will operate in parallel. This "two ships in the night" mode of operation is seen as preferable to an implementation that handles both IPv4 and IPv6 [Huit96].

The main changes in IPv6 OSPF are [Huit96]:

The link state records will be identified by a 128-bit field instead of a 32-bit one.
The routers in the network will be identified by one of their IPv6 addresses.
The network areas will be identified by on the their IPv6 addresses or an address prefix.
An integer signifying the number of prefix bits will be used instead of a network mask.

3.1.2 IS-IS and EIGRP

IS-IS

IS-IS (Intermediate System to Intermediate System Routeing Protocol) is an internal routing protocol designed by the ISO for use the the CLNP network layer. There are some implementations of IS-IS for IPv4, and it is highly likely that versions for IPv6 will appear in the future.

IS-IS is a link state protocol and is actually quite similar to OSPF. It contains a "hello" protocol to discover neighboring nodes, and uses a "flooding" protocol to propagate link information. There is no separate "exchange" protocol as in OSPF, the "flood" protocol is used for this purpose, too. IS-IS uses a sequence number for messages, but it is a simple incrementing counter unlike OSPF's elaborate "lollipop" scheme [Huit95]. When the counter reaches the "ceiling" an IS-IS router has no option but to fake a failure and trigger a purge of all old information. However, this is not a problem since the sequence numbers used are 32 bits long, giving a very large sequence number space before a ceiling is reached. Originally IS-IS was developed purely for use on OSI networks, but a version has been developed that can handle both OSI (CLNP) and IPv4 networks. This "dual-stack" version of IS-IS has been seen as some as a competitor for OSPF, but it suffers from some problems. IS-IS follows the hierarchical OSI model which dictates fairly rigid constraints on the organization and connectivity of subnetworks (or "areas" in OSI terms). On OSI networks this rigidity is compensated by automatic address assignment to form "areas", but when used for IP IS-IS retains the rigidity without offering any real advantages. It is possible that IS-IS implementations for IPv6 will utilize the autoconfiguration options of IPv6 to bring it closer to the OSI model of operation. At least two other technical problems exist, however: IS-IS uses a tiny metric (6 bits), severely limiting the information that can be conveyed with it; in addition, the link state number is only an 8-bit value, limiting the number of records that a router can advertise to 256. A further non-technical problem is that IS-IS is bound to OSI, and as such is much slower to evolve and to respond to change as is OSPF (for example) [Huit95].

Only time will tell whether IS-IS will find use in the future IPv6 Internet. At the moment OSPF seems to be superior for IPv6 use, but (as always) the future is difficult to predict with any accuracy.

EIGRP

EIGRP (Extended Interior Gateway Routing Protocol) is a proprietary internal routing protocol designed by Cisco Systems Inc. It is an extended version of a protocol called IGRP. As with IS-IS, is will most likely be ported to support IPv6.

IGRP was born at a difficult time for cisco (the company's preferred format for their name). The IETF hadn't yet formalized the specifications for OSPF, and it was becoming clear that RIP had too many limitations to be considered "state of the art" in any sense. cisco had the option to either wait for the IETF to finish their work, or to develop its own protocol -- cisco chose the latter option.

IGRP is a distance vector protocol (like RIP) with "cures" for some of RIP's major problems. It operates on a lower frequency (every 90 seconds as compared to RIP's 30 seconds) and supports such features as composite metrics, some protection against loops, and multipath routing (like OSPF). IGRP routing uses a composite of four metrics: delay, bandwidth, reliability and load. These values (together with network manager -assigned coefficients) are combined in a formula to obtain the final metric for a link. The fact that the metrics are precise allow routers to configure packet delivery based on local preferences. IGRP uses various methods to prevent message loops, including the "split horizon" and "triggered update" techniques used by some RIP implementations [Huit95]. The enhancements lessen, but do not remove the looping problem. IGRP also supports multipath routing very much like OSPF -- load can be balanced among paths that are of "almost equal cost" (the meaning of "almost equal" depends on the version of IGRP).

EIGRP implements a sophisticated algorithm (the "diffusing update algorithm" or DUAL), developed by J.J. Garcia-Luna-Aceves based on work by E.W.Dijkstra and C.S.Scholten in 1980 [Huit95]. It improves noticeably on the distance vector algorithm used by RIP and previous versions of IGRP, mainly by elimination routing loops. The cost of this, however, is added complexity. EIGRP is non-compatible with IGRP and is more complex both on the protocol level and on the implementation level. In addition to a better algorithm, EIGRP has support for CIDR subnet masks (which can presumably be easily extended to support IPv6 prefixes) and support for tagged external routes like OSPF. In short, EIGRP is a much more robust protocol than RIP, and is seen by some as a competitor to OSPF. That is hindered by the fact that a proprietary protocol is not seen as a good thing in today's Internet, which emphasizes "open" solutions.

3.2 Exterior routing protocols

Exterior routing protocols (or "external gateway protocols", EGP for short -- not to be confused with an old routing protocol called EGP) are protocols for routing packets between autonomous systems. They typically view autonomous systems as "black boxes" with well-defined entry- and exit-points, and attempt to route traffic between these points as efficiently as possible.

Older protocols such as GGP and EGP have largely been succeeded by newer protocols (currently various versions of BGP).

3.2.0 BGPv4

The Border Gateway Protocol (BGP, currently in version 4) is the current Internet EGP standard [Black95][RFC1771]. The current version supports the routing table aggregation procedures required by CIDR, and it is based on path vectors. Routers using BGP announce full paths between two sites, allowing the implementation of arbitrary routing policies with full loop detection.

BGP uses the TCP layer instead of the UDP packets used by most other protocols. This simplifies the protocol state machine, since is can rely on the TCP layer for message delivery, but it has the drawback that the link to another node is seen as either totally dead or totally alive -- there is no easy way to probe the quality of the connection (by counting the amount of UDP packets sent/lost, etc). In practice this hasn't caused problems because modern networks tend to be "binary" -- they either work or they're totally dead [Huit95]. The choice of TCP has the favorable effect of lessening the network load, since a reliable transport layer makes possible incremental updates instead of the traditional solution of copying the entire database. After the initialization stage BGP consumes extremely little bandwidth.

A node wishing to communicate with a BGP peer node initially opens a TCP connection to the default BGP port (179). An initial handshake is done, in which identification numbers, authentication information, protocol version numbers etc are exchanged. If the protocol numbers differ, the connection is terminated and can be retried with a new (lower) protocol number. If the authentication succeeds, the nodes will start exchanging "update" packets to bring each other up-to-date. Changes are propagated to other neighboring (connected) nodes. After these exchanges, traffic is limited to "keep alive" messages (informing the other end that the node is active) and further "update" messages if the network topology changes. Nodes do various sanity checks on the path vectors they receive, including checking for instances of themselves along the path -- an indication of a looping path.

BGP is heavily optimized to handle 32-bit addresses, making it less than ideal for IPv6 use. BGPv4 is not easily upgradable to IPv6 even though is has extensions to support CIDR, as a result of which the IETF chose another protocol (IDRP) as the basis of IPv6 external routing.

3.2.1 IDRP

The Inter-Domain Routing Protocol (IDRP) was first designed for use with the OSI family of protocols developed by the ISO, and is defined in ISO standard 10747. Since the most designers of BGP were also involved with IDRP design (making IDRP a descendant of BGP), the widespread opinion among IPv6 developers was that adopting IDRP was wiser than developing a new version of BGP for IPv6 use. Other reasons for the choice are [Huit96]:

It does not have any OSI networking dependencies, even though it was developed for OSI network use.
It was designed from the ground up for multiprotocol routing and can compute routing information from different address families.
It is based on the same (well-tested) path-vector design as BGP, making it technically safe.

The main differences between BGP and IDRP are [Huit96]:

BGP messages are exchanged over a TCP connection, IDRP messages are exchanged using bare datagram services.
BGP is a single-address-family protocol. IDRP supports multiple simultaneous protocol families.
BGP uses 16-bit autonomous system IDs, IDRP uses variable-length prefixes.
BGP always describes the full list of autonomous systems that a path passes through, IDRP can use the concept of Routing Domain Confederations to aggregate this information.

The changes needed in IDRP to support IPv6 are more a matter of defining the content of certain fields than changing the protocol [Rek96].

4. Coexisting IPv4 and IPv6

In an ideal world, the entire Internet would be converted to IPv6 in an instant and everyone could get on with their business. In reality, of course, the transition will be anything but fast -- many sites will probably be reluctant to switch to the new technology unless they can visibly benefit from it. In the beginning, the sites who will benefit from the change will probably be those research and academic sites which have the resources to exploit the new opportunities given by IPv6. It is hard to convince a site to upgrade if the system they have now both works and is based on well-tested technology [RFC1671].

Whatever the actual time frame of the transition, the fact remains that for a fair amount of time the Internet will have IPv4 coexisting with IPv6. This presents some special problems for routing, since IPv6 prefix-based routing differs fundamentally from IPv4 routing (though CIDR is a step in the IPv6 direction). A network (autonomous system) which is composed of IPv6 hosts presents no problem, since it can communicate internally using IPv6 and use the IPv6 internal routing protocols, and communicate with the "outside" via IPv4 and the corresponding protocols. A problem surfaces, however, when we have two IPv6 domains (islands, if you will) separated by a "sea" of IPv4 networks. The routers of each IPv6 domain must establish a "tunnel" through the IPv4 network so that messages can transparently be relayed between the networks [Cal95].

[your browser cannot display this image]

A "tunnel" between IPv6 routers R1 and R2

IPv6 packets traveling from one domain to the other are encapsulated as IPv4 packets. The end points of the tunnel are two IPv4 addresses, in addition to which the MTU of the tunnel and the "time to live" value of the IPv4 packets must be chosen. Both of these choices should be made carefully to avoid strange results.

Since IPv4 supports packet fragmentation, the simple way to determine the MTU would be to ignore it; in other words, just use the IPv6 packet size and let the IPv4 network fragment it as needed. This works, but is potentially very wasteful of resources. Fragmentation forces the other end of the tunnel to store all the parts of a message until all are received, potentially using up large amount of buffer space if large messages are used. If any fragments are lost, the remaining fragments will needlessly eat up buffer space until their time to live expires and the whole packet is retransmitted. To avoid fragmentation, the routers at both ends of the tunnel should attempt to discover the tunnel's MTU (by using variations of the IPv4 MTU discovery technique, for instance, in which MTUs are lowered if ICMP messages indicate that the packet size was too big). As long as the MTU is larger or equal to IPv6's minimum packet size (576 octets, as compared to IPv4's 48 octets) fragmentation can be turned off in the IPv4 header. If the MTU is lower, IPv4 fragmentation must be used, with all the adverse effects it entails.

The IPv4 time to live (TTL) value must simply be guessed at. It must be large enough to enable packets to pass through the tunnel without expiring, but small enough that possible looping packets and incomplete fragmented messages are caught as quickly as possible. The current recommendation is vague on this subject, the actual value used is left as "implementation specific".

One inconvenience of using tunneling is the difficulty of determining a valid metric for a tunnel. Protocols such as RIP see the tunnel as 1 hop, even though it is most likely composed of multiple hops. A metric can be obtained manually, but even this is difficult due to the probable changing nature of the network that forms the "tunnel". All this makes for some potentially strange routing decisions, in which routing protocols route messages through the tunnel even though "cheaper" direct connections exist. To solve this problem we need a way of determining a metric for a tunnel from the IPv4 routing tables, and the software for that doesn't exist (yet).

Another problem is resources in the tunnel. IPv4 resource allocation schemes see the tunnel as one "user" and give it a proportional share of the network resources, even though the tunnel could well be carrying heavy traffic from multiple users. This can result in poor transmission rates over the tunnel. Solutions to this have been proposed (enforcing a nominal bandwidth for IPv4-encapsulated IPv6 packets, for example), but they all have their drawbacks.

It is probable that most tunnels will be formed (and collapsed) automatically by IPv6/IPv4-capable routers [Huit96].

5. New routing protocols

Some new routing protocols are being developed that attempt to build on current knowledge of routing protocols. The most prominent of these is Nimrod, and its ATM offshoot called PNNI.

5.1 Nimrod

The Nimrod routing architecture ([Cast96][Ram96]) is a new scalable routing architecture still in the design stage. It is geared towards IPv6 while not being tied to it, and supports dynamic internetworking with arbitrary network sizes, provides service-specific routing, and allows incremental deployment within an internetwork. The design philosophy of the Nimrod effort is "maximize the lifetime and flexibility of the architecture", with a secondary philosophy of specifying all field lengths as somewhat larger than can conceivably be used -- past history shows that large changes in numeric magnitude are not exceptional in computer science, with microprocessor address size and IPv4 address size serving as as good examples [RFC1753].

The main goals of Nimrod are [Cast96]:

Support a dynamic internetwork of arbitrary size by limiting the amount of routing information that must be known throughout the internetwork.
Provide service-specific routing in the presence of multiple constraints imposed by service providers and users.
Admit incremental deployment throughout an internetwork.

To meet these goals, Nimrod represents internetwork connectivity and services with maps which have different abstraction levels. It supports user-controlled route generation and selection based on these maps and on general traffic service requirements, and it also supports user-directed packet forwarding along established paths. Nimrod is a scalable architecture, that is to say it can perform routing either within a single domain or between multiple domains -- in effect, is it both an IGP and an EGP. The technology is not tied to IP, and can function equally well in an OSI environment.

Nimrod sees the internetwork as clusters of "entities" at various levels of abstraction. The entities can be hosts, routers, or other equipment, and the method of clustering them together is not mandated by Nimrod -- a cluster can represent a local network, a larger collection of networks that are managed by a single authority, etc. Clusters can be grouped into other clusters, giving abstraction levels to the network "map". All elements of a cluster must satisfy one condition: connectivity. Any two entities within a cluster must be connected by at least one route that lies entirely within the cluster. Once a cluster is formed, connectivity and service information about it is stored (this is statistical information that describes the cluster "as a whole").

Each cluster selects which portion of the available routing information it advertises to the "outside world" and which portion it wants to receive. This is one of the key elements of Nimrod's scalability, since it allows portions of an internetwork to retain only the necessary amount of information. At the rate the Internet is growing, it is becoming more and more impractical to store all available routing information. Nimrod allows route generation according to specific constraints, but since this is a computation-intensive procedure it calculates such information only for entities that request it. Entities can also select their own route selection algorithms.

To limit the amount of forwarding information that must be maintained in each router, Nimrod multiplexes multiple traffic flows with similar requirements over a single path. To meet the same goal, Nimrod retains information only for active traffic flows.

On the lowest level, Nimrod handles traffic between endpoints, which are identified by endpoint identifiers (EID). These are globally unique bit strings with no topological significance. Regions of the network (clusters) are represented as nodes, with stored adjacency information telling which nodes connect to which. Network topology is stored in maps, which consists of nodes and adjacency information. These maps are used to calculate routes for messages. It is not expected that routers on the network have consistent maps (due to the information hiding nature of Nimrod and to delays in routing information propagation). Nimrod has been designed to prevent loops even in the case of inconsistent router maps.

"Addresses" are expressed as locators in Nimrod. Each node and each endpoint has a locator, with endpoints potentially having multiple locators. All calculations are performed on locators, EIDs are not used in routing decisions. A node is said to own the locators which have the node's locator as prefix. A node can have a more detailed internal map, which a router can request from it if it wants to make detailed routing decisions. This internal map can recursively contain other internal maps, allowing the mapping of a whole internetwork into a single node. There is no defined "lowest level" of maps ("it's turtles all the way down!") [Cast96].

Nimrod is a fairly complex architecture, and a complete description of its operating modes (let alone protocols [Ram96]) is outside the scope of this document. A high-level description of Nimrod functionality can be found in [Cast96].

5.2 PNNI

PNNI is an evolving routing architecture for use with ATM networks. PNNI is based on the Nimrod protocol suite being developed by the IETF, but it is a distinct protocol and not just a version of Nimrod.

PNNI has similar goals to Nimrod, namely scalability, support for policy-based routing, and user-level determination of the desired quality of service (QoS). The basic units are switches, which are recursively grouped into peer groups. Like Nimrod, PNNI is map-based, and performs routing decisions based on internal maps which are kept as up-to-date as possible -- peer group leaders receive maps from the peer groups that it "contains", and forward their own maps to their (possible) parent peer groups. Like Nimrod, the maps contain resource information which is used in route determination.

PNNI is still an evolving protocol. As of this writing, the packet formats and protocols have largely been settled but there is still lots of work to be done, especially in the support for policy routing and authentication [PNNI].

While ATM networks currently feature very little in the structure of the Internet, the high-speed virtual circuits they offer will probably find use somewhere in the Internet community, and PNNI routing may have to be taken into account. As such, ATM is quite different from TCP/IP, with the former being a circuit-oriented design and the latter being a packet-oriented strategy. While some opponents see ATM as being based on multimedia needs as conceived 15 years ago, instead of the current philosophy of "lots bandwidth is all you need" [Huit96], the powerful financial investments made by major telecommunications companies into ATM technology cannot be overlooked.

6. Conclusions

The move from IPv4 to IPv6 is inevitable, as is the fact that the transition will be somewhat painful. The newer routing protocols (OSPF, IDRP) support classless routing (CIDR), which extends naturally to IPv6 prefix-based routing. Older protocols which are rigidly tied to the 32-bit IPv4 address format (at least on the implementation level) will likely become obsolete, since there is no real reason to keep using them if one is going to upgrade to IPv6.

Interim measures such as tunneling and IPv6 encapsulation in IPv4 packets enable a smooth transition, but the fact that it is a transition should not be ignored. Tunneling and other layered network methods cause serious routing problems, and should not be seen as anything else than an interim solution. Dual-stack routing systems (IPv4/IPv6, IPv6/OSI, etc) are also not a long-term solution, as their implementations can easily become over-complicated and they force compromise solutions in cases where the address formats differ considerably (IPv6 vs OSI, for instance).

The new routing protocols under development -- especially Nimrod -- show promise, but they are still experimental protocols and their performance and implementation in the "real world" are unknown factors. In the near future, IPv6 sites should expect to run OSPF for IPv6 as an internal routing protocol and a version of IDRP for external routing. In the not-so-near future the picture is far less clear -- what is seen as the "right" technology right now may not be suitable for the Internet of tomorrow. Time will tell, as always.

References

[Huit95]: Huitema, Christian. "Routing in the Internet"
1995, Prentice Hall, ISBN 1-201-592-2498
[Huit96]: Huitema, Christian "IPv6, The New Internet Protocol"
1996, Prentice Hall, ISBN 0-13-241936-X
[Black95]: Black, Uyless. "TCP/IP & Related Protocols", 2nd edition
1995, McGraw-Hill, ISBN 0-07-005560-2
[Cal95]: R. Callon, D. Haskin "Routing Aspects Of IPv6 Transition"
10/11/1995, Internet-Draft
[Moy95]: J. Moy "OSPF Version 2"
11/22/1995, Internet-Draft
[Rek96]: Y.Rekhter; P.Traina "IDRP for IP v4 and v6"
January 1996, Internet-Draft
[Malk96]: G. Malkin "RIPng for IPv6"
02/07/1996, Internet-Draft
[Colt96]: R. Coltun, J. Moy "OSPF Version 2 For IP Version 6"
02/23/1996, Internet-Draft
[Cast96]: I. Castineyra, J. N. Chiappa, M. Steenstrup "The Nimrod Routing Architecture"
February 1996, Internet-Draft
[Ram96]: R. Ramanathan; M. Steenstrup "Nimrod Functionality and Protocol Specifications, Version 1"
March 1996, Internet-Draft
[RFC1668]: Estrin, D.; Li, T.; Rekhter, Y. "Unified Routing Requirements for Ipng"
August 1994, RFC1668
[RFC1671]: Carpenter, B. "IPng White Paper on Transition and Other Considerations"
August 1994, RFC1671
[RFC1723]: G. Malkin "RIP Version 2 Carrying Additional Information"
November 1994, RFC1723
[RFC1753]: Chiappa, N. "IPng Technical Requirements Of the Nimrod Routing and Addressing Architecture"
December 1994, RFC1753
[RFC1771]: Y. Rekhter; T. Li "A Border Gateway Protocol 4 (BGP-4)"
03/21/1995, RFC1771
[RFC1787]: Rekhter, Y.,ed. "Routing in a Multi-provider Internet."
April 1995, RFC1787
[RFC1883]: S. Deering; R. Hinden "Internet Protocol, Version 6 (IPv6) Specification."
01/04/1996, RFC1883
[PNNI]: Joel M. Halpern "The Architecture and Status of PNNI"
http://www.vivid.newbridge.com/documents/Joel.html

Glossary

ATM	Asynchronous Transfer Mode, a virtual-circuit-oriented high-speed network architecture. Also known as "Another Terrible Mistake" in some circles.
autonomous system	A group of networks that are administered as a whole system.
BGPv4	Border Gateway Protocol, version 4. An external routing protocol.
classless routing	A routing scheme that does not separate addresses into different network type classes.
EIGRP	Extended Interior Gateway Routing Protocol. A proprietary internal routing protocol developed by Cisco Systems Inc.
IDRP	Interdomain Routing Protocol. An external routing protocol, used in IPv6. Originally designed by the ISO.
IPv4	The current version of the Internet Protocol (version 4).
IPv6	The Internet Protocol, version 6. Earlier known as IPng.
IS-IS	Intermediate System to Intermediate System Routing protocol. An internal routing protocol defined by the ISO.
MTU	Media transmission unit. The (maximum) size for a packet that the media can transmit.
Nimrod	A new scalable routing architecture.
multicast address	An address that corresponds to a selected group of targets.
OSPF	Open Shortest Path First. An internal routing protocol for use in both IPv4 and IPv6 environments.
PNNI	A routing/switching protocol designed for ATM.
RIPv2	Route Information Protocol, version 2. A version of RIP called RIPng is being defined for IPv6.
unicast address	An address for transmitting to exactly one target.