Table of contents (for this page):
BGP and IPv6 routing coursesSeveral times a year I teach two training courses, one about BGP and one about IPv6. The BGP course is half theory and half hands-on practice, and so is the new IPv6 routing course. Previously, we did an IPv6 course without a hands-on part.
The courses consists of a theory part in the morning and a practical part in the afternoon where the participants implement several assignments on a Cisco router (in groups of two participants per router).
The next dates are February 2 for the BGP course in Dutch and February 3 for the IPv6 routing course in Dutch. (There will be dates for the courses in English later in 2015.) Go to the NL-ix website to find more information and sign up. The location will be The Hague, Netherlands.
Interdomain Routing & IPv6 News
After some heated discussions about packet sizes on the mailinglist of the IETF v6ops working group, I decided to do some measurements to find out what maximum packet sizes are supported on today's internet. I did this by capturing two types of packets: the ICMP "too big" messages that routers send to tell a computer to send smaller packets, and the first packet of a TCP session, which contains the MSS option. The maximum segment size (MSS) option is used in TCP sessions to tell the other side what the maximum packet size is that we can receive. This depends on the maximum transfer unit (MTU) of the hardware, which may be further reduced by system administrators.
The Ethernet standard uses an MTU of 1500 bytes, although a lot of Ethernet hardware can support more, such as 9000-byte "jumboframes". Wi-Fi also uses 1500 bytes to be compatible with Ethernet. However, sometimes one protocol needs to be tunneled over another protocol, such as IPv6 over IPv4 (over Ethernet) or PPP over Ethernet, which reduces the supported packet size to 1480 or 1492, respectively. The IPv6 specifications require that a minimum MTU of 1280 bytes is supported. IPv4 has no minimum MTU. Note that all of this is about the maximum packet size, it is of course perfectly fine to send smaller packets.
TCP (and UDP) use segments which are put inside IP packets that are then transmitted inside Ethernet frames. A 1500-byte IPv4 packet supports 1460-byte TCP frames (1500 bytes minus the 20-byte IPv4 header and the 20-byte TCP header). This 1500-byte IP packet is transmitted as a 1518-byte Ethernet frame, although some people only count 14 bytes for the Ethernet header, ignoring the 4-byte checksum that's at the end of the Ethernet frame. Because the IPv6 header is 40 bytes, a 1500-byte IPv6 packet can hold a 1440-byte TCP segment, while a 1500-byte IPv4 packet can hold a 1460-byte TCP segment. I'll be talking about IP MTU sizes rather than segment/MSS sizes to make it easier to compare IPv4 and IPv6 results.
Over the better part of a week, my server received 41753 incoming TCP SYN packets with an MSS option on port 80. Another 140 packets didn't have the MSS option, and looked like they were mostly TCP-based traceroute packets. 24246 packets were IPv4 packets, coming from 4164 unique IP addresses. 17507 were IPv6 packets, which I reduced down to 227 unique IP addresses. Turns out, most of the IPv6 traffic on my server is from bots that check if I've added any new content to the site. Some of them use the same address each time, but others keep using different addresses, but, strangely, the same source port number (12000 - 12006). I removed these addresses to keep them from drowning out the real data.
The data showed no fewer than 72 different MTU sizes for IPv4, ranging from 576 to 9198 bytes. However, both of these extremes only showed up once, and other values below 1280 and above 1500 are also quite rare:
I found the 9001 value quite curious; computers really like to work on nice round multiples of 2, 4 or 8 bytes. 9001, on the other hand, is a prime number. Turns out that 9001 bytes is used Amazon's datacenters, where some of the bots that index my website reside. These are the more common MTU sizes advertised in the TCP MSS option:
1300 and 1400 look like someone set them manually; 1300 is also a common VPN MTU. 1440 bytes seems to be hardcoded in some home routers. 1460 could indicate IPv4-over-IPv6 tunneling. 1470 seems to be used by a number of broadband ISPs and 1492 results from PPP over Ethernet (PPPoE). Last but not least, just under two thirds of IPv4 visitors support the Ethernet MTU of 1500.
These are the results for IPv6 with the < 1% values removed (there were no values below 1280 and above 1500):
1280 and 1480 are probably IPv6-in-IPv4 tunnels and 1428 AYIYA tunnels. 1472 could be IPv6-in-UDP-in-IPv4 tunnels or IPv6-in-IPv4-over-PPPoE. The image below shows the cumulative frequency of MTU sizes for both IPv4 (red) and IPv6 (blue), where the line shows how many systems support a given MTU value, starting at 99.98% for 1200 and ending at 65.56% for 1500 (for IPv4).
The 90th percentile MTU size is 1428 for IPv6 and 1440 for IPv4. Obviously 100% of IPv6 systems support 1280, but 99.7% of IPv4 systems also support this size.
The MSS reflects the maximum size that the systems at both ends of a connection think they can use. However, there may be a bottleneck somewhere along the path. In that case, routers send back an ICMP Packet Too Big message. Tomorrow, I'll look at those.
I'm in Honolulu for the IETF meeting this week. As always, on sunday morning before the meeting proper starts, there's the IEPG, where there's always interesting stuff being presented, usually from the operational side of networking.
Today, there were talks about IPv6 packets with extension headers being dropped, routing table and packet size issues by Geoff Huston, and a discussion on Shim6 and Multipath TCP (MPTCP) failure recovery by Brian Carpenter. All good stuff. However, at the end of Brian's presentation, Lorenzo Colitti thanked Brian for the interesting presentation about the performance of undeployable protocol A vs undeployable protocol B.
I kind of get why Shim6 and MPTCP are considered undeployable, because you need to have addresses from two different ISPs, and you need to make sure that packets with addresses from ISP 1 go to ISP 1 and those with addresses from ISP 2 to ISP 2. If not, BCP 38 ingress filtering will block the packets. The trouble is that the RIRs started giving out provider independent IPv6 addresses shortly before Shim6 was finished so larger networks simply use those, Shim6 never got any traction and so if you want to use it now you'll find that nobody else uses it, and you need it at both ends for it to work. It's still somewhat early days for MPTCP, but it doesn't seem to be setting the world on fire, either.
But Lorenzo was talking about the fact that MPTCP uses TCP options that are filtered out by firewalls. Brian already mentioned that the Shim6 extension header is also often filtered, and suggested that probe packets should look like normal data packets.
However, when both of these were designed, those issues were considered. Obviously it would have been great if we could have implemented these two protocols without the need for additional options or headers, but I don't see how that would have been possible. So the next best thing was to make sure that if the options, or the packets containing options, are filtered, communication still works without the benefits of Shim6 or MPTCP. This means the protocols were never undeployable: you can turn them on by default without any issues. If the headers/options are filtered, you simply don't get any benefit, but everything still works. For paths where the options/headers are left alone, Shim6 and MPTCP get to do their thing and you benefit from being able to use additional paths. Over time, hopefully firewall operators realize these protocols don't cause any harm and stop filtering them.
Unfortunately, there are protocols that really do turn out to be undeployable, because firewalls or bad implementations break any communication that uses those protocols.
This week, the Amsterdam Internet Exchange is renumbering its peering LAN.
An internet exchange (IX) is simply a very big Ethernet. Members connect a router port to that Ethernet, and can then exchange packets with each other. When you want to exchange traffic with many other networks, obviously this is more efficient than setting up dedicated connections with all these other networks.
Until this week, AMS-IX used a /22 prefix, allowing for about a thousand connected routers. That was no longer enough, so they got a new /21 prefix, which can accommodate two thousand connected routers. This means that all the currently connected routers must get a new address. No big deal. This is why search-and-replace was invented.
However, sometimes someone makes a mistake. Like configuring <new address>/22 instead of <new address>/21. And then letting that /22 propagate to other networks over BGP. Suppose:
(A more specific prefix is a smaller range of IP addresses. 192.0.0.0/21 is BGP talk for the address range 192.0.0.0 - 18.104.22.168. 192.0.0.0/22 is the range 192.0.0.0 - 22.214.171.124. Because the latter identifies a smaller range of IP addresses, the packets are sent in that direction, just like you'd follow a sign "Paris" rather than a sign "France" if you were going to Paris, even though Paris is part of France so presumably following the sign "France" would also get you to Paris.)
The sad thing is that the exact same thing happened in 2003, when the AMS-IX renumbered from a /24 to a /23. I always warn against this issue during my training courses, and tell students to filter the IX prefixes of internet exchanges they're connected to, as well as all possible subprefixes (more specifics) that fall within that IX prefix. For instance:
This prefix list will reject incoming updates with your own prefix and all possible more specifics (assuming your prefix is 172.16.0.0/12) as well as the AMS-IX prefix 126.96.36.199/21 and all possible subprefixes. It then allows all prefixes with a prefix length of no more than /24, which is common practice for IPv4.
"le" means "less or equal" so "172.16.0.0/12 le 32" means:
Hopefully, by the time AMS-IX connects more than 2000 routers, the issue is moot because we no longer use IPv4. But for now: happy renumbering!
I was updating a presentation the other day, and I found something I wanted to share here. (Looks like old presentations are good blog fodder. Who knew.) The old presentation was the one I did at the 2013 ISOC New Year's event, where I put up some slides on Google's numbers of IPv6 users (from their vantage point) in various countries. That was less than two years ago, and a lot has changed.
The global number of IPv6-capable users was 1% then (January 2013), now (September 2014) it's 4%. Some countries that were ahead of the curve extended their lead, others didn't, and a new world leader emerged out of nowhere. Click the link to see the old slides followed by the current situation. (1.4 MB PDF.)
So the good news is that a lot can happen quickly when it comes to IPv6 deployment.
My Books: "BGP" and "Running IPv6"On this page you can find more information about my book "BGP". Or you can jump immediately to chapter 6, "Traffic Engineering", (approx. 150kB) that O'Reilly has put online as a sample chapter. Information about the Japanese translation can be found here.
"no synchronization"When you run BGP on two or more routers, you need to configure internal BGP (iBGP) between all of them. If those routers are Cisco routers, they won't work very well unless you configure them with no synchronization.
The no synchronization configuration command tells the routers that you don't want them to "synchronize" iBGP and the internal routing protocol such as OSPF. The idea behind synchronizing is that when you have two iBGP speaking routers with another router in between that doesn't speak BGP, the non-BGP router in the middle needs to have the same routing information as the BGP routers, or there could be routing loops. The way to make sure that the non-BGP router is aware of the routing information in BGP, is to redistribute the BGP routing information into the internal routing protocol.
By default, Cisco routers expect you to do this, and wait for the BGP routing information to show up in an internal routing protocol before they'll use any routes learned through iBGP. However, these days redistributing full BGP routing into another protocol isn't really done any more, because it's easier to simply run BGP on any routers in the middle.
But if you don't redistribute BGP into internal routing, the router will still wait for the BGP routes to show up in an internal routing protocol, which will never happen, so the iBGP routes are never used.
BGP SecurityBGP has some security holes. This sounds very bad, and of course it isn't good, but don't be overly alarmed. There are basically two problems: sessions can be hijacked, and it is possible to inject incorrect information into the BGP tables for someone who can either hijack a session or someone who has a legitimate BGP session.
Session hijacking is hard to do for someone who can't see the TCP sequence number for the TCP session the BGP protocol runs over, and if there are good anti-spoofing filters it is even impossible. And of course using the TCP MD5 password option (RFC 2385) makes all of this nearly impossible even for someone who can sniff the BGP traffic.
Nearly all ISPs filter BGP information from customers, so in most cases it isn't possible to successfully inject false information. However, filtering on peering sessions between ISPs isn't as widespread, although some networks do this. A rogue ISP could do some real damage here.
There are now two efforts underway to better secure BGP:
The IETF RPSEC (routing protocol security) working group is active in this area.
IPv6BGPexpert is available over IPv6 as well as IPv4. www.bgpexpert.com has both an IPv4 and an IPv6 address. You can see which one you're connected to at the bottom of the page. Alternatively, you can click on www.ipv6.bgpexpert.com to see if you can connect over IPv6. This URL only has an IPv6 address.
What is BGPexpert.com?BGPexpert.com is a website dedicated to Internet routing issues. What we want is for packets to find their way from one end of the globe to another, and make the jobs of the people that make this happen a little easier.
Ok, but what is BGP?Have a look at the "what is BGP" page. There is also a list of BGP and interdomain routing terms on this page.
BGP and MultihomingIf you are not an ISP, your main reason to be interested in BGP will probably be to multihome. By connecting to two or more ISPs at the same time, you are "multihomed" and you no longer have to depend on a single ISP for your network connectivity.
This sounds simple enough, but as always, there is a catch. For regular customers, it's the Internet Service Provider who makes sure the rest of the Internet knows where packets have to be sent to reach their customer. If you are multihomed, you can't let your ISP do this, because then you would have to depend on a single ISP again. This is where the BGP protocol comes in: this is the protocol used to carry this information from ISP to ISP. By announcing reachability information for your network to two ISPs, you can make sure everybody still knows how to reach you if one of those ISPs has an outage.
For those of you interested in multihoming in IPv6 (which is pretty much impossible at the moment), have a look at the "IPv6 multihoming solutions" page.
Are you a BGP expert? Take the test to find out!
These questions are somewhat Cisco-centric. We now also have another set of questions and answers for self-study purposes.
You are visiting bgpexpert.com over IPv4. Your address is 188.8.131.52.