Table of contents (for this page):
BGP and IPv6 routing coursesSeveral times a year I teach two training courses, one about BGP and one about IPv6. The BGP course is half theory and half hands-on practice, and so is the new IPv6 routing course. Previously, we did an IPv6 course without a hands-on part.
The courses consists of a theory part in the morning and a practical part in the afternoon where the participants implement several assignments on a Cisco router (in groups of two participants per router).
Dates for upcoming courses in 2015 are:
Interdomain Routing & IPv6 News
This July 30th, at 23:59:60, a leap second was added to Coordinated Universal Time (UTC). Dyn Research posted the following graph on Twitter that shows there was significant BGP update instability for five minutes after the leap second occurred:
Unfortunately, it's not clear why this happened. However, leap seconds have triggered all kinds of mishaps in the past. They're basically miniature Y2K problems. Time and time again, software engineers show that they can't be trusted to take corner cases into account properly.
This does remind me of a situation about a decade ago, where I had a customer that experienced BGP instability every night at the same time. They used Quagga running on Linux machines. We couldn't figure out what the problem was, until we realized that at that very moment, the ntpdate command was run from the cron. ntpdate synchronizes the system clock with an NTP server. As the machine in question had a very poor system clock, this meant that the system's time was adjusted a lot every night, I think a minute or more, but definitely more than 30 seconds.
Which meant that if Quagga had gotten a BGP keepalive message 8 seconds earlier, it now thought that was 38 seconds ago. If BGP is configured with a hold time of 30 seconds, this means that Quagga now thinks the other side has been quiet for longer than the hold time and it'll tear down the BGP session. This is what happened every night for a bunch of BGP sessions. We solved this by running the NTP daemon continuously, so there was never a big adjustment in system time. (Alternatively, just letting the time drift would also have worked.)
The minimum BGP hold time is 3 seconds, so adjusting for an (improperly handled) leap second shouldn't be able to make BGP think the hold time for a session is expired. However, there could be bug somewhere else that impacted BGP.
I'm not sure whether these kinds of issues are a good argument in favor of abandoning leap seconds, as the bugs won't go away, they'll just show up at a less predictable time. But I don't like the current leap second practice, as they're unpredictable, and you can't calculate the time difference in seconds between two dates without taking the entire list of leap seconds into account. I think it would be better to save the leap seconds up and apply them all at the end of a century.
At the NANOG meeting in San Francisco two weeks ago, there was a session on The benefits of deploying IPv6 only. Someone from T-Mobile explained that the latest Windows Mobile and Android support 464XLAT to allow IPv4-only applications to work over IPv6 with NAT64, so those devices now only get IPv6. Other devices only get IPv4, there's no dual stack. At that point, the panelists didn't know yet that Apple is requiring iOS 9 apps to work over IPv6 so those can work through NAT64 without 464XLAT.
Another interesting data point is the observation by Facebook that IPv6 tends to perform better than IPv4, with the margin being as large as 40%:
However, why this is is unclear: the RTTs are the same, yet the performance/bandwidth over IPv6 is better. There was some frustration because Apple's implementation of "happy eyeballs" only looks at the RTT to choose between IPv4 and IPv6, and thus lands on IPv4 a good deal of the time and doesn't enjoy the benefits of that better IPv6 performance.
Earlier this month, RIPE Labs had a lengthy blog post about transfers of IPv4 addresses within the RIPE region. A lot of addresses went from Romania to Saudi Arabia, but the rest of Europe and the Middle East has been busy, too. However:
In the subsequent months of January 2015 through to April 2015, levels of transfer were significantly lower. Because the RIPE NCC listing service continues to show strong demand, the lower amounts transferred may well be a sign that the market in the RIPE region is capped by availability; total demand cannot be met by available supplies. This may change after the recently accepted RIPE policy for inter-RIR transfers has been implemented.
It probably wasn't an accident that two of the sponsors of the RIPE-70 meeting were businesses that facilitate IPv4 address trading.
For some years now, the Regional Internet Registries have been rolling out RPKI. The Resource Public Key Infrastructure allows holders of IP addresses to authorize an autonomous system to inject those addresses in BGP. (See here for an overview of how RPKI works and more links.)
I've always thought it would be hard to deploy RPKI in the real world, because it's just way too easy for a certificate or ROA (route origination authorization) to expire. If that then leads to routes becoming invalid and the addresses in question being unreachable, that would be a good example of the cure being worse than the disease.
Fortunately, that's not the case: RPKI is ready for real-world deployment today.
So packets will follow a path that is RPKI-validated if available. If not, they follow a path that isn't covered by RPKI if that's available. Only if there's no "valid" or "unknown" paths, the packets will be sent over an "invalid" path that is covered by RPKI, but validation failed. The trouble with this approach is that it still allows for invalid more specific prefixes to hijack traffic. For instance:
RIPE has a ROA for prefix 188.8.131.52/21 that allows AS 3333 to originate that prefix, with a maximum prefix length of /21. So if AS 4444 originates 184.108.40.206/21, that will result in the following BGP table:
Network Next Hop Metric LocPrf Weight Path >* 220.127.116.11/21 18.104.22.168 10 200 0 3333 i * 22.214.171.124 10 50 0 4444 i
So effectively, the path through AS 4444 is ignored. However, AS 4444 could also do this:
Network Next Hop Metric LocPrf Weight Path >* 126.96.36.199/21 188.8.131.52 10 200 0 3333 i >* 184.108.40.206/24 220.127.116.11 10 50 0 4444 i >* 18.104.22.168/24 22.214.171.124 10 50 0 4444 i >* 126.96.36.199/24 188.8.131.52 10 50 0 4444 i >* 184.108.40.206/24 220.127.116.11 10 50 0 4444 i >* 18.104.22.168/24 22.214.171.124 10 50 0 4444 i >* 126.96.36.199/24 188.8.131.52 10 50 0 4444 i >* 184.108.40.206/24 220.127.116.11 10 50 0 4444 i >* 18.104.22.168/24 22.214.171.124 10 50 0 4444 i
So even though the path towards the /21 is still routed to AS 3333, the packets flow to AS 4444 because of the longest match first rule. Solution: filter out "invalid" prefixes completely.
But then, what happens when RIPE forgets to renew their certificate or ROA in time? If their prefix would then revert to "invalid", it would disappear from routing tables everywhere, and RIPE would be unreachable:
Network Next Hop Metric LocPrf Weight Path
In this scenario, it would be very dangerous to filter "invalid" prefixes, as RPKI is still relatively immature and mistakes will happen.
❝If ARIN (or another other RIR) went offline or signed broken data, all signed prefixes that previously has the RPKI status "Valid", would fall back to the state "Unknown", as if they were never signed in the first place. The state would NOT be "Invalid".❞
So what would happen is this:
Network Next Hop Metric LocPrf Weight Path >* 126.96.36.199/21 188.8.131.52 10 100 0 3333 i
Obviously, in this case the protection against unauthorized origination of the prefixes in question would go away, but in the normal situation where nobody tries to hijack those prefixes, they would still be reachable and a mistake with certificate or ROA expiration wouldn't immediately lead to a network disappearing off of the internet.
In other words: deploy RPKI today. It doesn't protect against all forms of malicious address hijacking, but it does offer very robust protection against accidental unauthorized route origination, such as the infamous Youtube/Pakistan incident. Also, you can run an RPKI validator locally without the need for your upstream ISPs or peers to do the same. Archives of all articles - RSS feed
My Books: "BGP" and "Running IPv6"On this page you can find more information about my book "BGP". Or you can jump immediately to chapter 6, "Traffic Engineering", (approx. 150kB) that O'Reilly has put online as a sample chapter. Information about the Japanese translation can be found here.
More information about my second book, "Running IPv6", is available here.
BGP SecurityBGP has some security holes. This sounds very bad, and of course it isn't good, but don't be overly alarmed. There are basically two problems: sessions can be hijacked, and it is possible to inject incorrect information into the BGP tables for someone who can either hijack a session or someone who has a legitimate BGP session.
Session hijacking is hard to do for someone who can't see the TCP sequence number for the TCP session the BGP protocol runs over, and if there are good anti-spoofing filters it is even impossible. And of course using the TCP MD5 password option (RFC 2385) makes all of this nearly impossible even for someone who can sniff the BGP traffic.
Nearly all ISPs filter BGP information from customers, so in most cases it isn't possible to successfully inject false information. However, filtering on peering sessions between ISPs isn't as widespread, although some networks do this. A rogue ISP could do some real damage here.
There are now two efforts underway to better secure BGP:
The IETF RPSEC (routing protocol security) working group is active in this area.
What is BGPexpert.com?BGPexpert.com is a website dedicated to Internet routing issues. What we want is for packets to find their way from one end of the globe to another, and make the jobs of the people that make this happen a little easier.
Ok, but what is BGP?Have a look at the "what is BGP" page. There is also a list of BGP and interdomain routing terms on this page.
BGP and MultihomingIf you are not an ISP, your main reason to be interested in BGP will probably be to multihome. By connecting to two or more ISPs at the same time, you are "multihomed" and you no longer have to depend on a single ISP for your network connectivity.
This sounds simple enough, but as always, there is a catch. For regular customers, it's the Internet Service Provider who makes sure the rest of the Internet knows where packets have to be sent to reach their customer. If you are multihomed, you can't let your ISP do this, because then you would have to depend on a single ISP again. This is where the BGP protocol comes in: this is the protocol used to carry this information from ISP to ISP. By announcing reachability information for your network to two ISPs, you can make sure everybody still knows how to reach you if one of those ISPs has an outage.
For those of you interested in multihoming in IPv6 (which is pretty much impossible at the moment), have a look at the "IPv6 multihoming solutions" page.
Are you a BGP expert? Take the test to find out!
These questions are somewhat Cisco-centric. We now also have another set of questions and answers for self-study purposes.
You are visiting bgpexpert.com over IPv4. Your address is 184.108.40.206.