Table of contents (for this page):
If you could use some help with BGP, have a look at my business web site: inet6consult.com.
BGP routing coursesThere are currently no training courses planned.
Interdomain Routing & IPv6 News
The BGP RFCs state that external BGP peers should insert their own AS into the AS PATH advertised to eBGP peers. Some peers strip their AS, generally for commercial gain. Juniper and Cisco have opposite default behaviors for handling this. Make sure you set bgp enforce-first-as on Juniper routers. Caveats apply.
The annoying part here is that you want to disable this check for internet exchange route servers, but keep it enabled for everything else for security reasons. But that's not universally possible, as on some routers this is a global setting, rather than a per-neighbor one.
The necessary background
In addition to the "well-known" BGP path attributes that we all know (because the RFC says we must) and love (because they make the internet work), it's also possible define new attributes to provide new functionality. These can be "transitive" attributes, which means that a BGP router that doesn't recognize them propagates them to its BGP neighbors unchanged.
The ability to create new optional transitive attributes has allowed us to run BGP version 4 for three decades without having to bump the version number because we had to make backward-incompatible changes that would make adoption all but impossible.
For instance, 32-bit autonomous system numbers were added as the 16-bit BGP AS numbers started to run out. In addition to the well-known (mandatory) 16-bit AS path, an optional 32-bit AS path was added. If a router in the middle didn't understand the 32-bit AS path, it would update the 16-bit AS path and propagate the 32-bit AS path unchanged.
The next 32-bit capable BGP router can then add back the AS numbers from the 16-bit path that are missing from the 32-bit path, and 32-bit AS numbers work even if routers in the middle don't understand them. (They just see "23456".)
The error handling issue
In Ben's blog post, he talks about a Brazilian network included a malformed version of a still experimental attribute. All the big routers in the core of the internet don't run experiments, so they just saw an attribute they didn't recognize, and propagated it as per the transitive setting. Eventually BGP updates with the broken attribute arrived at routers that did understand the attribute, but saw that it was broken.
So as per the original BGP spec, they tore down the BGP session towards the router that sent them the broken attribute. And then, after a short delay, tried to set up a new BGP session towards that neighboring router. Only to encounter the same error again and tearing down the BGP session again. And so on.
Which is probably not wat you want. Which is nicely explained in RFC 7606, published in 2015, which suggests to treat such errors as if the neighboring router had asked to withdraw the route containing the offending path attribute. So if a neighbor tells me prefixes 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 are reachable through them, and 172.16.0.0/12 has a broken attribute, I just act as if my neighbor had told me that 172.16.0.0/12 is not reachable through them. But I don't bring down the BGP session so 10.0.0.0/8 and 192.168.0.0/16 remain reachable through the neighbor in question.
Ben seems to be rather annoyed that many router vendors don't implement the RFC 7606 behavior, implement it but don't enable it by default, and/or don't have a bug bounty program to reward security researchers for pointing out these deficiencies. He spent a good amount of time evaluating different implementations and then "fuzzing" attributes to see what would happen, So that's somewhat understandable. Here is his score card from his presentation slides:
I agree that the RFC 7606 handling by default is what you want. I also agree that changing a default here, something router vendors loathe to do, shouldn't be problematic.
However, these are pretty obscure errors. This is not an internet extinction level issue.
For my own network, I would strongly prefer a mechanism to turn off handling of these often rather frivolous new attributes. Both to avoid being bitten by buggy implementations elsewhere, but also to avoid inflating BGP messages. As BGP updates propagate, the AS paths (the 16- and 32-bit versions) increase in length, so an update that was just under the limit at some point will exceed the maximum size of 4096 bytes at some point, and then definitely bad things will happen.
However, it's important that new transitive attributes aren't filtered out wholesale, as that would make it impossible to add new features to BGP. I'm not sure if there is a workable way to put a stop to frivolous BGP path attributes being injected into the global routing system while at the same time not robbing BGP of its forward compatibility with future new innovations.
The other day, I landed on this article: In Focus: Subsea Network Architecture: IXPs. The article takes some time to arrive at the point that undersea internet exchanges would be a good idea. The most eyecatching part is a variation on this image:
As the article starts out discussing how datacenters have been moving away from large cities to take advantage of opportunities such as space, cheap energy and easier cooling, this image seems to suggest that these blue dots in are good locations for datacenters and/or internet exchanges in general. And that's definitely not the point of the paper that the image is from.
That paper is very specifically about the best locations to place servers for high speed algorithmic trading on multiple markets some distance away from each other. This immediately explains why there is nothing around the western US: there are simply no stock exchanges / markets there (the red dots in the image).
The math looks more complicated, but presumably, in these cases it helps when the servers executing the trading algorithms are in the middle between the "users", rather than close to one and further from the other(s).
If you need data from two places far away from each other, then it's better when each is 25 milliseconds away, as you can then complete your action in 25 ms plus however long it takes to do your own processing. If you're close to one so it's 0 ms for one data source and 50 ms for the other, then the entire action takes at least 50 ms.
But is that a common situation?
In general, you can just copy the data beforehand. So this only applies if you're using "live" data from two or more locations. Videoconferencing with a number of participants could be an example, where a server receives the video from all the participants, mixes it into a single feed and then sends that single feed out to all the participants. If the server is in the middle, this limits the maximum delay. I guess that could be somewhat helpful. But to the degree that it makes sense to have datacenters in the middle of the ocean? I'm not convinced.
On potaroo.net Geoff Huston wishes Happy 50th Birthday Ethernet.
Back in 2011 I wrote an Ars Technica feature about the history of Ethernet: Speed matters: How Ethernet went from 3Mbps to 100Gbps… and beyond.
Interesting to compare our different takes!
And of course Ethernet is still going strong. My oldest computers have the original 10 Mbps Ethernet adapters that I got almost 30 years ago, while my newest computer has 10 gigabit Ethernet, 1000 x faster.
My Books: "BGP" and "Running IPv6"On this page you can find more information about my book "BGP". Or you can jump immediately to chapter 6, "Traffic Engineering", (approx. 150kB) that O'Reilly has put online as a sample chapter. Information about the Japanese translation can be found here.
More information about my second book, "Running IPv6", is available here.
BGP SecurityBGP has some security holes. This sounds very bad, and of course it isn't good, but don't be overly alarmed. There are basically two problems: sessions can be hijacked, and it is possible to inject incorrect information into the BGP tables for someone who can either hijack a session or someone who has a legitimate BGP session.
Session hijacking is hard to do for someone who can't see the TCP sequence number for the TCP session the BGP protocol runs over, and if there are good anti-spoofing filters it is even impossible. And of course using the TCP MD5 password option (RFC 2385) makes all of this nearly impossible even for someone who can sniff the BGP traffic.
Nearly all ISPs filter BGP information from customers, so in most cases it isn't possible to successfully inject false information. However, filtering on peering sessions between ISPs isn't as widespread, although some networks do this. A rogue ISP could do some real damage here.
There are now two efforts underway to better secure BGP:
The IETF RPSEC (routing protocol security) working group is active in this area.
What is BGPexpert.com?BGPexpert.com is a website dedicated to Internet routing issues. What we want is for packets to find their way from one end of the globe to another, and make the jobs of the people that make this happen a little easier.
Ok, but what is BGP?Have a look at the "what is BGP" page. There is also a list of BGP and interdomain routing terms on this page.
BGP and MultihomingIf you are not an ISP, your main reason to be interested in BGP will probably be to multihome. By connecting to two or more ISPs at the same time, you are "multihomed" and you no longer have to depend on a single ISP for your network connectivity.
This sounds simple enough, but as always, there is a catch. For regular customers, it's the Internet Service Provider who makes sure the rest of the Internet knows where packets have to be sent to reach their customer. If you are multihomed, you can't let your ISP do this, because then you would have to depend on a single ISP again. This is where the BGP protocol comes in: this is the protocol used to carry this information from ISP to ISP. By announcing reachability information for your network to two ISPs, you can make sure everybody still knows how to reach you if one of those ISPs has an outage.
For those of you interested in multihoming in IPv6 (which is pretty much impossible at the moment), have a look at the "IPv6 multihoming solutions" page.
Are you a BGP expert? Take the test to find out!
These questions are somewhat Cisco-centric. We now also have another set of questions and answers for self-study purposes.
You are visiting bgpexpert.com over IPv4. Your address is 126.96.36.199.