Border Gateway Protocol (BGP) is a fundamental routing protocol used by cloud providers to enable connectivity and routing between networks. BGP is leveraged by major cloud providers like AWS, Azure, and GCP to create highly scalable, robust, and globally connected cloud networks.
Before we dive in, make sure you have read through my previous BGP posts.
- What is BGP in Networking
- BGP Message Types
- BGP States
- How to advertise routes in BGP – 5 Simple Techniques
- eBGP vs iBGP
- What is BGP Regular Expression
- BGP Security Best Practices
- Common BGP Misconfigurations and Ways to Fix Them
- Cisco BGP Multihoming with Two Different ISPs
Let’s get started.
Some of the key ways BGP is used in cloud networking are:
Interconnecting Cloud Networks to Customer’s On-Premise Networks:
BGP is crucial for establishing peering and routing between cloud providers’ networks and customer networks hosted within their cloud infrastructure. This process enables workloads in customer networks to communicate with resources in the cloud provider’s network and vice versa, allowing for dynamic routing and the exchange of routing information between autonomous systems. Major cloud providers like AWS, Azure, and GCP use BGP to peer with customer networks, such as AWS using BGP for peering with a customer’s on-premises network via AWS Direct Connect.
Let us understand Interconnecting Cloud Networks to Customer On-Premise Networks using BGP by the diagram:
This diagram represents the concept of interconnecting cloud networks for different cloud providers:
- The customer On-Premises Network connects to AWS, Azure, and GCP using BGP peering.
- AWS uses AWS Direct Connect for BGP peering with the customer’s on-premises network.
- Azure uses Express Route for BGP peering with the customer’s on-premises network.
- GCP uses Interconnect for BGP peering with the customer’s on-premises network.
- Customer Workloads can communicate with Cloud Provider Resources, enabling dynamic routing and exchange of routing information between autonomous systems.
Connecting Cloud Regions:
BGP is vital for connecting geographically distributed cloud regions, enabling the routing of traffic between regions for high availability and reduced latency.
Major cloud providers like AWS, Azure, and GCP use BGP to connect their regions worldwide, allowing workloads to span across regions. For example, AWS uses BGP to route traffic between its various regions, such as US-East, US-West, EU, and Asia Pacific, to maintain a high availability and low latency of applications.
The interesting point is that as cloud providers expand into more regions, scaling BGP to connect numerous regions becomes challenging. To address this, AWS, Azure, and GCP have developed advanced BGP optimization techniques, such as a BGP route reflector with an “add-path,” which efficiently distributes routes between many regions. These scale-out optimizations have enabled truly global cloud networks.
Let us understand Connecting Different Cloud Regions using BGP by the diagram:
This diagram represents the concept of connecting geographically distributed cloud regions for different cloud providers in more detail:
- AWS US-East, US-West, EU, and Asia Pacific regions are interconnected using BGP.
- Azure East US, West US, EU, and Asia Pacific regions are interconnected using BGP.
- GCP US-East, US-West, EU, and Asia Pacific regions are interconnected using BGP.
In all cases, BGP is used to establish routing between these cloud regions, ensuring high availability and low latency for applications running across regions.
Routing to the Internet:
BGP is essential for cloud providers to peer and route traffic to the Internet, acting as a transit point for Internet-bound and Internet-originating traffic. Major cloud providers like AWS, Azure, and GCP announce route prefixes to major ISPs and use BGP to route customer traffic to and from the Internet, such as AWS announcing customer prefixes to its ISP partners for connectivity to public Internet from EC2 instances or other AWS resources.
The interesting point is that the way cloud providers announce customer routes via BGP and the partnerships they form with ISPs significantly impacts connectivity and performance. AWS, Azure, and GCP work with top ISPs to provide redundant connectivity and optimal peering while carefully tuning BGP policies to offer the best Internet routing for customers. These BGP and peering optimizations contribute to a high-performance global network backbone.
Let us understand Routing to the Internet in the Cloud using BGP by the diagram:
This diagram represents the concept of routing to the Internet for different cloud providers in more detail:
- An “AWS Customer” accesses an “EC2 Instance.”
- AWS announces a route prefix to an “ISP” partner.
- An “Azure Customer” accesses a “VM Instance.”
- Azure announces a route prefix to an “ISP” partner.
- A “GCP Customer” accesses a “GCE Instance.”
- GCP announces a route prefix to an “ISP” partner.
In all cases, the ISPs connect to the “Public Internet” to enable communication between customer instances and Internet resources.
Cloud providers typically aggregate customer routes to minimize the number of routes announced over BGP peering sessions, optimizing routing tables and reducing the size of BGP updates. Route aggregation is a crucial technique for scaling BGP in large cloud networks. Providers like AWS, Azure, and GCP aggregate customer CIDR blocks into larger prefixes to limit the number of routes announced to partners. For instance, AWS may aggregate routes from multiple VPCs into a 192.168.0.0/16 prefix to announce to an AWS Direct Connect partner.
The route aggregation schemes used by each cloud provider significantly impact their BGP routing tables’ size and the number of routes announced to partners. To minimize BGP routes, AWS, Azure, and GCP use CIDR allocation and aggregation strategies tailored to cloud networking, such as AWS allocating customers a /20 CIDR block and aggregating multiple blocks into a /16 prefix to announce to Direct Connect partners.
Check out my complete course on Mastering IP Addressing and Subnetting: From Fundamentals to Advanced Techniques.
Let us understand Route Aggregation in the Cloud using BGP by the diagram:
This diagram represents the route aggregation concept for different cloud providers in more detail:
- “Customer VPC 1 /20” and “Customer VPC 2 /20” are aggregated into an “Aggregate /16 Prefix.”
- This aggregate prefix is announced to an “AWS Direct Connect Partner.”
- “Azure VNet 1 /24” and “Azure VNet 2 /24” are aggregated into an “Aggregate /23 Prefix.”
- This aggregate prefix is announced to an “Azure ExpressRoute Partner.”
- “GCP VPC 1 /20” and “GCP VPC 2 /20” are aggregated into an “Aggregate /16 Prefix.”
- This aggregate prefix is announced to a “GCP Cloud Interconnect Partner.”
BGP ensures fast failover and high availability in the cloud by quickly rerouting traffic to alternate paths when link failures or network issues occur, resulting in a robust network for customers. Major cloud providers like AWS, Azure, and GCP utilize BGP capabilities, such as fast failover routes, to avoid network disruption in case of router loss, link outage, or infrastructure issues. The convergence time and failover routing policies configured for BGP in each provider’s network determine the speed and efficiency of traffic restoration after a network failure.
AWS, Azure, and GCP have designed their BGP infrastructure for fast failure detection and optimized failover routing to provide maximum availability. Techniques like BGP prefix prioritization are employed to prefer failover routes, ensuring traffic is rerouted quickly even after significant network problems.
Let us understand High availability in the Cloud using BGP by the diagram:
This diagram represents the high availability concept in more detail:
- A “Customer Workload” communicating with a “Cloud Resource” via a “Primary Route.”
- The “Customer Workload” also has a “Failover Route” to an “Alternate Cloud Resource” in case of a failure in the primary path.
- An “Edge Router” in the customer network is connected to two “Cloud Provider Routers” via a “Primary Link” and “Secondary Link” to ensure redundancy.
- The “Cloud Provider Router 1” provides the “Primary Path” to the “Cloud Resource,” while the “Cloud Provider Router 2” provides the “Alternate Path” to the “Alternate Cloud Resource.”
- BGP prefix prioritization is used between the two Cloud Provider Routers to ensure quick failover and traffic rerouting in case of a network issue.
In conclusion, BGP plays a crucial role in enabling scalable, robust, highly available, and globally connected cloud networks. By understanding how large cloud providers have optimized BGP, valuable insights can be gained into architecting scalable cloud networks. BGP is a key foundational element in global cloud infrastructure and will continue to grow in importance as networks expand in size and complexity.