In a previous post here I detailed how to deploy a single-region hub and spoke network topology in Azure. However, most deployments in Azure have a requirement for high availability, meaning multiple hub and spoke topologies in different Azure regions. Deploying and connecting the two hub and spoke topologies is a complicated process so I wanted to share my experience in getting this to work.
Not all multi-region hub and spoke topologies have a requirement for connecting the regions (for example if you want to set up the second region as a passive disaster recovery region, with VNets as target VNets for Azure Site Recovery).
You would only connect the hubs if there is a requirement for spokes in different regions to be able to communicate, e.g. for SQL replication. As both spokes will need a network virtual appliance (or firewall) in each hub, the traffic flow should be Spoke 1 – Hub 1’s Firewall – Hub 2’s Firewall – Spoke 3.
I’ve seen four major scenarios for connecting hub and spokes architectures in Azure:
VPN Connections to On-Premises
Diagram: VPN Connections to On-Premises/Private Cloud
The most common topology I’ve seen is to use separate Site-to-site VPN connections between On-premises/private datacentres and Azure hubs. It’s much cheaper to manage site-to-site VPNs but doesn’t give the resilience and performance an ExpressRoute connection would.
In this scenario it is recommended that VPN Peering is used to connect the two hubs and then route tables added the firewall subnet to force traffic destined to the remote spokes through the remote hub’s firewall. It is useful to consider the traffic flow of say Spoke 1 to Spoke 3; we would want traffic to go from Spoke 1, over the peering and through the firewall in Hub 1, across VNet peering again to Hub 2 and through it’s firewall, and finally across VNet peering to Spoke 3. Firewall rules will need to be added to both Hub 1 and Hub 2’s firewalls to allow this traffic:
DIAGRAM: Traffic Flow between Spokes with VPN connections
For this to work, a route table needs to be added to the AzureFirewallSubnet (if using Azure Firewall). You will notice when deploying this configuration is that Azure requires you put a default route on a route table associated with this specific subnet.
I’ve also found that traffic directed on-premises over the VPN needs to be defined in this route table if BGP VPN isn’t used. On the topic of BGP, it’s also important to highlight a setting in Azure route tables, ‘Propagate gateway routes’
This setting basically enables the propagation of BGP routes learnt from the Virtual Network Gateway (if using BGP VPN or ExpressRoute). When enabled, a number of extra routes will be added to a subnet’s route table which are learnt using the BGP protocol. To see these routes the easiest method is to check ‘Effective routes’ on a VM’s Network interface in that subnet:
If you aren’t using BGP at all, then all routes will need to be statically defined in an Azure route table.
Below is an example of route table on the AzureFirewallSubnet in a hub that would work to route traffic through the remote hub and Azure Firewall:
Remember that routing will always use the more specific route first, so in the above route table the default route is at the lowest priority.
When traffic hits the Azure Firewall to be forwarded on, it will go to the VNet Gateway (and across the VPN to on-prem) if in the 192.168.0.0/16. To reach Spoke 3 and Spoke 4 it will be forwarded across the peering to the remote hub’s firewall IP address (10.3.254.4). If there are other subnets in the remote hub’s VNet that need to be routed to via the Az Firewall, then you will need to add those into this route table too (otherwise it’ll go directly over the peering and bypass the remote firewall).
Remember also to add a similar corresponding route table on the AzureFirewallSubnet in Hub 2. Finally, as you should now be routing through two Azure firewalls to route traffic between spokes, you will need Firewall rules written into BOTH Azure Firewalls to allow the traffic.
ExpressRoute Connections to On-Premises
When an ExpressRoute circuit is deployed and connected to the two hubs using ExpressRoute gateways, you can simply use the Microsoft core network to route traffic between the two hub and spoke topologies. So in the example of directing traffic from Spoke 1 to Spoke 3 the traffic would route through the Hub 1 Firewall, out the ExpressRoute Gateway onto the Microsoft core network, to Hub 2’s ExpressRoute gateway, through the Hub 2 Firewall and then to Spoke 3.
DIAGRAM: Traffic Flow in ExpressRoute hub and spoke topology
This all works using the BGP protocol, where instead of having to use static routing and a route table on AzureFirewallSubnet (like in the VPN example) the ExpressRoute circuit knows about the IP ranges of the Hub 2, Spoke 3 and 4 networks and will route to Hub 2’s ExpressRoute gateway. No route tables are required on the AzureFirewallSubnet and so long as you’ve connected your Expressroute circuit to Expressroute gateways in both hubs, the connectivity should be there. There should already be route tables on both GatewaySubnet subnets forcing all traffic to the firewall as it comes in from the ExpressRoute.
Remember however you will still need rules in both Azure Firewalls to allow Spoke 1 to communicate with Spoke 3.
Additionally, you are not limited to the bandwidth of your ExpressRoute circuit when connecting your hub and spoke topologies this way, so even if you have a 100MB ExpressRoute circuit, the bandwidth will be 1Gb + between regions. You are limited I believe by Azure VM sizes, as different sizes give different throughput on NICs.
ExpressRoute with VPN Backup Connections to On-Premises
For additional resilience to the single ExpressRoute circuit (even though there are always two physical cables for a single ExpressRoute circuit, there is still a single point of failure on the Microsoft Edge datacenter) a backup VPN gateway can be deployed in the GatewaySubnet of each hub VNet. If the ExpressRoute circuit were to fail and traffic cannot be routed ton on-premises, Azure will automatically route the traffic down the Site-to-site VPN tunnels instead. Microsoft have a guide on this topology here:
DIAGRAM: Site-to-site VPN as a backup to ExpressRoute
In this scenario, when connecting hubs you would configure connectivity to use the ExpressRoute circuit as per the last section. In the event of the ExpressRoute circuit failing you would need to configure peering and route tables on the FirewallSubnet as per the VPN section earlier. Unfortunately this would be a manual process (unless scripted using Azure automation, perhaps another blog post…) so to save time you could pre-create the route tables and in the event of an outage, VNet peering could be introduced and route tables applied to both Firewall subnets.
Cloud-only (Disconnected)
Some organisations no longer have the requirement for on-premises traffic at all, for example if users connect into an Azure environment using Azure Virtual Desktop, or Azure VPN (also very useful with more and more people working from home). In this topology, you could have two hub and spoke topologies with no ExpressRoute or Site-to-site VPN deployed at all.
DIAGRAM: Cloud Only (disconnected) hub and spoke network topology
In this scenario, you would connect the two hubs using VNet Peering and put route tables on the AzureFirewallSubnet as illustrated in the VPN example above.
Final Thoughts
In the next few years, I believe Virtual WAN will supersede traditional hub and spoke topologies like this one, but this is still an Azure technology in it’s infancy. Instead of having to manually set up VNet peering, route tables etc, you simply deploy a Virtual WAN instance and several hubs (with or without Firewall) and connect spoke VNets. More info on Virtual WAN here:
https://docs.microsoft.com/en-us/azure/virtual-wan/virtual-wan-about
Finally, I also wanted to thank the marvellous Aidan Finn for his article on hub and spoke topologies here – this provided a basis for my understanding and learning of this concept:
https://aidanfinn.com/?p=21935
Hi Archie
I came across your comment on Aidan’s blog yesterday and left my comment (to be approved by Aidan). I had some use-cases for Azure cross-region connectivity and based on my observations, cross-region connectivity via ExpressRoute gateways and MSEE routers is limited by the bandwidth allocated to the ExpressRoute circuit. Refer to my observations at https://azure.cybergav.in/azure-cross-region-network-testing/ . Further, for SQL replication traffic (or any database replication traffic), it’s best to have the least (if not none) number of devices in the traffic path and hence Global VNet peering would be more appropriate for such a use case.
Regards
Gavin