Geo-Aware Multi Cluster Ingress — Ingress for Anthos

Ingress for Anthos — Multi-cluster Ingress and Global Service Load Balancing

Published in

ITNEXT

14 min readSep 18, 2020

Ingress for Anthos is a Google cloud-hosted multi-cluster ingress controller for Anthos GKE clusters. Ingress for Anthos supports deploying shared load balancing resources across clusters and across regions enabling users to use a same load balancer with an anycast IP for applications running in a multi-cluster and multi-region topology.

In simpler terms this allows users to place multiple GKE clusters located in different regions under one LoadBalancer. It’s a controller for the external HTTP(S) load balancer to provide ingress for traffic coming from the internet across one or more clusters by programming the external HTTP(S) load balancer using network endpoint groups (NEGs). NEGs are useful for Container native load balancing where each Container can be represented as endpoint to the load balancer.

Taking advantage of GCP’s 100+ Points of Presence and global network, Ingress for Anthos leverage GCLB along with multiple Kubernetes Engine clusters running across regions around the world, to serve traffic from the closest cluster using a single anycast IP address.

Anycast, Geo-Aware Proximity-Based Routing

Global load balancing with Anycast

Ingress for Anthos creates an external HTTP(S) load balancer in Premium Tier, it uses a global external IP address advertised as Anycast IP and can intelligently route requests from users to the closest backend instance group or NEG (Network Element Group), based on proximity. Compared with using multiple addresses with DNS-based load balancing, dedicated anycast addresses mean that clients anywhere can connect to the same IP address, while still entering Google’s network as fast as possible and connecting to a load balancer at the edge of Google’s network where the traffic entered minimizing the network distance between the client and the frontline load balancer.

Anycast enables usage of a same IP address and requesting backends with identical content in different geographical zones around the world provides the shortest response times possible. Anycast directs packets to the geographically closest backend based on Border Gateway Protocol (BGP) paths. For example, if a user sets up instance groups/NEGs in North America, Europe, and Asia, and attach them to a load balancer’s backend service, user requests around the world are automatically sent to the VMs/Pods closest to the users, assuming the VMs/Pods pass health checks and have enough capacity (defined by the balancing mode).

Anycast is linked with the BGP protocol which ensures that all of a router’s neighbors are aware of the networks that can be reached through that router and the topographical distance to those networks. With anycast, the system consistently chooses the shortest path every time. In the event of a node failure, the next shortest route is determined, and traffic is redirected without having to change the IP address.

Anycast improves throughput and minimizes latency for clients around the world. Further, if Cloud CDN is added, caching can be enabled at these edge locations. A global anycast IP address enables users to seamlessly change or add regions for deploying application instances and increase capacity as needed. Apart from latency optimization other biggest advantage with this approach is high-availability, if the closest backends are all unhealthy, or if the closest instance group/NEG is at capacity and another instance group/NEG is not at capacity, the load balancer automatically sends requests to the next closest region with capacity.

Cold Potato Routing to Google PoPs

Requests from clients are routed cold potato to Google PoPs, meaning that internet traffic goes to the closest PoP and gets to the Google backbone as fast as possible.

Cold Potato technique in contrast to Hot Potato (traffic is sent to the peer at the closest exchange point), carry the customer’s traffic for as long as possible before delivering packets to the peer using the internal network/backbone. Although Cold Potato increase overall operational cost, provides advantages like keeping the traffic under the network administrator’s control for longer, applying sophisticated rules using custom software stacks, allowing operators of well-provisioned networks to offer a higher quality of service to their customers.

Google PoPs or GFE’s (Google Front Ends) are software-defined, distributed systems that are located in Google points of presence (PoPs) and perform global load balancing in conjunction with other systems and control planes. Any internal service which chooses to publish itself externally uses the GFE as a smart reverse-proxy front end. This front end provides public IP hosting of its public DNS name, Denial of Service (DoS) protection, and TLS termination.

Container Native Load Balancing with NEG’s (Network Endpoint Groups)

NEGs are lists of IP addresses used by Google Cloud load balancers and IP addresses in a NEG can be primary or secondary IP addresses of a VM, which means they can be Pod IPs. In a GKE VPC-Native cluster (Pod IP addresses are natively routable within the cluster’s VPC network and other VPC networks connected to it by VPC Network Peering) GCP VPC network is aware of alias IPs, so the routing is taken care by VPC without any need for extra route configuration.

NEGs are useful for Container native load balancing where each Container can be represented as endpoint to the load balancer. This enables container-native load balancing that sends traffic directly to Pods from a Google Cloud load balancer.

In the old GKE load-balancing approach, traditional instance-groups used to be the backends and these comprise the actual Kubernetes nodes where the workloads run.

Load Balancer — backends — Instance Groups

Instance groups with Kubernetes nodes:

Users can use a tool called kubemci (a tool to configure Kubernetes ingress to load balance traffic across multiple Kubernetes clusters) to create a LoadBalancer spread across clusters and associate an anycast IP.

This approach is now deprecated and Ingress for Anthos is the recommended way to deploy multi-cluster ingress moving forward.

Without Container native load balancing, load balancer would distribute packets to each node (Kubernetes node) and iptables in each node would further distribute the packets to each pod/container and this adds the extra hop. With Container native load balancing, the advantages are better load distribution, removal of extra hop that reduces latency and better health check.

Ingress for Anthos is a cloud-hosted multi-cluster ingress controller that programs the external HTTP(S) load balancer using network endpoint groups (NEGs). This service is Google-hosted and supports deploying shared load balancing resources across clusters and across regions. The controller deploys Compute Engine load balancer resources and configures the appropriate Pods across clusters as backends when user creates a MultiClusterIngress resource. The NEGs are used to track Pod endpoints dynamically so the Google load balancer has the right set of healthy backends.

Container Native Load Balancing with NEGs

On GKE the NEGs can be created and managed automatically by adding an annotation to a Kubernetes Service (cloud.google.com/neg). When NEGs are used with Anthos Ingress, the Ingress controller facilitates the creation of all aspects of the L7 load balancer. This includes creating the virtual IP address, forwarding rules, health checks, firewall rules, and more.

Tryout

Three GKE clusters are created in three different regions and all the three clusters are registered to Anthos Hub. The cluster in us-west2-a is selected as a config cluster, a config cluster is a centralized point of control for the multi-cluster resources — MultiClusterIngress and MultiClusterService. These multi-cluster resources exist in and are accessible from a single logical API to retain consistency across all clusters. The Ingress controller watches the config cluster and reconciles the load balancing infrastructure.

The ingress controller is a globally distributed control plane that runs as a service outside of the clusters. Environ provides a unified way to view and manage multiple clusters and their workloads as part of Anthos. Clusters are registered with Anthos using the project’s environ using Connect, ingress uses the concept of environs for how Ingress is applied across different clusters. Clusters that are registered to an environ become visible to Ingress, so they can be used as backends for Ingress. GKE clusters registered as members to the Hub become part of what is conceptually known as an environ.

Clusters registered to an environ are called member clusters, in the topology above all three clusters are member clusters (config cluster can be isolated with cluster-selector). Member clusters in the environ comprise the full scope of backends that Ingress is aware of.

MCI (MuliClusterIngress) and MCS (MultiClusterService) are custom resources (CRDs) that are the multi-cluster equivalents of Ingress and Service resources. These are configured on the config cluster and MCS creates required derived services (actual services mapping to endpoints) in the member clusters.

MultiClusterIngress and MultiClusterService

A MultiClusterIngress resource contains a list of backend services (names of MCS that is deployed in the same Namespace and config cluster as this MCI). A MultiClusterIngress (MCI) resource behaves identically in many ways to the core Ingress resource. Both have the same specification for defining hosts, paths, protocol termination and backends.

A MultiClusterService (MCS) is a custom resource used by Ingress for Anthos that is a logical representation of a Service across multiple clusters. An MCS is similar to, but substantially different from, the core Service type. An MCS exists only in the config cluster and generates derived Services in the target clusters. An MCS does not route anything like a ClusterIP, LoadBalancer, or NodePort Service does.

A MCI resource can have a default backend (MCS) and list of rules specifying multiple backends. MCI matches traffic to the VIP on the hosts specified in the rules by sending the traffic to the MCS resource specified in the backend and all other traffic which does not match will be sent to the default-backend MCS.

Users can selectively apply ingress rules using ‘clusters’ on MCS configuration where the derived services are only created on the list specified. If the “clusters” section of the MultiClusterService is not specified or if no clusters are listed, it is interpreted as the default “all” clusters. This feature provides advantages such as: Isolating the config cluster to prevent MCSs from selecting across them, routing to application backends that only exist in subset of clusters/regions etc., controlling traffic between clusters in a blue-green fashion useful for app migration, facilitates to use a single L7 VIP for different clusters.

MultiClusterIngress and MultiClusterService — Cluster Selector and Ingress Rules

The following are the steps in the workflow to configure Ingress for Anthos across multiple clusters:

Workflow — Configuring Ingress for Anthos

Multi Cluster Ingress API should be enabled in the project:

As shown below three GKE clusters are created in different Zones and added to the hub. Cluster1 is selected as a config cluster and Multicluster Ingress Feature is enabled on the hub level.

All clusters added to Hub using Connect, as all these clusters are GKE managed clusters the type specifies them as GKE:

Clusters in a Hub — Registered using Connect

Enabling the ingress feature from Anthos portal:

Enabling Multi Cluster Ingress Feature — Anthos Portal

Ingress detail page showing the config cluster information and all other registered clusters (memberships):

Cluster Memberships and Config Membership

A demo application ‘zone-ingress’ is deployed on all three clusters, this application just prints the name of the datacenter where it is running when accessed.

While deploying applications across clusters users have to consider a concept called “namespace sameness”, a characteristic possessed by Environs. This assumes that resources with the identical names and same namespace across clusters are considered to be instances of the same resource. In effect, this means that Pods in the ‘zoneprinter’ namespace with the labels ‘app: zoneprinter’ across different clusters are all considered part of the same pool of application backends from the perspective of Ingress for Anthos. This has ramifications for how different development teams operate across a group of clusters.

In this scenario a deployment called zone-ingress is deployed on all the three clusters in ‘zoneprinter’ namespace:

Demo application deployed on all clusters

MCI (MuliClusterIngress) and MCS (MultiClusterService) are created on the config cluster:

MultiClusterIngress and MultiClusterService created in Config Cluster

As we have not provided any ‘clusters’ in the MCS configuration a derived service is created on all the three clusters including the config cluster:

Derived Services created by MCS on all Member Clusters

The derived services created in the member clusters (all three clusters in this case) are annotated with NEGs (mapping of respective network element group), multiclusterservice-parent (MCS created on the config cluster) and the port information as shown below:

Derived Services created by MCS on Member Cluster

This step creates NEGs in Compute Engine, which begins to register and manage service endpoints.

The controller auto-creates the NEGs. NEGs map the ‘zone-ingress’ pods in all three clusters:

As shown below the network endpoints in the NEG map to the pod_ip of zone-ingress on cluster1:

Network Element Groups pointing to Pod_IP

Creating MCI resource on config cluster:

This step deploys the Compute Engine external load balancer resources and exposes the endpoints across clusters through a single load balancer VIP.

The MultiClusterIngress resource ‘Status’ field shows the backend services: MCS, firewalls, forwarding rules, health checks and NEGs of individual clusters. The VIP is the anycast IPV4 address which maps to a load balancer.

MCI resource with Backend Services, Firewalls, Forwarding Rules, Health Checks and NEGs

Load balancer backends mapping NEGs as backends, this is auto-created by the ingress controller when a MCI object is created:

The NEGs are configured as backends for the Load Balancer.

The frontend maps to a single IPV4 Anycast address.

Forwarding rules are configured to send traffic from LB to ingress resources.

Accessing the demo Service (Zoneprinter) from different Geographical Regions

Three VM instances are created in respective zones where the GKE clusters are positioned to replicate access from different regions.

Test Instances — Test Access from Different Regions

Accessing Zoneprinter using the LB’s Anycast IP from US-West, returns response from Zoneprinter pod running in Cluster1 — us-west2-a.

Accessing Zoneprinter using the LB’s Anycast IP from US-East, returns response from Zoneprinter pod running in Cluster2 — us-east4-a.

Accessing Zoneprinter using the LB’s Anycast IP from EU-West, returns response from Zoneprinter pod running in Cluster3 — europe-west2-b.

Accessing the application using web page performance tool shows the same response time from both USA and Europe.

From Virginia:

From London:

Comparing traceroute results of the LB’s Anycast IP and an app serving on a traditional EIP (Elastic IP) shows drastic differences in the response time and number of next-hops.

The traceroute from different regions to the LB’s Anycast IP shows minimal number of hops and lower response time. This basically proves that the proximity based routing is choosing the application running on the nearest cluster.

Traceroute — Anycast IP from Different Regions

The traceroute from New York to an application serving on a EIP in London shows significant increase in the number of hops and response time.

Traceroute — EIP from New York to London

Monitoring and Metrics

Load balancer details section provides backend wise health, utilization and request metrics.

Map showing requests, traffic and serving backends.

Other Features

Ingress for Anthos supports other features such as:

HTTPS support: The Kubernetes Secret supports HTTPS. Before enabling HTTPS support, user must create a static IP address. This static IP allows HTTP and HTTPS to share the same IP address. The secret can be provided using the ‘tls’ parameter in the MCI resource.
BackendConfig support: The BackendConfig CRD allows you to customize settings on the Compute Engine BackendService resource.

WAF with Cloud Armor

Google Cloud Armor is deployed at the edge of Google’s network and tightly coupled with the global load balancing infrastructure. Cloud Armor can be used to implement Geo-based access controls, pre-configured WAF rules and Custom L7 filtering policies using custom or pre-configured rules provided by Google Cloud. Google Cloud Armor provides preconfigured complex web application firewall (WAF) rules with dozens of signatures that are compiled from open source industry standards.

Users can frame security and SSL policies and the policies can be applied to targets, here targets are the load balancers.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Apart from the proximity-based routing, multi-cluster ingress simplifies other operations such as Blue/Green cluster upgrades, disaster recovery, automated fail-overs based on sophisticated health-checks, Run highly available (HA) apps on Kubernetes, Lower DDoS attack risk etc. and has has numerous other benefits, all of which have the potential to translate into a significant competitive edge.