Dive into Kubernetes BGP Network, Part Three

Kubernetes BGP Network with Cilium

0x00 Introduction

In part two of this series, I wrote about deploying Cilium with BGP mode in the Kubernetes Cluster, which worked fine in the internal cluster. Today, I will write about how to make the cluster pod IP routable from the outside, such as the on-premises environment.

In this article, I will use a Linux virtual machine installed BIRD as a software router that runs as a BGP peer so that the on-premises and the cluster pod IP are routable.

0x0000 The Environment

Here is the environment information of the software router:

  • System Distribution: Ubuntu 22.04.1 LTS
  • IP Address: 192.168.56.10
  • BIRD version: 1.6.8
  • kube-router: v1.5.3

0x01 Deployment

0x0101 Configure Kube-router

To use an external BGP peer, we have to change some configurations of kube-router. You can download the original YAML of kube-router from https://github.com/cloudnativelabs/kube-router/blob/v1.5.3/daemonset/generic-kuberouter-only-advertise-routes.yaml , and here is the snippet of the configurations which I modified.

...
containers:
      - name: kube-router
        image: docker.io/cloudnativelabs/kube-router
        imagePullPolicy: Always
        args:
        - "--run-router=true"
        - "--run-firewall=false"
        - "--run-service-proxy=false"
        - "--bgp-graceful-restart=true"
        - "--enable-cni=false"
        - "--enable-ibgp=true"
        - "--enable-overlay=false"
        - "--peer-router-ips=192.168.56.10"
        - "--peer-router-asns=65000"
        - "--cluster-asn=65001"
        - "--advertise-cluster-ip=true"
        - "--advertise-external-ip=true"
        - "--advertise-loadbalancer-ip=true"
...

Some options need to be noticed if you will use external BGP. The peer-router-ips are the IP addresses of external BPGs. The peer-router-asns are ASN numbers of the BGP peer to which cluster nodes will advertise cluster IP and node's pod CIDR. The advertise-cluster-ip means add Cluster IP of the service to the RIB (Routing Information Base that contains the routing information maintained by that router) so that it gets advertises to the BGP peers. Now we apply the configuration.

0x0102 Deploy External BGP Peer

Now we need to install an external BGP Peer on 192.168.56.10 .

sudo apt -y install bird-bgp

Here is the configuration of BIRD.

cat /etc/bird/bird.conf
protocol kernel {
	scan time 60;
	import none;
	export all;   # Actually insert routes into the kernel routing table
}

# The Device protocol is not a real routing protocol. It doesn't generate any
# routes and it only serves as a module for getting information about network
# interfaces from the kernel.
protocol device {
	scan time 60;

}

protocol bgp k8smaster0 {
	import all;
        local as 65000;
        neighbor 192.168.56.4 as 65001;
}

protocol bgp k8sslave0 {
	import all;
        local as 65000;
        neighbor 192.168.56.5 as 65001;
}

The above snippet means we receive the BGP propagations from Kubernetes nodes. The ASN needs to match the ASN we configured in kube-router. After the bird.conf has been configured, we can use the following commands to start the BIRD process and check the routes:

sysadmin@ubuntu:~$ sudo invoke-rc.d bird start

sysadmin@ubuntu:~$ sudo birdc show route
BIRD 1.6.8 ready.
10.98.28.103/32    via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
                   via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] (100) [AS65001i]
10.96.229.67/32    via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
                   via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] (100) [AS65001i]
10.96.0.1/32       via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
                   via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] (100) [AS65001i]
10.96.0.10/32      via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
                   via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] (100) [AS65001i]
10.112.0.0/24      via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
10.112.1.0/24      via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] * (100) [AS65001i]

The result shows the cluster service IP and Cluster pod IP have been propagated to the external BGP peer. Now we can access the pod IP from 192.168.56.10 directly.

## Get the Nginx Pod IP 
root@k8smaster0:~/kube-router# kubectl get pods -A -l app=nginx -o wide
NAMESPACE   NAME                                READY   STATUS    RESTARTS      AGE   IP             NODE        NOMINATED NODE   READINESS GATES
default     nginx-deployment-7fb96c846b-7jhtt   1/1     Running   2 (67m ago)   47h   10.112.1.194   k8sslave0   <none>           <none>
default     nginx-deployment-7fb96c846b-lqpgf   1/1     Running   2 (67m ago)   47h   10.112.1.90    k8sslave0   <none>           <none>

## Curl from 192.168.56.10
sysadmin@ubuntu:~$ curl -I 10.112.1.194
HTTP/1.1 200 OK
Server: nginx/1.14.2
Date: Wed, 11 Jan 2023 06:49:32 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 04 Dec 2018 14:44:49 GMT
Connection: keep-alive
ETag: "5c0692e1-264"
Accept-Ranges: bytes

Then we tried to access the cluster service IP.

## Get service IP of nginx
root@k8smaster0:~/kube-router# kubectl get service -A -l app=nginx -o wide
NAMESPACE   NAME    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE   SELECTOR
default     nginx   ClusterIP   10.96.229.67   <none>        80/TCP    47h   app=nginx

## Curl from 192.168.56.10
sysadmin@ubuntu:~$ curl 10.96.229.67 -v
*   Trying 10.96.229.67:80...

But the result shows we still can’t access cluster service IP directly. It’s strange; let's check the monitor metrics of Cilium to see if or not the traffic from external BGP peer VM are normal.

## Step 1. Find the Nginx service managed by which daemonset of Cilum

root@k8smaster0:~/kube-router# kubectl -n kube-system exec cilium-2xlxh -- cilium service list
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
ID   Frontend           Service Type   Backend
1    10.96.229.67:80    ClusterIP      1 => 10.112.1.194:80 (active)
                                       2 => 10.112.1.90:80 (active)

## Step 2. Use Cilium daemonset located on the node of CIDR 10.112.1.0/24
## to check traffic

root@k8smaster0:~/kube-router# kubectl -n kube-system exec -ti cilium-hcr6r -- cilium monitor --type drop
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
Listening for events on 4 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit

## Step 3. Curl from 192.168.56.10

sysadmin@ubuntu:~$ curl 10.96.229.67 -v
*   Trying 10.96.229.67:80...

## Step 4. Check the stdout of cilium monitor and we can see the TCP packet has been dropped

root@k8smaster0:~/kube-router# kubectl -n kube-system exec -ti cilium-hcr6r -- cilium monitor --type drop
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
Listening for events on 4 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (Is a ClusterIP) flow 0x0 to endpoint 0, file bpf_host.c line 665, , identity world->unknown: 192.168.56.10:33802 -> 10.112.1.90:80 tcp SYN
xx drop (Is a ClusterIP) flow 0x0 to endpoint 0, file bpf_host.c line 665, , identity world->unknown: 192.168.56.10:33802 -> 10.112.1.90:80 tcp SYN

I read the official document, and it seems to be because of the option of bpf-lb-external-clusterip. The official document shows this option enables external access to ClusterIP services, and by default, it is false. Now we know why the curl failed, so we can modify it to true:

root@k8smaster0:~/kube-router# cilium config view | grep -i bpf-lb-external-clusterip
bpf-lb-external-clusterip                      false

root@k8smaster0:~/kube-router# cilium config set bpf-lb-external-clusterip true
✨ Patching ConfigMap cilium-config with bpf-lb-external-clusterip=true...
♻️  Restarted Cilium pods

## curl from 192.168.56.10
sysadmin@ubuntu:~$ curl 10.96.229.67 -I
HTTP/1.1 200 OK
Server: nginx/1.14.2
Date: Wed, 11 Jan 2023 07:26:35 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 04 Dec 2018 14:44:49 GMT
Connection: keep-alive
ETag: "5c0692e1-264"
Accept-Ranges: bytes

Finally, we can access the cluster service from outside. We need to do the last step to test if external IP is routable from pods.

## Create a busybox deployment 
apiVersion: v1
kind: Pod
metadata:
  name: busybox
  namespace: default
spec:
  containers:
  - name: busybox
    image: busybox:1.28
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always

## Ping from pod of busybox
root@k8smaster0:~/app# kubectl exec -it busybox -- ping -c 3 192.168.56.10
PING 192.168.56.10 (192.168.56.10): 56 data bytes
64 bytes from 192.168.56.10: seq=0 ttl=62 time=3.074 ms
64 bytes from 192.168.56.10: seq=1 ttl=62 time=0.566 ms
64 bytes from 192.168.56.10: seq=2 ttl=62 time=0.927 ms

--- 192.168.56.10 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.566/1.522/3.074 ms

0x02 Summary

By now, the Kubernetes cluster and external BGP router are routable from each other. You only need to configure some static routes on your local environment and external BGP router if you want the K8s cluster and the local environment to be routable to each other.