Dive into Kubernetes BGP Network, Part Three
0x00 Introduction
In part two of this series, I wrote about deploying Cilium with BGP mode in the Kubernetes Cluster, which worked fine in the internal cluster. Today, I will write about how to make the cluster pod IP routable from the outside, such as the on-premises environment.
In this article, I will use a Linux virtual machine installed BIRD as a software router that runs as a BGP peer so that the on-premises and the cluster pod IP are routable.
0x0000 The Environment
Here is the environment information of the software router:
- System Distribution:
Ubuntu 22.04.1 LTS
- IP Address:
192.168.56.10
- BIRD version:
1.6.8
- kube-router:
v1.5.3
0x01 Deployment
0x0101 Configure Kube-router
To use an external BGP peer, we have to change some configurations of kube-router. You can download the original YAML of kube-router from https://github.com/cloudnativelabs/kube-router/blob/v1.5.3/daemonset/generic-kuberouter-only-advertise-routes.yaml , and here is the snippet of the configurations which I modified.
...
containers:
- name: kube-router
image: docker.io/cloudnativelabs/kube-router
imagePullPolicy: Always
args:
- "--run-router=true"
- "--run-firewall=false"
- "--run-service-proxy=false"
- "--bgp-graceful-restart=true"
- "--enable-cni=false"
- "--enable-ibgp=true"
- "--enable-overlay=false"
- "--peer-router-ips=192.168.56.10"
- "--peer-router-asns=65000"
- "--cluster-asn=65001"
- "--advertise-cluster-ip=true"
- "--advertise-external-ip=true"
- "--advertise-loadbalancer-ip=true"
...
Some options need to be noticed if you will use external BGP. The peer-router-ips are the IP addresses of external BPGs. The peer-router-asns are ASN numbers of the BGP peer to which cluster nodes will advertise cluster IP and node's pod CIDR. The advertise-cluster-ip means add Cluster IP of the service to the RIB (Routing Information Base that contains the routing information maintained by that router) so that it gets advertises to the BGP peers. Now we apply the configuration.
0x0102 Deploy External BGP Peer
Now we need to install an external BGP Peer on 192.168.56.10
.
sudo apt -y install bird-bgp
Here is the configuration of BIRD.
cat /etc/bird/bird.conf
protocol kernel {
scan time 60;
import none;
export all; # Actually insert routes into the kernel routing table
}
# The Device protocol is not a real routing protocol. It doesn't generate any
# routes and it only serves as a module for getting information about network
# interfaces from the kernel.
protocol device {
scan time 60;
}
protocol bgp k8smaster0 {
import all;
local as 65000;
neighbor 192.168.56.4 as 65001;
}
protocol bgp k8sslave0 {
import all;
local as 65000;
neighbor 192.168.56.5 as 65001;
}
The above snippet means we receive the BGP propagations from Kubernetes nodes. The ASN needs to match the ASN we configured in kube-router. After the bird.conf
has been configured, we can use the following commands to start the BIRD process and check the routes:
sysadmin@ubuntu:~$ sudo invoke-rc.d bird start
sysadmin@ubuntu:~$ sudo birdc show route
BIRD 1.6.8 ready.
10.98.28.103/32 via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] (100) [AS65001i]
10.96.229.67/32 via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] (100) [AS65001i]
10.96.0.1/32 via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] (100) [AS65001i]
10.96.0.10/32 via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] (100) [AS65001i]
10.112.0.0/24 via 192.168.56.4 on enp0s8 [k8smaster0 14:40:34] * (100) [AS65001i]
10.112.1.0/24 via 192.168.56.5 on enp0s8 [k8sslave0 14:40:33] * (100) [AS65001i]
The result shows the cluster service IP and Cluster pod IP have been propagated to the external BGP peer. Now we can access the pod IP from 192.168.56.10
directly.
## Get the Nginx Pod IP
root@k8smaster0:~/kube-router# kubectl get pods -A -l app=nginx -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default nginx-deployment-7fb96c846b-7jhtt 1/1 Running 2 (67m ago) 47h 10.112.1.194 k8sslave0 <none> <none>
default nginx-deployment-7fb96c846b-lqpgf 1/1 Running 2 (67m ago) 47h 10.112.1.90 k8sslave0 <none> <none>
## Curl from 192.168.56.10
sysadmin@ubuntu:~$ curl -I 10.112.1.194
HTTP/1.1 200 OK
Server: nginx/1.14.2
Date: Wed, 11 Jan 2023 06:49:32 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 04 Dec 2018 14:44:49 GMT
Connection: keep-alive
ETag: "5c0692e1-264"
Accept-Ranges: bytes
Then we tried to access the cluster service IP.
## Get service IP of nginx
root@k8smaster0:~/kube-router# kubectl get service -A -l app=nginx -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default nginx ClusterIP 10.96.229.67 <none> 80/TCP 47h app=nginx
## Curl from 192.168.56.10
sysadmin@ubuntu:~$ curl 10.96.229.67 -v
* Trying 10.96.229.67:80...
But the result shows we still can’t access cluster service IP directly. It’s strange; let's check the monitor metrics of Cilium to see if or not the traffic from external BGP peer VM are normal.
## Step 1. Find the Nginx service managed by which daemonset of Cilum
root@k8smaster0:~/kube-router# kubectl -n kube-system exec cilium-2xlxh -- cilium service list
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
ID Frontend Service Type Backend
1 10.96.229.67:80 ClusterIP 1 => 10.112.1.194:80 (active)
2 => 10.112.1.90:80 (active)
## Step 2. Use Cilium daemonset located on the node of CIDR 10.112.1.0/24
## to check traffic
root@k8smaster0:~/kube-router# kubectl -n kube-system exec -ti cilium-hcr6r -- cilium monitor --type drop
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
Listening for events on 4 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
## Step 3. Curl from 192.168.56.10
sysadmin@ubuntu:~$ curl 10.96.229.67 -v
* Trying 10.96.229.67:80...
## Step 4. Check the stdout of cilium monitor and we can see the TCP packet has been dropped
root@k8smaster0:~/kube-router# kubectl -n kube-system exec -ti cilium-hcr6r -- cilium monitor --type drop
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
Listening for events on 4 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
level=info msg="Initializing dissection cache..." subsys=monitor
xx drop (Is a ClusterIP) flow 0x0 to endpoint 0, file bpf_host.c line 665, , identity world->unknown: 192.168.56.10:33802 -> 10.112.1.90:80 tcp SYN
xx drop (Is a ClusterIP) flow 0x0 to endpoint 0, file bpf_host.c line 665, , identity world->unknown: 192.168.56.10:33802 -> 10.112.1.90:80 tcp SYN
I read the official document, and it seems to be because of the option of bpf-lb-external-clusterip. The official document shows this option enables external access to ClusterIP services, and by default, it is false. Now we know why the curl failed, so we can modify it to true:
root@k8smaster0:~/kube-router# cilium config view | grep -i bpf-lb-external-clusterip
bpf-lb-external-clusterip false
root@k8smaster0:~/kube-router# cilium config set bpf-lb-external-clusterip true
✨ Patching ConfigMap cilium-config with bpf-lb-external-clusterip=true...
♻️ Restarted Cilium pods
## curl from 192.168.56.10
sysadmin@ubuntu:~$ curl 10.96.229.67 -I
HTTP/1.1 200 OK
Server: nginx/1.14.2
Date: Wed, 11 Jan 2023 07:26:35 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 04 Dec 2018 14:44:49 GMT
Connection: keep-alive
ETag: "5c0692e1-264"
Accept-Ranges: bytes
Finally, we can access the cluster service from outside. We need to do the last step to test if external IP is routable from pods.
## Create a busybox deployment
apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:
- name: busybox
image: busybox:1.28
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
## Ping from pod of busybox
root@k8smaster0:~/app# kubectl exec -it busybox -- ping -c 3 192.168.56.10
PING 192.168.56.10 (192.168.56.10): 56 data bytes
64 bytes from 192.168.56.10: seq=0 ttl=62 time=3.074 ms
64 bytes from 192.168.56.10: seq=1 ttl=62 time=0.566 ms
64 bytes from 192.168.56.10: seq=2 ttl=62 time=0.927 ms
--- 192.168.56.10 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.566/1.522/3.074 ms
0x02 Summary
By now, the Kubernetes cluster and external BGP router are routable from each other. You only need to configure some static routes on your local environment and external BGP router if you want the K8s cluster and the local environment to be routable to each other.