Jan 9, 2023

Dive into Kubernetes BGP Network, Part Two

Kubernetes with Cilium BGP

0x00 Introduction

In my previous blog of this series, I wrote about how to prepare the environment for deploying Kubernetes with BGP Network. Today, I am going to write about the process of how I deployed.

0x0000 The Environment

Before We start, here are the tools and versions I used to deploy Kubernetes.

kubeadm: v1.25.4
cilium: v1.12.2
helm: v3.10.2
kube-router: v1.5.3

Here is my configuration for kubeadm, and I will explain the usage of the configurations I used.

kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd        # Specify cgroup driver of kubelet,recommend systemd
---
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.56.4   # the address for apiserver listening
  bindPort: 6443
    #nodeRegistration:
    #  criSocket: unix:///var/run/containerd/containerd.sock
    #  imagePullPolicy: IfNotPresent
    #  name: k8smaster0
    #  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.25.4
# Recommand to set up controlPlane if you want to add apiserver for HA,
# and the cluster-endpoint is to use for the DNS parsed to the apiservers.
controlPlaneEndpoint: "cluster-endpoint:6443"  
networking:
  dnsDomain: cluster.local
# Specify the service CIDR range
  serviceSubnet: 10.96.0.0/12
# Specify the pod CIDR range
  podSubnet: 10.112.0.0/12
scheduler: {}

Usually, We can start to deploy Kubernetes using kubeadm, but unfortunately, as a Chinese developer behind the GFW, there are some obstacle to pulling the images. So I wrote the scripts to pull images from the domestic image mirror sources, and the scripts will rename images to the name that kubeadm used.

After we install kubeadm, we can use kubeadm config images list to show the images require for kubeadm. Here is my command execution result:

root@k8smaster0:~# kubeadm config images list
I1221 07:38:49.772660    1024 version.go:256] remote version is much newer: v1.26.0; falling back to: stable-1.25
registry.k8s.io/kube-apiserver:v1.25.5
registry.k8s.io/kube-controller-manager:v1.25.5
registry.k8s.io/kube-scheduler:v1.25.5
registry.k8s.io/kube-proxy:v1.25.5
registry.k8s.io/pause:3.8
registry.k8s.io/etcd:3.5.5-0
registry.k8s.io/coredns/coredns:v1.9.3

As you know, we have already installed containerd as Kubernetes runtime in the previous article. However, the usage of containerd is somewhat different from docker because containerd uses ctr as a command line tool, and also, containerd has a namespace concept. If we use ctr command without specifying namespaces, it will use the default namespace. However, the Kubernetes CRI use k8s.io namespace. So if we want to pre-pull images, we need to put them into the correct namespace; otherwise, Kubernetes can not find the resources. Here are my commands to pre-pull images:

#Download the scripts for pulling images from China.
wget https://raw.githubusercontent.com/N0mansky/docker_wrapper/master/crt_wrapper.py
chmod +x crt_wrapper.py

# Pullings images to k8s.io namespace by using scripts
./crt_wrapper.py pull registry.k8s.io/xxxxxx

# After we have pulled the images, we can check by using the command
ctr -n k8s.io image ls -q

# Or we can use the crictl
root@k8sslave0:~# crictl image ls
IMAGE                                                         TAG                 IMAGE ID            SIZE
docker.io/cloudnativelabs/kube-router                         latest              a5e6dc4b76a3f       45MB
docker.io/library/busybox                                     1.28                8c811b4aec35f       728kB
docker.io/library/nginx                                       1.14.2              295c7be079025       44.7MB
quay.io/cilium/cilium                                         v1.12.2             743cf6b60787d       167MB
quay.io/cilium/cilium                                         v1.12.4             b7257a8403c50       167MB
quay.io/cilium/operator-generic                               v1.12.2             1f3c9d6876457       18.9MB
quay.io/cilium/operator-generic                               v1.12.4             ca5b3c9580cb3       18.9MB
registry.cn-hangzhou.aliyuncs.com/google_containers/coredns   v1.9.3              5185b96f0becf       14.8MB
registry.k8s.io/coredns/coredns                               v1.9.3              5185b96f0becf       14.8MB
registry.cn-hangzhou.aliyuncs.com/google_containers/pause     3.8                 4873874c08efc       311kB
registry.k8s.io/pause                                         3.8                 4873874c08efc       311kB
registry.k8s.io/kube-proxy                                    v1.25.4             2c2bc18642790       20.3MB

0x01 Deployment

0x0100 Create Cluster

After all the pre-work has been done, it is time to create a cluster. First, I initiated the master with the command:

# I specified to skip kube-proxy, because I will use cilium to replace it.
kubeadm init --config kubeadm-config.yml --skip-phases=addon/kube-proxy

# After the above command had finished, the following results were printed.
...
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

  kubeadm join cluster-endpoint:6443 --token abcdef.0123456789abcdef \
	--discovery-token-ca-cert-hash sha256:7337839717eb93c80bad2157ecbed814c389f8fa843c2d6b41e305e763751107 \
	--control-plane

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join cluster-endpoint:6443 --token abcdef.0123456789abcdef \
	--discovery-token-ca-cert-hash sha256:7337839717eb93c80bad2157ecbed814c389f8fa843c2d6b41e305e763751107

After the master node had initialized successfully, I used the following command to add a worker node:

# Executing the join command on worker node
kubeadm join cluster-endpoint:6443 --token abcdef.0123456789abcdef \
	--discovery-token-ca-cert-hash sha256:7337839717eb93c80bad2157ecbed814c389f8fa843c2d6b41e305e763751107

# After the above command had finished, the following result were printed.
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

# And the nodes all were in the ready status
root@k8smaster0:~# kubectl get nodes -A
NAME         STATUS   ROLES           AGE     VERSION
k8smaster0   Ready    control-plane   5m42s   v1.25.4
k8sslave0    Ready    <none>          66s     v1.25.4

Now we can use kubectl get pods -A to check the state of pods. We can see the result didn’t show the pods of kube-proxy, and the Coredns are in ContainerCreating status. That is because we didn’t install kube-proxy and a network add-on, and the Coredns containers need a pod network add-on for creation.

root@k8smaster0:~# kubectl get pods -A
NAMESPACE     NAME                                 READY   STATUS              RESTARTS           AGE
kube-system   coredns-565d847f94-l7pz9             0/1     ContainerCreating   0                  18d
kube-system   coredns-565d847f94-pfxbj             0/1     ContainerCreating   0                  18d
kube-system   etcd-k8smaster0                      1/1     Running             1154 (5m50s ago)   18d
kube-system   kube-apiserver-k8smaster0            1/1     Running             1327 (5m50s ago)   18d
kube-system   kube-controller-manager-k8smaster0   1/1     Running             1125 (5m50s ago)   18d
kube-system   kube-scheduler-k8smaster0            1/1     Running             1186 (5m50s ago)   18d

0x0101 Install Cilium

After we create the cluster, it is time to install the Network add-on. At this time, I am using cilium. There are many methods to install Cilium. I used helm to install it. Here is the command I used to install Cilium.

root@k8smaster0:~# cat install_cilium.sh
API_SERVER_IP=192.168.56.4
# Kubeadm default is 6443
API_SERVER_PORT=6443
helm install cilium cilium/cilium --version 1.12.4 \
    --namespace kube-system \
    --set kubeProxyReplacement=strict \
    --set k8sServiceHost=${API_SERVER_IP} \
    --set k8sServicePort=${API_SERVER_PORT} \
    --set ipv4NativeRoutingCIDR=192.168.56.0/24 \
    --set tunnel="disabled" \
    --set ipam.mode=kubernetes

Let me explain those options. The kubeProxyReplacement means we are using Cilium to replace the kube-proxy component. The tunnel and ipv4NativeRoutingCIDR indicate we are using Native Route mode instead of the overlay network and the ipam.mode means we delegate each node in the cluster to allocate IP addresses for the Pods.

The CoreDNS has been running since we installed Cilium.

root@k8smaster0:~# kubectl get pods -A
NAMESPACE     NAME                                 READY   STATUS    RESTARTS           AGE
kube-system   cilium-2xlxh                         1/1     Running   0                  6m53s
kube-system   cilium-hcr6r                         1/1     Running   0                  6m53s
kube-system   cilium-operator-675567f547-8jz7l     1/1     Running   0                  6m53s
kube-system   cilium-operator-675567f547-l68zt     1/1     Running   0                  6m53s
kube-system   coredns-565d847f94-l7pz9             1/1     Running   0                  18d
kube-system   coredns-565d847f94-pfxbj             1/1     Running   0                  18d
kube-system   etcd-k8smaster0                      1/1     Running   1154 (4h19m ago)   18d
kube-system   kube-apiserver-k8smaster0            1/1     Running   1327 (4h19m ago)   18d
kube-system   kube-controller-manager-k8smaster0   1/1     Running   1125 (4h19m ago)   18d
kube-system   kube-scheduler-k8smaster0            1/1     Running   1186 (4h19m ago)   18d

0x0102 Install kube-router

At the moment, the setting up of the network still has some work to do. If we want to use BGP mode, we must install BGP Daemonset on each node for BPG peering and route propagation. There have many options, such as kube-router, BIRD and Cilium Native BGP. I chose kube-router instead of others because the kube-router is easy to use.

You can download the YAML file which I used to install kube-router from here https://github.com/cloudnativelabs/kube-router/blob/v1.5.3/daemonset/generic-kuberouter-only-advertise-routes.yaml ,and I used following options:

...
containers:
      - name: kube-router
        image: docker.io/cloudnativelabs/kube-router
        imagePullPolicy: Always
        args:
        - "--run-router=true"
        - "--run-firewall=false"
        - "--run-service-proxy=false"
        - "--bgp-graceful-restart=true"
        - "--enable-cni=false"
        - "--enable-ibgp=true"
        - "--enable-overlay=false"
        - "--cluster-asn=65001"
        - "--advertise-cluster-ip=true"
        - "--advertise-external-ip=true"
        - "--advertise-loadbalancer-ip=true"
...

Now, the internal cluster is already using BGP. We can create a Deployment and a service to test it.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2 # tells deployment to run 2 pods matching the template
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx

Check the Nginx pods and services IP addresses by the following command. It indicated Cilium works fine.

root@k8smaster0:~/app# kubectl get service nginx
NAME    TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
nginx   ClusterIP   10.96.229.67   <none>        80/TCP    12m
root@k8smaster0:~/app# kubectl get pods -o wide -l app=nginx
NAME                                READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READINESS GATES
nginx-deployment-7fb96c846b-7jhtt   1/1     Running   0          12m   10.112.1.174   k8sslave0   <none>           <none>
nginx-deployment-7fb96c846b-lqpgf   1/1     Running   0          12m   10.112.1.125   k8sslave0   <none>           <none>
root@k8smaster0:~/app# curl 10.96.229.67
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

And we can use the Cilium command to get the services managed by Cilium.

root@k8smaster0:~/app# kubectl get pods -A -l k8s-app=cilium
NAMESPACE     NAME           READY   STATUS    RESTARTS   AGE
kube-system   cilium-2xlxh   1/1     Running   0          60m
kube-system   cilium-hcr6r   1/1     Running   0          60m
root@k8smaster0:~/app# kubectl -n kube-system exec cilium-2xlxh -- cilium service list
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
ID   Frontend           Service Type   Backend
1    10.96.0.1:443      ClusterIP      1 => 192.168.56.4:6443 (active)
2    10.96.0.10:53      ClusterIP      1 => 10.112.0.59:53 (active)
                                       2 => 10.112.0.84:53 (active)
3    10.96.0.10:9153    ClusterIP      1 => 10.112.0.59:9153 (active)
                                       2 => 10.112.0.84:9153 (active)
4    10.98.28.103:443   ClusterIP      1 => 192.168.56.4:4244 (active)
                                       2 => 192.168.56.5:4244 (active)
5    10.96.229.67:80    ClusterIP      1 => 10.112.1.125:80 (active)
                                       2 => 10.112.1.174:80 (active)
root@k8smaster0:~/app#

We can see the Nginx Cluster IP 10.96.229.67 is active.

By now, we have finished all the work, the internal cluster is using BPG to communicate with each other, but we still have some work to do to let the external can route to the cluster. I’ll write it in my next blog.