Dive into Kubernetes BGP Network, Part Two
Kubernetes with Cilium BGP
0x00 Introduction
In my previous blog of this series, I wrote about how to prepare the environment for deploying Kubernetes with BGP Network. Today, I am going to write about the process of how I deployed.
0x0000 The Environment
Before We start, here are the tools and versions I used to deploy Kubernetes.
- kubeadm: v1.25.4
- cilium: v1.12.2
- helm: v3.10.2
- kube-router: v1.5.3
Here is my configuration for kubeadm, and I will explain the usage of the configurations I used.
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd # Specify cgroup driver of kubelet,recommend systemd
---
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.56.4 # the address for apiserver listening
bindPort: 6443
#nodeRegistration:
# criSocket: unix:///var/run/containerd/containerd.sock
# imagePullPolicy: IfNotPresent
# name: k8smaster0
# taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.25.4
# Recommand to set up controlPlane if you want to add apiserver for HA,
# and the cluster-endpoint is to use for the DNS parsed to the apiservers.
controlPlaneEndpoint: "cluster-endpoint:6443"
networking:
dnsDomain: cluster.local
# Specify the service CIDR range
serviceSubnet: 10.96.0.0/12
# Specify the pod CIDR range
podSubnet: 10.112.0.0/12
scheduler: {}
Usually, We can start to deploy Kubernetes using kubeadm, but unfortunately, as a Chinese developer behind the GFW, there are some obstacle to pulling the images. So I wrote the scripts to pull images from the domestic image mirror sources, and the scripts will rename images to the name that kubeadm used.
After we install kubeadm, we can use kubeadm config images list
to show the images require for kubeadm. Here is my command execution result:
root@k8smaster0:~# kubeadm config images list
I1221 07:38:49.772660 1024 version.go:256] remote version is much newer: v1.26.0; falling back to: stable-1.25
registry.k8s.io/kube-apiserver:v1.25.5
registry.k8s.io/kube-controller-manager:v1.25.5
registry.k8s.io/kube-scheduler:v1.25.5
registry.k8s.io/kube-proxy:v1.25.5
registry.k8s.io/pause:3.8
registry.k8s.io/etcd:3.5.5-0
registry.k8s.io/coredns/coredns:v1.9.3
As you know, we have already installed containerd as Kubernetes runtime in the previous article. However, the usage of containerd is somewhat different from docker because containerd uses ctr
as a command line tool, and also, containerd has a namespace concept. If we use ctr
command without specifying namespaces, it will use the default namespace. However, the Kubernetes CRI use k8s.io
namespace. So if we want to pre-pull images, we need to put them into the correct namespace; otherwise, Kubernetes can not find the resources. Here are my commands to pre-pull images:
#Download the scripts for pulling images from China.
wget https://raw.githubusercontent.com/N0mansky/docker_wrapper/master/crt_wrapper.py
chmod +x crt_wrapper.py
# Pullings images to k8s.io namespace by using scripts
./crt_wrapper.py pull registry.k8s.io/xxxxxx
# After we have pulled the images, we can check by using the command
ctr -n k8s.io image ls -q
# Or we can use the crictl
root@k8sslave0:~# crictl image ls
IMAGE TAG IMAGE ID SIZE
docker.io/cloudnativelabs/kube-router latest a5e6dc4b76a3f 45MB
docker.io/library/busybox 1.28 8c811b4aec35f 728kB
docker.io/library/nginx 1.14.2 295c7be079025 44.7MB
quay.io/cilium/cilium v1.12.2 743cf6b60787d 167MB
quay.io/cilium/cilium v1.12.4 b7257a8403c50 167MB
quay.io/cilium/operator-generic v1.12.2 1f3c9d6876457 18.9MB
quay.io/cilium/operator-generic v1.12.4 ca5b3c9580cb3 18.9MB
registry.cn-hangzhou.aliyuncs.com/google_containers/coredns v1.9.3 5185b96f0becf 14.8MB
registry.k8s.io/coredns/coredns v1.9.3 5185b96f0becf 14.8MB
registry.cn-hangzhou.aliyuncs.com/google_containers/pause 3.8 4873874c08efc 311kB
registry.k8s.io/pause 3.8 4873874c08efc 311kB
registry.k8s.io/kube-proxy v1.25.4 2c2bc18642790 20.3MB
0x01 Deployment
0x0100 Create Cluster
After all the pre-work has been done, it is time to create a cluster. First, I initiated the master with the command:
# I specified to skip kube-proxy, because I will use cilium to replace it.
kubeadm init --config kubeadm-config.yml --skip-phases=addon/kube-proxy
# After the above command had finished, the following results were printed.
...
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join cluster-endpoint:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:7337839717eb93c80bad2157ecbed814c389f8fa843c2d6b41e305e763751107 \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join cluster-endpoint:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:7337839717eb93c80bad2157ecbed814c389f8fa843c2d6b41e305e763751107
After the master node had initialized successfully, I used the following command to add a worker node:
# Executing the join command on worker node
kubeadm join cluster-endpoint:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:7337839717eb93c80bad2157ecbed814c389f8fa843c2d6b41e305e763751107
# After the above command had finished, the following result were printed.
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
# And the nodes all were in the ready status
root@k8smaster0:~# kubectl get nodes -A
NAME STATUS ROLES AGE VERSION
k8smaster0 Ready control-plane 5m42s v1.25.4
k8sslave0 Ready <none> 66s v1.25.4
Now we can use kubectl get pods -A
to check the state of pods. We can see the result didn’t show the pods of kube-proxy, and the Coredns are in ContainerCreating status. That is because we didn’t install kube-proxy and a network add-on, and the Coredns containers need a pod network add-on for creation.
root@k8smaster0:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-565d847f94-l7pz9 0/1 ContainerCreating 0 18d
kube-system coredns-565d847f94-pfxbj 0/1 ContainerCreating 0 18d
kube-system etcd-k8smaster0 1/1 Running 1154 (5m50s ago) 18d
kube-system kube-apiserver-k8smaster0 1/1 Running 1327 (5m50s ago) 18d
kube-system kube-controller-manager-k8smaster0 1/1 Running 1125 (5m50s ago) 18d
kube-system kube-scheduler-k8smaster0 1/1 Running 1186 (5m50s ago) 18d
0x0101 Install Cilium
After we create the cluster, it is time to install the Network add-on. At this time, I am using cilium. There are many methods to install Cilium. I used helm to install it. Here is the command I used to install Cilium.
root@k8smaster0:~# cat install_cilium.sh
API_SERVER_IP=192.168.56.4
# Kubeadm default is 6443
API_SERVER_PORT=6443
helm install cilium cilium/cilium --version 1.12.4 \
--namespace kube-system \
--set kubeProxyReplacement=strict \
--set k8sServiceHost=${API_SERVER_IP} \
--set k8sServicePort=${API_SERVER_PORT} \
--set ipv4NativeRoutingCIDR=192.168.56.0/24 \
--set tunnel="disabled" \
--set ipam.mode=kubernetes
Let me explain those options. The kubeProxyReplacement means we are using Cilium to replace the kube-proxy component. The tunnel and ipv4NativeRoutingCIDR indicate we are using Native Route mode instead of the overlay network and the ipam.mode means we delegate each node in the cluster to allocate IP addresses for the Pods.
The CoreDNS has been running since we installed Cilium.
root@k8smaster0:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-2xlxh 1/1 Running 0 6m53s
kube-system cilium-hcr6r 1/1 Running 0 6m53s
kube-system cilium-operator-675567f547-8jz7l 1/1 Running 0 6m53s
kube-system cilium-operator-675567f547-l68zt 1/1 Running 0 6m53s
kube-system coredns-565d847f94-l7pz9 1/1 Running 0 18d
kube-system coredns-565d847f94-pfxbj 1/1 Running 0 18d
kube-system etcd-k8smaster0 1/1 Running 1154 (4h19m ago) 18d
kube-system kube-apiserver-k8smaster0 1/1 Running 1327 (4h19m ago) 18d
kube-system kube-controller-manager-k8smaster0 1/1 Running 1125 (4h19m ago) 18d
kube-system kube-scheduler-k8smaster0 1/1 Running 1186 (4h19m ago) 18d
0x0102 Install kube-router
At the moment, the setting up of the network still has some work to do. If we want to use BGP mode, we must install BGP Daemonset on each node for BPG peering and route propagation. There have many options, such as kube-router, BIRD and Cilium Native BGP. I chose kube-router instead of others because the kube-router is easy to use.
You can download the YAML file which I used to install kube-router from here https://github.com/cloudnativelabs/kube-router/blob/v1.5.3/daemonset/generic-kuberouter-only-advertise-routes.yaml ,and I used following options:
...
containers:
- name: kube-router
image: docker.io/cloudnativelabs/kube-router
imagePullPolicy: Always
args:
- "--run-router=true"
- "--run-firewall=false"
- "--run-service-proxy=false"
- "--bgp-graceful-restart=true"
- "--enable-cni=false"
- "--enable-ibgp=true"
- "--enable-overlay=false"
- "--cluster-asn=65001"
- "--advertise-cluster-ip=true"
- "--advertise-external-ip=true"
- "--advertise-loadbalancer-ip=true"
...
Now, the internal cluster is already using BGP. We can create a Deployment and a service to test it.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
selector:
matchLabels:
app: nginx
replicas: 2 # tells deployment to run 2 pods matching the template
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
labels:
app: nginx
name: nginx
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: nginx
Check the Nginx pods and services IP addresses by the following command. It indicated Cilium works fine.
root@k8smaster0:~/app# kubectl get service nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 10.96.229.67 <none> 80/TCP 12m
root@k8smaster0:~/app# kubectl get pods -o wide -l app=nginx
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-7fb96c846b-7jhtt 1/1 Running 0 12m 10.112.1.174 k8sslave0 <none> <none>
nginx-deployment-7fb96c846b-lqpgf 1/1 Running 0 12m 10.112.1.125 k8sslave0 <none> <none>
root@k8smaster0:~/app# curl 10.96.229.67
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
And we can use the Cilium command to get the services managed by Cilium.
root@k8smaster0:~/app# kubectl get pods -A -l k8s-app=cilium
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system cilium-2xlxh 1/1 Running 0 60m
kube-system cilium-hcr6r 1/1 Running 0 60m
root@k8smaster0:~/app# kubectl -n kube-system exec cilium-2xlxh -- cilium service list
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), mount-bpf-fs (init), clean-cilium-state (init)
ID Frontend Service Type Backend
1 10.96.0.1:443 ClusterIP 1 => 192.168.56.4:6443 (active)
2 10.96.0.10:53 ClusterIP 1 => 10.112.0.59:53 (active)
2 => 10.112.0.84:53 (active)
3 10.96.0.10:9153 ClusterIP 1 => 10.112.0.59:9153 (active)
2 => 10.112.0.84:9153 (active)
4 10.98.28.103:443 ClusterIP 1 => 192.168.56.4:4244 (active)
2 => 192.168.56.5:4244 (active)
5 10.96.229.67:80 ClusterIP 1 => 10.112.1.125:80 (active)
2 => 10.112.1.174:80 (active)
root@k8smaster0:~/app#
We can see the Nginx Cluster IP 10.96.229.67
is active.
By now, we have finished all the work, the internal cluster is using BPG to communicate with each other, but we still have some work to do to let the external can route to the cluster. I’ll write it in my next blog.