Setting Up a Self-Managed Kubernetes Cluster: A Step-by-Step Guide (None HA)
As Kubernetes continues to dominate the landscape of container orchestration, many developers and system administrators are opting for self-managed clusters to gain complete control over their infrastructure. In this guide, I’ll walk you through the steps required to set up a self-managed Kubernetes cluster using kubeadm
.
By the end of this tutorial, you’ll have a functional Kubernetes cluster ready to handle workloads. Let’s dive in!
📝 Prerequisites
Before beginning, ensure that you have:
- An Odd Number of Servers: For failure tolerance, Kubernetes clusters work best with an odd number of servers. This setup allows for a simple quorum, following the formula
(N-1)/2
to determine the number of tolerated failures. For a minimal setup, you can start with 1 master node (or control plane node) and 2 worker nodes. - Ubuntu 20.04 or a similar Linux distribution on your nodes.
- At least 2 CPUs and 2 GB of RAM on each node (one control plane node and at least one worker node).
- The control plane node should have at least 20 GB of storage while the worker nodes should have at least 10 GB of storage.
- A reliable network connection
- Root access or a user with sudo privileges during set up.
Step 1: Prepare the Environment
Note: Repeat Step 1 on both the control plane (master) node and each worker node. This will ensure all nodes are prepared with the necessary dependencies.
Update System Packages
Before installing Kubernetes components, let’s make sure your system packages are up to date. Run:
sudo apt update && sudo apt upgrade -y
Add Kubernetes Package Repository
- Download the Kubernetes GPG key:
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
- Add the Kubernetes apt repository:
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
- Update package lists:
sudo apt update
For more details, refer to the Kubernetes Installation Guide.
Step 2: Install Kubernetes Tools
Note: The first part of this step, where we install
kubeadm
,kubelet
, andkubectl
, should also be repeated on each worker node. These tools are required on both the control plane and worker nodes.
Install kubeadm
, kubelet
, and kubectl
These tools are essential for managing the Kubernetes cluster:
sudo apt install -y kubeadm kubelet kubectl
Install Container Runtime (containerd)
Kubernetes requires a container runtime like containerd
to run containers. Install it with:
sudo apt update && sudo apt install -y containerd
Configure CGroups for containerd
and kubelet
Control groups, or cgroups, are a Linux kernel feature that manage and allocate system resources such as CPU, memory, disk I/O, and network bandwidth for groups of processes. In the context of Kubernetes, cgroups play a crucial role in ensuring that containers are efficiently managed and isolated. Here’s what cgroups do:
- Resource Allocation and Limiting:
Cgroups allow Kubernetes to set limits and requests on the amount of CPU and memory a container can use. For example, you can restrict a container to use a maximum of 512 MB of memory or a limited percentage of CPU resources. This prevents any single container from consuming all resources on the node. - Resource Isolation:
By isolating resources, cgroups ensure that containers and their processes run independently without interfering with each other. If one container crashes or spikes in resource usage, cgroups help prevent it from impacting other containers. - Resource Monitoring:
Cgroups provide visibility into resource usage, allowing Kubernetes to monitor the CPU and memory consumption of containers and make decisions based on this data (like scaling or scheduling). - Resource Prioritization:
Cgroups can be used to prioritize resources for certain containers over others, ensuring critical services receive the resources they need, even under high load.
Overall, cgroups help Kubernetes enforce resource limits, maintain stability, and optimize performance, which is essential for managing containerized applications in a multi-tenant environment.
Kubernetes components need to use consistent cgroup drivers. There are two main CGroup drivers, which are:
systemd
cgroup driver
- Description: This driver uses
systemd
as the cgroup manager, which is the default init system on most modern Linux distributions (like Ubuntu and CentOS). - Usage: Preferred for Kubernetes clusters as it aligns with the native Linux
systemd
init system, leading to better compatibility and stability in resource management.
cgroupfs
cgroup driver
- Description: This driver uses
cgroupfs
as the cgroup manager, which directly manages cgroups within the filesystem. - Usage: Often used with container runtimes like Docker but may lead to issues in Kubernetes because it can conflict with
systemd
. It’s generally recommended to switch tosystemd
for Kubernetes clusters, especially in production environments.
Note: Consistency between the cgroup driver used by kubelet
and the container runtime (such as containerd
or Docker) is essential for stable cluster operations. Using the systemd
driver is typically recommended for production environments.
Let’s set them to use systemd
. With this being the default, all we have to do is verify that the current init system is using systemd, if so then we do not need to do anything.
- Check the current init system by running:
ps -p 1
- Create the configuration path for
containerd
:
sudo mkdir -p /etc/containerd
- Generate the default
containerd
config:
sudo containerd config default | sudo tee /etc/containerd/config.toml
- Edit the config to set
SystemdCgroup
totrue
:
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
- Verify the setting:
cat /etc/containerd/config.toml | grep -i SystemdCgroup -B 50
- Restart
containerd
:
sudo systemctl restart containerd
Set the kubelet
Cgroup Driver to cgroupfs (Optional)
This section can be skipped if you are using the default systemd
driver. The whole point is to make sure we keep the container runtime and the kubelet using the same driver.
1. Set the kubelet
Cgroup Driver to cgroupfs
- Edit the kubelet configuration file, typically located at
/var/lib/kubelet/config.yaml
:
sudo nano /var/lib/kubelet/config.yaml
- Find the
cgroupDriver
setting and set it tocgroupfs
:
cgroupDriver: cgroupfs
- Save and close the file.
- Restart
kubelet
to apply the changes:
sudo systemctl restart kubelet
If the kubelet
configuration file doesn’t already exist, you can also specify the cgroup driver directly by adding the --cgroup-driver=cgroupfs
flag in the kubelet
service configuration file (typically at /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
), then restart the kubelet
service.
2. Set the Container Runtime Cgroup Driver to cgroupfs
For containerd
- Edit the
containerd
configuration file located at/etc/containerd/config.toml
:
sudo nano /etc/containerd/config.toml
- Locate the
SystemdCgroup
option under the[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
section, and set it tofalse
:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] SystemdCgroup = false
- Save and close the file.
- Restart
containerd
to apply the changes:
sudo systemctl restart containerd
- For Docker
- Create or edit the Docker daemon configuration file located at
/etc/docker/daemon.json
:
sudo nano /etc/docker/daemon.json
- Set the cgroup driver to
cgroupfs
in the JSON configuration:
{ "exec-opts": ["native.cgroupdriver=cgroupfs"] }
- Save and close the file.
- Restart Docker to apply the changes:
sudo systemctl restart docker
For more details, refer to the Container Runtime Configuration in Kubernetes documentation.
Step 3: Initialize the Control Plane Node
Disable Swap
Kubernetes requires swap to be turned off, the reason is Kubernetes manages memory resources for containers and relies on accurate memory allocation and usage data. If swap is enabled, it can interfere with Kubernetes’ ability to monitor and control resource usage accurately. This can lead to unexpected behavior, where containers exceed memory limits or become unstable, which affects overall cluster stability.
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
Enable IP Forwarding
IP forwarding is a feature in the Linux kernel that allows the server to forward network packets from one network interface to another. By default, many Linux distributions disable IP forwarding, which means the system will not route packets between network interfaces.
To ensure network traffic flows correctly between different network interfaces, IP forwarding is essential in Kubernetes. This will enable pod-to-pod communication between nodes.
Why IP Forwarding is Needed in Kubernetes
- Pod-to-Pod Communication Across Nodes:
- Kubernetes creates a virtual network where each pod can communicate with other pods, even if they’re on different nodes. For this to work, IP packets need to be routed between different interfaces (e.g., between the pod network interface and the node’s main network interface).
- Enabling IP forwarding allows each node to forward packets between these interfaces, facilitating cross-node communication in the cluster.
- Service and Network Routing:
- Kubernetes uses various service types (like ClusterIP, NodePort, and LoadBalancer) to expose applications to internal and external traffic.
- With IP forwarding enabled, packets can be routed to their destination service or pod IP, even if that IP is on a different node within the cluster. This is crucial for Kubernetes networking components, such as kube-proxy, to correctly handle traffic routing.
- Network Plugins and CNI (Container Network Interface):
- Many CNI plugins (like Flannel, Calico, and Weave) rely on IP forwarding to handle the complex routing required for networking in a multi-node cluster.
- Enabling IP forwarding ensures that the network plugins can set up and manage pod networking, allowing pods to communicate seamlessly across the cluster.
Enable IP forwarding to allow packets to be routed between different network interfaces:
#permanently change it
echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf
#to temporarily change it
sudo sysctl -w net.ipv4.ip_forward=1
Or
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sudo sysctl --system
Initialize the Kubernetes Cluster
Let’s initialize the cluster on the control plane node. Replace <control-plane-ip>
with the actual IP of your control plane node:
sudo kubeadm init --apiserver-advertise-address=<control-plane-ip> --pod-network-cidr="10.244.0.0/16" --upload-certs
Refer to the kubeadm Cluster Setup Guide for more details.
Configure kubectl
for Cluster Access
Now, let’s set up kubectl
so you can manage the cluster:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Step 4: Install Pod Network (CNI)
Install Flannel (Example CNI)
Kubernetes requires a container network interface (CNI) for pod-to-pod communication. Here, we’ll use Flannel. If you encounter issues with pods not running due to br_netfilter
, load the module with:
sudo modprobe overlay | bridge #depending on the module being used.
sudo modprobe br_netfilter
Download and Configure Flannel
Kubernetes needs a Container Network Interface (CNI) for pod-to-pod communication. We’ll use Flannel as our CNI. Follow these steps to download and configure it:
- Download the Flannel YAML configuration file and save it as
kube-flannel.yml
:
curl -LO https://raw.githubusercontent.com/flannel-io/flannel/v0.20.2/Documentation/kube-flannel.yml
- Edit the Flannel Configuration:
- Open the
kube-flannel.yml
file using a text editor:
cd kube-flannel.yml
- Locate the
args
section within thekube-flannel
container definition. It should look like this:
args:
- --ip-masq
- --kube-subnet-mgr
- Add an additional argument to specify the network interface:
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=eth0
- Adding
--iface=eth0
ensures Flannel uses the correct network interface for pod communication.
- Save and exit the file.
- Deploy Flannel:
kubectl apply -f kube-flannel.yaml
For more about Flannel configuration, refer to the Flannel Documentation.
Step 5: Verify Control Plane Status
Run the following command to check if the control plane node is ready:
kubectl get nodes
You should see the control plane node in a “Ready” state if everything is set up correctly.
Step 6: Join Worker Nodes to the Cluster
Set Unique Hostnames for Worker Nodes
Each worker node should have a unique hostname. Set it with:
sudo hostnamectl set-hostname <worker-node-name>
Join Worker Nodes
After initializing the control plane, kubeadm
generates a command to join worker nodes. It will look similar to this:
kubeadm join <control-plane-ip>:6443 --token <token> --discovery-token-ca-cert-hash <hash>
Run the command on each worker node.
Refer to the Adding Nodes with kubeadm for more information.
Troubleshoot Common Issues
If you encounter errors related to unsupported fields in the kubeadm
config file, such as caCertificateValidityPeriod
or certificateValidityPeriod
, open the config file:
sudo nano <your-kubeadm-config-file>.yaml
Remove the unsupported fields if using v1beta3
.
🧪 Step 7: Run Kubernetes End-to-End (E2E) Tests
Kubernetes provides a set of E2E tests to ensure that all cluster components are working together as expected. These tests are especially useful for validating your setup, troubleshooting issues, and ensuring the cluster meets expected performance and reliability standards.
Prerequisites for E2E Tests
- Install
kubectl
and have access to your Kubernetes cluster. - Make sure the
kubeconfig
file is correctly configured, sokubectl
can interact with your cluster:
export KUBECONFIG=$HOME/.kube/config
- Install Go (if it’s not already installed):
Kubernetes E2E tests require Go, as they are implemented in Go code. - Clone the Kubernetes Repository:
The E2E test scripts are included in the Kubernetes GitHub repository.
git clone https://github.com/kubernetes/kubernetes.git cd kubernetes
Running Basic E2E Tests
Kubernetes includes a variety of E2E tests, from simple checks to complex scenarios. Follow these steps to run a basic test suite.
- Build the E2E Test Binary:
make WHAT=test/e2e/e2e.test
- Run the E2E Test Suite:
- The following command will run the E2E test suite against your cluster. Replace
<your-cluster-ip>
with your actual API server IP.
./_output/local/bin/linux/amd64/e2e.test --host=https://<your-cluster-ip>:6443
3. Specify tests to run by using the --ginkgo.focus
flag if you want to run specific tests. For example, to run only the tests focused on pod creation, use:
./_output/local/bin/linux/amd64/e2e.test --ginkgo.focus="\[sig-scheduling\] Pod should"
4. Viewing Test Results:
The E2E tests will output results to the console, indicating success or failure for each test.
- You can also export logs to a file:
./_output/local/bin/linux/amd64/e2e.test --host=https://<your-cluster-ip>:6443 > e2e_test_results.log
Running Tests with Sonobuoy (Alternative Method)
Sonobuoy is a diagnostic tool designed to run Kubernetes conformance tests and can simplify the process of running E2E tests across different Kubernetes clusters.
- Download Sonobuoy: Download Sonobuoy:
curl -L https://github.com/vmware-tanzu/sonobuoy/releases/latest/download/sonobuoy_$(uname -s)_$(uname -m).tar.gz | tar xvz sudo mv sonobuoy /usr/local/bin
- Run Sonobuoy Tests:
- Run Sonobuoy to start the conformance test suite:
sonobuoy run
- Check the test status:
sonobuoy status
- Retrieve and View Results:
- When tests are complete, download the results:
sonobuoy retrieve .
6. Extract the tarball:
tar -xf *.tar.gz
- Results will be stored in a set of JSON files, providing detailed insights into which tests passed or failed.
Additional Testing Tips
- Run Tests on a Dedicated Test Cluster: Running E2E tests can place a significant load on the cluster, so it’s advisable to run them in a testing or staging environment.
- Customize Test Runs: For large clusters, it might be useful to limit test scope by using tags like
--ginkgo.focus
or--ginkgo.skip
to focus on specific test types. - Regular Testing: For production-grade clusters, consider integrating E2E tests into a CI/CD pipeline to monitor cluster health continuously.
More on E2E testing is available in the [Kubernetes E2E Testing Guide](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-testing/e2e-tests)
🎉 Wrapping Up
Congratulations! You’ve successfully set up a self-managed Kubernetes cluster. From here, you can deploy applications, set up monitoring, and explore Kubernetes’ powerful orchestration capabilities.
For more complex configurations, like high availability clusters, consider diving deeper into the Kubernetes documentation.
Happy Clustering! 👾