Difference between revisions of "K3s"
(Instructions to fix k3s network policies) |
|||
(30 intermediate revisions by 11 users not shown) | |||
Line 5: | Line 5: | ||
<syntaxHighlight lang=nix> | <syntaxHighlight lang=nix> | ||
{ | { | ||
− | # | + | networking.firewall.allowedTCPPorts = [ |
− | networking.firewall. | + | 6443 # k3s: required so that pods can reach the API server (running on port 6443 by default) |
+ | # 2379 # k3s, etcd clients: required if using a "High Availability Embedded etcd" configuration | ||
+ | # 2380 # k3s, etcd peers: required if using a "High Availability Embedded etcd" configuration | ||
+ | ]; | ||
+ | networking.firewall.allowedUDPPorts = [ | ||
+ | # 8472 # k3s, flannel: required if using multi-node for inter-node networking | ||
+ | ]; | ||
services.k3s.enable = true; | services.k3s.enable = true; | ||
services.k3s.role = "server"; | services.k3s.role = "server"; | ||
Line 20: | Line 26: | ||
== Multi-node setup == | == Multi-node setup == | ||
− | + | it is simple to create a cluster of multiple nodes in a highly available setup (all nodes are in the control-plane and are a part of the etcd cluster). | |
+ | |||
+ | The first node is configured like this: | ||
+ | |||
+ | <syntaxHighlight lang=nix> | ||
+ | { | ||
+ | services.k3s = { | ||
+ | enable = true; | ||
+ | role = "server"; | ||
+ | token = "<randomized common secret>"; | ||
+ | clusterInit = true; | ||
+ | }; | ||
+ | } | ||
+ | </syntaxHighlight> | ||
+ | |||
+ | Any other subsequent nodes can be added with a slightly different config: | ||
+ | |||
+ | <syntaxHighlight lang=nix> | ||
+ | { | ||
+ | services.k3s = { | ||
+ | enable = true; | ||
+ | role = "server"; | ||
+ | token = "<randomized common secret>"; | ||
+ | serverAddr = "https://<ip of first node>:6443"; | ||
+ | }; | ||
+ | } | ||
+ | </syntaxHighlight> | ||
+ | |||
+ | For this to work you need to open the aforementioned API, etcd, and flannel ports in the firewall. Note that it is [https://etcd.io/docs/v3.3/faq/#why-an-odd-number-of-cluster-members recommended] to use an odd number of nodes in such a cluster. | ||
+ | |||
+ | Or see this [https://github.com/Mic92/doctor-cluster-config/tree/master/modules/k3s real world example]. You might want to ignore some parts of it i.e. the monitoring as this is specific to our setup. | ||
The K3s server needs to import <code>modules/k3s/server.nix</code> and an agent <code>modules/k3s/agent.nix</code>. | The K3s server needs to import <code>modules/k3s/server.nix</code> and an agent <code>modules/k3s/agent.nix</code>. | ||
− | + | Tip: You might run into issues with coredns not being reachable from agent nodes. Right now, we disable the NixOS firewall all together until we find a better solution. | |
== ZFS support == | == ZFS support == | ||
− | K3s's builtin containerd does not support the zfs snapshotter. However it is possible to configure it to use an external containerd: | + | K3s's builtin containerd does not support the zfs snapshotter. However, it is possible to configure it to use an external containerd: |
<syntaxHighlight lang=nix> | <syntaxHighlight lang=nix> | ||
Line 45: | Line 81: | ||
conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d/"; | conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d/"; | ||
}; | }; | ||
+ | # Optionally set private registry credentials here instead of using /etc/rancher/k3s/registries.yaml | ||
+ | # plugins."io.containerd.grpc.v1.cri".registry.configs."registry.example.com".auth = { | ||
+ | # username = ""; | ||
+ | # password = ""; | ||
+ | # }; | ||
}; | }; | ||
}; | }; | ||
Line 53: | Line 94: | ||
</syntaxHighlight> | </syntaxHighlight> | ||
− | == | + | == Nvidia support == |
+ | To use Nvidia GPU in the cluster the nvidia-container-runtime and runc are needed. To get the two components it suffices to add the following to the configuration | ||
+ | |||
+ | <syntaxHighlight lang=nix> | ||
+ | virtualisation.docker = { | ||
+ | enable = true; | ||
+ | enableNvidia = true; | ||
+ | }; | ||
+ | environment.systemPackages = with pkgs; [ docker runc ]; | ||
+ | </syntaxHighlight> | ||
+ | |||
+ | Note, using docker here is a workaround, it will install nvidia-container-runtime and that will cause it to be accessible via <code>/run/current-system/sw/bin/nvidia-container-runtime</code>, currently its not directly accessible in nixpkgs. | ||
+ | |||
+ | You now need to create a new file in <code>/var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl</code> with the following | ||
+ | |||
+ | <syntaxHighlight lang=toml> | ||
+ | {{ template "base" . }} | ||
+ | |||
+ | [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia] | ||
+ | privileged_without_host_devices = false | ||
+ | runtime_engine = "" | ||
+ | runtime_root = "" | ||
+ | runtime_type = "io.containerd.runc.v2" | ||
+ | |||
+ | [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options] | ||
+ | BinaryName = "/run/current-system/sw/bin/nvidia-container-runtime" | ||
+ | </syntaxHighlight> | ||
+ | |||
+ | Update: | ||
+ | As of 12/03/2024 It appears that the last two lines above are added by default, and if the two lines are present (as shown above) it will refuse to start the server. You will need to remove the two lines from that point onward. | ||
+ | |||
+ | Note here we are pointing the nvidia runtime to "/run/current-system/sw/bin/nvidia-container-runtime". | ||
− | + | Now apply the following runtime class to k3s cluster: | |
− | + | <syntaxHighlight lang=yaml> | |
− | <syntaxHighlight lang= | + | apiVersion: node.k8s.io/v1 |
− | + | handler: nvidia | |
+ | kind: RuntimeClass | ||
+ | metadata: | ||
+ | labels: | ||
+ | app.kubernetes.io/component: gpu-operator | ||
+ | name: nvidia | ||
</syntaxHighlight> | </syntaxHighlight> | ||
− | + | Following [https://github.com/NVIDIA/k8s-device-plugin#deployment-via-helm k8s-device-plugin] install the helm chart with <code>runtimeClassName: nvidia</code> set. In order to passthrough the nvidia card into the container, your deployments spec must contain | |
+ | - runtimeClassName: nvidia | ||
+ | - env: | ||
+ | - name: NVIDIA_VISIBLE_DEVICES | ||
+ | value: all | ||
+ | - name: NVIDIA_DRIVER_CAPABILITIES | ||
+ | value: all | ||
+ | to test its working exec onto a pod and run <code>nvidia-smi</code>. For more configurability of nvidia related matters in k3s look in [https://docs.k3s.io/advanced#nvidia-container-runtime-support k3s-docs] | ||
+ | |||
+ | == Storage == | ||
+ | |||
+ | === Longhorn === | ||
+ | |||
+ | NixOS configuration required for Longhorn: | ||
+ | |||
+ | <syntaxHighlight lang=nix> | ||
+ | environment.systemPackages = [ pkgs.nfs-utils ]; | ||
+ | services.openiscsi = { | ||
+ | enable = true; | ||
+ | name = "${config.networking.hostName}-initiatorhost"; | ||
+ | }; | ||
+ | </syntaxHighlight> | ||
+ | |||
+ | Longhorn container has trouble with NixOS path. Solution is to override PATH environment variable, such as: | ||
+ | |||
+ | <syntaxHighlight lang=bash> | ||
+ | PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/run/wrappers/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin | ||
+ | </syntaxHighlight> | ||
+ | |||
+ | ==== Kyverno Policy for Fixing Longhorn Container for NixOS ==== | ||
+ | |||
+ | <syntaxHighlight lang=yaml> | ||
+ | --- | ||
+ | apiVersion: v1 | ||
+ | kind: ConfigMap | ||
+ | metadata: | ||
+ | name: longhorn-nixos-path | ||
+ | namespace: longhorn-system | ||
+ | data: | ||
+ | PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/run/wrappers/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin | ||
+ | --- | ||
+ | apiVersion: kyverno.io/v1 | ||
+ | kind: ClusterPolicy | ||
+ | metadata: | ||
+ | name: longhorn-add-nixos-path | ||
+ | annotations: | ||
+ | policies.kyverno.io/title: Add Environment Variables from ConfigMap | ||
+ | policies.kyverno.io/subject: Pod | ||
+ | policies.kyverno.io/category: Other | ||
+ | policies.kyverno.io/description: >- | ||
+ | Longhorn invokes executables on the host system, and needs | ||
+ | to be aware of the host systems PATH. This modifies all | ||
+ | deployments such that the PATH is explicitly set to support | ||
+ | NixOS based systems. | ||
+ | spec: | ||
+ | rules: | ||
+ | - name: add-env-vars | ||
+ | match: | ||
+ | resources: | ||
+ | kinds: | ||
+ | - Pod | ||
+ | namespaces: | ||
+ | - longhorn-system | ||
+ | mutate: | ||
+ | patchStrategicMerge: | ||
+ | spec: | ||
+ | initContainers: | ||
+ | - (name): "*" | ||
+ | envFrom: | ||
+ | - configMapRef: | ||
+ | name: longhorn-nixos-path | ||
+ | containers: | ||
+ | - (name): "*" | ||
+ | envFrom: | ||
+ | - configMapRef: | ||
+ | name: longhorn-nixos-path | ||
+ | --- | ||
+ | </syntaxHighlight> | ||
+ | |||
+ | === NFS === | ||
+ | |||
+ | NixOS configuration required for NFS: | ||
+ | |||
<syntaxHighlight lang=nix> | <syntaxHighlight lang=nix> | ||
− | + | boot.supportedFilesystems = [ "nfs" ]; | |
+ | services.rpcbind.enable = true; | ||
</syntaxHighlight> | </syntaxHighlight> | ||
+ | |||
+ | == Upkeep == | ||
+ | |||
+ | === Cluster Reset === | ||
+ | |||
+ | Disable K3s instances in **all** hosts: | ||
+ | |||
+ | In NixOS configuration, set: | ||
+ | services.k3s.enable = false; | ||
+ | Rebuild NixOS. This is going to remove K3s service files. But it won't delete K3s data. | ||
+ | |||
+ | To delete K3s files: | ||
+ | |||
+ | Dismount kubelet: | ||
+ | KUBELET_PATH=$(mount | grep kubelet | cut -d' ' -f3); | ||
+ | ${KUBELET_PATH:+umount $KUBELET_PATH} | ||
+ | |||
+ | Delete k3s data: | ||
+ | rm -rf /etc/rancher/{k3s,node}; | ||
+ | rm -rf /var/lib/{rancher/k3s,kubelet,longhorn,etcd,cni} | ||
+ | |||
+ | When using Etcd, Reset Etcd: | ||
+ | |||
+ | Certify **all** K3s instances are stopped, because a single instance can re-seed etcd database with previous cryptographic key. | ||
+ | |||
+ | Disable etcd database in NixOS configuration: | ||
+ | services.etcd.enable = false; | ||
+ | Rebuild NixOS. | ||
+ | |||
+ | Delete etcd files: | ||
+ | rm -rf /var/lib/etcd/ | ||
+ | |||
+ | Reboot hosts. | ||
+ | |||
+ | In NixOS configuration: | ||
+ | Re-enable Etcd first. Rebuild NixOS. Certify service health. (systemctl status etcd) | ||
+ | Re-enable K3s second. Rebuild NixOS. Certify service health. (systemctl status k3s) | ||
+ | |||
+ | Etcd & K3s cluster will be provisioned new. | ||
+ | |||
+ | Tip: Use Ansible to automate reset routine, like [https://gist.github.com/superherointj/d496714ddf218bdcd1c303dbfd834a5b this]. | ||
== Troubleshooting == | == Troubleshooting == | ||
− | === | + | === Raspberry Pi not working === |
− | If the k3s.service/k3s server does not start and gives you | + | If the k3s.service/k3s server does not start and gives you the error <code>FATA[0000] failed to find memory cgroup (v2)</code> Here's the github issue: https://github.com/k3s-io/k3s/issues/2067 . |
− | To fix the problem you can add these things to your configuration.nix. | + | To fix the problem, you can add these things to your configuration.nix. |
<source lang="nix"> boot.kernelParams = [ | <source lang="nix"> boot.kernelParams = [ | ||
Line 80: | Line 281: | ||
</source> | </source> | ||
+ | === FailedKillPod: failed to get network "cbr0" cached result === | ||
+ | |||
+ | > KillPodSandboxError: failed to get network "cbr0" cached result: decoding version from network config: unexpected end of JSON input | ||
+ | |||
+ | Workaround: https://github.com/k3s-io/k3s/issues/6185#issuecomment-1581245331 | ||
+ | |||
+ | == Release support == | ||
+ | |||
+ | Documented [https://github.com/NixOS/nixpkgs/tree/master/pkgs/applications/networking/cluster/k3s#upstream-release-cadence-and-support here]. | ||
[[Category:Applications]] | [[Category:Applications]] | ||
[[Category:Server]] | [[Category:Server]] | ||
[[Category:orchestration]] | [[Category:orchestration]] |
Latest revision as of 13:58, 1 May 2024
K3s is a simplified version of Kubernetes. It bundles all components for a kubernetes cluster into a few of small binaries.
Single node setup
{
networking.firewall.allowedTCPPorts = [
6443 # k3s: required so that pods can reach the API server (running on port 6443 by default)
# 2379 # k3s, etcd clients: required if using a "High Availability Embedded etcd" configuration
# 2380 # k3s, etcd peers: required if using a "High Availability Embedded etcd" configuration
];
networking.firewall.allowedUDPPorts = [
# 8472 # k3s, flannel: required if using multi-node for inter-node networking
];
services.k3s.enable = true;
services.k3s.role = "server";
services.k3s.extraFlags = toString [
# "--kubelet-arg=v=4" # Optionally add additional args to k3s
];
environment.systemPackages = [ pkgs.k3s ];
}
After enabling, you can access your cluster through sudo k3s kubectl
i.e. sudo k3s kubectl cluster-info
, or by using the generated kubeconfig file in /etc/rancher/k3s/k3s.yaml
Multi-node setup
it is simple to create a cluster of multiple nodes in a highly available setup (all nodes are in the control-plane and are a part of the etcd cluster).
The first node is configured like this:
{
services.k3s = {
enable = true;
role = "server";
token = "<randomized common secret>";
clusterInit = true;
};
}
Any other subsequent nodes can be added with a slightly different config:
{
services.k3s = {
enable = true;
role = "server";
token = "<randomized common secret>";
serverAddr = "https://<ip of first node>:6443";
};
}
For this to work you need to open the aforementioned API, etcd, and flannel ports in the firewall. Note that it is recommended to use an odd number of nodes in such a cluster.
Or see this real world example. You might want to ignore some parts of it i.e. the monitoring as this is specific to our setup.
The K3s server needs to import modules/k3s/server.nix
and an agent modules/k3s/agent.nix
.
Tip: You might run into issues with coredns not being reachable from agent nodes. Right now, we disable the NixOS firewall all together until we find a better solution.
ZFS support
K3s's builtin containerd does not support the zfs snapshotter. However, it is possible to configure it to use an external containerd:
virtualisation.containerd = {
enable = true;
settings =
let
fullCNIPlugins = pkgs.buildEnv {
name = "full-cni";
paths = with pkgs;[
cni-plugins
cni-plugin-flannel
];
};
in {
plugins."io.containerd.grpc.v1.cri".cni = {
bin_dir = "${fullCNIPlugins}/bin";
conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d/";
};
# Optionally set private registry credentials here instead of using /etc/rancher/k3s/registries.yaml
# plugins."io.containerd.grpc.v1.cri".registry.configs."registry.example.com".auth = {
# username = "";
# password = "";
# };
};
};
# TODO describe how to enable zfs snapshotter in containerd
services.k3s.extraFlags = toString [
"--container-runtime-endpoint unix:///run/containerd/containerd.sock"
];
Nvidia support
To use Nvidia GPU in the cluster the nvidia-container-runtime and runc are needed. To get the two components it suffices to add the following to the configuration
virtualisation.docker = {
enable = true;
enableNvidia = true;
};
environment.systemPackages = with pkgs; [ docker runc ];
Note, using docker here is a workaround, it will install nvidia-container-runtime and that will cause it to be accessible via /run/current-system/sw/bin/nvidia-container-runtime
, currently its not directly accessible in nixpkgs.
You now need to create a new file in /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
with the following
{{ template "base" . }}
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/run/current-system/sw/bin/nvidia-container-runtime"
Update: As of 12/03/2024 It appears that the last two lines above are added by default, and if the two lines are present (as shown above) it will refuse to start the server. You will need to remove the two lines from that point onward.
Note here we are pointing the nvidia runtime to "/run/current-system/sw/bin/nvidia-container-runtime".
Now apply the following runtime class to k3s cluster:
apiVersion: node.k8s.io/v1
handler: nvidia
kind: RuntimeClass
metadata:
labels:
app.kubernetes.io/component: gpu-operator
name: nvidia
Following k8s-device-plugin install the helm chart with runtimeClassName: nvidia
set. In order to passthrough the nvidia card into the container, your deployments spec must contain
- runtimeClassName: nvidia
- env:
- name: NVIDIA_VISIBLE_DEVICES value: all - name: NVIDIA_DRIVER_CAPABILITIES value: all
to test its working exec onto a pod and run nvidia-smi
. For more configurability of nvidia related matters in k3s look in k3s-docs
Storage
Longhorn
NixOS configuration required for Longhorn:
environment.systemPackages = [ pkgs.nfs-utils ];
services.openiscsi = {
enable = true;
name = "${config.networking.hostName}-initiatorhost";
};
Longhorn container has trouble with NixOS path. Solution is to override PATH environment variable, such as:
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/run/wrappers/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
Kyverno Policy for Fixing Longhorn Container for NixOS
---
apiVersion: v1
kind: ConfigMap
metadata:
name: longhorn-nixos-path
namespace: longhorn-system
data:
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/run/wrappers/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: longhorn-add-nixos-path
annotations:
policies.kyverno.io/title: Add Environment Variables from ConfigMap
policies.kyverno.io/subject: Pod
policies.kyverno.io/category: Other
policies.kyverno.io/description: >-
Longhorn invokes executables on the host system, and needs
to be aware of the host systems PATH. This modifies all
deployments such that the PATH is explicitly set to support
NixOS based systems.
spec:
rules:
- name: add-env-vars
match:
resources:
kinds:
- Pod
namespaces:
- longhorn-system
mutate:
patchStrategicMerge:
spec:
initContainers:
- (name): "*"
envFrom:
- configMapRef:
name: longhorn-nixos-path
containers:
- (name): "*"
envFrom:
- configMapRef:
name: longhorn-nixos-path
---
NFS
NixOS configuration required for NFS:
boot.supportedFilesystems = [ "nfs" ];
services.rpcbind.enable = true;
Upkeep
Cluster Reset
Disable K3s instances in **all** hosts:
In NixOS configuration, set:
services.k3s.enable = false;
Rebuild NixOS. This is going to remove K3s service files. But it won't delete K3s data.
To delete K3s files:
Dismount kubelet:
KUBELET_PATH=$(mount | grep kubelet | cut -d' ' -f3); ${KUBELET_PATH:+umount $KUBELET_PATH}
Delete k3s data:
rm -rf /etc/rancher/{k3s,node}; rm -rf /var/lib/{rancher/k3s,kubelet,longhorn,etcd,cni}
When using Etcd, Reset Etcd:
Certify **all** K3s instances are stopped, because a single instance can re-seed etcd database with previous cryptographic key.
Disable etcd database in NixOS configuration:
services.etcd.enable = false;
Rebuild NixOS.
Delete etcd files:
rm -rf /var/lib/etcd/
Reboot hosts.
In NixOS configuration:
Re-enable Etcd first. Rebuild NixOS. Certify service health. (systemctl status etcd) Re-enable K3s second. Rebuild NixOS. Certify service health. (systemctl status k3s)
Etcd & K3s cluster will be provisioned new.
Tip: Use Ansible to automate reset routine, like this.
Troubleshooting
Raspberry Pi not working
If the k3s.service/k3s server does not start and gives you the error FATA[0000] failed to find memory cgroup (v2)
Here's the github issue: https://github.com/k3s-io/k3s/issues/2067 .
To fix the problem, you can add these things to your configuration.nix.
boot.kernelParams = [
"cgroup_enable=cpuset" "cgroup_memory=1" "cgroup_enable=memory"
];
FailedKillPod: failed to get network "cbr0" cached result
> KillPodSandboxError: failed to get network "cbr0" cached result: decoding version from network config: unexpected end of JSON input
Workaround: https://github.com/k3s-io/k3s/issues/6185#issuecomment-1581245331
Release support
Documented here.