> ## Documentation Index
> Fetch the complete documentation index at: https://docs.hyko.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Talos cluster

A comprehensive, step-by-step guide to deploying multi-node Talos Linux Kubernetes cluster from scratch.

## What is Talos Linux?

Talos Linux is a modern, secure, and minimal Linux distribution designed specifically for Kubernetes. Unlike traditional Linux distributions, Talos is:

* **Immutable**: The OS cannot be modified at runtime, ensuring consistency
* **API-managed**: All configuration is done via a declarative API
* **Minimal**: Only includes what's necessary to run Kubernetes
* **Secure by default**: No SSH, no shell, reduced attack surface

Learn more in the [official Talos documentation](https://docs.siderolabs.com/talos/v1.11/getting-started/getting-started).

## Prerequisites

### 1. Control Machine Setup

You need a Linux machine (physical, VM, or WSL2) to orchestrate the deployment. This machine will NOT be part of the cluster; it's just your workstation.

* **Install talosctl (Talos CLI):**
* **Install kubectl (Kubernetes CLI)**

### 2. Prepare Cluster Nodes

You need physical or virtual machines with Talos Linux installed. For this guide, we'll assume:

* **1 control plane node**: Will run the Kubernetes control plane and etcd
* **2 worker nodes**: Will run your application workloads

**Minimum requirements per node:**

* 2+ CPU cores (4 recommended)
* 4GB RAM (8GB recommended for control plane)
* 50GB+ disk space
* Network connectivity between all nodes

## Getting Your Talos Factory Image

Talos uses a "factory" system to generate custom installation images with system extensions (drivers, plugins, etc.). You need to determine which image matches your hardware.

### Step 1: Visit the Talos Image Factory

Go to [https://factory.talos.dev/](https://factory.talos.dev/)

### Step 2: Select System Extensions (if needed)

Common extensions include:

* **qemu-guest-agent**: For Proxmox/QEMU VMs
* **iscsi-tools**: For iSCSI storage
* **siderolabs/util-linux-tools**

For a basic setup without special hardware, you can use the default image with no extensions. The two last tools are used for setting up CSI, we highly recommend installing them.

### Step 3: Copy Your Image Identifier

The factory will generate an installer URL that looks like:

```
https://factory.talos.dev/image/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b/v1.12.0/metal-amd64.iso
```

This identifier is unique to your selected extensions. Save this—you'll need it in the next section.

## Step 4: Bootup the VMs with the talos image

* proceed to bootup the 3 vms with this image, after bootup, each vm will show a talos control plan which contains the state of the talos machine as well as their IPS.

## Environment Configuration

Now let's set up your environment variables. These will be used throughout the deployment process.
for a highly available cluster you should have 3 control plane nodes
Open a terminal and export these variables (adjust IPs to match your infrastructure):

```bash theme={null}
# Cluster identification
export CLUSTER_NAME="my-talos-cluster"
export CONTROL_PLANE_IP="192.168.1.101"  # First control plane node

# Where to store generated configurations
export CONFIG_PATH="$HOME/.config/talos-cluster"

# Your control plane node IPs (space-separated)
export CONTROL_PLANE_NODES="192.168.1.101"

# Your worker node IPs (space-separated)
export WORKER_NODES="192.168.1.104 192.168.1.105"

# Talos factory image (from previous section)
export INSTALL_IMAGE=
"https://factory.talos.dev/image/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b/v1.12.0/metal-amd64.iso"
```

**Important:** These variables only persist in your current shell session. If you close your terminal, you'll need to export them again, or add them to a script/file that you can source.

**Pro tip:** Save these to a file for easy reuse:

```bash theme={null}
cat > ~/talos-cluster-env.sh << 'EOF'
export CLUSTER_NAME="my-talos-cluster"
export CONTROL_PLANE_IP="192.168.1.101"
export CONFIG_PATH="$HOME/.config/talos-cluster"
export CONTROL_PLANE_NODES="192.168.1.101"
export WORKER_NODES="192.168.1.104 192.168.1.105"
export INSTALL_IMAGE="https://factory.talos.dev/image/88d1f7a5c4f1d3aba7df787c448c1d3d008ed29cfb34af53fa0df4336a56040b/v1.12.0/metal-amd64.iso"
EOF

# Load them anytime with:
source ~/talos-cluster-env.sh
```

## Step-by-Step Deployment

### Step 1: Generate Cluster Configuration Files

Talos needs configuration files that define how each node should behave. We'll generate these now.

```bash theme={null}
# Create output directory
mkdir -p "$CONFIG_PATH"

# Generate configurations
talosctl gen config \
  "$CLUSTER_NAME" \
  "https://$CONTROL_PLANE_IP:6443" \
  --output-dir "$CONFIG_PATH" \
  --install-image "$INSTALL_IMAGE"
```

**What's happening here:**

* `talosctl gen config` creates three files:
  * `controlplane.yaml`: Configuration for control plane nodes
  * `worker.yaml`: Configuration for worker nodes
  * `talosconfig`: Authentication credentials for managing the cluster
* The `--install-image` flag tells Talos which image to use (with your hardware-specific extensions)
* The cluster endpoint URL (`https://$CONTROL_PLANE_IP:6443`) is where kubectl will connect

**Expected output:**

```
generating PKI and tokens
Created /home/user/.config/talos-cluster/controlplane.yaml
Created /home/user/.config/talos-cluster/worker.yaml
Created /home/user/.config/talos-cluster/talosconfig
```

**What just happened:**

* Talos generated cryptographic certificates for secure communication
* Created machine configurations with your cluster name and endpoint
* Generated credentials for you to manage the cluster

**Merge talosconfig into your local configuration:**

```bash theme={null}
talosctl config merge "$CONFIG_PATH/talosconfig"
```

This allows `talosctl` commands to authenticate with your cluster.

### Step 2: Apply Configuration to Control Plane Nodes

Now we'll push the control plane configuration to each control plane node. This tells them they're control plane nodes and gives them their identity.

```bash theme={null}
# Apply to each control plane node
for IP in $CONTROL_PLANE_NODES; do
  echo "→ Configuring control plane node: $IP"
  talosctl apply-config \
    --insecure \
    --nodes "$IP" \
    --file "$CONFIG_PATH/controlplane.yaml"
  echo "✓ Configuration applied to $IP"
done
```

**What's happening here:**

* `--insecure` flag: Required on first boot because nodes don't have certificates yet
* `--nodes`: Specifies which node to configure
* `--file`: The configuration file to apply
* After applying, each node will reboot to apply the configuration

**What you'll see:**

```
→ Configuring control plane node: 192.168.1.101
✓ Configuration applied to 192.168.1.101
```

**Wait 2-5 minutes** for all control plane nodes to complete their reboot cycle.

### Step 3: Apply Configuration to Worker Nodes

Same process, but for worker nodes. These nodes will run your application workloads.

**Execute:**

```bash theme={null}
# Apply to each worker node
for IP in $WORKER_NODES; do
  echo "→ Configuring worker node: $IP"
  talosctl apply-config \
    --insecure \
    --nodes "$IP" \
    --file "$CONFIG_PATH/worker.yaml"
  echo "✓ Configuration applied to $IP"
done
```

**Wait another 2-5 minutes** for worker nodes to reboot and apply configuration.

**Verify nodes are responsive:**

```bash theme={null}
# Check control plane nodes
for IP in $CONTROL_PLANE_NODES; do
  echo "Checking $IP..."
  talosctl --nodes "$IP" version
done

# Check worker nodes
for IP in $WORKER_NODES; do
  echo "Checking $IP..."
  talosctl --nodes "$IP" version
done
```

You should see version information for each node. If you get connection errors, wait a bit longer; nodes might still be rebooting.

### Step 4: Bootstrap the Cluster

This is the critical step where we initialize etcd and start Kubernetes. **This must only be done once per cluster.**
Select one control plane node (in our case we only have one)

```bash theme={null}
# Set the endpoint and node for talosctl
talosctl config endpoint "$CONTROL_PLANE_IP"
talosctl config node "$CONTROL_PLANE_IP"

# Bootstrap the cluster
echo "Bootstrapping etcd and Kubernetes control plane..."
talosctl bootstrap
```

**What's happening here:**

* We're telling `talosctl` which node to communicate with (the first control plane node)
* The `bootstrap` command initializes etcd on this node
* etcd is the distributed database that stores all Kubernetes state
* Other control plane nodes will automatically join the etcd cluster
* Kubernetes control plane components will start

**⚠️ CRITICAL WARNING:**
<div class="tenor-gif-embed" data-postid="18352782376048563814" data-share-method="host" data-aspect-ratio="1.76596" data-width="100%"><a href="https://tenor.com/view/wait-one-more-thing-charlie-mr-boss-smiling-friends-just-one-last-thing-gif-18352782376048563814">Wait One More Thing Charlie GIF</a>from <a href="https://tenor.com/search/wait+one+more+thing-gifs">Wait One More Thing GIFs</a></div> <script type="text/javascript" async src="https://tenor.com/embed.js" />

DO NOT run `talosctl bootstrap` more than once. Running it again will destroy your etcd cluster and you'll lose all data. If you accidentally run it twice, you'll need to wipe all nodes and start over.

**What's happening in the background:**

1. First control plane node starts etcd
2. etcd creates the initial cluster state
3. Kubernetes API server starts
4. Controller manager and scheduler start
5. Other control plane nodes detect the running cluster and join
6. Worker nodes connect to the API server and join as compute nodes

**Wait 2-3 minutes** for Kubernetes to fully initialize.

**Verify the bootstrap:**

```bash theme={null}
# Check etcd health
talosctl etcd members

# Check services
talosctl services
```

You should see etcd, kubelet, and other services running.

### Step 5: Retrieve Kubeconfig

Now we need the kubeconfig file so `kubectl` can communicate with your cluster.

```bash theme={null}
# Retrieve kubeconfig
talosctl kubeconfig "$CONFIG_PATH"
```

**What's happening here:**

* Talos generates a kubeconfig file with admin credentials
* This file contains the cluster endpoint and authentication certificates
* kubectl will use this to communicate with the Kubernetes API

**Expected output:**

```
downloading kubeconfig from node 192.168.1.101
kubeconfig written to /home/user/.config/talos-cluster/kubeconfig
```

**Set KUBECONFIG environment variable:**

```bash theme={null}
export KUBECONFIG="$CONFIG_PATH/kubeconfig"
```

**Verify kubectl connectivity:**

```bash theme={null}
kubectl cluster-info
```

**Expected output:**

```
Kubernetes control plane is running at https://192.168.1.101:6443
CoreDNS is running at https://192.168.1.101:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
```

## Understanding What We Built

### Architecture Overview

Your cluster now has:

**Control Plane Node (1):**

* Running etcd (distributed database)
* Running Kubernetes API server
* Running scheduler (assigns pods to nodes)
* Running controller manager (maintains desired state)

**Worker Nodes (2):**

* Running kubelet (node agent)
* Running container runtime
* Ready to run your application pods

**Networking:**

* Flannel CNI for pod networking

### Completely Reset a Node

If a node is broken beyond repair:

```bash theme={null}
talosctl reset --nodes <NODE_IP> --graceful=false --reboot
```

Then reapply configuration from Step 2 or 3.

## Upgrading Talos

When a new Talos version is released:

```bash theme={null}
# Upgrade control plane nodes one at a time
talosctl upgrade --nodes 192.168.1.101 \
  --image factory.talos.dev/installer/<YOUR_EXTENSIONS>:v1.12.0

# Wait for node to come back up, then next node
talosctl upgrade --nodes 192.168.1.102 \
  --image factory.talos.dev/installer/<YOUR_EXTENSIONS>:v1.12.0

# Continue for remaining nodes...
```

<div class="tenor-gif-embed" data-postid="8966105137955520559" data-share-method="host" data-aspect-ratio="1.76596" data-width="100%"><a href="https://tenor.com/view/smiling-friends-mr-boss-impact-frames-mr-boss-smilin-friends-pim-gif-8966105137955520559">Smiling Friends Mr Boss GIF</a>from <a href="https://tenor.com/search/smiling+friends-gifs">Smiling Friends GIFs</a></div> <script type="text/javascript" async src="https://tenor.com/embed.js" />

**Congratulations! You now have a multi-node Kubernetes cluster running on Talos Linux! 🎉**
