This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Welcome

Welcome to the Sidero documentation.

Community

If you’re interested in this project and would like to help in engineering efforts, or have general usage questions, we are happy to have you! We hold a weekly meeting that all audiences are welcome to attend.

Office Hours

You can subscribe to this meeting by joining the community forum above.

1 - Overview

1.1 - Introduction

Sidero (“Iron” in Greek) is a project created by the Talos Systems team. The goal of this project is to provide lightweight, composable tools that can be used to create bare-metal Talos + Kubernetes clusters. These tools are built around the Cluster API project. Sidero is also a subproject of Talos Systems’ Arges project, which will publish known-good versions of these components (along with others) with each release.

Overview

Sidero is made currently made up of three components:

  • Metal Metadata Server: Provides a Cluster API (CAPI)-aware metadata server
  • Metal Controller Manager: Provides custom resources and controllers for managing the lifecycle of metal machines
  • Cluster API Provider Sidero (CAPS): A Cluster API infrastructure provider that makes use of the pieces above to spin up Kubernetes clusters

Sidero also needs these co-requisites in order to be useful:

All componenets mentioned above can be installed using Cluster API’s clusterctl tool.

Because of the design of Cluster API, there is inherently a “chicken and egg” problem with needing an existing Kubernetes cluster in order to provision the management plane. Talos Systems and the Cluster API community have created tools to help make this transition easier. That being said, the management plane cluster does not have to be based on Talos. If you would, however, like to use Talos as the OS of choice for the Sidero management plane, you can find a number of ways to deploy Talos in the documentation.

1.2 - Installation

As of Cluster API version 0.3.9, Sidero is included as a default infrastructure provider in clusterctl.

To install Sidero and the other Talos providers, simply issue:

clusterctl init -b talos -c talos -i sidero

1.3 - Architecture

The overarching architecture of Sidero centers around a “management plane”. This plane is expected to serve as a single interface upon which administrators can create, scale, upgrade, and delete Kubernetes clusters. At a high level view, the management plane + created clusters should look something like:

Alternative text

1.4 - Resources

Sidero, the Talos bootstrap/controlplane providers, and Cluster API each provide several custom resources (CRDs) to Kubernetes. These CRDs are crucial to understanding the connections between each provider and in troubleshooting problems. It may also help to look at the cluster template to get an idea of the relationships between these.


Cluster API (CAPI)

It’s worth defining the most basic resources that CAPI provides first, as they are related to several subsequent resources below.

Cluster

Cluster is the highest level CAPI resource. It allows users to specify things like network layout of the cluster, as well as contains references to the infrastructure and control plane resources that will be used to create the cluster.

Machines

Machine represents an infrastructure component hosting a Kubernetes node. Allows for specification of things like Kubernetes version, as well as contains reference to the infrastructure resource that relates to this machine.

MachineDeployments

MachineDeployments are similar to a Deployment and their relationship to Pods in Kubernetes primitives. A MachineDeployment allows for specification of a number of Machine replicas with a given specification.


Cluster API Bootstrap Provider Talos (CABPT)

TalosConfigs

The TalosConfig resource allows a user to specify the type (init, controlplane, join) for a given machine. The bootstrap provider will then generate a Talos machine configuration for that machine. This resource also provides the ability to pass a full, pre-generated machine configuration. Finally, users have the ability to pass configPatches, which are applied to edit a generate machine configuration with user-defined settings. The TalosConfig corresponds to the bootstrap sections of Machines, MachineDeployments, and the controlPlaneConfig section of TalosControlPlanes.

TalosConfigTemplates

TalosConfigTemplates are similar to the TalosConfig above, but used when specifying a bootstrap reference in a MachineDeployment.


Cluster API Control Plane Provider Talos (CACPPT)

TalosControlPlanes

The control plane provider presents a single CRD, the TalosControlPlane. This resource is similar to MachineDeployments, but is targeted exclusively for the Kubernetes control plane nodes. The TalosControlPlane allows for specification of the number of replicas, version of Kubernetes for the control plane nodes, references to the infrastructure resource to use (infrastructureTemplate section), as well as the configuration of the bootstrap data via the controlPlaneConfig section. This resource is referred to by the CAPI Cluster resource via the controlPlaneRef section.


Sidero

Cluster API Provider Sidero (CAPS)

MetalClusters

A MetalCluster is Sidero’s view of the cluster resource. This resource allows users to define the control plane endpoint that corresponds to the Kubernetes API server. This resource corresponds to the infrastructureRef section of Cluster API’s Cluster resource.

MetalMachines

A MetalMachine is Sidero’s view of a machine. Allows for reference of a single server or a server class from which a physical server will be picked to bootstrap.

MetalMachineTemplates

A MetalMachineTemplate is similar to a MetalMachine above, but serves as a template that is reused for resources like MachineDeployments or TalosControlPlanes that allocate multiple Machines at once.

ServerBindings

ServerBindings represent a one-to-one mapping between a Server resource and a MetalMachine resource. A ServerBinding is used internally to keep track of servers that are allocated to a Kubernetes cluster and used to make decisions on cleaning and returning servers to a ServerClass upon deallocation.

Metal Controller Manager

Environments

These define a desired deployment environment for Talos, including things like which kernel to use, kernel args to pass, and the initrd to use. Sidero allows you to define a default environment, as well as other environments that may be specific to a subset of nodes. Users can override the environment at the ServerClass or Server level, if you have requirements for different kernels or kernel parameters.

See the Environments section of our Configuration docs for examples and more detail.

Servers

These represent physical machines as resources in the management plane. These Servers are created when the physical machine PXE boots and completes a “discovery” process in which it registers with the management plane and provides SMBIOS information such as the CPU manufacturer and version, and memory information.

See the Servers section of our Configuration docs for examples and more detail.

ServerClasses

ServerClasses are a grouping of the Servers mentioned above, grouped to create classes of servers based on Memory, CPU or other attributes. These can be used to compose a bank of Servers that are eligible for provisioning.

See the ServerClasses section of our Configuration docs for examples and more detail.

Metal Metadata Server

While the metadata server does not present unique CRDs within Kubernetes, it’s important to understand the metadata resources that are returned to physical servers during the boot process.

Metadata

The metadata server may be familiar to you if you have used cloud environments previously. Using Talos machine configurations created by the Talos Cluster API bootstrap provider, along with patches specified by editing Server/ServerClass resources or TalosConfig/TalosControlPlane resources, metadata is returned to servers who query the metadata server at boot time.

See the Metadata section of our Configuration docs for examples and more detail.

2 - Resource Configuration

2.1 - Environments

Environments are a custom resource provided by the Metal Controller Manager. An environment is a codified description of what should be returned by the PXE server when a physical server attempts to PXE boot.

Especially important in the environment types are the kernel args. From here, one can tweak the IP to the metadata server as well as various other kernel options that Talos and/or the Linux kernel supports.

Environments can be supplied to a given server either at the Server or the ServerClass level. The hierarchy from most to least respected is:

  • .spec.environmentRef provided at Server level
  • .spec.environmentRef provided at ServerClass level
  • "default" Environment created by administrator

A sample environment definition looks like this:

apiVersion: metal.sidero.dev/v1alpha1
kind: Environment
metadata:
  name: default
spec:
  kernel:
    url: "https://github.com/talos-systems/talos/releases/download/v0.8.1/vmlinuz-amd64"
    sha512: ""
    args:
      - init_on_alloc=1
      - init_on_free=1
      - slab_nomerge
      - pti=on
      - consoleblank=0
      - random.trust_cpu=on
      - ima_template=ima-ng
      - ima_appraise=fix
      - ima_hash=sha512
      - console=tty0
      - console=ttyS1,115200n8
      - earlyprintk=ttyS1,115200n8
      - panic=0
      - printk.devkmsg=on
      - talos.platform=metal
      - talos.config=http://$PUBLIC_IP:9091/configdata?uuid=
  initrd:
    url: "https://github.com/talos-systems/talos/releases/download/v0.8.1/initramfs-amd64.xz"
    sha512: ""

Example of overriding "default" Environment at the Server level:

apiVersion: metal.sidero.dev/v1alpha1
kind: Server
...
spec:
  environmentRef:
    namespace: default
    name: boot
  ...

Example of overriding "default" Environment at the ServerClass level:

apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
...
spec:
  environmentRef:
    namespace: default
    name: boot
  ...

2.2 - Servers

Servers are the basic resource of bare metal in the Metal Controller Manager. These are created by PXE booting the servers and allowing them to send a registration request to the management plane.

An example server may look like the following:

apiVersion: metal.sidero.dev/v1alpha1
kind: Server
metadata:
  name: 00000000-0000-0000-0000-d05099d333e0
spec:
  accepted: false
  configPatches:
    - op: replace
      path: /cluster/network/cni
      value:
        name: custom
        urls:
          - http://192.168.1.199/assets/cilium.yaml
  cpu:
    manufacturer: Intel(R) Corporation
    version: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz
  system:
    family: Unknown
    manufacturer: Unknown
    productName: Unknown
    serialNumber: Unknown
    skuNumber: Unknown
    version: Unknown

Installation Disk

A an installation disk is required by Talos on bare metal. This can be specified in a configPatch:

apiVersion: metal.sidero.dev/v1alpha1
kind: Server
...
spec:
  accepted: false
  configPatches:
    - op: replace
      path: /machine/install/disk
      value: /dev/sda

The install disk patch can also be set on the ServerClass:

apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
...
spec:
  configPatches:
    - op: replace
      path: /machine/install/disk
      value: /dev/sda

Server Acceptance

In order for a server to be eligible for consideration, it must be accepted. This is an important separation point which all Servers must pass. Before a Server is accepted, no write action will be performed against it. Thus, it is safe for a computer to be added to a network on which Sidero is operating. Sidero will never write to or wipe any disk on a computer which is not marked as accepted.

This can be tedious for systems in which all attached computers should be considered to be under the control of Sidero. Thus, you may also choose to automatically accept any machine into Sidero on its discovery. Please keep in mind that this means that any newly-connected computer WILL BE WIPED automatically. You can enable auto-acceptance by pasing the --auto-accept-servers=true flag to sidero-controller-manager.

Once accepted, a server will be reset (all disks wiped) and then made available to Sidero.

You should never change an accepted Server to be not accepted while it is in use. Because servers which are not accepted will not be modified, if a server which was accepted is changed to not accepted, the disk will not be wiped upon its exit.

IPMI

Sidero can use IPMI information to control Server power state, reboot servers and set boot order.

IPMI connection information can be set in the Server spec after initial registration:

apiVersion: metal.sidero.dev/v1alpha1
kind: Server
...
spec:
  bmc:
    endpoint: 10.0.0.25
    user: admin
    pass: password

If IPMI information is set, server boot order might be set to boot from disk, then network, Sidero will switch servers to PXE boot once that is required.

Without IPMI info, Sidero can still register servers, wipe them and provision clusters, but Sidero won’t be able to reboot servers once they are removed from the cluster. If IPMI info is not set, servers should be configured to boot first from network, then from disk.

2.3 - Server Classes

Server classes are a way to group distinct server resources. The “qualifiers” key allows the administrator to specify criteria upon which to group these servers. There are currently three keys: cpu, systemInformation, and labelSelectors. Each of these keys accepts a list of entries. The top level keys are a “logical AND”, while the lists under each key are a “logical OR”. Qualifiers that are not specified are not evaluated.

An example:

apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
metadata:
  name: default
spec:
  qualifiers:
    cpu:
      - manufacturer: Intel(R) Corporation
        version: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz
      - manufacturer: Advanced Micro Devices, Inc.
        version: AMD Ryzen 7 2700X Eight-Core Processor
    labelSelectors:
      - "my-server-label": "true"

Servers would only be added to the above class if they had EITHER CPU info, AND the label associated with the server resource.

2.4 - Metadata

The Metadata server manages the Machine metadata. In terms of Talos (the OS on which the Kubernetes cluster is formed), this is the “machine config”, which is used during the automated installation.

Talos Machine Configuration

The configuration of each machine is constructed from a number of sources:

  • The Talos bootstrap provider.
  • The Cluster of which the Machine is a member.
  • The ServerClass which was used to select the Server into the Cluster.
  • Any Server-specific patches.

The base template is constructed from the Talos bootstrap provider, using data from the associated Cluster manifest. Then, any configuration patches are applied from the ServerClass and Server.

Only configuration patches are allowed in the ServerClass and Server resources. These patches take the form of an RFC 6902 JSON (or YAML) patch. An example of the use of this patch method can be found in Patching Guide.

Also note that while a Server can be a member of any number of ServerClasses, only the ServerClass which is used to select the Server into the Cluster will be used for the generation of the configuration of the Machine. In this way, Servers may have a number of different configuration patch sets based on which Cluster they are in at any given time.

3 - Guides

3.1 - Bootstrapping

A guide for bootstrapping Sidero management plane

Introduction

Imagine a scenario in which you have shown up to a datacenter with only a laptop and your task is to transition a rack of bare metal machines into an HA management plane and multiple Kubernetes clusters created by that management plane. In this guide, we will go through how to create a bootstrap cluster using a Docker-based Talos cluster, provision the management plane, and pivot over to it. Guides around post-pivoting setup and subsequent cluster creation should also be found in the “Guides” section of the sidebar.

Because of the design of Cluster API, there is inherently a “chicken and egg” problem with needing a Kubernetes cluster in order to provision the management plane. Talos Systems and the Cluster API community have created tools to help make this transition easier.

Prerequisites

First, you need to install the latest talosctl by running the following script:

curl -Lo /usr/local/bin/talosctl https://github.com/talos-systems/talos/releases/latest/download/talosctl-$(uname -s | tr "[:upper:]" "[:lower:]")-amd64
chmod +x /usr/local/bin/talosctl

You can read more about Talos and talosctl at talos.dev.

Next, there are two big prerequisites involved with bootstrapping Sidero: routing and DHCP setup.

From the routing side, the laptop from which you are bootstrapping must be accessible by the bare metal machines that we will be booting. In the datacenter scenario described above, the easiest way to achieve this is probably to hook the laptop onto the server rack’s subnet by plugging it into the top-of-rack switch. This is needed for TFTP, PXE booting, and for the ability to register machines with the bootstrap plane.

DHCP configuration is needed to tell the metal servers what their “next server” is when PXE booting. The configuration of this is different for each environment and each DHCP server, thus it’s impossible to give an easy guide. However, here is an example of the configuration for an Ubiquti EdgeRouter that uses vyatta-dhcpd as the DHCP service:

This block shows the subnet setup, as well as the extra “subnet-parameters” that tell the DHCP server to include the ipxe-metal.conf file.

$ show service dhcp-server shared-network-name MetalDHCP

 authoritative enable
 subnet 192.168.254.0/24 {
     default-router 192.168.254.1
     dns-server 192.168.1.200
     lease 86400
     start 192.168.254.2 {
         stop 192.168.254.252
     }
     subnet-parameters "include "/etc/dhcp/ipxe-metal.conf";"
 }

Here is the ipxe-metal.conf file.

$ cat /etc/dhcp/ipxe-metal.conf

allow bootp;
allow booting;

next-server 192.168.1.150;
if exists user-class and option user-class = "iPXE" {
  filename "http://192.168.1.150:8081/boot.ipxe";
} elsif substring (option vendor-class-identifier, 0, 10) = "HTTPClient" {
  option vendor-class-identifier "HTTPClient";
  filename "http://192.168.1.150:8081/tftp/ipxe.efi";
} else {
  filename "ipxe.efi";
}

host talos-mgmt-0 {
    fixed-address 192.168.254.2;
    hardware ethernet d0:50:99:d3:33:60;
}

Notice that it sets a static address for the management node that I’ll be booting, in addition to providing the “next server” info. This “next server” IP address will match references to PUBLIC_IP found below in this guide.

Create a Local Cluster

The talosctl CLI tool has built-in support for spinning up Talos in docker containers. Let’s use this to our advantage as an easy Kubernetes cluster to start from.

Set an environment variable called PUBLIC_IP which is the “public” IP of your machine. Note that “public” is a bit of a misnomer. We’re really looking for the IP of your machine, not the IP of the node on the docker bridge (ex: 192.168.1.150).

export PUBLIC_IP="192.168.1.150"

We can now create our Docker cluster. Issue the following to create a single-node cluster:

talosctl cluster create \
  -p 69:69/udp,8081:8081/tcp,9091:9091/tcp,50100:50100/tcp \
  --workers 0 \
  --endpoint $PUBLIC_IP

Note that there are several ports mentioned in the command above. These allow us to access the services that will get deployed on this node.

Once the cluster create command is complete, issue talosctl kubeconfig /desired/path to fetch the kubeconfig for this cluster. You should then set your KUBECONFIG environment variable to the path of this file.

Untaint Control Plane

Because this is a single node cluster, we need to remove the “NoSchedule” taint on the node to make sure non-controlplane components can be scheduled.

kubectl taint node talos-default-master-1 node-role.kubernetes.io/master:NoSchedule-

Install Sidero

As of Cluster API version 0.3.9, Sidero is included as a default infrastructure provider in clusterctl.

To install Sidero and the other Talos providers, simply issue:

clusterctl init -b talos -c talos -i sidero

Patch Components

We will now want to ensure that the Sidero services that got created are publicly accessible across our subnet. This will allow the metal machines to speak to these services later.

Patch the Metadata Server

Update the metadata server component with the following patches:

## Update args to use 9091 for port
kubectl patch deploy -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args", "value": ["--port=9091"]}]'

## Tweak container port to match
kubectl patch deploy -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/ports", "value": [{"containerPort": 9091,"name": "http"}]}]'

## Use host networking
kubectl patch deploy -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "add", "path": "/spec/template/spec/hostNetwork", "value": true}]'

Patch the Metal Controller Manager

## Update args to specify the api endpoint to use for registration
kubectl patch deploy -n sidero-system sidero-controller-manager --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/1/args", "value": ["--api-endpoint='$PUBLIC_IP'","--metrics-addr=127.0.0.1:8080","--enable-leader-election"]}]'

## Use host networking
kubectl patch deploy -n sidero-system sidero-controller-manager --type='json' -p='[{"op": "add", "path": "/spec/template/spec/hostNetwork", "value": true}]'

Register the Servers

At this point, any servers on the same network as Sidero should PXE boot using the Sidero PXE service. To register a server with Sidero, simply turn it on and Sidero will do the rest. Once the registration is complete, you should see the servers registered with kubectl get servers:

$ kubectl get servers -o wide
NAME                                   HOSTNAME        ACCEPTED   ALLOCATED   CLEAN
00000000-0000-0000-0000-d05099d33360   192.168.254.2   false      false       false

Accept the Servers

Note in the output above that the newly registered servers are not accepted. In order for a server to be eligible for consideration, it must be marked as accepted. Before a Server is accepted, no write action will be performed against it. Servers can be accepted by issuing a patch command like:

kubectl patch server 00000000-0000-0000-0000-d05099d33360 --type='json' -p='[{"op": "replace", "path": "/spec/accepted", "value": true}]'

For more information on server acceptance, see the server docs.

Create the Default Environment

We must now create an Environment in our bootstrap cluster. An environment is a CRD that tells the PXE component of Sidero what information to return to nodes that request a PXE boot after completing the registration process above. Things that can be controlled here are kernel flags and the kernel and init images to use.

To create a default environment that will use the latest published Talos release, issue the following:

cat <<EOF | kubectl apply -f -
apiVersion: metal.sidero.dev/v1alpha1
kind: Environment
metadata:
  name: default
spec:
  kernel:
    url: "https://github.com/talos-systems/talos/releases/latest/download/vmlinuz-amd64"
    sha512: ""
    args:
      - initrd=initramfs.xz
      - page_poison=1
      - slab_nomerge
      - slub_debug=P
      - pti=on
      - random.trust_cpu=on
      - ima_template=ima-ng
      - ima_appraise=fix
      - ima_hash=sha512
      - console=tty0
      - console=ttyS1,115200n8
      - earlyprintk=ttyS1,115200n8
      - panic=0
      - printk.devkmsg=on
      - talos.platform=metal
      - talos.config=http://$PUBLIC_IP:9091/configdata?uuid=
  initrd:
    url: "https://github.com/talos-systems/talos/releases/latest/download/initramfs-amd64.xz"
    sha512: ""
EOF

Create Server Class

We must now create a server class to wrap our servers we registered. This is necessary for using the Talos control plane provider for Cluster API. The qualifiers needed for your server class will differ based on the data provided by your registration flow. See the server class docs for more info on how these work.

Here is an example of how to apply the server class once you have the proper info:

cat <<EOF | kubectl apply -f -
apiVersion: metal.sidero.dev/v1alpha1
kind: ServerClass
metadata:
  name: default
spec:
  qualifiers:
    cpu:
      - manufacturer: Intel(R) Corporation
        version: Intel(R) Atom(TM) CPU C3558 @ 2.20GHz
EOF

In order to fetch hardware information, you can use

kubectl get server -o yaml

Note that for bare-metal setup, you would need to specify an installation disk. See the Installation Disk

Once created, you should see the servers that make up your server class appear as “available”:

$ kubectl get serverclass
NAME      AVAILABLE                                  IN USE
default   ["00000000-0000-0000-0000-d05099d33360"]   []

Create Management Plane

We are now ready to template out our management plane. Using clusterctl, we can create a cluster manifest with:

clusterctl config cluster management-plane -i sidero > management-plane.yaml

Note that there are several variables that should be set in order for the templating to work properly:

  • CONTROL_PLANE_ENDPOINT: The endpoint used for the Kubernetes API server (e.g. https://1.2.3.4:6443). This is the equivalent of the endpoint you would specify in talosctl gen config. There are a variety of ways to configure a control plane endpoint. Some common ways for an HA setup are to use DNS, a load balancer, or BGP. A simpler method is to use the IP of a single node. This has the disadvantage of being a single point of failure, but it can be a simple way to get running.
  • CONTROL_PLANE_SERVERCLASS: The server class to use for control plane nodes.
  • WORKER_SERVERCLASS: The server class to use for worker nodes.
  • KUBERNETES_VERSION: The version of Kubernetes to deploy (e.g. v1.19.4).
  • CONTROL_PLANE_PORT: The port used for the Kubernetes API server (port 6443)

For instance:

export CONTROL_PLANE_SERVERCLASS=master
export WORKER_SERVERCLASS=worker
export KUBERNETES_VERSION=v1.20.1
export CONTROL_PLANE_PORT=6443
export CONTROL_PLANE_ENDPOINT=1.2.3.4
clusterctl config cluster management-plane -i sidero > management-plane.yaml

In addition, you can specify the replicas for control-plane & worker nodes in management-plane.yaml manifest for TalosControlPlane and MachineDeployment objects. Also, they can be scaled if needed:

kubectl get taloscontrolplane
kubectl get machinedeployment
kubectl scale taloscontrolplane management-plane-cp --replicas=3

Now that we have the manifest, we can simply apply it:

kubectl apply -f management-plane.yaml

NOTE: The templated manifest above is meant to act as a starting point. If customizations are needed to ensure proper setup of your Talos cluster, they should be added before applying.

Once the management plane is setup, you can fetch the talosconfig by using the cluster label. Be sure to update the cluster name and issue the following command:

kubectl get talosconfig \
  -l cluster.x-k8s.io/cluster-name=<CLUSTER NAME> \
  -o yaml -o jsonpath='{.items[0].status.talosConfig}' > management-plane-talosconfig.yaml

With the talosconfig in hand, the management plane’s kubeconfig can be fetched with talosctl --talosconfig management-plane-talosconfig.yaml kubeconfig

Pivoting

Once we have the kubeconfig for the management cluster, we now have the ability to pivot the cluster from our bootstrap. Using clusterctl, issue:

clusterctl init --kubeconfig=/path/to/management-plane/kubeconfig -i sidero -b talos -c talos

Followed by:

clusterctl move --to-kubeconfig=/path/to/management-plane/kubeconfig

Upon completion of this command, we can now tear down our bootstrap cluster with talosctl cluster destroy and begin using our management plane as our point of creation for all future clusters!

3.2 - Creating Your First Cluster

A guide for creating your first cluster with the Sidero management plane

Introduction

This guide will detail the steps needed to provision your first bare metal Talos cluster after completing the bootstrap and pivot steps detailed in the previous guide. There will be two main steps in this guide: reconfiguring the Sidero components now that they have been pivoted and the actual cluster creation.

Reconfigure Sidero

Patch Services

In this guide, we will convert the metadata service to a NodePort service and the other services to use host networking. This is also necessary because some protocols like TFTP don’t allow for port configuration. Along with some nodeSelectors and a scale up of the metal controller manager deployment, creating the services this way allows for the creation of DNS names that point to all management plane nodes and provide an HA experience if desired. It should also be noted, however, that there are many options for acheiving this functionality. Users can look into projects like MetalLB or KubeRouter with BGP and ECMP if they desire something else.

Metal Controller Manager:

## Use host networking
kubectl patch deploy -n sidero-system sidero-controller-manager --type='json' -p='[{"op": "add", "path": "/spec/template/spec/hostNetwork", "value": true}]'

Metadata Server:

## Convert metadata server service to nodeport
kubectl patch service -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "replace", "path": "/spec/type", "value": "NodePort"}]'

## Set a known nodeport for metadata server
kubectl patch service -n sidero-system sidero-metadata-server --type='json' -p='[{"op": "replace", "path": "/spec/ports", "value": [{"port": 80, "protocol": "TCP", "targetPort": "http", "nodePort": 30005}]}]'

Update Environment

The metadata server’s information needs to be updated in the default environment. Edit the environment with kubectl edit environment default and update the talos.config kernel arg with the IP of one of the management plane nodes (or the DNS entry you created) and the nodeport we specified above (30005).

Update DHCP

The DHCP options configured in the previous guide should now be updated to point to your new management plane IP or to the DNS name if it was created.

A revised ipxe-metal.conf file looks like:

allow bootp;
allow booting;

next-server 192.168.254.2;
if exists user-class and option user-class = "iPXE" {
  filename "http://192.168.254.2:8081/boot.ipxe";
} else {
  if substring (option vendor-class-identifier, 15, 5) = "00000" {
    # BIOS
    if substring (option vendor-class-identifier, 0, 10) = "HTTPClient" {
      option vendor-class-identifier "HTTPClient";
      filename "http://192.168.254.2:8081/tftp/undionly.kpxe";
    } else {
      filename "undionly.kpxe";
    }
  } else {
    # UEFI
    if substring (option vendor-class-identifier, 0, 10) = "HTTPClient" {
      option vendor-class-identifier "HTTPClient";
      filename "http://192.168.254.2:8081/tftp/ipxe.efi";
    } else {
      filename "ipxe.efi";
    }
  }
}

host talos-mgmt-0 {
   fixed-address 192.168.254.2;
   hardware ethernet d0:50:99:d3:33:60;
}

Register the Servers

At this point, any servers on the same network as Sidero should PXE boot using the Sidero PXE service. To register a server with Sidero, simply turn it on and Sidero will do the rest. Once the registration is complete, you should see the servers registered with kubectl get servers:

$ kubectl get servers -o wide
NAME                                   HOSTNAME        ACCEPTED   ALLOCATED   CLEAN
00000000-0000-0000-0000-d05099d33360   192.168.254.2   false      false       false

Accept the Servers

Note in the output above that the newly registered servers are not accepted. In order for a server to be eligible for consideration, it must be marked as accepted. Before a Server is accepted, no write action will be performed against it. Servers can be accepted by issuing a patch command like:

kubectl patch server 00000000-0000-0000-0000-d05099d33360 --type='json' -p='[{"op": "replace", "path": "/spec/accepted", "value": true}]'

For more information on server acceptance, see the server docs.

Create the Cluster

The cluster creation process should be identical to what was detailed in the previous guide. Note that, for this example, the same “default” serverclass that we used in the previous guide is used again. Using clusterctl, we can create a cluster manifest with:

clusterctl config cluster workload-cluster -i sidero > workload-cluster.yaml

Note that there are several variables that should be set in order for the templating to work properly:

  • CONTROL_PLANE_ENDPOINT: The endpoint used for the Kubernetes API server (e.g. https://1.2.3.4:6443). This is the equivalent of the endpoint you would specify in talosctl gen config. There are a variety of ways to configure a control plane endpoint. Some common ways for an HA setup are to use DNS, a load balancer, or BGP. A simpler method is to use the IP of a single node. This has the disadvantage of being a single point of failure, but it can be a simple way to get running.
  • CONTROL_PLANE_SERVERCLASS: The server class to use for control plane nodes.
  • WORKER_SERVERCLASS: The server class to use for worker nodes.
  • KUBERNETES_VERSION: The version of Kubernetes to deploy (e.g. v1.19.4).

Now that we have the manifest, we can simply apply it:

kubectl apply -f workload-cluster.yaml

NOTE: The templated manifest above is meant to act as a starting point. If customizations are needed to ensure proper setup of your Talos cluster, they should be added before applying.

Once the workload cluster is setup, you can fetch the talosconfig with a command like:

kubectl get talosconfig -o yaml workload-cluster-cp-xxx -o jsonpath='{.status.talosConfig}' > workload-cluster-talosconfig.yaml

Then the workload cluster’s kubeconfig can be fetched with talosctl --talosconfig workload-cluster-talosconfig.yaml kubeconfig /desired/path.

3.3 - Patching

A guide describing patching

Server resources can be updated by using the configPatches section of the custom resource. Any field of the Talos machine config can be overridden on a per-machine basis using this method. The format of these patches is based on JSON 6902 that you may be used to in tools like kustomize.

Any patches specified in the server resource are processed by the Metal Metadata Server before it returns a Talos machine config for a given server at boot time.

A set of patches may look like this:

apiVersion: metal.sidero.dev/v1alpha1
kind: Server
metadata:
  name: 00000000-0000-0000-0000-d05099d33360
spec:
  configPatches:
    - op: replace
      path: /machine/install
      value:
        disk: /dev/sda
    - op: replace
      path: /cluster/network/cni
      value:
        name: "custom"
        urls:
          - "http://192.168.1.199/assets/cilium.yaml"

Testing Configuration Patches

While developing config patches it is usually convenient to test generated config with patches before actual server is provisioned with the config.

This can be achieved by querying the metadata server endpoint directly:

$ curl http://$PUBLIC_IP:9091/configdata?uuid=$SERVER_UUID
version: v1alpha1
...

Replace $PUBLIC_IP with the Sidero IP address and $SERVER_UUID with the name of the Server to test against.

If metadata endpoint returns an error on applying JSON patches, make sure config subtree being patched exists in the config. If it doesn’t exist, create it with the op: add above the op: replace patch.

Combining Patches from Multiple Sources

Config patches might be combined from multiple sources (Server, ServerClass), which is explained in details in Metadata section.

3.4 - Provisioning Flow

Diagrams for various flows in Sidero.
graph TD;
    Start(Start);
    End(End);

    %% Decisions

    IsOn{Is server is powered on?};
    IsRegistered{Is server is registered?};
    IsAccepted{Is server is accepted?};
    IsClean{Is server is clean?};
    IsAllocated{Is server is allocated?};

    %% Actions

    DoPowerOn[Power server on];
    DoPowerOff[Power server off];
    DoBootAgentEnvironment[Boot agent];
    DoBootEnvironment[Boot environment];
    DoRegister[Register server];
    DoWipe[Wipe server];

    %% Chart

    Start-->IsOn;
    IsOn--Yes-->End;
    IsOn--No-->DoPowerOn;

    DoPowerOn--->IsRegistered;

    IsRegistered--Yes--->IsAccepted;
    IsRegistered--No--->DoBootAgentEnvironment-->DoRegister;

    DoRegister-->IsRegistered;

    IsAccepted--Yes--->IsAllocated;
    IsAccepted--No--->End;

    IsAllocated--Yes--->DoBootEnvironment;
    IsAllocated--No--->IsClean;
    IsClean--No--->DoWipe-->DoPowerOff;

    IsClean--Yes--->DoPowerOff;

    DoBootEnvironment-->End;

    DoPowerOff-->End;

Installation Flow

graph TD;
    Start(Start);
    End(End);

    %% Decisions

    IsInstalled{Is installed};

    %% Actions

    DoInstall[Install];
    DoReboot[Reboot];

    %% Chart

    Start-->IsInstalled;
    IsInstalled--Yes-->End;
    IsInstalled--No-->DoInstall;

    DoInstall-->DoReboot;

    DoReboot-->IsInstalled;