Netbooting and Tinkerbell for Bare Metal

Overview of Netbooting and Tinkerbell for EKS Anywhere on Bare Metal

EKS Anywhere uses Tinkerbell to provision machines for a Bare Metal cluster. Understanding what Tinkerbell is and how it works with EKS Anywhere can help you take advantage of advanced provisioning features or overcome provisioning problems you encounter.

As someone deploying an EKS Anywhere cluster on Bare Metal, you have several opportunities to interact with Tinkerbell:

  • Create a hardware CSV file: You are required to create a hardware CSV file that contains an entry for every physical machine you want to add at cluster creation time.
  • Create an EKS Anywhere cluster: By modifying the Bare Metal configuration file used to create a cluster, you can change some Tinkerbell settings or add actions to define how the operating system on each machine is configured.
  • Monitor provisioning: You can follow along with the Tinkerbell Overview in this page to monitor the progress of your hardware provisioning, as Tinkerbell finds machines and attempts to PXE boot, configure, and restart them.

Using Tinkerbell on EKS Anywhere

The sections below step through how Tinkerbell is integrated with EKS Anywhere to deploy a Bare Metal cluster. While based on features described in Tinkerbell Documentation , EKS Anywhere has modified and added to Tinkerbell components such that the entire Tinkerbell stack is now Kubernetes-friendly and can run on a Kubernetes cluster.

Create bare metal CSV file

The information that Tinkerbell uses to provision machines for the target EKS Anywhere cluster needs to be gathered in a CSV file with the following format:

hostname,bmc_ip,bmc_username,bmc_password,mac,ip_address,netmask,gateway,nameservers,labels,disk
eksa-cp01,10.10.44.1,root,PrZ8W93i,CC:48:3A:00:00:01,10.10.50.2,255.255.254.0,10.10.50.1,8.8.8.8,type=cp,/dev/sda
...

Each physical, bare metal machine is represented by a comma-separated list of information on a single line. It includes information needed to identify each machine (the NIC’s MAC address), PXE boot the machine, point to the disk to install on, and then configure and start the installed system. See Preparing hardware inventory for details on the content and format of that file.

Modify the cluster specification file

Before you create a cluster using the Bare Metal configuration file, you can make Tinkerbell-related changes to that file. In particular, TinkerbellDatacenterConfig fields , TinkerbellMachineConfig fields , and Tinkerbell Actions can be added or modified.

Tinkerbell actions vary based on the operating system you choose for your EKS Anywhere cluster. Actions are stored internally and not shown in the generated cluster specification file, so you must add those sections yourself to change from the defaults (see Ubuntu TinkerbellTemplateConfig example and Bottlerocket TinkerbellTemplateConfig example for details).

In most cases, you don’t need to touch the default actions. However, you might want to modify an action (for example to change kexec to a reboot action if the hardware requires it) or add an action to further configure the installed system. Examples in Advanced Bare Metal cluster configuration show a few actions you might want to add.

Once you have made all your modifications, you can go ahead and create the cluster. The next section describes how Tinkerbell works during cluster creation to provision your Bare Metal machines and prepare them to join the EKS Anywhere cluster.

Overview of Tinkerbell in EKS Anywhere

When you run the command to create an EKS Anywhere Bare Metal cluster, a set of Tinkerbell components start up on the Admin machine. One of these components runs in a container on Docker, while other components run as either controllers or services in pods on the Kubernetes kind cluster that is started up on the Admin machine. Tinkerbell components include boots, hegel, rufio, and tink.

Tinkerbell boots service

The boots service runs in a single container to handle the DHCP service and Netbooting activities. In particular, boots hands out IP addresses, serves iPXE binaries via HTTP and TFTP, delivers an iPXE script to the provisioned machines, and runs a syslog server.

Boots is different from the other Tinkerbell services because the DHCP service it runs must listen directly to layer 2 traffic. (The kind cluster running on the Admin machine doesn’t have the ability to have pods listening on layer 2 networks, which is why boots is run directly on Docker instead, with host networking enabled.)

Because boots is running as a container in Docker, you can see the output in the logs for the boots container by running:

docker logs boots

From the logs output, you will see iPXE try to netboot each machine. If the process doesn’t get all the information it wants from the DHCP server, it will time out. You can see iPXE loading variables, loading a kernel and initramfs (via DHCP), then booting into that kernel and initramfs: in other words, you will see everything that happens with iPXE before it switches over to the kernel and initramfs. The kernel, initramfs, and all images retrieved later are obtained remotely over HTTP and HTTPS.

Tinkerbell hegel, rufio, and tink components

After boots comes up on Docker, a small Kubernetes kind cluster starts up on the Admin machine. Other Tinkerbell components run as pods on that kind cluster. Those components include:

  • hegel: Manages Tinkerbell’s metadata service. The hegel service gets its metadata from the hardware specification stored in Kubernetes in the form of custom resources. The format that it serves is similar to an Ec2 metadata format.
  • rufio: Handles talking to BMCs (which manages things like starting and stopping systems with IPMI). The rufio Kubernetes controller sets things such as power state, persistent boot order, and eventually other services (like NTP, LDAP, and TLS certificates). BMC authentication is managed with Kubernetes secrets.
  • tink: The tink service consists of three components: tink server, tink controller, and tink worker. The tink controller manages hardware data, templates you want to execute, and the worflows that each target specific hardware you are provisioning. The tink worker is a small binary that runs inside of HookOS and talks to the tink server. The worker sends the tink server its MAC address and asks the server for workflows to run. The tink worker will then go through each action, one-by-one, and try to execute it.

To see those services and controllers running on the kind bootstrap cluster, type:

kubectl get pods -n eksa-system
NAME                                      READY STATUS    RESTARTS AGE
hegel-sbchp                               1/1   Running   0        3d
rufio-controller-manager-5dcc568c79-9kllz 1/1   Running   0        3d
tink-controller-manager-54dc786db6-tm2c5  1/1   Running   0        3d
tink-server-5c494445bc-986sl              1/1   Running   0        3d

Provisioning hardware with Tinkerbell

After you start up the cluster create process, the following is the general workflow that Tinkerbell performs to begin provisioning the bare metal machines and prepare them to become part of the EKS Anywhere target cluster. You can set up kubectl on the Admin machine to access the bootstrap cluster and follow along:

export KUBECONFIG=${PWD}/${CLUSTER_NAME}/generated/${CLUSTER_NAME}.kind.kubeconfig

Power up the nodes

Tinkerbell starts by finding a node from the hardware list (based on MAC address) and contacting it to identify a baseboard management job (BMJ) that runs a set of baseboard management tasks (BMT). To see that information, type:

kubectl get bmj -A
NAMESPACE    NAME                                           AGE
eksa-system  mycluster-md-0-1656099863422-vxvh2-provision   12m
kubectl get bmt -A 
NAMESPACE    NAME                                                AGE
eksa-system  mycluster-md-0-1656099863422-vxh2-provision-task-0  55s
eksa-system  mycluster-md-0-1656099863422-vxh2-provision-task-1  51s
eksa-system  mycluster-md-0-1656099863422-vxh2-provision-task-2  47s

The following shows snippets from the bmt output that represent the three tasks: Power Off, enable PXE boot, and Power On.

kubectl describe bmt -n eksa-system eksa-system mycluster-md-0-1656099863422-vxh2-provision-task-0
...
  Task:
    Power Action:  Off
Status:
  Completion Time:   2022-06-27T20:32:59Z
  Conditions:
    Status:    True
    Type:      Completed 
kubectl describe bmt -n eksa-system eksa-system mycluster-md-0-1656099863422-vxh2-provision-task-1
...
  Task:
    One Time Boot Device Action:
      Device:
        pxe
      Efi Boot:  true
Status:
  Completion Time:   2022-06-27T20:33:04Z
  Conditions:
    Status:    True
    Type:      Completed   
kubectl describe bmt -n eksa-system eksa-system mycluster-md-0-1656099863422-vxh2-provision-task-2
  Task:
    Power Action:  on
Status:
  Completion Time:   2022-06-27T20:33:10Z
  Conditions:
    Status:    True
    Type:      Completed   

Rufio converts the baseboard management jobs into task objects, then goes ahead and executes each task. To see rufio logs, type:

kubectl logs -n eksa-system rufio-controller-manager-5dcc568c79-9kllz | less

PXE boots

Next the boots service PXE boots the machine and begins streaming the HookOS (vmlinuz and initramfs) to the machine. HookOS runs in memory and provides the installation environment. To watch the boots log messages as each node boots, type:

docker logs boots 

You can search the output for vmlinuz and initramfs to watch as the HookOS is downloaded and booted from memory on each machine.

Running workflows

Once the HookOS is up, Tinkerbell begins running the tasks and actions contained in the workflows. This is coordinated between the tink worker, running in memory within the HookOS on the machine, and the tink server on the kind cluster. To see the workflows being run, type the following:

kubectl get workflows.tinkerbell.org -n eksa-system
NAME                                TEMPLATE                            STATE
mycluster-md-0-1656099863422-vxh2   mycluster-md-0-1656099863422-vxh2   STATE_RUNNING

This shows the worflow for the first machine that is being provisioned. Add -o yaml to see details of that workflow template:

kubectl get workflows.tinkerbell.org -n eksa-system -o yaml
...
status:
  state: STATE_RUNNING
  tasks:
  - actions
    - environment:
        COMPRESSED: "true"
        DEST_DISK: /dev/sda
        IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/11/artifacts/raw/1-22/ubuntu-v1.22.10-eks-d-1-22-8-eks-a-11-amd64.gz
      image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-11
      name: stream-image
      seconds: 35
      startedAt: "2022-06-27T20:37:39Z"
      status: STATE_SUCCESS
...

You can see that the first action in the workflow is to stream (stream-image) the operating system to the destination disk (DEST_DISK) on the machine. In this example, the Ubuntu operating system that will be copied to disk (/dev/sda) is being served from the location specificed by IMG_URL. The action was successful (STATE_SUCCESS) and it took 35 seconds.

Each action and its status is shown in this output for the whole workflow. To see details of the default actions for each supported operating system, see the Ubuntu TinkerbellTemplateConfig example and Bottlerocket TinkerbellTemplateConfig example .

In general, the actions include:

  • Streaming the operating system image to disk on each machine.
  • Configuring the network interfaces on each machine.
  • Setting up the cloud-init or similar service to add users and otherwise configure the system.
  • Identifying the data source to add to the system.
  • Setting the kernel to pivot to the installed system (using kexec) or having the system reboot to bring up the installed system from disk.

If all goes well, you will see all actions set to STATE_SUCCESS, except for the kexec-image action. That should show as STATE_RUNNING for as long as the machine is running.

You can review the CAPT logs to see provisioning activity. For example, at the start of a new provisioning event, you would see something like the following:

kubectl logs -n capt-system capt-controller-manager-9f8b95b-frbq | less
..."Created BMCJob to get hardware ready for provisioning"...

You can follow this output to see the machine as it goes through the provisioning process.

After the node is initialized, completes all the Tinkerbell actions, and is booted into the installed operating system (Ubuntu or Bottlerocket), the new system starts cloud-init to do further configuration. At this point, the system will reach out to the Tinkerbell hegel service to get the hegel metadata.

If something goes wrong, viewing hegel files can help you understand why a stuck system that has booted into Ubuntu has not joined the cluster yet. To see the hegel files, get the internal IP address for one of the new nodes. Then check for the names of hegel logs and display the contents of one of those logs, searching for the IP address of the node:

kubectl get nodes -o wide
NAME        STATUS   ROLES                 AGE    VERSION               INTERNAL-IP    ...
eksa-da04   Ready    control-plane,master  9m5s   v1.22.10-eks-7dc61e8  10.80.30.23
kubectl get logs -n eksa-system | grep hegel
hegel-n7ngs
kubectl logs -n eksa-system hegel-n7ngs
..."Retrieved IP peer IP..."userIP":"10.80.30.23...

If the log shows you are getting requests from the node, the problem is not a cloud-init issue.

After the first machine successfully completes the workflow, each other machine repeats the same process until the initial set of machines is all up and running.

Tinkerbell moves to target cluster

Once the initial set of machines is up and the EKS Anywhere cluster is running, all the Tinkerbell services and components (including boots) are moved to the new target cluster and run as pods on that cluster. Those services are deleted on the kind cluster on the Admin machine.

Reviewing the status

At this point, you can change your kubectl credentials to point at the new target cluster to get information about Tinkerbell services on the new cluster. For example:

export KUBECONFIG=${PWD}/${CLUSTER_NAME}/${CLUSTER_NAME}-eks-a-cluster.kubeconfig

First check that the Tinkerbell pods are all running by listing pods from the eksa-system namespace:

kubectl get pods -n eksa-system
NAME                                        READY   STATUS    RESTARTS   AGE
boots-5dc66b5d4-klhmj                       1/1     Running   0          3d
hegel-sbchp                                 1/1     Running   0          3d
rufio-controller-manager-5dcc568c79-9kllz   1/1     Running   0          3d
tink-controller-manager-54dc786db6-tm2c5    1/1     Running   0          3d
tink-server-5c494445bc-986sl                1/1     Running   0          3d

Next, check the list of Tinkerbell machines. If all of the machines were provisioned successfully, you should see true under the READY column for each one.

kubectl get tinkerbellmachine -A
NAMESPACE    NAME                                                   CLUSTER    STATE  READY  INSTANCEID                          MACHINE
eksa-system  mycluster-control-plane-template-1656099863422-pqq2q   mycluster         true   tinkerbell://eksa-system/eksa-da04  mycluster-72p72

You can also check the machines themselves. Watch the PHASE change from Provisioning to Provisioned to Running. The Running phase indicates that the machine is now running as a node on the new cluster:

kubectl get machines -n eksa-system
NAME              CLUSTER    NODENAME    PROVIDERID                         PHASE    AGE  VERSION
mycluster-72p72   mycluster  eksa-da04   tinkerbell://eksa-system/eksa-da04 Running  7m25s   v1.22.10-eks-1-22-8

Once you have confirmed that all your machines are successfully running as nodes on the target cluster, there is not much for Tinkerbell to do. It stays around to continue running the DHCP service and to be available to add more machines to the cluster.