1 - Requirements for EKS Anywhere on Bare Metal

Bare Metal provider requirements for EKS Anywhere

To run EKS Anywhere on Bare Metal, you need to meet the hardware and networking requirements described below.

Administrative machine

Set up an Administrative machine as described in Install EKS Anywhere .

Compute server requirements

The minimum number of physical machines needed to run EKS Anywhere on bare metal is 1. To configure EKS Anywhere to run on a single server, set controlPlaneConfiguration.count to 1, and omit workerNodeGroupConfigurations from your cluster configuration.

The recommended number of physical machines for production is at least:

  • Control plane physical machines: 3
  • Worker physical machines: 2

The compute hardware you need for your Bare Metal cluster must meet the following capacity requirements:

  • vCPU: 2
  • Memory: 8GB RAM
  • Storage: 25GB

Upgrade requirements

If you are running a standalone cluster with only one control plane node, you will need at least one additional, temporary machine for each control plane node grouping. For cluster with multiple control plane nodes, you can perform a rolling upgrade with or without an extra temporary machine. For worker node upgrades, you can perform a rolling upgrade with or without an extra temporary machine.

When upgrading without an extra machine, keep in mind that your control plane and your workload must be able to tolerate node unavailability. When upgrading with extra machine(s), you will need additional temporary machine(s) for each control plane and worker node grouping. Refer to Upgrade Bare Metal Cluster and Advanced configuration for rolling upgrade .

NOTE: For single node clusters that require an additional temporary machine for upgrading, if you don’t want to set up the extra hardware, you may recreate the cluster for upgrading and handle data recovery manually.

Network requirements

Each machine should include the following features:

  • Network Interface Cards: At least one NIC is required. It must be capable of netbooting from PXE.
  • IPMI integration (recommended): An IPMI implementation (such a Dell iDRAC, RedFish-compatible, legacy or HP iLO) on the computer’s motherboard or on a separate expansion card. This feature is used to allow remote management of the machine, such as turning the machine on and off.

NOTE: IPMI is not required for an EKS Anywhere cluster. However, without IPMI, upgrades are not supported and you will have to physically turn machines off and on when appropriate.

Here are other network requirements:

  • All EKS Anywhere machines, including the Admin, control plane and worker machines, must be on the same layer 2 network and have network connectivity to the BMC (IPMI, Redfish, and so on).

  • You must be able to run DHCP on the control plane/worker machine network.

NOTE:: If you have another DHCP service running on the network, you need to prevent it from interfering with the EKS Anywhere DHCP service. You can do that by configuring the other DHCP service to explicitly block all MAC addresses and exclude all IP addresses that you plan to use with your EKS Anywhere clusters.

  • The administrative machine and the target workload environment will need network access to:

    • public.ecr.aws
    • anywhere-assets.eks.amazonaws.com: To download the EKS Anywhere binaries, manifests and OVAs
    • distro.eks.amazonaws.com: To download EKS Distro binaries and manifests
    • d2glxqk2uabbnd.cloudfront.net: For EKS Anywhere and EKS Distro ECR container images
  • Two IP addresses routable from the cluster, but excluded from DHCP offering. One IP address is to be used as the Control Plane Endpoint IP. The other is for the Tinkerbell IP address on the target cluster. Below are some suggestions to ensure that these IP addresses are never handed out by your DHCP server. You may need to contact your network engineer to manage these addresses.

    • Pick IP addresses reachable from the cluster subnet that are excluded from the DHCP range or
    • Create an IP reservation for these addresses on your DHCP server. This is usually accomplished by adding a dummy mapping of this IP address to a non-existent mac address.

NOTE: When you set up your cluster configuration YAML file, the endpoint and Tinkerbell addresses are set in the ControlPlaneConfiguration.endpoint.host and tinkerbellIP fields, respectively.

  • Ports must be open to the Admin machine and cluster machines as described in Ports and protocols .

Validated hardware

Through extensive testing in a variety of on premises customer environments during our beta phase, we expect Amazon EKS Anywhere on bare metal to run on most generic hardware that meets the above requirements. In addition, we have collaborated with our hardware original equipment manufacturer (OEM) partners to provide you a list of validated hardware:

Bare metal servers IPMI NIC OS
Dell PowerEdge R740 iDRAC9 Mellanox ConnectX-4 LX 25GbE Validated with Ubuntu v20.04.1
Dell PowerEdge R7525 (NVIDIA Tesla™ T4 GPU’s) iDRAC9 Mellanox ConnectX-4 LX 25GbE & Intel Ethernet 10G 4P X710 OCP Validated with Ubuntu v20.04.1
Dell PowerFlex (R640) iDRAC9 Mellanox ConnectX-4 LX 25GbE Validated with Ubuntu v20.04.1
SuperServer SYS-510P-M IPMI2.0/Redfish API Intel® Ethernet Controller i350 2x 1GbE Validated with Ubuntu v20.04.1 and Bottlerocket v1.8.0
Dell PowerEdge R240 iDRAC9 Broadcom 57414 Dual Port 10/25GbE Validated with Ubuntu v20.04 and Bottlerocket v1.8.0
HPE ProLiant DL20 iLO5 HPE 361i 1G Validated with Ubuntu v20.04 and Bottlerocket v1.8.0
HPE ProLiant DL160 Gen10 iLO5 HPE Eth 10/25Gb 2P 640SFP28 A Validated with Ubuntu v20.04.1
Dell PowerEdge R340 iDRAC9 Broadcom 57416 Dual Port 10GbE Validated with Ubuntu v20.04.1 and Bottlerocket v1.8.0
HPE ProLiant DL360 iLO5 HPE Ethernet 1Gb 4-port 331i Validated with Ubuntu v20.04.1
Lenovo ThinkSystem SR650 V2 XClarity Controller Enterprise v7.92
  • Intel I350 1GbE RJ45 4-port OCP
  • Marvell QL41232 10/25GbE SFP28
    2-Port PCIe Ethernet Adapter
Validated with Ubuntu v20.04.1

2 - Preparing Bare Metal for EKS Anywhere

Set up a Bare Metal cluster to prepare it for EKS Anywhere

After gathering hardware described in Bare Metal Requirements , you need to prepare the hardware and create a CSV file describing that hardware.

Prepare hardware

To prepare your computer hardware for EKS Anywhere, you need to connect your computer hardware and do some configuration. Once the hardware is in place, you need to:

  • Obtain IP and MAC addresses for your machines' NICs.
  • Obtain IP addresses for your machines' IPMI interfaces.
  • Obtain the gateway address for your network to reach the Internet.
  • Obtain the IP address for your DNS servers.
  • Make sure the following settings are in place:
    • UEFI is enabled on all target cluster machines, unless you are provisioning RHEL systems. Enable legacy BIOS on any RHEL machines.
    • PXE boot is enabled for the NIC on each machine for which you provided the MAC address. This is the interface on which the operating system will be provisioned.
    • PXE is set as the first device in each machine’s boot order
    • IPMI over LAN is enabled on the IPMI interfaces
  • Go to the IPMI settings for each machine and set the IP address (bmc_ip), username (bmc_username), and password (bmc_password) to use later in the CSV file.

Prepare hardware inventory

Create a CSV file to provide information about all physical machines that you are ready to add to your target Bare Metal cluster. This file will be used:

  • When you generate the hardware file to be included in the cluster creation process described in the Create Bare Metal production cluster Getting Started guide.
  • To provide information that is passed to each machine from the Tinkerbell DHCP server when the machine is initially PXE booted.

The following is an example of an EKS Anywhere Bare Metal hardware CSV file:

hostname,bmc_ip,bmc_username,bmc_password,mac,ip_address,netmask,gateway,nameservers,labels,disk
eksa-cp01,10.10.44.1,root,PrZ8W93i,CC:48:3A:00:00:01,10.10.50.2,255.255.254.0,10.10.50.1,8.8.8.8|8.8.4.4,type=cp,/dev/sda
eksa-cp02,10.10.44.2,root,Me9xQf93,CC:48:3A:00:00:02,10.10.50.3,255.255.254.0,10.10.50.1,8.8.8.8|8.8.4.4,type=cp,/dev/sda
eksa-cp03,10.10.44.3,root,Z8x2M6hl,CC:48:3A:00:00:03,10.10.50.4,255.255.254.0,10.10.50.1,8.8.8.8|8.8.4.4,type=cp,/dev/sda
eksa-wk01,10.10.44.4,root,B398xRTp,CC:48:3A:00:00:04,10.10.50.5,255.255.254.0,10.10.50.1,8.8.8.8|8.8.4.4,type=worker,/dev/sda
eksa-wk02,10.10.44.5,root,w7EenR94,CC:48:3A:00:00:05,10.10.50.6,255.255.254.0,10.10.50.1,8.8.8.8|8.8.4.4,type=worker,/dev/sda

The CSV file is a comma-separated list of values in a plain text file, holding information about the physical machines in the datacenter that are intended to be a part of the cluster creation process. Each line represents a physical machine (not a virtual machine).

The following sections describe each value.

hostname

The hostname assigned to the machine.

bmc_ip (optional)

The IP address assigned to the IPMI interface on the machine.

bmc_username (optional)

The username assigned to the IPMI interface on the machine.

bmc_password (optional)

The password associated with the bmc_username assigned to the IPMI interface on the machine.

mac

The MAC address of the network interface card (NIC) that provides access to the host computer.

ip_address

The IP address providing access to the host computer.

netmask

The netmask associated with the ip_address value. In the example above, a /23 subnet mask is used, allowing you to use up to 510 IP addresses in that range.

gateway

IP address of the interface that provides access (the gateway) to the Internet.

nameservers

The IP address of the server that you want to provide DNS service to the cluster.

labels

The optional labels field can consist of a key/value pair to use in conjunction with the hardwareSelector field when you set up your Bare Metal configuration . The key/value pair is connected with an equal (=) sign.

For example, a TinkerbellMachineConfig with a hardwareSelector containing type: cp will match entries in the CSV containing type=cp in its label definition.

disk

The device name of the disk on which the operating system will be installed. For example, it could be /dev/sda for the first SCSI disk or /dev/nvme0n1 for the first NVME storage device.

3 - Netbooting and Tinkerbell for Bare Metal

Overview of Netbooting and Tinkerbell for EKS Anywhere on Bare Metal

EKS Anywhere uses Tinkerbell to provision machines for a Bare Metal cluster. Understanding what Tinkerbell is and how it works with EKS Anywhere can help you take advantage of advanced provisioning features or overcome provisioning problems you encounter.

As someone deploying an EKS Anywhere cluster on Bare Metal, you have several opportunities to interact with Tinkerbell:

  • Create a hardware CSV file: You are required to create a hardware CSV file that contains an entry for every physical machine you want to add at cluster creation time.
  • Create an EKS Anywhere cluster: By modifying the Bare Metal configuration file used to create a cluster, you can change some Tinkerbell settings or add actions to define how the operating system on each machine is configured.
  • Monitor provisioning: You can follow along with the Tinkerbell Overview in this page to monitor the progress of your hardware provisioning, as Tinkerbell finds machines and attempts to PXE boot, configure, and restart them.

Using Tinkerbell on EKS Anywhere

The sections below step through how Tinkerbell is integrated with EKS Anywhere to deploy a Bare Metal cluster. While based on features described in Tinkerbell Documentation , EKS Anywhere has modified and added to Tinkerbell components such that the entire Tinkerbell stack is now Kubernetes-friendly and can run on a Kubernetes cluster.

Create bare metal CSV file

The information that Tinkerbell uses to provision machines for the target EKS Anywhere cluster needs to be gathered in a CSV file with the following format:

hostname,bmc_ip,bmc_username,bmc_password,mac,ip_address,netmask,gateway,nameservers,labels,disk
eksa-cp01,10.10.44.1,root,PrZ8W93i,CC:48:3A:00:00:01,10.10.50.2,255.255.254.0,10.10.50.1,8.8.8.8,type=cp,/dev/sda
...

Each physical, bare metal machine is represented by a comma-separated list of information on a single line. It includes information needed to identify each machine (the NIC’s MAC address), PXE boot the machine, point to the disk to install on, and then configure and start the installed system. See Preparing hardware inventory for details on the content and format of that file.

Modify the cluster specification file

Before you create a cluster using the Bare Metal configuration file, you can make Tinkerbell-related changes to that file. In particular, TinkerbellDatacenterConfig fields , TinkerbellMachineConfig fields , and Tinkerbell Actions can be added or modified.

Tinkerbell actions vary based on the operating system you choose for your EKS Anywhere cluster. Actions are stored internally and not shown in the generated cluster specification file, so you must add those sections yourself to change from the defaults (see Ubuntu TinkerbellTemplateConfig example and Bottlerocket TinkerbellTemplateConfig example for details).

In most cases, you don’t need to touch the default actions. However, you might want to modify an action (for example to change kexec to a reboot action if the hardware requires it) or add an action to further configure the installed system. Examples in Advanced Bare Metal cluster configuration show a few actions you might want to add.

Once you have made all your modifications, you can go ahead and create the cluster. The next section describes how Tinkerbell works during cluster creation to provision your Bare Metal machines and prepare them to join the EKS Anywhere cluster.

Overview of Tinkerbell in EKS Anywhere

When you run the command to create an EKS Anywhere Bare Metal cluster, a set of Tinkerbell components start up on the Admin machine. One of these components runs in a container on Docker, while other components run as either controllers or services in pods on the Kubernetes kind cluster that is started up on the Admin machine. Tinkerbell components include boots, hegel, rufio, and tink.

Tinkerbell boots service

The boots service runs in a single container to handle the DHCP service and Netbooting activities. In particular, boots hands out IP addresses, serves iPXE binaries via HTTP and TFTP, delivers an iPXE script to the provisioned machines, and runs a syslog server.

Boots is different from the other Tinkerbell services because the DHCP service it runs must listen directly to layer 2 traffic. (The kind cluster running on the Admin machine doesn’t have the ability to have pods listening on layer 2 networks, which is why boots is run directly on Docker instead, with host networking enabled.)

Because boots is running as a container in Docker, you can see the output in the logs for the boots container by running:

docker logs boots

From the logs output, you will see iPXE try to netboot each machine. If the process doesn’t get all the information it wants from the DHCP server, it will time out. You can see iPXE loading variables, loading a kernel and initramfs (via DHCP), then booting into that kernel and initramfs: in other words, you will see everything that happens with iPXE before it switches over to the kernel and initramfs. The kernel, initramfs, and all images retrieved later are obtained remotely over HTTP and HTTPS.

Tinkerbell hegel, rufio, and tink components

After boots comes up on Docker, a small Kubernetes kind cluster starts up on the Admin machine. Other Tinkerbell components run as pods on that kind cluster. Those components include:

  • hegel: Manages Tinkerbell’s metadata service. The hegel service gets its metadata from the hardware specification stored in Kubernetes in the form of custom resources. The format that it serves is similar to an Ec2 metadata format.
  • rufio: Handles talking to BMCs (which manages things like starting and stopping systems with IPMI). The rufio Kubernetes controller sets things such as power state, persistent boot order, and eventually other services (like NTP, LDAP, and TLS certificates). BMC authentication is managed with Kubernetes secrets.
  • tink: The tink service consists of three components: tink server, tink controller, and tink worker. The tink controller manages hardware data, templates you want to execute, and the workflows that each target specific hardware you are provisioning. The tink worker is a small binary that runs inside of HookOS and talks to the tink server. The worker sends the tink server its MAC address and asks the server for workflows to run. The tink worker will then go through each action, one-by-one, and try to execute it.

To see those services and controllers running on the kind bootstrap cluster, type:

kubectl get pods -n eksa-system
NAME                                      READY STATUS    RESTARTS AGE
hegel-sbchp                               1/1   Running   0        3d
rufio-controller-manager-5dcc568c79-9kllz 1/1   Running   0        3d
tink-controller-manager-54dc786db6-tm2c5  1/1   Running   0        3d
tink-server-5c494445bc-986sl              1/1   Running   0        3d

Provisioning hardware with Tinkerbell

After you start up the cluster create process, the following is the general workflow that Tinkerbell performs to begin provisioning the bare metal machines and prepare them to become part of the EKS Anywhere target cluster. You can set up kubectl on the Admin machine to access the bootstrap cluster and follow along:

export KUBECONFIG=${PWD}/${CLUSTER_NAME}/generated/${CLUSTER_NAME}.kind.kubeconfig

Power up the nodes

Tinkerbell starts by finding a node from the hardware list (based on MAC address) and contacting it to identify a baseboard management job (BMJ) that runs a set of baseboard management tasks (BMT). To see that information, type:

kubectl get bmj -A
NAMESPACE    NAME                                           AGE
eksa-system  mycluster-md-0-1656099863422-vxvh2-provision   12m
kubectl get bmt -A 
NAMESPACE    NAME                                                AGE
eksa-system  mycluster-md-0-1656099863422-vxh2-provision-task-0  55s
eksa-system  mycluster-md-0-1656099863422-vxh2-provision-task-1  51s
eksa-system  mycluster-md-0-1656099863422-vxh2-provision-task-2  47s

The following shows snippets from the bmt output that represent the three tasks: Power Off, enable PXE boot, and Power On.

kubectl describe bmt -n eksa-system eksa-system mycluster-md-0-1656099863422-vxh2-provision-task-0
...
  Task:
    Power Action:  Off
Status:
  Completion Time:   2022-06-27T20:32:59Z
  Conditions:
    Status:    True
    Type:      Completed 
kubectl describe bmt -n eksa-system eksa-system mycluster-md-0-1656099863422-vxh2-provision-task-1
...
  Task:
    One Time Boot Device Action:
      Device:
        pxe
      Efi Boot:  true
Status:
  Completion Time:   2022-06-27T20:33:04Z
  Conditions:
    Status:    True
    Type:      Completed   
kubectl describe bmt -n eksa-system eksa-system mycluster-md-0-1656099863422-vxh2-provision-task-2
  Task:
    Power Action:  on
Status:
  Completion Time:   2022-06-27T20:33:10Z
  Conditions:
    Status:    True
    Type:      Completed   

Rufio converts the baseboard management jobs into task objects, then goes ahead and executes each task. To see rufio logs, type:

kubectl logs -n eksa-system rufio-controller-manager-5dcc568c79-9kllz | less

PXE boots

Next the boots service PXE boots the machine and begins streaming the HookOS (vmlinuz and initramfs) to the machine. HookOS runs in memory and provides the installation environment. To watch the boots log messages as each node boots, type:

docker logs boots 

You can search the output for vmlinuz and initramfs to watch as the HookOS is downloaded and booted from memory on each machine.

Running workflows

Once the HookOS is up, Tinkerbell begins running the tasks and actions contained in the workflows. This is coordinated between the tink worker, running in memory within the HookOS on the machine, and the tink server on the kind cluster. To see the workflows being run, type the following:

kubectl get workflows.tinkerbell.org -n eksa-system
NAME                                TEMPLATE                            STATE
mycluster-md-0-1656099863422-vxh2   mycluster-md-0-1656099863422-vxh2   STATE_RUNNING

This shows the workflow for the first machine that is being provisioned. Add -o yaml to see details of that workflow template:

kubectl get workflows.tinkerbell.org -n eksa-system -o yaml
...
status:
  state: STATE_RUNNING
  tasks:
  - actions
    - environment:
        COMPRESSED: "true"
        DEST_DISK: /dev/sda
        IMG_URL: https://anywhere-assets.eks.amazonaws.com/releases/bundles/11/artifacts/raw/1-22/bottlerocket-v1.22.10-eks-d-1-22-8-eks-a-11-amd64.img.gz
      image: public.ecr.aws/eks-anywhere/tinkerbell/hub/image2disk:6c0f0d437bde2c836d90b000312c8b25fa1b65e1-eks-a-15
      name: stream-image
      seconds: 35
      startedAt: "2022-06-27T20:37:39Z"
      status: STATE_SUCCESS
...

You can see that the first action in the workflow is to stream (stream-image) the operating system to the destination disk (DEST_DISK) on the machine. In this example, the Bottlerocket operating system that will be copied to disk (/dev/sda) is being served from the location specified by IMG_URL. The action was successful (STATE_SUCCESS) and it took 35 seconds.

Each action and its status is shown in this output for the whole workflow. To see details of the default actions for each supported operating system, see the Ubuntu TinkerbellTemplateConfig example and Bottlerocket TinkerbellTemplateConfig example .

In general, the actions include:

  • Streaming the operating system image to disk on each machine.
  • Configuring the network interfaces on each machine.
  • Setting up the cloud-init or similar service to add users and otherwise configure the system.
  • Identifying the data source to add to the system.
  • Setting the kernel to pivot to the installed system (using kexec) or having the system reboot to bring up the installed system from disk.

If all goes well, you will see all actions set to STATE_SUCCESS, except for the kexec-image action. That should show as STATE_RUNNING for as long as the machine is running.

You can review the CAPT logs to see provisioning activity. For example, at the start of a new provisioning event, you would see something like the following:

kubectl logs -n capt-system capt-controller-manager-9f8b95b-frbq | less
..."Created BMCJob to get hardware ready for provisioning"...

You can follow this output to see the machine as it goes through the provisioning process.

After the node is initialized, completes all the Tinkerbell actions, and is booted into the installed operating system (Ubuntu or Bottlerocket), the new system starts cloud-init to do further configuration. At this point, the system will reach out to the Tinkerbell hegel service to get the hegel metadata.

If something goes wrong, viewing hegel files can help you understand why a stuck system that has booted into Ubuntu or Bottlerocket has not joined the cluster yet. To see the hegel files, get the internal IP address for one of the new nodes. Then check for the names of hegel logs and display the contents of one of those logs, searching for the IP address of the node:

kubectl get nodes -o wide
NAME        STATUS   ROLES                 AGE    VERSION               INTERNAL-IP    ...
eksa-da04   Ready    control-plane,master  9m5s   v1.22.10-eks-7dc61e8  10.80.30.23
kubectl get logs -n eksa-system | grep hegel
hegel-n7ngs
kubectl logs -n eksa-system hegel-n7ngs
..."Retrieved IP peer IP..."userIP":"10.80.30.23...

If the log shows you are getting requests from the node, the problem is not a cloud-init issue.

After the first machine successfully completes the workflow, each other machine repeats the same process until the initial set of machines is all up and running.

Tinkerbell moves to target cluster

Once the initial set of machines is up and the EKS Anywhere cluster is running, all the Tinkerbell services and components (including boots) are moved to the new target cluster and run as pods on that cluster. Those services are deleted on the kind cluster on the Admin machine.

Reviewing the status

At this point, you can change your kubectl credentials to point at the new target cluster to get information about Tinkerbell services on the new cluster. For example:

export KUBECONFIG=${PWD}/${CLUSTER_NAME}/${CLUSTER_NAME}-eks-a-cluster.kubeconfig

First check that the Tinkerbell pods are all running by listing pods from the eksa-system namespace:

kubectl get pods -n eksa-system
NAME                                        READY   STATUS    RESTARTS   AGE
boots-5dc66b5d4-klhmj                       1/1     Running   0          3d
hegel-sbchp                                 1/1     Running   0          3d
rufio-controller-manager-5dcc568c79-9kllz   1/1     Running   0          3d
tink-controller-manager-54dc786db6-tm2c5    1/1     Running   0          3d
tink-server-5c494445bc-986sl                1/1     Running   0          3d

Next, check the list of Tinkerbell machines. If all of the machines were provisioned successfully, you should see true under the READY column for each one.

kubectl get tinkerbellmachine -A
NAMESPACE    NAME                                                   CLUSTER    STATE  READY  INSTANCEID                          MACHINE
eksa-system  mycluster-control-plane-template-1656099863422-pqq2q   mycluster         true   tinkerbell://eksa-system/eksa-da04  mycluster-72p72

You can also check the machines themselves. Watch the PHASE change from Provisioning to Provisioned to Running. The Running phase indicates that the machine is now running as a node on the new cluster:

kubectl get machines -n eksa-system
NAME              CLUSTER    NODENAME    PROVIDERID                         PHASE    AGE  VERSION
mycluster-72p72   mycluster  eksa-da04   tinkerbell://eksa-system/eksa-da04 Running  7m25s   v1.22.10-eks-1-22-8

Once you have confirmed that all your machines are successfully running as nodes on the target cluster, there is not much for Tinkerbell to do. It stays around to continue running the DHCP service and to be available to add more machines to the cluster.

4 - Customize HookOS for EKS Anywhere on Bare Metal

Customizing HookOS for EKS Anywhere on Bare Metal

To initially PXE boot bare metal machines used in EKS Anywhere clusters, Tinkerbell acquires a kernel and initial ramdisk that is referred to as the HookOS. A default HookOS is provided when you create an EKS Anywhere cluster. However, there may be cases where you want to override the default HookOS, such as to add drivers required to boot your particular type of hardware.

The following procedure describes how to get the Tinkerbell stack’s Hook/Linuxkit OS built locally. For more information on Tinkerbell’s Hook Installation Environment, see the Tinkerbell Hook repo .

  1. Clone the hook repo or your fork of that repo:

    git clone https://github.com/tinkerbell/hook.git
    cd hook/
    
  2. Pull down the commit that EKS Anywhere is tracking for Hook:

    git checkout -b <new-branch> 029ef8f0711579717bfd14ac5eb63cdc3e658b1d
    

    NOTE: This commit number can be obtained from the EKS-A build tooling repo .

  3. Make changes shown in the following diff in the Makefile located in the root of the repo using your favorite editor.

    diff --git a/Makefile b/Makefile
    index e7fd844..8e87c78 100644
    --- a/Makefile
    +++ b/Makefile
    @@ -2,7 +2,7 @@
     ### !!NOTE!!
     # If this is changed then a fresh output dir is required (`git clean -fxd` or just `rm -rf out`)
     # Handling this better shows some of make's suckiness compared to newer build tools (redo, tup ...) where the command lines to tools invoked isn't tracked by make
    -ORG := quay.io/tinkerbell
    +ORG := localhost:5000/tinkerbell
     # makes sure there's no trailing / so we can just add them in the recipes which looks nicer
     ORG := $(shell echo "${ORG}" | sed 's|/*$$||')
    
    

    Changes above change the ORG variable to use a local registry (localhost:5000)

  4. Make changes shown in the following diff in the rules.mk located in the root of the repo using your favorite editor.

    diff --git a/rules.mk b/rules.mk
    index b2c5133..64e32b1 100644
    --- a/rules.mk
    +++ b/rules.mk
    @@ -22,7 +22,7 @@ ifeq ($(ARCH),aarch64)
     ARCH = arm64
     endif
    
    -arches := amd64 arm64
    +arches := amd64
     modes := rel dbg
    
     hook-bootkit-deps := $(wildcard hook-bootkit/*)
    @@ -87,13 +87,12 @@ push-hook-bootkit push-hook-docker:
            docker buildx build --platform $$platforms --push -t $(ORG)/$(container):$T $(container)
    
     .PHONY: dist
    -dist: out/$T/rel/amd64/hook.tar out/$T/rel/arm64/hook.tar ## Build tarballs for distribution
    +dist: out/$T/rel/amd64/hook.tar ## Build tarballs for distribution
     dbg-dist: out/$T/dbg/$(ARCH)/hook.tar ## Build debug enabled tarball
     dist dbg-dist:
            for f in $^; do
            case $$f in
            *amd64*) arch=x86_64 ;;
     -      *arm64*) arch=aarch64 ;;
            *) echo unknown arch && exit 1;;
            esac
            d=$$(dirname $$(dirname $$f))
    
    

    Above changes are for the docker build command to only build for the immediately required platform (amd64 in this case) to save time.

  5. Modify the hook.yaml file located in the root of the repo with the following changes:

    diff --git a/hook.yaml b/hook.yaml
    
     index 0c5d789..b51b35e 100644
    
     net: host
    --- a/hook.yaml
    +++ b/hook.yaml
    @@ -1,5 +1,5 @@
     kernel:
    - image: quay.io/tinkerbell/hook-kernel:5.10.85 (http://quay.io/tinkerbell/hook-kernel:5.10.85)
    + image: localhost:5000/tinkerbell/hook-kernel:5.10.85
     cmdline: "console=tty0 console=ttyS0 console=ttyAMA0 console=ttysclp0"
     init:
     - linuxkit/init:v0.8
    @@ -42,7 +42,7 @@ services:
     binds:
     - /var/run:/var/run
     - name: docker
    - image: quay.io/tinkerbell/hook-docker:0.0 (http://quay.io/tinkerbell/hook-docker:0.0)
    + image: localhost:5000/tinkerbell/hook-docker:0.0
     capabilities:
     - all
     net: host
    @@ -64,7 +64,7 @@ services:
     - /var/run/docker
     - /var/run/worker
     - name: bootkit
    - image: quay.io/tinkerbell/hook-bootkit:0.0 (http://quay.io/tinkerbell/hook-bootkit:0.0)
    + image: localhost:5000/tinkerbell/hook-bootkit:0.0
     capabilities:
     - all
    

    The changes above are for using local registry (localhost:5000) for hook-docker, hook-bootkit, and hook-kernel.

    NOTE: You may also need to modify the hook.yaml file if you want to add or change components that are used to build up the image. So far, for example, we have needed to change versions of init and getty and inject SSH keys. Take a look at the LinuxKit Examples site for examples.

  6. Make any planned custom modifications to the files under hook, if you are only making changes to bootkit or tink-docker.

  7. If you are modifying the kernel, such as to change kernel config parameters to add or modify drivers, follow these steps:

    • Change into kernel directory and make a local image for amd64 architecture:
    cd kernel; make kconfig_amd64
    
    • Run the image
    docker run --rm -ti -v $(pwd):/src:z quay.io/tinkerbell/kconfig
    
    • You can now navigate to the source code and run the UI for configuring the kernel:
    cd linux-5-10
    make menuconfig
    
    • Once you have changed the necessary kernel configuration parameters, copy the new configuration:
    cp .config /src/config-5.10.x-x86_64
    

    Exit out of container into the repo’s kernel directory and run make:

    /linux-5.10.85 # exit
    user1 % make
    
  8. Install Linuxkit based on instructions from the LinuxKit page.

  9. Ensure that the linuxkit tool is in your PATH:

    export PATH=$PATH:/home/tink/linuxkit/bin
    
  10. Start a local registry:

    docker run -d -p 5000:5000 -—name registry registry:2
    
  11. Compile by running the following in the root of the repo:

    make dist  
    
  12. Artifacts will be put under the dist directory in the repo’s root:

    ./initramfs-aarch64
    ./initramfs-x86_64
    ./vmlinuz-aarch64
    ./vmlinuz-x86_64
    
  13. To use the kernel (vmlinuz) and initial ram disk (initramfs) when you build your cluster, see the description of the hookImagesURLPath field in your Bare Metal configuration file.