samedi 18 septembre 2021

OpenShift Assisted Installer on premise deep dive

Test

Assisted Installer on premise deep dive

Introduction

In this series of blog posts, we will demonstrate how Infrastructure as a code becomes a reality with OpenShift Assisted Installer onprem. This post will leverage kvm to show how to use Assisted Installer to deploy OpenShift, but the concepts here can extend to baremetal or vSphere deployments just as easily.

Lab Preparation

In this lab we will simulate Baremetal nodes with KVM VMs. Terraform will be used to orchestrate this virtual infrastructure.

A minimum of 256Gb of Ram and 500Gb SSD drive is recommended. The scripts and install steps below are based around the use of a Centos 8 machine as your host machine.

In order to have everything set and all the bits installed, run the following commands:


git clone https://github.com/latouchek/assisted-installer-deepdive.git

cd assisted-installer-deepdive

cp -r terraform /opt/

cd scripts

sh prepare-kvm-host.sh

The script creates a dedicated ocp network. It is mandatory to have a DNS and a static DHCP server on that network.

A dnsmasq.conf template is provided in assisted-installer-deepdive/config/ with mac adresses matching the OCP VMs that we will deploy later. It can be run on the host or on a dedicated VM/container.

Part I : Deploying the OpenShift Assisted Installer service on premise

1. Get the bits and build the service


sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

setenforce 0

dnf install -y @container-tools

dnf group install "Development Tools" -y

dnf -y install python3-pip socat make tmux git jq crun

git clone https://github.com/openshift/assisted-service

cd assisted-service

IP=192.167.124.1

AI_URL=http://$IP:8090

Modify onprem-environment and Makefile to set proper URL and port forwarding


sed -i "s@SERVICE_BASE_URL=.*@SERVICE_BASE_URL=$AI_URL@" onprem-environment

sed -i "s/5432,8000,8090,8080/5432:5432 -p 8000:8000 -p 8090:8090 -p 8080:8080/" Makefile

make deploy-onprem

.

.

.

If everything went well, we should see 4 containers running inside a pod


[root@kvm-host ~]podman ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

a940818185cb k8s.gcr.io/pause:3.5 3 minutes ago Up 2 minutes ago 0.0.0.0:5432->5432/tcp, 0.0.0.0:8000->8000/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp 59b56cb07140-infra

d94a46c8b515 quay.io/ocpmetal/postgresql-12-centos7:latest run-postgresql 2 minutes ago Up 2 minutes ago 0.0.0.0:5432->5432/tcp, 0.0.0.0:8000->8000/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp db

8c0e90d8c4fa quay.io/ocpmetal/ocp-metal-ui:latest /opt/bitnami/scri... About a minute ago Up About a minute ago 0.0.0.0:5432->5432/tcp, 0.0.0.0:8000->8000/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp ui

e98627cdc5f8 quay.io/ocpmetal/assisted-service:latest /assisted-service 42 seconds ago Up 43 seconds ago 0.0.0.0:5432->5432/tcp, 0.0.0.0:8000->8000/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8090->8090/tcp installer


[root@kvm-host ~] podman pod ps

POD ID NAME STATUS CREATED INFRA ID # OF CONTAINERS

59b56cb07140 assisted-installer Running 4 minutes ago a940818185cb 4

API should be accessible at http://192.167.124.1:8090 and GUI at http://192.167.124.1:8080/

API documentation can be found here

2. How does it work

In order to provision a cluster the following process must be followed:

Create a new OpenShift cluster definition in a json file
Register the new cluster by presenting the definition data to the API
Create a discovery boot media the nodes will boot from in order to be introspected and validated
Assign roles to introspected nodes and complete the cluster definition
Trigger the deployment

Part II : Using the Assisted Installer API

In this part we will show how to deploy a 5 nodes OCP cluster by following the steps we mentioned above.

Even though this lab is purely cli based it is recommended to have the UI on sight to understand the whole process.

1. Deploy our first cluster with AI API

Create a cluster definition file


export CLUSTER_SSHKEY=$(cat ~/.ssh/id_ed25519.pub)

export PULL_SECRET=$(cat pull-secret.txt | jq -R .)

cat <<  EOF > ./deployment-multinodes.json

{

"kind": "Cluster",

"name": "ocpd",

"openshift_version": "4.8",

"ocp_release_image": "quay.io/openshift-release-dev/ocp-release:4.8.5-x86_64",

"base_dns_domain": "lab.local",

"hyperthreading": "all",

"ingress_vip": "192.167.124.8",

"schedulable_masters": false,

"high_availability_mode": "Full",

"user_managed_networking": false,

"platform": {

"type": "baremetal"

},

"cluster_networks": [

{

"cidr": "10.128.0.0/14",

"host_prefix": 23

}

],

"service_networks": [

{

"cidr": "172.31.0.0/16"

}

],

"machine_networks": [

{

"cidr": "192.167.124.0/24"

}

],

"network_type": "OVNKubernetes",

"additional_ntp_source": "ntp1.hetzner.de",

"vip_dhcp_allocation": false,

"ssh_public_key": "$CLUSTER_SSHKEY",

"pull_secret": $PULL_SECRET

}

EOF

high_availability_mode and schedulable_masters parameters let you decide what type of cluster you want to install. Here is how to set those parameters:

3 nodes clusters: “high_availability_mode”: “Full” and “schedulable_masters”: true
3+ nodes clusters: “high_availability_mode”: “Full” and “schedulable_masters”: false
Single Node: “high_availability_mode”: "None"

You can choose if you want to handle loadbalancing in house or leave it to OCP by setting user_managed_networking to true. In both case, DHCP and DNS server are mandatory (Only DNS in the case of a static IP deployment).

Use deployment-multinodes.json to register the new cluster


AI_URL='http://192.167.124.1:8090'

curl -s -X POST "$AI_URL/api/assisted-install/v1/clusters" \

-d @./deployment-multinodes.json --header "Content-Type: application/json" | jq .

Check cluster is registered

Once the cluster definition has been sent to an the API we should be able to retrieve its unique id


CLUSTER_ID=$(curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" -H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].id')

[root@kvm-host ~] echo  $CLUSTER_ID

43b9c2f0-218e-4e76-8889-938fd52d6290

Check the new cluster status


[root@kvm-host ~] curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" -H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].status'

pending-for-input

When registering a cluster, the assisted installer runs a series of validation tests to assess if the cluster is ready to be deployed.

‘pending-for-input’ tells us we need to take some actions. Let’s take a look at validations_info:


curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" -H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].validations_info'|jq .

We can see below that the installer is waiting for the hosts . Before building the hosts, we need to create the Discovery ISO.


{

"id": "sufficient-masters-count",

"status": "failure",

"message": "Clusters must have exactly 3 dedicated masters. Please either add hosts, or disable the worker host"

}

],

  

{

"id": "cluster-cidr-defined",

"status": "success",

"message": "The Cluster Network CIDR is defined."

},

Build the discovery boot ISO

The discovery boot ISO is a live CoreOS image that the nodes will boot from. Once booted an introspection will be performed by the discovery agent and data sent to the assisted service. If the node passes the validation tests its status_info will be “Host is ready to be installed”.

We need some extra parameters to be injected into the ISO . To do so, we create a data file as described bellow:


cat <<  EOF > ./discovery-iso-params.json

{

"ssh_public_key": "$CLUSTER_SSHKEY",

"pull_secret": $PULL_SECRET,

"image_type": "full-iso"

}

EOF

ISO is now ready to be built! Let’s make the API call! As you can see we use the data file so pull-secret and ssh public key are injected into the live ISO.


curl -s -X POST "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID/downloads/image"  \

-d @discovery-iso-params.json \

--header "Content-Type: application/json" \

| jq '.'

In real world we would need to present this ISO to our hosts so they can boot from it. Because we are using KVM, we are going to download the ISO in the libvirt images directory and later create the VMs


curl \

-L "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID/downloads/image" \

-o /var/lib/libvirt/images/discovery_image_ocpd.iso

Start the nodes and the discovery process

In this lab, BM nodes are virtual and need to be provisioned first. A Terraform file is provided and will build 3 Masters, 4 workers. All the VMS are using the previously generated ISO to boot. Run the following commands inside the Terraform folder


[root@kvm-host terraform-ocp4-cluster-ai] terraform init ; terraform apply -auto-approve

Apply complete! Resources: 24 added, 0 changed, 0 destroyed.


[root@kvm-host terraform-ocp4-cluster-ai] virsh list --all

Id Name State

-----------------------------------

59 ocp4-master3 running

60 ocp4-master1 running

61 ocp4-master2 running

- ocp4-worker1 shut off

- ocp4-worker1-ht shut off

- ocp4-worker2 shut off

- ocp4-worker3 shut off

Only the master nodes will start for now. Wait 1 mn for them to be discovered and check validations_info


curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" \

-H "accept: application/json" \

-H "get_unregistered_clusters: false"| jq -r '.[].progress'


........

"hosts-data": [

{

"id": "all-hosts-are-ready-to-install",

"status": "success",

"message": "All hosts in the cluster are ready to install."

},

{

"id": "sufficient-masters-count",

"status": "success",

"message": "The cluster has a sufficient number of master candidates."

}

.........

Our hosts have been validated and are ready to be installed. Let’s take a closer look at the discovery data.

Retrieve the discovery hosts data with an API call


[root@kvm-host ~]curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" \

-H "accept: application/json" \

-H "get_unregistered_clusters: false"| jq -r '.[].hosts'


{

"checked_in_at": "2021-09-15T22:57:25.484Z",

"cluster_id": "71db492e-207e-47eb-af7b-c7c716c7e09d",

"connectivity": "{\"remote_hosts\":[{\"host_id\":\"2121a000-d27e-4596-a408-6813d3114caf\",\"l2_connectivity\":[{\"outgoing_ip_address\":\"192.167.124.12\",\"outgoing_nic\":\"ens3\",\"remote_ip_address\":\"192.167.124.13\",\"remote_mac\":\"aa:bb:cc:11:42:11\",\"successful\":true}],\"l3_connectivity\":[{\"average_rtt_ms\":0.304,\"outgoing_nic\":\"ens3\",\"remote_ip_address\":\"192.167.124.13\",\"successful\":true}]},{\"host_id\":\"84083091-8c0c-470b-a157-d002dbeed785\",\"l2_connectivity\":[{\"outgoing_ip_address\":\"192.167.124.12\",\"outgoing_nic\":\"ens3\",\"remote_ip_address\":\"192.167.124.14\",\"remote_mac\":\"aa:bb:cc:11:42:12\",\"successful\":true}],\"l3_connectivity\":[{\"average_rtt_ms\":0.237,\"outgoing_nic\":\"ens3\",\"remote_ip_address\":\"192.167.124.14\",\"successful\":true}]}]}",

"created_at": "2021-09-15T19:23:23.614Z",

"discovery_agent_version": "latest",

"domain_name_resolutions": "{\"resolutions\":[{\"domain_name\":\"api.ocpd.lab.local\",\"ipv4_addresses\":[\"192.167.124.7\"],\"ipv6_addresses\":[]},{\"domain_name\":\"api-int.ocpd.lab.local\",\"ipv4_addresses\":[],\"ipv6_addresses\":[]},{\"domain_name\":\"console-openshift-console.apps.ocpd.lab.local\",\"ipv4_addresses\":[\"192.167.124.8\"],\"ipv6_addresses\":[]},{\"domain_name\":\"validateNoWildcardDNS.ocpd.lab.local\",\"ipv4_addresses\":[],\"ipv6_addresses\":[]}]}",

"href": "/api/assisted-install/v2/infra-envs/71db492e-207e-47eb-af7b-c7c716c7e09d/hosts/fa89d7cd-c2d9-4f26-bd78-155647a32b04",

"id": "fa89d7cd-c2d9-4f26-bd78-155647a32b04",

"infra_env_id": "71db492e-207e-47eb-af7b-c7c716c7e09d",

"installation_disk_id": "/dev/disk/by-path/pci-0000:00:05.0",

"installation_disk_path": "/dev/vda",

"inventory": "{\"bmc_address\":\"0.0.0.0\",\"bmc_v6address\":\"::/0\",\"boot\":{\"current_boot_mode\":\"bios\"},\"cpu\":{\"architecture\":\"x86_64\",\"count\":12,\"flags\":[\"fpu\",\"vme\",\"de\",\"pse\",\"tsc\",\"msr\",\"pae\",\"mce\",\"cx8\",\"apic\",\"sep\",\"mtrr\",\"pge\",\"mca\",\"cmov\",\"pat\",\"pse36\",\"clflush\",\"mmx\",\"fxsr\",\"sse\",\"sse2\",\"ss\",\"syscall\",\"nx\",\"pdpe1gb\",\"rdtscp\",\"lm\",\"constant_tsc\",\"arch_perfmon\",\"rep_good\",\"nopl\",\"xtopology\",\"cpuid\",\"tsc_known_freq\",\"pni\",\"pclmulqdq\",\"vmx\",\"ssse3\",\"fma\",\"cx16\",\"pdcm\",\"pcid\",\"sse4_1\",\"sse4_2\",\"x2apic\",\"movbe\",\"popcnt\",\"tsc_deadline_timer\",\"aes\",\"xsave\",\"avx\",\"f16c\",\"rdrand\",\"hypervisor\",\"lahf_lm\",\"abm\",\"cpuid_fault\",\"invpcid_single\",\"pti\",\"ssbd\",\"ibrs\",\"ibpb\",\"stibp\",\"tpr_shadow\",\"vnmi\",\"flexpriority\",\"ept\",\"vpid\",\"ept_ad\",\"fsgsbase\",\"tsc_adjust\",\"bmi1\",\"avx2\",\"smep\",\"bmi2\",\"erms\",\"invpcid\",\"xsaveopt\",\"arat\",\"umip\",\"md_clear\",\"arch_capabilities\"],\"frequency\":3491.914,\"model_name\":\"Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz\"},\"disks\":[

  

{\"bootable\":true,\"by_path\":\"/dev/disk/by-path/pci-0000:00:01.1-ata-1\",\"drive_type\":\"ODD\",\"hctl\":\"0:0:0:0\",\"id\":\"/dev/

  

"progress": {

"current_stage": "",

.

},

"progress_stages": null,

"role": "auto-assign",

"user_name": "admin",

"validations_info": "{\"hardware\":[{\"id\":\"has-inventory\",\"status\":\"success\",\"message\":\"Valid inventory exists for  the host\"},{\"id\":\"has-min-cpu-cores\",\"status\":\"success\",\"message\":\"Sufficient CPU cores\"},{\"id\":\"has-min-memory\",\"status\":\"success\",\"message\":\"Sufficient minimum RAM\"},{\"id\":\"has-min-valid-disks\",\"status\":\"success\",\"message\":\"Sufficient disk capacity\"},{\"id\":\"has-cpu-cores-for-role\",\"status\":\"success\",\"message\":\"Sufficient CPU cores for role auto-assign\"},{\"id\":\"has-memory-for-role\",\"status\":\"success\",\"message\":\"Sufficient RAM for role auto-assign\"},{\"id\":\"hostname-unique\",\"status\":\"success\",\"message\":\"Hostname ocp4-master0.ocpd.lab.local is unique  in cluster\"},{\"id\":\"hostname-valid\",\"status\":\"success\",\"message\":\"Hostname ocp4-master0.ocpd.lab.local is allowed\"},{\"id\":\"valid-platform\",\"status\":\"success\",\"message\":\"Platform KVM is allowed\"},

.............................................................................

{\"id\":\"sufficient-installation-disk-speed\",\"status\":\"success\",\"message\":\"Speed of installation disk has not yet been measured\"},{\"id\":\"compatible-with-cluster-platform\",\"status\":\"success\",\"message\":\"Host is compatible with cluster platform \"message\":\"lso is disabled\"},{\"id\":\"ocs-requirements-satisfied\",\"status\":\"success\",\"message\":\"ocs is disabled\"}]}"

}

This is a truncated version of the full ouput as it contains quite a lot of informations. Basically the agent provides all hardware info to the assisted service so it can have a precise inventory of the host hardware and eventually validate the nodes.

To get more info about validation and hardware inventory, you can use these 2 one liners


curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" -H "accept: application/json" \

-H "get_unregistered_clusters: false"| jq -r '.[].validations_info'|jq .


curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" -H "accept: application/json" \

-H "get_unregistered_clusters: false"| jq -r '.[].hosts[].inventory'|jq -r .

One important point to notice is that each hosts gets its own id after this process. We can extract these with the following call:


[root@kvm-host ~] curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true"\

-H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].hosts[].id'

  

2121a000-d27e-4596-a408-6813d3114caf

84083091-8c0c-470b-a157-d002dbeed785

fa89d7cd-c2d9-4f26-bd78-155647a32b04

Assign role to discovered Nodes

After validation, each node gets the ‘auto-assign’ role. We can check with this API call:


[root@kvm-host ~]curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" -H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].hosts[].role'

auto-assign

auto-assign

auto-assign

If you want something a bit more predictable, you can assign roles based on nodes id. Since only our master nodes have been discovered, we will assign them the master role:


for  i  in  `curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true"\

-H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].hosts[].id'| awk 'NR>0' |awk '{print $1;}'`

do curl -X PATCH "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"hosts_roles\": [ { \"id\": \"$i\", \"role\": \"master\" } ]}"

done

Check the result:


[root@kvm-host ~]curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true"\

-H "accept: application/json" \

-H "get_unregistered_clusters: false"| jq -r '.[].hosts[].role'

master

master

master

Add workers, complete configuration and trigger the installation

It’s now time to start our workers. The same discovery process will take place and the new nodes will get the auto-assign role. Because a cluster cannot have more than 3 masters, we are sure auto-assign=worker this time.

Because we set vip_dhcp_allocation to false in the cluster definition file, we need to set api_vip parameter before we can trigger the installation.


curl -X PATCH "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID" \

-H "accept: application/json"\

-H "Content-Type: application/json" -d "{ \"api_vip\": \"192.167.124.7\"}"

And finally start installation:


curl -X PATCH "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID" \

-H "accept: application/json" \

-H "Content-Type: application/json" -d "{ \"api_vip\": \"192.167.124.7\"}"

During the installation process, disks will be written and nodes will reboot. One of the masters will also play the bootstrap role until the control plane is ready then the installation will continue as usual.

Monitoring the installation progress

We can closely monitor the nodes states during the installation process:


[root@kvm-host ~]curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true"\

-H "accept: application/json" \

-H "get_unregistered_clusters: false"| jq -r '.[].hosts[].progress'

{

"current_stage": "Writing image to disk",

"installation_percentage": 42,

"progress_info": "92%",

"stage_started_at": "2021-09-16T15:56:39.275Z",

"stage_updated_at": "2021-09-16T15:57:31.215Z"

}

{

"current_stage": "Writing image to disk",

"installation_percentage": 42,

"progress_info": "93%",

"stage_started_at": "2021-09-16T15:56:38.290Z",

"stage_updated_at": "2021-09-16T15:57:31.217Z"

}

{

"current_stage": "Writing image to disk",

"installation_percentage": 30,

"progress_info": "92%",

"stage_started_at": "2021-09-16T15:56:38.698Z",

"stage_updated_at": "2021-09-16T15:57:31.218Z"

}

{

"current_stage": "Waiting for control plane",

"installation_percentage": 44,

"stage_started_at": "2021-09-16T15:56:32.053Z",

"stage_updated_at": "2021-09-16T15:56:32.053Z"

}

{

"current_stage": "Waiting for control plane",

"installation_percentage": 44,

"stage_started_at": "2021-09-16T15:56:42.398Z",

"stage_updated_at": "2021-09-16T15:56:42.398Z"

}

To monitor the whole installation progress:


[root@kvm-host ~]curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" \

-H "accept: application/json" \

-H "get_unregistered_clusters: false"| jq -r '.[].progress'

{

"finalizing_stage_percentage": 100,

"installing_stage_percentage": 100,

"preparing_for_installation_stage_percentage": 100,

"total_percentage": 100

}

Retrieve kubeconfig and credentials


[root@kvm-host ~] curl -X GET "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID/downloads/kubeconfig" \

-H "accept: application/octet-stream" > .kube/config

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 12104 100 12104 0 0 2955k 0 --:--:-- --:--:-- --:--:-- 2955k

[root@kvm-host ~]oc get nodes

NAME STATUS ROLES AGE VERSION

ocp4-master0.ocpd.lab.local Ready master 119m v1.21.1+9807387

ocp4-master1.ocpd.lab.local Ready master 134m v1.21.1+9807387

ocp4-master2.ocpd.lab.local Ready master 134m v1.21.1+9807387

ocp4-worker0.ocpd.lab.local Ready worker 119m v1.21.1+9807387

ocp4-worker1.ocpd.lab.local Ready worker 119m v1.21.1+9807387


[root@kvm-host ~] curl -X GET "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID/credentials" \

-H "accept: application/json" |jq -r .

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 132 100 132 0 0 44000 0 --:--:-- --:--:-- --:--:-- 44000

{

"console_url": "https://console-openshift-console.apps.ocpd.lab.local",

"password": "8Tepe-uxF7Q-ztHg5-yoKPQ",

"username": "kubeadmin"

}

Part III : Day 2 Operations

Adding worker nodes

In order to add extra workers to an existing cluster the following process must be followed:

Create a new ‘AddHost cluster’

This creates a new OpenShift cluster definition for adding nodes to our existing OCP cluster.

We have to manually generate the new cluster id and name and make sure the api_vip_dnsname matches the existing cluster.


###Generate id for addhost cluster####

  

NCLUSTER_ID=$(curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" -H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].id'| tr b c)

##### creating addhost cluster

echo  $NCLUSTER_ID

  

curl -X POST "http://192.167.124.1:8090/api/assisted-install/v1/add_hosts_clusters" \

-H "accept: application/json" -H "Content-Type: application/json" \

-d "{ \"id\": \"$NCLUSTER_ID\", \"name\": \"ocp2\", \"api_vip_dnsname\": \"api.ocpd.lab.local\", \"openshift_version\": \"4.8\"}"

Create a new discovery boot media the new nodes will boot from in order to be introspected and validated


####Patch new cluster to add pullsecret####

  

cat <<  EOF > ./new-params.json

{

"ssh_public_key": "$CLUSTER_SSHKEY",

"pull_secret": $PULL_SECRET

}

EOF

curl -s -X PATCH "$AI_URL/api/assisted-install/v1/clusters/$NCLUSTER_ID" -d @new-params.json --header "Content-Type: application/json" | jq '.'

  

####create and download new ISO ####

curl -s -X POST "$AI_URL/api/assisted-install/v1/clusters/$NCLUSTER_ID/downloads/image" \

-d @new-params.json --header "Content-Type: application/json" \

| jq '.'

  

curl -L "$AI_URL/api/assisted-install/v1/clusters/$NCLUSTER_ID/downloads/image" \

-o /var/lib/libvirt/images/default/discovery_image_ocpd2.iso

Before starting the extra workers, make sure they boot from the new created ISO


virsh change-media --domain ocp4-worker1-ht hda \

--current --update \

--source /var/lib/libvirt/images/default/discovery_image_ocpd2.iso

virsh start --domain ocp4-worker1-ht

Let’s take a closer look at both clusters


curl -s -X GET -H "Content-Type: application/json"  "$AI_URL/api/assisted-install/v1/clusters" | jq -r .[].id

Output should look like this:


58fb589e-2f8b-44ee-b056-08499ba7ddd5 <-- UUID of the existing cluster

58fc589e-2f8c-44ee-c056-08499ca7ddd5 <-- UUID of the AddHost cluster

Use the API to get more details:


[root@kvm-host ~] curl -s -X GET --header "Content-Type: application/json" \

"$AI_URL/api/assisted-install/v1/clusters/$NCLUSTER_ID" | jq -r .kind

AddHostsCluster

[root@kvm-host ~] curl -s -X GET --header "Content-Type: application/json" \

"$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID" | jq -r .kind

Cluster

[root@kvm-host ~] curl -s -X GET --header "Content-Type: application/json" \

"$AI_URL/api/assisted-install/v1/clusters/$NCLUSTER_ID" | jq -r .status

adding-hosts

[root@kvm-host ~] curl -s -X GET --header "Content-Type: application/json" \

"$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID" | jq -r .status

installed

####Check nodes for each clusters####

[root@kvm-host ~] curl -s -X GET --header "Content-Type: application/json" \

"$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID" | jq -r .hosts[].requested_hostname

ocp4-master1.ocpd.lab.local

ocp4-master2.ocpd.lab.local

ocp4-worker0.ocpd.lab.local

ocp4-worker1.ocpd.lab.local

ocp4-master0.ocpd.lab.local

[root@kvm-host ~] curl -s -X GET --header "Content-Type: application/json" \

"$AI_URL/api/assisted-install/v1/clusters/$NCLUSTER_ID" | jq -r .hosts[].requested_hostname

ocp4-worker1-ht.ocpd.lab.local

[root@kvm-host ~] curl -s -X GET --header "Content-Type: application/json" \

"$AI_URL/api/assisted-install/v1/clusters/$NCLUSTER_ID" | jq -r .hosts[].role

auto-assign

We see above that everything is working as expected:

The new cluster is in adding-hosts state
The existing cluster is in installed state
The new worker has been discovered and has been given the right role
Start the new node installation:


curl -X POST "$AI_URL/api/assisted-install/v1/clusters/$NCLUSTER_ID/actions/install_hosts" \

-H "accept: application/json" | jq '.'

As soon the installation begin,the new node will get the worker roles


[root@kvm-host ~] curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" -H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].hosts[].role'

master

master

worker

worker

master

worker

Wait for the new worker to reboot and check pending CSRs


[root@kvm-host ~] oc get csr|grep Pending

csr-5jrm7 5m55s kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending

  

###Approve all CSR###

[root@kvm-host ~] for  csr  in  $(oc -n openshift-machine-api get csr | awk '/Pending/ {print $1}'); do oc adm certificate approve $csr;done

certificatesigningrequest.certificates.k8s.io/csr-5jrm7 approved

We should now see the new node:


[root@kvm-host ~] oc get nodes

NAME STATUS ROLES AGE VERSION

ocp4-master0.ocpd.lab.local Ready master 59m v1.22.0-rc.0+75ee307

ocp4-master1.ocpd.lab.local Ready master 40m v1.22.0-rc.0+75ee307

ocp4-master2.ocpd.lab.local Ready master 59m v1.22.0-rc.0+75ee307

ocp4-worker0.ocpd.lab.local Ready worker 42m v1.22.0-rc.0+75ee307

ocp4-worker1-ht.ocpd.lab.local NotReady worker 48s v1.22.0-rc.0+75ee307

ocp4-worker1.ocpd.lab.local Ready worker 42m v1.22.0-rc.0+75ee307

After a few minutes we should see:


[root@kvm-host ~] oc get nodes

NAME STATUS ROLES AGE VERSION

ocp4-master0.ocpd.lab.local Ready master 62m v1.22.0-rc.0+75ee307

ocp4-master1.ocpd.lab.local Ready master 44m v1.22.0-rc.0+75ee307

ocp4-master2.ocpd.lab.local Ready master 62m v1.22.0-rc.0+75ee307

ocp4-worker0.ocpd.lab.local Ready worker 45m v1.22.0-rc.0+75ee307

ocp4-worker1-ht.ocpd.lab.local Ready worker 3m54s v1.22.0-rc.0+75ee307

ocp4-worker1.ocpd.lab.local Ready worker 45m v1.22.0-rc.0+75ee307

Check with AI API


[root@kvm-host ~] curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" \

-H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r .[].hosts[].status

installed

installed

installed

installed

installed

added-to-existing-cluster

We succefully added an extra worker.

Part IV : Network Tweaks

Bond configuration

In order to provision a cluster with bonded interfaces for Workers, we need to use the static_network_config parameter when building the Discovery ISO.


"static_network_config": [

{

"network_yaml": "Network state in json format for a specific node",

"mac_interface_map": [

{

"mac_address": "string",

"logical_nic_name": "string"

}

]

}

]

Let’s take a look at the different values we need to provide:
“network_yaml”: “Network state in json format”
Nodes network configuration is handled by kubernetes-nmstate and needs to be provided in JSON. For simplicity we will first write the desired network state in YAML format and JSON encode it for each nodes. Many examples are provided here and in assisted installer documention here.

The YAML below describes the bond definition we will write for each workers in our Lab.

In this example we create a bond with ens3 and ens4 as slaves and we assign a static ip to ens5.


interfaces:

- name: bond0

description: Bond

type: bond

state: up

ipv4:

enabled: true

dhcp: true

auto-dns: true

auto-gateway: true

auto-routes: true

link-aggregation:

mode: balance-rr

options:

miimon: '140'

port:

- ens3

- ens4

- name: ens3

state: up

type: ethernet

- name: ens4

state: up

type: ethernet

- name: ens5

state: up

type: ethernet

ipv4:

address:

- ip: 10.17.3.4

prefix-length: 24

enabled: true

“mac_interface_map”: []
Because all nodes will boot from the same Discovery ISO, mac addresses and logical nic names have to be mapped as shown in example bellow:


{

"mac_interface_map": [

{

"mac_address": "aa:bb:cc:11:42:21",

"logical_nic_name": "ens3"

},

{

"mac_address": "aa:bb:cc:11:42:51",

"logical_nic_name": "ens4"

},

{

"mac_address": "aa:bb:cc:11:42:61",

"logical_nic_name": "ens5"

}

]

}

It is important to understand that with both nmstates and mac mapping, each node can be individually configured when booting from the same Discovery ISO.

Now we have described the logic, let’s create the data file we’ll present to the API. We’ll use jq to json encode and inject nmstate YAML into our final data file.

Prepare the environment:

In this lab workers node have 3 NICs connected to 2 different networks (See Terraform file provided for more details)

nmstate files are provided in the git repo


cd assisted-installer-deepdive

mkdir ~/bond

cp config/nmstate* ~/bond/

Create the network data file


export AI_URL='http://192.167.124.1:8090'

export CLUSTER_SSHKEY=$(cat ~/.ssh/id_ed25519.pub)

export PULL_SECRET=$(cat pull-secret.txt | jq -R .)

export NODE_SSH_KEY="$CLUSTER_SSHKEY"

cd /root/

  

jq -n --arg SSH_KEY "$NODE_SSH_KEY" --arg NMSTATE_YAML1 "$(cat ~/bond/nmstate-bond-worker0.yaml)" --arg NMSTATE_YAML2 "$(cat ~/bond/nmstate-bond-worker1.yaml)"  '{

"ssh_public_key": $CLUSTER_SSHKEY",

"image_type": "full-iso",

"static_network_config": [

{

"network_yaml": $NMSTATE_YAML1,

"mac_interface_map": [{"mac_address": "aa:bb:cc:11:42:20", "logical_nic_name": "ens3"}, {"mac_address": "aa:bb:cc:11:42:50", "logical_nic_name": "ens4"},{"mac_address": "aa:bb:cc:11:42:60", "logical_nic_name": "ens5"}]

},

{

"network_yaml": $NMSTATE_YAML2,

"mac_interface_map": [{"mac_address": "aa:bb:cc:11:42:21", "logical_nic_name": "ens3"}, {"mac_address": "aa:bb:cc:11:42:51", "logical_nic_name": "ens4"},{"mac_address": "aa:bb:cc:11:42:61", "logical_nic_name": "ens5"}]

}

]

}' > bond-workers

Build the image


curl -H "Content-Type: application/json" -X POST \

-d @bond-workers ${AI_URL}/api/assisted-install/v1/clusters/$CLUSTER_ID/downloads/image | jq .

Deploy the cluster

For a fully automated deployment use the script full-deploy-ai-multinode-bond.sh provided in the git repo


###Create infra ###

  

terraform -chdir=/opt/terraform/ai-bond apply -auto-approve

####Assign master role to master VMs####

  

for  i  in  `curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true"\

-H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].hosts[].id'| awk 'NR>0' |awk '{print $1;}'`

do curl -X PATCH "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"hosts_roles\": [ { \"id\": \"$i\", \"role\": \"master\" } ]}"

done

  

###set api IP###

  

curl -X PATCH "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"api_vip\": \"192.167.124.7\"}"

  

###Start workers####

for  i  in {0..1}

do virsh start ocp4-worker$i

done

  

sleep 180

  

###Start installation###

curl -X POST \

"$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID/actions/install" \

-H "accept: application/json" \

-H "Content-Type: application/json"

  

echo Wait for install to complete

  

while [[ $STATUS != 100 ]]

do

sleep 5

STATUS=$(curl -s -X GET "$AI_URL/api/assisted-install/v2/clusters?with_hosts=true" -H "accept: application/json" -H "get_unregistered_clusters: false"| jq -r '.[].progress.total_percentage')

done

  

echo Download kubeconfig

mkdir ~/.kube

curl -X GET "$AI_URL/api/assisted-install/v1/clusters/$CLUSTER_ID/downloads/kubeconfig" -H "accept: application/octet-stream" > .kube/config

Check workers are configured as requested:
After sshing into worker0 we can see all connections were created.(The same result can be observed on worker1)


[root@ocp4-worker0 ~] ls -1 /etc/NetworkManager/system-connections/

bond0.nmconnection

br-ex.nmconnection

ens3-slave-ovs-clone.nmconnection

ens3.nmconnection

ens4-slave-ovs-clone.nmconnection

ens4.nmconnection

ens5.nmconnection

ovs-if-br-ex.nmconnection

ovs-if-phys0.nmconnection

ovs-port-br-ex.nmconnection

ovs-port-phys0.nmconnection

Check configuration for each NIC:


cat /etc/NetworkManager/system-connections/ens3.nmconnection

[connection]

id=ens3

uuid=501c47c3-7c1d-4424-b131-f40dd89827a9

type=ethernet

interface-name=ens3

master=bond0

permissions=

slave-type=bond

autoconnect=true

autoconnect-priority=1

  

[ethernet]

mac-address-blacklist=


cat /etc/NetworkManager/system-connections/ens4.nmconnection

[connection]

id=ens4

uuid=6baa8165-0fa0-4eae-83cb-f89462aa6f18

type=ethernet

interface-name=ens4

master=bond0

permissions=

slave-type=bond

autoconnect=true

autoconnect-priority=1

  

[ethernet]

mac-address-blacklist=


cat /etc/NetworkManager/system-connections/ens5.nmconnection

[connection]

id=ens5

uuid=c04c3a19-c8d7-4c6b-a836-c64b573de270

type=ethernet

interface-name=ens5

permissions=

autoconnect=true

autoconnect-priority=1

  

[ethernet]

mac-address-blacklist=

  

[ipv4]

address1=10.17.3.3/24

dhcp-client-id=mac

dns-search=

method=manual

  

[ipv6]

addr-gen-mode=eui64

dhcp-duid=ll

dhcp-iaid=mac

dns-search=

method=disabled

  

[proxy]

Thank you for reading

References

jeudi 22 avril 2021

How to automate OCP 4.7 UPI Installation on Vsphere and assign Static IPs to Nodes

Test

How to automate OCP 4.7 UPI Installation on Vsphere and assign Static IPs to Nodes

Introduction

In this post we will show how to automate an customize an OCP 4.7 UPI installation on Vsphere.
In the first part we will use govc,an open source command-line utility for performing administrative actions on a VMware vCenter or vSphere and in the second part we will deploy the cluster with Terraform. All files can be found here

Prequisites:

DNS, Loadbalancer and Webserver

Use the provided templates in files folder to configure the required services

dnf install -y named httpd haproxy
mkdir -p  /var/www/html/ignition

Download necessary binaries

We need to install govc client, oc client and ocp installer and Terraform. This is how we proceed

wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest-4.7/openshift-client-linux.tar.gz
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest-4.7/openshift-install-linux.tar.gz
wget https://github.com/vmware/govmomi/releases/download/v0.24.0/govc_linux_amd64.gz
tar zxvf openshift-client-linux.tar.gz
tar zxvf openshift-install-linux.tar.gz
gunzip govc_linux_amd64.gz
rm -f *gz README.md
mv oc kubectl openshift-install /usr/local/bin/
mv govc_linux_amd64 /usr/local/bin/govc
dnf install -y dnf-plugins-core
dnf config-manager --add-repo https://rpm.releases.hashicorp.com/$release/hashicorp.repo
dnf install terraform -y

Part I

Automating with govc:

Export variables for govc (modify according to your env)

export OCP_RELEASE="4.7.4"
export CLUSTER_DOMAIN="vmware.lab.local"
export GOVC_URL='192.168.124.3'
export GOVC_USERNAME='administrator@vsphere.local'
export GOVC_PASSWORD='vSphere Pass'
export GOVC_INSECURE=1
export GOVC_NETWORK='VM Network'
export VMWARE_SWITCH='DSwitch'
export GOVC_DATASTORE='datastore1'
export GOVC_DATACENTER='Datacenter'
export GOVC_RESOURCE_POOL=yourcluster_name/Resources  ####default pool
export MYPATH=$(pwd)
export HTTP_SERVER="192.168.124.1"
export bootstrap_name="bootstrap"
export bootstrap_ip="192.168.124.20"

export HTTP_SERVER="192.168.124.1"
export master_name="master"
export master1_ip="192.168.124.7"
export master2_ip="192.168.124.8"
export master3_ip="192.168.124.9"

export worker_name="worker"
export worker1_ip="192.168.124.10"
export worker2_ip="192.168.124.11"
export worker3_ip="192.168.124.12"

export MASTER_CPU="4"
export MASTER_MEMORY="16384"   
export WORKER_CPU="4"
export WORKER_MEMORY="16384"

export ocp_net_gw="192.168.124.1"
export ocp_net_mask="255.255.255.0"
export ocp_net_dns="192.168.124.235"

Create ignition files

Modify install-config.yaml according to your needs.
Because bootstrap ignition is too big, it needs to be placed on a webserver and downloaded during the first boot. To achieve that, we create bootstrap-append.ign that points to the right file

rm -f /var/www/html/ignition/*.ign
rm -rf ${MYPATH}/openshift-install
rm -rf ~/.kube
mkdir ${MYPATH}/openshift-install
mkdir ~/.kube
cp install-config.yaml ${MYPATH}/openshift-install/install-config.yaml
cat > ${MYPATH}/openshift-install/bootstrap-append.ign <<EOF
{
  "ignition": {
    "config": {
      "merge": [
      {
        "source": "http://${HTTP_SERVER}:8080/ignition/bootstrap.ign"
      }
      ]
    },
    "version": "3.1.0"
  }
}
EOF
openshift-install create ignition-configs --dir  openshift-install --log-level debug
cp ${MYPATH}/openshift-install/*.ign /var/www/html/ignition/
chmod o+r /var/www/html/ignition/*.ign
restorecon -vR /var/www/html/
cp ${MYPATH}/openshift-install/auth/kubeconfig ~/.kube/config

Prepare CoreOS template

Before downloading the ova we create coreos.json to modify Network mapping (Make sure GOVC_NETWORK is properly defined )

cat > coreos.json <<EOF
{
"DiskProvisioning": "flat",
"IPAllocationPolicy": "dhcpPolicy",
"IPProtocol": "IPv4",
"PropertyMapping": [
{
 "Key": "guestinfo.ignition.config.data",
 "Value": ""
},
{
 "Key": "guestinfo.ignition.config.data.encoding",
 "Value": ""
}
],
"NetworkMapping": [
{
 "Name": "VM Network",
 "Network": "${GOVC_NETWORK}"
}
],
"MarkAsTemplate": false,
"PowerOn": false,
"InjectOvfEnv": false,
"WaitForIP": false,
"Name": null
}
EOF

We can now download the image, apply the changes, import, tag the resulting VM as template and finaly create the bootsrap VM out of this template

wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/latest/rhcos-vmware.x86_64.ova
govc import.ova -options=coreos.json -name coreostemplate rhcos-vmware.x86_64.ova
govc vm.markastemplate coreostemplate
govc vm.clone -vm coreostemplate  -on=false  bootstrap

IGN files need to be provided to Vsphere instance through guestinfo.ignition.config.data. We need to encode it in base64 before anything and change the previously created bootstrap VM:

bootstrap=$(cat openshift-install/append-bootstrap.ign | base64 -w0)
govc vm.change -e="guestinfo.ignition.config.data=${bootstrap}" -vm=${bootstrap_name}
govc vm.change -e="guestinfo.ignition.config.data.encoding=base64" -vm=${bootstrap_name}
govc vm.change -e="disk.EnableUUID=TRUE" -vm=${bootstrap_name}

To set Static IP to bootstrap we issue the following command:

govc vm.change -e="guestinfo.afterburn.initrd.network-kargs=ip=${bootstrap_ip}::${ocp_net_gw}:${ocp_net_mask}:${bootstrap_name}:ens192:off nameserver=${ocp_net_dns}" -vm=${bootstrap_name}

We are going to repeat those steps for Masters and Workers :

govc vm.clone -vm coreostemplate  -on=false  ${master_name}00.${CLUSTER_DOMAIN}
govc vm.clone -vm coreostemplate  -on=false  ${master_name}01.${CLUSTER_DOMAIN}
govc vm.clone -vm coreostemplate  -on=false  ${master_name}02.${CLUSTER_DOMAIN}

govc vm.change -c=${MASTER_CPU} -m=${MASTER_MEMORY} -vm=${master_name}00.${CLUSTER_DOMAIN}
govc vm.change -c=${MASTER_CPU} -m=${MASTER_MEMORY} -vm=${master_name}01.${CLUSTER_DOMAIN}
govc vm.change -c=${MASTER_CPU} -m=${MASTER_MEMORY} -vm=${master_name}02.${CLUSTER_DOMAIN}

govc vm.disk.change -vm ${master_name}00.${CLUSTER_DOMAIN} -size 120GB
govc vm.disk.change -vm ${master_name}01.${CLUSTER_DOMAIN} -size 120GB
govc vm.disk.change -vm ${master_name}02.${CLUSTER_DOMAIN} -size 120GB

master=$(cat openshift-install/master.ign | base64 -w0)

govc vm.change -e="guestinfo.ignition.config.data=${master}" -vm=${master_name}00.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.ignition.config.data=${master}" -vm=${master_name}01.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.ignition.config.data=${master}" -vm=${master_name}02.${CLUSTER_DOMAIN}

govc vm.change -e="guestinfo.ignition.config.data.encoding=base64" -vm=${master_name}00.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.ignition.config.data.encoding=base64" -vm=${master_name}01.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.ignition.config.data.encoding=base64" -vm=${master_name}02.${CLUSTER_DOMAIN}

govc vm.change -e="disk.EnableUUID=TRUE" -vm=${master_name}00.${CLUSTER_DOMAIN}
govc vm.change -e="disk.EnableUUID=TRUE" -vm=${master_name}01.${CLUSTER_DOMAIN}
govc vm.change -e="disk.EnableUUID=TRUE" -vm=${master_name}02.${CLUSTER_DOMAIN}

govc vm.change -e="guestinfo.afterburn.initrd.network-kargs=ip=${master1_ip}::${ocp_net_gw}:${ocp_net_mask}:${master_name}00.${CLUSTER_DOMAIN}:ens192:off nameserver=${ocp_net_dns}" -vm=${master_name}00.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.afterburn.initrd.network-kargs=ip=${master2_ip}::${ocp_net_gw}:${ocp_net_mask}:${master_name}01.${CLUSTER_DOMAIN}:ens192:off nameserver=${ocp_net_dns}" -vm=${master_name}01.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.afterburn.initrd.network-kargs=ip=${master3_ip}::${ocp_net_gw}:${ocp_net_mask}:${master_name}02.${CLUSTER_DOMAIN}:ens192:off nameserver=${ocp_net_dns}" -vm=${master_name}02.${CLUSTER_DOMAIN}

worker=$(cat /var/opsh/ocpddc-test/worker.ign | base64 -w0)
govc vm.clone -vm coreostemplate  -on=false  ${worker_name}00.${CLUSTER_DOMAIN}
govc vm.clone -vm coreostemplate  -on=false  ${worker_name}01.${CLUSTER_DOMAIN}
# govc vm.clone -vm coreostemplate  -on=false  ${worker_name}02.${CLUSTER_DOMAIN}

govc vm.change -e="guestinfo.ignition.config.data=${worker}" -vm=${worker_name}00.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.ignition.config.data=${worker}" -vm=${worker_name}01.${CLUSTER_DOMAIN}
# govc vm.change -e="guestinfo.ignition.config.data=${worker}" -vm=${worker_name}02.${CLUSTER_DOMAIN}

govc vm.change -e="guestinfo.ignition.config.data.encoding=base64" -vm=${worker_name}00.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.ignition.config.data.encoding=base64" -vm=${worker_name}01.${CLUSTER_DOMAIN}
# govc vm.change -e="guestinfo.ignition.config.data.encoding=base64" -vm=${worker_name}02.${CLUSTER_DOMAIN}

govc vm.change -e="disk.EnableUUID=TRUE" -vm=${worker_name}00.${CLUSTER_DOMAIN}
govc vm.change -e="disk.EnableUUID=TRUE" -vm=${worker_name}01.${CLUSTER_DOMAIN}
govc vm.change -e="disk.EnableUUID=TRUE" -vm=${worker_name}02.${CLUSTER_DOMAIN}

govc vm.change -e="guestinfo.afterburn.initrd.network-kargs=ip=${worker1_ip}::${ocp_net_gw}:${ocp_net_mask}:${worker_name}00.${CLUSTER_DOMAIN}:ens192:off nameserver=${ocp_net_dns}" -vm=${worker_name}00.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.afterburn.initrd.network-kargs=ip=${worker2_ip}::${ocp_net_gw}:${ocp_net_mask}:${worker_name}01.${CLUSTER_DOMAIN}:ens192:off nameserver=${ocp_net_dns}" -vm=${worker_name}01.${CLUSTER_DOMAIN}
govc vm.change -e="guestinfo.afterburn.initrd.network-kargs=ip=${worker3_ip}::${ocp_net_gw}:${ocp_net_mask}:${worker_name}02.${CLUSTER_DOMAIN}:ens192:off nameserver=${ocp_net_dns}" -vm=${worker_name}02.${CLUSTER_DOMAIN}

govc vm.change -c=${WORKER_CPU} -m=${WORKER_MEMORY} -vm=${worker_name}00.${CLUSTER_DOMAIN}
govc vm.change -c=${WORKER_CPU} -m=${WORKER_MEMORY} -vm=${worker_name}01.${CLUSTER_DOMAIN}
govc vm.change -c=${WORKER_CPU} -m=${WORKER_MEMORY} -vm=${worker_name}02.${CLUSTER_DOMAIN}

govc vm.disk.change -vm ${worker_name}00.${CLUSTER_DOMAIN} -size 120GB
govc vm.disk.change -vm ${worker_name}01.${CLUSTER_DOMAIN} -size 120GB
govc vm.disk.change -vm ${worker_name}02.${CLUSTER_DOMAIN} -size 120GB

Time to start the nodes:

govc vm.power -on=true bootstrap
govc vm.power -on=true ${master_name}00.${CLUSTER_DOMAIN}
govc vm.power -on=true ${master_name}01.${CLUSTER_DOMAIN}
govc vm.power -on=true ${master_name}02.${CLUSTER_DOMAIN}
govc vm.power -on=true ${worker_name}00.${CLUSTER_DOMAIN}
govc vm.power -on=true ${worker_name}01.${CLUSTER_DOMAIN}
govc vm.power -on=true ${worker_name}02.${CLUSTER_DOMAIN}

Wait for the installation to complete

openshift-install --dir=openshift-install wait-for bootstrap-complete
openshift-install --dir=openshift-install wait-for bootstrap-complete > /tmp/bootstrap-test 2>&1
grep safe /tmp/bootstrap-test > /dev/null 2>&1
if [ "$?" -ne 0 ]
then
	echo -e "\n\n\nERROR: Bootstrap did not complete in time!"
	echo "Your environment (CPU or network bandwidth) might be"
	echo "too slow. Continue by hand or execute cleanup.sh and"
	echo "start all over again."
	exit 1
fi
echo -e "\n\n[INFO] Completing the installation and approving workers...\n"
for csr in $(oc -n openshift-machine-api get csr | awk '/Pending/ {print $1}'); do oc adm certificate approve $csr;done
sleep 180

for csr in $(oc -n openshift-machine-api get csr | awk '/Pending/ {print $1}'); do oc adm certificate approve $csr;done

openshift-install --dir=openshift-install wait-for install-complete --log-level debug

Part II

Automating with Terraform

With terraform we will create all the objects we need (templates and VMs) in a single piece of code. If we want to scale our cluster we’ll just have to modify one variable value and rerun terraform to modify the state of our cluster. Before proceeding, please modify variables.tf and install-config.yaml according to your needs and place it in the terraform folder.
Also we need to export govc variables since terraform needs it during the template creation

export OCP_RELEASE="4.7.4"
export CLUSTER_DOMAIN="vmware.lab.local"
export GOVC_URL='192.168.124.3'
export GOVC_USERNAME='administrator@vsphere.local'
export GOVC_PASSWORD='password'
export GOVC_INSECURE=1
export GOVC_NETWORK='VM Network'
export VMWARE_SWITCH='DSwitch'
export GOVC_DATASTORE='datastore1'
export GOVC_DATACENTER='Datacenter'
export GOVC_RESOURCE_POOL=[VSPHERE_CLUSTER]/Resources  ####default pool
export MYPATH=$(pwd)
export HTTP_SERVER="192.168.124.1"

Create ignition files

cd terraform
rm -f /var/www/html/ignition/*.ign
rm -rf ${MYPATH}/openshift-install
rm -rf ~/.kube
mkdir ${MYPATH}/openshift-install
mkdir ~/.kube
cp install-config.yaml ${MYPATH}/openshift-install/install-config.yaml
cat > ${MYPATH}/openshift-install/bootstrap-append.ign <<EOF
{
  "ignition": {
    "config": {
      "merge": [
      {
        "source": "http://${HTTP_SERVER}:8080/ignition/bootstrap.ign"
      }
      ]
    },
    "version": "3.1.0"
  }
}
EOF
openshift-install create ignition-configs --dir  openshift-install --log-level debug
cp ${MYPATH}/openshift-install/*.ign /var/www/html/ignition/
chmod o+r /var/www/html/ignition/*.ign
restorecon -vR /var/www/html/
cp ${MYPATH}/openshift-install/auth/kubeconfig ~/.kube/config

Create the cluster

terraform init
terraform plan
terraform apply -auto-approve

Wait for installation to complete


openshift-install --dir=openshift-install wait-for bootstrap-complete
openshift-install --dir=openshift-install wait-for bootstrap-complete > /tmp/bootstrap-test 2>&1
grep safe /tmp/bootstrap-test > /dev/null 2>&1
if [ "$?" -ne 0 ]
then
	echo -e "\n\n\nERROR: Bootstrap did not complete in time!"
	echo "Your environment (CPU or network bandwidth) might be"
	echo "too slow. Continue by hand or execute cleanup.sh and"
	echo "start all over again."
	exit 1
fi
echo -e "\n\n[INFO] Completing the installation and approving workers...\n"
for csr in $(oc -n openshift-machine-api get csr | awk '/Pending/ {print $1}'); do oc adm certificate approve $csr;done
sleep 180

for csr in $(oc -n openshift-machine-api get csr | awk '/Pending/ {print $1}'); do oc adm certificate approve $csr;done

openshift-install --dir=openshift-install wait-for install-complete --log-level debug

If the installation times out you might need to type the following command again:

openshift-install --dir=openshift-install wait-for install-complete --log-level debug

Result should look like this:

[root@esxi-bastion terraform]# openshift-install --dir=openshift-install wait-for install-complete --log-level debug
DEBUG OpenShift Installer 4.7.4                    
.
.
.
DEBUG Route found in openshift-console namespace: console
DEBUG OpenShift console route is admitted          
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/root/terraform-vsphere-ignitiontest/openshift-install/auth/kubeconfig'
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.vmware.lab.local
INFO Login to the console with user: "kubeadmin", and password: "TkwHE-GWu5U-rAEsA-FrgqQ"

How does it work?

Just like with govc we need to create a template and clone it to create Bootstrap, Masters and Workers in that order and inject ignitions and network setup.

We create a template to clone from as shown by the terraform block bellow.

The local-exec provisioner is needed to actualy stop the created VM so it can be used as a template.

In this case we use local_ovf_path and thus have to download the ova beforehand but remote_ovf_url works as well for a more dynamic approach.

Terraform ovf_network_map support capabilities allows us to set the right Network for this Template.

 resource "vsphere_virtual_machine" "coreostemplate" {
   name             = "coreostemplate"
   resource_pool_id = data.vsphere_resource_pool.pool.id
   datastore_id     = data.vsphere_datastore.datastore.id
   datacenter_id    = data.vsphere_datacenter.dc.id
   host_system_id   = data.vsphere_host.host.id
   num_cpus = 2
   memory   = 4096
   guest_id = "coreos64Guest"
   wait_for_guest_net_timeout  = 0
   wait_for_guest_ip_timeout   = 0
   wait_for_guest_net_routable = false
   enable_disk_uuid  = true
   network_interface {
     network_id = data.vsphere_network.network.id
   }
   ovf_deploy {
     #remote_ovf_url       = "https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/4.7/latest/rhcos-vmware.x86_64.ova"
     local_ovf_path       = "rhcos-vmware.x86_64.ova"  
     disk_provisioning    = "thin"
     ovf_network_map = {
       "VM Network" = data.vsphere_network.network.id
   }
  }
  provisioner "local-exec" {
    command = "govc vm.power -off=true coreostemplate && sleep 10"

    environment = {
      GOVC_URL      = var.vsphere_server
      GOVC_USERNAME = var.vsphere_user
      GOVC_PASSWORD = var.vsphere_password
      GOVC_INSECURE = "true"
    }
  }
 }

Let’s take a look at the Masters definition block (Some lines were removed for better visibility)

Workers nodes specs are described in variables.tf and we use a ‘Count’ loop to create the Nodes.

To feed ignition to our VMs we create a data source read from local file master.ign

data "local_file" "master_vm_ignition" {
 filename   = "${var.generationDir}/master.ign"
}

We defines the masterVMs vsphere_virtual_machine resource that depends on the bootstrapVM resource and we use the coreOS template created previously with the clone block.

resource "vsphere_virtual_machine" "masterVMs" {
  depends_on = [vsphere_virtual_machine.bootstrapVM]
  count      = var.master_count

  name             = "${var.cluster_name}-master0${count.index}"
.
.
.
.

  clone {
    template_uuid = data.vsphere_virtual_machine.coreostemplate.id
  }

To inject ignition data and metadata into the VM we need to use the extra_config block. Unfortunately and as stated in Fedora CoreOS Documentation, vApp Property does not work in this scenario. Syntax is very similar to what was done with govc.

  extra_config = {
    "guestinfo.ignition.config.data"           = base64encode(data.local_file.master_vm_ignition.content)
    "guestinfo.ignition.config.data.encoding"  = "base64"
    "guestinfo.hostname"                       = "${var.cluster_name}-master${count.index}"
    "guestinfo.afterburn.initrd.network-kargs" = lookup(var.master_network_config, "master_${count.index}_type") != "dhcp" ? "ip=${lookup(var.master_network_config, "master_${count.index}_ip")}:${lookup(var.master_network_config, "master_${count.index}_server_id")}:${lookup(var.master_network_config, "master_${count.index}_gateway")}:${lookup(var.master_network_config, "master_${count.index}_subnet")}:${var.cluster_name}-master${count.index}:${lookup(var.master_network_config, "master_${count.index}_interface")}:off nameserver=${lookup(var.bootstrap_vm_network_config, "dns")}" : "ip=::::${var.cluster_name}-master${count.index}:ens192:on"
  }
}

Workers nodes are created the exact same way except we changed the resource we want to use with depends_on meta-argument. That way we make sure Workers built after Masters.

resource "vsphere_virtual_machine" "workerVMs" {
  depends_on = [vsphere_virtual_machine.masterVMs]
  .
  .
  .  
}

Thank you for reading

References