HOL-1844-01: VMware Cloud Foundation Commissioning and Decommissioning a Host

In this simulation we will show how to handle host failure in an existing workload domain. We will remove the failed host, add a host from the free pool. Then after any necessary repairs are complete, re-image the failed host, and return it to the free pool.

Commissioning and Decommissioning a host

This part of the lab is presented as a Hands-on Labs Interactive Simulation. This will allow you to experience steps which are too time-consuming or resource intensive to do live in the lab environment. In this simulation, you can use the software interface as if you are interacting with a live environment.

The orange boxes show where to click, and the left and right arrow keys can also be used to move through the simulation in either direction.

We begin at the SDDC Manager Dashboard:

Click on Status
- From the System Status page, we can see that there is one critical alert.
Click on VIEW DETAILS under Alerts
- We see the "Alert - Server is powere...>" critical alert with the date and time that it occurred.
Click on Alert - Server is powere...> to expand and see additional details
Click on the scroll bar to scroll down.
- In the alert details, we see this is a SERVER_DOWN_ALERT for host R1N5. This host is powered off and may be having a power supply or other hardware related issue. This means the vSAN cluster of which this host is a member is at risk of falling into a degraded state.
- We need to quickly respond in order to replace the failed host. We'll leverage the automation capabilities of the SDDC Manager to do this.
Click on DASHBOARD
Click on VIEW DETAILS next to Physical Resources
Click on the icon labled LASSEN 10 HOSTS
Click on the scroll bar to scroll down
- Here we see a summary of all the hosts in the rack. Again we see that the host R1N5 is in a failed state (as indicated by the red icon in the STATUS column).
Click on host R1N5
- We are unable to view the host details because it is unreachable.
- Let's start our recovery by first adding a new host from the Cloud Foundation free pool to the cluster in order to replace this failed host.
Click on DASHBOARD
- Let's start by checking the cluster's state in the vSphere Web Client.
Click on the vSphere Web Client tab in the browser
- Here we see that host 'r1n5.lassen.demo.vmware.com' is not responding and that the affected cluster is our Management Workload Domain. Let's go back to Cloud Foundation and remove this host from the Management Domain.
Click on VMware Cloud Foundation tab in the browser
Click on VIEW DETAILS next to Workload domains
Click on the scroll bar to scroll down
Click on the icon labeled MGMT MANGEMENT Domain
- This is the workload domain where our failed host is located.
Click on the scroll bar to scroll down
Click the VRACK-CLUSTER link
Click on the scroll bar to scroll down
- Here we see the four hosts that are part of our management workload domain. With R1N5 down, we are down to three healthy hosts. We need to remove R1N5 and replace it with an available host to restore the cluster to four nodes.
Click on R1N5
- This is the failed host we need to remove from this cluster. We do this by decommissioning the host.
Click on DECOMMISSION
- The Decommission Host dialog box pops up asking you to verify that you want to decommission this host. T
Click on CONFIRM to proceed with decommissioning the host
- The Host Decommission workflow is initiated. We can monitor the progress of this workflow by clicking the "System Status Screen" link.
Click on the System Status Screen link
Click on the scroll bar to scroll down
Click on VIEW DETAILS under Workflow Tasks
- Here we see our active workflow named "VI Resource Pool - Decommission of hosts (192.168..."
Click on the VI Resource Pool - Decommission of hosts(192.168... workflow
Click on the scroll bar to scroll down
- We see a summary of the workflow. There are currently 9 pending subtasks and 8 completed subtasks.
Click View Sub Tasks to view the details
Click on the scroll bar to scroll down
- As the steps are performed the subtasks will dynamically update in the UI and progress can be monitoring by watching them progress from a NEW state, to a RUNNING state, to a SUCCESSFULL state. Once all the sub tasks have completed successfully the workflow is done.
- Here we see that all of the tasks have completed successfully and the host has been removed from the management workload domain.
Click the scroll bar in the subtask section to scroll through the list of tasks
- The tasks in the decommission workflow include: removing the host from the vSphere cluster, updating the vSAN datastore, Virtual Distributed Switch, and NSX configurations to reflect the host has been removed, and updating the vCenter inventory. In addition, SDDC Manager will also reconfigure switch ports in the Top-of-Rack switches to reflect that the host has been removed.
Click the scroll bar on the far right to scroll back to the top of the page
Click on Workflows to return to the Workflow summary page
- Here we see the decommission workflow has completed successfully
- Next, we'll verify the host has been removed from the Cloud Foundation inventory.
Click on DASHBOARD
Click on VIEW DETAILS next to Physical Resources
- We can see that the LASSEN Rack now Contains 9 hosts, as one host has been decommissioned
Click on the icon labeled LASSEN 9 HOSTS rack
Click on the scroll bar to scroll down
- Looking through the list of servers in the rack we can see that r1n5 has been removed.
- We can now proceed with adding a replacement host to the cluster.
Click on DASHBOARD
Click on the VIEW DETAILS under Workload Domain
Click on the scroll bar to scroll down
Click on the icon labeled MGMT MANAGEMENT Domain
Click on the scroll bar to scroll down
Click on VRACK-CLUSTER link
Click on the scroll bar to scroll down
- We can see that the management domain now has 3 hosts. Let's add a fourth host to this Workload Domain to replace the host we just removed.
Click the scroll bar to scroll up
Click on the DOMAIN DETAILS breadcrumb
Click on EXPAND DOMAIN
Click on the scroll bar to scroll down
- We see the three hosts currently assigned to the domain (indicated by the check box) along with the unassigned hosts that are currently available.
Click on host R1N3 to assign it to the domain
Click the scroll bar to scroll down
Click NEXT
Click the scroll bar to scroll down
- Here we are able to review the details of the domain expansion. We see that one additional host is being added.
Click on APPLY
- The Expand domain verification box pops up.
Click on CONFIRM
- We are notified that the Domain Expand Workflow has been triggered. Here again, we can review the status of this workflow in the status section.
Click OK
Click on STATUS
Click the scroll bar to scroll down
Click on VIEW DETAILS under Workflow Tasks
- Here we see the "VI Resource Pool - Expanding MGMT" workflow is RUNNING
Click on VI Resource Pool - Expand MGMT to view the details of this workflow.
Click on the scroll bar to scroll down
- We see there are 23 subtasks being executed to add the host.
Click on View Sub Tasks
Click the scroll bar to scroll down
- Here we see the separate tasks that SDDC manager is executing. Again, we can scroll through the list to see the steps that are being taken to add the host to the cluster.
- Here we see the switch ports are updated with the correct VLAN information, the host is added to the vCenter Inventory, joined to the cluster, and the vSAN datastore, distributed Switch and NSX configurations updated accordingly.
Click on the scroll bar to scroll through the list of sub-tasks
- We see that the sub-tasks have all completed successfully.
Click on the scroll bar to scroll back to the top of the page
Click Workflows to return to the workflow summary
- We see the workflow "VI Resource Pool - Expanding MGMT" has completed successfully
Click on DASHBOARD
Click on VIEW DETAILS next to Workload Domains
Click on the scroll bar to scroll down
Click on the icon labeled MGMT MANGEMENT Domain
Click on the scroll bar to scroll down
- We can see that following the domain expansion the Management Workload domain once again contains four hosts.
Click the vSphere Web Client browser tab
- From vSphere we can confirm that the host R1N3 has been added to the cluster it again contains 4 hosts and is no longer in an alarm state.
- We have successfully replaced the failed hosts in the management workload domain. Next, we'll return the failed host to the Cloud Foundation inventory.
Click on the VMware Cloud Foundation tab in the browser
Click on DASHBOARD
- The failed host has been repaired and we are ready to add it back to the Cloud Foundation free pool. To do this we first need to re-image the server using the VIA.
Click the new tab button in the browser to open a new browser window
- in the new browser window we have connected to the VIA using the URL "192.168.100.2:8080/via"
- Here we will activate the software bundle, install ESXi on the repaired server, and download the manifest file.
Click on Bundle
Click on the Bundle drop down
Select the Latest 2.3.0-5526927 version
Click Activate Bundle
- This tells the VIA which ESXi version it should use to image the host.
Click the Imaging tab
Click in the Name Box
Type, Repair Node 5
Click in the Description Box
Type, Repair Node 5
Click on the Deployment Type drop down
Select, Cloud Foundation Individual Deployment
- The device type defaults to ESXi SERVER and the number of servers to be imaged defaults to 1, which is what we need so we accept these defaults.
Click on the Vendor Dropdown
Select, Quanta Computers, Inc.
Click on Start Imaging
- The VIA instantiates a DHCP server and prompts you to reboot the physical server to initiate a PXE boot. The host will PXE boot and the VIA will proceed to install ESXi.
Click on the ESXi SERVER box
- During imaging, we can click the server to see the steps and monitor the progress
Click the X in the corner to close the pop-up window.
- After the host has been imaged, the VIA verifies that ESXi was successfully installed and prompts us to complete the imaging process.
Click on Complete
- After imaging the host, we need to download its manifest file.
Click the Inventory tab
- The last imaging run is automatically selected, it is Run ID 16 and was named "Repair Node 5".
Click the Download Manifest link
- The manifest file for the host is downloaded.
- We are now ready to add the host back to the SDDC Manager inventory.
Click on the VMware Cloud Foundation web browser tab
Click on SETTINGS
Click on ADD HOST
Click the Select the Rack to add Host drop down
Select LASSEN
Click the BROWSE button
Select the manifest file we just downloaded from the VIA (vcf-imaging-details-Repair-Node-5)
Click on Open
Click on ADD HOST
- SDDC Manager uses the information in the manifest file to discover the new host
Click CONTINUE
- SDDC Manager will now complete the host bring-up and add the host back to its inventory.
Click on the scroll bar to scroll down
- It takes just a few minutes for the host bring-up to complete. Once completed the host will be configured with a private IP address and with necessary DNS and NTP settings and ready for use with Cloud Foundation.
Click OK
Click the scroll bar to scroll down and see the that were completed during the host bring-up.
- We can now verify the host is back in the inventory.
Click on DASHBOARD
Click on VIEW DETAILS next to Physical Resources
Click the icon labeled LASSEN 10 HOSTS
Click on the scroll bar to scroll down
- Here we see the host R1N5 is back in the inventory with a healthy status
Click on the host R1N5
Click on the scroll bar to scroll down
- As we can see from the details screen, R1N5 is now successfully online, with the Status of Green
Click on the dashboard

This concludes our demonstration on how to replace a failed host in a Cloud Foundation environment. In this demo, we have seen how cloud administrators are able to leverage the powerful automation capabilities of the VMware SDDC Manager to quickly respond to hardware failures and perform the recovery steps with no disruption to the business.

To return to the lab, click the link in the top right corner or close this browser tab

Error

Loading

Error

Error

Error

This is an interactive demo

Commissioning and Decommissioning a host