Error

Unable to initialize the simulation player:

Please reload the page or report this error to:
hol-feedback@vmware.com

This demo file may be incomplete or damaged. Please reload the page or download again from the VMware Demo Library:

For VMware partners:
www.vmware.com/go/partnerdemos

For VMware employees:
www.vmware.com/go/demos

Loading

Error

Your web browser doesn't support some required capabilities.

This demo works best with the latest version of Chrome, Firefox, Safari, or Internet Explorer.

This simulation works best with the latest version of Chrome, Firefox, Safari, or Internet Explorer.

Error

This demo file is incomplete or damaged. Please reload the page or download again from the VMware Demo Library:

For VMware partners:
www.vmware.com/go/partnerdemos

For VMware employees:
www.vmware.com/go/demos

This simulation did not load correctly. Please reload the page or report this error to:
hol-feedback@vmware.com

Error

Visit the VMware Demo Library
to get more demos!

For VMware partners:
www.vmware.com/go/partnerdemos

For VMware employees:
www.vmware.com/go/demos

The demo will restart in 5 seconds.

Hit Esc to cancel.

X
↩ Return to the lab
HOL-1844-01: VMware Cloud Foundation Commissioning and Decommissioning a Host

This is an interactive demo

Drive it with your mouse, your finger, or just use the arrow keys.

Use Learn mode to learn the demo. The orange boxes show where to click.

Use Present mode to hide the orange boxes and notes.

Use Autoplay mode to make it play like a movie. Hit the Esc key to stop.

Click a Shortcut to jump to a specific part of the demo.

X
Hide notes
Restore notes
Open notes window
Increase font size
Decrease font size

In this simulation we will show how to handle host failure in an existing workload domain. We will remove the failed host, add a host from the free pool. Then after any necessary  repairs are complete, re-image the failed host, and return it to the free pool.

 

Commissioning and Decommissioning a host

 

This part of the lab is presented as a Hands-on Labs Interactive Simulation. This will allow you to experience steps which are too time-consuming or resource intensive to do live in the lab environment. In this simulation, you can use the software interface as if you are interacting with a live environment.

The orange boxes show where to click, and the left and right arrow keys can also be used to move through the simulation in either direction.

We begin at the SDDC Manager Dashboard:

  1. Click on Status
    • From the System Status page, we can see that there is one critical alert.
  2. Click on VIEW DETAILS under Alerts
    • We see the "Alert - Server is powere...>" critical alert with the date and time that it occurred.
  3. Click on Alert - Server is powere...> to expand and see additional details
  4. Click on the scroll bar to scroll down.
    • In the alert details, we see this is a SERVER_DOWN_ALERT for host R1N5.  This host is powered off and may be having a power supply or other hardware related issue. This means the vSAN cluster of which this host is a member is at risk of falling into a degraded state.
    • We need to quickly respond in order to replace the failed host.  We'll leverage the automation capabilities of the SDDC Manager to do this.
  5. Click on DASHBOARD
  6. Click on VIEW DETAILS next to Physical Resources
  7. Click on the icon labled LASSEN 10 HOSTS
  8. Click on the scroll bar to scroll down
    • Here we see a summary of all the hosts in the rack.  Again we see that the host R1N5 is in a failed state (as indicated by the red icon in the STATUS column).
  9. Click on host R1N5
    • We are unable to view the host details because it is unreachable.  
    • Let's start our recovery by first adding a new host from the Cloud Foundation free pool to the cluster in order to replace this failed host.
  10. Click on DASHBOARD
    • Let's start by checking the cluster's state in the vSphere Web Client.
  11. Click on the vSphere Web Client tab in the browser
    • Here we see that host 'r1n5.lassen.demo.vmware.com' is not responding and that the affected cluster is our Management Workload Domain.  Let's go back to Cloud Foundation and remove this host from the Management Domain.
  12. Click on VMware Cloud Foundation tab in the browser
  13. Click on VIEW DETAILS next to Workload domains
  14. Click on the scroll bar to scroll down
  15. Click on the icon labeled MGMT MANGEMENT Domain
    • This is the workload domain where our failed host is located.
  16. Click on the scroll bar to scroll down
  17. Click the VRACK-CLUSTER link
  18. Click on the scroll bar to scroll down
    • Here we see the four hosts that are part of our management workload domain.  With R1N5 down, we are down to three healthy hosts.  We need to remove R1N5 and replace it with an available host to restore the cluster to four nodes.
  19. Click on R1N5
    • This is the failed host we need to remove from this cluster.  We do this by decommissioning the host.
  20. Click on DECOMMISSION
    • The Decommission Host dialog box pops up asking you to verify that you want to decommission this host.  T
  21. Click on CONFIRM to proceed with decommissioning the host
    • The Host Decommission workflow is initiated.  We can monitor the progress of this workflow by clicking the "System Status Screen" link.
  22. Click on the System Status Screen link
  23. Click on the scroll bar to scroll down
  24. Click on VIEW DETAILS under Workflow Tasks
    • Here we see our active workflow named "VI Resource Pool - Decommission of hosts (192.168..."
  25. Click on the VI Resource Pool - Decommission of hosts(192.168... workflow
  26. Click on the scroll bar to scroll down
    • We see a summary of the workflow.  There are currently 9 pending subtasks and 8 completed subtasks.
  27. Click View Sub Tasks to view the details
  28. Click on the scroll bar to scroll down
    • As the steps are performed the subtasks will dynamically update in the UI and progress can be monitoring by watching them  progress from a NEW state, to a RUNNING state, to a SUCCESSFULL state.  Once all the sub tasks have completed successfully the workflow is done.
    • Here we see that all of the tasks have completed successfully and the host has been removed from the management workload domain.
  29. Click the scroll bar in the subtask section to scroll through the list of tasks
    • The tasks in the decommission workflow include: removing the host from the vSphere cluster, updating the vSAN datastore, Virtual Distributed Switch, and NSX configurations to reflect the host has been removed, and updating the vCenter inventory.  In addition, SDDC Manager will also reconfigure switch ports in the Top-of-Rack switches to reflect that the host has been removed.
  30. Click the scroll bar on the far right to scroll back to the top of the page
  31. Click on Workflows to return to the Workflow summary page
    • Here we see the decommission workflow has completed successfully
    • Next, we'll verify the host has been removed from the Cloud Foundation inventory.
  32. Click on DASHBOARD
  33. Click on VIEW DETAILS next to Physical Resources
    • We can see that the LASSEN Rack now Contains 9 hosts, as one host has been decommissioned
  34. Click on the icon labeled LASSEN 9 HOSTS rack
  35. Click on the scroll bar to scroll down
    • Looking through the list of servers in the rack we can see that r1n5 has been removed.  
    • We can now proceed with adding a replacement host to the cluster.
  36. Click on DASHBOARD
  37. Click on the VIEW DETAILS under Workload Domain
  38. Click on the scroll bar to scroll down
  39. Click on the icon labeled MGMT MANAGEMENT Domain
  40. Click on the scroll bar to scroll down
  41. Click on VRACK-CLUSTER link
  42. Click on the scroll bar to scroll down
    • We can see that the management domain now has 3 hosts.  Let's add a fourth host to this Workload Domain to replace the host we just removed.
  43. Click the scroll bar to scroll up
  44. Click on the DOMAIN DETAILS breadcrumb
  45. Click on EXPAND DOMAIN
  46. Click on the scroll bar to scroll down
    • We see the three hosts currently assigned to the domain (indicated by the check box) along with the unassigned hosts that are currently available.
  47. Click on host R1N3 to assign it to the domain
  48. Click the scroll bar to scroll down
  49. Click NEXT
  50. Click the scroll bar to scroll down
    • Here we are able to review the details of the domain expansion.  We see that one additional host is being added.
  51. Click on APPLY
    • The Expand domain verification box pops up.
  52. Click on CONFIRM
    • We are notified that the Domain Expand Workflow has been triggered.  Here again, we can review the status of this workflow in the status section.
  53. Click OK
  54. Click on STATUS
  55. Click the scroll bar to scroll down
  56. Click on VIEW DETAILS under Workflow Tasks
    • Here we see the "VI Resource Pool - Expanding MGMT" workflow is RUNNING
  57. Click on VI Resource Pool - Expand MGMT to view the details of this workflow.
  58. Click on the scroll bar to scroll down
    • We see there are 23 subtasks being executed to add the host.
  59. Click on View Sub Tasks
  60. Click the scroll bar to scroll down
    • Here we see the separate tasks that SDDC manager is executing.  Again, we can scroll through the list to see the steps that are being taken to add the host to the cluster.
    • Here we see the switch ports are updated with the correct VLAN information, the host is added to the vCenter Inventory, joined to the cluster, and the vSAN datastore, distributed Switch and NSX configurations updated accordingly.
  61. Click on the scroll bar to scroll through the list of sub-tasks
    • We see that the sub-tasks have all completed successfully.
  62. Click on the scroll bar to scroll back to the top of the page
  63. Click Workflows to return to the workflow summary
    • We see the workflow "VI Resource Pool - Expanding MGMT" has completed successfully
  64. Click on DASHBOARD
  65. Click on VIEW DETAILS next to Workload Domains
  66. Click on the scroll bar to scroll down
  67. Click on the icon labeled MGMT MANGEMENT Domain
  68. Click on the scroll bar to scroll down
    • We can see that following the domain expansion the Management Workload domain once again contains four hosts.
  69. Click the vSphere Web Client browser tab
    • From vSphere we can confirm that the host R1N3 has been added to the cluster it again contains 4 hosts and is no longer in an alarm state.
    • We have successfully replaced the failed hosts in the management workload domain.  Next, we'll return the failed host to the Cloud Foundation inventory.
  70. Click on the VMware Cloud Foundation tab in the browser
  71. Click on DASHBOARD
    • The failed host has been repaired and we are ready to add it back to the Cloud Foundation free pool. To do this we first need to re-image the server using the VIA.
  72. Click the new tab button in the browser to open a new browser window
    • in the new browser window we have connected to the VIA using the URL "192.168.100.2:8080/via"
    • Here we will activate the software bundle, install ESXi on the repaired server, and download the manifest file.
  73. Click on Bundle
  74. Click on the Bundle drop down
  75. Select the Latest 2.3.0-5526927 version
  76. Click Activate Bundle
    • This tells the VIA which ESXi version it should use to image the host.
  77. Click the Imaging tab
  78. Click in the Name Box
  79. Type,  Repair Node 5
  80. Click in the Description Box
  81. Type,  Repair Node 5
  82. Click on the Deployment Type drop down
  83. Select, Cloud Foundation Individual Deployment
    • The device type defaults to ESXi SERVER and the number of servers to be imaged defaults to 1, which is what we need so we accept these defaults.
  84. Click on the Vendor Dropdown
  85. Select, Quanta Computers, Inc.
  86. Click on Start Imaging
    • The VIA instantiates a DHCP server and prompts you to reboot the physical server to initiate a PXE boot.  The host will PXE boot and the VIA will proceed to install ESXi.
  87. Click on the ESXi SERVER box
    • During imaging, we can click the server to see the steps and monitor the progress
  88. Click the X in the corner to close the pop-up window.
    • After the host has been imaged, the VIA verifies that ESXi was successfully installed and prompts us to complete the imaging process.
  89. Click on Complete
    • After imaging the host, we need to download its manifest file.
  90. Click the Inventory tab
    • The last imaging run is automatically selected, it is Run ID 16 and was named "Repair Node 5".
  91. Click the Download Manifest link
    • The manifest file for the host is downloaded.  
    • We are now ready to add the host back to the SDDC Manager inventory.
  92. Click on the VMware Cloud Foundation web browser tab
  93. Click on SETTINGS
  94. Click on ADD HOST
  95. Click the Select the Rack to add Host drop down
  96. Select LASSEN
  97. Click the BROWSE button
  98.  Select the manifest file we just downloaded from the VIA (vcf-imaging-details-Repair-Node-5)
  99.  Click on Open
  100.  Click on ADD HOST
    • SDDC Manager uses the information in the manifest file to discover the new host
  101.  Click CONTINUE
    • SDDC Manager will now complete the host bring-up and add the host back to its inventory.
  102.  Click on the scroll bar to scroll down
    • It takes just a few minutes for the host bring-up to complete.  Once completed the host will be configured with a private IP address and with necessary DNS and NTP settings and ready for use with Cloud Foundation.
  103.  Click OK
  104.  Click the scroll bar to scroll down and see the that were completed during the host bring-up.
    • We can now verify the host is back in the inventory.
  105.  Click on DASHBOARD
  106.  Click on VIEW DETAILS next to Physical Resources
  107.  Click the icon labeled LASSEN 10 HOSTS
  108.  Click on the scroll bar to scroll down
    •  Here we see the host R1N5 is back in the inventory  with a healthy status
  109.  Click on the host R1N5
  110.  Click on the scroll bar to scroll down
    • As we can see from the details screen, R1N5 is now successfully online, with the Status of Green
  111.  Click on the dashboard

This concludes our demonstration on how to replace a failed host in a Cloud Foundation environment.  In this demo, we have seen how cloud administrators are able to leverage the powerful automation capabilities of the VMware SDDC Manager to quickly respond to hardware failures and perform the recovery steps with no disruption to the business.  

To return to the lab, click the link in the top right corner or close this browser tab

 

 

Copyright © 2017 VMware, Inc. All rights reserved.