VMware Hands-on Labs - HOL-SDC-1404


Lab Overview - HOL-SDC-1404 - vSphere Performance Optimization

Lab Guidance


This Lab, HOL-SDC-1404, covers vSphere performance best practices and various performance related features. You will work with a broad array of solutions and tools, including VMware Labs "Flings" and esxtop to gauge and diagnose performance in a VMware Environment. vSphere features related to performance includes vFRC (vSphere Flash Read Cache), vNUMA, Latency Sensitivity and Power Policy Settings.

The lab contains the following modules.

You have 90 minutes for each lab session and next to each module you can see the estimated time to complete it.  Every module can be completed by itself, and the modules can be taken in random order, but make sure that you follow the instructions carefully with respect to the cleanup procedure after each module. In short, all VMs should be shut down after the completion of each module using the script instructed in the modules.

Lab Captains: Tom Turco (Module 1), Henrik Moenster (Modules 2, 3, 4), Robert Jensen (Modules 5, 6).


Module 1 - Basic vSphere Performance Concepts and Troubleshooting (60 Min)

CPU Performance


The goal of this lab is to expose you to a CPU performance problem in a virtualized environment. It will also guide you on how to quickly identify performance problems by checking various performance metrics and settings.

This module is similar to module 1 from the HOL-SDC-1304 lab from VMworld 2013. If you have taken that module, you may want to skip this module and move onto the modules on vSphere Flash Read Cache, Latency Sensitivity in vSphere, vNUMA, BIOS power policies and visualEsxtop.  

While the time available in this lab constrains the number of performance problems we can review as examples, we have selected relevant problems that are commonly seen in vSphere environments. By walking through these examples, you should be more capable to understand and troubleshoot typical performance problems.  

For the complete Performance Troubleshooting Methodology and a list of VMware Best Practices, please visit the VMware.com website:

http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.5.pdf

http://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-55-monitoring-performance-guide.pdf

http://communities.vmware.com/servlet/JiveServlet/downloadBody/23094-102-2-30667/vsphere5x-perfts-vcops.pdf


 

Getting Started

Performance problems may occur when there are insufficient CPU resources to satisfy demand. Excessive demand for CPU resources on a vSphere host may occur for many reasons. In some cases, the cause is straightforward. Populating a vSphere host with too many virtual machines running compute-intensive applications can make it impossible to supply sufficient CPU resources to all the individual virtual machines. However, sometimes the cause may be more subtle, related to the inefficient use of available resources or non-optimal virtual machine configurations.

This lab demonstrates CPU contention in a vSphere environment. The first step is to prepare the environment for the demonstration. Please continue to start the lab.

PLEASE NOTE: The performance lab is typically very popular and can stress the lab environment when too many people are running the lab simultaneously.  If you receive login, single-sign-on, or certificate errors when you start the lab, it probably means that the lab has not completed its startup process.  Wait a few minutes and try again.  If the issue persists, please contact a proctor for assistance.  

PLEASE NOTE #2: If you have problems typing in any of the required commands throughout this lab, please use the Keyboard Cheat Sheet found in the README.txt file on the desktop to Copy/Paste the correct commands.

 

 

For users with non-US Keyboards

 

NOTE:  If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop. Alternatively, you can use the On-Screen Keyboard.

 

 

On-Screen Keyboard

 

Another option for having issues with the keyboard is to use the On-Screen Keyboard.  

To open the On-Screen Keyboard go to Start - All Programs - Accessories - Ease of Access - On-Screen Keyboard

 

 

Open a PowerCLI window

 

Click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Start the CPU Workload

 

From the PowerCLI Console

type:

.\StartCPUTest.ps1

press enter

This experiment compares the difference between 2-way vCPU virtual machines. Your manager has asked you if there is any difference between multi-socket virtual machines compared to multi-core virtual machines. He also wanted to know if Hot-Add CPUs perform well.

While the script configures and starts up the virtual machines, please continue to read ahead.

 

 

Overview of CPU Test

Below is a list of the most common CPU performance issues…

High Ready Time: Ready Time above 10% could indicate CPU contention and might impact the Performance of CPU intensive application. However, some less CPU sensitive application and virtual machines can have much higher values of ready time and still perform satisfactorily.

High Costop time: Costop time indicates that there are more vCPUs than necessary, and that the excess vCPUs make overhead that drags down the performance of the VM. The VM will likely run better with fewer vCPUs. The vCPU(s) with high costop is being kept from running while the other, more-idle vCPUs are catching up to the busy one.

CPU Limits: CPU Limits directly prevent a virtual machine from using more than a set amount of CPU resources. Any CPU limit might cause a CPU performance problem if the virtual machine needs resources beyond the limit.

Host CPU Saturation: When the Physical CPUs of a vSphere host are being consistently utilized at 85% or more then the vSphere host may be saturated. When a vSphere host is saturated, it is more difficult for the scheduler to find free physical CPU resources in order to run virtual machines.

Guest CPU Saturation: Guest CPU (vCPU) Saturation is when the application inside the virtual machine is using 90% or more of the CPU resources assigned to the virtual machine. This may be an indicator that the application is being bottlenecked on vCPU resource. In these situations, adding additional vCPU resources to the virtual machine might improve performance.

Incorrect SMP Usage: Using large SMP virtual machines can cause extra overhead. Virtual machines should be correctly sized for the application that is intended to run in the virtual machine. Some applications may only support multithreading up to a certain number of threads.  Assignment of additional vCPU to the virtual machine may cause additional overhead.  If vCPU usage shows that a machine that is configured with multiple vCPUs is only using one of them, then it might be an indicator that the application inside the virtual machine is unable to take advantage of the additional vCPU capacity, or that the guest OS is not configured correctly.

Low Guest Usage: Low in-guest CPU utilization might be an indicator that the application is not configured correctly or that the application is starved on some other resource such as I/O or Memory and therefore cannot fully utilize the assigned vCPU resources.

 

 

CPU Test Started

 

When the script completes, you will see two VM Stats Collector applications start up. The script has started a CPU intensive application on the perf_cpu_worker-l-01a and perf_cpu_worker-l-02a virtual machines and each is collecting the benchmark results from those CPU intensive workloads.

IMPORTANT NOTE: Due to changing loads in the lab environment, your values may vary from the values shown in the screenshots.

 

 

Login to vSphere

 

Start Firefox, and log into vSphere. The vSphere web client is the default start page.

Login using Windows session authentication.

Credentials used are:

User name: corp\Administrator
Password: VMware1!

 

 

Navigate the Web Client to Hosts and Clusters Screen

 

  1. Click on the Hosts and Clusters icon

 

 

Navigate to the Performance Screen for the perf_cpu_worker-l-01a virtual machine

 

  1. Select the perf_cpu_worker-l-01a virtual machine
  2. Select the Monitor tab
  3. Select the Performance screen
  4. Select the Advanced view
  5. Click on Chart Options

 

 

View Latest Values for Performance CPU Metrics

 

When investigating a potential CPU issue are

  1. Select  CPU from the Chart metrics
  2. Select only the perf_cpu_worker-l-01a object
  3. Click None under the List of Counters section
  4. Select Demand, Ready, and Usage in MHz
  5. Click Ok

 

 

CPU State Time Explanation

 

Virtual machines can be in any one of four high-level CPU States:

 

 

Navigating Performance Charts

 

TIP: You can shrink the left hand navigation pane by clicking on the Navigator tab. You can also shrink the advance/overview selector by pressing the arrows at the top of that pane.

TIP: You can right click on the Performance Chart Legend column header bar and select or deselect the columns you want to see.

Notice the amount of CPU this virtual machine is demanding and compare that to the amount of CPU usage the virtual machine is actually allocated (Usage in MHz). The virtual machine is demanding more than it is currently being allowed to use.

Notice that the virtual machine is also seeing a large amount of ready time.

Guidance: Ready time greater than 10% could be a performance concern.

 

 

Explanation of value conversion

 

NOTE:  vCenter reports some metrics such as "Ready Time" in milliseconds (ms). Use the formula above to convert the milliseconds (ms) value to a percentage.

For multi vCPU virtual machines you need to multiply the Sample Period by the number of vCPUs in the virtual machine to determine the total time of the sample period. It is also beneficial to monitor Co-Stop time on multi vCPU virtual machines.  Like Ready time, Co-Stop time greater than 10% could indicate a performance problem.  You can examine Ready time and Co-Stop metrics per vCPU as well as per VM.  Per vCPU is the most accurate way to examine statistics like these.

 

 

Examine ESX Host Level CPU Metrics

 

  1. Select esx-02a.corp.local
  2. Select the Monitor tab
  3. Select Performance
  4. Select the Advanced view
  5. Select the CPU view
  6. Notice in the Chart, that only 1 of the CPUs in the host seem to have any significant workload running on it. One CPU is at 100% but the other CPU in the host is not really being used.

 

 

Misconfigured Affinity

 

In this experiment the virtual machines have affinity set and misconfigured.

Change the affinity to correct the CPU contention.

1. Right Click on the perf_cpu_worker-l-01a virtual machine

2. Select Edit Settings

 

 

Check Affinity Settings

 

  1. Expand the CPU item in the list and you will see that affinity is set. Unfortunately, both virtual machines are bound to the same CPU (cpu 0). This can happen if an administrator sets affinity for a virtual machine and then creates a second virtual machine by cloning the original.
  2. Change the "0" to "1"  to correctly balance the virtual machines across the physical CPUs in the system.  
  3. Press OK to make the changes.

Note:  VMware does not recommend setting affinity in most cases. vSphere will balance VMs across CPUs optimally without manually specifying affinity. Enabling affinity prevents some features like vMotion, can become a management headache and lead to performance issue like the one we just diagnosed.

 

 

See Better Performance

 

It may take a moment but the CPU Benchmark score should increase.

Also, you can return to the virtual machine's performance screen to see that the Ready Time is now below the 10% guidance.

In this example, we have seen how to use the Demand compared to the Used CPU metrics to identify CPU contention.  We showed you the Ready time metric and how it can be used to detect physical CPU contention. We also showed you the danger of setting affinity.  

 

 

End CPU Test

 

  1. Stop the workload by clicking the "X" on the VM Stats Collectors windows.

Now let’s look at SMP virtual machines for the second part of this lab.

 

 

Start SMP VM's

 

From the PowerCLI Console

type:

.\StartCPUTest2.ps1

press enter

This experiment compares the difference between 2-way vCPU virtual machines. Your manager has asked you if there is any difference between multi-socket virtual machines compared to multi-core virtual machines. He also wanted to know if Hot-Add CPUs perform well.

 

 

Lab Started

 

It will take a few moments to start this portion of the lab. Please be patient.

A Putty session to 192.168.110.242 will auto start, don't close it down, you will need it shortly.

The PowerCLI script has hot-added a second CPU to the cpu_worker-l-02a virtual machine.

Now cpu_worker-l-02a is a 2 Way virtual machine (2 Sockets x 1 Core). But notice that the performance has not improved.  

The Performance has remained the same because if you hot add a CPU you must also online that CPU in the Guest OS, so that the Guest OS knows that it can now use the newly-added CPU.

 

 

Look at the performance screen for cpu_worker-l-02a

 

  1. Select perf_cpu_worker-l-02a
  2. Select Monitor
  3. Select Performance
  4. Select Advanced
  5. Select the CPU view

Notice that the virtual machine is only using one vCPU the other is not being used at all.

This indicates that there is an OS misconfiguration. The virtual machine is not using the newly hot added vCPU.

 

 

Enable the CPU in the guest OS

 

In the PuTTY window

type:

/online_hotplug_cpu.sh     

and press enter.

 

 

Results of CPU Add

 

Now that we have brought the CPU online, notice that performance has increased and the application/guest OS can now use the other vCPU.

 

 

See Results in vCenter

 

You can go back and see that now the 2nd vCPU is being used by the VM.  Please remember that there is no difference in vCPU and vCPU cores in a virtual machine.  If you setup a virtual machine with 2 vCPU or 1 vCPU with 2 cores, you should see no performance difference with your application performance. When dealing with larger VM with more than 8 vCPUs, this might not always be true. See Module 4 about vNUMA for more details.

Remember, most workloads are not necessarily CPU bound.  The OS and the application need to be able to be multithreaded to get performance improvements in CPU.  Most of the work that an OS is doing is typically not CPU-bound, that is, most of their time is spent waiting for external events such as user interaction, device input, or data retrieval, rather than executing instructions. Because otherwise-unused CPU cycles are available to absorb the virtualization overhead, these workloads will typically have throughput similar to native, but potentially with a slight increase in latency.

Configuring a virtual machine with more virtual CPUs (vCPUs) than its workload can use might cause slightly increased resource usage, potentially impacting performance on very heavily loaded systems. Common examples of this include a single-threaded workload running in a multiple-vCPU virtual machine or a multi-threaded workload in a virtual machine with more vCPUs than the workload can effectively use.

Even if the guest operating system doesn’t use some of its vCPUs, configuring virtual machines with those vCPUs still imposes some small resource requirements on ESXi that translate to real CPU consumption on the host.

For example:

These resource requirements translate to real CPU consumption on the host.

 

 

Shut Down SMP Test

 

Stop the workloads by pressing the "X" button on both VM Stats Collectors.  

Also close the Putty session.

In this test, we learned that it makes no difference in performance if you add vCPUs as cores or sockets, as long as a VM has no more than 8 vCPUs. See module 4 about vNUMA for more details on this. Remember to make sure, that newly added vCPUs are being used by the guest OS.

We also learned that multi-way virtual machines can scale well from 1 to 64 way in vSphere, but you need to be mindful of the potential waste that unused vCPUS can have on the environment.  

 

 

Power Off Workloads

 

  1. Click the PowerCLI icon in the taskbar(Close old PowerCli Window)
  2. Type:
.\StopBasicLabVMs.ps1

press enter

 

 

CPU Performance Summary

CPU contention problems are generally easy to detect. In fact, vCenter has several alarms that will trigger if host CPU utilization or virtual machine CPU utilization goes too high for extended periods of times.

vSphere 5.1+ allows you to create very large virtual machines that have up to 64 vCPUs. It is highly recommended to size your virtual machine for the application workload that will be running in them. Sizing your virtual machine with resources that are unnecessarily larger than the workload can actually use may result in hypervisor overhead and can also lead to performance issues.

In general, here are some common CPU performance tips

Avoid a large VM on too small a platform

Don't expect as high of consolidation ratios with busy workloads as you did with the low-hanging-fruit

Thanks for taking the CPU Performance 101 module!

 

Memory Performance


The goal of this lab is to expose you to a memory performance problem in a virtualized environment as an example.  It will also guide you on how to quickly identify performance problems by checking various performance metrics and settings.

This module is similar to module 1 from the HOL-SDC-1304 lab from VMworld 2013. If you have taken that module, you may want to skip this module and move onto the modules on vSphere Flash Read Cache, Latency Sensitivity in vSphere, vNUMA, BIOS power policies and visualEsxtop.  

While the time available in this lab constrains the number of performance problems we can review as examples, we have selected relevant problems that are commonly seen in vSphere environments.  By walking though these examples, you should be more capable to understand and troubleshoot typical performance problems.  

For the complete Performance Troubleshooting Methodology and a list of VMware Best Practices, please visit the VMware.com website(Note. this HOL have been created before release of vSphere 6.0, so only 5.5 documentation available):

http://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-55-monitoring-performance-guide.pdf

http://communities.vmware.com/servlet/JiveServlet/downloadBody/23094-102-2-30667/vsphere5x-perfts-vcops.pdf


 

Getting Started

Host memory is a limited resource. VMware vSphere incorporates sophisticated mechanisms that maximize the use of available memory through page sharing, resource-allocation controls, and other memory management techniques. However, several of vSphere Memory Over-commitment Techniques only kick-in when the host is under memory pressure.

This module will discuss:

This test demonstrates Memory Demand vs. Consumed Memory in a vSphere environment.  It also demonstrates how memory overcommitment impacts host and VM performance.  The first step is to prepare the environment for the demonstration.  Please continue to start the lab.

 

 

For users with non-US Keyboards

 

NOTE:  If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop. Alternatively, you can use the on-screen Keyboard.

 

 

On-Screen Keyboard

 

Another option for having issues with the keyboard is to use the On-Screen Keyboard.  

To open the On-Screen Keyboard go to Start - All Programs - Accessories - Ease of Access - On-Screen Keyboard

 

 

Open a PowerCLI window

 

Click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Start the Memory Test

 

From the PowerCLI Console

type:  

.\StartMemoryTest.ps1

press enter

 

 

Login to vSphere

 

Start Firefox, and log into vSphere. The vSphere web client is the default start page.

Login using Windows session authentication.

Credentials used are:

User name: corp\Administrator
Password: VMware1!

 

 

Navigate the Web Client to Hosts and Clusters Screen

 

  1. Click on the Hosts and Clusters icon

 

 

Select perf_mem_worker-l-01a

 

  1. Select perf_mem_worker-l-01a

 

 

Navigate to the perf_mem_worker-l-01a Resource Allocation screen.

 

  1. Select the Monitor tab.
  2. Select Resource Allocation

Also see the same Resource Allocation page for perf_mem_worker-l-02a.

Note: In the Host Memory screen, that the perf_mem_worker-l-01a and perf_mem_worker-l-02a  virtual machines are configured with 1GB of memory on the physical host. If you wait for a while, the memory consumption of the virtual machines will look something like the above screenshot. The ESXi host has 4 GB of memory, so there is no memory contention at this time.

Note: vSphere uses a 100 memory page per virtual machine random sampling to calculate active virtual machine memory. It is by no means 100% accurate but for the statistics majors out there, it does have a very high confidence level and is generally quite accurate.

 

 

Select esx-02a.corp.local

 

  1. Select esx-02a.corp.local

 

 

View the ESX hosts memory metrics

 

  1. Select Monitor
  2. Select Performance
  3. Select Advanced
  4. Select the Memory view

Notice: Consumed memory on the host is more than 3GB, but active memory is less than 2 GB. You may also notice that in both the virtual machine memory metrics and the host memory metrics, Shared Memory (transparent Page Sharing) is very low.

Transparent page sharing is a method by which redundant copies of pages are eliminated. TPS is always running by default, however on modern hardware-assisted memory virtualization systems; vSphere will preferentially back guest physical pages with large host physical pages (2MB contiguous memory region instead of 4KB for regular pages) for better performance. vSphere will not attempt to share large physical pages because the probability of finding two large pages that are identical is very low. If memory pressure occurs on the host, vSphere may break the large memory pages into regular 4KB pages, which TPS will then be able to use to consolidate memory in the host.

For that reason, it is no longer recommended to solely look at the "host memory consumed" metric for capacity planning. "Host consumed memory" may be constantly high in most environments.  Instead, Active Memory (Memory Demand) should be used to determine memory capacity planning.

 

 

 

Observe Application Performance

 

As there is no memory pressure in the environment, the performance of the virtual machines are running well.  

(Due to Lab fluctuations your values may vary)

 

 

Power on the Memory Hog virtual machine

 

Note: There are several Memory Hog virtual machines in this Environment. Make sure to select memhog-o-02a under the esx-02a.corp.local host.

  1. Select and Right click on the memhog-o-02a virtual machine
  2. Select Power
  3. Select Power On

The Memory Hog virtual machine will consume all the free memory of the host and cause memory pressure on the host.

While the Memory Hog powers on, keep an eye on the benchmark scores for the memory performance on perf_mem_worker-l-01a and perf_mem_worker-l-02a.  They will take a dip in performance as the memory pressure increase and vSphere has to stabilize the environment.

 

 

Additional Information on Memory Overcommit Technologies

 

There is a nifty “VMware vSphere 5 Memory Management and Monitoring diagram” that shows in a picture/diagram fashion the various memory overcommit techniques that vSphere uses.  http://kb.vmware.com/kb/2017642

There are 4 main memory overcommit techniques and 4 main memory states/levels when these techniques are enabled.

MinFree Memory for a vSphere 5.x host is calculated by default on a sliding scale from 6% to 1% of physical host memory.

 

 

Review the Resource Allocation for the virtual machines

 

1. Select perf_mem_worker-l-01a

2. Select Monitor

3. Select Resource Allocation

Now that memory pressure is occurring in the system, vSphere will begin to use the memory overcommit techniques to conserve memory use.

It may take a while for vCenter to update the memory utilization statistics, so you might have to wait. (Try to refresh if nothing happens!)

Notice that vSphere has used some memory overcommit techniques on the perf_mem_worker virtual machines to relieve the memory pressure. Notice that consumed memory for the virtual machines is now lower than the original 1GB.  As long as the Active memory the virtual machine requires can stay in physical memory the performance of the application will perform well.

 

 

Observe the Memory Resources Allocation for perf_mem_worker-l-02a

 

1. Select perf_mem_worker-l-02a

2. Select Monitor

3. Select Resource Allocation

Once vSphere has stabilized the memory pressure the application performance should return to normal. You may notice that perf_mem_worker-l-02a which has a larger memory footprint consumes more physical memory than perf_mem_worker-l-01a.  By default, vSphere only utilizes memory saving techniques when it needs to and it tries to provide memory to the virtual machines that demand it and reclaim memory from virtual machines that are not using it. This allows for the best use of physical memory (providing the best performance) while at the same time providing the best consolidation of memory when memory pressure requires it.

 

 

Select esx-02a.corp.local

 

  1. Select esx-02a.corp.local

 

 

Review the ESX host memory metrics

 

Review the ESX host memory metrics now that we have powered on the Memory Hog.

  1. Select Monitor
  2. Select Performance
  3. Select Advanced
  4. Select Memory

Notice that Granted and Consumed are very close to the full size of the ESX host (4GB), active is higher but still less than consumed. Also notice that Swap Used is relatively low.  Any active swapping is a performance concern, but simply looking at the Swapped Used metric can be misleading. To more accurately tell if swapping is affecting performance you would need to look at the Swap in Rate available from the chart options screen. Any non-trivial Swap-in Rate would likely indicate a performance problem.

Tip:  Memory over-allocation tends to be fine for most applications and environments. It is generally safe to have a 20% memory over-allocation and it is therefore recommended if starting to do memory over-allocation to start with 20% or less memory over-allocation and increase or decrease after monitoring application performance and monitoring that the memory over-allocation does not cause a constant Swap in Rate to occur.

 

 

Power Off the Workloads

 

  1. Click on the vSphere PowerCLI window in the taskbar(Close windows if still running)
  2. Type:  
.\StopBasicLabVMs.ps1

press enter

 

 

Storage Performance


The goal of this lab is to expose you to storage performance problems in a virtualized environment. It will also guide you on how to quickly identify performance problems by checking various performance metrics and settings.

This module is similar to module 1 from the HOL-SDC-1304 lab from VMworld 2013. If you have taken that module, you may want to skip this module and move onto the modules on vSphere Flash Read Cache, Latency Sensitivity in vSphere, vNUMA, BIOS power policies and visualEsxtop.

While the time available in this lab constrains the number of performance problems we can review as examples, we have selected relevant problems that are commonly seen in vSphere environments. By walking though these examples, you should be more capable to understand and troubleshoot typical performance problems.

For the complete Performance Troubleshooting Methodology and a list of VMware Best Practices, please visit the VMware.com website:

http://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-55-monitoring-performance-guide.pdf

http://communities.vmware.com/servlet/JiveServlet/downloadBody/23094-102-2-30667/vsphere5x-perfts-vcops.pdf


 

Getting started

Approximately 90% of performance problems in a vSphere deployment are typically related to storage in some way.   There have been significant advances in storage technologies over the past 6-12 months to help improve storage performance.  There are a few things that you should be aware of:

In a well-architected environment, there is no difference in performance between storage fabric technologies.  A well-designed NFS, iSCSI or FC implementation will work just about the same as the others.

Despite advances in the interconnects, performance limit is still hit at the media itself, in fact 90% of storage performance cases seen by GSS (Global Support Services - VMware support) that are not configuration related, are media related. Some things to remember:

A good rule of thumb on the total number of IOPS any given disk will provide:

So, if you want to know how many IOPs you can achieve with a given number of disks:

This test demonstrates some methods to identify poor storage performance, and how to resolve it using VMware Storage DRS for workload balancing.  The first step is to prepare the environment for the demonstration.  Please continue to start the lab.

 

 

For users with non-US Keyboards

 

NOTE:  If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop. Alternatively, you can use the On-Screen Keyboard.

 

 

On-Screen Keyboard

 

Another option for having issues with the keyboard is to use the On-Screen Keyboard.  

To open the On-Screen Keyboard go to Start - All Programs - Accessories - Ease of Access - On-Screen Keyboard

 

 

Open a PowerCLI window

 

Click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Start the Storage Workloads

 

From the PowerCLI Console

type:

.\StartStorageTest 

press enter

While the script configures and starts up the virtual machines, please continue to read ahead.

(This script may take up to 3 minutes to complete)

If the script (StartStorageTest) fails due to a storage device not being present, restart the script.

 

 

Start the Storage Workloads

 

You will notice that two Iometer GUIs will start on the desktop.  

Iometer is a commonly used tool for testing storage.   For more details on Iometer or the free I/O Analyzer tool (offered as a VMware fling) that improves on Iometer, be sure to take the extended Iometer and I/O Analyzer modules offered in this lab.

  1. Start the Iometer Workloads in the first window by pressing the Green Flag
  2. Switch to the results tab by clicking on the Results Display tab
  3. Start the Iometer Workloads in the second window by pressing the Green Flag
  4. Switch to the results tab by clicking on the Results Display tab

 

 

View the Storage Performance as reported by Iometer

 

The storage workload is started on both perf_storage_worker-l-01a and perf_storage_worker-l-02a.   Since these virtual machines testing disk share the same Datastore and that Datastore is a single spindle, the performance is poor.

The poor performance can be seen in the Iometer GUI as...

Long Latencies (Average I/O Response Time), latencies greater than 20ms.  

Low IOPs (Total I/O per Second)

Low Throughput (Total MBs per Second)

 

 

Login to vSphere

 

Start Firefox, and log into vSphere. The vSphere web client is the default start page.

Login using Windows session authentication.

Credentials used are:

User name: corp\Administrator
Password: VMware1!

 

 

Navigate the Web Client to Hosts and Clusters Screen

 

  1. Click on the Hosts and Clusters icon

 

 

Select perf_storage_worker-l-02a

 

  1. Select perf_storage_worker-l-02a

 

 

View Storage Performance Metrics in vCenter

 

  1. Select Monitor
  2. Select Performance
  3. Select Advanced
  4. Click Chart Options

 

 

Select Performance Metrics

 

  1. Select Virtual disk
  2. Deselect perf_storage_worker-l-02a
  3. Click None under Select Counters
  4. Select Read Latency and Write Latency
  5. Click OK

 

 

View Storage Performance Metrics in vCenter

 

Notice: In the Performance Chart Legend that the write latency on the scsi0:1 disk (the Iometer Test Disk) is quite high. In this screen capture, latency is 29ms.

(due to lab fluctuations latency values may vary)  

Guidance:  Device Latencies that are greater than 20 to 30 ms may be a performance impact to your applications.

vSphere provides several storage features to help manage and control storage performance:

Let’s configure Storage DRS to solve this problem.

 

 

Change to Datastore View

 

  1. Click on the Datastore Icon
  2. Drop expand all datastores by clicking on the triangles

 

 

Disk I/O Latency

 

When we think about storage performance problems, the top issue is generally latency, so we need to look at the storage stack and understand what layers there are in the storage stack and where latency can build up.

At the top most layer, is the Application running in the guest operating system.  That is ultimately the place where we most care about latency.  This is the total amount of latency that application sees and it include the latencies off the total storage stack including the guest OS, the VMKernel virtualization layers, and the physical hardware.  

ESX can’t see application latency because that is a layer above the ESX virtualization layer.

From ESXi  we see 3 main latencies that are reported in esxtop and vCenter.  

The top most is GAVG,  or Guest Average latency,   that is the total amount of latency that ESXi  can detect.  

That is not saying that is the total amount of latency that Application will see,  in fact if you compare the GAVG (the Total Amount of Latency ESX is seeing) and the Actual latency the Application is seeing, you can tell how much latency the Guest OS is adding to the storage stack and that could tell you if the guest OS is configured incorrectly or is causing a performance problem.  For example, if ESX is reporting GAVG of 10ms, but the application or perfmon in the guest OS is reporting Storage Latency of 30ms, that means that 20ms of latency is somehow building up in the Guest OS Layer, and you should focus your debugging on the Guest OS’s storage configuration.

Ok, now GAVG is made up of 2 major components   KAVG and DAVG,   DAVG = basically how much time is spent in the Device from the driver HBA and storage array, and KAVG = how much time is spent in the ESXi Kernel (so how much over is the kernel adding).  

KAVG is actually a derived metric - ESX does not specifically calculate KAVG.  ESX calculates KAVG with the following formula:

Total Latency –  DAVG =  KAVG.  

The VMKernel is very efficient in processing IO, so there really should not be any significant time that an IO should wait in the kernel or KAVG,  so KAVG should be equal to 0 in well configured / running environments. When KAVG is not equal to 0, then that most likely means that the IO is stuck in a Kernel Queue inside the VMKernel.  So the vast majority of the time KAVG will equal QAVG or Queue Average latency (The amount of time an IO is stuck in a queue waiting for a slot in a lower queue to free up so it can move down the stack).

 

 

Create a new Storage DRS Cluster

 

  1. Right-click Datacenter Site A
  2. Select Storage
  3. Select New Datastore Cluster...

 

 

Create a Datastore Cluster  ( part 1 of 6 )

 

For this lab, we will accept most of the default settings.

  1. Type DatastoreCluster as the name of the new datastore cluster.
  2. Click Next

 

 

Create a Datastore Cluster  ( part 2 of 6 )

 

  1. Click Next

 

 

Create a Datastore Cluster  ( part 3 of 6 )

 

  1. Change the Utilized Space thresholdto 50%
  2. Click Next

Note: Since the HOL is a nested virtual environment, it is difficult to demonstrate high latency.

 

 

Create a Datastore Cluster  ( part 4 of 6 )

 

  1. Select Standalone Hosts
  2. Select esx-01a.corp.local
  3. Click Next

 

 

Create a Datastore Cluster  ( part 5 of 6 )

 

  1. Select Datastore-A and Datastore-B
  2. Click Next

 

 

Create a Datastore Cluster  ( part 6 of 6 )

 

  1. Click Finish

 

 

Run Storage DRS

 

  1. Select Datastore Cluster
  2. Select the Monitor tab
  3. Select Storage DRS
  4. Click on Run Storage DRS Now
  5. Press Apply Recommendations

Notice that SDRS recommends moving one of the workloads from Datastore-A to Datastore-B.  It is making the recommendation based on space. SDRS makes storage moves based on performance only after it has collect performance data for more than 8 hours. Since the workloads just recently started SDRS would not make a recommendation to balance the workloads based on performance until it has collected more data.

 

 

Return to the Iometer GUIs to review the performance

 

Notice that performance is now much better.  Average I/O Response Time is now below 20ms and Total I/Os per Second and Total MBs per Second are significantly higher.

Guidance:  This shows the importance of sizing your storage correctly.  It also shows that sometimes when you have two storage intensive sequential workloads sharing the same spindles, the performance can be greatly impacted. If possible try to keep workloads separated; sequential workloads separate (back by different spindles/LUNs) from random workloads.  

Guidance: From a vSphere perspective, for most applications, the use of one large Datastore vs. several small Datastores tends not to have a performance impact. However, the use of one large LUN vs. several LUNs is storage array dependent and most storage arrays perform better in a multi LUN configuration than a single large LUN configuration.

Guidance: Follow your storage vendor’s best practices and sizing guidelines to properly size and tune your storage for your virtualized environment.

Note: you may notice that maximum I/O latency time as reported in Iometer can spike to 100+ ms latency during this test.  This is side effect of the lab environment and the simulated storage used in the lab.  In general, frequent latency spikes of 60ms or more would be a performance concern and something to investigate further.

 

 

Stop the Iometer workloads

 

Stop the Workloads

  1. Press the Stop Sign button on the Iometer GUI
  2. Close the GUIs by pressing the “X”
  3. Press the Stop Sign button on the Iometer GUI
  4. Close the GUIs by pressing the “X”

 

 

Power Off the Workloads

 

  1. Click the PowerCLI icon in the taskbar
  2. Type:  
.\StopBasicLabVMs

press enter

 

 

Wrap Up - Storage

In this test, we learned that storage latency greater than 20 to 30ms may cause performance problems for applications.  Not having enough spindles or sharing the same storage resources/disk spindles with competing workloads can cause poor storage performance. vSphere 5.1+ provides several storage features, such as Storage DRS that can perform Storage vMotions to balance storage workloads based on capacity and performance.

Other Things to keep in mind with storage are....

For more details on these topics, see the Performance Best Practices and Troubleshooting Guides on the VMware.com website.

http://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-55-storage-guide.pdf

http://communities.vmware.com/docs/DOC-19166

 

Conclusion



 

Conclusion

 

This concludes Module 1 - Basic vSphere Performance Concepts and Troubleshooting.  We hope you have enjoyed taking it.  Do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab, along with an estimated time to complete each one.  Click on 'More Options - Table of Contents' button to quickly jump to a module in the manual.

 

Module 2 - Performance Feature: vSphere Flash Read Cache (45 Min)

What is vSphere Flash Read Cache?


vSphere Flash Read Cache (vFRC) is a flash based storage solution from VMware that is fully integrated with vSphere. vFRC design is based on a framework that enables the virtualization and management of local flash based devices in vSphere. The framework provides a centralized management platform for all locally attached flash-based devices and it maximizes their utilization throughout virtualized infrastructures.

This module focuses on configuring and analyzing the performance of the vSphere Flash Read Cache.

The vFRC framework design is based on two major components:


 

vSphere Flash Read Cache Architecture

 

The vFRC Infrastructure component virtualizes SSDs and provides tight integration with vSphere features such as HA, vMotion and DRS.

The vFRC Cache Software component is a host-based cache that will enable the acceleration of VMs without requiring any change to the VMs that are provisioned on them. The vFRC tier will be managed as a resource just like CPU and storage.

In the following lab lesson, we will configure and utilize vFRC resources.

It should be noted, that because we are working in a fully virtualized lab environment, we cannot reliably demonstrate the performance benefits of vFRC.

 

 

For users with non-US Keyboards

 

NOTE:  If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop. Alternatively, you can use the On-Screen Keyboard.

 

 

On-Screen Keyboard

 

To open the On-Screen Keyboard go to Start - All Programs - Accessories - Ease of Access - On-Screen Keyboard

 

Configure vSphere Flash Read Cache



 

Login to the vSphere Web Client

 

Start Firefox, and log into vSphere. The vSphere web client is the default start page.

Login using Windows session authentication.

Credentials used are:

User name: corp\Administrator
Password: VMware1!

 

 

Hosts and clusters

 

  1. Click Hosts and Clusters

 

 

Verify SSD disk

 

To us vFRC, the ESXi host needs to have at least one local SSD drive installed.

In the lab environment, we have added an additional virtual disk drive to the ESXi host, and modified the storage system to recognize it as an SSD disk. This enables vSphere to accept the "SSD resource" as a valid device to be used as a vFRC resource.

Note, that we cannot do comparative performance tests, because the fake SSD drive has an order of magnitude higher latency than a real SSD drive and the data path to the device is longer.

We need to verify that the ESXi host has a valid SSD drive installed.

  1. Click esx-02a.corp.local
  2. Select Manage
  3. Select Storage
  4. Select Storage Adapters
  5. Select vmhba1
  6. Select Devices
  7. Verify, that the ESXi host has two local drives with at least one being of drive type Flash

 

 

Configure vFRC

 

Now we need to configure vFRC to use our "SSD" device.

  1. Select "Settings"
  2. Select "Virtual Flash Resource Management"
  3. Click "Add Capacity..."

Note that the Device Backing list is empty.

 

 

Add SSD device to Virtual Flash Resource Capacity

 

  1. Select the 1 GB device to be used as a vFRC resource
  2. Click OK

 

 

Verify device backing

 

Wait for the client to update or refresh the screen.

Verify that we now have the storage device listed.

You have now successfully configured an ESXi host to utilize a vFRC resource.

 

 

Configure virtual machine to use vFRC

 

Now we need to configure a virtual machine to use vFRC on a selected disk drive.

  1. Right Click "Win7-01a"
  2. Click "Edit Settings..."

 

 

Configure Cache Size

 

We will configure a specific disk drive to use vFRC

  1. Expand "Hard disk 1"
  2. Select "MB"
  3. Enter "64" MB of the flash device to be used for vFRC cache
  4. Click "Advanced"

 

 

Advanced Settings

 

Note: It is possible to change the vFRC block size, ranging from 4 KB to 1024 KB. This is an important feature, because in order to get the best performance from vFRC, you should adjust the vFRC block size to match your application I/O pattern. Application I/O pattern can be investigated by using vscsiStats.

Performance analysis like this should be done in a real physically installed vSphere environment and not in a fully virtualized environment like the VMworld HOL environment, where we cannot simulate the real performance of an SSD device.

  1. Change the Block Size to 4 KB

Click "OK" to exit the Virtual Flash Read Cache Settings window.

Click "OK" in the Win7-01a Settings window.

You have now successfully configured a virtual machine to utilize a vFRC resource.

 

Analyze vSphere Flash Read Cache Performance



 

Power on virtual machine

 

Now we will try and create some workload on the vFRC enabled virtual disk, and then we will analyze the cache statistics.

  1. Right Click "Win7-01a"
  2. Click "Power On"

 

 

Launch Putty from ControlCenter

 

vFRC statistics can be retrieved using esxcli from a command prompt.

Start Putty

 

 

SSH to esx-02a.corp.local

 

  1. Select the session "esx-02a.corp.local"
  2. Click "Open"

The login procedure will complete automatically using public key authentication.

The text for the following commands can be copied and pasted from the file ReadMe.txt on the desktop. The paste functionality in Putty, is right click.

 

 

Using esxcli with vFRC

 

Issue the following command to list the vFRC cache file descriptor:

esxcli storage vflash cache list -m vfc

1.   Note the output

Issue the following command to list the vFRC cache file descriptor. Note that the output from the previous command (1) must be inserted instead of <descriptor>:

esxcli storage vflash cache stats get -m vfc -c <descriptor>

Note: The text for this command can also be copied and pasted from the file README.txt on the desktop. The paste functionality in Putty, is right click.

 

 

vFRC statistics

 

Note the "Cache hit rate" value. This value should be pretty low, because the VM has just been powered on and we have not generated any I/O to the vFRC enabled disk yet.

 

 

Launch Console

 

Return to the vSphere Web Client and the Win7-01a VM.

  1. Click "Launch Console"

 

 

Log into Win7-01a VM

 

Login using:

User name: vmware
Password: VMware1!

 

 

Go to F:\

 

Use the shortcut on the desktop to navigate to F:\.

 

 

Start I/O test 1

 

  1. Launch the "vFlash_test_1.cmd file" by double clicking it

A command prompt will appear and run a 60 second random read test. When the command prompt disappears, continue to next step.

This test writes a 64MB file to the vFRC enabled disk, and reads it randomly in 60 seconds. Note that 64MB is the same size as the vFRC cache we configured on the disk.

What will be your expected cache hit percentage?

 

 

Using esxcli with vFRC

 

Return to Putty.

Issue the same command again:

esxcli storage vflash cache stats get -m vfc -c <descriptor>

Where <descriptor> is the same vflash cache file descriptor found previously.

Hint: You can use the "Up Arrow" to repeat the command.

 

 

vFRC statistics

 

Note: The "Cache hit rate" value; this value should be much higher now. The reason for this is that we have been reading the same data over and over again for 60 seconds, and that the cache is just big enough to hold all data from the data file. So we read the data multiple times and gets a high cache hit rate.

 

 

Start I/O test 2

 

Return to the console of the Win7-01a VM.

  1. Launch the "vFlash_test_2.cmd file" by double clicking it

A command prompt will appear and run a 60 second read test. When the command prompt disappears, continue to the next step.

This test writes a 128MB file to the vFRC enabled disk, and reads it randomly for 60 seconds. Note that 128MB is twice as large as the vFRC cache we configured on the disk.

What will be your expected cache hit percentage, larger or smaller than before?

 

 

Using esxcli with vFRC

 

Return to Putty.

Issue the same command again:

esxcli storage vflash cache stats get -m vfc -c <descriptor>

Where <descriptor> is the same vFRC cache file descriptor found previously.

Hint: You can use the "Up Arrow" to repeat the command.

 

 

vFRC statistics

 

Note: The "Cache hit rate" value has decreased again now. The reason for this is that we have been reading randomly on a data set that is larger than the cache configured on the disk. So even though we might read the same data some times, the cache is not large enough to hold all data from the data set.

Now we have examined how we can use esxcli to obtain statistics from the vFRC software in vSphere. These statistics can be used to determine if the cache size has been sized appropriately and being used efficiently.

Optional:

Investigate the information from the output. Compare latencies for disk and cache, and cached vs total I/Os.

Feel free to modify the IOBlazer workloads or try to compare performance when vFRC is enabled and disabled. You can also reset the I/O counters using the following command:

esxcli storage vflash cache stats reset -m vfc -c <descriptor>

It should be noted, that because we are working in a fully virtualized lab environment, we cannot realistically demonstrate the performance benefits of vFRC, and you might get unexpected results.

 

 

vFRC Performance

 

Through the vCenter performance monitor, there is a set of vFRC performance counters available, in order to monitor the performance of the flash based cache.

  1. Select the "Win7-01a" VM
  2. Select the "Monitor" tab
  3. Select the "Performance" tab
  4. Select "Advanced"

 

 

Virtual Disk

 

  1. Select the view: "Virtual disk" and let the display refresh
  2. Click "Chart Options"

 

 

vFRC Performance Metrics

 

  1. Click "None" to deselect all counters
  2. Select the three vFRC related counters

Click OK

 

 

Performance Charts

 

You can monitor the vFRC IOPs, latency and throughput for each virtual disk configured to utilize vFRC.

 

Migration of a VM configured to use vFRC


When using local storage devices on an ESXi host server, the virtual machines utilizing those storage devices are typically stuck on a particular host. When using vFRC, this is not the case.


 

Migrate Win7-01a using vMotion

 

  1. Right click "Win7-01a"
  2. Click "Migrate..."

 

 

Select type of migration

 

  1. Select "Change host"
  2. Click "Next"

 

 

Select destination

 

  1. Select "esx-01a-corp.local" as the destination
  2. Click "Next"

Note that the compatibility checks succeeds, even though the VM is utilizing a host local storage resource. esx-01a has been preconfigured for vFRC using a local Flash device.

 

 

Select cache migration settings

 

Here we can select if we want to migrate the read cache of a VM to the destination host, or discard the read cache and build up a new cache on the destination host. If we migrate the cache, the amount of data being sent over the network is larger and the migration takes longer to complete. If we discard the cache, we might experience a dip in read performance of the VM as new cache content is built up on the destination host. So depending on the current load on the VM and the amount of cache, you can select a migration type that matches your needs.

  1. Select "Always migrate the cache contents"
  2. Click "Advanced"

 

 

Individual HDDs

 

Note that if a VM has more that one disk using vFRC, you can manage how the cache is being handled during migration.

  1. Click "Next"

 

 

Select Network

 

  1. Select "VM Network" as Destination Network
  2. Click "Next"

 

 

vMotion Priority

 

  1. Click "Next"

 

 

Optional: Migrate the VM

 

Migration of a VM in a lab environment like this can take a while due to limited performance. You can choose to either migrate the VM and wait for five minutes or more, or just cancel out of the migration wizard.

  1. Click "Finish" or "Cancel"

 

 

Optional: Verify successful migration

 

If you selected to migrate the VM, verify that you have successfully migrated the VM.  It is now using vFRC on esx-01a.corp.local.

 

Clean-Up Procedure and Conclusion



 

Clean up procedure

In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machines.

 

 

Power Off Workloads

 

  1. Click the PowerCLI icon in the taskbar
  2. Type:
.\StopBasicLabVMs.ps1

press enter

 

 

Key take aways

 

vSphere Flash Read Cache will give you the following features and benefits.

For more information on vSphere Flash Read Cache, see the documentation found on vmware.com.

http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-Flash-Read-Cache-FAQ.pdf

http://www.vmware.com/files/pdf/techpaper/vfrc-perf-vsphere55.pdf

 

 

Conclusion

 

This concludes Module 2, Performance Feature: vSphere Flash Read Cache. We hope you have enjoyed taking it.  Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab, along with an estimated time to complete each one.  Click on 'More Options - Table of Contents' to quickly jump to a module in the manual.

 

Module 3 - Performance Feature: Latency Sensitivity Setting (45 Min)

What is Latency and Jitter?



 

Latency Sensitivity feature of vSphere

The Latency-Sensitivity feature aims at eliminating the major sources of extra latency imposed by virtualization to achieve low response time and jitter. This per-VM feature achieves the goal by giving exclusive access to physical resources to avoid resource contention due to sharing, bypassing virtualization layers to eliminate the overhead of extra processing, and tuning virtualization layers to reduce the overhead. Performance can be further improved when the latency-sensitivity feature is used together with a pass-through mechanism such as single-root I/O virtualization (SR-IOV).

The latency-sensitivity feature is applied per VM, and thus a vSphere host can run a mix of normal VMs and VMs with this feature enabled.

 

 

Who should use this feature?

The Latency Sensitivity feature is intended for specialized use cases that require extremely low latency. It is extremely important to determine whether or not your workload could benefit from this feature before enabling it. In a nutshell, Latency Sensitivity provides extremely low network latency with a tradeoff of increased CPU and memory cost as a result of less resource sharing, and increased power consumption.

We define a 'Highly' latency sensitive application as one that requires network latencies in the order of tens of microseconds and very small jitter. Stock market trading applications are an example of highly latency sensitive applications.

Before deciding if this setting is right for you, you should be aware of the network latency needs of your application. If you set latency sensitivity to High, it could lead to increased host CPU utilization, power consumption, and even negatively impact performance in some cases.

 

 

Who should not use this feature?

Enabling the Latency Sensitivity feature reduces network latency. Latency Sensitivity will not decrease application latency if latency is influenced by storage latency or other sources of latency besides the network.

The Latency Sensitivity feature should be enabled in environments in which the CPU is under committed. VMs which have Latency Sensitivity set to High will be given exclusive access to the physical CPU they need to run. This means the latency sensitive VM can no longer share the CPU with neighboring VMs.

Generally, VMs that use the latency sensitivity feature should have a number of vCPUs which is less than the number of cores per socket in your host to ensure that the latency sensitive VM occupies only one NUMA node.

If the Latency Sensitivity feature is not relevant to you, feel free to choose a different module from HOL-SDC-1404 now.

 

 

Changes to CPU access

When a VM has 'High' Latency Sensitivity set in vCenter, the VM is given exclusive access to the physical cores it needs to run. This is termed exclusive affinity. These cores will be reserved for the latency sensitive VM only, which results in greater CPU accessibility to the VM and less L1 and L2 cache pollution from multiplexing other VMs onto the same cores. When the VM is powered on, each vCPU is assigned to a particular physical CPU and remains on that CPU.

When the Latency Sensitive VM's vCPU is idle, ESXi also alters its halting behaviour so that the physical CPU remains active. This reduces wakeup latency when the VM becomes active again.

 

 

Changes to virtual NIC coalescing

A virtual NIC (vNIC) is a virtual device which exchanges data packets between the VMkernel and the Guest operating system. Exchanges are typically triggered by interrupts to the Guest OS or by the Guest OS calling into VMKernel, both of which are expensive operations. Virtual NIC coalescing, which is default behaviour in ESXi, attempts to reduce CPU overhead by holding onto packets for some time before posting interrupts or calling into VMKernel. In doing so, coalescing introduces additional network latency and jitter, but these effects are negligible for most non-latency sensitive workloads.

Enabling 'High' Latency Sensitivity disables virtual NIC coalescing, so that there is less latency between when a packet is sent or received and when the CPU is interrupted to process the packet.

This also results in greater power and CPU consumption as a tradeoff for reduced network latency. This reduces network latency when the number of packets being processed is small. But, if the number of packets becomes large, disabling virtual NIC coalescing can actually be counterproductive due to the increased CPU overhead.

Are you ready to get your hands dirty? Let's start the hands-on portion of this lab.

 

Analyze the effect of the Latency Sensitivity Setting in vSphere



 

For users with non-US Keyboards

 

NOTE:  If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop. Alternatively, you can use the On-Screen Keyboard.

 

 

On-Screen Keyboard

 

To open the On-Screen Keyboard, go to Start > All Programs > Accessories > Ease of Access > On-Screen Keyboard.

 

 

Open a PowerCLI window

 

To start the demo, click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt. It may take a moment to load.

Note: Your desktop may look slightly different from those in the screen captures.

 

 

Start Lab configuration

 

At the PowerCLI prompt

type:

.\StartLSLab.ps1

and hit Enter.

Note: The text for this command can also be copied and pasted from the file ReadMe.txt on the desktop. Alternatively, you can type a few characters of the script name and hit the Tab key to complete the name. Keep hitting the Tab key until the correct name is displayed and hit Enter.

Please leave this PowerCLI window open for the remainder of the lab and continue reading this lab manual while the lab environment is being configured.

 

 

VM Stats Collectors: CPU intensive workload started

 

In a few minutes, when the script completes, you will see two “VM Stats Collector” applications start up. Within a minute after, each utility will start a CPU intensive application on the perf_cpu_worker-l-01a and perf_cpu_worker-l-02a virtual machines and will be collecting the benchmark results from those CPU intensive workloads. These VMs perf_cpu_worker-l-01a and perf_cpu_worker-l-02a will create high demand for CPU on the host, which will help us demonstrate the Latency Sensitivity feature.

IMPORTANT NOTE: Due to changing loads in the lab environment your values may vary from the values shown in the screenshots.

Please continue to follow these instructions while the lab environment is being configured.

 

 

Login to the vSphere Web Client

 

Start Firefox, and log into vSphere. The vSphere web client is the default start page.

Login using Windows session authentication.

Credentials used are:

User name: corp\Administrator
Password: VMware1!

 

 

Hosts and clusters

 

  1. Click Hosts and Clusters

 

 

Edit resource reservations for the latency-sensitivity-l-01a VM

 

Your Lab datacenter is made up of two hosts named esx-01a.corp.local and esx-02a.corp.local.

We will use the latency-sensitivity-l-01a virtual machine to demonstrate the Latency Sensitivity feature. To show how the 'High' Latency Sensitivity setting affects network latency, we will compare network performance between latency-sensitivity-l-01a with Latency Sensitivity set to 'Normal' and that same VM with Latency Sensitivity set to 'High'.

IMPORTANT : If you can't find the latency-sensitivity-l-01a, then refresh the browser, or log out and log in again.

The Latency Sensitivity feature, when set to 'High', has two VM resource requirements. For best performance, it needs 100% memory reservation and 100% CPU reservation.

To make a fair comparison, both the 'Normal' latency sensitivity VM and the 'High' latency sensitivity VM should have the same resource reservations, so that the only difference between the two is the 'High' latency sensitivity setting.

First, we will create resource allocations for the latency-sensitivity-l-01a virtual machine while Latency Sensitivity is set to "Normal".

 

  1. Right Click the VM latency-sensitivity-l-01a
  2. Click Edit Settings...

 

 

Set Memory Reservation

 

1. Under the tab Virtual Hardware, click on Memory.

2. Check the box Reserve all guest memory (All locked).

This sets a 100% memory reservation for the VM. Right now, we are going to test network performance on a 'Normal' Latency Sensitivity VM, but when we change the VM's latency sensitivity to 'High' later, 100% memory reservation ensures that all the memory the VM needs will be located close to the processor which is running the VM. If the VM has a 'High' Latency Sensitivity setting and does not have a 100% memory reservation, it will not power on.

 

 

Set CPU Reservation

 

Still on the Edit Settings page,

1. Click on CPU.

2. In the Reservation field, click on the drop down menu.

3. Click on Maximum.(In this case 3059MHz)

Note: Your maximum CPU reservation value may differ from the screenshot above.

3059 Mhz(your value may differ) is input into the Reservation field, which is the frequency of one physical CPU on the host. This sets a 100% CPU reservation for the VM. latency-sensitivity-l-01a has one vCPU so 'Maximum' reserves as many processor cycles for the vCPU as it needs. When the VM has the 'High' latency sensitivity setting, this CPU reservation enables exclusive affinity so that one physical CPU is reserved solely for use of the 'High' Latency Sensitive VM vCPU.

 

 

Ensure Latency Sensitivity is 'Normal'

 

Still on the Edit Settings page,

1. Click on the tab VM Options.

2. Click Advanced.

3. This VM has Normal Latency Sensitivity, which is the default setting.

4. Click OK.

 

 

Wait for reconfiguration to complete

 

In the top right corner, Recent Tasks will show that the settings for latency-sensitivity-l-01a are being reconfigured. Wait for the task to complete before proceeding. It should look like the image above.  You may see the VM momentarily disappear from the inventory when vCenter is reconfiguring the virtual machine.

 

 

Check Status

 

Check to see that the reconfiguring task have been completed.

 

 

Return to Hosts and Clusters

 

1. Select Hosts and Clusters to return to latency-sensitivity-l-01a.

 

 

Power on latency-sensitivity-l-01a

 

1. In the inventory right click on the latency-sensitivity-l-01a VM.

2. Click Power On.

 

 

Monitor host CPU usage

 

  1. Click on esx-01a.corp.local
  2. Click on the tab Monitor
  3. Click Performance
  4. Click Advanced
  5. You can see that the Latest Value for esx-01a.corp.local Usage should be 99 or 100 percent

By this time, you should have seen VM Stats Collector windows open. This indicates that VMs perf_cpu_worker-l-01a and perf_cpu_worker-l-02a are consuming as much CPU on the host as they can. Although an environment which contains latency-sensitive VMs should typically remain CPU undercommitted, creating demand for CPU makes it more likely that we can see a difference between the 'Normal' and 'High' Latency Sensitivity network performance.

The VM perf_cpu_worker-l-03a will serve as the network performance test target.

 

 

Monitor Resource Allocation

 

1. In the inventory, click latency-sensitivity-l-01a.

 

 

Monitor Resource Allocation

 

2. Click the Monitor tab.

3. Click Resource Allocation.

The Resource Allocation for the 'Normal' Latency Sensitive VM shows only a small portion of the total CPU and Memory reservation is Active. Your screen may show different values if the VM is still booting up.

 

 

Open a PuTTY window

 

Start Putty

 

 

Test Latency Sensitivity on 'Normal' Latency Sensitive VM.

 

  1. Select latency-sensitivity-l-01a
  2. Click Open

The login procedure will complete automatically using public key authentication.If not, please chose Yes if prompted with a Security Alert.

The text for the following commands can be copied and pasted from the file ReadMe.txt on the desktop. The paste functionality in Putty, is right click.

 

 

Test network latency on 'Normal' latency sensitivity VM

 

At the command line, type:

ping -f -w 1 192.168.110.243

Press enter.

Note: The text for this command can also be copy-pasted from the file Readme.txt on the desktop. Right click in PuTTY to paste it in.

Wait for the command to complete, and run this command a total of 3 times. On the second and third times, you can press the up arrow to retrieve the last command entered.

Ping is a very simple network workload, which measures Round Trip Time (RTT), in which a network packet is sent to a target VM and then returned back to the VM. The VM latency-sensitivity-l-01a is pinging a VM with the IP address 192.168.110.243 which is located on a remote host. For a period of one second, latency-sensitivity-l-01a sends back-to-back ping requests. Ping is an ideal low-level network test because the request is processed in the kernel and does not need to access the application layer of the operating system.

We have finished testing network latency and throughput on the 'normal' Latency Sensitivity VM. Do not close this PuTTY window as we will use it for reference later. We will now change the VM to 'high' Latency Sensitivity.

 

 

Shut down the VM latency-sensitivity-l-01a.

 

  1. Right click latency-sensitivity-l-01a
  2. Click Shut Down Guest OS

At Confirm Guest Shut Down, click Yes

Wait for latency-sensitivity-l-01a to power off. If it takes to long, try to do a refresh.

To enable the latency sensitivity feature for a VM, the VM must first be powered off. You can still change the setting while the VM is powered on but it doesn't fully apply until the VM has been powered off and then back on again.

 

 

Edit Settings

 

After latency-sensitivity-l-01a shows that it is powered off

  1. Right click latency-sensitivity-l-01a
  2. Click Edit Settings...

 

 

Set 'High' Latency Sensitivity.

 

  1. Click VM Options.
  2. Click Advanced.
  3. Next to the Latency Sensitivity field, click the drop down arrow.
  4. Select High.

The values for this setting are 'Low', 'Normal', 'Medium', and 'High'.

 

 

CPU reservation warning

 

You will see a warning "Check CPU Reservation" appear next to the Latency Sensitivity setting. For best performance, High Latency Sensitivity requires you set 100% CPU reservation for the VM, which we did earlier. This warning will always appear in the Advanced Settings screen, even when the CPU reservation has already been set high enough.

If no reservation is set, the VM is still allowed to power on and no further warnings are made.

1. Click OK.

 

 

Power on latency-sensitivity-l-01a

 

1. In the inventory, right click on the latency-sensitivity-l-01a VM.

2. Click Power On.

 

 

Monitor Resource Allocation

 

  1. Click the Monitor tab.
  2. Click Resource Allocation.

On the top half of this image, we see that the 'High' Latency Sensitivity VM shows 100% Active CPU and Private Memory even though the VM itself is idle. Compare this to the Resource Allocation for the 'Normal' Latency Sensitive VM which we examined earlier. It shows only a small portion of the total CPU and Memory reservation is Active. This increase in Active CPU and Memory is the result of the 'High' Latency Sensitivity setting.

Although we cannot see the difference in this environment when 'High' Latency Sensitivity is set with 100% CPU reservation, the Host CPU will show 100% utilization of the physical core which is hosting the VM's vCPU. This is a normal result of exclusive affinity in the Lab environment and occurs even when the VM itself is idle. On many Intel processors, the physical CPU hosting the vCPU will be idle if the vCPU is idle but it will still be unavailable to other vCPUs.

 

 

Monitor the VM Stats Collectors

 

Before we set 'High' Latency Sensitivity for latency-sensitivity-l-01a, the CPU workers had equivalent benchmark scores. Now, one of the CPU workers will have a lower score. In the example above, CPU worker 02a has a lower score. Your lab may show either CPU worker 01a or CPU worker 02a with a lower score. This confirms that latency-sensitivity-l-01a has impacted CPU worker 02a's access to CPU cycles which decreases its CPU benchmark score.

Next, we will test network latency on the 'High' Latency Sensitivity VM.

 

 

Start a new PuTTY window

 

Start Putty. Right click on old session and chose Putty to start a new a keep current disconnected window.

 

 

Test Latency Sensitivity on 'High' Latency Sensitive VM.

 

  1. Select latency-sensitivity-l-01a
  2. Click Open

The login procedure will complete automatically using public key authentication.

The text for the following commands can be copied and pasted from the file ReadMe.txt on the desktop. The paste functionality in Putty, is right click.

 

 

Test network latency on 'High' Latency Sensitive VM

 

At the command line, run the command:

ping -f -w 1 192.168.110.243

Note: You can press the up arrow to retrieve the last command entered. The text for this command can also be copy-pasted from the file CheatSheet.txt on the desktop. Right click in PuTTY to paste it in.

Like last time, wait for the command to complete, and run this command a total of three times.

We'll take a look at the results in a second, but first we will set the Latency Sensitivity setting back to default.

 

 

Edit Settings

 

Back at the Web Client,

  1. Right click latency-sensitivity-l-01a if not already selected.
  2. Click Edit Settings...

 

 

Set the Latency Sensitivity back to Normal

 

  1. Click VM Options.
  2. Click Advanced.
  3. Next to the Latency Sensitivity field, click the drop down arrow.
  4. Select Normal.
  5. Click OK.

 

 

Close the VM Stats Collector windows

 

From the task bar, click the .NET icon to bring the VM Stats Collectors to the foreground.

Notice the CPU benchmark scores are once again equivalent. This is because we changed the Latency Sensitivity setting on latency-sensitivity-l-01a from 'High' back to 'Normal'.

We have finished the network tests. Close the windows using the X on each window. This stops the VMs CPU worker 01a and 02a from placing load on the hosts. The VM latency-sensitivity-l-01a will also be automatically powered down. You will see the message "The system is going down for system halt NOW!" appear in the current PuTTY session which you can disregard.

 

 

Compare network latency tests

 

From the taskbar, click the PuTTY icons to bring both PuTTY windows to the foreground and arrange them with Normal Latency Sensitivity on top and High Latency Sensitivity on the bottom.

Hint: At the bottom of both windows, there should be a timestamp:

Broadcast message from root (timestamp): The oldest timestamp is the Normal Latency Sensitivity VM. Place this window on top and the other on bottom.

Now let's delve into the performance results.

Important Note: Due to variable loads in the lab environment, your numbers may differ from those above.

The ping test we completed sends as many ping requests to the remote VM as possible ("Back to back pings") within a one second period. As soon as one ping is returned, another request is sent. The ping command outputs four statistics per test:

Of these, we are most interested in minimum latency and average.

From 'eyeballing' the differences in numbers between the 'Normal' and 'High' Latency Sensitivity VMs, hopefully you will be able to see the difference between the average latency and minimum latency. Note the numbers within the green brackets. This should give you a general ide on what impact changing the latency setting from 'Normal' to 'High' would acomplice. Because this is a shared virtualized test environment, these performance results are not representative of the effects of the Latency Sensitivity setting in a real-life environment. They are for demonstration purposes only.

Remember, these numbers were taken from the same VM with the same resource allocations, under the same conditions. The only difference between the two is setting 'Normal' versus 'High' Latency Sensitivity.

 

Clean-Up Procedure and Conclusion



 

Cleanup procedure

 

Make sure that you have closed the two VM Stats Collector windows as previously instructed, as this also will trigger the cleanup of the lab environment. This stops the VMs CPU worker 01a and 02a from placing load on the hosts. The VM latency-sensitivity-l-01a will also be automatically powered down and configuration will be set back to default.

 

 

Key take aways

The Latency Sensitivity setting is very easy to configure. Once you have determined whether your application fits the definition of 'High' latency sensitivity (tens of microseconds), configure Latency Sensitivity.

To review:

1. On a powered off VM, set 100% memory reservation for the latency sensitive VM.

2. If your environment allows, set 100% CPU reservation for the latency sensitive VM such that the MHz reserved is equal to 100% of the sum of the frequency of the VM's vCPUs.

3. In Advanced Settings, set Latency Sensitivity to High.

If you want to learn more about running latency sensitive applications on vSphere, consult these white papers:

http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf 

http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf

 

 

Conclusion

 

This concludes Module 3, Performance Feature: Latency Sensitivity Setting. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents' to quickly jump to a module in the manual.

 

Module 4 - Performance Feature: vNUMA (30 Min)

What is vNUMA?


Since 5.0, vSphere has had the vNUMA feature that presents the physical NUMA topology to the guest operating system. Traditionally virtual machines have been presented with a single NUMA node, regardless of the size of the virtual machine, and regardless of the underlying hardware. Larger and larger workloads are being virtualized, and it has become increasingly important that the guest OS and applications can make decisions on where to execute application processes and where to place specific application memory. ESXi is NUMA aware, and will always try to fit a VM within a single NUMA node when possible. With the emergence of the "Monster VM" this is not always possible.

Besides the possibility of presenting a virtual NUMA architecture to a virtual machine, it is also possible to alter the cores per socket configuration of a virtual machine. Generally, this feature has been all about how the CPU layout, with respect to cores per socket, has been presented to a guest OS.  One example of this is for licensing reasons and it has had no impact on performance. This is because changing the cores per socket configuration did not alter the NUMA architecture of the virtual machine. This is not always the case anymore, at least when used in combination with virtual machines with more than 8 vCPUs per default. More about that later on in this module.

Note that because we are working in a fully virtualized environment, we have to enforce NUMA architectures presented to e.g. an ESXi host In a real environment it would be possible to see the physical architecture. The purpose of this module is to gain understanding of how vNUMA works by itself and in combination with the cores per socket feature.


 

NUMA

 

Non-Uniform Memory Access (NUMA) system architecture

Each node consists of CPU cores and memory. A pCPU can access memory across NUMA nodes, but at a performance cost, and memory access time can be 30% ~ 100% longer

 

 

 

Without vNUMA

 

In this example, a VM with 12 vCPUs is running on a host with four NUMA nodes with 6 cores each. This VM is not being presented with the physical NUMA configuration and hence the guest OS and application only sees a single NUMA node. This means that the guest has no chance of placing processes and memory within a physical NUMA node.

We have poor memory locality.

 

 

With vNUMA

 

In this example, a VM with 12 vCPUs is running on a host that has four NUMA nodes with 6 cores each. This VM is being presented with the physical NUMA configuration, and hence the guest OS and application sees two NUMA nodes. This means that the guest can place processes and accompanying memory within a physical NUMA node when possible.

We have good memory locality.

 

 

For users with non-US Keyboards

 

NOTE:  If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop. Alternatively, you can use the On-Screen Keyboard.

 

 

On-Screen Keyboard

 

To open the On-Screen Keyboard go to Start - All Programs - Accessories - Ease of Access - On-Screen Keyboard

 

Configure vNUMA



 

Login to vSphere

 

Start Firefox, and log into vSphere. The vSphere web client is the default start page.

Login using Windows session authentication.

Credentials used are:

User name: corp\Administrator
Password: VMware1!

 

 

Hosts and clusters

 

  1. Click Hosts and Clusters

 

 

Power on VM

 

  1. Right click esx-03a.corp.local
  2. Click Power On
  3. Wait a few minutes for esx-03a.corp.local to start.

esx-03a.corp.local is a virtual machine running ESXi. We will use this VM to illustrate how vNUMA is configured on a virtual machine, and as an ESXi host to show how NUMA architecture is seen from a host perspective.

 

 

Open a PuTTY window

 

Start Putty

 

 

Connect to esx-03a.corp.local

 

  1. Select esx-03a.corp.local
  2. Click Open

The login procedure will complete automatically using public key authentication.

If connection fails, please wait until eas-03a is completely up and running. Can take a few minutes.

The text for the following commands can be copied and pasted from the file README.txt on the desktop. The paste functionality in Putty, is right click.

 

 

Verify default NUMA architecture

 

Type in the following command, followed by enter:

esxcli hardware memory get | grep NUMA

From the output you can tell, that the VM is presented with a single NUMA node, which was also expected.

 

 

Launch esxtop

 

Type in the following command, followed by enter:

esxtop

 

 

Investigate memory information

 

Hit "m" to see memory information in esxtop.

Note that there is no information related to NUMA nodes. This is the expected result when running on hardware with only a single NUMA node.

 

 

Shut down esx-03a.corp.local

 

  1. Right click esx-03a.corp.local
  2. Click Shut Down Guest OS

 

 

Edit Settings of esx-03a.corp.local

 

  1. Right click esx-03a.corp.local
  2. Click Edit Settings...

 

 

Modify CPU configuration

 

  1. Select the Virtual Hardware tab
  2. Expand CPU
  3. On the Cores per Socket drop-down, select 2

Click OK to save the changes.

 

 

Power on VM

 

  1. Right click esx-03a.corp.local
  2. Click Power On

 

 

Open a PuTTY window

 

Start Putty

 

 

Connect to esx-03a.corp.local

 

  1. Select esx-03a.corp.local
  2. Click Open

The login procedure will complete automatically using public key authentication.

The text for the following commands can be copied and pasted from the file README.txt on the desktop. The paste functionality in Putty, is right click.

 

 

Verify default NUMA architecture

 

Type in the following command, followed by enter:

esxcli hardware memory get | grep NUMA

From the output you can tell that the VM is presented with a single NUMA node, which means that changing the cores per socket did not alter the NUMA architecture presented to the VM.

 

 

Shut down esx-03a.corp.local

 

  1. Right click esx-03a.corp.local
  2. Click Shut Down Guest OS

 

 

Edit Settings of esx-03a.corp.local

 

  1. Right click esx-03a.corp.local
  2. Click Edit Settings...

 

 

Change vNUMA configuration

 

  1. Select the VM Options tab
  2. Expand Advanced
  3. Click Edit Configuration

 

 

Change threshold for enabling vNUMA

 

  1. Click Add Row
  2. Select the name column in the newly created row and type in numa.vcpu.min
  3. Select the Value column and type in 3
  4. Click OK

Click OK in the Edit Settings windows to save changes

The numa.vcpu.min configuration parameter makes it possible to alter when vNUMA is presented to a VM. For backward compatibility with previous versions of vSphere, the default is 8. Changing the value to 3 triggers the vNUMA presentation to our VM because it is equipped with 4 vCPUs

 

 

Power on VM

 

  1. Right click esx-03a.corp.local
  2. Click Power On

 

 

Open a PuTTY window

 

Start Putty

 

 

Connect to esx-03a.corp.local

 

  1. Select esx-03a.corp.local
  2. Click Open

The login procedure will complete automatically using public key authentication.

The text for the following commands can be copied and pasted from the file README.txt on the desktop. The paste functionality in Putty, is right click.

 

 

Verify new NUMA architecture

 

Type in the following command, followed by enter:

esxcli hardware memory get | grep NUMA

From the output you can tell that the VM is presented with two NUMA nodes, which means that enabling vNUMA works. In a real physical environment the VM would now be presented with the physical NUMA layout. Because we are using a double virtualized lab environment, we do not have access to the physical layer and we are not creating large enough VMs to span the physical NUMA nodes in the underlying cloud environment. But do not worry because this gives us a reason to investigate how the combination of vNUMA and cores per socket works.

We saw earlier in this module that changing the cores per socket alone did not alter the NUMA architecture presented to the VM. Now we can see that when used in combination with vNUMA, the cores per socket configuration dictates the presented vNUMA architecture. This means that when using the cores per socket feature on VMs with more than 8 vCPUs (default value), the configuration dictates the vNUMA architecture presented to the VM and therefore can have a impact on VM performance. This is because we can force a VM to span multiple NUMA nodes when not needed.

A word of caution. Only modify cores per socket if needed for licensing reasons, or if a VM is running in a cluster where the hosts have varying NUMA node sizes and you want to ensure that the VM is using a vNUMA configuration that fits on all hosts.

 

 

Launch esxtop

 

Type in the following command, followed by enter:

esxtop

 

 

Investigate memory information

 

Hit "m" to see memory information in esxtop.

Note that now there is information about the size of two NUMA nodes.

 

Clean-Up Procedure and Conclusion



 

Clean up procedure

In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machine.

 

 

Power Off Workloads

 

  1. Click the PowerCLI icon in the taskbar
  2. Type:
.\StopBasicLabVMs.ps1

press enter

 

 

Key take aways

During this lab we saw that as long as vNUMA is not triggered, which happens per default when a VM has more than 8 vCPUs, then altering the cores per socket configuration does not affect performance since the VM fits within a physical NUMA node regardless of its configuration.  In that case, the cores per socket setting is used for licensing issues only.

However, when vNUMA is triggered, the cores per socket setting does impact the virtual NUMA topology presented to the guest and as a result can have a performance impact if it does not match the physical NUMA topology.  By default, vNUMA will pick the optimal topology for you as long as you have not manually configured cores per socket to anything more than 1.  If in fact it has been changed, which you might need to do to address licensing, it is important to match the physical NUMA topology manually.

WARNING! When using the cores per socket configuration in combination with vNUMA, you need to be careful about the changes you make. Dictating a NUMA architecture of a VM that does not match the underlying NUMA architecture, or at least fit within the underlying NUMA architecture, can cause performance problems for demanding applications. However, this can also be used to manage the NUMA architecture of a VM so that it matches the physical servers NUMA layout in a cluster with different physical NUMA layouts. The vNUMA configuration of a VM is locked the first time the VM is powered on and will, per default, not be altered after that. This is due to guest OS and application stability.

So in conclusion, we can say that when you start working with VMs where vNUMA is active, one should make sure that the vNUMA layout of a VM matches the physical NUMA architecture of all hosts within a cluster.

In a homogeneous cluster (identical hosts), ESXi will determine the optimal vNUMA layout if you have not changed cores per socket from 1.

In a heterogeneous cluster (non-identical hosts), you can either make sure that a VM is powered up for the first time on the host in the cluster with the smallest NUMA nodes. Or you can use the cores per socket configuration to make sure that the vNUMA layout of a VM fits within the smallest physical NUMA node size.

Remember that a NUMA node consists of CPU cores and memory. So if a VM has more memory than what will fit within a single NUMA node, and the VM has 8 or fewer vCPUs it may make sense to force enable vNUMA for this VM so that the guest OS has a chance to do intelligent placement of processes and memory.

There has been some confusion around the performance impact of setting the cores per socket of a VM and how vNUMA actually works. By completing this module, we have shown that:

  1. Setting the cores per socket on a VM without vNUMA enabled has no performance impact and can be used to comply with license restrictions.
  2. vNUMA is an important feature to ensure optimal performance of larger VMs with more than 8 vCPUs per default.
  3. Setting the cores per socket of a VM with vNUMA enabled can have a performance impact and can be used to override the VM presentation of the physical NUMA architecture. Use with caution.

If you want to know more about the vNUMA feature of vSphere, see these articles:

http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf

http://blogs.vmware.com/vsphere/tag/vnuma

 

 

Conclusion

 

This concludes Module 4, Performance Feature: vNUMA.  We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents'  to quickly jump to a module in the manual.

 

Module 5 - Performance Feature: Power Policy Setting (15 Min)

How does BIOS Power Policies influence VM Performance?


vSphere Host Power Management (HPM) is a technique that saves energy by placing certain parts of a computer system or device into a reduced power state when the system or device is inactive or does not need to run at maximum speed.

vSphere handles power management by utilizing Advanced Configuration and Power Interface (ACPI) performance and power states. In the VMware vSphere® 6 release, the default power management policy is based on dynamic voltage and frequency scaling (DVFS). This technology utilizes the processor’s performance states and allows some power to be saved by running the processor at a lower frequency and voltage while still leaving enough idle CPU time to maintain acceptable performance levels in most environments.

In this lab we will go through some of the settings you can set in vSphere. In a nested environment, it is not possible to set bios power policies. In the physical world, it is important that the bios is set to allow vSphere to manage the power settings. Check with your hardware manufacturer for the correct settings in your bios before applying these settings in your own environment.


 

For users with non-US Keyboards

 

NOTE:  If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop. Alternatively, you can use the On-Screen Keyboard.

 

 

On-Screen Keyboard

 

Another option for having issues with the keyboard is to use the On-Screen Keyboard.  

To open the On-Screen Keyboard go to Start - All Programs - Accessories - Ease of Access - On-Screen Keyboard

 

 

Login to vSphere

 

Start Firefox and log into vSphere. The vSphere web client is the default start page.

Login using Windows session authentication.

Credentials used are:

User name: corp\Administrator
Password: VMware1!

 

 

Hosts and clusters

 

  1. Click Hosts and Clusters

 

 

Select ESX Host

 

  1. Select esx-01a.corp.local
  2. Select Manage
  3. Select Settings
  4. Select Power management in the hardware section.
  5. Select Edit

 

 

Power management settings in a nested environment

 

Due to the nature of the Hands On Labs, with nested ESX servers, it is not possible to set the Power Management and get an effect.

Therefore, the rest of the lab will consist of screenshots of how to apply the settings and what the purpose of each setting is.

 

 

Power management settings in a physical host.

 

On a physical host, the Power Management options could look like this (it may vary depending of the processor in the physical host).

Here you can see what ACPI states that get presented to the host and what Power Policy is currently active.

In the next steps we will click Edit and show the different policies that are available.

 

 

High Performance

 

This power policy maximizes performance using no power management features. It keeps CPUs in the highest P-state at all times. It uses only the top two C-states (running and halted), not any of the deep states (for example, C3 and C6 on the latest Intel processors). High performance is the default power policy for ESX/ESXi releases prior to 5.0.

 

 

Balanced

 

This power policy is designed to reduce host power consumption while having little or no impact on performance. The balanced policy uses an algorithm that exploits the processor’s P-states. Balanced is the default power policy since ESXi 5.

 

 

Low Power

 

This power policy is designed to more aggressively reduce host power consumption through the use of deep C-states. This comes at the risk of reduced performance.

 

 

Custom

 

This power policy starts out the same as balanced, but it allows individual parameters to be modified. If the host hardware does not allow the operating system to manage power, only the Not Supported policy is available. (On some systems only the High Performance policy is available.)

 

 

Setting custom parameters

 

To set the custom policy settings,

  1. Select Advanced System settings, just below the Power management option, that you selected earlier.
  2. Type "in custom policy" in the filter tab, in the search bar, to only show advanced power settings.

The options you can select is :

Power.CStateMaxLatency : Do not use C-states whose latency is greater than this value.

Power.CStatePredictionCoef : A parameter in the ESXi algorithm for predicting how long a CPU that becomes idle will remain idle. Changing this value is not recommended.

Power.CStateResidencyCoef : When a CPU becomes idle, choose the deepest C-state whose latency multiplied by this value is less than the host’s prediction of how long the CPU will remain idle. Larger values make ESXi more conservative about using deep C-states; smaller values are more aggressive.

Power.MaxCpuLoad : Use P-states to save power on a CPU only when the CPU is busy for less than the given percentage of real time.

Power.MaxFreqPct : Do not use any P-states faster than the given percentage of full CPU speed, rounded up to the next available P-state.

Power.MinFreqPct : Do not use any P-states slower than the given percentage of full CPU speed.

Power.PerfBias : Performance Energy Bias Hint (Intel only). Sets an MSR on Intel processors to an Intel-recommended value. Intel recommends 0 for high performance, 6 for balanced, and 15 for low power. Other values are undefined.

Power.TimerHz : Controls how many times per second ESXi reevaluates which P-state each CPU should be in.

Power.UseCStates : Use deep ACPI C-states (C2 or below) when the processor is idle.

Power.UsePStates : Use ACPI P-states to save power when the processor is busy.

Power.UseStallCtr : Use a deeper P-state when the processor is frequently stalled waiting for events such as cache misses.

 

Clean-Up Procedure and Conclusion



 

Clean up procedure

In case you have powered up any VMs or any VMs from previous modules are running, we need to shut them down in order to free up resources for the remaining parts of this lab.

 

 

Power Off Workloads

 

  1. Click the PowerCLI icon in the taskbar
  2. Type:
.\StopBasicLabVMs.ps1

press enter

 

 

Key take aways

We hope that you have discovered the importance of setting the correct Bios power policy.

Remember to always consult your hardware provider for the correct settings in the bios you are using, and look at the white paper Host Power management in vSphere 5.5 for more information.(Since this HOL have been released before vSphere 6.0, 5.5 have been included!)

 

 

Conclusion

 

This concludes Module 5, Performance Feature: Power Policy Setting. We hope you have enjoyed taking it.  Do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents' to quickly jump to a module in the manual.

 

Module 6 - Using esxtop (60 Min)

When To Use esxtop


There are several tools to monitor and diagnose performance for vSphere environments. It is best to use esxtop to diagnose and further investigate performance issues that have already been identified through another tool or method. Esxtop is not a tool designed for monitoring performance over the long term, but is great for deep investigation or monitoring a specific issue or VM over a defined period of time.

In this lab, which should take about 60 minutes, we will use esxtop to enable a deep dive into performance troubleshooting. First, we will take a look at vCenter monitoring before using esxtop. We will then spend some time with esxtop to see how to use it both interactively and in batch data capture mode. Finally, we will use the data captured from esxtop in a few different tools. This will allow us to be able to better analyze and examine our data.


 

Day To Day Performance Monitoring

There are a variety of tools that can be used to monitor your vSphere environment on a day to day basis. vCenter Operations Manager (vCOPs) is powerful tool that can be used to monitor your entire virtual infrastructure. It incorporates high-level dashboard views and built in intelligence to analyze the data and identify possible problems.

vCOPs is not included in this lab, but we recommend that you take a look at some of the VCOPs labs after this one.

 

 

Performance in vSphere Client / vCenter

A good place to start to examine performance is the vSphere Client through vCenter. Here you are able to look at a performance stats in real time and historically.  

 

 

For users with non-US Keyboards

 

NOTE:  If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop. Alternatively, you can use the On-Screen Keyboard.

 

 

On-Screen Keyboard

 

Another option for having issues with the keyboard is to use the On-Screen Keyboard.  

To open the On-Screen Keyboard go to Start - All Programs - Accessories - Ease of Access - On-Screen Keyboard

 

 

Login to vSphere

 

Start Firefox, and log into vSphere. The vSphere web client is the default start page.

Login using Windows session authentication.

Credentials used are:

User name: corp\Administrator
Password: VMware1!

 

 

Hosts and clusters

 

  1. Click Hosts and Clusters

 

 

vCenter Performance Graphs

 

  1. Select esx-01a.corp.local
  2. Select Monitor
  3. Select Performance
  4. Select Overview

You will now see a graph with the CPU utilization from the last 24 hours.

 

 

vCenter Real-Time Performance

 

You can change the time frame for the displayed performance

  1. Click on the drop down Time Range
  2. Click on Realtime

Now you see the current state of CPU usage for the selected host. In addition to CPU you can scroll down and see memory, disk, and networking specific performance information.

 

 

vCenter Advanced Performance Graphs

 

  1. Click on Advanced. More details are now available and specific performance counters can be added.
  2. Click on Chart Options to be able to specify exactly what is displayed in the advanced performance chart.

 

 

vCenter Advanced Performance Chart Options

 

  1. Click None
  2. Select Core Utilization and Ready
  3. Click on OK

 

 

Summary

vCenter (via the vSphere Client) provides a detailed level of real-time and historical performance information.  These tools are designed to be used in day-to-day operations and management of vSphere.  Esxtop is designed to provide deep-dive real time performance information for an ESXi host.  This is usually only needed when the performance issue is either not identified by one of these other tools, or deep-level details are not available in these tools.  The rest of this lab takes at look a how to use esxtop and how to sort through and analyze the data that it produces.

 

Diagnosing Performance In Interactive Mode


Esxtop can be used to diagnose performance issues involving almost any aspect of performance and at both the host and virtual machine perspectives. This section will step through how to use esxtop to view CPU, memory, disk, network, and memory using esxtop in interactive mode.


 

Open a PuTTY window

 

Start Putty

 

 

Connect to esx-01a.corp.local

 

  1. Select esx-01a.corp.local
  2. Click Open

The login procedure will complete automatically using public key authentication.

The text for the following commands can be copied and pasted from the file ReadMe.txt on the desktop. The paste functionality in Putty, is right click.

 

 

Start esxtop

 

Type:

esxtop 

and hit enter

 

 

Initial esxtop screen - CPU

 

The initial esxtop screen you will see is the CPU screen which shows CPU utilization in terms of the ESX host and all the VMs that are running.

Initially, we see any VMs that are running in the list of processes with their %USED, %RUN, %SYS, %WAIT, and %VMWAIT shown.  

There are also lots of other processes shown in the list as well.  These are esxi host processes and functions that for the most part can be identified by name.

 

 

CPU screen Details

 

Expand the putty session window to fullscreen.  This will allow esxtop to display many more fields.  These additional fields become visible after expanding the session to fullscreen.  You will see the %RDY, %IDLE, %OVRLP, and others.

 

 

Customizing esxtop Screens

 

Press the letter o and you will be presented with the Current Field Order screen. Here you can customize the order from left to right that the different groups of data columns appear on the esxtop screen.

Press and hold the shift key while pressing the letter d three times. Then press the space bar.

Watch the string of letters after Current Field order at the top of the screen. You will see the letter d move with press of SHIFT-d. This is changing the screen so that the NAME field is displayed first on the far left.

 

 

More esxtop Screen Customization

 

Now press the letter f key to open the Current Field Order Screen in toggle mode. When in this screen you can toggle on or off groups of fields.

Press the E and F keys to remove the NWLD and %STATE TIMES fields from the screen.

Press spacebar to return. You will see only the NAME, ID, and GID displayed.

 

 

Adding New Columns to CPU screen

 

Press the f key again to go back to the field toggle screen.

Add the %STATE TIMES back by pressing f.

Press space bar to return to the CPU screen.

 

 

Open a PowerCLI window

 

Click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt. It may take a moment to load.

Note: Your desktop may look slightly different from those in the screen captures.

 

 

Start Lab configuration

 

At the PowerCLI prompt

type:

.\StartesxtopLab.ps1

and hit Enter.

Note: The text for this command can also be copied and pasted from the file ReadMe.txt on the desktop. Alternatively, you can type a few characters of the script name and hit the Tab key to complete the name. Keep hitting the Tab key until the correct name is displayed and hit Enter.

Please leave this PowerCLI window open for the remainder of the lab, and continue reading this lab manual while the lab environment is being configured.

 

 

Verify Powered On VMs

 

Return to the vSphere web client and verify that the following VMs has been started and are located on esx-01a.corp.local:

If the vSphere web client was already open before starting the script, you should refresh the browser.

 

 

Monitoring VMs CPU in esxtop

 

Return to the putty session where esxtop is running.  (If you closed it, just open putty again and reconnect to the esx-01a.corp.local system.  Then type in esxtop to get back.)

Press <SHIFT> V (or capital V) to tell esxtop to just display the VMs.  You will now only see the three VMs we just started and not all the other processes.

 

 

Expand VM

 

Expand to see the details for a VM by pressing E and then entering the ID of a VM.

Here we expanded the perf_cpu_worker VM with ID 86296 (ID might be different in your environment)

 

 

CPU Details of a VM

 

You can now see all of the associated processes that make up the VM including the single vCPU of this VM. VMs with multiple vCPUs will show each as a separate process. This allows you to see the distribution of workload that is running inside the VM. (Sometimes after you expand the VM it might only show for one refresh cycle before disappearing. This is because multiple changes were made in a single refresh cycle.)

 

 

Memory Screen Details

 

Press m to switch to the Memory screen of esxtop.

This screen shows us stats in a similar style as the initial CPU screen, but all of these stats are dedicated to memory related information.  

 

 

Memory Screen Customizing

 

The same field selection and ordering customization can be made on the memory screen as for the cpu screen.

Press f to see the available groups of memory counters that can be displayed.

Press space bar to return to the memory screen.  

Press o to open the ordering screen and be able to change the order that they are displayed in.  

Press space bar to return to the memory screen when finished.

 

 

VMs Only On Memory Screen

 

To limit the view to just the VMs you can press <SHIFT> V (or capital V) to get the VM only view.  

 

 

Network Screen

 

Press n to go to the network screen of esxtop.

Here you will see all of the virtual networks and the virtual nics attached.  On the screen, you can quickly see how much network activity is occurring with each of the VMs and on which virtual nics.

 

 

Network Screen Customizing

 

The same field selection and ordering customization can be made on the network screen as for the cpu and memory screens.

Press f to see the available groups of memory counters that can be displayed.

Press space bar to return to the memory screen.  

Press o to open the ordering screen and be able to change the order that they are displayed in.  

Press space bar to return to the network screen when finished.

 

 

Disk Screens

 

Press d to open the disk adapter esxtop screen.

This screen displays the disk performance data from the disk adapter perspective. There is also a disk device screen and virtual disk screen that provide disk performance information from different perspectives. We will look at those in just a few minutes.  

 

 

Disk Adapter Screen Customization

 

The same field selection and ordering customization can be made on the different disk screens as for the cpu, memory and network screens.

Press f to see the available groups of memory counters that can be displayed.

Press space bar to return to the memory screen.  

Press o open the ordering screen and be able to change the order that they are displayed in.  

Press space bar to return to the disk adapter screen when finished.

 

 

Disk Adapter Details

 

Press e to expand the details for a disk adapter.  

Enter the adapter that you want to expand. In the example screen shot, we have entered vmhba1 to be expanded.  

Once expanded it will show all the paths currently on the expanded disk adapter. In the lab environment, there is only a single path per adapter; not so exciting to see.  In most real environments this additional detail will show many paths and will allow you to see exactly where the activity is occurring on the adapter.  

 

 

Disk Device Screen

 

Press u to reach the disk device screen.

This shows the disk performance information from the perspective of each specific disk device. In this example, we see that the two local disks and the NFS mounted disk are all shown.  

 

 

Disk Device Customization

 

The same field selection and ordering customization can be made on the different disk screens as for the cpu, memory and network screens.

Press f to see the available groups of memory counters that can be displayed.

Press space bar to return to the memory screen.  

Press o to open the ordering screen and be able to change the order that they are displayed in.  

Press space bar to return to the disk device screen when finished.

This includes many interesting stats including VAAI related performance information.

 

 

Virtual Disk Screen

 

Press v to display the virtual disk screen.

This displays disk performance from the perspective of the virtual machines and the virtual disks that they have. The CMD/s displayed here should be very close to what is reported inside the guest.  

 

 

Virtual Disk Screen Customization

 

The same field selection and ordering customization can be made on the different disk screens as for the cpu, memory and network screens.

Press f to see the available groups of memory counters that can be displayed.

Press space bar to return to the memory screen.  

Press o to open the ordering screen and be able to change the order that they are displayed in.  

Press space bar to return to the virtual disk screen when finished.

 

 

Virtual Disk Screen Details

 

Press e to expand a specific VM and enter the GID for the perf_cpu_worker VM.

In the example screen shot the GID is 86296. You will see each of the virtual disks that are attached to the virtual machine and the stats per virtual disk shown. This allows you to quickly see the IOPS (CMD/s) or latency (LAT/rd or LAT/wr) per virtual disk.

 

 

Saving Your esxtop Customizations

 

When using the field selection and ordering options of esxtop you will end up creating the environment that you want. In order to save these views and have esxtop default to these settings on the next start you can press <SHIFT> W (or capital W) at any time. This will create an esxtop configuration file that looks like the example in the screen shot above. It will default to use the file name that esxtop automatically looks for when it starts up (default is //.esxtop60rc). You can also specify a different file name and then have esxtop use that file on startup with the -c myconfigfilename option.

 

 

Exit esxtop

Hit q to exit esxtop, and leave putty running. We will use putty again in the next section.

 

 

Summary of Interactive Mode

Interactive mode allows for you to quickly access detailed real time stats grouped by CPU, memory, network, disk or device. This performance information is extremely detailed and can be customized in terms of what is and is not displayed.

 

Capturing Performance Data with esxtop


Esxtop can be used to capture or record performance data. This approach is often used to capture data when a known performance problem is occurring. The data can then be analyzed in detail after the capture. It is also often used while running performance tests or benchmarks so that performance metrics can be calculated and viewed from a variety of different perspectives after the testing is completed.  


 

Check for Available Disk Space

 

The size of the file that will hold the captured esxtop data will depend on the options that you specify when you do the capture.  Sometimes these files get to be rather large so it is important to make sure you have enough space.  You can use the disk free command to see the available space on volumes attached to the esx host.

Type:

df -h 

to get the "human" readable format..  

Once you have determined where you want to store the esxtop data, switch to that volume.  

In this lab, change directories to /vmfs/volumes/ds-site-a-nfs01 as this volume has plenty of free space for our purposes.

type:

cd /vmfs/volumes/ds-site-a-nfs01

 

 

Batch Mode esxtop

 

Batch mode is used to capture esxtop data. This mode is entered by including the -b option to esxtop on the command line. In addition, we will want to include a specification for the number of data samples we want to capture and how long between each sample. This is specified via the -n for number of iterations or samples and a -d for the delay or time between each sample. A combination of the interactions and delay parameter will determine how long esxtop will be capturing data. For example if iterations was set to 5 and delay was set to 60, then the capture would run for 5 minutes.

We will also be redirecting the output of the batch mode esxtop to a file using an output redirect (the > symbol).

 

 

Capturing esxtop Data

 

Start an esxtop capture using the same syntax as in the screen shot. It should be:

esxtop -b -a -d 5 -n 12 > HOLesxtop1.csv

Note: The text for the commands in the putty window can be copied and pasted from the file README.txt on the desktop.

Please wait until this commands complete; read below while waiting to get more details on what is happening.

This will capture data for one minute because we have specified to wait 5 between each interval and then capture 12 times for a total of 60 seconds. We have added the -a parameter so that all possible performance counters are captured. This will cause the size of the output file to be larger but will also ensure that the most possible counters are captured. Nothing will happen during the minute that the esxtop capture is running.

 

 

Viewing the Results

 

On the command line type:

more HOLesxtop1.csv

Then press q to quit the more command.

As you can see, the output of the batch mode esxtop is a csv file that is not meant to be viewed directly. Because it is in a csv format there are several tools that can be used to view and analyze the output. The next section in this module will go through the details of a few examples of ways that this csv output file can be analyzed.

 

 

Start WinSCP

 

Start Winscp by

  1. Press the start menu
  2. Type Winscp in the search bar
  3. Select Winscp from the search result

 

 

Select host

 

  1. Select host esx-01a.corp.local
  2. press login

 

 

Find the file

 

  1. On the left side, browse to c:\esxtopdata
  2. On the right side, browse to /vmfs/volumes/ds-site-a-nfs01 (the directory name change, when you select it)
  3. right click the file HOLesxtop1.csv
  4. Select download

 

 

Download file

 

  1. Select ok.

When the transfer is finished, you can close winscp.

 

Performance monitoring using Visual EsxTop


Monitoring data in esxtop can be hard. Visual ESXtop is a VMware fling that VMware provides.  It gives a visual tool to monitor live or captured data from ESXi hosts.


 

Start Visual ESXtop

 

  1. Double click the visualEsxtop shortcut on the desktop

 

 

Import Saved data

 

Select File

  1. Load batch output

 

 

Load saved data

 

  1. Browse to C:\esxtopdata
  2. Select the saved file: HOLesxtop1.csv
  3. Click Open

 

 

Browse hierarchy

 

  1. Double click Object types, to make the tree view unfold.

Here you can browse all the objects that we have captured performance data on.

 

 

Finding VM CPU Performance

 

  1. Expand Group and find the 3 running VMs.

 

 

Select counters

 

  1. Expand the VM folder 57213:perf_cpu_worker-l-01a
  2. Double click on %RDY and %RUN to add the counters to the graph area to the right

Do this for each of the 3 VM's.

You should now be able to see and compare all 6 metrics, from 3 VM's in the same graph.

This ability to quickly graph and see multiple metrics is very powerful in being able to easily spot issues. A visual representation of performance data is often much more understandable because it is easy to spot trends, outliers, and correlations.

 

 

Close local file

 

  1. Close the local file by pressing the lower x in the top right corner. If you close the entire app, just double click the desktop icon and open it again.

 

 

Connect to Live Server

 

We will now connect to the running ESXi host and collect data live.

  1. Select File
  2. Connect to Live Server

 

 

Connect to Live host

 

  1. Remote server : esx-01a.corp.local
  2. Username/Password : root/VMware1!
  3. Click Connect

 

 

Monitor data

 

Maximize the app to be able to better see all field names.

Select the Memory pane

 

 

Customize columns

 

  1. Select the button, in the right corner, to open up to columns window

 

 

Select Columns

 

Here you can select wich columns you want to look at, just like you did in the ESXTOP part of this module.

  1. Select %ACTV to add another column to look at.
  2. Click OK

 

 

Learn about Columns

 

There are a lot of values to add.  Since it can be difficult to remember the meaning of all the different values, there is build in help in Visual ESXTOP.

To access it,

  1. Position your courser, in the column, of the value you want to learn about.
  2. A yellow box will then popup, with a help text, describing what the values mean.

 

 

Summary

We hope that we have showed you some of the features of Visual ESXTOP.  Remember that this tool is still a VMware Fling and does not come with support. That said, it can still be a valuable tool to troubleshoot issues in a more visual manner.

We recommend that you also take a look at our vCOPs labs. There you can find additional visual ways of monitoring and troubleshooting performance related issues.

 

Clean-Up Procedure and Conclusion



 

Clean up procedure

In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machine.

 

 

Power Off Workloads

 

  1. Click the PowerCLI icon in the taskbar
  2. Type:
.\StopBasicLabVMs.ps1

press enter

 

 

Key take aways

In this module, we have seen that esxtop can be an extremely powerful performance troubleshooting tool and can provide very deep and detailed performance data. At the same time, esxtop does not give you a good way of monitoring different performance metrics at the same time or help identifying trends.

Using visualEsxtop, can provide you with a better overview than esxtop while maintaining the same level of details.

 

 

Conclusion

 

This concludes Module 6: Using esxtop. We hope you have enjoyed taking it. Do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab and the estimated time to complete each one.  Click on 'More Options - Table of Contents' to quickly jump to a module in the manual.

 

Conclusion

Thank you for participating in the VMware Hands-on Labs. Be sure to visit http://hol.vmware.com/ to continue your lab experience online.

Lab SKU: HOL-SDC-1404

Version: 20150227-110430