VMware Hands-on Labs - HOL-1704-SDC-1


Lab Overview - HOL-1704-SDC-1 - vSphere 6: Performance Optimization

Lab Guidance


You have 90 minutes for each lab session and next to each module you can see the estimated time to complete it.  Every module can be completed by itself, and the modules can be taken in random order, but make sure that you follow the instructions carefully with respect to the cleanup procedure after each module. In short, all VMs should be shut down after the completion of each module using the script instructed in the modules. In total, there is more than six hours of content in this lab.

Lab Module List:

Lab Overview

Module 1: CPU Performance, Basic Concepts and Troubleshooting (15 minutes)

Module 2: CPU Performance Feature: Latency Sensitivity Setting (45 minutes)

Module 3: CPU Performance Feature: Power Policies (15 minutes)

Module 4: CPU Performance Feature: SMP-FT (30 minutes)

Module 5: Memory Performance, Basic Concepts and Troubleshooting (30 minutes)

Module 6: Memory Performance Feature:  vNUMA with Memory Hot Add (30 minutes)

Module 7: Storage Performance and Troubleshooting (30 minutes)

Module 8: Network Performance, Basic Concepts and Troubleshooting (15 minutes)

Module 9: Network Performance Feature: Network IO Control with Reservations (45 minutes)

Module 10: Performance Tool: esxtop CLI introduction (30 minutes)

Module 11: Performance Tool: vRealize Operations, next step in performance monitoring and Troubleshooting (30 minutes)

Lab Captains: David Morse and Jim Sou

This lab manual can be downloaded from the Hands-on Labs Document site found here:

http://dcos.hol.vmware.com

This lab may be available in other languages.  To set your language preference and have a localized manual deployed with your lab, you may utilize this document to help guide you through the process:

http://docs.hol.vmware.com/announcements/nee-default-language.pdf


 

Location of the Main Console

 

  1. The area in the RED box contains the Main Console.  The Lab Manual is on the tab to the Right of the Main Console.
  2. A particular lab may have additional consoles found on separate tabs in the upper left. You will be directed to open another specific console if needed.
  3. Your lab starts with 90 minutes on the timer.  The lab can not be saved.  All your work must be done during the lab session.  But you can click the EXTEND to increase your time.  If you are at a VMware event, you can extend your lab time twice, for up to 30 minutes.  Each click gives you an additional 15 minutes.  Outside of VMware events, you can extend your lab time up to 9 hours and 30 minutes. Each click gives you an additional hour.

 

 

Activation Prompt or Watermark

 

When you first start your lab, you may notice a watermark on the desktop indicating that Windows is not activated.  

One of the major benefits of virtualization is that virtual machines can be moved and run on any platform.  The Hands-on Labs utilizes this benefit and we are able to run the labs out of multiple datacenters.  However, these datacenters may not have identical processors, which triggers a Microsoft activation check through the Internet.

Rest assured, VMware and the Hands-on Labs are in full compliance with Microsoft licensing requirements.  The lab that you are using is a self-contained pod and does not have full access to the Internet, which is required for Windows to verify the activation.  Without full access to the Internet, this automated process fails and you see this watermark.

This cosmetic issue has no effect on your lab.  

 

 

Alternate Methods of Keyboard Data Entry

During this module, you will input text into the Main Console. Besides directly typing it in, there are two very helpful methods of entering data which make it easier to enter complex data.

 

 

Click and Drag Lab Manual Content Into Console Active Window

You can also click and drag text and Command Line Interface (CLI) commands directly from the Lab Manual into the active window in the Main Console.  

 

 

Accessing the Online International Keyboard

 

You can also use the Online International Keyboard found in the Main Console.

  1. Click on the Keyboard Icon found on the Windows Quick Launch Task Bar.

 

 

Click once in active console window

 

In this example, you will use the Online Keyboard to enter the "@" sign used in email addresses. The "@" sign is Shift-2 on US keyboard layouts.

  1. Click once in the active console window.
  2. Click on the Shift key.

 

 

Click on the @ key

 

  1. Click on the "@" key.

Notice the @ sign entered in the active console window.

 

 

Look at the lower right portion of the screen

 

Please check to see that your lab is finished all the startup routines and is ready for you to start. If you see anything other than "Ready", please wait a few minutes.  If after 5 minutes you lab has not changed to "Ready", please ask for assistance.

 

vSphere 6 Performance Introduction


This Lab, HOL-SDC-1704, covers vSphere performance best practices and various performance related features available in vSphere 6. You will work with a broad array of solutions and tools, including VMware Labs "Flings" and esxtop to gauge and diagnose performance in a vSphere Environment. vSphere features related to performance includes Network IO Control Reservations, vNUMA with Memory Hot Add, Latency Sensitivity and Power Policy Settings.

While the time available in this lab constrains the number of performance problems we can review as examples, we have selected relevant problems that are commonly seen in vSphere environments. By walking through these examples, you should be more capable to understand and troubleshoot typical performance problems.  

For the complete Performance Troubleshooting Methodology and a list of VMware Best Practices, please visit the vmware.com website:

http://www.vmware.com/files/pdf/techpaper/VMware-PerfBest-Practices-vSphere6-0.pdf

http://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-60-monitoring-performance-guide.pdf

Further more, if you have interest in performance related articles, make sure that you monitor the VMware VROOM! Blog:

http://blogs.vmware.com/performance/


Module 1: CPU Performance, Basic Concepts and Troubleshooting (15 minutes)

Introduction to CPU Performance Troubleshooting


The goal of this module is to expose you to a CPU contention issue in a virtualized environment. It will also guide you on how to quickly identify performance problems by checking various performance metrics and settings.

Performance problems may occur when there are insufficient CPU resources to satisfy demand. Excessive demand for CPU resources on a vSphere host may occur for many reasons. In some cases, the cause is straightforward. Populating a vSphere host with too many virtual machines running compute-intensive applications can make it impossible to supply sufficient CPU resources to all the individual virtual machines. However, sometimes the cause may be more subtle, related to the inefficient use of available resources or non-optimal virtual machine configurations.

Let's get started!


 

For users with non-US Keyboards

 

If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop.

 

 

On-Screen Keyboard

 

Another option, if you are having issues with the keyboard, is to use the On-Screen Keyboard.  

To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.

 

 

Getting Back on Track

 

If, for any reason, you make a mistake or the lab breaks, perform the following actions to get back on track, and restart the current module from the beginning.  

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell prompt.

 

 

Resetting VMs to Restart Module

 

From the PowerCLI prompt, type:    

.\StopLabVMs.ps1

and press Enter.

The script will stop all running VMs and reset their settings, and you can restart the module.

 

 

Start this Module

 

Let's start this module.

Launch Chrome from the shortcut in the Taskbar.

 

 

Login to vSphere

 

Log into vSphere. The vSphere Web Client should be the default home page.

If, for some reason, that does not work, uncheck the box and use these credentials:

User name: CORP\Administrator
Password: VMware1!

 

 

Refresh the UI

 

In order to reduce the amount of manual input in this lab, a lot of tasks are automated using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the actual state of the inventory immediately after a script has run.

If you need to manually refresh the inventory, click the Refresh icon in the top of the vSphere Web Client.

 

 

Select Hosts and Clusters

 

 

vSphere HTML5 Web Client


During this module you have the option of utilizing the HTML5 version of the Web Client.  This version is designed to be more responsive, stable and at the same time provide security.  

At this point not all features are available in the Fling release.  The most commonly actions/features are the following:

If you want to run this in your own environment you can reference the following link: https://labs.vmware.com/flings/vsphere-html5-web-client

Keep in mind that the majority of the actions in this module won’t work in the HTML5 version. The recommendation here is to have both Flash and HTML5 instances open at the same time and toggle between them.

The HTML5 Web Client is located in the bookmark section labeled “Region A HTML5 client”.


 

HTML5 Web Client Link

 

 

 

 

What is a Fling?

From the Flings website: https://labs.vmware.com/about

Our engineers work on tons of pet projects in their spare time, and are always looking to get feedback on their projects (or “flings”). Why flings? A fling is a short-term thing, not a serious relationship but a fun one. Likewise, the tools that are offered here are intended to be played with and explored. None of them are guaranteed to become part of any future product offering and there is no support for them. They are, however, totally free for you to download and play around with them!

 

 

CPU Contention


Below are a list of most common CPU performance issues:

High Ready Time: A CPU is in the Ready state when the virtual machine is ready to run but unable to run because the vSphere scheduler is unable to find physical host CPU resources to run the virtual machine on. Ready Time above 10% could indicate CPU contention and might impact the Performance of CPU intensive application. However, some less CPU sensitive application and virtual machines can have much higher values of ready time and still perform satisfactorily.

High Costop time: Costop time indicates that there are more vCPUs than necessary, and that the excess vCPUs make overhead that drags down the performance of the VM. The VM will likely run better with fewer vCPUs. The vCPU(s) with high costop is being kept from running while the other, more-idle vCPUs are catching up to the busy one.

CPU Limits: CPU Limits directly prevent a virtual machine from using more than a set amount of CPU resources. Any CPU limit might cause a CPU performance problem if the virtual machine needs resources beyond the limit.

Host CPU Saturation: When the Physical CPUs of a vSphere host are being consistently utilized at 85% or more then the vSphere host may be saturated. When a vSphere host is saturated, it is more difficult for the scheduler to find free physical CPU resources in order to run virtual machines.

Guest CPU Saturation: Guest CPU (vCPU) Saturation is when the application inside the virtual machine is using 90% or more of the CPU resources assigned to the virtual machine. This may be an indicator that the application is being bottlenecked on vCPU resource. In these situations, adding additional vCPU resources to the virtual machine might improve performance.

Oversizing VM vCPUs: Using large SMP (Symmetric Multi-Processing) virtual machines can cause unnecessary overhead. Virtual machines should be correctly sized for the application that is intended to run in the virtual machine. Some applications may only support multithreading up to a certain number of threads. Assignment of additional vCPU to the virtual machine may cause additional overhead. If vCPU usage shows that a machine, which is configured with multiple vCPUs and is only using one of them. Then it might be an indicator that the application inside the virtual machine is unable to take advantage of the additional vCPU capacity, or that the guest OS is incorrectly configured.

Low Guest Usage: Low in-guest CPU utilization might be an indicator, that the application is not configured correctly, or that the application is starved of some other resource such as I/O or Memory and therefore cannot fully utilize the assigned vCPU resources.


 

Open a PowerCLI window

 

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a command prompt.

 

 

Start the CPU Workload

 

From the PowerCLI Console

type:

.\StartCPUTest.ps1

press enter

While the script configures and starts up the virtual machines, please continue to read ahead.

 

 

CPU Test Started

 

When the script completes, you will see two Remote Desktop windows open (note: you may have to move one of the windows to display them side by side, as shown above).

The script has started a CPU intensive benchmark (SPECjbb2005) on both perf-worker-01a and perf-worker-01b virtual machines, and a GUI is displaying the real-time performance value as this workload runs.

If you do not see the SPECjbb2005 window open launch the shortcut in the upper left hand corner.  

Above, we see an example screenshot where the performance of the benchmarks are around 15,000.

IMPORTANT NOTE: Due to changing loads in the lab environment, your values may vary from the values shown in the screenshots.

 

 

Navigate to perf-worker-01a (VM-level) Performance Chart

 

  1. Select the perf-worker-01a virtual machine from the list of VMs on the left
  2. Click the Monitor tab
  3. Click Performance
  4. Click Advanced
  5. Click on Chart Options

 

 

Select Specific Counters for CPU Performance Monitoring

 

When investigating a potential CPU issue, there are several counters that are important to analyze:

  1. Select CPU from the Chart metrics
  2. Check only the perf-worker-01a object
  3. Click None on the bottom right of the list of counters
  4. Now check only Demand, Ready, and Usage in MHz
  5. Click Ok

 

 

CPU State Time Explanation

 

Virtual machines can be in any one of four high-level CPU States:

 

 

Monitor Demand vs. Usage lines

 

Notice the amount of CPU this virtual machine is demanding and compare that to the amount of CPU usage the virtual machine is actually allocated (Usage in MHz). The virtual machine is demanding more than it is currently being allowed to use.

Notice that the virtual machine is also seeing a large amount of ready time.

Guidance: Ready time > 10% could be a performance concern.

 

 

Explanation of value conversion

 

NOTE:  vCenter reports some metrics such as "Ready Time" in milliseconds (ms). Use the formula above to convert the milliseconds (ms) value to a percentage.

For multi-vCPU virtual machines, multiply the Sample Period by the number of vCPUs of the VM to determine the total time of the sample period. It is also beneficial to monitor Co-Stop time on multi-vCPU virtual machines.  Like Ready time, Co-Stop time greater than 10% could indicate a performance problem.  You can examine Ready time and Co-Stop metrics per vCPU as well as per VM.  Per vCPU is the most accurate way to examine statistics like these.

 

 

Navigate to Host-level CPU chart view

 

  1. Select esx-01a.corp.local
  2. Select the Monitor tab
  3. Select Performance
  4. Select the Advanced view
  5. Select the CPU view

 

 

Examine ESX Host Level CPU Metrics

 

Notice in the Chart, that only 1 of the CPUs in the host seems to have any significant workload running on it.

One CPU is at 100%, but the other CPU in the host is not really being used.

 

 

Edit Settings of perf-worker-01a

 

Let's see how perf-worker-01a is configured:

  1. Click on the perf-worker-01a virtual machine
  2. Click Actions
  3. Click Edit Settings…

 

 

Check Affinity Settings on perf-worker-01a

 

  1. Expand the CPU item in the list and you will see that affinity is set to cpu1.
  2. Clear the "1" to correctly balance the virtual machines across the physical CPUs in the system.  
  3. Press OK to make the changes.

Note:  VMware does not recommend setting affinity in most cases. vSphere will balance VMs across CPUs optimally without manually specifying affinity. Enabling affinity prevents some features like vMotion, can become a management headache and lead to performance issues like the one we just diagnosed.

 

 

Check Affinity Settings on perf-worker-01b

 

  1. Expand the CPU item in the list and you will see that affinity is set. Unfortunately, both virtual machines are bound to the same processor (CPU1). This can happen if an administrator sets affinity for a virtual machine and then creates a second virtual machine by cloning the original.
  2. Clear the "1" to correctly balance the virtual machines across the physical CPUs in the system.  
  3. Press OK to make the changes.

Note:  VMware does not recommend setting affinity in most cases. vSphere will balance VMs across CPUs optimally without manually specifying affinity. Enabling affinity prevents some features like vMotion, can become a management headache and lead to performance issues like the one we just diagnosed.

 

 

Monitor Ready time

 

Return to perf-worker-01a and see how the Ready time immediately drops, and the Usage in MHz increases.

 

 

See Better Performance

 

It may take a moment, but the CPU Benchmark scores should increase.  Click back to the Remote Desktop windows to confirm this.

In this example, we have seen how to use the Demand compared to the Used CPU metrics to identify CPU contention.  We showed you the Ready time metric and how it can be used to detect physical CPU contention. We also showed you the danger of setting affinity.  

 

 

Edit Settings of perf-worker-01b

 

Let's add a virtual CPU to perf-worker-01b to improve performance.

  1. Click on the perf-worker-01b virtual machine
  2. Click Actions
  3. Click Edit Settings…

 

 

Add a CPU to perf-worker-01b

 

  1. Change the number of CPUs to 2
  2. Click OK

 

 

Monitor CPU performance of perf-worker-01b

 

  1. Select perf-worker-01b
  2. Select Monitor
  3. Select Performance
  4. Select the CPU view

Notice that the virtual machine is now using both vCPUs. This is because the OS in the virtual machine supports CPU hot-add, and because that feature has been enabled on the virtual machine.

 

 

Investigate performance

 

Notice that the performance of perf-worker-01b has increased, since we added the additional virtual CPU.

However, this is not always the case. If the host these VMs are running on (esx-01a) only had two physical CPUs, the addition of an additional vCPU would have caused an overcommitment, leading to high %READY and poor performance.

Remember, most workloads are not necessarily CPU bound.  The OS and the application need to be able to be multi-threaded to get performance improvements from additional CPUs.  Most of the work that an OS is doing is typically not CPU-bound, that is, most of their time is spent waiting for external events such as user interaction, device input, or data retrieval, rather than executing instructions. Because otherwise-unused CPU cycles are available to absorb the virtualization overhead, these workloads will typically have throughput similar to native, but potentially with a slight increase in latency.

Configuring a virtual machine with more virtual CPUs (vCPUs) than its workload can use might cause slightly increased resource usage, potentially impacting performance on very heavily loaded systems. Common examples of this include a single-threaded workload running in a multiple-vCPU virtual machine, or a multi-threaded workload in a virtual machine with more vCPUs than the workload can effectively use.

Even if the guest operating system doesn’t use all of the vCPUs allocated to it, over-configuring virtual machines with too many vCPUs still imposes non-zero resource requirements on ESXi that translate to real CPU consumption on the host. For example:

These resource requirements translate to real CPU consumption on the host.

 

 

Close Remote Desktop Connections

 

Close the two remote desktop connections.

 

Conclusion and Clean-Up


In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machine and reset their configuration.


 

Launch PowerCLI

 

If the PowerCLI window is not already open, click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Power off and Reset VMs

 

In the PowerCLI console, type:    

.\StopLabVMs.ps1

press Enter

The script will now stop all running VMs and reset their settings.

 

 

Close PowerCLI window

 

Close the PowerCLI Window  

You can now move on to another module.

 

 

Key take aways

CPU contention problems are generally easy to detect. In fact, vCenter has several alarms that will trigger if host CPU utilization or virtual machine CPU utilization goes too high for extended periods of times.

vSphere 6.0 allows you to create very large virtual machines that have up to 128 vCPUs. It is highly recommended to size your virtual machine for the application workload that will be running in them. Sizing your virtual machine with resources that are unnecessarily larger than the workload can actually use may result in hypervisor overhead and can also lead to performance issues.

In general, here are some common CPU performance tips

Avoid a large VM on too small a platform

Don't expect as high of consolidation ratios with busy workloads as you did with the low-hanging-fruit

 

 

Conclusion

 

This concludes Module 1: CPU Performance, Basic Concepts and Troubleshooting. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents' to quickly jump to a module in the manual.

 

Module 2: CPU Performance Feature: Latency Sensitivity Setting (45 minutes)

Introduction to Latency Sensitivity


The latency sensitivity feature aims at eliminating the major sources of extra latency imposed by virtualization to achieve low response time and jitter. This per-VM feature achieves this goal by giving exclusive access to physical resources to avoid resource contention due to sharing, bypassing virtualization layers to eliminate the overhead of extra processing, and tuning virtualization layers to reduce overhead. Performance can be further improved when the latency sensitivity feature is used together with a pass-through mechanism such as single-root I/O virtualization (SR-IOV).

Since the latency sensitivity feature is applied on a per VM basis, a vSphere host can run a mixture of normal VMs and latency sensitive VMs.


 

Who should use this feature?

The latency sensitivity feature is intended for specialized use cases that require extremely low latency. It is extremely important to determine whether or not your workload could benefit from this feature before enabling it. In a nutshell, latency sensitivity provides extremely low network latency with a tradeoff of increased CPU and memory cost as a result of less resource sharing, and increased power consumption.

We define a "highly latency sensitive application" as one that requires network latencies in the order of tens of microseconds and very small jitter. Stock market trading applications are an example of highly latency sensitive applications.

Before deciding if this setting is right for you, you should be aware of the network latency needs of your application. If you set latency sensitivity to High, it could lead to increased host CPU utilization, power consumption, and even negatively impact performance in some cases.

 

 

Who should not use this feature?

Enabling the latency sensitivity feature reduces network latency. Latency sensitivity will not decrease application latency if latency is influenced by storage latency or other sources of latency besides the network.

The latency sensitivity feature should be enabled in environments in which the CPU is undercommitted. VMs which have latency sensitivity set to High will be given exclusive access to the physical CPU they need to run. This means the latency sensitive VM can no longer share the CPU with neighboring VMs.

Generally, VMs that use the latency sensitivity feature should have a number of vCPUs which is less than the number of cores per socket in your host to ensure that the latency sensitive VM occupies only one NUMA node.

If the latency sensitivity feature is not relevant to your environment, feel free to choose a different module.

 

 

Changes to CPU access

When a VM has 'High' latency sensitivity set in vCenter, the VM is given exclusive access to the physical cores it needs to run. This is termed exclusive affinity. These cores will be reserved for the latency sensitive VM only, which results in greater CPU accessibility to the VM and less L1 and L2 cache pollution from multiplexing other VMs onto the same cores. When the VM is powered on, each vCPU is assigned to a particular physical CPU and remains on that CPU.

When the latency sensitive VM's vCPU is idle, ESXi also alters its halting behavior so that the physical CPU remains active. This reduces wakeup latency when the VM becomes active again.

 

 

Changes to virtual NIC interrupt coalescing

A virtual NIC (vNIC) is a virtual device that exchanges network packets between the VMkernel and the guest operating system. Exchanges are typically triggered by interrupts to the guest OS or by the guest OS calling into VMkernel, both of which are expensive operations. Virtual NIC interrupt coalescing, which is enabled by default in vSphere, attempts to reduce CPU overhead by holding back packets for some time (combining or "coalescing" these packets) before triggering interrupts, which causes the hypervisor to wake up VMs more frequently.

Enabling 'High' latency sensitivity disables virtual NIC coalescing, so that there is less latency between when a packet is sent or received and when the CPU is interrupted to process the packet.   Typically, coalescing is desirable for higher throughput (so the CPU isn't interrupted as often), but it can introduce network latency and jitter.

While disabling coalescing can reduce latency, it can also increase CPU utilization, and thus power usage.  Therefore this option should only be used in environments with small packet rates and plenty of CPU headroom.

Are you ready to get your hands dirty? Let's start the hands-on portion of this lab.

 

 

For users with non-US Keyboards

 

If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop.

 

 

On-Screen Keyboard

 

Another option, if you are having issues with the keyboard, is to use the On-Screen Keyboard.  

To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.

 

 

Getting Back on Track

 

If, for any reason, you make a mistake or the lab breaks, perform the following actions to get back on track, and restart the current module from the beginning.  

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell prompt.

 

 

Resetting VMs to Restart Module

 

From the PowerCLI prompt, type:    

.\StopLabVMs.ps1

and press Enter.

The script will stop all running VMs and reset their settings, and you can restart the module.

 

 

Start this Module

 

Let's start this module.

Launch Chrome from the shortcut in the Taskbar.

 

 

Login to vSphere

 

Log into vSphere. The vSphere Web Client should be the default home page.

If, for some reason, that does not work, uncheck the box and use these credentials:

User name: CORP\Administrator
Password: VMware1!

 

 

Refresh the UI

 

In order to reduce the amount of manual input in this lab, a lot of tasks are automated using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the actual state of the inventory immediately after a script has run.

If you need to manually refresh the inventory, click the Refresh icon in the top of the vSphere Web Client.

 

 

Select Hosts and Clusters

 

 

vSphere HTML5 Web Client


During this module you have the option of utilizing the HTML5 version of the Web Client.  This version is designed to be more responsive, stable and at the same time provide security.  

At this point not all features are available in the Fling release.  The most commonly actions/features are the following:

If you want to run this in your own environment you can reference the following link: https://labs.vmware.com/flings/vsphere-html5-web-client

Keep in mind that the majority of the actions in this module won’t work in the HTML5 version. The recommendation here is to have both Flash and HTML5 instances open at the same time and toggle between them.

The HTML5 Web Client is located in the bookmark section labeled “Region A HTML5 client”.


 

HTML5 Web Client Link

 

 

 

 

What is a Fling?

From the Flings website: https://labs.vmware.com/about

Our engineers work on tons of pet projects in their spare time, and are always looking toget feedback on their projects (or “flings”). Why flings? A fling is a short-term thing, not a serious relationship but a fun one. Likewise, the tools that are offered here are intended to be played with and explored. None of them are guaranteed to become part of any future product offering and there is no support for them. They are, however, totally free for you to download and play around with them!

 

 

Performance impact of the Latency Sensitivity setting


In this section, we will observe the impact of the Latency Sensitivity setting on network latency.  To do so, let's start up some workloads to stress the VMs.


 

Open a PowerCLI window

 

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a command prompt.

 

 

Start CPU workload

 

From the PowerCLI Console, type:  

.\StartLSLab.ps1

press enter

The script will configure and start up three VMs (03a, 05a, and 06a), and generate a CPU workload on two of them (05a and 06a).

 

 

VM Stats Collectors: CPU intensive workload started

 

In a few minutes, when the script completes, you will see two “VM Stats Collector” applications start up. Within a minute after, each utility will start a CPU intensive application on the perf-worker-05a and perf-worker-06a virtual machines and will be collecting the benchmark results from those CPU intensive workloads. These VMs perf-worker-05a and perf-worker-06a will create high demand for CPU on the host, which will help us demonstrate the Latency Sensitivity feature.

IMPORTANT NOTE: Due to changing loads in the lab environment, your values may vary from the values shown in the screenshots.

 

 

Select ESX Host

 

The environment where the lab is running, is not constant. Due to that, it's important, to notice the speed of the CPU's on the nested ESX hosts.

Open the vSphere Web client again.

You should already be in the host and clusters window.

  1. Select esx-01a.corp.local
  2. Make a note of the cpu speed of the processor (in this case 3.07GHZ)

You will be using this, in a later step.

 

 

Edit perf-worker-04a

 

We will use the perf-worker-04a virtual machine to demonstrate the Latency Sensitivity feature. To show how the 'High' Latency Sensitivity setting affects network latency, we will compare network performance between perf-worker-04a with Latency Sensitivity set to 'Normal' and that same VM with Latency Sensitivity set to 'High'.

The Latency Sensitivity feature, when set to 'High', has two VM resource requirements. For best performance, it needs 100% memory reservation and 100% CPU reservation.

To make a fair comparison, both the 'Normal' latency sensitivity VM and the 'High' latency sensitivity VM should have the same resource reservations, so that the only difference between the two is the 'High' latency sensitivity setting.

First, we will create resource allocations for the perf-worker-04a virtual machine while Latency Sensitivity is set to "Normal".

  1. Select perf-worker-04a.  This VM resides on esx-02a.corp.local.
  2. Select edit settings

 

 

Set CPU Reservation to Maximum

 

  1. Expand CPU
  2. Set the Reservation value to the highest value possible, according the the cpu speed, that you noted in the earlier step. If the cpu speed was 3.1 GHZ, set it til 3058MHZ.

Note that it must be a couple of mhz less, for the VM to be able to start.

This sets a near-100% CPU reservation for the VM. When the VM has the 'High' latency sensitivity setting, this CPU reservation enables exclusive affinity so that one physical CPU is reserved solely for use of the 'High' Latency Sensitive VM vCPU.

Note that normally you should select "Maximum" reservation, but due to this being a fully virtualized environment, the CPU speed is detected with a wrong value.Therefore we set it manually according to the underlying hardware.

 

 

Set Memory Reservation

 

Still on the Edit Settings page,

  1. Click CPU to collapse the CPU view
  2. Click Memory to expand the Memory view
  3. Check the box Reserve all guest memory (All locked)

This sets a 100% memory reservation for the VM.

Right now, we are going to test network performance on a 'Normal' Latency Sensitivity VM, but when we change the VM's latency sensitivity to 'High' later, 100% memory reservation ensures that all the memory the VM needs will be located close to the processor which is running the VM. If the VM has a 'High' Latency Sensitivity setting and does not have a 100% memory reservation, it will not power on.

 

 

Ensure Latency Sensitivity is 'Normal'

 

Still on the Edit Settings page:

  1. Click the VM Options tab
  2. Click Advanced to expand this section
  3. Confirm the Latency Sensitivity is Normal
  4. Click OK

 

 

Power on perf-worker-04a

 

  1. Right click on"perf-worker-04a".
  2. Select "Power"
  3. Click "Power On"

 

 

Monitor esx-02a host CPU usage

 

  1. Select esx-02a.corp.local
  2. Select Monitor
  3. Select Performance
  4. Select Advanced
  5. You can see that the Latest value for esx-02a.corp.local Usage should be close to 100%.  This indicates that the perf-worker-05a and perf-worker-06a VMs are consuming as much CPU on the host as they can.

Although an environment which contains latency-sensitive VMs should typically remain CPU undercommitted, creating demand for CPU makes it more likely that we can see a difference between the 'Normal' and 'High' Latency Sensitivity network performance.

The VM perf-worker-03a will serve as the network performance test target.

 

 

Monitor Resource Allocation

 

  1. Select perf-worker-04a
  2. Select Monitor
  3. Select Utilization

The Resource Allocation for the 'Normal' Latency Sensitive VM shows only a small portion of the total CPU and Memory reservation is Active. Your screen may see different values if the VM is still booting up.

 

 

Open a PuTTY window

 

Click the PuTTY icon on the taskbar

 

 

SSH to perf-worker-04a

 

  1. Select perf-worker-04a
  2. Click Open

 

 

Test network latency on 'Normal' latency sensitivity VM

 

At the command line, type:

ping -f -w 1 192.168.100.153

Press enter.

Wait for the command to complete, and run this command a total of 3 times. On the second and third times, you can press the up arrow to retrieve the last command entered.

Ping is a very simple network workload, which measures Round Trip Time (RTT), in which a network packet is sent to a target VM and then returned back to the VM. The VM perf-worker-04a, located on esx-02a.corp.local, is pinging perf-worker-03a, located on esx-01a.corp.local, with the IP address 192.168.100.153. For a period of one second, perf-worker-04a sends back-to-back ping requests. Ping is an ideal low-level network test because the request is processed in the kernel and does not need to access the application layer of the operating system.

We have finished testing network latency and throughput on the 'normal' Latency Sensitivity VM. Do not close this PuTTY window as we will use it for reference later. We will now change the VM to 'high' Latency Sensitivity.

 

 

Shut down the perf-worker-04a VM

 

To enable the latency sensitivity feature for a VM, the VM must first be powered off. You can still change the setting while the VM is powered on, but it doesn't fully apply until the VM has been powered off and then back on again.

  1. Right-click perf-worker-04a
  2. Select Power
  3. Click Shut Down Guest OS

 

 

Confirm Guest Shut Down

 

Click Yes

Wait for perf-worker-04a to shut down.

 

 

Edit Settings for perf-worker-04a

 

We will use the perf-worker-04a virtual machine to demonstrate the Latency Sensitivity feature. To show how the 'High' Latency Sensitivity setting affects network latency, we will compare network performance with this setting set to Normal and High.

The Latency Sensitivity feature, when set to 'High', has two VM resource requirements. For best performance, it needs 100% memory reservation and 100% CPU reservation.

To make a fair comparison, both the 'Normal' latency sensitivity VM and the 'High' latency sensitivity VM should have the same resource reservations, so that the only difference between the two is the 'High' latency sensitivity setting.

First, we will create resource allocations for the perf-worker-04a virtual machine while Latency Sensitivity is set to "Normal" (the default setting).

  1. Click perf-worker-04a
  2. Click Actions
  3. Click Edit Settings...

 

 

Set 'High' Latency Sensitivity.

 

  1. Select VM Options
  2. Expand Advanced
  3. Select High
  4. Click OK

 

 

CPU reservation warning

 

Maybe you noticed a warning in the previous picture? "Check CPU Reservation" appear next to the Latency Sensitivity setting. For best performance, High Latency Sensitivity requires you set 100% CPU reservation for the VM, which we did earlier. This warning will always appear in the Advanced Settings screen, even when the CPU reservation has already been set high enough.

If no reservation is set, the VM is still allowed to power on and no further warnings are made.

 

 

Power on perf-worker-04a

 

  1. Right-click perf-worker-04a
  2. Select Power
  3. Click Power On

 

 

Monitor Resource Allocation

 

  1. Select the "Monitor" tab
  2. Select "Utilization"

On the top half of this image, we see that the 'High' Latency Sensitivity VM shows 100% Active CPU and Private Memory even though the VM itself is idle. Compare this to the Resource Allocation for the 'Normal' Latency Sensitive VM which we examined earlier. It shows only a small portion of the total CPU and Memory reservation is Active. This increase in Active CPU and Memory is the result of the 'High' Latency Sensitivity setting.

Although we cannot see the difference in this environment when 'High' Latency Sensitivity is set with 100% CPU reservation, the Host CPU will show 100% utilization of the physical core which is hosting the VM's vCPU. This is a normal result of exclusive affinity in the Lab environment and occurs even when the VM itself is idle. On many Intel processors, the physical CPU hosting the vCPU will be idle if the vCPU is idle but it will still be unavailable to other vCPUs.

 

 

Monitor the VM Stats Collectors

 

Before we set 'High' Latency Sensitivity for perf-worker-04a, the CPU workers had equivalent benchmark scores. Now, one of the CPU workers will have a lower score. In the example above, perf-worker-06a has a lower score. Your lab may show either perf-worker-05a or perf-worker-06a with a lower score. This confirms that perf-worker-04a has impacted perf-worker-06a's access to CPU cycles which decreases its CPU benchmark score.

Next, we will test network latency on the 'High' Latency Sensitivity VM.

 

 

Open a PuTTY window

 

Click the PuTTY icon on the taskbar

 

 

SSH to perf-worker-04a

 

  1. Select perf-worker-04a
  2. Click Open

 

 

Test network latency on 'High' Latency Sensitive VM

 

At the command line, run the command:

ping -f -w 1 192.168.100.153

Like last time, wait for the command to complete, and run this command a total of three times.

We'll take a look at the results in a second, but first we will set the Latency Sensitivity setting back to default.

 

 

Compare network latency tests

 

From the taskbar, click the PuTTY icons to bring both PuTTY windows to the foreground and arrange them with Normal Latency Sensitivity on top and High Latency Sensitivity on the bottom.

Hint: At the bottom of both windows, there should be a timestamp:

Broadcast message from root (timestamp): The oldest timestamp is the Normal Latency Sensitivity VM. Place this window on top and the other on bottom.

Now let's delve into the performance results.

Important Note: Due to variable loads in the lab environment, your numbers may differ from those above.

The ping test we completed sends as many ping requests to the remote VM as possible ("Back to back pings") within a one second period. As soon as one ping is returned, another request is sent. The ping command outputs four statistics per test:

Of these, we are most interested in minimum latency and maximum deviation.

From 'eyeballing' the differences in numbers between the 'Normal' and 'High' Latency Sensitivity VMs, hopefully you will be able to see the difference.  Note the numbers within the green brackets; the smaller deviation in the 'High' Latency sensitive VM represents less "jitter". Because this is a shared virtualized test environment, these performance results are not representative of the effects of the Latency Sensitivity setting in a real-life environment. They are for demonstration purposes only.

Remember, these numbers were taken from the same VM with the same resource allocations, under the same conditions. The only difference between the two is setting 'Normal' versus 'High' Latency Sensitivity.

 

 

Close the VM Stats Collector windows

 

From the taskbar, click the .NET icon to bring the VM Stats Collectors to the foreground.

We have finished the network tests. Close the windows using the X on each window.

 

 

Close open PuTTY windows

 

Close the open PuTTY windows.

 

Conclusion and Cleanup



 

Clean up procedure

In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machine and reset their configuration.

 

 

Launch PowerCLI

 

If the PowerCLI window is not already open, click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Power off and Reset VMs

 

In the PowerCLI console, type:    

.\StopLabVMs.ps1

press enter

The script will now stop all running VMs and reset their settings.

 

 

Close PowerCLI window

 

Close the PowerCLI Window  

You can now move on to another module.

 

 

Key take aways

The Latency Sensitivity setting is very easy to configure. Once you have determined whether your application fits the definition of 'High' latency sensitivity (tens of microseconds), configure Latency Sensitivity.

To review:

1. On a powered off VM, set 100% memory reservation for the latency sensitive VM.

2. If your environment allows, set 100% CPU reservation for the latency sensitive VM such that the MHz reserved is equal to 100% of the sum of the frequency of the VM's vCPUs.

3. In Advanced Settings, set Latency Sensitivity to High.

If you want to learn more about running latency sensitive applications on vSphere, consult these white papers:

http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf 

http://www.vmware.com/files/pdf/techpaper/latency-sensitive-perf-vsphere55.pdf

 

 

Conclusion

 

This concludes Module X, Module Title. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents'  to quickly jump to a module in the manual.

 

Module 3: CPU Performance Feature: Power Management Policies (15 minutes)

Introduction to, and Performance Impact of, Power Policies


VMware vSphere serves as a common virtualization platform for a diverse ecosystem of applications. Every application has different performance demands which must be met, but recent increases in density and computing needs in datacenters are straining power and cooling capacities and costs of running these applications.

vSphere Host Power Management (HPM) is a technique that saves energy by placing certain parts of a computer system or device into a reduced power state when the system or device is inactive or does not need to run at maximum speed.  vSphere handles power management by utilizing Advanced Configuration and Power Interface (ACPI) performance and power states. In VMware vSphere® 5.0, the default power management policy was based on dynamic voltage and frequency scaling (DVFS). This technology utilizes the processor’s performance states and allows some power to be saved by running the processor at a lower frequency and voltage. However, beginning in VMware vSphere 5.5, the default HPM policy uses deep halt states (C-states) in addition to DVFS to significantly increase power savings over previous releases while still maintaining good performance.

However, in order for ESXi to be able to control these features, you must ensure that the server BIOS power management profile is set to "OS Control mode" or the equivalent.

In this lab, we will show how to:

  1. Customize your server's BIOS settings (using example screen shots)
  2. Explain the four power policies that ESXi offers, and demonstrate how to change this setting
  3. Optimize your environment for either balancing power and performance (recommended for most environments), or optimizing for maximum performance (which can sacrifice some power savings).

 

For users with non-US Keyboards

 

If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop.

 

 

On-Screen Keyboard

 

Another option, if you are having issues with the keyboard, is to use the On-Screen Keyboard.  

To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.

 

 

Getting Back on Track

 

If, for any reason, you make a mistake or the lab breaks, perform the following actions to get back on track, and restart the current module from the beginning.  

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell prompt.

 

 

Resetting VMs to Restart Module

 

From the PowerCLI prompt, type:    

.\StopLabVMs.ps1

and press Enter.

The script will stop all running VMs and reset their settings, and you can restart the module.

 

 

Start this Module

 

Let's start this module.

Launch Chrome from the shortcut in the Taskbar.

 

 

Login to vSphere

 

Log into vSphere. The vSphere Web Client should be the default home page.

If, for some reason, that does not work, uncheck the box and use these credentials:

User name: CORP\Administrator
Password: VMware1!

 

 

Refresh the UI

 

In order to reduce the amount of manual input in this lab, a lot of tasks are automated using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the actual state of the inventory immediately after a script has run.

If you need to manually refresh the inventory, click the Refresh icon in the top of the vSphere Web Client.

 

 

Select Hosts and Clusters

 

 

Configuring the Server BIOS Power Management Settings


VMware ESXi includes a full range of host power management capabilities.  These can save power when an ESXi host is not fully utilized.  As a best practice, you should configure your server BIOS settings to allow ESXi the most flexibility in using the power management features offered by your hardware, and make your power management choices within ESXi (next section).

On most systems, the default setting is BIOS-controlled power management. With that setting, ESXi won’t be able to manage power; instead it will be managed by the BIOS firmware.  The sections that follow describe how to change this setting to OS Control (recommended for most environments).

In certain cases, poor performance may be related to processor power management, implemented either by ESXi or by the server hardware.  Certain applications that are very sensitive to processing speed latencies may show less than expected performance when processor power management features are enabled. It may be necessary to turn off ESXi and server hardware power management features to achieve the best performance for such applications.  This setting is typically called Maximum Performance mode in the BIOS.

NOTE: Disabling power management usually results in more power being consumed by the system, especially when it is lightly loaded. The majority of applications benefit from the power savings offered by power management, with little or no performance impact.

Bottom line: some form of power management is recommended, and should only be disabled if testing shows this is hurting your application performance.

For more details on how and what to configure, see this white paper: http://www.vmware.com/files/pdf/techpaper/hpm-perf-vsphere55.pdf


 

Configuring BIOS to OS Control mode (Dell example)

 

The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS can be configured to allow the OS (ESXi) to control the CPU power-saving features directly:

For a Dell PowerEdge 12th Generation or newer server with UEFI (Unified Extensible Firmware Interface), review the System Profile modes in the System Setup> System BIOS settings. You see these options:

Choose Performance Per Watt (OS).

Next, you should verify the Power Management policy used by ESXi (see the next section).

 

 

Configuring BIOS to OS Control mode (HP example)

 

The screenshot above illustrates how a HP ProLiant server BIOS can be configured through the ROM-Based Setup Utility (RBSU).  The settings highlighted in red allow the OS (ESXi) to control some of the CPU power-saving features directly:

Next, you should verify the Power Management policy used by ESXi (see the next section).

 

 

Configuring BIOS to Maximum Performance mode (Dell example)

 

The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS can be configured to disable power management:

For a Dell PowerEdge 12th Generation or newer server with UEFI, review the System Profile modes in the System Setup> System BIOS settings. You see these options:

Choose Performance to disable power management.

NOTE: Disabling power management usually results in more power being consumed by the system, especially when it is lightly loaded. The majority of applications benefit from the power savings offered by power management, with little or no performance impact. Therefore, if disabling power management does not realize any increased performance, VMware recommends that power management be re-enabled to reduce power consumption.

 

 

Configuring BIOS to Maximum Performance mode (HP example)

 

The screenshot above illustrates how to set the HP Power Profile mode in the server's RBSU to the Maximum Performance setting to disable power management:

NOTE: Disabling power management usually results in more power being consumed by the system, especially when it is lightly loaded. The majority of applications benefit from the power savings offered by power management, with little or no performance impact. Therefore, if disabling power management does not realize any increased performance, VMware recommends that power management be re-enabled to reduce power consumption.

 

 

Configuring BIOS Custom Settings (Advanced)

 

The screenshot above illustrates that if a Custom System Profile is selected, individual parameters are allowed to be modified.  Here are some examples of some of these settings; for more information, please consult your server's BIOS setup manual.

 

Configuring Host Power Management in ESXi


VMware ESXi includes a full range of host power management capabilities.  These can save power when an ESXi host is not fully utilized.  As a best practice, you should configure your server BIOS settings to allow ESXi the most flexibility in using the power management features offered by your hardware, and make your power management choices within ESXi.  These choices are described below.


 

Select Host Power Management Settings for esx-01a

 

  1. Select "esx-01a.corp.local"
  2. Select "Manage"
  3. Select "Settings"
  4. Select "Power Management" in the Hardware section (not under System)

 

 

Power Management Policies

 

On a physical host, the Power Management options could look like this (it may vary depending on the processors of the physical host).

Here you can see what ACPI states that get presented to the host and what Power Management policy is currently active.  There are four Power Management policies available in ESXi 5.0, 5.1, 5.5, 6.0 and ESXi/ESX 4.1:

  1. Click "Edit" to see the different options

NOTE: Due to the nature of this lab environment, we are not interacting directly with physical servers, so changing the Power Management policy will not have any noticeable effect.  Therefore, while the sections that follow will describe each Power Management policy, we won't actually change this setting.

 

 

High Performance

 

The High Performance power policy maximizes performance, and uses no power management features. It keeps CPUs in the highest P-state at all times. It uses only the top two C-states (running and halted), not any of the deep states (for example, C3 and C6 on the latest Intel processors). High performance was the default power policy for ESX/ESXi releases prior to 5.0.

 

 

Balanced (default)

 

The Balanced power policy is designed to reduce host power consumption while having little or no impact on performance. The balanced policy uses an algorithm that exploits the processor’s P-states. This is the default power policy since ESXi 5.0.  Beginning in ESXi 5.5, we now also use deep C-states (greater than C1) in the Balanced power policy. Formerly, when a CPU was idle, it would always enter C1. Now ESXi chooses a suitable deep C-state depending on its estimate of when the CPU will next need to wake up.

 

 

Low Power

 

The Low Power policy is designed to save substantially more power than the Balanced policy by making the P-state and C-state selection algorithms more aggressive, at the risk of reduced performance.

 

 

Custom

 

The Custom power policy starts out the same as Balanced, but allows individual parameters to be modified.

Click "Cancel" to exit.

The next step describes settings that control the Custom power policy.

 

 

Setting custom parameters

 

To configure the custom policy settings,

  1. Select Advanced System Settings (under the System section)
  2. Type "in custom policy" in the filter search bar (as shown above) to only show Custom policy settings.

The settings you can customize include:

 

Conclusion


This concludes Module 3: CPU Performance Feature: Power Policies.  We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.  Let's review some key takeaways and where to go from here.


 

Key takeaways

We hope that you now know how to change power policies, both at the server BIOS level and also within ESXi itself.

To summarize, here are some best practices around power management policies:

Depending on your applications and the level of utilization of your ESXi hosts, the correct power policy setting can have a great impact on both performance and energy consumption. On modern hardware, it is possible to have ESXi control the power management features of the hardware platform used. You can select to use predefined policies or you can create your own custom policy.

Recent studies have shown that it is best to let ESXi control the power policy.  For more details, see the following references:

 

 

Next Steps

 

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents'  to quickly jump to a module in the manual.

 

Module 4: vSphere Fault Tolerance (FT) and Performance (30 minutes)

Introduction to vSphere Fault Tolerance


VMware vSphere Fault Tolerance (FT) is a pioneering component that provides continuous availability to applications, preventing downtime and data loss in the event of server failures. VMware Fault Tolerance, built using VMware vLockstep technology, provides operational continuity and high levels of uptime in VMware vSphere environments, with simplicity and at a low cost.

With vSphere 6, one of the key new features is support for up to 4 virtual CPUs (vCPUs) in FT virtual machines, also known as SMP FT. This is especially important for IT departments that may have limited clustering experience but don't want the downtime associated with a hardware failure.  Use vSphere FT as needed for applications that require continuous protection during critical times, such as quarter-end processing.


 

How does Fault Tolerance work?

 

VMware vSphere® Fault Tolerance (FT) provides continuous availability for applications in the event of server failures by creating a live "shadow instance" of a virtual machine that is always up-to-date with the primary virtual machine. In the event of a hardware outage, vSphere FT automatically triggers failover—ensuring zero downtime and preventing data loss.

After failover, vSphere FT automatically creates a new, secondary virtual machine to deliver continuous protection for the application.

 

 

FT Architecture

vSphere FT is made possible by four underlying technologies: storage, runtime state, network, transparent failover.

Storage

vSphere FT ensures the storage of the primary and secondary virtual machines is always kept in sync.  Whenever vSphere FT protection begins, an initial synchronization of the VMDKs happens using a Storage vMotion to ensure the primary and secondary have the exact same disk state.

This initial Storage vMotion happens whenever FT is turned on, a failover occurs, or a powered-off FT virtual machine powers on.  The FT virtual machine is not considered FT-protected until the Storage vMotion completes.

After this initial synchronization, vSphere FT will mirror VMDK modifications between the primary and secondary over the FT network to ensure the storage of the replicas continues to be identical.

Runtime State

vSphere FT ensures the runtime state of the two replicas is always identical.  It does this by continuously capturing the active memory and precise execution state of the virtual machine, and rapidly transferring them over a high speed network, allowing the virtual machine to instantaneously switch from running on the primary ESXi host to the secondary ESXi host whenever a failure occurs.

Network

The networks being used by the virtual machine are also virtualized by the underlying ESXi host, ensuring that even after a failover, the virtual machine identity and network connections are preserved.  Similar to vSphere vMotion, vSphere FT manages the virtual MAC address as part of the process.  If the secondary virtual machine is activated, vSphere FT pings the network switch to ensure that it is aware of the new physical location of the virtual MAC address.  Since vSphere FT preserves the storage, the precise execution state, the network identity, and the active network connections, the result is zero downtime and no disruption to users should an ESXi server failure occur.

Transparent Failover

If a failover occurs, vSphere FT ensures that the new primary always agrees with the old primary about the state of the virtual machine.  This is achieved by holding and only releasing externally visible output from the virtual machine once an acknowledgment is made from the secondary affirming that the state of the two virtual machines is consistent (for the purposes of vSphere FT, externally visible output is network transmissions).

 

 

Benefits of FT

 

vSphere FT offers the following benefits:

 

 

What's new in vSphere 6.0 FT

 

With vSphere 6.0, the new Multi-Processor FT (SMP-FT) implementation now brings continuous availability protection for VMs with up to 4 vCPUs.  There are also new features in addition:

Note that there are some differences between the vSphere editions: Standard and Enterprise support 2 vCPU FT, while Enterprise Plus raises this to 4 vCPU support.

(credit: http://vinfrastructure.it/2015/02/vmware-vsphere-6-the-new-ft-feature/)

 

 

Considerations for vCenter Server with FT

 

When virtualizing vCenter Server, technologies such as vSphere FT can help protect the vCenter management server from hardware failures.

Compared to vSphere HA, vSphere FT can provide instantaneous protection, but the following limitations must be considered:

Because vSphere FT is suitable for workloads with a maximum of four vCPUs and 64GB of memory, it can be used in “tiny” and “small” vCenter Server deployments.

 

 

Prerequisites for FT

 

All hosts with vSphere FT enabled require a dedicated 10Gbps low-latency VMkernel interface for vSphere FT logging traffic.

The option to turn on vSphere FT is unavailable (dimmed) if any of these conditions apply:

Next, we will go through an example of configuring a VM for vSphere FT.

 

 

For users with non-US Keyboards

 

If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop.

 

 

On-Screen Keyboard

 

Another option, if you are having issues with the keyboard, is to use the On-Screen Keyboard.  

To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.

 

 

Getting Back on Track

 

If, for any reason, you make a mistake or the lab breaks, perform the following actions to get back on track, and restart the current module from the beginning.  

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell prompt.

 

 

Resetting VMs to Restart Module

 

From the PowerCLI prompt, type:    

.\StopLabVMs.ps1

and press Enter.

The script will stop all running VMs and reset their settings, and you can restart the module.

 

 

Start this Module

 

Let's start this module.

Launch Chrome from the shortcut in the Taskbar.

 

 

Login to vSphere

 

Log into vSphere. The vSphere Web Client should be the default home page.

If, for some reason, that does not work, uncheck the box and use these credentials:

User name: CORP\Administrator
Password: VMware1!

 

 

Refresh the UI

 

In order to reduce the amount of manual input in this lab, a lot of tasks are automated using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the actual state of the inventory immediately after a script has run.

If you need to manually refresh the inventory, click the Refresh icon in the top of the vSphere Web Client.

 

 

Select Hosts and Clusters

 

 

vSphere HTML5 Web Client


During this module you have the option of utilizing the HTML5 version of the Web Client.  This version is designed to be more responsive, stable and at the same time provide security.  

At this point not all features are available in the Fling release.  The most commonly actions/features are the following:

If you want to run this in your own environment you can reference the following link: https://labs.vmware.com/flings/vsphere-html5-web-client

Keep in mind that the majority of the actions in this module won’t work in the HTML5 version. The recommendation here is to have both Flash and HTML5 instances open at the same time and toggle between them.

The HTML5 Web Client is located in the bookmark section labeled “Region A HTML5 client”.


 

HTML5 Web Client Link

 

 

 

 

What is a Fling?

From the Flings website: https://labs.vmware.com/about

Our engineers work on tons of pet projects in their spare time, and are always looking toget feedback on their projects (or “flings”). Why flings? A fling is a short-term thing, not a serious relationship but a fun one. Likewise, the tools that are offered here are intended to be played with and explored. None of them are guaranteed to become part of any future product offering and there is no support for them. They are, however, totally free for you to download and play around with them!

 

 

Configure Lab for Fault Tolerance


As it leverages existing vSphere HA clusters, vSphere FT can safeguard any number of virtual machines in a cluster.  Administrators can turn vSphere FT on or off for specific virtual machines with a point-and-click action in the vSphere Web Client.

To see how this works from a "functional" perspective, we will use our lab environment, which is a nested ESXi environment. SMP-FT no longer uses the "record/replay" capability like its younger brother Uniprocessing Fault Tolerance (UP-FT). Instead, SMP-FT now uses a new Fast Checkpointing technique which not only improves the overall performance of its predecessor but also greatly simplifies and reduces additional configurations when running in a Nested ESXi environment like this Hands-On Lab.

NOTE: Running SMP-FT in a Nested ESXi environment does not replace or substitute actual testing of physical hardware. For any type of performance testing, please test SMP-FT using real hardware.


 

Open a PowerCLI window

 

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a command prompt.

 

 

Start Fault Tolerance configuration

 

From the PowerCLI Console, type:  

.\StartFaultToleranceLab.ps1

press enter

 

 

Edit Cluster's vSphere HA Settings

 

  1. Click RegionA01-COMP01 from the Hosts and Clusters list on the left
  2. Click Manage
  3. Click Settings
  4. Click vSphere HA (NOTE: you should see the message "vSphere HA is Turned OFF" as shown above)
  5. Click Edit...

We will now enable vSphere HA for the cluster.

 

 

Enable vSphere HA

 

  1. Check Turn on vSphere HA
  2. Check Host Monitoring
  3. Set Virtual Machine Monitoring to Disabled
  4. Click Admission Control to configure this option.  We will set this in the next step.

 

 

Enable vSphere HA (continued)

 

  1. Under Admission Control, scroll to the bottom and select the last radio button, Do not reserve failover capacity.  This is necessary in our lab environment since not all HA constraints may necessarily be guaranteed.
  2. Click OK.

vSphere HA will now be enabled to reduce downtime.  Again, this is a prerequisite for Fault Tolerance.

 

 

Verify vSphere HA is Enabled

 

  1. Now that vSphere HA has been enabled, click the Refresh icon at the top of the vSphere Web Client to ensure this is reflected in the UI.
  2. Click on vSphere HA Monitoring.

 

 

Review vSphere HA Monitoring page

 

Review the vSphere HA tab under the Monitor section.  You should see a screen similar to the above.

  1. To verify there are no issues, click Configuration Issues.

 

 

Review vSphere HA Configuration Issues

 

Review this Configuration Issues page.  This list is empty if vSphere HA succeeded without issues.

 

 

Turn On vSphere FT for perf-worker-02a

 

  1. Left-click perf-worker-02a from the list of virtual machines on the left
  2. Click the Actions dropdown in the upper right pane
  3. Hover over the Fault Tolerance menu, and choose Turn On Fault Tolerance

This will pop up a Fault Tolerance configuration screen, which is shown next.

 

 

Select datastore for vSphere FT (1/2)

 

The first step is to select the datastores for the secondary VM configuration file, tie breaker file, and virtual hard disk.

  1. Click Browse...
  2. Click Browse... again from the dropdown.

A list of datastores will pop up next.

 

 

Select datastore for vSphere FT (2/2)

 

Click the only datastore we have in this environment (RegionA01-ISCSI02-COMP01), then click OK.

Repeat the previous step and this step for all three files (Configuration File, Tie Breaker File, and Hard disk 1).

 

 

Ensure compatibility checks succeeded

 

After selecting the datastore for the secondary files, you should see a screen like the one above, with a green checkbox that says "Compatibility checks succeeded."

Click Next.

 

 

Select host esx-02a for the secondary VM

 

We now need to select where to host the secondary VM.

  1. perf-worker-02a is already running on esx-01a, so we should select the other host in the cluster; click esx-02a.corp.local.
  2. You'll see a warning that we will be using the same datastore (RegionA01-ISCI02-COMP01) for both the primary and secondary VM's disks.  While not recommended for production, this is only a demonstration lab environment.  Click Next to continue.

 

 

 

Review selections and finalize enabling vSphere FT

 

Ensure your selections match the screenshot above, and click Finish to turn on fault tolerance for perf-worker-02a.

 

 

Power on perf-worker-02a

 

  1. Left-click perf-worker-02a from the list of virtual machines on the left.  Note the slightly darker blue color for the VM, which indicates it is now fault-tolerant.
  2. Click the Actions link in the upper right pane
  3. Hover over the Power menu, and choose Power On

This will power on the perf-worker-02a VM, and start the procedure to make it Fault Tolerant.

 

 

Monitor the Fault Tolerance secondary VM creation

 

  1. Click the vSphere Web Client text in the upper left-hand corner to return to the home screen
  2. Click Tasks on the left-hand pane
  3. Click the Refresh icon periodically and monitor the Task Console Progress until you see it has Completed

Step 3 could take up to 5-10 minutes to complete.  Once you see that all tasks have Completed, continue onto the next step.

 

 

Select the Fault Tolerant VM

 

Once the secondary VM has been created in the previous step, click on the perf-worker-02a VM to switch the view back to our fault-tolerant VM.

 

 

Verify the VM is Fault Tolerant

 

From the perf-worker-02a VM view, we can verify that the VM is now Fault Tolerant in a couple of ways:

Additionally, we could induce a failover from esx-01a to esx-02a by selecting Actions, Fault Tolerance, Test Failover.  However, this is time-consuming and resource-intensive, as it not only makes esx-02a the new Primary VM location, but also makes esx-01a the new Secondary VM location.

 

 

Turn Off vSphere FT for perf-worker-02a

 

Now let's reverse the process (disable Fault Tolerance) to clean up the environment:

  1. Click the Actions dropdown in the upper right pane
  2. Hover over the Fault Tolerance menu
  3. Click Turn Off Fault Tolerance

This will pop up a dialog box asking you to confirm; click Yes.

This will unregister, power off, and delete the secondary VM.

 

 

Select Hosts and Clusters

 

We need to reselect the cluster to remove vSphere HA:

  1. Get to the Home screen by clicking vSphere Web Client in the upper left
  2. Click Hosts and Clusters.

 

 

Edit Cluster's vSphere HA Settings

 

  1. Click Cluster Site A from the Hosts and Clusters list on the left
  2. Click Manage
  3. Click Settings
  4. Click vSphere HA (NOTE: you should see the message "vSphere HA is Turned ON" as shown above)
  5. Click Edit...

We will now disable vSphere HA for the cluster.

 

 

Disable vSphere HA

 

  1. Uncheck Turn on vSphere HA
  2. Check OK

 

 

Shut down perf-worker-02a

 

  1. Left-click perf-worker-02a from the list of virtual machines on the left.  Note the VM is no longer dark blue, since it is no longer fault-tolerant.
  2. Click the Actions link in the upper right pane
  3. Hover over the Power menu, and choose Shut Down Guest OS
  4. Hit Yes to confirm

This will shut down the perf-worker-02a VM.

 

Fault Tolerance Performance


Because the hands-on lab environment is shared, it is not feasible to try and run benchmarks that could potentially saturate the environment (there are other users taking this, and other labs, too :-).

Therefore, this section will show some results from a FT performance whitepaper, which used a variety of micro-benchmarks and real-life workloads.

Identical hardware test beds were used for all the experiments, and the performance comparison was done by running the same workload on the same virtual machine with and without FT enabled.


 

Kernel Compile

 

This experiment shows the time taken to do a parallel compile of the Linux kernel.  This is a both a CPU- and MMU-intensive workload due to the forking of many parallel processes. The CPU is 100 percent utilized. This workload does some disk reads and writes, but generates no network traffic.

As shown in the figure above, FT protection increases the kernel compile time a small amount -- about 7 seconds.

 

 

Network Throughput (Netperf)

 

Netperf is a micro-benchmark that measures the performance of sending and receiving network packets.  Netperf was configured with several scenarios and network speeds to demonstrate the throughput and latency performance of TCP/IP under FT.

This netperf experiment measures unidirectional TCP/IP throughput.  One experiment is done in each direction, when the virtual machine is either receiving or transmitting data.  The speed of the virtual machine network is an important factor for performance; the experiment shown above was on a 1 Gbps uplink.

The throughput experiments reveal some important points about performance under FT protection:

Receiving heavy workloads tends to increase FT traffic due to the requirement to keep the replicas in sync.  The influx of data into the primary causes large differences between the replicas, and thus requires more data to be sent over the FT network.  Transmitting heavy workloads, on the other hand, causes very few differences between the replicas, and thus very little FT traffic.  Therefore, transmit-heavy applications, such as Web servers or read-bound databases tend to have less FT traffic requirements.

 

 

Network Latency (Netperf)

 

Another aspect of networking performance to consider is latency. Fault tolerance introduces some delay to network output (measurable only in milliseconds, as shown above). The latency occurs because the Primary VM must wait until the Secondary is in an identical state before transmitting network packets.

In this experiment, netperf is run with the TCP_RR configuration (single stream, no parallel streams) and the round-trip latency is reported here (it is the inverse of the round-trip transaction rate).  TCP_RR is a pure latency benchmark: the sender transmits a 1-byte message and blocks waiting for a 1-byte response, the benchmark counts the number of serial transactions completed in a unit time, and has no parallelization.

In a pure latency benchmark, latency increases are proportional to throughput decreases.  Normal server applications are not pure latency benchmarks; applications handle multiple connections at a time, and each connection will transmit multiple packets of data before pausing to hear a response.  The result is that real-world applications can tolerate network latencies without dropping throughput.  The previous netperf throughput experiment is an example of this, and the client/server workloads shown here demonstrate the same.

One aspect not measured by netperf is jitter and latency fluctuation.  FT-protected virtual machines can vary widely in latencies depending on the workload, and over time within a given workload.  This can cause significant jitter.  Highly latency-sensitive applications, such as high frequency trading (HFT), or some voice-over-IP (VOIP) applications may experience high overhead with FT. However, some voice applications, where the bulk data is carried by separate machines and only call management traffic is FT protected, would perform fine.

 

 

Iometer

 

Iometer is an I/O subsystem measurement and characterization tool for Microsoft Windows.  It is designed to produce a mix of operations to stress the disk.  This benchmark ran random I/Os of various types.  The bar charts above show that FT-protected VM achieves nearly the same throughput as the non-protected VM.

 

 

Swingbench with Oracle 11g

 

In this experiment, an Oracle 11g database was driven using the Swingbench 2.2 order entry online transaction processing (OLTP) workload. This workload has a mixture of CPU, memory, disk, and network resource requirements.  The FT-protected virtual machine is able to achieve nearly the same throughput as the non-FT virtual machine (top chart).

The latency of basic operations has increased under FT protection (bottom chart), but still within an acceptable user threshold of milliseconds.

 

Conclusion


All Fault Tolerance solutions rely on redundancy.  That means a certain cost must be paid to establish replicas and keep them in sync.  These costs come in the form of CPU, storage, and network overheads. For a variety of workloads, CPU and storage overheads are generally modest or minimal with FT protection. The most noticeable overhead for FT-protected virtual machines is the increase in latency for network packets.  However, the experiments performed have shown that FT-protected workloads can achieve good application throughput despite an increase in network latency; network latency does not dictate overall application throughput for a wide variety of applications. On the other hand, applications that are sensitive to network latency (such as high frequency trading or realtime workloads) will pay a higher cost under FT protection.

VMware vSphere Fault Tolerance is a revolutionary new technology. It universally applies the basic principles and guarantees of fault-tolerant technology to any multi-vCPU workload in a uniquely simple to use way. The vSphere FT solution is able to achieve good throughput for a wide variety of applications.


Module 5: Memory Performance, Basic Concepts and Troubleshooting (30 minutes)

Introduction to Memory Performance Troubleshooting


The goal of this module is to expose you to a memory performance problem in a virtualized environment as an example. It will also guide you on how to quickly identify performance problems by checking various performance metrics and settings.

Host memory is a limited resource. VMware vSphere incorporates sophisticated mechanisms that maximize the use of available memory through page sharing, resource-allocation controls, and other memory management techniques. However, several vSphere Memory Over-commitment Techniques only come into play when the host is under memory pressure (in other words, when the sum of all of the VMs' virtual memory exceeds that of the physical hosts hosting the VMs).

This module will discuss:

This test demonstrates Memory Demand vs. Consumed Memory in a vSphere environment.  It also demonstrates how memory overcommitment impacts host and VM performance.

The first step is to prepare the environment for the demonstration.


 

For users with non-US Keyboards

 

If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop.

 

 

On-Screen Keyboard

 

Another option, if you are having issues with the keyboard, is to use the On-Screen Keyboard.  

To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.

 

 

Getting Back on Track

 

If, for any reason, you make a mistake or the lab breaks, perform the following actions to get back on track, and restart the current module from the beginning.  

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell prompt.

 

 

Resetting VMs to Restart Module

 

From the PowerCLI prompt, type:    

.\StopLabVMs.ps1

and press Enter.

The script will stop all running VMs and reset their settings, and you can restart the module.

 

 

Start this Module

 

Let's start this module.

Launch Chrome from the shortcut in the Taskbar.

 

 

Login to vSphere

 

Log into vSphere. The vSphere Web Client should be the default home page.

If, for some reason, that does not work, uncheck the box and use these credentials:

User name: CORP\Administrator
Password: VMware1!

 

 

Refresh the UI

 

In order to reduce the amount of manual input in this lab, a lot of tasks are automated using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the actual state of the inventory immediately after a script has run.

If you need to manually refresh the inventory, click the Refresh icon in the top of the vSphere Web Client.

 

 

Select Hosts and Clusters

 

 

vSphere HTML5 Web Client


During this module you have the option of utilizing the HTML5 version of the Web Client.  This version is designed to be more responsive, stable and at the same time provide security.  

At this point not all features are available in the Fling release.  The most commonly actions/features are the following:

If you want to run this in your own environment you can reference the following link: https://labs.vmware.com/flings/vsphere-html5-web-client

Keep in mind that the majority of the actions in this module won’t work in the HTML5 version. The recommendation here is to have both Flash and HTML5 instances open at the same time and toggle between them.

The HTML5 Web Client is located in the bookmark section labeled “Region A HTML5 client”.


 

HTML5 Web Client Link

 

 

 

 

What is a Fling?

From the Flings website: https://labs.vmware.com/about

Our engineers work on tons of pet projects in their spare time, and are always looking toget feedback on their projects (or “flings”). Why flings? A fling is a short-term thing, not a serious relationship but a fun one. Likewise, the tools that are offered here are intended to be played with and explored. None of them are guaranteed to become part of any future product offering and there is no support for them. They are, however, totally free for you to download and play around with them!

 

Memory Resource Control



 

Open a PowerCLI window

 

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a command prompt.

 

 

Start Memory workload

 

From the PowerCLI Console, type:  

.\StartMemoryTest.ps1

press Enter.  This script will configure and start up two VMs, and generate a memory workload.

NOTE: Please wait a couple of minutes, and do not proceed with the lab until you see output as shown in the next step.

 

 

Memory activity benchmark

 

Two windows showing a memory performance benchmark are launched. We need these to generate workload that we can inspect. We will return to them shortly.

The actual performance numbers will vary from environment to environment.

 

 

Select perf-worker-02a

 

Return to the vSphere Web Client.

  1. Select perf-worker-02a

 

 

Monitor perf-worker-02a Utilization metrics

 

  1. Select the Monitor tab.
  2. Select Utilization

You can see that perf-worker-02a and perf-worker-03a virtual machines are configured with 1.5GB of memory and are running on the ESXi host esx-01a. If you wait for a while, the memory consumption of the virtual machines will look something like the above screenshot. The ESXi host has 8GB of memory, so there is no memory contention at this time.

A host determines allocations for each VM based on the number of shares allocated to it and an estimate of its recent working set size (shown as Active Guest Memory above):

This approach ensures that a virtual machine from which idle memory is reclaimed can ramp up quickly to its full share-based allocation when it starts using its memory more actively.

By default, active memory is estimated once every 60 seconds. To modify this, adjust the Mem.SamplePeriod advanced setting.

 

 

Select esx-01a Host

 

  1. Select esx-01a.corp.local

 

 

View the ESX hosts memory metrics

 

  1. Select "Monitor"
  2. Select "Performance"
  3. Select "Advanced"
  4. Select the "Memory" view

Consumed memory on the host is around 4GB, but active memory is less than 3GB. Notice that there is no memory contention, as the host has 8GB of memory.

In late 2014, VMware announced that ESXi will no longer have TPS (Transparent Page Sharing) enabled in future releases per default, although TPS is still available. For more information see KB: http://kb.vmware.com/kb/2080735

Transparent page sharing is a method by which redundant copies of memory pages are eliminated (deduplicated). TPS has always been running by default, until late 2014. However, if TPS is enabled, and you are running on modern hardware-assisted memory virtualization systems; vSphere will preferentially back guest physical pages with large host physical pages (2MB contiguous memory region instead of 4KB for regular pages) for better performance. vSphere will not attempt to share large physical pages because the probability of finding two large pages that are identical is very low. If memory pressure occurs on the host, vSphere may break the large memory pages into regular 4KB pages, which TPS will then be able to use to consolidate memory in the host.

In vSphere 6, TPS has been enhanced to support different levels of page sharing such as intra VM sharing, inter VM sharing etc. See this article for more information: http://kb.vmware.com/kb/2097593

 

 

Observe Application Performance

 

As there is no memory pressure in the environment, the virtual machine performance is good. The virtual machines are configured identical, and there for their performance numbers are fairly identical. The numbers will bounce a bit up and down due to the design of the lab.

 

 

Power on the Memory Hog virtual machine

 

Select and Right click on perf-worker-04a.

  1. Right click "perf-worker-04a"
  2. Select "Power"
  3. Click "Power On"

Perf-worker-04a has been configured to boot up as a VM that consumes a lot of memory, a memory hog. The memory hog virtual machine is configured with 5.5GB of memory, and will consume all the free memory of the host and cause memory contention.

While the memory hog powers on, keep an eye on the benchmark scores for the memory performance on perf-worker-02a and perf-worker-03a. They will take a large dip in performance as the memory pressure increases and vSphere has to stabilize the environment.

 

 

Review the Resource Allocation for the virtual machines

 

1. Select "perf-worker-02a"

2. Select "Monitor"

3. Select "Utilization"

Now that memory pressure is occurring in the system, vSphere will begin to use memory overcommit techniques to conserve memory use.

It may take a while for vCenter to update the memory utilization statistics, so you might have to wait. (Try to refresh if nothing happens)

Notice that vSphere has used some memory overcommit techniques on the perf-worker virtual machines to relieve the memory pressure. Notice that consumed memory for the virtual machines is now lower than before we applied memory pressure. As long as the Active memory that virtual machines requires stays in physical memory, the application will perform well.

 

 

Select esx-01a.corp.local

 

  1. Select esx-01a.corp.local

 

 

Review the ESX host memory metrics

 

Review the ESX host memory metrics now that we have powered on the Memory Hog.

  1. Select Monitor
  2. Select Performance
  3. Select Advanced
  4. Select Memory

Notice that Granted and Consumed are very close to the full size of the ESX host (8GB); Active is higher (but still less) than Consumed. You can also see how swapping and ballooning started when we increased the memory pressure on the host. Also notice that Swap used is relatively low. Any active swapping is a performance concern, but relying on this metric alone can be misleading. To more accurately tell if swapping is affecting performance, you would need to look at the Swap in rate available from the Chart Options screen, which we will look at next. Any non-trivial Swap in rate would likely indicate a performance problem.

 

 

Select Chart Options

 

Let's investigate the Swap in rate:

 

 

Select "Swap in rate" counter

 

  1. Scroll down to find "Swap in rate"
  2. Select "Swap in rate"
  3. Click "OK"

 

 

Monitor "Swap in rate" graph

 

Swap in rate is the highlighted purple graph. Note that it is different from the previous chart.

You don't have to wait for the graph to progress as far as illustrated above. Just let it be and come back and see the result later or before stopping perf-worker-04a in a later step.

Overallocating memory tends to be fine for most applications and environments. It is generally safe to have a 20% memory over-allocation, so for initial sizing, start with 20% less memory over-allocation, and increase or decrease after monitoring application performance and ensuring that memory over-allocation does not cause a constant Swap in rate to occur. This is also depending on if you are using transparent page sharing or not.

As you can see, there is a significant amount of memory swap in occurring on the ESXi host.

Continue to see how that has impacted the memory performance measured.

 

 

Monitor Memory Performance under Contention

 

Now that we have applied memory contention to esx-01a.corp.local, the memory performance has dropped significantly. The virtual machines are performing fairly identical.

If the performance numbers are still high, wait a couple of minutes, and they will drop significantly.

If you monitor the benchmarks for a longer period of time, you will see the performance fluctuate. This is due to the benchmark that keeps memory active in the virtual machines. This causes a "waves" of swap-in and swap-out, which causes the performance numbers to shift. When ESXi has had time to optimize the memory access between the powered on virtual machines, the application performance will start to increase. Potentially performance will increase up to the same level as before we had any memory contention. It all depends on the level of memory contention, level of memory activity and memory overcommit techniques available.

Let's try and change the priority of the virtual machines access to memory.

 

 

Edit perf-worker-03a

 

  1. Select perf-worker-03a and right click
  2. Click Edit Settings...

 

 

Change Memory Shares to High

 

  1. Select "Virtual Hardware"
  2. Expand "Memory"
  3. Under shares, select "High"
  4. Click "OK"

 

 

Monitor Memory Performance with High Shares

 

Wait for a couple of minutes and see how the performance of perf-worker-03a starts to increase.

Now that we have doubled the amount of memory shares assigned to perf-worker-03a, the virtual machine is being prioritized over perf-worker-02a and perf-worker-04a. This results in increased memory performance of perf-worker-03a.

Shares is a way of influencing how access to a resource is prioritized between virtual machines, but only under resource contention. It will not increase VM performance in an underutilized environment.

Let's try and remove the memory contention, but keep the high amount of shares assigned to perf-worker-03a.

 

 

Power off perf-worker-04a

 

  1. Right click "perf-worker-04a"
  2. Select "Power"
  3. Click "Power Off"

 

 

Confirm Power Off

 

 

 

Observe memory performance

 

Now monitor the application performance for a couple of minutes. Since we powered off the memory hog VM, we no longer have memory contention, and the memory performance returns to the same level as before we powered on the memory hog. So now the amount of shares assigned to perf-worker-03a is irrelevant.

 

 

Close the benchmark windows

 

  1. Close the two performance counters

 

Conclusion and Clean-Up



 

Clean up procedure

In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machine and reset their configuration.

 

 

Launch PowerCLI

 

If the PowerCLI window is not already open, click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Power off and Reset VMs

 

In the PowerCLI console, type:    

.\StopLabVMs.ps1

press enter

The script will now stop all running VMs and reset their settings.

 

 

Close PowerCLI window

 

Close the PowerCLI Window  

You can now move on to another module.

 

 

Key take aways

During this lab we saw how memory overcommitting affects performance, and how vSphere can use several techniques to reduce the impact of memory overcommit. We also touched on how it is possible to adjust how ESXi uses TPS intra VM, inter VM or doesn't use it at all, depending on how you evaluate the security aspects of TPS. Even though the memory overcommit techniques in ESXi can compensate for some degree of memory overcommit, it is still recommended to rightsize the configuration of a virtual machine if possible. For inspiration on how to rightsize the resource configuration of virtual machines, take a look at HOL-SDC-1610 Module 5. There you will use vRealize Operations Manager to identify resource stress and do right sizing.

 

 

Conclusion

 

This concludes Module 5, Memory Performance, Basic Concepts and Troubleshooting. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents'  to quickly jump to a module in the manual.

 

Module 6: Memory Performance Feature: vNUMA with Memory Hot Add (30 minutes)

Introduction to NUMA and vNUMA


Since 5.0, vSphere has had the vNUMA feature that presents the physical NUMA topology to the guest operating system. Traditionally virtual machines have been presented with a single NUMA node, regardless of the size of the virtual machine, and regardless of the underlying hardware. Larger and larger workloads are being virtualized, and it has become increasingly important that the guest OS and applications can make decisions on where to execute application processes and where to place specific application memory. ESXi is NUMA aware, and will always try to fit a VM within a single NUMA node when possible. With the emergence of the "Monster VM" this is not always possible.

Note that because we are working in a fully virtualized environment, we have to enforce NUMA architectures presented to a VM. In a real environment it would be possible to see the physical architecture. The purpose of this module is to gain understanding of how vNUMA works by itself and in combination with the cores per socket feature.


 

NUMA

 

Non-Uniform Memory Access (NUMA) system architecture

Each node consists of CPU cores and memory. A pCPU can access memory across NUMA nodes, but at a performance cost, and memory access time can be 30% ~ 100% longer

 

 

 

Without vNUMA

 

In this example, a VM with 12 vCPUs is running on a host with four NUMA nodes with 6 cores each. This VM is not being presented with the physical NUMA configuration and hence the guest OS and application only sees a single NUMA node. This means that the guest has no chance of placing processes and memory within a physical NUMA node.

We have poor memory locality.

 

 

With vNUMA

 

In this example, a VM with 12 vCPUs is running on a host that has four NUMA nodes with 6 cores each. This VM is being presented with the physical NUMA configuration, and hence the guest OS and application sees two NUMA nodes. This means that the guest can place processes and accompanying memory within a physical NUMA node when possible.

We have good memory locality.

 

 

For users with non-US Keyboards

 

If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop.

 

 

On-Screen Keyboard

 

Another option, if you are having issues with the keyboard, is to use the On-Screen Keyboard.  

To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.

 

 

Getting Back on Track

 

If, for any reason, you make a mistake or the lab breaks, perform the following actions to get back on track, and restart the current module from the beginning.  

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell prompt.

 

 

Resetting VMs to Restart Module

 

From the PowerCLI prompt, type:    

.\StopLabVMs.ps1

and press Enter.

The script will stop all running VMs and reset their settings, and you can restart the module.

 

 

Start this Module

 

Let's start this module.

Launch Chrome from the shortcut in the Taskbar.

 

 

Login to vSphere

 

Log into vSphere. The vSphere Web Client should be the default home page.

If, for some reason, that does not work, uncheck the box and use these credentials:

User name: CORP\Administrator
Password: VMware1!

 

 

Refresh the UI

 

In order to reduce the amount of manual input in this lab, a lot of tasks are automated using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the actual state of the inventory immediately after a script has run.

If you need to manually refresh the inventory, click the Refresh icon in the top of the vSphere Web Client.

 

 

Select Hosts and Clusters

 

 

vSphere HTML5 Web Client


During this module you have the option of utilizing the HTML5 version of the Web Client.  This version is designed to be more responsive, stable and at the same time provide security.  

At this point not all features are available in the Fling release.  The most commonly actions/features are the following:

If you want to run this in your own environment you can reference the following link: https://labs.vmware.com/flings/vsphere-html5-web-client

Keep in mind that the majority of the actions in this module won’t work in the HTML5 version. The recommendation here is to have both Flash and HTML5 instances open at the same time and toggle between them.

The HTML5 Web Client is located in the bookmark section labeled “Region A HTML5 client”.


 

HTML5 Web Client Link

 

 

 

 

What is a Fling?

From the Flings website: https://labs.vmware.com/about

Our engineers work on tons of pet projects in their spare time, and are always looking toget feedback on their projects (or “flings”). Why flings? A fling is a short-term thing, not a serious relationship but a fun one. Likewise, the tools that are offered here are intended to be played with and explored. None of them are guaranteed to become part of any future product offering and there is no support for them. They are, however, totally free for you to download and play around with them!

 

vNUMA vs. Cores per Socket


Besides the possibility of presenting a virtual NUMA architecture to a virtual machine, it is also possible to alter the number of cores per socket for a virtual machine. This feature controls how virtual CPUs are presented to a guest OS, essentially allowing the guest OS to "see" multi-core CPUs, since by default VMware presents multiple single-core CPUs.

In general, it's best to stick to the default (1 core per virtual socket), and just set the number of virtual cores as large as the workload needs; this is because there is no performance gain to be realized by using multiple virtual multi-core CPUs. The primary use for this feature is for licensing, where an application may require fewer virtual sockets. In this case, the optimal vNUMA size should be determined, and the number of virtual sockets should be set to the same value.


 

Open a PowerCLI window

 

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a command prompt.

 

 

Start the vNUMA Script

 

From the PowerCLI Console, type:  

.\StartvNUMA.ps1

press enter

The script will configure and startup a VM, with four vCPUs and memory hot-add enabled.

The script will then launch a Remote Desktop session to the perf-worker-01a VM.

 

 

Open a Command Prompt on perf-worker-01a

 

In the Remote Desktop window to perf-worker-01a, launch cmd.exe:

  1. Click the Start button
  2. Click cmd.exe

Note: Make sure you do this inside of the Remote Desktop Connection window (as shown above).

 

 

Observe default Cores/Sockets and NUMA architecture

 

Type in the following command, followed by enter:

coreinfo -c -n -s

From the output you can see that the VM is presented with:

  1. 4 Physical Processors (actually, virtual :-) which equals 4 cores (since it is configured with the default 1 core per virtual socket)
  2. 4 Logical Processors, one per physical socket
  3. 1 single NUMA node

Now let's see what impact changing the cores per socket has on this output.

 

 

Shut down perf-worker-01a

 

Back in the vSphere Web Client on the Main Console:

  1. Right click perf-worker-01a
  2. Select Power
  3. Select Shut Down Guest OS

 

 

Confirm Shut Down

 

Click Yes to confirm.

 

 

Edit Settings of perf-worker-01a

 

  1. Select perf-worker-01a from the list of VMs on the left
  2. Click Actions
  3. Click Edit Settings...

 

 

Modify CPU configuration

 

  1. Select the Virtual Hardware tab
  2. Expand CPU
  3. On the Cores per Socket drop-down, select 2
  4. Click OK

 

 

Power on perf-worker-01a VM

 

  1. Right click perf-worker-01a
  2. Select Power
  3. Click Power On

 

 

Start a Remote Desktop session to perf-worker-01a

 

Wait a minute to allow the VM to boot, then open a Remote Desktop Connection to perf-worker-01a by double-clicking the 01a shortcut on the desktop.

 

 

Open a Command Prompt on perf-worker-01a

 

In the perf-worker-01a window, launch a Command Prompt:

  1. Click the Start button
  2. Click cmd.exe

Note: Make sure you're in the Remote Desktop Connection window as shown above.

 

 

Verify Multiple Cores per Socket with coreinfo

 

Type in the following command and press Enter:

coreinfo -c -n -s

From the output you see that one thing has changed: the VM is now presented with 2 Logical Processors (cores) per socket.  Since we still have 4 processors (cores), all presented in a single NUMA node, this is just matter of presentation to the guest OS, and has no performance impact. The feature can be used in order to adhere to licensing terms.

This is valid when vNUMA is not enabled. Let's see what happens when we enable vNUMA with this 2 cores per socket configuration.

 

 

Shut down perf-worker-01a

 

Back in the vSphere Web Client on the Main Console:

  1. Right click perf-worker-01a
  2. Select Power
  3. Select Shut Down Guest OS

 

 

Confirm Shut Down

 

Click Yes to confirm.

 

 

Edit Settings of perf-worker-01a

 

  1. Select perf-worker-01a from the list of VMs on the left
  2. Click Actions
  3. Click Edit Settings...

 

 

Edit Advanced Configuration Parameters

 

  1. Select the VM Options tab
  2. Expand Advanced
  3. Click Edit Configuration...

 

 

Reduce threshold for enabling vNUMA

 

  1. Click the Name column to sort the configuration parameters alphabetically
  2. Locate the row numa.vcpu.min and change the value to 4 (as shown above); the default is 9.
  3. Click OK twice to save this change.

The numa.vcpu.min configuration parameter specifies the minimum number of virtual CPUs in a VM before vNUMA is enabled.  The default is 9, which means that a VM must have 9 or more vCPUs before virtual NUMA is enabled.

By decreasing this value to 4, we can see what effect vNUMA has on this VM without having to increase the number of vCPUs in our resource-constrained lab environment.

 

 

Power on perf-worker-01a VM

 

  1. Right click perf-worker-01a
  2. Select Power
  3. Click Power On

 

 

Start a Remote Desktop session to perf-worker-01a

 

Wait a minute to allow the VM to boot, then open a Remote Desktop Connection to perf-worker-01a by double-clicking the 01a shortcut on the desktop.

 

 

Open a Command Prompt on perf-worker-01a

 

In the perf-worker-01a window, launch a Command Prompt:

  1. Click the Start button
  2. Click cmd.exe

Note: Make sure you're in the Remote Desktop Connection window as shown above.

 

 

Verify Multiple vNUMA Nodes

 

Type in the following command, followed by enter:

coreinfo -c -n -s

From the output you can tell that the VM is now presented with 2 vNUMA nodes, each having 2 processors, which means that we have successfully enabled vNUMA for this VM.

We saw earlier in this module that changing the cores per socket alone did not alter the NUMA architecture presented to the VM. Now we can see that when used in combination with vNUMA, the cores per socket configuration dictates the presented vNUMA architecture. This means that when using the cores per socket feature on VMs with more than 8 vCPUs (default value), the configuration dictates the vNUMA architecture presented to the VM and therefore can have a impact on VM performance. This is because we can force a VM to unnecessarily  span multiple NUMA nodes.

 

 

Best Practices for vNUMA and Cores per Socket

In general, the following best practices should be followed regarding vNUMA and Cores per Socket:

There are many Advanced Virtual NUMA Attributes (see the vSphere Documentation Center for a full list); here are a few:

 

vNUMA with Memory Hot Add


In vSphere releases 5.0 through 5.5, if virtual NUMA was configured with in combination with memory hot add, the additional memory was only allocated to NUMA node 0. In vSphere 6, the added memory is distributed evenly across all available NUMA nodes, providing better VM memory scalability.

Note that vNUMA is still disabled if vCPU hotplug is enabled; therefore, only enable vCPU hotplug if you plan to use it. See this article for more information: http://kb.vmware.com/kb/2040375


 

Launch NumaExplorer

 

perf-worker-01a should already be running and you should have an active RDP session to it. If not, power on perf-worker-01a and launch the RDP session from the shortcut on the Main Console, as previously in this module.

On perf-worker-01a, do the following.

  1. Click Start
  2. Click NumaExplorer

 

 

Observe vNUMA memory sizes

 

  1. Click the Refresh button a couple of times, and monitor how the GetNumaAvailableMemory values change slightly.

As you can see, the VM still has 2 NUMA nodes, each with 2 processor cores.  Furthermore, you can see that the available memory is evenly distributed between the 2 nodes, taking the memory consumption of running processes into consideration.

 

 

Edit Settings of perf-worker-01a

 

  1. Select perf-worker-01a from the list of VMs on the left
  2. Click Actions
  3. Click Edit Settings...

 

 

Modify Memory configuration

 

  1. Select the Virtual Hardware tab
  2. Change Memory to 4096 MB
  3. Click OK

 

 

Observe vNUMA memory sizes

 

Click back to the perf-worker-01a Remote Desktop Connection window.

  1. Click the Refresh button again, and now notice how the "GetNumaAvailableMemory" has increased to 1GB per NUMA node (2GB added in total).

This experiment shows us that on a vNUMA enabled VM, memory hot add does in fact distribute the additional memory evenly across vNUMA nodes.

 

 

Close Remote Desktop Connection to perf-worker-01a

 

  1. Close the Remote Desktop Connection window, as shown above.

 

Conclusion and Clean-Up



 

Clean up procedure

In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machine and reset their configuration.

 

 

Launch PowerCLI

 

If the PowerCLI window is not already open, click on the "VMware vSphere PowerCLI" icon in the taskbar to open a command prompt.

 

 

Power off and Reset VMs

 

In the PowerCLI console, type:    

.\StopLabVMs.ps1

press enter

The script will now stop all running VMs and reset their settings.

 

 

Close PowerCLI window

 

Close the PowerCLI Window  

You can now move on to another module.

 

 

Key take aways

During this lab we learned that for virtual machines with 8 or fewer vCPUs, virtual NUMA is not enabled by default.  In this scenario, increasing the cores per socket value from the default of 1 (to expose multi-core CPUs to the guest) does not affect performance since the VM fits within a physical NUMA node, regardless of its configuration.  In that case, the cores per socket setting is used for licensing issues only.

However, when vNUMA is used, the cores per socket setting does impact the virtual NUMA topology presented to the guest and can have a performance impact if it does not match the physical NUMA topology.  By default, vNUMA will pick the optimal topology for you as long as you have not manually increased the cores per socket value.  If it has been changed for licensing purposes, it is important to match the physical NUMA topology manually.

WARNING! When using the cores per socket configuration in combination with vNUMA, you need to be careful about the changes you make. Dictating a NUMA architecture of a VM that does not match the underlying NUMA architecture, or at least fit within the underlying NUMA architecture, can cause performance problems for demanding applications. However, this can also be used to manage the NUMA architecture of a VM so that it matches the physical server's NUMA layout in a cluster with different physical NUMA layouts. The vNUMA configuration of a VM is locked the first time the VM is powered on and will (by default) not be altered after that. This is to provide guest OS and application stability.

In conclusion, when working with vNUMA enabled VMs, you should ensure that the vNUMA layout of a VM matches the physical NUMA architecture of all hosts within a cluster.

Remember that a NUMA node consists of CPU cores and memory. So if a VM has more memory than what will fit within a single NUMA node, and the VM has 8 or less vCPUs, it may make sense to enable vNUMA so that the guest OS can better place vCPUs and memory.

There has been some confusion around the performance impact of setting the cores per socket of a VM and how vNUMA actually works. By completing this module, we have shown that:

  1. Setting the cores per socket on a VM without vNUMA has no performance impact, and should only be used to comply with license restrictions.
  2. Setting the cores per socket of a VM with vNUMA enabled can have a performance impact and can be used to force a particular vNUMA architecture. Use with caution!
  3. vNUMA is an important feature to ensure optimal performance of larger VMs (>8 vCPUs by default)

If you want to know more about the vNUMA feature of vSphere, see these articles:

http://www.vmware.com/files/pdf/techpaper/VMware-vSphere-CPU-Sched-Perf.pdf (paper has not yet (june 2015) been upgraded with vSphere 6 additions)

http://blogs.vmware.com/vsphere/tag/vnuma

 

 

Conclusion

 

This concludes Module 6, Memory Performance Feature:  vNUMA with Memory Hot Add. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents'  to quickly jump to a module in the manual.

 

Module 7: Storage Performance and Troubleshooting (30 minutes)

Introduction to Storage Performance Troubleshooting


Approximately 90% of performance problems in a vSphere deployment are typically related to storage in some way.  There have been significant advances in storage technologies over the past couple of years to help improve storage performance. There are a few things that you should be aware of:

In a well-architected environment, there is no difference in performance between storage fabric technologies. A well-designed NFS, iSCSI or FC implementation will work just about the same as the others.

Despite advances in the interconnects, performance limit is still hit at the media itself, in fact 90% of storage performance cases seen by GSS (Global Support Services - VMware support) that are not configuration related, are media related. Some things to remember:

A good rule of thumb on the total number of IOPS any given disk will provide:

So, if you want to know how many IOPs you can achieve with a given number of disks:

This test demonstrates some methods to identify poor storage performance, and how to resolve it using VMware Storage DRS for workload balancing. The first step is to prepare the environment for the demonstration.


 

For users with non-US Keyboards

 

If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop.

 

 

On-Screen Keyboard

 

Another option, if you are having issues with the keyboard, is to use the On-Screen Keyboard.  

To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.

 

 

Getting Back on Track

 

If, for any reason, you make a mistake or the lab breaks, perform the following actions to get back on track, and restart the current module from the beginning.  

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell prompt.

 

 

Resetting VMs to Restart Module

 

From the PowerCLI prompt, type:    

.\StopLabVMs.ps1

and press Enter.

The script will stop all running VMs and reset their settings, and you can restart the module.

 

 

Start this Module

 

Let's start this module.

Launch Chrome from the shortcut in the Taskbar.

 

 

Login to vSphere

 

Log into vSphere. The vSphere Web Client should be the default home page.

If, for some reason, that does not work, uncheck the box and use these credentials:

User name: CORP\Administrator
Password: VMware1!

 

 

Refresh the UI

 

In order to reduce the amount of manual input in this lab, a lot of tasks are automated using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the actual state of the inventory immediately after a script has run.

If you need to manually refresh the inventory, click the Refresh icon in the top of the vSphere Web Client.

 

 

Select Hosts and Clusters

 

 

vSphere HTML5 Web Client


During this module you have the option of utilizing the HTML5 version of the Web Client.  This version is designed to be more responsive, stable and at the same time provide security.  

At this point not all features are available in the Fling release.  The most commonly actions/features are the following:

If you want to run this in your own environment you can reference the following link: https://labs.vmware.com/flings/vsphere-html5-web-client

Keep in mind that the majority of the actions in this module won’t work in the HTML5 version. The recommendation here is to have both Flash and HTML5 instances open at the same time and toggle between them.

The HTML5 Web Client is located in the bookmark section labeled “Region A HTML5 client”.


 

HTML5 Web Client Link

 

 

 

 

What is a Fling?

From the Flings website: https://labs.vmware.com/about

Our engineers work on tons of pet projects in their spare time, and are always looking toget feedback on their projects (or “flings”). Why flings? A fling is a short-term thing, not a serious relationship but a fun one. Likewise, the tools that are offered here are intended to be played with and explored. None of them are guaranteed to become part of any future product offering and there is no support for them. They are, however, totally free for you to download and play around with them!

 

Storage I/O Contention



 

Open a PowerCLI window

 

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a command prompt.

 

 

Start the Storage Workloads

 

From the PowerCLI Console, type:

.\StartStorageTest.ps1

press enter

The script configures and starts up the virtual machines, and launches a storage workload using Iometer.

The script may take up to 5 minutes to complete. While the script runs, spend a few minutes on reading through the next step, to gain understanding on storage latencies.

 

 

Disk I/O Latency

 

When we think about storage performance problems, the top issue is generally latency, so we need to look at the storage stack and understand what layers there are in the storage stack and where latency can build up.

At the top most layer, is the Application running in the guest operating system. That is ultimately the place where we most care about latency. This is the total amount of latency that application sees and it include the latencies off the total storage stack including the guest OS, the VMKernel virtualization layers, and the physical hardware.  

ESXi can’t see application latency because that is a layer above the ESXi virtualization layer.

From ESXi we see 3 main latencies that are reported in esxtop and vCenter.  

The top most is GAVG, or Guest Average latency, that is the total amount of latency that ESXi can detect.  

That is not saying this is the total amount of latency the application will see, in fact if you compare the GAVG (the Total Amount of Latency ESX is seeing) and the Actual latency the Application is seeing, you can tell how much latency the Guest OS is adding to the storage stack and that could tell you if the guest OS is configured incorrectly or is causing a performance problem. For example, if ESX is reporting GAVG of 10ms, but the application or perfmon in the guest OS is reporting Storage Latency of 30ms, that means that 20ms of latency is somehow building up in the Guest OS Layer, and you should focus your debugging on the Guest OS’s storage configuration.

Ok, now GAVG is made up of 2 major components KAVG and DAVG, DAVG = basically how much time is spent in the Device from the driver HBA and storage array, and KAVG = how much time is spent in the ESXi Kernel (so how much over is the kernel adding).  

KAVG is actually a derived metric - ESXi does not specifically calculate KAVG. ESXi calculates KAVG with the following formula:

Total Latency –  DAVG =  KAVG.  

The VMKernel is very efficient in processing IO, so there really should not be any significant time that an IO should wait in the kernel or KAVG, so KAVG should be equal to 0 in well configured / running environments. When KAVG is not equal to 0, then that most likely means that the IO is stuck in a Kernel Queue inside the VMKernel.  So the vast majority of the time KAVG will equal QAVG or Queue Average latency (The amount of time an IO is stuck in a queue waiting for a slot in a lower queue to free up so it can move down the stack).

 

 

View the Storage Performance as reported by Iometer

 

When the storage script has completed, you should see two Iometer windows, and two storage workloads should be running.

The storage workload is started on both perf-worker-02a and perf-worker-03a. It will take a few minutes for the workloads to settle, and performance numbers to become almost identical for the two VMs. These virtual machines testing disk share the same datastore and that datastore is saturated.

The performance can be seen in the Iometer GUI as...

Latencies (Average I/O Response Time), latencies around 6ms.  

Low IOPs (Total I/O per Second), around 160IOPs

Low Throughput (Total MBs per Second), around 2.7MBPS

Disclaimer: Because we run this lab in a fully virtualized environment, where the ESXi host servers also run in virtual machines, we cannot assign physical disk spindles to individual datastores. Therefore the performance numbers on these screenshots will vary depending on the actual load in the cloud environment the lab is running in.

 

 

Select perf-worker-03a

 

  1. Select "perf-worker-03a"

 

 

View Storage Performance Metrics in vCenter

 

  1. Select "Monitor"
  2. Select "Performance"
  3. Select "Advanced"
  4. Click "Chart Options"

 

 

Select Performance Metrics

 

  1. Select "Virtual disk"
  2. Select only"scsi0:1"
  3. Click "None" under "Select counters for this chart"
  4. Select "Write latency" and "Write rate"
  5. Click "OK"

The disk that Iometer uses for generating workload is scsi0:1, or sdb inside the guest.

 

 

View Storage Performance Metrics in vCenter

 

Repeat the configuration of the performance chart for perf-worker-02a and verify that performance is almost identical to perf-worker-03a.

Guidance:  Device Latencies that are greater than 20ms may be a performance impact to your applications.

Due to the way we create a private datastore for this test, we actually have pretty good low latency numbers. Scsi0:1 is located on an iSCSI datastore based on a RAMdisk on perf-worker-04a (DatastoreA), running on the same ESXi host as perf-worker-03a. Hence, latencies are pretty low for a fully virtualized environment.

vSphere provides several storage features to help manage and control storage performance:

Let’s configure Storage DRS to solve this contention problem.

 

Storage Cluster and Storage DRS


A datastore cluster is a collection of datastores with shared resources and a shared management interface. Datastore clusters are to datastores what clusters are to hosts. When you create a datastore cluster, you can use vSphere Storage DRS to manage storage resources.

When you add a datastore to a datastore cluster, the datastore's resources become part of the datastore cluster's resources. As with clusters of hosts, you use datastore clusters to aggregate storage resources, which enables you to support resource allocation policies at the datastore cluster level. The following resource management capabilities are also available per datastore cluster.

Space utilization load balancing You can set a threshold for space use. When space use on a datastore exceeds the threshold, Storage DRS generates recommendations or performs Storage vMotion migrations to balance space use across the datastore cluster.

I/O latency load balancing You can set an I/O latency threshold for bottleneck avoidance. When I/O latency on a datastore exceeds the threshold, Storage DRS generates recommendations or performs Storage vMotion migrations to help alleviate high I/O load. Remember to consult your storage vendor, to get their recommendation on using I/O latency load balancing.

Anti-affinity rules You can create anti-affinity rules for virtual machine disks. For example, the virtual disks of a certain virtual machine must be kept on different datastores. By default, all virtual disks for a virtual machine are placed on the same datastore.


 

Change to the Datastore view

 

  1. Change to the datastore view
  2. Expand "vcsa-01a.corp.local" and "RegionA01"

 

 

Create a Datastore Cluster

 

  1. Right Click on "RegionA01"
  2. Select "Storage"
  3. Click "New Datastore Cluster..."

 

 

Create a Datastore Cluster ( part 1 of 6 )

 

For this lab, we will accept most of the default settings.

  1. Type "DatastoreCluster" as the name of the new datastore cluster.
  2. Click Next

 

 

Create a Datastore Cluster (part 2 of 6 )

 

  1. Click "Next"

 

 

Create a Datastore Cluster ( part 3 of 6 )

 

  1. Change the "Utilized Space" threshold to "50"
  2. Click "Next"

Since the HOL is a nested virtual environment, it is difficult to demonstrate high latency in a reliable manner. Therefor we do not use I/O latency to demonstrate load balancing. The default is to check for storage cluster imbalances every 8 hours, but it can be changed to 60 minutes as a minimum.

 

 

Create a Datastore Cluster ( part 4 of 6 )

 

  1. Select "Clusters"
  2. Select "RegionA01-COMP01"
  3. Click "Next"

 

 

Create a Datastore Cluster ( part 5 of 6 )

 

  1. Select "DatastoreA" and "DatastoreB"
  2. Click "Next"

 

 

Create a Datastore Cluster ( part 6 of 6 )

 

  1. Click "Finish"

 

 

Run Storage DRS

 

Take a note of the name of the virtual machine that Storage DRS wants to migrate.

  1. Select "DatastoreCluster"
  2. Select the "Monitor" tab
  3. Select "Storage DRS"
  4. Click "Run Storage DRS Now"
  5. Click "Apply Recommendations"

Notice that SDRS recommends moving one of the workloads from DatastoreA to DatastoreB. It is making the recommendation based on space. SDRS makes storage moves based on performance only after it has collect performance data for more than 8 hours. Since the workloads just recently started SDRS would not make a recommendation to balance the workloads based on performance until it has collected more data.

 

 

Storage DRS in vSphere 6.0

 

  1. Select "Manage"
  2. Select "Settings"
  3. Select "Storage DRS"
  4. Investigate the different settings you can configure for Storage DRS

A number of enhancements has been made to Storage DRS in vSphere 6.0, in order to remove some of the previous limitations of Storage DRS.

Storage DRS has improved interoperability with deduplicated datastores, so that Storage DRS is able to identify if datastores are baked by the same deduplication pool or not, and hence avoid moving a VM to a datastore using a different deduplication pool.

Storage DRS has improved interoperability with thin provisioned datastores, so that Storage DRS is able to identify if thin provisioned datastores are baked by the same storage pool or not, and hence avoid moving a VM between datastores using the same storage pool.

Storage DRS has improved interoperability with Array-based auto-tiering, so that Storage DRS can identify datastores with auto-tiering, and treat them differently, according to the type and frequency of auto-tiering.

Common for all these improvements is that they all require VASA 2.0, which requires that the storage vendor has an updated storage provider.

 

 

Select the VM that was migrated

 

  1. Return to the "Hosts and Clusters" view
  2. Select the virtual machine that was migrated using Storage DRS, in this case perf-worker-03a

 

 

Increased throughput and lower latency

 

  1. Select the "Monitor" tab
  2. Select "Performance"
  3. Select "Advanced"

Now you should see the performance chart you created earlier in this module.

Notice how the throughput has increased and how the latency is lower (green arrows), than it was when both VMs shared the same datastore.

 

 

Return to the Iometer GUIs to review the performance

 

Return the Iometer workers, and see how they also report increased performance and lower latencies.

It will take a while for Iometer to show these higher numbers, maybe 10 minutes. This due to the way the storage performance is throttled in this lab. If you want to cut a shortcut, try to stop, wait for 30 seconds, and then restart the two workers. See arrows on the picture. Then the workload will spike and then settle at the higher performance level in a matter of a couple of minutes.

 

 

Stop the Iometer workloads

 

Stop the Workloads

  1. Press the "Stop Sign" button on the Iometer GUI
  2. Close the GUI by pressing the “X”
  3. Press the "Stop Sign" button on the Iometer GUI
  4. Close the GUI by pressing the “X”

 

Conclusion and Clean-Up



 

Clean up procedure

In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machine and reset their configuration.

 

 

Launch PowerCLI

 

If the PowerCLI window is not already open, click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Power off and Reset VMs

 

In the PowerCLI console, type:    

.\StopLabVMs.ps1

press enter

The script will now stop all running VMs and reset their settings.

 

 

Close PowerCLI window

 

Close the PowerCLI Window  

You can now move on to another module.

 

 

Key take aways

During this lab we saw the importance of sizing your storage correctly, with respect to space and performance. It also shows that sometimes when you have two storage intensive sequential workloads sharing the same spindles, the performance can be greatly impacted. If possible try to keep workloads separated; sequential workloads separate (back by different spindles/LUNs) from random workloads.

In general, we will aim to keep storage latencies under 20ms, lower if possible, and monitor for frequent latency spikes of 60ms or more, which would be a performance concern and something to investigate further.

Guidance: From a vSphere perspective, for most applications, the use of one large datastore vs. several small datastores tends not to have a performance impact. However, the use of one large LUN vs. several LUNs is storage array dependent and most storage arrays perform better in a multi LUN configuration than a single large LUN configuration.

Guidance: Follow your storage vendor’s best practices and sizing guidelines to properly size and tune your storage for your virtualized environment.

 

 

Conclusion

 

This concludes Module 7, Storage Performance and Troubleshooting. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents'  to quickly jump to a module in the manual.

 

Module 8: Network Performance, Basic Concepts and Troubleshooting (15 minutes)

Introduction to Network Performance


As defined by Wikipedia, network performance refers to measures of service quality of a telecommunications product as seen by the customer.

These metrics are considered important:

In the following module, we will show you how to monitor and troubleshoot some network-related issues, so that you can troubleshoot similar issues that may exist in your own environment.


 

For users with non-US Keyboards

 

If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop.

 

 

On-Screen Keyboard

 

Another option, if you are having issues with the keyboard, is to use the On-Screen Keyboard.  

To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.

 

 

Getting Back on Track

 

If, for any reason, you make a mistake or the lab breaks, perform the following actions to get back on track, and restart the current module from the beginning.  

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell prompt.

 

 

Resetting VMs to Restart Module

 

From the PowerCLI prompt, type:    

.\StopLabVMs.ps1

and press Enter.

The script will stop all running VMs and reset their settings, and you can restart the module.

 

 

Start this Module

 

Let's start this module.

Launch Chrome from the shortcut in the Taskbar.

 

 

Login to vSphere

 

Log into vSphere. The vSphere Web Client should be the default home page.

If, for some reason, that does not work, uncheck the box and use these credentials:

User name: CORP\Administrator
Password: VMware1!

 

 

Refresh the UI

 

In order to reduce the amount of manual input in this lab, a lot of tasks are automated using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the actual state of the inventory immediately after a script has run.

If you need to manually refresh the inventory, click the Refresh icon in the top of the vSphere Web Client.

 

 

Select Hosts and Clusters

 

 

Show network contention


Network contention, is when multiple VM's are fighting for the same resources.

In the VMware Hands on labs, it's not possible, to use all network resources, in a way that simulates the real world.

Therefore this module will focus on creating network load, and show you where to look, when you suspect network problems in your own environment.

You might see different results on your screen, due to the load of the environment when you are running the lab.


 

Launch PowerCLI

 

If the PowerCLI window is not already open, click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Start network load

 

Start the lab VMs and start generating network load by typing

.\StartNetTest.ps1

in the PowerCLI window, and press enter.

 

 

Select VM

 

  1. Select "perf-worker-06a" on ESXi host "esx-02a.corp.local"
  2. Select "Monitor" tab
  3. Select "Performance" tab
  4. Select "Advanced"
  5. Click "Chart Options"

 

 

Select Chart options

 

  1. Select "Network"
  2. Click "None"
  3. Select "perf-worker-06a"
  4. Select the Receive and Transmit packets dropped
  5. Click "OK"

Note : If you are unable to select all the metrics shown here, wait until the script starts the VM's and select open the "Chart options" again.

 

 

Monitor chart

 

Depending on the time it has taken for you to get to here, the Network load might be done. You should still be able to see the load that was running in the charts. Notice, that on the picture above, we ran the network twice for illustrational purposes.

  1. Here you can see the graphical network load, on perf-worker-06a
  2. Here you can monitor the load, of the VM and see the actual numbers, of the data transmitted.

Some good advice on what to look for is:

Usage:

If this number is to low, depending on what you expect, it might be because of problems in the network, or in the VM.

Receive and Transmit packets dropped:

This is a good indication of contention. This means that packages are dropped, and might need to be re-transmitted, which could be caused by contention or problems in the network.

Let's go to the host, and see if this is a VM, or a host problem.

 

 

Select Host

 

  1. Select "esx-01a.corp.local"
  2. Select "Monitor" tab
  3. Select "Performance" tab
  4. Select "Advanced"
  5. Select "Network" from the drop down menu
  6. Click "Chart Options"

 

 

Select Chart options

 

  1. Click "None"
  2. Select "esx-01a.corp.local"
  3. Select "Receive and Transmit packets dropped"
  4. Click "OK"

 

 

Monitor Chart

 

  1. See if there are any dropped packets on the host

In this example, there is no packets dropped in the host, wich indicates that this is a VM problem.

Note that you might see different results, in the lab, due to the nature of the Hands on Labs.

 

Conclusion and Cleanup



 

Clean up procedure

In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machines and reset their configuration.

 

 

Launch PowerCLI

 

If the PowerCLI window is not already open, click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Power off and Reset VMs

 

In the PowerCLI console, type:    

.\StopLabVMs.ps1

press enter

The script will now stop all running VMs and reset their settings. You can then move on to another module.

 

 

Close PowerCLI window

 

Close the PowerCLI Window  

You can now move on to another module.

 

 

Key take aways

During this lab we saw how to diagnose networking problems, in VM's and hosts, using VMware's build in monitoring tools in vCenter.

There are many other ways of performance troubleshooting.

If you want to know more about performance troubleshooting, continue with the next modules, or see this article:

Troubleshooting network performance issues in a vSphere environment

http://kb.vmware.com/kb/1004087 

 

 

Conclusion

 

This concludes Module 8, Network Performance, Basic Concepts and Trubleshooting We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents'  to quickly jump to a module in the manual.

 

Module 9: Network Performance Feature: Network IO Control with Reservations (45 minutes)

Introduction to Network IO Control


The Network I/O Control (NIOC) feature in VMware vSphere® has been enhanced in vSphere 6.0 to support a number of exciting new features such as bandwidth reservations.

The total list of features, can be seen below.

The above features are in addition to NetIOC features already available in vSphere 5, such as:


 

Architecture

 

An overview of the architecture of NIOC

 

Hands-on Labs Interactive Simulation: Network IO Control


Due to the nature of the Hands on labs, it's not possible to show Network IO Control live. Therefore we have created an interactive simulation that will allow you to explore the functionality.

  1. Click here to open the interactive simulation. It will open in a new browser window or tab.
  2. When finished, click the “Return to the lab” link to continue with this lab.

The lab will continue to run in the background. If it takes too long to complete the simulation the lab may go into standby mode, and you can resume it after completing the module.


Conclusion and Cleanup



 

Clean up procedure

Since this is an interactive demo, there is no cleanup necessary in this module.

If your lab was suspended, during the interactive demo, please resume it now.

 

 

Key take aways

During this lab we saw that NIOC can be used, to reserve bandwidth, for certain certain types of VM's and workloads.

If you want to know more about NIOC, see these articles:

Performance Evaluation of Network I/O Control in VMware vSphere® 6

http://www.vmware.com/files/pdf/techpaper/Network-IOC-vSphere6-Performance-Evaluation.pdf

Youtube video explaining vSphere Network I/O Control, Version 3

https://www.youtube.com/watch?v=IvczUp6d8ZY

 

 

 

 

Conclusion

 

This concludes Module 9, Network Performance Feature: Network IO Control with Reservations. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents' to quickly jump to a module in the manual.

 

Module 10: Performance Monitoring Tool: esxtop CLI introduction (30 minutes)

Introduction to esxtop


There are several tools to monitor and diagnose performance in vSphere environments. It is best to use esxtop to diagnose and further investigate performance issues that have already been identified through another tool or method. esxtop is not a tool designed for monitoring performance over the long term, but is great for deep investigation or monitoring a specific issue or VM over a defined period of time.

In this lab, which should take about 30 minutes, we will use esxtop to dive into performance troubleshooting, in both CPU, Memory, Storage and Network. The goal of this module is to expose you to the different views in esxtop, and to present you with different loads, in each view.  This is not meant to be a deep dive into esxtop, but to get you comfortable with this tool so that you can use it in your own environment.  

To learn more about each metric in esxtop, and what they mean, we recommend that you look at the links at the end of this module.

In the next module, we will look at the ESXtopNGC Plugin which displays host-level statistics in new and more powerful ways by tapping into the GUI capabilities of the vSphere Web Client.

For day-to-day performance monitoring of an entire vSphere environment, vRealize Operations Manager (vROPs) is powerful tool that can be used to monitor your entire virtual infrastructure. It incorporates high-level dashboard views and built in intelligence to analyze the data and identify possible problems.  Module 12 of this lab shows you some basic functions of vROPs. We also recommend that you look at the other vROPs lab when you are finished with this one, for better understanding of day-to-day monitoring.


 

For users with non-US Keyboards

 

If you are using a device with non-US keyboard layout, you might find it difficult to enter CLI commands, user names and passwords throughout the modules in this lab.

The CLI commands, user names and passwords that needs to be entered, can be copied and pasted from the file README.txt on the desktop.

 

 

On-Screen Keyboard

 

Another option, if you are having issues with the keyboard, is to use the On-Screen Keyboard.  

To do so, click Start and On-Screen Keyboard, or the shortcut on the Taskbar.

 

 

Getting Back on Track

 

If, for any reason, you make a mistake or the lab breaks, perform the following actions to get back on track, and restart the current module from the beginning.  

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a PowerCLI shell prompt.

 

 

Resetting VMs to Restart Module

 

From the PowerCLI prompt, type:    

.\StopLabVMs.ps1

and press Enter.

The script will stop all running VMs and reset their settings, and you can restart the module.

 

 

Start this Module

 

Let's start this module.

Launch Chrome from the shortcut in the Taskbar.

 

 

Login to vSphere

 

Log into vSphere. The vSphere Web Client should be the default home page.

If, for some reason, that does not work, uncheck the box and use these credentials:

User name: CORP\Administrator
Password: VMware1!

 

 

Refresh the UI

 

In order to reduce the amount of manual input in this lab, a lot of tasks are automated using scripts. Therefore, it's possible that the vSphere Web Client does not reflect the actual state of the inventory immediately after a script has run.

If you need to manually refresh the inventory, click the Refresh icon in the top of the vSphere Web Client.

 

 

Select Hosts and Clusters

 

 

vSphere HTML5 Web Client


During this module you have the option of utilizing the HTML5 version of the Web Client.  This version is designed to be more responsive, stable and at the same time provide security.  

At this point not all features are available in the Fling release.  The most commonly actions/features are the following:

If you want to run this in your own environment you can reference the following link: https://labs.vmware.com/flings/vsphere-html5-web-client

Keep in mind that the majority of the actions in this module won’t work in the HTML5 version. The recommendation here is to have both Flash and HTML5 instances open at the same time and toggle between them.

The HTML5 Web Client is located in the bookmark section labeled “Region A HTML5 client”.


 

HTML5 Web Client Link

 

 

 

 

What is a Fling?

From the Flings website: https://labs.vmware.com/about

Our engineers work on tons of pet projects in their spare time, and are always looking toget feedback on their projects (or “flings”). Why flings? A fling is a short-term thing, not a serious relationship but a fun one. Likewise, the tools that are offered here are intended to be played with and explored. None of them are guaranteed to become part of any future product offering and there is no support for them. They are, however, totally free for you to download and play around with them!

 

Show esxtop CPU Features


Esxtop can be used to diagnose performance issues involving almost any aspect of performance at both the host and virtual machine level. This section will step through how to view CPU performance, using esxtop in interactive mode.


 

Open a PowerCLI window

 

Click on the "VMware vSphere PowerCLI" icon in the taskbar to open a command prompt.

 

 

Start CPU load on VMs

 

Type

.\StartCPUTest2.ps1

and press Enter.  The lab VMs will now start.

 

 

Open PuTTY

 

 

 

SSH to esx-01a

 

  1. Select host esx-01a.corp.local
  2. Click Open

 

 

Start esxtop

 

  1. From the ESXi shell, type
esxtop

and press Enter.

2.     Click the Maximize icon so we can see the maximum amount of information.

 

 

Select the CPU view

 

If you just started esxtop, you are default on the CPU view.

To be sure, press "c" to switch to the CPU view.

 

 

Filter the fields displayed

 

Type

f

To see the list of available fields (counters).

Since we don't have a lot of screen space, let's remove the ID and Group Id counters.

Do this by typing the following letters (NOTE: make sure these are capitalized, as these are case sensitive!)

A
B

Press Enter

 

 

Filter only VMs

 

This screen shows performance counters for both virtual machines and ESXi host processes.

To see only values for virtual machines

Press (capital)

V

 

 

Monitor VM load

 

Monitor the load on the 2 Worker VM's: perf-worker-01a and perf-worker-01b.

They should both be running at (or near) 100% guest CPU utilization. If not, then wait for a moment and let the CPU workload startup.

One important metric to monitor is %RDY (CPU Ready).  This metric is the percentage of time a “world” is ready to run, but awaiting the CPU scheduler for approval.  This metric can go up to 100% per vCPU, which means that with 2 vCPU's, it has a maximum value of 200%.  A good guideline is to ensure this value is below 5% per vCPU, but it will always depend on the application.

Look at the worker VMs to see if they go above the 5% per vCPU threshold.  To force esxtop to immediately refresh, click the Space bar.

 

 

Edit Settings of perf-worker-01a

 

Let's see how perf-worker-01a is configured:

  1. Click on the perf-worker-01a virtual machine
  2. Click Actions
  3. Click Edit Settings…

 

 

Add a vCPU to perf-worker-01a

 

  1. Select 2 vCPUs
  2. Click "OK"

 

 

Edit Settings of perf-worker-01b

 

Let's add a virtual CPU to perf-worker-01b to improve performance.

  1. Click on the perf-worker-01b virtual machine
  2. Click Actions
  3. Click Edit Settings…

 

 

Add a vCPU to perf-worker-01b

 

  1. Select 2 vCPUs
  2. Click "OK"

 

 

Monitor %USED and %RDY

 

Return to the PuTTY (esxtop) window.

Now we have added an additional vCPU to each virtual machine, you should see results like the screenshot above:

 

 

Monitor %USED and %RDY (continued)

 

After a few minutes, the CPU benchmark will start to use the additional vCPUs and %RDY will increase even more. This is due to CPU contention and SMP scheduling (increased %CSTP) on the system. The ESXi host has 4 vCPUs across two active virtual machines, attempting to run 2 vCPUs at 100% each, are fighting for resources. Remember that the ESXi host also requires some CPU resources to run, and this causes CPU contention.

 

Show esxtop memory features


Esxtop can be used to diagnose performance issues involving almost any aspect of performance and at both the host and virtual machine perspectives. This section will step through how to view memory performance, using esxtop in interactive mode.


 

Start load on VM's

 

In the PowerCLI window type

.\StartMemoryTest.ps1

And press enter, to start the memory load.

You can continue to the next step, while the script is running, but please don't close any windows, since that will stop the memory load.

 

 

Select the Memory view

 

In the PuTTY window type

m

To see the memory view

 

 

Select correct fields

 

Type

f

To see the list of available counters.

Since we don't have so much screen space, we will remove the 2 counters ID and Group Id

Do this by pressing (capital letters)

B
H
J

 Press enter to return to the esxtop screen

 

 

See only VMs

 

This screen shows performance counters for both virtual machines and ESXi host processes.

To see only values for virtual machines

Press (capital)

V

 

 

Monitor memory load with no contention

 

When the load on the worker VM's begin, you should be able to see them, in the top of the esxtop window.

Some good metrics to look at is :

MCTL :

Is the balloon driver installed?  If not, then it's a good idea to fix that first.

MCTLSZ :

Shows how inflated the balloon is.  How much memory has been taken back from the operating system. This should be 0.

SWCUR :

Shows how much the VM has swapped. This should be 0, but could be ok, if the last counters are ok.

SWR/S :

Shows how much read there is on the swap file.

SWW/S :

Shows how much write there is on the swap file.

Depending on the lab, all counters should be ok. But due to the nature of the nested lab, it's unclear what you might see. So look around, and see if everything looks fine.

 

 

Power on perf-worker-04a

 

  1. Right Click "perf-worker-04a"
  2. Select "Power"
  3. Click "Power On"

 

 

Monitor memory load under contention

 

Now that we have created memory contention on the ESXi host, we can see.

perf-worker-04a has no VMware Tools (and no Balloon driver) installed, and therefor doesn't have any ballooning target

perf-worker-02a and 03a are ballooning around 400MB each

perf-worker-02a, 03a and 04a are swapping to disk

 

 

Stop load on worker

 

  1. Stop the load on the workers that appeared after you started the load script by closing the 2 VM stat collector windows.

 

Show esxtop storage features


Esxtop can be used to diagnose performance issues involving almost any aspect of performance and at both the host and virtual machine perspectives. This section will step through how to use esxtop to view storage performance using esxtop in interactive mode.


 

Start lab

 

In the PowerCLI window type

.\StartStorageTest.ps1

and press enter to start the lab

The lab will take about 5 minutes to prepare. Feel free to continue, on the other steps, while the script finishes.

After you start the script, be sure that you don't close any windows that appear.

 

 

Different views

 

When looking at storage in esxtop, you have multiple options to choose from.

Esxtop shows the storage statistics in three different screens:

And

We will focus on the VM screen in this module.

In the Putty window type (lower case)

v

To see the storage vm view

 

 

Select correct fields

 

Type

f

To see the list of available counters.

In this case, all counters are added by vscsi id.

Since we have enough room for all counters, we will add this too by pressing (capital letter)

A

Press enter when finished

 

 

Start load on VM's

 

The StartStorageTest.ps1 script that we executed in the beginning of this lab, should be finished now and you should have 2 IOmeter windows on your desktop, looking like this.

If not, run the

.\StartStorageTest.ps1 

again, and wait for it to finish.

 

 

Monitor VM load

 

You have 4 running VM's in the Lab.

2 of them, is running IOmeter Workloads, and the other 2 are iSCSI storage targets using RAM disk. Because they are using a RAM disk as storage target, they do not generate any disk I/O.

The metrics to look for here is :

CMDS/S :

This is the total amount of commands per second and includes IOPS (Input/Output Operations Per Second) and other SCSI commands such as SCSI reservations, locks, vendor string requests, unit attention commands etc. being sent to or coming from the device or virtual machine being monitored.

In most cases, CMDS/s = IOPS unless there are a lot of metadata operations (such as SCSI reservations)

LAT/rd and LAT/wr :

Indicates average response time or Read and Write IO, as seen by the VM.

In this case, you should see high values, in CMD/s on the worker VM's that is currently doing IO Meter load (perf-worker-02a and 03a) indicating, that we are generating a lot of IO.

And a high value in LAT/wr, since we are only doing writes.

The numbers can be different, on your screen, due to the nature of the Hands on labs.

 

 

Device or Kernel latency

 

Press

d

To go to the Device view.

Here you can see that the storage workload is on device vmhba33, which is the software iSCSI adapter. Look for DAVG (device latency) and KAVG (kernel latency). DAVG should be below 25ms and KAVG, latency caused by the kernel, should be very low, and always below 2ms.

In this example the latencies are within acceptable values.

 

 

Stop load on worker

 

Close BOTH Iometer workers

  1. When finished, stop IOmeter workloads by clicking "STOP"
  2. Close the window, by selecting the X in the top right corner.

 

show esxtop Network features


Esxtop can be used to diagnose performance issues involving almost any aspect of performance and at both the host and virtual machine perspectives. This section will step through how to view network performance, using esxtop in interactive mode.


 

Start Lab VM's

 

In the PowerCLI window type

 .\StartNetTest.ps1

And press enter.

Continue with the next steps, while the script runs, it will take a few minutes.

 

 

Select the network view

 

In the PuTTY window type

n

to see the networking view

 

 

Select correct fields

 

Type

f

To see the list of available counters.

Since we don't have so much screen space, we will remove the 2 counters PORT-ID and DNAME

Do this by pressing (capital letters)

A
F

Press enter when finished.

 

 

Monitor load

 

Monitor the metrics.

Note that the result might be different, on your screen, due to the load of the environment where the Hands On Labs is running.

The screen updates automatic, but you can force a refresh, by pressing

space

The metric to watch for, is :

%DRPTX and %DRPRX :

Which is the % of sent and received packages that were dropped.

If this number goes up, it might be an indication of high network utilization.

Note that the StartNetTest.ps1 script that you ran in the first step, starts the VM's and then waits for 2 minutes before running a network load for 5 minutes.

Depending on how fast you were, at getting to this step, you might not see any load, if it took you more than seven minutes.

 

 

Restart network load

 

If you want to start the network load for another 5 minutes, return to the PowerCLI window.

In the PowerCLI window type

 .\StartupNetLoad.bat

And press enter.

The network load will now run for another five minutes.  While you wait, you can explore esxtop more.

 

 

Network workload complete

 

As described previously, the load will stop by itself, . So when the PowerShell window says "Network load complete" no more load will be generated.

 

Conclusion and Cleanup



 

Clean up procedure

In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machine and reset their configuration.

 

 

Launch PowerCLI

 

If the PowerCLI window is not already open, click on the VMware vSphere PowerCLI icon in the taskbar to open a command prompt.

 

 

Power off and Reset VMs

 

In the PowerCLI console, type:    

.\StopLabVMs.ps1

press enter

The script will now stop all running VMs and reset their settings. You can then move on to another module.

 

 

Key take aways

During this lab we saw how to use esxtop, to monitor load, in both CPU, memory, storage and network.

We have only scratched the surface, of what esxtop can do.

If you want to know more about esxtop, see these articles:

Yellow-Bricks esxtop page

http://www.yellow-bricks.com/esxtop/

Esxtop Bible

https://communities.vmware.com/docs/DOC-9279

 

 

Conclusion

 

This concludes Module 10, Performance Tool: esxtop CLI introduction. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents' to quickly jump to a module in the manual.

 

Module 11: Performance Monitoring and Troubleshooting: vRealize Operations, Next Steps (30 minutes)

Introduction to vRealize Operations Manager


vRealize Operations Manager 6 features a re-architectured platform that now delivers 8x greater scalability with unified management across vSphere and other domains. Analytics, smart alerts and problem detection capabilities identify complex issues and then recommend automated tasks that streamline remediation before problems impact users.


 

Architecture

 

An overview of the architecture of vRealize Operations Manager version 6. vRealize Operations Manager now uses a scale out architecture, where the older version 5.X used a scale up architecture.

 

Hands-on Labs Interactive Simulation: Using vRealize Operations Manager for Performance Troubleshooting


Since this lab has its focus on deep parameter based performance troubleshooting, we have decided not to include vRealize Operations Manager in this particular lab. However, vRealize Operations Manager is a very powerfull performance troubleshooting tool, and therefore we have included an interactive simulation, that hopefully will inspire you to go and explore one of the dedicated vRealize Operations Manager Hands-On labs.

The interactive simulation will allow you to experience steps which are too time-consuming or resource intensive to do live in the lab environment.

  1. Click here to open the interactive simulation. It will open in a new browser window or tab.
  2. When finished, click the “Return to the lab” link to continue with this lab.

The lab will continue to run in the background. If it takes too long to complete the simulation the lab may go into standby mode, and you can resume it after completing the module.


Conclusion and Cleanup



 

Clean up procedure

Since this is an interactive demo, there is no cleanup necessary in this module.

If your lab was suspended, during the interactive demo, please resume it now.

 

 

Key take aways

During this lab we saw how vRealize Operations Manager can be used for performance troubleshooting. But there is much more to vRealize Operations Manager and if you feel inspired to learn more about this powerfull tool, then try out the following labs: HOL-SDC-1601, HOL-SDC-1602 and HOL-SDC-1610.

If you want to learn more about vRealize Operations Manager, you can go to the following sites.

VMware TV on youtube.com: https://www.youtube.com/user/vmwaretv and go to "Software-Defined Data Center" - "vRealize Ops Mgmt"

VMware vRealize Operations Manager 6.2 Documentation Center: https://pubs.vmware.com/vrealizeoperationsmanager-62/index.jsp

 

 

Conclusion

 

This concludes Module 11, Performance Tool: vRealize Operations, next step in performance monitoring and Troubleshooting. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.

If you have time remaining, here are the other modules that are part of this lab along with an estimated time to complete each one.  Click on 'More Options - Table of Contents' to quickly jump to a module in the manual.

 

Conclusion

Thank you for participating in the VMware Hands-on Labs. Be sure to visit http://hol.vmware.com/ to continue your lab experience online.

Lab SKU: HOL-1704-SDC-1

Version: 20161027-102724