VMware Hands-on Labs - HOL-2004-01-SDC


Lab Overview - HOL-2004-01-SDC - Mastering vSphere Performance

Lab Introduction


This lab,  HOL-2004-01-SDC, Mastering vSphere Performance, has a lot of content, broken down into modules. First, you'll learn about what specifically is new and improved with the current vSphere 6.7 release.  You will also work with a broad array of benchmarks such as DVD Store, Weathervane, and X-Mem and performance monitoring tools such as esxtop and advanced performance charts to both measure performance and diagnose bottlenecks in a vSphere environment. We also explore performance-related vSphere features such as right-sizing virtual machines, virtual NUMA, Latency Sensitivity and Host Power Management.

While the time available in this lab constrains the number of performance problems we can review as examples, we have selected relevant problems that are commonly seen in vSphere environments. Walking through these examples can help you understand and troubleshoot typical performance problems.

For the complete Performance Troubleshooting Methodology and a list of VMware Best Practices, please visit the www.vmware.com website:

Furthermore, if you have interest in performance related articles, make sure that you monitor the VMware VROOM! Blog:

https://blogs.vmware.com/performance/


Lab Guidance


Note: It takes more than 90 minutes to complete this lab. You should expect to only finish two or three of the modules during your time.  The modules are independent of each other, so you can start at the beginning of any module and proceed from there. You can use the Table of Contents to access any module of your choosing at any point in the lab.

You can find the Table of Contents in the upper right-hand corner of the Lab Manual.

Lab Module List:

 Lab Captains:

This lab manual can be downloaded from the Hands-on Labs Document site found here:

http://docs.hol.vmware.com This lab may be available in other languages.  To set your language preference and have a localized manual deployed with your lab, you may utilize this document to help guide you through the process:

http://docs.hol.vmware.com/announcements/nee-default-language.pdf


 

Location of the Main Console

 

  1. The area in the RED box contains the Main Console.  The Lab Manual is on the tab to the Right of the Main Console.
  2. A particular lab may have additional consoles found on separate tabs in the upper left. You are directed to open another specific console if needed.
  3. Your lab starts with 90 minutes on the timer.  The lab cannot be saved.  All your work must be done during the lab session, but you can click EXTEND to increase your time.  If you are at a VMware event, you can extend your lab time twice for up to 30 minutes; each click gives you an additional 15 minutes. Outside of VMware events, you can extend your lab time up to 9 hours and 30 minutes; each click gives you an additional hour.

 

 

Alternate Methods of Keyboard Data Entry

During this module, you input text into the Main Console. Besides directly typing it in, there are two helpful methods of entering data which make it easier to enter complex data.

 

 

Click and Drag Lab Manual Content Into Console Active Window

You can also click and drag text and Command Line Interface (CLI) commands directly from the Lab Manual into the active window in the Main Console.  

 

 

Accessing the Online International Keyboard

 

You can also use the Online International Keyboard found in the Main Console.

  1. Click on the Keyboard Icon found on the Windows Quick Launch Task Bar.

 

 

Click once in active console window

 

In this example, you will use the Online Keyboard to enter the "@" sign used in email addresses. The "@" sign is Shift-2 on US keyboard layouts.

  1. Click once in the active console window.
  2. Click on the Shift key.

 

 

Click on the @ key

 

  1. Click on the "@ key".

Notice the @ sign entered in the active console window.

 

 

Activation Prompt or Watermark

 

When you first start your lab, you may notice a watermark on the desktop indicating that Windows is not activated.  

One of the major benefits of virtualization is that virtual machines can be moved and run on any platform.  The Hands-on Labs utilizes this benefit, and we are able to run the labs out of multiple datacenters.  However, these datacenters may not have identical processors, which triggers a Microsoft activation check through the Internet.

Rest assured, VMware and the Hands-on Labs are in full compliance with Microsoft licensing requirements.  The lab that you are using is a self-contained pod and does not have full access to the Internet, which is required for Windows to verify the activation.  Without full access to the Internet, this automated process fails and you see this watermark.

This cosmetic issue has no effect on your lab.  

 

 

Look at the lower right portion of the screen

 

 

 

  1. Please check to see that your lab is finished all the startup routines and is ready for you to start.

If you see anything other than "Ready", please wait a few minutes.  If after five minutes your lab has not changed to "Ready", please ask for assistance.

 

Module 1 - vSphere 6.7 Performance: What's New? (30 minutes)

Introduction


Underlying each release of VMware vSphere® are many performance and scalability improvements. The vSphere 6.7 platform continues to provide industry-leading performance and features to ensure the successful virtualization and management of your entire software-defined datacenter.


 

Check the Lab Status in the lower-right of the desktop

 

Please check to see that your lab is finished all the startup routines and is ready for you to start. If you see anything other than "Ready", please wait a few minutes. If after 5 minutes you lab has not changed to "Ready", please ask for assistance.

 

 

Open Google Chrome

 

First, let's open Google Chrome.

 

 

Login to vCenter

 

This is the vCenter login screen.  To login to vCenter:

  1. Check the Use Windows session authentication checkbox
  2. Click the LOGIN button

 

 

Select Hosts and Clusters

 

 

Faster Lifecycle Management


VMware vSphere 6.7 includes several improvements that accelerate the host lifecycle management experience to save administrators valuable time.


 

New vSphere Update Manager Interface

 

To see the Update Manager in our lab environment:

  1. Click on the Menu dropdown
  2. Select Update Manager

 

 

Update Manager

 

This release of vSphere includes a brand-new Update Manager interface that is part of the HTML5 Web Client. 

  1. Click the Updates tab
  2. Click the ID column twice to sort by the most recent Update ID
  3. Click a radio button to select an update (note: your environment may be different, as new updates are continually released)
  4. Use the vertical scrollbar to scroll down see more information about the selected update

Update Manager in vSphere 6.7 keeps VMware ESXi 6.x hosts reliable and secure by making it easy for administrators to deploy the latest patches and security fixes. When the time comes to upgrade older releases to the latest version of ESXi 6.7, Update Manager makes that task easy, too.

The new HTML 5 Update Manager interface is more than a simple port from the old Flex client – the new UI provides a much more streamlined remediation process. For example, the previous multi-step remediation wizard is replaced with a much more efficient workflow, requiring just a few clicks to begin the procedure. In addition to that, the pre-check is now a separate operation, allowing administrators to verify that a cluster is ready for upgrade before initiating the workflow.

As of vSphere 6.7 Update 1, the HTML5 Client is now ‘Fully Featured’.  This means that you can manage all aspects of your vSphere  environment using the HTML5-based vSphere Client, no need to switch back and forth between the vSphere Client and the vSphere Web Client.  We’ve ported all features including VMware Update Manager (VUM). Read about all the  features released in this version of the vSphere Client by visiting Functionality Updates for the vSphere Client site.

 

 

Faster Upgrades from ESXi 6.5 to 6.7

Hosts that are currently on ESXi 6.5 upgrade to 6.7 significantly faster than ever before. This is because several optimizations have been made for that upgrade path, including eliminating one of two reboots traditionally required for a host upgrade. In the past, hosts that were upgraded with Update Manager were rebooted a first time in order to initiate the upgrade process, and then rebooted once again after the upgrade was complete.

Modern server hardware, equipped with hundreds of gigabytes of RAM, typically take several minutes to initialize and perform self-tests. Doing this hardware initialization twice during an upgrade really adds up, so this new optimization will significantly shorten the maintenance windows required to upgrade clusters of vSphere infrastructure.

These new improvements reduce the overall time required to upgrade clusters, shortening maintenance windows so that valuable efforts can be focused elsewhere.

Recall that, because of DRS and vMotion, applications are never subject to downtime during hypervisor upgrades – VMs are moved seamlessly from host to host as needed.

 

 

ESXi 6.7 Update Manager Video (3:48)

Since this lab runs in the cloud, it is not practical to upgrade an ESXi host to 6.7. Instead, check out this video to see how the process works:

 

 

vSphere Quick Boot

vSphere 6.7 introduces vSphere Quick Boot – a new capability designed to reduce the time required for a VMware ESXi host to reboot during update operations.

Host reboots occur infrequently but are typically necessary after activities such as applying a patch to the hypervisor or installing a third-party component or driver. Modern server hardware that is equipped with large amounts of RAM may take many minutes to perform device initialization and self-tests.

Quick Boot eliminates the time-consuming hardware initialization phase by shutting down ESXi in an orderly manner and then immediately re-starting it. If it takes several minutes or more for the physical hardware to initialize devices and perform necessary self-tests, then that is the approximate time savings to expect when using Quick Boot! In large clusters that are typically remediated one host at a time, it’s easy to see how this new technology can substantially shorten time requirements for data center maintenance windows.

 

 

Quick Boot video (1:53)

Since this lab runs in the cloud, we can't show a reboot of a physical host.  Instead, check out this video to see how it works!

 

 

Conclusion

The new streamlined Update Manager interface, single reboot upgrades, and vSphere Quick Boot shorten the time required for host lifecycle management operations and make VMware vSphere 6.7 the Efficient and Secure Platform for your Hybrid Cloud.

 

vCenter Server 6.7


vSphere 6.7 delivers an exceptional experience with an enhanced VMware vCenter® Server Appliance™ (vCSA). vSphere 6.7 adds functionality to support not only the typical workflows customers need but also other key functionality such as managing VMware NSX®, VMware vSAN™, VMware vSphere® Update Manager™ (VUM), as well as third-party components.


 

2X faster performance in vCenter operations per second

With their benchmark vcbench, VMware performance engineers measured the number of operations per second (throughput) that vCenter produced.

This benchmark stresses the vCenter server by performing typical vCenter operations like power on and off a VM among several others. vCenter 6.7 performs 16.7 operations per second, which is a twofold increase over the 8.3 operations per second vCenter 6.5 produced.

 

 

3X faster operations, 3X reduction in memory usage

Before vCenter can power on a VM, it first consults several sub-systems, including DRS, to support the initial placement of the VM on a vSphere host. Latency, in this context, is the measure of the duration of this process. VMware made many optimizations in the coordination of these sub-systems to reduce power-on latency from 9.5 seconds to 2.8 seconds.

VMware also optimized the core vCenter process (vpxd) to use much less memory (a 3x reduction!) to complete the same workloads.

 

 

vCenter Performance Analysis

For more information about vCenter Performance, check out vCenter Performance Analysis, a module later in this lab.

 

Core Platform Improvements


Let's look at the variety of improvements that vSphere 6.7 brings: host maximums, new scheduler options, large memory pages, per-VM EVC, virtual hardware versions 14 and 15, persistent memory (PMEM/NVDIMM), virtualization-based security (VBS), and Instant Clone.


 

Host Scalability

There are some minor improvements to vSphere 6.7 ESXi host maximums worth noting.

 

 

1 GB Large Memory Pages

 

Applications with large memory footprints, like SAP HANA, can often stress the hardware memory subsystem (that is, Translation Lookaside Buffer, or TLB) with their access patterns. Modern processors can mitigate this performance impact by creating larger mappings to memory and increasing the memory reach of the application. In prior releases, ESXi allowed guest operating system memory mappings based on 2 MB page sizes. This release introduces memory mappings for 1 GB page sizes.

As shown in this figure, there is up to 26% improvement in 1 GB memory access performance, compared to the 2 MB page size, through more efficient use of the TLB and processor L1-L3 cache.

To enable this advanced attribute, see Backing Guest vRAM with 1GB Pages at https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.resmgmt.doc/GUID-F0E284A5-A6DD-477E-B80B-8EFDF814EE01.html

 

 

CPU Scheduler Enhancements

Scalability of the vSphere ESXi CPU scheduler is always being improved release-to-release to support current and future requirements. New in vSphere 6.7 is the elimination of the last global lock, which allows the scheduler to support tens of thousands of worlds (various processes running in the VMkernel; for example, each virtual CPU has a world associated with it). This feature ensures vSphere maintains its lead as a platform for containers and microservices.

In vSphere 6.7 U2, there is a new scheduler option called the side-channel aware scheduler to address a security vulnerability known as L1TF.  For more information, including performance test results, see this blog: https://blogs.vmware.com/performance/2019/05/new-scheduler-option-for-vsphere-6-7-u2.html 

 

 

Virtual Per-VM EVC

vSphere previously implemented Enhanced vMotion Compatibility (EVC) as a cluster-wide attribute because, at the cluster-wide level, you can make certain assumptions about migrating a VM (for example, even if the processor is not the same across all ESXi hosts, EVC still works). However, this policy can cause problems when you try to migrate across vCenter hosts or vSphere clusters. By implementing per-VM EVC, the EVC mode becomes an attribute of the VM rather than the specific processor generation it happens to be booted on in the cluster.

 

Let's configure a EVC for a specific VM:

  1. Click on Menu then Hosts and Clusters (it should be underlined)
  2. Select the perf-worker-01a VM
  3. Click the Configure tab
  4. Select VMware EVC from the list
  5. You'll note that "EVC is Disabled".  Click the EDIT... button to see what the choices are.

 

  1. Click the Enable EVC for Intel hosts radio button
  2. Click the VMware EVC Mode dropdown and choose Intel "Haswell" Generation. Read the Description of this mode, indicating this would restrict the VM to only Haswell or future-generation Intel processors.
  3. Click Cancel (since this is just an example and we don't actually want to apply EVC in the lab)

 

 

Virtual Hardware 14

Virtual Hardware 14 adds support for:

 

Verify that perf-worker-01a VM is running Virtual Hardware version 14:

  1. Ensure you're in the Hosts and Clusters view (it should be underlined)
  2. Select the perf-worker-01a VM
  3. Click the Summary tab
  4. Note that it has been configured with VM version 14, which is only compatible with ESXi 6.7 and later.

Next, we'll show how to upgrade a legacy VM to this version.

 

 

Virtual Hardware 15 (ESXi 6.7 Update 2 and later)

 

Virtual Hardware 15, which is only supported for ESXi 6.7 U2 (and later) hosts, increases the maximum number of logical processors from 128 to 256.

 

 

Persistent Memory (PMEM)

 

Persistent memory (PMEM) is a type of non-volatile DRAM (NVDIMM) that has the speed of DRAM but retains contents through power cycles. It is a new layer that sits between NAND flash and DRAM and provides faster performance. It’s also non-volatile unlike DRAM.

vSphere 6.7 supports two modes of accessing persistent memory:

vPMEMDisk - presents NVDIMM capacity as a local host datastore which requires no guest operating system changes to leverage this technology.

vPMEM - exposes NVDIMM capacity to the virtual machine through a new virtual NVDIMM device. Guest operating systems use it directly as a block device or in DAX mode.

This chart shows the result of a performance test run using the MySQL benchmark of Sysbench.  The benchmark measures the throughput and latency of a MySQL workload. Here, we ran the tests with three tables, nine threads, and an 80-20 read-write ratio with a MySQL server in a VM hosted on vSphere 6.7.

The blue bars show throughput measured in transactions per second. The green line shows latency measured as the 95th percentile in milliseconds.

We observe that virtual PMEM can improve performance by up to 1.8x better throughput and 2.3x better latency over standard SSD technology.

 

 

vSphere 6.7 Persistent Memory Video (2:00)

Check out this video to learn more about vSphere Persistent Memory can significantly enhance performance for both existing and new applications.

 

 

Virtualization-based Security (VBS) Overview

 

Microsoft VBS, a feature of Windows 10 and Windows Server 2016 operating systems, uses hardware and software virtualization to enhance system security by creating an isolated, hypervisor-restricted, specialized subsystem. Starting with vSphere 6.7 and Virtual Hardware 14, you can enable Microsoft virtualization-based security (VBS) on supported Windows guest operating systems.

VMware engineering made a number of vSphere features and enhancements to improve performance in VBS-enabled virtual machines.

To measure the performance of a vSphere 6.7 virtual machine running Windows with VBS enabled, we used the benchmark HammerDB. The test simulated 22 virtual users generating an OLTP TPC-C-like workload that wrote to a Microsoft SQL Server 2016 database. This workload was like TPC-C.

As shown, these engineering efforts resulted in a 33% improvement in transactions per minute.

 

 

Creating a VBS-enabled VM

 

Let's create a VBS-enabled VM:

  1. Ensure you're in the Hosts and Clusters view
  2. Select esx-02a.corp.local as the host
  3. Select the ACTIONS dropdown
  4. Select New Virtual Machine...

 

Create a new virtual machine will be highlighted.  Click the NEXT button.

 

Type a name for the VM, i.e. VBS and click NEXT.

 

esx-02a.corp.local should already be selected as the host.  Click NEXT.

 

Select the RegionA01-ISCSI02 datastore and click NEXT.

 

Select the virtual machine version.  By default, ESXi 6.7 and later is selected, which is required for VBS, so click NEXT.

 

  1. Ensure Windows is selected as the Guest OS Family
  2. Change Guest OS Version to Microsoft Windows Server 2016 (64-bit)
  3. Note there is a new checkbox for VBS, Enable Windows Virtualization Based Security.  Check this box.
  4. Click NEXT.

 

  1. Click VM Options
  2. Expand the Boot Options section.
    Note that by enabling VBS, the necessary options such as EFI firmware and Secure Boot are required, and automatically set.
  3. Click NEXT.

 

Note that Virtualization Based Security is Enabled, allowing an easy provisioning of a VBS-enabled Windows Server 2016 VM, with the VBS prerequisites automatically set!

Click CANCEL as we are not continue installing the guest.

 

 

Instant Clone

 

The time to fully deploy and boot 64 clones using vSphere 6.7 Instant Clone showed approximately 2.8x improvement over the older Linked Clone architecture.

You can use Instant Clone technology to create powered-on virtual machines from the running state of another powered-on virtual machine. The result of an Instant Clone operation is a new virtual machine that is identical to the source virtual machine. With Instant Clone, you can create new virtual machines from a controlled point in time. Instant cloning is very convenient for large-scale application deployments because it ensures memory efficiency and allows for creating numerous virtual machines on a single host.

This Instant Clone video demonstration shows how 20 CentOS VMs can be provisioned in two minutes (credit: LearnVMware.online).  The magic happens around 3:11 if you want to skip ahead!

 

Conclusion


Based on these performance, scalability, and feature improvements in vSphere 6.7, VMware continues to demonstrate industry-leading performance.


 

You've finished Module 1

Congratulations on completing Module 1.

If you are looking for additional information on vSphere 6.7 performance, check out these links:

 

 

Test Your Skills!

 

Now that you’ve completed this lab, try testing your skills with VMware Odyssey, our newest Hands-on Labs gamification program. We have taken Hands-on Labs to the next level by adding gamification elements to the labs you know and love. Experience the fully automated VMware Odyssey as you race against the clock to complete tasks and reach the highest ranking on the leaderboard. Try the vSphere Performance Odyssey lab

 

Module 2 - Right-Sizing vSphere VMs for Optimal Performance (45 minutes)

Introduction


 

Meet Melvin the Monster VM! vSphere 6.5 and later can handle Melvin and any other large, business-critical workloads (known affectionately as "wide" or "monster" VMs) without breaking a sweat! :-)

In all seriousness, this module discusses rules of thumb for right-sizing VMs -- particularly those that are so large that they span multiple physical processor or memory node boundaries.  We throw around terms such as vCPUs, pCPUs, Cores Per Socket, NUMA (pNUMA and vNUMA), and learn how to right-size these VMs to perform optimally.


NUMA and vNUMA


UMA, NUMA, and vNUMA, oh my!  Let's look at these acronyms and see what they look like from an architectural perspective.


 

UMA

 

This is a bit of a history lesson, as UMA, or Uniform Memory Access, is no longer how modern servers are designed.  The reason why?

The Memory Controller (highlighted) quickly became a bottleneck; it is easy to see why, as every CPU requesting memory or I/O had to pass through this layer.  (Credit: frankdenneman.nl)

 

 

NUMA

NUMA moves away from a centralized pool of memory and introduces the concept of a topology. By classifying memory location bases on signal path length from the processor to the memory, latency and bandwidth bottlenecks can be avoided. This is done by redesigning the whole system of processor and chipset. NUMA architectures gained popularity at the end of the 90's when it was used on SGI supercomputers such as the Cray Origin 2000. NUMA helped to identify the location of the memory, in this case of these systems, they had to wonder which memory region in which chassis was holding the memory bits.

In the first half of the millennium decade, AMD brought NUMA to the enterprise landscape where UMA systems reigned supreme. In 2003 the AMD Opteron family was introduced, featuring integrated memory controllers with each CPU owning designated memory banks. Each CPU has now its own memory address space. A NUMA-optimized operating system such as ESXi allows workload to consume memory from both memory addresses spaces while optimizing for local memory access. Let's use an example of a two CPU system to clarify the distinction between local and remote memory access within a single system:

(Credit: frankdenneman.nl

 

The memory connected to the memory controller of the CPU1 is considered to be local memory. Memory connected to another CPU socket (CPU2) is considered to be foreign or remote for CPU1. Remote memory access has additional latency overhead as opposed to local memory access, since it has to traverse an interconnect (point-to-point link) and connect to the remote memory controller. As a result of the different memory locations, this system experiences “non-uniform” memory access time.

 

 

Without vNUMA

 

In this example, a VM with 12 vCPUs is running on a host with four NUMA nodes with six cores each. This VM is not being presented with the physical NUMA configuration and hence the guest OS and application only sees a single NUMA node. This means that the guest has no chance of placing processes and memory within a physical NUMA node.

We have poor memory locality.

 

 

With vNUMA

Since vSphere 5, ESXi has had the vNUMA (virtual NUMA) feature that can present multiple NUMA nodes to the guest operating system. Traditionally, virtual machines have only been presented with a single NUMA node regardless of the size of the VM and its underlying hardware. Larger and larger workloads are being virtualized, so it has become increasingly important that the guest OS and applications can make decisions on where to execute applications and where to place memory.

VMware ESXi is NUMA aware and always tries to fit a VM within a single physical NUMA node when possible. However, with very large "monster VMs", this isn't always possible.

The purpose of this section is to gain understanding of how vNUMA works by itself and in combination with the cores per socket feature.

 

In this example, a VM with 12 vCPUs is running on a host that has four NUMA nodes with six cores each. This VM is being presented with the physical NUMA configuration, and hence the guest OS and application sees two NUMA nodes. This means that the guest can place processes and accompanying memory within a physical NUMA node when possible.

We have good memory locality.

 

vCPU and vNUMA Right-Sizing


Using virtualization, we enjoy the flexibility to quickly create virtual machines with various virtual CPU (vCPU) configurations for a diverse set of workloads.

However, as we virtualize larger and more demanding workloads such as databases, on top of the latest generations of processors with up to 28 cores, special care must be taken in vCPU and vNUMA configuration to ensure optimal performance.


 

vCPUs, Cores per Socket, vSockets, CPU Hot Plug/Hot Add

 

The most important values are shown in this screenshot, taken directly from the vSphere Web Client.

NOTE: You must expand the CPU dropdown to view/change some of these fields!

  1. CPU: This is the total number of vCPUs presented to the guest OS (20 in this example)
  2. Cores per Socket: If this value is 1 (the default), all CPUs are presented to the guest as single-core processors. For most VMs, the default value is OK, but there are definitely instances when you should consider increasing this value, which we'll discuss in a bit.

    In this example, we've increased it to 10, which means the guest will see multi-core (10-core) processors.
  3. Sockets: This is not a configurable value; it is simply the number of CPUs divided by Cores per Socket: in this example, 20 / 10 = 2.
    Also called "virtual sockets" or "vSockets".
  4. CPU Hot Plug: Also known as CPU Hot Add, this is a checkbox to allow adding more CPUs "on the fly" (while the guest is powered on).

    If you have right-sized your VM from the beginning, you should not enable this feature, because it has the major downside of disabling vNUMA. For more information, see vNUMA is disabled if VCPU hotplug is enabled (KB 2040375)

Let's refer to this 20 vCPU VM, as configured, as 2 Sockets x 10 Cores per Socket.

 

 

Cores per Socket: Licensing Considerations

 

Let's talk about the Cores per Socket value.  As mentioned earlier, this defaults to 1, which means that every virtual CPU is present as a Socket to the guest VM. In most cases, there's no issue there.

However, this may not be ideal from a Microsoft licensing perspective where the operating system and/or application is sometimes per-processor.  Here are a few examples:

 

 

vNUMA Behavior Changes in vSphere 6.5 and above

In an effort to automate and simplify configurations for optimal performance, vSphere 6.5 introduced a few changes in vNUMA behavior.  Thanks to Frank Denneman for thoroughly documenting them here:

http://frankdenneman.nl/2016/12/12/decoupling-cores-per-socket-virtual-numa-topology-vsphere-6-5/

Essentially, the vNUMA presentation under vSphere 6.5/6.7 is no longer affected by Cores per Socket. vSphere will now always present the optimal vNUMA topology (unless you use advanced settings).

However, you should still choose the CPU and Cores per Socket values wisely.  Read on for some best practices.

 

 

Best Practices for Cores per Socket and vNUMA

In general, the following best practices should be followed regarding vNUMA and Cores per Socket:

There are many Advanced Virtual NUMA Attributes (click for a full list); here are a few guidelines, but in general, the defaults are best:

 

Of course, a picture (or in this case, a table) is worth a thousand words.  This table outlines how a VM could (should) be configured on a dual-socket, 10-core physical host to ensure an optimal vNUMA topology and performance, regardless of vSphere version.

 

Guest OS Tools to View vCPUs/vNUMA


We saw how to use the vSphere Client to right-size a virtual machine's vCPUs and Cores per Socket.

What do these toplogies look like from the guest OS perspective?  Let's look at some examples of tools for Windows and Linux that let us verify that the guest is showing the expected processor and NUMA configurations.


 

vSphere Client CPU/Cores per Socket Example

 

Although shown before, it is worth repeating:

  1. CPU: This is the total number of vCPUs presented to the guest OS (20 in this example)
  2. Cores per Socket: If this value is 1 (the default), all CPUs are presented to the guest as single-core processors.

    For most VMs, the default value is OK, but there are definitely instances when you should consider increasing this value, which we'll discuss in a bit.

    In this example, we've increased it to 10, which means the guest will see multi-core (10-core) processors.

  3. Sockets: This is not a configurable value; it is simply the number of CPUs divided by Cores per Socket: in this example, 20 / 10 = 2.

    Also called "virtual sockets" or "vSockets".

  4. CPU Hot Plug: Also known as CPU Hot Add, this is a checkbox to allow adding more CPUs "on the fly" (while the guest is powered on).

    If you have right-sized your VM from the beginning, you should not enable this feature, because it has the major downside of disabling vNUMA.

Let's refer to this 20 vCPU VM, as configured, as 2 Sockets x 10 Cores per Socket.

 

 

Windows: Coreinfo

From the Microsoft Sysinternals web site:  

Coreinfo is a command-line utility that shows you the mapping between logical processors and the physical processor, NUMA node, and socket on which they reside, as well as the cache’s assigned to each logical processor. It uses the Windows’ GetLogicalProcessorInformation function to obtain this information and prints it to the screen, representing a mapping to a logical processor with an asterisk e.g. ‘*’.

Coreinfo is useful for gaining insight into the processor and cache topology of your system.

Parameter Description
-c
Dump information on cores.
-f
Dump core feature information.
-g
Dump information on groups.
-l
Dump information on caches.
-n
Dump information on NUMA nodes.
-s
Dump information on sockets.
-m
Dump NUMA access cost.
-v
Dump only virtualization-related features

 

Here we see the output of coreinfo (with no command line options) on the aforementioned 20 vCPU VM.  Here is a breakdown of the highlights:

  1. Logical to Physical Processor Map: This section confirms Windows sees 20 vCPUs (note that it presents them as Logical and Physical Processors, with a 1:1 mapping)
  2. Logical Processor to Socket Map: This section confirms Windows sees 2 Sockets, with 8 Logical Processors on each Socket.  We can also refer to these as vSockets.
  3. Logical Processor to NUMA Node Map: This section confirms that Windows sees 2 NUMA Nodes, with 8 Logical Processors on each Node.  Since this is a VM, we call these vNUMA nodes.

 

 

Linux: numactl

For Linux, the most useful parameter to gain information about virtual NUMA is numactl.  Note that you may need to install the package that provides the numactl tool for your OS (for RHEL/CentOS 7, an appropriate command is yum install numactl).

Parameter Description
-c
Dump information on cores.
-f
Dump core feature information.
-g
Dump information on groups.
-l
Dump information on caches.
-n
Dump information on NUMA nodes.
-s
Dump information on sockets.
-m
Dump NUMA access cost.
-v
Dump only virtualization-related features

 

Here we see the output of numactl -H (the -H is an abbreviation for hardware; use the man numactl command to see all of the available parameters).  Here is a quick explanation:

  1. numactl -H: This is the command we typed to get the output.
  2. available: 2 nodes (0-1): This section confirms Linux sees 2 NUMA nodes, also known as vNUMA nodes.
  3. node 0 cpus, node 1 cpus: This section confirms Linux sees 10 logical processors on each NUMA node (20 vCPUs total).

 

Conclusion


Congratulations! You now know how to right size VMs optimally for vSphere 6.7!


 

Resources/Helpful Links

For more information about right-sizing VMs, NUMA/vNUMA, and vSphere performance in general, here are some helpful links:

 

Module 3 - Introduction to esxtop (30 minutes)

Introduction to esxtop


There are several tools to monitor and diagnose performance in vSphere environments. esxtop helps you diagnose and further investigate performance issues that you've already identified through the vSphere Client or other tool or method. esxtop is not a tool designed for monitoring performance over the long term but is great for deep investigation or monitoring a specific issue or VM on a specific host over a defined period of time.

In this lab, which should take about 30 minutes, we use esxtop to dive into performance troubleshooting the utilizations of  CPU, Memory, Storage, Network, and Power. The goal of this module is to expose you to the different views in esxtop and to present you with different loads in each view.  This is not meant to be a deep dive into esxtop but to get you comfortable with this tool so that you can use it in your own environment.  

To learn more about the metrics in esxtop and what they mean, we recommend that you look at the links at the end of this module.

For day-to-day performance monitoring of an entire vSphere environment, the VMware vRealize® suite offers a hybrid cloud management platform that provides a comprehensive management for IT services on VMware vSphere and other hypervisors. vRealize Operations™ (vROPs) is a powerful application you can use to monitor your entire virtual infrastructure. It incorporates high-level dashboard views, custom dashboards, and built-in intelligence to analyze the data and identify possible problems. We also recommend that you look at the other VMware vRealize Hands On Labs when you have finished with this one for better understanding of day-to-day monitoring.


Show esxtop CPU features


You can use esxtop to diagnose performance issues involving almost any aspect of performance at both the host and virtual machine perspectives. This section shows you how to view both VM and host CPU performance using esxtop in interactive mode.


 

Monitor VM vCPU load

 

 

Open a PowerShell window

 

Click on the "Windows PowerShell" icon in the taskbar.

 

 

Start CPU load on VMs

 

Type

.\StartCPUTest2.ps1

and press Enter.  Wait until you see the RDP sessions to continue.  

 

 

Open PuTTY

 

Click the PuTTY icon on the taskbar.

 

 

SSH to esx-01a

 

  1. Select host esx-01a.corp.local
  2. Click Open

 

 

Start esxtop

 

  1. From the ESXi shell, type
esxtop

and press Enter.

2.     Click the Maximize icon so we can see the maximum amount of information.

 

 

Select the CPU view

 

If you just started esxtop, you are in the CPU view by default.  

If you happen to be on a different screen, pressing "c" gets you back to this view.

By default the screen will be refreshed every five seconds. To change this, for example to set the refresh rate to two seconds,  press "s 2" then press Enter:

s 2

Let's filter this view (remove some fields) by pressing the letter "f":

f

 

 

Filter the fields displayed

 

Since we don't much screen space, let's remove (filter out) the ID and GID fields.

Do this by typing the following letters (NOTE: Make sure these are capitalized as these are case sensitive!)

AB

You should see the * next to A: and B: disappear.  Press Enter to resume the esxtop screen.

 

 

Filter only VMs

 

By default, this screen shows performance counters for both virtual machines and ESXi host processes.

Let's filter out everything except for virtual machines.  To do this, type a capital "V":

V

 

 

Monitor VM load

 

Monitor the load on the two Worker VM's: perf-worker-01a and perf-worker-01b:

  1. Both VMs should both be running at or near 100% utilization (%USED). If not, then wait for a moment and let the CPU workload startup.
  2. Another important metric to monitor is %RDY (CPU Ready).  This metric is the percentage of time a “world” is ready to run but waiting on the CPU scheduler for approval.  This metric can go up to 100% per vCPU, which means that with two vCPUs, it has a maximum value of 200%.  A good guideline is to ensure this value is below 5% per vCPU, but it always depends on the application.

    Look at the worker VMs to see if they go above the 5% per vCPU threshold. To force esxtop to immediately refresh, click the 
    Space bar.

 

 

Open Google Chrome

 

Click the Google Chrome icon to open up a Web browser.

 

 

Login to the vSphere Client

 

  1. Make sure the Use Windows session authentication box is checked.
  2. Click the Login button to login to the vSphere Client.

 

 

Edit Settings of perf-worker-01a

 

Let's see how perf-worker-01a is configured:

  1. Click on perf-worker-01a, which is hosted on esx-01a.corp.local
  2. Click the Actions dropdown
  3. Click Edit Settings…

 

 

Add a vCPU to perf-worker-01a

 

Since we previously enabled CPU Hot Add, we can add another vCPU while the VM is running:

  1. Expand the CPU dropdown
  2. Change CPU to 2
  3. Click OK to save

 

 

Edit Settings of perf-worker-01b

 

Let's add a virtual CPU to perf-worker-01b as well to improve performance.

  1. Right click on the perf-worker-01b virtual machine
  2. Click Edit Settings…

 

 

Add a vCPU to perf-worker-01b

 

  1. Change CPU to 2
  2. Click OK to save

 

 

Switch back to esxtop/PuTTY

 

Return to the PuTTY (esxtop) window by clicking esx-01a.corp.local on the taskbar to see what has changed.

 

 

Monitor %USED and %RDY

 

Now that you've added an additional vCPU to each VM, you should see results like the screenshot above:

 

 

Monitor %USED and %RDY (continued)

 

After a few minutes, the CPU benchmark starts to use the additional vCPUs and %RDY increases even more. This is due to CPU contention and SMP scheduling (increased %CSTP) on the system. The ESXi host has two active virtual machines each with two vCPUs, and these four vCPUs  attempting to run at 100% each results in fighting for resources. Remember that the ESXi host also requires some physical CPU resources to run, and this causes CPU contention.

 

 

Monitor host CPU power

 

A new switch in vSphere 6.5 lets you monitor the host CPU power statistics in esxtop. To view the host power screen in esxtop, type a lowercase "p":

p

 

Press the letter "f" to see available fields to add to the screen:

f

Press the letter "f" again to add %Aperf/Mperf then press Enter:

f

 

This screen shows:

  1. Current power usage in watts
  2. The number of processors
  3. CPU %USED and %UTIL
  4. Turbo boost as a ratio of clock speeds with and without Turbo (%A/MPERF)

The metric to watch is: 

%A/MPERF:

This ratio column identifies at what frequency the processor is currently running.  aperf and mperf are two hardware registers that keep track of the actual frequency and nominal frequency of the processor. You can't see actual values because of the nature of the Hands On Lab.

However, look at the following image captured from a physical host. It shows a host running VMware vSphere 6.7 U2 with 36 logical CPUs (18 physical CPUs with Hyperthreading enabled) each at 2.8 GHz. The host serves two VMs, and we started a CPU-intensive quad-threaded process on each VM to generate load. 

 

 

Actual and Nominal Frequency

 

  1. The host is using 358 watts
  2. The host has 36 processors (18 physical and 18 logical with Hyperthreading)
  3. %USED and %UTIL vary across the processors. The eight CPUs serve the eight CPU-intensive processes on the VMs
  4. The Aperf/Mperf ratio (%A/MPERF) at about 122% means that the processor is running at about 3.4 GHz:

2.8 GHz × 122% = approximately 3.4 GHz

For more details on host power policies, see .

 

Show esxtop memory features


You can use esxtop to diagnose performance issues involving almost any aspect of performance at both the host and virtual machine perspectives. This section shows you how to view memory performance using esxtop in interactive mode.


 

Open a PowerShell Window (if necessary)

 

Click on the Windows PowerShell icon in the taskbar to open a command prompt.

NOTE: If you already have one open, just switch back to that window.

 

 

Reset Lab

 

Type

.\StopLabVMs.ps1

and press Enter.  This resets the lab in to a base configuration.  

 

 

Start Memory Test

 

In the PowerShell window type

.\StartMemoryTest.ps1

Then press Enter to start the memory load.

You can continue to the next step while the script is running, but please don't close any windows since that stops the memory load.

 

 

Select the esxtop Memory view

 

In the PuTTY window type

m

to see the memory view.

 

 

Select correct fields

 

Type

f

to see the list of available counters.

Since we don't have so much screen space, let's remove the two counters ID and GID

To do this, press (capital letters)

BH

Press Enter to return to the esxtop screen.

 

 

See only VMs

 

This screen shows memory performance counters for both virtual machines and ESXi host processes.

To see only values for virtual machines, press (capital)

V

You can press (capital) V again to toggle between all processes and only VM processes.

 

 

Monitor memory load with no contention

 

When the load on the worker VMs begin, you can see them in the top of the esxtop window.

Some good metrics to look at are:

MCTL: 

Is the balloon driver installed?  If not, then it's a good idea to fix that first.

MCTLSZ: 

Shows how inflated the balloon is and how much memory has been taken back from the operating system. This should be 0.

SWCUR: 

Shows how much the VM has swapped. This should be 0, but could be OK if SWR/S and SWW/S are low.

SWR/S: 

Shows reads from the swap file.

SWW/S: 

Shows writes to the swap file.

Depending on the lab, all counters should be good. However, due to the nature of the nested lab, it's unclear what you might see, so look around.

 

 

Power on perf-worker-04a

 

  1. Click to focus on the vSphere Web Client browser window. Right click on perf-worker-04a
  2. Select Power
  3. Click Power On

 

 

Monitor memory load under contention

 

Now that we have created memory contention on the ESXi host, we can see:

  1. perf-worker-02a and 03a are ballooning around 400MB each
  2. perf-worker-02a, 03a and 04a are swapping to disk, indicating too much memory strain in this environment

 

 

Stop load on workers

 

  1. To stop the load on the workers that appeared after you started the load script, close the two VM Stats Collector windows.

 

Show esxtop storage features


You can use esxtop to diagnose performance issues involving almost any aspect of performance at both the host and virtual machine perspectives. This section shows you how to view storage performance using esxtop in interactive mode.


 

Open a PowerShell Window (if necessary)

 

Click on the Windows PowerShell icon in the taskbar to open a command prompt

NOTE: If you already have one open, just switch back to that window.

 

 

Reset Lab

 

Type

.\StopLabVMs.ps1

and press Enter.  This resets the lab to a base configuration.  

 

 

Start Storage Test

 

In the PowerShell window type

.\StartStorageTest.ps1

and press Enter to start the lab.

The lab takes about five minutes to prepare. Feel free to continue on to the other steps while the script finishes.

After you start the script, be sure that you don't close any windows that appear.

 

 

Different views

 

When looking at storage in esxtop, you have multiple options to choose from.

esxtop shows the storage statistics in three different screens: 

  • adapter screen (d)
  • device screen (u)
  • vm screen (v)

and

  • vSAN (x)

Let's focus on the VM screen in this module.

In the Putty window type (lower case)

v

to see the storage vm view.

 

 

Select correct fields

 

To see the available list of counters, type

f

Let's add this ID by pressing (capital letter) A:

A

Press Enter when finished. 

 

 

Display Iometer load on VMs

 

The StartStorageTest.ps1 script that we executed in the beginning of this lab should be finished now, and you should have two Iometer windows on your desktop that look like the above image.

If not, run 

.\StartStorageTest.ps1 

again, and wait for it to finish.

 

 

Monitor VM load

 

You have four running VMs in the Lab.

Two of them are running Iometer workloads, and the other two are iSCSI storage targets using RAM disk. Because they are using a RAM disk as storage target, they do not generate any disk I/O.

The metrics to watch are :

CMDS/S: 

This is the total amount of commands per second and includes IOPS (Input/Output Operations Per Second). It also includes other SCSI commands such as SCSI reservations, locks, vendor string requests, unit attention commands, and so on being sent to or coming from the device or virtual machine.

In most cases, CMDS/s = IOPS unless there are a many metadata operations (such as SCSI reservations).

LAT/rd and LAT/wr: 

These indicate average response time or Read and Write IO as seen by the VM.

In this case, you should see high values in CMD/s on the worker VMs that currently do Iometer load (perf-worker-02a and 03a). This indicates that the VMs are generating a lot of IO.

You also can observe a high value in LAT/wr since the VMs are only doing writes.

The numbers may be different on your screen due to the nature of the Hands On Labs.

 

 

Device or Kernel latency

 

Press

d

to go to the Device view.

Here you can see that the storage workload is on device vmhba65, which is the software iSCSI adapter. Look for DAVG (device latency) and KAVG (kernel latency).

  1. DAVG should be below 25ms
  2. KAVG should be very low and always below 2ms

In this example the latencies are within acceptable values.

 

 

Stop load on workers

 

Close BOTH Iometer windows:

  1. When finished, stop IOmeter workloads by clicking the red STOP button in each IOmeter window
  2. Click on the red X in the top right corner to close the window

 

 

Wait for PowerShell script to complete

 

After both Iometer windows are closed, switch back to the PowerShell window and wait for the script to clean up the environment before proceeding. Once you see this screen, you can proceed.

 

Show esxtop network features


You can use esxtop to diagnose performance issues involving almost any aspect of performance at both the host and virtual machine perspectives. This section shows you how to view network performance using esxtop in interactive mode.


 

Open a PowerShell Window (if necessary)

 

Click on the Windows PowerShell icon in the taskbar to open a command prompt

NOTE: If you already have one open, just switch back to that window.

 

 

Start Network Test

 

In the PowerShell window type

 .\StartNetTest.ps1

Press Enter.

Continue with the next steps while the script runs since it takes a few minutes to load.

 

 

Select the network view

 

In the PuTTY window type

n

to see the networking view

 

 

Select correct fields

 

To see the list of available counters type

f

Since there is not a lot of screen space, let's remove the two counters PORT-ID and DNAME

To do this, press (capital letters)

AF

Press Enter when finished.

 

 

Monitor load

 

Note that the result might be different on your screen due to the load of the environment where the Hands On Lab is running.

The screen updates automatically. To force a refresh, press

space

Take particular note of these metrics:

  1. PKTTX/s (Packets Transmitted per second) and MbTX/s (Megabits Transmitted per Second): transmit throughput of the NIC/VM
  2. PKTRX/s (Packets Received per second) and MbRX/s (Megabits Received per Second): receive throughput of the NIC/VM
  3. %DRPTX/RX: if these are non-zero and/or increase over time, your network utilization may be too high

Note that the StartNetTest.ps1 script that you ran in the first step starts the VMs and then waits for two minutes before running a network load for five minutes.

Depending on how fast you were at getting to this step, you might not see any load if it took you more than seven minutes. You can restart the network load in the next step if you need to.

 

 

Restart network load

 

If you want to start the network load for another five minutes, return to the PowerCLI window.

In PowerShell type

 .\StartupNetLoad.bat

Press Enter.

The network load runs for another five minutes. While you wait, you can continue to explore esxtop.

 

 

Network workload complete

 

As described previously, the load stops by itself.  When the PowerShell window says, "Network load complete", it no longer generates load and the test is finished.

 

Conclusion and Clean-Up



 

Key takeaways

During this lab we learned how to use esxtop to monitor load in CPU, memory, storage, network, and power views.

We have only scratched the surface of what esxtop can do. In the next module, we take a closer look at using esxtop in your own datacenter.

If you want to know more about esxtop, see these articles:

 

 

Clean up procedure

To free up resources for the remaining parts of this lab, we need to shut down all used virtual machines and reset the configuration.

 

 

Open a PowerShell Window (if necessary)

 

Click on the Windows PowerShell icon in the taskbar to open a command prompt

NOTE: If you already have one open, just switch back to that window.

 

 

Reset Lab

 

To reset the lab, type

.\StopLabVMs.ps1

and press Enter.  This resets the lab into a base configuration. You now can move on to another module.

 

 

Conclusion

This concludes the Introduction to esxtop module. We hope you have enjoyed taking it. To learn more about esxtop's advanced features, such as running in batch mode and viewing collected statistics, continue to the next module.

Please remember to fill out the survey when you finish.

 

Module 4 - esxtop in Real-World Use Cases (30 minutes)

esxtop in Real-World Use Cases


This module takes what you learned in the previous Introduction to esxtop module and applies it to real-world scenarios. It expands on some of the advanced esxtop metrics available to monitor. We also discuss how to run esxtop in batch mode, save its output into Comma-Separated Value (.csv) format, and graph that output with a graphical interface. 


Creating an esxtop resource file


Because the VM and host performance statistics on the esxtop screens can be overwhelming, esxtop lets you create a resource file (rc for short) that automatically filters the displays and saves only the information that interests you.

Once you become familiar with esxtop and begin using it interactively or in batch mode, you can see that it generates screens full of detailed VM and host information. When you have many VMs on a large host, the screen display can be difficult to manage, so esxtop lets you create one or more resource files that initializes the display to capture a subset of the performance statistics. This file's default name is ~/.esxtop60rc.  Let's learn how to use it and trim down the number of fields to report.

If you took Module 3 - Introduction to esxtop (30 minutes) then you're already familiar with adding and removing fields from esxtop. In this module, we filter esxtop to capture commonly monitored performance statistics in the CPU, memory, I/O, network, and power components.

First, let's log into a host and start esxtop.


 

Open PuTTY

 

Click the PuTTY icon on the taskbar

 

 

SSH to esx-01a

 

  1. Select host esx-01a.corp.local
  2. Click Open

 

 

Start esxtop

 

From the ESXi shell, type esxtop on the command line:

esxtop

and press Enter.

 

If you just started esxtop, you are in the CPU view by default.  

If you happen to be on a different screen, pressing "c" gets you back to this view.

 

 

Stretch the esxtop window

 

Some of the columns exceed the width of the window, so hover the cursor on the right edge of the window, click once, and stretch it horizontally to the right to expand the window.

 

Now the PuTTY window is wide enough to display most or all of the available columns (as highlighted above).

 

 

Customize the CPU view

Let's filter this view (add and remove some fields) by pressing the letter "f":

f

 

 

Customize host CPU power fields

 

By default, this screen shows performance counters for both virtual machines and ESXi host processes. To view the host power screen in esxtop, type a lowercase "p":

p

 

 

Customize the memory view

 

 

 

Customize I/O views

 

To display the disk adapter statistics, press the letter "d":

d

 

 

 

Customize network view

 

To display the network statistics, press the letter "n":

n

 

To filter the displayed fields, press the letter "f" and press Enter. To add identification of uplinks (UPLINK) press "B" and press Enter:

B

 

As you can see, you have added the previously hidden UPLINK field to see which networks have uplinks.

 

 

Write to resource file

 

To write these custom settings to a resource file we'll name .esxtopHOL, type "W .esxtopHOL" then press Enter:

W .esxtopHOL

Note: Don't edit these resource files manually! For changes, run esxtop and follow the preceding steps to change the resource file.

When you invoke esxtop either interactively or through batch, esxtop looks for the default resource file and automatically applies any filters it finds.

 

 

View the resource file's contents

You can create different resource files for specific components. For example, you may want to create a CPU-only resource file, memory-only, and so forth.

Type "q" to exit esxtop.

To see the differences between the default resource file .esxtop60rc and your custom .esxtopHOL resource file:

diff .esxtop60rc .esxtopHOL
--- .esxtop60rc
+++ .esxtopHOL
@@ -1,10 +1,10 @@
-ABcDEFghij
-aBcDefgHijKLmnOpq
-ABCdEfGhijkl
-ABcdeFGhIjklmnop
+abcDEFghiJ
+abcDefgHijkLmnOpq
+AbcdEfGhijkl
+AbcdeFGhIJKlmnop
 aBCDEfghIJKl
-AbcDEFGHIJKLMNopq
+ABcDEFGHIJKLMNopq
 ABCDeF
-ABCDef
+ABCDeF
 ABCd
-5c
+5n

 

 

Start esxtop with your customized views

Let's say you want to customize your view for capturing performance statistics at different times of the day for several minutes at a time. This is especially useful when using esxtop in batch mode. You can create several resource files and use them to filter your initial view when invoking esxtop whether interactively or through batch.

To start esxtop with your custom resource file, type:

esxtop -c .esxtopHOL

For more information on using esxtop in batch mode to capture statistics and analyze them later, see the next section.

 

Saving esxtop statistics with batch mode


This module discusses creating the Comma-Separated Value (.csv) with the output from esxtop in batch mode to share with colleagues and to analyze the statistics you collect.  

In the previous module we filtered the fields to display in esxtop and saved our preferences in the esxtop resource file. Now that we're collecting only the statistics that you find interesting, we can invoke esxtop in batch mode and capture the statistics in a Comma-Separated Values (.csv) file to share with colleagues and graph to look at trends during the collection period.


 

Invoking esxtop interactively with a resource file

As we saw in the previous module, you can invoke esxtop interactively and apply your custom resource file settings with the -c switch. 

For example, if you created a resource file for all fields under the CPU display and named it .esxtopallcpustats, you can invoke esxtop and use the resource file to apply your preferred filters:

esxtop -c .esxtopallcpustats 

For this lab, we already created a sample resource file named .esxtopHOL. This resource file captures only the statistics we selected in the previous module.

 

 

Start the workload

Let's start a workload and use esxtop in batch mode to capture only the statistics we requested.

 

 

Open a PowerShell window

 

If you don't already have a "Windows PowerShell" window open, click on the "Windows PowerShell" icon in the taskbar.

 

 

Start load on VMs

Type

.\StartCPUTest2.ps1

and press Enter. Depending on the load of the lab systems, this may take several minutes.

Wait until you see the RDP sessions to continue.

 

 

Open PuTTY

 

Click the PuTTY icon on the taskbar.

 

 

SSH to esx-01a

 

  1. Select host esx-01a.corp.local
  2. Click Open

 

 

Start esxtop with a resource file in batch mode

 

Invoke esxtop in batch mode and apply our custom settings with the -b, -d, and -n switches:

esxtop -b -d 2 -n 100 -c .esxtopHOL > /tmp/esxtop_HOLstats.csv

where:

The above command collects 100 total samples every two seconds over the course of 200 seconds and writes the statistics to a file named /tmp/esxtop_HOLstats.csv.

 

After 200 seconds, esxtop finishes.

 

 

Review contents of the .csv file

 

To look inside the esxtop output .csv file, type "more /tmp/esxtop_HOLstats.csv" and press Enter:

more /tmp/esxtop_HOLstats.csv

 

As you can see, the output .csv file contains all the statistics we selected. You now can use NMON Visualizer to graph the statistics as described in the next module. You also can copy the .csv to a Windows system and use PERFMON to analyze the statistics you collected.

The next sections discuss examples of how to apply additional esxtop switches.

 

 

Example: View all statistics

If you want to override any resource files and record all metrics, add -a:

esxtop -b -d 30 -n 360 -a > /tmp/esxtop_HOLstats.csv

 

 

Example: Output to a compressed file

The esxtop output .csv file grows quickly, so you can pipe the output into a compressed file:

esxtop -b -d 30 -n 360 -a | gzip -9c > /tmp/esxtop_HOLstats.csv.gz

 

 

Conclusion

For more details on esxtop and running in batch mode, see:

 

Graphing esxtop statistics


This module discusses taking the Comma-Separated Value (.csv) output from esxtop in batch mode and graph it to see trends over the measurement interval.

You can graph the contents of the esxtop output file to visualize vSphere performance over the collection interval.  This module discusses using NMON Visualizer, a free Java program that graphs the contents of .csv files. You also can use Windows PERFMON to view the results. 

NMON Visualizer is a Java program and can run on any operating system where Java is installed, and the user interface is the same no matter which platform you use. Let's get familiar with it on Windows.


 

Copy the output file to desktop

 

We'll use the esxtop batch output file we created in the previous module. First, you need to copy the .csv output file from the ESXi host to the desktop.

  1. For this lab, we already copied esxtop_HOLstats.csv on the desktop
  2. To start NMON Visualizer, double click on its icon

 

 

Start NMON Visualizer

 

 

 

Load esxtop output file

 

You need to load the .csv file into NMON Visualizer. In the NMON Visualizer window:

  1. Click on File
  2. Click on Load...

 

Double click on esxtop_HOLstats.csv.

NMON Visualizer loads the output file.

 

  1. The host name
  2. The collection period

 

 

Graphing CPU statistics with NMON Visualizer

 

Click on the gray triangle next to the host "esx-01a.corp.local" to expand the list of collected statistics.

 

  1. Click on the  gray triangle next to "Physical Cpu" to open its folder
  2. Click on the  word "Total"

The graph displays total Physical CPU utilization broken down into Processor Time and Util Time. During the test, physical CPU averaged about 54% utilization.

Let's clear the CPU statistics and look at physical disk activity.

Click on the  gray triangle next to "Physical Cpu" to close its folder.

 

 

Graphing disk statistics with NMON Visualizer

 

  1. Click on the  gray triangle next to "Physical Disk" to open its folder
  2. Click on the last entry in the folder for vmhba65:vmhba65:C0:T0:L2

We can see that disk utilization increased towards the end of the collection period.

Let's narrow down the collection and see the load for a particular time period.

 

Click on the Manage button.

 

 

Displaying activity during a custom interval

 

  1. You can see the system time interval when esxtop capture performance statistics.
  2. You can add a custom interval to narrow down the time period.

 

  1. Next to Start: change the time from 37 to 40 
  2. Press Add
  3. Press the red Close Window button to close the dialog box and display the narrowed collection period

 

You can see that we've narrowed the statistics to display only the records from 14:40:48 to 14:41:31. We can narrow down further to see only the physical disk statistics we're interested in. 

 

With NMON Visualizer you can add or remove statistics dynamically. In the box under the graph, click on the first three check boxes next to Physical Disk Path and deselect:

  1. Command/sec
  2. Reads/sec
  3. Writes/sec

 

Conclusion and Clean-Up



 

Key takeaways

During this lab we learned how to customize the performance statistics we collect using resource files, how to save the statistics into an output .csv file, and how to graph the statistics and produce performance charts. 

We have only scratched the surface of what esxtop can do. If you want to know more about esxtop, see these articles:

 

 

Clean up procedure

To free up resources for the remaining parts of this lab, we need to shut down all used virtual machines and reset the configuration.

 

 

Open a PowerShell Window (if necessary)

 

Click on the Windows PowerShell icon in the taskbar to open a command prompt

NOTE: If you already have one open, just switch back to that window.

 

 

Reset Lab

 

To reset the lab, type

.\StopLabVMs.ps1

and press Enter.  This resets the lab into a base configuration. You now can move on to another module.

 

 

Conclusion

This concludes the esxtop in Real-World Use Cases module. We hope you have enjoyed taking it. Please remember to fill out the survey when you finish.

 

Module 5 - vCenter Performance Analysis (30 minutes)

Introduction


vSphere 6.7 delivers an exceptional experience with an enhanced VMware vCenter® Server Appliance™ (vCSA).  As mentioned earlier, when measuring the performance of vCenter 6.7 versus 6.5, performance engineers saw much higher performance (throughput) and lower latency with operations such as powering on/off VMs.

This module will show you how to monitor the health/performance of your vCenter Server using the vCenter Server Appliance Management Interface (VAMI), as well as tools for detailed analysis, including vimtop, profiler, pg_top and postgres (database) log files.


 

Check the Lab Status in the lower-right of the desktop

 

Please check to see that your lab is finished all the startup routines and is ready for you to start. If you see anything other than "Ready", please wait a few minutes. If after five minutes you lab has not changed to "Ready", please ask for assistance.

 

 

10,000 Foot View of vCenter

 

For most customers, vCenter looks like a service (vpxd) that UI and API clients make requests to, and vCenter stores inventory information (hosts, clusters, VMs) in a database.

Many years ago, vpxd used to be a monolithic service, and while it's still conceptually the same, there is a lot more going on under the hood to provide improved performance, additional features, etc.

 

 

vCenter: Under the Hood

 

Here is what a vCenter Server/vCSA looks like under the hood.  Don't worry, we'll touch on the most important of these as we look at debugging tools later in this module.

 

vCenter Server Appliance Management Interface (VAMI)


The vCenter Server Appliance Management Interface (VAMI) is the administration Web interface for the vCenter Server Appliance (vCSA), and is used to perform basic administrative tasks such as monitoring the vCSA, changing the host name and the network configuration, NTP configuration, and applying patches and updates.

The VAMI was included in the early versions of vCSA, but was removed by VMware in vSphere 6.0 and then reintroduced once more in vSphere 6.0 U1. The revamped VAMI in vCenter 6.7 uses HTML and has a new look and feel. In this section, the VAMI within the HOL environment will be accessed and some of its performance monitoring features will be showcased, along with some guidance on what to look for in case performance is not what you would expect.


 

Open Chrome

 

Click the Chrome icon from the shortcut in the Taskbar.

 

 

Open the VAMI Login page

 

  1. In the upper-left of the Chrome window, click the HOL Admin folder.
  2. Click the vcsa-01a Mgmt bookmark.  This is the VAMI interface.

 

 

Note the VAMI URL

 

We are now at the VAMI login screen.  Note a couple of things:

 

 

Login to the VAMI

 

To login to the VAMI, use these credentials:

  1. Username: root
  2. Password: VMware1!

 

 

VAMI Summary Screen

 

This is the Summary screen of the VAMI, which is the default when you login.  Note a couple of things: 

  1. This is a useful Health Status table, which shows various states of the vCenter Server (vCSA).  In this example, everything is in the "Good" (healthy) state.
  2. Click Monitor to explore the various subsystems that are monitored.

 

 

VAMI Monitoring: CPU & Memory

 

  1. Upon clicking Monitor, the first screen shown is CPU & Memory
  2. This shows the percentages of CPU & Memory consumption.
  3. By default, the time range is over the last hour, but you can change the time range at the top right of the screen.
  4. A good rule of thumb is to keep both CPU & Memory less than 70%.  What if they're higher?  Here are some options:
    • Split the inventory of the vCenter (hosts, clusters, VMs, etc.) across one or more vCenter Servers.  Using vCenter Enhanced Linked Mode allows you to log in to any single instance of a vCSA and view/manage  the inventories of all the vCenter Server systems in the group.  You can join up to 15 vCSA deployments with vCenter Enhanced Linked Mode.
    • For CPU > 70%, Add Virtual CPUs to the vCSA VM.
    • Keep in mind that the CPU scale goes from 0-100% utilization and doesn't separate out the activity by the individual vCPUs of the vCSA VM.
      • For example, if you're showing 25% utilization and your vCSA has 4 vCPUs, this could mean that the workload is being divided evenly between each vCPU, but it could also indicate that one vCPU is being utilized 100% of the time.
        Many services that run on the vCSA are single-threaded, so you do need to keep this in mind. If you suspect that a single vCPU is being heavily utilized, you can monitor the CPU activity of the vCSA on a per-CPU basis from the vSphere client or by using vimtop (which we'll learn about later).
    • For Memory > 70%, Change the Memory Configuration of the vCSA VM.
    • Consider setting a memory reservation for the vCSA VM.  For more information, see Allocate Memory Resources.
  5. Let's move on to the next screen.  Click Disks.

 

 

VAMI Monitoring: Disks

 

  1. You are now on the Disks section of the Monitor screens.  The Disks screen shows all of the virtual hard disks the vCSA is using, the purpose of the partition, and how much disk space is being consumed.
  2. The DB, DBLog, and SEAT (Stats/Events/Alarms/Tasks) partitions are write-intensive, so placing this data on SSDs (solid state drives) is preferred to achieve optimal performance.
  3. Let's move on to the next screen.  Click Network.

 

 

VAMI Monitoring: Network

 

  1. The Network screen shows a variety of network statistics, including transmit (tx) and receive (rx) throughput (KB/sec), for both loopback and eth0.  Unlike CPU & Memory, you'll need to click through the list of these counters to get an accurate portrayal of the network activity of the vCSA.  Although these counters should be monitored, networking is usually not an issue with the vCSA.
  2. The important thing to check is that you don't see any errors (as shown here, the value is 0) for eth0 tx/rx errors detected as well as packets dropped.  If greater than zero, you should look into whether there are networking infrastructure problems in your environment.
  3. Let's move on to the next and final screen.  Click Database.

 

 

VAMI Monitoring: Database

 

  1. The Database monitoring tab is arguably the most important, as the information that it provides is not easily obtained by any other means. The vCSA uses a PostgreSQL database to store persistent information for the vCSA.
  2. The Database page is divided into two charts: Seat space and Overall space utilization trends.  Use Alarms to avoid running out of disk space.
  3. The Seat section displays the statistics, events, alarms and stats. These different categories can be displayed as graph lines by clicking on their names below the Seat graph.
    The total Seat utilization is shown in the bottom graph, as well as the DB log and core utilization; these graph lines can also be removed from the graph by clicking on the associated name below the graph. If any of these sections start to fill up, the reason for this anomaly should be investigated and appropriate actions taken to ensure that the vCSA database performs as expected.

 

 

VAMI Backup & Update

 

  1. We just covered all of the performance monitoring features of the Monitor tab.
  2. While unrelated to performance, you should back up your vCSA on a regular basis, especially before you perform a major operation on your vCSA such as updating it. The VAMI tool includes a powerful backup tool (the Backup tab highlighted) that lets you back up the data on your vCSA either on demand or on a set schedule. This tool is unique in that, in order to be as space efficient as possible, it only backs up the data on the vCSA and not the entire vCSA. To restore the vCSA, you reinstall the vCSA and then restore the backed up data on it. The restore process can be initiated from the vCSA installation ISO.
  3. One of the most critical tasks you can perform to make sure that your vCSA is safe, secure, reliable, and performant is keeping it updated, and the VAMI has a feature included that makes the upgrade process as painless as possible: the Update tab.  Let's take a look at what this screen looks like.

 

 

VAMI Update tab

 

  1. When you click the Update tab, a screen appears with current version details.
  2. In the upper right-hand corner of the screen is the Check Updates button, which downloads a list of the latest patches and updates for your vCSA from VMware.
  3. Once the list is downloaded you can click on the patch to review important information about it, including its criticality, the size of the download, and whether will require a reboot of your vCSA.
  4. To install the patch or upgrade, you can select Stage Only or Stage and Install. If you select Stage Only, it only downloads the patch and then later you'll have the option to install it when you see fit.

Since this is a lab environment, it is not feasible to upgrade the vCSA, as this is is a resource- and time-intensive process.
For your environment, however, VAMI Monitoring, Backups and Updates will ensure your vCSA is running as optimally as possible.

 

 

Conclusion

The vCSA has become the de-facto standard in most datacenters for managing a vSphere environment. For your vSphere environment to run most efficiently, you need to ensure that the processes running on your vCSA have the resources that they need; by using the VAMI, you can monitor the performance of the vCSA and detect abnormalities. You can also use the VAMI to back up and update  your vCSA to ensure that it's patched to the most recent version so that, in the case of a catastrophic event, you can recover easily and efficiently.

Credits to Tom Fenton and Ravi Soundararajan for much of this VAMI content.  For more information on how to use the VAMI, see Tom's great blog article:
https://virtualizationreview.com/articles/2018/09/10/how-to-use-vami.aspx

 

Tools for Detailed Analysis: vimtop


This section will introduce vimtop, a tool for real-time CPU/memory debugging of the vCSA.  Let's see how it looks in the lab environment, and how it might look under benchmark loads.


 

Open PuTTY

 

First, click on the PuTTY icon on the taskbar.

 

 

Load vcsa-01a session

 

  1. Scroll down and click on vcsa-01a.corp.local
  2. Click Load

 

 

Open vimtop

 

Simply type in vimtop:

vimtop

and press Enter to start this tool.

 

 

Example vimtop screenshot

 

Here is an example screenshot of vimtop running within the lab environment.  If you're familiar with top (the Linux performance monitoring tool) or esxtop (the equivalent for ESXi), you'll notice vimtop has a similar look and feel.  The default vimtop screen provides you with an overview and task pane. The overview pane quantifies the CPU and memory resources that your vCSA is currently consuming (the top half of the screen); the task pane (bottom half) shows you the processes that are consuming the most CPU resources.  The CPU activity should never total more than 70% for your vCSA.

By default, vimtop refreshes its data every second. To pause this automatic refresh, press "p"; alternatively, to set a lower refresh rate, press "s" and then enter the number of seconds between screen refreshes.

To see the help menu, press "h." The help menu will explain how to add, remove and reorder columns from vimtop. To quit vimtop, press "q".

Let's see how vimtop looks while under load.

 

 

vimtop During a 'Churn' Benchmark

 

This is what vimtop looks like during a "churn" benchmark, which basically consists of creating a VM, powering it on, running for a while, powering it off, and then deleting it.

This screen shows shows us several interesting things:

With their benchmark vcbench, VMware performance engineers measured the number of operations per second (throughput) that vCenter produced.

This benchmark stresses the vCenter server by performing typical vCenter operations like power on and off a VM, among several others. vCenter 6.7 performs 16.7 operations per second, which is a twofold increase over the 8.3 operations per second vCenter 6.5 produced.

 

 

vimtop During a Tagging Benchmark

 

This is what vimtop looks like during a tagging benchmark (which performs/simulates advanced API calls, such as PowerCLI Get-Tag).  Behind the scenes, tagging goes through a proxy, the endpoint, through the data service, to the vpxd services (aka vCenter Services, aka the tagging service).

This screen shows a couple of processes, and here some additional ones that may pop up:

 

 

vimtop Showing Heap Issues; Consider Increasing Heap Size

 

The vCenter UI runs as a Java process within the vCSA, and as such, if the CPU utilization is consistently high, i.e. 100% (as shown here; note that this is not 100% across all vCPUs, just of one core), for a prolonged period of time, it may be invoking garbage collection too often.  This is an indicator that it may not have enough memory.

Let's look at a command that will show you how to increase the memory size.

 

 

vimtop Showing Heap Issues; Consider Increasing Heap Size

 

Assuming you still have the PuTTY session open to vcsa-01a, type this command:

cloudvm-ram-size -l vsphere-client

This will show you the memory allocated to the vsphere-client process in your particular environment (853MB in this example; this will be different in your environment).
You can increase this by using this command:

cloudvm-ram-size -C 1000 vsphere-client 

where 1000 is the value in MB that you want to increase the service's memory to.

Note that the preferred method would be shutting down your vCSA and assigning it that VM more virtual memory, which should auto-scale all the processes such as vsphere.client, but that does involve some downtime.

 

 

Conclusion

vimtop is a very powerful real-time tool to show you real-time resource issues that may be adversely affecting the performance of your vCSA.

For more information on vimtop, please visit these excellent resources online:

 

Tools for Detailed Analysis: vpxd profiler logs


This section will discuss the vCenter (vpxd) profiler logs files - how to find them on your vCSA, what they look like, and some important counters to look at.


 

Open PuTTY

 

Let's find where the vpxd profiler logs are in the lab environment. If you don't already have a Putty session open to vcsa-01a, click on the PuTTY icon on the taskbar.

 

 

SSH to vcsa-01a

 

Scroll down and double-click on vcsa-01a.corp.local.

 

 

Find vpxd-profiler logs

 

1. To find the vpxd profiler log files, execute these commands in the PuTTY window:

cd /var/log/vmware/vpxd
ls -l vpxd-profiler*

2. Note that vpxd-profiler.log is a symbolic link to the most recent log file, while the older profiler logs are compressed (gzipped).

3. Let's look at the file format of this log file.  Run this command:

less vpxd-profiler.log

 

 

 

vpxd-profiler.log example

 

Here is an example of what the vpxd-profiler.log file consists of:

  1. Timestamp
  2. Key-Value pairs (i.e. a vCSA setting, and the value the setting was set to)

This is a large file, with a lot of counters, so what are some useful ones?  We'll look at some next.

 

 

Useful vpxd-profiler.log counters

 

Here are a few counters that may be useful while troubleshooting vCSA performance:

Press "q" when you are done reviewing the vpxd profiler log file.

 

Tools for Detailed Analysis: PostgreSQL logs and pg_top


This section discusses how to analyze the Postgres logs and using the pg_top command to debug the database of the vCSA.


 

Open PuTTY

 

Let's look at the Postgres logs and the pg_top command in the lab environment.

First, click on the PuTTY icon on the taskbar.

 

 

SSH to vcsa-01a

 

Scroll down and double-click on vcsa-01a.corp.local

 

 

List Postgresql Logs

 

To list the Postgres log files, run these commands in the PuTTY window:

cd /var/log/vmware/vpostgres/
ls -l postgresql-*

Note that each numbered log file is for a different day of the month; for example, postgresql-01.log above would contain the database log entries from June 1.

 

 

List Postgresql Logs

 

Let's search for log entries with the string 'duration' to see which SQL queries took longer than one second (1,000 ms):

grep duration postgres*

For stats and events tables, these durations are OK. For other tables (core tables: host tables, VM tables, network tables), if you notice SQL queries consistently taking an abnormally long time (multiple seconds), that could indicate a performance issue with your database.

How do we look at database performance once we suspect there's an issue?  We'll look at pg_top next, a tool to do just that.

 

 

Running pg_top

 

Here are the commands to run pg_top on your vCSA:

cd /opt/vmware/vpostgres/current/bin/
./pg_top -U postgres -d VCDB

If you're familiar with top (the Linux performance monitoring tool) or esxtop (the equivalent for ESXi), you'll notice pg_top has a similar look and feel.  The default pg_top screen provides you with an overview and task pane. The overview pane quantifies the CPU and memory resources that your PostgreSQL database (VCDB) is currently consuming (the top half of the screen); the task pane (bottom half) shows you the processes that are consuming the most CPU resources.  The CPU activity should never total more than 70%.

By default, pg_top refreshes its data every second. To pause this automatic refresh, press "p"; alternatively, to set a lower refresh rate, press "s" and then enter the number of seconds between screen refreshes.

To see the help menu, press "h." The help menu explains how to add, remove, and reorder columns from pg_top. To quit, press "q".

Let's see how pg_top looks while under load.

 

 

Example pg_top screenshot

 

Here is what pg_top looks like; as you can see, much like top, esxtop, or vimtop, it shows you real-time CPU and memory process usage, but only for the PostgreSQL database (VCDB).

There are many single-character commands available from this screen.  Press "?" to see a list of them.

 

 

pg_top help screen

 

Here is a list of pg_top commands.  Note that since this a database-specific top, we can use the "Q" command to show the query of a currently running process, which can be useful to understand which see what table a SQL query is accessing.

Press the Space Bar a couple of times to return to the main pg_top screen.

 

 

pg_top with a CPU-intensive process/query

 

Here is another screenshot of pg_top, but while the PostgreSQL database was running a CPU-intensive query.  Here are some things to note:

  1. The CPU for this process was 97.79% (very high)
  2. The PID for this process was 3063 (which would be entered when using the "Q" command to show the query details)
  3. The STATE for this process is "run", which means the process is still running (note the other processes are in a "sleep" state)
  4. The COMMAND includes a "DECLARE CURSOR"; CURSOR usually means a query on the stats table.  Recall from the earlier VAMI database section that the VCDB consists of Alarms, Events, Tasks, and Stats (Performance Statistics).  We'll confirm this query is on the "stats" table when we look at the query details on the next screen.

Since we are not running a benchmark in the lab environment, the next screen will show you what the output would be upon typing "Q" and then the PID (3063).

 

 

pg_top query details

 

Here is the result of querying the CPU-intensive query (PID 3063).  The "SELECT sc.stat_id" confirms that the SELECT SQL command was on the stats table.

Your environment (queries, tables) may be different; just be mindful of queries that are long-running may be scanning all partitions.

 

Clients (UI and API) Performance Tips


This section discusses a few tips to achieve better vCenter client performance (either the user interface/UI or the APIs, e.g. PowerCLI).


 

Clients: UI

Here are some ways to ensure the vCenter user interface (UI) performance is optimal. 

 

 

Clients: API (PowerCLI) - Default PowerCLI

 

This is an example of some PowerCLI code that was taken from the VMware Community Forums: https://communities.vmware.com/thread/499845

While it gets the job done, internal performance testing with 20 hosts and 300 VMs showed that this code ran for 80 seconds.  Let's see how this code could be optimized, and how much faster it could run.

 

 

Clients: API (PowerCLI) - Optimized PowerCLI

 

Note that this PowerCLI code does the same thing, but it makes much fewer API calls to vCenter -- namely, the highlighted Get-VM and Get-VMHost calls are only executed once - outside of the ForEach loop.  Minimizing unnecessary/repeated PowerCLI calls is key to obtaining better client API performance.

By doing this, the runtime for the script was reduced from 80 seconds to 7.5 seconds (a 10x speedup).

 

Conclusion and Clean-Up



 

Clean up procedure

To free up resources for the remaining parts of this lab, we need to shut down all used virtual machines and reset the configuration.

 

 

Open a PowerShell Window (if necessary)

 

Click on the Windows PowerShell icon in the taskbar to open a command prompt

NOTE: If you already have one open, just switch back to that window.

 

 

Reset Lab

 

To reset the lab, type

.\StopLabVMs.ps1

and press Enter.  This resets the lab into a base configuration. You now can move on to another module.

 

 

Conclusion

This concludes the vCenter Performance Analysis module. We hope you have enjoyed taking it. Please remember to fill out the survey when you finish.

 

Module 6 - Database Performance Testing with DVD Store (30 minutes)

Introduction


 

This Module introduces DVD Store 3, also known as DS3 for short.  It simulates an online store that allows customers to logon, search for DVDs, read customer reviews, rate helpfulness of reviews, and purchase DVDs.

Here is a brief overview for the content in this Module:


What is DVD Store 3?


This lesson will describe what DVD Store is, including all of its various features.


 

DVD Store 3 Description

 

Here is an overview of the DVD Store 3 (DS3) benchmark:

 

 

DVD Store 3 Database Sizes

DVD Store 3 supports three standard sizes of small, medium, and large. In addition to these standard sizes, any custom size can be specified during the DVD Store setup. The number of rows in the various tables that make up the DVD Store 3 database are what is varied to determine the size specified.

The table below shows the number of rows for the standard sizes for the Customers, Orders, and Products tables as examples:

Database
Size
Customers
Orders
Products
Small
10 MB
20,000
1,000/month
10,000
Medium
1 GB
2,000,000
100,000/month
100,000
Large
100 GB
200,000,000
10,000,000/month
1,000,000

 

Downloading/Installing DVD Store 3


This lesson will describe how to install the DVD Store 3 benchmark.  Specifically, we will look at how we set it up for this lab environment using the LAMP stack (Linux, Apache, MySQL, and PHP) stack.

NOTE #1: The LAMP stack is only one of the supported environments for DVD Store 3.  The benchmark supports a variety of databases: Microsoft SQL Server, Oracle, MySQL, and PostgreSQL.

NOTE #2: This VM and database have already been created; this is informational, if you'd like to set it up for testing in your own environment.

Creating the database is resource intensive, in terms of both time and storage, so it is not available for the hands-on lab environment.


 

Create a Linux VM

 

This screenshot shows that, in our lab environment, DVD Store 3 is installed in a CentOS Linux VM with 1 vCPU, 1 GB of memory, and a 10 GB hard disk.

You may notice that these are lower minimum system requirements than the Weathervane module.  There are a couple of reasons for this:

  1. We are only exercising a couple of applications in this VM (namely MySQL for the database tier and Apache HTTP Server for the web server tier).
  2. This VM has been built with a small database size.  From the previous lesson, we learned that DVD Store 3 comes in 3 sizes: small (10 MB), medium (1 GB), and large (100 GB). For building a medium or large database, you should scale up the CPU, memory, and disk size appropriately.

 

 

OS Installation/Post-Install Tasks

DVD Store should work on any modern Linux distribution.  This VM was installed with CentOS 6.8.

After the OS installation, some of tasks should be run as the root user, prior to installing DS3:
NOTE: these have already been done in our lab environment; do not run these commands now!

  1. Update all software packages by running the command yum update
  2. Install VMware Tools (or open-vm-tools)
  3. Stop the firewall by running service iptables stop and disable it on boot by running chkconfig iptables off
    (NOTE: this is for ease of use in a test/dev environment; never do this in production!)
  4. Install MySQL by running yum install mysql-server and start it by running service mysqld start
  5. Install Apache HTTPD Server by running yum install httpd httpd-devel
  6. Install PHP with MySQL support by running yum install php php-mysql
  7. Create a user named web and set its password to web:
    useradd web passwd web chmod 755 /home/web
  8. Set permissions for this new user within MySQL:
    mysql >create user 'web'@'localhost' identified by 'web'; >grant ALL PRIVILEGES on *.* to 'web'@'localhost' IDENTIFIED BY 'web'; >exit;

 

 

Download and Extract DS3

 

DVD Store 3 is an open source project that is actively developed and maintained.  The latest version can be downloaded from github, as shown here, from https://github.com/dvdstore/ds3/

To extract DS3, login as root to your CentOS host, and unzip it with the command unzip ds3-master.zip

NOTE: This has already been done in our hands-on lab environment, so do not run this command in the lab VM.

Finally, we need to copy the PHP Web pages to the correct place on the host (again, this has already been done in our lab, no need to run):

mkdir /var/www/html/ds3
cp /root/home/ds3/mysqlds3/web/php5/* /var/www/html/ds3
service httpd restart

 

Building a DVD Store 3 Database/Starting the Lab


This lesson shows how to build a DVD Store 3 database - hands-on!

We run the configuration script to generate the necessary SQL commands, but due to time and resource constraints, we do not run the actual build. Our lab environment already has a pre-built database ready to run.


 

Launch Performance Lab Module Switcher

 

Double click the Performance Lab MS shortcut on the Main Console desktop.

 

 

Start Module 6

 

Click on the Module 6 Start button (highlighted) to run a PowerShell script that starts the DVD Store 3 VM, and open a PuTTY session to it.

 

Once the module starts, a PuTTY window and a popup box appear indicating that is has started (as shown here). Click OK.

We are now ready to learn how to build a DS3 database!

 

 

Run the Install_DVDStore.pl script

 

Remember earlier when we learned DS3 has three "canned" database sizes (small, medium, and large)?  Well, we can also specify a custom database size to build.  Here's how:
(Press Enter after each command/value)

  1. Change to the DS3 directory.  In this VM, it's been installed to /root/ds3 and you're already in the /root folder so type:
    cd ds3
  2. Run the Install_DVDStore Perl script:
    perl Install_DVDStore.pl
  3. We are now asked how big we want our DS3 database to be.  Let's build a 100 MB MySQL database:
    100
  4. When asked if the database size is in MB or GB, specify MB:
    MB
  5. Since DS3 supports multiple databases, we need to specify MYSQL:
    MYSQL
  6. Finally, DS3 needs to know if the database server will be on a Windows or Linux machine; this determines whether the input files will have CR/LF (DOS format).  Choose LINUX:
    LINUX

The Install_DVDStore.pl script now does the following:

Please wait for the Perl script to finish.

 

This is how the script looks upon completion.  Look for the message highlighted: Completed creating and writing build scripts for MySQL database...

Now that all the MySQL scripts have been generated, the database would normally be built at this point.  The reason that the scripts are generated instead of just doing the database creation directly is that it allows for the database to be easily recreated later, or even modified if needed, to address specific testing requirements of individual environments.

The database build is accomplished by the following commands.  NOTE: Do not run these commands in the lab environment, for a couple of reasons: the database build takes a long time, and we have already saved you the trouble (a database has been built and is ready to run).

# NOTE: Do not run these commands in the lab environment
cd mysqlds3
sh mysqlds3_create_all.sh

Now that we've seen how to build a DS3 database, let's start an actual run!

 

Configuring/Running DVD Store 3


This lesson will describe how to configure the DVD Store 3 load driver and run it against the MySQL database VM deployed in the lab environment.


 

Start top on the DVD Store VM

 

To view the performance of the DVD Store VM, type the command top and press Enter.  This shows us how much CPU and memory are consumed along with which processes are taxing the VM the most.

Next, we kick off the DS3 driver from our Windows machine.

 

 

Start the DVD Store driver on the Main Console

 

On the Main Console (Windows desktop), double-click the DVD Store 3 Driver icon shown here (note: you may need to minimize some windows in order to see it).

 

 

Monitor the driver and the PuTTY windows during the run

 

While the run is progressing, you should watch both the PuTTY console running top (shown here on the top) and the DS3 driver window (shown here on the bottom).

Let's make some observations about this screenshot (note: due to the variability of the cloud, your performance may vary):

  1. The CPU utilization line in top shows us that 34.2% is consumed in user space (application), 9.3% in system (kernel), for a total of 43.5% CPU.  There is zero idle time, however; the rest of the CPU (55.5%) is waiting for I/O -- meaning we likely have a disk or network bottleneck in our environment.
  2. The process that is consuming the 43.5% CPU utilization we saw is mysqld (the MySQL database) -- which makes sense, since we're hammering it with a database benchmark!

    Let's look at the output of the driver:

  3. These are normal DS3 driver startup messages, indicating the various threads that are connecting to the database server before the actual run begins
  4. Approximately every ten seconds, you will see a performance summary output to the screen (notice et, elapsed time, goes up by ten each line).
  5. There are many statistics on each line (many of them dealing with rt which is short for response time), but we're most interested in the primary DVD Store throughput performance metric, known as opm or orders per minute.  Here we can see we're only achieving about 40 opm on average, which is very low.  You would achieve much higher opm numbers in an optimized testbed.

Congratulations!  You're now running DVD Store!

Here's the command we used on the Windows machine to start the driver, in case you're curious:

c:\hol\ds3mysqldriver --target=dvdstore-01a.corp.local --n_threads=5 --warmup_time=0 --detailed_view=Y

Let's see what each of these driver parameters means.

 

 

Show the driver parameters

 

Click the Command Prompt icon to open a Command Prompt window.

 

Type this command as shown and press Enter:
ds3mysqldriver

You can see a list showing each Parameter Name, Description, and Default Value.  You can also create a configuration file and pass that on the command line instead of manually setting each parameter.

 

Analyzing Results/Improving DVD Store 3 Performance


This lesson describes how to anaylze DVD Store 3 results (specifically, comparing and contrasting a low-performing run versus a higher-performing run) and then looking at ways to improve performance.


 

DVD Store 3 Performance Metrics: opm (throughput) and rt (response time)

Performance metric (abbreviation) Definition Value
opm Orders Per Minute (throughput) Higher = better
rt Response Time (latency) Lower = better

We will look at a couple of results that we'll call "bad" and "good".

 

 

Example output: "Bad" performance (low opm, high rt)

 

Here is example output of a poorly-performing configuration.  We will look at several key areas:

  1. Real-time output: every 10 seconds (et is short for elapsed time, in seconds), the DS3 driver will output a line showing how long it's been running, how many orders per minute (opm) were achieved, and response times (rt).
  2. After the driver finishes, it prints a line that starts with Final that shows the overall performance statistics.
    et=  60.0 tells us that this was a short run (only one minute).
  3. opm=41 tells us that the database server was only able to process 41 operations per minute.  This is low, but expected, as it was run in our nested hands-on lab environment which shares resources with many other labs.
  4. rt_tot_avg=13218 tells us that the average response time was 13218 milliseconds (13.218 seconds).  This is high, but again, expected.

Let's compare this to a high-performing run that was done in an isolated dedicated lab environment.

 

 

Example output: "Good" performance (high opm, low rt)

 

Here is example output of a high-performing configuration:

  1. This summary line which starts with Final that shows the overall performance statistics.
    et=  609.4 tells us that this was a 10-minute run (~600 seconds).
  2. opm=74932 indicates this database server was able to process 74,932 operations per minute.  This is much higher than the previous example, as it is a highly-tuned performance configuration.
  3. rt_tot_avg=87 tells us that the average response time was only 87 milliseconds.  Again, this low value is in stark contrast to the previous example.

So what factors determine whether a database server can sustain high load, and thus achieve the maximum opm?

 

 

Database Performance Factors

Obviously, we want to achieve the maximum opm (database performance) possible in our environment.
There are many factors that affect performance, and there isn't enough time in this lab to cover any one in detail, but here is a short list (NOTE: some references say 6.5, but still apply to 6.7):

By following these guides and testing the performance of your particular environment prior to production deployment, you can ensure your virtualized databases achieve maximum throughput.

 

Conclusion and Clean-Up


Congratulations! You now know how to install, configure, and run the DVD Store 3 benchmark!

You've also learned how to tune your database server to achieve the maximum orders per minute (opm), so your database throughput will be as high as possible with the lowest response times.


 

Stop Module 6

 

 

To end this module:

  1. Click on the Module Switcher in the taskbar (or the desktop icon if you closed it)
  2. Click the Stop button for Module 6.

 

 

Resources/Helpful Links

Congratulations on completing this module!

For more information about DVD Store 3, and database performance in general, here are some helpful links:

Best Practices:

DVD Store blogs/whitepapers:

DVD Store 3 is also one of the key workloads in VMmark 3.0:

 

Module 7 - Application Performance Testing with Weathervane (45 minutes)

Introduction


 

This Module introduces Weathervane, a new application-level performance benchmark designed to allow the investigation of performance trade-offs in modern virtualized and cloud infrastructures.

Here is a brief overview for the content in this Module:


What is Weathervane?


This lesson will describe what the Weathervane benchmark is, and how it is different from traditional benchmark workloads.


 

Weathervane Description

 

 

 

Weathervane Components

 

Weathervane consists of three main components (if the picture above seems daunting, do not fear: this lab has all three components running inside one Linux VM!).  It is possible to run every Weathervane service in one VM or container, but it is also possible to run only specific service tiers, or even only specific service instances.

  1. The Workload driver that can drive a realistic and repeatable load against the application
  2. The Run Harness that automates the process of executing runs and collecting results and relevant performance data
  3. The Auction Application itself is a web-application for hosting real-time auctions.

We will take a look at each of these components in more detail then run Weathervane in our lab environment.

 

 

Workload Driver

 

The Weathervane workload driver has several key features:

 

 

Run Harness

 

The Weathervane run harness is controlled by a configuration file that describes the deployment, including:

The harness also does several other extremely useful tasks:

Later in this module, we will start an actual run in the lab environment using the harness to see how easy it is -- it is literally just one command!

 

 

Auction Application

 

The Auction Application, as we can tell from the picture above, is the most complex portion of Weathervane.

It is a web app that simulates hosting real-time auctions.  It uses an architecture that allows deployments to be easily scaled to, and sized for, a large range of user loads. A deployment of the application involves a wide variety of support services, such as caching, messaging, data store, and relational database tiers. Many of the services are optional, and some support multiple provider implementations.

A default Weathervane deployment like the VM in this lab uses the following applications (click the links for more information about the applications).  All are set up "out of the box" (ready to run) via the automatic setup script that comes with the benchmark:

In addition, the number of instances of some of these services can be scaled elastically at run time in response to a preset schedule or to monitored performance metrics. The flexibility of the application deployment allows us to investigate a wide variety of complex infrastructure-related performance questions.

 

Downloading/Installing Weathervane


This lesson describes how to install the Weathervane benchmark.  It is very easy set up as most of it is automated.

NOTE: Weathervane has already been installed in our hands-on lab environment, so this lesson is purely informational (for example, if you want to learn how easy it is to install Weathervane in your own environment).  In the next lesson, we will configure and run Weathervane in the lab environment.


 

Create a Weathervane VM

 

Creating a Weathervane host is relatively straightforward. The process of setting up Weathervane starts with creating a Weathervane host, which is a CentOS 7 VM that we configure to run the workload driver, run harness, and application components.  When creating a VM, select Linux as the Guest OS Family, and Red Hat Enterprise Linux 7 (64-bit) as the Guest OS Version. This is necessary in order for proper operation of customization scripts when cloning the VM.

As shown in the screenshot, the virtual hardware must have at least 2 CPUs, 8 GB of memory, and at least 20 GB of disk space (we used 30 GB in this example).  For larger deployments, the hardware can be scaled up appropriately (see the Weathervane documentation for more details).

 

 

Install CentOS 7

 

The CentOS 7 installation may be a Minimal Install (the default, as shown) or a full desktop install.

In fact, you may want to create one Weathervane host with a full desktop install for running the harness, and a second with a Minimal Install for cloning to VMs for running the various Weathervane services.

 

 

Post-OS Installation Tasks

After completing the OS installation, some of tasks should be done prior to installing Weathervane:

  1. Update all software packages by running the command yum update as the root user.
  2. Install VMware Tools (for CentOS 7, open-vm-tools) by running the command yum install -y open-vm-tools as the root user.
  3. Install Java by running the command yum install –y java-1.8.0-openjdk* as the root user.
  4. Install Perl by running the command yum install –y perl as the root user.

NOTE: These commands will not work in the lab environment, but these tasks have already been performed in our VM.

 

 

Download and Extract Weathervane

 

Weathervane is an open source project developed by VMware.  As such, the latest release tarball (.tar.gz) can be downloaded from github, as shown here, from https://github.com/vmware/weathervane/releases

A release tarball is a snapshot of the repository at known good point in time.  Releases are typically more heavily tested than the latest check-in on the master branch.

To install Weathervane, login as root to your CentOS host and unpack the tarball with the command tar zxf weathervane-1.0.14.tar.gz NOTE: This has already been done in our hands-on lab environment, so do not run this command in the lab VM.

Once the tarball is extracted, build the Weathervane executables.

 

 

Building the Weathervane Executables

To build the Weathervane executables, unpack the tarball in the previous step, go into the /root/weathervane directory and issue the command:
./gradlew clean release NOTE: This has already been done in our hands-on lab environment, so do not run this command in the lab VM.

The first time you build Weathervane, this downloads a large number of dependencies. Wait until the build completes before proceeding to the next step.

 

 

Running the Weathervane auto-setup script

The auto-setup script configures the VM to run all of the Weathervane components.
NOTE: the VM must be connected to the Internet in order for this process to succeed.

From the Weathervane directory, Run the script using the command:
./autoSetup.pl NOTE: This has already been done in our hands-on lab environment, so do not run this command in the lab VM.

The auto-setup script may take an hour or longer to run depending on the speed of your internet connection and the capabilities of the host hardware.
Once it has completed, the VM must be rebooted.  Weathervane is now ready to run!

 

Configuring Weathervane


This lesson describes how to start the lab and configure the Weathervane benchmark on our lab environment deployment.


 

Launch Performance Lab Module Switcher

 

Double click the Performance Lab MS shortcut on the Main Console desktop, or switch to that window on the taskbar.

 

 

Start Module 7

 

Click on the Module 7 Start button (highlighted) to start a PowerShell script to start the Weathervane VM, and open two PuTTY sessions to it.

 

Once the module starts, you see two PuTTY windows side-by-side and a popup window (as shown here). Click OK.

We are now ready to configure and run Weathervane!

 

 

Configuring Weathervane

 

We should look at the Weathervane configuration file to see how configurable this benchmark is.

In the PuTTY window on the left, type this command and press Enter:

less weathervane.config

We can now use the standard navigation keys (Up/Down arrows, Page Up/Down) to see the various parameters to customize.

 

We are now looking at the beginning of the Weathervane configuration file.  As standard with most configuration files, lines that start with "#" are commented out and thus ignored by Weathervane.

Highlighted here is one of the most useful parameters (which is why it is at the top!): users.  As the comments state, this determines how many simulated users are active during a Weathervane benchmark run.  This has already been reduced to the minimum value of 60 due to the constraints of our lab environment, but the default is 300 as we will see next.

 

In the right-hand PuTTY window, type the following command and press Enter:
(Note: the character before less is the pipe symbol (typically typed by holding down Shift and pressing the backslash \ key.  You can also select this text and drag-and-drop it directly into the PuTTY window -- try it!)

./weathervane.pl --help |less

 

The --help command we just ran lists all the Weathervane command-line parameters.  If any of these parameters are set on the command line, it will override both the Parameter Default and even the value set in the weathervane.config  file we just looked at.

As shown in this screenshot, the users parameter defaults to a value of 300, but we have set it to the minimum value of 60 in the weathervane.config.  If we wanted to try a Weathervane run of 100 users, we could override it on the command line, i.e.  ./weathervane.pl --users=100.

 

In both PuTTY windows, press the Page Down key to scroll down to the next page, and you should see a screen similar to this.  As the help text explains, Weathervane has three run length parameters: rampUp, steadyState, and rampDown.  To make it easier, you can set all three parameters by changing runLength to short, medium, or long.

In the interest of time (and to not tax our lab environment for any longer than it needs to be!), we have set the values to 30, 60, and 0 in our configuration file.  In an actual benchmark environment, we would want to set runLength to medium or long to gauge performance over a longer period of time.

At this point, feel free to use the arrow keys and the Page Up/Page Down keys to look at all of the parameters Weathervane supports.  As you can see, it is very configurable!

 

Now that we have looked at the Weathervane configuration file and the help text, left-click in each PuTTY window and press q to "quit" less and return to the bash shell.  You should see a screen similar to this.

In the next lesson, we'll start an actual Weathervane benchmark run!

 

Running/Tuning Weathervane


This lesson describes how to run and tune the Weathervane benchmark using the VM deployed in the lab environment.


 

Running Weathervane

 

Now that we have learned how to configure Weathervane, we can start a test run!  This is actually the easiest part, since the run harness automates starting the necessary services, gathering performance statistics, and stopping the benchmark once the run lengths we specified have elapsed.

Click in the left-hand PuTTY window, and start the Weathervane benchmark harness by running one simple command (and press Enter):

./weathervane.pl

Note that since we are already in the /root/weathervane directory, we invoke weathervane.pl from the current directory.

In the right-hand PuTTY window, the processes consuming CPU, memory, etc. in real-time can be monitored while Weathervane is running by running the Linux top command (press Enter afterwards):

top

 

Weathervane should now be running!

The top output can be broken down into three sections:

  1. This shows the CPU utilization of the two virtual CPUs (vCPUs); these values will fluctuate throughout the run.  In this screenshot, they are both heavily utilized (95-96%), which is expected for this benchmark.
  2. This shows the memory utilization of the VM.
    The top line (KiB Mem) shows us that most of the 8 GB we have allocated to the VM is used, with very little free; again, this is expected, as there are many services/processes running and consuming RAM.
    Conversely, the next line (KiB Swap) shows that while we have ~3 GB of swap space, most of it is free, and very little used; this is a Good Thing, as Linux is not having to swap memory to disk (which is likely what would happen if we did not give the VM enough memory, i.e. 4 GB)
  3. The bottom part of the top output shows the running processes, sorted by highest CPU utilization (%CPU) first.  At a quick glance, we can see that java (Tomcat), mongod (MongoDB), and postgres (PostgreSQL) are the heavy hitters.

This benchmark run takes some time to complete (~15 minutes from start to finish).  While we wait, we can browse through the Weathervane documentation to see how we can improve performance.

 

 

Tuning Parameters (User's Guide)

The Weathervane User's Guide comes as a PDF with the benchmark that shows how to install, configure, and tune Weathervane.  It also has a handy section on Tuning Parameters.

The document is available here on github: https://github.com/vmware/weathervane/raw/master/weathervane_users_guide.pdf

 

We will not make you read this 99-page document from beginning to end :-)  In any case, we have already touched on a lot of what this guide covers in terms of installation and configuration.

Therefore, scroll down to page 56 (shown here), which has a section on Component Tuning.  Skim through the next few pages to get a feel for the parameters you can experiment with to tune the various tiers inside Weathervane:

Another way to improve performance of a Weathervane environment is to clone the Weathervane host VM, and assign different services to each.  For example, you can have separate (and multiple) VMs that act as application servers, web servers, NoSQL data stores, etc.  For more information, see section 7.5 of the User's Guide, "Cloning the Weathervane VM".

 

 

Check on the Weathervane run

 

Periodically switch back to the PuTTY windows to check on the progress of the run.  When the Weathervane benchmark run has finished, you will see screens similar to this one.  Specifically:

  1. On the left, you will see messages about Cleaning and compacting storage, and whether the run Passed or Failed.
    NOTE: It is OK if it says failed and/or a message such as Failed Response-Time metric.  In our shared lab environment, the response times likely won't meet the benchmark requirements.  This would not be an issue in a dedicated test/dev environment.
  2. This Take specific note of the run number at the end (in this example, it is Run 8).  We use that number in the next step when we look at the output files.
  3. On the right, note the top screen will indicate the Linux VM is now essentially idle (%Cpu less than 1%, and most of the memory is free).
  4. Once you have confirmed the run is over, close the PuTTY window on the right by clicking the "x" in the upper-right (click OK when PuTTY asks you to confirm).
  5. Maximize the remaining PuTTY window on the left by clicking the maximize button in the upper-right, as shown.

 

 

 

Analyzing Weathervane benchmark output

After running the benchmark, you can look at the various log files the Weathervane run harness collects:

  1. cd output (all Weathervane output is stored in /root/weathervane/output)
  2. ls (to show all the runs on this VM; determine the most recent one)
  3. cd 8/ (replace with the most recent run number)
  4. cat version.txt (records the version of Weathervane used to run this result)
  5. cat run.log (shows any errors and details of response-times from each of the application instances)
  6. cat console.log (not shown; this is just a record what you already saw output to the PuTTY console, i.e. the start/stopping of services, whether the run passed or failed, and cleanup)

Once you are done looking at these files, you can close this PuTTY console.

If a run passes, this means that the application deployment and the underlying infrastructure can support the load driven by the given number of users with acceptable response-times for the users' operations.

A typical way of using Weathervane is to compare the maximum number of supported users when some component of the infrastructure is varied. For example, if the same application configuration is run on two different servers, you can compare the maximum user load supported by the servers to determine which has better performance for this type of web application.

Congratulations! You now know how to run Weathervane!

 

Conclusion and Clean-Up


Congratulations! You now know how to install, configure, and run the Weathervane benchmark!


 

How to End Module 7

 

To end this module, open the Module Switcher window and click the Stop button for Module 7.

 

 

Resources/Helpful Links

For more information about Weathervane, here are some helpful links:

 

Module 8 - Processor Performance Monitoring, Host Power Management (30 minutes)

Intro to CPU Performance Monitoring and Host Power Management


The goal of this module is twofold:

  1. Expose you to a CPU contention issue in a virtualized environment, and quickly identify performance problems by checking various performance metrics and settings
  2. Learn about how to review/change power management policies at the server BIOS and within ESXi via the vSphere Client

Performance problems may occur when there are insufficient CPU resources to satisfy demand. Excessive demand for CPU resources on a vSphere host may occur for many reasons. In some cases, the cause is straightforward. Populating a vSphere host with too many virtual machines running compute-intensive applications can make it impossible to supply sufficient CPU resources to all the individual virtual machines. However, sometimes the cause may be more subtle, related to the inefficient use of available resources or non-optimal virtual machine configurations.

Let's get started!


 

Check the Lab Status in the lower-right of the desktop

 

Please check to see that your lab is finished all the startup routines and is ready for you to start. If you see anything other than "Ready", please wait a few minutes. If after five minutes you lab has not changed to "Ready", please ask for assistance.

 

CPU Contention, vCenter Performance Charts


Below are a list of most common CPU performance issues:

High Ready Time: A CPU is in the Ready state when the virtual machine is ready to run but unable to run because the vSphere scheduler is unable to find physical host CPU resources to run the virtual machine on. Ready Time above 10% could indicate CPU contention and might impact the Performance of CPU intensive application. However, some less CPU sensitive application and virtual machines can have much higher values of ready time and still perform satisfactorily.

High Costop time: Costop time indicates that there are more vCPUs than necessary, and that the excess vCPUs make overhead that drags down the performance of the VM. The VM likely runs better with fewer vCPUs. The vCPU(s) with high costop is being kept from running while the other, more-idle vCPUs are catching up to the busy one.

CPU Limits: CPU Limits directly prevent a virtual machine from using more than a set amount of CPU resources. Any CPU limit might cause a CPU performance problem if the virtual machine needs resources beyond the limit.

Host CPU Saturation: When the Physical CPUs of a vSphere host are being consistently utilized at 85% or more then the vSphere host may be saturated. When a vSphere host is saturated, it is more difficult for the scheduler to find free physical CPU resources in order to run virtual machines.

Guest CPU Saturation: Guest CPU (vCPU) Saturation is when the application inside the virtual machine is using 90% or more of the CPU resources assigned to the virtual machine. This may be an indicator that the application is being bottlenecked on vCPU resource. In these situations, adding additional vCPU resources to the virtual machine might improve performance.

Oversizing VM vCPUs: Using large SMP (Symmetric Multi-Processing) virtual machines can cause unnecessary overhead. Virtual machines should be correctly sized for the application that is intended to run in the virtual machine. Some applications may only support multithreading up to a certain number of threads. Assignment of additional vCPU to the virtual machine may cause additional overhead. If vCPU usage shows that a machine, which is configured with multiple vCPUs and is only using one of them. Then it might be an indicator that the application inside the virtual machine is unable to take advantage of the additional vCPU capacity, or that the guest OS is incorrectly configured.

Low Guest Usage: Low in-guest CPU utilization might be an indicator, that the application is not configured correctly, or that the application is starved of some other resource such as I/O or Memory and therefore cannot fully utilize the assigned vCPU resources.


 

Launch Performance Lab Module Switcher

 

Double click the Performance Lab MS shortcut on the Main Console desktop.

 

 

Start Module 8

 

Click on the Start button for Module 8, and a script launches.  

 

The script takes a few minutes to run.  

 

 

Wait until you see "Press Enter to continue" to proceed. Press enter.  

 

 

CPU Benchmarks Started

 

When the script completes, you see two Remote Desktop windows open (note: you may have to move one of the windows to display them side by side, as shown above).

The script has started a CPU intensive benchmark (SPECjbb2005) on both perf-worker-01a and perf-worker-01b virtual machines, and a GUI is displaying the real-time performance value as this workload runs.

If you do not see the SPECjbb2005 window open, launch the shortcut in the upper left hand corner.  

Above, we see an example screenshot where the performance of the benchmarks are around 15,000.

IMPORTANT NOTE: Due to changing loads in the lab environment, the performance values may vary.  Please make note of the approximate Performance scores, as it will change later.

 

 

Navigate to VM-level Performance Chart

 

Click on the Chrome icon to open a browser window.

 

This is the vCenter login screen.  To login to vCenter:

  1. Check the Use Windows session authentication checkbox
  2. Click the LOGIN button

 

  1. Select the perf-worker-01a virtual machine from the list of VMs on the left
  2. Click the Monitor tab
  3. Click Performance
  4. Click Advanced
  5. Click on the Popup Chart icon so we can get a dedicated chart popup window.

 

 

Select Chart Options

 

Let's maximize the window and select specific counters via Chart Options:

  1. Click the Maximize window icon (be careful not to click Close!)
  2. Click Chart Options at the top

 

 

Select CPU Counters for Performance Monitoring

 

When investigating a potential CPU issue, there are several counters that are important to analyze:

  1. Select CPU on the left-hand side (if it's not already selected by default)
  2. Scroll through the list, and check these counters: Demand, Ready, and Usage in MHz
  3. Only select perf-worker-01a for the Target Object (deselect 0 if it's checked)
  4. Click OK

 

 

Monitor Demand vs. Usage lines

 

Notice the amount of CPU this virtual machine is demanding and compare that to the amount of CPU usage the virtual machine is actually allocated (Usage in MHz). The virtual machine is demanding more than it is currently being allowed to use.

Notice that the virtual machine is also seeing a large amount of ready time. Guidance: Ready time > 10% could be a performance concern.

You can close this popup window, but please leave the vSphere Client window open.

 

 

CPU State Times Explanation

 

Virtual machines can be in any one of four high-level CPU States:

 

 

Explanation of value conversion

 

NOTE:  vCenter reports some metrics such as "Ready Time" in milliseconds (ms). Use the formula above to convert the milliseconds (ms) value to a percentage.

For multi-vCPU virtual machines, multiply the Sample Period by the number of vCPUs of the VM to determine the total time of the sample period. It is also beneficial to monitor Co-Stop time on multi-vCPU virtual machines.  Like Ready time, Co-Stop time greater than 10% could indicate a performance problem.  You can examine Ready time and Co-Stop metrics per vCPU as well as per VM.  Per vCPU is the most accurate way to examine statistics like these.

 

 

Navigate to Host-level CPU chart view

 

  1. Select esx-01a.corp.local in the vSphere Client.
  2. Select the Monitor tab
  3. Select the Advanced Performance view
  4. Select the Popup Chart icon

 

 

Examine ESXi Host Level CPU Metrics

 

Click the Maximize window icon to get the maximum real estate.

Notice in the Chart, that only one of the CPUs (pictured here in green) on the host seems to have any significant workload running on it.  We'll see why this is the case next.

 

 

VM CPU Affinity set via PowerCLI

 

The PowerShell/PowerCLI script that ran when we started this lab set the CPU Affinity of both VMs (perf-worker-01a and perf-worker-01b) to CPU 1, as shown here.

Normally, affinitizing VMs to specific CPUs (also known as "pinning") is generally not a best practice.  It is only used here as a demonstration.

 

 

Stop 1 VM, Monitor ESXi Host CPU

 

Switch back to the Chrome window (vSphere Client) to shut down one of the VMs:

  1. Click on perf-worker-01b
  2. Click on the Shut Down Guest OS (stop icon) and click YES

Let's see if the ESXi Host CPU level has dropped from 100% by shutting this VM down.

 

 

Examine ESXi Host Level CPU with 1 VM

 

Notice that even after shutting down 1 of the VMs, CPU1 is still at 100%.  Why?

Since the remaining resources went to perf-worker-01a, let's see if its performance increased.

 

 

Notice VM Performance Increase on perf-worker-01a

 

If you recall the scores from both VMs at the beginning of the tests, you'll notice that the Performance of the remaining VM has increased to approximately double of its original value, now that we have shut down the other one (and thus reduced the CPU contention on CPU1).

 

 

Stop 2nd VM, Monitor ESXi Host CPU

 

Switch back to the Chrome window (vSphere Client) to shut down the remaining VM:

  1. Click on perf-worker-01a
  2. Click on the Shut Down Guest OS (stop icon) and click YES

Let's see if the ESXi Host CPU level has dropped from 100% by shutting this VM down.

 

 

Examine ESXi Host Level CPU

 

Now that there are no VMs running on the host, CPU1 is no longer at 100%:

  1. Notice the sharp dropoff in the line chart
  2. Monitor the Latest value for CPU1, and you'll notice it's essentially idle now.

 

 

Summary

In summary:

Next, let's talk about Power Management, and how to configure different power policies at the host/BIOS level and within ESXi.

 

Configuring Server BIOS Power Management


VMware ESXi includes a full range of host power management capabilities. These can save power when an ESXi host is not fully utilized.  As a best practice, you should configure your server BIOS settings to allow ESXi the most flexibility in using the power management features offered by your hardware, and make your power management choices within ESXi (next section).

On most systems, the default setting is BIOS-controlled power management. With that setting, ESXi won’t be able to manage power; instead the BIOS firmware manages it.  The sections that follow describe how to change this setting to OS Control (recommended for most environments).

In certain cases, poor performance may be related to processor power management, implemented either by ESXi or by the server hardware.  Certain applications that are very sensitive to processing speed latencies may show less than expected performance when processor power management features are enabled. It may be necessary to turn off ESXi and server hardware power management features to achieve the best performance for such applications.  This setting is typically called Maximum Performance mode in the BIOS.

NOTE: Disabling power management usually results in more power being consumed by the system, especially when it is lightly loaded. The majority of applications benefit from the power savings offered by power management with little or no performance impact.

Bottom line: some form of power management is recommended and should only be disabled if testing shows this is impacting your application performance.

For more details on how and what to configure, see this white paper: http://www.vmware.com/files/pdf/techpaper/hpm-perf-vsphere55.pdf


 

Configuring BIOS to OS Control mode (Dell example)

 

The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS can be configured to allow the OS (ESXi) to control the CPU power-saving features directly:

For a Dell PowerEdge 12th Generation or newer server with UEFI (Unified Extensible Firmware Interface), review the System Profile modes in the System Setup > System BIOS settings. You see these options:

Choose Performance Per Watt (OS).

Next, you should verify the Power Management policy used by ESXi (see the next section).

 

 

Configuring BIOS to OS Control mode (HP example)

 

The screenshot above illustrates how a HP ProLiant server BIOS can be configured through the ROM-Based Setup Utility (RBSU).  The settings highlighted in red allow the OS (ESXi) to control some of the CPU power-saving features directly:

Next, you should verify the Power Management policy used by ESXi (see the next section).

 

 

Configuring BIOS to Maximum Performance mode (Dell example)

 

The screenshot above illustrates how an 11th Generation Dell PowerEdge server BIOS can be configured to disable power management:

For a Dell PowerEdge 12th Generation or newer server with UEFI, review the System Profile modes in the System Setup > System BIOS settings. You see these options:

Choose Performance to disable power management.

NOTE: Disabling power management usually results in more power being consumed by the system, especially when it is lightly loaded. The majority of applications benefit from the power savings offered by power management, with little or no performance impact. Therefore, if disabling power management does not realize any increased performance, VMware recommends that power management be re-enabled to reduce power consumption.

 

 

Configuring BIOS to Maximum Performance mode (HP example)

 

The screenshot above illustrates how to set the HP Power Profile mode in the server's RBSU to the Maximum Performance setting to disable power management:

NOTE: Disabling power management usually results in more power being consumed by the system, especially when it is lightly loaded. The majority of applications benefit from the power savings offered by power management with little or no performance impact. Therefore, if disabling power management does not realize any increased performance, VMware recommends that power management be re-enabled to reduce power consumption.

 

Configuring ESXi Host Power Management


VMware ESXi includes a full range of host power management capabilities. These can save power when an ESXi host is not fully utilized.  As a best practice, you should configure your server BIOS settings to allow ESXi the most flexibility in using the power management features offered by your hardware, and make your power management choices within ESXi.  These choices are described below.


 

Select Host Power Management Settings for esx-01a

 

  1. Select "esx-01a.corp.local"
  2. Select "Configure"
  3. Select "Hardware" (you will need to scroll all the way to the bottom)
  4. Select "Power Management"

 

 

Power Management Policies

 

On a physical host, the Power Management options could look like this (it may vary depending on the processors of the physical host).

Here you can see what ACPI states that get presented to the host and what Power Management policy is currently active.  There are four Power Management policies available in ESXi:

  1. Click "EDIT..." to see these different options.

NOTE: Due to the nature of this lab environment, we are not interacting directly with physical servers, so changing the Power Management policy will not have any noticeable effect.  Therefore, while the sections that follow will describe each Power Management policy, we won't actually change this setting.

 

 

High Performance

 

The High Performance power policy maximizes performance, and uses no power management features. It keeps CPUs in the highest P-state at all times. It uses only the top two C-states (running and halted), not any of the deep states (for example, C3 and C6 on the latest Intel processors). High performance was the default power policy for ESX/ESXi releases prior to 5.0.

 

 

Balanced (default)

 

The Balanced power policy is designed to reduce host power consumption while having little or no impact on performance. The balanced policy uses an algorithm that exploits the processor’s P-states. This is the default power policy since ESXi 5.0.  Beginning in ESXi 5.5, we now also use deep C-states (greater than C1) in the Balanced power policy. Formerly, when a CPU was idle, it would always enter C1. Now ESXi chooses a suitable deep C-state depending on its estimate of when the CPU will next need to wake up.

 

 

Low Power

 

The Low Power policy is designed to save substantially more power than the Balanced policy by making the P-state and C-state selection algorithms more aggressive, at the risk of reduced performance.

 

 

Custom

 

The Custom power policy starts out the same as Balanced, but allows individual parameters to be modified.

Click "Cancel" to exit.

The next step describes settings that control the Custom power policy.

 

 

Edit Advanced System Settings

 

To configure the custom power policy settings:

  1. Select Advanced System Settings (under the System section)
  2. Click the EDIT... button.

 

 

Filter Advanced System Settings

 

To filter the System Settings for only Power settings:

  1. Click inside the Filter text box (next to the Filter icon) and type the word Power. (make sure to add a period after the word Power)
  2. Click the first parameter, Power.ChargeMemoryPct
  3. Note that a description and valid minimum and maximum values appear in the lower-left corner.
  4. Click CANCEL after you've reviewed this list.

Some Advanced Settings you can customize include:

 

Conclusion and Clean-Up


In order to free up resources for the remaining parts of this lab, we need to shut down the used virtual machine and reset the configuration.


 

Stop Module 8

 

On your desktop, find the Module Switcher window and click the Stop button for Module 8.

 

 

Key takeaways

CPU contention problems are generally easy to detect. In fact, vCenter has several alarms that trigger if host CPU utilization or virtual machine CPU utilization goes too high for extended periods of times. 

vSphere allows you to create very large virtual machines (up to 256 vCPUs with 6.7 U2; see https://configmax.vmware.com/home for more information). It is highly recommended to size your virtual machine for the application workload that runs in them. Sizing your virtual machine with resources that are unnecessarily larger than the workload can actually use may result in hypervisor overhead and can also lead to performance issues.

In general, here are some common CPU performance tips.

Avoid a large VM on too small a platform

Don't expect as high of consolidation ratios with busy workloads as you did with the low-hanging-fruit

Here are some best practices around power management policies:

Depending on your applications and the level of utilization of your ESXi hosts, the correct power policy setting can have a great impact on both performance and energy consumption. On modern hardware, it is possible to have ESXi control the power management features of the hardware platform used. You can select to use predefined policies or you can create your own custom policy.

Recent studies have shown that it is best to let ESXi control the power policy.

 

Module 9 - Memory Performance with X-Mem (30 minutes)

Introduction


 

The goal of this module is to learn how to characterize memory performance in a virtualized environment.  VMware vSphere incorporates sophisticated mechanisms that maximize the use of available memory through page sharing, resource-allocation controls, and other memory management techniques.

Host memory is a limited resource, but it is critical that you assign sufficient resources (especially memory, but also CPU) to each VM so they perform optimally.

This module discusses an open-source memory benchmark named X-Mem, which can be used to characterize both memory bandwidth (throughput) and memory latency (access time).


What is X-Mem / Why X-Mem?


This lesson will describe what X-Mem is (no, it's not a superhero movie), and why we've decided to use it to characterize memory performance in this lab.


 

What X-Mem is: A Cross-Platform, Extensible Memory Characterization Tool for the Cloud

From the X-Mem page on github (https://github.com/Microsoft/X-Mem):

X-Mem is a flexible open-source research tool for characterizing memory hierarchy throughput, latency, power, and more. The tool was developed jointly by Microsoft and the UCLA NanoCAD Lab. This project was started by Mark Gottscho (Email: mgottscho@ucla.edu) as a Summer 2014 PhD intern at Microsoft Research. X-Mem is released freely and open-source under the MIT License. The project is under active development.

 

 

Why X-Mem / Alternatives

 

Of course, X-Mem is not the only memory benchmark available.  Here is a feature comparison of X-Mem versus some other popular memory benchmarks like STREAM, lmbench and Intel's mlc (source). Here is a quick summary of some key advantages that set it apart:

 

 

Research Paper and Attribution

There is a research tool paper describing the motivation, design, and implementation of X-Mem as well as three experimental case studies using tools to deliver insights useful to both cloud providers and subscribers. For more information, see the following links:

Citation:

Mark Gottscho, Sriram Govindan, Bikash Sharma, Mohammed Shoaib, and Puneet Gupta. X-Mem: A Cross-Platform and Extensible Memory Characterization Tool for the Cloud. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 263-273. Uppsala, Sweden. April 17-19, 2016. DOI: http://dx.doi.org/10.1109/ISPASS.2016.7482101

 

Downloading/Installing X-Mem


This lesson describes how to download the X-Mem benchmark.  There are prebuilt binaries for Windows and Linux; this lab demonstrates X-Mem inside of Windows VMs.


 

Download and Extract X-Mem

 

There are multiple ways to obtain X-Mem, but the easiest is to go to http://nanocad-lab.github.io/X-Mem/ and click the Binaries (zip) button, which has precompiled binaries for Windows.  If you're using Linux, or wish to make modifications to the source code, click the appropriate link.

 

 

Runtime Prerequisites

There are a few runtime prerequisites in order for the software to run correctly.  Note that these requirements are for the pre-compiled binaries that are available on the project homepage at https://nanocad-lab.github.io/X-Mem. Also note that these requirements are already met using our lab environment:

HARDWARE:

WINDOWS:

GNU/LINUX:

 

 

Installation

Fortunately, the only file that is needed to run X-Mem on Windows is the respective executable xmem-win-.exe on Windows, and xmem-linux- on GNU/Linux. It has no other dependencies aside from the pre-installed system prerequisites which were just outlined.

 

Running X-Mem



 

Launch Performance Lab Module Switcher

 

Double click on the Performance Lab MS shortcut on the Main Console desktop

 

 

 

Launch Module 9

Click on the Start button for Module 9.

NOTE: Please wait a couple of minutes, and do not proceed with the lab until you see Remote Desktop windows appear.

 

 

Reposition Remote Desktops

 

The script opens Remote Desktop Connections to two Windows VMs.  However, we need to make both of them visible.  Drag the title bars of the Remote Desktop windows:

  1. Position perf-worker-01a on the left (as shown)
  2. Position perf-worker-01b on the right (as shown)
  3. Note that perf-worker-01a has 4 vCPUs
  4. Note that perf-worker-01b has only 1 vCPU
  5. Note that both VMs have 2GB (2047 MB) RAM

Given #5, you might think the memory performance of these two VMs should be identical.  As we'll see, X-Mem can run multiple worker threads to exercise multiple CPUs simultaneously, allowing better scalability with more vCPUs.

 

 

X-Mem Command Line Options

Command-line Option Purpose
-j
# of worker threads to use in benchmarks.
NOTE: Can not be larger than the # of vCPUs.
-n
# of iterations to run; helps ensure consistency (the results shouldn't fluctuate much) 
-t
Throughput benchmark mode (as opposed to -l for latency benchmark mode)
-R
Use memory read-based patterns
-W
Use memory write-based patterns

Here is a summary of some of the command-line options we'll be using in this lab, but X-Mem has many more options to customize how it is run.

 

 

X-Mem Command-Line Help

 

Within the perf-worker-01b Remote Desktop window:

  1. Click the Command Prompt taskbar icon
  2. Type this command:  xmem -h  (h for help) and press Enter
  3. The window is not very big, and there's a lot of help text, so use the Up arrow or the scrollbar to scroll back up and see all the different options X-Mem has.

As you can see, X-Mem has a ton of options!  Let's look at some we'll be using for this lab.

 

 

Run X-Mem throughput (two jobs) on perf-worker-01b (FAIL)

 

You should already have a Command Prompt window open on perf-worker-01b from the previous step; if not, click the Command Prompt icon on the taskbar.

Let's try to run X-Mem with a couple of command-line parameters we just saw: -t to test memory throughput, and -j2 to run two worker threads:

  1. Type xmem -t -j2 and press Enter
  2. You should see output like what is shown here, namely:
ERROR: Number of worker threads may not exceed the number of logical CPUs (1)

This is expected, because if you recall, this VM only has one virtual CPU.  

 

 

Run X-Mem throughput (two jobs) on perf-worker-01a (PASS)

 

Now run the exact same X-Mem command that failed on perf-worker-01b on perf-worker-01a:

  1. Select the perf-worker-01a Remote Desktop window.
  2. Click the Command Prompt icon on the taskbar.
  3. Type this command and press Enter: xmem -t -j2
  4. Notice how this command successfully runs the benchmark on this VM.

This command ran successfully on this VM, because it has 4 virtual CPUs (so -j3 or -j4 would also work).  Next, let's take a closer look at the results.

 

 

Review X-Mem throughput results (2 jobs)

 

Once you're back at the command prompt, use the scrollbar to scroll back up and look at the results:

  1. The first benchmark throughput test, Test #1T, will show Read/Write Mode: read.  Since we specified -j2, the output shows that it ran 2 worker threads.
    The result in this example was 90664.66 MB/s (or 90.664 GB/s).  Note that your performance may vary, given the shared resources of the hands-on lab environment (where many other workloads are running).
  2. The second benchmark throughput test, Test #2T, will show Read/Write Mode: write.  Since we specified -j2, the output shows that it ran 2 worker threads.
    The result in this example was 44113.39 MB/s (or 44.11 GB/s).  Note that your performance may vary, given the shared resources of the hands-on lab environment (where many other workloads are running).

Why did the second test have lower (in this case, about half) the throughput of the first?  Well, writes are almost always more expensive than reads; this is true for memory/RAM, and other subsystems, such as disk storage I/O.

 

 

Run X-Mem read throughput (four jobs) on perf-worker-01a

 

Let's further customize the X-Mem command line options, again on perf-worker-01a:

  1. Make sure the focus is on the Command Prompt of the perf-worker-01a Remote Desktop window (if it isn't already)
  2. Type this command and press Enter:   xmem -t -R -j4 -n5
  3. The results will be listed under the *** RESULTS*** heading, as shown here.

Notice that the benchmark ran differently due to the different command line we used.  Here is an explanation of each option:

 

 

Review X-Mem throughput results (four jobs)

 

Once you're back at the command prompt, use the scrollbar to scroll back up and look at the results.  In this example, the results are consistently around 170,000 MB/sec (170 GB/sec).  Since we specified -j4, it ran four worker threads, so the memory performance is significantly higher than when we ran with two worker threads.  

NOTE: Given the nature of our hands-on lab environment, your results may (and probably will) vary from this example.

As shown here, if your application is multi-threaded, additional vCPUs can potentially increase the VM's memory performance.

 

 

Close the Remote Desktop windows

 

  1. Close the perf-worker-01a Remote Desktop window
  2. Close the perf-worker-01b Remote Desktop window

 

Conclusion and Clean-Up



 

Stop Module 9

 

On the main console, find the Module Switcher window and click Stop.

 

 

Key takeaways

During this lab, we learned that X-Mem is a flexible memory benchmark tool.  It can:

You can download this tool to run in your environment to ensure you are getting optimal memory performance out of your hosts and virtual machines.

 

 

Conclusion

This concludes the Memory Performance with X-Mem module. We hope you have enjoyed taking it. Please don't forget to fill out the survey when you finish.

 

Module 10 - Storage Performance and Troubleshooting (30 minutes)

Introduction to Storage Performance Troubleshooting


Approximately 90% of performance problems in a vSphere deployment are typically related to storage in some way.  There have been significant advances in storage technologies over the past few years to help improve storage performance. There are a few things that you should be aware of:

In a well-architected environment, there is no difference in performance between storage fabric technologies. A well-designed NFS, iSCSI or FC implementation works just about the same as the others.

Despite advances in the interconnects, performance limit is still hit at the media itself. In fact, 90% of storage performance cases seen by GSS (Global Support Services - VMware support) that are not configuration related are media related. Some things to remember:

A good rule of thumb on the total number of IOPs any given disk provides:

So, if you want to know how many IOPs you can achieve with a given number of disks:

This test demonstrates some methods to identify poor storage performance and how to resolve it using VMware Storage DRS for workload balancing. The first step is to prepare the environment for the demonstration.


 

Launch Performance Lab Module Switcher

 

Double click on the Performance Lab MS shortcut on the Main Console desktop.

 

 

 

Launch Module 10

Click on the Start button under Module 10.  The script configures and starts up the virtual machines and launches a storage workload using Iometer.

The script may take up to five minutes to complete. While the script runs, spend a few minutes on reading through the next step to gain understanding on storage latencies.

 

Storage I/O Contention



 

Disk I/O Latency

 

When we think about storage performance problems, the top issue is generally latency, so we need to look at the storage stack and understand what layers there are in the storage stack and where latency can build up.

At the top most layer is the Application running in the guest operating system. That is ultimately the place where we most care about latency. This is the total amount of latency that application sees and it include the latencies off the total storage stack including the guest OS, the VMKernel virtualization layers, and the physical hardware.  

ESXi can’t see application latency because that is a layer above the ESXi virtualization layer.

From ESXi we see three main latencies that are reported in esxtop and vCenter.  

The top most is GAVG, or Guest Average latency, that is the total amount of latency that ESXi can detect.  

That is not saying this is the total amount of latency the application sees. In fact, if you compare the GAVG (the Total Amount of Latency ESX is seeing) and the Actual latency the Application is seeing, you can tell how much latency the Guest OS is adding to the storage stack. This could tell you if the guest OS is configured incorrectly or is causing a performance problem. For example, if ESXi is reporting GAVG of 10ms, but the application or perfmon in the guest OS is reporting Storage Latency of 30ms, that means that 20ms of latency is somehow building up in the Guest OS Layer, and you should focus your debugging on the Guest OS's storage configuration.

GAVG is made up of 2 major components KAVG and DAVG: 

DAVG is basically how much time is spent in the Device from the driver HBA and storage array

KAVG is how much time is spent in the ESXi Kernel (so how much over is the kernel adding).  

KAVG is actually a derived metric - ESXi does not specifically calculate KAVG. ESXi calculates KAVG with the following formula:

Total Latency –  DAVG =  KAVG  

The VMKernel is very efficient in processing IO, so there really should not be any significant time that an IO should wait in the kernel or KAVG. KAVG should be equal to 0 in well configured / running environments. When KAVG is not equal to 0, then that most likely means that the IO is stuck in a Kernel Queue inside the VMKernel.  So the vast majority of the time KAVG equals QAVG or Queue Average latency (the amount of time an IO is stuck in a queue waiting for a slot in a lower queue to free up so it can move down the stack).

 

 

View the Storage Performance as reported by IOmeter

 

When the storage script has completed, you should see two IOmeter windows, and two storage workloads should be running.

The storage workload is started on both perf-worker-02a and perf-worker-03a. It takes a few minutes for the workloads to settle and for the performance numbers to become almost identical for the two VMs. These virtual machines testing disk share the same datastore, and that datastore is saturated.

The performance can be seen in the IOmeter GUI as:

Latencies (Average I/O Response Time) - latencies around 6ms

Low IOPs (Total I/O per Second) - around 160IOPs

Low Throughput (Total MBs per Second) - around 2.7MBPS

Disclaimer: Because we run this lab in a fully virtualized environment where the ESXi host servers also run in virtual machines, we cannot assign physical disk spindles to individual datastores. Therefore the performance numbers on these screenshots vary depending on the actual load in the cloud environment the lab is running in.

 

 

Log into vSphere web client

 

This is the vCenter login screen.  To login to vCenter:

  1. Check the Use Windows session authentication checkbox
  2. Click the LOGIN button

 

 

Select perf-worker-03a

 

  1. Select "perf-worker-03a"

 

 

View Storage Performance Metrics in vCenter

 

  1. Select "Monitor"
  2. Select "Performance"
  3. Select "Advanced"
  4. Click "Chart Options"

 

 

Select Performance Metrics

 

  1. Select "Virtual disk"
  2. Select only "scsi0:1"
  3. Click "None" under "Select counters for this chart"
  4. Select "Write latency" and "Write rate"
  5. Click "OK"

The disk that IOmeter uses for generating workload is scsi0:1 or sdb inside the guest.

 

 

View Storage Performance Metrics in vCenter

 

Repeat the configuration of the performance chart for perf-worker-02a and verify that performance is almost identical to perf-worker-03a.

Guidance:  Device latencies that are greater than 20ms may see a performance impact in your applications.

Due to the way we create a private datastore for this test, we actually have pretty good low latency numbers. scsi0:1 is located on an iSCSI datastore based on a RAMdisk on perf-worker-04a (DatastoreA) running on the same ESXi host as perf-worker-03a. Hence, latencies are low for a fully virtualized environment.

vSphere provides several storage features to help manage and control storage performance:

Let’s configure Storage DRS to solve this contention problem.

 

Storage Cluster and Storage DRS


A datastore cluster is a collection of datastores with shared resources and a shared management interface. Datastore clusters are to datastores what clusters are to hosts. When you create a datastore cluster, you can use vSphere Storage DRS to manage storage resources.

When you add a datastore to a datastore cluster, the datastore's resources become part of the datastore cluster's resources. As with clusters of hosts, you use datastore clusters to aggregate storage resources, which enables you to support resource allocation policies at the datastore cluster level. The following resource management capabilities are also available per datastore cluster.

Space utilization load balancing: You can set a threshold for space use. When space use on a datastore exceeds the threshold, Storage DRS generates recommendations or performs Storage vMotion migrations to balance space use across the datastore cluster.

I/O latency load balancing: You can set an I/O latency threshold for bottleneck avoidance. When I/O latency on a datastore exceeds the threshold, Storage DRS generates recommendations or performs Storage vMotion migrations to help alleviate high I/O load. Remember to consult your storage vendor to get their recommendation on using I/O latency load balancing.

Anti-affinity rules: You can create anti-affinity rules for virtual machine disks. For example, the virtual disks of a certain virtual machine must be kept on different datastores. By default, all virtual disks for a virtual machine are placed on the same datastore.


 

Change to the Datastore view

 

  1. Change to the Storage view by clicking on the icon
  2. Click on RegionA01 which is under vcsa-01a.corp.local

 

 

Create a Datastore Cluster

 

  1. Click on ACTIONS
  2. Go to Storage
  3. Click on New Datastore Cluster...

 

 

Specify Datastore Name

 

For this lab, we will accept most of the default settings.

  1. We can specify a name for the Datastore cluster, but leave it at the default of DatastoreCluster.
  2. Click NEXT

 

 

Specify Storage DRS Automation

 

  1. Select No Automation (Manual Mode)
  2. Click NEXT

 

 

Specify Storage DRS Runtime Settings

 

  1. Move the slider all the way to the left to specify a 50% Utilized space threshold.
  2. Click NEXT

Since this lab is a nested virtual environment, it is difficult to demonstrate high latency in a reliable manner. Therefore we do not use I/O latency to demonstrate load balancing. The default is to check for storage cluster imbalances every eight hours, but it can be changed to 60 minutes as a minimum.

 

 

Select Clusters and Hosts

 

  1. Check RegionA01-COMP01 to select our lab cluster
  2. Click NEXT

 

 

Select Datastores

 

  1. Select DatastoreA and DatastoreB
  2. Click NEXT

 

 

Ready to Complete

 

Click FINISH to create the Datastore cluster.

 

 

Run Storage DRS

 

Take a note of the name of the virtual machine that Storage DRS (SDRS) wants to migrate.

  1. Select DatastoreCluster
  2. Select the Monitor tab
  3. Select Storage DRS / Recommendations
  4. Click RUN STORAGE DRS NOW
  5. Click APPLY RECOMMENDATIONS

Notice that SDRS recommends moving one of the workloads from DatastoreA to DatastoreB. It is making the recommendation based on capacity. SDRS makes storage moves based on performance only after it has collected performance data for more than eight hours. Since the workloads just recently started, SDRS would not make a recommendation to balance the workloads based on performance until it has collected more data.

 

 

Configure Storage DRS

 

  1. Select Configure
  2. Select Storage DRS
  3. Select the dropdown arrows to observe the different SDRS settings you can configure

A number of enhancements have been made to Storage DRS to remove some of the previous limitations: 

Common for all these improvements is that they all require VASA 2.0, which requires that the storage vendor has an updated storage provider.

 

 

Select the VM that was migrated

 

  1. Return to the Hosts and Clusters view by clicking the icon.
  2. Select the VM that was migrated using Storage DRS.  In this example, it is perf-worker-03a

 

 

Increased throughput and lower latency

 

  1. Select the "Monitor" tab
  2. Select "Performance"
  3. Select "Advanced"

Now you should see the performance chart you created earlier in this module.

Notice how the throughput has increased and how the latency is lower (green arrows), than it was when both VMs shared the same datastore.

 

 

Return to the Iometer GUIs to review the performance

 

Return the Iometer workers, and see how they also report increased performance and lower latencies.

It takes a while for Iometer to show these higher numbers, maybe ten minutes. This due to the way the storage performance is throttled in this lab. If you want to try a shortcut:

  1. Click the "Stop sign", and wait for about 30 seconds
  2. Click the "Green flag" (start tests) to restart the two workers (see arrows on the picture)

The workload should spike but then settle at the higher performance level in a couple of minutes.

 

 

Stop the Iometer workloads

 

Stop the Workloads

  1. Press the "Stop Sign" button on the Iometer GUI
  2. Close the GUI by pressing the “X
  3. Press the "Stop Sign" button on the Iometer GUI
  4. Close the GUI by pressing the “X

 

Conclusion and Clean-Up


This concludes the Storage Performance and Troubleshooting module. We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.


 

Stop Module 10

 

On the main console, find the Module Switcher window and click Stop for Module 10.  

 

 

Key takeaways

During this lab we saw the importance of sizing your storage correctly with respect to space and performance. It also shows that sometimes when you have two storage intensive sequential workloads sharing the same spindles, the performance can be greatly impacted. If possible try to keep workloads separated; keep sequential workloads separate (back by different spindles/LUNs) from random workloads.

In general, we aim to keep storage latencies under 20ms, lower if possible, and monitor for frequent latency spikes of 60ms or more which would be a performance concern and something to investigate further.

Guidance: From a vSphere perspective, for most applications, the use of one large datastore vs. several small datastores tends not to have a performance impact. However, the use of one large LUN vs. several LUNs is storage array dependent and most storage arrays perform better in a multi-LUN configuration than a single large LUN configuration.

Guidance: Follow your storage vendor’s best practices and sizing guidelines to properly size and tune your storage for your virtualized environment.

 

Module 11 - Network Performance, Basic Concepts and Troubleshooting (15 minutes)

Introduction to Network Performance


As defined by Wikipedia, network performance refers to measures of service quality of a telecommunications product as seen by the customer.

These metrics are considered important:

In the following module, we will show you how to monitor and troubleshoot some network-related issues so that you can troubleshoot similar issues that may exist in your own environment.


 

Launch Performance Lab Module Switcher

 

To start this module, double-click on the Performance Lab MS shortcut on the Main Console desktop.

 

 

Start Module 11

 

Click on the Start button under Module 11.

 

 

Open Google Chrome

 

Click the Google Chrome icon on the taskbar.

 

 

Login to the vSphere Client

 

This is the vCenter login screen.  To login to vCenter:

  1. Check the Use Windows session authentication checkbox
  2. Click the LOGIN button

 

Monitor network activity with performance charts


Network contention can occur when multiple VMs are accessing the same "pipe" (virtual and/or physical network) and there isn't enough bandwidth available.

In our lab environment, it's not feasible to attempt to saturate the network (we'd like others to be able to take labs without delays!).  Therefore, this module focuses on creating network load and showing you where to look when you suspect network problems in your own environment. 

NOTE: You might see different results on your screen, which is to be expected giving the variability of the lab environments.


 

Chart perf-worker-02a network performance

 

  1. Select the perf-worker-02a VM which is on ESXi host esx-01a.corp.local
  2. Select the Monitor tab
  3. Select Performance/Advanced option from the list
  4. Click the Popup Chart icon

 

 

Select Chart Options

 

Click Chart Options to select the network metrics we want to chart.

 

  1. Select the Network subsystem
  2. Select these counters: Packets received, Packets transmitted, and Usage (not pictured)
  3. Make sure only perf-worker-02a is checked (uncheck the other objects)
  4. Click OK

 

 

Monitor chart output

 

Depending on the time it took to get here, the network load test might be done. You should still be able to see the network load that ran and finished.

  1. Here you can see the graphical representation of the network load of perf-worker-02a
  2. Here you can see the counters we selected in the previous step (Packets received, transmitted, and overall Usage in KBps) and a real-time view of their values

Some good advice on what to look for is:

Let's go to the host, and see if this is a VM or a host-level problem.

 

 

Select esx-01a.corp.local host

 

  1. Select host esx-01a.corp.local
  2. Select the Monitor tab
  3. Select Advanced Performance from the list
  4. Select the Popup Chart icon

 

 

Select Chart Options

 

Click Chart Options to select the network metrics we want to chart.

 

 

Monitor chart output

 

  1. See if there are any dropped packets on the host

In this example, there are no dropped packets at the host level, which indicates the hosts' NICs are not the bottleneck.

NOTE: You might see different results depending upon the lab environment conditions.

 

Conclusion and Clean-Up


This concludes the Network Performance, Basic Concepts and Troubleshooting module. We hope you have enjoyed taking it. Please don't forget to fill out the survey when you are finished.


 

Stop Module 11

 

On the main console, click Stop under Module 11.

 

 

Key takeaways

During this lab we saw how to diagnose networking problems, both at a VM and at an ESXi host level, using the vSphere Client's built-in performance charts.

Note that there are other ways to troubleshoot networking performance:

If you want to know more about troubleshooting network performance, see this VMware KB article:
"Troubleshooting network performance issues in a vSphere environment": http://kb.vmware.com/kb/1004087  

 

Module 12 - Advanced Performance Feature: Latency Sensitivity Setting (45 minutes)

Introduction to Latency Sensitivity


The 'Latency Sensitivity' feature was developed to address major sources of latency that can be introduced by virtualization. This feature was designed to programmatically reduce response time and jitter on a per-VM basis allowing sensitive workloads exclusive access to physical resources and avoid resource contention on a granular basis. This is achieved by bypassing virtualization layers reducing overhead. Even greater performance can be realized when latency sensitivity is used in conjunction with a pass-through mechanism such as single-root I/O virtualization (SR-IOV).

Since this feature is set on a per-VM basis, a mixture of both normal VMs and latency sensitive workload VMs can be run on a single vSphere host.


 

Who should use this feature?

The latency sensitivity feature is intended only for specialized use cases, namely, workloads that require extremely low latency. It is extremely important to determine if your workload could benefit from this feature before enabling it. Latency sensitivity provides extremely low network latency performance with a tradeoff of increased CPU and memory cost because of reduced resource sharing and increased power consumption.

The definition of a “high-latency sensitive application” is one that requires network latencies in the tens of microseconds and very small jitter. An example would be stock market trading applications which are highly sensitive to latency; any introduced latency could mean the difference of making millions or losing millions.

Before making the decision to leverage VMware’s latency sensitivity feature, perform the necessary cost-benefit analysis if this feature is necessary. Choosing to enable this feature just because it exists can lead to higher host CPU utilization, higher power consumption, and it can needlessly impact performance of the other VMs running on the host.

 

 

Who should not use this feature?

Choosing whether to enable the latency sensitivity or not is one of those “Just because you can doesn’t mean you should” choices. The Latency sensitivity feature reduces network latency. Latency sensitivity does not decrease application latency, especially if latency is influenced by storage or other sources of latency besides the network.

The latency sensitivity feature should be enabled in environments in which the CPU is under committed. VMs which have latency sensitivity set to High are given exclusive access to the physical CPU on the host. This means the latency sensitive VM can no longer share the CPU with neighboring VMs.

Generally, VMs that use the latency sensitivity feature should have fewer vCPUs than the number of cores per socket in your host to ensure that the latency sensitive VM occupies only one NUMA node.

If the latency sensitivity feature is not relevant to your environment, consider choosing a different module.

 

 

Changes to CPU access

When a VM has 'High' latency sensitivity set in vCenter, the VM is given exclusive access to the physical cores it needs to run. This is termed exclusive affinity. These cores will be reserved for the latency sensitive VM only, which results in greater CPU accessibility to the VM and less L1 and L2 cache pollution from multiplexing other VMs onto the same cores. When the VM is powered on, each vCPU is assigned to a particular physical CPU and remains on that CPU.

When the latency sensitive VM's vCPU is idle, ESXi also alters its halting behavior so that the physical CPU remains active. This reduces wakeup latency when the VM becomes active again.

 

 

Changes to virtual NIC interrupt coalescing

A virtual NIC (vNIC) is a virtual device that exchanges network packets between the VMkernel and the guest operating system. Exchanges are typically triggered by interrupts to the guest OS or by the guest OS calling into VMkernel, both of which are expensive operations. Virtual NIC interrupt coalescing, which is enabled by default in vSphere, attempts to reduce CPU overhead by holding back packets for some time (combining or "coalescing" these packets) before triggering interrupts, which causes the hypervisor to wake up VMs more frequently.

Enabling 'High' latency sensitivity disables virtual NIC coalescing, so that there is less latency between when a packet is sent or received and when the CPU is interrupted to process the packet.   Typically, coalescing is desirable for higher throughput (so the CPU isn't interrupted as often), but it can introduce network latency and jitter.

While disabling coalescing can reduce latency, it can also increase CPU utilization and thus power usage.  Therefore this option should only be used in environments with small packet rates and plenty of CPU headroom.

Are you ready to get your hands dirty? Let's start the hands-on portion of this lab.

 

 

Check the Lab Status in the lower-right of the desktop

 

Please check to see that your lab is finished all the startup routines and is ready for you to start. If you see anything other than "Ready", please wait a few minutes. If after five minutes you lab has not changed to "Ready", please ask for assistance.

 

Enabling and Confirming the Latency Sensitivity setting


In this section, we learn how to enable and confirm that Latency Sensitivity for a VM in the lab environment.


 

Open Google Chrome

 

First, let's open Google Chrome.

 

 

Login to vCenter

 

This is the vCenter login screen.  To login to vCenter:

  1. Check the Use Windows session authentication checkbox
  2. Click the LOGIN button

 

 

Select the challenge-04a VM

 

Let's select the VM we'll be enabling Latency Sensitivity on:

  1. Ensure Hosts and Clusters is the view in the vSphere Client by clicking the highlighted icon
  2. Select the challenge-04a VM highlighted
  3. Note this VM has 2 CPUs and 2 GB of Memory configured.
  4. Click the Edit Settings icon so we can enable the setting.

 

 

Go to Advanced VM Settings

 

  1. Select VM Options
  2. Expand the Advanced pulldown
  3. Scroll down

 

 

Set Latency Sensitivity to High

 

After you scroll down, you should see the Latency Sensitivity setting.

  1. Select the dropdown and change it from Normal to High
  2. Click OK to save this setting

Now, let's try to power on this VM.  Hint: We may have to do a couple of more things before it powers on successfully, but we'll learn how to do these as well.

 

 

Power on challenge-04a VM, Note CPU Reservation Requirement

 

Let's try to power on the VM, and note the error that comes up.

  1. Click the Power On icon to attempt to start the VM
  2. Note the failure in the lower right corner: Operation failed! 
  3. The reason for the failure is listed next to Status: Invalid CPU reservation for the latency-sensitive VM, (sched.cpu.min) should be at least 5598 MHz.
    NOTE: Your specific lab environment will likely show a different MHz value here; please make note of it for the next step.
  4. Click the Edit Settings icon again so we can set the CPU reservation.

 

 

Set CPU Reservation

 

Let's set the CPU reservation for the challenge-04a VM to resolve the power-on failure.

  1. Expand the CPU dropdown.
  2. The Reservation field is 0 by default (no CPU reservation).  Change this to the value that the error message stated in the previous step.  In this example, it is 5598 MHz but might differ in your lab environment. 
  3. Click OK.

Try to power on the VM again now that the CPU reservation has been set.

 

 

Power on challenge-04a VM, Note Memory Reservation Requirement

 

Let's try to power on the VM, and note the error that comes up.

  1. Click the Power On icon to attempt to start the VM
  2. Note the failure in the lower right corner: Operation failed!
  3. The reason for the failure is listed next to Status: Invalid memory setting: memory reservation (sched.mem.min) should be equal to memsize(2048).
  4. Click the Edit Settings icon again so we can set the memory reservation.

 

 

Set Memory Reservation

 

Let's set the Memory Reservation for the challenge-04a VM to resolve the power-on failure.

  1. Expand the Memory dropdown.
  2. The Reservation field is 0 by default (no memory reservation).  Check the "Reserve all guest memory (All locked)" checkbox. 
  3. Click OK.

Let's try to power on the VM again, now that both the CPU and memory reservations have been set.

 

 

Power on challenge-04a VM

 

Let's try to power on the VM, and note that shouldn't be any more errors.

  1. Click the Power On icon to attempt to start the VM.
  2. Click the Monitor tab so we can confirm that the CPU and Memory Reservations are indeed working as intended.

 

 

Open PuTTY

 

Click on the PuTTY icon so we can SSH to the host that is running the challenge-04a VM.

 

 

PuTTY to esx-01a

 

Double-click on the esx-01a.corp.local session.

 

 

Launch esxtop

 

Type in esxtop and press Enter.

 

 

Filter only running VMs (V)

 

Filtering only running VMs makes the display easier to read.
Type an uppercase V (shift-v) and to see the display change to only challenge-04a.

 

 

Change the displayed esxtop fields (f)

 

Type the f key (short for fields) to see a display like the above.

We want to remove the "F" field (CPU State Times), and add the "I" field (CPU Summary Stats).
Type the uppercase F and I keys and you should see the CPU Summary Stats selected now.

Press Enter to return to the main esxtop display.

 

 

Expand the esxtop window to the right

 

There are still many fields in esxtop, so expand the window by clicking and dragging the edge of the right border of the window to the right.

There are two things we want to note here:

  1. Note the GID of your VM (279396 in this example but is different in your lab environment)
  2. Note that EXC_AF is Y - this is new with ESXi 6.7; it confirms that the VM has exclusive affinity

 

 

Expand the VM GID in esxtop

 

Expand the GID of the VM that is displayed in your lab environment:

 

 

Expanded esxtop display shows EXC_AF for all vCPUs

 

Note that we now see much more information about the challenge-04a VM, including processes, which CPUs those processes are on, and so on.

The highlighted rows show:

  1. There are two vCPUs (vmx-vcpu-0 and vmx-vcpu-1)
  2. Both vCPUs have 99% DMD (demand), confirming the CPU reservation set earlier
  3. Exclusive affinity (EXC_AF = Y), confirming the Latency Sensitivity setting is active.
  4. Go ahead and close PuTTY.

 

 

Monitor Performance Overview Charts (CPU and Memory)

 

Switch back to the vSphere Client (Chrome window) and let's look at the CPU and Memory usage, now that we have created the necessary Reservations for Latency Sensitivity to be High.

  1. Under Performance, click Overview.
  2. For the time range, make sure this drop-down is set to Real-time.
  3. Hover over the red lines on the left and right, which indicate CPU and Memory Usage for challenge-04a.
    These usages are 100%, which means the reservations we set are working.

    NOTE: While these reservations are necessary for latency sensitivity, keep in mind this excludes the ability for ESXi to share or free up idle resources for other VMs.

 

 

Shut down challenge-04a VM

 

Still in the Chrome/vSphere Client window:

  1. Click the Refresh icon
  2. Ensure the Monitor tab is still highlighted
  3. Click All Issues
  4. Note that vCenter generated CPU and memory usage alarms for this VM.
    This is expected, given the reservations we had to set for Latency Sensitivity.
  5. Click the Shut Down icon to power off the VM.
  6. Click YES to confirm.

 

 

 

Remove CPU Reservation for challenge-04a VM

 

To remove the CPU reservation:

  1. Expand CPU by clicking the caret as shown
  2. Change the Reservation from the value you set earlier to 0 MHz. This removes the CPU reservation.
  3. Collapse CPU by clicking the caret from Step 1.

Next, we'll remove the Memory reservation.

 

 

Remove Memory Reservation for challenge-04a VM

 

To remove the Memory reservation:

  1. Expand Memory by clicking the caret as shown
  2. Uncheck the "Reserve all guest memory (All locked)" checkbox
  3. Change the Reservation from 2048 MB to 0 MB as shown.  This removes the memory reservation.
  4. Click the VM Options tab, where we will set Latency Sensitivity back to Normal.

 

 

Remove Memory Reservation for challenge-04a VM

 

To remove the Memory reservation:

  1. Expand Memory by clicking the caret as shown
  2. Uncheck the "Reserve all guest memory (All locked)" checkbox
  3. Change the Reservation from 2048 MB to 0 MB as shown.  This removes the memory reservation.

Next, set Latency Sensitivity back to Normal.

 

 

Go to Advanced VM Settings

 

  1. Select VM Options
  2. Expand the Advanced pulldown
  3. Scroll down

 

 

Set Latency Sensitivity to Normal

 

After you scroll down, you can see the Latency Sensitivity setting.

  1. Select the dropdown and change it from High to Normal.
  2. Click OK to save these settings.

 

 

Summary

Congratulations!  In summary, you have successfully:

 

Conclusion


This concludes the Latency Sensitivity module.  We hope you have enjoyed taking it. Please do not forget to fill out the survey when you are finished.


 

Key takeaways

The Latency Sensitivity setting is easy to configure, but you should determine whether your application fits the definition of "High" latency sensitivity.

To review: 

  1. In the VM's Advanced Settings, set Latency Sensitivity to High.
  2. Set the necessary minimum CPU reservation for the latency sensitive VM such that the MHz reserved is equal to the frequency of the CPU.
  3. Set the 100% memory reservation to reserve/lock all of the memory of the guest VM.

If you want to learn more about running latency sensitive applications on vSphere, consult these white papers:

 

 

Test Your Skills!

 

Now that you’ve completed this lab, try testing your skills with VMware Odyssey, our newest Hands-on Labs gamification program. We have taken Hands-on Labs to the next level by adding gamification elements to the labs you know and love. Experience the fully automated VMware Odyssey as you race against the clock to complete tasks and reach the highest ranking on the leaderboard. Try the vSphere Performance Odyssey lab

 

Conclusion

Thank you for participating in the VMware Hands-on Labs. Be sure to visit http://hol.vmware.com/ to continue your lab experience online.

Lab SKU: HOL-2004-01-SDC

Version: 20200424-193737