SHIFT

--- Sjoerd Hooft's InFormation Technology ---

User Tools

Site Tools


Sidebar

Recently Changed Pages:

View All Pages


View All Tags


LinkedIn




WIKI Disclaimer: As with most other things on the Internet, the content on this wiki is not supported. It was contributed by me and is published “as is”. It has worked for me, and might work for you.
Also note that any view or statement expressed anywhere on this site are strictly mine and not the opinions or views of my employer.


Pages with comments

View All Comments

determiningnumberofvmspervmfsvolume

Determining Number Of VMs Per VMFS Volume

This article explains the following two articles in more detail:

VMware Performance Study: Scalable Storage Performance
Yellow Bricks article: Max amount of VMs per VMFS volume

The reason for this is that both articles are quite technical and can be quite confusing. I have a lot of experience with storage and I would say a little bit more than the average system administrator but not like the guys who work with NetApp and EMC for 20 years. I thought I'd write an article on going through the articles step by step, excluding the steps that are not relevant for my environment, providing an extensive example to use for future references.

My environment:

  • ESX 4.1 update 1
  • VMs are for 80% Windows Server 2003 and 20% Windows Server 2008
  • Storage is IBM NSeries N6060 which is a rebranded NetApp FAS3160
  • Storage link is 4GB Fiber Channel
  • Disks are Fiber Channel type

Excluded step: I'm not taking SCSI reservations into considerations. See the quote below on what kind of operations cause SCSI reservations. In my environment I do not expect these kind of operations to happen on a daily bases:

VMFS is a clustered file system and uses SCSI reservations as part of its distributed locking algorithms. Administrative operations, such as creating or deleting a virtual disk, extending a VMFS volume, or creating or deleting snapshots, result in metadata updates to the file system using locks, and thus result in SCSI reservations. Reservations are also generated when you expand a virtual disk for a virtual machine with a snapshot. A reservation causes the LUN to be available exclusively to a single ESX host for a brief period of time. Although it is acceptable practice to perform a limited number of administrative tasks during peak hours, it is preferable to postpone major maintenance or configuration tasks to off-peak hours in order to minimize the impact on virtual machine performance.

Remember that changing settings and or defaults could do more harm than good when you're not properly analyzing your environment. Default values are there for a reason.

To come to an actual number of Virtual Machines per VMFS volume we have to gather data first. Data gathering consists of two parts:

  1. Performance and settings data gathering
  2. Capacity data gathering

Performance and Settings Data Gathering

Active SCSI Commands

The first step we do is gathering the average amount of active SCSI commands because that will take some time. To do so, we will run Vmware ESXTOP in batch mode using a modified configuration file. First we'll have to create the configuration file:

  • Log on to the esx host and type esxtop
  • Switch to each of these components and disable all counters:
    • c : cpu
    • m : memory
    • d : disk adapter
    • v : virtual machine disk
    • n : network
    • i : interrupt
    • p : power management
  • Then switch to u (disk device) and leave queue stats enabled while disabling all other counters
  • Save the configuration by typing W and providing a name for the file (/root/batch_mode)
  • Exit esxtop by typing q

Now start esxtop in batchmode like this:

esxtop -b -c /root/batch_mode -d 2 -n 900 | gzip -9c > /tmp/esxtop_esxprd01_1244.csv.gz

And run it in batch mode on the background for a full day:

nohup esxtop -b -c /root/batch_mode -d 15 -n 5760 | gzip -9c > /tmp/esxtop_fullday_`hostname -s`.csv.gz &

Explanation to above switches:

  • b : batch mode
  • c : path to custom config file for esxtop or resxtop
  • d : time between samples
  • n : how many samples

After running the command and downloading/unpacking the file you can load the file in excel/perfmon to evaluate the data. In my environment we had so many disks excel would run out of columns. Even the latest version of libre office did not fix that. So I was left with perfmon which does not very well with large files.

Update: Excel 2010 can import the amount of columns. Using special paste/transpose you can flip columns and rows and filter out what you don't need so the file gets smaller which is better for perfmon. I performed these steps so the data was readable:

* Imported the csv as text and saved as xls

  • Removed as many columns I could easily select and saved the file
  • Copied the text and special paste / transposed it in a new workbook
  • Sorted the text on the first column and removed every counter I could easily select
  • Performed find and replace on specific counters, replacing them by blank cells (search for *Q Depth)
  • Sorted the text again, and special paste / transposed it in a new workbook, saved the file as csv
  • Opened the csv with a text editor and search and replaced the ; with “,”
  • Created a macro to add a “ at the start and end of each line
  • Saved the csv

You can import a CSV in perfmon as follows:

  • Run: perfmon
  • Right click on the graph and select “Properties”.
  • Select the “Source” tab.
  • Select the “Log files:” radio button from the “Data source” section.
  • Click the “Add” button.
  • Select the CSV file created by esxtop and click “OK”.
  • Click the “Apply” button.
  • Optionally: reduce the range of time over which the data will be displayed by using the sliders under the “Time Range” button.
  • Select the “Data” tab.
  • Remove all Counters.
  • Click “Add” and select appropriate counters.
  • Click “OK”.
  • Click “OK”.

Because perfmon would crash several times I was forced to create a file over a shorter period of time. Eventually I added these counters:

  • Active Commands
  • Queued Commands

Which look like this in perfmon: countersperfmon.jpg
The sum of these counters are the total numbers of outstanding commands for that specific LUN. IN case multiple esx hosts access that LUN please perform the same steps for all esx hosts and add these numbers as well.

Queue Depth

The LUN queue depth determines how many commands the HBA is willing to accept and process per LUN. If a single VM is issuing IO, the queue depth setting is indeed the leading parameter. If multiple VMs are simultaneously issuing IO to the LUN, the Disk.SchedNumReqOutstanding setting becomes the leading parameter.

Note that it's a best practice to keep both settings at the same value.

The LUN queue depth can be found like this:

  • Select the host in vCenter and go to the Performance tab
  • Click the Advanced view
  • Click Chart options
  • Select Disk → Real Time
  • Select all disks in the Objects panel
  • Only select the Maximum Queue Depth in the Counters panel
  • Click OK

This will give you an overview of the LUNs accessible by the ESX host and the corresponding queue depth value: lunqueuedepth.jpg

The Disk.SchedNumReqOutstanding is a per host value and can be found like this:

  • Select the host in vCenter and go to the Configuration tab
  • Click Advanced Settings
  • Click Disk and find the value in the panel on the right:

advancedsettingsdisk.jpg

Formula

Now we can use the formula from the sources to determine the maximum number of VMs per VMFS volume:

First sum up all the gathered information:

  • Average active SCSI commands per server: 4
  • LUN queue depth: 16 or 32
  • Disk.SchedNumReqOutstanding: 32

The formula used on shared storage is: lun queue depth / average active SCSI commands: 32 / 4 = 8 or 16 / 4 = 4.

So, depending on the queue depth which is different on a few LUNs I can have a maximum of 8 or 4 VMs per VMFS volume.

Capacity Data Gathering

Now that we have established that for performance reasons the maximum number of VMs per VMFS volumes should not exceed 8 we should also look at the storage demand. In our environment we have an acceptance cluster consisting of 4 esx hosts. In this cluster reside 117 VMs. We should calculate whether these 117 VMs (and future growth) can co exist with each other keeping esx limits in mind. In esx 4.1 the maximums we should consider are these:

  • max paths per host: 1024
  • max luns per host: 256
  • max paths per luns: 32
  • max size LUN: 2 TB -512kb

Current Storage Needs

You can create csv files with PowerCLI to select the storage need per cluster:

$timestamp = Get-Date -format "yyyyMMdd-HH.mm"
 
# $vCenter = "vCenter"
# Connect-VIServer $vCenter
 
foreach ($cluster in (Get-Cluster)){
#$cluster = "Acceptance"
 
$csvfile = "D:\adminshf\$timestamp-$cluster-storagerequirements.csv" 
 
$myCol = @()
 
$vms = Get-Cluster $cluster | Get-VM 
#$vms = Get-Cluster "Acceptance" | Get-VM 
 
foreach($vm in $vms){
 
$vmview = Get-VM $vm | Get-View
 
$VMInfo = "" |select-Object VMName,UsedSpaceGB,ProvisionedSpaceGB,MEMSize,MEMReservation,ProposedMEMReservation,RequiredVMKernelStorage
$VMInfo.VMName = $vmview.Name 
$VMInfo.UsedSpaceGB = [System.Math]::Round($vm.UsedSpaceGB,0)
$VMInfo.ProvisionedSpaceGB = [System.Math]::Round($vm.ProvisionedSpaceGB,0)
$VMInfo.MEMSize = $vmview.Config.Hardware.MemoryMB
$VMInfo.MEMReservation = $vmview.Config.MemoryAllocation.Reservation
$VMInfo.ProposedMEMReservation = [System.Math]::Round(($VMInfo.MEMSize / 3),0)
$VMInfo.RequiredVMKernelStorage = ($VMInfo.MEMSize - $VMInfo.ProposedMEMReservation)
 
$myCol += $VMInfo
 
}
 
$myCol |Export-csv -NoTypeInformation $csvfile
 
} 
 
# Disconnect-VIServer -Confirm:$false

You can now open the csv in excel and make some calculations on what you need per LUN.

Deploying Storage

The last step would logically be to deploy the storage. I wrote a special script for that, which you can find here.

You could leave a comment if you were logged in.
determiningnumberofvmspervmfsvolume.txt · Last modified: 2021/09/24 00:24 (external edit)