VMWARE Troubleshooting Disk and Datastore Related Issues FAQ-Part1

Troubleshooting Disk and Datastore Related Issues

1.Troubleshooting a VMFS resource volume that is corrupted


The event indicates the reported VMFS volume is corrupted.
Example

If 4976b16c-bd394790-6fd8-00215aaf0626 represents the UUID and san-lun-100 represents the associated volume label, you see:
For Event: vmfs.lock.corruptondisk

Volume 4976b16c-bd394790-6fd8-00215aaf0626 (san-lun-100) may be damaged on disk. Corrupt lock detected at offset O
For Event: vmfs.resource.corruptondisk

Volume 4976b16c-bd394790-6fd8-00215aaf0626 (san-lun-100) may be damaged on disk. Resource cluster metadata corruption detected
Impact

The scope of the corruption may vary. It might affect just one file or corrupt the whole volume. Do not use the affected VMFS any longer.

Solution

To recover from this issue: Back up all data on the volume.

Run the following command to save the VMFS3 metadata region and provide it to VMware customer support: dd if=/vmfs/devices/disks/<disk>of=/root/dump bs=1M count=1200 conv=notrunc

where <disk> is the partition that contains the volume. If you have a spanned volume, <disk> is the head partition.
This provides information on the extent of the volume corruption and can assist in recovering the volumes.
=======================================================================
2.VMFS Lock Volume is Corrupted Details

You may observe the following events within the /var/log/vmkernel logs within your VMware ESX host: Volume 4976b16c-bd394790-6fd8-00215aaf0626 (san-lun-100) may be damaged on disk. Corrupt lock detected at offset 0

Note: In this example 4976b16c-bd394790-6fd8-00215aaf0626 represents the UUID of the VMFS datastore and san-lun-100 represents the name of the VMFS datastore.

You may observe the following events within the /var/log/vmkernel logs within your VMware ESX host: Resource cluster metadata corruption detectedVolume 4976b16c-bd394790-6fd8-00215aaf0626 (san-lun-100) may be damaged on disk.

Note: In this example 4976b16c-bd394790-6fd8-00215aaf0626 represents the UUID of the VMFS datastore and san-lun-100 represents the name of the VMFS datastore.

Solution

The events indicate that the reported VMFS volume is corrupt. The scope and the cause of the corruption may vary. The corruption may affect just one file or the entire volume.

Create a new datastore and restore any information that may have been compromised to the new datastore from existing backups. Do not use the corrupt VMFS datastore any longer.

Note: If some information is still accessible on the datastore that is reportedly corrupt, you may attempt to migrate the information off of the datastore with the use of the vCenter migrate feature, vmkfstools, or the datastore browser. If you are able to migrate any information off of the corrupt datastore, validate the information to ensure that it has not been affected by the corruption.

Determining the cause of the corruption

If you would like assistance in determining the cause of the corruption, VMware technical support can provide assistance in a best effort capacity.
To collect the appropriate information to diagnose the issue:

Note: More information about support service terms and conditions can be found here. Log into the service console as root.

Find the partition that contains the volume. In the case of a spanned volume, this is the head partition. Run the following command to find the value of the partition:
vmkfstools -P /vmfs/volumes/<volumeUUID>

For example, run the following command to find the partition for 4976b16c-bd394790-6fd8-00215aaf0626:

# vmkfstools -P /vmfs/volumes/4976b16c-bd394790-6fd8-00215aaf0626 File system label (if any): san-lun-1000

Mode: public

Capacity 80262201344 (76544 file blocks * 1048576), 36768317440 (35065 blocks) avail UUID: 49767b15-1f252bd1-1e57-00215aaf0626
Partitions spanned (on "lvm"): naa.60060160b4111600826120bae2e3dd11:1

Make note of the first device listed in the output for the Partitions spanned list. This is the value for the partition. In the above example, the first device is:

naa.60060160b4111600826120bae2e3dd11:1

Using the value from step 3, run the following command to save the vmfs3 metadata region and provide it to VMware customer support:

dd if=/vmfs/devices/disks/<disk:partition> of=/root/dump bs=1M count=1200 conv=notrunc

Note: The variable <disk:partition> is the value recorded in step 3.

Caution: The resulting file is approximately 1200 MB in size. Ensure that you have adequate space on the destination. The destination in the above example is the /root/ folder. To compress the file, you can use an open source utility called gzip. The following is an example of the command:

# gzip /root/dump

Note: For more information on the gzip utility, type man gzip at the console.

Create a new support request. For more information, see How to Submit a Support Request. Upload the resulting file along with a full support bundle to VMware technical support.
======================================================================
3.Troubleshooting virtual machine performance issues Symptoms

The guest operating system boots slowly

Applications running in virtual machines perform poorly Applications running in virtual machines take a long time to launch
Applications running in virtual machines frequently become unresponsive
Multi-user services have long transaction times or can handle less simultaneous users than expected

Purpose

This articles discusses identifying and resolving various issues that affect virtual machine performance running on VMware hosted products.


Resolution

Validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.

Verify that the reduced performance is unexpected behavior. When a workload is virtualized it is common to see some performance reduction due to virtualization overhead. Troubleshoot a performance problem if you experience the following conditions:
 The virtual machine was previously working at acceptable performance levels but has since degraded The virtual machine performs significantly slower than a similar setup on a physical computer

You want to optimize your virtual machines for the best performance possible

Verify that you are running the most recent version of the VMware product being used. For download information, see the VMware Download Center.

Check that VMware Tools is installed in the virtual machine and running the correct version. The version listed in the toolbox application must match the version of the product hosting the virtual machine. To access the toolbox, double-click the VMware icon in the notification area on the task bar, or run vmware-toolbox in Linux. Some VMware products indicate when the version does not match by displaying a message below the console view. For more information on installing VMware Tools

4.Review the virtual machine's virtual hardware settings and verify that you have provided enough resources to the virtual machine, including memory and CPU resources. Use the average hardware requirements typically used in a physical machine for that operating system as a guide. Adjustments to the settings are required to factor-in the application load: higher for larger loads such as databases or multi-user services, and lower for less intense usage such as casual single-user application like e-mail or web clients.

5. Ensure that any antivirus software installed on the host is configured to exclude the virtual machine files from active scanning. Install antivirus software inside the virtual machine for proper virus protection. For more information, see Investigating busy hosted virtual machine files (1003849).


Investigating busy hosted virtual machine files Symptoms

Unable to open file. Insufficient permissions. Virtual machine runs slowly.

Virtual machine becomes unresponsive. Virtual machine crashes.

Virtual machine file corruption.

Guest operating system data corruption. Cannot power on virtual machine.

Unable to successfully perform snapshot operations. Unexpected behaviour in guest operating system.
Purpose

This article guides you through determining if problems being experienced by a virtual machine are related to other software on your host trying to access virtual machine files. The article also offers means of correcting this situation.

Resolution

To ensure optimum performance and data integrity, VMware requires exclusive disk access to all of the files that comprise the resources of a virtual machine while it is powered on. If another program accesses one of these files at the wrong moment, unexpected results may occur.

For information on determining the location of virtual machine files, see Locating a hosted virtual machine's files (1003880) .

The following are the types of software and activities that may interfere with normal virtual machine operation, and the steps to take in order to ensure that they do not cause a conflict:

Antivirus software. Exclude all of the virtual machine files from manual, automatic, and real time scanning. Limit antivirus scanning of virtual machines to the guest operating system inside the virtual machine. If an antivirus scan of the virtual machine files is required ensure that the virtual machine is powered off prior to the scan.

Backup software. Exclude all virtual machine files from host file backups. If a virtual machine needs to be backed up it can be done from the guest operating system inside the virtual machine. If the virtual machine files themselves need to be backed up ensure that the virtual machine is powered off prior to the backup. Disk utility software. Do not run host disk integrity checks, defragmentation routines, or anything else that involves writing to the disk or otherwise accessing files, on any of the virtual machine's files while the virtual machine is powered on.

Other software. This includes spyware, P2P applications, and anything else that may be accessing one of the virtual machine's files. Operations that make use of the files include reading, scanning, copying, and writing. Exclude the virtual machine's files from any of these operations.

Editing of virtual machine files. Do not edit any of the virtual machine's files while it is powered on.

6. Check the storage sub-system on the host and verify that it is configured for optimal performance. For information, see Troubleshooting hosted disk I/O performance problems (1008885).
===================================================

Troubleshooting hosted disk I/O performance problems

Symptoms

The virtual machine performs very slowly: applications start slowly or take a long time to appear, or the operating system takes a long time to boot.

Virtual machines stop responding for long periods of time.

Windows guests fail with STOP 0x77 KERNEL_STACK_INPAGE_ERROR, or the event log contains Event ID 51 from the source: Disk.

Linux guests become nonresponse or lock-up. The vmware.log file contains lines similar to: Command WRITE(10) took 10.858 seconds (ok) Command READ(10) took 1.173 seconds (ok) SCSI0: RESET BUS

Resolution

These symptoms may indicate that there is a disk performance issue on the underlying hardware. When using VMware hosted products, consider that both the virtual machines and host operating system often share the same disk resources and hardware.

Below are some suggestions you can implement to improve disk I/O performance issues:

Suggestion Details

Using non-growable or preallocated VMDK disks

When creating a production virtual machine, VMware recommends that the virtual hard disk be configured to preallocated mode. If existing disks are not in preallocated mode, use the vmware-vdiskmanager tool to convert the disks. Consult the product's User Guide for more information.

Removing or reducing snapshots

When a snapshot is created, the VMware product produces an additional delta file. Each successive snapshot produces an additional file. When a disk operation is performed within the guest, the disk I/O is recreated by parsing each snapshot delta file in the chain. This produces additional disk overhead on the host because more than one file must be opened and processed to recreate the I/O data for the guest operating system. For best performance, remove all snapshots in the guest operating system or store performance-sensitive data on an independent virtual disk. Consult the product's User Guide for information on configuring independent virtual disks.
Using separate physical and virtual hard disks

Install the host operating system onto a separate hard disk than the virtual machines. Also store the paging file or swap partition on a different drive than the host operating system.
Optimizing the drive

Run disk defragmentation software on the host and in the guest operating system. Fragmentation of both the VMDK files and within the guest can create a double the impact from fragmentation.

Using partitions

Implementing partitions inside the guest operating system or host can improve performance by creating fragmentation boundaries and can reduce further fragmentation. For example, consider storing the small, often modified files of the operating system away from large files such as database or Microsoft Exchange stores by using a separate partition. Also consider storing the virtual machine files (.VMDK files) on their own partition or disk on the host
Using RAID or adjusting the RAID configuration or adding disks to the array

Certain RAID configurations can impact read or write performance positively and negatively. When using a RAID 5 configuration, consider adding more disks to the array. This generally improves the performance of the array. Using mirroring can improve read performance but may degrade write performance. If write performance is primarily impaired, consider a different RAID type to host the virtual machine.

Check for disk encryption

Disk encryption can reduce disk performance. Try moving the virtual machine to a non-encrypted volume and test if performance has improved.

Ensure the existing physical hardware is healthy and performing as expected

Often disk problems such as bad sectors or failing controllers can impact performance because I/O and bad cluster auto-recovery can cause sudden interruptions in I/O operations to the device. Perform a hardware and file system diagnostic to verify if this is impacting performance. For more information, see Performing a disk check (1004003).

Upgrade or choose different physical disk hardware

It is important to consider the performance characteristics of the physical disk hardware. In general, hardware RAID and independent disk controllers perform better than software RAID and integrated disk controllers. When an independent controller is used, often it is possible to configure or increase the cache memory to yield better performance. Consult the hardware vendor for more information. Typically older hardware performs slower than newer hardware. Hard disks used in laptop or notebook computers are often far slower than drives used in desktop computers. SCSI hard disks typically perform much faster than those used in regular desktops and notebooks. Hard disks connected over USB typically perform slower than directly attached local disks (such as IDE, SATA, and SCSI). Flash-based USB thumb drives typically perform slower than hard drives.

Review the performance specifications provided by the disk manufacturer on critical metrics such as spindle speed, and average seek time (latency), burst data transfer rates. Higher spindle speeds, lower seek times and higher transfer rates perform better. When comparing flash-based drives, observe both the read and write transfer performance ratings.

Edit the virtual machine settings to reduce I/O usage by using more host memory

Adding the following settings to a virtual machine can reduce the I/O load on the hard disk, however these adjustments require additional memory on the host. Only add these settings if there is sufficient free memory on the host to accommodate all the memory allocated to the virtual machine, otherwise you may cause a memory starvation condition that can reduce performance of all the running virtual machines or possibly affect the host operating system. Use these settings with caution.

Open the .VMX file for the affected virtual machine while it is powered off. Add the following lines to the file using a text editor.

Note: If you are using VMware Server, you may need to restart the VMware Authorizatioin Service (vmware-authd) for changes to take effect.

MemTrimRate = "0" mainMem.useNamedFile=false sched.mem.pshare.enable = "FALSE"

prefvmx.useRecommendedLockedMemSize = "TRUE"

7. Verify that there are enough free resources on the host to satisfy the requirements of the virtual machine. In VMware hosted products resources must be shared by both the host operating system and all running guests. For more information, see Investigating hosted virtual machine resources (1003848).
==========================================================
Investigating hosted virtual machine resources Symptoms

A virtual machine: Cannot be powered on. Cannot be resumed. Cannot be suspended.

Cannot perform a snapshot operation.

A guest operating system or a host operating system with powered on virtual machines: Has stopped responding.

Has performance problems. Is slow.
Is experiencing excessive disk use.
Purpose

This article guides you through the process of determining if a lack of host resources is causing problems with a virtual machine operation. A lack of host resources can also cause problems on a virtual machine's guest operating system and on a host computer where a VMware product is installed and virtual machines are powered on. The steps outlined here eliminate the possibility that the problem is related to insufficient host resources.
Resolution

To determine if your host has enough resources to support your virtual machines, consider memory, disk space, and CPU. For each of these resources:
Note: The following procedures must all be done when the affected virtual machine is powered off.

Note: If you perform a corrective action in any of the following steps, confirm if you are still experiencing the problem.

Total the amount of the resource assigned to all virtual machines that can be powered on at the same time. If different virtual machines can be powered on at different times use the highest total.

Note: The amount of disk space assigned to a virtual machine is the combined maximum size of all of its virtual drives. If you use snapshots with a virtual machine, take into account that each snapshot may require as much disk space as the combined size of all of the virtual machine's virtual drives. Each snapshot starts off requiring very little disk space but its disk space requirements increase over time.

Note: There is no direct way of modifying the amount of CPU assigned to a virtual machine. When considering virtual machine performance, evaluate the CPU needs of the guest operating system and its applications.

Add to this the amount of the resource required by your host for its own operations.

If this results in a number that is not less than the amount of the resource available on your host, do one of the following:

Reduce the number of virtual machines powered on at the same time on this host. Reduce the amount of this resource assigned to the virtual machines.

Increase the amount of this resource installed on the host.

Note: To adjust virtual CPU assignment if your host has multiple CPUs or CPU cores, it is possible to set processor affinity among virtual machines so that one or more CPUs are not used by any other virtual machine. For more information, see Associating a Virtual Machine With a Particular Host Processor (110) . Alternatively, the host hardware must be upgraded to one with faster or more CPUs.

Associating a Virtual Machine With a Particular Host Processor Details

I have a multiprocessor or hyperthreaded processor system, but my virtual machine shows only one processor. Why is that?
Solution

VMware products run on symmetric multiprocessor (SMP) systems, also referred to as multiprocessor specification (MPS) systems. However, the environment provided within each virtual machine is a uniprocessor system.

If you have multiple virtual machines running at the same time, some use one processor and some use another, thus taking advantage of the multiple processors in the system.
Associating a Virtual Machine with a Particular Processor on a Multiprocessor/Hyperthreaded Processor Host

If your host is a multiprocessor system (multiple physical processors) or if the processor or processors are hyperthreaded (where each physical processor is split into two or more logical processors), you can associate each virtual machine with a specific processor on the host.


By default, each virtual machine is associated with all physical and logical processors on the host. The virtual machine uses whichever processor is available at the time it needs to execute instructions.

To associate a virtual machine with a specific physical or logical processor on the host, do the following.

Note: These steps apply to virtual machines on Windows hosts and on Linux hosts with 2.6.x kernels. In a text editor, open the virtual machine's configuration file (.vmx).

Add the following line for each processor with which you do not want to associate the virtual machine: processor#.use = FALSE

where # is the number of the processor on the host, the count beginning at 0 .

On a Windows host, processors are listed in the registry. To view the processors, complete the following steps.

Choose Start>Run, then type regedt32. The Windows registry opens.

In the registry, choose HKEY_LOCAL_MACHINE>HARDWARE>DESCRIPTION>

System>CentralProcessor. Each CPU on the host is listed here, numbered starting with 0. On a Linux host, processors are listed in /proc/cpuinfo.

Typically, on a Windows or Linux 2.6.x kernel system with multiple hyperthreaded processors, the physical processors are numbered first, followed by the logical processors. Keep this numbering system in mind if you move the virtual machine to another host with a different number of physical or logical processors.

Caution: GSX Server 3.1 and earlier and Workstation for Linux do not honor the processor#.use option. Thus, a virtual machine cannot be associated with a specific CEC while on a Linux host, and the workaround discussed here does not work. Keep this in mind if you move a virtual machine from a GSX Server or Workstation Windows host to a Workstation or older GSX Server Linux host.


— For more information on checking free host memory, see Investigating operating system memory usage (1004014) .
=======================================================================


Investigating operating system memory usage


Purpose

This article describes how to determine memory usage. Memory usage information is useful in addressing problems encountered with an operating system as a result of a process taking up excessive memory or with an operating system that has insufficient free memory for correct operation. Memory usage problems result in slow operating system performance, slow application performance, and the inability of an application to load or continue to run. In some instances, these problems can include an operating system crashing or failing to respond.


Resolution

The methods of determining memory usage differ between operating systems. Refer to the section that matches your operating system.

Note: If you determine that you have insufficient memory, you must limit the amount of concurrently running processes or increase the amount of memory. If your operating system has been installed on a virtual machine, you can increase the amount of memory assigned to the virtual machine. For more information, see Increasing the amount of memory assigned to a virtual machine (1004059).

Windows

To determine memory usage: Run the Task Manager:
 
Click Start>Run. Type taskmgr. Click OK.

Note: If you are running a version of Windows where this command does not work, you must find an alternate method of launching the Task Manager or determining the amount of free memory.
Click the Performance tab. The m emory usage is displayed.

Linux

Note: The exact procedure may differ between distributions of Linux. If the following commands do not work, consult the manual for your distribution of Linux.

To determine memory usage:

Open a shell prompt. For more information, see Opening a command or shell prompt (1003892) . Type free -mt and press Enter. The m emory usage is displayed.
Mac OS

To determine memory usage: Press Shift + Command + U. Double-click Activity Monitor.

Click the System Memory tab. The memory usage is displayed.

— For more information on checking free disk space, see Investigating operating system disk space (1004007)

====================================================================

Investigating operating system disk space

Purpose

This article guides you through determining disk usage. Disk usage information is useful in addressing problems encountered with an operating system as a result of a lack of disk space. Problems related to disk usage can include slow operating system performance, slow application performance, and the inability of an application to load or continue to run. In some cases, these problems can include an operating system unexpectedly stopping or failing to respond.

Resolution

The methods of determining disk usage differ between operating systems. Refer to the section below that matches your operating system.

Note: If you determine that you have insufficient disk space you must free up some space or increase the size of your hard disk. If your operating system has been installed on a virtual machine, you can increase the size of its virtual disk. For more information, see Increasing the size of a virtual disk (1004047).

Windows

Note: The exact procedure differs between versions of Windows. If one procedure does not work try the other. If neither method works, consult the manual for your version of Windows.

To determine disk usage from the user interface: Double-click the My Computer icon.

Right-click the entry for your local disk.

Click Properties. Disk usage is displayed graphically. To determine disk usage from a command line:

Open a command prompt. For more information, see Opening a command or shell prompt (1003892). Type dir c:\ and press Enter. Free disk space is displayed on the last line of output.

Note: If the local disk being investigated is not c: , replace c: with its drive letter.
Linux

Note: The exact procedure may differ between distributions of Linux. If the following commands do not work for you, consult the manual for your distribution of Linux.

To determine disk usage from a shell prompt:

Open a shell prompt. For more information, see Opening a command or shell prompt (1003892) . Type df -vh and press Enter. Disk usage is displayed for each file system.

Mac OS

To determine disk usage from the user interface: Press Shift + Command + U.
Doule-click Activity Monitor.

Click the Disk Usage tab. Disk usage is displayed graphically. To determine disk usage from a shell prompt:

Open a shell prompt. For more information, see Opening a command or shell prompt (1003892). Type df -H and press Enter. Disk usage is displayed for each file system.

— For more information on checking free host CPU, see Investigating operating system CPU usage (1004016)
===================================================================

Investigating operating system CPU usage Purpose


This article guides you through determining CPU usage. CPU usage information is useful in addressing problems encountered with an operating system as a result of a process taking up excessive CPU cycles. CPU usage problems result in slow operating system performance.

Resolution

The methods of determining CPU usage differ between operating systems. Refer to the section that matches your operating system.

Note: If you determine that you have insufficient CPU you must limit the amount of concurrently running processes or increase the amount of CPU. If your operating system has been installed on a virtual machine running under an ESX Server host, see Increasing the amount of CPU assigned to a virtual machine (1004060) . If this is a virtual machine running under a different product there is no direct way of increasing the amount of CPU assigned. If your host has multiple CPUs or CPU cores, it is possible to set processor affinity among virtual machines so that one or more CPUs are not used by any other virtual machine. For more information, see Associating a Virtual Machine With a Particular Host Processor (110) . Alternatively, the host hardware must be upgraded or the virtual machine moved to a different host.


If this is a virtual machine, you can increase the amount of memory assigned to the virtual machine. For more information, see Increasing the amount of memory assigned to a virtual machine (1004059) .

Windows

To determine CPU usage: Run the Task Manager: Click Start > Run. Type taskmgr.

Click OK.

Note: If you are running a version of Windows where this command does not work, you must find an alternate method of launching the Task Manager or determining the CPU usage.

Click the Performance tab. The CPU usage is displayed.

Note: Click the Processes tab to get detailed information about the CPU usage of each process. Click the CPU column to sort the results by the amount of CPU each process is using.

Linux

Note: The exact procedure may differ between distributions of Linux. If the following commands do not work, consult the manual for your distribution of Linux.

To determine CPU usage:

Open a shell prompt. For further information, see Opening a command or shell prompt (1003892) . Type top and press Enter. The C PU usage is displayed.

Mac OS

To determine memory usage: Press Shift + Command + U. Double-click Activity Monitor.

Click the CPU tab. The CPU usage is displayed.

Note: Click the % CPU column to sort these results by the amount of CPU each process is using.

8. Disable the CPU power management features on the host. In some cases, these features can cause CPU performance issue with virtual machines. For more information, see Virtual Machine Clock Reports Time Unpredictably on Multiprocessor Systems (2041).

Comments