VMware/ESX

From Segfault
Jump to: navigation, search

Installation

Booting the ESX DVD on a SunFire v40z worked...until the boot screen came up, and we only had a serial console attached to it. We could type blindly or cheat and go to the box and attach a monitor to it:

 * press DOWN, to select "text install"
 * press 5 times BACKSPACE, to remove the "quiet" from the boot arguments
 * type "console=ttyS0,115200n8"
 * press ENTER

Or we just edit the ISO, so that it'll boot in textmode with the correct arguments. Yeah, we're so l337, baby :-)

If all goes well, the installation should proceed:

ESX 4.0 -- Virtual Infrastructure for the Enterprise------------
Welcome to the ESX Text Installer
Release 4.0

This wizard will guide you through the installation of ESX.

Postinstall

Serial Console

First, enable we have to setup the physical part serial console. This might be a real terminal or an emulated terminal, accessible from the service processor of the host system. On this particular system (Sun V40z), this can be done in the system's ILOM:

> platform set console --speed 115200
> platform get console
Rear Panel Console Redirection Speed  Pruning Log Trigger
SP Console Enabled             115200 No      244 KB

This will get allow us to access the bootloader (GRUB). Here, we'll configure the kernel parameter which will allow us to watch Linux booting. During bootup, press "e" to edit the boot-entries and modify the "kernel" as follows:

 title VMware ESX 4.0
       #vmware:autogenerated esx
       root (hd0,0)
       uppermem 307200
       kernel /vmlinuz ro root=/dev/sda2 mem=300M console=ttyS0,115200n8 text
       initrd /initrd.img

Note that the "text" parameter was also added, so that VMware would start into a text install properly.

However, after booting has finished, the kernel will load /sbin/init which is not (yet) configured for our serial console and we will lose control over the serial line now. We could login via SSH to the newly installed ESX server, but we only created the root during the installation process and root access is NOT possible via SSH right now. Luckily, KB 8375637 has a way how to enable root access for SSH connections:

  1. Login with the vSphere Client directly on the ESX server (NOT on the vCenter server!)
  2. Once logged in, there will be a "Users & Groups" tab. Create a new, temporary user with a password and shell access
  3. When the user has been created, login via SSH with the username we just created
  4. Use "su" to gain root access.

Now we can add the following to /etc/inittab to support a serial console:

S:2345:respawn:/sbin/agetty 115200 ttyS0

Reload init with init q. Add ttyS0 to securetty to allow logins from the serial console:

$ tail -3 /etc/securetty 
tty10
tty11
ttyS0

From that point on, login via the serial console should be possible. While we're at it, enable root logins via SSH:

$ grep ^Permit /etc/ssh/sshd_config 
PermitRootLogin yes

$ service sshd reload

With all that in place, it might be a good idea to remove the (temporary) user we just created.

Firewall

While inbound ssh traffic is enabled we want to enable outbound ssh client connections. Outbound NetBIOS and NFS might come in handy too:

esxcfg-firewall -e sshClient
esxcfg-firewall -e smbClient
esxcfg-firewall -e nfsClient

As a shortcut, (temporarily) allow all outgoing traffic:

esxcfg-firewall --allowOutgoing

Be sure to disable it again:

esxcfg-firewall --blockOutgoing

To allow a specific service, i.e. for outbound syslog messages:

esxcfg-firewall -o 514,udp,out,syslog

Software

DAG

Import their public keys:

$ wget http://apt.sw.be/RPM-GPG-KEY.dag.txt
$ rpm --import RPM-GPG-KEY.dag.txt
$ rpm -qa gpg-pubkey*
gpg-pubkey-6b8d79e6-3f49313d

Let's see how our installation names itself:

$ rpm -qf /bin/ls
coreutils-5.97-14.el5

This "el5" stands for "Enterprise Linux 5" - more specifically for RHEL 5. Thus we can install packages from an "el5"-repository:

$ export REL=el5  
$ rpm -hiv http://apt.sw.be/redhat/$REL/en/`uname -i`/extras/RPMS/rsync-3.0.9-1.$REL.rfx.x86_64.rpm

pkgs.org

pkgs.org is a meta-site, we can install from one of the available mirror sites:

rpm -hiv http://mirror.centos.org/centos/5/os/`uname -i`/CentOS/rsync-3.0.6-4.el5_7.1.x86_64.rpm

Disable .vswp files

To disable (the use of) .vswp files, set "Memory Reservation" to the configured memory size of the VM. This way an empty .vswp file will be created.

vSphere Client

Once VMware ESX is installed, the client can be downloaded from the ESX server:

 https://ESX_SERVER.example.com/client/VMware-viclient.exe

vSphere client needs at least the following ports to connect to vCenter Server and/or ESXi/ESX hosts:

443/tcp
902/tcp
903/tcp

SSH forwarding should work just fine with these ports. In some cases, Windows might need a hint where to connect to:

vSphere Client could not connect to vCenter Server client01
Details: A connection failure occured (Unable to connect to remote server) 
  1. Add "127.0.0.1 esx01" to %systemroot%\system32\drivers\etc\hosts.
  2. Point vSphere Client to esx01 and the connection should work now.

Syslog

Configure a syslog server:

$ tail -1 /etc/syslog.conf
*.*;auth,authpriv,cron.warning             @loghost

$ service syslog restart

Update

ESX updates can be applied in different ways:

Before the actual upgrade, we'll shut down or migrate all running VMs and enter the maintenance mode:

$ vimsh -n -e /hostsvc/maintenance_mode_enter
$ vimsh -n -e /hostsvc/runtimeinfo | grep inMaintenanceMode
inMaintenanceMode = true,

vihostupdate

Download vSphere CLI, then unpack:

sha1sum ../VMware-vSphere-CLI-*.tar.gz           # Verify the checksum!
tar -xzf ../VMware-vSphere-CLI-*.tar.gz
cd vmware-vsphere-cli-distrib

Install:

$ ./vmware-install.pl --prefix=/opt/vmware/cli
Creating a new vSphere CLI installer database using the tar4 format.
Installing vSphere CLI.
Installing version 253290 of vSphere CLI

which: no ld in (/bin:/usr/bin:/sbin:/usr/sbin)

No Crypt::SSLeay Perl module or linker could be found on the system.  Please
either install SSLeay from your distribution or install a development toolchain
and run this installer again for encrypted connections.

The following Perl modules were found on the system but may be too old to work
with vSphere CLI:

Compress::Zlib

Please wait while copying vSphere CLI files...

The installation of vSphere CLI 4.0.0 build-253290 for Linux completed
successfully. You can decide to remove this software from your system at any
time by invoking the following command:
"/opt/vmware/cli/bin/vmware-uninstall-vSphere-CLI.pl".

Add /opt/vmware/cli to our PATH:

$ printf 'PATH=$PATH:/opt/vmware/cli/bin\nexport PATH\n' >> /etc/profile.d/local.sh
$ . /etc/profile.d/local.sh
$ vihostupdate --version
VI Perl Toolkit version: 4.0
Script 'vihostupdate' version: 4.0

esxupdate

Note: ESX Updates are meant to be cumulative - however patches are comprised as "sets" and not every set is included in the next update. For example:

  • Patch_01 updates the following sets: VMkernel, hostd and Tools
  • Patch_02 updates the following sets: VMkernel, hostd
→ Patch_02 contains all available updates for "VMkernel" and "hostd" but leaves out "Tools". So, although patches are cumulative we will have to apply them one after another after all :-\

After downloading the updates they need to be transferred to the ESX host. Alternatively we could also access them via a network share:

mount -t cifs -o ro,user=guest //server/updates /mnt/cdrom
cd /mnt/cdrom

List all updates included in the package:

esxupdate --bundle update-from-esx4.0-4.0_update04.zip info | less

After reviewing the output, perform the actual update:

esxupdate --bundle update-from-esx4.0-4.0_update04.zip update
sync
reboot

After the reboot (and possibly further updates), let's review all installed updates:

$ esxupdate query
----Bulletin ID---- -----Installed----- ----------------Summary---------------- 
ESX400-Update04     2012-05-17T11:35:27 VMware ESX 4.0 Complete Update 4        
ESX400-201203402-SG 2012-05-17T12:26:39 Updates Python package                  
ESX400-201203403-SG 2012-05-17T12:26:39 Updates Curl RPM                        
ESX400-201203404-SG 2012-05-17T12:26:39 Updates samba RPM and libsmbclient      
ESX400-201203405-SG 2012-05-17T12:26:39 Updates popt, rpm, rpm-libs, rpm-python 
ESX400-201203406-SG 2012-05-17T12:26:39 Updates libuser                         
ESX400-201203407-SG 2012-05-17T12:26:39 Updates Kerberos RPMs                   
ESX400-201203408-BG 2012-05-17T12:26:39 Updates tzdata                          
ESX400-201205401-SG 2012-05-17T12:38:28 Updates VMkernel, VMX, and others 

The specific update path for this machine was:

esxupdate --bundle update-from-esx4.0-4.0_update04.zip update      # Released 2011-11-17
esxupdate --bundle ESX400-201112401.zip update                     # Released 2011-12-13
esxupdate --bundle ESX400-201203001.zip update                     # Released 2012-03-30
esxupdate --bundle ESX400-201205001.zip update                     # Released 2012-05-03

After the update has been completed, exit the maintenance mode:

$ vimsh -n -e hostsvc/maintenance_mode_exit
$ vimsh -n -e /hostsvc/runtimeinfo | grep inMaintenanceMode
  inMaintenanceMode = false,

vmware-cmd

Get status of each registered virtual machine:

$ for v in `vmware-cmd -l`; do printf "$v  "; vmware-cmd "$v" getstate; done
/vmfs/volumes/3cfe21dd-c4f646d3-063f-00013d143b12/netbsd0/netbsd0.vmx  getstate() = on
/vmfs/volumes/3cfe21dd-c4f646d3-063f-00013d143b12/fedora0/fedora0.vmx  getstate() = on
/vmfs/volumes/3cfe21dd-c4f646d3-063f-00013d143b12/gentoo0/gentoo0.vmx  getstate() = off

Start virtual machine:

export VMX=/vmfs/volumes/3cfe21dd-c4f646d3-063f-00013d143b12/gentoo0/gentoo0.vmx
vmware-cmd $VMX start

Stop virtual machine, even when no VMware tools are installed:

vmware-cmd $VMX stop hard

Reset virtual machine, even when no VMware tools are installed:

vmware-cmd $VMX reset hard

Create snapshot:

vmware-cmd $VMX createsnapshot gentoo-20120928 "My first snapshot" 1 0       # QuiesceFilesystem=1, IncludeMemory=0

vim-cmd

vim-cmd can also be used to access the virtual machines.

$ vim-cmd vmsvc/getallvms
Vmid     Name                 File                  Guest OS     Version
128    fedora0    [v40z2] fedora0/fedora0.vmx     rhel6Guest     vmx-07
16     debian1    [v40z2] debian1/debian1.vmx     debian5Guest   vmx-07
160    netbsd0    [v40z2] netbsd0/netbsd0.vmx     freebsdGuest   vmx-07

Start/shutdown virtual machine:

vim-cmd vmsvc/power.on 160
vim-cmd vmsvc/power.shutdown 160    # Works only with VMware Tools installed

Power off virtual machine:

vim-cmd vmsvc/power.off 160

List snapshots of one virtual machine:

$ vim-cmd vmsvc/snapshot.get 160
Get Snapshot:
|-ROOT
--Snapshot Name        : 2011-08-06
--Snapshot Desciption  :
--Snapshot Created On  : 8/6/2011 13:56:37
--Snapshot State       : powered off

List snapshots of all virtual machine:

$ for vm in `vim-cmd vmsvc/getallvms | awk '!/^Vmid/ {print $1}'`; do
     printf "VMID: $vm"
     vim-cmd vmsvc/get.summary $vm | grep name
     vim-cmd vmsvc/snapshot.get $vm
     echo
done | less

Create snapshot:

vim-cmd vmsvc/snapshot.create 160 2012-09-28 "test" 0 1      # includeMemory=0, quiesced=1

Revert to snapshot:

$ vim-cmd vmsvc/snapshot.revert 160 0 0 0                 # suppressPowerOff=0, snapshotLevel=0, snapshotIndex=0
Remove Snapshot:
|-ROOT
--Snapshot Name        : 2012-12-16
--Snapshot Desciption  :
--Snapshot Created On  : 12/16/2012 23:19:29
--Snapshot State       : powered off

Remove snapshot:

vim-cmd vmsvc/snapshot.remove 160 0                       # removeChildren=0

Note: unlike vmware-cmd, the vim-cmd returns to the command-prompt immediately and the issued task continues in the background! Use vmsvc/get.tasklist to see running tasks.

Known Issues

Repeating characters in VMware Console

On a slow link, the console sometimes repeats characters, making it hard to type correctly. Add this to your VM configuration:

   keyboard.typematicMinDelay = 2000000

This will set the repeat time to 2000000µs, or 2 seconds.

IPMI hangs

Booting might hang on IPMI:

* ipmi ...                           [ !! ]
* vmci ...                           [ ok ]

In /var/log/messages we see:

 sfcb[3843]: RawIpmiProvider::initialize: No IPMI Interface. Will not be polling. \ 
             Error Message: File /dev/ipmi0 not found

According to vm-help.com this most likely indicates a server or BIOS issue. To disable ipmi:

 sed -i.orig '/Exec/s/^/return ${SUCCESS} # disable IPMI\n\n/' /etc/vmware/init/init.d/72.ipmi
 mv /etc/vmware/init/init.d/72.ipmi.orig /var/tmp

Timed out waiting for vmware-aam to startup

During bootup, this happens:

 Starting vmware-aam:Timed out waiting for vmware-aam to startup... backgrounding[FAILED]

The AAM (Automated Availability Manager) logfile (/var/log/vmware/aam/vmware_bob.log) shows:

 Info FT Fri May 27 10:37:51 2011
By: FT/Agent on Node: bob
MESSAGE: Need to reconfigure heartbeat settings. Not yet set up.
===================================
 Info FT Fri May 27 10:37:51 2011
By: FT/Agent on Node: bob
MESSAGE: Starting reconfiguration of heartbeat settings.
===================================
 Error FT Fri May 27 10:37:52 2011
By: FT/Agent on Node: alice
MESSAGE: ftProcMon failed. Being restarted
===================================
 Info FT Fri May 27 10:37:53 2011
By: FT/Agent on Node: bob
MESSAGE: Finished reconfig of heartbeat settings in 2 seconds.
===================================
 Info NODE Fri May 27 10:37:53 2011
By: FT/Agent on Node: alice
MESSAGE: Node v40z1 is running.
===================================
 Info PROC Fri May 27 10:38:01 2011
By: FT/Agent on Node: alice
MESSAGE: Started process VMap_bob on bob [pid = 9312]

And indeed, booting continued and AAM seems to be running:

 # /etc/init.d/vmware-aam status
 vmware-aam is running                                      [  OK  ]

Unable to get COS default route

During boot, this happens:

 Starting VMware ESX services:
 'IpSecConfig' warning] Ipv6 not Enabled
 'RoutingInfo' warning] Unable to get COS default route
 'RoutingInfo' warning] Unable to restore VMkernel default gateway (10.0.0.1): \
                        Unable to set VMkernel gateway address. Please verify your IP settings and try again

I only found KB 1002729 where this happened in combination with iSCSI and DHCP enabled. The solution was to specify a static default route:

 # echo "GATEWAY=10.0.0.1" >> /etc/sysconfig/network
 # cat /etc/sysconfig/network
 NETWORKING=yes
 HOSTNAME=bob.example.com
 IPV6_AUTOCONF=no
 NETWORKING_IPV6=no
 GATEWAY=10.0.0.1

Changing the boot order for a VM

According to "Changing the default boot sequence for newly created virtual machines" this can only be done via the VM's virtual BIOS, to be access after POST via F2.

Linux/x86-64 as a guest VM

We're currently unable to boot Linux/x86-64 as a guest VM (Host: AMD Opteron 848).

Status of other host hardware objects

Sometimes both ESX hosts are tagged with the following error:

v40z1.int.consol.us
Warning
Status of other host hardware objects
11/22/2012 1:31:54 AM

...which isn't really helpful. The "Hardware Status" tab should list the root cause for this, but sometimes the tab is not visible. Select "Plug-ins" → "Manage Plug-ins" and try to enable the "vCenter Hardware Status" and the "vCenter Service Status" plugin. But maybe this isn't working either:

vCenter Hardware Status VMware, Inc. 4.0
Disabled
Displays the hardware status of  hosts (CIM monitoring)
The following error occured while downloading the script plugin from https://natascha:8443/cim-ui/scriptConfig.xml: 
Unable to connect to the remote server

Check if the "VMware VirtualCenter Management Webservices" service is running (and set to "Automatic"). Once started, try to enable the plugins again. Now the "Hardware Status" tab should be visible and the real warning should be printed:

System Management Software 0 Event Logging: Log full,out of 94 sensors

Aha! :-) Login to the ILOM and clear the SEL:

ilom$ ipmi clear sel

vmware-webAccess

Access the vSphere Web Access URL (https://esx01.example.org/ui/) might generate an error:

503 Service Unavailable

Looking at the latest /var/log/vmware/hostd-8.log logfile, we can see the following:

Connection to localhost:8308 failed with error N7Vmacore15SystemExceptionE(Connection refused).

The vmware-webAccess service might not be running:

$ service vmware-webAccess status
webAccess is stopped

$ chkconfig --list vmware-webAccess
vmware-webAccess         0:off   1:off   2:off   3:off   4:off   5:off   6:off

Let's enable it for runlevels 3, 4 and 5 and start it now:

$ chkconfig --level 345 vmware-webAccess on
$ chkconfig --list vmware-webAccess
$ vmware-webAccess       0:off   1:off   2:off   3:on    4:on    5:on    6:off

$ service vmware-webAccess start
$ service vmware-webAccess status
webAccess (pid 26395) is running...

Now the Web Access URL should be working.

Links

Files

These kernel configurations should be able to start an ESX virtual machine: