Q.14 | Killing stubborn VMs

It's killing time in the virtual world.

Q.14 | Killing stubborn VMs

I've recently encountered a problem in my lab environment while trying to power on a virtual machine (let's call it TheVM in this post):

stupidvm

Yeah, as you can see - the error reported by vCenter wasn't really helpful:

A general system error occurred: Connection refused

After trying to remove the TheVM (or migrate it, or do anything else with it), I got a different, yet still unhelpful, message:

A general system error occurred: Unknown error

At this point, I got a bit annoyed to say the least. Restarting individual services on the host or straight out rebooting it altogether didn't seem to help either.

Here's what I did next:

  1. As usual, I started out the troubleshooting by connecting to the relevant host through SSH:
[vlku@vdi.ssh.lab]: ~>$ ssh esxi01.ssh.lab
  1. Next, I've asked the system to list all the VM processes:
[vlku@esxi01.ssh.lab]: ~>$ esxcli vm process list

(...)
TheVM
World ID: 1234 
Process ID: 0
VMX Cartel ID: 1233
UUID: 11 22 33 44 55 66 77 88-88 77 66 55 44 33 22 11
Display Name: TheVM
Config File: /vmfs/volumes/XXX/TheVM/TheVM.vmx
(...)
  1. From the above command's output, I was able to retrieve the World ID of TheVM - 1234. I needed that ID to kill processes responsible for running theVM:
[vlku@esxi01.ssh.lab]: ~>$ esxcli vm process kill -t hard -w 1234
  1. At this stage, theVM was "dead" and - just like dead humans - it stopped resisting. I was able to remove it from the inventory and fix it up.

Read all other the posts of the "Quickie" series in the archive

Related Article