Killing stubborn VMs
I’ve recently encountered a problem in my lab environment while trying to power on a virtual machine (let’s call it TheVM
in this post):
Yeah, as you can see - the error reported by vCenter wasn’t really helpful:
A general system error occurred: Connection refused
After trying to remove the TheVM
(or migrate it, or do anything else with it), I got a different, yet still unhelpful, message:
A general system error occurred: Unknown error
At this point, I got a bit annoyed to say the least. Restarting individual services on the host or straight out rebooting it altogether didn’t seem to help either.
Here’s what I did next:
- As usual, I started out the troubleshooting by connecting to the relevant host through SSH:
[vlku@vdi.ssh.lab]: ~>$ ssh esxi01.ssh.lab
- Next, I’ve asked the system to list all the VM processes:
[vlku@esxi01.ssh.lab]: ~>$ esxcli vm process list
(...)
TheVM
World ID: 1234
Process ID: 0
VMX Cartel ID: 1233
UUID: 11 22 33 44 55 66 77 88-88 77 66 55 44 33 22 11
Display Name: TheVM
Config File: /vmfs/volumes/XXX/TheVM/TheVM.vmx
(...)
- From the above command’s output, I was able to retrieve the
World ID
ofTheVM
- 1234. I needed that ID to kill processes responsible for runningtheVM
:
[vlku@esxi01.ssh.lab]: ~>$ esxcli vm process kill -t hard -w 1234
- At this stage,
theVM
was “dead” and - just like dead humans - it stopped resisting. I was able to remove it from the inventory and fix it up.
Read other posts