Sunday, February 5, 2017

HyperV Live Migration Changes in Windows Server 2016

 

After upgrading my lab servers to Windows Server 2016, I had an “interesting” (ask a Minnesotan what that means) weekend troubleshooting Hyper-V Live Migration, finally finding that there has been a major change in the way virtual machine migration works, and a couple gotchas. In an effort to save others the same trouble, I’ll discuss them here.

Image From Polarstein on Flickr

Kerberos Constrained Delegation, 0x8009030E, and You(r Network Service Account)

No credentials are available in the security package”, Event ID 20306. Under previous circumstances, this would have indicated that you didn’t have constrained delegation set up correctly as outlined in numerous other articles on the internet, but due to an underlying change the correct configuration is now different.

Previously, failover would be set up as outlined in articles such as this, with each HyperV host set up to allow constrained delegation over the Kerberos protocol only.


Starting in server 2016, the delegation must be set up to allow delegation over any protocol as displayed here:


The reason for this is that 2016 has changed the WMI provider used to a new version, which relies on WinRM to execute remote procedures rather than DCOM. WinRM, running as the Network Service, cannot access the Kerberos service ticket obtained to perform the action. By allowing any protocol, a “S4U” logon is sufficient to authenticate the request. While this setting is somewhat less secure, the point is made by the Team PM (published a few days ago, link below) that sensitive (privileged) accounts in any domain should have the “Account is sensitive and cannot be delegated” flag enabled to mitigate delegation risk.

NIC Teaming, 0x8007274C/0x80072741, and You(r Service Startup Problem)

This may impact 2012/R2 as well, though for some reason it only bit me on 2016. If using NIC teaming on your host for your failover network, the interface may not be available when the Virtual Machine Management Service (VMMS) attempts to start on bootup. This condition will result in the service not opening the port (6600) on the server, which makes it impossible to failover virtual machines. To fix this, change the service startup type from “Automatic” to “Automatic(Delayed Start)”. With PowerShell as our weapon of choice (hey nano server!) this is a two-step process:

Set-Service –Name vmms –StartupType "Automatic"
Set-ItemProperty -Path "Registry::HKLM\System\CurrentControlSet\Services\vmms" -Name "DelayedAutoStart" -Value 1 -Type DWORD

The service type should already be automatic, but we’ll re-assert that here to be sure. This will only delay service (and thus VM) startup by a small bit, but ensure that the adapter is available when it does.

EventID 21024, Failed at Migration Source, and You(r Crazy, Still Unexplained Error)

This is an odd one I can’t fully explain, but I’m including it in the hopes it may save others some time. On 2 of the 3 hosts, I had the following error preventing live migration after full 2016 setup:

Virtual machine migration operation for 'VMNAME' failed at migration source 'VMHOST'. (Virtual machine ID GUID-GOES-HERE)

This error message was not accompanied by any supporting information whatsoever. After numerous network captures and log combing, I found evidence of something slightly off with domain membership. In both cases the host was able to process group policy for the computer object, but never for any logged on users. This led me to attempt leaving and re-joining the domain, which in all cases remediated the problem. Note that when doing so you will need to delete the computer account prior to re-joining, then set up the constrained delegation as outlined above for each host again.

I wish I had more information about the root cause of this issue, but with it fixed I’m moving on.

In Closing

The upgrades to my lab didn’t go as smoothly as I would like, but I’m glad to have these issues out of the way to make for smoother efforts with production efforts. Hopefully this information will help you as well!

Additional References

Microsoft Virtualization Blog: Live Migration via Constrained Delegation with Kerberos in Windows Server 2016

Microsoft GTCS Romania EPS: Shared Nothing Migration Fails

Canberra PFE Team Blog: Kerberos Troubleshooting

Nyan Cat: 10 Hours 4k UHD For Endless Kerberos Packet Caps and Analysis!