Sunday, January 16, 2011

VCP Rants - DRS, DRS Clusters, Maintanence Mode

--- VCP RANT ---

Hey guys, here comes a VCP rant!

This post assumes you already have some knowledge of VMotion in VMWARE, just a super quick refresher on VMotion: Vmotion allows one ESX Server running a Guest VM to "migrate" the Guest VM over to another ESX server with a very very small amount of downtime, it does this by using shared storage and copying the memory from one ESX server to the "target" ESX server, this allows you migrate a VM from one machine to another so that you can perform maintance on a physical host without causing any downtime. When configured correctly vmotion results in essentially no downtime when migrating a host.

OK, so today I am going to talk a little bit about clusters in ESX, a cluster in ESX is essentially a grouping of ESX/ESXi hosts in Vcenter, you would normally group together hosts that run the same CPU. The reason for this is that a DRS cluster uses lots of features such as Vmware HA and VMOTION (more on that later) that assume that all the hosts in the DRS cluster have compatible CPU's (same brand of CPU, same family of CPU)

OK, so hopefully I have explained a Cluster a little bit, now when you create a cluster, your asked if you want to enable the hosts for DRS, and given an option for the "automation level" between fully automated, partially automated and manual, just what the heck is that all about?

First of all, DRS stands for Dynamic Resource Scheduling, it serves two purposes:

1. Whenever a new VM is started, DRS chooses an appropriate ESX server to run the VM on based on CPU load, memory etc. being utilized by each of the hosts. This is called intelligent placement

2. Based on usage and load across the ESXi hosts, DRS will either recommend or execute Vmotion changes on the vm's to more evenly distribute the load amongst the available ESX hosts.

When you first setup DRS, you will be given three options, fully automated, partially automated and manual.

Manual requires an administrator to make and approve any changes DRS recommends, this includes the placement of newly started VM's, when you start a VM with manual mode a dialog box will pop up showing recommendations on which hosts the vcenter server thinks you should run the VM on based on resource usage across the hosts. Each option will also be given a "priority" with higher priority being more recommended. DRS will also suggest migrations to you (find them under the DRS tab) but will never execute them without your permission.

Partially Automated
This works almost exactly the same as manual mode but when a VM is started a host is chosen automatically without any administrator intervention.

Fully Automated
Fully automated will both start VM's on the DRS recommended hosts and execute certain DRS recommendations, the recommendations it will execute depends on the level of automation you select when you select fully automated, these range from aggressive to conservative, there are 5 options available, which is no coincidence, because each recommendation by DRS is given a "stars" rating, with 5 being a high recommendation, the level of automation you choose selects what "star" level of recommendation is executed.

So, now lets talk briefly about maintanence mode.

If your anything like me, you have sometimes right-clicked a host in ESXi and noticed "enter maintanence mode" and wondered just what the heck that is, and how it works. Maybe, like me you have been lambasted by one of your VMWARE coworkers who has got upset at you for "not entering maintance mode first" on an ESXI host before shutting it down.

So what does it actually do? Maintanence mode prevents an ESXi host from being picked by DRS to run any new VM's, no VM's will be allowed to startup on a host that has entered maintanence mode. Also, every VM that is currently running on the host will be given a five star recommendation to be migrated off, the idea being that all the VM's can be migrated off so you can run the maintence, like a patch perhaps, hence the term maintanence mode.

I hope this explains that part.

Now finally, it's worth talking about DRS rules, you can override individual virtual machines DRS levels, for example you might decide that there is a critical VM that you don't want to move between hosts in the cluster unless you do it manually, you could set the DRS level for that individual VM to manual or even disable completely (although thats not recommended.)

DRS rules also allow you to do things like set VM affinity or anti-affinity, this allows you to say "Make sure these two VM's try not to run on the same ESX host" or "always run these VM's on the same ESX host"

Why might that be useful? Consider an example of an exchange server with two hub transport machines for redundancy, if your DRS moves both the Hub Transport VM's to the same ESXi Host you now have a physical single point of failure for your hub transport servers, this might not be what you had in mind :p

By setting affinity you can prevent these VM's from running together. Ofcourse these rules are overwritten if the rules would actually take the VM down, for example, in our previous scenario if you lost all your ESXi hosts except for one, VMware HA would still allow both the Exchange hub transports to be run on the same host, even though thats less than optimal, its probably better than not running them at all!

I hope i helped someone out there!

1 comment: