Sunday, January 16, 2011

VCP Rants - DRS, DRS Clusters, Maintanence Mode

--- VCP RANT ---

Hey guys, here comes a VCP rant!

This post assumes you already have some knowledge of VMotion in VMWARE, just a super quick refresher on VMotion: Vmotion allows one ESX Server running a Guest VM to "migrate" the Guest VM over to another ESX server with a very very small amount of downtime, it does this by using shared storage and copying the memory from one ESX server to the "target" ESX server, this allows you migrate a VM from one machine to another so that you can perform maintance on a physical host without causing any downtime. When configured correctly vmotion results in essentially no downtime when migrating a host.

OK, so today I am going to talk a little bit about clusters in ESX, a cluster in ESX is essentially a grouping of ESX/ESXi hosts in Vcenter, you would normally group together hosts that run the same CPU. The reason for this is that a DRS cluster uses lots of features such as Vmware HA and VMOTION (more on that later) that assume that all the hosts in the DRS cluster have compatible CPU's (same brand of CPU, same family of CPU)

OK, so hopefully I have explained a Cluster a little bit, now when you create a cluster, your asked if you want to enable the hosts for DRS, and given an option for the "automation level" between fully automated, partially automated and manual, just what the heck is that all about?

First of all, DRS stands for Dynamic Resource Scheduling, it serves two purposes:

1. Whenever a new VM is started, DRS chooses an appropriate ESX server to run the VM on based on CPU load, memory etc. being utilized by each of the hosts. This is called intelligent placement

2. Based on usage and load across the ESXi hosts, DRS will either recommend or execute Vmotion changes on the vm's to more evenly distribute the load amongst the available ESX hosts.

When you first setup DRS, you will be given three options, fully automated, partially automated and manual.

Manual requires an administrator to make and approve any changes DRS recommends, this includes the placement of newly started VM's, when you start a VM with manual mode a dialog box will pop up showing recommendations on which hosts the vcenter server thinks you should run the VM on based on resource usage across the hosts. Each option will also be given a "priority" with higher priority being more recommended. DRS will also suggest migrations to you (find them under the DRS tab) but will never execute them without your permission.

Partially Automated
This works almost exactly the same as manual mode but when a VM is started a host is chosen automatically without any administrator intervention.

Fully Automated
Fully automated will both start VM's on the DRS recommended hosts and execute certain DRS recommendations, the recommendations it will execute depends on the level of automation you select when you select fully automated, these range from aggressive to conservative, there are 5 options available, which is no coincidence, because each recommendation by DRS is given a "stars" rating, with 5 being a high recommendation, the level of automation you choose selects what "star" level of recommendation is executed.

So, now lets talk briefly about maintanence mode.

If your anything like me, you have sometimes right-clicked a host in ESXi and noticed "enter maintanence mode" and wondered just what the heck that is, and how it works. Maybe, like me you have been lambasted by one of your VMWARE coworkers who has got upset at you for "not entering maintance mode first" on an ESXI host before shutting it down.

So what does it actually do? Maintanence mode prevents an ESXi host from being picked by DRS to run any new VM's, no VM's will be allowed to startup on a host that has entered maintanence mode. Also, every VM that is currently running on the host will be given a five star recommendation to be migrated off, the idea being that all the VM's can be migrated off so you can run the maintence, like a patch perhaps, hence the term maintanence mode.

I hope this explains that part.

Now finally, it's worth talking about DRS rules, you can override individual virtual machines DRS levels, for example you might decide that there is a critical VM that you don't want to move between hosts in the cluster unless you do it manually, you could set the DRS level for that individual VM to manual or even disable completely (although thats not recommended.)

DRS rules also allow you to do things like set VM affinity or anti-affinity, this allows you to say "Make sure these two VM's try not to run on the same ESX host" or "always run these VM's on the same ESX host"

Why might that be useful? Consider an example of an exchange server with two hub transport machines for redundancy, if your DRS moves both the Hub Transport VM's to the same ESXi Host you now have a physical single point of failure for your hub transport servers, this might not be what you had in mind :p

By setting affinity you can prevent these VM's from running together. Ofcourse these rules are overwritten if the rules would actually take the VM down, for example, in our previous scenario if you lost all your ESXi hosts except for one, VMware HA would still allow both the Exchange hub transports to be run on the same host, even though thats less than optimal, its probably better than not running them at all!

I hope i helped someone out there!

Monday, January 3, 2011

VCP related studies - Memory in VMWARE reservations, limits and memory overcommit.

Hey Guys
--------------- VCP Rant -------------------------------
I might have to rename this blog to VCPRants soon enough :) I am busy chasing my VCP or VMWare certified Professional,

Cisco Guys, let me tell you that your highly likely to run into Vmware sooner than you think. The new UC on UCS stratergy from cisco means that all unified communications products will soon be running under VMWare.

With this in mind, I wanted to make sure I understood VMWare very well, so I can do the best possible job with the next generation UC.

This blog article will be the first in a series on VMWare, these blog posts do assume you at least have some rudimental knowledge on what virtual machines are and a reasonable understanding of what ESX is
----------------------------- End Rant -----------------

Now onto business, this blog article is going to talk about Memory in VMWare.

ESX/ESXi is the only commercially available hypervisor that I am aware of (please leave a comment if you know otherwise!) that supports memory overcommit, or the act of provisioning more memory for virtual machines than a host actually has.

Lets take a quick example, I have an ESX Host, HostMachine, this HostMachine has 4 gig of ram, if i want to create four virtual machines, each assigned 1 gig of ram, without memory overcommit, i could only provision 4 virtual machines (in reality, it would be slightly less than 4 gig of ram available to my virtual machines because of ESX overheads but lets ignore that for a minute.)

With memory overcommit, vmware allows you to overcommit memory, so in our example above , we could create 8 hosts, each with 1 gig of memory assigned and run all 8 of them at the same time. ESX looks after memory usage using three key technologies:

1. Idle Page Reclamation
this technology reclaims any idle pages in memory and allows other virtual machines to use them
2. Transparent Page sharing
if two virtual machines have the exact same memory contents in a particular memory page, this single page will be shared amongst both the virtual machines (lets say for example, two virtual machines running the exact same version of windows, probably quite a few bits of memory that are going to contain exactly the same information)
3. Ballooning
This feature relies on vmware tools being installed on the guest OS, but allows Vmware to tell the vmware tools driver on the guest OS to start grabbing memory, it can then return that memory to the ESX host and the ESX host can then allocate it to other virtual machines in a process called "inflation", memory can be returned when not needed using a process called, you guessed it: deflation.

My favorite tech of the above is definitely ballooning, what a brilliant idea.

Ok, so now you know how VMWare can share and overcommit memory, what about controlling access to memory, after all, memory is a finite resource. Plus what happens when there is simply no other memory available?

Well VMware uses swap file memory, much like windows, in the event that no physical memory is actually available. This .vswp extension file sits by default in the same directory as all the other files for the vmware machine. Its size is the same size as the memory allocated to the virtual machine, so in our example above the machines would have 1 gig swap files. This file is used as a last resort should there litreally be no other memory available to be used even with the three technologies mentioned above.

But Disk is slow! Incredibly slow compared to ram, up to 800,000 times slower. So how can you ensure your important mission critical virtual machines can be allocated REAL physical RAM rather than being paged onto this swap file? The answer in VMWARE is reservations and limits.

Go to Edit Settings under your virtual machine then to resources, then memory. You will see you can set reservations and limits, by default the settings are
reservations: 0 MB
Limit: Unlimited

Reservations control how much physical RAM _must_ be available to the Virtual machine before it will even be allowed to turn on. This ensures that the machine has an acceptable amount of memory available before it will boot. it also ensures that no other virtual machines can en-roach upon this memory. So for our example, lets say of our four virtual machines one of them is our ARC Attendant Console server, its mission critical so we want to ensure it gets at least 512 MB of physical RAM, we set the reservation to 512.

Limit controls how much of the memory assigned to the virtual machine can actually be physical memory. So for example, if on our ARC server we decided that the machine needed no more than 768 MB of memory, but might occasionally burst to 1 gig, we could set the limit to 768. THis means that once the virtual machine is using 768 meg of physical memory, any additional memory it needs over and above this to its configured memory limit (1 gig) will be stored on the swap file.

So Reservation guarantees that the virtual machine will be allocated at LEAST that amount of physical memory, the limit sets a limit to how much physical ram the virtual machine can have before it has to start using swap file and the configured memory is how much memory the OS can actually use.

I hope this helps a studying VCP out there!