You are here: Home > Blog

Recently I wrote a post about the hazards of XenServer HA and how to avoid a couple of different pitfalls which lead to XenServer fencing. In that post I talked about the necessity of correctly setting the HA heartbeat timeout for your environment so that your XenServers will allow enough time for a storage failover to occur. The idea, of course, is to prevent your XenServer from going into a “fence” condition which can occur for many reasons. The reason we’re discussing here is triggered when the XenServer believes its storage has suddenly become unavailable and it is not able to recover its state quickly enough to prevent the HA timeout from fencing the server.

I frequently build environments that use a pair of replicated DataCore SANmelody nodes (two physical nodes) and configure my XenServer in a multipath configuration. With this configuration my XenServers see two active paths to their storage (the status of the multipath is shown in the image below) – one path to each of the two nodes. If, for example, one of the SANmelody nodes goes off line, the other node will immediately take over. However, the XenServers have to be given enough time to fully recognize a failover has occurred, and the storage is still available, in order to avoid a fence. The default HA timeout in XenServer is 30 seconds which means if it takes a XenServer more than 30 seconds to realize the storage is still healthy and available then the server will fence. If the storage was indeed still available, then more than likely there were still VM guests up and running on the XenServer, which have now been taken offline unnecessarily.

To test and tune this setting I first make sure HA is enabled on the pool, then I perform hard failover tests where, using a DRAC or iLO card if I have one, I suddenly power cycle one of the storage servers and watch to see if any XenServers fence. I run this hard power cycle test because this specific problem never comes up with simple storage stops and restarts; rather it only shows up when a storage server actually goes down suddenly, or “hard,” as we say. So I run these tests because I want to stress the system to simulate unfortunate things like power failures, sudden server reboots due to gremlins, and other things along those lines. If nothing happens then great – let’s go home and we can sleep well knowing HA is working correctly. But what if you do have one or more servers which do fence because they believe their storage is gone when in fact it is not?

The last time I had this happen to me I had to test my environment several times, and with each successive run through the hard failover test I used a different timeout setting. In the end I found that 120 seconds worked best for me. (Keep in mind I am doing this during a build and there are no live production workloads running on any of these servers.)

So what is the downside of setting your timeout this high? Well, if a XenServer really fails (for whatever reason) it will take about 120 seconds for the Pool to decide there is a problem and then take action to restart the VMs elsewhere based upon available resources and the restart priority of each VM. Personally, I’d rather wait the 120 seconds when something has really gone wrong than suffer an unnecessary fence/shutdown when all the VMs were actually still running fine.

So how did I set the timeout values? Like this:

Rather than enable HA from the GUI you’re going to have to do it from a command line. I use PuTTY when I’m not actually at the XenServer console. The command you will use is xe pool-ha-enable heartbeat-sr-uuids=your uuid goes here ha-config:timeout=however many seconds you want.

But in that command string, how do you know what the sr-uuid is? The way I find it is to start with XenCenter and locate the SR (storage repository) which is going to be used for the heartbeat status disk. I locate the SCSI ID of that SR and copy the number as shown in this image (click picture to view full-size):

Finding the SCSI ID of a Storage Repository

Finding the SCSI ID of a Storage Repository


After I have that number I next connect to the master XenServer using PuTTY (the master XenServer in a pool is always the top server shown in XenCenter) and run this command xe pbd-list device-config=SCSIid:\ 360030d903131325f48415f4865617274 where the number in RED is the ID just copied from Xencenter:
Finding the sr-uuid

Finding the sr-uuid


What is shown above is what the output should look like. The reason you see three sequences in this example is because there are three hosts in this pool, notice the host-uuids are all different. However also notice the sr-uuid value is the same in each grouping and this is the number we are after. Take the sr-uuid you just found and enter it into a command like this: xe pool-ha-enable heartbeat-sr-uuids=7a213624-1209-c467-42ed-6ef72a1b7699 ha-config:timeout=120

It may take a bit of time for the command to actually complete but once it does you should be able to refresh your Xencenter by using either the xe-toolstack-restart or the service xapi restart command and then when you look at the pool level on the HA tab you should see that HA is now turned on:

Verify that HA is now turned on

Verify that HA is now turned on


As I said previously I found 120 seconds worked best for me – but how did I determine that? Simple: I started by setting the HA timeout to 60 seconds (twice the default) and then ran the hard shutdown test again. One of the XenServers still fenced so I went to 90 seconds, and then finally 120 seconds. The point at which the XenServers do not fence is where you want to stop. But don’t just do this test on one side of the storage! You will want to recover your storage servers and once everything is back online and healthy run the same test again – but this time hard-shutdown the other storage node. Now if none of the XenServers fence then you are done…unless you disable and re-enable HA. As I pointed out in that earlier post, this manual timeout setting is not persistent – if you disable and re-enable HA on the pool, you will have to re-enable it from the command line again to insure that the timeout is set correctly. If it’s done from the GUI, it will revert to the 30-second default.

In our post of October 6, hard on the heels of the Citrix news release that announced XenDesktop 4, (hereinafter called “XD4” to save wear and tear on my keyboard) we told you that XD4 was moving toward a strict per-user licensing model, rather than the concurrent-use model that Citrix products have been using since forever. Since that initial news release, however, Citrix has backed down on that position, and made some changes in how XD4 can be licensed.

XD4 Enterprise and Platinum Editions can now be licensed in either per-user or per-device mode. The per-device mode has obvious benefits in, say, classroom situations where a single device will be shared by multiple users, a clinical workstation in a hospital that is used by multiple users, or a factory floor where different shifts come and go. This aligns very closely with the Microsoft RDS CAL licensing model. (RDS, or Remote Desktop Services, is the new name for Terminal Services.) If a given use case would be more economically licensed using per-device RDS CALs, then per-device licensing for XD4 will probably make more sense as well.

A user who has been assigned a user license is entitled to use an unlimited number of devices to access an unlimited number of desktops. A device that has been assigned a device license can be used by an unlimited number of users. Just as is the case with Microsoft RDS CALs, user licenses can be reassigned permanently if a licensed user leaves the organization, or temporarily if a licensed user is absent for a protracted period of time. Likewise, a device license can be reassigned if a device must be replaced, or reassigned temporarily while a device is being repaired.

Customers can have both user and device licensing in the same enterprise, and licenses may be switched from user to device and vice-versa after 90 days. Once you reassign a license, you must wait at least another 90 days before you can switch back.

Just in case that’s not confusing enough, the low-end XD4 “VDI Edition” – which supports only VDI deployments and does not include any of the XenApp or “FlexCast” functionality – can be licensed in either per-user or per-device or concurrent mode. Concurrent licenses for the VDI Edition can be upgraded to either user or device licenses for XD4 Enterprise or Platinum Edition. However, within the VDI Edition, you cannot convert VDI concurrent licenses to VDI user or device licenses, nor can you convert VDI user or device licenses to VDI concurrent licenses.

License Management
Device licenses are assigned by manually adding a unique device identity to a device log. This device log must be manually maintained as devices come and go. User licenses leverage Active Directory – you create and maintain a specific OU for your licensed users.

One wrinkle that you may not be aware of is the concept of “overdraft” licenses. Citrix will actually grant one overdraft license for every 10 licenses that you allocate to a license file. These overdraft licenses are automatically rolled into the license file when it’s generated, and are displayed in a separate column of the License Management Console. The allocation of an overdraft license is recorded in the XenDesktop event log, but you won’t know unless you go looking for it – there is currently no alerting system that would proactively tell you that it’s happened. I would expect that, at some point, Citrix will build in some kind of overdraft alert.

Bear in mind that the overdraft licenses are not intended to let you, on an ongoing basis, exceed the license count you purchased. They’re intended to prevent the situation where a user is denied service because of a temporary spike in usage, or because a license hasn’t been properly allocated or re-allocated, and give you time to purchase additional licenses before the lack of available licenses becomes a crisis. Bottom line here is that if you think you’re getting close to your maximum license count, you should probably check the License Management Console from time to time to see how many licenses are actually in use, and whether you’re into your overdraft pool.

Citrix Provisioning Services, which evolved from their acquisition of the Ardence technology, enables some great concepts:

  • Since the first time a Citrix customer deployed more than one WinFrame server, we’ve struggled with the issue of change control – how do we insure that, over time, all of the servers that are supposed to be identical do, in fact, remain identical? Booting and running them all from a single, read-only image is a great way to do that.
  • It gives you an “undo” option when you upgrade your server image. You can make a copy of your read-only image, set it to read/write, apply your patches, updates, etc., reboot one server from the new image, do your testing, then set the new image to read-only, reboot your servers, and ba-da-boom ba-da-bing (that’s a technical term), in the time it takes them to reboot, they’re all running from the new image. If you then discover that there’s something wrong with the new image, point them back at the old image and reboot them again, and, in the time it takes them to reboot again, you’ve just rolled back to the old image.
  • In a VDI scenario, not only do you enjoy the first two advantages, you also save a ton of expensive SAN storage. If your typical desktop image is, say, 10 Gb, and you want to deploy 100 virtual desktops, with some vendors’ approaches you will consume a full terabyte of expensive SAN storage. By using provisioning services, you consume only the 10 Gb required by the common image.

Unfortunately, when you convert a modern Microsoft OS image to a shared read-only image, it looks like a hardware change to the OS, and breaks the license activation. This is the case with Windows 2008, 2008 R2, Vista, and Windows 7.

Enter the KMS server. KMS stands for “Key Management Service,” and it’s one way to automate the activation of Microsoft volume licenses within an organization. There’s a pretty good video that you can download from Microsoft Technet that walks through the process of configuring a KMS server to automatically activate servers and workstations, but it was made prior to the release of 2008 R2, so it omits a very important point (which we will get to in due time).

The concept is that as an un-activated copy of Server 2008, Vista, or Win7 boots, it queries Active Directory to see if there is a KMS server on the network. If there is, it contacts the KMS server for activation. However, for reasons that are not at all clear to me, the KMS server must be contacted by a minimum number of machines before it will actually activate anything. So, each time a different machine contacts the KMS server for activation, it is assigned a unique ID number, and the KMS server increments its counter by one. When it has been contacted by a total of five different systems, it will begin to activate servers. When it has been contacted by a total of 25 different systems, it will begin to activate workstations.

Before the release of Server 2008 R2, only physical systems would increment the counter – virtual systems would not. (Don’t ask me how the KMS server could tell the difference – that’s one of the ongoing mysteries of KMS.) And that’s the message you’ll hear when you watch the video referenced earlier. However, if KMS is running on a Windows 2008 R2 server, both physical and virtual systems will increment the counter. Note also that what matters is the aggregate number of all systems that have contacted the server for activation, regardless of whether they’re running Server 2008, 2008 R2, Vista, or Win7.

If the threshold has not yet been reached, the system will not be activated, but will still run…within the constraints of the built-in 30-day “grace period” for activation. (Although the nag messages get pretty intrusive in the last three days of the grace period.) This, by the way, is good news if you’re looking at an evaluation or proof of concept that will involve fewer systems than it takes to meet the threshold – you should be OK as long as the evaluation term doesn’t exceed the 30-day grace period. The system will continue to check back in with the KMS server ever two hours to see if the threshold has been met. When it is met, all of the systems that have been waiting will be activated. Once activated, a system will attempt to check back in and renew its activation every 7 days. It must renew its activation within 180 days, or it will revert back to an un-activated state.

The KMS server keeps track of the ID numbers of the systems that have contacted it for activation. If an activated system does not check back in within 30 days, its ID number is removed from the KMS server’s cache, and the counter is decremented. If the count falls back below the threshold, the KMS server will stop activating systems. To help guard against this, the KMS server’s cache size is set to 2x the threshold. In other words, if you’re only activating servers, the cache will contain the IDs of the last 10 servers that have contacted it for activation. If you’re activating workstations, or a combination of workstations and servers, the cache will contain the IDs of the last 50 systems that have contacted it for activation.

The KMS service can be co-hosted with other services in your server infrastructure – you do not have to dedicate a server to this function. In fact, if all you care about are workstations, you can host the KMS service on a Win7 workstation. You’re going to want to have more than one KMS host running, to insure that it doesn’t become a single point of failure in your infrastructure. And remember, unless you’re going to be activating enough physical systems to meet the KMS threshold, you need to be running KMS on Server 2008 R2. That will give you the ability to activate “any Windows operating system that supports Volume Activation,” (which today means the four operating systems we’ve been discussing here), and count both physical and virtual systems toward the required threshold.

So…wrapping back around to the beginning of this discussion, if you want to use Provisioning Services to provision XenApp servers on Server 2008 (and remember, XenApp does not yet work on 2008 R2 as of this writing), you’re going to need a couple of KMS servers. And unless you have five or more physical 2008 servers that it can activate, you’re going to need to have your KMS servers running on R2. And even then, you’re going to need a total of at least five machines to meet the threshold before KMS will activate anything.

Likewise, if you want to use Provisioning Services to provision Win7 desktops – and I’m ignoring Vista here, because, even though I personally liked Vista, I think Win7 is sufficiently superior that it just doesn’t make sense at this point not to go to Win7 – you’re also going to need a couple of KMS servers. And unless you have 25 or more physical systems (in aggregate, counting both servers and workstations), they’re going to need to be running on R2. And in any event, you’re going to need a total of at least 25 systems.

For more information on exactly how KMS works, I strongly recommend the Technet Volume Activation Planning Guide for Windows 7 and Windows Server 2008 R2. Happy provisioning!

I have been cloning Citrix servers since the days of MetaFrame XP. Over the years I’ve done hundreds of systems and taught a number of people a process for cloning servers that has worked 100% of the time. Unfortunately that process required removing registry keys, running tools to change the SID, and “sterilizing” the image to get it ready to clone. Then once this was done you had to make a copy of the server (in the Bad Old Days we used Symantec Ghost – today we have better imaging tools, which we’ll discuss below), and then move that copy to either different hardware or to a virtualization platform. Then, after copying it, you had to reverse the whole process by adding back registry keys, changing the server name, joining the domain, and finally running “chfarm” (change farm) to join the machine back to the Citrix farm.

About a year and a half ago, Citrix came out with a tool called XenApp Prep, which takes the whole process down from about 30 minutes to just a couple of minutes (not including the amount of time to copy the files). With Windows 2008, the process is simple, and I’m going to tell you exactly how I clone an image. But before I start, I want to stress that, while the process is nearly the same for using XenApp Prep to make a V-Disk image for use with Provisioning Server, there are some slight differences, so be sure to read the “readme” file and the FAQ that come in the XenApp Prep zipped download.

Here are the high-level steps I use to create the server that I’m going to turn into a “Gold” image that I can then use as the source of my cloned image(s):

  1. First I install Windows Server 2008 and apply all critical OS patches and any optional patches I deem necessary to bring the server up to current standards. (Most IT shops have their own policies and standards for approving and applying patches, so your list may be different from mine.)
  2. Install any extra pieces that will be required by your application set: j#, .NET (whichever versions you need) with the appropriate SP, Java, etc.
  3. Turn on the required Terminal Services roles, and, if you are going to place the Web Interface on the server (I don’t personally recommend this), turn on the IIS role.
  4. When all my prerequisites are met – and you may want to check the admin guide or the Citrix Web site to find the most recent requirements – I install XenApp 5.0.
  5. Install the most recent Citrix service packs, hotfixes, feature packs, etc.
  6. Apply any best practices and tweaks necessary. (This is a whole topic by itself, so we won’t try to cover it here.)
  7. Now, unless I’m using application streaming (another subject we’re not covering here), I install all of my applications. Generally I start with Microsoft Office, because nearly all the time, a customer requires that at least part of the Office Suite be installed. For specific “line of business” and third-party applications, I would always want to work with the customer’s Subject Matter Expert (“SME”) to verify proper operation.
  8. After the application is installed, I have the SME test the functionality to verify that the application is functioning as would be expected to do whatever it is the business needs the application to do.

If the customer’s SME agrees that the applications are working correctly, I am ready to transform this server into my Gold image. This couldn’t be easier, especially if you’re virtualizing the XenApp servers. (And you know that XenServer is the best virtualization platform for XenApp, right?) Here are the steps:

  1. Hopefully I was thinking ahead and used a generic name for the server when I built it…but if for some reason I forgot to do that, I change the server name to something generic and reboot.
  2. Now I download XenApp Prep and install it to the server by running the MSI file. By default, the XenApp Prep installation places its executables in the C:\Program Files\Citrix\XenAppPrep directory (click image to view full size):
  3. The XenApp Prep Directory

    The XenApp Prep Directory

  4. If you are not creating an image for Provisioning Server – and we’re assuming here that you’re not – then all you do is navigate to the directory shown above and double click the XenAppPrep.exe to run it. (Again, refer to the readme and FAQ that come with XenApp Prep if you are creating an image for PVS.) A command window will appear, run a few commands, and close. That’s it – and that quick little process that took about 15 seconds saved you at least 10 minutes.
  5. Command Window

  6. Once XenApp Prep has completed, I next remove the IP address by either setting it to DHCP or to some static IP address. I prefer to set the address to something that’s not on its local subnet, so when it reboots, it cannot communicate until I want it to.
  7. I now navigate to the C:\windows\system32\sysprep directory, and doubleclick the sysprep.exe file to run, select the “OOBE” option (that’s “Out Of Box” Experience, not “Out Of Body”), select the option to shut down the server (not reboot), then click “next,” and sysprep runs – taking only a few seconds to complete:
  8. The Sysprep Directory

    The Sysprep Directory


    The Out-of-Box Experience

    The Out-of-Box Experience


    Sysprep Runs

    Sysprep Runs

At this point, you have your Gold image and you’re ready to deploy it over and over again. How do you do that? Again, it couldn’t be any easier:

  1. Copy the image to a new physical server using whatever imaging tool you prefer – we generally use Ultrabac’s UBDR Gold or Acronis, but whatever tool you prefer should work fine. If you’re virtualizing on XenServer, Hyper-V, or VMware all you need to do is copy the image to another storage repository.
  2. After the copying process is done – which is the longest step in the process of creating your clone – boot the server up, and follow the sysprep utility prompts (as though you just ran “setup” on a brand new server – hence the “Out of Box Experience”) to give the server its final name. This may take several minutes to complete.
  3. Boot Your New Server

    Boot Your New Server

  4. When sysprep is done, you will need to change the password in order to log on to the system.
  5. Immediately set the correct IP address and verify that the machine can ping the domain name.
  6. Go to the system properties and join the machine to your domain.
  7. Reboot
  8. When the server comes up this time, and you log onto the domain, your server should have already joined the Citrix farm and be ready to go. Just to be sure, I open a command prompt and type “qfarm” to verify that the server is now a member of the farm.
  9. Once you’ve confirmed that the server is in the farm, run the Access Suite Console, and configure it to see the farm. Once it comes up, I simply drag and drop the published applications that should be assigned to the new server and it’s ready to go.
  10. After I drag the applications onto the server, just to be sure, I again run a qfarm command – “qfarm/app” – to verify that the farm sees the new server with the newly allocated published applications on it.
  11. After you test the new server, make sure you’ve enabled logons on it.

That’s it – you now have another server in your farm, and creating more servers should only take you a few minutes for each one. (Of course the copy process is the slowest part…but you can always use that time to refill your coffee cup, comment on our blog site, or otherwise multitask if you’re really ambitious.)

Which App Streaming Is Best?

October 28th, 2009 | Posted by Sid Herron in Citrix | Microsoft | VMware - (0 Comments)

For quite some time now, Citrix has had the ability to stream applications on demand, either to XenApp servers, or to desktop/laptop PCs. If you own current versions of XenApp, you can use it. Microsoft also has an application streaming product called App-V, which it evolved from its acquisition of Softricity a few years back. They recently announced that they were going to discontinue the App-V for Terminal Services licenses, and just bundle the rights into what is now (in Windows 2008 R2) called the Remote Desktop Services (“RDS”) CAL. So if you own Server 2008 TS CALs or 2008 R2 RDS CALs, you’ve got the rights to use App-V to stream apps to your Remote Desktop Servers a.k.a. Terminal Servers.

Not wanting to be left out of the application streaming game, VMware went shopping a while back, and bought ThinApp. They maintain that ThinApp is better – or at least safer – because it runs exclusively in user mode, whereas both App-V and Citrix App Streaming require the explicit installation of an agent that contains kernel components.

So what’s the real story? Which application streaming technology should you use? Which is really best? As is so often the case with IT, the answer is a resounding, “It depends.” It’s sometime frustrating, but the fact is that we work in an industry where there is often no single “right way” to do something. But today I ran across a blog entry over in the Citrix Community Blog area that did such a great job of delving into the differences that I thought it was worth linking to here.

Check it out and let us know what you think.

NOTE: This was originally posted in October, 2009, and may not be a problem any more with current versions of XenServer, as some of the more recent comments would tend to verify – but we will keep the post active for historical purposes. (added by Moose Logic administrator, March 16, 2012)

The Level 1 HA (High Availability) feature that comes with Citrix Essentials for XenServer may be one of the best ways to crash your whole virtual infrastructure if you don’t understand how it works and don’t design in an appropriate level of redundancy. This of course will lead to hours of down time, unhappy management, possible data loss, and lots of extra work for you (most likely on a weekend).

The basics -
HA is designed to monitor the XenServer virtualization environment. When HA is enabled, the administrator can specify which virtual machines (VMs) need to be automatically restarted if the host server they’re running on should fail. If there is a failure of a host server, HA should then automatically restart its designated guest VMs on another host in the XenServer “resource pool.” Note that the HA function does not “live migrate” the guest VMs, because when a host fails the VMs on that host also fail. Rather, it selects another host server and restarts the VMs on that host. For all of this to happen correctly, Citrix’s HA requires two things to be true at all times:

  1. Each XenServer must be able to communicate with its peers in the pool.
  2. Each XenServer in the pool requires access at all times to the HA heartbeat disk, which is shared by all the XenServers in the pool.

If either of these two items is not true for any given XenServer in the pool, that server will “fence.” The short definition of “fencing” is that the XenServer suspects – although it’s not absolutely sure – that it is experiencing some kind of failure, so to protect against possible data corruption it shuts itself down – essentially sacrificing itself to protect the data – until a human comes along and sorts things out. If the fenced server is in a correctly configured HA pool, guest VMs that were configured for HA restart will be restarted on a surviving XenServer.

Considerations -
So… you have two XenServers all set up and all your VMs configured just the way you like them, and you decide to turn on HA. Everything appears to be working until one of the hosts suffers a failure and goes off line. (Murphy’s Law says this will happen on a Saturday evening right before your BBQ party is starting.) With HA enabled, you would expect, based on the whole “High Availability” concept, that everything would be OK. Critical VMs should get restarted on the other host and you should be able to deal with the failed host on Monday.

Oh, but wait, remember HA rule #1? The XenServer host that is still running suddenly does not have any peers to talk to. It no longer knows whether or not it’s healthy so, in the interest of protecting your data from corruption, it does what it’s designed to do – it fences, and now both of your XenServers are down. They may try to reboot, but you are now in an endless loop of fencing, and to get it resolved, you’re going to have to know how to use the “xe host-emergency-ha-disable force=true” command to resolve your problems. (And if you don’t understand that last sentence, you’re in for a long weekend.)

This results in a situation that we in IT refer to as “not good,” with a chance of “career altering,” and you’re going to miss your BBQ party.

Here’s another scenario that will spoil your party: What if both XenServers are actually healthy, and all the virtual servers are up and functioning, but the network link for the management communications between the XenServers fails? Again, each XenServer would think it was stranded from the pool and fence itself in an attempt to correct the issue. With both servers fencing, this would again create an endless loop of server fencing. In essence, one server would start to come back online and would still not see the other XenServer and would fence again, and so on, and so on.

So for those reasons a two-XenServer pool cannot successfully run HA! Just don’t do it – even though you can configure HA on a two-server pool the result can be disastrous and ruin your weekend…not to mention your next performance review.

HA In a Two-Server Pool - Just Don't Do It!

HA In a Two-Server Pool - Just Don't Do It!


Well, what about HA in a three node XenServer pool? Based upon the previously described scenarios, you now have a valid “pool,” in which HA will function. So you configure and enable HA, and when you test the HA functionality by killing one of the XenServers, everything works like it is supposed to. The guest VMs are restarted on the surviving XenServer hosts and you’re happy that everything is working correctly.

But here is another “gotcha!” If you have only one Ethernet interface per XenServer assigned to management, and they’re all plugged into one switch, what happens if the management link fails because a NIC fails – or even worse, the switch fails? If it’s just a NIC in one server, then that XenServer will fence – not too bad but still not what you want. If you were using a different set of NICs (as you always should) for the guest VMs to communicate with the rest of the world, then the guests on that server were probably up and working just fine until the server fenced. Sure, the critical ones will restart on the remaining servers, but you’ve lost a third of the resources in your pool unnecessarily.

Now let’s consider what would happen if the switch should fail and you had only single management ports on each XenServer all plugged into just that one switch. If this happens, it may be time to dust off the old resume, because you have just lost your entire XenServer pool. Why? Because when the switch went down, all the XenServers lost communication with one another, and each assumed that, because it was suddenly isolated from the pool, it must be experiencing some kind of failure. Therefore the whole pool fenced.

Non-Redundant Management Links - Don't Do This Either!

Non-Redundant Management Links - Don't Do This Either!


Conclusions -
Citrix’s HA does not work in a two host pool, period. With a pool of three or more XenServers you’ll be OK if you design the infrastructure correctly so that there is no single point of failure in your peer communications. How? Simply by bonding together two NICs, dedicating them to the management communication function, and then splitting the bonded pairs between two separate Ethernet switches. That way you’re protected against both a NIC failure and a switch failure.

But you’re not out of the woods yet! Don’t forget HA rule #2 – servers need to see the HA heartbeat disk. This is equally important, and you must consider the topology of that side of the network (iSCSI, Fiber, etc.) and be sure it is also redundant. And if you’re using iSCSI multi-pathing (e.g., with a pair of mirrored DataCore iSCSI SAN nodes), be sure to manually bump up the HA timeout interval so that if one of the SAN nodes should fail, the multi-pathing function has time to fail over to the other node before the XenServers all conclude that the HA heartbeat disk is gone – otherwise, again, they will all fence. Our testing indicates that a two minute timeout appears to have an adequate margin of safety. The default setting of one minute (oops – the default is actually 30 seconds) is definitely too short. Unfortunately, this setting does not appear to be persistent, so if you turn HA off and then back on, you’ll need to manually reset the timeout interval again. (This is probably a job for Workflow Studio, but we just haven’t had time to work through the process yet.)

NO Single Points of Failure
HA will do a fine job of protecting you, if you build the network correctly. So make sure you’ve built in enough redundancy that you have no single point of failure, and enjoy your BBQ.

The Right Way to Build an HA Environment

The Right Way to Build an HA Environment


P.S.: If you can’t justify more than two XenServers, but you still have one or more critical guests that need to be highly available, there is a solution: Marathon Technologies’ everRun VM. But that’s another post for another day.

Have you been considering moving from VMware ESX or vSphere to either Citrix® XenServer™ or Microsoft® Windows® Server 2008 Hyper-V™ – but been concerned about exactly how to go about it? Knowing what tools to use to make the migration go smoothly is often a major concern. Also, what kind of support can you get during the transition? And structured training on a new platform is not inexpensive, either. Now Citrix is trying to eliminate these obstacles with a new promotion that runs through March 31, 2010.

On October, 14, 2009, Citrix announced a new program called Project “Open Door”. Customers who switch existing VMware servers to XenServer or Hyper-V, and add Citrix Essentials™ for advanced virtualization management, will receive additional technical support, training, and conversion tools from Citrix at no additional cost.

The Project Open Door promotion will be effective worldwide from October 1 – March 31, 2010. Customers who decommission five or more VMware vSphere 4 or VI3 servers and replace them with XenServer or Hyper-V plus the Citrix Essentials solution, receive the following:

  • A free five incident support pack (5 by 8 hours) for every five servers converted
  • A voucher for six hours of online training for every five servers converted
  • Free migration tools for seamlessly transferring virtual machines from VMware to XenServer or Hyper-V

Check out http://www.citrix.com/opendoor for more information on the program. If you’re seriously considering making the switch, this just might be the time to do it.

Citrix Changes the Game Again

October 6th, 2009 | Posted by Sid Herron in Citrix | VDI | XenDesktop - (1 Comments)

Disclaimer: Moose Logic is a Citrix Solution Advisor, and the author has worked with Citrix products for well over a decade – which is about how long there have been Citrix products to work with. As a fan of the company and the technology, it’s sometimes difficult to be objective…but I’ll try.

Citrix has shown in the past that it is not afraid to make bold moves to shake up the market landscape. The most recent was the decision to make XenServer, the “type 1” hypervisor obtained through the acquisition of XenSource, free. With today’s announcement of XenDesktop 4, they’ve made another bold move – arguably the boldest and the most far-reaching retooling of their product line ever.

You can read the press release at the Citrix Web site, and also get all of the details of the new offerings there, as well as from the volumes that will be written in the blogosphere and trade press over the next few days. But the basics are as follows:

  • XenDesktop, in all but it’s most basic version, will include XenApp. With a single XenDesktop license, you will be able to:
    • Deploy a shared virtual desktop from a XenApp-equipped Terminal Server, or deliver published applications running on a XenApp-equipped Terminal Server.
    • Connect to a virtual instance of a PC Operating System running on your choice of virtualization platforms (XenServer, Hyper-V, or VMware) – the classic definition of “VDI.”
    • Connect to a blade PC, if your computing or graphics needs are so demanding that you need dedicated hardware.
    • Stream a PC Operating System in real time to a desktop PC across the LAN – allowing you to boot and run your PCs from a common master image.
    • Stream applications to XenApp servers, PCs (whether virtual or physical), or both, and, if necessary, cache them for off-line use.
    • (Coming very soon) stream a PC Operating System to a client-side hypervisor, where it can be cached for off-line use.
  • XenDesktop will be moving to a per-user license model – a major shift, since Citrix licensing has almost exclusively been based on concurrent use as long as anyone can remember. Sales of concurrent-use licenses for XenDesktop will be discontinued on November 16, when sales of XenDesktop 4 licenses begin.
  • XenApp Enterprise and Platinum users with current Subscription Advantage will be offered a screaming “trade-up” deal that runs through June 30, 2010.
  • Strategically speaking, XenApp is clearly taking the back seat compared to XenDesktop. It will continue to be sold in all existing editions, but is being repositioned as the best solution for customers with high user concurrency (greater than 2:1), or those who use it as a “point solution” (e.g., remote access over limited bandwidth connections, call center applications, etc.). This also is a huge shift, when you consider that XenApp is the product that made Citrix.

So…what’s behind these moves? Citrix clearly believes that the battle for control of desktop delivery is where the future of the company lies. WinFrame/MetaFrame/Presentation Server/XenApp has been the de facto standard for remote access and server-based computing for well over a decade. But if all you care about is deploying Terminal Services (a.k.a. Remote Desktop Services in Windows Server 2008 R2), the value proposition for adding XenApp to your Terminal Servers has been steadily declining – and with the new features of Windows Server 2008 R2, it declines even further. This is why Citrix has worked so hard to reposition the conversation as one about application delivery as opposed to remote access or server-based computing, and why they have continued to roll more features into XenApp – particularly the Platinum Edition, which is really a suite of products more than an edition of one product.

Now they are working to reposition the conversation yet again. Nearly everyone agrees that there will be a huge uptake of Windows 7 over the next couple of years. And as Brian Madden pointed out in a techtarget.com article recently: “…there’s no sense virtualizing your desktops just to end up with XP again. And when Windows 7 launches, there’s no sense migrating to it while still managing your desktops the ‘old’ way.” Clearly, the Windows 7 rollout is a perfect opportunity for organizations to rethink the way they deploy and manage desktops.

The message from Citrix is clear: Desktop virtualization does not equal VDI. VDI, as it is classically defined, is only one way to deliver a virtualized desktop. There are many other ways – which we listed at the top of this article – and all of them have perfectly valid use cases. Since Citrix has solutions that cover all of those ways, it makes sense to offer a single license that will allow customers to “mix and match” and choose the best virtualization solution for each use case.

As the old saying goes, “Nothing succeeds like success.” If this works out the way Citrix obviously hopes it will, it will, by definition, be viewed as one of the most brilliant marketing moves since the deal with Microsoft that led to MetaFrame. At the very least, I think it must be recognized as a pretty gutsy move. And it’s certainly going to be fun to watch.

Latest Blog Feeds
Testimonials
“Our business is all about process and margins; we rely on Moose Logic to install and manage network solutions that enable us to control both. Moose Logic created solutions that transformed our business relationships and processes.”
Ron Horowitz
Birchwood Park Homes
Read our Newsletter
Copyright © 2010 All rights reserved.
Wordpress Delicate template designed by NattyWP