You are here: Home > Blog

We were recently asked, by someone who was planning a XenDesktop 4 Proof of Concept, what minimum components were required to conduct the POC. Rather than prepare a document just for them, it seemed like a good idea to put the information here so others can read and contribute.

In its most basic configuration, XenDesktop is, functionally, going to look like this (click on picture to view full-size):

XenDesktop Functional Diagram


I lifted this drawing from a three-year-old Citrix PowerPoint presentation, and while XenDesktop has evolved considerably since then, the functional building blocks are still much the same:

  • You’re going to have a Desktop Delivery Controller (“DDC”). This is the Windows server that brokers the connection between the client device and the virtual OS. As you move into production and scale up the environment, you will probably have multiple DDCs.
  • You’re going to have a Citrix License Server. In a small deployment, like a POC, this service can also reside on the DDC.
  • You’re going to need a place for Citrix to store configuration data. In a production deployment, you’ll probably want the Data Store on a SQL Server. For the POC, it can also reside on the DDC.
  • You’re going to need a “Web Interface” server. One way or another, the client devices are going to communicate with the WI server, which will consume the user’s authentication credentials and (in most cases) present the user with the desktop choices that are available to him/her. I say “in most cases,” because it is possible to configure a client such that it will immediately connect to a designated virtual desktop without requiring the user to click on an icon.

    Once again, in a small deployment like a POC, the Web Interface services can run on the same Windows Server as the DDC, the Licensing Services, and the Data Store. So far, we haven’t moved beyond just a single Windows server – although, of course, as the environment expands and moves into production, these Web services should also be migrated to their own server.

  • All of this needs to live in a Windows Active Directory Domain, so if you’re building a POC that is isolated from your production environment, you’re going to need to provide a Domain Controller. That poor little DDC system already has enough running on it, so let’s make the Domain Controller a separate server.
  • You’re going to need some kind of virtualization infrastructure. XenDesktop is platform-agnostic at this level – it will run on XenServer, Hyper-V, or VMware. All of the other servers/services we’ve been talking about so far can be virtual servers running on this infrastructure. In a small POC, that’s the obvious way to go anyway.

Now things start to get a little tricky. That gray box that surrounds the repositories labeled “Profiles,” “Apps,” and “OS” can be broken down in a couple of ways.

Let’s assume that we are going to stream an OS, from a single, shared, read-only image, to virtual PCs that will be instantiated (I love that word – it just rolls off the tongue, and it sounds so technical) on-demand on whatever virtualization platform we’ve chosen. That means we need a Provisioning Server, and a place to store those read-only images. For a POC, the images can be stored on the Provisioning Server itself. When we move into production, since we don’t want the Provisioning Server to be a single point of failure in our VDI infrastructure, we’re going to want more than one Provisioning Server, which means that the OS images are going to need to reside on shared storage of some kind that can be accessed by all of our Provisioning Servers.

Elisabeth Teixeira of Citrix has a great 4-part series on High Availability for Provisioning Services over on the Citrix Community Blog site. Rather than go into detail here, I’d strongly recommend reading through her posts.

For our POC, the Provisioning Server can be virtualized. When we move into production, it’s probably best, for a variety of reasons that we won’t go into here, that they be physical servers.

Our virtual PCs are going to need apps as well. (After all, the entire purpose of a PC is to run apps, right?) If you wish, you can “bake” the applications into the read-only “golden” image that we’re going to use for provisioning, by first installing them on the PC that we’re going to use to create the image. Of course, that means that whenever you make a change to an app, you have to change the whole image, and we know what a pain that is, because many of us have been managing images for physical PCs that way for years. So we’re going to be better off if we stream the applications on-demand onto the virtual PCs after they’re booted up and users have attached to them. We will therefore need at least one XenApp server to manage the application streaming.

Finally, we’re going to need a file server to serve as a repository for user profiles and user data. The streamed OS images are, after all, read-only, so we’re going to need to use AD Group Policies to specify where that data is stored, since it can’t be stored in a profile that’s part of the streamed image.

One more thing comes into play, depending on what Windows OS you’re going to use for your virtual PCs. As we’ve noted in other posts, the process of converting a Vista or Windows 7 PC into a shared golden image will break the license key. You must therefore have a KMS Server available to auto-activate the PCs as they boot up. For best results, the KMS service should be running on a Windows 2008 R2 server. For more information on KMS and how it works, please see our earlier blog post on KMS.

That’s really all you need to do a POC, provided that all your clients will be connecting from within the protected network. If you want to grant access to clients connecting in from the public Internet, you’re going to need a secure way to do that. The simplest way is to use the software Citrix Secure Gateway that comes with XenApp. The CSG is basically an application-specific software SSL/VPN – running on a Windows Web server – that provides a secure proxy between the public Internet and the Web Interface server. For more demanding environments, you should consider the line of Citrix Access Gateway appliances, which can function as general-purpose SSL/VPN appliances as well as providing access to the XenDesktop infrastructure, and can provide advanced features like redundancy, automatic failover and, with the NetScaler software load, even provide Global Network Load Balancing for automatic failover between a primary site and a DR site.

If you have clients in branch offices connecting to your XenDesktop infrastructure across a Wide Area Network, you may see some benefits from deploying the Citrix Branch Repeater line of WAN optimization appliances. It’s likely that as we move through the year and see the release of new technology like XenClient, we will see an expanded role for the Branch Repeater with Windows Server and its ability to cache data locally at the branch office level – but that’s another post for another day.

So there you have it. To summarize, our minimum POC environment will consist of the following servers/services running on our virtualization infrastructure:

  • Domain Controller
  • A Windows Server hosting the following services (which can be broken out onto separate servers as the environment scales):
    • Desktop Delivery Controller
    • License Server
    • Data Store
    • Web Interface
  • Provisioning Server
  • XenApp Server (for application streaming)
  • File Server (optional – in a pinch you could make file shares available on one of the other servers)
  • KMS Server (if you want to provision Vista or Win7 PCs)
  • Secure Gateway Server or Access Gateway Appliance (if you want to provide secure access from the public Internet…note that this server or appliance should be in a DMZ for best security)

Citrix has announced that, effective immediately, the XenDesktop 4 trade-up offer has been extended to customers who have XenApp Advanced Edition. This is great news for those customers, because, under the terms of the original trade-up offer, XenApp Advanced customers would have had to first upgrade their XenApp licenses to XenApp Enterprise, and then do the trade-up.

The table below shows the pricing grid for the trade-up program, depending on which version of XenApp you currently own, which version of XenDesktop you want to trade up to, whether you’re trading up all of your XenApp licenses, and whether or not your Subscription Advantage is current (click on the graphic to view full-size):

XenDesktop 4 Trade-Up Pricing

XenDesktop 4 Trade-Up Pricing


Because the part numbers for the trade-up from XenApp Advanced have not yet been released, customers who want to take advantage of it will need to request a special quote. Two other points to remember:

  • If you trade-up 100% of your XenApp licenses, you get two XenDesktop licenses per XenApp license. Otherwise it’s one-for-one.
  • The trade-up offer runs through June 30, 2010. And as much as I hate to say this, that date will be here before you know it, so please don’t wait until the last minute!

The on-line trade-up calculator has been updated to include information for XenApp advanced.

To continue the discussion of “What is Virtualization?” that I started back on December 4, I bring you the next installment – Application Virtualization.

Application Virtualization is the isolation and separation of an application from its underlying Operating System (OS) as well as from other applications. The application is fooled into believing that it is working as normal, interacting with the OS and using those resources as if the application had been installed directly on the OS as normal.

Additionally, the application can be installed once within the datacenter and preserved as a “golden image” to be delivered out to the end users. This gives you one instance to manage, one instance to patch, one instance to maintain – all housed in one location. This will help cut IT application maintenance costs as well as help control licensing costs as it will be easier to track application utilization.

Since each virtualized application is isolated from other applications it becomes possible to deploy, on the same piece of hardware, applications that typically didn’t play nicely together in the past. This cuts down on the time needed to test application compatibility since each application resides inside its own “bubble” (much like teenagers).application silos

Traditionally, both desktop admins and admins who were in charge of Terminal Servers (and XenApp servers) spent hours and hours on application compatibility testing. When a new application was added to the official desktop or server image, or an existing application was upgraded, regression testing was necessary to insure that the new or upgraded application didn’t break some other application by, for example, overwriting a shared DLL file. By providing a method for virtualizing Registry entries and calls to particular folder locations, application isolation overcomes most of these headaches.

The real trick with application virtualization is the delivery method, since the delivery methods of these virtual applications is what separates the different vendor solutions in this field. The big three application virtualization solutions are Citrix XenApp, VMware ThinApp, and Microsoft Application Virtualization (a.k.a. “App-V”). These three vendors use either one method or a combination of delivery methods to get the applications to the end users.

Application Streaming: This refers to streaming the application over the network to the client PC on demand. The “secret sauce” here is in figuring out how to stream down just enough of the code to launch the application and allow the user to begin interacting with it. The rest of the code can be streamed down when the user attempts to use a feature that requires it, or it can be simply streamed down in the background until all of the application code is cached locally. An added benefit of streaming all of the code down is that it allows the application to continue to be used when the PC is not connected to the network. (E.g., you can unplug your laptop and take it on the road.)

The application streaming technology you use will determine the control and security of the application once it has been streamed to the end user device. For example, Citrix allows you to administratively set a “time to live” limit on how long apps will run in a disconnected state. If the PC isn’t reconnected to the network within that time limit, the app simply stops working – giving you some level of protection if a PC is lost or stolen. For another example, ThinApp allows you to make an application completely portable – you could carry the Office Suite with you on a USB stick, plug it into any PC, use it, and leave no trace behind when you unplugged the USB stick. (Note: Doing this with the Office Suite could result in a violation of the Office EULA!)

Another “secret sauce” ingredient is the ability to allow limited communication between applications, even though they’re running in their own isolation environments (the “bubble” referred to earlier). For example, your accounting application may need to call Excel to render the output of a particular report. Early versions of application isolation required these applications to be “packaged” together, i.e., installed into the same isolation environment – otherwise, the accounting app wouldn’t know that Excel was available, and you’d get an application error. The latest implementations allow enough inter-isolation communication to take place to avoid problems like this while still avoiding application compatibility conflicts.

Application Hosting: This method can take a couple of different forms. The first is to virtualize the presentation of a typical Windows application by installing the application on a Terminal Server (in most cases, a Terminal Server with Citrix XenApp installed on it), and connecting to that Terminal Server using some kind of remote communications protocol (e.g., Microsoft’s RDP, Citrix’s ICA, etc.). We’ve been doing this for years, and thousands of customers and millions of users access applications this way every day.

Most readers of this blog are probably familiar with the advantages of this deployment model: centralized deployment and management, tighter security, granular control over what can be saved and/or printed at the client location, etc.

Application Streaming can work with this kind of Application Hosting by allowing you to stream applications to your Terminal Servers rather than having to explicitly install them or build them into your official server image. Citrix XenApp customers have the rights to use the Citrix streaming technology to do this, and Microsoft recently announced that the new Server 2008 R2 Remote Desktop Services CAL (formerly called a Terminal Services CAL) will include the rights to use App-V to stream applications to Terminal Servers.

Web-based applications can also be legitimately called “hosted applications” – whether they’re hosted in your own corporate data center, or by some kind of application service provider (e.g., Salesforce.com). In this scenario, all that’s required on the client PC is a browser – at least in theory.

In fact, the browser then becomes an application that must be managed! For example, you may find that you require a specific version of Java to access a particular hosted Web application – and if the user has local admin rights to the PC, the possibility exists that s/he will inadvertently install something that breaks its compatibility with your critical Web application. Some Microsoft applications require the use of Internet Explorer (e.g., Microsoft CRM is not compatible with Firefox). Some applications may even require a specific browser version. (When IE7 was first released, it caused compatibility issues for users of Microsoft CRM v3.0.)

Also, as a general rule, a Web application will require a more powerful client PC as well as more bandwidth between the client and the Web server to yield a good user experience, compared to an RDP or ICA client device connecting to a Terminal Server.

There is, of course, the option of installing an application directly on a device either by physically visiting the machine with installation media in hand or by using some kind of central management system to push the bits onto the client’s hard drive. These options, however, do not fall under the definition of application virtualization that we’re using here.

The important thing to take away from application virtualization is that no matter how you approach it, it will save you money:

  • Hardware – being able to host multiple applications on a single piece of hardware without worrying about application incompatibility. This can virtually eliminate the “silos” of servers with different configurations in large XenApp environments that used to be necessary to isolate those problem apps that wouldn’t play nicely with any others.
  • Licensing costs – with all your applications being housed in the data center you will have a better understanding of how many instances of each application you are using and will be able to better track your licensing needs
  • Maintenance – being able to update or patch a single instance of the application rather than needing to physically update and patch each machine.
  • Management – less hardware to look after, less time spent with helping end users with application issues, less time spent in application regression testing

Hope this clears up that “what is application virtualization” question. However if you have more questions feel free to use the comments or contact me directly.

Effective today (12/7/09), qualifying institutions can take advantage of Citrix’s new campus-wide licensing for XenDesktop 4. This is an annual license (meaning that you pay this every year) that is based on the concept of “Full Time Equivalents” (FTEs). For example, an FTE student is defined as either:

  • One student attending the educational institution on a full-time basis, or
  • Three students attending the educational institution on a part-time basis.

The suggested pricing is as follows:

  • XenDesktop Platinum – $29/year/FTE
  • XenDesktop Enterprise – $19/year/FTE
  • XenDesktop VDI – $9/year/FTE

There are several other things you need to know if you want to take advantage of the campus-wide pricing model:

  • For K-12 educational institutions, a “campus” may be defined as a single school, or as an entire school district. Either way, all FTE students must be licensed – either all FTE students attending that single school, or all FTE students in all schools within the district.
  • For higher educational institutions, a “campus” may defined as “a school or department, an individual location, or an entire multi-campus university.” For example, it could be the entire University of YourState, the University of YourState SpecificCity Campus, or just the University of YourState School of Engineering. Again, whichever definition you choose, you must license all FTE students that fall within that definition.
  • You are not required to license faculty and staff, but if you choose to do so, you must license 100% of them, “using the same FTE calculation as your Microsoft Campus or School Agreement.”
  • You must hold an active Microsoft Campus or School Agreement. The Citrix definition of “FTE” is deliberately designed to align with the definition Microsoft uses in these agreements.
  • To qualify for a campus-wide agreement, you must be:
    • “A school organized and operated exclusively for educational purposes, such as a correspondence school, junior college, college, university, scientific or technical institution, which is accredited by associations recognized by either the Department of Education and/or the local Education Authority, and that teaches students as its primary focus.” – or -
    • “The district, regional, or state administrative office of an entity described above, if the office is organized and operated exclusively for educational purposes.” – or -
    • “A hospital, healthcare organization, medical testing laboratory, non-profit museum or public library which is wholly owned by an entity described above. By way of example, the hospital or library of a university meeting the requirements would be part of the customer for purposes of this Agreement.” – or -
    • “Any administrative office or Board of Directors that controls, administers, or is controlled by or administered by an entity described above may also participate.”
  • There is a minimum purchase requirement of 1,000 licenses. You don’t necessarily have to have 1,000 students, you just have to buy 1,000 licenses.

You can find more information in this Citrix Community blog post by Sumit Dhawan.

Virtualization can mean different things depending on who you ask so we are going to take a broad look at what virtualization is, the different forms it comes in, and why it is so popular.

This is going to be pretty basic stuff so if you are looking for more advanced material I promise we will have advanced stuff in future posts.

Virtualization has been getting a lot of buzz the last few years as it moved from being “bleeding edge” technology to becoming an industry standard. You may have even heard that there are lots of benefits to virtualizing your datacenter…but you may not be sure whether it’s for you, how it works, or even what it means.

There are several kinds of virtualization, including server virtualization, storage virtualization, application virtualization, network virtualization, and desktop virtualization. But when most folks talk about virtualization, they’re referring to server virtualization, so that’s what we will cover today.

So, what is server virtualization?  Simply put server virtualization is the technology that is designed to allow multiple (virtual) servers to reside on a single piece of (physical) hardware and share the resources of the physical server – while still maintaining separate operating environments, so that a problem that crops up in one virtual server won’t affect the operation of others that may be running on the same physical “host.” To help explain what this means I’m going to use the house and condo analogy.

Let’s say you’re a land developer and you build residential property. You cut your land into smaller plots and build one house per plot. As part of the land development, you need to bring in all the utilities from the main street to each and every plot. All of this development costs money.  To make matter worse you know that your city’s population is growing, you’re running out of land to build on, and you also need to control the spiraling costs of building materials. How do you cut cost and provide more homes for a growing population on a limited amount of land?

Figure 1 - Typical cul-de-sac USA

Figure 1 - Typical cul-de-sac USA

Perhaps instead of building single-family homes and having one resident per plot you start building condominiums that hold several residents each. Now the utilities that are brought in to the condo complex are shared by all the residents and yet no one ever sees the other residents’ bills. You’re making more efficient use of the land you have and not wasting time and money bringing in utilities to each individual house. Plus one yard is easier to take care of than ten yards.

1 & 2bd Condos Available Now!!

Figure 2 - 1 & 2bd Condos Available Now!!

So how does this relate to server virtualization?

Each plot of land is a physical server, the structure you build on that plot is a server “workload” (i.e., Exchange, SQL, file server, print server, etc.), and the city is your data center. The utilities are things like power, cooling, and network connectivity. When there is only one workload per physical server, a lot of space and resources get wasted. It’s common to see only 10-15% (if that) processor utilization on physical servers which run only one operating system and one application.

With server virtualization we can now create several “virtual” servers on one physical piece of hardware – think of the hardware as little “server condos” if you like. Just as you can have one-bedroom, two-bedroom, and three-bedroom units in a single building, you can allocate differing amounts of processing and memory resources to the virtual servers depending on the requirements of each individual workload. Each virtual server can now share the physical resources of the host machine with the other virtual servers and never know that they are sharing. In fact, each virtual server “thinks” it’s running on its own dedicated hardware platform. By doing this you can now utilize 80-90% of the processing power of the hardware you own, and cut down on the total amount of power, cooling, and floor space you need in your data center.

For example (just pulling numbers out of the air), let’s say that you’ve been paying an average of $5K each for servers that would handle a single workload. If you need four of them, that’s $20K in hardware cost. But if you can buy one server for $8 – 10K to virtualize these 4 machines, that’s a significant reduction in hardware cost. And with fewer machines to plug in and keep cool, your savings can be up to 40% on power consumption alone. (Did you know that we’ve now reached the point where, over the service life of a typical new server, it’s going to cost you more to keep it cool than it cost you to buy it?)

Since the virtual servers are all located on one physical box you now have fewer pieces of hardware to maintain – allowing the IT staff to use their time more efficiently. You’ll save space in your data center. You’ll also cut down on the amount of waste (some of it hazardous) that must be recycled or disposed of when your hardware finally reaches its end-of-life.

You’ve also cut down time needed to bring a new server on line. In the past you would have had to acquire the hardware, assemble it, rack it, connect it to the network, install and patch the OS, install and configure the application, test it all, and finally put it into service. Now that the servers are virtual they can be created, configured, and put into production in a few hours as opposed to the weeks it used to take. In some cases, by using templates for commonly-needed workloads, it can take only minutes. This makes for a much more flexible and scalable environment.

So server virtualization can:

  • Cut hardware costs
  • Cut energy costs (for both power and cooling)
  • Cut system maintenance time and costs
  • Create a very scalable and flexible data center
  • Save space
  • Create a more environmentally friendly data center (a.k.a. “green computing”)

These are the main reasons that server virtualization has become an industry standard. According to folks like Gartner, we’ve now reached the point where the majority of new servers placed into service are being virtualized, and the majority of enterprises have made it a standard practice to virtualize all new servers unless there is a compelling reason why a server can’t or shouldn’t be virtualized. Virtualization also makes it easier to implement things like high availability, disaster recovery, and business continuity, but that’s a subject for a future post.

Recently I had my first opportunity to create a Windows 2008 R2 virtual machine on Citrix XenServer 5.0. When I attempted to install the operating system I ran into an interesting issue where the installation would hang right at the initial Windows install screen and the CPU usage pegged at 100%. Once that happened no matter how long I waited the installation never progressed beyond that point.

Of course what did I do? I turned to Google and quickly found the following article which provided a workaround for the issue I was having.

I followed the advice in the article and after running the “xe vm-param-set uuid= platform:viridian=false” command as outlined in the article was able to install Windows Server 2008 R2.

Two more things are worth mentioning here, which are not specifically addressed in the previously referenced article:

  1. With Windows 2008 R2 I was able to install the XenServer 5.0 tools with none of the problems others people are having with Windows 7 installations.
  2. This issue has been resolved in XenServer v5.5!

The TechTarget family of blog sites has a lot of great information. That’s why we have several of their sites linked in our Blogroll (under “Virtualization” in the right sidebar). But one thing that I don’t like about their sites is that – unlike this blog – there is no way to directly comment on their posts. That makes it difficult to respond to posts like the one last week on VMware’s High Availability (VMHA).

In that post, author David Davis opens by stating:

VMware’s High Availability (VMHA) provides high availability to any guest operating system at a potentially much lower cost than other HA options (as you don’t have to pay per virtual machines [VMs] or per server; VMHA is included in the price of vSphere).

I have a couple of problems with this statement.

First, I don’t know what “a potentially much lower cost” means. Is it less expensive than other HA options, or isn’t it? If it is, which other HA options are you comparing it to? If you’re going to throw that line out there, shouldn’t you give us the data on which the statement is based?

Second, it appears that the “lower cost” claim is primarily based on the fact that VMHA is included in the price of vSphere, rather than requiring a separate license. That’s a little like claiming that the high-end German sound system is less expensive if you get it in a Mercedes – because it’s standard equipment – whereas if you want one in your Malibu you have to buy it separately. What matters is the total amount of money I have to spend to get all the functionality I need, isn’t it?

It is true that with, say, Citrix XenServer, you have to purchase a Citrix Essentials for XenServer license to get HA functionality. That will cost you, at the suggested retail price (which nobody actually pays), $2,500 per XenServer for the Enterprise Edition. But the copy of XenServer you’re putting it on is free. On the other hand, vSphere 4 lists for $2,875 per processor, so if I’m using dual-processor servers, I’m looking at $5,750 for vSphere 4 compared to $2,500 for that copy of Essentials for XenServer. If I’m using quad-processor servers, vSphere 4 is going to run $11,500, but I still only need that single license for Essentials. And don’t forget the cost of VirtualCenter to control my vSphere environment, whereas XenCenter is, again, free, and runs on a workstation rather than requiring a dedicated server.

The point of this post is not to argue the relative merits of vSphere vs. XenServer, nor of whose HA feature is better. In fact, if you follow this blog, you’ll know that we’ve raised some red flags regarding how to properly deploy XenServer HA without risking potentially “career-altering” disasters. The point is simply that the old adage “don’t believe everything you read” is particularly appropriate for stuff you read on the Internet. (But you already knew that, right?)

People who throw out unsubstantiated generalized statements need to be challenged. If the TechTarget site allowed comments, I would have challenged the statement there. Since they don’t, I’m challenging it here. If I’m missing something, David Davis (or anyone else, for that matter) is welcome to comment on this post and point out what it is.

Recently I wrote a post about the hazards of XenServer HA and how to avoid a couple of different pitfalls which lead to XenServer fencing. In that post I talked about the necessity of correctly setting the HA heartbeat timeout for your environment so that your XenServers will allow enough time for a storage failover to occur. The idea, of course, is to prevent your XenServer from going into a “fence” condition which can occur for many reasons. The reason we’re discussing here is triggered when the XenServer believes its storage has suddenly become unavailable and it is not able to recover its state quickly enough to prevent the HA timeout from fencing the server.

I frequently build environments that use a pair of replicated DataCore SANmelody nodes (two physical nodes) and configure my XenServer in a multipath configuration. With this configuration my XenServers see two active paths to their storage (the status of the multipath is shown in the image below) – one path to each of the two nodes. If, for example, one of the SANmelody nodes goes off line, the other node will immediately take over. However, the XenServers have to be given enough time to fully recognize a failover has occurred, and the storage is still available, in order to avoid a fence. The default HA timeout in XenServer is 30 seconds which means if it takes a XenServer more than 30 seconds to realize the storage is still healthy and available then the server will fence. If the storage was indeed still available, then more than likely there were still VM guests up and running on the XenServer, which have now been taken offline unnecessarily.

To test and tune this setting I first make sure HA is enabled on the pool, then I perform hard failover tests where, using a DRAC or iLO card if I have one, I suddenly power cycle one of the storage servers and watch to see if any XenServers fence. I run this hard power cycle test because this specific problem never comes up with simple storage stops and restarts; rather it only shows up when a storage server actually goes down suddenly, or “hard,” as we say. So I run these tests because I want to stress the system to simulate unfortunate things like power failures, sudden server reboots due to gremlins, and other things along those lines. If nothing happens then great – let’s go home and we can sleep well knowing HA is working correctly. But what if you do have one or more servers which do fence because they believe their storage is gone when in fact it is not?

The last time I had this happen to me I had to test my environment several times, and with each successive run through the hard failover test I used a different timeout setting. In the end I found that 120 seconds worked best for me. (Keep in mind I am doing this during a build and there are no live production workloads running on any of these servers.)

So what is the downside of setting your timeout this high? Well, if a XenServer really fails (for whatever reason) it will take about 120 seconds for the Pool to decide there is a problem and then take action to restart the VMs elsewhere based upon available resources and the restart priority of each VM. Personally, I’d rather wait the 120 seconds when something has really gone wrong than suffer an unnecessary fence/shutdown when all the VMs were actually still running fine.

So how did I set the timeout values? Like this:

Rather than enable HA from the GUI you’re going to have to do it from a command line. I use PuTTY when I’m not actually at the XenServer console. The command you will use is xe pool-ha-enable heartbeat-sr-uuids=your uuid goes here ha-config:timeout=however many seconds you want.

But in that command string, how do you know what the sr-uuid is? The way I find it is to start with XenCenter and locate the SR (storage repository) which is going to be used for the heartbeat status disk. I locate the SCSI ID of that SR and copy the number as shown in this image (click picture to view full-size):

Finding the SCSI ID of a Storage Repository

Finding the SCSI ID of a Storage Repository


After I have that number I next connect to the master XenServer using PuTTY (the master XenServer in a pool is always the top server shown in XenCenter) and run this command xe pbd-list device-config=SCSIid:\ 360030d903131325f48415f4865617274 where the number in RED is the ID just copied from Xencenter:
Finding the sr-uuid

Finding the sr-uuid


What is shown above is what the output should look like. The reason you see three sequences in this example is because there are three hosts in this pool, notice the host-uuids are all different. However also notice the sr-uuid value is the same in each grouping and this is the number we are after. Take the sr-uuid you just found and enter it into a command like this: xe pool-ha-enable heartbeat-sr-uuids=7a213624-1209-c467-42ed-6ef72a1b7699 ha-config:timeout=120

It may take a bit of time for the command to actually complete but once it does you should be able to refresh your Xencenter by using either the xe-toolstack-restart or the service xapi restart command and then when you look at the pool level on the HA tab you should see that HA is now turned on:

Verify that HA is now turned on

Verify that HA is now turned on


As I said previously I found 120 seconds worked best for me – but how did I determine that? Simple: I started by setting the HA timeout to 60 seconds (twice the default) and then ran the hard shutdown test again. One of the XenServers still fenced so I went to 90 seconds, and then finally 120 seconds. The point at which the XenServers do not fence is where you want to stop. But don’t just do this test on one side of the storage! You will want to recover your storage servers and once everything is back online and healthy run the same test again – but this time hard-shutdown the other storage node. Now if none of the XenServers fence then you are done…unless you disable and re-enable HA. As I pointed out in that earlier post, this manual timeout setting is not persistent – if you disable and re-enable HA on the pool, you will have to re-enable it from the command line again to insure that the timeout is set correctly. If it’s done from the GUI, it will revert to the 30-second default.

I have been cloning Citrix servers since the days of MetaFrame XP. Over the years I’ve done hundreds of systems and taught a number of people a process for cloning servers that has worked 100% of the time. Unfortunately that process required removing registry keys, running tools to change the SID, and “sterilizing” the image to get it ready to clone. Then once this was done you had to make a copy of the server (in the Bad Old Days we used Symantec Ghost – today we have better imaging tools, which we’ll discuss below), and then move that copy to either different hardware or to a virtualization platform. Then, after copying it, you had to reverse the whole process by adding back registry keys, changing the server name, joining the domain, and finally running “chfarm” (change farm) to join the machine back to the Citrix farm.

About a year and a half ago, Citrix came out with a tool called XenApp Prep, which takes the whole process down from about 30 minutes to just a couple of minutes (not including the amount of time to copy the files). With Windows 2008, the process is simple, and I’m going to tell you exactly how I clone an image. But before I start, I want to stress that, while the process is nearly the same for using XenApp Prep to make a V-Disk image for use with Provisioning Server, there are some slight differences, so be sure to read the “readme” file and the FAQ that come in the XenApp Prep zipped download.

Here are the high-level steps I use to create the server that I’m going to turn into a “Gold” image that I can then use as the source of my cloned image(s):

  1. First I install Windows Server 2008 and apply all critical OS patches and any optional patches I deem necessary to bring the server up to current standards. (Most IT shops have their own policies and standards for approving and applying patches, so your list may be different from mine.)
  2. Install any extra pieces that will be required by your application set: j#, .NET (whichever versions you need) with the appropriate SP, Java, etc.
  3. Turn on the required Terminal Services roles, and, if you are going to place the Web Interface on the server (I don’t personally recommend this), turn on the IIS role.
  4. When all my prerequisites are met – and you may want to check the admin guide or the Citrix Web site to find the most recent requirements – I install XenApp 5.0.
  5. Install the most recent Citrix service packs, hotfixes, feature packs, etc.
  6. Apply any best practices and tweaks necessary. (This is a whole topic by itself, so we won’t try to cover it here.)
  7. Now, unless I’m using application streaming (another subject we’re not covering here), I install all of my applications. Generally I start with Microsoft Office, because nearly all the time, a customer requires that at least part of the Office Suite be installed. For specific “line of business” and third-party applications, I would always want to work with the customer’s Subject Matter Expert (“SME”) to verify proper operation.
  8. After the application is installed, I have the SME test the functionality to verify that the application is functioning as would be expected to do whatever it is the business needs the application to do.

If the customer’s SME agrees that the applications are working correctly, I am ready to transform this server into my Gold image. This couldn’t be easier, especially if you’re virtualizing the XenApp servers. (And you know that XenServer is the best virtualization platform for XenApp, right?) Here are the steps:

  1. Hopefully I was thinking ahead and used a generic name for the server when I built it…but if for some reason I forgot to do that, I change the server name to something generic and reboot.
  2. Now I download XenApp Prep and install it to the server by running the MSI file. By default, the XenApp Prep installation places its executables in the C:\Program Files\Citrix\XenAppPrep directory (click image to view full size):
  3. The XenApp Prep Directory

    The XenApp Prep Directory

  4. If you are not creating an image for Provisioning Server – and we’re assuming here that you’re not – then all you do is navigate to the directory shown above and double click the XenAppPrep.exe to run it. (Again, refer to the readme and FAQ that come with XenApp Prep if you are creating an image for PVS.) A command window will appear, run a few commands, and close. That’s it – and that quick little process that took about 15 seconds saved you at least 10 minutes.
  5. Command Window

  6. Once XenApp Prep has completed, I next remove the IP address by either setting it to DHCP or to some static IP address. I prefer to set the address to something that’s not on its local subnet, so when it reboots, it cannot communicate until I want it to.
  7. I now navigate to the C:\windows\system32\sysprep directory, and doubleclick the sysprep.exe file to run, select the “OOBE” option (that’s “Out Of Box” Experience, not “Out Of Body”), select the option to shut down the server (not reboot), then click “next,” and sysprep runs – taking only a few seconds to complete:
  8. The Sysprep Directory

    The Sysprep Directory


    The Out-of-Box Experience

    The Out-of-Box Experience


    Sysprep Runs

    Sysprep Runs

At this point, you have your Gold image and you’re ready to deploy it over and over again. How do you do that? Again, it couldn’t be any easier:

  1. Copy the image to a new physical server using whatever imaging tool you prefer – we generally use Ultrabac’s UBDR Gold or Acronis, but whatever tool you prefer should work fine. If you’re virtualizing on XenServer, Hyper-V, or VMware all you need to do is copy the image to another storage repository.
  2. After the copying process is done – which is the longest step in the process of creating your clone – boot the server up, and follow the sysprep utility prompts (as though you just ran “setup” on a brand new server – hence the “Out of Box Experience”) to give the server its final name. This may take several minutes to complete.
  3. Boot Your New Server

    Boot Your New Server

  4. When sysprep is done, you will need to change the password in order to log on to the system.
  5. Immediately set the correct IP address and verify that the machine can ping the domain name.
  6. Go to the system properties and join the machine to your domain.
  7. Reboot
  8. When the server comes up this time, and you log onto the domain, your server should have already joined the Citrix farm and be ready to go. Just to be sure, I open a command prompt and type “qfarm” to verify that the server is now a member of the farm.
  9. Once you’ve confirmed that the server is in the farm, run the Access Suite Console, and configure it to see the farm. Once it comes up, I simply drag and drop the published applications that should be assigned to the new server and it’s ready to go.
  10. After I drag the applications onto the server, just to be sure, I again run a qfarm command – “qfarm/app” – to verify that the farm sees the new server with the newly allocated published applications on it.
  11. After you test the new server, make sure you’ve enabled logons on it.

That’s it – you now have another server in your farm, and creating more servers should only take you a few minutes for each one. (Of course the copy process is the slowest part…but you can always use that time to refill your coffee cup, comment on our blog site, or otherwise multitask if you’re really ambitious.)

NOTE: This was originally posted in October, 2009, and may not be a problem any more with current versions of XenServer, as some of the more recent comments would tend to verify – but we will keep the post active for historical purposes. (added by Moose Logic administrator, March 16, 2012)

The Level 1 HA (High Availability) feature that comes with Citrix Essentials for XenServer may be one of the best ways to crash your whole virtual infrastructure if you don’t understand how it works and don’t design in an appropriate level of redundancy. This of course will lead to hours of down time, unhappy management, possible data loss, and lots of extra work for you (most likely on a weekend).

The basics -
HA is designed to monitor the XenServer virtualization environment. When HA is enabled, the administrator can specify which virtual machines (VMs) need to be automatically restarted if the host server they’re running on should fail. If there is a failure of a host server, HA should then automatically restart its designated guest VMs on another host in the XenServer “resource pool.” Note that the HA function does not “live migrate” the guest VMs, because when a host fails the VMs on that host also fail. Rather, it selects another host server and restarts the VMs on that host. For all of this to happen correctly, Citrix’s HA requires two things to be true at all times:

  1. Each XenServer must be able to communicate with its peers in the pool.
  2. Each XenServer in the pool requires access at all times to the HA heartbeat disk, which is shared by all the XenServers in the pool.

If either of these two items is not true for any given XenServer in the pool, that server will “fence.” The short definition of “fencing” is that the XenServer suspects – although it’s not absolutely sure – that it is experiencing some kind of failure, so to protect against possible data corruption it shuts itself down – essentially sacrificing itself to protect the data – until a human comes along and sorts things out. If the fenced server is in a correctly configured HA pool, guest VMs that were configured for HA restart will be restarted on a surviving XenServer.

Considerations -
So… you have two XenServers all set up and all your VMs configured just the way you like them, and you decide to turn on HA. Everything appears to be working until one of the hosts suffers a failure and goes off line. (Murphy’s Law says this will happen on a Saturday evening right before your BBQ party is starting.) With HA enabled, you would expect, based on the whole “High Availability” concept, that everything would be OK. Critical VMs should get restarted on the other host and you should be able to deal with the failed host on Monday.

Oh, but wait, remember HA rule #1? The XenServer host that is still running suddenly does not have any peers to talk to. It no longer knows whether or not it’s healthy so, in the interest of protecting your data from corruption, it does what it’s designed to do – it fences, and now both of your XenServers are down. They may try to reboot, but you are now in an endless loop of fencing, and to get it resolved, you’re going to have to know how to use the “xe host-emergency-ha-disable force=true” command to resolve your problems. (And if you don’t understand that last sentence, you’re in for a long weekend.)

This results in a situation that we in IT refer to as “not good,” with a chance of “career altering,” and you’re going to miss your BBQ party.

Here’s another scenario that will spoil your party: What if both XenServers are actually healthy, and all the virtual servers are up and functioning, but the network link for the management communications between the XenServers fails? Again, each XenServer would think it was stranded from the pool and fence itself in an attempt to correct the issue. With both servers fencing, this would again create an endless loop of server fencing. In essence, one server would start to come back online and would still not see the other XenServer and would fence again, and so on, and so on.

So for those reasons a two-XenServer pool cannot successfully run HA! Just don’t do it – even though you can configure HA on a two-server pool the result can be disastrous and ruin your weekend…not to mention your next performance review.

HA In a Two-Server Pool - Just Don't Do It!

HA In a Two-Server Pool - Just Don't Do It!


Well, what about HA in a three node XenServer pool? Based upon the previously described scenarios, you now have a valid “pool,” in which HA will function. So you configure and enable HA, and when you test the HA functionality by killing one of the XenServers, everything works like it is supposed to. The guest VMs are restarted on the surviving XenServer hosts and you’re happy that everything is working correctly.

But here is another “gotcha!” If you have only one Ethernet interface per XenServer assigned to management, and they’re all plugged into one switch, what happens if the management link fails because a NIC fails – or even worse, the switch fails? If it’s just a NIC in one server, then that XenServer will fence – not too bad but still not what you want. If you were using a different set of NICs (as you always should) for the guest VMs to communicate with the rest of the world, then the guests on that server were probably up and working just fine until the server fenced. Sure, the critical ones will restart on the remaining servers, but you’ve lost a third of the resources in your pool unnecessarily.

Now let’s consider what would happen if the switch should fail and you had only single management ports on each XenServer all plugged into just that one switch. If this happens, it may be time to dust off the old resume, because you have just lost your entire XenServer pool. Why? Because when the switch went down, all the XenServers lost communication with one another, and each assumed that, because it was suddenly isolated from the pool, it must be experiencing some kind of failure. Therefore the whole pool fenced.

Non-Redundant Management Links - Don't Do This Either!

Non-Redundant Management Links - Don't Do This Either!


Conclusions -
Citrix’s HA does not work in a two host pool, period. With a pool of three or more XenServers you’ll be OK if you design the infrastructure correctly so that there is no single point of failure in your peer communications. How? Simply by bonding together two NICs, dedicating them to the management communication function, and then splitting the bonded pairs between two separate Ethernet switches. That way you’re protected against both a NIC failure and a switch failure.

But you’re not out of the woods yet! Don’t forget HA rule #2 – servers need to see the HA heartbeat disk. This is equally important, and you must consider the topology of that side of the network (iSCSI, Fiber, etc.) and be sure it is also redundant. And if you’re using iSCSI multi-pathing (e.g., with a pair of mirrored DataCore iSCSI SAN nodes), be sure to manually bump up the HA timeout interval so that if one of the SAN nodes should fail, the multi-pathing function has time to fail over to the other node before the XenServers all conclude that the HA heartbeat disk is gone – otherwise, again, they will all fence. Our testing indicates that a two minute timeout appears to have an adequate margin of safety. The default setting of one minute (oops – the default is actually 30 seconds) is definitely too short. Unfortunately, this setting does not appear to be persistent, so if you turn HA off and then back on, you’ll need to manually reset the timeout interval again. (This is probably a job for Workflow Studio, but we just haven’t had time to work through the process yet.)

NO Single Points of Failure
HA will do a fine job of protecting you, if you build the network correctly. So make sure you’ve built in enough redundancy that you have no single point of failure, and enjoy your BBQ.

The Right Way to Build an HA Environment

The Right Way to Build an HA Environment


P.S.: If you can’t justify more than two XenServers, but you still have one or more critical guests that need to be highly available, there is a solution: Marathon Technologies’ everRun VM. But that’s another post for another day.

Latest Blog Feeds
Testimonials
“Our business is all about process and margins; we rely on Moose Logic to install and manage network solutions that enable us to control both. Moose Logic created solutions that transformed our business relationships and processes.”
Ron Horowitz
Birchwood Park Homes
Read our Newsletter
Copyright © 2010 All rights reserved.
Wordpress Delicate template designed by NattyWP