Your are here: Home > Blog

At the recent Synergy Berlin conference, Citrix announced Access Gateway 5.0. We have confirmed that, as of now, 5.0 is available for download from the Citrix download site – both as an update for the CAG 2010 hardware appliance, and in Access Gateway VPX (virtual appliance) format. (Note: you will need a “mycitrix” account to download the software.)

One of the things I really like about 5.0 is that it now supports running two 2010 appliances in an active/passive HA configuration with automatic failover. This was a serious shortcoming of the original CAG appliance.

In earlier versions, if you were using the Access Gateway as a general-purpose SSL VPN, you could configure HA of a sort within the Access Gateway client plug-in, by defining primary and secondary Access Gateways for the client to connect to. However, if you were simply running the Access Gateway in “CSG replacement” mode to connect to a XenApp farm without requiring your users to first establish an SSL/VPN connection, you had no ability to provide automatic failover unless you had some kind of network load balancing device in front of multiple Access Gateway appliances. That meant, of course, that to avoid having the load balancing device become a single point of failure, you had to have some kind of HA functionality there as well. By the time you were done, the price tag had climbed to a level that just didn’t make sense for some smaller deployments.

NOTE: This specifically applies to the 2010 appliance. The CAG Enterprise models, because they are built on the NetScaler hardware platform, have always supported operation as HA pairs with automatic failover. Of course, a CAG MPX 5500 also carries a $9,000 list price, compared to $3,500 for a CAG 2010.

Now, with the release of 5.0, you can purchase two 2010 appliances (which will cost you less than a single MPX 5500), and run them as an active/passive HA pair. Thank you very much, Citrix CAG team!

Here are a couple of videos from Citrix TV. The first deals with how to upgrade an existing CAG 2010 to the 5.0 software using a USB flash drive, and then set up the basic system parameters:

The second video shows how to configure a pair of appliances for active/passive failover:

You can access several other “how-to” videos by going to http://www.citrix.com/tv, and searching on “Access Gateway 5.0.”

Moose Logic has been building and supporting networks for a long time. And during most of that time we’ve had a real love-hate relationship with most of the backup technologies we’ve implemented and/or recommended.

Tape backups – although they are arguably the best technology for long-term archival storage – are a pain to manage. Tapes wear out. Tape drives get dirty. People just don’t do test restores as often as they should. As a result, all too often, the first time you realize that you’ve got a problem with your backups is when you have a data loss, try to restore from your backups, and find out that they’re no good.

Add to that the astronomical growth in storage capacity, meaning that all the data you need to back up often won’t fit on one tape any more. So, unless you have someone working the night shift who can swap out the tape when it gets full, you’re faced with…

  • Buying multiple tape drives, which typically means you’re going to spend more on your backup software. And if your servers are virtualized, where are you going to install those tape drives?
  • Buying a tape library (a.k.a. autoloader), which can also get expensive.
  • Changing the tape when you come in the next morning, which means that your network performance suffers because you’re trying to finish the backup job(s) while people are trying to get work done.

Then there’s the issue of getting a copy of your data out of the building. Typically, that’s done by having multiple sets of tapes, and a designated employee who takes one set home every Friday and brings the other set in. If s/he remembers. Or isn’t sick or on vacation.

Backing up to external hard drives is a reasonable alternative for some. It solves the capacity issue in most cases. But over the years, we’ve seen reliability issues with some manufacturers’ units. We’ve uncovered nagging little issues like some units that don’t automatically come back on line after a power interruption. And they’re not necessarily the best for long-term archival storage, unless you keep them powered on – or at least power them on once in a while – because hard disks that just sit for long periods of time may develop issues with the lubrication in their bearings and not want to spin back up.

But we’ve finally found an approach that we really, really like. One that, as one of our engineers said in an internal email thread, we actually enjoy managing. In fact, we like it so much we built a backup appliance around it. It’s Microsoft’s System Center Data Protection Manager (SCDPM).

In this installment of the Moose Logic Video Series, our own Scott Gorcester gives you a quick overview of SCDPM 2010:



For more detail on how it works, check out the description of our MooseSentryTM backup appliance.

One of the many enhancements Citrix made in XenApp v6 is that cloning a server is now much easier that it was in previous versions. Here’s a step-by-step guide, with lots of screen caps:

  1. Install the updated XenApp Server Configuration Tool.
  2. Run the XenApp Server Role Manager (Start – All Programs – Citrix – XenApp Server Role Manager – XenApp Server Role Manager):
    XenApp Server Role Manager

    XenApp Server Role Manager

  3. Select “Edit Configuration:”
    Edit Configuration

    Edit Configuration

  4. Select “Prepare this server for imaging and provisioning:”
    Choose a Task

    Choose a Task

  5. On the next screen, check “Remove this current server instance from the farm,” as shown below, then click “Next.” As the pop-up tip indicates, this will save you from having to do it manually later. The server will automatically join the farm when you bring it back on-line.
    Provisioning Options

    Provisioning Options

  6. On the next screen, click “Apply:”
    Ready to Configure

    Ready to Configure

  7. The server runs through the items that are needed to prepare XenApp for cloning. Note the informational warning that the settings will be applied when you clone or reboot the server. This means that once your new server comes on-line, it will automatically join the farm that the original server was in (before you removed it in Step 5).
    Configuring Server

    Configuring Server

  8. Back at the XenApp Server Role Manager screen, you can choose to reboot the server (which you probably don’t want to do just yet), or simply close the window and proceed with any additional tasks you may need to perform before cloning, such as Sysprep.
    XenApp Server Role Manager

    XenApp Server Role Manager

  9. After you’ve finished any additional tasks, you can shut the server down, and clone it to your heart’s content. When your clones come back on-line, if they have a network connection on the correct IP subnet, they will automatically join the farm. However (“gotcha” alert), if you didn’t Sysprep them, they will all try to join the farm under the same machine name – the one your original server had. So if you didn’t change the name of the server, it’s best to disconnect it from the network, change the name and IP address, reconnect to the network, join it to the AD Domain, and then reboot it so it can join the XenApp farm using the correct name.

If you’re a Citrix “old-timer,” you’ve got to agree that it doesn’t get much easier that this!

Tomorrow (May 5), at 17:00 GMT, all 13 root DNS servers on the Internet will begin using DNSSEC (Domain Name System Security Extensions) to reply to user requests. Here’s why you might care about this.

As most of our readers know, DNS is what translates the URL you type into your browser (like “www.mooselogic.com”) into an IP address (like “216.9.9.164″) that your computer can actually use to send packets of data across the Internet. If you have a Windows Server-based network, one (or more) of your Windows Servers is probably providing DNS services to the users on your network. But the DNS server on your network doesn’t automatically know where everything is. If it needs to resolve an address that doesn’t happen to already be in its local cache, it has to ask some other DNS server out on the Internet. Sometimes those queries go all the way to one of the root servers.

It’s been recognized for quite some time that the existing protocol used for DNS queries isn’t entirely secure. Therefore, the international standards bodies have been working on a more secure standard, which is DNSSEC. DNSSEC uses digital signatures to authenticate DNS responses, so your computer knows the response actually came from an authoritative DNS server.

So what’s the problem? The potential problem is that those DNS responses will arrive in significantly larger data packets than before. Specifically, rather than using UDP packets that are smaller than 512 bytes, the responses will not only be longer, but may be broken into multiple TCP packets. Some routers and firewalls specifically inspect DNS traffic to look for anomalies, and if you have older equipment that doesn’t know about the DNSSEC standard, these changes may very well look like anomalies, and be blocked. That would mean that your DNS clients or DNS server would not be able to communicate with the public root DNS servers, and that would mean that you would start having problems resolving DNS.

These problems may be intermittent in nature at first, because some DNS requests may be able to be resolved by using locally cached information…but DNS records typically have a “time to live” built into them, so eventually the cached information will expire and have to be refreshed. So if you do have a problem, it’s likely to get worse with time.

There are some tools available to help you determine whether you’re likely to have a problem. If you’re comfortable using a DNS query tool like dig (which is a command-line query that can be run from most unix or linux systems), you can find instructions on using it at https://www.dns-oarc.net/oarc/services/replysizetest. If you don’t have access to a unix or linux host, or don’t feel comfortable using such a tool, you can download a Java utility from http://labs.ripe.net/content/testing-your-resolver-dns-reply-size-issues, and run it on any system with Java run-time installed (which includes most Windows systems). Just download and save the file, then double-click it.

Watchguard customers should note that if you have a Watchguard Firebox or XTM appliance with current firmware, you should not have any issues with these new DNSSEC packets.

It’s 8 pm on a Sunday evening, and I get a panicked call from a customer because he cannot connect to his XenServersTM via the XenCenterTM management tool. However, as near as he could tell, all of the hosted virtual machines were up and running and in a healthy state. He had unsuccessfully tried to point the XenCenter management tool at another member of the XenServer pool but was unsuccessful.

So what happened and how do you fix it?

This situation can happen for several reasons but generally it happens when there are only two servers in the XenServer pool, and the pool master suddenly fails. In essence, what happens is the surviving server (let’s just call it the “slave”) can no longer see its peer, the pool master, so it assumes it has been stranded and goes into emergency mode to protect its own VMs. There are other ways this can happen (an incorrectly configured pool with HA turned on for example), but this is the most common reason that I have personally experienced.

Depending upon the situation, you may not be able to ping the master server because it is actually down, or you may be able to ping the server but it is in an inconsistent, “locked up”, state such that it cannot answer requests to it. If you are able to connect to the console of the master server either directly with a monitor, keyboard, and mouse (the old fashioned way) or through a remote management interface (DRAC, ILO, ILOM, etc) the server may appear to be running, but you may not be able to do anything with it.

At this point you may be thinking, “This is no big deal – just reboot the machine and it will be fine.” If you are lucky that may actually solve the problem, but in many cases it will not. What you might see is that after the master reboots you will be able to connect to the master but you will not see the slave. Or it may be that your master is truly broken and you are not able to simply reboot it due to a system or hardware failure. But, of course, you’ve still got to get your pool online and working again regardless.

During this period of time, if you try to use a tool such as Putty to connect to the slave via its management interface, you may not be able to connect to it either. If you try to ping the slave on the management interface you may not get any replies. But if you connect to the console of the slave (again, either the physical console or via a remote management interface) you will probably see that the machine is running, but if you look at XSconsole it will appear that the management interface is gone because there will be no IP address showing. By now you’ll probably be scratching your head because the strange thing is all the VMs are running.

So at this point your master appears to be down, or at least impaired, you’ve got no management interface on the slave, your pool is broken and you cannot manage the VMs. So what do you do?

Well, if this happens to you and your VMs are still up and running the first thing you should do is take a deep breath, because more than likely it is not as bad as you might think. XenServer is a robust platform and if the infrastructure is built correctly (and I’m going to quote a customer), “you can really slam the things around and they still work”.

After you take a deep breath and let it out slowly, from the console of the slave server, you will need to access the command line and start by typing:

xe host-is-in-emergency-mode

If the server returns an answer of “True” then you’ve confirmed that the server has gone into emergency mode in order to protect itself and the VMs running on it. (If the server returns an answer of “False” then you can stop reading, because the rest of this post isn’t going to help you.)

Assuming you receive the answer of “True” the slave server is in emergency mode because it cannot see a master – either because the master is actually down, or because the management interface(s) is(are) not working. Therefore, the next step is to promote the slave to master to get it out of emergency mode. We do this by typing:

xe pool-emergency-transition-to-master

At this point the slave server should take over as the pool master and the management interface should be available again. Now if you type the xe host-is-in-emergency-mode command again you should get an answer of “False”.

Now, open XenCenter again. It will first try to connect to the server that was the master, but after it times out it will then attempt to connect to the new master server. Be patient, because eventually it will connect (it may take several seconds) and you will again see your pool and be able to manage your VM’s. If some of the VMs are down because they were on the server that failed you’ll be able to start them on the remaining server (assuming you have shared backend storage and sufficient processor and memory resources).

Now what about the master if it has totally failed? What do I do after I’ve fixed, say, a hardware problem in order to return it to my pool?

If the following two conditions are true:

  1. You are using shared storage so that your VMs are not stored on the XenServer local drives, and
  2. You have built your XenServers with HBAs (fiber or iSCSI) rather than using Open iSCSI, which means the connectivity information to your backend SAN will be stored within the HBA,

…then it may be much simpler and quicker just to reload the XenServer operating system. (If you do not have shared backend storage, which means your VMs are on local storage, DO NOT DO THIS). I can rebuild my XenServers from scratch in about 20 – 30 minutes and have them back in the pool and running.

If either of those two conditions is not true then, depending upon your situation, recovery may be significantly more difficult. It could be as simple as resetting your Open iSCSI settings and connecting back to your SAN (still easy but takes more time to accomplish) or it could be as painful as rebuilding your VMs because you lost your server drives. (OUCH!)

Real world example: I recently had a NIC fail on the motherboard of my master server. Of course since the NIC was on the motherboard it meant the whole motherboard had to be replaced which significantly modified the hardware configuration for that server.

In this case, when I brought that XenServer back online it still had all the information about the old NICs showing in XenCenter, plus it had all the new NICs from the new hardware. Yes I could have used some PIF forget commands to remove the NICs that no longer existed and reconfigure everything but that would have taken me a bit of time to straighten out. Since I had iSCSI HBAs attached to a Datacore SAN (great product, by the way) for shared storage, all I did was reload XenServer on that machine, modify the multipath-enabled.conf file (that is a different blog topic for another day), and rejoin the server to the pool. Because the HBAs already had all the iSCSI information saved in the card, the storage automatically reconnected all the LUNs, the network interfaces took the configuration of the pool, and I was back online and running in less than 30 minutes.

After you repair the machine that failed and get it back online, you may want it to once again be the master server. To do this type:

xe host-list

You will get a list of available servers with their UUID’s. Record the UUID of the server that you want to designate as the new master and then type:

xe pool-designate-new-master host-uuid=[the uuid of the host you want]

After you type this your pool will again disappear from XenCenter, but after about 20 – 30 seconds (be patient) it will reappear with the new server as the master. Your pool should now be healthy, and you should again be able to manage servers as normal.

Recently, my Word 2007 program started experiencing some weird behavior.  When I had a document open and would go to close Word, I’d be greeted by this:

Microsoft Office Error

Also, since I’m running Windows 7, when I tried to Right-click the Word icon that I pinned to my taskbar and open a recently opened document, Word would open, but not the document.  I would have to explicitly open the document from within the application.

Initial reports I read pointed to a recently installed patch, but instead of going through my “Programs and Features” uninstalling patches one-by-one, I found an even easier solution on Ed Bott’s blog.

NOTE: The following instructions deal with editing the Windows Registry, which is not for the incautious or faint of heart. It would be a good idea to back up your Registry and/or create a restore point before trying this.

If I haven’t scared you off, open Regedit, navigate to HKCU\Software\Microsoft\Office\12\Word and delete the “Data” Key.  (I did export the “Data” key first, just to be on the safe side.) After deleting the key, Word appeared to open and close properly.  I right-clicked the Word icon on my taskbar, chose a document to open and it opened right up. So far, so good.

I then merged the backed-up reg key back into the registry and tried opening that same recently-used document from the task bar – Word opened, no document.  Then upon closing Word, it crashed again.

From the information in Ed’s blog entry, the cause appears to be related to having Word running at the time a particular “Important” update is applied to the system. Removing this reg key fixed it right up! I saved at least 30 minutes by avoiding having to uninstall and reinstall Office.

A couple months back, I was working on a XenDesktop 3.0 installation.  The customer was having some issues with licensing, so my first thought was to upgrade the license server to the newest version – because historically that’s been the recommended first step in dealing with license issues.  In this case, though, it turned out to break other things – but in the end gave us some very valuable information.

First, a little background information.  XenDesktop uses an agent on the virtual workstation images that communicates with a server piece called the Desktop Delivery Controller (“DDC”).  This agent/server communication happens on TCP port 8080.

Unfortunately, in this specific case, the Citrix License Service was running on the same server as the DDC service. And in the latest version of the license server service, Citrix changed the port that the License Management Console uses to communicate with the license server to…yeah, you guessed it: 8080.

This conflict prevented the XenDesktop machines from checking into the DDC, so the DDC, which, as you might infer from its name, is the broker service that controls the delivery of virtual desktops to client devices that request them, viewed them all as “offline”.  It therefore believed that there were no virtual desktops available, so no connection requests were being serviced. In technical terms, this is what we call a “bad thing.”

The solution in this case was to change the port that the License server and License Management Console communicate on to port 8082 – which just happens to be the port it used in previous versions… hmm…

It should be noted that this can affect more than just XenDesktop Agents.  Port 8080 is heavily used for other Citrix functions, such as the XML service.  Why Citrix would change the license service to port 8080? Why do people jump out of perfectly good airplanes? Guess it seemed like a good idea at the time…to somebody, anyway.

Once we started digging for it, we found detailed steps on editing the license server configuration in the License Server 11.6 readme, and it’s obvious from the text that Citrix knew using port 8080 could cause conflicts:

When you install the license server components, version 11.6.1, on the same server as other Citrix products, the License Management Console is configured to use port 8080, by default, which may conflict with the other Citrix products (for example, they may use the Citrix XML Service which may use 8080 as well). To resolve this issue:

  1. Navigate to C:\Program Files\Citrix\Licensing\LMC\tomcat\
    conf\server.xml
  2. Open the file in Wordpad.
  3. Replace the port number (8080) in the following line with 8082:
    <Connector port=”8080″ protocol=”HTTP/1.1″
    connectionTimeout=”20000″ redirectPort=”8443″/>
  4. Save the file.
  5. Reboot the server.

The other way we could have solved the problem, of course, would have been to move the license service to a server that had no other Citrix components running on it. We generally recommend that as a best practice, if at all possible (although we realize that sometimes it isn’t).  It just makes things easier in the long run.

This post was inspired by my quest to install the newest version of Windows Live Messenger on our 32-bit, Windows Server 2008 servers running XenApp 5.0 here at Moose.  We were previously running Live Messenger 8.5 until it started prompting us that an update was available and would not let us use the program until it was updated.  This update is mandated by Microsoft to keep everyone’s messengers up-to-date to protect against security flaws.

According to the Windows Live Essentials System Requirements, you can install it on Server 2008…HA!  Good luck!  Standard attempts to install just don’t work – it goes through the process and at the end of the installation it rolls everything back. (Note: Click on an image to view full-size.)

Live Messenger System Requirements

Live Messenger System Requirements


I tried a couple of different approaches, including copying the bits from C:\program files\Windows Live\Messenger on a Vista SP2 box over to the Citrix XenApp Servers, but that didn’t work either (big surprise).

By default, the package you download from http://download.live.com to install Live Messenger is a small executable that asks you what other programs you want to install.  Once you’ve made your choices, it downloads the actual installation bits from the Internet.  There is a full download that contains all the installers, but you won’t find it on a Microsoft site (at least I couldn’t).  I ultimately found it on this third-party Web site. You can do the installation with either package, but I did it with the full 140 Mb package, so I know it works that way.

I have not found any way to extract the contents of the full package executable into separate .msi files for each product.  You can browse to c:\program files\common files\windows live\.cache\ and search for the particular piece you’re looking for (Messenger.msi in my case), but it is a silent install, and things still didn’t work after trying to install it on Server 2008.

The solution I found was to actually hack the installer to allow it to install on a server OS.

I found an article via a Google search that is actually for installing Windows Live Wave 3 Beta on unsupported x86 and x64 versions of Server 2003 and 2008, and discovered that the method worked perfectly for Live Messenger as well.

To install Live messenger you will need to download:

  1. The full install of Windows Live Essentials;
  2. “Resource Hacker” from the link in the article above. I chose the UK mirror.

Customization and Installation steps:

  1. Extract the ResHack.zip file to your hard drive (anywhere works, I used the Desktop).
  2. Launch ResHack.exe from the extracted files.
  3. Open the Windows Live Installer executable (WLSetup-All.exe or WLSetup-Web.exe).
  4. Open the Windows Live installer file

    Open the Windows Live installer file

  5. Locate CONFIG -> CONFIG0 -> 0.
  6. Using ResHack

    Using ResHack

  7. Select-All and copy the XML code into a text editor such as Notepad (choose Word Wrap from the Format menu in Notepad to make it easier to edit)
  8. Find ‘os productType=”workstation”’  and replace ‘workstation’ with ‘server’
  9. Select Text to Replace

    Select Text to Replace

  10. Copy/paste the XML code back into ResHack, replacing the old code
  11. Click ‘Compile Script’
  12. Compile Script

    Compile Script

  13. Select File -> Save and save the modified executable (Make sure to put .exe on the end of the filename and name it something unique).  It’s 140Mb – be patient as it compiles the executable.
  14. Run the modified executable.

The Windows Live applications should now be installed on Server 2008!

Happy Messaging!

Recently I wrote a post about the hazards of XenServer HA and how to avoid a couple of different pitfalls which lead to XenServer fencing. In that post I talked about the necessity of correctly setting the HA heartbeat timeout for your environment so that your XenServers will allow enough time for a storage failover to occur. The idea, of course, is to prevent your XenServer from going into a “fence” condition which can occur for many reasons. The reason we’re discussing here is triggered when the XenServer believes its storage has suddenly become unavailable and it is not able to recover its state quickly enough to prevent the HA timeout from fencing the server.

I frequently build environments that use a pair of replicated DataCore SANmelody nodes (two physical nodes) and configure my XenServer in a multipath configuration. With this configuration my XenServers see two active paths to their storage (the status of the multipath is shown in the image below) – one path to each of the two nodes. If, for example, one of the SANmelody nodes goes off line, the other node will immediately take over. However, the XenServers have to be given enough time to fully recognize a failover has occurred, and the storage is still available, in order to avoid a fence. The default HA timeout in XenServer is 30 seconds which means if it takes a XenServer more than 30 seconds to realize the storage is still healthy and available then the server will fence. If the storage was indeed still available, then more than likely there were still VM guests up and running on the XenServer, which have now been taken offline unnecessarily.

To test and tune this setting I first make sure HA is enabled on the pool, then I perform hard failover tests where, using a DRAC or iLO card if I have one, I suddenly power cycle one of the storage servers and watch to see if any XenServers fence. I run this hard power cycle test because this specific problem never comes up with simple storage stops and restarts; rather it only shows up when a storage server actually goes down suddenly, or “hard,” as we say. So I run these tests because I want to stress the system to simulate unfortunate things like power failures, sudden server reboots due to gremlins, and other things along those lines. If nothing happens then great – let’s go home and we can sleep well knowing HA is working correctly. But what if you do have one or more servers which do fence because they believe their storage is gone when in fact it is not?

The last time I had this happen to me I had to test my environment several times, and with each successive run through the hard failover test I used a different timeout setting. In the end I found that 120 seconds worked best for me. (Keep in mind I am doing this during a build and there are no live production workloads running on any of these servers.)

So what is the downside of setting your timeout this high? Well, if a XenServer really fails (for whatever reason) it will take about 120 seconds for the Pool to decide there is a problem and then take action to restart the VMs elsewhere based upon available resources and the restart priority of each VM. Personally, I’d rather wait the 120 seconds when something has really gone wrong than suffer an unnecessary fence/shutdown when all the VMs were actually still running fine.

So how did I set the timeout values? Like this:

Rather than enable HA from the GUI you’re going to have to do it from a command line. I use PuTTY when I’m not actually at the XenServer console. The command you will use is xe pool-ha-enable heartbeat-sr-uuids=your uuid goes here ha-config:timeout=however many seconds you want.

But in that command string, how do you know what the sr-uuid is? The way I find it is to start with XenCenter and locate the SR (storage repository) which is going to be used for the heartbeat status disk. I locate the SCSI ID of that SR and copy the number as shown in this image (click picture to view full-size):

Finding the SCSI ID of a Storage Repository

Finding the SCSI ID of a Storage Repository


After I have that number I next connect to the master XenServer using PuTTY (the master XenServer in a pool is always the top server shown in XenCenter) and run this command xe pbd-list device-config=SCSIid:\ 360030d903131325f48415f4865617274 where the number in RED is the ID just copied from Xencenter:
Finding the sr-uuid

Finding the sr-uuid


What is shown above is what the output should look like. The reason you see three sequences in this example is because there are three hosts in this pool, notice the host-uuids are all different. However also notice the sr-uuid value is the same in each grouping and this is the number we are after. Take the sr-uuid you just found and enter it into a command like this: xe pool-ha-enable heartbeat-sr-uuids=7a213624-1209-c467-42ed-6ef72a1b7699 ha-config:timeout=120

It may take a bit of time for the command to actually complete but once it does you should be able to refresh your Xencenter by using either the xe-toolstack-restart or the service xapi restart command and then when you look at the pool level on the HA tab you should see that HA is now turned on:

Verify that HA is now turned on

Verify that HA is now turned on


As I said previously I found 120 seconds worked best for me – but how did I determine that? Simple: I started by setting the HA timeout to 60 seconds (twice the default) and then ran the hard shutdown test again. One of the XenServers still fenced so I went to 90 seconds, and then finally 120 seconds. The point at which the XenServers do not fence is where you want to stop. But don’t just do this test on one side of the storage! You will want to recover your storage servers and once everything is back online and healthy run the same test again – but this time hard-shutdown the other storage node. Now if none of the XenServers fence then you are done…unless you disable and re-enable HA. As I pointed out in that earlier post, this manual timeout setting is not persistent – if you disable and re-enable HA on the pool, you will have to re-enable it from the command line again to insure that the timeout is set correctly. If it’s done from the GUI, it will revert to the 30-second default.