Your are here: Home > Blog

Today, we’re going to play “What’s Wrong with This Picture.” First of all, take a look at the following screen capture. (You can view it full-sized by clicking on it.)

Example of Phishing Email

Example of Phishing Email


Now let’s see if you can list all the things that are wrong with this email. Here’s what I came up with:

  • There is no such thing as “Microsoft ServicePack update v6.7.8.”
  • The Microsoft Windows Update Center will never, ever send you a direct email message like this.
  • Spelling errors in the body of the email: “This update is avelable…” “…new futures were added…” (instead of “features”) and “Microsoft Udates” (OK, that last one is not visible in my screen cap, so it doesn’t count).
  • Problems with the hyperlink. Take a look at the little window that popped up when I hovered my mouse over the link: The actual link is to an IP address (85.214.70.156), not to microsoft.com, as the anchor text would have you believe. Furthermore, the directory path that finally takes you to the executable (“bilder/detail/windowsupdate…”) is not what I would expect to see in the structure of a Microsoft Web site.”

If you want to know what sp-update.v678.exe would do if you downloaded and executed it, take a look at the description on the McAfee Web site (click on the “Virus Characteristics” tab). Suffice it to say that this is not something you want on your PC.

Sad to say, I suspect that thousands of people have clicked through on it because it has the Windows logo at the top with a cute little “Windows Update Center” graphic.

Would you have spotted it as a phishing attempt? Did you spot other giveaways in addition to the ones I listed above? Let us know in the comments.

If you’ve been following this blog for any length of time, you know that we’ve written extensively about XenDesktop, and spent a lot of time on best practices and problems to avoid. And one of the biggest problems to avoid is poor storage design resulting in poor VDI performance.

In a nutshell, the problem is that a Windows desktop OS uses disk far differently than a Windows server OS. Thanks to the way Windows uses the swap file, disk writes outnumber disk reads by about 2 to 1. You can build your virtual desktop infrastructure on the latest and greatest server hardware, with tons of processing power and insanely huge amounts of RAM, but if all of the disk I/O for all of those virtual desktops is hitting your SAN, you’ve got a scalability problem on your hands.

Provisioning Services (“PVS”) can help to mitigate this in two ways (assuming for sake of argument that you’re provisioning multiple virtual systems from a common, read-only image): First, if you build your Provisioning Servers correctly, you should be able to serve up most of the OS read operations from the Provisioning Server’s own cache memory. Second, you can use the virtualization host’s local disk storage as the required “write cache” – because all of those write operations have to go somewhere while the virtual system is running.

But XenDesktop 5 introduced a new way to provision desktops called “Machine Creation Services” (“MCS”). We wrote about this in the April edition of our Moose Views newsletter, so if you’re not familiar with all the pros and cons of MCS vs. PVS, I’d encourage you to take a brief time out and read that article. Suffice it to say that, despite all the advantages of MCS, the biggest downside of using MCS to provision pooled desktops was that all of the IOPS hit your SAN storage, which limited the scalability of an MCS-provisioned VDI deployment.

But all of that just changed, with the release of XenDesktop 5 Service Pack 1, which was made available for download a week ago (May 13). With SP1, XenDesktop 5 is now able to take advantage of the “IntelliCache” feature that was introduced as part of XenServer v5.6 Service Pack 2. Using MCS with the combination of XenDesktop 5 SP1 and XenServer SP2…

  • The first time a virtual desktop is booted on a given XenServer, the boot image is cached on that XenServer’s local storage.
  • Subsequent virtual desktops booted on that same XenServer will boot and run from that locally cached image.
  • You can use the XenServer’s local storage for the write cache as well.

The bottom line is that you can move as much as 90% of the IOPS off of the SAN and onto local XenServer storage, removing nearly all of the scalability limitations from an MCS-provisioned environment.

With most of the IOPS for running VMs taking place on local storage, it’s pretty straightforward to figure out how many VMs you can expect to support on a given virtualization host. Dan Feller’s blog post does a great job of walking through the process of calculating the functional IOPS that your local XenServer storage repository should be able to support, and inferring from that number how many light, normal, or power users you should be able to support as a result.

This also means that using XenServer as the hypervisor for your XenDesktop 5 deployment is going to yield a significant performance advantage over any other hypervisor, unless or until the other guys come out with similar local caching features. So, if you’re a VMware shop, my advice is this: Go ahead and virtualize all of the supporting XenDesktop server components on your VSphere infrastructure. Run your XenDesktop 5 VMs on XenServer hosts, and just don’t tell anyone! If you’re asked, just say, “Oh, yeah, these are my XenDesktop host systems – they’re completely separate from our VSphere infrastructure, because we don’t need the (insert favorite VSphere feature) function for these systems.” Your infrastructure will run better, and no one will know but you…

Many times, terms like “High Availability” and “Fault Tolerance” get thrown around as though they were the same thing. In fact, the term “fault tolerant” can mean different things to different people – and much like the terms “portal,” or “cloud,” it’s important to be clear about exactly what someone means by the term “fault tolerant.”

As part of our continuing efforts to guide you through the jargon jungle, we would like to discuss redundancy, fault tolerance, failover, and high availability, and we’d like to add one more term: continuous availability.

Our friends at Marathon Technologies shared the following graphic, which shows how IDC classifies the levels of availability:
graphic of availability levels

Redundancy is simply a way of saying that you are duplicating critical components in an attempt to eliminate single points of failure. Multiple power supplies, hot-plug disk drive arrays, multi-pathing with additional switches, and even duplicate servers are all part of building redundant systems.

Unfortunately, there are some failures, particularly if we’re talking about server hardware, that can take a system down regardless of how much you’ve tried to make it redundant. You can build a server with redundant hot-plug power supplies and redundant hot-plug disk drives, and still have the system go down if the motherboard fails – not likely, but still possible. And if it does happen, the server is down. That’s why IDC classifies this as “Availability Level 1″ (“AL1″ on the graphic)…just one level above no protection at all.

The next step up is some kind of failover solution. If a server experiences a catastrophic failure, the work loads are “failed over” to a system that is capable of supporting those workloads. Depending on those work loads, and what kind of fail-over solution you have, that process can take anywhere from minutes to hours. If you’re at “AL2,” and you’ve replicated your data using, say, SAN replication or some kind of server-to-server replication, it could take a considerable amount of time to actually get things running again. If your servers are virtualized, with multiple virtualization hosts running against a shared storage repository, you may be able to configure your virtualization infrastructure to automatically restart a critical workload on a surviving host if the host it was running on experiences a catastrophic failure – meaning that your critical system is back up and on-line in the amount of time it takes the system to reboot – typically 5 to 10 minutes.

If you’re using clustering technology, your cluster may be able to fail over in a matter of seconds (“AL3″ on the graphic). Microsoft server clustering is a classic example of this. Of course, it means that your application has to be cluster-aware, you have to be running Windows Enterprise Edition, and you may have to purchase multiple licenses for your application as well. And managing a cluster is not trivial, particularly when you’ve fixed whatever failed and it’s time to unwind all the stuff that happened when you failed over. And your application was still unavailable during whatever interval of time was required for the cluster to detect the failure and complete the failover process.

You could argue that a fail over of 5 minutes or less equals a highly available system, and indeed there are probably many cases where you wouldn’t need anything better than that. But it is not truly fault tolerant. It’s probably not good enough if you are, say, running a security application that’s controlling the smart-card access to secured areas in an airport, or a video surveillance system that sufficiently critical that you can’t afford to have a 5-minute gap in your video record, or a process control system where a five minute halt means you’ve lost the integrity of your work in process and potentially have to discard thousands of dollars worth of raw material and lose thousands more in lost productivity while you clean out your assembly line and restart it.

That brings us to the concept of continuous availability. This is the highest level of availability, and what we consider to be true fault tolerance. Instead of simply failing workloads over, this level allows for continuous processing without disruption of access to those workloads. Since there is no disruption in service there is no data loss, no loss of productivity and no waiting for your systems to restart your workloads.

So all this leads to the question of what your business needs.

Do you have applications that are critical to your organization? If those applications go down how long could you afford to be without access to them? If those applications go down how much data can you afford to lose? 5 minutes? An hour? And, most importantly, what does it cost you if that application is unavailable for a period of time? Do you know, or can you calculate it?

This is another way to ask what the requirements are for your “RTO” (“Recovery Time Objective” – i.e., how long, when a system goes down, do you have before you must be back up) and “RPO” (“Recovery Point Objective” – i.e., when you do get the system back up, how much data it is OK to have lost in the process). We’ve discussed these concepts in previous posts. These are questions that only you can answer, and the answers are significantly different depending on your business model. If you’re a small business, and your accounting server goes down, and all it means is that you have to wait until tomorrow to enter today’s transactions, it’s a far different situation from a major bank that is processing millions of dollars in credit card transactions.

If you can satisfy your business needs by deploying one of the lower levels of availability, great! Just don’t settle for an AL1 or even an AL3 solution if what your business truly demands is continuous availability.

Color me skeptical when it comes to the “cloud computing” craze. Well, OK, maybe my skepticism isn’t so much about cloud computing per se as it is about the way people seem to think it is the ultimate answer to Life, the Universe, and Everything (shameless Douglass Adams reference). In part, that’s because I’ve been around IT long enough that I’ve seen previous incarnations of this concept come and go. Application Service Providers were supposed to take the world by storm a decade ago. Didn’t happen. The idea came back around as “Software as a Service” (or, as Microsoft preferred to frame it, “Software + Services”). Now it’s cloud computing. In all of its incarnations, the bottom line is that you’re putting your critical applications and data on someone else’s hardware, and sometimes even renting their Operating Systems to run it on and their software to manage it. And whenever you do that, there is an associated risk – as several users of Amazon’s EC2 service discovered just last week.

I have no doubt that the forensic analysis of what happened and why will drag on for a long time. Justin Santa Barbara had an interesting blog post last Thursday (April 21) that discussed how the design of Amazon Web Services (AWS), and its segmentation into Regions and Availability Zones, is supposed to protect you against precisely the kind of failure that occurred last week…except that it didn’t.

Phil Wainewright has an interesting post over at ZDnet.com on the “Seven lessons to learn from Amazon’s outage.” The first two points he makes are particularly important: First, “Read your cloud provider’s SLA very carefully” – because it appears that, despite the considerable pain some of Amazon’s customers were feeling, the SLA was not breached, legally speaking. Second, “Don’t take your provider’s assurances for granted” – for reasons that should be obvious.

Wainewright’s final point, though, may be the most disturbing, because it focuses on Amazon’s “lack of transparency.” He quotes BigDoor CEO Keith Smith as saying, “If Amazon had been more forthcoming with what they are experiencing, we would have been able to restore our systems sooner.” This was echoed in Santa Barbara’s blog post where, in discussing customers’ options for failing over to a different cloud, he observes, “Perhaps they would have started that process had AWS communicated at the start that it would have been such a big outage, but AWS communication is – frankly – abysmal other than their PR.” The transparency issue was also echoed by Andrew Hickey in an article posted April 26 on CRN.com.

CRN also wrote about “lessons learned,” although they came up with 10 of them. Their first point is that “Cloud outages are going to happen…and if you can’t stand the outage, get out of the cloud.” They go on to talk about not putting “Blind Trust” in the cloud, and to point out that management and maintenance are still required – “it’s not a ‘set it and forget it’ environment.”

And it’s not like this is the first time people have been affected by a failure in the cloud:

  • Amazon had a significant outage of their S3 online storage service back in July, 2008. Their northern Virginia data center was affected by a lightning strike in July of 2009, and another power issue affected “some instances in its US-EAST-1 availability zone” in December of 2009.
  • Gmail experienced a system-wide outage for a period of time in August, 2008, then was down again for over 1 ½ hours in September, 2009.
  • The Microsoft/Danger outage in October, 2009, caused a lot of T-Mobile customers to lose personal information that was stored on their Sidekick devices, including contacts, calendar entries, to-do lists, and photos.
  • In January, 2010, failure of a UPS took several hundred servers offline for hours at a Rackspace data center in London. (Rackspace also had a couple of service-affecting failures in their Dallas area data center in 2009.)
  • Salesforce.com users have suffered repeatedly from service outages over the last several years.

This takes me back to a comment made by one of our former customers, who was the CIO of a local insurance company, and who later joined our engineering team for a while. Speaking of the ASPs of a decade ago, he stated, “I wouldn’t trust my critical data to any of them – because I don’t believe that any of them care as much about my data as I do. And until they can convince me that they do, and show me the processes and procedures they have in place to protect it, they’re not getting my data!”

Don’t get me wrong – the “Cloud” (however you choose to define it…and that’s part of the problem) has its place. Cloud services are becoming more affordable, and more reliable. But, as one solution provider quoted in the CRN “lessons learned” article put it, “Just because I can move it into the cloud, that doesn’t mean I can ignore it. It still needs to be managed. It still needs to be maintained.” Never forget that it’s your data, and no one cares about it as much as you do, no matter what they tell you. Forrester analyst Rachel Dines may have said it best in her blog entry from last week: “ASSUME NOTHING. Your cloud provider isn’t in charge of your disaster recovery plan, YOU ARE!” (She also lists several really good questions you should ask your cloud provider.)

Cloud technologies can solve specific problems for you, and can provide some additional, and valuable, tools for your IT toolbox. But you dare not assume that all of your problems will automagically disappear just because you put all your stuff in the cloud. It’s still your stuff, and ultimately your responsibility.

I recently discovered a video on “Citrix TV” that does as good a job as I’ve ever seen in presenting the big picture of desktop and application virtualization using XenApp and XenDesktop (which, as we’ve said before, includes XenApp now). The entire video is just over 17 minutes long, which is longer than most videos we’ve posted here (I prefer to keep them under 5 minutes or so), but in that 17 minutes, you’re going to see:

  • How easy it is for a user to install the Citrix Receiver
  • Self-service application delivery
  • Smooth roaming (from a PC to a MacBook)
  • Application streaming for off-line use
  • A XenDesktop virtual desktop following the user from an HP Thin Client…
    • …to an iPad…
    • …as the iPad switches to 3G operation aboard a commuter train…
    • …to a Mac in the home office…
    • …to a Windows multi-touch PC in the kitchen…
    • …to an iPhone on the golf course.
  • And a demo of XenClient to wrap things up.

I remember, a few years ago, sitting through the keynote address at a Citrix conference and watching a similar video on where the technology was headed. But this isn’t smoke and mirrors, and it isn’t a presentation of some future, yet-to-be-released technology. All of this functionality is available now, and it’s all included in a single license model. The future is here. Now.

I think you’ll find that it’s 17 minutes that are well-spent:

Most companies instinctively know that they need to be prepared for an event that will compromise business operations, but it’s often difficult to know where to begin.  We hear a lot of acronyms: “BC” (Business Continuity), “DR” (Disaster Recovery), “BIA” (Business Impact Analysis), “RA” (Risk Assessment), but not a lot of guidance on exactly what those things are, or how to figure out what is right for any particular business.

Many companies we meet with today are not really sure what components to implement or what to prioritize.  So what is the default reaction?  “Back up my Servers!  Just get the stuff off-site and I will be OK.”   Unfortunately, this can leave you with a false sense of security.  So let’s stop and take a moment to understand these acronyms that are tossed out at us.

BIA (Business Impact Analysis)
BIA is a process through which a business will gain an understanding from a financial perspective how and what to recover once a disruptive business event occurs.   This is one of the more critical steps and should be done early on as it directly impacts  BC and DR. If you’re not sure how to get started, get out a blank sheet of paper, and start listing everything you can think of that could possibly disrupt your business. Once you have your list, rank each item on a scale of 1 – 3 on how likely it is to happen, and how severely it would impact your business if it did. This will give you some idea of what you need to worry about first (the items that were ranked #1 in both categories). Congratulations! You just performed a Risk Assessment!

Now, before we go much farther, you need to think about two more acronyms: “RTO” and “RPO.” RTO is the “Recovery Time Objective.” If one of those disruptive events occurs, how much time can pass before you have to be up and running again? An hour? A half day? A couple of days? It depends on your business, doesn’t it? I can’t tell you what’s right for you – only you can decide. RPO is the “Recovery Point Objective.” Once you’re back up, how much data is it OK to have lost in the recovery process? If you have to roll back to last night’s backup, is that OK? How about last Friday’s backup? Of course, if you’re Bank of America and you’re processing millions of dollars worth of credit card transactions, the answer to both RTO and RPO is “zero!” You can’t afford to be down at all, nor can you afford to lose any data in the recovery process. But, once again, most of our businesses don’t need quite that level of protection. Just be aware that the closer to zero you need those numbers to be, the more complex and expensive the solution is going to be!

BC (Business Continuity)
Business Continuity planning is the process through which a business develops a specific plan to assure survivability in the event of a disruptive business event: fire, earthquake, terrorist events, etc.  Ideally, that plan should encompass everything on the list you created – but if that’s too daunting, start with a plan that addresses the top-ranked items. Then revise the plan as time and resources allow to include items that were, say, ranked #1 in one category and #2 in the other, and so forth. Your plan should detail specifically how you are going to meet the RTO and RPO you decided on earlier.

And don’t forget the human factor. You can put together a great plan for how you’re going to replicate data off to another site where you can have critical systems up and running within a couple of hours of your primary facility turning into a smoking hole in the ground. But where are your employees going to report for work? Where will key management team members convene to deal with the crisis and its aftermath? How are they going to get there if transportation systems are disrupted, and how will they communicate if telephone lines are jammed?

DR (Disaster Recovery)
Disaster recovery is the process or action a business takes to bring the business back to a basic functioning entity after a disruptive business event. Note that BC and DR are complementary: BC addresses how you’re going to continue to operate in the face of a disruptive event; DR addresses how you get back to normal operation again.

Most small business think of disasters as events that are not likely to affect them.  Their concept of “disaster” is that of a rare act of God or a terrorist attack.  But in reality, there are many other things that would qualify as a “disruptive business event:” fire, long term power loss, network security breach, swine flu pandemic, and in the case of one of my clients, a fire in the power vault of a building that crippled the building for three days.  It is imperative to not overlook some of the simpler events that can stop us from conducting our business.

Finally, it is important to actually budget some money for these activities. Don’t try to justify this with a classic Return on Investment calculation, because you can’t. Something bad may never happen to your business…or it could happen tomorrow. If it never happens, then the only return you’ll get on your investment is peace of mind (or regulatory compliance, if you’re in a business that is required to have these plans in place). Instead, think of the expense the way you think of an insurance premium, because, just like an insurance premium, it’s money you’re paying to protect against a possible future loss.

Dan Feller is a Lead Architect with the Citrix Consulting group, and has written extensively about XenDesktop. We found his series on the top ten mistakes people make when implementing desktop virtualization to be quite enlightening. In case you missed it, we thought we’d share his “top ten” list here, with links to the individual posts. We would highly recommend that you take the time to read through the series in its entirety:

#10 – Not calculating user bandwidth requirements
Back in the “good old days” of MetaFrame, when we didn’t particularly care about 3D graphics, multimedia content, etc., we could get by with roughly 20 Kbps of network bandwidth per user session. That’s not going to cut it for a virtualized desktop, for a number of reasons that Dan outlines in his blog post. He provides the following estimates for the average bandwidth required both with and without the presence of a pair of Citrix Branch Repeaters (which have some secret sauce that is specifically designed to accelerate Citrix traffic) between the client device and the virtual desktop session:

Parameter XenDesktop Bandwidth without Branch Repeater XenDesktop Bandwidth with Branch Repeater
Office Productivity Apps 43 Kbps 31 Kbps
Internet 85 Kbps 38 Kbps
Printing 553 – 593 Kbps 155 – 180 Kbps
Flash Video (with HDX redirection) 174 Kbps 128 Kbps
Standard WMV Video (with HDX redirection) 464 Kbps 148 Kbps
HD WMV Video (with HDX redirection) 1812 Kbps 206 Kbps

NOTE: These are estimates – your mileage may vary!

One thing that should come across loud and clear from the table above is what a huge difference the Citrix Branch Repeater can make in your bandwidth utilization. And as we’ve always said: you only buy hardware once – bandwidth costs go on forever!

#9 – Not considering the user profile
It should go without saying that user profiles are important. But if it’s number 9 on the list of things people most often screw up, then apparently it doesn’t. In a nutshell: If you mess up the users’ profiles, the users won’t be happy – logon/logoff performance will suffer, settings (including personalization) will be lost. If the users aren’t happy, they will be extremely vocal about it, and your VDI deployment will fail for lack of user buy-in and support. There are some great tools available for managing user profiles, including the Citrix Profile Manager, and the AppSense Environment Manager. AppSense can even maintain a consistent user experience across platforms – making sure that the user profile is the same regardless of whether the user is logged onto a Windows XP system, a Windows 7 System, or a Windows Server 2008 R2-based XenApp server.

Do yourself a favor and make sure you understand what your users’ profile requirements are, then investigate the available tools and plan accordingly.

#8 – Lack of an application virtualization strategy
How many applications are actually deployed in your organization? Do you even know? Are the versions consistent across all users? Which users use which applications? You have to understand the application landscape before you can decide how you’re going to deploy applications in your new virtualized desktop environment.

You have three basic choices on how to deliver apps:

  1. You can install every application into a single desktop image. That means that whenever an application changes, you have to change your base image, and do regression testing to make sure that the new or changed application didn’t break something else.
  2. You can create multiple desktop images with different application sets in each image, depending on the needs of your different user groups. Now if an application changes, you may have to change and do regression testing on multiple images. It’s worth noting that many organizations have been taking this approach in managing PC desktop images for years…but part of the promise of desktop virtualization is that, if done correctly, you can break out of that cycle. But to do that, you must…
  3. Remove the applications from the desktop image and deliver them some other way: either by running them on a XenApp server, or by streaming the application using either the native XenApp streaming technology or Microsoft’s App-V (or some other streaming technology of your choice).

Ultimately, you may end up with a mixed approach, where some core applications that everyone uses are installed in the desktop image, and the rest are virtualized. But, once again, it’s critical to first understand the application landscape within your organization, and then plan (and test) carefully to determine the best application delivery approach.

#7 – Improper resource allocation
Quoting Dan: “Like me, many users only consume a fraction of their total potential desktop computing power, which makes desktop virtualization extremely attractive. By sharing the resources between all users, the overall amount of required resources is reduced. However, there is a fine line between maximizing the number of users a single server can support and providing the user with a good virtual desktop computing experience.”

This post provides some great guidelines on how to optimize the environment, depending on the underlying hypervisor you’re planning to use.

#6 – Protection from Anti-Virus (as well as protection from viruses)
If you are provisioning desktops from a shared read-only image (e.g., Citrix Provisioning Services), then any virus infection will go away when the virtual PC is rebooted, because changes to the base image – including the virus – are discarded by design. But you still need AV protection, because the virus can use the interval between infection and reboot to propagate itself to other systems. The gotcha here is that the AV software itself can cause serious performance issues if it is not configured properly. Dan provides a great outline in this post for how to approach AV protection in a virtual desktop environment.

#5 – Managing the incoming storm
In most organizations, the majority of users arrive and start logging into their desktops at approximately the same time. What you don’t want is dozens, or hundreds, of virtual desktops trying to start up simultaneously, because it will hammer your virtualization environment. There are some very specific things you need to do to survive the “boot storm,” and Dan outlines them in this post.

#4 – Not optimizing the virtual desktop image
Dan provides several tips on things you should do to optimize your desktop image for the virtual environment. He also has specific sections on his blog that deal with recommended optimizations for Windows 7, and recommended optimizations for Windows XP.

#3 – Not spending your cache wisely
Specifically, we’re talking about configuring the system cache on your Provisioning Server appropriately, depending on the OS and amount of RAM in your Provisioning Server, and the type of storage repository you’re using for your vDisk(s).

#2 – Using VDI defaults
Default settings are great for getting a small Proof of Concept up and running quickly. But as you scale up your VDI environment, there are a number of things you should do. If you ignore them, performance will suffer, which means that users will be upset, which means that your VDI project is more likely to fail.

#1 – Improper storage design
This shouldn’t be a surprise, because we’ve written about this before, and even linked to a Citrix TV video of Dan discussing this very thing as part of developing a reference architecture for an SMB (under 500 desktops) deployment. We’re talking here about how to calculate the “functional IOPS” available from a given storage system, and what that means in relation to the number of IOPS a typical user will need at boot time, logon time, working hours (which will vary depending on the users themselves), and logoff time.

Just to round things out, Dan also tossed in a few “honorable mentions,” like the improper use of NIC teaming or not optimizing the NIC configuration in Provisioning Servers, trying to provision images to hardware with mismatched hardware device drivers (generally not an issue if you’re provisioning into a virtual environment), and failing to have a good business reason for launching a VDI project in the first place.

Again, this post was intended to whet your appetite by giving you enough information that you’ll want to read through Dan’s individual “top ten” posts. We would heartily recommend that you do that – you’ll probably learn a lot. (We certainly did!)

Volume 9 of the Microsoft Security Intelligence Report is out, and it makes for some pretty interesting reading. Among other things, it talks extensively about botnets – the various “families” of botnets, how they are used, how they work, and how access to them is sold and traded on the black market. Why? Because (quoting from the report), “When we look at that intelligence as a whole, it’s clear that botnets pose one of the most significant threats to system, organizational, and personal security.”

One of the things you’ll find in the report is a discussion of the infection rates of different versions of the Windows Operating System. You may have noticed that every now and then, as part of the critical patches and updates that Microsoft pushes to your PC, there’s something included called the “Malicious Software Removal Tool,” or “MSRT.” Microsoft keeps track of how often the MSRT actually finds malicious software when it runs, and that information is presented here as the number of computers cleaned of bot-related malware per 1,000 executions of the MSRT. Take a look at the following graph, which covers just Q2 of 2010 (click to view larger image):

Infection rate found per 1,000 executions of MSRT

I would like to particularly direct your attention to the fact that the infection rate for Windows XP SP3 is four times the infection rate for Windows 7, and the rate for Windows XP SP2 is five times the Win7 rate.

I understand that, for some people, the issue of upgrading from Windows XP to something else borders on being a religious discussion. But, honestly, if Windows 7 is that much more secure – which it clearly is – isn’t it getting a bit difficult to justify the “you can have my Windows XP when you pry it from my cold, dead fingers” position?

Of course, larger enterprises have some challenges to overcome. As we discussed in our September post about the cost of a Windows 7 migration, Gartner recently reported that, since most organizations weren’t planning to begin their Win7 migrations until 4Q2010, and with PC hardware replacement cycles typically running at four to five years at present, most organizations simply will not be able to complete a Windows 7 migration through the normal PC replacement cycle before Microsoft ends support for XP SP3. There just isn’t enough time left.

But even if there was enough time – why would you not want to move to an Operating System that’s four times more secure as quickly as you possibly can?

As Gartner pointed out, one alternative is to move some users to a “hosted virtual desktop” instead of a new PC. Translation: Making VDI part of your migration strategy can help get you out from behind the eight ball. It can also boost the overall security of your organization. Doesn’t that make it a conversation worth having?

At the recent Synergy Berlin conference, Citrix announced Access Gateway 5.0. We have confirmed that, as of now, 5.0 is available for download from the Citrix download site – both as an update for the CAG 2010 hardware appliance, and in Access Gateway VPX (virtual appliance) format. (Note: you will need a “mycitrix” account to download the software.)

One of the things I really like about 5.0 is that it now supports running two 2010 appliances in an active/passive HA configuration with automatic failover. This was a serious shortcoming of the original CAG appliance.

In earlier versions, if you were using the Access Gateway as a general-purpose SSL VPN, you could configure HA of a sort within the Access Gateway client plug-in, by defining primary and secondary Access Gateways for the client to connect to. However, if you were simply running the Access Gateway in “CSG replacement” mode to connect to a XenApp farm without requiring your users to first establish an SSL/VPN connection, you had no ability to provide automatic failover unless you had some kind of network load balancing device in front of multiple Access Gateway appliances. That meant, of course, that to avoid having the load balancing device become a single point of failure, you had to have some kind of HA functionality there as well. By the time you were done, the price tag had climbed to a level that just didn’t make sense for some smaller deployments.

NOTE: This specifically applies to the 2010 appliance. The CAG Enterprise models, because they are built on the NetScaler hardware platform, have always supported operation as HA pairs with automatic failover. Of course, a CAG MPX 5500 also carries a $9,000 list price, compared to $3,500 for a CAG 2010.

Now, with the release of 5.0, you can purchase two 2010 appliances (which will cost you less than a single MPX 5500), and run them as an active/passive HA pair. Thank you very much, Citrix CAG team!

Here are a couple of videos from Citrix TV. The first deals with how to upgrade an existing CAG 2010 to the 5.0 software using a USB flash drive, and then set up the basic system parameters:

The second video shows how to configure a pair of appliances for active/passive failover:

You can access several other “how-to” videos by going to http://www.citrix.com/tv, and searching on “Access Gateway 5.0.”

In this installment of the Moose Logic Video Series, Steve Parlee, our Director of Engineering, talks about:

  • Why we always use iSCSI HBAs in our Citrix XenServer deployments.
  • The possible risks of using HA in a two-server pool. (NOTE: Initial testing indicates that XenServer v5.6 may not present the same problems in a two-server pool as earlier versions. When we have completed our testing, we will post an update here.)
  • A useful utility for XenServer called “hostdevscan.”