Your are here: Home > Blog

Our friends at DataCore ran a press release yesterday positioning the new release (v8.1) of SANsymphony-V as a “storage hypervisor.” On the surface, that may just sound like some nice marketing spin, but the more I thought about it, the more sense it made – because it highlights one of the major differences between DataCore’s products and most other SAN products out there.

To understand what I mean, let’s think for a moment about what a “hypervisor” is in the server virtualization world. Whether you’re talking about VSphere, Hyper-V, or XenServer, you’re talking about software that provides an abstraction layer between hardware resources and operating system instances. An individual VM doesn’t know – or care – whether it’s running on an HP Server, a Dell, an IBM, or a “white box.” It doesn’t care whether it’s running on an Intel or an AMD processor. You can move a VM from one host to another without worrying about changes in the underlying hardware, bios, drivers, etc. (Not talking about “live motion” – that’s a little different.) The hypervisor presents the VM with a consistent execution platform that hides the underlying complexity of the hardware.

So, back to DataCore. Remember that SANsymphony-V is a software application that runs on top of Windows Server 2008 R2. In most cases, people buy a couple of servers that contain a bunch of local storage, install 2008 R2 on them, install SANsymphony-V on them, and turn that bunch of local storage into full-featured iSCSI SAN nodes. (We typically run them in pairs so that we can do synchronous mirroring of the data across the two nodes, such that if one node completely fails, the data is still accessible.) But that’s not all we can do.

Because it’s running on a 2008 R2 platform, it can aggregate and present any kind of storage the underlying Server OS can access at the block level. Got a fibre channel SAN that you want to throw into the mix? Great! Put fiber channel Host Bus Adapters (HBAs) in your DataCore nodes, present that storage to the servers that SANsymphony-V is running on, and now you can manage the fibre channel storage right along with the local storage in your DataCore nodes. Got some other iSCSI SAN that you’d like to leverage? No problem. Just make sure you’ve got a couple of extra NICs in the DataCore nodes (or install iSCSI HBAs if you want even better performance), present that iSCSI storage to the DataCore nodes, and you can manage it as well. You can even create a storage pool that crosses resource boundaries! And now, with the new auto-tiering functionality of SANsymphony-V v8.1, you can let DataCore automatically migrate the most frequently accessed data to the highest-performing storage subsystems.

Or how about this: You just bought a brand new storage system from Vendor A to replace the system from Vendor B that you’ve been using for the past few years. You’d really like to move Vendor B’s system to your disaster-recovery site, but Vendor A’s product doesn’t know how to replicate data to Vendor B’s product. If you front-end both vendors’ products with DataCore nodes, the DataCore nodes can handle the asynchronous replication to your DR site. Alternately, maybe you bought Vendor A’s system because it offered higher performance than Vendor B’s system. Instead of using Vendor B’s product for DR, you can present both systems to SANsymphony-V and leverage its auto-tiering feature to automatically insure that the data that needs the highest performance gets migrated to Vendor A’s platform.

So, on the back end, you can have disparate SAN products (iSCSI, fibre channel, or both) and local storage (including “JBOD” expansion shelves), and a mixture of SSD, SAS, and SATA drives. The SANsymphony-V software masks all of that complexity, and presents a consistent resource – in the form of iSCSI virtual volumes – to the systems that need to consume storage, e.g., physical or virtual servers.

That really is analogous to what a traditional hypervisor does in the server virtualization world. So it is not unreasonable at all to call SANsymphony-V a “storage hypervisor.” In fact, it’s pretty darned clever positioning, and I take my hat off to the person who crafted the campaign.

Today, we’re going to play “What’s Wrong with This Picture.” First of all, take a look at the following screen capture. (You can view it full-sized by clicking on it.)

Example of Phishing Email

Example of Phishing Email


Now let’s see if you can list all the things that are wrong with this email. Here’s what I came up with:

  • There is no such thing as “Microsoft ServicePack update v6.7.8.”
  • The Microsoft Windows Update Center will never, ever send you a direct email message like this.
  • Spelling errors in the body of the email: “This update is avelable…” “…new futures were added…” (instead of “features”) and “Microsoft Udates” (OK, that last one is not visible in my screen cap, so it doesn’t count).
  • Problems with the hyperlink. Take a look at the little window that popped up when I hovered my mouse over the link: The actual link is to an IP address (85.214.70.156), not to microsoft.com, as the anchor text would have you believe. Furthermore, the directory path that finally takes you to the executable (“bilder/detail/windowsupdate…”) is not what I would expect to see in the structure of a Microsoft Web site.”

If you want to know what sp-update.v678.exe would do if you downloaded and executed it, take a look at the description on the McAfee Web site (click on the “Virus Characteristics” tab). Suffice it to say that this is not something you want on your PC.

Sad to say, I suspect that thousands of people have clicked through on it because it has the Windows logo at the top with a cute little “Windows Update Center” graphic.

Would you have spotted it as a phishing attempt? Did you spot other giveaways in addition to the ones I listed above? Let us know in the comments.

Red Cross Ready Rating Program

Ready Rating Program Seal

A few days ago, I spotted a headline in the local morning paper: “SBA Partners with the Red Cross to Promote Disaster Planning.” We’ve written some posts in the past that dealt with the importance of DR planning, and how to go about it, so this piqued my curiosity enough that I visited the Red Cross “Ready Rating” Web site. I was sufficiently impressed with what I found there that I wanted to share it with you.

Membership in the Ready Rating program is free. All you have to do to become a member is to sign up and take the on-line self-assessment, which will help you determine your current level of preparedness. And I’m talking about overall business preparedness, not just IT preparedness. The assessment rates you on your responses to questions dealing with things like:

  • Have you conducted a “hazard vulnerability assessment,” including identifying appropriate emergency responders (e.g., police, fire, etc.) in your area and, if necessary, obtaining agreements with them?
  • Have you developed a written emergency response plan?
  • Has that plan been communicated to employees, families, clients, media representatives, etc.?
  • Have you developed a “continuity of operations plan?”
  • Have you trained your people on what to do in an emergency?
  • Do you conduct regular drills and exercises?

That last point is more important than you might think. It’s not easy to think clearly when you’re in the middle of an earthquake, or when you’re trying to find the exit when the building is on fire and there’s smoke everywhere. The best way to insure that everyone does what they’re supposed to do is to drill until the response is automatic. It’s why we had fire drills when we were in elementary school. It’s still effective now that we’re all grown up.

Once you become a member, your membership will automatically renew from year to year, as long as you take the self-assessment annually and can show that your score has improved from the prior year. (Once your score reaches a certain threshold, you’re only required to maintain that level to retain your membership.)

So, why should you be concerned about this? It’s hard to imagine that, after the tsunami in Japan and the flooding and tornadoes here at home, there’s anyone out there who still doesn’t get it. But, just in case, consider these points taken from the “Emergency Fast Facts” document in the members’ area:

  • Only 2 in 10 Americans feel prepared for a catastrophic event.
  • Close to 60% of Americans are wholly unprepared for a disaster of any kind.
  • 54% of Americans don’t prepare because they believe a disaster will not affect them – although 51% of Americans have experienced at least one emergency situation where they lost utilities for at least three days, had to evacuate and could not return home, could not communicate with family members, or had to provide first aid to others.
  • 94% of small business owners believe that a disaster could seriously disrupt their business within the next two years.
  • 15 – 40% of small businesses fail following a natural or man-made disaster.

If you’re not certain how to even get started, they can help there as well. Here’s a screen capture showing a partial list of the resources available in the members’ area:

Member Resources

You may also want to review the following articles and posts:

And speaking of getting started, check this out: Just about everything I’ve ever read about disaster preparedness talks about the importance of having a “72-hour kit” – something that you can quickly grab and take with you that contains everything you need to survive for three days. Well, for those of you who haven’t got the time to scrounge up all of the recommended items and pack them up, you may find the solution at your local Costco. Here’s what I spotted on my most recent trip:

Pre-Packaged 3-day Survival Kit

Yep, it’s a pre-packaged 3-day survival kit. The cost at my local store (in Woodinville, WA, if you’re curious) was $69.95. That, in my opinion, is a pretty good deal.

So, if you haven’t started planning yet, consider this your call to action. Don’t end up as a statistic. You can do this.

Mark Twain allegedly came up with the famous line: "Figures don’t lie, but liars figure." That’s a good thing to keep in mind any time you’re looking through a report that was sponsored ("sponsored" = "paid for") by a vendor that concludes that their product is better than the other guy’s.

Maybe it is better than the other guy’s. But you might want to look closely at what was tested, how it was tested, and whether they were, shall we say, selective in the facts they present.

Case in point: The Tolly Group’s report, released May 27, comparing VMware View 4.6 Premier Edition to Citrix XenDesktop 5 Platinum edition. There are several interesting aspects to this report, which are dealt with in detail in Tal Klein’s blog over on the Citrix Community blog site. Here are a few of the more egregious items:

  • VMware View 4.6 Premier licensing costs less than XenDesktop 5 Platinum. Absolutely true, and absolutely irrelevant. That’s like pointing out that if you load every possible dealer option onto your new car, it’s going to cost more than the basic model. Thank you, Captain Obvious. If you want an "apples-to-apples" comparison, you need to compare VMware View to the XenDesktop VDI Edition. But wait, if you do that, XenDesktop is actually less expensive, and that would be an awkward point to publish in a paper that’s being paid for by VMware.
  • VMware’s PCoIP provides a more consistent multi-media experience than XenDesktop 5. (Over a LAN. Using a single thin client device that did not support any of the Citrix HDX media acceleration features.) Sorry, guys, but once again this is not an apples-to-apples comparison. And did they publish any results of testing across a WAN link? Nope…and for the same reason they didn’t use XenDesktop VDI Edition for their price comparison.
  • It’s easier to upgrade View 4.5 to View 4.6 than it is to upgrade XenDesktop 4 to XenDesktop 5. Once again, both true and irrelevant. It’s easier to give your kitchen a new coat of paint than it is to rip out the cabinets and completely remodel it. Anybody surprised by that? There are significant architectural changes from XenDesktop 4 to XenDesktop 5. It shouldn’t be surprising to anyone that this will involve more effort than a "dot release" upgrade.

I’ve always been skeptical of vendor-sponsored "analysis" reports, and, to be fair, Citrix has used the Tolly Group in the past for its own sponsored reports – but it seems to me that this one is just over the top. Apparently, former Gartner analyst Simon Bramfitt agrees. His pithy assessment of the report: "There are undiscovered tribes lost in the darkest parts of the Amazon jungle that would know exactly what to do if a vendor airdropped a pile of competitive marketing literature authored by the Tolly Group; send it back, and asked [sic] that it be re-printed on more absorbent paper."

What do you think?

We have, for a long time, been fans of thin client devices. However, if you run the numbers, it turns out that thin-clients may not necessarily be the most cost-effective client devices for a VDI deployment.

Just before writing this post, I went to the Dell Web site and priced out a low-end Vostro Mini Tower system: 3.2 GHz Intel E5800 dual-core processor, 3 Gb RAM, 320 Gb disk drive, integrated Intel graphics, Windows 7 Professional 64-bit OS, 1 year next-business-day on-site service. Total price: $349.00.

When you buy a new PC with an OEM license of Windows on it, you have 90 days to add Microsoft Software Assurance to that PC. That will cost you $109.00 for two years of coverage. You’re now out of pocket $458.00. However, one of the benefits of Software Assurance is that you don’t need any other Microsoft license component to access a virtual desktop OS. You also have the rights, under SA, to install Windows Thin PC (WinTPC) on the system, which strips out a lot of non-essential stuff and allows you to administratively lock it down – think of WinTPC as Microsoft’s own tool kit for turning a PC into a thin client device.

Now consider the thin client option. A new Wyse Winterm built on Embedded Windows 7 carries an MSRP of $499. There are less expensive thin clients, but this one would be the closest to a Windows 7 PC in terms of the user experience (media redirection to a local Windows Media Player, Windows 7 user interface, etc.). However, having bought the thin client, you must now purchase a Microsoft Virtual Desktop Access (VDA) license to legally access your VDI environment. The VDA license is only available through the Open Value Subscription model, and will cost you $100/year forever. So your total cost over two years is $699 for the Wyse device vs. $458 for the Dell Vostro.

After the initial two year term, you’ll have to renew Software Assurance on the PC for another two years. That will continue to cost you roughly $54.50/year vs. $100/year to keep paying for that VDA license.

Arguably, the Wyse thin client is a better choice for some use cases. It will work better in a hostile environment – like a factory floor – because it has no fan to pull dust and debris into the case. In fact, it has no moving parts at all, and will likely last longer as a result…although PC hardware is pretty darned reliable these days, and at that price point, the low-end PC becomes every bit as disposable as a thin client device.

So, as much as we love our friends at Wyse, the bottom line is…well, it’s the bottom line. And if you’re looking at a significant VDI deployment, it might be worth running the numbers both ways before you decide for sure which way you’re going to go.

No, I’m not talking about the weather here in San Francisco – that’s actually been pretty good. It’s just that everywhere you look here at the Citrix Summit / Synergy conference, the talk is all about clouds – public clouds, private clouds, even personal clouds, which, according to Mark Templeton’s keynote on Wednesday, refers to all your personal stuff:

  • My Devices – of which we have an increasing number
  • My Preferences – which we want to be persistent across all of our devices
  • My Data – which we want to get to from wherever we happen to be
  • My Life – which increasingly overlaps with…
  • My work – which I want to use My Devices to perform, and which I want to reflect My Preferences, and which produces Work Data that is often all jumbled up with My Data (and that can open up a whole new world of problems, from security of business-proprietary information to regulatory compliance).

These five things overlap in very fluid and complex ways, and although I’ve never heard them referred to as a “personal cloud” before, we do need to think about all of them and all of the ways they interact with each other. So if creating yet another cloud definition helps us do that, I guess I’m OK with that, as long as nobody asks me to build one.

But lest I be accused of inconsistency, let me quickly recap the cloud concerns that I shared in a post about a month ago, hard on the heels of the big Amazon EC2 outage:

  1. We have to be clear in our definition of terms. If “cloud” can simply mean anything you want it to mean, then it means nothing.
  2. I’m worried that too many people are running to embrace the public cloud computing model while not doing enough due diligence first:
    1. What, exactly, does your cloud provider’s SLA say?
    2. What is their track record in living up to it?
    3. How well will they communicate with you if problems crop up?
    4. How are you insuring that your data is protected in the event that the unthinkable happens, there’s a cloud outage, and you can’t get to it?
    5. What is your business continuity plan in the event of a cloud outage? Have you planned ahead and designed resiliency into the way you use the cloud?
    6. Never forget that, no matter what they tell you, nobody cares as much about your stuff as you do. It’s your stuff. It’s your responsibility to take care of it. You can’t just throw it into the cloud and never think about it again.

Having said that, and in an attempt to adhere to point #1 above, I will henceforth stick to the definitions of cloud computing set forth in the draft document (#800-145) released by the National Institute of Standards and Technology in January of this year, and I promise to tell you if and when I deviate from those definitions. The following are the essential characteristics of cloud computing as defined in that draft document:

  • On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service’s provider.
  • Broad network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
  • Resource pooling. The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, network bandwidth, and virtual machines.
  • Rapid elasticity. Capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out, and rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be purchased in any quantity at any time.
  • Measured Service. Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.

If you’ll read through those points a couple of times and give it a moment’s thought, a couple of things should become obvious.

First, most of the chunks of infrastructure that are being called “private clouds” aren’t – at least by the definition above. Standing up a XenApp or XenDesktop infrastructure, or even a mixed environment of both, does not mean that you have a private cloud, even if you access it from the Internet. Virtualizing a majority, or even all, of your servers doesn’t mean you have a private cloud.

Second, very few Small & Medium Enterprises can actually justify the investment required to build a true private cloud as defined above, although some of the technologies that are used to build public and private clouds (such as virtualization, support for broad network access, and some level of user self-service provisioning) will certainly trickle down into SME data centers. Instead, some will find that it makes sense to move some services into public clouds, or to leverage public clouds to scale out or scale in to address their elasticity needs. And some will decide that they simply don’t want to be in the IT infrastructure business anymore, and move all of their computing into a public cloud. And that’s not a bad thing, as long as they pay attention to my point #2 above. If that’s the way you feel, we want to help you do it safely, and in a way that meets your business needs. That’s one reason why I’ve been here all week.

So stay tuned, because we’ll definitely be writing more about the things we’ve learned here, and how you can apply them to make your business better.

If you’ve been following this blog for any length of time, you know that we’ve written extensively about XenDesktop, and spent a lot of time on best practices and problems to avoid. And one of the biggest problems to avoid is poor storage design resulting in poor VDI performance.

In a nutshell, the problem is that a Windows desktop OS uses disk far differently than a Windows server OS. Thanks to the way Windows uses the swap file, disk writes outnumber disk reads by about 2 to 1. You can build your virtual desktop infrastructure on the latest and greatest server hardware, with tons of processing power and insanely huge amounts of RAM, but if all of the disk I/O for all of those virtual desktops is hitting your SAN, you’ve got a scalability problem on your hands.

Provisioning Services (“PVS”) can help to mitigate this in two ways (assuming for sake of argument that you’re provisioning multiple virtual systems from a common, read-only image): First, if you build your Provisioning Servers correctly, you should be able to serve up most of the OS read operations from the Provisioning Server’s own cache memory. Second, you can use the virtualization host’s local disk storage as the required “write cache” – because all of those write operations have to go somewhere while the virtual system is running.

But XenDesktop 5 introduced a new way to provision desktops called “Machine Creation Services” (“MCS”). We wrote about this in the April edition of our Moose Views newsletter, so if you’re not familiar with all the pros and cons of MCS vs. PVS, I’d encourage you to take a brief time out and read that article. Suffice it to say that, despite all the advantages of MCS, the biggest downside of using MCS to provision pooled desktops was that all of the IOPS hit your SAN storage, which limited the scalability of an MCS-provisioned VDI deployment.

But all of that just changed, with the release of XenDesktop 5 Service Pack 1, which was made available for download a week ago (May 13). With SP1, XenDesktop 5 is now able to take advantage of the “IntelliCache” feature that was introduced as part of XenServer v5.6 Service Pack 2. Using MCS with the combination of XenDesktop 5 SP1 and XenServer SP2…

  • The first time a virtual desktop is booted on a given XenServer, the boot image is cached on that XenServer’s local storage.
  • Subsequent virtual desktops booted on that same XenServer will boot and run from that locally cached image.
  • You can use the XenServer’s local storage for the write cache as well.

The bottom line is that you can move as much as 90% of the IOPS off of the SAN and onto local XenServer storage, removing nearly all of the scalability limitations from an MCS-provisioned environment.

With most of the IOPS for running VMs taking place on local storage, it’s pretty straightforward to figure out how many VMs you can expect to support on a given virtualization host. Dan Feller’s blog post does a great job of walking through the process of calculating the functional IOPS that your local XenServer storage repository should be able to support, and inferring from that number how many light, normal, or power users you should be able to support as a result.

This also means that using XenServer as the hypervisor for your XenDesktop 5 deployment is going to yield a significant performance advantage over any other hypervisor, unless or until the other guys come out with similar local caching features. So, if you’re a VMware shop, my advice is this: Go ahead and virtualize all of the supporting XenDesktop server components on your VSphere infrastructure. Run your XenDesktop 5 VMs on XenServer hosts, and just don’t tell anyone! If you’re asked, just say, “Oh, yeah, these are my XenDesktop host systems – they’re completely separate from our VSphere infrastructure, because we don’t need the (insert favorite VSphere feature) function for these systems.” Your infrastructure will run better, and no one will know but you…

Many times, terms like “High Availability” and “Fault Tolerance” get thrown around as though they were the same thing. In fact, the term “fault tolerant” can mean different things to different people – and much like the terms “portal,” or “cloud,” it’s important to be clear about exactly what someone means by the term “fault tolerant.”

As part of our continuing efforts to guide you through the jargon jungle, we would like to discuss redundancy, fault tolerance, failover, and high availability, and we’d like to add one more term: continuous availability.

Our friends at Marathon Technologies shared the following graphic, which shows how IDC classifies the levels of availability:
graphic of availability levels

Redundancy is simply a way of saying that you are duplicating critical components in an attempt to eliminate single points of failure. Multiple power supplies, hot-plug disk drive arrays, multi-pathing with additional switches, and even duplicate servers are all part of building redundant systems.

Unfortunately, there are some failures, particularly if we’re talking about server hardware, that can take a system down regardless of how much you’ve tried to make it redundant. You can build a server with redundant hot-plug power supplies and redundant hot-plug disk drives, and still have the system go down if the motherboard fails – not likely, but still possible. And if it does happen, the server is down. That’s why IDC classifies this as “Availability Level 1″ (“AL1″ on the graphic)…just one level above no protection at all.

The next step up is some kind of failover solution. If a server experiences a catastrophic failure, the work loads are “failed over” to a system that is capable of supporting those workloads. Depending on those work loads, and what kind of fail-over solution you have, that process can take anywhere from minutes to hours. If you’re at “AL2,” and you’ve replicated your data using, say, SAN replication or some kind of server-to-server replication, it could take a considerable amount of time to actually get things running again. If your servers are virtualized, with multiple virtualization hosts running against a shared storage repository, you may be able to configure your virtualization infrastructure to automatically restart a critical workload on a surviving host if the host it was running on experiences a catastrophic failure – meaning that your critical system is back up and on-line in the amount of time it takes the system to reboot – typically 5 to 10 minutes.

If you’re using clustering technology, your cluster may be able to fail over in a matter of seconds (“AL3″ on the graphic). Microsoft server clustering is a classic example of this. Of course, it means that your application has to be cluster-aware, you have to be running Windows Enterprise Edition, and you may have to purchase multiple licenses for your application as well. And managing a cluster is not trivial, particularly when you’ve fixed whatever failed and it’s time to unwind all the stuff that happened when you failed over. And your application was still unavailable during whatever interval of time was required for the cluster to detect the failure and complete the failover process.

You could argue that a fail over of 5 minutes or less equals a highly available system, and indeed there are probably many cases where you wouldn’t need anything better than that. But it is not truly fault tolerant. It’s probably not good enough if you are, say, running a security application that’s controlling the smart-card access to secured areas in an airport, or a video surveillance system that sufficiently critical that you can’t afford to have a 5-minute gap in your video record, or a process control system where a five minute halt means you’ve lost the integrity of your work in process and potentially have to discard thousands of dollars worth of raw material and lose thousands more in lost productivity while you clean out your assembly line and restart it.

That brings us to the concept of continuous availability. This is the highest level of availability, and what we consider to be true fault tolerance. Instead of simply failing workloads over, this level allows for continuous processing without disruption of access to those workloads. Since there is no disruption in service there is no data loss, no loss of productivity and no waiting for your systems to restart your workloads.

So all this leads to the question of what your business needs.

Do you have applications that are critical to your organization? If those applications go down how long could you afford to be without access to them? If those applications go down how much data can you afford to lose? 5 minutes? An hour? And, most importantly, what does it cost you if that application is unavailable for a period of time? Do you know, or can you calculate it?

This is another way to ask what the requirements are for your “RTO” (“Recovery Time Objective” – i.e., how long, when a system goes down, do you have before you must be back up) and “RPO” (“Recovery Point Objective” – i.e., when you do get the system back up, how much data it is OK to have lost in the process). We’ve discussed these concepts in previous posts. These are questions that only you can answer, and the answers are significantly different depending on your business model. If you’re a small business, and your accounting server goes down, and all it means is that you have to wait until tomorrow to enter today’s transactions, it’s a far different situation from a major bank that is processing millions of dollars in credit card transactions.

If you can satisfy your business needs by deploying one of the lower levels of availability, great! Just don’t settle for an AL1 or even an AL3 solution if what your business truly demands is continuous availability.

Color me skeptical when it comes to the “cloud computing” craze. Well, OK, maybe my skepticism isn’t so much about cloud computing per se as it is about the way people seem to think it is the ultimate answer to Life, the Universe, and Everything (shameless Douglass Adams reference). In part, that’s because I’ve been around IT long enough that I’ve seen previous incarnations of this concept come and go. Application Service Providers were supposed to take the world by storm a decade ago. Didn’t happen. The idea came back around as “Software as a Service” (or, as Microsoft preferred to frame it, “Software + Services”). Now it’s cloud computing. In all of its incarnations, the bottom line is that you’re putting your critical applications and data on someone else’s hardware, and sometimes even renting their Operating Systems to run it on and their software to manage it. And whenever you do that, there is an associated risk – as several users of Amazon’s EC2 service discovered just last week.

I have no doubt that the forensic analysis of what happened and why will drag on for a long time. Justin Santa Barbara had an interesting blog post last Thursday (April 21) that discussed how the design of Amazon Web Services (AWS), and its segmentation into Regions and Availability Zones, is supposed to protect you against precisely the kind of failure that occurred last week…except that it didn’t.

Phil Wainewright has an interesting post over at ZDnet.com on the “Seven lessons to learn from Amazon’s outage.” The first two points he makes are particularly important: First, “Read your cloud provider’s SLA very carefully” – because it appears that, despite the considerable pain some of Amazon’s customers were feeling, the SLA was not breached, legally speaking. Second, “Don’t take your provider’s assurances for granted” – for reasons that should be obvious.

Wainewright’s final point, though, may be the most disturbing, because it focuses on Amazon’s “lack of transparency.” He quotes BigDoor CEO Keith Smith as saying, “If Amazon had been more forthcoming with what they are experiencing, we would have been able to restore our systems sooner.” This was echoed in Santa Barbara’s blog post where, in discussing customers’ options for failing over to a different cloud, he observes, “Perhaps they would have started that process had AWS communicated at the start that it would have been such a big outage, but AWS communication is – frankly – abysmal other than their PR.” The transparency issue was also echoed by Andrew Hickey in an article posted April 26 on CRN.com.

CRN also wrote about “lessons learned,” although they came up with 10 of them. Their first point is that “Cloud outages are going to happen…and if you can’t stand the outage, get out of the cloud.” They go on to talk about not putting “Blind Trust” in the cloud, and to point out that management and maintenance are still required – “it’s not a ‘set it and forget it’ environment.”

And it’s not like this is the first time people have been affected by a failure in the cloud:

  • Amazon had a significant outage of their S3 online storage service back in July, 2008. Their northern Virginia data center was affected by a lightning strike in July of 2009, and another power issue affected “some instances in its US-EAST-1 availability zone” in December of 2009.
  • Gmail experienced a system-wide outage for a period of time in August, 2008, then was down again for over 1 ½ hours in September, 2009.
  • The Microsoft/Danger outage in October, 2009, caused a lot of T-Mobile customers to lose personal information that was stored on their Sidekick devices, including contacts, calendar entries, to-do lists, and photos.
  • In January, 2010, failure of a UPS took several hundred servers offline for hours at a Rackspace data center in London. (Rackspace also had a couple of service-affecting failures in their Dallas area data center in 2009.)
  • Salesforce.com users have suffered repeatedly from service outages over the last several years.

This takes me back to a comment made by one of our former customers, who was the CIO of a local insurance company, and who later joined our engineering team for a while. Speaking of the ASPs of a decade ago, he stated, “I wouldn’t trust my critical data to any of them – because I don’t believe that any of them care as much about my data as I do. And until they can convince me that they do, and show me the processes and procedures they have in place to protect it, they’re not getting my data!”

Don’t get me wrong – the “Cloud” (however you choose to define it…and that’s part of the problem) has its place. Cloud services are becoming more affordable, and more reliable. But, as one solution provider quoted in the CRN “lessons learned” article put it, “Just because I can move it into the cloud, that doesn’t mean I can ignore it. It still needs to be managed. It still needs to be maintained.” Never forget that it’s your data, and no one cares about it as much as you do, no matter what they tell you. Forrester analyst Rachel Dines may have said it best in her blog entry from last week: “ASSUME NOTHING. Your cloud provider isn’t in charge of your disaster recovery plan, YOU ARE!” (She also lists several really good questions you should ask your cloud provider.)

Cloud technologies can solve specific problems for you, and can provide some additional, and valuable, tools for your IT toolbox. But you dare not assume that all of your problems will automagically disappear just because you put all your stuff in the cloud. It’s still your stuff, and ultimately your responsibility.

Back at the end of January, DataCore announced the availability of a new product called SANsymphony-V. This product replaces SANmelody in their product line, and is the first step in the eventual convergence of SANmelody and SANsymphony into a single product with a common user interface.

Note: In case you’re not familiar with DataCore, they make software that will turn an off-the-shelf Windows server into an iSCSI SAN node (FibreChannel is optional) with all the bells and whistles you would expect from a modern SAN product. You can read more about them on our DataCore page.

We’ve been playing with SANsymphony-V in our engineering lab, and our technical team is impressed with both the functionality and the new user interface – but that’s another post for another day. This post is focused on the packaging and pricing of SANsymphony-V, which in many cases can come in significantly below the old SANmelody pricing.

First, we need to recap the old SANmelody pricing model. SANmelody nodes were priced according to the maximum amount of raw capacity that node could manage. The full-featured HA/DR product could be licensed for 0.5 Tb, 1 Tb, 2 Tb, 3 Tb, 4 Tb, 8 Tb, 16 Tb, or 32 Tb. So, for example, if you wanted 4 Tb of mirrored storage (two 4 Tb nodes in an HA pair), you would purchase two 4 Tb licenses. At MSRP, including 1 year of software maintenance, this would have cost you a total of $17,496. But what if you had another 2 Tb of archival data that you wanted available, but didn’t necessarily need it mirrored between your two nodes? Then you would want 4 Tb in one node, and 6 Tb in the other node. However, since there was no 6 Tb license, you’d have to buy an 8 Tb license. Now your total cost is up to $21,246.

SANsymphony-V introduced the concept of separate node licenses and capacity licenses. The node license is based on the maximum amount of raw storage that can exist in the storage pool to which that node belongs. The increments are:

  • “VL1″ – Up to 5 Tb – includes 1 Tb of capacity per node (more on this in a moment)
  • “VL2″ – Up to 16 Tb – includes 2 Tb of capacity per node
  • “VL3″ – Up to 100 Tb – includes 8 Tb of capacity per node
  • “VL4″ – Up to 256 Tb – includes 40 Tb of capacity per node
  • “VL5″ – More than 256 Tb – includes 120 Tb of capacity per node

In my example above, with 4 Tb of mirrored storage and 2 Tb of non-mirrored storage, there is a total of 10 Tb of storage in the storage pool: (4 x 2) + 2 = 10. Therefore, each node needs a “VL2″ node license, since the total storage in the pool is more than 5 Tb but less than 16 Tb. We also need a total of 10 Tb of capacity licensing. We’ve already got 4 Tb, since 2 Tb of capacity were included with each node license. So we need to buy an additional six 1 Tb capacity licenses. At MSRP, this would cost a total of $14,850 – substantially less than the old SANmelody price.

The cool thing is, once we have our two VL2 nodes and our 10 Tb of total capacity licensing, DataCore doesn’t care how that capacity is allocated between the nodes. We can have 5 Tb of mirrored storage, we can have 4 Tb in one node and 6 Tb in the other, we can have 3 Tb in one node and 7 Tb in the other. We can divide it up any way we want to.

If we now want to add asynchronous replication to a third SAN node that’s off-site (e.g., in our DR site), that SAN node is considered a separate “pool,” so its licensing would be based on how much capacity we need at our DR site. If we only cared about replicating 4 Tb to our DR site, then the DR node would only need a VL1 node license and a total of 4 Tb of capacity licensing (i.e., a VL1 license + three additional 1 Tb capacity licenses, since 1 Tb of capacity is included with the VL1 license).

At this point, no new SANmelody licenses are being sold – although, if you need to, you can still upgrade an existing SANmelody license to handle more storage. If you’re an existing SANmelody customer with current software maintenance, rest assured that you will be entitled to upgrade to SANsymphony-V as a benefit of your software maintenance coverage. However, there will not be a mechanism that allows for an easy in-place upgrade until sometime in Q3. In the meantime, an upgrade from SANmelody to SANsymphony-V would entail a complete rebuild from the ground up. (Which we would be delighted to do for you if you just can’t wait for the new features.)