9 thoughts on “VDI and IOPS

  1. Dan, I liked your latest post on IOPs. However, even if you know all that stuff it is still difficult to be successful for a few reasons that come to mind.

    Purchasing storage for VDI is likely different for management and procurement. We have to help management see storage as performance, not capacity. The people signing checks want to use all the available resources to maximize their investment. They do not recognize that using disk capacity beyond disk performance results in outages.

    Also related to the purchasing of storage for VDI, vendors start eliminating parts to lower the cost of their solution to stay competitive. So, maybe their solution was solid, but procurement starts beating them up and the solution gets reworked and looks a lot different in the end.

    When we rush to purchase storage to meet project timelines, regardless of the available information about the desktop profile for VDI, we don’t have enough time to make the best decision. Even with slower timelines, information overload lowers the odds of selecting a suitable solution. It takes a lot of valuable time to get familiar with all the reference architectures, take customer reference calls, and read blogged feedback on the available solutions.

    The vendor needs to ask the right questions so they have what they need correctly size the solution. This is not always obvious until it is too late. A trade secret VDI calculator is hard to validate.

    Not recognizing when you are leaving a valuable VAR out of the loop that could have helped all the parties involved be successful.

    Lastly, overcoming a normalcy bias. Virtualization for servers does not typically require the same high level of design involvement as a VDI solution. Management underestimates the importance of design features and does not want to spend more money on a solution that is sized correctly until the VDI environment experiences large and prolonged outages.

    Eric Gustafson

  2. At Quest that’s exactly what we do; cache commonly read blocks into RAM, then coalesce and serialize writes to limit the amount of disk access.

    As for boot storms, it’s not just about VM creation or boot time, as we al know we can boot VMs in chunks of 10 or 20 at a time to limit disk IO, but when doing a wholesale update to thousands of VMs, the time that takes is substantial. If 99% of the Read IO is coming from RAM, the boot storm is a non-issue.

    Example; I have 1000 Win7 desktops spread across 7 VDI hosts and I want to delete them all and replace them with 1000 Win7 SP1 VMs. Booting Win7 reads about 300MB of data from disk, so booting. 125+ per host would require ~40GB of data to be read from disk. If we cache the boot process in RAM, we read only 300MB from disk (per host)

    The fundamental issue is that to deliver the IO to sustain this process we either need to stagger it out over time, or move the workloads to SAN where we can provide enough spindles to deliver the IO, as 1U servers don’t have enough drive slots to handle the required IO. This leaves us with only SAN or SSD as viable options.

    Now if we cache read IO in RAM and optimize write IO, we can use commodity local disk like 6 x SAS 15KRPM drives to easily deliver the IO and provide more than enough storage for 100-200 VMs per host.

    Customers don’t want to hear that they need special storage, special servers, special networks…that are all expensive, to deliver a virtual desktop, that is “supposed to be” less expensive than the physical equivalent.

    Remove the requirement for these for a good percentage of the virtual desktops (the non persistent VDI or session host workloads) and only spend the money for SAN, FC/iSCSI, clustering, HA…for the desktops that must be HA, I.e. developer desktops and the capital expenditure for the project drops significantly.

    Marry that with something that just works without any specialized virtualization skills, I.e. all the desktop team needs to know is how to install Windows and their apps and the broker configures and manages the VDI host and you have real value.

    1. I will continue to disagree that a boot storm is a problem for most customers, it’s not. The underlying point of my post was that too many people focus on the boot storm and read IOPs caching, the ignored and bigger problem is write IOPs. Read cache has a lot of solutions out there, this isn’t the barrier, it’s marketing. Shared storage solutions can easily cache GB’s of data in cache, no disk impact.

      Optimized write IO onto spinning disk doesn’t replace the need for SSD for writes and doesn’t scale beyond the next generation of Intel processors. Distributed IO, doing it locally as opposed to putting it all back on the centralized storage solution as you are doing is great architecture. Distributed write IO is a big deal and you’ve done great work here.

      my crystal ball says the hypervisor will do many of these things (read cache, write optimization) in the future. We’re on the cusp of storage vendor’s having SSD solutions for both read/write, time will tell if they solve the problem at a price that is cost effective…for the time being they majors vendors definitely are challenged here and your optimizations will indeed save dollars for non-persistent deployments.

      Citrix VDI-in-a-Box does not have the ram as disk optimizations that Quest has, but it does have an architecture based on local disk deployments, can’t wait to see some integration of these ideas in XenDesktop and better yet at the hypervisor.

      Thanks for the comments I enjoy the discussion… and congratulations on the latest release of vWorkspace, it has my attention.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s