Storage and fabric virtualization

7 08 2007

Aloha Open Systems Storage Guy,

What’s your take on virtualization? VSAN from Cisco, SVC from IBM? What other virtualization products are available from other vendors?

Thanks,
John

Cisco VSANs and IBM’s SVC are different things for certain :)

The VSAN allows you to create multiple logical fabrics within the same switch- you tell it what ports are part of what SAN, and you can manage the fabrics individually. It’s especially useful if you’re bridging two locations’ fabrics together for replication or something because it allows you to do “inter VSAN routing” if you have the right enterprise software feature. That would allow you to have two separate fabrics whose devices can see each other, but if the link between the sites fails (which is more likely than a switch failure), you won’t have the management nightmare of having to rebuild the original fabric out of two separated fabrics when the link comes back. VSANs are also commonly used to isolate groups of devices for the purpose of keeping those devices logically separated from parts of the network they’ll never need to interact with.

IBM’s SVC is a different technology that is supposed to consolidate multiple islands of FC storage. It’s essentially a Linux server cluster that you place between your application servers and the storage. It allows you to take all the storage behind it and create what they call “virtual disks”- essentially a LUN that’s passed to a server but contains multiple raids (possibly from multiple controllers). This gives you the option of striping your data across more spindles than you would be able to normally, and allows you to do dynamic thin provisioning when your datasets grow.

The only downside of the Cisco VSAN technology I can think of is its cost- it’s bloody expensive compared to a cheap low end solution, and for anything less than a 50 device FC fabric, I would questionable whether it’s worth it. There is an alternative from Brocade/McData they call LSAN, however I am not as familiar with it. I have been told that it’s slightly less complicated, but harder to manage, and doesn’t have the full feature-set of Cisco.

The downside to the IBM SVC is that you create latency for all your disk reads- every time a server needs to perform a write, it has to go through the Linux cluster first. It has a much larger cache than most controllers, so there’s a better chance that the data you’re looking for is already there, but if it’s not, your read performance might suffer a little because of the extra few milliseconds. The advantage is that you can now use incredibly cheap controllers with tiny amounts of cache, and it allows you to migrate data from any manufacturer’s device to any other manufacturer’s device without interrupting your servers. Under a virtualized environment like this, an older DS4300 like you have will perform pretty much on the same level as a more expensive DS4800 or EMC CX3-80 (assuming the same number of drives) because you don’t really use the cache of the underlying system. Another advantage of the SVC is that most FC storage controllers charge you either one time or over time for the number of servers you’re planning to connect to them. IBM charges a “partition license” fee for LUN masking, and EMC charges a “multipath maintenance” tax. Either way, the multipath drivers for SVC are free, and it only needs one partition from the controller, so you might be able to save money that way.

Did you have any specific questions about these topics you want more detail on?

Also, one of the new bloggers in the storage world- Barry Whyte– focuses on IBM SVC. He just started, but his blog will hopefully become a real resource for people with IBM storage virtualization on their mind.

Advertisements

Actions

Information

8 responses

7 08 2007
Barry Whyte

You rang ;-P

I’m in the middle of writing some stuff that should be of interest, plan to post over the next few days…

Anything specifc feel free to ask.

8 08 2007
John Call

Thanks OSSG! I’ve already got Barry’s RSS into my Google Reader. I’m patiently waiting for his posts. :)

I’ve had a quote prepared for the Cisco MDS line. You’re right, it is bloody expensive. I’m curious if there are any other types of “virtualization” going on in the SAN besides fabric virtualization ala Cisco VSAN / Brocade LSAN, and what seems to me as LUN virtualization ala IBM San Volume Controller (are there any companies who create competitive products for SVC) What other types of abstraction exist in the SAN?

I’d like a bit more clarification on some things you said. First, increased latency. Does this happen for both reads and writes? The expanded cache on the linux cluster nodes provide for greater cache hit %, but I assume the penalty would only apply to reads. Write operations would be cached and control based back to the host straight away, right? Second, partition licensing. I saw quite a few 0’s behind the partition license cost of our last purchase. Are you saying that I can create as many partitions as I like w/ the SVC w/out purchasing expensive partition licenses? I may be ignorant of a better way to do things — when setting up boot-from-SAN for my HS20 blades and VMware VI3, I’m chewing up a partition license for each blade. Seems like the HS20 requires LUN 0 for boot-LUN. Please correct me if I’ve horribly mis-understood the boot-from-SAN setup. The multi-pathing driver sounds like a win. I’m not terribly familiar w/ non-IBM licensing for MPIO drivers, but I’m excited about moving up from the RDAC to the SDD. What should I look out for w/ this transition?

Thanks OSSG,
John

8 08 2007
John Call

Sorry to hit you up again so quickly OSSG. I took another look at Barry Whyte’s inaugural post and wanted to ask a question about his comment…

“I plan to stick to the facts and prompt readers, interested in Storage Virtualization, to carefully consider their infrastructure needs to make an educated and constructed decision when it comes to thinking about, or actually going ahead and virtualizing their SAN environment.”

Why wouldn’t I want a virtualized SAN environment? I’m not sure of all that is entailed in virtualizing a SAN. I look forward to Barry’s post about the advantages of an appliance-based solution and what all of the features are that it provides. Maybe before then, you could sound-off with your comments.

Thanks,
John

9 08 2007
Barry Whyte

Some comments over here

9 08 2007
opensystemsguy

Phew! Start late one day and the inbox piles up ;)

I’ll address John’s questions in this comment first:

“I’m curious if there are any other types of “virtualization” going on in the SAN besides fabric virtualization ala Cisco VSAN / Brocade LSAN, and what seems to me as LUN virtualization ala IBM San Volume Controller (are there any companies who create competitive products for SVC) What other types of abstraction exist in the SAN?”

-IBM’s SVC is called “in band” virtualization and relies on a box between servers and storage. A competitor to them is Hitachi’s high end boxes that can run other controllers behind their controller. I’ve heard good things about it, but it’s a fairly steep entry price and I don’t think they offer their engine without their storage. Aside from that, the only other in band storage virtualization I know of is Falconstor- they offer a software solution that does essentially the same thing as SVC- no need to buy a new disk array like Hitachi.

The other type of virtualization is called “out of band”- that’s where the logic for the virtualization lies in the switch. Cisco is (obviously) a proponent of this because you need one of their directors to run the blade where all the magic is done. This is EMC’s approach to virtualization. The upside is that there little to no latency added, but the downside is that there’s no cache added, and I don’t believe that they provide the same range of options as an in-band appliance would. I’m going to have to see if I can get an EMC or Cisco blogger to guest-write a blurb explaining the technology.

“Does [increased latency] happen for both reads and writes? The expanded cache on the linux cluster nodes provide for greater cache hit %, but I assume the penalty would only apply to reads. Write operations would be cached and control based back to the host straight away, right?”

-Correct. Reads are affected by the latency, but writes go much faster as it’s hard to max out the large cache usually present in in band appliance engines. Also, read latency is indeed offset by the cache hits, however this will depend on your workload. You might be able to have your vendor get you modeling software that would analyze your workload and project your read hits if you increased the cache.

“Are you saying that I can create as many partitions as I like w/ the SVC w/out purchasing expensive partition licenses?”

-As far as I know, yes. The SVC looks to the storage device as one host, so you only need one partition.

“I’m excited about moving up from the RDAC to the SDD. What should I look out for w/ this transition?”

-As with many infrastructure changes, there will be downtime and possibly version headaches. Make sure your servers, applications, and storage firmware support the version of multipath driver you move to. Also, you might want to consider bringing in some hired guns who have done it before just in case. It’s a hourly investment, but you can learn a lot from good service contractors :)

And here’s the big one: “Why wouldn’t I want a virtualized SAN environment? I’m not sure of all that is entailed in virtualizing a SAN”

-Barry might disagree with me on this one, but virtualization is not a silver bullet. While it can increase your disk utilization and decrease management overhead for multiple storage devices, it is also a added layer of complexity. It is widely interoperable with storage devices, but while your DS4300 might be fully compatible with your server environment, SVC might not be. In fact, I know from a rather painful personal experience that at one time SVC had issues with VMWare. Officially it was not supported without special permission, and the special permission came with restrictions on how you could use it. Normally, you are limited in the version and type of guest operating systems you can run.

If your environment is compatible, its benefits outweigh its complications, but then you have to factor in its cost. While it allows you to avoid spending top dollar on cache heavy controllers, it is licensed by TB, adding a linear cost to your growth. If you would have bought low end FC arrays anyways, it might not be worth it. If you are buying it instead of an expensive high cache machine, then the cost of the solution is offset by the fact that you can sometimes reduce the price of your disk by an order of magnitude by buying a low end array.

The price of arrays is made up from two parts- disk capacity, and everything else. Generally, the disks are the part you have to pay for over time as you grow, and this cost eventually dwarfs the investment in everything else (controller, cache, licensing, features, etc). Adding virtualization to the equation increases the cost of the capacity you get, and that can be significant.

9 08 2007
Barry Whyte

Looks like we are in agreement for the most part, see the comments on my blog regarding latency however. Its like when I had my RX-8 – the first thing everyone said was – “drinks a lot of oil” – when it actually only a little more than my Celica had done. Where had this statement come from – press coverage that had neglected to actually check if it was any different from the RX-7. Anyway while its obvious that sticking something in the middle of your SAN that all data flows through *could* add latency – we have done everything possible to keep this to a minimum.

As for silver bullet, I’d agree – there is a time and a place – and its not for everyone, hence my original quoted comment.

Ongoing currency support is a nightmare for our test organizations spread through the world, especially so when some OEM’s make it as awkward as possible… but we manage and there’s much to be said for finding a set of levels that work and ‘not fixing what aint broke’

9 08 2007
opensystemsguy

In this case, it’s not an OEM- it’s VMWare. Their software doesn’t play nice when there’s a non-critical failure (something where the multi-path driver is supposed to keep you up during repairs). The only solution is to use Windows guest machines and pass the storage directly to them. The way everyone uses VMWare is to pass the LUNs to the ESX hypervisor which then passes storage as needed to the guest machines- this functionality is required for all the nifty VMWare features like VMotion and VMWare HA, and it breaks (and is not supported by big blue) when you try to use SVC.

9 08 2007
Barry Whyte

Its always a problem, and we often get asked “why don’t you support” or “why is X restricted with Y” – when we find issues during our testing with other vendors products, or the way it interactions with SVC – what option do you have? In some cases we can work around an issue by adding special handling code to SVC – but when its a bug or deficiency in another product, we can only report and wait for a fix or change in behaviour. Until such time we have to restrict usage or impose SAN maintenance restrictions to ensure it doesn’t cause an SVC outage.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: