John- hardware vs. software RAID, RAID 5 or 10?

23 07 2007

“Aloha Open Systems Storage Guy,

I’m a recent convert to storage administration. I’m having a hard time cutting through the cruft to find the truth. Could you answer some of these questions?

1 – Which is faster, software-based RAID (e.g. Linux md, Windows Dynamic Disks) or hardware-based RAID? One person said that software-based RAID is faster because it has a faster processor and more RAM/cache (something like a Xeon 3.0 Ghz w/ 4Gb or RAM would be typical in my environment). But how could that stack up against my (little bit old) IBM DS4300 Turbo (2Gb cache).

2 – Which is faster, RAID-5 or RAID-10 (or is that RAID-01?) I know everybody says RAID-10, but what about those fancy XOR engines? Or have I fallen prey to marketing?

Thanks for taking a moment to listen to my questions.
Mahalo (Thank you),

Hi John, and welcome to the blog!

To answer your questions, I’m first going to give a bit of background info. If any of my statements don’t make sense, please reply and I’ll answer :).

The term “faster” can mean different things to different people. Each type of storage has its strengths and weaknesses, and different applications perform differently on the same storage systems. There are two primary application workloads- those that do random IOs, and those that do sequential IOs.

The random workloads are the hardest ones to provide storage for because it’s very difficult to “read ahead” by predicting where the next read will fall. An example of an application that has a random workload would be a database or email server.

The sequential workloads are easier to provide storage for. Pre-fetching the next block will most of the time yield a read that’s already in cache. An example of an application like this would be a backup server or certain file servers.

Another general bit of info is that in a RAID, reads (not writes) are usually the bottleneck. Writes are usually fed into the cache and acknowledged to the host server immediately. Reads, however, are typically 70% of the IO being done by a system, and as we discussed are often impossible to “pre-cache”.

When you’re calculating performance, the two stats you’ll want to know is IOs per second for random loads, and MB per second for sequential loads (abbreviated IOPS and MBPS). When you’re trying to tune a system to be quick for your applications, you need to know the different levels of your system and which one is the bottleneck. Normally, on a decent controller, the number of spindles you have in the RAID will determine the IOPS. You should get a linear increase in performance as you add drives to a RAID. Cache is important for the 30% of writes you can expect (your mileage may vary), however everything goes to disk eventually, and most people experiencing slow performance on their disk controllers simply don’t have enough disks.

Onto the specifics of your question:

1- Software or Hardware RAID: For most workloads, a dedicated hardware RAID controller is faster. Software RAIDs have to share resources with the operating system, which is usually not optimized for sharing on that level. The IBM DS4300 you have is actually an LSI box, and has a very powerful RAID controller for its price. Don’t let your sales rep try to replace your controller! Those boxes may be a little old, but the only major difference between that and the newer IBMs is that the newer ones use 4 gig fiber and more cache. It’s very rare that a workload can max out 2 gig fiber on the front end, and even more rare that the controller can fully utilize all the bandwidth on the disk side. The extra cache can be useful, but you will experience diminishing returns- the benefit of going from 2 to 4 GB is way less than from 1 to 2 GB. The controller should not be your bottleneck for anything under 80 FC drives on the system you have, so unless you want to go beyond that, keep your box until the maintenance costs more than the replacement. Add more drives if you need IOPS or MBPS, but don’t throw it out. These boxes are supposed to be like houses- only buy a bigger one when you need it. Not because the last one is obsolete.

2- RAID 5 or RAID 10: I will compare them in reliability and performance. RAID 5 uses the space of one disk for parity, and RAID 10 uses the space of half the disks. Reliability wise, RAID 10 is the obvious winner. You can lose up to half your disks before you lose data (assuming you don’t lose two of the same pair). If you lose a second drive while rebuilding a critical RAID 5 array, you will always have to go back to your last backup. Generally, this is more of a worry for large SATA drives than it is for the smaller and faster FC drives- SATA RAIDs take exponentially longer to rebuild because of the larger amount of data combined with the lower performance per spindle.

Speaking of performance, the performance (per drive) is better on RAID 5. Most people put two RAID 5s on each enclosure, and have 4 to 6 RAIDs per hot spare. The XOR engine you speak of performs the parity calculations for RAID 5, however is not needed for RAID 10 or any other non-parity type of RAID. Since you do have a fairly fast controller, RAID 5 is attractive, however you have to balance your decision based on performance and reliability.




9 responses

27 07 2007
Barry Whyte

I think one of the key points to note about R5 vs R10 is the performance difference. If money is not an issue, then R10 is always better. From reading your otherwise excellent write up, the final paragraph implies that because per drive performance is better on R5, that overall performance is better. Which is not the case.

In a single write operation to a R5 you may have to do several reads and writes to re-generate the parity information. Known as the write-penalty on R5. R10, yes you have to do two writes, but these can be done in parallel. Of course in a sequential workload you may be able to eliminate this by writing all segments in a stride(thus calculating the new parity without having to read anything) – known as a ‘full stride write’

As for reads, depending on the RAID implementation, R10 can get twice and much performance due to having two copies of the data.

These are important considerations. As I stated above, if money is not an issue then R10 will in most cases outperform R5.

27 07 2007

Hi Barry, thanks for your feedback!

As for your comment, I am under the impression that the drive pairs in a RAID 10 must move in lockstep- thus the greatest number of spindles you can have reading or writing at a time is half of the total drive count. For example: if you are streaming a backup to a RAID with 16 drives in RAID 10, all 16 drives will be running at full throttle, however the throughput will be the same as a RAID 0 with 8 drives because both drives in a pair work in lockstep. Each of the 8 pairs can work on only one request at a time. You can’t (as far as I know) have one side of the pair reading from or writing to one sector and the other side another.

Even with the parity overhead in terms of latency and IO for RAID 5, the performance boost you get from going from using half the drives to all but one is significant. Of course, everyone has to make their own decision, and some controllers (especially controllers that rely on software based XOR engines) really don’t do RAID 5 fast enough to make it worth it.

In John’s case, he has controllers with dedicated RAID XOR engines, so his RAID 5 performance per drive might be good enough to justify choosing RAID 5 over RAID 10. Benchmarks are the only way to test this though.

27 07 2007
Barry Whyte

True, as with most performance questions “it all depends on the workload”

Its true that writes to R10 have to be in lock-step, but some implementations do make use of the dual channel read effect, since its the same data on both drives, but you only need to read from one. (Usually only very high end RAID hardware controllers though)

1 08 2007
John Call

Barry, OSG,

Mahalo! (Thank You). I’ve been absent for a few days because I’m trying to wrap my brain around IBM’s TotalStorage Productivity Center – Standard Edition. My intent is to get some measurements that will help me determine how my RAID is doing, and how my ‘old house’ is doing. Nice analogy. I’ve had my sales reps knocking at my door to swap out my DS4300 with a DS4700 or DS4800 – they talk about 4Gb/s a lot, but I agree. It’s pretty hard to saturate a 2Gb/s link. I’ve got some SNMP data from my FC switches that show burst up to 800Mb/s (I think). So I’d like to follow my gut feeling, which you’ve validated, which would be to throw more HDD behind the controller and boost my IOPS / MBPS until there’s no more room for expansion.

I appreciate your comments on ‘random vs. sequential’. My SAN supports an 18-host, 100-VM VMware infrastructure. It also supports a few Oracle instances for ERP. I’m trying to convince my TSM Backup guy to use some TBs, but he’s still a DAS guy. Please correct me if I’m wrong, but I’ve got a lot of random IO going on here.

Like I said, I’m new at this stuff. I walked onto the job and saw that our contractors had created three RAID-5 arrays (each 12+2 hot spare). Each array is composed of 12 x 73Gb, 146Gb, and 300Gb FC drives. I didn’t time the last (and only) rebuild on the 300Gb array. But the vendor did a good job at placing fear in my heart of rebuilding a 500Gb (or pray-not 750Gb) RAID-5 array using SATA disks. Something about the odds of loosing another drive due to the intensive IO’s going on to rebuild the array would more than likely break the second disk, and forefit all the data. :) Sounds like a job-killer to me.

So, besides TPC, what performance tools can I use? I like data, not impressions, on performance. How would running the IOmeter compare? Would other traffic to/from the DS4300 controllers throw off any host-based metrics?

Thanks again guys!

1 08 2007

Hi John,

What you’re describing is indeed a random IO type environment. As for quantitative performance measurement tools, there are a few options. Iometer is a very specific tool that has its uses, but (from what I understand) it only simulates application IO traffic to the SAN to stress test it. Since you have a real workload (that probably has a few different IO characteristics than the simulation model), I don’t know what Iometer would do for you…

Another complication is that your theoretical native performance caps might not be reachable through the hypervisor of VMWare- If you use VMWare the way many people do and have all the SAN storage allocated to a few large VDisks using VMFS to take advantage of their availability features like VMotion, then the performance you will see using perfmon inside several virtual servers will be slightly lower than you would see if you were to build those same virtual servers as physical boxes on bare metal because the VMFS adds a little bit of latency and overhead. It’s worth it because of the advantages of VMWare, but it can complicate your efforts to quantify your performance.

The real question here, I suppose, is what information you need? What do you mean by “determine how my RAID is doing”? Are users experiencing slower performance than they want? Do you want to know if you need more spindles?

2 08 2007


Thanks for the validation. Just when I think I’ve figured something out in this storage arena I get broadsided by some new-to-me concept, or misleading information from vendors. My real question is, how do I know if its time to trade in my DS4300? By my guessing, we’ve got years to go on our investment. After all, we’re new to SAN, and we’ve only attached a single EXP710 to the controller. But I’d like to confirm that with numbers (maybe from TPC).

The sub-question would be, how do I go about creating a more tiered storage environment? As I said, I’ve got three types of disk… all FC

73Gb – 10K
146Gb – 15K
300Gb – 10K

Right now I feel like we effectively have one-tier of storage. Maybe you know how well VMware would run on SATA drives in the DS4300.

Thanks again!

2 08 2007

Hi John- I think I understand. The answer to the first question actually has little to do with benchmarks. The performance of a storage device is first and foremost dictated by the number, type, and configuration of drives it controls, followed by whether it has the horsepower to run those drives as fast as they can go. You can tell when your horsepower is maxed out by not seeing an improvement in performance when you add drives.

If you have 20 odd drives in your system, it’s unlikely that it doesn’t have the horsepower to run all your drives at full tilt, so you may well have years of life left in your controller. Most people only buy a new controller when the new option becomes cheaper than maintenance on the old one. In the meantime, if you can afford the downtime to stress test your system before and after a drive upgrade, then by all means, run Iometer. If not, but you want to add transactional performance for your random IO apps, then you can start by adding more 10k or 15k rpm drives and test the performance from inside the virtual servers with perfmon (or TPC, but I’m less familiar with it, and I don’t know how it handles VMWare).

For your sub question about tiered storage, you should first determine if you know your data well enough to identify what could be placed on SATA drives. The obvious candidates are backups and light use file systems. I’d be careful about VMWare though- if all your VMWare virtual disks are in the same LUN, it might be hard to differentiate the workloads handled in your ESX environment. If you do have a way to move specific virtual data to SATA without affecting the rest of the VMWare environment, then you’ll have to determine which of your workloads are less transactional in nature (and ideally less active). Contrary to popular myth, SATA drives are almost as stable as FC, but they still spin at 7200 rpm, so are bad at random IO workloads. Their other downside is that because of the long rebuild time for RAIDs, they have a slightly higher risk of data loss during a second disk corruption. Don’t run an app that can not tolerate the possibility of downtime due to a tape restore.

8 08 2007
John Call

OSSG, I don’t mean to kick a dead subject, but I read something from Jon at that made me think about your latest comment. Does my DS4300 qualify as a ‘junk box’? Or is Jon a little over-dosed on Vicks?

Here’s a copy of the question I raised to Jon…

Can a new guy ask a question?

Jon, can you clarify what the fellow’s question was again? I’ve got an older IBM storage box (DS4300 Turbo). I hope its not junk. I need to add to my institution’s storage capacity and figure I’ll just strap a few extra EXP810 drawers behind the DS4300 filled with as many FC drives as my budget will afford. Are you saying that I’m setting myself up for disaster? I’d like to get a few more years ROI out of my initial DS4300 purchase. What types of vendor-gouging do I need to look out for? The reseller who sold me the storage, the same who will help me increase capacity, also sells ERP software and consulting services — we buy those services.

Thanks Jon,

edited to add:

Sorry, I left off the link to the comment from Jon Toigo

9 08 2007

Does my DS4300 qualify as a ‘junk box’?

Actually Jon was talking about the metaphorical “junk drawer”- a repository for all sorts of unstructured but useful things. I don’t think he was talking about the hardware at all, but about the fact that most companies tend to not clean out their storage and prioritize certain data. Many organizations have a capacity problem- they can’t afford enough. This problem can be solved by investing in some sort of solution that will identify data that takes up space and needs to be available but is not really accessed often and move it off to SATA.

I think you have a few years left in your box, and attaching more spindles will increase your performance and capacity.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: