Barry’s question

7 01 2008

Via email:

“When you are thinking about Disaster Recovery, CDP, do you assume that Tier3 is adequate, mainly because this is backup only, or maybe DR so hopefully not needed? How does your thinking proceed? Do think about your primary data at the same time?

I ask this as a loaded question, knowing that anything that has to copy to, snap to, or mirror with, secondary, backup or DR or CDP storage now has a definite tie with the primary.

Barry Whyte

SVC Performance Architect
IBM Systems & Technology Group”

This is a loaded question! To start with, I’ll note some assumptions and concept clarifications to ensure we’re talking about the same thing- if I’m off on anything, let me know ;)

  • CDP: continuous data protection, an IBM backup software algorithm- small changes sent to a central server continuously
  • Tier 3: low price random access storage media- not tape, usually cheap SATA drives
    • Note: there’s been discussion about these tier definitions before, and I hold that tier 3 means different things to different companies.

To your question- I would have to decide based on the company’s current architecture. If they have a storage solution that has synchronous mirroring between two sites, then using low performance drives on either side will slow production. If they’re doing asynchronous replication (or a server instead of storage based DR solution), I would probably be fine with SATA/tier3.

To explain my reasoning, I must first say that I can not decide without having a specific case and a IT person to question. My advice would be based on risk tolerance versus capital expenditure tolerance. Secondly, SATA has a undeserved bad rap- the drives are about as reliable as other enterprise ones (according to Google). SATA drives are certainly not fast for random access loads, but for sequential and low urgency loads like backups, they will do the job.

Low performance media will always be part of a healthy storage balance- the most bang for most companies’ bucks will be in prioritizing their applications (or even their data), and using the media that makes the most sense. Need an Oracle server to stop freezing up your warehouse management app? Put that baby on 15,000 RPM FC hard drives- lots of them. Need to keep a backup copy of a file server on site in case of a server outage? SATA will do the job. Need to keep nightly point in time backups of your entire storage infrastructure for years? You probably can’t afford to put that on drives at all- use tape.

That said, most companies that haven’t reached a boiling point in their storage gear yearly expenditures won’t bother to do much of this stuff. Face it, tiering your applications for storage takes operator time, and gear just seems to feel cheaper to management than IT man hours. That and the explosive growth of media density in the last 5 years have kept tiered storage plan adoption either to the ridiculously large data producers who have no other choice (like large banks) or to more forward thinking smaller shops.



2 responses

9 01 2008

OSG, thanks for the reply. You’ve covered the essence of what I was asking, On the whole I would agree with your thinking however I disagree that async mirror is particularly different to sync mirror.

This is especially true when the async mirror has been implemented with a small RPO (Recovery Point Objective) then you are going to try and keep the secondary site as close in time to the primary as possible. With the delays over the long distance link, this means probably an RPO of a short number of seconds (less than a minute) (For example this is the RPO SVC implements) Therefore with a finite amount of local buffer space, if your secondary is substantially lower random performance than your primary, then I’d say you were asking for trouble.

This is a common misconception about asynchronous mirroring (with a low RPO). SATA would be fine if you have enough random IO/s performance at the secondary (so as to not slow down you primary) In the case of a 15K RPM primary site, this would mean something in the order of 2.5 to 3 times the number of spindles at the secondary site.

Your points about sequential (archive) performance, or backup to SATA are valid. And I’d agree.

My CDP question was really aimed at using FlashCopy (point in time copy) where again if you are using SATA as the target volumes, great care is needed to ensure your IO/s rate at the source disk is not going to be hampered by the rate at which the random copies can be made at the target disk. While most good implementations will have a cache layer above the flash copy (i.e. any writes to the source are hidden from the subsequent read and clone operations) However under some circumstances (for example performing a complete copy of the source to target) and continuing random I/O to the source, things can quickly overload the SATA target volume.

My comments come from experience in the field, where the general ‘assumption’ is that SATA is fine under these circumstances. All I’m saying is that for a standard RAID-5 /6 SATA controller you need to do the maths carefully before deciding that your async copy or flash copy targets can live on a stock SATA RAID-5/6 (of course the XIV distributed RAID model starts to change this substantially)

9 01 2008
Open Systems Guy

Asynch replication usually means that the IT staff don’t mind a farther recovery point, in my experience. That said, if they are using asynch with a short RPO, then you’re absolutely correct. The remote target must be able to go as fast as the source or else they will have to choose between slowing the primary storage or having their recovery point fall behind their objective.

Typically, all replication setups include a periodic consistency set- the synchronous or asynchronous mirror of an open database is useless if the data and log won’t agree to a consistency point. Files are less of an issue, but email and database applications are very sensitive to this. Typically, scripting will be used to halt application writes while the buffer is flushed to disk locally and remotely, and then a consistent, bootable flash copy is created.

As for flash copy, different techniques have different algorithms. Assuming we’re talking about the standard “bundle of pointers” implementation, generally people put the flash data (the pointers) on the same type of media as the main data. It is so interconnected with real data that most people just feel uneasy if they mix media. I have seen SATA used for this though, and the performance hit on a speedy FC LUN was not as bad as one would expect- a huge portion of IOs/s for most workloads is from reading, and the SATA only had to work on the writes, which were acknowledged to the host and put into cache before the real disk even saw them.

Overall, I agree that SATA is not for everything, but I always believe careful math should be done before deciding where to draw the primary/secondary media tier line. Every shop will have their own priorities.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

<span>%d</span> bloggers like this: