Oracle RAC ASM on a JBOD- Mike’s question

21 07 2008

I had Oracle RAC with ASM running in a RAW configuration on dual 32 bit servers running RedHat 4. I upgraded to dual qla2200 HBAs on each server and they connect through a Brocade 2250 switch to two JBOD disk arrays, a NexStor 18f and a NexStor 8f. I have set up multipath in the multibus configuration and can see the drives as multipath devices in an active-active configuration. I am using OCFS2 for my CRS voting and config disks and that runs fine. When I try to start ASM I can get proper connectivity on the first server that starts up, the second server hands until it eventually errors with a access issue.

I have verified all permissions required are set and can see the disks on both sides using the oracleasm utility. It appears DM is only allowing a single host to access the ASM disks at one time, so when node A starts up and acquires the ASM disks when the ASM instance starts, node B is left hung, visa versa if Node B starts first.

I was told it may be a SCSI reservation issue, but can’t seem to find any information on this. I know people are using this type of configuration to RAID controllers but is the JBOD causing issues? How to get both instances seeing the ASM disks?

Thanks!

Hi Mike! To preface my answer, I’ll start by saying that my Oracle knowledge was purely acquired through osmosis, and that I’m primarily a platform guy. I’ve never sat in front of an Oracle server and done anything, but I do understand a fair bit about how they interact with storage :)

First, when you say “I upgraded to dual qla2200 HBAs on each server”, do you mean that you had a working system using ASM and RAC before you changed the HBA hardware? If so, without even going into the rest of the story, I would start by checking Oracle’s and Nexstor’s support for that card and seeing if they have any known issues with the firmware level you’re using.

Second, it really sounds like your main issue is a multipath one. A “SCSI reservation issue” is another way of saying that a server is locking the devices to itself, which is exactly what multipathing software is supposed to fix. There are several places that it could break down: the application, the OS, the hardware, or the firmware. The only way to see which level your problem comes from is to try to eliminate them by swapping them out. I’d start with Oracle- ASM is supposed to really get down and directly control the disks as raw devices, so they might have a compatibility matrix that contains the whole stack. Maybe it’s as simple as Oracle not supporting your firmware…

If not, you’ll have to do some trouble shooting and vendor support calls until you find out where in your config the error is. I am fairly certain that Oracle can fix this for you though.

Advertisements




Linux sharing a JBOD- paul’s question

23 05 2008

“Question about multiple servers accessing same disks through a SAN switch

I’m trying to set up a Linux system for server failover where two servers (with SAS HBA) are accessing the same set of disks (jbod) through a SAN switch. First – will this work? If so, what software do I need to run on the servers to keep the two servers from stepping on each other? Do I need multipath support?”

It depends. A JBOD does not do any RAID management- it leaves that to the servers. If you have two servers trying to operate on the same disks, they have to be using a clustering operating system to avoid overwriting themselves. Linux is probably able to do that, and I know Windows has a cluster edition, but it’s certainly not the simplest way to get servers sharing data.

This brings me to my first question: what are you trying to do? Do you want them to run the same application so if one fails, the other will pick up the slack without losing anything? Or are you trying to process the same data twice as fast by using two server “heads”?

Secondly, when you say “SAN switch”, do you mean fibre channel switches? If you have SAS HBAs, those can not plug into an FC network.

Thirdly, multipath support usually implies allowing a server to see a single LUN through multiple fabrics. If you have more than one path between every server and drive, multipathing software would indeed be suggested.





Gene’s question

21 11 2007

Gene writes:

question about SAN interoperability

…2 windows 2003 server sp2 servers, running on HP proliant dl380 g4 each with one single port fiber hba’s. Servers will be clustered to run sql 2005. HBA’s are hp branded- emulex fc2143’s (Emulex id- is lp1150)…SUN SAN has both 6130’s disk array and 3510 array…we want to use disk from both arrays…(betters disks in 6130, slower stuff in 3510)

Do we actually need multipath drivers? (SUN has come out with DSM’s for both these arrays)…any issue using multiple DSM’s if they are requried.

Any known issues with the type of device drivers for the HBAs? Storport versus scsiport…

any help is appreciated

In general, if you only have one FC port per server, you don’t need a multipath driver. I am not sure if this holds true with multiple subsystems that aren’t under some sort of virtualization umbrella though… you might need a device driver that understands how to work with multiple subsystems. This would not be a multipath driver though- those are for multiple paths to the same LUN.

Regarding scsiport versus storport, I found an excellent whitepaper detailing the differences here. The way I read this is that these layers of the storage stack replace the proprietary device and multipath drivers provided by Sun- if they support it, then you should take storport, the more recent version. Unfortunately, I can’t give you very specific caveats with this technology because every system I’ve worked on used the vendor’s device and multi-path drivers, or a virtualization head to combine multiple physical subsystems into a logical one.

The disk subsystems are withdrawn from Sun’s marketing- have you asked your Sun contact whether they’ll support the setup you’re considering?





Storage and fabric virtualization

7 08 2007

Aloha Open Systems Storage Guy,

What’s your take on virtualization? VSAN from Cisco, SVC from IBM? What other virtualization products are available from other vendors?

Thanks,
John

Cisco VSANs and IBM’s SVC are different things for certain :)

The VSAN allows you to create multiple logical fabrics within the same switch- you tell it what ports are part of what SAN, and you can manage the fabrics individually. It’s especially useful if you’re bridging two locations’ fabrics together for replication or something because it allows you to do “inter VSAN routing” if you have the right enterprise software feature. That would allow you to have two separate fabrics whose devices can see each other, but if the link between the sites fails (which is more likely than a switch failure), you won’t have the management nightmare of having to rebuild the original fabric out of two separated fabrics when the link comes back. VSANs are also commonly used to isolate groups of devices for the purpose of keeping those devices logically separated from parts of the network they’ll never need to interact with.

IBM’s SVC is a different technology that is supposed to consolidate multiple islands of FC storage. It’s essentially a Linux server cluster that you place between your application servers and the storage. It allows you to take all the storage behind it and create what they call “virtual disks”- essentially a LUN that’s passed to a server but contains multiple raids (possibly from multiple controllers). This gives you the option of striping your data across more spindles than you would be able to normally, and allows you to do dynamic thin provisioning when your datasets grow.

The only downside of the Cisco VSAN technology I can think of is its cost- it’s bloody expensive compared to a cheap low end solution, and for anything less than a 50 device FC fabric, I would questionable whether it’s worth it. There is an alternative from Brocade/McData they call LSAN, however I am not as familiar with it. I have been told that it’s slightly less complicated, but harder to manage, and doesn’t have the full feature-set of Cisco.

The downside to the IBM SVC is that you create latency for all your disk reads- every time a server needs to perform a write, it has to go through the Linux cluster first. It has a much larger cache than most controllers, so there’s a better chance that the data you’re looking for is already there, but if it’s not, your read performance might suffer a little because of the extra few milliseconds. The advantage is that you can now use incredibly cheap controllers with tiny amounts of cache, and it allows you to migrate data from any manufacturer’s device to any other manufacturer’s device without interrupting your servers. Under a virtualized environment like this, an older DS4300 like you have will perform pretty much on the same level as a more expensive DS4800 or EMC CX3-80 (assuming the same number of drives) because you don’t really use the cache of the underlying system. Another advantage of the SVC is that most FC storage controllers charge you either one time or over time for the number of servers you’re planning to connect to them. IBM charges a “partition license” fee for LUN masking, and EMC charges a “multipath maintenance” tax. Either way, the multipath drivers for SVC are free, and it only needs one partition from the controller, so you might be able to save money that way.

Did you have any specific questions about these topics you want more detail on?

Also, one of the new bloggers in the storage world- Barry Whyte– focuses on IBM SVC. He just started, but his blog will hopefully become a real resource for people with IBM storage virtualization on their mind.





Another question from John- multipath drivers

2 08 2007

Aloha Open Systems Guy,can you take another question from me? I’ve got some questions about OS-drivers for disk subsystems…What’s up with all the RDAC, MPIO/DSM, and SDD? I’ll try and keep things consistent by limiting my question to one OS (Windows Server 2003).

I’ve heard talk about SDD being superior for the ESS / DS8000 line of storage. It’s apparently not even available in an active/passive array. However, I’ve got a mid-range disk subsystem from IBM, the DS4300 Turbo model.

Until tonight I thought there was only a single choice of multi-pathing driver for me, RDAC. However, when I went about installing my first Windows OS to be SAN-connected I ran into all kinds of new information like SCSIport and STORport and now MPIO / DSM.

Can you help de-mystify this enigma for me?

Mahalo nui loa,
John

Certainly! Always happy to get more questions. I’m a chronic sufferer of writers block, so your questions help by providing material ;)

Each vendor dictates the support they provide for multi-path drivers, and going outside these constraints is possible, but will usually void the warranty. My experience with IBM is that they usually support something out of the box if it works, or in special cases if it can be made to work. Since they only support RDAC with the DS4000 series, I’ll bet that nothing else would work. Whether through design or technical limitation, I do not know, but I suggest that you stick with the driver they recommend.

The only limitation to RDAC is that it does not dynamically load balance- however in terms of failover protection, it’s bullet-proof.

edited to add: The other drivers you mention are supported on other IBM systems,  by the way.