Oracle RAC ASM on a JBOD- Mike’s question

21 07 2008

I had Oracle RAC with ASM running in a RAW configuration on dual 32 bit servers running RedHat 4. I upgraded to dual qla2200 HBAs on each server and they connect through a Brocade 2250 switch to two JBOD disk arrays, a NexStor 18f and a NexStor 8f. I have set up multipath in the multibus configuration and can see the drives as multipath devices in an active-active configuration. I am using OCFS2 for my CRS voting and config disks and that runs fine. When I try to start ASM I can get proper connectivity on the first server that starts up, the second server hands until it eventually errors with a access issue.

I have verified all permissions required are set and can see the disks on both sides using the oracleasm utility. It appears DM is only allowing a single host to access the ASM disks at one time, so when node A starts up and acquires the ASM disks when the ASM instance starts, node B is left hung, visa versa if Node B starts first.

I was told it may be a SCSI reservation issue, but can’t seem to find any information on this. I know people are using this type of configuration to RAID controllers but is the JBOD causing issues? How to get both instances seeing the ASM disks?

Thanks!

Hi Mike! To preface my answer, I’ll start by saying that my Oracle knowledge was purely acquired through osmosis, and that I’m primarily a platform guy. I’ve never sat in front of an Oracle server and done anything, but I do understand a fair bit about how they interact with storage :)

First, when you say “I upgraded to dual qla2200 HBAs on each server”, do you mean that you had a working system using ASM and RAC before you changed the HBA hardware? If so, without even going into the rest of the story, I would start by checking Oracle’s and Nexstor’s support for that card and seeing if they have any known issues with the firmware level you’re using.

Second, it really sounds like your main issue is a multipath one. A “SCSI reservation issue” is another way of saying that a server is locking the devices to itself, which is exactly what multipathing software is supposed to fix. There are several places that it could break down: the application, the OS, the hardware, or the firmware. The only way to see which level your problem comes from is to try to eliminate them by swapping them out. I’d start with Oracle- ASM is supposed to really get down and directly control the disks as raw devices, so they might have a compatibility matrix that contains the whole stack. Maybe it’s as simple as Oracle not supporting your firmware…

If not, you’ll have to do some trouble shooting and vendor support calls until you find out where in your config the error is. I am fairly certain that Oracle can fix this for you though.