Opened 13 years ago

Last modified 12 years ago

#28 assigned defect

5.5.0 seems to cause freeze on R300 servers running lenny

Reported by: guillaume Owned by: bas
Priority: critical Milestone:
Version: 5.5.0 Keywords:
Cc: xani666@…, kae@…

Description

Hello,

I had some Dell R300 servers with the 5.4 package running ok for a few months, I upgraded servers to lenny, then 2 days after I installed the new 5.5 package via apt-get upgrade.

On 1950 upgrade is ok but on all r300 upgrade have freezed the server (no more ping, no serial console via ipmi). After manual reboot server freeze again if I try to restart omsa.

I tried to downgrade & reinstall 5.4.0 (apt-get --purge dellomsa ; apt-get install dellomsa ) but server freezes again when I restart omsa, perhaps the uninstall script left some files or change the 5.4 base configuration ?

Has anyone experienced this problem ? I will test on a fresh lenny install to see if it is the upgrade process which cause problems

PS: Someone having the same issue http://lists.us.dell.com/pipermail/linux-poweredge/2009-January/038331.html

/guillaume

Change History (24)

comment:1 Changed 13 years ago by bas

  • Owner set to bas
  • Status changed from new to assigned

We do not have any R300 servers. So i can help you with this issue. What i know is that dpkg -P must delete all the files that dellomsa has installed. Maybe there are some file s in /var?

comment:2 Changed 13 years ago by Vincent

Same problem with a fresh install of lenny on a R300. During apt-get install dellomsa the ssh connection freeze on the messages "Starting dsm_sa_eventmgr32d: ." and network seems down but the machine response to keyboard action And I have the following kernel messages : Program dsm_sa_datamgr3 tried to access /dev/mem between fffff000->100010000.

comment:3 Changed 13 years ago by onyx.peridot@…

  • Summary changed from 5.5.0 upgrade from 5.4.0 on R300 servers seems to cause freeze on lenny to 5.5.0 degraded RAID5 on MD1000 unusable

I see the same problem on my poweredge1950+MD1000 combo. My MD1000 is a 15xSATA RAID5 with 1 hot spare. When I did a simulated single-disk failure, my RAID5 crashed in degraded mode. More specifically, my LVM2 physical volumes and logical volumes were broken and the EXT3 file system on top of LVM2 became read-only and unusable while the MD1000 attempts to rebuild the RAID5 with its hot spare drive.

Upon completion of the RAID5 rebuild, I had to reboot to get things working again. Although no data had been lost, but the degraded RAID was completely useless during the rebuilding.

comment:4 Changed 13 years ago by onyx.peridot@…

I have done another slight variation of my previous test. This time, I rebooted immediately after the single-disk failure (LVM2 and EXT3 both crashed after the disk failure). The MD1000 began its rebuilding and the degraded RAID5 was usable. All its LVM/EXT3 stack were operational. So the problem is the need to reboot the machine in order to get a usable degraded RAID5.

comment:5 Changed 13 years ago by Guillaume

  • Summary changed from 5.5.0 degraded RAID5 on MD1000 unusable to 5.5.0 seems to cause freeze on R300 servers running lenny

Hello,

Could you please open a different ticket as it seems to be a different problem

Thank you

comment:6 Changed 13 years ago by goth

  • Priority changed from major to critical

Works with 2.6.24 kernel on a R300 but same crash with 2.6.26 and cannot uninstaall dellomsa:

dpkg -P dellomsa (Reading database ... 24916 files and directories currently installed.) Removing dellomsa ... dpkg - warning: while removing dellomsa, unable to remove directory `/opt': Device or resource busy - directory may be a mount point ? Purging configuration files for dellomsa ... rm: cannot remove `/etc/ld.so.conf.d/dell-omsa.conf': No such file or directory dpkg - warning: while removing dellomsa, unable to remove directory `/opt': Device or resource busy - directory may be a mount point ?

comment:7 follow-up: Changed 13 years ago by bas

Did you reboot teh system and tried it agian. It is strange that you can not remove the package

comment:8 in reply to: ↑ 7 Changed 13 years ago by goth

Replying to bas:

Did you reboot teh system and tried it agian. It is strange that you can not remove the package

Rebooting and retry worked ... I hope a workaround to omsa install on r300 will be found pretty soon.

comment:9 Changed 13 years ago by xani666@…

  • Cc xani666@… added

Ive got similar problem on T300 R300 and 1950, after installation of dellomsa system was unreachable (direct console worked), after restart it started working fine, until i changed IP and rebooted, then it "cut all network" incl. IPMI, but tou could still log by normal console

comment:10 Changed 12 years ago by bas

Strange we use dellomsa 5 on 1850, 1950 and m600 without any problem (version 5) on Debian Lenny. We just upgrade all our servers to dellomsa version 6, maybe this will solve your problem. We do not use the standard debian kernel. We use a vanilla kernel from www.kernel.org.

comment:11 Changed 12 years ago by xani666@…

I use vanilla kernel also. Fix for me was adding ifdown -a ifup -a to dellomsa init scripts. It worked propertly when i was getting IP from DHCP, my guess is dhclient set interface parameters after dellomsa started.

comment:12 Changed 12 years ago by bas

Thanks for the info. That is why i never encountered this, we use DHCP on all our nodes.

comment:13 Changed 12 years ago by crbr-dell@…

The problem seems to be related to the tg3 network driver. On my r300s the network card stops responding after omsa is loaded. I have to unload/reload the tg3 module to make it work again. So far as I tried, the problem exists with standard debian kernel (etch 2.6.18 & lenny 2.6.26) and custom kernels too (the last one I tried was 2.6.30). I also tried the version 6 and old 5.4 of omsa but it didn't do much good either so I don't think it's related to the version 5.5.0 of dell omsa. It might be hardware related I fear.

pe1850 (using intel e1000) and pe1950/pe1955/M600/M610/others (using broadcom bnx2) are immune to this problem because they just do not use the same network cards.

comment:14 Changed 12 years ago by xani666@…

Did u tried just doin ifdown/ifup ? It worked for me.

comment:15 Changed 12 years ago by noah-junk@…

Just adding a "me too" here for a fresh Lenny AMD64 install on a T300.

Any ideas for a fix? Has anyone successfully integrated the workarounds mentioned into the OMSA initscripts?

comment:16 Changed 12 years ago by anonymous

I have the same problem on the DELL R300 after installing dellomsa with apt-get command , the network is inaccessible.

comment:17 Changed 12 years ago by anonymous

The error is when i launch /etc/init.d/networking restart

REconfiguring network interfaces ... Master 'eth0' : Error: handshake with driver failed. Aborting

comment:18 Changed 12 years ago by franck.leprette@…

I found a temporary fix. After dataeng is launched, I remove the bonding module : rmmod bonding , then i reload it : modprobe bonding. Then i restart the network /etc/init.d/networking restart ... and it works

It seems that the bonding module (and perharps network modules) needs to be launched after dellomsa.. Is there any solution to do it every startup ?

comment:19 Changed 12 years ago by anonymous

Same problem on my R300 but I'd like to tell that it works fine on R200. Is there a difference between both ?

comment:20 follow-up: Changed 12 years ago by guillaume

Does anyone tried omsa 6.0.1 on R300 ? Is there the same behavior ?

comment:21 in reply to: ↑ 20 Changed 12 years ago by theo

Replying to guillaume:

Does anyone tried omsa 6.0.1 on R300 ? Is there the same behavior ?

Yes same behavious here, with various default lenny and lenny-backport kernels.

comment:22 Changed 12 years ago by saranl@…

same here: fresh debian 5.0.3 setup on a new R300 with dellomsa_5.4.0-1_i386.deb breaks networking at install and when restarting dataeng! has anyone any news on this?

comment:23 Changed 12 years ago by saranl@…

edit: i tried dellomsa_5.5.0-5_i386.deb !

fresh debian 5.0.3 setup on a new R300 with dellomsa_5.5.0-5_i386.deb breaks networking at install and when restarting dataeng! has anyone any news on this?

comment:24 Changed 12 years ago by kae@…

  • Cc kae@… added

We have the same problems (ten R300 servers). Is this likely to get resolved?

Note: See TracTickets for help on using tickets.