Opened 12 years ago

Last modified 12 years ago

#66 new defect

severe memory leak in OMSA6 dsm_sa_snmp32d on Ubuntu Lucid 64-bit

Reported by: Gavin McCullagh <gavin.mccullagh@…> Owned by:
Priority: critical Milestone:
Version: 6.0.1 Keywords: memory leak snmp
Cc:

Description

Hi,

I have 64-bit Ubuntu install which was running Hardy and is now running Lucid, since upgrade last week.

Looking at htop, I can see:

PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command

25757 root 20 0 382M 336M 2888 S 0.0 5.8 0:00.02 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25758 root 20 0 382M 336M 2888 S 0.0 5.8 0:11.21 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25760 root 20 0 382M 336M 2888 S 0.0 5.8 0:01.14 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25761 root 20 0 382M 336M 2888 S 0.0 5.8 0:00.02 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25770 root 20 0 382M 336M 2888 S 0.0 5.8 0:06.75 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25771 root 20 0 382M 336M 2888 S 0.0 5.8 0:00.08 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d

The MEM% seems to be stepping up in increments of about 3-4%, possibly each time the snmp agent is queried (we have a nagios running every 5 minutes). The data_eng stuff then stops responding after a period of time, at which point (until now), I've been restarting it. Attached is an image showing the munin RAM usage on the machine since the upgrade to Lucid.

When installing OMSA on Lucid, I decided to try OMSA v6. Perhaps v5 would be safer? I also had trouble with libstdc++5 being unavailable on Lucid. As suggested somewhere, I copied the library from the 32-bit jaunty package here:

http://de.archive.ubuntu.com/ubuntu/pool/universe/g/gcc-3.3/libstdc++5_3.3.6-17ubuntu1_i386.deb

We're also regularly getting errors like this one:

Jun 22 12:09:20 cuimhne kernel: [ 9423.640213] megasas: Failed to copy out to user sense data Jun 22 12:09:20 cuimhne kernel: [ 9423.642845] megasas: Failed to copy out to user sense data Jun 22 12:09:20 cuimhne kernel: [ 9423.646571] megasas: Failed to copy out to user sense data Jun 22 12:09:20 cuimhne kernel: [ 9423.661891] megasas: Failed to copy out to user sense data Jun 22 12:09:20 cuimhne kernel: [ 9423.678531] megasas: Failed to copy out to user sense data Jun 22 12:09:20 cuimhne kernel: [ 9423.681089] megasas: Failed to copy out to user sense data Jun 22 12:09:20 cuimhne kernel: [ 9423.683658] megasas: Failed to copy out to user sense data

which apparently can be fixed with a kernel patch which I haven't yet had time to try:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/566853 https://bugs.launchpad.net/ubuntu/+bug/544982

Gavin

Attachments (1)

cuimhne.staff.gcd.ie-memory-week.png (63.9 KB) - added by anonymous 12 years ago.

Download all attachments as: .zip

Change History (6)

Changed 12 years ago by anonymous

comment:1 Changed 12 years ago by anonymous

later on in the same day....

PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command

25757 root 20 0 1982M 1935M 2888 S 0.0 33.1 0:00.02 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25758 root 20 0 1982M 1935M 2888 S 0.0 33.1 1:10.51 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25760 root 20 0 1982M 1935M 2888 S 0.0 33.1 0:06.46 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25761 root 20 0 1982M 1935M 2888 S 0.0 33.1 0:00.12 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25770 root 20 0 1982M 1935M 2888 S 0.0 33.1 0:38.89 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d 25771 root 20 0 1982M 1935M 2888 S 0.0 33.1 0:00.45 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d

comment:2 Changed 12 years ago by Gavin McCullagh <gavin.mccullagh@…>

Ahem.  With foratting this time:

  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command                                                             25757 root      20   0 1982M 1935M  2888 S  0.0 33.1  0:00.02 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d                       25758 root      20   0 1982M 1935M  2888 S  0.0 33.1  1:10.51 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d25760 root      20   0 1982M 1935M  2888 S  0.0 33.1  0:06.46 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d25761 root      20   0 1982M 1935M  2888 S  0.0 33.1  0:00.12 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d25770 root      20   0 1982M 1935M  2888 S  0.0 33.1  0:38.89 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d25771 root      20   0 1982M 1935M  2888 S  0.0 33.1  0:00.45 /opt/dell/srvadmin/dataeng/bin/dsm_sa_snmp32d

comment:3 Changed 12 years ago by anonymous

I tried a downgrade to dellomsa v5 and the same (or similar) memory leak seemed to be present.

For some reason, v5 wasn't able to see the temperature probes, so I've moved back to v6.

gavinmc@cuimhne:~$ sudo omreport chassis temps
 Error! No temperature probes found on this system.


Gavin

comment:4 follow-up: Changed 12 years ago by anonymous

I have the same issue with 5 R410 servers on 10.4 64bit with OMSA 6. Definitely critical... Anyone with ideas what to do?

comment:5 in reply to: ↑ 4 Changed 12 years ago by anonymous

Replying to anonymous:

I have the same issue with 5 R410 servers on 10.4 64bit with OMSA 6. Definitely critical... Anyone with ideas what to do?

Now I followed this old advice: http://lists.us.dell.com/pipermail/linux-poweredge/2007-February/029648.html

Uninstall OpenManage? and use IPMI for monitoring. This page encouraged me to write my own script: https://hep.pa.msu.edu/twiki/bin/view/AGLT2/DellCactiSetup

Note: See TracTickets for help on using tickets.