Opened 14 years ago

Closed 9 years ago

#72 closed defect (worksforme)

jobmond memory leak

Reported by: ramonb Owned by: ramonb
Priority: normal Milestone: 1.0
Component: jobmond Version: 0.3.1
Keywords: Cc:
Estimated Number of Hours:

Description

At least in version 0.3.1 there is a memory leak present in jobmond.

Have now received multiple reports of jobmond consuming multiple gigabytes of memory after a while.

Change History (8)

comment:1 Changed 14 years ago by ramonb

  • Owner changed from somebody to ramonb
  • Status changed from new to assigned

comment:2 Changed 14 years ago by ramonb

This seems related to pbs-python.

when I run this loop:

>>> ps = PBSQuery.PBSQuery()
>>> while (1 ):
...     j = ps.getjobs()
...     time.sleep( 10 )
... 

the python process seems to gradually grow even up to 1 MB memory every getjobs() iteration.

this could very well be a PBSQuery bug, Bas (van der Vlies) is looking into that.

Meanwhile, I can try to work around this leak.

comment:3 Changed 14 years ago by ramonb

  • Priority changed from critical to normal

this seems directly related to pbs-python, version 2.9.8 has the issue, while 2.9.4 does not.

comment:4 Changed 13 years ago by mike.scchen@…

I'm having similar problem now. System config: CentOS 5.4, GNU compiler 4.1.2, python 2.4.3, PgSQL 8.1.18-2 Packages: jobmonarch 0.3.1 ganglia 3.0.7 torque 2.3.7 rrdtool 1.4.2 pbs_pyton 3.2.0 pyPgSQL 2.5.1 py-rrdtool 1.0b1 All these packages are built from source.

The jobarchived leaks memory of ~600MB/day in my case, while continously give this message: "Unhandled exception in thread started by" (by what?) This message happens no matter using rrdtool's internal python module (build when python-devel exists), external py-rrdtool, or even deprecated rrdtool binary calling. Except for these, my jobarchived installation works perfectly.

comment:5 follow-up: Changed 13 years ago by mike.scchen@…

Ow, bad formatting. Fixing:

I'm having similar problem now.
System config: CentOS 5.4, GNU compiler 4.1.2, python 2.4.3, PgSQL 8.1.18-2
Packages:
jobmonarch 0.3.1
ganglia 3.0.7
torque 2.3.7
rrdtool 1.4.2
pbs_pyton 3.2.0
pyPgSQL 2.5.1
py-rrdtool 1.0b1
All these packages are built from source.

The jobarchived leaks memory of ~600MB/day in my case, while continously give this message:
"Unhandled exception in thread started by".
This message happens no matter using rrdtool's internal python module (builds when python-devel exists), external py-rrdtool, or even deprecated rrdtool binary calling.
Except for these, my jobarchived installation works perfectly.

comment:6 in reply to: ↑ 5 Changed 13 years ago by ramonb

  • Cc mike.scchen@… added

I think you are suffering from the threading bug in jobarchived, as discovered/described in ticket #45 and #34.

I'm working on fixing this, but it will probably require some major rewrites in the threading model for jobarchived. Unfortunately there is no easy fix at this time.

I'm hoping to get this fixed for the upcoming version 1.0 release.

This btw has however nothing to do with the memory leak in this particular ticket #72, which is related to the pbs_python querying module used by jobmond. That is a entirely other issue, non-related.

Replying to mike.scchen@…:

Ow, bad formatting. Fixing:

I'm having similar problem now.
System config: CentOS 5.4, GNU compiler 4.1.2, python 2.4.3, PgSQL 8.1.18-2
Packages:
jobmonarch 0.3.1
ganglia 3.0.7
torque 2.3.7
rrdtool 1.4.2
pbs_pyton 3.2.0
pyPgSQL 2.5.1
py-rrdtool 1.0b1
All these packages are built from source.

The jobarchived leaks memory of ~600MB/day in my case, while continously give this message:
"Unhandled exception in thread started by".
This message happens no matter using rrdtool's internal python module (builds when python-devel exists), external py-rrdtool, or even deprecated rrdtool binary calling.
Except for these, my jobarchived installation works perfectly.

comment:7 Changed 13 years ago by ramonb

  • Cc mike.scchen@… removed

comment:8 Changed 9 years ago by ramonb

  • Milestone set to 1.0
  • Resolution set to worksforme
  • Status changed from assigned to closed

should be fixed in 1.0

Note: See TracTickets for help on using tickets.