Custom Query (101 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (46 - 48 of 101)

Ticket Resolution Summary Owner Reporter
#78 invalid jobmonarch data not showing up in ganglia stream somebody jsarlo@…
Description

We had a network issue on our cluster this morning where the frontend couldn't get to any of the nodes. We have that fixed and ganglia is showing all information, but jobmonarch data is not showing up anymore. Nothing should have changed that I can figure out. When I do telnet localhost 8649, nothing for jobmonarch shows up. If I change the jobmond.conf to have debug 10 and not run as daemon, it shows the information, but still nothing in telnet localhost 8649. I can't figure out what might have this messed up.

Any ideas on what to try?

Thanks. Jeff

#40 fixed Jobmond dies when monitoring a number of jobs > number of processors bastiaans anonymous
Description

Jobmond dies with the following traceback when it is monitoring a number of jobs which is superior to the number of processors:

Traceback (most recent call last):

File "/usr/local/sbin/jobmond.py", line 811, in ?

main()

File "/usr/local/sbin/jobmond.py", line 806, in main

gather.run()

File "/usr/local/sbin/jobmond.py", line 339, in run

self.jobs = self.getJobData( self.jobs )

File "/usr/local/sbin/jobmond.py", line 623, in getJobData

count_mynodes = count_mynodes + int( nodepart )

ValueError?: invalid literal for int():

A patch for the error is provided as an attachment.

#72 worksforme jobmond memory leak ramonb ramonb
Description

At least in version 0.3.1 there is a memory leak present in jobmond.

Have now received multiple reports of jobmond consuming multiple gigabytes of memory after a while.

Note: See TracQuery for help on using queries.