Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#40 closed defect (fixed)

Jobmond dies when monitoring a number of jobs > number of processors

Reported by: anonymous Owned by: bastiaans
Priority: critical Milestone: 0.3
Component: jobmond Version: 0.2
Keywords: Cc:
Estimated Number of Hours:

Description

Jobmond dies with the following traceback when it is monitoring a number of jobs which is superior to the number of processors:

Traceback (most recent call last):

File "/usr/local/sbin/jobmond.py", line 811, in ?

main()

File "/usr/local/sbin/jobmond.py", line 806, in main

gather.run()

File "/usr/local/sbin/jobmond.py", line 339, in run

self.jobs = self.getJobData( self.jobs )

File "/usr/local/sbin/jobmond.py", line 623, in getJobData

count_mynodes = count_mynodes + int( nodepart )

ValueError?: invalid literal for int():

A patch for the error is provided as an attachment.

Attachments (1)

jobmond.py.patch (626 bytes) - added by aloga@… 14 years ago.

Download all attachments as: .zip

Change History (6)

Changed 14 years ago by aloga@…

comment:1 Changed 14 years ago by bastiaans

  • Cc aloga@… added
  • Milestone set to 0.2.1
  • Owner changed from somebody to bastiaans
  • Status changed from new to assigned

Thanks a lot for this and your other patches!

This particular bug has been hard te debug/find and didn't occur at my local site. ;)

comment:2 Changed 14 years ago by bastiaans

  • Cc aloga@… removed

comment:3 Changed 14 years ago by bastiaans

  • Cc aloga@… added
  • Milestone changed from 0.2.1 to 0.3
  • Resolution set to fixed
  • Status changed from assigned to closed

Committed in changelog r451.

Bug was caused by numeric_node variable not initialized properly within loop.

Documented noderequest process better now.

comment:4 Changed 14 years ago by bastiaans

  • Cc aloga@… removed
  • Milestone changed from 0.4 to 0.3
Note: See TracTickets for help on using tickets.