Custom Query (101 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (67 - 69 of 101)

Ticket Resolution Summary Owner Reporter
#3 fixed jobarchived does not finalize jobs that exited while it was not running bastiaans anonymous
Description

then they stay in 'R' state forever

#76 worksforme jobarchived does not change status to "F" ramonb j.kasiak@…
Description

Jobarchived does not update a jobs status to "F" once it finishes. Jobmond runs on the head node. gmetad runs on a seperate box. I've narrowed down the problem: when I do on my gmetad box

telnet -l ganglia localhost 8651 | grep -i monarch | grep -i 23055

<METRIC NAME="MONARCH-JOB-23055-0" VAL="status=R start_timestamp=1269222985 name=STDIN poll_interval=30 queue=batch reported=1269223164 requested_time=100:00:00 queued_timestamp=1269222984 owner=user1 nodes=p340050" TYPE="string" UNITS="" TN="442" TMAX="60" DMAX="0" SLOPE="both" SOURCE="gmond"> Connection closed by foreign host.

The job is still there!!! Only a restart of gmetad clears this. This is a problem, since jobarchived parses this xml file and puts this node in an array of active nodes, and never gets to set the job_status to "F".

How can I fix this? Thanks, Jan

#171 fixed jobarchived crashed after 20 interation XML parsing due to exception ramonb oufei.zhao@…
Description

In line 914 of jobarchived.py, Null checking on timeout_jobs should be added before accessing it. It will throw exception when timeout_jobs is 'None'. Once add a line to check null, it works fine.

if timedout_jobs != None: <== added

for j in timedout_jobs:

del self.jobAttrs[ j ] del self.jobAttrsSaved[ j ]

See below for log and stack trace: Mon 24 Jun 2013 18:00:59 - job_xml_thread(): Retrieving XML data.. Mon 24 Jun 2013 18:00:59 - job_xml_thread(): Done retrieving: data size 2656 Mon 24 Jun 2013 18:00:59 - job_xml_thread(): Parsing XML.. Mon 24 Jun 2013 18:00:59 - XML: Start document: iteration 20 Mon 24 Jun 2013 18:00:59 - XML: Processed 2 elements - found 0 jobs Mon 24 Jun 2013 18:00:59 - self.heartbeat = 0 Mon 24 Jun 2013 18:00:59 - job_xml_thread(): Done parsing. Mon 24 Jun 2013 18:00:59 - job_xml_thread(): Sleeping.. (15s) Mon 24 Jun 2013 18:01:04 - job_xml_thread(): Retrieving XML data.. Mon 24 Jun 2013 18:01:04 - job_xml_thread(): Done retrieving: data size 2656 Mon 24 Jun 2013 18:01:04 - job_xml_thread(): Parsing XML.. Mon 24 Jun 2013 18:01:04 - Housekeeping: checking database for timed out jobs.. Mon 24 Jun 2013 18:01:04 - doDatabase(): get: SELECT * from jobs WHERE job_status != 'F' Mon 24 Jun 2013 18:01:04 - doDatabase(): result: [] Exception in thread job_proc_thread: Traceback (most recent call last):

File "/usr/local/lib/python2.4/threading.py", line 442, in bootstrap

self.run()

File "/usr/local/lib/python2.4/threading.py", line 422, in run

self.target(*self.args, self.kwargs)

File "/usr/sbin/jobarchived", line 870, in run

xml.sax.parseString( my_data, self.myXMLHandler, self.myXMLError )

File "/usr/local/lib/python2.4/xml/sax/init.py", line 49, in parseString

parser.parse(inpsrc)

File "/usr/local/lib/python2.4/xml/sax/expatreader.py", line 107, in parse

xmlreader.IncrementalParser?.parse(self, source)

File "/usr/local/lib/python2.4/xml/sax/xmlreader.py", line 123, in parse

self.feed(buffer)

File "/usr/local/lib/python2.4/xml/sax/expatreader.py", line 200, in feed

self._cont_handler.startDocument()

File "/usr/sbin/jobarchived", line 914, in startDocument

for j in timedout_jobs:

TypeError?: iteration over non-sequence

Note: See TracQuery for help on using queries.