Custom Query (101 matches)


Show under each result:

Results (34 - 36 of 101)

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Ticket Resolution Summary Owner Reporter
#78 invalid jobmonarch data not showing up in ganglia stream somebody jsarlo@…

We had a network issue on our cluster this morning where the frontend couldn't get to any of the nodes. We have that fixed and ganglia is showing all information, but jobmonarch data is not showing up anymore. Nothing should have changed that I can figure out. When I do telnet localhost 8649, nothing for jobmonarch shows up. If I change the jobmond.conf to have debug 10 and not run as daemon, it shows the information, but still nothing in telnet localhost 8649. I can't figure out what might have this messed up.

Any ideas on what to try?

Thanks. Jeff

#76 worksforme jobarchived does not change status to "F" ramonb j.kasiak@…

Jobarchived does not update a jobs status to "F" once it finishes. Jobmond runs on the head node. gmetad runs on a seperate box. I've narrowed down the problem: when I do on my gmetad box

telnet -l ganglia localhost 8651 | grep -i monarch | grep -i 23055

<METRIC NAME="MONARCH-JOB-23055-0" VAL="status=R start_timestamp=1269222985 name=STDIN poll_interval=30 queue=batch reported=1269223164 requested_time=100:00:00 queued_timestamp=1269222984 owner=user1 nodes=p340050" TYPE="string" UNITS="" TN="442" TMAX="60" DMAX="0" SLOPE="both" SOURCE="gmond"> Connection closed by foreign host.

The job is still there!!! Only a restart of gmetad clears this. This is a problem, since jobarchived parses this xml file and puts this node in an array of active nodes, and never gets to set the job_status to "F".

How can I fix this? Thanks, Jan

#170 fixed jobarchived runs wild after Non-recoverable XML error ramonb jaap.dijkshoorn@…

Jun 7 08:48:20 xtrac jobarchived: FATAL ERROR: Non-recoverable XML error <unknown>:33565:0: unclosed token Jun 7 08:48:21 xtrac jobarchived: FATAL ERROR: Non-recoverable XML error <unknown>:33565:0: unclosed token

After this the pyhton process runs wild. It also seems like hitting a memory leak. Python used 20% of memeory at time of killing

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Note: See TracQuery for help on using queries.