Custom Query (101 matches)
Results (1 - 3 of 101)
Ticket | Resolution | Summary | Owner | Reporter |
---|---|---|---|---|
#24 | worksforme | SGE support broken | ramonb | bastiaans |
Description |
After going through the instructions, I attempted to execute jobmond.py. When I did that, I received the following error message: cluster1:/usr/local/sbin # /usr/local/sbin/jobmond.py -c /etc/jobmond.conf Traceback (most recent call last): File "/usr/local/sbin/jobmond.py", line 814, in ? main() File "/usr/local/sbin/jobmond.py", line 807, in main gather.daemon() UnboundLocalError: local variable 'gather' referenced before assignment An examination of the code reveals that the SGE data gathering code was commented out on line 792. Uncommenting it had the following effect: cluster1:/usr/local/sbin # /usr/local/sbin/jobmond.py -c /etc/jobmond.conf File "/usr/local/sbin/jobmond.py", line 797 debug_msg( 0, "fatal error: BATCH_API set to 'sge' but python module 'sge_drmaa' is not installed' ) ^ SyntaxError: EOL while scanning single-quoted string Commenting out everything but "gather = SgeDataGatherer()" gave me the following error: cluster1:/usr/local/sbin # /usr/local/sbin/jobmond.py -c /etc/jobmond.conf Traceback (most recent call last): File "/usr/local/sbin/jobmond.py", line 814, in ? main() File "/usr/local/sbin/jobmond.py", line 800, in main gather = SgeDataGatherer() File "/usr/local/sbin/jobmond.py", line 419, in __init__ self.initSgeJobInfo() File "/usr/local/sbin/jobmond.py", line 426, in initSgeJobInfo self.qstatparser = SgeQstatXMLParser( SGE_QSTAT_XML_FILE ) NameError: global name 'SGE_QSTAT_XML_FILE' is not defined At this point, I decided to search my systems for references to drmaa. I saw several references to C++ example and header files related to it. Is the sge_drmaa module supposed to be provided by Job Monarch or Sun Grid Engine? |
|||
#45 | worksforme | jobarchived storage threads can't be stopped if they take too long | ramonb | bastiaans |
Description |
We need to add a function to jobarchived's storage threads so they can be stopped if it is taking too long. Or else too many storage threads may get started, since they are not killed correctly. Also see ticket #34 |
|||
#53 | worksforme | Error trying to run jobarchive | ramonb | mhanafi@… |
Description |
Looks like it doesn't find all the hosts and give the following error... I have tried version 0.3.1 and 0.4 [root@aphrodite-adm jobarchived]# python jobarchived.py Mon 17 Mar 2008 15:37:36 - Checking database.. Mon 17 Mar 2008 15:37:36 - Check done. Mon 17 Mar 2008 15:37:36 - Checking rrd archive.. Mon 17 Mar 2008 15:37:36 - Check done. Mon 17 Mar 2008 15:37:36 - torque_xml_thread(): started. Mon 17 Mar 2008 15:37:36 - torque_xml_thread(): Retrieving XML data.. Mon 17 Mar 2008 15:37:36 - torque_xml_thread(): Done retrieving. Mon 17 Mar 2008 15:37:36 - ganglia_parse_thread(): Parsing XML.. Mon 17 Mar 2008 15:37:36 - main threading started. Mon 17 Mar 2008 15:37:36 - XML: Processed 1492 elements - found 1 (updated) jobs Mon 17 Mar 2008 15:37:36 - ganglia_xml_thread(): started. Mon 17 Mar 2008 15:37:36 - ganglia_xml_thread(): Sleeping.. (15s) Mon 17 Mar 2008 15:37:36 - torque_xml_thread(): Storing.. Mon 17 Mar 2008 15:37:36 - ganglia_parse_thread(): started. Mon 17 Mar 2008 15:37:36 - ganglia_parse_thread(): Retrieving XML data.. Mon 17 Mar 2008 15:37:36 - ganglia_parse_thread(): Done retrieving. Mon 17 Mar 2008 15:37:36 - ganglia_parse_thread(): Parsing XML.. Mon 17 Mar 2008 15:37:36 - ganglia_store_metric_thread(): started. Mon 17 Mar 2008 15:37:36 - ganglia_store_metric_thread(): Storing data.. Mon 17 Mar 2008 15:37:36 - ganglia_store_thread(): started. Mon 17 Mar 2008 15:37:36 - ganglia_store_thread(): Sleeping.. (360s) Mon 17 Mar 2008 15:37:36 - Entering storeMetrics() Mon 17 Mar 2008 15:37:36 - size of cluster 'aphrodite': 3 hosts 71 metrics 71 values 1027 bits 128 bytes Exception in thread store_metric_thread: Traceback (most recent call last): File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap self.run() File "/usr/lib64/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "jobarchived.py", line 1378, in storeThread ret = self.myXMLHandler.storeMetrics() File "jobarchived.py", line 1104, in storeMetrics ret = rrdh.storeMetrics() File "jobarchived.py", line 1752, in storeMetrics create_ret = self.createCheck( hostname, metricname, period ) File "jobarchived.py", line 1891, in createCheck heartbeat = 8 * int( interval ) TypeError: int() argument must be a string or a number Mon 17 Mar 2008 15:37:36 - ganglia_parse_thread(): Done parsing. Mon 17 Mar 2008 15:37:36 - ganglia_parse_thread(): finished. Mon 17 Mar 2008 15:37:36 - torque_xml_thread(): Done storing. Mon 17 Mar 2008 15:37:36 - ganglia_parse_thread(): Done parsing. Mon 17 Mar 2008 15:37:36 - torque_xml_thread(): Sleeping.. (15s) |
Note: See TracQuery
for help on using queries.