Opened 10 years ago
Closed 10 years ago
#163 closed defect (fixed)
Exception in thread store_metric_thread
Reported by: | vitt@… | Owned by: | ramonb |
---|---|---|---|
Priority: | minor | Milestone: | 1.1 |
Component: | jobarchived | Version: | 1.0 |
Keywords: | Cc: | ||
Estimated Number of Hours: | |||
Description
I have tried version 1.0. Jobarchive is available in web, it shows list archived jobs, but there is no store metrics.
[root@master ~]# service jobarchived start Starting Job Archiving Daemon: Sun 05 May 2013 23:53:00 - XML: Handler created Sun 05 May 2013 23:53:00 - Checking database.. Sun 05 May 2013 23:53:00 - Check done. Sun 05 May 2013 23:53:00 - Checking rrd archive.. Sun 05 May 2013 23:53:00 - Check done. Sun 05 May 2013 23:53:00 - job_xml_thread(): started. Sun 05 May 2013 23:53:00 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:00 - job_xml_thread(): Done retrieving: data size 37183 Sun 05 May 2013 23:53:00 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:53:00 - main threading started. Sun 05 May 2013 23:53:00 - ganglia_xml_thread(): started. Sun 05 May 2013 23:53:00 - ganglia_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): started. Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): Done retrieving: data size 37183 Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): Parsing XML.. Sun 05 May 2013 23:53:00 - ganglia_store_metric_thread(): started. Sun 05 May 2013 23:53:00 - ganglia_store_metric_thread(): Storing data.. Sun 05 May 2013 23:53:00 - ganglia_store_thread(): started. Sun 05 May 2013 23:53:00 - Entering storeMetrics() Sun 05 May 2013 23:53:00 - size of cluster 'Test Cluster': 0 hosts 0 metrics 0 values 0 bits 0 bytes Sun 05 May 2013 23:53:00 - ganglia_store_thread(): Sleeping.. (60s) Sun 05 May 2013 23:53:00 - Leaving storeMetrics() Sun 05 May 2013 23:53:00 - ganglia_store_metric_thread(): Done storing. Sun 05 May 2013 23:53:00 - ganglia_store_metric_thread(): finished. Sun 05 May 2013 23:53:00 - XML: Start document Sun 05 May 2013 23:53:00 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:53:00 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:53:00 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:53:00 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:53:00 - job_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): Done parsing. Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): finished. Sun 05 May 2013 23:53:15 - ganglia_xml_thread(): Done sleeping. Sun 05 May 2013 23:53:15 - ganglia_xml_thread(): finished. Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): started. Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:15 - ganglia_xml_thread(): started. Sun 05 May 2013 23:53:15 - ganglia_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): Done retrieving: data size 37196 Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): Parsing XML.. Sun 05 May 2013 23:53:15 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:15 - job_xml_thread(): Done retrieving: data size 37196 Sun 05 May 2013 23:53:15 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:53:15 - XML: Start document Sun 05 May 2013 23:53:15 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:53:15 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:53:15 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:53:15 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:53:15 - job_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): Done parsing. Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): finished. Sun 05 May 2013 23:53:30 - ganglia_xml_thread(): Done sleeping. Sun 05 May 2013 23:53:30 - ganglia_xml_thread(): finished. Sun 05 May 2013 23:53:30 - ganglia_xml_thread(): started. Sun 05 May 2013 23:53:30 - ganglia_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): started. Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): Done retrieving: data size 37194 Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): Parsing XML.. Sun 05 May 2013 23:53:30 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:30 - job_xml_thread(): Done retrieving: data size 37194 Sun 05 May 2013 23:53:30 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:53:30 - XML: Start document Sun 05 May 2013 23:53:30 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:53:30 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:53:30 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:53:30 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:53:30 - job_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): Done parsing. Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): finished. Sun 05 May 2013 23:53:45 - ganglia_xml_thread(): Done sleeping. Sun 05 May 2013 23:53:45 - ganglia_xml_thread(): finished. Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): started. Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:45 - ganglia_xml_thread(): started. Sun 05 May 2013 23:53:45 - ganglia_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): Done retrieving: data size 37162 Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): Parsing XML.. Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): Done parsing. Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): finished. Sun 05 May 2013 23:53:45 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:45 - job_xml_thread(): Done retrieving: data size 37162 Sun 05 May 2013 23:53:45 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:53:45 - XML: Start document Sun 05 May 2013 23:53:45 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:53:45 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:53:45 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:53:45 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:53:45 - job_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:54:00 - ganglia_store_thread(): Done sleeping. Sun 05 May 2013 23:54:00 - ganglia_store_thread(): finished. Sun 05 May 2013 23:54:00 - ganglia_store_metric_thread(): started. Sun 05 May 2013 23:54:00 - ganglia_store_metric_thread(): Storing data.. Sun 05 May 2013 23:54:00 - Entering storeMetrics() Sun 05 May 2013 23:54:00 - size of cluster 'Test Cluster': 1 hosts 97 metrics 388 values 6172 bits 771 bytes Sun 05 May 2013 23:54:00 - ganglia_store_thread(): started. Sun 05 May 2013 23:54:00 - ganglia_store_thread(): Sleeping.. (60s) Exception in thread store_metric_thread: Traceback (most recent call last): File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap self.run() File "/usr/lib64/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "/usr/sbin/jobarchived", line 1464, in storeThread ret = self.myXMLHandler.storeMetrics() File "/usr/sbin/jobarchived", line 1188, in storeMetrics ret = rrdh.storeMetrics() File "/usr/sbin/jobarchived", line 1843, in storeMetrics create_ret = self.createCheck( hostname, metricname, period ) File "/usr/sbin/jobarchived", line 1982, in createCheck heartbeat = 8 * int( interval ) TypeError: int() argument must be a string or a number Sun 05 May 2013 23:54:00 - ganglia_xml_thread(): Done sleeping. Sun 05 May 2013 23:54:00 - ganglia_xml_thread(): finished. Sun 05 May 2013 23:54:00 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:54:00 - job_xml_thread(): Done retrieving: data size 37162 Sun 05 May 2013 23:54:00 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:54:00 - XML: Start document Sun 05 May 2013 23:54:00 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:54:00 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:54:00 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:54:00 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:54:00 - job_xml_thread(): Sleeping.. (15s)
Change History (5)
comment:1 Changed 10 years ago by ramonb
- Owner changed from somebody to ramonb
- Status changed from new to assigned
comment:2 Changed 10 years ago by ramonb
- Milestone set to 1.1
- Priority changed from normal to minor
This exception is probably triggered when the jobmond metrics are not completely/correctly reported, as caused in a network issue described below.
It is not a big bug, nevertheless should catch this in job archived and continue along, perhaps only issuing a warning that jobmond is not running (correctly)
Hi, I had wrong ganglia configuration. Torque on my test server station provides follows names of hosts: master.cluster (head node), node01.cluster (compute node). But gmond provides other names of host (from /etc/hosts) that correspond to the configuration of my second network interface (internet) on ones. I have added static route to my first (cluster) interface and the problem has resolved (route add -host 239.2.11.71 dev eth0). Currently the jobs and RRD graph's are stored in the database. .... Tue 07 May 2013 00:20:42 - ganglia_store_thread(): started. Tue 07 May 2013 00:20:42 - ganglia_store_thread(): Sleeping.. (60s) Tue 07 May 2013 00:20:42 - size of cluster 'TestCluster': 2 hosts 192 metrics 768 values 12153 bits 1519 bytes Tue 07 May 2013 00:20:42 - Leaving storeMetrics() Tue 07 May 2013 00:20:42 - Entering storeMetrics() Tue 07 May 2013 00:20:42 - size of cluster 'Test Cluster': 2 hosts 192 metrics 0 values 0 bits 0 bytes Tue 07 May 2013 00:20:42 - Leaving storeMetrics() Tue 07 May 2013 00:20:42 - ganglia_store_metric_thread(): Done storing. Tue 07 May 2013 00:20:42 - ganglia_store_metric_thread(): finished. Tue 07 May 2013 00:20:44 - job_xml_thread(): Retrieving XML data.. Tue 07 May 2013 00:20:44 - job_xml_thread(): Done retrieving: data size 70591 Tue 07 May 2013 00:20:44 - job_xml_thread(): Parsing XML.. .... Anyway I still working around configurations of the network, ganglia and jobmonarch. And I don't completely understand how to I should to make those configurations. Best Regards
comment:3 Changed 10 years ago by ramonb
I'm having a hard time reproducing this, but it is related to Ganglia config parsing.
comment:4 Changed 10 years ago by ramonb
In 855:
comment:5 Changed 10 years ago by ramonb
- Resolution set to fixed
- Status changed from assigned to closed
While I am unable to reproduce what exactly caused this Exception (regardless of (mis)configuration issues) I have now made the interval determination more robust. In addition a check is now performed to prevent jobarchived.conf misconfiguration.
This Exception should no longer happen.
So there are jobs stored in the database, but no RRD graph's stored?
Will investigate