Custom Query (101 matches)
Results (1 - 3 of 101)
Ticket | Resolution | Summary | Owner | Reporter |
---|---|---|---|---|
#163 | fixed | Exception in thread store_metric_thread | ramonb | vitt@… |
Description |
I have tried version 1.0. Jobarchive is available in web, it shows list archived jobs, but there is no store metrics. [root@master ~]# service jobarchived start Starting Job Archiving Daemon: Sun 05 May 2013 23:53:00 - XML: Handler created Sun 05 May 2013 23:53:00 - Checking database.. Sun 05 May 2013 23:53:00 - Check done. Sun 05 May 2013 23:53:00 - Checking rrd archive.. Sun 05 May 2013 23:53:00 - Check done. Sun 05 May 2013 23:53:00 - job_xml_thread(): started. Sun 05 May 2013 23:53:00 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:00 - job_xml_thread(): Done retrieving: data size 37183 Sun 05 May 2013 23:53:00 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:53:00 - main threading started. Sun 05 May 2013 23:53:00 - ganglia_xml_thread(): started. Sun 05 May 2013 23:53:00 - ganglia_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): started. Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): Done retrieving: data size 37183 Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): Parsing XML.. Sun 05 May 2013 23:53:00 - ganglia_store_metric_thread(): started. Sun 05 May 2013 23:53:00 - ganglia_store_metric_thread(): Storing data.. Sun 05 May 2013 23:53:00 - ganglia_store_thread(): started. Sun 05 May 2013 23:53:00 - Entering storeMetrics() Sun 05 May 2013 23:53:00 - size of cluster 'Test Cluster': 0 hosts 0 metrics 0 values 0 bits 0 bytes Sun 05 May 2013 23:53:00 - ganglia_store_thread(): Sleeping.. (60s) Sun 05 May 2013 23:53:00 - Leaving storeMetrics() Sun 05 May 2013 23:53:00 - ganglia_store_metric_thread(): Done storing. Sun 05 May 2013 23:53:00 - ganglia_store_metric_thread(): finished. Sun 05 May 2013 23:53:00 - XML: Start document Sun 05 May 2013 23:53:00 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:53:00 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:53:00 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:53:00 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:53:00 - job_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): Done parsing. Sun 05 May 2013 23:53:00 - ganglia_parse_thread(): finished. Sun 05 May 2013 23:53:15 - ganglia_xml_thread(): Done sleeping. Sun 05 May 2013 23:53:15 - ganglia_xml_thread(): finished. Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): started. Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:15 - ganglia_xml_thread(): started. Sun 05 May 2013 23:53:15 - ganglia_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): Done retrieving: data size 37196 Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): Parsing XML.. Sun 05 May 2013 23:53:15 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:15 - job_xml_thread(): Done retrieving: data size 37196 Sun 05 May 2013 23:53:15 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:53:15 - XML: Start document Sun 05 May 2013 23:53:15 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:53:15 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:53:15 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:53:15 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:53:15 - job_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): Done parsing. Sun 05 May 2013 23:53:15 - ganglia_parse_thread(): finished. Sun 05 May 2013 23:53:30 - ganglia_xml_thread(): Done sleeping. Sun 05 May 2013 23:53:30 - ganglia_xml_thread(): finished. Sun 05 May 2013 23:53:30 - ganglia_xml_thread(): started. Sun 05 May 2013 23:53:30 - ganglia_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): started. Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): Done retrieving: data size 37194 Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): Parsing XML.. Sun 05 May 2013 23:53:30 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:30 - job_xml_thread(): Done retrieving: data size 37194 Sun 05 May 2013 23:53:30 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:53:30 - XML: Start document Sun 05 May 2013 23:53:30 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:53:30 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:53:30 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:53:30 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:53:30 - job_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): Done parsing. Sun 05 May 2013 23:53:30 - ganglia_parse_thread(): finished. Sun 05 May 2013 23:53:45 - ganglia_xml_thread(): Done sleeping. Sun 05 May 2013 23:53:45 - ganglia_xml_thread(): finished. Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): started. Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:45 - ganglia_xml_thread(): started. Sun 05 May 2013 23:53:45 - ganglia_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): Done retrieving: data size 37162 Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): Parsing XML.. Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): Done parsing. Sun 05 May 2013 23:53:45 - ganglia_parse_thread(): finished. Sun 05 May 2013 23:53:45 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:53:45 - job_xml_thread(): Done retrieving: data size 37162 Sun 05 May 2013 23:53:45 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:53:45 - XML: Start document Sun 05 May 2013 23:53:45 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:53:45 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:53:45 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:53:45 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:53:45 - job_xml_thread(): Sleeping.. (15s) Sun 05 May 2013 23:54:00 - ganglia_store_thread(): Done sleeping. Sun 05 May 2013 23:54:00 - ganglia_store_thread(): finished. Sun 05 May 2013 23:54:00 - ganglia_store_metric_thread(): started. Sun 05 May 2013 23:54:00 - ganglia_store_metric_thread(): Storing data.. Sun 05 May 2013 23:54:00 - Entering storeMetrics() Sun 05 May 2013 23:54:00 - size of cluster 'Test Cluster': 1 hosts 97 metrics 388 values 6172 bits 771 bytes Sun 05 May 2013 23:54:00 - ganglia_store_thread(): started. Sun 05 May 2013 23:54:00 - ganglia_store_thread(): Sleeping.. (60s) Exception in thread store_metric_thread: Traceback (most recent call last): File "/usr/lib64/python2.4/threading.py", line 442, in __bootstrap self.run() File "/usr/lib64/python2.4/threading.py", line 422, in run self.__target(*self.__args, **self.__kwargs) File "/usr/sbin/jobarchived", line 1464, in storeThread ret = self.myXMLHandler.storeMetrics() File "/usr/sbin/jobarchived", line 1188, in storeMetrics ret = rrdh.storeMetrics() File "/usr/sbin/jobarchived", line 1843, in storeMetrics create_ret = self.createCheck( hostname, metricname, period ) File "/usr/sbin/jobarchived", line 1982, in createCheck heartbeat = 8 * int( interval ) TypeError: int() argument must be a string or a number Sun 05 May 2013 23:54:00 - ganglia_xml_thread(): Done sleeping. Sun 05 May 2013 23:54:00 - ganglia_xml_thread(): finished. Sun 05 May 2013 23:54:00 - job_xml_thread(): Retrieving XML data.. Sun 05 May 2013 23:54:00 - job_xml_thread(): Done retrieving: data size 37162 Sun 05 May 2013 23:54:00 - job_xml_thread(): Parsing XML.. Sun 05 May 2013 23:54:00 - XML: Start document Sun 05 May 2013 23:54:00 - XML: Processed 518 elements - found 0 jobs Sun 05 May 2013 23:54:00 - job_xml_thread(): Found 0 updated jobs. Sun 05 May 2013 23:54:00 - job_xml_thread(): No jobs to store. Sun 05 May 2013 23:54:00 - job_xml_thread(): Done parsing. Sun 05 May 2013 23:54:00 - job_xml_thread(): Sleeping.. (15s) |
|||
#56 | invalid | Display of a name and information from nodes. | somebody | user |
Description |
Hello, I see interest situation in display the info from nodes in jobmonarch. I see two way for this: my nodes displayed as short name & full dns name (whit domain name), for example, as node01 & node01.local.net. In this case in statistics (displayed in webfronted) I have two numbers of nodes: one - always present & showed (full name), other - when jobs running on this node (short name). And Number of Nodes = real number (based on full name) + number of Nodes which now used by jobs (based on short name). And I have also two list of nodes with different name in clusterimage for jobs. But only nodes with short name marked as "J" for example. I don't know how I can solve this because I don't know who produces this thing: jobmonarch or ganglia or batch system or problem in dns. What people think about this & how I can identify my problem. Thanx. |
|||
#69 | wontfix | Job information leaking over from one Ganglia cluster to another when clusters are in the same PBS queue | ramonb | renfro@… |
Description |
At one time, I had many Torque queues: one for each group of homogeneous systems. Since I couldn't rely on my users to consistently check qstat, showq, or Ganglia before submitting a job to an queue with free CPUs rather than a queue with none, I converted my Torque settings to put all cluster systems into one queue, and use Maui partitions to keep parallel jobs on a group of homogeneous systems. This has worked out great as far as queue efficiency is concerned. Now that I'm getting Job Monarch integrated into the setup, I've noticed that active jobs in my batch queue show up in all cluster joblists and overviews, even when that particular cluster has no active jobs on its nodes. I'll try to attach screenshots, but if my users still have jobs running when you read this ticket, you can see for yourself on the live server: Ganglia knows, for example, that "ChE Compute Nodes" has 9 systems in it named ch226-11 ... ch226-19. Monarch displays those, but also displays ch226-29 and ch226-31 from the "PNGV Project Compute Nodes" cluster that had active jobs. Any cluster view where Monarch was enabled had this effect. |