Opened 4 years ago

Last modified 4 years ago

#173 assigned defect

archiving multiple clusters malfunction

Reported by: ramonb Owned by: ramonb
Priority: normal Milestone: 1.2
Component: jobarchived Version: 1.1
Keywords: Cc:
Estimated Number of Hours: 24

Description

seems archiving multiple clusters does not work properly in 1.1.0

for starters, if you set 2 ARCHIVE_DATASOURCES, there seems to be host overlap and clashes

i.e. "Render Cluster" : 22 hosts and "LISA Cluster" : 628 hosts

but jobarchived's debug output shows:

Wed 31 Jul 2013 14:22:02 - size of cluster 'Render Cluster': 0 hosts 0 metrics 0 values 0 bits 0 bytes 
Wed 31 Jul 2013 14:22:02 - size of cluster 'LISA Cluster': 1 hosts 23 metrics 23 values 367 bits 45 bytes 
Wed 31 Jul 2013 14:23:02 - size of cluster 'Render Cluster': 650 hosts 81378 metrics 273561 values 4325519 bits 540689 bytes 
Wed 31 Jul 2013 14:30:03 - size of cluster 'Render Cluster': 650 hosts 81378 metrics 1057378 values 16719410 bits 2089926 bytes 
Wed 31 Jul 2013 14:37:04 - size of cluster 'Render Cluster': 650 hosts 81378 metrics 829343 values 13115240 bits 1639405 bytes 
Wed 31 Jul 2013 14:43:26 - size of cluster 'LISA Cluster': 650 hosts 81378 metrics 369914 values 5850154 bits 731269 bytes 
Wed 31 Jul 2013 14:44:04 - size of cluster 'Render Cluster': 650 hosts 81378 metrics 361926 values 5723618 bits 715452 bytes 
Wed 31 Jul 2013 14:44:50 - size of cluster 'LISA Cluster': 650 hosts 81378 metrics 313158 values 4953612 bits 619201 bytes 
Wed 31 Jul 2013 14:51:05 - size of cluster 'Render Cluster': 650 hosts 81378 metrics 212615 values 3364312 bits 420539 bytes 
Wed 31 Jul 2013 14:51:40 - size of cluster 'LISA Cluster': 650 hosts 81378 metrics 188598 values 2984378 bits 373047 bytes 
Wed 31 Jul 2013 14:52:38 - size of cluster 'LISA Cluster': 650 hosts 82770 metrics 144556 values 2286402 bits 285800 bytes 
Wed 31 Jul 2013 14:58:04 - size of cluster 'Render Cluster': 650 hosts 82983 metrics 162484 values 2569767 bits 321220 bytes 
Wed 31 Jul 2013 14:58:21 - size of cluster 'LISA Cluster': 650 hosts 82983 metrics 157824 values 2496428 bits 312053 bytes 
Wed 31 Jul 2013 15:05:04 - size of cluster 'Render Cluster': 650 hosts 82983 metrics 309337 values 4896702 bits 612087 bytes 
Wed 31 Jul 2013 15:05:07 - size of cluster 'LISA Cluster': 650 hosts 82983 metrics 315044 values 4987062 bits 623382 bytes 
Wed 31 Jul 2013 15:10:54 - size of cluster 'LISA Cluster': 650 hosts 82983 metrics 290641 values 4597175 bits 574646 bytes 

it looks like all hosts are added to both clusters: 628 + 22 = 650.

This results in hosts being stored twice in RRD.

In addition jobs from Render Cluster do not seem to be picked up, but only LISA Cluster's jobs.

Change History (7)

comment:1 Changed 4 years ago by ramonb

  • Owner changed from somebody to ramonb
  • Status changed from new to assigned

parsing seems to work properly. metric's are stored properly in their own clusters.

comment:2 Changed 4 years ago by ramonb

In 930:

jobarchived/jobarchived.py:

  • rearranged RRDHandler's class variables to be explicitly set in constructor so that they are local to instance
  • small changes to debug messages
  • see #173

comment:3 Changed 4 years ago by ramonb

that change fixed the first part

Wed 31 Jul 2013 16:36:56 - size of cluster 'Render Cluster': 22 hosts 3192 metrics 9574 values 150720 bits 18840 bytes 
Wed 31 Jul 2013 16:37:14 - size of cluster 'LISA Cluster': 624 hosts 77410 metrics 254148 values 4019597 bits 502449 bytes 
Wed 31 Jul 2013 16:42:56 - size of cluster 'Render Cluster': 22 hosts 3214 metrics 22828 values 359319 bits 44914 bytes 
Wed 31 Jul 2013 16:43:18 - size of cluster 'LISA Cluster': 624 hosts 77410 metrics 314220 values 4969672 bits 621209 bytes 
Wed 31 Jul 2013 16:45:30 - size of cluster 'Render Cluster': 22 hosts 3216 metrics 5828 values 91852 bits 11481 bytes 
Wed 31 Jul 2013 16:45:31 - size of cluster 'LISA Cluster': 624 hosts 77410 metrics 95822 values 1516196 bits 189524 bytes 
Wed 31 Jul 2013 16:46:30 - size of cluster 'Render Cluster': 22 hosts 3216 metrics 1279 values 20566 bits 2570 bytes 
Wed 31 Jul 2013 16:46:31 - size of cluster 'LISA Cluster': 624 hosts 77410 metrics 163546 values 2588273 bits 323534 bytes 

now works properly again.

comment:4 Changed 4 years ago by ramonb

now the jobs from both clusters are not yet properly found, only the first cluster somehow.

the rrd's are good now

comment:5 Changed 4 years ago by ramonb

  • Estimated Number of Hours set to 24

comment:6 Changed 4 years ago by ramonb

In 945:

jobarchived/jobarchived.py:

  • see #173
  • changed threading: now each cluster gets it's own ganglia xml/store threads
  • added some yappi profiling functions for debugging
  • better debug statements
  • many performance improvements:
    • now use deque collections in stead of lists for storing metrics: faster appends and pops
    • remove some typecasts
    • replaced xml readlines() with read()
    • XMLDataGatherer now truly caches and prevents unnecessary data retrieval
    • replaced some if statements with catch/excepts: is faster
    • disabled thread locking
    • excluded metrics are now ignored while storing: not while parsing, and matched with compiled regexp

comment:7 Changed 4 years ago by ramonb

In 949:

1.1/jobarchived/jobarchived.py:

  • reverted change that was supposed to go in 1.2

1.2/jobarchived/jobarchived.py:

  • see #173
  • changed threading: now each cluster gets it's own ganglia xml/store threads
  • added some yappi profiling functions for debugging
  • better debug statements
  • many performance improvements:
    • now use deque collections in stead of lists for storing metrics: faster appends and pops
    • remove some typecasts
    • replaced xml readlines() with read()
    • XMLDataGatherer now truly caches and prevents unnecessary data retrieval
    • replaced some if statements with catch/excepts: is faster
    • disabled thread locking
    • excluded metrics are now ignored while storing: not while parsing, and matched with compiled regexp
Note: See TracTickets for help on using tickets.