Opened 11 years ago

Last modified 11 years ago

#160 closed task

version 1.0 — at Version 13

Reported by: ramonb Owned by: ramonb
Priority: normal Milestone: 1.0
Component: general Version: 1.0
Keywords: Cc:
Estimated Number of Hours:

Description (last modified by ramonb)

Work in progress to get Job Monarch working again with new Ganglia.

Should result in release: 1.0

Now have a semi-working setup:

Tested with Ganglia 3.40 and Ganglia-web2 3.5.6

Already fixed a lot, but still some stuff to do.

TODO:

  • Still a little slow
  • update web templates to ganglia-web2 html/css
  • fix running/queued jobs graph
  • rename metrics so they display lower on alphabet sort of metrics list
  • fix job arrays
  • check archive
  • address (missing) jobrange/jobstart line for RRDs in overview

Change History (13)

comment:1 Changed 11 years ago by ramonb

  • Description modified (diff)
  • Version changed from 0.3.1 to 0.4

comment:2 Changed 11 years ago by ramonb

  • Description modified (diff)

comment:3 Changed 11 years ago by ramonb

job arrays use same style jobid-sequence as jobmond, so think should change separator

i.e.:

6758475-1.batch1     marenh   serial   JobStopoGAIN-EA.  10364     1   1    --  15:00 R 01:23
   r40n22/0
 
6758490.batch1.l     mbotseka serial   batch_all6_P.txt    --      1   1    --  02:20 R   -- 
   r39n9/0
    -- 

and in jobmond

Fri, 22 Mar 2013 13:14:02 [gmetric 145.101.32.3:8649] name: zplugin-monarch-job-6758490-0 - val: status=R start_timestamp=1363953787 name=batch_all6_P.txt poll_interval=120 domain=lisa.surfsara.nl queue=serial reported=1363954441 requested_time=02:20:00 queued_timestamp=1363950186 owner=mbotseka nodes=r39n9 - dmax: 240

Fri, 22 Mar 2013 13:14:02 [gmetric 145.101.32.3:8649] name: zplugin-monarch-job-6758475-1-0 - val: status=R start_timestamp=1363949179 name=JobStopoGAIN-EA.txt-1 poll_interval=120 domain=lisa.surfsara.nl queue=serial reported=1363954441 requested_time=15:00:00 queued_timestamp=1363949174 owner=marenh nodes=r40n22 - dmax: 240

that screws things up

comment:4 Changed 11 years ago by ramonb

  • Description modified (diff)
  • Owner changed from somebody to ramonb
  • Status changed from new to assigned

comment:5 Changed 11 years ago by ramonb

it appears the jobstart / jobrange variable for graphing is no longer present ganglia-web2. might need to implement one

comment:6 Changed 11 years ago by ramonb

  • Description modified (diff)

comment:7 Changed 11 years ago by ramonb

  • Description modified (diff)

comment:8 Changed 11 years ago by ramonb

removed a unnecessary XML parsing.

reduced overview loading time (for 600 jobs and 600 nodes) from 53 seconds down to 12 seconds.

comment:9 Changed 11 years ago by ramonb

now 3 times faster at 14 seconds for 2000 jobs

comment:10 Changed 11 years ago by ramonb

now at ~10 seconds for 2000 jobs. Would be nice to get it below 5 seconds but now it's going to get tough.

already a whole lot faster. Might leave the speedup's (for now) and get started on ironing out the last web things and archive testing

comment:11 Changed 11 years ago by ramonb

  • Description modified (diff)

lots of speedups implemented, XML parsing optimized more.

rewritten templating from TemplatePower? to now use Dwoo, since TemplatePower? has changed it's licensing for commercial use.

comment:12 Changed 11 years ago by ramonb

  • Description modified (diff)

archive now working too again. Going to let it run for a while to properly test it and iron out some bugs.

Now nearing a fully functional 0.4 version. Might ask some people to test it

comment:13 Changed 11 years ago by ramonb

  • Description modified (diff)
  • Summary changed from version 0.4 to version 1.0

extensively tested new jobmond and job archived.

Working good now, squashed some more bugs. One in particular was very nasty in job archived, whenever 1 job update/insert in SQL failed, the entire job xml thread would hang while not crashing the entire daemon. This has been fixed.

Also it has been discussed to bump the version number to 1.0. This is because the new version is incompatible with previous version of Job Monarch. Amongst other things caused by changes in de database schema and changes in the job monarch protocol. Version 1.0 is still the non-Ajax-pretty-ExtJS-ui version or the old-simple-web-interface version.

Now just need to finish up the web interface of 1.0 and then we should be ready for release.

The pretty Ajax ExtJS UI version will be bumped up to version 2.0.

Note: See TracTickets for help on using tickets.