| 8 | == Features == |
| 9 | |
| 10 | Job Monarch stands for 'Job Monitoring and Archiving' tool and consists of three (3) components: |
| 11 | |
| 12 | __jobmond__:: |
| 13 | |
| 14 | The Job Monitoring Daemon. |
| 15 | |
| 16 | Gathers PBS/Torque batch statistics on jobs/nodes and submits them into |
| 17 | Ganglia's XML stream. |
| 18 | |
| 19 | Through this daemon, users are able to view the PBS/Torque batch system and the |
| 20 | jobs/nodes that are in it (be it either running or queued). |
| 21 | |
| 22 | __jobarchived (optionally)__:: |
| 23 | |
| 24 | The Job Archiving Daemon. |
| 25 | |
| 26 | Listens to Ganglia's XML stream and archives the job and node statistics. |
| 27 | It stores the job statistics in a Postgres SQL database and the node statistics |
| 28 | in RRD files. |
| 29 | |
| 30 | Through this daemon, users are able to lookup a old/finished job |
| 31 | and view all it's statistics. |
| 32 | |
| 33 | Optionally: You can either choose to use this daemon if your users have use for it. |
| 34 | As it can be a heavy application to run and not everyone may have a need for it. |
| 35 | |
| 36 | * Key features |
| 37 | * Multithreaded[[BR]] |
| 38 | Will not miss any data regardless of (slow) storage |
| 39 | * Staged writing[[BR]] |
| 40 | Spread load over bigger time periods |
| 41 | * High precision RRDs[[BR]] |
| 42 | Allow for zooming on old periods with large precision |
| 43 | * Timeperiod RRDs[[BR]] |
| 44 | Allow for smaller number of files while still keeping advantage of small disk space |
| 45 | |
| 46 | __web__:: |
| 47 | |
| 48 | The Job Monarch web interface. |
| 49 | |
| 50 | This interfaces with the jobmond data and (optionally) the jobarchived and presents the |
| 51 | data and graphs. |
| 52 | |
| 53 | It does this in a similar layout/setup as Ganglia itself, so the navigation and usage is intuitive. |
| 54 | |
| 55 | * Key features |
| 56 | * Graphical usage[[BR]] |
| 57 | Displays graphical cluster overview so you can see the cluster (job) state |
| 58 | in one view/image and additional pie chart with relevant information on your |
| 59 | current view |
| 60 | * Filters[[BR]] |
| 61 | Ability to filter output to limit information displayed (usefull for those |
| 62 | clusters with 500+ jobs). This also filters the graphical overview images output |
| 63 | and pie chart so you only see the filter relevant data |
| 64 | * Archive[[BR]] |
| 65 | When enabling jobarchived, users can go back as far as recorded in the database |
| 66 | or archived RRDs to find out what happened to a crashed or old job |
| 67 | * Zoom ability[[BR]] |
| 68 | Users can zoom into a timepriod as small as the smallest grain of the RRDS |
| 69 | (typically up to 10 seconds) when a jobarchived is present |
| 70 | |
| 71 | == Requirements == |
| 72 | |
| 73 | * Python 2.3 or higher |
| 74 | |
| 75 | __jobmond__ |
| 76 | |
| 77 | * pbs_python v2.8.2 or higher[[BR]] |
| 78 | ftp://ftp.sara.nl/pub/outgoing/pbs_python.tar.gz |
| 79 | * gmond v3.0.1 or higher[[BR]] |
| 80 | http://www.ganglia.info |
| 81 | |
| 82 | __jobarchived__ |
| 83 | |
| 84 | * Postgres SQL v7.xx[[BR]] |
| 85 | http://www.postgres.org |
| 86 | |
| 87 | * rrdtool v1.xx[[BR]] |
| 88 | http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/ |
| 89 | |
| 90 | * python-pgsql v4.x.x[[BR]] |
| 91 | http://sourceforge.net/projects/pypgsql/ |
| 92 | |
| 93 | * gmetad v3.x.x[[BR]] |
| 94 | http://www.ganglia.info |
| 95 | |
| 96 | __web__ |
| 97 | |
| 98 | * PHP v4.1 or higher[[BR]] |
| 99 | http://www.php.net |
| 100 | * php-mbstring (multibyte string handling support)[[BR]] |
| 101 | (configure php with --enable-mbstring) |
| 102 | * php-pgsql v4.x.x[[BR]] |
| 103 | (should come with Postgres) |
| 104 | * GD v2.x[[BR]] |
| 105 | http://www.boutell.com/gd/ |
| 106 | * Ganglia web frontend v3.x.x[[BR]] |
| 107 | http://www.ganglia.info |
| 108 | |