[[PageOutline]] = Welcome to '''Job Monarch'''! = Job Monarch is an addon to the [http://www.ganglia.info/ Ganglia Monitoring System] that provides (batch) job monitoring and graphical overview of clusters and assorted batch systems. Monarch is an abbreviation for Monitoring and Archiving, as Monarch also provides the ability to archive these job (monitoring) statistics so that your (batch) cluster users may lookup job information of old (and possibly failed) jobs to analyze possible problems. == Features == Job Monarch stands for 'Job Monitoring and Archiving' tool and consists of three (3) components: __jobmond__:: The Job Monitoring Daemon. Gathers batch statistics on jobs/nodes and submits them into Ganglia's XML stream. Through this daemon, users are able to view the PBS/Torque batch system and the jobs/nodes that are in it (be it either running or queued). Currently supported batch systems: * PBS * Torque __jobarchived (optionally)__:: The Job Archiving Daemon. Listens to Ganglia's XML stream and archives the job and node statistics. It stores the job statistics in a (Postgres) SQL database and the node statistics in RRD files. Through this daemon, users are able to lookup a old/finished job and view all it's statistics. Optionally: You can either choose to use this daemon if your users have use for it. As it can be a heavy application to run and not everyone may have a need for it. * Key features * Multithreaded[[BR]] Will not miss any data regardless of (slow) storage * Staged writing[[BR]] Spread load over bigger time periods * High precision RRDs[[BR]] Allow for zooming on old periods with large precision * Timeperiod RRDs[[BR]] Allow for smaller number of files while still keeping advantage of small disk space __web__:: The Job Monarch web interface. This interfaces with the jobmond data and (optionally) the jobarchived and presents the data and graphs. It does this in a similar layout/setup as Ganglia itself, so the navigation and usage is intuitive. * Key features * Graphical usage[[BR]] Displays graphical cluster overview so you can see the cluster (job) state in one view/image and additional pie chart with relevant information on your current view * Filters[[BR]] Ability to filter output to limit information displayed (usefull for those clusters with 500+ jobs). This also filters the graphical overview images output and pie chart so you only see the filter relevant data * Archive[[BR]] When enabling jobarchived, users can go back as far as recorded in the database or archived RRDs to find out what happened to a crashed or old job * Zoom ability[[BR]] Users can zoom into a timepriod as small as the smallest grain of the RRDS (typically up to 10 seconds) when a jobarchived is present = Documentation = Visit our online documentation here: * [wiki:Documentation/Requirements Requirements] * [wiki:Documentation/Installation Installation] * [wiki:Documentation/Configuration Configuration] * [wiki:Documentation/Usage Usage] * [wiki:Documentation/FAQ Frequently Asked Questions] = Screenshots = You can have a look at a number of screenshots, displaying Job Monarch in action: * [wiki:Documentation/Screenshots Screenshots] = Working example preview = You can see a working preview/example here: * [http://ganglia.sara.nl/addons/job_monarch/?c=LISA%20Cluster SARA's Lisa Cluster Job Monarch page] * [http://ganglia.sara.nl/?c=LISA%20Cluster SARA's Lisa Cluster Ganglia page] = Download = You can grab the tarball from our ftp site: * [ftp://ftp.sara.nl/pub/outgoing/ganglia_jobmonarch-latest.tar.gz ganglia_jobmonarch-latest.tar.gz] There are also DEB and RPM packages available, get the latest versions here: * [ftp://ftp.sara.nl/pub/outgoing/jobmonarch/latest/ ftp://ftp.sara.nl/pub/outgoing/jobmonarch/latest/] == Source code == You can browse the current code here: * [source:tags/] -- releases (stable) * [source:trunk/] -- current development (non-stable) Or you can check out code (anonymous read-only) through subversion: * {{{svn co https://subtrac.sara.nl/oss/svn/jobmonarch/tags}}} -- releases (stable) * {{{svn co https://subtrac.sara.nl/oss/svn/jobmonarch/trunk}}} -- current development (non-stable) === Build packages === You can build the RPM and DEB packages or tarballs yourself from the SVN tree, through the Makefile. {{{ make deb make rpm make tarball }}} If you want to change the web installdir of your packages for example, or simply test a development version. = Report bugs = You can create tickets and/or submit patches in our ticket system: * [https://subtrac.sara.nl/oss/jobmonarch/newticket Create a new ticket] -- Don't forget to supply your e-mail address if you would like to be kept informed! * [report:1 View current tickets] = Links = * [https://subtrac.sara.nl/oss/pbs_python pbs_python] -- Homepage of pbs_python, this python module is used for gathering job statistics from PBS/Torque * [http://www.ganglia.info/ Ganglia] -- Homepage of The Ganglia Monitoring System = Looking for help! = To maximize the results and compatibility we are looking for help! Don't hesitate to contact the author below, if you think you can help with: * Testing Job Monarch: currently only python2.4 with PHP4 and PBS is actively tested * Can you test cross platform: x64, ia64 ? * Can you test cross batch: SGE, LSF ? * Can you test cross versions: PHP4, PHP5 and python 2.3 or 2.5 ? * Implementing more batch systems support * for example: !LoadLeveler is still wanted * If you think you can contribute in any other way = Contact & Community = Two mailinglists have been set up to support the Job Monarch project. For user community discussion and help: * https://lists.sourceforge.net/lists/listinfo/jobmonarch-users/ For project development progress and discussion: * https://lists.sourceforge.net/lists/listinfo/jobmonarch-developers/