[[ShowPath]] [[PageOutline]] = Configuration = After installation each component requires additional configuration. == jobmond == Here is an example of a typical jobmond.conf file contents: {{{ [DEFAULT] # Specify debugging level here; # # 10 = gemtric cmd's # DEBUG_LEVEL : 0 # Wether or not to run as a daemon in background # DAEMONIZE : 1 # What Batch type is the system # # Currently supported: pbs, slurm, sge (experimental), lsf (experimental) # BATCH_API : pbs # Which Batch server to monitor # BATCH_SERVER : localhost # Which queue(s) to report jobs of # (optional) # #QUEUE : long, short # How many seconds interval for polling of jobs # # this will effect directly how accurate the # end time of a job can be determined # BATCH_POLL_INTERVAL : 30 # Location of gmond.conf # # Default: /etc/gmond.conf # # DEPRECATED!: use GMETRIC_TARGET! # #GMOND_CONF : /etc/gmond.conf # Location of gmetric binary # # Default: /usr/bin/gmetric # # DEPRECATED!: use GMETRIC_TARGET! # #GMETRIC_BINARY : /usr/bin/gmetric # Target of Gmetric's: where should we report to # (usually: your udp_send_channel from gmond) # # Syntax: : # GMETRIC_TARGET : 239.2.11.71:8649 # Enable logging to syslog? # USE_SYSLOG : 1 # What level msg'es should be logged to syslog? # # usually: lvl 0 (errors) # SYSLOG_LEVEL : 0 # Which facility to use in syslog # # Known: # KERN, USER, MAIL, DAEMON, AUTH, LPR, # NEWS, UUCP, CRON and LOCAL0 through LOCAL7 # SYSLOG_FACILITY : DAEMON # Wether or not to detect differences in # time from Torque server and local time. # # Ideally both machines (if not the same) # should have the same time (via ntp or whatever) # DETECT_TIME_DIFFS : 1 # Regexp style hostname translation # # Usefull if your Batch hostnames are not the same as your # Ganglia hostnames (different network interfaces) # # Syntax: /orig/new/, /orig/new/ # BATCH_HOST_TRANSLATE : }}} === DEBUG_LEVEL === * required * valid values: any number between 0 - 20 This level sets which level of messages are either syslogged (in daemon mode) and/or printed to stdout (in foreground mode) === DAEMONIZE === * required * valid values: 0 or 1 * 0 : Don't daemonize: run in the foreground : any DEBUG_LEVEL messages are sent to stdout * 1 : Daemonize: run in the background : any DEBUG_LEVEL messages are sent to syslog Determines wether or not jobmond should run as daemon in background. === BATCH_API === * required * valid values: pbs, slurm, sge, lsf * pbs : PBS or [http://www.adaptivecomputing.com/products/open-source/torque/ TORQUE Resource Manager] * requires [https://oss.trac.surfsara.nl/pbs_python pbs_python] * slurm : [http://slurm.schedmd.com SLURM Workload Manager] * requires [http://www.gingergeeks.co.uk/pyslurm/ pyslurm] * lsf : (experimental) [http://www-03.ibm.com/systems/technicalcomputing/platformcomputing/products/lsf/ IBM Platform LSF] * requires [http://sourceforge.net/projects/lsfobject/ lsfObject] * sge : (experimental) [http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html Oracle Grid Engine] or [http://gridscheduler.sourceforge.net Open Grid Scheduler] What type of batch (api) system is used. === BATCH_SERVER === * optional * valid values: any text string Tell's jobmond wether or not to connect to a remote batch server (of type BATCH_API) or not. If set: connect with BATCH_API to BATCH_SERVER If not set: use BATCH_API on local system where jobmond is running (should be on batch server) === QUEUE === * optional * valid values: any text string or comma seperated list Specifies which queue's of the batch system to monitor. If you would like to limit job reporting to only certain queue's, you can specify them here. * If set: only jobs are reported that reside in QUEUE * If not set: all jobs are reported === BATCH_POLL_INTERVAL === * required * valid values: any number (of seconds) Sets how often jobmond will poll the BATCH_API and how often this info will be reported. This directly affects how accurately jobarchived can monitor for finished jobs. For example: if this is set to 180 seconds and a job has finished it may take jobarchived up to 180 seconds to set an finished time in the job database === GMOND_CONF === * optional * default: /etc/ganglia/gmond.conf * valid values: any text string Specifies location of Ganglia's gmond.conf: * If set: jobmond checks GMOND_CONF for which udp_send_channel's to use for reporting job metrics * If not set: jobmond uses GMETRIC_TARGET for reporting jobs metrics === GMETRIC_BINARY === * deprecated * optional * valid values: any text string Specifies location of Ganglia's gmetric binary. This forces jobmond to use Ganglia's gmetric binary to report jobs. This should not be needed or used: jobmond uses it's own internal gmetric handling, which is much faster. * If set: disables jobmond internal gmetric handling: submit gmetrics using GMETRIC_BINARY : requires GMOND_CONF to be set * If not set: jobmond internal gmetric handling is used === GMETRIC_TARGET === * optional * valid values: * : Specifies where to report job information to. This can be a multicast or unicast address. There must be a gmond running that has this address set as udp_receive_channel and proper network routes have to be set up to this network address. * If set: report job information to GMETRIC_TARGET * If not set: report job information to udp_send_channel's found in GMOND_CONF : requires a valid GMOND_CONF === USE_SYSLOG === * required * valid values: 0 or 1: * 0: Don't log messages * 1: Log any messages at DEBUG_LEVEL to syslog's SYSLOG_FACILITY Specifies wether or not to use syslog for any messages === SYSLOG_FACILITY === * required * valid values: KERN, USER, MAIL, DAEMON, AUTH, LPR, NEWS, UUCP, CRON, LOCAL0, LOCAL1, LOCAL2, LOCAL3, LOCAL4, LOCAL5, LOCAL6, LOCAL7 Specifies to which syslog facility any syslog messages are sent. === DETECT_TIME_DIFFS === * required * valid values: 0 or 1 * 0: Don't detect time differences * 1: Detect time difference between BATCH_SERVER and localhost When a remote BATCH_SERVER is used, this will tell jobmond to detect and compensate for any time difference's between localhost and remote BATCH_SERVER. Ideally both servers should utilize NTP to maintain the same date/time. === BATCH_HOST_TRANSLATE === * required * default: (empty) * valid values: a comma seperated list of: /// Specifies if to use a search and replace (regular expressions allowed) on batch node hostnames before reporting them. This is useful when your batch nodes hostnames and ganglia hostnames are not the same. For example a job runs on batch node with hostname: infiniband-host1 but in Ganglia the node is named: host1 - Then you can set: {{{BATCH_HOST_TRANSLATE: /infiniband//}}} and it will strip the infiniband portion from the hostname * If not empty: all batch nodes names are passed through all specified regular expression search/replace statements before reported * If empty: no search/replace done == jobarchived == Here is an example of a typical jobmond.conf file contents: {{{ [DEFAULT] # Wether or not to run as a daemon in background # DAEMONIZE : 1 # Specify debugging level here (only when _not_ DAEMONIZE) # # 11 = XML: metrics # 10 = XML: host, cluster, grid, ganglia # 9 = RRD activity, gmetad config parsing # 8 = RRD file activity # 6 = SQL # 1 = daemon threading # 0 = errors # # default: 0 # DEBUG_LEVEL : 1 # Enable logging to syslog? # USE_SYSLOG : 1 # What level msg'es should be logged to syslog? # # usually: lvl 0 (errors) # SYSLOG_LEVEL : 0 # Which facility to use in syslog # # Known: # KERN, USER, MAIL, DAEMON, AUTH, LPR, # NEWS, UUCP, CRON and LOCAL0 through LOCAL7 # SYSLOG_FACILITY : DAEMON # Where is the gmetad.conf located # GMETAD_CONF : /etc/ganglia/gmetad.conf # Where to grab XML data from # Usually: local gmetad (port 8651) # # Syntax: : # ARCHIVE_XMLSOURCE : localhost:8651 # List of data_source names to archive for # # Syntax: [ "", "" ] # ARCHIVE_DATASOURCES : [ "My Cluster" ] # Amount of hours to store in one single archived rrd # # If you would like less files you can set this bigger # but could degrade performance # # For now 12 hours seems to work: 2 periods per day # ARCHIVE_HOURS_PER_RRD : 12 # Which metrics to exclude from archiving # NOTE: This can be a regexp or a string # ARCHIVE_EXCLUDE_METRICS : ".*Temp.*", ".*RPM.*", ".*Version.*", ".*Tag$", "boottime", "gexec", "os.*", "machine_type" # Where to store the archived rrd's # ARCHIVE_PATH : /usr/local/jobmonarch # Archive's SQL dbase to use # # Syntax: / # JOB_SQL_DBASE : localhost/jobarchive JOB_SQL_USER : jobarchive #JOB_SQL_PASSWORD : # Timeout for jobs in archive # # Assume job has already finished while jobarchived was not running # after this amount of hours: the it will be finished anyway in the database # JOB_TIMEOUT : 168 # Location of rrdtool binary # RRDTOOL : /usr/bin/rrdtool }}} === DEBUG_LEVEL === * required * valid values: any number between 0 - 20 This level sets which level of messages are either syslogged (in daemon mode) and/or printed to stdout (in foreground mode) === DAEMONIZE === * required * valid values: 0 or 1 * 0 : Don't daemonize: run in the foreground : any DEBUG_LEVEL messages are sent to stdout * 1 : Daemonize: run in the background : any DEBUG_LEVEL messages are sent to syslog Determines wether or not jobarchived should run as daemon in background. === USE_SYSLOG === * required * valid values: 0 or 1: * 0: Don't log messages * 1: Log any messages at DEBUG_LEVEL to syslog's SYSLOG_FACILITY Specifies wether or not to use syslog for any messages === SYSLOG_FACILITY === * required * valid values: KERN, USER, MAIL, DAEMON, AUTH, LPR, NEWS, UUCP, CRON, LOCAL0, LOCAL1, LOCAL2, LOCAL3, LOCAL4, LOCAL5, LOCAL6, LOCAL7 Specifies to which syslog facility any syslog messages are sent. === GMETAD_CONF === * required * valid value: any text string Specifies location of Ganglia's gmetad.conf === ARCHIVE_XMLSOURCE === * required * valid values: * : Specifies where to get XML from to store in archive. Normally this is a gmetad daemon's tcp_accept_channel == web == 1. Change your Ganglia's web template to Job Monarch {{{ vi /var/www/ganglia/conf.php }}} {{{ $template_name = "job_monarch"; }}} 2. Change Job Monarch's config to reflect your settings: {{{ vi /var/www/ganglia/addons/job_monarch/conf.php }}} ( see config comments for syntax and explanation )