source: branches/1.0/INSTALL @ 980

Last change on this file since 980 was 849, checked in by olahaye, 11 years ago

[INSTALL] replaced pyPgSQL with python-psycopg2

File size: 5.6 KB
RevLine 
[221]1DESCRIPTION
2===========
3
4        Job Monarch is a set of tools to monitor and optionally archive (batch)job information.
5
6        It is a addon for the Ganglia monitoring system and plugs in to a existing Ganglia setup.
7
[222]8        To view a operational setup with Job Monarch, have a look here: http://ganglia.sara.nl/
[221]9
10
11        Job Monarch stands for 'Job Monitoring and Archiving' tool and consists of three (3) components:
12
13        * jobmond
14
15                The Job Monitoring Daemon.
16                 
17                Gathers PBS/Torque batch statistics on jobs/nodes and submits them into
18                Ganglia's XML stream.
19
20                Through this daemon, users are able to view the PBS/Torque batch system and the
21                jobs/nodes that are in it (be it either running or queued).
22
[232]23        * jobarchived (optionally)
[221]24
[232]25                The Job Archiving Daemon.
[221]26
27                Listens to Ganglia's XML stream and archives the job and node statistics.
28                It stores the job statistics in a Postgres SQL database and the node statistics
29                in RRD files.
30               
31                Through this daemon, users are able to lookup a old/finished job
32                and view all it's statistics.
33
34                Optionally: You can either choose to use this daemon if your users have use for it.
[232]35                As it can be a heavy application to run and not everyone may have a need for it.
36
37                - Multithreaded:        Will not miss any data regardless of (slow) storage
38
39                - Staged writing:       Spread load over bigger time periods
40
41                - High precision RRDs:  Allow for zooming on old periods with large precision
42
43                - Timeperiod RRDs:      Allow for smaller number of files while still keeping advantage
44                                        of small disk space
[221]45               
46        * web
47
48                The Job Monarch web interface.
49
50                This interfaces with the jobmond data and (optionally) the jobarchived and presents the
51                data and graphs.
52
53                It does this in a similar layout/setup as Ganglia itself, so the navigation and usage is intuitive.
54
[232]55                - Graphical usage:      Displays graphical cluster overview so you can see the cluster (job) state
56                                        in one view/image and additional pie chart with relevant information on your
57                                        current view
58
59                - Filters:              Ability to filter output to limit information displayed (usefull for those
60                                        clusters with 500+ jobs). This also filters the graphical overview images output
61                                        and pie chart so you only see the filter relevant data
62
63                - Archive:              When enabling jobarchived, users can go back as far as recorded in the database
64                                        or archived RRDs to find out what happened to a crashed or old job
65
66                - Zoom ability:         Users can zoom into a timepriod as small as the smallest grain of the RRDS
67                                        (typically up to 10 seconds) when a jobarchived is present
68
[221]69REQUIREMENTS
70============
71
[222]72        all:
73
74                - Python 2.3 or higher
75
[221]76        jobmond:
77
[230]78                - pbs_python v2.8.2 or higher
[366]79                  https://subtrac.sara.nl/oss/pbs_python/
[221]80
[222]81                - gmond v3.0.1 or higher
[366]82                  http://www.ganglia.info/
[221]83
84        jobarchived:
85
[223]86                - Postgres SQL v7.xx
[366]87                  http://www.postgres.org/
[221]88
89                - rrdtool v1.xx
90                  http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/
91
[366]92                - py-rrdtool
93                  http://sourceforge.net/projects/py-rrdtool/
94
[849]95                - python-psycopg2
[222]96                  http://sourceforge.net/projects/pypgsql/
97
98                - gmetad v3.x.x
[366]99                  http://www.ganglia.info/
[221]100
101        web:
102
[222]103                - PHP v4.1 or higher
[221]104                  http://www.php.net
105
[222]106                - php-pgsql v4.x.x
107                  (should come with Postgres)
[221]108
[843]109                - php-mbstring
110
[222]111                - GD v2.x
112                  http://www.boutell.com/gd/
113
114                - Ganglia web frontend v3.x.x
[223]115                  http://www.ganglia.info
[222]116
117
[221]118INSTALLATION
119============
120
121        Prior to installing the software make sure you meet the necessary requirements as
122        mentioned above.
123
[222]124        NOTE: You can choose to install to other path/directories if your setup is different.
[221]125
[222]126        * jobmond
[221]127
[222]128                1. Copy jobmond.py:
129
130                 > cp jobmond/jobmond.py /usr/local/sbin/jobmond.py
131
132                2. Copy jobmond.conf:
133               
134                 > cp jobmond/jobmond.conf /etc/jobmond.conf
135
136        * jobarchived
137
138                1. Create a Postgres SQL database for jobarchived:
139
140                 > createdb jobarchive
141
142                2. Setup jobarchived's tables:
143
144                 > psql -f jobarchived/job_dbase.sql jobarchive
145
146                3. Copy jobarchived/jobarchived.conf:
147
148                 > cp jobarchived/jobarchived.conf /etc/jobarchived.conf
149
[489]150                4. Copy jobarchived.py:
[222]151
152                 > cp jobarchived/jobarchived.py /usr/local/sbin/jobarchived.py
153
154        * web
155
156                1. Copy the Job Monarch Template to your Ganglia installation
157
158                 > cp -a web/templates/job_monarch /var/www/ganglia/templates
159
160                2. Copy the web interface files to the addon directory in Ganglia
161
[493]162                 > mkdir -p /var/www/ganglia/addons
[222]163                 > cp -a web/addons/job_monarch /var/www/ganglia/addons
164
[221]165CONFIGURATION
166=============
167
[222]168        After installation each component requires additional configuration.
[221]169
[222]170        * jobmond
171       
172                1. Edit Jobmond's config to reflect your settings:
173
174                 - In /etc/jobmond.conf
175
176                   ( see config comments for syntax and explanation )
177
178        * jobarchived
179
180                1. Edit Jobarchived's config to reflect your settings:
181
182                 - In /etc/jobarchived.conf
183
184                   ( see config comments for syntax and explanation )
185
186        * web
187
188                1. Change your Ganglia's web template to Job Monarch
189
190                 - In /var/www/ganglia/conf.php:
191
192                 > $template_name = "job_monarch";
193
194                2. Change Job Monarch's config to reflect your settings:
195
196                 - In /var/www/ganglia/addons/job_monarch/conf.php
197
198                   ( see config comments for syntax and explanation )
199
[221]200START
201=====
202
[222]203        * jobmond
[221]204
[222]205                The Job Monitor has to be run on a machine that is allowed to
206                query the PBS/Torque server.
207                Make sure that if you have 'acl_hosts' enabled on your PBS/Torque
208                server that jobmond's machine is in it.
[221]209
[222]210                1. Start the Job Monitor:
211
212                 > /usr/local/sbin/jobmond.py -c /etc/jobmond.conf
213
214        * jobarchived
215
216                1. Start the Job Archiver:
217
218                 > /usr/local/sbin/jobarchived.py -c /etc/jobarchived.conf
219
220        * web
221
222                Doesn't require you to (re)start anything.
223                ( make sure the Postgres database is running though )
224
[221]225CONTACT
226=======
227
228        To contact the author for anything from bugfixes to flame/hate mail:
229
[222]230        * Ramon Bastiaans
231
[235]232          <bastiaans ( a t ) sara ( d o t ) nl>
Note: See TracBrowser for help on using the repository browser.