source: tags/0.1.0/INSTALL @ 491

Last change on this file since 491 was 232, checked in by bastiaans, 18 years ago

INSTALL:

  • added special features to description
File size: 5.6 KB
Line 
1DESCRIPTION
2===========
3
4        Job Monarch is a set of tools to monitor and optionally archive (batch)job information.
5
6        It is a addon for the Ganglia monitoring system and plugs in to a existing Ganglia setup.
7
8        To view a operational setup with Job Monarch, have a look here: http://ganglia.sara.nl/
9
10
11        Job Monarch stands for 'Job Monitoring and Archiving' tool and consists of three (3) components:
12
13        * jobmond
14
15                The Job Monitoring Daemon.
16                 
17                Gathers PBS/Torque batch statistics on jobs/nodes and submits them into
18                Ganglia's XML stream.
19
20                Through this daemon, users are able to view the PBS/Torque batch system and the
21                jobs/nodes that are in it (be it either running or queued).
22
23        * jobarchived (optionally)
24
25                The Job Archiving Daemon.
26
27                Listens to Ganglia's XML stream and archives the job and node statistics.
28                It stores the job statistics in a Postgres SQL database and the node statistics
29                in RRD files.
30               
31                Through this daemon, users are able to lookup a old/finished job
32                and view all it's statistics.
33
34                Optionally: You can either choose to use this daemon if your users have use for it.
35                As it can be a heavy application to run and not everyone may have a need for it.
36
37                - Multithreaded:        Will not miss any data regardless of (slow) storage
38
39                - Staged writing:       Spread load over bigger time periods
40
41                - High precision RRDs:  Allow for zooming on old periods with large precision
42
43                - Timeperiod RRDs:      Allow for smaller number of files while still keeping advantage
44                                        of small disk space
45               
46        * web
47
48                The Job Monarch web interface.
49
50                This interfaces with the jobmond data and (optionally) the jobarchived and presents the
51                data and graphs.
52
53                It does this in a similar layout/setup as Ganglia itself, so the navigation and usage is intuitive.
54
55                - Graphical usage:      Displays graphical cluster overview so you can see the cluster (job) state
56                                        in one view/image and additional pie chart with relevant information on your
57                                        current view
58
59                - Filters:              Ability to filter output to limit information displayed (usefull for those
60                                        clusters with 500+ jobs). This also filters the graphical overview images output
61                                        and pie chart so you only see the filter relevant data
62
63                - Archive:              When enabling jobarchived, users can go back as far as recorded in the database
64                                        or archived RRDs to find out what happened to a crashed or old job
65
66                - Zoom ability:         Users can zoom into a timepriod as small as the smallest grain of the RRDS
67                                        (typically up to 10 seconds) when a jobarchived is present
68
69REQUIREMENTS
70============
71
72        all:
73
74                - Python 2.3 or higher
75
76        jobmond:
77
78                - pbs_python v2.8.2 or higher
79                  ftp://ftp.sara.nl/pub/outgoing/pbs_python.tar.gz
80
81                - gmond v3.0.1 or higher
82                  http://www.ganglia.info
83
84        jobarchived:
85
86                - Postgres SQL v7.xx
87                  http://www.postgres.org
88
89                - rrdtool v1.xx
90                  http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/
91
92                - python-pgsql v4.x.x
93                  http://sourceforge.net/projects/pypgsql/
94
95                - gmetad v3.x.x
96                  http://www.ganglia.info
97
98        web:
99
100                - PHP v4.1 or higher
101                  http://www.php.net
102
103                - php-pgsql v4.x.x
104                  (should come with Postgres)
105
106                - GD v2.x
107                  http://www.boutell.com/gd/
108
109                - Ganglia web frontend v3.x.x
110                  http://www.ganglia.info
111
112
113INSTALLATION
114============
115
116        Prior to installing the software make sure you meet the necessary requirements as
117        mentioned above.
118
119        NOTE: You can choose to install to other path/directories if your setup is different.
120
121        * jobmond
122
123                1. Copy jobmond.py:
124
125                 > cp jobmond/jobmond.py /usr/local/sbin/jobmond.py
126
127                2. Copy jobmond.conf:
128               
129                 > cp jobmond/jobmond.conf /etc/jobmond.conf
130
131        * jobarchived
132
133                1. Create a Postgres SQL database for jobarchived:
134
135                 > createdb jobarchive
136
137                2. Setup jobarchived's tables:
138
139                 > psql -f jobarchived/job_dbase.sql jobarchive
140
141                3. Copy jobarchived/jobarchived.conf:
142
143                 > cp jobarchived/jobarchived.conf /etc/jobarchived.conf
144
145                4. Copy jobarchived.py and DBClass.py:
146
147                 > cp jobarchived/jobarchived.py /usr/local/sbin/jobarchived.py
148                 > cp jobarchived/DBClass.py /usr/local/sbin/DBClass.py
149
150        * web
151
152                1. Copy the Job Monarch Template to your Ganglia installation
153
154                 > cp -a web/templates/job_monarch /var/www/ganglia/templates
155
156                2. Copy the web interface files to the addon directory in Ganglia
157
158                 > cp -a web/addons/job_monarch /var/www/ganglia/addons
159
160CONFIGURATION
161=============
162
163        After installation each component requires additional configuration.
164
165        * jobmond
166       
167                1. Edit Jobmond's config to reflect your settings:
168
169                 - In /etc/jobmond.conf
170
171                   ( see config comments for syntax and explanation )
172
173        * jobarchived
174
175                1. Edit Jobarchived's config to reflect your settings:
176
177                 - In /etc/jobarchived.conf
178
179                   ( see config comments for syntax and explanation )
180
181        * web
182
183                1. Change your Ganglia's web template to Job Monarch
184
185                 - In /var/www/ganglia/conf.php:
186
187                 > $template_name = "job_monarch";
188
189                2. Change Job Monarch's config to reflect your settings:
190
191                 - In /var/www/ganglia/addons/job_monarch/conf.php
192
193                   ( see config comments for syntax and explanation )
194
195START
196=====
197
198        * jobmond
199
200                The Job Monitor has to be run on a machine that is allowed to
201                query the PBS/Torque server.
202                Make sure that if you have 'acl_hosts' enabled on your PBS/Torque
203                server that jobmond's machine is in it.
204
205                1. Start the Job Monitor:
206
207                 > /usr/local/sbin/jobmond.py -c /etc/jobmond.conf
208
209        * jobarchived
210
211                1. Start the Job Archiver:
212
213                 > /usr/local/sbin/jobarchived.py -c /etc/jobarchived.conf
214
215        * web
216
217                Doesn't require you to (re)start anything.
218                ( make sure the Postgres database is running though )
219
220CONTACT
221=======
222
223        To contact the author for anything from bugfixes to flame/hate mail:
224
225        * Ramon Bastiaans
226
227          <ramon ( a t ) sara ( d o t ) nl>
Note: See TracBrowser for help on using the repository browser.