1 | DESCRIPTION |
---|
2 | =========== |
---|
3 | |
---|
4 | Job Monarch is a set of tools to monitor and optionally archive (batch)job information. |
---|
5 | |
---|
6 | It is a addon for the Ganglia monitoring system and plugs in to a existing Ganglia setup. |
---|
7 | |
---|
8 | To view a operational setup with Job Monarch, have a look here: http://ganglia.sara.nl/ |
---|
9 | |
---|
10 | |
---|
11 | Job Monarch stands for 'Job Monitoring and Archiving' tool and consists of three (3) components: |
---|
12 | |
---|
13 | * jobmond |
---|
14 | |
---|
15 | The Job Monitoring Daemon. |
---|
16 | |
---|
17 | Gathers PBS/Torque batch statistics on jobs/nodes and submits them into |
---|
18 | Ganglia's XML stream. |
---|
19 | |
---|
20 | Through this daemon, users are able to view the PBS/Torque batch system and the |
---|
21 | jobs/nodes that are in it (be it either running or queued). |
---|
22 | |
---|
23 | * jobarchived |
---|
24 | |
---|
25 | The Job Archiving Daemon (optionally). |
---|
26 | |
---|
27 | Listens to Ganglia's XML stream and archives the job and node statistics. |
---|
28 | It stores the job statistics in a Postgres SQL database and the node statistics |
---|
29 | in RRD files. |
---|
30 | |
---|
31 | Through this daemon, users are able to lookup a old/finished job |
---|
32 | and view all it's statistics. |
---|
33 | |
---|
34 | Optionally: You can either choose to use this daemon if your users have use for it. |
---|
35 | As it can be a heavy application to run - even though optimized (staged/buffered writes |
---|
36 | and multi threaded) - and not everyone may have a need for it. |
---|
37 | |
---|
38 | * web |
---|
39 | |
---|
40 | The Job Monarch web interface. |
---|
41 | |
---|
42 | This interfaces with the jobmond data and (optionally) the jobarchived and presents the |
---|
43 | data and graphs. |
---|
44 | |
---|
45 | It does this in a similar layout/setup as Ganglia itself, so the navigation and usage is intuitive. |
---|
46 | |
---|
47 | |
---|
48 | REQUIREMENTS |
---|
49 | ============ |
---|
50 | |
---|
51 | all: |
---|
52 | |
---|
53 | - Python 2.3 or higher |
---|
54 | |
---|
55 | jobmond: |
---|
56 | |
---|
57 | - pbs_python v2.8.1 or higher |
---|
58 | ftp://ftp.sara.nl/pub/outgoing/pbs_python.tar.gz |
---|
59 | |
---|
60 | - gmond v3.0.1 or higher |
---|
61 | http://www.ganglia.info |
---|
62 | |
---|
63 | jobarchived: |
---|
64 | |
---|
65 | - Postgres SQL v7.xx |
---|
66 | http://www.postgres.org |
---|
67 | |
---|
68 | - rrdtool v1.xx |
---|
69 | http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/ |
---|
70 | |
---|
71 | - python-pgsql v4.x.x |
---|
72 | http://sourceforge.net/projects/pypgsql/ |
---|
73 | |
---|
74 | - gmetad v3.x.x |
---|
75 | http://www.ganglia.info |
---|
76 | |
---|
77 | web: |
---|
78 | |
---|
79 | - PHP v4.1 or higher |
---|
80 | http://www.php.net |
---|
81 | |
---|
82 | - php-pgsql v4.x.x |
---|
83 | (should come with Postgres) |
---|
84 | |
---|
85 | - GD v2.x |
---|
86 | http://www.boutell.com/gd/ |
---|
87 | |
---|
88 | - Ganglia web frontend v3.x.x |
---|
89 | http://www.ganglia.info |
---|
90 | |
---|
91 | |
---|
92 | INSTALLATION |
---|
93 | ============ |
---|
94 | |
---|
95 | Prior to installing the software make sure you meet the necessary requirements as |
---|
96 | mentioned above. |
---|
97 | |
---|
98 | NOTE: You can choose to install to other path/directories if your setup is different. |
---|
99 | |
---|
100 | * jobmond |
---|
101 | |
---|
102 | 1. Copy jobmond.py: |
---|
103 | |
---|
104 | > cp jobmond/jobmond.py /usr/local/sbin/jobmond.py |
---|
105 | |
---|
106 | 2. Copy jobmond.conf: |
---|
107 | |
---|
108 | > cp jobmond/jobmond.conf /etc/jobmond.conf |
---|
109 | |
---|
110 | * jobarchived |
---|
111 | |
---|
112 | 1. Create a Postgres SQL database for jobarchived: |
---|
113 | |
---|
114 | > createdb jobarchive |
---|
115 | |
---|
116 | 2. Setup jobarchived's tables: |
---|
117 | |
---|
118 | > psql -f jobarchived/job_dbase.sql jobarchive |
---|
119 | |
---|
120 | 3. Copy jobarchived/jobarchived.conf: |
---|
121 | |
---|
122 | > cp jobarchived/jobarchived.conf /etc/jobarchived.conf |
---|
123 | |
---|
124 | 4. Copy jobarchived.py and DBClass.py: |
---|
125 | |
---|
126 | > cp jobarchived/jobarchived.py /usr/local/sbin/jobarchived.py |
---|
127 | > cp jobarchived/DBClass.py /usr/local/sbin/DBClass.py |
---|
128 | |
---|
129 | * web |
---|
130 | |
---|
131 | 1. Copy the Job Monarch Template to your Ganglia installation |
---|
132 | |
---|
133 | > cp -a web/templates/job_monarch /var/www/ganglia/templates |
---|
134 | |
---|
135 | 2. Copy the web interface files to the addon directory in Ganglia |
---|
136 | |
---|
137 | > cp -a web/addons/job_monarch /var/www/ganglia/addons |
---|
138 | |
---|
139 | CONFIGURATION |
---|
140 | ============= |
---|
141 | |
---|
142 | After installation each component requires additional configuration. |
---|
143 | |
---|
144 | * jobmond |
---|
145 | |
---|
146 | 1. Edit Jobmond's config to reflect your settings: |
---|
147 | |
---|
148 | - In /etc/jobmond.conf |
---|
149 | |
---|
150 | ( see config comments for syntax and explanation ) |
---|
151 | |
---|
152 | * jobarchived |
---|
153 | |
---|
154 | 1. Edit Jobarchived's config to reflect your settings: |
---|
155 | |
---|
156 | - In /etc/jobarchived.conf |
---|
157 | |
---|
158 | ( see config comments for syntax and explanation ) |
---|
159 | |
---|
160 | * web |
---|
161 | |
---|
162 | 1. Change your Ganglia's web template to Job Monarch |
---|
163 | |
---|
164 | - In /var/www/ganglia/conf.php: |
---|
165 | |
---|
166 | > $template_name = "job_monarch"; |
---|
167 | |
---|
168 | 2. Change Job Monarch's config to reflect your settings: |
---|
169 | |
---|
170 | - In /var/www/ganglia/addons/job_monarch/conf.php |
---|
171 | |
---|
172 | ( see config comments for syntax and explanation ) |
---|
173 | |
---|
174 | START |
---|
175 | ===== |
---|
176 | |
---|
177 | * jobmond |
---|
178 | |
---|
179 | The Job Monitor has to be run on a machine that is allowed to |
---|
180 | query the PBS/Torque server. |
---|
181 | Make sure that if you have 'acl_hosts' enabled on your PBS/Torque |
---|
182 | server that jobmond's machine is in it. |
---|
183 | |
---|
184 | 1. Start the Job Monitor: |
---|
185 | |
---|
186 | > /usr/local/sbin/jobmond.py -c /etc/jobmond.conf |
---|
187 | |
---|
188 | * jobarchived |
---|
189 | |
---|
190 | 1. Start the Job Archiver: |
---|
191 | |
---|
192 | > /usr/local/sbin/jobarchived.py -c /etc/jobarchived.conf |
---|
193 | |
---|
194 | * web |
---|
195 | |
---|
196 | Doesn't require you to (re)start anything. |
---|
197 | ( make sure the Postgres database is running though ) |
---|
198 | |
---|
199 | CONTACT |
---|
200 | ======= |
---|
201 | |
---|
202 | To contact the author for anything from bugfixes to flame/hate mail: |
---|
203 | |
---|
204 | * Ramon Bastiaans |
---|
205 | |
---|
206 | <ramon ( a t ) sara ( d o t ) nl> |
---|