source: branches/1.1/CHANGELOG @ 954

Last change on this file since 954 was 951, checked in by ramonb, 10 years ago
  • updated for 1.1.2 release
File size: 12.9 KB
Line 
1
2        LEGEND  f: fixed - c: changed - a: added - r: removed
3
41.1.2:
5
6    jobmond)
7
8        c: job info is now escaped: special characters (from for example job
9           names) are escaped to prevent (XML) errors
10        f: no longer eats up file descriptors and crashing after running out of
11           file descriptors
12        f: no longer crash after Torque/PBS unavailability issue under certain
13           conditions
14
151.1.1:
16
17    packaging)
18
19        f: correctly set the JOBARCHIVE_RRDS in both jobarchived.conf and
20           web/conf.php.in
21        f: debian init.d script names in post/pre pkg corrected to new name
22        f: debian postrm was incorrectly trying redhat conditional restart
23
24    web)
25
26        c: column nodes renamed to: hosts
27        a: sorting by hosts now implemented
28
29    jobarchived)
30
31        f: now properly exits on fatal xml errors
32        f: prevent exception to occur when no timed out jobs are found during
33           Housekeeping
34
35    jobmond)
36
37        f: BATCH_HOST_TRANSLATE no longer required in jobmond.conf
38
391.1:
40
41    web)
42
43        a: archive search now has "include running jobs" option
44        c: rewritten short versus FQDN hostname detection: now works properly
45           with ganglia hosts not using FQDN hostnames
46        f: display of xml parsetime for overview. no longer display parsetime
47           for archive (no parsing done)
48        f: down/offline nodes are now properly marked in cluster image again
49        f: bug where "Unavailable" row would not be shown in overview summary
50           table
51
52    packaging)
53
54        c: completely redone and rewritten by Olivier Lahaye - thanks!
55
56    jobmond)
57
58        a: now supports SLURM Workload Manager!
59        a: warning if connecting to remote BATCH_SERVER is not supported by
60           selected BATCH_API
61        f: bug where incorrect commandline option would trigger traceback in
62           usage()
63
64    jobarchived)
65
66        a: now performs regular database Housekeeping every 20 job XML
67           iterations (previously only once at startup)
68        a: now checks if ARCHIVE_DATASOURCES are present in gmetad.conf
69        f: prevent an Exception to occur when determining datasource polling
70           interval
71        f: bug where config file handle was not closed
72
731.0:
74
75    jobmond)
76
77        a: now supports multiple udp send channels
78        a: now supports job arrays
79        c: updated Gmetric XDR protocol to version 3.1+ compatible
80
81        c: gmond.conf parsing has been rewritten to handle include's and
82           multiple send channels
83        c: METRIC_MAX_VAL_LEN is now determined from gmond.conf
84        c: utilize new job monarch protocol
85
86        f: can now handle new PBSQuery / pbs_python versions
87        f: default gmond.conf search location is now /etc/ganglia/gmond.conf
88        f: fatal error's are now printed to shell upon startup, not just syslog
89        f: more error checking and miscellanious bugfixes
90
91    jobarchived)
92
93        r: no longer use pyPgSQL for postgres database
94        c: now use psycopg2 module for postgres database
95
96        a: job thread now utilizes db commits and rollbacks
97        a: now use USER/PASS authentication to database (in stead of hostbased)
98
99        c: database schema: changed job_id to varchar to support job arrays
100        c: database schema: changed job_name max length to 255, just like
101           torque
102        c: database schema: added username/password role authentication
103        c: utilize new job monarch protocol
104
105        f: job thread no longer hangs when insert/update of a job in database
106           fails
107        f: rewrite of job (finished) detection: all finished jobs again
108           properly detected
109        f: job checking now done post-parsing not while parsing
110        f: more error checking and miscellanious bugfixes
111
112    web)
113
114        r: removed Pie chart
115        r: removed TemplatePower
116        r: removed php ini_set's and time limit directive: should be handled in
117           php.ini
118        r: removed "Get Fresh Data" button: served no purpose anymore
119        a: now utilize Dwoo templates for html output
120
121        a: now use USER/PASS authentication to database (in stead of hostbased)
122        a: ClusterImage now drops a shadow below nodes
123        a: RRDs now show "Last: Min: Avg: Max:" values in legend
124
125        c: utilize new job monarch protocol
126        c: all templates rewritten from TemplatePower to Dwoo
127        c: graph.php now used for overview and archive
128        c: RRDs job start/finish line is now dashed green/red line with legend
129
130        f: some dbase fields are now CAST to INT for php since postgres now
131           requires explicit casts
132        f: sort order descending/ascending is now correct
133        f: many, many speed and memory improvements
134        f: more error checking and miscellanious bugfixes
135
1360.4:
137
138        jobmond)
139                a:      SGE support
140                        thanks to: Dave Love - d(d.o.t)love(a.t)liverpool(d.o.t)ac(d.o.t)uk
141                        for writing it!
142                a:      LSF support
143                        thanks to: Mahmoud Hanafi - mhanafi(a.t)csc(d.o.t)com
144                        for writing it!
145                a:      GMETRIC_TARGET is now parsed from gmond.conf
146                a:      GMETRIC_BINARY is now looked for in PATH
147                f:      queue selection support is now working
148                        thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu
149                        for the patch
150        web)
151                a:      large graphs link for job report
152                        thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu
153                a:      SHOW_EMPTY_COLUMN, SHOW_EMPTY_ROW options for ClusterImage hostname parsing
154
1550.3.1:
156
157        other)
158                f:      updated INSTALL since "addons" directory is not included by default anymore in Ganglia
159                        thanks to: Steven DuChene linux(d.a.s.h)clusters(a.t)mindspring(d.o.t)com
160                        for reporting it
161
162        rpm)
163                f:      add "addons" directory since it's not included by default anymore in Ganglia
164                f:      properly rewrite WEBDIR path in %files when rebuilding rpms with Makefile
165
166        web)
167                f:      typo in empty_cpu variable: causing incorrect 'free cpu' count and similar errors
168                        thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu
169                        for reporting it
170                f:      changed erroneous domain detection a little
171                        thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu
172                        for reporting it
173                a:      now properly detects whether or not to use FQDN or short hostnames w/o domain
174                        thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu
175                        thanks to: Jeffrey Sarlo - JSarlo(a.t)Central(d.o.t)UH(d.o.t)EDU
176                        for the many testing and reporting it
177
178                        SPECIAL THANKS to the University of Houston for sending me a shirt!
179
180        jobarchived)
181                f:      properly catch postgres exception
182                f:      don't use debug_message while loading config file
183
1840.3:
185
186        web)
187                a:      allow per-cluster settings/override options: see CLUSTER_CONFS option
188                a:      clusterimage can now draw nodes at x,y position parsed from hostname
189                        see SORTBY_HOSTNAME for this in clusterconf/example.php
190                a:      clusterimage nodes are now clickable: has link to all jobs from that host
191                a:      clusterimage nodes now have a tooltip: displays hostname and jobids for now
192                a:      jobmonarch logo image
193                        thank to: Robin Day
194                        for the design
195                a:      rrd graph of running/queued jobs to overview
196                a:      per-cluster settings for archive database
197                        thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr
198                        for the patch
199
200                c:      host archive view is now more complete and detailed in the same manner as
201                        Ganglia's own host view
202                c:      host archive view available metric list is now compiled from disk,
203                        so that the detailed archive host view works even when the node is currently down.
204                c:      removed size restrictions from detailed host archive view
205
206                f:      compatibility: removed php5 call
207                        thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr
208                        for the patch
209                f:      prevent negative cpu/node calculation
210                        thanks to: aloga(a.t)ifca(d.o.t)unican(d.o.t)es
211                        for the patch
212                f:      archive search not properly resetting nodes list
213                        thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr
214                        for the patch
215                f:      detailed host view from jobarchive was broken since hostbased support of 0.2
216                        now host view is properly set and parsed again
217                        thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr
218                        for reporting the bug and suggesting a patch
219                f:      bug where jobstart redline indicator in host detail graphs was set incorrectly
220                        or not at all due to a miscalculation in job times
221                f:      bug where hostimage headertext xoffset was miscalculated, causing the column names
222                        to overlap their position when the columnname was longer than the columnvalues
223
224        jobmond)
225
226                a:      syslog support
227                a:      report number of running/queued jobs as seperate metrics
228                a:      native gmetric support, much faster and cleaner!
229                        thanks to: Nick Galbreath - nickg(a.t)modp(d.o.t)com
230                        for writing it and allowing inclusion in jobmond
231
232                f:      crashing jobmond when multiple nodes amounts are requested in
233                        a queued job: numeric_node variable not initialized properly
234                        thanks to: aloga(a.t)ifca(d.o.t)unican(d.o.t)es
235                        for supplying the patch
236                        and many others for reporting and helping debug this
237                f:      hanging/blocked, increased cpu usage and halted reporting
238                        thanks to: Bas van der Vlies - basv(a.t)sara(d.o.t)nl
239                        for discovering the origin of the bug
240                        thanks to: Mickael Gastineau - gastineau(a.t)imcce(d.o.t)fr
241                        for reporting it and testing the fix
242                        thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu
243                        for reporting it and testing the fix
244                f:      uninitialized variable in checkGmetricVersion()
245                        thanks to: Peter Kruse - pk(a.t)q-leap(d.o.t)com
246                        for the patch
247                f:      undefined PBSError
248                        thanks to: Peter Kruse - pk(a.t)q-leap(d.o.t)com
249                        for reporting it
250
251                r:      SGE support broken
252
253        jobarchived)
254
255                a:      can now use py-rrdtool api instead of pipes, much faster!
256                        install py-rrdtool to use this
257                        backwards compatible fails back to pipes if module not installed
258
259                c:      all XML input was uniencoded, which could cause errors,
260                        now all properly converted to normal strings
261
262                f:      when XML data source (gmetad) is unavailable parsethread didn't return correctly
263                        which caused a large number of threads to spawn while consuming large amounts of memory
264                f:      autocreate clusterdirs in archivedir
265                f:      unhandled gather exception
266                f:      incorrect stop_timestamping when jobs finished
267                        thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr
268                        for finding and debugging/testing it
269
2700.2:
271
272        web)
273                f:      misc. optimization and bugfixes
274                f:      now fully compatible with latest PHP5 and PHP4
275
276                c:      cluster image now incorporates small text descr.
277                c:      monarch (cluster/host) images no longer displayed
278                        for clusters that are not jobmond enabled
279                c:      pie chart percentages are now cpu-based instead of node-based
280
281                a:      host template for Ganglia
282                        adds a extra monarch host image to Ganglia's host overview
283                        which displays/links to the jobs on that host
284                        NOTE!: be sure to copy/install new template from addons/templates
285                a:      (optional) nodes hostnames column
286                        thanks to: Daniel Barthel - daniel(d.o.t)barthel(a.t)nottingham(d.o.t)ac(d.o.t)uk
287                        for the suggestion
288
289        jobmond)
290
291                f:      when a job metric is longer than maximum metric length,
292                        the info is split up amongst multiple metrics
293                f:      no longer exit when batch server is unavailable
294                        thanks to: Peter Kruse - pk(a.t)q-leap(d.o.t)com
295                        for the patch
296                f:      fd closure bug causing stderr/stdout to remain open after daemonizing
297
298                c:      rearranged code to allow support for other batch systems
299
300                a:      (experimental) SGE (Sun Grid Engine) support as batch server
301                        thanks to: Babu Sundaram - babu(a.t)cs(d.o.t)uh(d.o.t)edu
302                        who developed it for a OSCAR's Google-SoC project
303                a:      pidfile support
304                        thanks to: Michael Jeanson - michael(a.t)ccs(d.o.t)usherbrooke(d.o.t)ca
305                        for the patch
306                a:      usage display
307                        thanks to: Michael Jeanson - michael(a.t)ccs(d.o.t)usherbrooke(d.o.t)ca
308                        for the patch
309                a:      queue selection support: ability to specify which QUEUE's to get jobinfo from
310                        thanks to: Michael Jeanson - michael(a.t)ccs(d.o.t)usherbrooke(d.o.t)ca
311                        for the patch
312
313        jobarchived)
314
315                f:      XML retrieval for Ganglia version >= 3.0.3 working properly again
316                f:      database storing for Ganglia version >= 3.0.3 working properly again
317                f:      fd closure bug causing stderr/stdout to remain open after daemonizing
318
319                c:      misc. bugfixes to optimize XML connections
320                c:      misc. bugfixes for misc. minor issues
321
322                a:      cleaning of stale jobs in dbase: see JOB_TIMEOUT option
323
3240.1.1:
325
326        web)
327
328                f:      misc. layout bugs for overview & search
329                f:      bug that occured when calculating the number of nodes when there
330                        was more than one job running on a machine
331
332                c:      column requested memory is now optional through conf.php
333                c:      search and overview tables are now full screen (100%)
334                c:      overview jobnames are now cutoff at max 9 characters
335                        to prevent (layout) scews in the tables
336                c:      overview graphs are no longer downsized
337
338                a:      (optional) column 'queued' (since) in overview
339                a:      search results (can) now have a SEARCH_RESULT_LIMIT
340                        this increases performance of the query's significantly!
341                a:      date/time format as displayed is now configurable through conf.php
342
343        jobmond)
344
345                a:      now reports 'queued since' (or creation time) of jobs
346
347        documentation)
348
349                f:      wrong e-mail adress in INSTALL (doh!)
350
3510.1:
352
353        - First public release
Note: See TracBrowser for help on using the repository browser.