[342] | 1 | |
---|
[363] | 2 | LEGEND f: fixed - c: changed - a: added - r: removed |
---|
[361] | 3 | |
---|
[951] | 4 | 1.1.2: |
---|
| 5 | |
---|
| 6 | jobmond) |
---|
| 7 | |
---|
| 8 | c: job info is now escaped: special characters (from for example job |
---|
| 9 | names) are escaped to prevent (XML) errors |
---|
| 10 | f: no longer eats up file descriptors and crashing after running out of |
---|
| 11 | file descriptors |
---|
| 12 | f: no longer crash after Torque/PBS unavailability issue under certain |
---|
| 13 | conditions |
---|
| 14 | |
---|
[935] | 15 | 1.1.1: |
---|
| 16 | |
---|
[937] | 17 | packaging) |
---|
| 18 | |
---|
[939] | 19 | f: correctly set the JOBARCHIVE_RRDS in both jobarchived.conf and |
---|
[937] | 20 | web/conf.php.in |
---|
| 21 | f: debian init.d script names in post/pre pkg corrected to new name |
---|
[939] | 22 | f: debian postrm was incorrectly trying redhat conditional restart |
---|
[937] | 23 | |
---|
[935] | 24 | web) |
---|
| 25 | |
---|
| 26 | c: column nodes renamed to: hosts |
---|
| 27 | a: sorting by hosts now implemented |
---|
| 28 | |
---|
| 29 | jobarchived) |
---|
| 30 | |
---|
| 31 | f: now properly exits on fatal xml errors |
---|
| 32 | f: prevent exception to occur when no timed out jobs are found during |
---|
| 33 | Housekeeping |
---|
| 34 | |
---|
| 35 | jobmond) |
---|
| 36 | |
---|
| 37 | f: BATCH_HOST_TRANSLATE no longer required in jobmond.conf |
---|
| 38 | |
---|
[877] | 39 | 1.1: |
---|
| 40 | |
---|
| 41 | web) |
---|
| 42 | |
---|
[881] | 43 | a: archive search now has "include running jobs" option |
---|
[877] | 44 | c: rewritten short versus FQDN hostname detection: now works properly |
---|
| 45 | with ganglia hosts not using FQDN hostnames |
---|
| 46 | f: display of xml parsetime for overview. no longer display parsetime |
---|
| 47 | for archive (no parsing done) |
---|
| 48 | f: down/offline nodes are now properly marked in cluster image again |
---|
| 49 | f: bug where "Unavailable" row would not be shown in overview summary |
---|
| 50 | table |
---|
| 51 | |
---|
| 52 | packaging) |
---|
| 53 | |
---|
| 54 | c: completely redone and rewritten by Olivier Lahaye - thanks! |
---|
| 55 | |
---|
| 56 | jobmond) |
---|
| 57 | |
---|
| 58 | a: now supports SLURM Workload Manager! |
---|
| 59 | a: warning if connecting to remote BATCH_SERVER is not supported by |
---|
| 60 | selected BATCH_API |
---|
| 61 | f: bug where incorrect commandline option would trigger traceback in |
---|
| 62 | usage() |
---|
| 63 | |
---|
| 64 | jobarchived) |
---|
| 65 | |
---|
| 66 | a: now performs regular database Housekeeping every 20 job XML |
---|
| 67 | iterations (previously only once at startup) |
---|
| 68 | a: now checks if ARCHIVE_DATASOURCES are present in gmetad.conf |
---|
| 69 | f: prevent an Exception to occur when determining datasource polling |
---|
| 70 | interval |
---|
| 71 | f: bug where config file handle was not closed |
---|
| 72 | |
---|
[827] | 73 | 1.0: |
---|
| 74 | |
---|
| 75 | jobmond) |
---|
| 76 | |
---|
| 77 | a: now supports multiple udp send channels |
---|
| 78 | a: now supports job arrays |
---|
| 79 | c: updated Gmetric XDR protocol to version 3.1+ compatible |
---|
| 80 | |
---|
| 81 | c: gmond.conf parsing has been rewritten to handle include's and |
---|
| 82 | multiple send channels |
---|
| 83 | c: METRIC_MAX_VAL_LEN is now determined from gmond.conf |
---|
| 84 | c: utilize new job monarch protocol |
---|
| 85 | |
---|
| 86 | f: can now handle new PBSQuery / pbs_python versions |
---|
| 87 | f: default gmond.conf search location is now /etc/ganglia/gmond.conf |
---|
| 88 | f: fatal error's are now printed to shell upon startup, not just syslog |
---|
| 89 | f: more error checking and miscellanious bugfixes |
---|
| 90 | |
---|
| 91 | jobarchived) |
---|
| 92 | |
---|
| 93 | r: no longer use pyPgSQL for postgres database |
---|
| 94 | c: now use psycopg2 module for postgres database |
---|
| 95 | |
---|
| 96 | a: job thread now utilizes db commits and rollbacks |
---|
| 97 | a: now use USER/PASS authentication to database (in stead of hostbased) |
---|
| 98 | |
---|
| 99 | c: database schema: changed job_id to varchar to support job arrays |
---|
| 100 | c: database schema: changed job_name max length to 255, just like |
---|
| 101 | torque |
---|
| 102 | c: database schema: added username/password role authentication |
---|
| 103 | c: utilize new job monarch protocol |
---|
| 104 | |
---|
| 105 | f: job thread no longer hangs when insert/update of a job in database |
---|
| 106 | fails |
---|
| 107 | f: rewrite of job (finished) detection: all finished jobs again |
---|
| 108 | properly detected |
---|
| 109 | f: job checking now done post-parsing not while parsing |
---|
| 110 | f: more error checking and miscellanious bugfixes |
---|
| 111 | |
---|
| 112 | web) |
---|
| 113 | |
---|
| 114 | r: removed Pie chart |
---|
| 115 | r: removed TemplatePower |
---|
| 116 | r: removed php ini_set's and time limit directive: should be handled in |
---|
| 117 | php.ini |
---|
| 118 | r: removed "Get Fresh Data" button: served no purpose anymore |
---|
| 119 | a: now utilize Dwoo templates for html output |
---|
| 120 | |
---|
| 121 | a: now use USER/PASS authentication to database (in stead of hostbased) |
---|
| 122 | a: ClusterImage now drops a shadow below nodes |
---|
| 123 | a: RRDs now show "Last: Min: Avg: Max:" values in legend |
---|
| 124 | |
---|
| 125 | c: utilize new job monarch protocol |
---|
| 126 | c: all templates rewritten from TemplatePower to Dwoo |
---|
| 127 | c: graph.php now used for overview and archive |
---|
| 128 | c: RRDs job start/finish line is now dashed green/red line with legend |
---|
| 129 | |
---|
| 130 | f: some dbase fields are now CAST to INT for php since postgres now |
---|
| 131 | requires explicit casts |
---|
| 132 | f: sort order descending/ascending is now correct |
---|
| 133 | f: many, many speed and memory improvements |
---|
| 134 | f: more error checking and miscellanious bugfixes |
---|
| 135 | |
---|
[511] | 136 | 0.4: |
---|
| 137 | |
---|
| 138 | jobmond) |
---|
| 139 | a: SGE support |
---|
| 140 | thanks to: Dave Love - d(d.o.t)love(a.t)liverpool(d.o.t)ac(d.o.t)uk |
---|
| 141 | for writing it! |
---|
[526] | 142 | a: LSF support |
---|
| 143 | thanks to: Mahmoud Hanafi - mhanafi(a.t)csc(d.o.t)com |
---|
| 144 | for writing it! |
---|
[521] | 145 | a: GMETRIC_TARGET is now parsed from gmond.conf |
---|
| 146 | a: GMETRIC_BINARY is now looked for in PATH |
---|
[511] | 147 | f: queue selection support is now working |
---|
| 148 | thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu |
---|
| 149 | for the patch |
---|
| 150 | web) |
---|
| 151 | a: large graphs link for job report |
---|
| 152 | thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu |
---|
[523] | 153 | a: SHOW_EMPTY_COLUMN, SHOW_EMPTY_ROW options for ClusterImage hostname parsing |
---|
[511] | 154 | |
---|
[498] | 155 | 0.3.1: |
---|
| 156 | |
---|
| 157 | other) |
---|
| 158 | f: updated INSTALL since "addons" directory is not included by default anymore in Ganglia |
---|
| 159 | thanks to: Steven DuChene linux(d.a.s.h)clusters(a.t)mindspring(d.o.t)com |
---|
| 160 | for reporting it |
---|
| 161 | |
---|
| 162 | rpm) |
---|
| 163 | f: add "addons" directory since it's not included by default anymore in Ganglia |
---|
[501] | 164 | f: properly rewrite WEBDIR path in %files when rebuilding rpms with Makefile |
---|
[498] | 165 | |
---|
| 166 | web) |
---|
| 167 | f: typo in empty_cpu variable: causing incorrect 'free cpu' count and similar errors |
---|
| 168 | thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu |
---|
| 169 | for reporting it |
---|
[502] | 170 | f: changed erroneous domain detection a little |
---|
| 171 | thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu |
---|
[498] | 172 | for reporting it |
---|
| 173 | a: now properly detects whether or not to use FQDN or short hostnames w/o domain |
---|
[502] | 174 | thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu |
---|
[498] | 175 | thanks to: Jeffrey Sarlo - JSarlo(a.t)Central(d.o.t)UH(d.o.t)EDU |
---|
| 176 | for the many testing and reporting it |
---|
| 177 | |
---|
| 178 | SPECIAL THANKS to the University of Houston for sending me a shirt! |
---|
| 179 | |
---|
[500] | 180 | jobarchived) |
---|
| 181 | f: properly catch postgres exception |
---|
| 182 | f: don't use debug_message while loading config file |
---|
| 183 | |
---|
[452] | 184 | 0.3: |
---|
[342] | 185 | |
---|
| 186 | web) |
---|
| 187 | a: allow per-cluster settings/override options: see CLUSTER_CONFS option |
---|
| 188 | a: clusterimage can now draw nodes at x,y position parsed from hostname |
---|
[427] | 189 | see SORTBY_HOSTNAME for this in clusterconf/example.php |
---|
[342] | 190 | a: clusterimage nodes are now clickable: has link to all jobs from that host |
---|
[427] | 191 | a: clusterimage nodes now have a tooltip: displays hostname and jobids for now |
---|
[345] | 192 | a: jobmonarch logo image |
---|
| 193 | thank to: Robin Day |
---|
| 194 | for the design |
---|
[414] | 195 | a: rrd graph of running/queued jobs to overview |
---|
[460] | 196 | a: per-cluster settings for archive database |
---|
| 197 | thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr |
---|
| 198 | for the patch |
---|
[342] | 199 | |
---|
[414] | 200 | c: host archive view is now more complete and detailed in the same manner as |
---|
| 201 | Ganglia's own host view |
---|
[427] | 202 | c: host archive view available metric list is now compiled from disk, |
---|
| 203 | so that the detailed archive host view works even when the node is currently down. |
---|
[400] | 204 | c: removed size restrictions from detailed host archive view |
---|
| 205 | |
---|
[465] | 206 | f: compatibility: removed php5 call |
---|
[460] | 207 | thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr |
---|
| 208 | for the patch |
---|
[458] | 209 | f: prevent negative cpu/node calculation |
---|
| 210 | thanks to: aloga(a.t)ifca(d.o.t)unican(d.o.t)es |
---|
| 211 | for the patch |
---|
[364] | 212 | f: archive search not properly resetting nodes list |
---|
| 213 | thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr |
---|
| 214 | for the patch |
---|
[400] | 215 | f: detailed host view from jobarchive was broken since hostbased support of 0.2 |
---|
| 216 | now host view is properly set and parsed again |
---|
| 217 | thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr |
---|
| 218 | for reporting the bug and suggesting a patch |
---|
[403] | 219 | f: bug where jobstart redline indicator in host detail graphs was set incorrectly |
---|
[414] | 220 | or not at all due to a miscalculation in job times |
---|
[427] | 221 | f: bug where hostimage headertext xoffset was miscalculated, causing the column names |
---|
| 222 | to overlap their position when the columnname was longer than the columnvalues |
---|
[364] | 223 | |
---|
[342] | 224 | jobmond) |
---|
| 225 | |
---|
[376] | 226 | a: syslog support |
---|
[427] | 227 | a: report number of running/queued jobs as seperate metrics |
---|
| 228 | a: native gmetric support, much faster and cleaner! |
---|
[361] | 229 | thanks to: Nick Galbreath - nickg(a.t)modp(d.o.t)com |
---|
| 230 | for writing it and allowing inclusion in jobmond |
---|
| 231 | |
---|
[452] | 232 | f: crashing jobmond when multiple nodes amounts are requested in |
---|
| 233 | a queued job: numeric_node variable not initialized properly |
---|
| 234 | thanks to: aloga(a.t)ifca(d.o.t)unican(d.o.t)es |
---|
| 235 | for supplying the patch |
---|
| 236 | and many others for reporting and helping debug this |
---|
[361] | 237 | f: hanging/blocked, increased cpu usage and halted reporting |
---|
| 238 | thanks to: Bas van der Vlies - basv(a.t)sara(d.o.t)nl |
---|
| 239 | for discovering the origin of the bug |
---|
| 240 | thanks to: Mickael Gastineau - gastineau(a.t)imcce(d.o.t)fr |
---|
| 241 | for reporting it and testing the fix |
---|
| 242 | thanks to: Craig West - cwest(a.t)astro(d.o.t)umass(d.o.t)edu |
---|
| 243 | for reporting it and testing the fix |
---|
[342] | 244 | f: uninitialized variable in checkGmetricVersion() |
---|
| 245 | thanks to: Peter Kruse - pk(a.t)q-leap(d.o.t)com |
---|
| 246 | for the patch |
---|
[364] | 247 | f: undefined PBSError |
---|
| 248 | thanks to: Peter Kruse - pk(a.t)q-leap(d.o.t)com |
---|
| 249 | for reporting it |
---|
[342] | 250 | |
---|
[363] | 251 | r: SGE support broken |
---|
| 252 | |
---|
[361] | 253 | jobarchived) |
---|
| 254 | |
---|
[427] | 255 | a: can now use py-rrdtool api instead of pipes, much faster! |
---|
[376] | 256 | install py-rrdtool to use this |
---|
| 257 | backwards compatible fails back to pipes if module not installed |
---|
[367] | 258 | |
---|
[427] | 259 | c: all XML input was uniencoded, which could cause errors, |
---|
| 260 | now all properly converted to normal strings |
---|
| 261 | |
---|
[470] | 262 | f: when XML data source (gmetad) is unavailable parsethread didn't return correctly |
---|
| 263 | which caused a large number of threads to spawn while consuming large amounts of memory |
---|
[376] | 264 | f: autocreate clusterdirs in archivedir |
---|
| 265 | f: unhandled gather exception |
---|
[361] | 266 | f: incorrect stop_timestamping when jobs finished |
---|
| 267 | thanks to: Alexis Michon - alexis(d.o.t)michon(a.t)ibcp(d.o.t)fr |
---|
[376] | 268 | for finding and debugging/testing it |
---|
[361] | 269 | |
---|
[308] | 270 | 0.2: |
---|
| 271 | |
---|
| 272 | web) |
---|
[342] | 273 | f: misc. optimization and bugfixes |
---|
| 274 | f: now fully compatible with latest PHP5 and PHP4 |
---|
[308] | 275 | |
---|
[342] | 276 | c: cluster image now incorporates small text descr. |
---|
| 277 | c: monarch (cluster/host) images no longer displayed |
---|
| 278 | for clusters that are not jobmond enabled |
---|
| 279 | c: pie chart percentages are now cpu-based instead of node-based |
---|
[308] | 280 | |
---|
[342] | 281 | a: host template for Ganglia |
---|
| 282 | adds a extra monarch host image to Ganglia's host overview |
---|
| 283 | which displays/links to the jobs on that host |
---|
| 284 | NOTE!: be sure to copy/install new template from addons/templates |
---|
| 285 | a: (optional) nodes hostnames column |
---|
| 286 | thanks to: Daniel Barthel - daniel(d.o.t)barthel(a.t)nottingham(d.o.t)ac(d.o.t)uk |
---|
| 287 | for the suggestion |
---|
[308] | 288 | |
---|
| 289 | jobmond) |
---|
| 290 | |
---|
[342] | 291 | f: when a job metric is longer than maximum metric length, |
---|
| 292 | the info is split up amongst multiple metrics |
---|
| 293 | f: no longer exit when batch server is unavailable |
---|
| 294 | thanks to: Peter Kruse - pk(a.t)q-leap(d.o.t)com |
---|
| 295 | for the patch |
---|
| 296 | f: fd closure bug causing stderr/stdout to remain open after daemonizing |
---|
[308] | 297 | |
---|
[342] | 298 | c: rearranged code to allow support for other batch systems |
---|
[308] | 299 | |
---|
[342] | 300 | a: (experimental) SGE (Sun Grid Engine) support as batch server |
---|
| 301 | thanks to: Babu Sundaram - babu(a.t)cs(d.o.t)uh(d.o.t)edu |
---|
| 302 | who developed it for a OSCAR's Google-SoC project |
---|
| 303 | a: pidfile support |
---|
| 304 | thanks to: Michael Jeanson - michael(a.t)ccs(d.o.t)usherbrooke(d.o.t)ca |
---|
| 305 | for the patch |
---|
| 306 | a: usage display |
---|
| 307 | thanks to: Michael Jeanson - michael(a.t)ccs(d.o.t)usherbrooke(d.o.t)ca |
---|
| 308 | for the patch |
---|
| 309 | a: queue selection support: ability to specify which QUEUE's to get jobinfo from |
---|
| 310 | thanks to: Michael Jeanson - michael(a.t)ccs(d.o.t)usherbrooke(d.o.t)ca |
---|
| 311 | for the patch |
---|
[308] | 312 | |
---|
| 313 | jobarchived) |
---|
| 314 | |
---|
[342] | 315 | f: XML retrieval for Ganglia version >= 3.0.3 working properly again |
---|
| 316 | f: database storing for Ganglia version >= 3.0.3 working properly again |
---|
| 317 | f: fd closure bug causing stderr/stdout to remain open after daemonizing |
---|
[308] | 318 | |
---|
[342] | 319 | c: misc. bugfixes to optimize XML connections |
---|
| 320 | c: misc. bugfixes for misc. minor issues |
---|
[308] | 321 | |
---|
[342] | 322 | a: cleaning of stale jobs in dbase: see JOB_TIMEOUT option |
---|
[308] | 323 | |
---|
[283] | 324 | 0.1.1: |
---|
[249] | 325 | |
---|
| 326 | web) |
---|
| 327 | |
---|
[342] | 328 | f: misc. layout bugs for overview & search |
---|
| 329 | f: bug that occured when calculating the number of nodes when there |
---|
| 330 | was more than one job running on a machine |
---|
[253] | 331 | |
---|
[342] | 332 | c: column requested memory is now optional through conf.php |
---|
| 333 | c: search and overview tables are now full screen (100%) |
---|
| 334 | c: overview jobnames are now cutoff at max 9 characters |
---|
| 335 | to prevent (layout) scews in the tables |
---|
| 336 | c: overview graphs are no longer downsized |
---|
[253] | 337 | |
---|
[342] | 338 | a: (optional) column 'queued' (since) in overview |
---|
| 339 | a: search results (can) now have a SEARCH_RESULT_LIMIT |
---|
| 340 | this increases performance of the query's significantly! |
---|
| 341 | a: date/time format as displayed is now configurable through conf.php |
---|
[249] | 342 | |
---|
| 343 | jobmond) |
---|
| 344 | |
---|
[342] | 345 | a: now reports 'queued since' (or creation time) of jobs |
---|
[249] | 346 | |
---|
| 347 | documentation) |
---|
| 348 | |
---|
[342] | 349 | f: wrong e-mail adress in INSTALL (doh!) |
---|
[249] | 350 | |
---|
[342] | 351 | 0.1: |
---|
[249] | 352 | |
---|
| 353 | - First public release |
---|