Opened 9 years ago
Last modified 8 years ago
#54 assigned enhancement
pbs_submit_hash()
Reported by: | glen.beane@… | Owned by: | bas |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | pbs | Version: | 4.6.0 |
Keywords: | Cc: | nate@… |
Description
We've found that pbs_submit() has not been reliable. Occasionally pbs_server gets into a state that causes all jobs submitted via pbs_submit() to fail (this is not a problem with pbs_python -- the problem exists even with the C API directly). However, pbs_submit_hash() continues to work in this case. Since Torque moved to pbs_submit_hash(), I don't feel that pbs_submit() is as well tested.
We'd like to move our applications from pbs_submit() to pbs_submit_hash(), but I don't think all the functionality we need is in pbs_python.
Here is a short snippet of C code using pbs_submit_hash:
int fd = pbs_connect(0); char *new_jobid; memmgr* mm; job_data* job_attrs = 0;
memmgr_init(&mm, 0);
/* pass empty ATTR_v, just to show use of hash_add_or_exit */ hash_add_or_exit(&mm, &job_attrs, ATTR_v, "", ENV_DATA);
pbs_submit_hash(fd, &mm, job_attrs, NULL, "/tmp/test.sh", NULL, NULL, &new_jobid, NULL);
Attachments (0)
Change History (9)
comment:1 Changed 9 years ago by bas
- Status changed from new to assigned
comment:2 Changed 9 years ago by glen.beane@…
We are using 4.2, but we are considering an upgrade to Torque 5 (first we need to fully test with our pipeline framework, which uses pbs_python)
I just looked at Torque in github, in the 5.1.0 branch in git pbs_submit_hash_ext() is declared in include/pbs_ifl.h, pbs_submit_hash() is declared in lib/Libifl/lib_ifl.h, hash_add_or_exit() is in u_hash_map_structs.h
qsub still calls pbs_submit_hash() directly
all pbs_submit_hash_ext() does is call pbs_submit_hash(), but it takes void* instead of job_data_container*:
int pbs_submit_hash_ext(
int socket, void *job_attr, void *res_attr, char *script, char *destination, char *extend, /* (optional) */ char return_jobid, char msg) { return pbs_submit_hash(socket,
(job_data_container *)job_attr, (job_data_container *)res_attr, script,destination,extend,return_jobid,msg);
}
/* END pbsD_submit.c */
comment:3 Changed 9 years ago by bas
Glen thanks for sorting this out. I am only using pbs_ifl.h, that is the public available functions api for libtorque. In torque 5.X there is only a definition for:
- pbs_submit_hash_ext
So i have to test in our 5.X test cluster. I do no want to include more and more header files like lib_ifl.h', maybe we can use the new function pbs_submit_hash_ext`. As said I can not test torque 4.X. Adaptive does a lot of interface changing (API) between versions. That is hard to keep up with ;-(
comment:4 Changed 9 years ago by anonymous
The guys from adaptive computing found the problem in the pbs_submit function and come up with a patch, thanks to David Beer:
diff --git a/src/lib/Libifl/pbsD_submit.c b/src/lib/Libifl/pbsD_submit.c index e096aa8..ca41cca 100644 --- a/src/lib/Libifl/pbsD_submit.c +++ b/src/lib/Libifl/pbsD_submit.c @@ -131,7 +131,7 @@ char *pbs_submit_err( if ((script != NULL) && (*script != '\0')) { - if (PBSD_jscript(c, script, NULL) != 0) + if (PBSD_jscript(c, script, return_jobid) != 0) { *local_errno = PBSE_BADSCRIPT;
comment:5 Changed 9 years ago by anonymous
see also #62
comment:6 Changed 8 years ago by nate@…
- Cc nate@… added
Users of pbs_python in Galaxy have reported this issue as well and the fix in comment:4 is working for them. If this could be included in a pbs_python release that'd be great. Here's the issue thread from Galaxy:
comment:7 Changed 8 years ago by glen.beane@…
We also fixed this by patching libtorque on a system that was still running Torque 4. The fix was described on the Torque mailing list and is the same one in the Galaxy issue linked in comment #6.
Note that this is fixed in more recent versions of Torque, so if you have an up to date Torque, then this isn't an issue.
Nate: I don't think you can include this fix with pbs_python because it involves patching libtorque, which is not provided by pbs_python.
comment:8 Changed 8 years ago by nate@…
Hey Glen, thanks for the hint, I totally missed that point.
comment:9 Changed 8 years ago by bas
It is interesting to read all comments and where pbs_python is used.
Glen which version of torque do you use?
For torque 4.x (I do not have installed or can test it)
For torque 5.X i see this function:
I do not see any function named hash_add_or_exit.