On 11 Apr 2013 09:02, dale janus wrote:
You probably explained it. I may have orphaned the lock if I tried
to cancel the CHGPF job.
  If indeed a lock was orphaned, one that is not visible via WRKOBJLCK, 
then it was almost surely due to a defect in the OS code.
  I seem to recall that it was deemed acceptable for the non-commit DB 
recovery code path to leave its normal object locks pending a recovery 
being initiated for the interrupted request, such that only invocation 
terminations that also ended the job would have those locks implicitly 
dropped.  IIRC that effect was due to there being no /invocation exit/ 
established for that code path.  Without such a /cancel handler/ being 
established, only a request that either successfully ran to completion 
or failed due to handled exceptions, would back out those locks.  Thus 
ENDRQS and unmonitored exceptions that caused termination of the request 
could leave any locks that had been obtained; acceptably.  However any 
non-standard types of locks, i.e. locks other than the object and data 
locks, such as SLLs, were supposed to be /protected/ from EndRqs, 
specifically to ensure that they could not be orphaned due to a user 
request to end the invocation.
 I really don't remember if I let it time out or not.
  But if the CHGPF request had failed instead due to timing out trying 
to obtain all of the necessary locks, then as a /normal/ and monitored 
failure, the code should have dropped any locks the processing had been 
able to obtain before backing-out its attempt at forward progress. 
Another job should not encounter any conflicting locks for its requests 
against the file, if the job requesting the CHGPF had failed solely due 
to its inability to allocate the file; i.e. failed due to CPF3202 or 
CPF3203 being issued as the error, per a timeout for the CHGPF request 
obtaining the necessary locks to proceed with its work.
Probably if I had signed off from the session that ran the CHGPF,
the lock would have been released.
  Yes.  Although if the situation could be recreated on a test file, 
whereby the conflicting lock is not visible on WRKOBJLCK, then that is 
probably a defect that can be reported.
Or if I would have changed the heading using SQL or the database part
of ops navigator, but green screen commands die hard.
  I think only performing the change operation under commitment control 
would have changed the outcome; the SQL requested with no isolation 
would perform effectively the same request as the CHGPF SRCFILE(named) 
when that source changes just column headings.  That is presumed, solely 
due to the different implementation, for how locks are registered and 
removed in the commit vs non-commit code paths for database recovery. 
That leaves only LABEL ON to effect the request [under commitment 
control] because an SQL ALTER request does not give the option to change 
the column labels like the request to CHGPF SRCFILE(specified) does.
  When a termination occurs and the work has been registered under 
commitment control, then the locks that had been obtained are dropped as 
part of the explicit or implicitly rolled back [ROLLBACK] for an 
interrupted request, or when the successfully completed operation is 
eventually committed [COMMIT].
We are running V7R1 and applied latest cum a few weeks ago.
  The condition may be easy to recreate in a test environment; one that 
could use jobs that would not need to be the web interface, but might 
mimic what the web interface did.  Such a recreate scenario could be 
submitted as a defect report to the service provider, in expectation of 
an APAR and PTF from IBM.  Getting a PTF as preventive sure beats 
encountering the problem again, and could save others the same hassle.
I am still concerned that WRKOBJLCK did not show the problem,
  As would I be... and likely indicative of a defect with the OS.
  If the origin was a conflict with a held\orphan SLL, the nature of 
SLLs, as I recall, would not allow presentation via WRKOBJLCK very 
easily nor especially at all efficiently.  A Space Location Lock is on 
any space, and is not specific to an /object/ as an allocated resource. 
And as I recall they are easiest obtained from the job, and why they are 
available via the Retrieve Job Locks (QWCRJBLK) API [and similar] based 
on MATPRLK but not via the List Object Locks (QWCLOBJL) API based on 
MATOBJLK.  As noted in my earlier reply, I believe iNav has an interface 
to show SLLs that are held, perhaps also showing waiters, though most 
likely in an interface requesting for information about one or more 
/job/ vs an requesting information about an /object/ ; i.e. a /job/ 
interface vs an /object/ interface.?  Anyhow...
  An option to materialize a list of SLLs using a base address of any 
space object type as input would be nice.  As it is, the specific 
address with offset [the specific location] must be requested to inquire 
of a list of any active holders.  Otherwise all processes would have to 
be materialized for all of their held SLLs, and then paring down the 
list of addresses to those that share the base addresses of interest. 
If that were available via the LIC, then the database could inquire of 
all of the base addresses of its various space objects that make up the 
composite object of the database *FILE [for a request from the Work 
Control feature (WC) via WRKOBJLCK], to present the effects on an 
object-basis.
but I can understand it now due to the odd nature of my problem.
  Odd, as in, likely defect.  Not as in /understand/ that there is 
something that was done wrong; just that what was done, if EndRqs, might 
validly leave locks, but would not /validly/ leave locks that are not 
visible from WRKOBJLCK [based on my recollection of design intent for 
the OS database feature (DB)]
  Were there any errors preceding the -913 in those jobs getting the 
SQL0913?  Any such messages could assist to find the origin; e.g. 
MCH5804 "Lock space location operation not satisfied ..." vs MCH5802 
"Lock operation for object &1 not satisfied" clearly diagnoses that what 
type of lock was origin for the conflict.  The failing instructions 
identify exactly the code that requested the lock, and the code path in 
which that lock request is could make the reason the lock was requested 
very conspicuous; e.g. a preceding test in the OS code that says "if the 
mutex-like indicator is set, then request a read-SLL to ensure not to 
proceed until the SLL can be obtained" could be very revealing as to origin.
  While I had suggested in my earlier reply that OPEN is unaffected by 
pending recovery, I seem to recall that an open by the SQL might have a 
protocol for delaying an open pending completion of certain 
identified-as /exclusive/ work for which a member or data lock might not 
be held to prevent the open, but work for which the SQL should probably 
await completion.  And I suppose that exclusivity might have been 
implemented via a flag in the file [as a space, it can be changed 
irrespective of locking], which as an effective mutex informing the SQL 
that it must await completion of some changes, and perhaps that was 
implemented via Space Location Locks [SLL]; i.e. that location would 
have been locked by the CHGPF requester, an SLL obtained, then the SQL 
open would await a lock on that location if the exclusive-work flag was 
set.  I seem to recall that some easy action would reset the flag in 
situations where the flag was improperly left on... perhaps something 
like DSPFD.?
  FWiW here is a v6 issue describing the /change file/ interface as an 
example of the OS DB leaving an orphan lock.  That example involved 
referential integrity, where the orphaned lock was left on the parent 
file vs the child file with the dependent data; no mention of the type 
of lock that was orphaned:
http://www.ibm.com/support/docview.wss?uid=nas3bd0dcd2b5f3164e28625772a0073bbb3
  4refOnly:
http://www.google.com/search?q=%22space+location+lock%22+sql0913+OR+%22-913%22+OR+%22msgsql0913
As an Amazon Associate we earn from qualifying purchases.