On 24-Sep-2015 08:47 -0600, John McKee wrote:
Yesterday was the fourth week in a row that Go SAVE 21 failed. But,
very differently.
  Minimally, the symptoms from the joblog should be offered.  The term 
"failed" is so nebulous as to ensure eliciting of no worthwhile 
response, just inquiries for more information to elucidate the failure.
Following the misadventure where a tape got stuck, I ended up (since
time was already allocated) doing an IPL.  That lowered max unprotect
from 20G to around 5G yesterday.  Combine that with save/delete of a
number of libraries.  No idea if there is any relation.
  So some reduced storage required for the *NONSYS portion prior to 
this save.  That could have prevented the need for the tape-change 
[volume-change] condition.?
  But apparently no other attempts at preventive prior to that save; 
i.e. no attempt was made prior to the save to do any of?:
 1> mark the file named flightrec [in whatever directory that resides] 
as AlwSav=No
 2> disable\remove BRMS
 3> disable\remove MSE
So, yesterday, I saw the SAV portion move along.  Then, it froze for
a long time.  Initially, I thought perhaps just a really big IFS
file.  Maybe, but no idea.  The progress line disappeared at some
point and then reappeared. That was followed by messages that
spooled output had been created for... a number of times.  I think 21
spool files.Disk usage dropped as a result of IPL and removal of
libraries to the point that I thought it was low enough (from before)
that second tape would not be needed.  There was NO message
requesting a tape change.  Just failed backup, and device left FAILED
and damaged.
  Without any preventive action taken prior to that save, the same 
seize contention could occur for the flight recorder, but occur in a 
different code path; the notable wait could have been the effect of the 
seize-wait [similar to how a lock-wait is manifest].
  While the request to log to the [BRMS] flight recorder was visible 
from the tape-change in the prior failures, perhaps this time the [BRMS] 
flight recorder logging is still occurring concurrent to the dump\save 
of the descriptor with the flightrec file, but from a different code 
path.  And this time, the failing code path apparently has the First 
Failure Data Capture (FFDC) active, for which the effect is:
But, this is different. This time, a problem was logged. The symptom
string is 5722 F/QTADMPDV.
  A symptom that is generated as a result of FFDC, manifest as a 
Problem Record visible in the Work With Problems (WRKPRB), and produced 
if the Software Error Logging (QSFWERRLOG) system value says to do so, 
in response to an unexpected failure.
I did attempt to Google this. But, I do not understand what I
received, as it appears to be related to MLB installation.
  The generated symptom-string is both sparse and generic, from which 
inferences from other issue with the same symptom are unlikely to be 
telling.
  The actual data would need to be reviewed to interpret the meaning 
for the failure data being logged.  IIRC the QTADMPDV is the program 
[TA=Tape, DMP=Dump, DV=Device] of the Tape feature that will dump the 
Device to include the Tape Flight Recorder information for that device 
[and the corresponding tape MLB]; typically that is something the user 
is asked to call, but the FFDC may invoke that feature in response to 
whatever failure that feature was logging.  The generated symptom may 
merely indicate that a non-specific failure is logged to include 
dump-tape info, rather than a failure of that dump-tape feature itself.
Just looked at PRTERRLOG. I decided to no longer use the "standard"
of WEEKLY for the volume id. The tape I used had one temporary write
error, with 138457 M Bytes written. This was the first time this
volume id had ever been used.
  My guess, that is probably immaterial with regard to the failure.
To summarize: four fails, only last one logged a problem, last
failure also did NOT issue a tape change request, and there were
entries in LIC LOG much as before. The LIC LOG had no activity posted
prior to the failure, which is also different. The entries, to my
uneducated mind, appear to be same type as before.
  My guess, this failure was the same issue, manifest from a different 
place.  Only by review of the spooled data and the VLog data, could that 
be inferred with any confidence.  Like data collection from the prior 
failures, I would be willing to review the LIC data and the 
aforementioned spooled data produced within the job, to try to infer 
what transpired and then to describe that failure in terms more easily 
digested.
Question: Is this one of *those* issues that violate that stupid
dogma "If it ain't broke...", and was there some PTF already
created?
  Could be lack of maintenance allows the failure; that a preventive 
PTF may exist, but is not applied.  But again, personally, I would 
ensure I had a good save of the system [even if not a GO SAVE opt-21; 
something from which the system could be reloaded if necessary] before 
applying maintenance.  As I recall, the directions for applying 
maintenance suggest a save before and one after [even if that almost 
never happens]; being on a system that has effectively no support except 
on a pay-as-you-go basis, that suggestion [despite any mis-recollection 
I might have] is probably sage advice in that situation.
Also, from WRKPRB, F11 display APAR library, shows 45 entries.
Only two do NOT text. The text for the others has this: Problem
1526634180 and system serial number.
Any distant bells ringing?
  I do not recall, and I do not have any access, to see what WRKPRB 
looks like [there is a dearth of panels in docs, and so I will not even 
waste time to look for any], so I have no idea what the big number is 
nor why the same number would appear.  Also I have no idea why any would 
appear devoid of text.
  Without some specifics about symptoms [e.g. from the joblog and other 
spooled results, or the problem entries themselves], a reader is left 
with nothing but wanting to ask for details, before any worthwhile 
reply\comment could be composed.
As an Amazon Associate we earn from qualifying purchases.