On 13-Oct-2014 01:18 -0500, Graap, Kenneth wrote:
I have a Power720 system with 2 active processors, 96GB of RAM and
an 8TB iASP - 50% utilized. Not a large system by today's standards.
For some reason that is still being determined, this system CRASHED
HARD (immediately did a MSD and IPL'ed) right at the beginning of a
work day.
If there was a successful MSD, then the crash actually was on the
softer side of "hard"; given a scale of softest to hardest. While *any*
crash on a production system is "hard" from the perspective of the
system\partition owner and users, any crash that both generates and
properly stores a full Main Storage Dump (MSD) is relatively somewhat
"soft", from the perspective of the OS.
After a hard crash like this it is strongly recommended that a
RCLSTG process be run.
Surely recommended after the hardest of crashes; a crash that results
from an effective pulled power plug, with either of a failed UPS or no
UPS, for which the failure to perform any shutdown processing and\or
even to produce a MSD would be the effect. Other "softer" crashes
[mostly software, even sometimes, but less so, if due to limits],
whereby the MSD was able to be written to disk, then probably not so
strongly.
AFaIK, the reclaim is not really "strongly recommended" anymore,
outside of a power loss. My recollection is that IBM has long been
advocating that no Reclaim Storage (RCLSTG) be performed except when
recommended by service [as likely resolution to an identified issue for
which an alternate corrective is not possible or unreasonable] or given
error messages for which the request was identified by the system as a
preferable recovery action [though those are more likely to be about and
thus to be resolved by, a reclaim of just the *DBXREF, rather than using
a full reclaim]. That direction to discourage the request, due entirely
to the requirements to perform the reclaim being incompatible with most
service\up-time requirements, was a direct consequence of the systems
having been enabled to become and thus becoming much larger due to
having extended system limits over time. There should be some KB
articles [now called TechNote documents] that imply the avoidance of a
full reclaim is most desirable, and that other means are available for
corrections of specific issues that a full RCLSTG might otherwise be
used as resolution; e.g. Reclaim DB Cross Reference (RCLDBXREF) [or
RCLSTG SELECT(*DBXREF)] for System Database XREF issues, Change object
Owner (CHGOBJOWN) and Grant Object Authority (GRTOBJAUT) to correct
ownership\authority issues, Reclaim Objects by Owner (RCLOBJOWN) to
correct out-of-context /QSYS.LIB objects, and Reclaim Object Links
(RCLLNK) for some file-system issues.
After negotiating a 12 hour window on Sunday, I started running
RCLSTG against the 8TB iASP...
As an iASP reclaim, I seem to recall that given their probable and
preferred use in HA, then the backup\mirror iASP would be activated in
place of the one taken offline, and thus that offline copy could be
reclaimed _while_ the other iASP is active in its place.?
After over 9 hours, the "Reading objects from disk" step was only
51% complete with an estimated remaining time of almost 9 more
hours!
I do not recall what is a reasonable estimate for that phase. Long
ago I recall some user experiences being documented with timings and
some specific configuration information; gathered from actual reclaim
requests, from the data stored in the QRCLSTG data area.?
Reclaim Storage in Progress S02
10/12/14 23:01:46
RCLSTG:
Select/Omit/ASP device or group : *ALL *NONE IASP1
Start date and time . . . . . . : 10/12/14 13:47:38
Current step / total . . . . . : 2 7
Reclaim Storage Step Percent Time Elapsed Time Remaining
Data base/library/directory recovery 100 00:00:11 00:00:00
>Reading objects from disk 51 09:13:54 08:52:10
Processing data base relationships 0
File ID table recovery 0
Directory recovery 0
Object description verification 0
Final cleanup 0
Total . . . . . . . . . . . . . . . . . . . . : 09:14:05
The "almost 9 more hours" was solely for that one phase of
processing; i.e. the estimated Time Remaining includes just that
currently active step [as designated with the greater than sign on the
same line of output].?
FWiW: The RECLAIM instruction also implements the Retrieve Disk
Information (RTVDSKINF) request. Thus the requirements for that similar
phase of processing [listing and some minimal review\processing of the
objects in the permanent storage directory] in both, could probably be
inferred from making and timing that RTVDSKINF request.
I had to abort the Reclaim Storage process and bring up the system,
knowing full well that when (or if) I restart RCLSTG it will start from
the beginning again.
It seems like the RCLSTG process needs to be modified in some way
that it can complete in a reasonable amount of time. Maybe it could
be designed to multitask (??)
The RECLAIM instruction already operates with multiple LIC tasks to
perform the stage shown; obtaining the list of every /object/ from the
Permanent Storage Directory. I believe the number of tasks that are
used for the request, is calculated based on the CPU [number] and
storage [size; possibly also arms] configurations; possibly also the
available memory. A review of the [number of] RCxxx tasks in the LIC
task list [IIRC: STRSST, D/A/D, ...] would show how many of those LIC
tasks are active for the RECLAIM instruction.
or keep track of what it had done so if restarted it could continue
where it left off (??) ...
If the operation is both discouraged and has lower-impact
alternatives, then the costs to achieve that capability [by IBM] for the
perceived benefits [of the few using the feature[, likely would be hard
to justify; i.e. very seldom and by few, would the benefits be experienced.
IMO the request is most often performed more for the placebo effect
than for any legitimate effect. But the pain of that pill [the cost of
the reclaim] often far outweighs the actual benefits that the reclaim
could offer as relief for the pain to the system, in care of the
system-owner.
All I know is it isn't working for me...
Has anyone else experienced this?
Consider: If a scratch-install Disaster Recovery of the disks is
faster than a RCLSTG, then performing that DR restore and applying
changes since the last save\backup, could be considered a /better/
choice than the reclaim; i.e. effectively the same result [though an
even /cleaner/ effect], achieved quicker.
As an Amazon Associate we earn from qualifying purchases.