Re: Has Reclaim Storage becoming outdated? -- MIDRANGE-L

On 13-Oct-2014 01:18 -0500, Graap, Kenneth wrote:

I have a Power720 system with 2 active processors, 96GB of RAM and
an 8TB iASP - 50% utilized. Not a large system by today's standards.

For some reason that is still being determined, this system CRASHED
HARD (immediately did a MSD and IPL'ed) right at the beginning of a
work day.

If there was a successful MSD, then the crash actually was on the softer side of "hard"; given a scale of softest to hardest. While *any* crash on a production system is "hard" from the perspective of the system\partition owner and users, any crash that both generates and properly stores a full Main Storage Dump (MSD) is relatively somewhat "soft", from the perspective of the OS.

After a hard crash like this it is strongly recommended that a
RCLSTG process be run.

Surely recommended after the hardest of crashes; a crash that results from an effective pulled power plug, with either of a failed UPS or no UPS, for which the failure to perform any shutdown processing and\or even to produce a MSD would be the effect. Other "softer" crashes [mostly software, even sometimes, but less so, if due to limits], whereby the MSD was able to be written to disk, then probably not so strongly.

AFaIK, the reclaim is not really "strongly recommended" anymore, outside of a power loss. My recollection is that IBM has long been advocating that no Reclaim Storage (RCLSTG) be performed except when recommended by service [as likely resolution to an identified issue for which an alternate corrective is not possible or unreasonable] or given error messages for which the request was identified by the system as a preferable recovery action [though those are more likely to be about and thus to be resolved by, a reclaim of just the *DBXREF, rather than using a full reclaim]. That direction to discourage the request, due entirely to the requirements to perform the reclaim being incompatible with most service\up-time requirements, was a direct consequence of the systems having been enabled to become and thus becoming much larger due to having extended system limits over time. There should be some KB articles [now called TechNote documents] that imply the avoidance of a full reclaim is most desirable, and that other means are available for corrections of specific issues that a full RCLSTG might otherwise be used as resolution; e.g. Reclaim DB Cross Reference (RCLDBXREF) [or RCLSTG SELECT(*DBXREF)] for System Database XREF issues, Change object Owner (CHGOBJOWN) and Grant Object Authority (GRTOBJAUT) to correct ownership\authority issues, Reclaim Objects by Owner (RCLOBJOWN) to correct out-of-context /QSYS.LIB objects, and Reclaim Object Links (RCLLNK) for some file-system issues.

After negotiating a 12 hour window on Sunday, I started running
RCLSTG against the 8TB iASP...

As an iASP reclaim, I seem to recall that given their probable and preferred use in HA, then the backup\mirror iASP would be activated in place of the one taken offline, and thus that offline copy could be reclaimed _while_ the other iASP is active in its place.?

After over 9 hours, the "Reading objects from disk" step was only
51% complete with an estimated remaining time of almost 9 more
hours!

I do not recall what is a reasonable estimate for that phase. Long ago I recall some user experiences being documented with timings and some specific configuration information; gathered from actual reclaim requests, from the data stored in the QRCLSTG data area.?

Reclaim Storage in Progress S02
10/12/14 23:01:46
RCLSTG:
Select/Omit/ASP device or group : *ALL *NONE IASP1
Start date and time . . . . . . : 10/12/14 13:47:38
Current step / total . . . . . : 2 7

Reclaim Storage Step Percent Time Elapsed Time Remaining
Data base/library/directory recovery 100 00:00:11 00:00:00
>Reading objects from disk 51 09:13:54 08:52:10
Processing data base relationships 0
File ID table recovery 0
Directory recovery 0
Object description verification 0
Final cleanup 0

Total . . . . . . . . . . . . . . . . . . . . : 09:14:05

The "almost 9 more hours" was solely for that one phase of processing; i.e. the estimated Time Remaining includes just that currently active step [as designated with the greater than sign on the same line of output].?

FWiW: The RECLAIM instruction also implements the Retrieve Disk Information (RTVDSKINF) request. Thus the requirements for that similar phase of processing [listing and some minimal review\processing of the objects in the permanent storage directory] in both, could probably be inferred from making and timing that RTVDSKINF request.

I had to abort the Reclaim Storage process and bring up the system,
knowing full well that when (or if) I restart RCLSTG it will start from
the beginning again.

It seems like the RCLSTG process needs to be modified in some way
that it can complete in a reasonable amount of time. Maybe it could
be designed to multitask (??)

The RECLAIM instruction already operates with multiple LIC tasks to perform the stage shown; obtaining the list of every /object/ from the Permanent Storage Directory. I believe the number of tasks that are used for the request, is calculated based on the CPU [number] and storage [size; possibly also arms] configurations; possibly also the available memory. A review of the [number of] RCxxx tasks in the LIC task list [IIRC: STRSST, D/A/D, ...] would show how many of those LIC tasks are active for the RECLAIM instruction.

or keep track of what it had done so if restarted it could continue
where it left off (??) ...

If the operation is both discouraged and has lower-impact alternatives, then the costs to achieve that capability [by IBM] for the perceived benefits [of the few using the feature[, likely would be hard to justify; i.e. very seldom and by few, would the benefits be experienced.

IMO the request is most often performed more for the placebo effect than for any legitimate effect. But the pain of that pill [the cost of the reclaim] often far outweighs the actual benefits that the reclaim could offer as relief for the pain to the system, in care of the system-owner.

All I know is it isn't working for me...
Has anyone else experienced this?

Consider: If a scratch-install Disaster Recovery of the disks is faster than a RCLSTG, then performing that DR restore and applying changes since the last save\backup, could be considered a /better/ choice than the reclaim; i.e. effectively the same result [though an even /cleaner/ effect], achieved quicker.