Re: How does the i handle programs in memory using different methods? -- MIDRANGE-L

Jon:

See embedded remarks below.

(Note that the following comments do not apply to programs running within PASE.)

HTH,
Mark S. Waterbury

> On 7/12/2015 5:17 AM, Wilson, Jonathan wrote:

An odd question occurred to me while playing around with some code.

How does the i handle individual invocations of a list of programs with
regard to a simple call versus all of the programs compiled into one
massive single program?

There is always some "overhead" to "activate" each program or service program within a job or activation group. And so, there is also additional overhead for the OS to activate multiple programs, versus activating just one large program where all of the *MODULEs are bound together into a single *PGM object. However, as POWERx architecture RISC processors keep getting faster and faster, this "overhead' becomes less and less noticeable. This was part of the rationale for ILE providing an ability to bind one or more *MODULEs to create a single *PGM or *SRVPGM object, using the CRTPGM or CRTSRVPGM commands. (When ILE was designed, OS/400 ran on much slower IMPI technology and CISC hardware, so the overhead of activating multiple programs versus one large program was far more significant and very noticeable.)

If I have a program that calls another program that calls another
program (say a menu program calling WW-customers calling ww-customer
types) I now have 3 programs on the call stack. Does/can the i decide
that as the first two programs are no longer actively running, they are
effectively paused until return from the last most called program, they
can be good candidates for the i equivalent of paging?

IBM i and OS/400, like System/38 CPF, is truly a "demand-paging" virtual memory system. The idea of a "single level storage" means that all objects reside on DASD, and all of real main storage (memory) is just a (large) "cache" for any objects that are currently (or recently) "in use".

Does the i even care what part of a program or variable is in or out of
memory or does it use some kind of "well that bit hasn't run for a
little while/long while/very long while" and just shift accordingly
having no knowledge of the state of a program within a stack of calls?

With any "demand paged" virtual memory, if a page goes unreferenced for a long enough period of time, eventually, its "page frame" (or "slot" in real memory) may be needed to make room for some other page that needs to be brought in, and so it might need to be get paged out. Note that modern IBM i POWER6/7/8 systems have vastly larger main storage sizes in the tens or even hundreds of Gigabytes, compared with IBM i systems of just a few years ago (Power4, Power5, etc.) -- the larger the real main storage "cache" for holding pages of the single-level storage objects in real memory, the less likely that other pages need to get "paged out" (or swapped out, aka "stolen"), to make room to bring in other pages of other objects that are needed by the same or other jobs.

The hardware (page table entry) maintans a "refeerenced" and "changed" bit for each page frame or "slot." Whenever a program fetches from a page (e.g. reads the value of a variable or loads executable instructions into the processor, etc.), the "referenced" bit is set on for that page slot. Whenever a program alters the value in a variable, the "changed" bit gets set on for that page in memory. Then, periodically, the OS scans the page tables to determine which pages are going "unreferenced" and also notices which pages have changed, and then those bits are reset to 0. Then, if a slot is needed for another page, and the page previously in that slot was "changed" it must first be written back out to "backing store" (DASD), before another page can be loaded into that slot. On the other hand, if a slot is needed and that page has not changed, it can simply be overwritten with the new contents, without first having to save the previous contents. So, the OS will usually prefer to replace such "read-only" pages before having to force other pages out to "backing store" since that requires additional I/O operations that will slow down the whole process, since they must be completed sequentially (old page contents get saved before new page contents are loaded).

If, however, I was to write the programs as say three service programs
or three modules and then link them into one massive "pc application"
style program how much difference would there be as to how the i would
handle such a program in comparison to the first example?

Service programs (*SRVPGM) are very much like Dynamic Link Libraries (DLLs) in OS/2 or MS-DOS and Windows, or like "shared libraries" in Unix or Linux. You can bind one or more *MODULEs to create one large *SRVPGM, and then many *PGMs can use it and call procedures within the same *SRVPGM, and within the same job, there is only one "activation" of that *SRVPGM (per activation group). In some ways, this can give you many of the benefits of "linking everything into one large application program" as you suggested. But, there are also maintenance advantages to using service programs, versus linking many of the same *MODULEs into many different *PGMs. Consider what happens when you need to make changes to one of these *MODULEs. You would then need to hunt down everyplace where it is bound into, and then replace it (e.g. using UPDPGM). With *SRVPGMs, you can strive to ensure that each *MODULE is only ever bound into one *SRVPGM, and then all client programs dynamically call the procedures of that *MODULE within that one *SRVPGM, and that way, when you need to make changes to that *MODULE, you have just one *SRVPGM object to update (via the UPDSRVPGM command.)

Also the start up of "a program". I can't recall if the i loads all of
the program and initialises all of its variable memory when a program is
called, or if it just loads chunks of code and then initialises the
variables the first time it comes across them, or even some combination
- perhaps something totally different - of the above.

This is called "activation". What happens is, some virtual memory is allocated for any static storage (variables) for that *PGM or *SRVPGM, within the activation group that it is running under. This is completely separate from the "executable code" which simply resides in single-level storage (virtual memory) and gets paged-in and paged-out on demand.

I understand, as far as I can recall, the concepts of the single level
storage where everything is just an address with the OS handling the
details of if the address is in memory or on disk etc. But at some point
for a newly started program the variables used must be set up to be
unique to the user/job/etc. running the program so until that point the
memory of the programs variables have no address... after that the
variables can be dealt with by the single level storage, in memory, in
NVRAM, on SSD's or on good old spinning disks.

With single-level storage, everything (both data and executable code) resides in the same vast address space, and gets "demand paged" into memory when it is first referenced. During program activation, any initialized static storage will also be initialized to its "initial value" at that time.

I'm guessing, but might be wrong, that the actual program (real good old
honest logic code) in its non active state looks no different to the
program once its in an active state so the program "code" fits nicely
into the single storage level concept so there is no overhead of
un-packing code from its on disk structure to its in memory footprint.

All compilers on IBM i or OS/400 always generate "reentrant code" so just one copy of the executable code can always be shared across any number of jobs and processes. This is also in the same single-level storage virtual memory, and so the code gets paged in and paged out in the same way as any other virtual memory pages. Note however that if multiple jobs are using the same code, it will be getting referenced far more often, and so become far less likely to ever get selected to be "paged out."

I also can't remember if the "loaded" program is loaded once but points
to unique data per job, or if a job running a program has a unique copy
of both the code and the variable memory.

The "activation" mentioned above is the part the is unique per job or per activation group. So each job has a "local" copy of the data and variables, but the executable code is always shared.

While writing the above, something niggled at the back of my memory that
said the first time a variable is accessed it gets a fault saying
something like "not initialised" which differs from a "not in
memory/page" fault... but I might be wrong it was so long ago that I
read up on it.

For more detailed information, see Dr. Frank Soltis' books: "Inside the AS/400" and "Fortress Rochester." You can find used copies on e-bay or various used bookstores on-line, such as www.abebooks.com, or you might even find copies at a public or university library.

Jon.