Re: Unexpected SQLSTATE returned from embedded SQL -- RPG400-L

On 15-Jul-2015 11:07 -0600, Gary Thompson wrote:

I have the following SQL that has been working as expected for about
2 years:

SELECT COALESCE(om8.suptyp,' ')
INTO :wk_OutAsnSts
FROM om01 om1
LEFT JOIN om08 om8
ON om8.outnum = om1.outnum
AND om8.suptyp = 'AS'
AND om8.recsta = '5'
WHERE om1.outnum = :wk_OutNum

The query is intended to return the EDI 856 ASN status of the "Sales
Outlet" specified by "wk_OutNum".

This week we received a problem report from one of our seven
warehouse locations that six outlets did not receive the expected ASN
pallet labels and Document.

This happened on Monday and Tuesday of this week at that one
location, but has yet to be reported by any other warehouse location.
We send 70-90 ASN documents daily.

I have the impression from followup(s) that the row data that should be selected is static\unchanging; i.e. remains unchanged since a prior [auto-committed] update performed long before the query runs, and that the data remains unchanged even until the next run of the query when the problem did not occur [as noted below].

Probably worth ensuring the query runs with messages_debug active, and the joblogs are saved to review if the Access Plan was rebuilt and\or there is a visibly different implementation being logged for the failing runs as contrasted with the successful runs. The database monitor feature should assist in that regard too; detailed monitor against those jobs to catch that query each time it runs. Be sure to get a Print SQL Information (PRTSQLINF) of the program too; taken after a successful run.

All location data is processed by one RPGLE program which has the
SQL shown above.

I reviewed all status reports created by this program yesterday and
found six examples where the program reported SQL State = 02000 (no
rows) for an outlet that *is* defined.

Late yesterday, we identified this problem in time to repeat the
procedure that runs the query, and then successfully created and
transmitted the required ASN documents.

Perhaps the Access Plan changed, and the new plan runs a code path that is incorrectly unable to find the record; i.e. perhaps there is a defect in the run-time implementing the query with an alternate AP. If the change date of the program is updated, then the changed\defective plan or the original\functional plan presumably has been stored; the Print SQL Information (PRTSQLINF) would record some details of the plan. If the plan changes for the failing queries, that would support the idea of a possible defect and either the maintenance is not applied or the problem is not yet diagnosed. The plan(s) for the statement could exist only in the cache; e.g. if the user(s) running the program are not authorized to update the program [associated space (PAS)] to store the updated plan.

There was no information given about the host variable declaration (wk_OutNum) or the value of that variable at run-time, nor the DDL for the column being compared (outnum columns) nor keys\index\constraints.

Note that seemingly unpredictable results can be /normal/ in some cases. If the data is VARCHAR and the physical data is no properly trimmed, there is a known issue [though there may be a QAQQINI feature that might have a toggle to adjust in some way] whereby comparisons via a keyed Access Path may operate different than non-key compares; untrimmed and trimmed values [for trailing blanks] may compare equal or mis-compare when the RTRIMmed values are equivalent. Similar issues may affect host variable compares. Similarly unpredictable (mis)compares may be possible for zoned decimal with improper zone portions of the numerals; i.e. different results for different implementation, keyed vs non-keyed. If the Sort Sequence (SRTSEQ) is dynamic, then data issues like capitalization [upper\lower case] might effect comparison differences for different submissions; e.g. submitted later in response to a prior failure, that Work Management setup might be different than the typical WM setup for running the query, and thus the resubmissions function as expected whereas the others submissions could be properly failing to find the row.

Repeating the query now also returns the expected "row found"
condition.

Presumably that intends to suggest sqlcode='00000'; arguably, a lack of a condition\class code indicating a lack of any identifiable issue with the operation of the SQL statement.?

My question is what, other than an actual "no row" could cause SQL
State = 02000 ?

Only the condition that the row was not found, according to the specified selection for the SELECT INTO, should be identified by the SqlState [with Class Code 02 meaning "No Data"] of '02000' with the associated SQLCODE=100.

Because I often see CPU well above 100%, I suspect that some kind of
"time-out" may cause this ?

Almost surely, No. A timeout should be reported as an SQLSTATE that reflects whatever was the timeout condition.

If the effect or implementation for the query changes such that the result is incorrect [perhaps due to environment such as available CPU or memory], then the effect is clearly a defect for apparently static row data.