RE: CPF5060 Session cannot send now; CPF4533 Device response code is 6155 -- MIDRANGE-L

We have had similar problems during the last few weeks. Rather sporadic; as soon as I canceled the session on the System i, the user was able to connect and sign in.

In our set up the Ethernet from the data center is forwarded to a relay in the middle of the building, and this relay services the north side of the building. All of our issues have been on the north side so we suspected the relay, but neither I nor my boss are really competent to diagnosis the hardware.

This weekend we had to work on other issues, but, as I was going out for a smoke break, I heard an alarm coming from the relay closet. Turned out to be coming from a dead UPS. I mean DEAD; CPR was not an option. We replaced the UPS with a new spare, and the sessions on the north side came back. We speculated that, perhaps, the UPS had been on life support and flickered off for just a split (nano?) second during the past few weeks. As I said, neither of us is capable of saying for sure. We just intend to keep an eye on the situation for the foreseeable future.

Just throwing this out as a scenario (or something similar) that might be checked.

Jerry C. Adams
IBM System i Programmer/Analyst
--
B&W Wholesale
office: 615-995-7024
email: jerry@xxxxxxxxxxxxxxx

-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of CRPence
Sent: Saturday, October 10, 2009 3:12 PM
To: midrange-l@xxxxxxxxxxxx
Subject: Re: CPF5060 Session cannot send now; CPF4533 Device response code is 6155

elehti wrote:

Occasionally (perhaps one or two times weekly) a 5250 session
will orphaned/stranded.

Is it possible the user has for some reason canceled their 5250
session, for example by killing the emulation window rather than
signing off; possibly because an application software problem had
the user frustrated, perhaps for example that the application
program was looping or in a [valid, possibly long] wait state? Then
perhaps the device is simply awaiting QDEVRCYACN effected either by
the application writing to the device or the communications layer to
reach a timeout condition.

Is it that the workstation session has problems for which the
reaction was to issue ENDJOB, or instead that as a result of the
ENDJOB the session has problems? What are the symptoms for what is
referred to as "orphaned/stranded"; i.e. what actions when requested
from or against that workstation device or job, failed to react
appropriately, and what was the unexpected reaction\response to each
of the failing requests?

Job log shown below.

5722SS1 V5R4M0 060210 Job Log SAMSON 10/09/09 12:37:24
CPC2206 Comp 10/09/09 09:13:35.275224 QSYCHONR QSYS 0665 QLIINSRT QSYS 04
Message . . . . : Ownership of object ID4074 in QTEMP type *USRIDX changed.
Cause . . . . . : The ownership of object ID4074 in library QTEMP type
*USRIDX has changed.
CPC1126 Comp 50 10/09/09 12:28:37.244032 QWTCCCNJ QSYS 078E *EXT *N
From user . . . . . . . . . : ELEHTI
Message . . . . : Job 811256/SBLUE/QPADEV003X was ended by user ELEHTI.
Cause . . . . . : User ELEHTI issued a controlled end job request for job
811256/SBLUE/QPADEV003X.
CPC1125 Comp 50 10/09/09 12:35:16.763720 QWTCCCNJ QSYS 078E *EXT *N
From user . . . . . . . . . : ELEHTI
Message . . . . : Job 811256/SBLUE/QPADEV003X was ended by user ELEHTI.
Cause . . . . . : User ELEHTI issued an immediate end job request for job
811256/SBLUE/QPADEV003X.
CPF5060 Notify 30 10/09/09 12:35:17.109960 QWSPUDDS QSYS 0BBE QUIINMGR QSYS 05
Message . . . . : Session cannot send now.
Cause . . . . . : Sending not allowed to device QPADEV003X. Recovery . . .
: Change sequence of operations in user-defined data stream and then try
the command again. Possible choices for replying to message . . . . . . . .
. . . . . . . : I -- Request is ignored. C -- Request is cancelled.
CPF5104 is sent.
*NONE Reply 10/09/09 12:35:17.109960 QMHSNINQ QSYS 0D9A QUIINMGR QSYS
Message . . . . : C
CPF5104 Diagnostic 40 10/09/09 12:35:17.110344 QWSPUDDS QSYS 0C05 QWTPITDP QSYS
Message . . . . : Cancel reply received for message CPF5060.
Cause . . . . . : Either a cancel reply was received from the operator or
the program, or the system used a default reply. The cancel reply was
received for message CPF5060 in file QDUI132 in library QSYS with device or
member QPADEV003X.
CPC1129 Comp 00 10/09/09 12:36:36.933152 QWTCHGJB QSYS 0B36 *EXT
From user . . . . . . . . . : ELEHTI
Message . . . . : Job 811256/SBLUE/QPADEV003X changed by ELEHTI.
CPF4533 Diagnostic 70 10/09/09 12:37:22.997568 QWSERROR QSYS 0628 QWSCLOSE QSYS
Message . . . . : Error on device QPADEV003X. Device response code is 6155.
Cause . . . . . : Device detected an error. Recovery . . . : Device
response codes and their meanings follow: 6140-Lost contact with device.
6155-No response from the device in configured device wait time.
6826-Bind failure. Close successful. Try the
command again. If the problem occurs again, enter the ANZPRB command to run
problem analysis.
CPF5503 Diagnostic 30 10/09/09 12:37:22.997648 QWSERROR QSYS 064A QWSCLOSE QSYS
Message . . . . : Input or Output request failed. See message CPF4533.
Recovery . . . : See the message CPF4533. Correct the errors and then try
the request again.
CPF4168 Diagnostic 70 10/09/09 12:37:23.058504 QWSOPEN QSYS 1226 QWTPITDP QSYS
Message . . . . : Error on device or location QPADEV003X in file QDUI132 in
QSYS.
Cause . . . . . : There was an error on device or location QPADEV003X for
file QDUI132 in library QSYS.
Recovery . . . : Vary off device QPADEV003X
and then vary it on again (VRYCFG command). Then try the request again. If
the problem continues, start problem analysis (ANZPRB command).
CPC2191 Comp 00 10/09/09 12:37:23.922520 QLIDLOBJ QSYS 051B QLICLLIB QSYS
Message . . . . : Object X00RLS in QTEMP type *USRSPC deleted.
CPC2191 Comp 00 10/09/09 12:37:23.923408 QLIDLOBJ QSYS 051B QLICLLIB QSYS
Message . . . . : Object QUS0000004 in QTEMP type *USRSPC deleted.
CPC2191 Comp 00 10/09/09 12:37:23.924280 QLIDLOBJ QSYS 051B QLICLLIB QSYS
Message . . . . : Object X0028 in QTEMP type *DTAARA deleted.
CPC2191 Comp 00 10/09/09 12:37:23.925152 QLIDLOBJ QSYS 051B QLICLLIB QSYS

The user logged on or was presumably successfully operating at
09:13:35 until sometime before 12:28:37 when they requested ELEHTI
to end their non-responsive job? Since there is no activity [no
msgs] recorded in the joblog for over three hours, effectively only
screen captures, traces, or the dreaded user-description of what the
user activity was at the workstation would seem to be able to help
clarify what the user might have been doing.

IBM Tech support says that the i5 is trying to send a message but
some network problem caused the interruption.

The workstation component is attempting to /put/ some User
Defined 5250 Data Stream to the device [in 132 wide mode], but the
device did not respond [within the timeout parameters defined].
However that error transpired only after the ENDJOB was requested.
There is no specific indication in the joblog that the device had
already lost its connection [presumably; i.e. non-responsive job]
prior to that failed program to device communication. What does the
history log show about the device, controller, and line for prior to
12:35:17? And in the problem log, LIC [comm] error logs, and VLIC logs?

Comments?

If instead of ending the job there was some investigative
activity performed, perhaps more can be learned. Minimally, what
has the user done [e.g. did they just Ctrl-V some RTF, press Enter
at screen Z of application Y, etc.] and what does the user actually
see at their emulation session [or did they just kill the client
emulator session], what is the job status, the job stack [and other
job details], and is anything changing or just static? Can the user
perform a system request 4 for DSPMSG, a SysRqs 3 OUTPUT(*PRINT), or
a SysRqs-2 for ENDRQS? What about the status of the device,
controller, and line, and the subsystem monitor job? Additionally,
try sending a break message or a start service job request to the
user job. What are the symptoms for the requester, and does
anything appear in the joblog of the target job and subsystem
monitor job?

If it is known that a specific user\job is typically having the
problem, a TRCJOB and possibly also some screen captures could be
enabled to try to narrow down more specifically what the problem is
such that ENDJOB is perceived as the apparent recovery action. A
prior time, apparently it was a different user?:
http://archive.midrange.com/midrange-l/200908/msg00617.html

Regards, Chuck