Re: Support on the blade vs a "real" i - Final notes -- MIDRANGE-L

I'll post this just in case anyone is interested and, since the blade is now back on it's legs after 9 long days, I'll blog about when I get the chance. Here is what transpired:

The VIOS support guy I was working with couldn't get VIOS to boot, despite a couple of days of fiddling with it. He finally suggested that I order the latest VIOS install CD and reinstall VIOS to see if that would fix it (ah! The tried and true Microsoft method: re-install). I tried ESS and came up empty on an install image, then made about a half dozen phone calls until a savvy entitlement person noted that I have entitlements under both a "System i" and "System p" entry in ESS. Turns out the System p downloads area was where I wanted to go. It was subtle enough that I and two ESS folks didn't see it.

5 hours later I had the VIOS new install DVD image downloaded. I started the installation at about 8pm and at about 10pm I noticed that it stopped progressing at 86% at the "Copying Cu* to disk" stage. This morning it was still at the same point so I figured it had hung and I restarted the install.

A couple of hours later I notice that it had finished so I restarted VIOS and logged in (new userid and password). I had to accept the VIOS license agreements again and I had to configure the networking, then I bounced the server and now I had the IVM back.

The i5/OS partition was still there but in checking the properties I didn't have any hard disks configured for the partition any more. A moment of panic ensued while I wracked my brain trying to remember just what the heck I had originally configured. After poking around a bit I remembered that all the RAID stuff is done on the i and I remembered having two, mirrored pairs so I decided to cross my fingers and give it 4 of the SAS drives in the BCS. The application hung for about 5 minutes after I clicked OK but finally it displayed the partition info and I clicked on the "Activate" button for the partition. Almost immediately the BCS reported errors and phoned home, indicating a hardware failure and no load source. I assumed I must have configured the partition incorrectly so I attempted to open up the partition properties and, after a long pause, the console started spilling errors into the IVM. From that point on the blade was pretty much a doorstop.

So bounced the blade and decided to install the latest fix pack for VIOS 2.1. The command prompt took a long time to return after the bounce but I was able to start and finish the update. After a restart and a long boot cycle I got the IVM back and checked on the i5/OS partition information. This time the configure disks screen displayed almost instantly and I selected the 4 SAS disks and then clicked OK. It saved the info and I double checked by clicking on the partition properties again, just to make sure. Then I attempted to restart the i5/OS partition. After about 10 minutes is was up and running.

However, I couldn't connect to it and again, after poking through the blade readme and some other information I had, I guessed that the virtual networking wasn't set up properly. I followed the readme instructions again and assigned the virtual Ethernet Bridge to the Ethernet line I had configured in VIOS and voila, the i was back!

So, it looks like if you re-install VIOS you need to reassign disks and networking resources to the the i partition again. All my other partition selections were retained. Don't know if this is working as designed or if it is a feature.

But, at least it is back. Just in time for me to work on my Common presentations......

Pete

Pete Helgren wrote:

I am just passing this on in case any of you early adopters on blades end up with a similar situation:

My JS12 had a few intermittent episodes where it would just stop responding. The blade itself would continue to run but VIOS and i5/OS would no longer respond. The Blade Center itself has an error log and errors would show up there. However, I never got a call from IBM letting me know it had phoned home. Not big deal, bouncing the blade seemed to take care if it.

Only, it didn't. The problem continued and last weekend after the blade crashed again, I finally figured out that the i5/OS ECS line wasn't working, and despite my best efforts, couldn't get it to configure and phone home a test problem (that is another story). However, I got the Blade Center configured to phone home and I lit up the blade again, sending a test problem. All was well. Until last Monday morning when I discovered the blade was DOA again. This time IBM did call but I was working with the Blade Center group who seemed to think that there was a part that needed replacing wholly unconcerned that the system was down. The sent me the part and it was replaced, two days later.

That didn't take care of the problem and now the blade is really down and these folks seem not to be as "responsive" and the "real" system i folks have been in the past. They know that there is an issue with the Media Tray causing a system to crash because other customers have reported the same issue but they haven't come up with a patch/workaround to the problem. This is a development machine so it isn't like the business can't run, but I am a little surprised by the lack of urgency in trying to quickly resolve the problem. There is of course much correspondence and more to the story that I can give here, but this is a bit disconcerting. Even on my development machines in the past IBM would dispatch someone pretty quickly.

Anybody on a blade have a different (positive) experience? Have you always gone to the i group first? This is definitely a hardware/firmware issue but I think I should go through the IBM i group next, rather than wait on these blade folks to figure it out.

Thoughts?

Pete Helgren