Archive for the 'Geeks Paradise' Category

LDOMS and virtual disks

Wednesday, June 24th, 2009

Have you ever had an LDOM that failed to boot due to some erroneous configuration files in the root file system? Ever wish you could modify the root filesystem of that LDOM without having to boot the LDOM?

Well, you could take the virtual disk and re-assign it to another LDOM, and restarting the LDOM you just re-assigned it to, then access it from there.

For example:

ldm rm-vdisk bootdisk broken_ldom
ldm add-vdisk broken_bootdisk broken_ldom_bootdisk@primary-vds0 working_ldom
ldm stop working_ldom
ldm start working_ldom

Once working_ldom boots, you can run “devfsadm” to create the device links for the new disk. At which point you can mount it and fix whatever you want to fix.

But what if you didn’t have an LDOM that you could restart at this point? What if you didn’t have the resources to create a new temporary LDOM for this purpose?

There is another way but it has limitations. You will only be able to access the first partition on that virtual disk.

If the virtual disk is a zfs volume you can mount it directly as follows:
mount -o rw /dev/zvol/dsk/zpoolname/volname /mountpoint

If the virtual disk is a file, then you can use lofiadm to create a device for it:
lofiadm -a /path/to/vdisk/file
mount -o rw /dev/lofi/1 /mountpoint

I’ve only tested this on volumes that have UFS file systems on it. It may work with other file systems (even ZFS), I just haven’t tried it.

Now if anyone can figure out how to create device files for the other partitions within the volume (by mapping that device file to an offset within the file), then managing every slice in a virtual disk file or volume would be possible and greatly simplified.

-- Posted in Geeks Paradise

Apple Macbook Air Parody

Wednesday, February 6th, 2008

I thought this was pretty cool.


-- Posted in Geeks Paradise

Solaris 9 Volume Manager Problems

Thursday, January 11th, 2007

Yesterday at work, I noticed a file system filling up. This file system happened to be on a RAID5 volume on a Solaris 9 system (using Solaris Volume Manager). After freeing up some space I decided to check the health of the RAID5 volume (I know I should have this automated) using metastat. To my surprise, one of the slices in the 3 slice raid 5 metadevice was in maintenance state. System messages indicated that the drive has failed.

I wasn’t overly concerned. I had a hotspare pool that kicked in to support that RAID5 volume. I could endure two more drive failures on that volume before suffering any data loss. But I needed to replace the failed drive.

The failed drive was an old 36 GB Seagate SCSI drive which was no longer available. I hunted for a spare, but only found a 146 GB Seagate SCSI drive. So I thought I could use that. As long as the slice allocated for the raid 5 volume is the same size, I should be OK (or so I thought).

I took out the broken disk, and replaced it with the new larger drive. The system recognized it without any problems so I proceeded to partition the drive using Solaris’ format utility. The slice used in the original drive was slice 0. I partitioned the new drive with an identically sized slice 0.

There was also s6 and s7 slices in the original drive used for metadbs. So I created those as well. After replacing the drive, naturally those metadbs were corrupted (metadb reported “W” beside those metadbs indicating “device has write errors”). I was able to delete those metadbs using metadb -d.

Now I wanted to re-attach slice 0 of the new drive to the RAID5 volume. Simple procedure, right? metareplace -e d30 c1t11d0s0. But before I do that why don’t I check on the status of the RAID5 volume first. metastat d30.

I was greeted by this unexpected message:

# metastat d30
Assertion failed: mdrcp->colnamep->start_blk < = rcp->un_orig_devstart, file ../common/meta_raid.c, line 151
metastat: Abort
Abort (core dumped)

Now every other meta* command I try for fixing the problem caused the same “Assertion failed” error message. However when I ran metastat as a non-root user, I did not get the Assertion failure, and the command ran successfully giving me a report of all my meta devices.

I googled this for hours trying to find someone who had experienced this and fixed this. I found one relevant entry that suggested metadevfsadm. Perhaps the md subsystem still thinks I have the old drive and so its idea of the size of the drive did not match the new drive. metadevfsadm did not core dump and it did update the “Device Relocation Information” to reflect the description of the new drive. But metastat as root continued to report assertion failures.

Perhaps if I delete the whole raid5 meta device I can re-create it successfully. The raid5 array was still accessible. So I backed up the contents of the raid5 array into another slice in the new drive, then remounted my file systems using the backup. I attempted to rebuild the raid5 array using metareplace -e but that too, resulted in assertion failures. Attempting to delete the raid5 device using metaclear also reported the same assertion failures. Basically I could not delete the array because it is in a strange state.

Rebooting the system did not clear the problem. I did notice that after rebooting the device information as reported by iostat -E correctly identifies the disk (before the reboot it still had information — vendor, model and serial number — about the old disk). Installing the latest md patch did not fix the problem. It appears my raid5 array will be in maintenance state forever.

I needed to find a way to delete the RAID5 volume so I can re-create it. How can I do that when metaclear dies with the same assertion error. Since metainit succeeds as a normal user, then perhaps the assertion could be safely ignored.

My solution was to use LD_PRELOAD. If I could create my own “assert” function which does not cause a core dump, and insert it into the application, then I may succeed in running metaclear.

source for assert.c

#include
void __assert(int a){
fprintf(stderr, “assertion failed: mdrcp->colnamep->start_blk < = rcp->un_orig_devstart\n”);
return;

compile that with

cc -Kpic -c assert.c
ld -zdefs -G -h libassert.so.1 -o libassert.so.1 assert.o -lc

copy the resulting libassert.so.1 to /usr/lib

Then as root:

# LD_PRELOAD=/usr/lib/libassert.so.1
# export LD_PRELOAD
# metastat d30

Metastat no longer core dumped! This is a good sign. So I attempt to delete the metadevice using metaclear:
metaclear -f d30

Success!!

At this point I can unset my LD_PRELOAD. Easiest thing to do is exit the shell.

Now I can re-create the RAID5 volume and copy the data back into it.

Disclaimer This solution works for me. I do not guarantee that it will work for you. I will not be held liable or responsible for data loss or any other kind of damages if you attempt this solution.

-- Posted in Geeks Paradise

The Pilgrimage To End All Pilgrimages.

Thursday, October 26th, 2006

In a previous post I wrote about making a pilgrimage out of my business travels by visiting Apple Stores. Well, this time I’ve done what Apple fans dream of doing: visiting the Apple headquarters in Cupertino.

Now I can say, “been there… done that… got a T-Shirt that say’s ‘I visited the mothership’.”

-- Posted in Journal, Geeks Paradise

Environment Variable Injection in Solaris

Tuesday, August 29th, 2006

Here’s a trick to inject environment variables to a Unix login session.

1. become root
2. set the environment variables you would like to inject
3. kill the inetd daemon
4. restart the inetd daemon

What happens is that inetd inherits all of the environment variables that are set when it starts up. When it spawns other services like telnet, these environment variables are inherited by those services. In the case of telnet, it sets these for the user’s shell.

The user will wonder where did these variables come from? It’s not in the user’s .cshrc, .login or .profile files. It’s not from any shell initialization files in the home directory or in /etc.

Of course this only affects daemons started by inetd. If you’d like to set these for ssh users, make ssh start from inetd as well. Unless of course sshd cleans up its environment before invoking login.

This works on Solaris 9. I have not tried it on any other Solaris versions,or any Linux for that matter.

This was discovered by accident when I could not figure out why I had some environment variables set upon login using telnet, when I could not find where they were being set.

This is probably a bug in inetd. Inetd should clean up the environment prior to exec’ing the requested service.

-- Posted in Geeks Paradise