Newsgroups: armory.general From: spcecdt@armory.com (John DuBois) Subject: A day in the life of deeptht Organization: The Armory Date: Sat, 3 Sep 1994 04:38:24 PDT TODAY deeptht ran out of space on the u filesystem again. I decided that the only thing to do about it was to get rid of the online backups, and give the 113M of space that was allocated to them to u. The only "supported" way of changing the size of a filesystem is to back it up onto tape, recreate the filesystem with the larger size, and restore from tape. But that takes a while, so I generally take the shortcut of directly patching the field in the superblock that sets the filesystem size. This is a somewhat dangerous operation (I've hosed filesystems in the past by doing it incorrectly) so I did a backup of u first. Because I always have to poke around a bit to figure out where the size field is, this time I actually wrote a little utility to change the size. So, after the backups were done, I modified the divvy table to get rid of the backup fs and extend the u fs. But, this brought up a problem. Because my SCSI drives map their internal geometry to a geometry in which there is one cylinder per megabyte, the increase in the size of u from 1000M to 1113M caused it to extend beyond the first 1024 cylinders of the drive. This puts the filesystem in danger of being unbootable, because all of the boot stages up until the kernel is executing use the system BIOS to read data from the drive, and the PC BIOS call that does the read has only a 10 bit field for the cylinder, meaning that data beyond the 1024th cylinder cannot be read. Normally, this wouldn't be an issue for anything but the root filesystem, but it happens that what is now the u filesystem on deeptht used to be the root filesystem. At some point, I needed to move the root filesystem off of the boot drive to make room to expand u. When doing that, the usual thing to do would be to change the SCSI IDs of the first & second drives so that the system could boot from the root filesystem. But that would require renaming lots of divisions, because changing the SCSI ID of a drive changes the minor numbers of all of the filesystems on the drive. So, instead I had just copied the minimum files necessary to have on a boot filesystem (/boot and /etc/default/boot) to u, and changed the boot parameters to tell the boot program to boot from a nonstandard device (hd(105) instead of the usual hd(40)). That worked fine until the 1024 cylinder limit was exceeded by u. As long as /boot and /etc/default/boot were never touched, neither would be liable to be allocated a block beyond the 1024th cylinder, but I do make modifications to /etc/default/boot occasionaly, and I might even load a new /boot some day, so I decided at this point that the safe thing to do would be to finally get around to changing the drives' SCSI IDs. To avoid the hassle of renaming the divisions, it occurred to me that I could change the kernel's SCSI-ID-to-drive-number map at the same time. So I swapped the order of the lines in the mapping file and relinked the kernel. Having the boot drive be anything other than 0th drive is also an unusual thing to do, but it *should* work, so I figured I'd give it a try. Changing the drive IDs turned out to be a bit difficult since I couldn't find the manuals for either of the drives. I took a guess at which jumpers on the drives were the ID jumpers and what their orientation was (i.e., which jumpers were for each of the 1, 2, and 4 bits of the SCSI ID), changed them, disconnected the 2nd drive so that I could try them one at a time, and rebooted to see if the host adapter would find the first drive. It did. Then I re-connected the 2nd drive. The host adapter didn't find it. So I tried the opposite orientation. It worked. With the drive IDs set, I tried booting with parameters for booting off of the second drive (because, remember, I had told the kernel that the drive at ID 0 should be considered the 2nd drive). The kernel booted but then hung. I tried various things, getting strange results, before realizing that although I changed the kernel's ID mapping, the boot program has its own, fixed mapping, which is supposed to be the same as the kernel's. Of course, it isn't if you change the kernel's mapping. Some of the strange results were because I had an old kernel on u (put there to allow me to boot if the root filesystem was damaged) which I was accidently booting in some cases. So, I tried booting with parameters that told /boot to read the kernel from hd(40) (the first division of the first drive, which is the normal boot device), but to "root" off of hd(104) (the first division of the second drive), since this parameter is interpreted by the kernel after it has booted. This gave me even more peculiar results. I finally remembered that when I had moved the root filesystem over to the 2nd drive, I had put it on the second division instead of the first, because it happened to be free and it didn't really matter when it wasn't on the boot drive anyway. In fact, I wouldn't have been able to boot at all except that I happened to have the boot files on the filesystem on the 2nd division of that drive, too. Fortunately, it's pretty easy to change the minor numbers. I edited the divvy table for the drive to swap the start and end block numbers for the divisions on hd(40) and hd(41), which were the local and root filesystems respectively. Then I changed their names to be root and local, the net effect being to swap the minor numbers of the two filesystems. I booted again. It didn't work, because the default boot parameters I had in /etc/default/boot were still telling it to root off of the second division. Before I realized this, I tried a few other things, including booting using an old kernel that did not have its ID mapping swapped. That kernel paniced and tried to write an image to hd(41), the standard "dump" device, but appeared to fail. Panicing when there is no root filesystem available is normal, and the failure to write to hd(41) seemed normal because my swap/dump device is not there; it's on what the kernel thinks is the second drive. I eventually successfully booted, but decided that all this was really too much hassle, and I should just change the ID mapping back to the normal one and rename the divisions. So I did. It only took a few minutes. I made sure the boot parameters were correct for the final configuration, and let the system autoboot, which will bring it into multiuser mode. The system came up, tried to mount u, and found it dirty (probably due to my having to power the system off at some point while it was mounted... things had begun to blur together). Cleaning a large filesystem takes a long time so I went off and let it do its thing. You eventually get to recognizing the sound of a normal fsck, and I came back to the console when I heard things going wrong. There were errors streaming by... serious errors. It seemed that somewhere along the line I had munged u somehow. That didn't worry me too much because I had just backed it up. The fsck had reached the pitiful point where it looked like it was probably doing more harm than good, and it occurred to me that it might even be cleaning the wrong device with all the mucking about I'd done, so I powered the system off and brought it back up in single-user mode. My suspicion was reinforced when I ran fsck on u and it came up clean. I checked all the minor numbers, made sure I was booting the right kernel, etc. Nothing seemed to be wrong so I went multiuser again, keeping an eye on the console this time. Then I saw what the real problem was: it was local, not u, that was corrupt. During the last boot it had gone past cleaning u and on to local without my noticing. I power-cycled the system again and went single user. I ran fsck on local manually, and it looked really grim. Thousands of bad inodes... maybe even all of them; it was hard to tell because it gave up after a while. "root inode unallocated" (a very bad sign). I was beginning to get a bit worried because I didn't make a backup of local before starting all this (since I hadn't expected to even be touching the drive it was on). The last backup of local was three weeks ago, meaning three weeks of work would be lost. Normally, the daily online backups save any files touched each day, but I had just gotten rid of the backup fs! I usually do a complete system backup (all filesystems) before junking the data on the backup fs, but I didn't today because u was full and I wanted to do something about it quickly. After trying fsck a couple of times, it was obvious that there was no hope. It may have been the kernel panic, writing over part of local's inode table. I edited the divvy table again, moving local from division 1 to 2 to make sure it couldn't happen again, and began resigning myself to restoring from the three week old backup. But, it occurred to me that although I had mounted, cleaned, etc. the u fs, I hadn't added anything to it, so although I had given the backup fs's space to it it might still be untouched. I even remembered what the old start and end block numbers were. So I restored the old divvy parameters. That put u in an invalid state, because it now thought it had 113M more than its divvy entry gave it, but I didn't intend to mount u until it was back to the larger size. After rebooting (to make sure the kernel wasn't terminally confused by all this), I tried mounting the old backup fs, and succeeded. Happy Happy! I quickly recreated local and issued a command that copied all of the files from local that had been backed up to the backup fs (all of the files that had been touched in the last three weeks), keeping the most recent copies of each. Then, I unmounted the backup fs (in case there was corruption in it that I hadn't run into), and started a restore from my tape backups of local, with instructions to not overwrite anything that I had already written to local. There were two 150M tapes. The first one restored without problems. In the middle of the second, though, I got a media error and the restore aborted. That was a bit odd since I had used these tapes to copy the local data to my machine at work. I tried again, this time starting with the second tape and with an option that tells cpio (the backup/restore program) to skip over any corruption. It got to the same spot on the tape, gave a media error, and hung. It refused to go past that spot. I had been getting all-too-frequent errors with the tape drive lately, so I decided that it might be time to clean it, something I had only done once before in the 3 years I'd been using it. I dug up the tape drive instructions to find how to clean it, got out some head cleaner & a long swab, and did the job. After letting it dry, I put the tape back in and tried to read it. It hung at the same spot. I started thinking about going to work and making a backup of the local data from the machine I had installed it on. But, I had removed all of the stuff that was only relevant on deeptht, and changed lots of other stuff. Still, I tried making a file list from the tape, so I could get a list of those files that were not successfully read and restore only them from work. But (as I pretty much expected) it wouldn't let me do that either. It still hung at the bad spot. Finally, I thought I'd try skipping over the bad spot on the tape by using the "no-rewind" tape device, which tells the drive to not rewind the tape when it's done reading. Since the tape was stopped at the bad spot, I removed it and aborted the cpio process. Then I reset the tape drive, put the tape back in, and tried reading from the no-rewind device using cpio. But it rewound the tape anyway. It didn't surprise me since the no-rewind device often doesn't seem to behave the way it should. I tried a few other combinations, putting other tapes in at times to let them be rewound instead of the one I was trying to read, but it didn't work. My last attempt was to read from the no-rewind device, let it hang, abort the cpio process, and then just start another one, also reading from the no-rewind device. It worked. cpio skipped the bad file (part of the webster dictionary database) and read the rest of the tape. As soon as it started, I knew I was home free at last. As far as I know, all I lost was the webster file (which I can get from another backup tape when I get around to it), and the filesystem size changing program. Since I was doing other stuff all day, I didn't do any other work on local in the last day that I can recall. When the restore was done, I changed the divvy table a final time to give the backup space back to u. And that's why deeptht was down for 7 hours this evening. -- John DuBois spcecdt@armory.com KC6QKZ http://www.armory.com/~spcecdt/