The NetWinder Rescue-HOWTO <author>Ralph Siemsen, <tt>ralphs@netwinder.org</tt> <date>$Revision: 1.5 $, $Date: 2000/11/19 17:15:26 $ <abstract> This document explains how to restore a NetWinder using the `Rescue Filesystem' that is now being shipped on all units. This is expected to replace, for most cases, the other methods of rescue (which are described in the <url url="Disk-Update-HOWTO">). This document also contains a section on troubleshooting common hardware and software problems on the NetWinder. </abstract> <toc> <sect>Introduction<p> Recovering a NetWinder to "factory" conditions has traditionally been a difficult task, since it required a second computer, configured to act as a boot server for the NetWinder. Although this is a flexible solution for technical users, it can be quite difficult for novices to get it to work. Thus, the OfficeServer includes a `Rescue' partition which eliminates the need for a boot server. This document explains how to use the rescue partition to recover a NetWinder to factory condition.<p> This document therefore is aimed primarily at users of the OfficeServer product, though owners of the Developer model can also benefit. The rescue partition software began shipping on all models beginning in October 1999. For machines shipped prior to that date, the rescue software may be retro-fitted (see chapter 3).<p> <sect1>Other source of information<p> A lot of NetWinder-specific information can be found in my home page at <url url="http://www.netwinder.org/~ralphs/"> including a number of other HOWTO's on disk images, kernel and firmware installation and usage.<p> There is also a wealth of information in the general Linux HOWTO's, most of which apply directly to the NetWinder as well. They can be found in many places on the net, including <url url="http://www.linux.org/help/howto.html">. I'd particularly recommend the Ethernet-HOWTO, the NET-3-HOWTO, and (if networking is all new to you), the Networking-HOWTO. Actually all of them :)<p> <sect>Using the rescue partition<p> This chapter explains the various ways that the rescue partition can be used to recover a NetWinder back to its factory state. Keep in mind that this is normaly a `last resort' measure for fixing your system; often you can more easily repair the damage in other ways. The rescue partition can be used as an emergency boot device, which allows you to go and fix stuff on the main partition.<p> <sect1>Overview<p> All recent NetWinder machines include a small (10 MB) rescue partition, that contains enough software to reformat the NetWinder's internal hard disk, and then to reinstall the normal software load. Naturally, an image of all the files on the hard disk is also necessary; for OfficeServer this image is included on the CDROM in the <tt>rescue</tt> folder.<p> In the most common scenario, the OfficeServer CDROM is placed into a PC that has network access, so that the NetWinder can retrieve the data from the CDROM via the network. It is necessary for the PC to have enabled file sharing, and a `shared folder' for the CDROM has to be created.<p> The NetWinder rescue partition is then booted, and networking is configured so that the PC can be reached. There are a series of scripts to guide you through the process of formatting and then mounting the NetWinder hard disk. Then, the drive image is retrieved from the CDROM on the PC and installed on the NetWinder hard disk. (There is also the option to fetch the drive image via FTP).<p> The following sections describe the process in greater detail.<p> <sect1>Booting the rescue partition<p> When you need to use the NetWinder's rescue partition, here are the steps to access it. You'll need to connect a keyboard and monitor to the NetWinder to carry out this rescue process.<p> <enum> <item> Turn on your NetWinder (or reboot it, if it was running) <item> Stop the autoboot sequence by as the NetWinder boots. (e.g. when it says `Press any key to abort autoboot') <item> Type the following commands at the firmware prompt <verb> setenv kerndev /dev/hda4 setenv rootdev /dev/hda4 boot </verb> </enum> That will do it, the NetWinder will now boot from the rescue partition. In short time, a shell prompt will appear, along with a message telling you to run <tt>netconfig</tt> to configure the network.<p> <sect1>Configuring networking<p> The <tt>netconfig</tt> script will allow you to set up a network interface. It will ask a number of questions about your network, such as the IP address and netmask to be used. Some options, like DNS servers and gateways, are not required if your rescue computer is on the same subnet.<p> The <tt>netconfig</tt> script will ask you which interface to use. Normally, the OfficeServer uses <tt>eth1</tt> (the 10/100-base-T port) for its internal gateway. So that is generally the one you would select. Then give an IP address and a netmask. The script will try to compute the broadcast address for you.<p> If you normally operate using DHCP, you'll have to `guess' a free IP address to be used during this rescue boot. Go to some other computer on your network, check out what it's IP address is, and then add one or two to the number. You can use <tt>ping</tt> or other tools to verify that the address is free for use. Then enter the free address into the NetWinder's script.<p> It is a good idea to test the network connection once it's been configured. From the NetWinder you can try to <tt>ping</tt> another machine on your network. DNS name resolution might not work, but numeric IP's should. Note that the rescue partition shell does not support job control, which means you cannot abort a <tt>ping</tt> with <tt>CTRL-C</tt>. Instead, you have to use <tt>ping -c 5 aa.bb.cc.dd</tt> which tells ping to only try 5 times.<p> <sect1>Now what?<p> At this point, there are five possible options for re-imaging the NetWinder's hard disk. Three of them are quite common:<p> <enum> <item><tt>mountsmb</tt> is used if the rescue image is going to be loaded from a Windows 95/98/NT computer on your network, <item><tt>mountnfs</tt> is used if the rescue image is going to be loaded via NFS from a unix system on your network, and <item><tt>ftprescue</tt> is used if the rescue image will be downloaded by FTP from an FTP server. </enum> In some cases, instead of connecting from the NetWinder to a rescue server, you'll want to turn the NetWinder into a server so that other computers can connect to it. If this seems like the same thing to you, then don't worry about it, and ignore the following options:<p> <enum> <item><tt>nfsserver</tt> turns your NetWinder into an NFS server, with the root filesystem exported to the whole network, <item><tt>smbserver</tt> similary turns the NetWinder into a Samba server, so that other (Windows) clients can connect to it. </enum> These options are described further in the following sections. There are a few more helpful scripts that are used, <tt>wipefs</tt> which erases the hard disk, and <tt>mountfs</tt> which mounts the partitions in preparation for the untarring of the disk image.<p> <sect1>Using <tt>mountsmb</tt><p> This is the option that most people will use. It requires that you have a computer running Windows on your network. You place the OfficeServer CD-ROM into this machine and allow the CD to be shared across the network. Click on `My Computer', then right-click on the CD-ROM icon. A menu will appear, select `Properties' and then click on the `Sharing' tab. Turn on sharing and give it a name, for example, `CDROM'.<p> On the NetWinder, you should now run the <tt>mountsmb</tt> script. It will ask for the name of the Windows computer (if you don't know what it is, then go to the Windows machine, right-click on `Network Neighborhood' and then click the `Identification' tab). Next, you'll be prompted for the name of the share (`CDROM' in the example above). Finally, you should enter the username (which matches the name you used to log into Windows). The NetWinder will then try to establish the connection to the Windows machine.<p> If the connection fails, you'll have to check your settings carefully and try again. Make sure the network cables are plugged in and that you can <tt>ping</tt> the Windows computer from your NetWinder, and vice-versa. Try entering the computer name and share name in uppercase, as some Windows systems seem to want it that way. If your DNS server is dodgy or nonexistant, then you'll need to use the IP address of the Windows machine in place of its name.<p> Once the mount is successful, then the contents of the CDROM should be visible on the NetWinder. To verify, type <tt>ls -l /mnt/rescue</tt>. You should see a directory called `recovery' (or `Recovery') and inside that directory, the OfficeServer disk image. You can now skip down to the <ref id="Actual installation"> section to complete the process.<p> <sect1>Using <tt>mountnfs</tt><p> If you have other computers on your network that run a Linux or some other UNIX-like operating system, then this option is the one to use. Place the CDROM into the drive and then do whatever is necessary to mount and share the CD to the network. For Linux, this would mean mounting the disk (<tt>mount /dev/cdrom /mnt/cdrom</tt>) and then editing the <tt>/etc/exports</tt> file to allow the <tt>/mnt/cdrom</tt> directory to be shared. And then the NFS service would need to be restarted.<p> On the NetWinder, the <tt>mountnfs</tt> script will prompt you for the IP address (or name) of the rescue server, and the name of the share (e.g. <tt>/mnt/cdrom</tt>). It will then try to mount the volume so that it can be accessed on the NetWinder as <tt>/mnt/rescue</tt>.<p> If the mount fails, check the network cables, IP addresses, and the settings on your server. Try mounting the server from elsewhere on your network, to see if it is correctly configured. Often you have to restart both NFS and portmap services on the server. Try ping tests to verify that the NetWinder can talk to the server.<p> Once the mount is successful, then the contents of the CDROM should be visible on the NetWinder. To verify, type <tt>ls -l /mnt/rescue</tt>. You should see a directory called `recovery' (or `Recovery') and inside that directory, the OfficeServer disk image. You can now skip down to the <ref id="Actual installation"> section to complete the process.<p> <sect1>Using <tt>ftprescue</tt><p> To be written.<p> <sect1>Using <tt>nfsserver</tt><p> To be written.<p> <sect1>Using <tt>smbserver</tt><p> To be written.<p> <sect1>Actual installation<p> <label id="Actual installation">At this point, the new disk image you want to install should be mounted under <tt>/mnt/rescue</tt> somewhere, and you should know the exact path and filename. Since the CDROM's have the old DOS limitations on filenames, you may find that the image is called something strange, like <tt>os-1_0_2~.gz</tt> when really it should be something more meaningful like <tt>os-1.0-2.tar.gz</tt>. In the following examples, just substitute the actual filename for the examples listed.<p> You can now proceed to erase the <tt>hda1</tt> and <tt>hda3</tt> partitions and then to transfer, via the network, the new disk image on to the empty partitions. Two scripts are provided to facilitate this process: <tt>wipefs</tt> is used to clear the two disk partitions, and <tt>mountfs</tt> sets the partitions up so they can be accessed from <tt>/mnt/hdroot</tt>.<p> <em>Note:</em> there is a bit of a bug in the early versions of the rescue system. If you type <tt>cat /proc/version</tt> and it reports linux version 2.2.9-3, then you will likely have trouble with formatting the two partitions. The format command (<tt>mke2fs</tt>) will fail randomly with a `memory violation' error. If this happens to you, your options are to replace the kernel with a newer version (2.2.12), or to repeat the command until it suceeds, or to use <tt>rm -rf</tt> to delete all the files instead of <tt>mke2fs</tt>.<p> After you've used <tt>wipefs</tt> and <tt>mountfs</tt>, the new disk image can be installed directly. Just to keep you on your toes, we did not include a script for doing this. You have to type the commands yourself:<p> <verb> cd /mnt/hdroot tar zxvpf /mnt/rescue/recovery/os-1.0-2.tar.gz </verb> Adjust the pathname on the <tt>tar</tt> command as necessary to reflect the actual path and filename where the new image is located. It is critical to use the `p' option so that permissions will be set correctly on the files. The `v' option can be omitted if you don't want to see the names of the files scrolling by.<p> It should take about 15 minutes to copy all the data across. Once it's done, you should wait a little longer (30 seconds or so) to let the data be flushed to disk. Then, type <tt>exit</tt>, wait until the message appears that its safe to shutdown. Then press the reset button to reboot. At this point, the new image will be loaded and hopefully all will be well.<p> <sect>Rescue partition installation<p> This chapter explains how to install and use the `rescue paritition' software package. NetWinder OfficeServer and DM models shipped after October 1999 include this software package by default; older systems need to be retrofitted (or sent back for upgrade) in order to make use of the new package.<p> <sect1>Do I already have it?<p> If you've received your machine after October 1999, then you should already have the rescue package installed on your system. To be sure, there are two things to check. As <tt>root</tt>, run the command <tt>fdisk -l /dev/hda</tt>. This will list the current partition table, which should look something like this:<p> <verb> Device Boot Start End Blocks Id System /dev/hda1 1 3895 1963048+ 83 Linux native /dev/hda2 3896 4026 66024 82 Linux swap /dev/hda3 4027 7921 1963080 83 Linux native /dev/hda4 7922 7944 11592 83 Linux native </verb> The rescue partition is <tt>/dev/hda4</tt>, and it's just a bit over 11 Megs in size. This is a pretty sure sign that you have the image, or at least, you have the space for the rescue image. To verify that the data is actually there, you need to mount the partition (temporarily):<p> <verb> mount /dev/hda4 /mnt cd /mnt ls </verb> If the <tt>mount</tt> command fails with `You must specify the filesystem type' then <tt>/dev/hda4</tt> probably is not formatted and therefore does not contain the rescue image. Otherwise, you should see a fairly standard directory structure listed: <verb> bin dev lib mnt sbin usr boot etc lost+found proc tmp var </verb> If you see these directories, then you're all set. Note that from time to time, the rescue package will be updated, so it's a good idea to periodically install a newer version anyways. There currently isn't a way to find out which version of the rescue package you have installed, but in the future, we'll include a <tt>README</tt> file in the root directory (shown above) that will tell you which version you are looking at.<p> <sect1>Installing the image<p> The following steps explain how to install the rescue image onto your system (or how to upgrade to a newer rescue image; it's the same proceedure). I'm assuming that you do actually have a <tt>/dev/hda4</tt> partition of at least 10 Megs. See below for advice if you do not have this partition.<p> To install or update the rescue image on <tt>/dev/hda4</tt>, follow these steps:<p> <enum> <item> Download the latest rescue image by anonymous FTP from <url url="ftp://ftp.netwinder.org/pub/netwinder/images/">. The filename is <tt>rescue.tar.gz</tt> or there may be a newer version. <item> Log in as <tt>root</tt> or use the <tt>su -</tt> command to become root. <item> If you had previously mounted the partition, unmount it with the command <tt>umount /dev/hda4</tt>. <item> Format the hda4 partition, then mount it on <tt>/mnt</tt>: <verb> mke2fs /dev/hda4 mount /dev/hda4 /mnt </verb> <item> Change directory to the mount point, and untar the rescue image. <verb> cd /mnt tar zxvpf /root/rescue.tar.gz </verb> </enum> You will of course need to adjust the pathname on the <tt>tar</tt> command to reflect the location where you downloaded the rescue image. <sect1>If you don't have <tt>/dev/hda4</tt><p> If you have an older system where the disk is already fully allocated to partitions 1 through 3, then it's a bit difficult to install the rescue system. I would recommend using one of the other rescue methods, which are described in the <url url="Disk-Update-HOWTO.html">. Instead of installing the full disk image, though, you can repartition the drive and install the rescue package only. Then the rescue package can be used to reinstall everything else.<p> Another option is to try and merge two partitions together. If there is enough space free, then you can copy e.g. <tt>/dev/hda3</tt> over to <tt>/dev/hda1</tt>, and then can safely split 10MB or so off from <tt>/dev/hda3</tt> to be used as the rescue partition. Sadly, there is no way to resize an <em>ext2</em> partition without erasing the data on it. (There is <em>fips</em>, but that only works for DOS partitions).<p> Supposing you want to try this, then the first thing to do would be to run <tt>df</tt> to check how much disk space is available. It should look roughly like so:<p> <verb> Filesystem 1k-blocks Used Available Use% Mounted on /dev/hda1 1477028 301819 1098880 22% / /dev/hda3 1521792 1151033 292110 80% /usr </verb> In this case, there are about 1.15 Gig on <tt>hda3</tt> and only 1.09 Gig of space remaining on <tt>hda1</tt>, so it won't fit on <tt>hda1</tt>. It could be copied the other way (making <tt>hda3</tt> the root filesystem) but in that case you'd need to carefully adjust <tt>/etc/fstab</tt> to reflect that fact that the root filesystem is then on <tt>/dev/hda3</tt>, and remember to delete <tt>/etc/mtab</tt> before shutting down.<p> To copy the data between the partitions, you would use the following series of commands. Note that in my case, <tt>/dev/hda3</tt> was mounted as <tt>/usr</tt> (as indicated in the output from <tt>df</tt> above). On the older systems, it was mounted on <tt>/home</tt> instead. If that is the case for you, then substitute <tt>home</tt> for <tt>usr</tt> below.<p> <verb> umount /dev/hda3 mount /dev/hda3 /mnt cp -avx /mnt /usr umount /dev/hda3 </verb> Now you have to edit <tt>/etc/fstab</tt> and comment out (with the <tt>#</tt> character) the line that begins with <tt>/dev/hda3</tt> (You don't have to do this if you plan to move everything right back again, after having re-partitioned. Just don't reboot in the meantime).<p> You can then safely split <tt>/dev/hda3</tt> into two smaller pieces, using <tt>fdisk /dev/hda</tt>. First delete the entry for partition 3, then create a new primary partition 3. When prompted for the size, put in 10 MB less than you have left. You can either do the math (total cylinders divided by the total drive size times 10 MB) or just fiddle by trial and error.<p> Then create a 4th primary partition with the remaining 10 MB of space. Save the partition table, and format both partitions. You might also want to copy the stuff from back over from <tt>/dev/hda1</tt>:<p> <verb> mke2fs /dev/hda3 mke2fs /dev/hda4 # Now copy back /usr back from /dev/hda1 if desired: mount /dev/hda3 /mnt cp -avx /usr /mnt umount /mnt rm -rf /usr/. # Careful with this !! mount /dev/hda3 /usr </verb> Don't forget to restore the <tt>/etc/fstab</tt> file if you changed it. Then you can install the rescue image onto <tt>/dev/hda4</tt> as described above.<p> <sect>Troubleshooting<p> <sect1>Normal boot proceedure and terminology<p> This section describes the stages that the NetWinder goes through when booting up, from the moment the power is applied until the login prompt appears. It also covers the common things that can go wrong.<p> <sect2>NeTTrom / BIOS<p> When power is first applied, the first block of flash memory (64k) gets mapped in and executed. The first visible action is a quick probe of video ram, to determine how much memory there is. The screen is then cleared and the firmware version number and build date are displayed. Any logos that might be found are also rendered, along with the NetWinder logo animation if it is enabled. Meanwhile, the remainder of flash memory is read into RAM and the code therein is decompressed. There is a red progress meter shown at the bottom of the screen during this time. When the decompression is completed successfully, the screen fades to black, then the decompressed code is executed.<p> If the progress meter stops, then flash memory has been corrupted (or bad data was written to it). The only way to boot the NetWinder in this case is to hook up a serial terminal and to download a kernel via the serial port. For more details, see section 3.7 of the <url url="Firmware-HOWTO.html">.<p> <sect2>Minikernel<p> The system now boots into a small linux kernel. The screen clears and reverts back to text mode. In older versions, the full boot-up messages were displayed as the minikernel boots. In recent versions, only selected messages are shown to describe the hardware found. This kernel has the ability to mount a root filesystem in a variety of ways, as well as to fetch the main kernel in a variety of ways. There is a `firmware control menu' available here.<p> Normally, the minikernel loads a real kernel from the hard disk. The parameters <tt>kerndev</tt> and <tt>kernfile</tt> specify the actual file in this case (default values are <tt>/dev/hda1</tt> and <tt>/boot/vmlinux</tt> respectively).<p> If an invalid kernel filename is given, the firmware will stop with an error message. The root filesystem however is a different matter: since it is not mounted until the kernel boots, the firmware cannot report if an invalid value is specified. So you won't find out until later, when the kernel says <tt>VFS: Unable to mount root fs</tt> and proceeds to try booting from the non-existent floppy disk.<p> <sect2>Second stage NeTTrom<p> After loading the main kernel into RAM, a reset is performed. Execution once again starts in the first block of flash code. However, this time it notices that its the second boot. Quickly, the RAM refresh is turned on and we jump directly to the main kernel.<p> If the main kernel is not bootable, the screen will stay dark at this point. This can also be caused by having inappropriate args passed from firmware to the main kernel (in particular, the amount of RAM on the system). Using old firmware with a new kernel will generally trigger this condition. Please see <url url="http://netwinder.org/~ralphs/compat.html"> for details on this.<p> <sect2>Main kernel<p> The main kernel, generally loaded from disk, then goes through its normal boot sequence. Hardware is probled, devices are reported, and eventually the root filesystem gets mounted. This could fail, particularly if an NFS root is being used, for a variety of reasons.<p> Once the root filesystem is mounted, the kernel tries to start the <tt>init</tt> program, which will then run through the SysV-style init process. It will source <tt>/etc/inittab</tt>, which in turn sources <tt>/etc/rc.d/rc.sysinit</tt> and then all of the <tt>/etc/rcN.d/S*</tt> scripts (where N is the current runlevel, as defined in <tt>inittab</tt>). Finally, <tt>getty</tt>'s are launched on the various virtual consoles. <sect>Misc<p> <sect1>Author<p> The author and maintainer of the NetWinder Rescue-HOWTO is Ralph Siemsen (ralphs@netwinder.org). Please send me any comments, additions, corrections so that the can be included in the next release. The latest version of this document can be obtained from <url url="http://www.netwinder.org/~ralphs/howto/Rescue-HOWTO.html">.<p> <sect1>To-do<p> The `sgml2info' version of this document doesn't show the examples properly - for some reason the linefeeds are removed. Why is this and how do I fix it?<p> <sect1>History<p> Sep 21, 1999 (version 1.0): First public release of this document.<p> Nov 09, 1999 (version 1.1): Reoganization, and significant rewrite.<p> <sect1>Contributors<p> Phil Petruzzo (philpe@rebel.com) contributed the section on how to install and use the rescue partition.<p> Douglas Paul (douglasp@netwinder.org) put together the rescue parition software.<p> <sect1>Legal stuff<p> This document is copyright (c) Ralph Siemsen, 1999.<p> Permission is granted to make and distribute copies of this manual provided the copyright notice and this permission notice are preserved on all copies.<p> There is no warrantee whatsoever.<p> </article>