Showing posts with label linux. Show all posts
Showing posts with label linux. Show all posts

Wednesday, January 7, 2009

Testing AoE over SCSI

In a previous post, I demonstrated that it was possible to serve an LVM logical volume over ATA-over-Ethernet. That particular LV was backed by a parallel ATA physical device. Now I want to see if ATA-over-Ethernet is a misnomer of sorts, and if it is in fact possible to serve an LV backed by a SCSI physical device. I suspect this to be the case.

The test system:

Intel(R) Pentium(R) 4 CPU 2.53GHz
1.0 GB RAM
Adaptec AIC-7892A U160/m (rev 02) SCSI controller
80G PATA hard disk (root file system)
5 Seagate ST373207LW 73GB U320 disks (Yes, I know I only have a U160 controller installed. I don't really care about the speed of the thing. It's a waltzing bear.)
SOYO mainboard

The SCSI disks will be configured as a RAID5 array using Linux software RAID.

mdadm --create /dev/md/0 -f --level=raid5 --raid-devices=5 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

As I am impatient:

echo 100000 > /proc/sys/dev/raid/speed_limit_max
echo 100000 > /proc/sys/dev/raid/speed_limit_min

Which brings my construction speed up to ~10,600 k/s. It will still take a good two hours to build, though.

Once the RAID has been constructed, I will then create a physical volume for LVM on it:

pvcreate /dev/md0

And create a volume group named "scsi":

vgcreate scsi /dev/md0

And create a logical volume named "raid":

lvcreate --name raid -l 70001 scsi

Note that the "scsi" VG had 70001 free extents.

Next, I'll create an ext3 filesystem on the LV:

mkfs.ext3 /dev/scsi/raid

And now, the fun part - sharing the LV via AoE:

vbladed 0 1 eth0 /dev/scsi/raid

Syslog on the test server shows that vbladed is running. Let's see what I get when I run aoe-discover and aoe-stat on my desktop (which is connected to the same network the AoE test server is on):

aoe-discover
aoe-stat

e0.1 293.605GB eth0 up

That looks pretty good, except that lvdisplay on the server says the LV is only 273.44 GB in size. However, 273.44 * 10243 (273.44 gibibytes) == 293.604 gigabytes. So it's likely the disparity is just two different definitions of "GB". Still worth testing, though.

Anyway, now it's time to mount the AoE drive:

mount /dev/etherd/e0.1 /mnt/

No errors reported on either server. But here's the real test: 1 GiB of random data.


dd if=/dev/urandom of=/mnt/random bs=1024 count=1048576
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 169.715 s, 6.3 MB/s


No errors. Next I'll take an MD5 sum of the file, unmount the AoE share, then mount the LV on the test server. In point of fact, I'm not required to unmount the AoE share first, but I want to ensure that there is no possible way both systems would be attempting to write to the test file. Sure enough, the MD5 sums are identical. This shows that you can, in fact, export any LVM logical volume via ATA over Ethernet, no matter what the physical medium backing it. It remains to be seen, however, if it is possible to export a single partition on a SCSI drive via AoE.

Wednesday, December 3, 2008

Proof-of-concept AoE on Linux

One thing I've been wanting to play with at work is ATA over Ethernet. Seems like a pretty neat trick - stick a bunch of drives in a box somewhere and mount them from somewhere else, like a VMWare image.

There's a nice article on Debian Administration on how to do this. I'll mostly be parroting that article, but I figure it might be useful to show how I did it.

I took an old P4 box that we had laying around with a 20G hard drive and put Debian 4.0 on it. I set up the disk using LVM2, because I was curious what would happen if I did that. My LVM setup:

5G root, formatted ext3
500M swap
12.92G "files", unformatted. I figure since I'll be using this thing as an AoE volume, I'll let the system that actually mounts this volume do the formatting. I don't think that's strictly necessary, though.

Also, per the article, I installed the "aoetools" and "vblade" packages. "aoetools" provides various useful tools for managing AoE volumes. "vblade" is described as a "virtual AoE blade emulator", which will allow me to export a local disk (or in this case, LV) over AoE. In this case, the command is:

vbladed 0 1 eth0 /dev/mapper/aoetest-files

And, sure enough, I see in my syslog:


... vbladed: ioctl returned 0
... vbladed: 13870562238 bytes
... vbladed: pid 2306: e0.1, 27090944 sectors


So now I need to access the AoE volume. Before I get into that, though, I'll note that both my AoE proof-of-concept machine and the machine I'll be mounting the AoE volume from have a dedicated network interface that I'll be using for AoE (connected via a crossover cable, in production I'd have a dedicated switch). I'd do similarly in production so that my AoE traffic wasn't sharing the network with regular network traffic. While I think this is good practice, it's not strictly necessary, and I'm pretty sure it's possible to run AoE over the regular network if you have to.

On my desktop (which I'll be using to mount the AoE volume) I've installed the "aoetools" package and loaded the AoE kernel module with modprobe aoe. Next I do aoe-discover, and I see:

"aoe-discover: /dev/etherd/discover does not exist or is not writeable."

Well, that's not good. What did I do wrong? Nothing, as it turns out. This is a bug in Ubuntu 8.10, and as yet, there has been no fix posted. But maybe I can fix it myself.

grep etherd /etc/udev/rules.d/* on the Debian box gives me:


/etc/udev/rules.d/udev.rules:SUBSYSTEM=="aoe", KERNEL=="discover", NAME="etherd/%k"
/etc/udev/rules.d/udev.rules:SUBSYSTEM=="aoe", KERNEL=="err", NAME="etherd/%k"
/etc/udev/rules.d/udev.rules:SUBSYSTEM=="aoe", KERNEL=="interfaces", NAME="etherd/%k"
/etc/udev/rules.d/udev.rules:SUBSYSTEM=="aoe", KERNEL=="revalidate", NAME="etherd/%k"


The same command on my Ubuntu box gives me nothing. However, I can't just tack those lines on to /etc/udev/rules.d/udev.rules on my desktop, because apparently Ubuntu doesn't use that file. Instead I'll create a special file just for AoE, and I'll put it in /etc/udev/rules.d/25-aoe.rules. Restart udev, and viola! The devices are there!

Now, when I run "aoe-discover", I see nothing. That's OK. aoe-discover doesn't have any output. It's aoe-stat that will tell me what's there, and when I run that, I get:

e0.1 13.870GB eth1 up

Hooray! I create a filesystem with: mkfs.ext3 /dev/etherd/e0.1, then, as a test, create a 100M file: dd if=/dev/urandom of=/mnt/test1 bs=1M count=100. Takes 15.1 seconds. Creating a similar file locally? 14.8 seconds, so not too bad for speed. Of course, the two boxen are connected via a crossover - I might well see some slowdown using a switch.

So here we have it. A proof-of-concept Linux-based AoE appliance using commodity hardware. Since the AoE volume is an LVM logical volume on the appliance, you can use LVM tools to change the size of that LV, should you need to. I wouldn't recommend it, though.

Friday, November 28, 2008

Setting up LVM on an already-setup box

I have this box at work that someone else was nice enough to set up with Debian Lenny and a great big honkin' RAID5 array. We've already got the basic filesystem structure set up on the box, but we'd like to add the RAID as well.

I'm going to set up the RAID as its own volume group under LVM. This allows us, should the OS drive fail, to slap another drive with an OS on it into the machine, boot, and remount the RAID. It also allows us to dynamically add storage, say some sort of SAN, to the physical volume, then resize the logical volumes on the fly. It also simplifies things a little by keeping the OS volume group and the file storage volume group separate.

Being an LVM newbie, I'll be referencing A simple introduction to working with LVM and The LVM HOWTO. You can assume that most any LVM-specific command syntax I pulled out of one of those two sources.

Now, the first thing to do is set up a partition on the RAID array, as LVM runs on top of physical partitions. I do this with fdisk, because that's the way I learned it. :) If I were a bit more clever, or if I felt like it, I'd do this with a single call to sfdisk.

Next I create a physical volume for the RAID: pvcreate /dev/sda1, and then a volume group: vgcreate file-storage /dev/sda1. Checking my work with vgscan, I see:

Reading all physical volumes. This may take a while...
Found volume group "os" using metadata type lvm2
Found volume group "file-storage" using metadata type lvm2


Now I want to create a logical volume (LV) that encompasses the entire volume group. I do this by first examining the output of vgdisplay, where I see the line: "Free PE / Size 357375 / 1.36 TB" (I told you it was a big honkin' RAID. Also note here that "PE" means "Physical Extent", the size of one quantum of storage in LVM. One PE is exactly the same size as one LE, so here one LE is about 4 MB). I will thus create an LV of 357375 extents, lvcreate -n files file-storage -l 357375. With this done, it's time to format the LV.

After consulting with my colleagues, I've decided to use ext4 for the filesystem on the RAID. I like what I read about ext4, both on Wikipedia and from IBM, and as this box is slated to become a backup server, it seems like a good place to play with it. Before I begin, though, I'll update the kernel to the latest version in (Debian) testing: 2.6.26-1, so as to have the latest ext4 fixes that have been included in Debian kernels. Even with that, though, I'll want to add "nodealloc" to the line in my fstab for the RAID:

It should be noted that the stock 2.6.26 ext4 has problems with delayed allocation and with filesystems with non-extent based files. So until Debian starts shipping a 2.6.27 based kernel or a 2.6.26 kernel with at least the 2.6.26-ext4-7 patchset, you should mount ext4dev filesystems using -o nodelalloc and only use freshly created filesystems using "mke2fs -t ext4dev". (Without these fixes, if you try to use an ext3 filesystem which was converted using tune2fs -E test_fs -o extents /dev/DEV, you will probably hit a kernel BUG the moment you try to delete or truncate an old non-extent based file.)


At any rate, per the ext4 HOWTO, I'll create an ext4 filesystem on the "files" LV with: mke2fs -t ext4dev /dev/file-storage/files. And wait. And wait. And wait some more, because 1.36 TiB is a lot of space.

From here, the RAID is like any other filesystem. Pick a mount point, make sure to mount it "-o nodelalloc", and off you go.

CORRECTION: Debian kernel 2.6.26-1 does not support the "nodelalloc" mount option. I ended up installing kernel 2.6.27.7 from http://kernel.org/. As the "nodelalloc" option was only recommended for 2.6.26-based kernels, I am no longer mounting the ext4 filesystem with the "nodelalloc" option.

Installing Ubuntu 8.04 on an ASUS F8Va-C1 laptop

As my primary workstation, I just ordered an ASUS F8Va-C1 laptop from Newegg. On the surface, this thing looked slick. 2.53GHz Intel Core2 Duo processor, 4GB of RAM, ATI Radeon 3650 video card with 1G dedicated VRAM, and a 320GB HDD. It came with Vista installed, but I figured I could just resize the Windows partition and install Ubuntu. The best laid plans...

I decided to go with Ubuntu 8.10, Intrepid Ibex. Lacking a CD burner on my current machine, I had one of my colleagues burn a copy for me. I put the CD into my new laptop (after spending a good 15 minutes trying to figure out that F2 was the key to get into the BIOS - serves me right for not RTFMing) and turned it on. Splash screen came up, and eventually I was presented with a white screen and nothing more. No sounds, no cursor, nothing.

Then I tried 8.04, and at least got the LiveCD to boot. I tried doing the install from the LiveCD, but ended up getting SquashFS errors (i.e. bad sectors on the CD). I took the thing home after work, burned an 8.10 CD according to the Coasterless CD burning instructions for Linux (just in case I was having issues caused by a poorly-burned CD), and had the same results. I burned an 8.04 CD using the same instructions and got a little further, but again got SquashFS errors. I booted the CD via an external drive I had laying around, and got even further, but still got SquashFS errors. As it was the night before Thanksgiving, and I had to be up at 0500 to brine the turkey, I decided not to play with it any further.

Today I brought the laptop back into work and installed an Ubuntu 8.04 network install image onto a USB stick using UNetbootn. So far, this is going well. I've been able to set up my disk using LVM (unfortunately, in all the installation attempts I deleted the Vista install, but hey, it's Vista. I'll likely be putting XP Pro on the Windows partition.) I'm keeping my fingers crossed...

Those SquashFS errors, though, have me worried. I suspect that the laptop's internal CD drive is messed up - I need to do some tests to be sure.