July 30, 2008

fiber channel (fibre channel)

Filed under: Computer joy — dalvenjah @ 11:42 pm

One of the things I want to start posting about is fiber channel (sometimes spelled fibre channel). It’s something I was first exposed to back in 1998, and have been dabbling with off and on since then (entirely at work, since it’s pricey stuff). In the past year I’ve been using it much more, and have learned quite a bit about it.

The biggest issue I have is that whenever I google about an issue, I either run into someone who’s using it in a database application, or someone who’s using it at a very low end in a small video configuration. This is a problem because neither of these scenarios fits our situation at work, and I have to experiment to figure out the solution to a problem, and more often than not it’s a shot in the dark — but it gives me an opportunity to really learn it and with luck impart more information about it.

The three-sentence description of fiber channel is basically this. Attaching a disk to a computer happens via some kind of storage interface — usually IDE/ATA (two disks max), SCSI (7 or 15 disks max), or SATA (one disk per port, more with port multipliers). USB and FireWire don’t count, since those convert one of IDE or SATA to USB or FireWire and then attach. Fiber channel is basically the result of someone saying “Hey Beavis, let’s take the disks out of computers, put them somewhere else, and then tie all the computer-to-disk connections together!”

The result of this is you get some of the great features of networks — with the right hardware, you can attach thousands of disks to a single computer. With the right hardware, you can transfer data screamingly fast. With the right software and hardware, you can add multiple links together and increase the speed of the connection. And with the right software, you can share a common set of disks between multiple computers.

The problem is that it brings with it the bad features of disk controllers. Most operating systems will scan for disks when they start up, and whatever they find, that’s what they expect to keep. They don’t like having new disks presented to them after bootup, and they *really* don’t like losing a disk that they found at bootup. Windows is a lot worse — it actually assumes that it owns any disk it sees, and writes a little tag to the beginning of each disk it finds if it doesn’t recognize the existing disk label.

I’ve spent the past year trying to wring performance and reliability out of several different fiber channel configurations at work, for video, database, shared storage, and other configurations, and have mostly succeeded, with lots of help from colleagues and vendors. I’ve learned a lot, and so has everyone involved; and I’ve not found a lot of references to most of the things we’ve learned, so I want to try and share it.

Next post on this topic — an introduction to our fiber channel switches.

March 29, 2008

as I’ve discovered, NetApps are reasonably rock solid

Filed under: Computer joy — dalvenjah @ 4:15 pm

So as part of my job I’ve had to start taking the firehose crash course in Fiber Channel (or Fibre Channel, as some still put it) storage technology. It’s interesting, still somewhat relegated to the higher end of storage devices, but with Apple’s recent (okay, 3 year old) introduction of XSan and such to its systems for the video editing world, a lot of it has become somewhat more reasonable and non-Fortune-50-level. However, as I’ve found, there are very few people who really know their stuff when it comes to complicated fiber channel setups; at some point I’ll try to relay what I’ve learned. But that’s not for this post.

In this post, I would like to relay to you just how I discovered how robust NetApp storage really is. NetApp has been around for at least fifteen years (probably earlier); I’ve been interacting with their equipment in various capacities for almost ten. They’re basically dedicated file serving appliances; they’re loosely based around some customized Intel X86 architecture, with a very nice filesystem and operating system that tries as hard as it can to protect the data you store on it. Oh, and their systems are quite fast at serving NFS and CIFS traffic, too — some wag at work decided it might be a good idea to store old data on Buffalo Terastations, which, while fine for home users, really kind of pale in a multi-user situation. Copying data off of these things occurs at a maximum of about 7Mbytes/sec, whereas I’ve gotten upwards of 250Mbytes/sec going to a NetApp box, and it wasn’t even one of their higher end models.

NetApp isn’t the fastest network storage out there (that award would probably have to go to BlueArc), but they’re solid, reliable, and recover gracefully from pretty much any failure you can throw at it. NetApp is the company that first introduced me to the concept of “we know what’s wrong with your system before you do”. If a drive is failing (it doesn’t even have to have completely failed, just be throwing enough weird juju that the OS loses confidence in it), the OS will snap up a spare disk, start copying data from the failing drive to the spare drive, and then fail the bad drive and move the spare drive into the raid, all without you really noticing unless you’re paying attention to the log messages. This is the company for which the way you notice that a drive failed overnight is that a replacement drive is waiting for you as you arrive at the office the next morning.

(A note to the brave and/or foolhardy; don’t try this at home or at work, it’s certainly not supported by NetApp, and I got lucky. It may void your warranty, though I trust it won’t cause your NetApp to burst into flames.)

So, I was working with just such a system, adding some extra shelves to it as we’re trying to move towards 500G and 750G drive shelves, away from the older 275G and 320G shelves to increase density of this particular system. It uses fiber channel to connect from the head to the drives, which means that although the proper way to make changes to the system is to take it down, reconnect things as needed, then bring it back up, you can make certain changes (like adding disk shelves) on the fly.

I connected up the shelves together, wired them up to the file server, set the shelf IDs, and powered them on. The system recognized the new drives, and started doing its work to add them into the system as spare drives.

Except for one piece of stupidity on my part. I’d forgotten that shelf IDs are not like SCSI IDs; even though the little setting supports it, you can’t have a shelf ID of zero, and that the shelf IDs start at 1. Which meant that shelf “zero” wasn’t being recognized by the system.

So I figured, what the hell, they’re spare disks, let’s just power the shelves back off and reset the IDs. I’d noticed that the system was upgrading the firmware on one of the shelves, so I waited until this was done and it didn’t appear to be doing anything else to that chain of disks, powered them off, reset the IDs, and powered them back on.

The system did scream bloody murder (beeped, sent pages to the admins, and opened a trouble case with NetApp support), but after about five minutes of twitching, the system figured out that the 56 disks that had just disappeared had indeed reappeared, and that after a few bouts of convulsing and finally calming down, everything was really all right.

What impressed me most about this was the reasonably graceful manner in which the NetApp figured out what was happening and recovered from what for all intents and purposes was a catastrophic disk failure. I suppose the two things that made this less severe than it could have been were 1) the disks were all spares, not data disks, and 2) the disks were on their own fiber channel interfaces that didn’t have other disks on them as well. But my recent experiences with Macintosh computers on fiber channel (even if a disk’s not mounted by a system, if the Mac can see it over the fiber channel network and the disk goes away, the Mac will probably lock up at some point in the future if you don’t reboot it first) had made me wonder what would happen when I tried this. I would at least have expected the system to not re-recognize the disks the second time I powered them up.

I have to say that for as pricey as these things are (usually in the five to low six figures; though they’re not nearly as bad as some higher-end storage), they’re worth the amount you pay for them in purchase and support. The systems don’t go down, the support organization behind them is stellar, and they just plain work. The only thing I don’t like about them is their new logo (the monolithic ‘n’; it looks like a piece of a henge. You know, like stonehenge, woodhenge, and strawhenge). Beyond that, I’m quite happy with that equipment.

May 10, 2007

mysterious error messages, part 2

Filed under: Computer joy — dalvenjah @ 1:24 pm

Here’s one that I just ran into; the results from a google search aren’t exactly helpful (no, you don’t need to reinstall the package because of this error).

After installing proftpd 1.3.0a, using a mostly-default /etc/proftpd.conf, on CentOS 4.4, you try to start it up and get the following error message:

- Fatal: ScoreboardFile: : unable to use '/var/run/proftpd.scoreboard': Operation not permitted on line 58 of '/etc/proftpd.conf'

The unhelpful error message doesn’t explain, like the comments in the source code do, that the scoreboard file should not be in a world-writeable directory. On CentOS 4.4, /var/run is world-writeable with the sticky bit (like /tmp) so that processes that don’t run as root can put their lock files in there.

Solution: create a new directory (I chose /var/lib/proftpd), chown it to the same user that proftpd runs as (the User directive in /etc/proftpd.conf), and make sure it’s mode 775 or similar. Then change the following line in /etc/proftpd.conf:

ScoreboardFile /var/run/proftpd.scoreboard

to

ScoreboardFile /var/lib/proftpd/proftpd.scoreboard

I should probably submit a patch to make a more helpful error message. But that won’t help the users with default installs who just run into this error.

March 21, 2007

mysterious error messages, part 1

Filed under: Computer joy — dalvenjah @ 10:50 pm

This may or may not be the first in a series of posts in which a strange unknown error is found, and a non-obvious solution is found.

This particular error message came after creating a software RAID device:


# mdadm --create /dev/md7 --level=1 --raid-devices=2 /dev/sdc1 /dev/sdd1
mdadm: /dev/sdc1 is too small: 0K

I had just partitioned the disks with fdisk and set the partition type; sfdisk -l on the disk gave the correct output. Nobody else appeared to provide a solution to this, even though a couple of posts with the same query went unanswered.

It turns out for the first time ever for me, despite the perpetual fdisk warning, the partition table didn’t get reread properly by the kernel when fdisk wrote out the new table. This only happened on sdc, not on sdd.

I figured this out with mke2fs’s much more explanatory error message:


# mke2fs /dev/sdc1
mke2fs 1.35 (28-Feb-2004)
mke2fs: Device size reported to be zero. Invalid partition specified, or
partition table wasn't reread after running fdisk, due to
a modified partition being busy and in use. You may need to reboot
to re-read your partition table.

The fix was to run fdisk one more time, and just say ‘w’ to write out the partition table again, and (more importantly) make the ioctl() call again to have the kernel reread the partition table, this time properly. The next step would have been to reboot if that didn’t work, but I didn’t want to. (As Saif said in the previous post, rebooting is for adding new hardware.)

As I find more of these non-obvious error messages and the solution, I’ll try to post about them. Hope this helps someone out.

March 18, 2007

ghetto raid scrubbing with linux

Filed under: Computer joy — dalvenjah @ 1:40 am

As a follow up to my adventures with Linux RAID scrubbing (or lack thereof), I decided to poke around a bit more this weekend after a filesystem started throwing some errors.

It appears that someone did fix at least part of the issue I ran into — a memcpy() was left out of the repair kernel code — but I’m not planning on installing that kernel for a while. Not without some serious testing, or perhaps after it’s applied in a RedHat/CentOS update kernel.

However, I did come up with something that may work as a very ghetto software RAID1 verification technique. (The following keywords should help someone google this post: linux software raid verify scrub oh shit.)

Here’s what you do. First, find the size of the mirror from /proc/mdstat:

md5 : active raid1 hdd1[1] hdb1[0]
      58613056 blocks [2/2] [UU]

Multiply the number of blocks by 1024:

[root@linux] # bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
58613056*1024
60019769344

Then, run cmp on the two devices that make up the mirror:

[root@linux] #cmp /dev/hdb1 /dev/hdd1
/dev/hdb1 /dev/hdd1 differ: byte 60019769345, line 246090365

If the byte at which the two devices differ is a higher number than the one you came up with using bc, it means both mirrors contain the same data. (From what I can tell, that’s the area where the raid metadata/superblock sits, at the end of the disk.)

If the differ byte number is smaller, you can probably do a more extended test with cmp -l to find out what data differs and whether there are one or more differences. Not sure how to repair at that point; if you feel lucky, you might be able to do some kind of block editing (and guess the value that block should be), but I’m not about to try that part.

Part of the point of scrubbing is to read every byte of data from every disk and make sure there aren’t any read errors; if there are, it should throw a kernel error which shows up in logs, or with IDE might allow the drive firmware to reallocate a block that has a soft error in it (which will show up in smartd’s output).

Note that this will only work with RAID1; RAID5 lays out data differently, in stripes of data and parity, so you’d have to do parity calculations as well as figure out where they are. It could probably done with some programming, but that’s left as an exercise for the reader }:>.

So yeah, it’s really ghetto, but it appears to work. And now I don’t feel like I’m flying 100% blind and not knowing whether my mirrors are really mirrors. If I feel industrious, I’ll probably put this into a shell script and start running it weekly or something.

January 29, 2007

today, java is not my friend

Filed under: Computer joy — dalvenjah @ 1:02 am

One thing before I start my rant about java. It means two things: coffee and a programming language. Coffee, to be clear, is absolutely my friend. It makes me happy in the morning and keeps me awake at night. I treat it with respect and it helps me out.

Java the programming language, on the other hand, is not my friend. Let me explain.

At work we’ve been working on a piece of software that, from within a web browser, needs to open up a file on the user’s computer and send it up to a server. To make sure this works on pretty much every browser, the only real option is java.

So we look around for how to do this, even buy a source license from a commercial package that does something similar, and think we’ve got it figured out. The head programmer writes up a set of code that should do the trick, builds a signed jar file, and tests it out — it works! But only on the programmer’s computer.

Let me back up and explain something about java. Unlike most languages, it has two modes. The first is your standard “everything works” mode, where you write code, you run it like a normal program, and it just works. The second mode happens when you download java code from a web site and run it inside a web browser. In this case, there are a set of things you can’t do — you can’t make network connections, you can’t open local files, etc. — unless the code is signed and/or you specifically grant the program permission, either by clicking ‘Yes’ at an allow box or by modifying a java config file.

(more…)

September 24, 2006

linux raid is not my friend right now

Filed under: Computer joy — dalvenjah @ 11:27 pm

Aaargh.

For a while now, I’ve been using linux software RAID 1 (2 drives, mirrored, so if one fails, you still have your data). The one thing I started to notice, though, was that Linux had no method to do ’scrubbing’, or verifying that the data on both drives really is the same. NetApps have it, 3ware cards have it, any real professional RAID setup has it. Except Linux.

So every now and then I google for ‘linux raid scrub’, and see if someone’s done it. Thursday, I discover that yes, someone has! They added a user-requested data verify to kernel 2.6.18; you do echo ‘check’ > /sys/block/mdX/md/sync_action, and it starts a scrub.

So I download the kernel, compile it (a daylong process), and Friday night I boot the new kernel on one of my systems and start the scrubbing. I’ve been wanting to do this for a while, since I suspected one of the drives was starting to fail.

The scrub takes a while; it’s a pair of 300GB drives. The next morning, I discover that one of the drives has been kicked out of the raid, so I replace it (had one on hand just in case) and add it to the mirror.

What’s supposed to happen next is that the kernel sees a new drive as member #2 of the raid set, and starts copying the data from the first drive to the second. What happened in this case was that the kernel added the drive and said everything was fine right away.

However, everything was NOT fine; when reading from a mirrored RAID set, reads can come from either drive, depending on which one has its heads closest to the data. One drive had the data. The other didn’t. Chaos ensued.

So after trying a few different things, I began to realize that RAID1 in this new kernel was broken. Horribly horribly broken. I tried to manually start the sync, left it for the 6 hours that it took, and found that it still didn’t fix the problem (apparently it hadn’t actually synced data; now that I think about it, that probably saved my butt, since it didn’t try to sync the bogus data on the new drive back to the old one).

So I remove the new drive from the raid set, boot back into the old kernel, and add the drive again. This time it does what it’s supposed to; now the drives are happy again.

I went back to kernel.org to look at the release date for this supposed ’stable’ kernel; turns out it was released Thursday. Not sure how that document made it out there that explained how to do a manual raid sync (that doesn’t work), but it did.

So now I’m back on an old kernel, with a newly rebuilt RAID set, and a few more grey hairs.

I dislike hardware RAID cards since I never know exactly what they’re doing with the drive; I like software RAID since in theory I can figure out what’s going on, look at the source, putz with it, break a mirror and mount the drive on its own, etc. (I got burned once by a cheap raid card that lost its config and hence the data on the drives.) And I can’t afford a NetApp, which seems to do RAID correctly.

I know the people working on Linux software RAID are doing the best they can with the time and resources available to them. But yeesh; this weekend was certainly Not Fun.

August 21, 2006

why must pretty be big?

Filed under: Computer joy — dalvenjah @ 11:37 pm

So I’m trying to free up space on one of my computers (all data expands to fill available storage space, don’t you know), and start looking at what takes up space. I’m using a nifty program called Disk Inventory X. It lets you visualize how much space is being used by certain files, and is very useful for figuring out how much space is being used by what.

One of the things I find is that iDVD, the DVD maker application that comes with the system, is using 1.5GB of space. 1.5GB! Even Photoshop only uses 153MB.

I do some investigating; as it turns out, the space is all used by the themes for the DVD menus. I suppose whoever crammed that much in there had the best interests of the user in mind; but on a system with only a 30GB hard drive, 1.5GB is a sizeable amount.

So now I just have to figure out if I’m ever going to use iDVD; I may just toss it, along with the MS Office Test Drive. But my mind still boggles; 1.5GB in a single application…

July 12, 2006

crazy big

Filed under: Computer joy — dalvenjah @ 1:43 am

So in idly browsing the web today, I find a link to Sun’s CEO blog, who links to Sun’s Tuesday press conference, in which they introduce their crazy new Sun Fire server, the X4500.

Sun Fire X4500 top view
image copyright Sun Microsystems

When I first started out doing computer stuff, I started on DOS and early Windows, then moved onto Sun Sparc and Solaris equipment. Ten years ago, Sun with Sparc was the workhorse of the server and workstation computing world. With the advent of Intel-based servers (and the fact that you can usually buy ten to twenty Intel servers for the same price as one Sun Sparc server that performs the same or less as one of those Intel servers), and Sun sort of became the slow creaky grandfather of servers — everyone had a few in their back closet they couldn’t get rid of, but the Intel computers stole the show and ran the new, exciting, intensive applications. Solaris was a pain in the butt, too, but it worked for the business application side of things. If you were a funky internet company, you’d run Linux or FreeBSD to serve those newfangled “web pages”, but the database backend (and the real value of the business) would always be running on Solaris.

So Sun struggled along for a few years to find their way; they went to the super-crazy-big stuff with the baby-Cray type E10k and E15k, and tried the SGI method of trying to sell an Intel box for twice the price after adding a few pieces of plastic to the case.

It sounds like they let the nutty engineers with clue back in the driver’s seat, though. They came out with a couple of very interesting products today, the most interesting of which is this Sun Fire X4500 system. It’s basically a 2-CPU (AMD64-based) system with an assload of hard disks — 48 500GB disks, making for 24TB of raw storage. Even if you RAID-5 those, that’s about 19.2TB of space. Now granted, Sun’s listing this for $70k fully loaded, so it’s not exactly within reach of a Mr. Average Middle-Class Budget like me (of course I’m thinking of the MythTV system from hell), but still — most “enterprise class” storage systems that would give you even 10TB cost way more than that, if you throw in the server and other stuff necessary to make it work. And it all fits in 4U of rack space. (Which, as the guy in the press conference said, lets you put 1 petabyte in 4 racks. Not bad, that.)

(more…)

May 23, 2006

aaaaargh. my tires are properly inflated!

Filed under: Computer joy — dalvenjah @ 11:38 pm

[updated 12/2006]

So, we have a 2005 Toyota Matrix. Nice car.

Except for the fact that the stupid tire pressure monitoring system doesn’t work right:

Picture of tire error light

The light is supposed to come on if one of the tires has incorrect pressure, indicating either “Put some air in”, or “You have a flat, dumbass”. There’s a button you’re supposed to push and hold when the pressure is correct in all the tires to reset the system.

In our car, however, the light seems to come on whenever the system feels lonely — that is, all the time. Oh sure, we can reset it, and the light goes away, pouting, for a few minutes. But then, after half an hour or an hour of driving, there it pops up again. “Did you miss me?” it seems to say. “I had a grand time being off, but I’m here to be on for you again!”

We’ve had the car in multiple times asking about it, to no avail. One time we were told that there’s also a sensor in the spare tire, so it had to be checked too. But nope — equalizing the pressure in all the tires (even checking to make sure the pressures are still equal after it comes on again) doesn’t help. And of course nobody has posted anything about it on the net. (Or my google-fu is not wise enough to locate the proper page.)

So now it’s kind of become a game; if the light comes on, hold the reset switch in (while driving) to see if there’s a stretch of road smooth enough that the tire pressures will be constant over the 5 or so second period that the system measures the pressures. I’ve had it happen a couple of times, which is surprising given the number of potholes in this town.

At some point someone may figure it out; meanwhile, the light taunts us in its yellow glowingness.

[Update 12/2006]: As it turns out, the issue was the valve stem in one of the tires. Apparently this system works by having special valve stems that have a battery and a transmitter which send the data to the car. When the battery dies, or one of the stems goes bad, you (or the tire shop) can replace it. This turned out to be the issue; the tires wore down, we had to get new tires, and (knock on wood) the light hasn’t been on since.

May 22, 2006

Rack mounting a Playstation 2, part 2

Filed under: Computer joy — dalvenjah @ 3:37 pm

In which rack mounting a PS2 is finished.

(more…)

Rack mounting a Playstation 2, part 1

Filed under: Computer joy — dalvenjah @ 3:36 pm

I’m not sure exactly whether anyone will find this useful anymore, but I have the pictures, so might as well put them to use.

A while ago we had a use for some Playstation 2 consoles running Linux. Nothing fancy (it didn’t even take advantage of the special chips and such in the system), but they worked.

Sony used to sell but recently discontinued a linux kit for the PS2 — basically it came with the network adapter, a hard drive, mouse, keyboard, VGA adapter, and a Linux boot CD. (You supply the PS2 and memory card.) Since we had more than a couple, we decided to rack mount them.

Middle Atlantic sells custom rackmount shelves to fit almost any product; they’re the company that custom home theater installers use to acquire form-fitting rackmount
shelves for almost any electronic equipment out there — and if they don’t have a
template in stock for your item, you can ship it to them (insured) and they’ll
measure it and make you a custom shelf. We got ours locally, but SmartHome carries them too. Note that we got the clamp option; without the two clamps that go above the PS2s, they’ll slide around a bit every time you try to do something to them.

(more…)

May 7, 2006

why can’t computers just get along?

Filed under: Computer joy — dalvenjah @ 6:01 pm

So here’s my current general frustration. Computers just can’t get along.

I have a Mac running iTunes. I also have an AirPort Express. The AirPort has an S/PDIF digital audio output, which I hook into my Denon receiver. The Denon receiver has an ethernet port; last year, there was a press release saying that “Real Soon Now” there would be a firmware upgrade so that the Denon could receive Windows Media (or something) broadcasts, and that they were in talks with other device manufacturers to support the same thing.

(more…)

May 5, 2006

Maya, debug output, and NFS

Filed under: Computer joy — dalvenjah @ 6:05 pm

So in the course of doing some troubleshooting on a file server slowdown, I discovered the following bit of wisdom. Others probably already know it, but I hadn’t put it all together before now.

If you’re using Maya on an NFS server, turn off debug output and other verboseness.

(more…)