Matt Pson Just another dead sysadmin blog

27Aug/11Off

NFS and Debian Squeeze followup

I previously wrote about some troubles with using a server running Debian Squeeze (6.0) with NFS. I'm now inclined to think that either the problem was fixed (a clean install of 6.0.2 does not have any of those problems) or that it was related to using an old server that probably was installed back in the Woody days and just 'apt-get dist-upgrade' from there. The fact that it (the server) worked well in general when upgraded from Woody to Squeeze, via Sarge, Etch and Lenny , is a great testament to how well Debian works imho.

18Jul/11Off

Proving a point – building a SAN (part 4 – the neverending story?)

Finally after weeks and weeks of waiting all parts for my SAN project had arrived and I unpacked the last cables and installed them and they sure did work just fine. All 16 disks jumped online and agreed to my configuration. Victory!

(Well, one of the Seagate Constellation 500GB 2,5" disks failed quite early when I started to transfer some data onto the filesystem. No way I would wait another couple of weeks for Cisco to ship me a new so I went by the local PC dealer and picked up a Seagate Momentus 500GB which worked perfectly.)

I decided to spread disks evenly among the 2 physical controllers and the channels so I got:

Card 1, channel 1: 1 SAS disk (system) and 3 SATA (storage pool)
Card 1, channel 2: 1 SSD disk (zil) and 3 SATA (storage pool)
Card 2, channel 1: 1 SAS disk (system) and 3 SATA (storage pool)
Card 2, channel 2: 1 SSD disk (zil) and 3 SATA (storage pool)

Brilliant, now to install Nexentastor CE and make some zpools. I was happy to see that during my wait for the hardware to arrive a new version had been released 3.0.5 which hopefully addressed some evil bugs I had read about.

I set up my zpool with two mirrored logdevices and two raidz2 pools with 6 disks each to be safe even if more than one disk would fail later on. The end result looks like this:

config:

        NAME         STATE     READ WRITE CKSUM
        stor         ONLINE       0     0     0
          raidz2-0   ONLINE       0     0     0
            c0t1d0   ONLINE       0     0     0
            c0t2d0   ONLINE       0     0     0
            c0t3d0   ONLINE       0     0     0
            c1t10d0  ONLINE       0     0     0
            c1t5d0   ONLINE       0     0     0
            c1t6d0   ONLINE       0     0     0
          raidz2-1   ONLINE       0     0     0
            c0t4d0   ONLINE       0     0     0
            c0t8d0   ONLINE       0     0     0
            c0t6d0   ONLINE       0     0     0
            c1t7d0   ONLINE       0     0     0
            c1t8d0   ONLINE       0     0     0
            c1t9d0   ONLINE       0     0     0
        logs
          mirror-2   ONLINE       0     0     0
            c0t0d0   ONLINE       0     0     0
            c1t4d0   ONLINE       0     0     0

errors: No known data errors

 

Sure, there was alot of diskspace 'wasted' with this configuration as I ended up with approximately 3.5TB of usable space from 6TB (12x500GB) of disks. Better safe than sorry when it comes to data and in this case the data would be virtual disks for a bunch of servers and if there is one thing I hate it's sitting in the middle of the night and restoring terabytes of data in case of an emergency (it's a fact that disks never break during daytime).

Enough labtesting, everything seemed to work just fine. Down to the datacenter to rack the server so it could be put into pre-production before my summer vacation in a couple of days. Took the 4 gigabit ports made a nice LACP of them in order to make no single NFS-client be able to use up all network bandwidth.

Made a NFS share and mounted it in our VMware ESXi cluster and transfered some uncritical VMs to the new SAN. and...

"Damn! That's fast..."

The difference when using proper a proper hardware setup (read: SSD as ZIL in a zpool) was like day and night compared to our old, SSD-less, Sun 7210. Average write latency reports in VMware was in the single digits compared to the 3 digit numbers we were used to.

Edit: now a few weeks later we have done even more test and even transfered some critical system to the new SAN and it just works without a single hickup so far (knock on wood). Next up is to transfer all data off the Sun 7210 system so we can upgrade the firmware and re-install the system with the latest updates.

7May/11Off

Proving a point – building a SAN (part 2)

So, lets make a little update on the last post as I have now put together (and ordered) a bunch of hardware in order to build a better SAN than my last experiment. This is what I ended up with:

  • Cisco C210 M2 server with 24GB RAM ( a nice 2U server with 16 x 2,5" diskslots, starting off with one 2,4GHz quadcore Intel E5620 CPU  )
  • a Intel Quad Gigabit network card ( I plan to aggregate these ports into a 4Gb NFS port using LACP )
  • two LSI MegaRAID SAS3081E-R RAID cards ( to get 16 channels to connect disks to )
  • 2 x 146GB 10000rpm SAS disks ( mirrored for os )
  • 2 x 120GB OCZ Vertex 3 Max IOPS SSD ( mirrored for ZFS logging )
  • 12 x 500GB 7200rpm SATA disks ( to create the storage )
  • NexentaStor ( storage os based on solaris )

That would be it. It will probably be a few weeks until all parts arrive which gives me time to think about the setup.

When it comes to disklayout I'm still deciding between using 6 mirror sets of 2 disks ( effectively a raid-10 giving a total of 3TB usable disk but with what should be the best performance possible. creating the mirrors between controllers, disk 1 on controller one mirrored with disk 1 on controller two, would give quite high resilience when it comes to faulting disks - if the 'right' disks (or controller) fail it could survive 6 disk failures, in theory ) or using 11 disks in a raidz with one spare ( like raid-5, giving a total of 5TB raw disk but with less resilience to faults and the usual write penalty when using stripes/distributed parity ). It all depends on if the SSD logdisks is enough to achive good write performance, which I'm told they will.

The logdisk, the mirrored SSD, is another thing I'm pondering as they come in 120GB size and I guess that ZFS will use a fraction of that. Some tell me that partitioning each disk in 2 partition and use one mirror set of partitions for the write log and the other 2 mirror partitions for read cache is a viable solution. Wonder if that will be needed - our problems so far has not been about the read performance, just the write performance.

One person even told me that since I'm using SSD drives with MLC technology I could partition them into 4 30GB partitions each creating a 8-way mirror for the ZFS log in order to cover for cellfailures on the disk. I'm not that convinced as that will lead to 4 times more write operations and if a disk fails I still have to replace the whole disk.

Well, well, when the hardware arrives I'll have to do some testing I think. I'll write some updated when I have something to report 🙂

28Mar/11Off

Using a Debian server as NFS storage for VMware ESXi

When installing a new VMware environment at work I thought up the idea of not using expensive SAN disks to provide various ISO files for installing virtual machines or software. In my environment I have access to several physical linux servers running Debian (Squeeze) so why not use one of those for a (readonly) share of ISOs via NFS, especially since we have way too much unused storage in most of those servers anyway.

After picking a suitable server, scp'ing some ISO files to a directory - time to install a NFS server. Really easy, just:

# apt-get install nfs-kernel-server

and after installing some additional packages, there it was ready to use. A quick edit to the /etc/exports file to share the directory /export/iso as read-only to hosts on my backend ESXi network:

/export/iso         10.10.10.0/24(ro,sync,no_subtree_check)

and reloading the NFS server to enable the new config. Brilliant, that should be all (I thought).

Trying to mount the NFS share on one of my ESXi hosts it failed and gave a quite cryptic error message ("Unable to connect to NAS volume mynfs: Unable to complete Sysinfo operation. Please see the VMkernel log file for more details").

Ok, a quick look at the ESXi logs produced another message, "MOUNT RPC failed with RPC status 9 (RPC program version mismatch)". That's pretty odd. I know ESXi wants to use NFS version 3 over TCP but I also know that Linux has been NFSv3 capable since a long time by now. Time to see what the Linux server thinks about this. Glancing at the logfiles on the Linux side gave nothing useful. A rcpinfo maybe?

$ rpcinfo -p  | grep nfs

100003    2   tcp   2049  nfs
100003    3   tcp   2049  nfs
100003    4   tcp   2049  nfs

So I clearly see a NFS version 3 over TCP up there. Something else must be wrong, after checking all configuration again to make sure that there wasn't any weird stuff anywhere. Mounting the share from another Linux server worked (of course) so the functionality was there. Insert some headscratching here and a new cup of tea.

After looking at the output of "nfsstat" I realized that the NFS mount by the other Linux server generated statistics under "Server nfs v2" and not under "Server nfs v3" (all zeroes there). So the Linux-Linux mount used version 2 and not 3. Why? A "ps auxwww" later I noticed that the mountd process looked quite odd:

/usr/sbin/rpc.mountd --manage-gids --no-nfs-version 3

So mountd has version 3 disabled? No configuration I saw included that "--no-nfs-version 3" option so it had to come from somewhere else. A quick read of the /etc/init.d/nfs-kernel-server file gave the answer:

$PREFIX/bin/rpcinfo -u localhost nfs 3 >/dev/null 2>&1 ||                    RPCMOUNTDOPTS="$RPCMOUNTDOPTS --no-nfs-version 3"

The startup script looks for NFS version 3 over UDP (the -u flag) and if it doesn't find it, it adds the "--no-nfs-version 3" flag. The "rpcinfo" command above did not list any NFS over UDP so it would always add that flag.

A quick edit of the /etc/init.d/nfs-kernel-server file to change the '-u' to '-t' to look for TCP instead of UDP and a restart of the NFS service via:

# /etc/init.d/nfs-kernel-server restart

and voila! - it works. The ESXi host mounted the directory without a problem. Problem solved.

So in short, Debian Squeeze provides a NFS version 3 service over TCP but always seems to disable it in the start-up script as there is no such service over UDP. I guess there is a reason for that somewhere but it would have been nice to see a bit more flexible start-up script, it would certainly have saved me some time.

Edit: when googling on the above problem I came across posts mentioning DNS problems and systemclocks that was out of sync so I guess in theory that those things can cause troubles too.