the lazy cheater’s guide to solaris

October 14th, 2008

Yes, it’s true. I haven’t blogged for a LONG time. Sometimes I find blogging to be a harmless way to vent my annoyances, and I guess I haven’t been all that annoyed lately.

Well this week I started using Solaris.

We are getting set to test LINA on some new platforms - branching out into other ‘nix’s, starting with BSD and Solaris. So I’m setting up test machines, and I’m thinking it’s all UNIX, right? I mean, how hard can it be?

To digress a little… This blog is addressed to a very small audience. As far as I can tell, lazy cheaters simply don’t use Solaris, unless, like me, they find themselves thrown together with it through some cruel twist of fate. If that is the case for you, you may find the following tips useful, if you’re running OpenSolaris 2008.05 anyway.

Installing OpenSolaris is actually quite pleasant, a huge step forward in usability from a Solaris 9 install I did a couple years back. A few nice click-throughs and it’s up, with a pretty Gnome interface and everything. I’m amazed. I begin to wonder if Solaris could possibly have become a lovely GUI-guided experience with plenty of back-patting and hand-holding. Would I be forced to retract all the snarky comments I’ve made lately about the non-usability that seems to permeate all things Sun?

So I go to set up the network.

Exploring the drop-down menus I find System-Administration-Network. Cool! I click on it, and it gives me this:

Solaris network admin message screenshot

Okey dokey. I like the handy link, especially useful because the whole point is that networking is actually NOTworking. So I guess I need to figure out what NWAM is, or at least how to disable it. This brings us to our first lesson:

  • Starting and stopping services
    Yes, there is a menu item called “Services” under System-Administration. Unfortunately, it does not appear to offer a complete list of services. I could find nothing that might be our friend NWAM. So the next stop is /etc/init.d. This directory is oddly vacant, save for some legacy scraps, and luckily a README file. This is the first breadcrumb on the trail. And this is the where the trail leads, in a nutshell.
    To see all services and their status, type:
    svcs -a
    Aha! There’s nwam, as svc:/network/physical:nwam. So how do we temporarily disable it,whatever it is, just until the next boot? Simple. Just type:
    svcadm disable -t network/physical:nwam
    And there you have it!
    Now, going back to System/Administration/Network, we get:
    Solaris network settings gui screenshot
    Sweet! A few clicks and I have the network up and running in DHCP mode.
    And then I go to test it. I type ‘ping openlina.com‘ at the command line. And I get: ‘ping: unknown host openlina.com‘. Drat!

  • Oh, you wanted DNS with that network?
    As a first shot at debugging, I try ‘ping 209.85.26.3. Success! So it’s just not resolving domain names. I guess that kind of thing is considered a bonus in Solaris land. What, don’t you know the IP address? You must be a lazy cheater.
    Why, yes I am. So I figured out how to make DNS work, as a service to my fellow sloths:

    1. As root, edit the file /etc/nsswitch.conf
    2. Change the line that says ‘hosts: files‘ to ‘hosts: files dns

    And there you have it. No need to even restart the network. See how friendly Solaris is!

  • Installing packages
    So by now you may have noticed that the only available console text editor is vi. If you’re a lazy cheater like me, you may find this unacceptable. So you open the package manager by going to System/Administration/Package Manager and hunt around a little. The only available text editor is something called SUNW-gnome-text-editor, which turns out to be gedit. Yuck. What if you want something simple and friendly, like nano? Are more repositories available?
    The answer lies at sunfreeware.com. Clearly by and for geeks, this website is nonetheless home to all manner of interesting and useful little Solaris apps. To find the right ones, just choose your Solaris version and hardware from the list on the right. So can this repository be added to the Package Manager? I couldn’t find a way to do this, but perhaps someone with a longer attention span could. Luckily, it turns out that installing packages by hand is not quite rocket science, once you know how. This is how it’s done:

    1. Select the package you want from the download page for your system. The relevant page for Solaris10 on Intel is www.sunfreeware.com/indexintel10.html. Warning - do NOT try to make sense of this page. Simply observe that there is a list of applications at the right. Choose one, say nano-2.09, and click on it.
    2. Now click on the link that appears at the top of the page and download the corresponding gzipped package file to your desktop. In the case of nano, the file is nano-2.0.9-sol10-x86-local.gz.
    3. Open a terminal and go to your desktop by typing “cd /export/home/username/Desktop“.
    4. Uncompress the package by typing “gunzip nano-2.0.9-sol10-x86-local.gz
    5. Install by changing to root (”su”) and typing “pkgadd -d nano-2.0.9-sol10-x86-local“.
    6. Now type “nano” at the command line and note that you get the response “bash: nano: command not found“. This is because the default sunfreeware install location is /usr/local/bin, and this is not on your path.
    7. Remedy this by editing the file /export/home/username/.bashrc. Add ‘:/usr/local/bin‘ to the end of the line that starts with ‘export PATH‘. After you do this close your terminal window and open a new one in order to set the new path.
    8. Now type “nano” at the command line and note that you will get a response complaining about the fact that ncurses is not installed. Remedy this by downloading and installing ncurses-5.6 from sunfreeware.com, as above.
    9. Now, type “nano” at the command line, and it will work!

See, Solaris is not that bad!

impatience never pays

July 12th, 2008

I guess we just need to accept the fact that the last hours of testing always stretch into days.At least this time each of us has actually gone home to sleep occasionally, between naps on the floor.

We have sixteen operating systems to support, and every single freaking one is quirky. We’d taken two small testing shortcuts, these past weeks. First, we’d been doing our main testing on just one-each of the main OS flavors - XP representing Windows, Leopard Intel representing Mac and Fedora7 standing up for UNIX. Second, we’d been testing hand-built installers, in order to speed up the turn-around time between bug discovery and bug stomp. Rebuilding from scratch takes at least a half day, while patching code is nearly instantaneous. We were eager to keep moving forward. No time for details like testing OS variants or building out the fixes.

And oh boy, did we pay for our impatience.

We expected glitches when we expanded to the full spectrum of OS’s. Some of the OS variants are puzzlingly unique when it comes to permissions, services and supported packages. Some are downright diabolical. We already knew that. Going from testing with our three little friends to including the whole mangy lot is like inviting the whole extended family over for dinner. It’s invariably messy, and it always goes on too long.

The installers generated by the build system should have been identical to the hand-built ones, and they almost were. There were only about two or three features that didn’t quite make it into the build system intact. Yeah, we’ve been doing continuous test builds, and yeah they were working to the extent that they were spitting out installers every day like chickens laying fat little eggs, but we were just watching those eggs fall into the basket, never noticing that they had started to get a tiny bit scrambled. Because of the 12-24 hour lag time between fix and build completion there were always fresher fixes to be fried up. And now we’re paying, with a 12-24 hour wait as we test each missing feature. Sure would have been nice if we’d been verifying the built installers as we went along.

Important testing note: Always test what you actually plan to use. At least occasionally.

Ok, better go check on those chickens.

I’m going to be SO happy when this is done.

stickiness

July 5th, 2008

I’m reading a very good book right now: Made to Stick by Chip and Dan Heath. You really ought to read it yourself, so I won’t go into much detail, but the basic premise is that if you frame your idea like an urban legend or a proverb, people will understand it, and spread it.

Which ideas become part of the common unconscious? What does everyone just know? Let’s see…

  • An apple a day keeps the doctor away.
  • Don’t take candy from strangers.
  • Campbell’s chicken soup is mmm mmm good.
  • If you’re throwing out the bathwater, check it for babies first.

What do these ideas have in common? Short, sweet, nice rhythm. A child could easily make them into colorful posters.

So the big question is, how do we do this with LINA?

We’ve got a tough one. LINA’s silent and invisible. It will mean different things to different people. To a software developer it will be a revolutionary breakthrough. To the head of an IT department it will be better than aspirin. To my mom it will be this thing she clicked on to make other things on her computer work.

What’s our sound-bite? Our tag-line? Our elevator pitch?

Invsible system-hopping super-bunnies? That lay eggs and grow wool and give milk? That future-proof your business while you are sleeping?

We’re working on it. Stay tuned.

release watch

June 17th, 2008

Release-week has actually become somewhat routine now. Yeah, we all sleep less and make stupider jokes. But really we’re getting pretty good at the whole thing. Now that the xml-based test system is clunking along fairly robustly, I can largely administer the builds from home. Which means Nile can call me at three A.M. with some fixes checked in, and I can drag myself out of bed and type a few characters and have the whole thing humming along again. No tougher than feeding an infant. In fact a lot less messy. And definitely preferable to our old test-week routine, where we took turns napping and pacing, with an occasional break for a groggy consultation followed by mad typing.

It helps that we’ve been testing lazily in the background during this whole release cycle, banging the kinks out of both the test system and the code itself. This time around we’re entering the formal testing period armed with a robust build machine. When all goes well, we can wander in in the morning and pluck its fruits for hand-testing. This leaves us a lot more time and brain space for other efforts, like rigorous usability testing, business-plan writing, and website design. Not to mention Haiku writing for the forum.

Mainly we’re fueled by excitement over this release. Everything’s starting to work nicely, rather than just limping along. We’re very much looking forward to letting this one out of its cage.

long weekend

March 12th, 2008

Wow, it’s Wednesday already.

We’ve had a long weekend here at the lab. Nile and Frog the dog have been mainly sleeping here, on the floor, warmed by the heat of our pounding and churning multi-brained test machine. I keep forgetting to bring a blanket from home. Paul and I have escaped occasionally to attend to domestic situations, but we have been on 24 hour call to network in and start or stop a test, examine results, hash out solutions, check in fixes, and get it all going again. We discovered a missing dependency on several disk images late Saturday night, and it was much easier to go in to the lab to examine the images than to try to check them all remotely. Networking in from home is like working in flatland - like peeking at the three-dimensional landscape of computers and workstations through a single line of characters on my screen. It works, but my late night mind needs more tangible objects to grab onto. Besides, it’s like a round-the-clock slumber party for geeks and dogs here. I hate to miss any of the fun.

Our big setback last week was discovering that after implementing all conceivable optimizations, doing an entire LINA build on a virtual OS takes 18-24 hours, even on our super-speedy Mac Minis with 2 Gig of RAM. Major monkey-wrench in our plans. We spent Friday ripping apart the build so that the host-dependent part could be tested independently of the LINA internals. That way we can do a full build on bare metal while simultaneously doing “host-only” builds on our whole zoo of virtual OS’s. This allows us to catch any platform-specific errors during the three-to-six hour virtual host-only builds, while the full build runs at the same time, testing the LINA end of things. The full build produces a LINA bootstrap disk that the host-only builds then need to retest. So that’s one level of iterativity. Then the platform-specific builds produce installer templates and platform-specific LINA install binaries, that then need to be tested on clean hosts, calling for a second iteration.

If anything breaks during the testing rounds, we fix it, then go back to square one testing-wise, to make sure our fix didn’t break something else.

You know, when you have one or two computers, it’s like having one or two children. With a bit of work and attention you can pretty much keep them clean and healthy. Yesterday, standing there surrounded by my ten test and development machines and dozens of virtual operating systems in various stages of LINA development, some sleeping, some awake, some needing to be registered with Microsoft, some humming along happily, some in need of serious first-aid, I had a surreal feeling that I was the matriarch of one of those enormous Irish Catholic families.

At any rate…

I think this is our final test round. Taking a break now to write some web content. Looking forward to getting this out there.

public service announcement #1

March 4th, 2008

During these recent long dark nights of my technological soul, the internet has been my best friend. Google has been my shining beacon of hope, illuminating the technical info-nuggets that have kept me on the precarious path of progress that rims the abyss of impossibility.

In the interest of giving back to the community, I want to share a few of the nifty tips that I have discovered, in hopes of guiding a future traveler whose path intersects with the one along which I’ve recently trudged.

So herewith, some nifty tricks that I wish I’d found when I was frantically googling:

remote run vmware
VMware comes with an awesome user interface - undoubtedly what most people use when they want to run a virtual operating system. Which is why it’s even more impressive that VMware comes with an equally awesome mechanism for running a virtual operating system without a user interface. This is a crucial ability if one is using ssh to methodically test software on a variety of virtual operating systems running on a network of real computers, since ssh chokes hard on graphical output. To make it work, simply type:

vmrun start virtualosname.vmx nogui

This gets the operating system virtualosname.vmx up and running in the background, and it does so very politely, with no squawking whatsoever about moved images or missing sound cards.

Then one just needs to figure out how to communicate with an operating system that’s so undercover that it won’t even reveal its own IP address. In all fairness, vmrun does come with some nifty tools. You can see them listed by typing “vmrun” without arguments on the command line. But I couldn’t get them to work right away, and I have a short attention span. Plus sometimes you just want to ssh on there and muck around with things. Which brings us to the next part:

vmware network ssh host port mapping
When VMware is installed on a new host machine, it chooses a block of private IP addresses for host/guest communication. As far as I can tell, it does this arbitrarily. It has not yet chosen the same block of addresses on any machine I’ve set it up on. In order to find out what address block has been assigned to VMWare on a given host, you can run ipconfig on Windows or ifconfig (usually sudo /sbin/ifconfig) on UNIX. If you’ve set up VMware according to defaults, the output of this program will include an entry for vmnet8. On this particular computer, this entry indicates inet addr:192.168.150.1. When a virtual operating system is set up using the NAT network configuration, this is the network address that the guest operating system can use to connect to the host.

But what we really want to know is how the host can connect to the guest OS.

It turns out that VMware defaults to assigning the address range from 128 to 254 to its DHCP server. In the case above, the addresses assigned by VMware to its guests would range from 192.168.150.128 to 192.168.150.254.

Left to its own devices, VMware will assign its guests these IP addresses in order - each new virtual operating system gets a new IP address. This is all fine if we’re assuming that each virtual operating system stays put on its original host. A quick ifconfig or ipconfig on the virtual guest OS reveals its assigned IP address, and once this is known the host and quest can communicate like any two machines on a network.

But what if the guest/host relationship is not so carved in stone? What if the guests are routinely packed up and sent to new hosts. Is there any way to contact them programmatically, in a consistent manner, despite all the variability and arbitrariness of VMwares’s network address assignment?

The answer is yes, and the implementation is two-pronged. First narrow VMware’s range of allowed IP addresses on each host, then assign the port of interest at the guest’s IP address to an unused port on the host.

  1. First set the networking configuration on the virtual operating system by selecting VM-Settings-Ethernet, clicking the “Custom” button and selecting “vmnet8″. Alternatively, on a Linux host, you can add the line ‘ethernet0.vnet = “/dev/vmnet8″‘ to the .vmx file.
  2. Now narrow the range of IP addresses that VMware can assign to new guests. On a Linux host this is done by editing the file /etc/vmware/vmnet8/dhcpd/dhcpd.conf. On a Windows host the file is something like: C:\Documents and Settings\All Users\Application Data\VMware\vmnetdhcp.conf. In either case, simply edit the “range” line of this file to narrow the range to two IP addresses. On my machine, I changed this line from this:
    range 192.168.150.128 192.168.150.254;
    to this:
    range 192.168.150.128 192.168.150.129;
    This looks like it should allow VMware to assign both of the addresses in the range, but for some reason VMware doesn’t use the highest address. Therefore, with this configuration VMWare will always assign the IP address 192.168.150.128 to guest OS’s running on this host. Which is exactly what we want!
    Remember though, that the IP address ranges used by VMware on different hosts are assigned in a seemingly arbitrary manner. All the guests on one host might be assigned the IP address 192.168.150.128, while on another they might be assigned 172.16.180.128. Still a bit of a problem if you want to connect to all your guests on each host in the same way.
  3. In order to make the guest OS’s on ALL our hosts accessible using the same ssh call, we reassign a particular port on each host to the ssh port on the IP address chosen above. This is done on a linux host by editing the file /etc/vmware/vmnet8/nat/nat.conf, or on a Windows host by editing the file C:\Documents and Settings\All Users\Application Data\VMware\vmnetnat.conf. Inside the file, simply uncomment the line in the SSH section that begins with numbers, then change the four-digit port number to the port that you want to map to the guest SSH port. On my machine, the SSH section looks like:
    # SSH
    # ssh -p 4444 root@localhost (this line is just a nice clue from VMware - don’t uncomment!)
    4444 = 192.168.150.128:22

    This allows me to connect via ssh to any user on the guest via the port 4444 on localhost, using the command ssh -p 4444 username@localhost.
    Woohoo!
  4. Finally, restart the vmware service (on a Linux host type “sudo /etc/init.d/vmware restart“, on a Windows host you’re on your own). Or simply restart the machine. And there you have it.

Up next:

convert qemu disk image vmware
windows 2003 server network new dhcp
windows vista cygwin ssh

the twenty-eight day week

February 25th, 2008

It was the last week in January, nearly a month ago, and we were preparing for the imminent release of LINA 0.72.

It was down to details. We believed ourselves to be past the unknown unknowns. Nile had hacked together code that could run natively on Linux while generating installers on all the major platforms. Paul and I had a sweet little test system sketched out - seven real-live computers wired together and KVM-d in to two workstations. Five host machines ruled by Zuul, the Gate-keeper, programmed to distribute LINA test-runs to our zoo of real and virtual operating systems, a machine dual-booting the two latest Mac flavors on i386, with the dual-booting Mac PPC machine waiting in the wings. That plus our QEMU disk images in various stages of completion. Debian4, Fedora7, Fedora8, OpenSUSE 10.2, OpenSUSE 10.3, Ubuntu Feisty, Ubunty Gutsy, Windows 2000, Windows 2003, Windows XP and Windows Vista, fully prepped with the LINA prerequisites and compressed from 30Gigs into nice little 1-2 Gig tarballs and camping out on our networked hard drive.

All was in order. In test-runs the convoluted scripts were delivering lovely false results files back to Zuul. The 0.72 release including the new install code was producing all kinds of actual GUI elements, running sweetly, if primatively, on the development machines. Yes indeed, a tiny bit more testing, and we should be good to go. LINA 0.72 would be out within a week. It was down to details.

And the long week began.

Day 1 - The Dell laptop host boots into blue screens and kernel panics after a package update. After hours of fsck-ing around with it, I pronounce the hard drive dead.

Day 2 - The first test run doesn’t quite loop right, eats its own brain, and leaves tarballs strewn across the hosts. I write a little cleanup routine.

Day 3 - All of the tests either fail outright or time out. This with the 0.71 release, known to work on seven of our platforms. I fix the passwordless sudo situation on three guests, pass host ssh keys to two other guests, and change the code to allow a long wait time during the kernel build.

Day 4 - Add wheel group to the Ubuntu images thus REALLY fixing their passwordless sudo problem. Increase time-out again.

Day 5 - All-day business meeting, then research into the blue-screen problem with starting Vista on QEMU.

Day 6 - Paul sets up Good Old Dell with the new 120G hard drive. Installs Fedora7, whacks self on forehead then installs Fedora8. I struggle to figure out why we’ve only had one successful build so far.

Day 7
- Fit out good old Dell with the qemu packages (gcc3.4, kqemu, qemu) and the git packages (git and curl). Set up ssh. Open each diskimage and pass it the revised authorized_keys file.

Day 8 - Attempt a day of rest. I come in to find out why the Good Old Dell build has disappeared from the radar. Turn off strict-host-checking in its ssh-config. Oops. Restart tests. Get in the car around dusk to go home. Car doesn’t start. Lots of time on the bus ride to think about improving test outputs.

Day 9 - Car has a broken timing belt. Still struggling to understand the lack of good builds. Set up scripts to preserve ALL output, with the option to preserve machine states as well

Day 10 - Still struggling. Add a missing package to Ubuntu Gutsy and install kernel sources on Debian. Increase the time-out.

Day 11 - Another successful build - OpenSuse 10.3 on a Mac Mini host. Which is weird because that image had hung earlier on a different host. Hmm.

Day 12 - Examining the results more carefully. Every Windows build has timed out. Every build on the HP Desktop and Good Old Dell have timed out. Hmm.

Day 13 - Another attempted rest day. Fooling around with getting Vista up on QEMU. Try translating an image built using VirtualBox, which kinda works except for nearly complete lack of hardware compatibility, including networking.

Day 14 - Board meeting and first round of installer usability testing. Begin exploration of the ACPI problem that prevents Vista from running on QEMU. Start another test run with even longer time-outs. Installer on Mac and Linux is sweet but still a tiny bit awkward. Nile takes it home to work through details again.

Day 15 - Recompile QEMU repeatedly, following clues from forums, working through different ACPI options. Still no luck with Vista. Starting to give up hope with Vista. Plan B, real hardware, may be the only choice. How barbaric.

Day 16 - Struggle to understand last night’s results. Still no completed builds on Windows. Bring home XP image to run it by hand on QEMU on my development laptop. Fedora7 has finally built, on a mini - after a timeout and a memory failure on other hosts. Hmm.

Day 17
- XP build, virtualized on my own machine appears to have hung. It trudged along for ten hour or so, but has made no progress in the last eighteen. On XP installed on hardware the entire build took about twelve hours. Cygwin on Windows on QEMU seems to be a disappearingly narrow bottleneck.

Day 18 - Still no completed builds on the slow HP desktop or Good Old Dell. It’s all starting to make sense. Start a local build of OpenSuse10.3 (known to be a good image) on Good Old Dell, with no time out

Day 19
- Finally get around to investigating the wxWidgets failure on the Mac Tiger build. Adjust script to copy libtoolize to system area. Still fails. Brainstorm with Nile and Google - seems like the latest Tiger update isn’t supported by our version of wxWidgets. Find a proposed fix to include in the current code.

Day 20 - Day of rest… Good Old Dell is still chomping away on the build. 36 hours is too long. The HP is even slower. Looks like we’re down to three host machines.

Day 21
- The super-speedy dual core 64-bit Dell server host has become suspect as well. Builds either hang or wind up with an alloc error. Are we maxing out QEMU’s memory at nearly 2 Gig? Is it some other problem? Tests are devised.

Day 22 - The Dell server has joined our collection of useless hosts. Sigh. Down to the two Minis as host machines. And still no good builds on Windows images, regardless of host. Meanwhile, the installer code is almost rock-solid. We really need to start testing the current code soon…

Day 23 - Paul and I lay it on the line - we need to do something fast. We can limp along with the two Mini’s for testing Unix images, but what about Windows? Better virtualization? Testing on bare metal? A bit of tense brainstorming, and we decide to do a trial run using VMWare.

Day 24
- Pounding on VMWare - nice interface, super speedy. The Debian image I convert runs with no graphical output, but the converted XP image doesn’t boot. Looks like we’ll need to re-create Windows images for VMWare. Can VMWare support the internal LINA virtualization? Tests are devised.

Day 25 - Building LINA on Vista on VMWare from a script! Now if we can just get a handle on the seemingly randomly assigned IP addresses, and get VMWare to stop quizzing us about relocated images…

Day 26 - Meanwhile, back at the lab, testing new code on Mac Tiger and the Linux distros.

Day 27 - Windows installer code complete, ready to test Windows builds by hand using VMWare - saving the automation for next time!

Day 28 - Test system is up and running, installer code is finished. Just a tiny bit of bug-stomping and we’re good to go.

Within a week for sure.

the end of the networked hard drive story

December 7th, 2007

In German the word for it is “glimpflich”. We don’t have a word for it in English, so we express the concept by saying: “It couldda been a lot worse”.

I couldn’t make it back into the office the night I pulled the plug on the testbed, so I went to bed haunted by the knowledge that we had suffered a major setback. At best we’d lost our storage solution. At worst we’d lost the days of work that it had taken to create the disk images stored on that evil hard drive. Secure storage solution, hah! I fell asleep plotting redundancy - dual backups, triple backups, sending 10 Gigabyte compressed images to our remote server, stashing hard drives in bank vaults, burying DVDs deep in remote glaciers… I did not rest well.

Finally, well before dawn, I got up, made some tea, and tiptoed out of the house. It was a good time to be on the freeway. I was at the office within minutes. I opened the door and there it was, the Iomega, sitting mutely, all its lights shining brightly, smugly holding our data in its little cuboidal belly. I hated it.

I powered it off, waited two full minutes, then powered it back on. I tried to connect to it through the network - nothing at its IP address. I tried a USB connection - no response from any of our OS’s. Finally, as a last resort, I dug up the Iomega install disk with “Disaster Recovery Tools”, dusted off our Windows box, and inserted the CD. I selected Detect NetHDD, and thought, “OK, Go for it. Good luck.”

The slow old machine pondered for a full minute before returning the expected result:

No Networked Hard Drive Detected. Troubleshoot?

Fine, ok, let’s troubleshoot.

Troubleshooting:

1. Is the host computer connected to the network?

Oh. Right.

So I connect the network cable and try again. This time I swear I hear a heavenly chorus. The Iomega network interface pops up in its full glory. No errors or warnings, all data apparently present and accounted for. It only takes a little more mucking around to figure out that the device has apparently forgotten all of its connection settings. A few clicks, and it’s fully connected.

Amazing relief washes over me. The sun is just coming up, the world is new and beautiful. I set up a backup of all the Iomega data to our big-disked server machine, then walk to get fresh bagels and head home to start my real day. It’s been a long night.

long night

Much later, I go back to the office and “protect” our power supply with some electrical tape:

protected power supply

Tomorrow we’ll buy an uninterruptible power supply, and set up a system to back up the disk images to our remote server.

interruptible power supply

December 6th, 2007

Paul and I have been working like maniacs to set up the hardware and software for our testing system. Maniacs who sleep and get fresh air from time to time, of course. We’ve got a whole little zoo of test machines - five “host” boxes running Fedora 8, three assorted Macs, and a slow old laptop for administering the builds. These are all nicely wired together, along with our 1 Terabyte Iomega StorCenter Pro networked hard drive that stores our virtual operating system images.

So anyway, yesterday we were going at it hard - all the machines set up and humming. Paul spent the day connecting hardware and setting up virtual operating systems while I pounded away at our Python scripts. We were making good progress, and the beginning of actual testing was in sight.

Late in the afternoon I got up to leave. And tragedy struck. Reaching down to unplug my laptop from one of the power strips neatly strewn about on the floor, I lost my balance just enough to put my hand down hard… right on the on/off switch to the power supply. In a moment of panic I immediately flipped the switch back on, effectively creating a nice little brown-out - exactly the best recipe for messing up electronics.

The first obvious effect of the incident was that Paul’s two nearly-finished virtual OS’s had vanished into the ether. He was very decent about it, but I felt awful. It wasn’t until I got home and checked my email that the real nightmare began. In my inbox was an astonishingly polite note from Paul, noting the unfortunate fact the the networked hard drive’s lights had come back on, but otherwise it was non-responsive. Perhaps our little collection of virtual images and archived data was still there, perhaps not. The lights were on, but no one was home.

A tiny bit of anguished googling brought no solace. It turns out that the Iomega StorCenter is not the most reliable piece of equipment, despite glowing reviews from new owners. Apparently Iomega utilizes the Hitachi Deskstar, aka “Deathstar” drive, which is notorious for crashing and eating data for no reason at all. Subjecting one to a brown-out was surely a very bad move. The most hopeful article I could find was one that provides instructions for possibly salvaging some data by removing the drives and physically mounting them on a host computer.

So the upshot, as of late last night, was that our data was clearly AWOL, and on top of that we were almost certainly no longer in possession of a useful networked storage solution. And the obvious moral of the story is that we need to start acting like grown-ups and protect our power supply.

I guess it had occurred to me once or twice that our power supply wasn’t quite optimally safeguarded. But, you know, we were going to fix it later, and meanwhile we were pretty careful, and the dogs had really mellowed out since they’d gotten more used to the new office, and we kept the power strips fairly well out of the main foot-traffic zone…

In case anyone wants to try it themselves, here are some simple methods for ensuring an interruptible power supply:

The manual method
The manual method.

The footal method
The footal method.

The canine method
The canine method.

Up next: The end of the story , and lessons learned.

peace and productivity through enlightened despotism

November 27th, 2007

OSCON is a monumental gathering of like minds. OSCON 2007 was the first such gathering I’ve been to, and I felt like a three-year-old at Disneyland - overstimulated and completely enchanted. It has been many. many months since it happened, but what I learned in a single session there has lingered and reverberated in such a way as to have a lasting effect on our work.

The first couple days of OSCON are devoted to three hour seminars. I arrived too late for the Monday morning session, but on Monday afternoon I heard Mark-Jason Dominus talk about Advanced Techniques for Parsing, a mind-blowing exploration of the power of abstracting out and codifying patterns. Parsing has been a passion of mine since I became smitten with AWK for DOS approximately an eon ago. The way Mark-Jason conjures fractal-like complexity from intelligent folding of simple code patterns was enthralling. Plus I picked up at least a dozen nifty Perl tricks. All-in-all a lovely afternoon.

Lovely, but not quite life-changing. That didn’t happen until Tuesday morning. In choosing my tutorials and sessions, I decided to make sure I was picking up at least a bit of practical information, not just running around slurping up tasty bits of random knowledge. To that end, I chose a tutorial called “Technical Management of Software Development”. Not exactly the most thrilling title. I pictured some buttoned-down guy with a haircut droning on about best practices. But I had been having more than my share of fun, and I’d been thinking that someone around here really ought to learn a bit about how the big boys manage software teams. I decided to bite the bullet and try to sit through at least part of it.

So you can imagine how shocked and delighted I was to find myself three-and-a-half hours later, not just still sitting but on the edge of my seat, overflowing with excitement about Alex Martelli’s simple words about decency, kindness and common sense.

In my experience, “management” seems often to be about dehumanization, about setting up policy and structure in a way that is theorized to maximize output. The modern workplace seems much like the modern dairy farm, with everyone lined up and scheduled and cubicled. It’s all about production - “human resources” with the emphasis on the second word, not the first. It was hard to imagine wanting to incorporate “management techniques” into our rag-tag, dogs-at-the-office, sleep-on-the-floor working style.

Alex Martelli’s approach is exactly the opposite of dehumanization. It’s all about understanding the universal and particular needs and desires of each person on a team. He spoke the self-evident truth that a software team is maximally effective and productive only if each human feels happy, well-liked, encouraged, respected and well-taken-care-of. And he followed this premise with many, many practical guidelines for creating this situation. Among the pearls of wisdom:

  • We have all evolved over millenia to sense whether someone really likes us or not. Moral - if you don’t genuinely like the people you are managing (or people in general), you need to find a different job.
  • All of the “motivating factors” in Maslow’s hierarchy of needs must be fulfilled in order for people to be happy and creative. Alex adds “transcendance” at the peak of the pyramid, describing this quality as “helping others for the sake of helping others”.
  • Therefore…
    • We need to safeguard the physical well-being of the team. In our case that means strongly encouraging a 45 hour work-week - a 180 degree shift from our prior round-the-clock mode.
    • We need to feel secure and productive. In our case that means being realistic about what we can produce in what amount of time, and understanding that we are in “innovation mode”, meaning that we need to schedule our deliverables no more than two weeks out, with a meeting at the halfway point to make sure we’re on track.
    • We need a beautiful, comfortable place to work
    • We need to do whatever is necessary to nurture our feelings of affection, belonging and respect for each other. For us that means lots of face-time - brief meetings almost every day in order to keep our creative differences from widening into chasms.

I left the seminar and went downstairs to collect as many of the bound hand-outs as they would let me have. I now carry one in my computer bag at all times. And then I appointed myself chief dictator of our software team and set out to whip us into shape.

The upshot is that we have become much, much happier and more productive. The 45 hour workweek idea isn’t exactly happening, but it does feel freeing to allow that goal to replace our former goal of working until our eyes started to bleed. Sticking to a two-week-plan for deliverables makes everything MUCH more pleasant. Frequent reality checks allow us to give the “unknown unknowns” the respect they deserve, and there is much less of that horrible desperate feeling that comes when complexities arise and get in the way of the fantasyland schedule. In terms of peace and esthetic beauty, our wonderful CFO went way beyond the call of duty to set up a developer’s paradise for us, complete with huge desks and sturdy test machine racks and plenty of Feng Shui. Finally, in this new condition of feeling safe, peaceful and well-rested, we are finding it much, much easier to be nice to each other.

Many thanks, Alex. Please write a book!


Close
E-mail It