Tangled in the Threads
Jon Udell, February 28, 2001The End of Backup
Will P2P synchronization change the rules?
We debate backup strategies, and contemplate a world in which data is always safely copied just as a matter of courseTim O'Reilly said recently that, in the not-too-distant future, peer computing will bring about the end of backup as we know it. In other words, when our systems are redundantly interconnected in a peer-to-peer fashion, our data will just naturally replicate and we won't have to make a conscious effort to create and manage safe copies.
We will, of course, have to worry about making sure that the right people can access those safe copies, and that the wrong people can't. This leads to an interesting conundrum. Here are two equally plausible but diametrically opposed possible futures:
A future in which cryptography and authentication work so well that our data resides out in the cloud of storage as safely as, and maybe more safely than, our data on removable media locked up in our homes and offices.
A future in which nothing that isn't air-gapped from the Net has any prayer of being safe and private.
These issues were debated recently in my newsgroup. What prompted the thread was my purchase, last week, of a CD-RW drive. It's been a while since I've used this technology. Years ago I worked for a company that made what might have been the first information product based on recordable CD (then called WORM -- write once, read many) technology. Later, at BYTE, I reviewed the first generation of CD recorders. Now CD recording, which is among other things a great backup solution, has reached the commodity stage.
Digression on Linux vs Win98 CD-RW setup
Well, perhaps "commodity" overstates the case slightly. I installed my CD-RW, the SCSI flavor of the Yamaha CRW2100, under both Linux and Win98. Although not officially supported on Linux, setup was flawless. But ironically, though it is officially supported on Win98, setup was a horror show.
It's my own fault. Win98 shouldn't even be on that machine, I had never used it until recently (preferring NT and now Win2K), but I figured, hey, the kids will like playing games on it, and it does talk USB to my photo gear, and millions of people use it, so how bad could it be?
Really bad.
After booting into Linux, checking out the drive, and writing my first disc on that side of the box, I rebooted into Win98 and...black screen of despair.
Part of the problem was that it detected the original CD-ROM drive (still in the system) twice, instead of one CD-ROM and one CD-RW. Part of it -- the worst of it -- was that the CD-RW ended up on the same interrupt as the video card. Finally I convinced Win98 to forget enough of what it thought it knew about its hardware in order to recover. This is a consumer experience? This is what the great majority who don't use Linux, or the Mac, or a proper version of Windows, put up with routinely?
Backup strategies
Enough griping. Both Linux and Win98 can see the drive now, it's a joy to use, and it is indeed a great backup solution.
Randy Switt:
For a while I was recommending Zip drives for personal storage, as they were relatively cheap, ubiquitous and easy to use. But now that CD-RW drives are VERY affordable and the media is MUCH cheaper, they make more sense than a Zip drive, particularly for anyone with a laptop (internal Zip drives for laptops are phenomenally expensive). Plus nearly anyone can read a CD-RW now, even if they don't have a CD-RW drive.
Of course with hard drives getting so cheap, it's tempting to bypass removable media altogether.
Mark Wilcox:
I know of several people who simply mirror drives because backup tapes just can't be trusted to work.
But we've got a ways to go to make this work. I just tried to synch files from my Windows ME desktop to my Windows 2000 laptop. Ended up setting up Apache with Mod_DAV and using Web Folders to publish them.
I think Groove -- I finally got around to using it the other day and it rocks! -- has shown how seamlessly the crypto and even authentication could possibly work. It's up to users, though, to determine if they really need it or not.
Indeed. When you are collaborating in a Groove shared space, or even just working alone in a shared space that synchs between two of your own PCs, it's a great relief to know that your data is never just in one place. This will inevitably, I think, become the norm. At some point we'll look back and wonder how things could ever have been different. But learning to trust a "cloud of storage" is a major shift, and it'll take a while to happen.
Meanwhile, there are a million backup scenarios. Here's another:
Michael Strasser:
I finally have a backup solution for my Win2K laptop that I'm happy with: an IEEE 1394 (FireWire) external hard disk. I keep the backup disk in a fireproof safe.
I had considered removable media, external tapes etc. but this setup enables a full backup of the system's 10GB hard drive three times. It appears to be the best value for money at the moment. I purchased the CardBus controller and disk (both Western Digital) on a recent trip to the U.S. for $US350 total. (They still cost much more here in Australia.)
1394 seems a logical choice for an external bus for laptops because of its ease of use and speed. It is still relatively new (compared to USB) but I'm seeing more devices advertised all the time. I'm thinking of buying an IDE enclosure into which I can put a CD-RW drive or another HDD.
The job's not done until you boot from the cloned disk
Christopher Spry:
For some years I have provided each of my systems (fortunately only running Windows 2000 and IRIX) with a spare hard disk, which is an 'exact' copy (i.e. replacement) for the system disk. Each backup contains the OS and applications. I create each backup whenever I have altered the system to the state that would take more than an hour or so to update the backup with the changes. I use Drive Image Pro for the Windows systems, and a script for IRIX. I check that they work by booting the system with the backup disk and checking that they perform correctly. I put these backup drives somewhere safe in the expectation that I will need them as a direct replacement for the system disk, or as a new disk for a replacement computer. I find I have to use them about once every six months, or so. They are very useful when I have installed software that wrecks my system.
I hold the firm opinion that no backup is complete until it has been shown to work, which rules out most disk mirroring and tape-systems for me. Can Michael Strasser's IEEE 1394 external hard disk backup be tested: i.e. can you boot from this external drive?
Michael Strasser:
No, The laptop's boot system knows nothing about the CardBus 1394 controller. Only Win2K does.
I agree that you need to test restores from backups, but I do not believe that a complete disaster recovery solution is the only way to go. Remember, this is a laptop, so I can't just put another disk on the IDE/SCSI bus.
If, for example, the disk dies I will have to start from a minimal Win2K installation (2 hours?) on a small partition and restore everything else to different partitions (one for Win2K & programs, one for my data) and set the restored Win2K partition bootable. (I have Partition Magic 6 so the partitioning is no problem.)
That is still better than only backing up the data. And that is way, way, better than no backup at all.
Christopher Spry:
Many notebook computers can only boot from one system disk, so we are forced to waste hours of time and effort when the hard disk dies.
But times change. Are drives now so reliable that we can forget disaster recovery? Are computers disposable, in the sense that we will replace them in a shorter time than it takes for the hard disk to die?
Times change
I would say rather: drives are so cheap that we can easily do a lot of replication of data, which is what's really irreplaceable. It's annoying to spend a day rebuilding a hosed system, but that's nothing compared to the loss of personal or business data.
I'll bet many if not most of the readers of this column have at least 2 machines, say a desktop and a notebook, and can work productively on either. My desktop and my laptop don't need to be bit-for-bit images of one another. If they have roughly the same apps and data, I can work on either. This doesn't get me out of the job of disaster recovery, but gives me some flexibility about when and how I attack that chore.
With our semi-disposable PCs and disks we're starting to create the kind of replicated environment that will, in the future, live canonically in the Net rather than in our own devices.
Anticipating that day, I've always been in the habit of not customizing systems to the nth degree. I tend to just accept and work within the defaults (screensavers, sounds) because I regard effort invested in optional customization as a kind of liability -- more "state" to backup, to worry about, to have to test the restoration of. So I travel as light as I can.
To a certain extent, I've already begun to rely on the "cloud of storage." For example, I can now burn a CD with all of the digital images that I've taken since I got my camera a year ago. And I probably will. But I noticed I haven't bothered to do it yet. Do I really need to? The truth is that I've already culled through those images, and placed the ones that matter to me, and might matter to others, on the Net. This is a fraction of the total. If I don't particularly care about the rest, who else is likely to?
I like the notion of a storage ecology that remembers stuff that is used, and forgets or loses what isn't. We have a kind of horror of forgetting and losing, but as our lives are increasingly recorded and stored in digital form, we'll find that there's a value in being able to forget and lose things, and in having systems actually do that for us so we don't have to make painful decisions about a lot of trivial things.
I remember watching Ted Nelson, at the open source conference at which he "released" Xanadu, obsessively videotaping his own life as he was living it, which is apparently his habit. Xanadu, of course, was to be a storage system that never forgets. This is a compelling idea, but in practice I'm not sure if it's really useful.
Peter Thoeny:
I like the idea of a system that never forgets. There are cases where you'd like to dig out something that has been in the attic for a long time.
It should work the way our brains do. We remember recent stuff quickly. We remember an event that happened a long time ago if it made an impression; we also remember those events by connecting them to related events.
How does that translate to a system that never forgets? "Remember" is search. That means the system could be designed to "forget" about data that is accessed rarely by placing it in the attic. Search brings up the recent stuff by default, which is fast. If you don't find what you are looking for you can do an extensive search that includes the attic.
Well said. Of course, unlike a real attic which has finite capacity, there is practically speaking no end to the amount of stuff you can throw into the virtual attic. The question then becomes: will search remain effective as a way to find things in the ever-growing pile? I think the jury's still out on that one. Internet search engines have kept pace better than I would have expected. But I don't yet have a high degree of confidence that I'll be able to find a document that I know exists somewhere on the Web.
Andrew Ducker:
Personally, as soon as I get a fast connection to the internet, I'm going to use it for my backups. I don't have anything to hide, so I'd just be using it for off-site file storage. I want whatever I want, where and when I want it, and I want someone else to look after it.
Honestly, I'd still be reluctant to do that. But I'll admit the reasons may be anachronistic. The physical security of my house, and the physical and electronic protections wrapped around my computer, are probably far weaker than what a well-designed and well-run data haven could offer.
Ray Ozzie's got the right idea with Groove -- as usual, ahead of his time. For the Groove user, the concept of backup ceases to exist. Safe copies are just a by-product of creating, using, and sharing data. That's clearly the way things ought to be, and will be. The only question is: how soon?
Jon Udell (http://udell.roninhouse.com/) was BYTE Magazine's executive editor for new media, the architect of the original www.byte.com, and author of BYTE's Web Project column. He's now an independent Web/Internet consultant, and is the author of Practical Internet Groupware, from O'Reilly and Associates. His recent BYTE.com columns are archived at http://www.byte.com/index/threads
This work is licensed under a Creative Commons License.