Jon Udell: From virtual memory to object storage

Tangled in the Threads
Jon Udell, August 13, 1999
From virtual memory to object storage

It's funny how you take some things for granted, like virtual memory. A few weeks ago Franck Arnaud made this provocative comment:

Of course, virtual memory is likely to be useful for legacy apps but is it a good thing in modern operating systems?

Virtual memory was invented when memory was a very scarce resource (expensive, small) and the main programming language used was C or other languages using similar memory management schemes.

Nowadays, if we were to design a workstation or server OS from scratch, would we want virtual memory?

Heresy? Well, it got me thinking. There's a point at which a change in degree becomes a change in kind. The quantities of RAM that we think nothing of stuffing into deskside PCs were unthinkable just a few years ago. When we routinely dedicate 128MB to Win9.x, or 256MB to NT/Win2K, is it possible that some old rules no longer apply?

As we discussed last week, the QNX 1.44MB Web Challenge certainly is food for thought. An (VM-less) OS, TCP/IP stack, modem-, netcard-, and graphics-detection logic and drivers, windowing system, Web browser, and Web server, that boots and runs from a floppy? Maybe Windows could learn something from this innovative realtime OS.

Franck Arnaud:

If I could design my ideal computer, it probably would not even have a hard disk. I could do all my computing on a machine with say 128 MB of RAM and 128 MB of flash. So could your mother do her wordprocessing, accounting and web browsing. But the software is not there so your mother and I have to go with bulky, ugly, noisy, inefficient PCs with Linux or Windows.

I'd rather do something else instead of waiting for my NT machine to swap around -- often when I have all the real memory I need to do everything in RAM, even with the bloated apps I use!

Similarly in a server environment. I'd prefer a server that quickly refused a request when it's fully used than swapping to death trying to serve too many requests and taking ages to do so. And it could serve more requests in the first instance if it were not so busy swapping.

Bjørn Borud isn't buying this argument, though. It's true that QNX has reminded us you can cram elegant, modern GUI applications into incredibly tiny spaces. But that doesn't mean that a general-purpose OS doesn't need VM:

I have a lot of applications running at any given time. Some of them can sit idle for minutes or even hours or days. There is no point in having their code and data in memory -- it is just wasted RAM, so you throw it to swap and use the freed memory to other things. Like disk cache for instance. This exact use of memory is extremely useful if you operate on large files or many files.

He adds:

Memory-mapping files is a powerful way to use virtual memory and an elegant one. Consider doing some processing on a large file that involves several random access patterns. Which is easier and more elegant: accessing array elements and letting the OS worrying about fetching the data, or fetching and dumping the actual data yourself?

Virtualizing objects, not 4K pages

Fair enough. Maybe what's needed is not to eliminate VM, but rather to raise the level of abstraction at which it operates. Today programs create and use in-memory objects, then jump through hoops to decompose those data structures into some kind of storage representation -- for example SQL records.

Object databases come along and say: "Let me handle your data structures, and make them magically persistent." Under the covers, the OODB is a kind of VM system, but it's about objects, not 4K pages (or rows and columns). The objects that your program actually touches are kept in memory by the OODB, the ones not used can age out. This arrangement is frictionless, in that there is never an explicit mapping made between in-memory data structures and storage.

What would an OS would look like that was built from the ground up in an OO language whose data objects automatically persisted to an object database. The OODB might even be in cahoots with OO language's garbage collector.

From an application's point of view, there wouldn't be the concept of an email message, or a wordprocessor document, as data stored in a file. There would be the concept, simply, of an email message or a wordprocessor document, and these things would be directly storable as such.

"What about the 100MB wordprocessor document?" OK, it's not just one big atomic object, it has substructure that's represented as XML, but stored as subobjects in the object file system.

I've been thinking about this because I just tested and reviewed Object Design's eXcelon, a first-generation "XML data server" which does exactly this: puts an XML interface onto persistent object storage. I think this is really, really neat. It's the kind of facility that I can imagine living not only in user space, but perhaps also (or instead) in kernel space.

Not so fast, says Bjørn Borud:

This can be accomplished today; it doesn't need OS support. My feeling is that letting the OS know about email and worprocessor documents is letting the OS know too much. I think the OS should be more primitive than that. It should know about "chunks of data." Right now we have chunks in the form of file systems, files, memory regions and would-be-chunks in the form of data streams.

Cairo: still ahead of its time?

Well, in Windows, a wordprocessor format is in fact registered with the OS as a COM GUID, as is the automation API associated with that wordprocessor app. Is this "too much" app-specific knowledge to be (nominally) "part of" the OS? I'm not sure, but I'm not actually arguing for or against that mechanism.

I am arguing for a lower level of chunking than the app level, but a higher level than files, memory regions, and data streams. At this hypothetical intermediate level you could deal primitively with an object that might have a set of attributes (think email headers) and one or more associated data streams (think email body, perhaps with attachments), and also some relationships to other similar objects (same author, same containing folder, etc.) What seems interesting to imagine is the ability to implement an email datastore (or a CAD drawing, or many other kinds of typically app-specific datastores) on top of these kinds of native OS services:

field-indexing of attributes

fulltext-indexing of data streams

object-level virtualization

serialization

query capability

transactionally-consistent update capability

object-to-relational mapping where appropriate

MS Cairo was headed in this direction, and I saw early demos of some of these ideas back in 1993. It was then, and may still be now, ahead of its time, but I think that the basic level of OS storage abstraction will inevitably rise, just as the basic level of programming abstraction has risen -- and for the same reasons.

Perl wishlist

Suppose you could sit down with a roomful of Perl luminaries, including Larry Wall, Tom Christiansen, Chip Salzenberg, and Dick Hardt. Suppose the purpose of the meeting was to discuss what should be the strategic priorities for the next iteration of Perl.

My list would include:

A GUI binding technology that is more complete, and easier to deploy and use than Perl/Tk. I'd love to be able to build, for example, a deeply-programmable messaging client made of Perl bound to a GUI, but there's not an obvious way to do that.

A more well-defined way to construct Perl-based network services. Think Zope for Perl. There is mod_perl, an indispensable tool, but nothing which is the kind of framework that Zope provides for Python.

Object database bindings. Perl's breadth and depth of SQL support is a major point in its favor, as compared to, say, Python. But unlike, say, Java, Perl doesn't have a well-established mechanism for binding to object databases. There's a one-off driver for ObjectStore, but nothing organized like Perl's DBI/DBD facility, which would support other OODBs and perhaps also the emerging XML data servers.
Better UNIX/NT standardization. The situation has improved enormously; a year ago at this time was the debut of the first merged version of Perl. Before that, you couldn't make a standard Perl module on NT using the mantra:
perl Makefile.PL
make
make test
make install
and now you usually can. Usually, but not always. There's also still an uncomfortable divide between ActiveState's cool, XML-based, downloadable, but not always current modules, and CPAN's old-fashioned, downloadable, current, but not always NT-capable modules.

What items would you put on the list? Drop by and let us know because, at the upcoming Open Source convention, I'm going to get the opportunity to relay these kinds of requests to the aforementioned roomful of Perl luminaries.

Jon Udell (http://udell.roninhouse.com/) was BYTE Magazine's executive editor for new media, the architect of the original www.byte.com, and author of BYTE's Web Project column. He's now an independent Web/Internet consultant, and is the author of Practical Internet Groupware, forthcoming from O'Reilly and Associates.

This work is licensed under a Creative Commons License.