Tangled in the Threads

Jon Udell, May 16, 2001

GUI: Past, Present, Future

Mouse gestures, GUI scripting, and GUI bloopers

The venerable GUI still affords plenty of scope for innovation. But before web designers reinvent everything, they should take a history lesson.

Last week, Randall Parker alerted my newsgroup to some clever UI innovation in the Opera browser:

Randall Parker:

Anyone tried Opera 5.10 or 5.11? They introduced a feature that I think is really cool: Mouse Gestures. The idea is to give you more things you can do with just a mouse.

Examples:

  1. Hold down second mouse button and then a leftward motion goes back a page and a rightward motion goes forward a page.

  2. Hold down second mouse button and then a down/up motion opens another page with the same current URL.

  3. Hold down second mouse button and then a down and then rightward motion closes the current page.

They have some others. But this is enough to get the general idea. You can do a motion that, for instance, opens a new window on an URL. You can use the mouse scroll wheel to move between multiple opened windows.

I think all this is neat stuff. Anyone else agree?

Yes, we all did agree. It's a reminder that even when WIMP (Windows, Icons, Menus, Pointing device) technology might seem to have reached the end of its evolutionary road, there are still ways to expand this channel of communication between people and computers.

Of course, there's also really nothing new under the sun:

Alan Shutko:

Gestures in general have been around for a long time. Mentor Graphics has been using them since at least 1994 or so (when I first heard of it). I _really_ don't think that using gestures for back button is worthy of a patent in and of itself, but....

Another newsgroup participant mentioned that SGI graphics software also uses mouse gestures.

The whole conversation got me thinking, again, about the tremendous challenges that we face when trying to make graphical interfaces adequate to the tasks for which we build them. Everybody's got a million GUI stories, but here's one that just came to light for me. I have, lately, been making an effort to use the Outlook Express newsreader instead of my preferred Netscape 4.x newsreader. The latter is extremely capable, but joined at the hip to the Netscape 4.x browser which has -- sadly -- become a liability. I'm finding it quite hard to like Outlook Express, though, and as I analyze why that's so, it turns out that some apparently small UI details are major obstacles.

Here's one of those details. I frequently switch between threaded and unthreaded views of my newsgroups. The context is that I remember a comment someone made, I can't remember in which thread the comment appeared, and I would like to find and review the entire thread. Bear in mind that my default view is threaded, and date-ordered. In Netscape's newsreader, I can do this:

  1. Click a button to sort by Sender, which flattens thread hierarchy.

  2. Locate the message using Date and Subject cues. (Hopefully, it has been tagged with a descriptive Subject!)

  3. Click another button to group messages by thread, restoring hierarchy.

In Outlook Express, this same sequence goes like this:

  1. Use View->Current View->Group Messages by Conversation to flatten thread hierarchy.

  2. Click a button to sort by Sender.

  3. Locate the message.

  4. Use View->Current View->Group Messages by Conversation again to restore thread hierarchy.

The Group Messages by Conversation menu toggle is the fly in the ointment. Where Netscape's newsreader requires only a click, Outlook Express requires: click, drag, slide down, slide right, slide down, click. Or, using accelerator keys, ALT, V, V, G. Either way, this adds up to significant impedance. The application, which did not expect this would be a frequent operation, has arranged things in a way that virtually ensures it will not become a frequent operation.

Should Outlook Express have anticipated such behavior? Arguably not. This is not at all the kind of thing you'd expect users to do when using a newsreader for its original purpose -- to read USENET news. In that environment, messages are transient, and there is little expectation that something said weeks or months ago might need to be found, reread, and even commented upon.

The BYTE.com newsgroups exemplify a very different use of the same technology. There, messages are preserved indefinitely, and conversations can be revisited and re-explored. Private and non-expiring newsgroups are also, as I argued in my book, a great collaborative tool for intranets. Perhaps the designers of Outlook Express should have anticipated this, as the designers of the Netscape newsreader did. But given that they did not, how might the software have managed the omission more gracefully?

Making GUI apps more extensible

A few months back, in a two-part column on managing web images, I extolled the scriptability of applications such as ImageMagick and gPhoto. The latter, in particular, was notable for being a GUI app that goes out of its way to expose an alternative command-line interface suitable for scripting.

That's well and good, but not sufficient:

Charlie Clark:

I used to do this sort of work (crop & scale images, put them on the web) a lot myself and was constantly frustrated at environments which weren't conducive to optimising it. Adobe Photoshop now includes some batch functions and JASC's Image Robot is very good but both are overkill without really doing the whole job (producing template based HTML). So I was confronted with a similar problem but as a user of the BeOS (http://free.be.com/) I came up with a different solution.

Command-line scripting is great but suffers from a lack of consistency in the interfaces: scripting ImageMagick is different to scripting sed, awk or the shell itself because the syntax and semantics are often unique to individual applications.

Because messaging between applications is an integral part of the BeOS I was hoping to find a way to automate my favourite imaging editor to to the work. This is possible using the "hey" utility from Attila Mezei. "hey" allows scripts to control programs remotely in a fairly uniform manner. As with most program automation environments "hey" requires explicit support in a program for all but the most basic controls. Sander Stoks "Becasso" (http://www.bebits.com/app/74) is an excellent example of this. Using "hey" to control "Becasso" I was able to write a script to scale and crop images which turned out very useful when putting the photos (over 70) from the last BeGeistert online!

With the release of BeScript (http://www.begeistert.org/bescript/bescript.zip) application scripting has become more powerful and easier to use in the BeOS. BeScript requires less explicit support in applications while at the same time giving more control -- more applications can be automated, and to a greater degree, as exemplified in the office suite Gobe Productive.

What's still missing is a kind of "record and play" facility which would allow non-programmers to create scripts without having to learn a shell, "hey" or a programming language.

Right. It'd be great if GUI apps were just built on top of frameworks that made "record and play" inherently available. To return to the Outlook Express example, I'd simply record ALT, V, V, G, then assign it to a toolbar button. Of course even this requires more thought and effort than most users will want to expend. Nor should we blame users for this reluctance. It's a rational strategy. Effort invested in customizing apps turns out to have been wasted if you find yourself using an instance of the app on another machine, or if your customizations vanish after a disk crash and recovery.

So while we're dreaming up an ideal solution, let's shoot for the moon. Suppose that the primitive features of every GUI app are available for scripting and "record and play" -- just by virtue of the underlying frameworks, with no special effort required of the programmer. Suppose, further, that behaviors -- such as the newsgroup-viewing behavior I've described here -- can be expressed in some standard way and published as URLs. So for example:

<behavior 
   app="Outlook Express"
   name="Group Messages by Conversation">
<implementation mode="keystrokes">
  <key>ALT</key>
  <key>V</key>
  <key>V</key>
  <key>G</key>
</implementation>
<invocation mode="toolbar">
  <button> 
    img="http://....
    label="Group Messages by Conversation"
  </button>
</invocation>
</behavior>

Now such behavior can be transmitted virally:

Me: It bugs me that you have to do ALT,V,V,G in Outlook Express, while Netscape only needs a click.

You: Try this: http://www.AppBehaviors.org/oe/GroupMessagesByConversation.

Me: Click...that did it, thanks!

While we're at it, let's further suppose that when I click that link, I'm offered the option to install that behavior in a web-based profile, to which I can then attach any instance of Outlook Express that I find myself using.

OK, OK, I'm not holding my breath waiting for this to happen. But I can dream, can't I?

Jeff Johnson's GUI Bloopers

Anyone with more than a passing interest in issues of GUI design will profit from reading GUI Bloopers: Don'ts and Do's for Software Developers and Web Designers, by Jeff Johnson. Like the classic works by Edward Tufte, Johnson illustrates deep principles using concrete examples.

Two such principles are:

A "GUI blooper" that violates both of these principles is the one Johnson calls dancing tabs. Not to pick on Outlook Express, because it is only one of many applications that commit this blooper, but here is how it looks in OE:

Dancing tabs: before

Too many tabs. Solution: arrange them in multiple rows.

Now here's what you get when you want to work with the settings on the Connection tab:

Dancing tabs: after

Oops. Where did General go?

Says Johnson:

This is very disorienting to users. They click on a tab and it seems to vanish. It didn't vanish; it just moved to a new position, but it takes users a few confused seconds to find it again. Furthermore, contrary to what computer engineers might expect, users do not quickly get over the disorientation caused by shifting tab rows; their previous and continuing experience with single rows of tabs (as well as with tabs in the physical world) perpetuates their expectation that tabs stay put when selected.

The book is crammed full of these gems. Often, as in this case, we're only subliminally aware of the kinds of problems they demonstrate. The GUI is a kind of language; it has a correct grammar; that grammar is so often violated that we all learn to accept, and perpetuate, violations.

Johnson focuses mainly on the conventional GUI, and there is much less explicit treatment of web design principles. But the book is, for that reason, no less useful to web designers. The web's simplification of the conventional GUI was a correct and necessary way to achieve crucial goals: portability, rapid development, zero-footprint deployment, ubiquity. But much was sacrificed along the way. A whole generation of application designers is now largely unaware of the complete GUI grammar. Even if the pendulum does not swing back to richer web GUIs -- and I'm pretty sure that it will -- every web designer ought to absorb the principles articulated in Johnson's book. Nearly all of them are relevant by analogy to the world of HTML-based applications.


Jon Udell (http://udell.roninhouse.com/) was BYTE Magazine's executive editor for new media, the architect of the original www.byte.com, and author of BYTE's Web Project column. He is the author of Practical Internet Groupware, from O'Reilly and Associates. Jon now works as an independent Web/Internet consultant. His recent BYTE.com columns are archived at http://www.byte.com/index/threads

Creative Commons License
This work is licensed under a Creative Commons License.