Tangled in the Threads

Jon Udell, September 27, 2000

Whatever became of computer telephony?

CTI (computer/telephone integration was the Next Big Thing in 1994. It still is.

We tend to focus on merging voice and data traffic into the same pipe. That'll eventually happen, but meanwhile can't our PCs help us use the existing telephone infrastructure more effectively?

Once upon a time (1994) I wrote a cover story for BYTE on the subject of computer telephony. Last week, I was reminded of that story when I found myself involved in several of those annoying conference call fiascos:

"Hang on, I'll try to patch in so-and-so."

"If I lose you, I'll call back."


The number of minutes of productivity lost every day to this sort of nonsense is startling, and the situation seems to have improved not at all in recent years. I still don't see voice-over-IP as a great near-term solution. Internet audio is, for many people, a disappointment. And in any case, why not leverage the reliable low-latency 64kbps voice circuits that we already have?

I still see CTI (computer-telephone integration) as a killer app waiting to happen. In that 1994 article, the focus wasn't on merging voice and data traffic into the same pipe. Rather, it was on ways that our PCs could help us use the existing telephone infrastructure more effectively.

You don't have to think very hard to come up with a whole series of interesting opportunities:

Forget the phone, use IM?

In follow-on discussion, Mark Wilcox suggests that maybe the phone just matters less than it used to:

Instead of doing voice conferences, use instant messsaging. (Full disclosure, I'm working on a consulting project for Jabber.com who sponsors the open-source Jabber IM project).

I'll admit that before July, when I started working on this project, I thought IM was kind of cheesy, but I've been using it regularly to do the development work for the project instead of just email.

The cool thing about IM is that you can do things like regular email (send a note to a single person) including attachments. But, you can know if they are online or not, via a presence protocol. On Jabber, you can even send a person who's offline a message and they'll get it once they are online.

Then there is groupchat, which is more like your traditional IRC environment. It does suffer from confusion with 3 or 4 people involved, but I don't think it's worse than 3 or more people on a conference call.

The only real-downside at the moment is that you don't have a guaranteed standard way to 'persist' the conversation, though many clients support a feature like that. But if you really need this, you can always revert back to persistent messaging, or what we've more traditionally called "e-mail."

At one time I too underestimated the significance of IM. I've been corrected by various people who've shown me the business relevance of the medium. But people talk much faster than they type. Equally important, they convey a great deal in tone and inflection. Voice is a rich medium, and it is carried most effectively through the existing and well-established circuit-switched network. Integrating that network with the packet data network need not necessarily wait for voice-over-IP, and should not. As far as ditching voice conferences for IM, I just can't see it. As a general rule, I think the name of the game is not to converge on a single best channel of communication, because they all have strengths and weaknesses. Rather, we want to find the best ways to coordinate the use of multiple channels.

The point about persistent chat is, by the way, interesting. Notes James Power:

Initially I thought it would be great to have a written record of the conversation. In fact, chat sessions (those I've looked at afterwards) are too disjointed and full of abbreviations to do anything with. Just more junk to file and ignore, or worse, spend time exploring only to come up with nothing.

I wouldn't expect to usefully spend much time reading chat transcripts. Searching them, though, that's something else again. Chat is not normally used in a mode where everything is indexed and searchable, but it can be, and that can be useful.

Message metadata

Something else that might be useful, someone else pointed out, would be a mechanism for threading and marking threads.

Like NNTP, except real-time, and with the ability for me to mark a thread as "interesting." Just a quick mouse-click or keypress, nothing more. I often have trouble finding that useful remark that's buried among three hundred lines of crud. Maybe I can't even remember what the idea was -- I just remember that Joe said something that I thought was interesting at the time.

Added James Ramirez:

What about leveraging the kind of solution that occurs in MUDs? Instead of explicitly tagging conversations/comments, individuals adopt a different method of communication. They use 'page' rather than 'say', or use a different channel ('admin' vs 'public') according to need. These methods tend to make your communication visibly distinct.

This raises the general issue of metadata-tagging our messages. I'd like to be able to incorporate the "speech act" model of communication into our messaging. In other words, I'd like (in any given messaging channel: mail, chat, phone) to be able to be explicit about the nature and purpose of the message. Is it:

There is a kind of deep structure underlying many communication acts. I have long believed that software should ultimately guide us, as we exchange messages, in making such deep structure explicit as metadata -- and thereby available for processing that can help us streamline and organize our communication.

Things like Subject/Author/Date (in email), Rate This Message 1-5 (in forums), and Message from Joe at 2PM (in voicemail) barely scratch the surface, with respect to what message metadata could and perhaps should be.

Voice recognition's hybrid potential

Let's switch back to the voice medium. What wouldn't I give for a searchable archive of my phone conversations? Here's another killer app just waiting in the wings. Some years ago, I saw a really interesting hybrid app. It scanned and OCR'd a bunch of resumes into a searchable database. Even though there was no correction on the OCR, it turned out that for searching, it worked fine. If I'm looking for key terms like "Java" or "SSL" in a stack of resumes, I can easily find the image of the document containing those recognized terms. The OCR'd text may not be all that readable, but the corresponding document image certainly is. The trick is to find it. The OCR'd text can be a great way to locate a document image.

The fact that fulltext search is an effective locator of documents, even when the text is in a degraded state, was a revelation. Now consider voice. Suppose all my phone conversations spool to disk (only, of course, with the permission of my interlocutors). Suppose those voice conversations are automatically recognized as (imperfect) text. Now I search. The recognized text may be just fine to help me locate, and randomly access, a piece of recorded audio.

There are lots more good ideas along these lines, I'm sure. Clearly 1995 wasn't the "Year of CTI," as I had hoped. I wonder when that year will be?

Jon Udell (http://udell.roninhouse.com/) was BYTE Magazine's executive editor for new media, the architect of the original www.byte.com, and author of BYTE's Web Project column. He's now an independent Web/Internet consultant, and is the author of Practical Internet Groupware, from O'Reilly and Associates. His recent BYTE.com columns are archived at http://www.byte.com/index/threads

Creative Commons License
This work is licensed under a Creative Commons License.