Jon Udell: Code browsing, dogfood, cookie federations, and false security

Tangled in the Threads
Jon Udell, February 16, 2000
Code browsing, dogfood, cookie federations, and false security

A mixed bag of sightings, advice, observations, and questions

Ask and ye shall receive

A few weeks ago, I said that Zope developers would benefit from a searchable, cross-referenced source viewer such as the lxr tool that's used in the Mozilla project.

This week, Digital Creations' Chris McDonough noted that my wish is granted. A new site, http://www.codecatalog.com, offers hyperlinked browsing and searching of more than 130 open-source projects including AOLserver, Apache, Berkeley DB, Samba, TCL, Perl, Python, wxPython, and Zope. It's a terrific resource!

Admittedly, CodeCatalog's textual analysis of the source code in its repository is not yet as sophisticated as that of lxr. For example, lxr makes the crucial distinction between definitions of symbols and references to them; CodeCatalog doesn't do that yet. Of course that's a tougher nut to crack in the case of CodeCatalog, since its repository includes packages written in C/C++, Perl, Python, and Java.

Still, CodeCatalog is a major step in the right direction. The open source movement is generating enormous quantities of publicly-available source code and documentation. Services that can mine, reorganize, and thereby add value to this stuff are important new pieces of open-source infrastructure and -- not coincidentally -- excellent business opportunities.

Eat your own dogfood

Patrick Hicks wrote to inquire about documentation and training materials for database software. And he asked:

If you were to do it again, knowing now what you did not know then, how would you go about becoming a systems administrator or database developer?

Alan Shutko replied:

I'm neither a sysadmin nor a database developer, but I have one rule for learning things which has served me very well: Use it at home.

If you're trying to learn an OS, replace whatever you're using at home with it. If you're trying to learn a database, move all your other databases into it. You won't learn everything this way; you'll still have a lot to learn from other sources, but it's amazing how quickly you'll pick things up, especially the little tricks books and classes rarely mention.

Dominic Amann seconded the motion:

I emphasize the "full immersion" effect: commit yourself to the exclusion of anything else (like going to French immersion school), and you WILL learn.

The "eat your own dogfood" principle is always sage advice. The phrase has popped up elsewhere recently. For example, the Mozilla team says that it has achieved "dogfood status" -- meaning that the Mozilla team is now using Mozilla for the majority of its daily browsing and communicating.

Cookie federation

Suppose you run sites in several different domains, and want to share cookie-based user identification among them. The sites use different application servers, so it's not practical to thread some common identifier through all the HTTP transactions to all the sites. Is there a way to use cookies to identify users across this federation of sites? The problem, of course, is that a basic tenet of browser security is that cookie transmission is scoped by domain name, so a browser that sends a cookie to a.com won't send that same cookie to b.com. Is there a way around this?
What if the servers talk to each other, suggested Troels Arvin:

One server could keep the cookie data and share it with the others, using ODBC, or a native DBMS communication protocol, generic sockets, RMI, CORBA, server-to-server HTTP, etc.

This idea sounds promising, let's explore it:

a.com is the cookiemaster.

I visit a.com for the first time, and get my cookie.

I visit b.com for the first time. There is no reason to send it a cookie, so I don't.

Oops. Here we are stuck. How to leverage a's knowledge of me when I visit b?

What about this (admittedly awkward) scheme?

All b.com pages embed silent calls (e.g., fetching of a 1-pixel invisible gif) to a.com, the cookiemaster.

Thus, my visit to b.com causes a.com to log the fact that IP address X at time Y visited it.

b.com, seeing no cookie, asks a.com "Did you see IP address X at roughly time Y?" and if so, the cookie data replicates to b.com and is stamped into the browser.

From then on, I am recognized at b.com with the same cookie data as originally set by a.com.

As Troels points out, using the remote IP is a poor solution. Coming from behind a proxy such as AOL, a user may present different IP addresses from one IP address to the next. Synchronizing timestamps across all participating servers is another problem. But here's the worst flaw:

If b.com needs to know the user data all the way from the first HTML page request, then there is a problem: the HTML needs to be loaded before we may reliably assume that a GIF will be requested from cookie master.

Back to the drawing board. What if:

b.com redirects all cookieless requests to a script on a.com, embedding the originally-requested url in the script url.

The browser, which has a cookie from a.com (because it earlier registered with that server), transmits it to a.com.

The a.com script now has the originally-requested b.com url, and the cookie identifying the user.

The a.com script redirects to a b.com script, embedding this package of information in the request.

The b.com script finally redirects to the originally-requested url, while setting the cookie.

This can't be a new problem, I shouldn't think. Yet I can't come up with an existence proof of a cookie federation scheme that will allow ad-hoc groups of sites running in different domains to share common user identification. If you've seen a setup like this in operation, drop by the newsgroup and let us know about it.

False sense of security

To many Web users, the HTTPS icon used in IE () and Communicator () sends a strong signal that the site is secure and trustworthy. But as Franck Arnaud points out, that sense of security is easily misplaced. He cited a service that enables sites to collect form data using an HTTPS-based script, but then relays that data to its final destination using SMTP email.

They claim you can optionally use PGP but the lack of any documentation on how to enable it, and their obvious emphasis on being a service for the clueless, leads me to believe few of their clients would use PGP (or know what it is).

I suspect Franck's right. I asked the proprietors of this service, www.safepage.com, how the PGP option is handled. A SafePage representative replied that:

The orders are immediately encrypted and never stored as cleartext.

It is the customer's responsibility to acquire PGP, generate a key pair, and transmit the public key to SafePage.

Most customers do not use the PGP option.

Of course this isn't an isolated scenario. I've seen other examples of "secure ordering" that uses unencrypted email to relay those "secure" orders to their destination.

Notes Alan Shutko:

At least you're actually getting HTTPS form submission, so it's encrypted a little bit of the way. I keep seeing sites where the forms are on a secure server... but submit to a script on a regular one. Oops.

Yup. This is another common scenario. And still another is the case of an HTTPS script that dumps the cleartext order to a file which is in the web server's visible tree, in a directory that allows browsing.

It's a bit of a quandary. Encryption based on PKI (public-key infrastructure) is frighteningly complex. It's nothing short of miraculous that browsers have, for years now, reduced key exchange, server authentication, and the negotation of an SSL session to a no-brainer operation that is so transparent to users that, in fact, it is a major challenge to educate them as to the existence and meaning of those "lock" icons in IE and Navigator.

But security is an end-to-end problem. It's hard enough to get users to grasp the notion of a secure session between a browser and a server. How can they possibly evaluate the risks associated with handling an order thus received? To assure customers that online shopping is safe, e-commerce sites ritualistically chant "VeriSign, SSL, 128-bit." But that's practically meaningless -- anybody can buy a server digital certificate from VeriSign and support 128-bit SSL sessions. What really differentiates e-commerce sites, in terms of security, is the infrastructure behind the SSL session.

Convincing customers that the infrastructure is solid is a can of worms that nobody wants to open because, well, the truth is that it isn't all that solid yet. Consider, for example, this remarkable observation from the latest Netcraft Web Server Survey:

Leading encryption company RSA which has styled itself the most trusted name in e-security has had its web site successfully compromised twice recently, and seems to have changed web server platform on each occasion. On Thursday 10th February www.rsa.com was running Solaris and Netscape-Enterprise, by Sunday 13th it had switched to Linux and Apache/1.3.6, while today [Monday 14th February] it is running NT4 and Microsoft-IIS/4.0. It would be interesting to know the reasons for this; sometimes companies change platforms as a knee jerk reaction to a security or reliability problem, but going through the three most common platforms in four days seems exceptional.

Indeed.

E-commerce today is supported by some remarkable innovations. In some respects, the technology is capable of far more than users demand. Client authentication, for example, is the inverse of the server authentication which occurs when your browser establishes a secure session with an e-commerce site. Users should be demanding that e-commerce sites support client authentication. After all, I'm not too concerned that I'll end up at a rogue site pretending to be www.amazon.com -- which is what the currently-standard server authentication protocol guards against. I'm much more concerned that amazon.com should strongly authenticate me, Jon Udell, as the source of the digits of my credit-card number, rather than accepting any random Web client that happens to present those digits. This is what client authentication can do, and the technology's been widespread in browsers since 1996. But it's a chicken-and-egg situation. E-commerce sites don't want to burden users with the administrivia of acquiring client certificates, and therefore browser makers and certificate authorities (such as VeriSign) aren't aggressively working to simplify the procedure.

There is much hard work still to do before e-commerce sites will be able to offer assurances that go beyond "VeriSign, SSL, 128-bit."

Jon Udell (http://udell.roninhouse.com/) was BYTE Magazine's executive editor for new media, the architect of the original www.byte.com, and author of BYTE's Web Project column. He's now an independent Web/Internet consultant, and is the author of Practical Internet Groupware, from O'Reilly and Associates. His recent BYTE.com columns are archived at http://www.byte.com/index/threads

This work is licensed under a Creative Commons License.