The Document Is Dead ... Long Live The Document
The pitch: Certain human interface elements are metaphors for real-world objects. However the metaphors have been extended so far that the interface elements now have widely-understood semantics far beyond their real-world counterparts.
In this post I’m going to pick on a specific example: the document. Yes, those icons of a piece of paper with the top right corner folded down, are everywhere. The document is a user interface metaphor which is very common in modern operating systems. And like most metaphors they have been stretched. But more importantly, has the document outlived its usefulness, or can it be extended into the world of Web 2.0 and beyond?
Cues From The Analogue World
We all-know what a dead-tree document is in the analogue world (you know, the world outside the computer). The idea is that computer documents should be similar to these in most respects, in order to maximise usability. You write dead-tree documents, send them to people, file them, forward them to colleagues, or fold them into paper aeroplanes… and so it shall be for electronic equivalents.
I've always liked apps like Apple Mail which use paper aeroplanes as icons for various email actions. It just seems more appropriate than the envelope and letter metaphor, somehow. Or may be just because it reminds me of playing Glider. Ahh, happy memories.
The only problem is that there are things you can do with electronic documents that you can’t do with paper ones. Like, save them, for one thing. Where does that fit into the metaphor? (Unless you count “saving” a paper document as not throwing it in the bin).
But this is OK, the metaphor is only intended as a leg-up to help newbies understand user interface concepts. Or as Tog puts it,
Good metaphors are stories, creating visible pictures in the mind. It shouldn’t necessarily constrain us as user interface designers. Obviously we don’t want to break the metaphor entirely, but it is inevitable that a user interface entity will perform many more functions than the real-world counterparts.
Of course, sometimes the metaphor is broken in hilariously bad ways. In MacOS pre-X you ejected a disk by putting it in the trash can. In Windows today you apply wallpaper to your desktop. It makes sense until you think about it. Wallpaper. On the desktop.
Here be usability dragons. For example, there are folks like Tog who think that the Save function shouldn’t have been invented in the first place. For the record, I agree, but would argue there are even better ways of managing changes to documents.
So given that the idea of a document as a user interface no longer corresponds to a physical document, it seems only natural to ask what the hell is it? More precisely I’d like to define exactly how the analogue world document metaphor has been extended into the world of the computer user interface. This is important because the definition leads to consistency and, ultimately, usability.
What a Document Isn’t
So if we Google on the definition of a document (with respect to user interfaces) we get about a million hits, none of them relevant. The term document is just too widespread.
Julian points at the definition of a document that you’ll find in the .NET API documentation:
A Document object represents each open document or designer in the environment – that is, windows that are not tool windows and have an area to edit text.
Wow, they really did write that. That’s far worse than the worst definition of a document that I could think of. They seem to be saying that documents are a specialisation of window. There are of course many ways to refute this, but one obvious way is to point at the multiplicities between documents and windows. A document can be rendered in more than one window, thus showing that a document is not a window.
Let’s look elsewhere elsewhere.
Back to basics
The remainder of this post is unashamedly Mac-centric. This is partly because of my background as a Mac programmer, but mostly because I think everyone else basically followed Apple's lead on this important innovation.
First a bit of history: MacApp was an application framework for MacOS (surprise!). It pioneered, or at least popularised, many object-oriented design patterns. It is widely cited, including in the hugely-influential Gang of Four book, if you look closely.
MacApp was also designed to support development of applications that were fully compliant with the Apple Human Interface Guidelines at the time. So without further ado, let’s take a walk down memory lane, with the Programmer’s guide to MacApp:
Most Macintosh applications use the document as a repository for data, both in memory and on disk. When a user double-clicks a document’s icon or chooses Open from the File menu, the application opens the document, reads its data into memory, and opens a window to display the data. When the user closes the document, the application saves the data to disk if it has changed.
Documents are closely associated with windows. Opening a document usually opens one or more windows to display the document’s data. Closing a window may close its associated document–closing a document always closes any associated windows.
Despite its age, I think this captures most of what we mean when we think about documents, even in today’s computing environments. Hey, it’s a start anyway.
Tog says that
human interface objects are not necessarily the same objects as found in object-oriented systems. A good point in general, but I think that in this case there is a high degree of overlap between the O-O framework's concept of a document, and that of the HI. I offer no real justification for this.
To summarise: a document is a persistent store of data. The data can be loaded into memory, manipulated, and then stored again. Manipulation of document data is done through one or more windows.
I think this is a pretty good definition of a document an in the early 90s it would have been unquestioned. However the world has moved on since then.
The document metaphor was very successful on the Macintosh. So successful, in fact, that the document metaphor was thought to be suitable for all data, and hence all interactions on computer systems. This is how OpenDoc, and the more general idea of Document-centric computing, was born. It seems a bit naive, but I confess I was a believer at the time.
The idea was that emphasising the document over the application was a usability win. The stated justification was that it allowed the user to focus on their data rather than the software. OpenDoc mainly consisted of a fantastically complex framework to manage access to the document canvass by the various “part editors”. It was kind of like Windows OLE, but better. (Well, better conceived, if not better executed.)
With hindsight, it all sounds impossibly idealistic. Even in the early days there were questions as to how the document metaphor could be made to fit applications that were mainly clients for a central database. Short answer: it couldn’t. Then came the web.
Apple had an OpenDoc-based web browser called (I am not making this up) CyberDog. It allowed a web page to be embedded in a document. All of the same usability problems for document-based database applications applied to CyberDog.
Now obviously OpenDoc suffered from many implementation problems but I think that with the advent of the web, the entire concept of document-centric computing was fatally flawed. Embedding a browser into a document? Why would anyone want to do that? Sure, embed content from a web page, but not an entire browser frame surely? It just doesn’t seem to fit into our real-world ideas of documents.
On OddThinking I made an offhand comment to the extent that “browsers don’t use documents” and proceeded to justify it by defining documents in terms of persistent storage and author-specified content, neither of which is true for browsers today. The persistent storage criterion originated from the MacApp definition above. And the author-specified content? That was my goofy way of saying that CyberDog was a dog.
This post started as yet another extended comment on OddThinking. Then I realised that I was spending far too much time writing comments over there, and not putting effort into my own blog. Rather than risk girtby.net becoming jealous (or over-anthropomorphised) I posted my reply here. Besides, we'd gotten off the original topic anyway.
What has it done for us lately?
So document-centric computing waned just as the web started to wax. Fast forward to 2006. I look at my OS X installation and notice that document-centric applications are definitely the exception rather than the rule, at least for the applications that I use. Looking down my dock, I see three browsers, NetNewsWire, Apple Mail, a couple of IM apps, iTunes, iPhoto, iCal, Address Book, Interarchy, Terminal, MarsEdit and a couple of text editors. How many document-centric apps there? Two, if you count the editors.
OK, I admit I’m not a typical user. I dislike WYSIWYG, so I’m not a fan of traditional document-editing applications like word processors. So instead, let’s look at the apps in the iLife suite. None of them are traditional document-based applications, although three of the six operate on what I call a project metaphor (missing, crucially, the Save function that would make them a document-based application).
Let’s face it: documents are not an important user interface metaphor in the MacOS world any more. Just look: the current revision of the Human Interface Guidelines has no mention of them, other than to say “here is the type of window you should use when displaying documents”.
A Document? That’s soooooo pre-Web 2.0
Has the web killed the document star? If not, will the web kill the document?
Lets say you had some content you wanted to have reviewed by your colleagues. The document-centric way (or workflow if you want to get all jargony) of doing this is, of course, to package up the document into an email and send it off to the reviewers. They would annotate and send back to you. You’d then merge the comments and act on them.
I think this highlights an important attribute of the document: it is not just a delivery mechanism for content, but also a token which grants the possessor a certain role in a given workflow. You ship me the document, and I get to approve it, review it, or whatever. Just like a real world document, I then ship it back to you.
The web-centric way of doing this is to publish online and invite the reviewers to either annotate the content directly (technology willing) or provide comments online. This is not too far away. The problems of delivering these workflows to the web are being worked on, and are not too far away.
Hybrid web-desktop documents
In previous posts I’ve banged on about the advent of hybrid web/desktop applications. These are basically applications that are mainly accessed through a desktop-based user interface, but also through a web-based interface when necessary. The examples that I’ve seen so far have not chosen to use a document metaphor.
However, I can see that a document metaphor would be a great addition to web-based applications. Lets call them Web Docs. They might work in the following way:
- Each Web Doc would be associated through a file extension (or whatever) to a hybrid web/desktop application. The traditional document operations such as New, Open and Close would all work as they do currently.
- The Web Doc would contain a URL for a subset of the data on the web application. So if the web application were a blog, the Web Doc would contain a URL for an individual post on that blog.
- The Web Doc would contain a cached copy of the most recent version of the data on the web. This would be used for offline editing, increasing performance, or whatever.
- The Save operation would be reserved for uploading local changes to the web version. Of course there is the potential for conflicts here, and these need to be resolved, unlike current documents.
- The Revert to Saved operation would trigger a download of any changed data from the server to the Web Doc. Maybe this could be triggered automatically?
- Creating a Web Doc could be done from within the appropriate application, or by a fairly simple enhancement to the browser. When the user drags a link (either from the location field or the page itself) to the desktop, current browsers create a “generic” URL document. In the Web Doc world, they should instead first perform an HTTP HEAD operation on the link, determine the mime type of the resource and then create a document containing this URL, but associated with the correct application for the mime type.
- As a consequence of the previous point, this raises the possibility of “unpopulated” Web Docs. These have a URL for the data but no local cache. This implies that the data URL should be conveyed in file system metadata rather than in the document content itself, otherwise a common file format would be needed across all Web Doc-aware applications.
There are many consequences, and admittedly I haven’t thought them all through. Like the fact that the Web Doc idea is dependent on the document metaphor surviving in an environment of multiple authors. And the need for authentication to decoupled somehow from the data (perhaps mandating a system-wide credentials repository like Keychain?). And the need to resolve confusion between local access controls and access controls for the web-based content. And, well, lots of other stuff I’m sure.
The key goal here would be to enable document-like behaviour for web-based content. That’s a pretty powerful combination if you ask me.