Most developers and architects trying to create new RESTful hypermedia formats today are focused on “machine-to-machine” systems where the client is not driven by a user interface (UI). Hypermedia formats already exist for UI-driven clients. There’s obviously HTML plus a whole family of standards (SVG, SMIL, etc.) for graphical UIs and for voice UIs there are standards such as VoiceXML. While there are many great examples of hypermedia formats for UI-driven clients, it’s not even clear what “hypermedia” actually means outside of the context of a UI.
Let’s take a look at the Wikipedia definition of “hypermedia”:
Hypermedia is used as a logical extension of the term hypertext in which graphics, audio, video, plain text and hyperlinks intertwine to create a generally non-linear medium of information.This contrasts with the broader term multimedia, which may be used to describe non-interactive linear presentations as well as hypermedia.
This seems to define hypermedia as an extension of media designed for human consumption. So does it make sense to use the term hypermedia for something that isn’t consumed through some sort of user interface?
Perhaps hypermedia has a slightly different definition in the context of REST. Instead, let’s look at the definition of distributed hypermedia in Roy Fielding’s dissertation:
Hypermedia is defined by the presence of application control information embedded within, or as a layer above, the presentation of information. Distributed hypermedia allows the presentation and control information to be stored at remote locations.
In a more recent 2008 ApacheCon presentation, Fielding defines hypertext as:
The simultaneous presentation of information and controls such that the information becomes the affordance through which the user obtains choices and selects actions.
and then sums it up as:
Hypertext = data-guided controls
This is interesting — “controls” implies the ability to effect change in an application through some sort of input or action. In the field of design, the term “affordance” means the set of possible actions that a user can take, though it is more often used to mean the set of possible actions that the user is made aware of – the “perceived affordance”. An on-screen control “affords clicking” if the user believes that this is a useful and meaningful action to take.
In a browser, hypertext determines what text and graphics are presented on the computer screen as well as what on-screen controls are made available. Realizing controls not only requires communication to the user of what areas of the screen can be clicked or respond to keyboard input, but also effecting the response to this input. The input should of course be meaningful as the user is being made to perceive it as such.
In short, the hypertext informs the browser how to turn input and output resources (a screen, keyboard and mouse) into interactive information. This is actually very much in line with the Wikipedia definition we started with which, in addition to the non-linear nature of hypermedia, identifies interactivity as a characteristic which separates hypermedia from multimedia. Perhaps Wikipedia wasn’t such a bad source after all!
This leads us to the question of how to realize interactivity in a machine-to-machine context. In the presentation slides referenced above, Fielding notes:
Hypertext does not need to be HTML on a browser – machines can follow links when they understand the data format and relationship types
This evokes the notion of a spider crawling through linked documents, and it is certainly common to see attempts at building RESTful clients take an approach that is similar to spiders. However, in the context of HTML, spiders are form of “secondary client”. Unlike browsers they do not realize the controls described by the hypermedia document. HTML is a declarative format – it is a description of the interactive output of the browser. As described by the Principle of Least Power, a secondary client like a spider can analyze what the browser would do when given a specific hypertext document as input without actually realizing the presentation of information and controls itself. If hypertext documents were instead written in an imperative language like Java, this would not be possible.
Because a spider is able to determine what effect the activation of a specific control (e.g. clicking a link) will take, it is able to perform the same action itself (e.g. GET the document referenced by the link URI). This is not the same thing as using the control as realized by the browser. The browser uses graphical means (usually blue, underlined text and a special mouse cursor) to indicate that a section of text is a link that can be clicked. The browser receives an input event to tell it when the section on the screen where the link resides has been clicked which triggers the associated action. The text itself provides additional details to the user regarding the meaning of the link, whereas the spider may use this text as well as other information, such as a link relation, that is hidden from the user.
A spider does not use the controls described by the hypertext document — even if it could realize the controls, they are designed to be used by human beings. Rather, it attempts to understand the meaning of the control based on the declarative description of what is conveyed to the user about that control as well as control metadata such as link relations. Based on this information it determines whether to take the action associated with the control. Because of this, spiders are typically limited in what they can do. For example, spiders usually cannot fill in and submit a form without some out-of-band knowledge about the web site (e.g. that the form is designed to capture the details of a book purchase).
To deal with this issue, machine-specific control information is often layered onto a user-specific HTML page. Link relations, microformats and the like are constructs for this purpose. Another approach is to use an entirely separate format from HTML for machine interaction with a service. Unfortunately, the controls offered to the client programs are quite anemic, modeled after the <link> tag in HTML with nothing but a simple link relation to drive the control. Interestingly, the <link> tag is not associated with a UI control in HTML; the de facto standard for machine-to-machine hypermedia controls isn’t a control at all. It is a declaration of a typed association between two resources – not the same thing. The <a> tag is a control, but <link> is not even though they are both types of “links”.
I suggest that a new approach to hypermedia design is required to address the needs of machine-to-machine systems; one that is based on the design of data-guided controls that are appropriate for the specific machine-driven clients that are relevant to a problem space. An approach that treats a machine control as an analogue of user interface controls: a construct that provides an equivalent to perceived affordance suitable for machines and processes input events from machines. I intend to explore this further in upcoming posts.