Machine-to-Machine Hypermedia

Most developers and architects trying to create new RESTful hypermedia formats today are focused on “machine-to-machine” systems where the client is not driven by a user interface (UI). Hypermedia formats already exist for UI-driven clients. There’s obviously HTML plus a whole family of standards (SVG, SMIL, etc.) for graphical UIs and for voice UIs there are standards such as VoiceXML. While there are many great examples of hypermedia formats for UI-driven clients, it’s not even clear what “hypermedia” actually means outside of the context of a UI.

Let’s take a look at the Wikipedia definition of “hypermedia”:

Hypermedia is used as a logical extension of the term hypertext in which graphics, audio, video, plain text and hyperlinks intertwine to create a generally non-linear medium of information.This contrasts with the broader term multimedia, which may be used to describe non-interactive linear presentations as well as hypermedia.

This seems to define hypermedia as an extension of media designed for human consumption. So does it make sense to use the term hypermedia for something that isn’t consumed through some sort of user interface?

Perhaps hypermedia has a slightly different definition in the context of REST. Instead, let’s look at the definition of distributed hypermedia in Roy Fielding’s dissertation:

Hypermedia is defined by the presence of application control information embedded within, or as a layer above, the presentation of information. Distributed hypermedia allows the presentation and control information to be stored at remote locations.

In a more recent 2008 ApacheCon presentation, Fielding defines hypertext as:

The simultaneous presentation of information and controls such that the information becomes the affordance through which the user obtains choices and selects actions.

and then sums it up as:

Hypertext = data-guided controls

This is interesting — “controls” implies the ability to effect change in an application through some sort of input or action. In the field of design, the term “affordance” means the set of possible actions that a user can take, though it is more often used to mean the set of possible actions that the user is made aware of – the “perceived affordance”. An on-screen control “affords clicking” if the user believes that this is a useful and meaningful action to take.

In a browser, hypertext determines what text and graphics are presented on the computer screen as well as what on-screen controls are made available. Realizing controls not only requires communication to the user of what areas of the screen can be clicked or respond to keyboard input, but also effecting the response to this input. The input should of course be meaningful as the user is being made to perceive it as such.

In short, the hypertext informs the browser how to turn input and output resources (a screen, keyboard and mouse) into interactive information. This is actually very much in line with the Wikipedia definition we started with which, in addition to the non-linear nature of hypermedia, identifies interactivity as a characteristic which separates hypermedia from multimedia. Perhaps Wikipedia wasn’t such a bad source after all!

This leads us to the question of how to realize interactivity in a machine-to-machine context. In the presentation slides referenced above, Fielding notes:

Hypertext does not need to be HTML on a browser – machines can follow links when they understand the data format and relationship types

This evokes the notion of a spider crawling through linked documents, and it is certainly common to see attempts at building RESTful clients take an approach that is similar to spiders. However, in the context of HTML, spiders are form of “secondary client”. Unlike browsers they do not realize the controls described by the hypermedia document. HTML is a declarative format – it is a description of the interactive output of the browser. As described by the Principle of Least Power, a secondary client like a spider can analyze what the browser would do when given a specific hypertext document as input without actually realizing the presentation of information and controls itself. If hypertext documents were instead written in an imperative language like Java, this would not be possible.

Because a spider is able to determine what effect the activation of a specific control (e.g. clicking a link) will take, it is able to perform the same action itself (e.g. GET the document referenced by the link URI). This is not the same thing as using the control as realized by the browser. The browser uses graphical means (usually blue, underlined text and a special mouse cursor) to indicate that a section of text is a link that can be clicked. The browser receives an input event to tell it when the section on the screen where the link resides has been clicked which triggers the associated action. The text itself provides additional details to the user regarding the meaning of the link, whereas the spider may use this text as well as other information, such as a link relation, that is hidden from the user.

A spider does not use the controls described by the hypertext document — even if it could realize the controls, they are designed to be used by human beings. Rather, it attempts to understand the meaning of the control based on the declarative description of what is conveyed to the user about that control as well as control metadata such as link relations. Based on this information it determines whether to take the action associated with the control. Because of this, spiders are typically limited in what they can do. For example, spiders usually cannot fill in and submit a form without some out-of-band knowledge about the web site (e.g. that the form is designed to capture the details of a book purchase).

To deal with this issue, machine-specific control information is often layered onto a user-specific HTML page. Link relations, microformats and the like are constructs for this purpose. Another approach is to use an entirely separate format from HTML for machine interaction with a service. Unfortunately, the controls offered to the client programs are quite anemic, modeled after the <link> tag in HTML with nothing but a simple link relation to drive the control. Interestingly, the <link> tag is not associated with a UI control in HTML; the de facto standard for machine-to-machine hypermedia controls isn’t a control at all. It is a declaration of a typed association between two resources – not the same thing. The <a> tag is a control, but <link> is not even though they are both types of “links”.

I suggest that a new approach to hypermedia design is required to address the needs of machine-to-machine systems; one that is based on the design of data-guided controls that are appropriate for the specific machine-driven clients that are relevant to a problem space. An approach that treats a machine control as an analogue of user interface controls: a construct that provides an equivalent to perceived affordance suitable for machines and processes input events from machines. I intend to explore this further in upcoming posts.

This entry was posted on August 25, 2010 at 6:38 am and is filed under Hypermedia, REST, Software Architecture. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

7 Responses to “Machine-to-Machine Hypermedia”

Mike K Says:
August 25, 2010 at 9:12 pm | Reply
Machines are not autonomous, and they don’t have intuition. HTML relies on these human properties to drive applications.

Machine clients will always rely on ‘out-of-band’ knowledge because they aren’t actually capable of comprehending any in-bound knowledge – they’re just pre-defined mechanisms, afterall – everything a machine client ‘knows’ is out-of-band, because that is where it is created by its developer(s).

Accepting that, ‘simple link relations’ (and accompanied documentation) provide everything a developer needs to comprehend/discover a particular application and build the machine, and given that; everything the machine itself needs to operate.

As a practical example: If I significantly change my customers’ checkout process in my HTML driven web app, I can use natural language and GUI elements to guide them through the new flow; and it is likely most of them will adapt ‘in-band’.. machine clients just aren’t capable of reacting in the same way right now, and possibly never will – so why bother creating unnecessary hypertext controls and bloating your hypermedia types?
- Andrew Wahbe Says:
  August 26, 2010 at 3:00 am | Reply
  Thanks for the comment Mike.
  
  A few things:
  – controls for machines are definitely different than controls for human beings, but that doesn’t mean they are impossible to build
  – yes a machine can’t “know” anything, but that doesn’t mean that output messages to the machine can’t put it in a state where it will ultimately generate an acceptable input message
  – this works best (and perhaps is only possible when) the hypermedia format is designed for a specific type of machine user
  – the link relation approach assumes that the only attributes necessary to define a control are a type and a URI. I think it’s hard to build interesting or useful controls that way
  – Here’s my practical example: One can build and evolve all sorts of call control applications using CCXML. These applications are “used” by a machine — a call control platform. While CCXML has its warts, I believe it qualifies as a hypermedia format for machines.
  
  I will definitely elaborate on this in future posts.
Anonymous Says:
August 25, 2010 at 11:19 pm | Reply
Twitter Trackbacks…

…
Scott Banwart's Blog » Blog Archive » Distributed Weekly 65 Says:
August 27, 2010 at 2:31 pm | Reply
[…] Machine-to-Machine Hypermedia […]
Duncan Cragg Says:
August 28, 2010 at 5:51 pm | Reply
My own interpretation of REST for Machine-to-Machine integration or distribution is here:

http://duncan-cragg.org/blog/post/deriving-forest/

I see links only as creating hyperdata graphs, which is quite a different view from the ‘link-rel-action’ approach that is seeing increasing popularity these days.
Using Typed Links to Forms :: iansrobinson.com Says:
September 2, 2010 at 2:23 pm | Reply
[…] many different kinds of hypermedia control. Recently, Andrew Wahbe started examining the need for machine-to-machine hypermedia, and the differences between controls for machines and controls for humans. Watch his blog for […]
This Week in #REST – Volume 24 (Aug 23 2010 – Sep 5 2010) « This week in REST Says:
September 6, 2010 at 7:27 am | Reply
[…] Machine-to-Machine Hypermedia – More analysis of what hypermedia is and why we need machine-to-machine hypermedia controls. (by AndrewWahbe) […]

linked, not bound