Hypermedia is the Client’s Lens

RESTful systems are by definition supposed to be based on the architectural style of the Web; however, there is one big fat glaring difference between the Web and almost all of the other systems out there that claim to be RESTful. I’m not talking about use of methods or any other aspect of HTTP. I’m not talking about the structure of the URIs, the resources in the system, or even whether or not the representations contain links. Many systems that can honestly claim to be at Level 3 of the Richardson Maturity Model have this deficiency.

I’m talking about over-constrained, service-specific hypermedia formats that precisely represent a service’s resources and workflow. It’s hard to think of this as a “problem” — it’s what we’re used to doing in software interfaces. But this is certainly not how things work on the Web. Here we have a single format, HTML, used by a wide variety of services: Google, Facebook, Amazon, etc., that all do very different things. The markup language is not designed around the semantics of any of the resources exposed by these services. There is no <book> element used to represent a book on Amazon.com. The Web’s interface is uniform not only because of HTTP but also because of HTML.

Think of it this way: the fact that HTML is able to describe a wide variety of services not only allows a single client to work with the limitless set of sites on the Web, but also allows a single service to evolve. A site can change itself from one of the services describable by HTML to another without requiring the browser to be updated. However, if a hypermedia format is only able to express a very constrained range of services, then service evolution becomes impossible. If you provide a service for “sprocket management” and your hypermedia format is defined in terms of sprockets, collections of sprockets and link relations describing all the supported actions one might take on sprockets, then it’s hard to see how it could be used for anything else in the future.

You may be wondering how exactly you go about broadening the range of your service’s hypermedia format. Sure you can generalize the format (maybe a sprocket is just a type of widget and instead of sprocket collections, you can just have a generic collection element) but that will only get you so far. And more importantly, most folks don’t have the precognitive abilities required to know where their service is going to evolve to in the future. Format extensibility isn’t the answer either: clients would still need to be adapted to any extensions required to support changes to a service.

We can look to the Web for a strategy to deal with this.

The formats used on the Web aren’t designed around the services. They are designed around the client and what it does with the information. A browser is a program that presents information to human beings, primarily through a textual/graphical user interface, and the browser languages revolve around the needs of this task. HTML tags are used to provide structure to text and inline media as well as provide controls for interacting with the information via links and forms. CSS allows the presentation to be customized, and Javascript event handlers express handling for focus, mouse click, and other events if the default handlers aren’t appropriate for the application. These languages drive what the browser does with the information — they have nothing to do with the specific services being executed.

HTML is the language of the browser; it is the lens through which browsers see all resources. Services adapt their information and workflow descriptions (resources) into documents in this language (representations) when working with browsers. The HTTP Accept header tells them that they are working with a client that speaks the browser’s language — a browser or some other client, like an indexing spider that would like to see the resources from the browser’s perspective.

When building RESTful systems for other domains, why do we make the hypermedia format specific to the service? Instead, we should be defining hypermedia formats around the clients — if the goal is to allow the client to interact with the widest range of services possible, then it makes sense to use a format that is designed around the client semantics and bounded only by the capabilities of the client.

At first glance this seems to be binding the service to a specific client — but not completely due to the declarative nature of markup and the Principle of Least Power. Other, secondary types of clients like indexing spiders are able to analyse the markup and understand what it does to provide other capabilities such as search. So really the service is bound to the ecosystem of clients that understand that markup language. But still, this model does limit the clients that can be supported with a single markup language.

This is where the separation of resources and representations, and Content Negotiation (conneg) come into play. A single resource can have multiple representations. They can have a separate URIs or share a single one (using the conneg features of HTTP), but either way a client is able to get the resources represented in a format it supports. By supporting multiple hypermedia formats, your service can cater to multiple classes of client simultaneously. Support for new formats/clients can easily be added without disrupting existing ones.

For example, say that you are building services within a bank. Rather than designing a single hypermedia format around the resources and services such as accounts, balances and financial transactions, design (or re-use) separate formats for each type of client. The teller terminals would likely use HTML, the ATM machine might also use HTML or something similar to deal with the specifics of the interface, a hypermedia format might be designed for automated cheque processing machines. Inter-bank transactions are a tricky point — this might require a format designed around accounts, transactions etc. but this should be at least customized to the specific inter-bank use cases. However, an alternative approach might be to support multiple hypermedia formats at the inter-bank boundary for the specific client types. (Ok, so I’m not in banking myself, so maybe I’m a little off base here on the specifics but I think the principle is sound. If you want examples from my domain, I can simply refer you to VoiceXML and CCXML.) By customizing the hypermedia to the capabilities and needs of the individual classes of client, you provide the services with the ability to change and grow.

This model is based on the assumptions that the services outnumber and will evolve more quickly than the clients and that updating clients is more difficult/expensive than updating services. HTML and browsers do evolve — but there are far fewer browser implementations and versions of HTML than there are services (never mind versions of those services). Also, it is much more difficult to get millions of users to update their browsers than to update your web site. If these assumptions don’t hold for your domain, maybe designing your hypermedia format around your service does make sense — just don’t fool yourself into thinking that you’ve decoupled your clients from your services.

That’s enough for now. I’ve likely raised more questions than I’ve answered here but this is definitely not my last word on this. I plan to provide more details in upcoming posts. Please fire away with the questions and feedback though so I can focus on the right areas.


11 Responses to “Hypermedia is the Client’s Lens”

  1. Nathan Says:

    Think you nailed it 🙂

    In addition, if you use a hypermedia type which supports typed relations then you can serialize machine readable data within it, or you could bump up to something like RDFa which has human readable HTML and machine readable EAV/RDF structured data with typed relations embedded within, then to serve a light version you could conneg through to something like turtle,n3,rdf/xml for the lightweight version with human presentation elements stripped…



    • Andrew Wahbe Says:

      Thanks Nathan.

      yeah, RDFa and microformats etc. offer a neat twist on things allowing extra semantic meaning to be layered on top of your data. It’s not clear to me if the right way to think of this is as expanding the ecosystem of clients that support your base format (and also allowing existing clients to do more with the data) or as really supporting multiple formats (and hence truly distinct clients) in one doc. The difference for me would be that the latter would allow different workflows through the data for each class of client.

  2. Scott Banwart's Blog » Blog Archive » Distributed Weekly 54 Says:

    […] Hypermedia is the Client’s Lens […]

  3. guilhermesilveira Says:

    Great post Andrew.

    In the human web, we design new services around the knowledge that our possible clients have, while in the machine world we have been designing clients around what our servers provide.
    Although we can change the order of things, designing our services around our client sets, there would always be this binding between boh sides – after all its a client server arch.

    If a new set of clients come in, the set of resources and representations that we were not able to provide yet can be supported by either providing a new media type and/or linking the resources (and processes) within the existing ones.

    What do you think?

  4. This Week in #REST – Volume 19 (May 31 2010 – Jun 13 2010) « This week in REST Says:

    […] Hypermedia is the Client’s Lens – “Over-constrained, service-specific hypermedia formats that precisely represent a service’s resources and workflow” are the problem with modern RESTful systems (other than the Web). (by Andrew Wahbe) […]

  5. Mike Says:

    That’s a great overview – I’m now just confused as to why you would dislike the idea behind application/hal+xml, since they are essentially the same.

    • Andrew Wahbe Says:

      Thanks Mike.
      So first I’ll just say that maybe I’m misunderstanding HAL. Is there a more comprehensive description than your blog post: http://restafari.blogspot.com/2010/06/please-accept-applicationhalxml.html ? Perhaps I just need to get a better sense of it.

      Otherwise, my take is that we’re not taking the same approach. I believe you are shooting for a highly extensible format — what I was addressing when I said:

      “Format extensibility isn’t the answer either: clients would still need to be adapted to any extensions required to support changes to a service.”

      To me, HAL is pushing the real definition of the media type semantics into the extensions. A base HAL client (that understands no extensions) isn’t very powerful — it can’t do much more than parse the representation. The advantage over simple XML seems to be that it has a notion of links, but without the relations (and these are all extensions) a base client has no idea what to do with the links. From what I can tell data semantics are also defined in extensions — again a base client can do nothing with the data.

      So in practice your hypermedia format is actually defined by HAL + a specific set of extensions. Now you could define your HAL-based hypermedia types around the client rather than the service — but I don’t think HAL necessarily makes this easier. In fact, I’d rather see a type more customized around the client’s needs (e.g. HTML, VoiceXML and CCXML) than something generic — it makes for a better, cleaner format. For example, HAL seems generic enough to implement something HTML-like, but the resulting format would be painful for developers IMO.

      On the other hand, maybe there are some use cases where it’s a good fit, perhaps making hypermedia format design a bit more of a paint-by-numbers, simple exercise. Not sure. And as I said, I might be missing something.

  6. Matthew W Says:

    One thing to bear in mind:

    There’s one important characteristic of HTML which makes it particularly amenable to being less constrained and more general in its semantics.

    That being that HTML documents are largely designed for direct human consumption!

    The client software itself (browser) doesn’t need to know anything about the application-specific semantics of an HTML document – it just needs to present hypertext options in a generic fashion to a human being, and the human being uses out-of-band, fuzzy, human reasoning to decide what the HTML page actually means and which hypertext options to select as a result.

    Not all APIs, clients and protocols are like this.

    Not saying that necessarily changes your argument, and of course there are ways to add structured semantics on top general-purpose formats like HTML too. But it’s worth thinking about and talking about I think as part of this debate.

    • Matthew W Says:

      I guess a more concise way of putting that would be to say that the right design decision here could depend greatly on whether you’re catering for ‘thick clients’ which offer a lot of local application-specific functionality, or ‘thin clients’ which are glorified hypermedia browsers.

      The latter is more the REST way I suppose — but REST is also being sold quite heavily for use with other kinds of clients too.

      It would be nice if rather than focusing on what is the One True Way To Do REST, people focused on different kinds and different levels of REST for different situations, without making any kind of global value judgement as to which is better.

    • Andrew Wahbe Says:

      Yup… I was waiting for someone to present this point of view — I have a series of blog entries planned to argue against this perspective (this is the first) but let me make a few points here.

      HTML is not designed for direct consumption by human beings. Browsers consume HTML — they execute the declarative program represented in a markup language designed for textual/visual (well primarily visual) presentation of information. The markup looks like the output (what humans consume) because of the declarative nature of the language. But it’s not the same thing.

      HTML is not that generic. For example, it’s a pretty lousy language for specifying a call control application. CCXML is much better there. It also has structured semantics — they revolve around the task the client performs — textual/visual information presentation.

      HTML is as generic enough to express all the things you want a browser to do and not much more. It is also specifically designed to express those things.

      There is a difference between the domain (for HTML: information presentation) and range (for HTML: services provided by all of the web sites on the internet, e.g. social networking, search, book selling) of a markup language. Don’t confuse the two — this is the main message of this post.

      I think the HTML model (as I see it) is widely applicable. At the end of the day the browser is converting HTML into command messages to the OS/window manager to draw things on the screen and input (keyboard, mouse) event messages into DOM events that ultimately trigger links to new HTML documents. This pattern is applicable to many use cases — VoiceXML and CCXML can be viewed this way. You need to design your hypermedia language around the commands and events that are processed on the client.

      I agree that you don’t need all of REST’s constraints in all cases. But I’m trying to address how to reap the full benefits — something that a lot of people haven’t had much success with IMO.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: