Web Linking

A couple months ago, Mark Nottingham’s Web Linking internet draft made its way to RFC status. This is a pretty significant specification for the web. It does three key things:

  1. It provides a generic definition of a “link”;
  2. It establishes a registry for link relations; and
  3. It defines the HTTP link header.

The first point is one of those things that surprisingly hadn’t been done before – at least as far as I know anyways. Sure, links have been defined in the context of specific formats, and the semantic web has a fairly generic definition of a link, but the web linking RFC provides an application and serialization agnostic definition, which is a pretty useful thing to have.

The RFC defines a link as a typed connection between two resources: a context resource and a target resource. The resources are identified by URIs (well, technically IRIs) and the type is identified by a link relation. A link relation identifier can be one of the relation types from the registry established by the RFC, or it can be an extension (non-registered) relation identified by a URI. The type defines the semantics of the link. For example, the “stylesheet” relation means that the target resource is a stylesheet for the context resource.

The HTTP link header provides a mechanism for specifying links in an HTTP response when the representation format being used does not natively support them. For example, a response to an HTTP GET request for a PNG image could use the link header to indicate that the resource is linked to other resources in various ways.

Now some might not immediately see the benefit of this specification. There are tons of (so-called) “RESTful” APIs out there stuffing links in JSON data structures all over the web right now. For example, what about the following HTTP request:

GET /someresource.json HTTP/1.1
Host: example.com

and response:

HTTP/1.1 200 OK
Content-Type: application/json

{ “node” : {
  “id” : “http://example.com/identifiers/12345”,
  “next” : “http://example.com/nextres.json”,
  “value” : “http://other-example.org/somelink.html”
}}

Well this data structure seems to contain 3 URIs – but does it contain three links? It’s hard to tell without knowing more about the data format, but if we make some assumptions from the names, the properties demonstrate 3 distinct uses of URIs. The “id” property uses a URI as a unique identifier, similar to how URIs are used in XML namespace identifiers or in the Atom id element. The “value” property is just a string, it could be anything, but in this example that string happens to contain an http URI – one might even argue that this isn’t really a URI at all, but just a string that looks like one. The “next” property is the only link out of the three, providing the URI of another node that follows the one we are looking at (i.e. the nodes form a linked-list).

Of course, if the property names were “x”, “y” and “z” we’d have a much harder time figuring this out. This is an example of where serving data with a content-type of “application/json” (or “application/xml”) falls short. You can’t tell how to interpret the data just by looking at the response – you need some extra information or documentation to understand it. This is where self-descriptive messages are critical – the response would be self-descriptive if the data format had its own media type (e.g. “application/node”) with an associated specification that described how to interpret the individual properties. This specification would define the semantics of the link relation associated with the “next” property (i.e. this format doesn’t use a generic link construct with an extensible relation field), allowing the consumer to follow the link to fetch the following node in the list. While this format specification could also be considered “extra information or documentation” the difference is that it isn’t specific to the URIs you are interacting with (in a linked environment you can find yourself interacting with new URIs that you hadn’t expected) and relevant specification can easily be identified via the communicated media type.

This hopefully makes it clear that it is not sufficient for data to simply contain URIs for it to be “linked”. There must be a specification of the format that identifies those URIs as links, and either defines the link semantics or how they can be determined. The link might be part of a generic link construct like the Atom and HTML <link> elements, referencing a relation from the link relation registry that provides the link semantics. Alternatively, the link semantics might be defined in the data format, as was the case in the “next” property from our example.

The Web Linking RFC is really useful because you don’t have to go and redefine what a link is in every data format you define – you can reference the definition in the RFC. The specific link types in your format can reference the definitions in the link relation registry. For example the definition of the format used in our example above could simply reference the “next” relation in the registry. Alternatively, a generic link serialization could be used that leaves the relation as a field allowing any relation in the registry (or a proprietary relation) to be used.

For example, the response in the example above could be re-defined as follows:

HTTP/1.1 200 OK
Content-Type: application/node

{ “node” : {
  “id” : “http://example.com/identifiers/12345”,
  “links” : [ {
    “href” : “http://example.com/nextres.json”,
    “rel” : “next”
  } ],
  “value” : “http://other-example.org/somelink.html”
}}

(Update: I changed the content type here from “application/json” to the fictitious “application/node” media type — see Nathan’s comment below. We assume that the specification for this type properly defines the “links” field as an array of links.)

Alternatively, the link header could be used as follows:

HTTP/1.1 200 OK
Content-Type: application/json
Link: <http://example.com/nextres.json&gt;; rel=”next”

{ “node” : {
  “id” : “http://example.com/identifiers/12345”,
  “value” : “http://other-example.org/somelink.html”
}}

However, this has the disadvantage of separating the link from the content – if you store the body somewhere (e.g. to disk) you lose the link information that was sent along with it. It also means that you can’t communicate the link information if you communicate the structure using a protocol that doesn’t support the link header (e.g. FTP). It is best to embed the link information in the response body unless the format doesn’t allow it (e.g. it’s an image).

I hope it’s clear that the Web Linking RFC makes it really easy to add links to any resource on the web. But does this mean that any resource can be turned into hypermedia? Is a linked resource the same thing as hypermedia? As discussed in my last post, we can summarize “hypermedia” as data-guided controls. Is adding links to data sufficient to provide “data-guided controls”? This is something I’ll continue to explore in future posts.

Advertisements

4 Responses to “Web Linking”

  1. Nathan Says:

    The first examples contains 3 URIs, if they were presented to a human in such a way that they were actionable, then those would be called hyperlinks, untyped ones.

    The second example is exactly the same, because application/json doesn’t have any hypermedia semantics, the terms href and rel in that JSON have no meaning at all. It’s no good using two terms from another mediatype and saying that they have the same meaning else where, they don’t.

    The third example has one fully typed link, the one in the Link header. And that’s the only example with any kind of link in it.

    Best,

    Nathan

    • Andrew Wahbe Says:

      Oops yes — I agree on the second example. I forgot to change the content type in the second example from “application/json” to my fictitious “application/node” media type. Will update.

      Your comment about “presenting the link to a human” is touching on the “data-guided controls” issue I talked about at the end. More on that to follow.

  2. Scott Banwart's Blog » Blog Archive » Distributed Weekly 79 Says:

    […] Web Linking […]

  3. This Week in #REST – Volume 30 (Nov 21 2010 – Dec 12 2010) « This week in REST Says:

    […] Web Linking – “A couple months ago, Mark Nottingham’s Web Linking internet draft made its way to RFC status. This is a pretty significant specification for the web. It does three key things: It provides a generic definition of a “link”; It establishes a registry for link relations; and It defines the HTTP link header.” (by Andrew Wahbe) […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: