In an earlier post, I examined multiple definitions of the term “hypermedia”. A key aspect shared by all those definitions was that hypermedia is interactive – an entity interacting with the system is able to effect change through some sort of input or action. In fact, Roy Fielding summarizes “hypertext” as data-guided controls. Unfortunately, little focus is typically given to input event processing in the context of REST even though the style offers some definite advantages in this area.
In another post, I discussed a comparison of various alternative architectures for the Web found in section 5.2.1 of Roy Fielding’s dissertation. The focus of this analysis was the transfer of information from server to client as the movement of data to the processor distinguishes REST from other architectural styles that move processing agents to the data.
In this post, I will re-examine these alternatives, extending the analysis to the information flowing from client to server and the protocols used to exchange information. By doing so, I hope to clarify the advantages provided by REST for input event processing.
Option 1: Client-Server
The first alternate architecture is based on a client-server model where information is rendered on the server and the resulting image is sent to the client. As discussed previously, this provides data encapsulation and decoupling at the expense of bandwidth usage and increased server-side processing. Also, the creation of secondary clients such as spiders is inhibited.
Now, let’s turn our attention to input processing: in such a system, the client has no knowledge of what input events are relevant to the application and what effect they might have. The only information the server has provided the client is an image to display on the screen – the client has no idea where the links or other input controls are located. There is nothing to tell it what areas of the screen respond to mouse-over events, clicks, etc. And so every mouse or keyboard event must be communicated to the server just in case they are relevant. This results in wasted bandwidth, and wasted server-side processing of irrelevant input events. The responsiveness of the client would likely seem quite slow as well because even the most minor change to the display would require a round trip to the server.
Option 2: Mobile-Object
The second option is a mobile-object style architecture where the server sends a combination of both the data and a rendering engine (i.e. processing logic or code) to the client. Here, the client-side events can be filtered and processed by the mobile code running on the client, allowing communication with the server to take place only when it is truly needed by the application. The processing of an event might result in an update to the rendered image based on local processing and in other cases result in a request to the server, though many events could simply be ignored.
The client also has the ability to establish a connection with the server and communicate using virtually any protocol as the protocol stack is included in the code that comprises the mobile-object and the communication protocol can be optimized for individual applications (though this flexibility could not be afforded to the protocol used to download the initial mobile object). Thus client-to-server bandwidth-usage, user-perceived latency and server-side processing are optimized. This comes at the expense of visibility as intermediaries are not able to interpret the information exchanged between the client and server and thus cannot perform any processing on the data or optimization of the communication flows (e.g. caching).
Option 3: Raw Hyper-Data
In the third architecture the server sends the raw data to the client along with a media type that indicates the data format allowing the client to choose a rendering engine for the data. Here, the downloaded data not only drives the rendering of output, but also provides instructions to the client on how to interpret input events – ignoring irrelevant events, handling events locally where possible and generating requests to the server for more data in other cases. Thus the client can be described as possessing an interactive rendering engine.
Unlike Option 2, these rendering and input-handling instructions do not take the form of downloaded code, instead they are embedded in the data as declarative controls. Control data informs the rendering engine on how to solicit input events from the user, for example, indicating that a certain string of text should be clickable and should therefore be rendered in a special way (underlined and blue) so that it (perceivably) affords clicking. The control data also tells the rendering engine that mouse click events targeted at the area of the screen occupied by the text must be processed in a certain way, for example, by downloading new data to drive the rendering engine.
In this architecture, mobile code implementing a protocol stack cannot be downloaded and run by the client. Instead, communication must be restricted to a protocol that is already understood by the client. The control data indicates how to construct the data to be sent to the server in requests. This request data may be constructed from other control data (e.g. a logical name for the next set of data to download performing a similar function to a URI) or from data received from the user via previous input events (e.g. keyboard input event).
This architecture has some of the same benefits as the mobile code architecture in that the client is able to process events locally rather than forwarding them all to the server. However, in this case event processing is restricted to the set of control types that can be expressed by the data format. Also, just as this architecture requires the client and server to pre-agree on a data format (inhibiting data encapsulation), they must also pre-agree on a set of data transfer protocols (restricting interoperability). However, as discussed in my earlier post, this comes with the advantage that because data is downloaded instead of code, not only is less bandwidth likely used but also the client has more flexibility – it is not restricted to running the code; it can perform alternate functions such as index spidering without actually rendering the data. The fact that the control information is embedded in the data enhances this capability allowing these alternate non-rendering clients to not only understand how the data would be rendered as output but also how the interactive rendering engine would process input events.
As discussed in my earlier post, Option 3 is not quite REST. REST additionally requires that data be transferred in a representation format “matching one of an evolving set of standard data types, selected dynamically based on the capabilities or desires of the recipient and the nature of the resource”, addressing the data encapsulation issues in Option 3. REST also requires the use of an evolving set of standard transfer protocols (that can all be mapped to a uniform interface). This addresses the issue in Option 3 where the client and server are required to pre-agree on a (potentially non-standard) protocol. In REST, a resource is referenced by a URI that uses a standard URI scheme. The URI scheme tells the client which standard protocol to use; this reduces pre-agreement between client and server to an agreement on a common space of standard protocols. This use of URIs to identify resources also allows the data format standards and the data transfer protocols to evolve separately (Larry Masinter has published a great discussion of this principle and it is also discussed in the first “rule” in Roy Fielding’s guidelines on REST APIs). Thus, the use of standard formats and transfer protocols allows REST to address the client-server coupling issues of Option 3.
REST also adds the use of code-on-demand to Option 3, allowing scripts and bytecode to be used as representation formats. This allows REST to achieve many of the benefits of Option 2. While Option 3 restricts the client’s rendering and input processing capabilities to what can be expressed by the data format, REST allows scripts and embedded objects to extend the client’s capabilities. Increased use of code-on-demand comes at the cost of decreased visibility – a tradeoff that must be kept in mind when designing a system. Practically, the more script-driven a web site is, the less non-browser clients (spiders, intermediaries, analytics and authoring tools, etc.) can do with it.
It should be noted that REST does not relax the constraints that require standard formats and protocols in the context of code-on-demand. For example, HTTP is used for communication performed by browser scripts in an AJAX application. To some, this may seem like a disadvantage, providing Option 2 with more flexibility than REST. However, standard protocols have some clear advantages over custom protocols. While I will leave a detailed discussion of this point for a future post, the advantages primarily come from the ability of intermediaries to act on communications, providing services such as caching and content transformation.
In my last post, I asked if adding links to data was sufficient to turn it into hypermedia. It seems that there is more to hypermedia than links; hypermedia is an event filter. In a RESTful system, input events from the user are processed by the client according to the controls specified by the currently-loaded hypermedia document. These controls determine if an event is simply dropped, transforms the document or application state in some way, or is translated into a request on the uniform interface for a new document to be loaded. Embedding controls in the data transfered to the client offers performance benefits over a client-server model that transmits all events to the server, and visibility benefits over a mobile-code architecture where downloaded code processes events. Controls are more than links; hypermedia is more than linked data.