ETag and JSON data

Recently as part of an update project for our university portal MyEd (which runs on uPortal) there was an emphasis on moving our content to more client driven access to data. We wanted to separate out the data and presentation a bit more, and also cut down on the load and traffic which a big single server-side render would produce.

We wanted to use JSON as the data format as it is nice and lightweight, and easy to parse with existing Javascript libraries (like JQuery). We then wrote in static URLs into the uPortal portlets which would allow the currently authenticated user (and them alone) to access their own data.

Our portal is under a reasonably heavy concurrent load at any given time, so we wanted to explore caching of data to make sure we make any client side calls perform well under load.

Cache Headers versus ETag

Cache Headers are used to tell a browser to not re-request an object from the server until a certain time, typically by setting an expiry date. This avoids any traffic going to the server at all, which reduces load but can mean that changes to data are missed because the cache expiry date has not been reached.

ETagging is different, in that an ETag value is set in the header, for example:

ETag: "asb227873hva23456n"

When the browser re-requests data from the url it passes the ETag back to the server in an If-None-Match header, e.g:

If-None-Match: "asb227873hva23456n"

The server then uses the ETag to decide what to do, either to send an HTTP Status Code of 304 not modified (typically with a very short response), or refresh the data and return new information back to the client. This reduces the bandwidth required, but more importently allows the server to decide how and when to respond with fresh data.

In order to get the best performance, you would in most situations use both caching and ETag in order to limit high frequency client traffic to the server but also allow the server to mitigate load using the ETag. We found when using both that behaviour in our uPortal server alongside our load balancer led to unexpected results , so we opted to initially use ETagging only.

(As to why our load balancer was causing unexpected caching behaviour we’ll have to investigate later, and potentially write up another post in and of itself!)

Portlet modifications

So in the portlet itself (which is written in Java), we set the JSON data controller method to add in an ETag.

final String eTag = getETag(data);
final Date expiry = new Date(System.currentTimeMillis() + MAX_AGE_MILLIS);
        
session.setAttribute(SESSION_ATTR_CACHE_ETAG, eTag);
session.setAttribute(SESSION_ATTR_CACHE_EXPIRY, expiry);
response.setStatus(HttpServletResponse.SC_OK);
response.setHeader("Cache-Control", "must-revalidate");
response.setHeader("ETag", eTag);

Finally, we then added a check in the method for the ETag coming from the If-None-Match header:

final String ifNoneMatch = request.getHeader("If-None-Match");
 final String existingETag = (String)session.getAttribute(SESSION_ATTR_CACHE_ETAG);
 final Date existingExpiry = (Date)session.getAttribute(SESSION_ATTR_CACHE_EXPIRY);
 if (null != ifNoneMatch
 && null != existingETag
 && null != existingExpiry
 && ifNoneMatch.equals(existingETag)
 && System.currentTimeMillis() < existingExpiry.getTime())
 {
 response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
 return null;
 }

The above code checks the passed in ETag, compares it with the one stored in the user session, and additionally compares it with an expiry tag, then responds with a NOT MODIFIED 304 if the tags match and the expiry hasn’t passed. The response is null which means it doesn’t have to query the underlying dataset to respond, and therefore the response time and the bandwidth used are dramatically reduced.