Reducing server bandwidth with conditional GETs

Let's imagine one day you've been poking around the network usage section of your phone, trying to see what apps are killing your allotted 10GB of mobile data from T-Mobile.

You scroll down and notice the usual suspects, YouTube, TikTok, whatnot. Then out of the blue, you start to see a bunch of applications that seem out of place. Newspaper apps, stock apps, even some banking apps can sometimes use more bandwidth than what you think.

How could that be? It turns out that many applications from the New York Times to Robinhood will often re-poll for the latest information from every few minutes to every second. These constant GET requests, while small, can add up.

In this article, I'll be explaining to you a method many of these Apps (hopefully) use to reduce the amount of bandwidth they take up, Conditional GETs. Conditional GETs can help prevent your apps from getting the same 20kb response every time you ping your server.

The gist

Conditional get diagram
A diagram showing the steps of Conditional GETs

Conditional GETs are used in asset caching to prevent a browser from receiving the same javascript/image/CSS payload if a browser cached the latest copy. We should try to use conditional GETs in any request to the server when we poll for cachable content.

Let's look at a typical flow for the conditonal request:

  1. The browser requests some content from a website.
  2. The server returns the content with one or both of these headers:
    • Last-Modified: some-date - The time (usually a timestamp) that this content was last modified
    • Etag: some-generated-value - A unique id referencing a resource to a particular state in time
      • An ETag could be a hash of the content, an id assigned whenever the content is updated, or a unique string representing the content
  3. The browser requests the same content later time; the browser can pass some conditional request headers:
    • If-Modified-Since: some-date - The last timestamp saved on the browser
    • If-None-Match: some-generated-value - The previous ETag saved on the browser
  4. The server will check if any of those two values satisfy these conditions:
    • If the content is the same, the server will return a 304 status
    • If the content is different, the server will return new data with a new Last-Modified and or Etag.

In Practice

In the example below, I am creating a server that allows a user to update and retrieve their user information. The application would allow us to fetch a user's social media information on request.

// @language-override:Node + Express
// @repl-it-link: https://repl.it/@4shub/conditional-get-nodejs
const express = require('express');
const server = express();
const port = 3000;

// Imagine this user is in our database
const someUser = {
  name: 'Yuo Mo',
  age: 24,
  followers: 0,
  updatedAt: new Date().valueOf(),
}

// helper function to "get" a user from our database
const getUser = () => Promise.resolve(someUser);

server.use(express.json());

server.get('/user', async (req, res) => {
  // express will automatically lowercase "If-Modified-Since"
  const lastModified = req.headers['if-modified-since'];

  const user = await getUser();

  if (parseFloat(lastModified) >= user.updatedAt) {
    res.sendStatus(304);
    return;
  }

  const payload = { user };

  res.setHeader('Last-Modified', user.updatedAt);
  res.send(payload);
});

server.listen(port, () => console.log(`App listening at 
    http://localhost:${port}`));
# @language-override:Python + Flask
# @repl-it-link: https://repl.it/@4shub/conditional-get-python#main.py
from flask import Flask, request, make_response
import json
import calendar
import time

# Imagine this user is in our database
some_user = {
  'name': 'Shub Naik',
  'age': 23,
  'followers': 0,
  'updated_at': calendar.timegm(time.gmtime())
}

# helper function to "get" a user from our database
def get_user_from_db():
  return some_user

def get_user():
  last_modified_at = str(request.headers.get('If-Modified-Since'))

  user = get_user_from_db()

  response = make_response()

  if (int(last_modified_at) >= user['updated_at']):
    response.status_code = 304
    return response

  response.data = json.dumps({'user': user})
  response.headers.set('Last-Modified', user['updated_at'])

  return response

@app.route("/user", methods=['GET'])
def user():
  return get_user()

if __name__ == "__main__":
    # Flask needs to run in threaded mode
    app.run(host='0.0.0.0', port=3000, threaded=True)

We use the attribute updatedAt of someUser to validate the "newness" of response and return it as Last-Modified. We will work with ETags later.

// @codepen-link:https://codepen.io/4shub/pen/zYvXRwJ

// Create a helper function so we can reuse it later
const createConditionalGetFetch = (url) => {
  let etag;
  let userLastUpdatedAt;
  let cachedData;


  return () => {
    const headers = {};

    if (userLastUpdatedAt) {
      headers['If-Modified-Since'] = userLastUpdatedAt;
    }

    if (etag) {
      headers['If-None-Match'] = etag;
    }

    const processResponse = (response) => {
      const { status } = response;
      if (status === 304) {
        return Promise.resolve(({ status, data: cachedData }));  
      }

      userLastUpdatedAt = response.headers.get('Last-Modified');
      etag = response.headers.get('Etag');

      return response.json().then(data => ({ status, data }));
    } 

    return fetch(url, { 
      method: 'GET', 
      headers,
    }).then(processResponse);
  }
}

const fetchUser = createConditionalGetFetch('https://ds.shub.dev/e/user');

fetchUser().then(({ user }) => { /* do something with the user */ })

Going Deeper

More headers!

The conditional request specification gives us a few different conditional header tags we can work with besides If-None-Match and If-Modified-Since. Those are listed below:

Strong and Weak Validation

The ETag HTML specification provides us two methodologies we can implement for validating our Etags:

Strong validation must ensure that the content requested is byte-by-byte the same as the previously requested content for a client to receive a 304 response. An example could be a dataset containing all your banking information. If anything has changed on the server, we should always send the most recent data.

Weak validation means that the server's content could be different from what already is on the client, but the change is not significant enough for the server to pass back new data. Let's go back to that banking information example. Let's say the banking information also contains some metadata information on an A/B test going on. This information is not essential and probably doesn't need to be updated on the client if we are performing live updates on the browser.

To ask a server to perform weak validation, you would prepend your Etag with W/.

Let's build a server that can perform both strong and weak Etag validation.

// @language-override:Node + Express
// @repl-it-link:https://repl.it/@4shub/conditional-get-validation-nodejs
const express = require('express');
const md5 = require('md5');

const server = express();
const port = 3000;

const article = {
  content: 'Hello there! this is an article there!',
  meta: 'Meta content for user',
  adInfo: '349243'
}

// gets an article from "our database"
const getArticle = () => Promise.resolve(article);

const generateETag = (article) => {
  const contentHash = md5(article.content);
  const metaHash = md5(article.meta + article.adInfo);

  return `${contentHash}_${metaHash}`;
}

const validateETag = (etag, article) => {
  const useWeakValidation = etag.includes('W/');
  const parsedTag = etag.replace('W/', '');

  if (useWeakValidation) {
    const weakCompare = md5(article.content);

    return weakCompare === parsedTag.split('_')[0];
  }

  const strongCompare = generateETag(article);

  return strongCompare === parsedTag;
}

server.get('/article', async (req, res) => {
  const etag = req.headers['if-none-match'];

  const article = await getArticle();

  if (!etag || validateETag(etag, article)) {
    res.sendStatus(304);
    return;
  }

  const nextEtag = generateETag(article);
  res.setHeader('ETag', nextEtag);
  res.send({ article });
})

server.listen(port, () => console.log(`App listening at 
    http://localhost:${port}`));

Above, we created a function called generateTag that creates an ETag composed of two parts, a contentHash and metaHash. The contentHash is an md5 hash of only the article's content. The metaHash is an md5 hash of all the non-content parts of this article.

We also created a validation function that will:

Weak validation is a little more complicated to implement then just checking if any byte has changed. Still, the benefit of building weak validation can help reduce unnecessary GETs when doing repetitive polls.

Conclusion

Conditional GETs are a straightforward way to reduce the bandwidth handled through your application. The bandwidth savings can directly reduce your networking costs and also help your customers reduce their networking costs (if they pay for their bandwidth).

Try this solution out alongside client-side caching, and you can have even more savings as users who return to your website or app don't need to redownload content that hasn't changed since their last visit. Anyway, give it a go--let me know what you make!

--
Written by Shubham Naik on June 10, 2020.

Comments, Questions, or Clarifications? Contact shub@shub.club