CloudFront

Edge cache with some compute moderately bizarre limitations

Warning: This section is under construction

Caching GitHub API Responses

On this wiki itself, I leverage remote content from https://github.com/thiskevinwang/public-docs

To construct the lefthand navigation sidebar, I leverage the structure of the filesystem from https://github.com/thiskevinwang/public-docs. But because this is a separate repository, I call the GitHub API to get that data.

GET /repos/:owner/:repo/git/trees/:tree_sha?recursive=0 gives me back a JSON blob like the following, and while not depicted, it contains everything under the wiki directory which is what I ultimately want.

Take note of the sha field. I use that later in blob requests.

This is 1 API call. Unfortunately it doesn’t provide pretty text like titles or descriptions. It only returns paths which I use as website slugs. The various MDX files have frontmatter with name and description keys which are pleasing to the human eye.

I can go fire off N+1 or 1+N requests to GET /repos/:owner/:repo/git/blobs/:file_sha, passing in the sha from earlier to fetch individual files' string contents.

Note: I need to set {"Accept": "application/vnd.github.raw"} header to fetch the raw string contents.

This way I have access to MDX frontmatter and can attach pretty titles to my previously ugly tree JSON data.

However, with 50-100's of files, this quickly burns through the 5000 request per... I think it resets every hour?... rate limit that GitHub sets.

So for this class of requests that are conveniently unique via :file_sha, I can set up a CloudFront proxy to cache each of these items, which I know will be quite stable, for 1 year.

This significantly cuts down on my GitHub API consumption at page build time, whether that is at static build time or at on-demand generation time. The math is roughly M * 1 * N, where:

  • M: number of actual pages under /wiki/[[...slug]]
  • 1: is the single call to GET /repos/:owner/:repo/git/trees/:tree_sha?recursive=0
  • N: is multiple calls to GET /repos/:owner/:repo/git/blobs/:file_sha
    • (All of these N requests now go to CloudFront)

CloudFront Settings

For the CloudFront proxy, the notable settings I set are:

  • Use Legacy cache settings
  • Include Accept and Authorization headers to be part of the cache key
  • Set custom cache TTLs (seconds)
    • Minimum: 31536000
    • Maximum: 31536000
    • Default: 31536000

That's it. And it's ready in about 3 minutes.

The uniqueness of the GitHub API endpoint, /repos/:owner/:repo/git/blobs/:file_sha, lends it self nicely to a almost-zero-config CloudFront setup.

Rate-limit sanity check

Here's Postman Test script for handy rate-limit observability when making GitHub API calls. I know there's a dedicated /rate_limit endpoint, but I prefer not calling that simply to avoid tabbing back and forth within Postman.