Adventures in nginx caching and redirects
Published on under the Coding category. Toggle Memex mode
I run a service that lets you create a sparkline that shows your MediaWiki contribution history. I use this tool to show how many contributions I have made to the IndieWeb wiki on my home page. Here is an example of a sparkline from the tool:
The way the service works is as follows:
- A user makes a request to my sparkline service with their MediaWiki username and the URL of a MediaWiki API. This API URL will vary depending on how your MediaWiki instance is set up. For example, I use
https://sparkline.jamesg.blog/?username=User:Jamesg.blog&api_url=https://indieweb.org/wiki/api.phpto retrieve my IndieWeb wiki contributions. - The service makes a request to the MediaWiki API to calculate how many contributions you have made to the wiki over the last N (by default, 90) days.
- The service redirects you to an SVG file with query string parameters that show your day-by-day contributions. For example, if you contributed on one day, then missed one, then contributed again, the SVG file would be
/sparkline.svg?1,0,1.
There is no caching at the second step. This means that there will be a request made to recalculate a sparkline every time someone queries my service. Herein lies a problem: if your sparkline is embedded on a popular web page (i.e. your home page), there could be many calls made to the MediaWiki API. I overcame this by making the request for the SVG when I build my static website and embedding the SVG with the query string information (the sparkline.svg file with the numbers in the query string). But, this is not practical for most people.
My first thought was "what about using HTTP caching headers?" Until I remembered those are client-side, not server-side. If I set caching HTTP headers, every new user would still make a request to the sparkline service -- and thus the corresponding MediaWiki API for the wiki you are querying -- once. This would be an improvement since subsequent requests from the same user would be cached, but such an implementation only partially solved the problem. Rather, I wanted the first user every hour to trigger generating the sparkline, then to serve that generated sparkline to everyone else.
I discussed solutions to help mitigate making so many requests to MediaWiki APIs. Solutions included:
- Creating a generic service that could cache changing assets like this that people could host on their own websites. This was a general solution, but would involve more work than was needed to solve the problem at hand.
- Update the sparkline service itself, implemented in Ruby, to use a cache that would expire every 60 minutes.
- Use NGINX's content caching features to cache responses.
After some discussion in the IndieWeb community, the third option -- using NGINX's content caching features -- was the best fit for my use case. I did not need to change my project code to implement caching. I could do everything at the NGINX layer. The basic proxy features are available in the free NGINX solution. There are more advanced features for which you can pay with NGINX Plus, but I didn't get that far in my research; the free-tier options are sufficient for my needs.
Now, you can embed my sparkline API directly onto your site without the service making a request to your MediaWiki API for every user who visits your site. Here is an example URL:
https://sparkline.jamesg.blog/?username=User%3AJamesg.blog&api_url=https%3A%2F%2Findieweb.org%2Fwiki%2Fapi.php&only_image=true
You can set a proxy cache that saves responses for later use after the response has been generated once. This cache can be configured to last for as long as you want for a given URL. You can even cache redirects, which is important for my use case. I set up a proxy cache for my MediaWiki site. Now, the site works as follows:
- A user makes a request to my sparkline service with their MediaWiki username and the URL of a MediaWiki API.
- If a sparkline has been generated in the last hour, the NGINX proxy cache will redirect you to the cached SVG. If no sparkline is available, or the sparkline was generated more than an hour ago, the service makes a request to the MediaWiki API to calculate how many contributions you have made to the wiki over the last N (by default, 90) days.
- The service redirects you to an SVG file with query string parameters that show your day-by-day contributions. For example, if you contributed on one day, then missed one, then contributed again, the SVG file would be
/sparkline.svg?1,0,1.
With this approach, the maximum number of times an individual sparkline will be generated in a day is 24 (one per hour, if there is one or more requests made for the sparkline each hour).
To set up the cache, I:
- Created a
/data/nginx/cachedirectory. - Added a
proxy_cache_path /data/nginx/cache keys_zone=mycache:60m;directive to mynginx.conffile.
This enables caching for NGINX properties. But, you have to specify which properties will use the cache. I updated the location directive in my sparkline NGINX configuration file to include the following:
server {
server_name sparkline.jamesg.blog;
location / {
proxy_cache mycache;
proxy_cache_valid 200 302 60m;
add_header X-Cache-Status $upstream_cache_status;
...
}
In the configuration above, I specify that I want to cache all 200 and 302 responses for up to 60 minutes. After an asset expires in the cache, it will be regenerated when a user makes a request. The response will be cached for later use.
An X-Cache-Status header is set. This allows us to inspect whether a HTTP request was served from our cache or our application. If a cached response is sent, X-Cache-Status will be equal to HIT. If a new response is generated from the application, X-Cache-Status will be equal to MISS.
You can inspect X-Cache-Status using the curl -I command, which lets you see the headers sent from a server when you make a HTTP request.
For example, we could use the following command to inspect headers for our app:
curl -I "https://sparkline.jamesg.blog/?username=User:Jamesg.blog&api_url=https://indieweb.org/wiki/api.php"
Here is an example response from the curl command:
HTTP/2 200
server: nginx
date: Wed, 10 Jan 2024 11:52:14 GMT
content-type: text/html;charset=utf-8
content-length: 438
vary: Accept-Encoding
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
x-frame-options: SAMEORIGIN
x-cache-status: MISS
The last header indicates a cache miss, which means our request was served from our application. The next time we run the command x-cache-status: HIT is returned since our cache has a response it can return that is within the 60m time limit we set earlier.
Responses
Comment on this post
Respond to this post by sending a Webmention.
Have a comment? Email me at readers@jamesg.blog.
