Building a Feed Reader to Follow Blogs and Websites: Part I
Published on under the IndieWeb category.Toggle Memex mode

I like surfing the web in the traditional sense of the word, going from site to site in search of interesting websites. I enjoy doing this because there are so many unique websites to explore in terms of design and content. While social media sites may have set a precedent with regard to similar user experiences, personal websites mostly have their own identities. Website owners—irrespective of whether they coded their site or used a tool to generate their site—can use their website as a method of expression. The owner can choose a design. The owner can write content that they want to share with the world.
Because there are so many websites out there publishing interesting content, it is impractical to visit them all on a regular basis. That is why I rely on having a feed reader. A feed reader is almost like a social media feed but the content comes from websites rather than users on a centralised platform. Feed readers aggregate content from sites I like so I can go to one place to see my content.
I used Feedly for months to keep track of web feeds. However, I did not want to pay for a recurring subscription and felt that Feedly offered more features than I reasonably needed. I also wanted to learn more about how feeds worked. Thus, I decided to build my own feed reader. In this post, I will discuss some of the server logic I have written to create my own feed reader. In part two of this series, I will talk about the Microsub client I built to read feeds.
Choosing the Microsub architecture
Microsub is a draft specification that describes two services—a client and a server—and the ways in which they interact to process feed subscriptions. The client shows content published by blogs that you like. The server does all of the work behind the scenes to actually retrieve the content. Microsub defines a few clear methods on how to retrieve feed content (a "timeline") from a server, subscribe or unsubscribe from a feed, and perform administrative actions.
I found out about Microsub through the IndieWeb wiki. I played around with two hosted services, Aperture and Monocle, and was intrigued by the promise of Microsub. I liked how Microsub seperates the client and the server so both services can interoperate with each other. I could build a server that would work with Monocle, a Microsub reader, and any other client that follows the specification. With some experience using Microsub, I decided that I was ready to replace Feedly, my primary method of consuming blogs, with my own project.
Microsub is still a draft specification with a lot of GitHub issues to work through. With that said, the architecture has been implemented by a number of different people and makes logical sense to me as an implementer. I am excited to see the specification mature with more implementation and discussions around how it can be improved.
Building the Server
I started by building a Microsub server first. The server manages all of my feed subscriptions, retrieves content, and lets me interact with content (for example, I can remove a piece of content from my feed, mark a post as read or unread, or block and unblock a particular feed). I decided to build the server component first because I like building back-end tools and wanted to dive deeper into feeds.
My Microsub server consists of three parts:
- A Python Flask server that can respond to the requests defined in the Microsub draft specification. The server also provides some user-facing views that let you manage subscriptions.
- A set of functions that are called by the Flask server and execute requests (i.e. follow a feed, unfollow a feed, preview a feed).
- A program that polls all of the feeds to which I am subscribed.
The Microsub server is powered by an SQLite database and Python. I choose this combination because I am most familiar with Python and I did not want to introduce too much database overhead. I may change the database to PostgreSQL in the future. PostgreSQL is more robust and is less likely to cause issues if multiple operations (i.e. feed polling and my trying to subscribe to a feed) occur at the same time.
Microsub defines one endpoint that should accept the requests you choose to support. This is hosted at the /endpoint destination on my Microsub server. I have other views that render web templates so I can perform some administrative actions like creating a channel (a category for feeds) and a subscription using a web interface.
Here is what my channel dashboard looks like:
The set of functions I wrote are all based on the Microsub specification. I have implemented the main functions that I expected to use on a daily or regular basis such as timeline support, follow and unfollow support, and a few other functions.
For instance, the screenshot above shows buttons such as "Move Up" and "Move Down." Those buttons send a request to my Microsub endpoint that instructs the server to change the order of channels in my reader. The image above also shows a form that lets me create a channel. This form sends a request to my Microsub server to create a channel.
I have not yet implemented the search function but I do think there is an opportunity to use the "discover" feature of IndieWeb search to find new feeds. This is on my radar. Howver, at the moment I usually find sites I am interested in manually as I explore the web so I would likely not use this feature much.
Here is an example Microsub request that my feed reader can parse:
GET https://server.com/?action=channels&channel=[channel_name]
This request returns a list of channels. When my server receives this request, it calls a function that retrieves a list of all of the channels to which I subscribed and orders them in the way I have set. (Reordering one's list of feeds is defined in the Microsub specification.) Then, the server sends the list to the client so that the list of feeds can appear on a web interface.
Defining Feed Polling Logic
The program that polls feeds has taken a lot of work to get right. The program runs every hour. In summary, this program:
- Gets a list of my current subscriptions from the database.
- Sends a HEAD request to a feed to get headers.
- Checks if the feed has a different etag value than the last one saved. etags are usually sent with feed files to tell readers if the content in the feed has changed. If the etag value is the same, my program skips polling a feed.
- Checks if the feed has changed in the last 12 hours using the Last-Modified header value.
- Checks if the feed is an XML feed, JSON Feed, or a h-feed.
- Retrieves the feed with a GET request.
- If the feed is a supported type, the program, retrieves all of its contents and converts each entry into a jf2 object. jf2 is a standardised way of representing content in JSON. This standard is used in Microsub so that clients know exactly what content they can expect to see in a feed entry.
- The jf2 object is saved to the database alongside some other pieces of information.
Steps three and four help save on processing time. If a feed has not changed according to its etag or Last-Modified value, there is no use in fetching the feed and making that extra request (which will lengthen the time it takes to poll all feeds). Indeed, etag values may not be configured correctly. I plan to add a feature so that a feed will be polled every so often anyway although the logic for this to happen is yet to be determined. [1]
To process feeds, I rely on a few libraries:
- feedparser: Used for processing XML feeds.
- mf2py: Used for turning HTML documents into JSON objects with microformats2 values.
These libraries make it easy for me to retrieve content from feeds without having to write code that manually looks for values on each page in line with RSS, Atom, and microformats2 standards. I do not use a library to process JSON Feed objects because they are already in JSON and easy to parse. The goal is to retrieve exactly what content I need from a page—a summary of a post, the main post image (if available), the title of the post, etc.—and format it all in jf2 so that clients only need to recognise one type of object to render content.
I recently made a change so that all of these steps happen concurrently. I used Python's concurrent.futures library to implement this concurrency. I saw a massive improvement in the time spent polling feeds with this change. This is because a lot of time in the polling process was taken up waiting for websites to return their feeds when my program made a HTTP request. HTTP request times are a particularly notable concern because some sites send relatively large feed files. These files would block the program until the whole file had been downloaded.
Wrapping Up
My Microsub server is in active development as I learn more about different feed types and the information available. For instance, I recently made a change to my h-feed processing logic so that I can render videos in feeds. I have also been adding more fallbacks so that posts with titles will be assigned a title like "Post by {author or domain name}" so that feed items look good in a reader.
Working on the server has been a good learning experience. While feedparser and mf2py have abstracted away the need to get too far into the "feed logic" weeds, I have had to read and experiment quite a bit to make my server recognise content in just the way I need. I expect to make more changes if I run into feeds that mark up information in ways for which I have not planned.
In the second part of this series, I am going to talk through how all of this server work translates to a web interface. I will show off the Microsub client I am currently developing and how I built it.
Are you using a Microsub reader? Do you follow blogs and websites using a feed reader? I would love to hear from you! Feel free to reach out to me at readers@jamesg.blog.
[1]: Invalid etags are an edge case but one I need to plan for to ensure my server is as robust as possible.
Responses
Comment on this post
Respond to this post by sending a Webmention.
Have a comment? Email me at readers@jamesg.blog.