Building an IRC archiver bot for the IndieWeb community
Published on under the IndieWeb category.
A few weeks ago, I learned that the IndieWeb community aims to archive all of the Etherpad documents from meetups on the wiki. Etherpad documents are made available at online meetups so participants can document ideas and what happened in the call. Archiving these documents to the wiki makes them easily searchable and ensures their contents are preserved not only in a document that could be edited further down the line.
Archiving Etherpad documents takes time. It is mostly a manual process where you copy and paste an Etherpad into a new wiki page, add some header information and links, then make some manual formatting changes. Doing this for a meetup every two weeks or so -- the cadence upon which most IndieWeb events that use Etherpad are run -- is fine, but if you have more to archive the process can get repetitive. That gave me an idea: what if there was a bot that could archive the Etherpad documents for us?
I decided to use Perl for writing this bot because I am trying to branch into different programming languages. I decided this bot would need to connect to IRC because that is the backbone of the community. I looked around for some libraries that would handle the IRC connection for me and found Bot::BasicBot on CPAN, the Perl package manager. I was excited that this library existed because I didn't want to write too much networking code to make this work (although I do love learning more about networking!).
After finding this library, I had a big decision to make: how would the bot work?
Here's the logic I came up with:
- Create an !archive command that accepts a URL as a parameter. This URL must be an IndieWeb events page with an Etherpad link added. This is important because I can infer some data from the events page to make a slug for the new wiki page for the event (i.e. the date that event is running). I talk more about this in bullet point #3.
- When the command is run, the bot will retrieve the events page, find the Etherpad link, then retrieve the plain text version of the Etherpad (this is possible using a custom plain text URL that Etherpad makes available for all documents).
- Compose a header with the information that all archived Etherpads should start with. This includes a link to the original event page, a link to the original Etherpad, the event name, and the date on which the event was run. The event name and date can be retrieved from the events page thanks to its structured markup.
- Add the Etherpad text below the header.
- Add a footer that tags the requisite category and adds a Homebrew Website Club shortcode to the event.
- Connect to the IndieWeb wiki using the MediaWiki API. (This is done using an account for the bot rather than my personal wiki account.)
- Upload the new document to the wiki. The slug of the new wiki document is derived from the date and name of the event.
Throughout development, I tested the bot privately so I could make sure everything worked. It took a bit of tinkering to get the final document in the way I wanted. Getting the MediaWiki connection to work took some time, too. To make a connection to the wiki, I had to retrieve a CSRF token, authenticate with my username and bot password (I created a bot password in the MediaWiki settings to access the API as recommended), then make the request to create a new page.
The whole process to create a new wiki page takes about 5-10 seconds. The bot then returns with a message informing you that the new page has been created (assuming you provided a valid IndieWeb events page URL that has an attached Etherpad document). The bot also asks that you manually review the new page. This is necessary because every Etherpad is different and sometimes formatting changes are required to make the Etherpad contents look good on the wiki. I wonder how much of the formatting issues I could try to fix automatically. I shall have to think about that more.
At first, I used this tool in the community to archive a Homebrew Website Club London / Europe Etherpad document. Then I realised this tool makes archiving so easy that I could go back and archive some old events that were not documented on the wiki. This led to an archiving spree per se, where I uploaded many old London / Europe meetup documents to the wiki. I found this really fun. After archiving them, I went back and made some minor formatting changes as necessary. The bot saved me a lot of time in archiving these old documents.
The source code for the bot is available on GitHub. While the use case outlined above is incredibly niche, there may be some interesting code if you're thinking about Perl MediaWiki integrations that interests you.
Comment on this post
Respond to this post by sending a Webmention.
Have a comment? Email me at firstname.lastname@example.org.