Help Six Degrees Edinburgh with a speciality coffee survey

Skip to main content

The Thermal Printer Project: IndieWeb Wiki

Written by . Published on under the IndieWeb category.

A piece of thermal paper with an IndieWeb word of the day written on it

I have enjoyed crafting my daily update that is printed with the thermal printer. So far, my daily update includes, among other things, my schedule, the weather for the day, my RSS feed, a news update from The Guardian, and a tech news update from The Guardian. But I wanted to keep experimenting with the printer. Then I got an idea: why don’t I try to add an IndieWeb word of the day to the update?

The IndieWeb wiki is filled with interesting pieces of knowledge related to web development, one of my interests. In addition, the wiki is structured in such a way that the first paragraph of most pages gives a definition of the term being discussed on the page. That meant that I could easily retrieve the information I needed. With this idea in mind and a vague sense of how I would build this project, I set out to create an “IndieWeb word of the day” module for my thermal printer.

Using the MediaWiki API

The first challenge was deciding how I would retrieve a random page. I found that MediaWiki, the software on which the IndieWeb wiki is hosted, has an API which is enabled on each instance of their software. I looked through the API documentation and found that I could retrieve a random page on the wiki with one API call.

To retrieve this information, I use code similar to this:

valid = False

while valid == False:
	url = "https://indieweb.org/wiki/api.php?action=query&list=random&rnnamespace=0&rnlimit=1&format=json"

	wiki_page = requests.get(url)

	if wiki_page.status_code == 200:
		wiki_page_name = wiki_page.json()["query"]["random"][0]["title"]

This code has a big limitation. Some pages are related to meetups, which often include a forward slash (“/”). I did not want these pages to show up because they are about an event rather than a term I might be interested in.

To solve this problem, I wrote a quick if statement to exclude all wiki page names that contained a forward slash. I also exclude all pages that start with “User:” so I don’t get any user pages. I am not 100% sure that user pages would show up anyway but I have added this code in anyway just in case. Here is the if statement:

if "/" not in wiki_page_name and "User:" not in wiki_page_name:
	# Code here
else:
	continue # Run loop again

The while loop in the first code snippet crucial because I want the program to keep finding new URLs until a valid one is returned. If a URL does not meet the above criteria, the program executes a continue statement so that the loop runs again.

Next I had to retrieve, and then validate, the definition of a term.

Retrieving the definition of a term

The definition of each term is usually in the first paragraph of the main text in the wiki. With this in mind, I wrote a few lines of code that made use of the BeautifulSoup web scraper that would retrieve the text from the first paragraph of main text. I used the following code to retrieve that text. This code is in the “if” statement mentioned earlier:

get_page = requests.get("https://indieweb.org/{}".format(wiki_page_name))

soup = BeautifulSoup(get_page.content, "lxml") # Read xml from lxml lib

main_content = soup.find_all("div", {"class": "mw-content-ltr"})[0]

top_paragraph = main_content.find("p", attrs={"class": None}).text

if len(top_paragraph) >= 20:
	valid = True
else:
	continue

I use the attrs={"class": None} to get the first paragraph of main text because there is sometimes a stub paragraph that appears at the top of entries. The stub has a class so by retrieving the first paragraph of text in the <div> with the mw-content-ltr class, I am able to retrieve a definition.

I check if the first paragraph contains 20 or more characters. If it does, I consider the while loop in which all of the code for retrieving a wiki page is enclosed complete. Otherwise, I use a continue statement so the code runs over again until I have found a valid URL that meets all of the aforementioned requirements.

After all of this, I print the term found on the wiki and the description of that term onto paper using my thermal printer.

Wrapping up

I think the IndieWeb word of the day will be a good way to learn something new about the web every so often that I might otherwise not have encountered. Maybe I’ll even learn a few new words to use in the IndieWeb chat. I enjoyed putting together this module, which is what really matters. Now every day I will wake up to an update with, among other things, a random IndieWeb term and its definition.

Today’s word of the day was “POOSNOW.” Pictured at the start of this article is the definition of the term.

Also posted on IndieNews.

Go Back to the Top