When I consumed more speciality coffee, I had an idea for an application that would let you take a photo (or photos) of a label on a bag of coffee and create a personal library of coffees you had consumed. The only input would be a photo, providing a relatively low friction method through which one could catalog the coffees they had consumed. Back when I first had this idea, I was limited by my knowledge of high-quality open source OCR models out there.
This weekend I have been playing around with EasyOCR and the results were promising enough to get me thinking about how this application could work. I wrote some code but there are a lot of fine details that need to be figured out to build this application. With that said, I thought I would write some of my notes here for reference in case anyone else wants to build the application.
The crux of the application is an interface through which you can take a photo and important information from the coffee label will be extracted. This information would include:
- The way in which the coffee was processed
- The origin from which the coffee was farmed
- The date on which the coffee was roasted
- The business who roasted the coffee
The application would create statistics showing how many coffees you had consumed according to the various aforementioned attributes. There would also be a search feature that, given a natural language query, returns relevant coffees. If I wanted to find a particular Ethiopian coffee that I enjoyed, I should be able to find the coffee via region tag or search.
Now, the big question: how does one get this information from an image? This is where OCR comes in. I was playing around with EasyOCR this afternoon and was able to extract good quality results from a product listing image. I haven't tried this on personal photos of coffee bags; my experimentation lasted an hour or two.
Nevertheless, OCR will get you the text. That text needs processing. I suspect a fuzzy string matching library would be able to help find slight typos. Perhaps GPT could help correct larger issues. For instance, in a label I tested, "Cairngorm" had numbers in it that looked like some of the characters, but GPT was able to correct it. Experimentation would need to be done here.
EasyOCR returns bounding boxes of each piece of text which is great because it means you can merge pieces of text. One would need to experiment to figure out how to match text without creating unnecessary combinations (i.e. "Region" and "Columbia" should be merged, but "Region" and "Process" should not be, even if they are in close proximety).
From here, the app would have a database of regions, processing methods, and flavours. Each cleaned string from the OCR -- unnecessary leading and trailing whitespaces removed, symbols removed, fuzzy text matching completed, etc. -- would be passed through this database to map a word (i.e. "apple") to its label (i.e. "flavour note"). This information could be used to tag data. If information could not be mapped, it may be worth saving it in a field hidden to users for use in full-text search, just in case the information is valuable but doesn't fall into the defined ontology for coffee labels.
To provide search, a database full text search system like the one offered by PostgreSQL would be sufficient.
There are likely other technical challenges to adress with this task that I haven't discovered yet. As I am not drinking coffee at home as actively as I once was, I'm not sure I am the best person to build this (in the vein of make what you need), but I nevertheless am excited by an application like this. My logic above may not be the best implementation path, but thus is the nature of the brainstorm! If you want to chat more about this idea, feel free to email me!
Comment on this post
Respond to this post by sending a Webmention.
Have a comment? Email me at email@example.com.