The web platform is vast. There are so many tools available for building websites and web applications. One example is that you can run machine learning models in your browser. This enables many novel applications (with the downside that models, which can be large, need to download so they can run in the web page).
At Homebrew Website Club this week, we had a brief discussion on alternative modalities of interacting with the web. I showed an application I made that lets you use your hands to scroll up and down a web page. Use the demo of the application. You will be asked to give permission to use your webcam to use the demo. The application and model. runs in your browser.
The application works by using a hand pose detection model that runs in the browser. This model is able to detect different points in one's fingers. You can use this information to figure out if someone is making a thumbs up, a fist, is waving, or making another gesture. To use the model, you must first give permission for the browser to access your webcam.
When someone makes an open palm, my web application records that an open palm has been made. If the user's palm moves up, I scroll the page up; if the user's palm moves down, I scroll the page down. I need to work on some logic that stops the scrolling when you make a fist so that the page doesn't move when you have scrolled to the place you want to see on the web page.
I loved building this project. Seeing the browser scroll as I moved my hands was delightful. I have seen further experiments that let you use hand gestures to navigate digital maps.
My experiment got me thinking: what novel ways of interacting with the web are possible with locally-running vision models like this?
One example I thought about was a way of liking web pages that combines web action handlers and a model that looks for a thumbs up gesture. Web actions are a proposed way of making responding to web pages using your own website. With webactions, you can configure a "handler" that handles when you click on a like button, for example. Combining web actions with computer vision, I thought about an app that saves a page for later if you make a thumbs up gesture.
Another idea is a Chrome extension that looks for you making a love heart with your hands. When you do, the extension records your scroll position and generates a fragmention link which links to the first sentence visible on the page. Will this be useful? I'm not sure. Perhaps one of the wonderful readers of this blog will find this inspiring and explore the idea. abit more!
Another demo about which I have spoken in the past is my website that lets you open a link by clicking your fingers and saying a phrase from the anchor text of the link you want to click. If there is al ink on a page that says "Coffee", you can snap your fingers and say "coffee" to open the link. This is another such example of a novel way of interacting with a web page.
Suppose you are able to detect hand gestures or sounds. How would you use this to change the way someone interacts with a web page? How could you use this to help you navigate the web more efficiently? If you are interested in a machine learning approach, check out Teachable Machine, which lets you train a vision model in your browser without any code, and MediaPipe's hand gesture model, which you can use to detect points on hands. You can email me at readers [at] jamesg [dot] blog if you have any questions about how to get started.
If you build a new way of interacting with the web -- whether it uses machine learning or something else entirely (navigate web pages with a MIDI keyboard?) -- I would love to hear about what you have made.
Comment on this post
Respond to this post by sending a Webmention.
Have a comment? Email me at firstname.lastname@example.org.