Charlie Gerard presented a terrific talk on machine learning in the browser at Beyond Tellerrand Berlin two weeks ago. Both during and after the talk, I felt inspired. The talk introduced me to the variety of different machine learning model types one can work with in the browser. Audio classification. Pose detection. Facial landmarks.
Over the last few days, I have been playing around with Tensorflow.js, a machine learning framework you can use in the browser. Tensorflow.js provides concise APIs you can use for different machine learning tasks. You can use many of them out of the box, without having to train a model. This means you can think more about writing logic than training models.
Yesterday I was working on an application that superimposes a photo of Taylor Swift on stage over your webcam and you have to try and make the pose she is making. After brainstorming with a friend, I learned that a good way to implement such a project is to use a pose detection model, such as one available through Tensorflow.js, then calculate the angles between "keypoints." Keypoints are parts of a person that a pose detection model has been trained to identify, such as elbows, shoulders, and knees.
My application works by:
- Asking the user for permission to use their webcam.
- Placing an image of Taylor Swift over the webcam.
- Calculating keypoints in the image of Taylor Swift.
- Calculating the angles between relevant keypoints in the Taylor Swift image (see below for the definition of "relevant").
- Calculating the angles between relevant keypoints in the webcam feed.
- Measuring the cosine similarity between each angle in the Taylor Swift points and the webcam feed. The average of the similarities is then taken. The closer to 1 the value is, the more similar the poses are.
Pose detection models that aim to identify the pose of a person (other types include hand or head pose detection models) typically return 17 points. But only a subset of these are relevant for comparing poses. For example, comparing the location of one's eyes -- included in the 17 points -- is not needed to know if your pose is like that of Taylor Swift in an image. The angles of one's elbow to shoulder and knee to hip, on the other hand, are relevant.
Here are the points I calculated:
``` // angles: 10 + 8 = right wrist to right elbow // 8 + 6 = right elbow to right shoulder // 9 + 7 = left wrist to left elbow // 7 + 5 = left elbow to left shoulder
// angles: 16 + 14 = right ankle to right knee // 14 + 12 = right knee to right hip // 15 + 13 = left ankle to left knee // 13 + 11 = left knee to left hip
// angles: 11 + 12 = right hip to left hip // 5 + 6 = right shoulder to left shoulder ```
The numbers correspond to the ID of the keypoint returned by the pose detection model.
For each image, I manually tested to figure out which similarity threshold is appropriate to consider the player's pose to be similar enough to the reference image. For easier poses, such as standing with one's arms out and looking up, a high similarity threshold can be set. For more difficult poses, such as standing on one leg, with the other leg bent, arm down, and another arm pointed up (Swift is holding a microphone in said image), a lower similarity is used. I am unsure if the manual testing approach is the best here. I am very much learning to figure out what works!
One of the most fun parts about this project was the testing. To know if my code was working and that the game felt intuitive, I had to play! I had my computer angled to see the whole room, took a step back from my laptop, and tried to make the pose in the image. The "felt intuitive" part is crucial. Players shouldn't need to make an exact replica of a stance. A close enough pose to the reference image is more fun.
If you are interested in seeing my code, check out the project in the project GitHub repository. You can test the game on one image, too! (I haven't finished working on the gameplay mechanics at the time of writing, so the demo only lets you play with one image. If your pose is close enough to the pose on screen, confetti should appear.)
Comment on this post
Respond to this post by sending a Webmention.
Have a comment? Email me at firstname.lastname@example.org.