Page 1: Pose Detection and Animation

This page provides a introduction of how to develop a web application that captures human motion in real-time and uses this data to animate a 3D articulated model.

Here’s what we will cover:

  • Detect keypoints of hand pose and body pose via Tensorflow.js
  • How to apply the detected keypoints for animation.

This part stands at the intersection of real-time computer graphics, machine learning, and human-computer interaction, offering a practical exploration of modern interactive technologies. We will use packages from Tensorflow.js.

Hand Pose Detection

In this part we will use MediaPipeHands from Tensorflow.js to get the detected keypoints from a video of hand pose movement. Before we get started in our .js file, we should import required packages in our .html file:

1
2
3
4
5
<!-- Load MediaPipe and TensorFlow.js dependencies -->
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/hands@0.4.1646424915/hands.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-core@4.10.0/dist/tf-core.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgl@4.10.0/dist/tf-backend-webgl.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/hand-pose-detection@2.0.0/dist/hand-pose-detection.min.js"></script>

After importing required packages, we create a hand pose detection model and set its configuration:

1
2
3
4
5
6
7
8
9
const model = handPoseDetection.SupportedModels.MediaPipeHands;
const detectorConfig = {
  runtime: 'mediapipe',
  solutionPath: 'https://cdn.jsdelivr.net/npm/@mediapipe/hands@0.4.1646424915',
  modelComplexity: 1,
  minDetectionConfidence: 0.5,
  minTrackingConfidence: 0.5,
  modelType: 'lite'
};

There are two options for you to choose from: lite, and full. From lite to full, the accuracy increases while the inference speed decreases. Once we define our detection model, we can pass in a video stream or static image to detect poses. Here we pass a video from hand-pose-one-through-five.mp4 in our .html file.

1
2
3
const video = document.getElementById('video');
const estimationConfig = { flipHorizontal: true };
const hands = await detector.estimateHands(video, estimationConfig);

The output format is as follows: hands represent an array of detected hand predictions in the image frame. For each hand, the structure contains a prediction of the handedness (left or right) as well as a confidence score of this prediction. An array of 2D keypoints is also returned, where each keypoint contains x, y, and name. The x, y denotes the horizontal and vertical position of the hand keypoint in the image pixel space, and name denotes the joint label. In addition to 2D keypoints, we also return 3D keypoints (x, y, z values) in a metric scale, with the origin in auxiliary keypoint formed as an average between the first knuckles of index, middle, ring and pinky fingers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
[
  {
    score: 0.8,
    Handedness: 'Right',
    keypoints: [
      {x: 105, y: 107, name: "wrist"},
      {x: 108, y: 160, name: "pinky_finger_tip"},
      ...
    ]
    keypoints3D: [
      {x: 0.00388, y: -0.0205, z: 0.0217, name: "wrist"},
      {x: -0.025138, y: -0.0255, z: -0.0051, name: "pinky_finger_tip"},
      ...
    ]
  }
]

Note that handedness is determined assuming the input image or video is mirrored, i.e., taken with a front-facing/selfie camera with images flipped horizontally. If it is not the case, we should set the flipHorizontal to True, and then flip the x values of the output keypoints as well.

For 2D points, we might just want to simply flip x as canvas.width - x. For 3D points, where values are normalized to (-1, 1), we might want to flip x, y, and z as -x, -y, and -z. The reason is:

  • Y-axis: In MediaPipe, Y increases downward (image coordinates), but in Three.js 3D, Y increases upward. So we negate Y to convert coordinate systems.
  • Z-axis: In MediaPipe, Z is depth away from camera (positive = further), but in Three.js, negative Z is away from camera. So we negate Z.
  • X-axis: We negate X because flipHorizontal: true gives us coordinates in flipped space, but we want to match the un-flipped video.

Box 10-01-01   10-01-01.html   10-01-01.js

Box 10-01-02  (3 basic + 1 advanced)   10-01-02.html   10-01-02.js

There are basic or advanced items to be completed for Box 10-01-02.

10-01-01.js provides a simple demo about hand pose detection that visualizes the detected hand skeleton in a 2D canvas using hands.keypoints. In 10-01-02.js , we are going to apply the keypoints we get from the detector to animate a 3D hand model using hands.keypoints3D, where each joint node is represented by a SphereGeometry, and a CylinderGeometry is used to link two nodes. All the detected keypoints are already provided, and you only need to create Geometry objects to make a skeleton animation. We assume the skeleton of the hand is like:

1
2
3
4
5
6
7
8
const HAND_CONNECTIONS = [
  [0, 1], [1, 2], [2, 3], [3, 4], // Thumb
  [0, 5], [5, 6], [6, 7], [7, 8], // Index finger
  [0, 9], [9, 10], [10, 11], [11, 12], // Middle finger
  [0, 13], [13, 14], [14, 15], [15, 16], // Ring finger
  [0, 17], [17, 18], [18, 19], [19, 20], // Pinky
  [5, 9], [9, 13], [13, 17] // Palm connections
];

where [0, 1] means points with index 0 and 1 from the output keypoints are two of the thumb joint nodes.

Body Pose Detection And Animation

Like what we did in Hand Pose detection, we can also leverage body pose detection models from Tensorflow.js to get the keypoints of body movement. An example is the BlazePose model:

1
2
3
4
5
6
const model = poseDetection.SupportedModels.BlazePose;
const detectorConfig = {
  runtime: 'mediapipe',
  solutionPath: 'https://cdn.jsdelivr.net/npm/@mediapipe/pose@0.4.1624666670/',
  modelType: 'lite'
};

Once we define our detection model, we can pass in a video stream or static image to detect poses. Here we pass a video from body-pose-dance.mp4 in our .html file.

1
2
3
const video = document.getElementById('video');
const estimationConfig = { flipHorizontal: true };
const poses = await detector.estimatePoses(video, estimationConfig);

The default handness rule is the same as hand pose. So if we input a static video instead of capturing a selfie using our webcam, we should set the flipHorizontal to True, and then flip the position values respectively.

BlazePose Keypoints: Used in MediaPipe BlazePose

BlazePose Keypoints
0-nose
1-lefteye(inner)
2-lefteye
3-lefteye(outer)
4-righteye(inner)
5-righteye
6-righteye(outer)
7-leftear
8-rightear
9-mouth(left)
10-mouth(right)
11-leftshoulder
12-rightshoulder
13-leftelbow
14-rightelbow
15-leftwrist
16-rightwrist
17-leftpinky
18-rightpinky
19-leftindex
20-rightindex
21-leftthumb
22-rightthumb
23-lefthip
24-righthip
25-leftknee
26-rightknee
27-leftankle
28-rightankle
29-leftheel
30-rightheel
31-leftfootindex
32-rightfootindex

Box 10-01-03   10-01-03.html   10-01-03.js

Box 10-01-04  (3 basic + 1 advanced)   10-01-04.html   10-01-04.js

There are basic or advanced gallery items to be completed for Box 10-01-04. To get full points, please add explanations of what you did in the .html file (or as notes in the Workbook Form) and submit at least one screenshot and one video recording of the box to the Workbook Canvas Assignment.

10-01-03.js provides a simple demo about hand pose detection that visualizes the detected hand skeleton in a 2D canvas using hands.keypoints. In 10-01-04.js , we are going to apply the keypoints we get from the detector to animate a 3D hand model using pose.keypoints3D, where each joint node is represented by a SphereGeometry, and a CylinderGeometry is used to link two nodes. All the detected keypoints are already provided, and you only need to create Geometry objects to make a skeleton animation.

Box 10-01-05   10-01-05.html   10-01-05.js

In addition, 10-01-05.js provides an example of how to use the detected keypoints to animate a GLTF model. The keypoints are used to calculate the rotations of the animated bones. If you are interested, you can replace the model with what you like and have a try.

Basic Items

10-01-02

  • Correctly show the keypoints in 3D space using basic geometry objects (like SphereGeometry).
  • Correctly connect all the keypoints using basic geometry objects (like CylinderGeometry) to create a hand skeleton.
  • The movement of the hand skeletion is correctly following the video.

10-01-04

  • Correctly show the keypoints in 3D space using basic geometry objects (like SphereGeometry).
  • Correctly connect all the keypoints using basic geometry objects (like CylinderGeometry) to create a body skeleton.
  • The movement of the body skeletion is correctly following the video.

Advanced Items

10-01-02

  • Make the hand pose animation more creative. For example, add some interactions with the environment.

10-01-04

  • Make the body pose animation more creative. For example, add some interactions with the environment.
Next: Cloth Simulation