Body keypoints, 17 joints,one API call.
Detect human bodies and 17 COCO keypoints (eyes, shoulders, elbows, wrists, hips, knees, ankles) in any image — multi-person, with skeleton overlay, in under a second.
- Model
- YOLOv8-Pose (m)
- Inputs
- JPG/PNG ≤ 25 MB
- Outputs
- 17 keypoints/person · bbox · skeleton overlay
- Latency
- ~150 ms p50
- Free quota
- 300 calls / month
YOLOv8-Pose extends YOLOv8 detection with a keypoint regression head, giving you per-person bounding boxes and the 17 standard COCO keypoints in a single forward pass. Faster than two-stage top-down models, accurate enough for production sports analysis, ergonomics, AR avatars, and behavioural monitoring.
mSightFlow hosts YOLOv8m-Pose (medium-size checkpoint) by default with optional skeleton overlay rendering. For custom keypoint schemas — hands, faces, animals, mechanical parts — define your template via /skeletons/ and we'll render and export accordingly.
When pose estimation is the right tool
Sports & fitness
Form analysis, rep counting, joint-angle measurement, technique scoring. Build a fitness coach in an afternoon.
Workplace ergonomics
Posture monitoring on production lines, lifting-form risk scoring, fatigue-trigger detection. RULA / REBA pipelines.
AR & avatars
Body tracking for AR filters, avatar driving, motion-capture-lite. Combine with depth for 3D-aware avatars.
The 17 COCO keypoints
Default skeleton schema. Index, name, and the order they appear in the keypoints array of each person.
| Index | Keypoint | Index | Keypoint |
|---|---|---|---|
0 | nose | 9 | left_wrist |
1 | left_eye | 10 | right_wrist |
2 | right_eye | 11 | left_hip |
3 | left_ear | 12 | right_hip |
4 | right_ear | 13 | left_knee |
5 | left_shoulder | 14 | right_knee |
6 | right_shoulder | 15 | left_ankle |
7 | left_elbow | 16 | right_ankle |
8 | right_elbow |
For custom schemas (hands, faces, animals, mechanical), see the skeleton templates API.
Code — Python, Node, cURL
import os, requests
from pathlib import Path
resp = requests.post(
"https://api.msightflow.ai/v1/pose",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
files={"image": Path("athlete.jpg").read_bytes()},
data={"return_overlay": "true"},
)
for p in resp.json()["persons"]:
nose = p["keypoints"][0] # COCO index 0 = nose
l_wrist, r_wrist = p["keypoints"][9], p["keypoints"][10]
print(f"person bbox={p['box']}, nose={nose}, wrists={l_wrist}, {r_wrist}")
import fetch from "node-fetch";
import FormData from "form-data";
import fs from "fs";
const form = new FormData();
form.append("image", fs.createReadStream("athlete.jpg"));
form.append("return_overlay", "true");
const resp = await fetch("https://api.msightflow.ai/v1/pose", {
method: "POST",
headers: { Authorization: `Bearer ${process.env.MSF_API_KEY}` },
body: form,
});
const { persons } = await resp.json();
console.log(`${persons.length} persons, total keypoints: ${persons.reduce((n,p)=>n+p.keypoints.length,0)}`);
curl -X POST https://api.msightflow.ai/v1/pose \
-H "Authorization: Bearer $MSF_API_KEY" \
-F "image=@athlete.jpg" \
-F "return_overlay=true"
# Compute elbow angle for form analysis (rep counting, ergonomics).
import numpy as np
def angle(a, b, c):
ba, bc = np.array(a) - np.array(b), np.array(c) - np.array(b)
cos = np.dot(ba, bc) / (np.linalg.norm(ba) * np.linalg.norm(bc))
return float(np.degrees(np.arccos(np.clip(cos, -1, 1))))
person = resp.json()["persons"][0]
kp = person["keypoints"]
# COCO indices: 5 left-shoulder, 7 left-elbow, 9 left-wrist
l_elbow_angle = angle(kp[5][:2], kp[7][:2], kp[9][:2])
print(f"left elbow flexion: {l_elbow_angle:.1f}°")
Pricing — same as every other endpoint
Free
$0
- 300 API calls / month
- Multi-person, COCO schema
- Skeleton overlay PNG
- No credit card
Standard
$10/mo
- 5,400 API calls / month
- Custom skeleton templates
- Batch up to 10 images / call
Related features
Depth estimation
Combine 17 keypoints with per-pixel depth for 3D-aware pose, AR avatars, ergonomics scoring.
Learn moreObject detection
Detect people, equipment, and PPE alongside pose for full-scene workplace and sports analytics.
Learn moreSAM segmentation
Pixel-level body masks (silhouettes) for compositing, background removal, AR effects.
Learn moreFAQ
Which keypoints are returned?
The 17 standard COCO keypoints: nose, left/right eye, left/right ear, left/right shoulder, left/right elbow, left/right wrist, left/right hip, left/right knee, left/right ankle. Each comes with an (x, y) image coordinate and a confidence score 0-1.
What if I need keypoints for hands, faces, or animals?
Use /features/pose-estimation/skeletons to define a custom keypoint schema. mSightFlow ships templates for COCO Person (17), MediaPipe Hand (21), MediaPipe Face Mesh (468), and Animal Pose (24). For brand-new schemas, contact us — we can host a fine-tuned checkpoint on the Pro tier.
Multi-person handling?
Yes — the response is an array of detected persons, each with their own bbox + 17 keypoints + confidence. The model handles occlusion via confidence-down-weighting; expect lower per-keypoint confidence for partially-visible bodies.
How accurate is YOLOv8-Pose?
On COCO Person Keypoints, YOLOv8m-Pose hits ~64 mAP — strong for a single-stage model. Faster than top-down models like HRNet, less accurate than the largest two-stage models. Best balance for real-time and batch processing.
Can I use this for video?
Yes — submit a video via /v1/video/upload (tier-gated by length: 8 s free / 15 s Standard / unlimited Pro). The pose model runs on sampled frames, returning per-frame keypoints. For continuous streams the streaming endpoint is on the roadmap.
17 joints, every person, one call.
300 free API calls / month. YOLOv8-Pose. Custom schemas on Pro.