Per-pixel depthfrom a single image — no stereo camera needed.
Estimate scene depth from any photo. No stereo rig. No LiDAR. No calibration. mSightFlow hosts Depth Anything v2 — the strongest open monocular depth model — as a REST endpoint with a magma-colormap visualisation by default and raw float32 on Pro tier.
- Model
- Depth Anything v2 (ViT-B)
- Inputs
- JPG/PNG ≤ 25 MB (single image)
- Outputs
- per-pixel depth map (PNG, magma colormap)
- Latency
- ~400 ms p50
- Free quota
- 300 calls / month
Monocular depth — depth from a single image, no stereo — was hard until Depth Anything v2. Trained on 62 million unlabelled images plus 595K labelled pairs, it generalises across indoor, outdoor, macro, drone, and aerial views with a quality that two years ago required a stereo rig or LiDAR.
mSightFlow exposes Depth Anything v2 ViT-B as a REST endpoint. Returns a colourised PNG by default; Pro tier returns raw float32 depth so you can feed it into your own maths. Pair with SAM or detection to get per-object depth statistics.
When monocular depth is the right tool
AR & composition
Depth-aware background blur, focus pulling, simulated bokeh, parallax effects, AR object placement.
Defect sizing
Combine a SAM mask with the depth map to estimate physical extent of an annotated defect relative to a known reference.
Robotics-adjacent
Single-camera obstacle priority, distance-to-target estimation, scene parsing where stereo isn't available.
Relative vs metric depth — which one?
| Relative depth (default) | Metric depth (roadmapped) | |
|---|---|---|
| Output unit | 0.0–1.0 (normalised) | metres |
| Needs calibration? | No | Camera intrinsics required |
| Cross-image comparison? | No | Yes |
| Good for | AR, sizing relative to reference, scene priority | Measurement, robotics, mapping |
| Available today? | ✅ Yes | Pro tier — request access |
Code — Python, Node, cURL
import os, base64, requests
from pathlib import Path
resp = requests.post(
"https://api.msightflow.ai/v1/depth",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
files={"image": Path("scene.jpg").read_bytes()},
)
result = resp.json()
# Save the colourised depth map
Path("depth.png").write_bytes(base64.b64decode(result["depth_map"]))
# On Pro tier, raw float32 depth is also returned
# import numpy as np
# arr = np.frombuffer(base64.b64decode(result["raw_depth"]), dtype=np.float32)
import fetch from "node-fetch";
import FormData from "form-data";
import fs from "fs";
const form = new FormData();
form.append("image", fs.createReadStream("scene.jpg"));
const resp = await fetch("https://api.msightflow.ai/v1/depth", {
method: "POST",
headers: { Authorization: `Bearer ${process.env.MSF_API_KEY}` },
body: form,
});
const { depth_map } = await resp.json();
fs.writeFileSync("depth.png", Buffer.from(depth_map, "base64"));
curl -X POST https://api.msightflow.ai/v1/depth \
-H "Authorization: Bearer $MSF_API_KEY" \
-F "image=@scene.jpg" \
--output depth-response.json
# Depth + SAM = per-object depth statistics.
import json, base64, numpy as np
from PIL import Image
import io
# 1. Get depth map
depth_resp = requests.post(api + "/depth", headers=hdr, files={"image": img}).json()
depth = np.array(Image.open(io.BytesIO(base64.b64decode(depth_resp["depth_map"]))).convert("L"))
# 2. Get object mask via SAM
seg_resp = requests.post(api + "/interactive_segment", headers=hdr,
files={"image": img}, data={"points": '[{"x":320,"y":240,"label":1}]'}).json()
mask = np.array(Image.open(io.BytesIO(base64.b64decode(seg_resp["mask"]))).convert("L")) > 127
# 3. Object depth statistics
obj_depth = depth[mask]
print(f"object median depth: {np.median(obj_depth)}, std: {np.std(obj_depth):.1f}")
Pricing — same as every other endpoint
Standard
$10/mo
- 5,400 API calls / month
- 500 exports / month
- Batch up to 10 images / call
Related features
SAM segmentation
Mask + depth = per-object depth statistics. Sizing, AR, priority — all unlocked.
Learn morePose estimation
17 keypoints in image coordinates. Combine with depth for 3D-aware pose, AR avatars, ergonomics.
Learn moreObject detection
Bboxes + depth = scene-priority object lists. The detection backbone for robotics-adjacent uses.
Learn moreFAQ
What's the difference between relative and metric depth?
Depth Anything v2 (the model we host) returns relative depth — depths are accurate relative to each other within the same image, but not in metres. For metric depth you need either a calibrated stereo rig, a known reference object in the scene, or a metric-finetuned model (roadmapped for Pro tier).
Why monocular instead of stereo?
Monocular works with any single-frame image — phone photos, drone footage, archived video, historical photos. No special hardware required. Stereo and structured-light depth are higher-accuracy but require capture-side investment. Many real applications (AR effects, defect sizing relative to a reference object, scene priority) work fine with relative depth.
What input resolution should I use?
Depth Anything v2 internally resizes to 518 px on the long side, so larger inputs don't add accuracy and they slow down the call. Send images at the resolution you care about (we resize the output back to match your input). For batch processing of large image archives, downsize to ~1024 px to save bandwidth.
Can I use this for SLAM or 3D reconstruction?
Not directly. SLAM needs camera-pose tracking, which mSightFlow doesn't do. Depth Anything v2 outputs gives you depth per frame; combining frames into a coherent 3D model is a separate workflow (try Open3D or COLMAP downstream).
How accurate is it?
On standard benchmarks Depth Anything v2 ViT-B is state-of-the-art for zero-shot monocular depth. It handles indoor scenes, outdoor scenes, and macro shots well. Transparent or highly reflective surfaces remain the hardest case — expect noise on glass and polished metal.
One image. Full depth. No rig.
300 free API calls / month. Depth Anything v2. No credit card.