Built on Meta AI's Segment Anything Model

Segment Anything (SAM)— click once, get a perfect mask.

Meta's Segment Anything Model gives you pixel-perfect masks from a single point. mSightFlow hosts SAM ViT-Base as a REST endpoint — no GPU setup, no model download, no PyTorch install. Drop it into your annotation tool, dataset pipeline, or product UI in under 10 lines of code.

Try in Studio free Read API reference

Model: SAM ViT-Base (Meta AI)
Inputs: JPG, PNG · ≤ 25 MB
Outputs: binary mask (base64 PNG) · bbox · confidence
Latency: ~200 ms p50
Free quota: 300 calls / month

SAM (Segment Anything Model) was trained on the largest-ever segmentation dataset — 11 million images and 1.1 billion masks — with the goal of foundation-model segmentation: one model that can mask any object you point at, without per-class training. That breaks the usual segmentation pipeline (label thousands of class instances, train a model, watch it fail on a new class). With SAM, you point and you get the mask.

mSightFlow hosts SAM in two modes — point-prompt for interactive annotation and Everything mode for fully-automatic mask generation across an image — and includes the connective tissue that turns SAM's raw masks into COCO / YOLO polygons, auto-labelled datasets, or refined annotations.

When SAM is the right tool

Annotation tools

Replace the polygon-drawing flow with a one-click prompt. A ~3-hour image batch becomes a ~20-minute one. Pair with active learning to label only what your model is unsure about.

Dataset bootstrapping

Pair SAM with Grounding DINO to build a COCO-format dataset for a brand-new class with zero training data. Detect with text prompt, segment with SAM, export.

Background removal & cutouts

Point at the foreground subject; ship the resulting alpha-matte to your e-commerce, AR, or editing pipeline. SAM masks are tighter than mid-2020s salience-segmentation models.

Two modes: point-prompt vs Everything

Point-prompt mode

You know what you want. Click on it.

One or more (x, y, label) points
label=1 foreground (include), label=0 background (exclude)
Returns the single mask containing your prompts
Best for annotation UIs, AR cutouts, interactive product flows

Everything mode

Mask every object on a grid. No prompts.

points_per_side parameter (4 coarse → 32 fine)
Returns an array of distinct masks with bboxes
Best for batch dataset annotation, change detection, object counting
Slower (proportional to points_per_side²)

How it works — three steps

01
Send your image and a point
POST an image and one or more (x, y, label) points to /v1/segment/interactive with your bearer token. Free tier: 300 calls / month, no credit card.
02
Get a mask back
The response is a binary mask (base64 PNG), bounding box, and confidence score. Median latency on a 1024-px image is around 200 ms.
03
Save, refine, or convert
Add positive points to grow the mask or negative points to shrink it. Convert the mask to a COCO polygon with /labeling/mask-to-polygon, or pipe straight into dataset export.

Code — Python, Node, cURL

Drop into any HTTP-capable language. Python and Node SDKs are optional sugar over the REST endpoint.

Python · requests

import os, base64
import requests
from pathlib import Path

api_key = os.environ["MSF_API_KEY"]

resp = requests.post(
    "https://api.msightflow.ai/v1/segment/interactive",
    headers={"Authorization": f"Bearer {api_key}"},
    files={"image": Path("image.jpg").read_bytes()},
    data={
        "points": '[{"x": 320, "y": 240, "label": 1}]',  # 1 = foreground
        "return_overlay": "true",
    },
)
result = resp.json()

# Save the mask
Path("mask.png").write_bytes(base64.b64decode(result["mask"]))
print("bbox:", result["bbox"], "confidence:", result["confidence"])

Node.js

import fetch from "node-fetch";
import FormData from "form-data";
import fs from "fs";

const form = new FormData();
form.append("image", fs.createReadStream("image.jpg"));
form.append("points", JSON.stringify([{ x: 320, y: 240, label: 1 }]));
form.append("return_overlay", "true");

const resp = await fetch("https://api.msightflow.ai/v1/segment/interactive", {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.MSF_API_KEY}` },
  body: form,
});
const result = await resp.json();
console.log("bbox:", result.bbox, "confidence:", result.confidence);

cURL

curl -X POST https://api.msightflow.ai/v1/segment/interactive \
  -H "Authorization: Bearer $MSF_API_KEY" \
  -F "image=@image.jpg" \
  -F 'points=[{"x":320,"y":240,"label":1}]' \
  -F "return_overlay=true"

Everything mode

# "Everything mode": auto-mask every object in the image.
resp = requests.post(
    "https://api.msightflow.ai/v1/segment/everything",
    headers={"Authorization": f"Bearer {api_key}"},
    files={"image": Path("image.jpg").read_bytes()},
    data={"points_per_side": "16"},   # 4 (coarse) - 32 (fine)
)
result = resp.json()
# result["masks"]: list of {mask: base64, bbox, confidence}
print(f"Found {len(result['masks'])} distinct objects")

Integrate SAM into your annotation tool

A typical click-to-annotate workflow on top of mSightFlow:

User clicks on an object in your image canvas → grab the click coordinates relative to the image.
POST /v1/segment/interactive with that point.
Render the returned mask as a coloured overlay on the canvas (decode base64 PNG → ImageBitmap → drawImage).
If the mask is wrong, capture a refinement click (positive or negative) and re-POST with the full point list. SAM resolves them jointly.
When the user accepts, POST /labeling/mask-to-polygon to convert to a polygon and save the annotation in your project.
At export time, /export writes the annotations as COCO / YOLO / Pascal VOC with the polygons attached.

SAM vs semantic segmentation — which one?

	SAM (this page)	Semantic / instance segmentation
Outputs class labels?	No — class-agnostic	Yes — labels from training set
Works on unseen classes?	✅ Yes	❌ Only trained classes
Interactive prompting?	✅ Point, box, or mask prompts	❌ No prompts
Auto-mask everything?	✅ Everything mode	✅ Instance segmentation
Latency	~200 ms point-prompt; ~3-15 s Everything mode	~50-200 ms per image
Best for	Annotation UIs, dataset bootstrapping, cutouts	Production inference with known classes

In real workflows, the two complement each other: SAM for the mask geometry, Grounding DINO or a trained detector for the class label. mSightFlow's auto-labelling pipeline runs both for you.

Pricing — same as every other endpoint

SAM consumes one API call per /v1/segment/interactive request. Everything mode also costs one call regardless of how many masks it returns.

Free

300 API calls / month
50 exports / month
All inference endpoints
No credit card

Start free

Standard

$10/mo

5,400 API calls / month
500 exports / month
Batch up to 10 images / call

Pick Standard

Pro

$29/mo

Unlimited calls
Unlimited exports
Higher per-provider quotas

Go Pro

Full feature matrix on the pricing page.

Related features

Zero-shot detection

Grounding DINO. Type a text prompt → get bounding boxes. Pair with SAM to label new classes from scratch.

Learn more

Auto-labelling

One-call dispatcher that runs detect + SAM + classify + pose to bootstrap an annotation pass.

Learn more

Semantic segmentation

Class-aware masks from trained models. Use when class labels matter and inputs are in your domain.

Learn more

FAQ

Which SAM model does mSightFlow use?

SAM ViT-Base — the smallest of Meta's three SAM checkpoints. It's a strong balance of accuracy and speed (~200 ms median on a single 1024-px image). ViT-Large and ViT-Huge are roadmapped for the Pro tier when latency-critical use cases demand them.

What's the difference between point-prompt and Everything mode?

Point-prompt segments the single object containing your click; Everything mode runs SAM on a regular grid of prompts (4-32 points per side) and returns every distinct mask in the image. Use point-prompt for click-to-annotate workflows; use Everything mode for automatic mask generation across a dataset.

Can I use multiple points?

Yes. Pass an array of points each tagged label=1 (foreground / include) or label=0 (background / exclude). The model resolves them jointly, which is the standard way to fix a mask that initially gets the wrong region.

Is the mask exportable to COCO or YOLO?

Yes. Use /labeling/mask-to-polygon to convert the binary mask to a polygon via cv2.findContours + Douglas-Peucker simplification. Polygons round-trip cleanly to COCO segmentation format and YOLO polygon format.

Is SAM the same as semantic segmentation?

No. SAM is class-agnostic — it gives you a mask for whatever you point at, with no class label attached. Semantic segmentation models output class labels (`person`, `car`, `road`) but require a class to be in their training set. Many real workflows combine them: SAM for the mask geometry, a classifier or zero-shot detection for the class.

Can SAM run on the edge?

Today, mSightFlow hosts SAM as a cloud REST endpoint. You can call it from any internet-connected device (Jetson, Raspberry Pi, ESP32-CAM). On-device SAM is feasible but slow on edge hardware without quantisation; an edge SDK is roadmapped.

Click. Mask. Ship.

300 free API calls / month. SAM ViT-Base. No credit card. No setup.

Start free Or try in Studio

When SAM is the right tool

Annotation tools

Dataset bootstrapping

Background removal & cutouts

Two modes: point-prompt vs Everything

Point-prompt mode

Everything mode

How it works — three steps

Send your image and a point

Get a mask back

Save, refine, or convert

Code — Python, Node, cURL

Integrate SAM into your annotation tool

SAM vs semantic segmentation — which one?

Pricing — same as every other endpoint

Free

Standard

Pro

Related features

Zero-shot detection

Auto-labelling

Semantic segmentation

FAQ

Click. Mask. Ship.