Built on Meta AI's Segment Anything Model

Segment Anything (SAM)— click once, get a perfect mask.

Meta's Segment Anything Model gives you pixel-perfect masks from a single point. mSightFlow hosts SAM ViT-Base as a REST endpoint — no GPU setup, no model download, no PyTorch install. Drop it into your annotation tool, dataset pipeline, or product UI in under 10 lines of code.

Model
SAM ViT-Base (Meta AI)
Inputs
JPG, PNG · ≤ 25 MB
Outputs
binary mask (base64 PNG) · bbox · confidence
Latency
~200 ms p50
Free quota
300 calls / month

SAM (Segment Anything Model) was trained on the largest-ever segmentation dataset — 11 million images and 1.1 billion masks — with the goal of foundation-model segmentation: one model that can mask any object you point at, without per-class training. That breaks the usual segmentation pipeline (label thousands of class instances, train a model, watch it fail on a new class). With SAM, you point and you get the mask.

mSightFlow hosts SAM in two modes — point-prompt for interactive annotation and Everything mode for fully-automatic mask generation across an image — and includes the connective tissue that turns SAM's raw masks into COCO / YOLO polygons, auto-labelled datasets, or refined annotations.

When SAM is the right tool

Annotation tools

Replace the polygon-drawing flow with a one-click prompt. A ~3-hour image batch becomes a ~20-minute one. Pair with active learning to label only what your model is unsure about.

Dataset bootstrapping

Pair SAM with Grounding DINO to build a COCO-format dataset for a brand-new class with zero training data. Detect with text prompt, segment with SAM, export.

Background removal & cutouts

Point at the foreground subject; ship the resulting alpha-matte to your e-commerce, AR, or editing pipeline. SAM masks are tighter than mid-2020s salience-segmentation models.

Two modes: point-prompt vs Everything

Point-prompt mode

You know what you want. Click on it.

  • One or more (x, y, label) points
  • label=1 foreground (include), label=0 background (exclude)
  • Returns the single mask containing your prompts
  • Best for annotation UIs, AR cutouts, interactive product flows

Everything mode

Mask every object on a grid. No prompts.

  • points_per_side parameter (4 coarse → 32 fine)
  • Returns an array of distinct masks with bboxes
  • Best for batch dataset annotation, change detection, object counting
  • Slower (proportional to points_per_side²)

How it works — three steps

  1. 01

    Send your image and a point

    POST an image and one or more (x, y, label) points to /v1/segment/interactive with your bearer token. Free tier: 300 calls / month, no credit card.

  2. 02

    Get a mask back

    The response is a binary mask (base64 PNG), bounding box, and confidence score. Median latency on a 1024-px image is around 200 ms.

  3. 03

    Save, refine, or convert

    Add positive points to grow the mask or negative points to shrink it. Convert the mask to a COCO polygon with /labeling/mask-to-polygon, or pipe straight into dataset export.

Code — Python, Node, cURL

Drop into any HTTP-capable language. Python and Node SDKs are optional sugar over the REST endpoint.

Python · requests
import os, base64
import requests
from pathlib import Path

api_key = os.environ["MSF_API_KEY"]

resp = requests.post(
    "https://api.msightflow.ai/v1/segment/interactive",
    headers={"Authorization": f"Bearer {api_key}"},
    files={"image": Path("image.jpg").read_bytes()},
    data={
        "points": '[{"x": 320, "y": 240, "label": 1}]',  # 1 = foreground
        "return_overlay": "true",
    },
)
result = resp.json()

# Save the mask
Path("mask.png").write_bytes(base64.b64decode(result["mask"]))
print("bbox:", result["bbox"], "confidence:", result["confidence"])
Node.js
import fetch from "node-fetch";
import FormData from "form-data";
import fs from "fs";

const form = new FormData();
form.append("image", fs.createReadStream("image.jpg"));
form.append("points", JSON.stringify([{ x: 320, y: 240, label: 1 }]));
form.append("return_overlay", "true");

const resp = await fetch("https://api.msightflow.ai/v1/segment/interactive", {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.MSF_API_KEY}` },
  body: form,
});
const result = await resp.json();
console.log("bbox:", result.bbox, "confidence:", result.confidence);
cURL
curl -X POST https://api.msightflow.ai/v1/segment/interactive \
  -H "Authorization: Bearer $MSF_API_KEY" \
  -F "image=@image.jpg" \
  -F 'points=[{"x":320,"y":240,"label":1}]' \
  -F "return_overlay=true"
Everything mode
# "Everything mode": auto-mask every object in the image.
resp = requests.post(
    "https://api.msightflow.ai/v1/segment/everything",
    headers={"Authorization": f"Bearer {api_key}"},
    files={"image": Path("image.jpg").read_bytes()},
    data={"points_per_side": "16"},   # 4 (coarse) - 32 (fine)
)
result = resp.json()
# result["masks"]: list of {mask: base64, bbox, confidence}
print(f"Found {len(result['masks'])} distinct objects")

Integrate SAM into your annotation tool

A typical click-to-annotate workflow on top of mSightFlow:

  1. User clicks on an object in your image canvas → grab the click coordinates relative to the image.
  2. POST /v1/segment/interactive with that point.
  3. Render the returned mask as a coloured overlay on the canvas (decode base64 PNG → ImageBitmap → drawImage).
  4. If the mask is wrong, capture a refinement click (positive or negative) and re-POST with the full point list. SAM resolves them jointly.
  5. When the user accepts, POST /labeling/mask-to-polygon to convert to a polygon and save the annotation in your project.
  6. At export time, /export writes the annotations as COCO / YOLO / Pascal VOC with the polygons attached.

SAM vs semantic segmentation — which one?

SAM (this page)Semantic / instance segmentation
Outputs class labels?No — class-agnosticYes — labels from training set
Works on unseen classes?✅ Yes❌ Only trained classes
Interactive prompting?✅ Point, box, or mask prompts❌ No prompts
Auto-mask everything?✅ Everything mode✅ Instance segmentation
Latency~200 ms point-prompt; ~3-15 s Everything mode~50-200 ms per image
Best forAnnotation UIs, dataset bootstrapping, cutoutsProduction inference with known classes

In real workflows, the two complement each other: SAM for the mask geometry, Grounding DINO or a trained detector for the class label. mSightFlow's auto-labelling pipeline runs both for you.

Pricing — same as every other endpoint

SAM consumes one API call per /v1/segment/interactive request. Everything mode also costs one call regardless of how many masks it returns.

Free

$0

  • 300 API calls / month
  • 50 exports / month
  • All inference endpoints
  • No credit card
Start free

Pro

$29/mo

  • Unlimited calls
  • Unlimited exports
  • Higher per-provider quotas
Go Pro

Full feature matrix on the pricing page.

Related features

FAQ

Which SAM model does mSightFlow use?

SAM ViT-Base — the smallest of Meta's three SAM checkpoints. It's a strong balance of accuracy and speed (~200 ms median on a single 1024-px image). ViT-Large and ViT-Huge are roadmapped for the Pro tier when latency-critical use cases demand them.

What's the difference between point-prompt and Everything mode?

Point-prompt segments the single object containing your click; Everything mode runs SAM on a regular grid of prompts (4-32 points per side) and returns every distinct mask in the image. Use point-prompt for click-to-annotate workflows; use Everything mode for automatic mask generation across a dataset.

Can I use multiple points?

Yes. Pass an array of points each tagged label=1 (foreground / include) or label=0 (background / exclude). The model resolves them jointly, which is the standard way to fix a mask that initially gets the wrong region.

Is the mask exportable to COCO or YOLO?

Yes. Use /labeling/mask-to-polygon to convert the binary mask to a polygon via cv2.findContours + Douglas-Peucker simplification. Polygons round-trip cleanly to COCO segmentation format and YOLO polygon format.

Is SAM the same as semantic segmentation?

No. SAM is class-agnostic — it gives you a mask for whatever you point at, with no class label attached. Semantic segmentation models output class labels (`person`, `car`, `road`) but require a class to be in their training set. Many real workflows combine them: SAM for the mask geometry, a classifier or zero-shot detection for the class.

Can SAM run on the edge?

Today, mSightFlow hosts SAM as a cloud REST endpoint. You can call it from any internet-connected device (Jetson, Raspberry Pi, ESP32-CAM). On-device SAM is feasible but slow on edge hardware without quantisation; an edge SDK is roadmapped.

Click. Mask. Ship.

300 free API calls / month. SAM ViT-Base. No credit card. No setup.