8 task types, one API call

Auto-label your datasetin minutes — not weeks.

The auto-labelling aggregator dispatches detection, segmentation, SAM refinement, pose, classification, OCR, zero-shot detection, and captioning in a single call. Get COCO-format suggestions back, drop them into your annotation project, and have humans verify rather than draw.

Model
Aggregator (8 sub-tasks)
Inputs
JPG/PNG ≤ 25 MB · task list
Outputs
COCO-format suggestions (bboxes · masks · keypoints · labels)
Speed-up
5-10× vs manual
Free quota
300 calls / month

Labelling is the single biggest cost in any CV project. Auto-labelling doesn't replace humans — it gets the boring 80% done so humans focus on the hard 20%. mSightFlow's aggregator runs whichever subset of detect, segment, pose, classify, OCR, zero-shot, caption you ask for, returns standard COCO output, and integrates with active learning so your team reviews uncertainty-sorted suggestions first.

When auto-labelling is the right tool

Bootstrap a new class

Use Grounding DINO + SAM to get day-1 annotations for a class with zero training data.

Refine existing labels

Run detection on your existing labelled set to find missed objects, double-count errors, or wrong-class labels — quality control at scale.

Multi-task labels in one pass

Need both bbox AND mask AND keypoints? One call, all three. Saves API quota and keeps annotations consistent across tasks.

The 8 task types

TaskWhat it doesModels used
detectObject detection bboxesYOLO family + /v1/detect
segmentSemantic / instance masks/v1/segmentation + SAM refinement
pose17 COCO keypoints per personYOLOv8-Pose
classifyTop-k image labels/v1/classify + cloud LLM
ocrText regions + confidenceEasyOCR + cloud LLM fallback
zero_shotText-prompted detectionGrounding DINO
captionImage description / VQABLIP + cloud LLM
ai_detectionSynthetic-image / deepfake flag4-detector ensemble

Pass any subset comma-separated: tasks=detect,segment,pose.

Code — single image, zero-shot, and batch

Python — multi-task on one image
import os, requests
from pathlib import Path

resp = requests.post(
    "https://api.msightflow.ai/v1/label/auto",
    headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
    files={"image": Path("scene.jpg").read_bytes()},
    data={
        "tasks": "detect,segment,pose",
        "model": "yolov8m",
    },
)
result = resp.json()
print(f"{len(result['annotations'])} suggestions generated")
# Each annotation has: bbox, mask (if segment), keypoints (if pose),
# category, confidence, source='ai_generated'
Zero-shot + SAM combo
# Zero-shot new class + SAM mask refinement, in one call.
resp = requests.post(
    "https://api.msightflow.ai/v1/label/auto",
    headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
    files={"image": Path("factory.jpg").read_bytes()},
    data={
        "tasks": "zero_shot,segment",
        "prompt": "cracked weld bead, spatter, undercut",   # zero-shot classes
        "segment_with": "sam",                              # refine bboxes via SAM
    },
)
# Returns: for each zero-shot detection, a SAM mask refined from the bbox
Batch + active-learning review queue
# Batch over a folder; pair with active learning to prioritise review.
import requests, glob, json

for img_path in glob.glob("dataset/*.jpg"):
    resp = requests.post(
        "https://api.msightflow.ai/v1/label/auto",
        headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
        files={"image": open(img_path, "rb")},
        data={"tasks": "detect", "project_id": "PROJECT_ID"},
    )
    # Suggestions land in project automatically

# Then sort the project queue by model uncertainty:
queue = requests.get(
    "https://api.msightflow.ai/v1/label/score-batch?project_id=PROJECT_ID",
    headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
).json()
print(f"Review these {len(queue[:50])} first (lowest-confidence)")

Pricing — same as every other endpoint

Free

$0

  • 300 API calls / month
  • 50 exports / month
  • All 8 sub-tasks
Start free

Pro

$29/mo

  • Unlimited calls
  • Higher per-provider quotas
Go Pro

Related features

FAQ

What's the difference between auto-labelling and zero-shot detection?

Zero-shot detection is one model (Grounding DINO) producing one signal (bounding boxes for a text prompt). Auto-labelling is an aggregator — it can call detection, classification, segmentation, SAM, pose, OCR, captioning, and zero-shot detection in a single batch, returning unified COCO output. Use zero-shot when you know exactly which prompt you want; use auto-label when you want a complete annotation pass across multiple task types.

How much faster is this than manual labelling?

Practitioners typically report 5-10× speed-up on bounding-box annotation when auto-labelling sets the initial state and the human only verifies/corrects. For segmentation, SAM-assisted refinement adds another 3-5×. For classification (single-label-per-image), well-fitting models can give you 95%+ accuracy and reduce labelling to spot-checking.

What format are the suggestions in?

COCO JSON format — the canonical CV annotation schema. Each suggestion gets a confidence score so you can sort by uncertainty (lowest-confidence first, which is what active learning does). Suggestions are flagged as 'ai_generated' in the source field so downstream QA can audit them separately from human annotations.

Can I use my own models?

Yes, on Pro tier. Bring your own ONNX or PyTorch model to host alongside the built-in models, and the auto-label aggregator will include it in dispatch. Useful for domain-specific classes (e.g. a fine-tuned welding-defect detector) where the public models are off the mark.

How does this work with my annotation team?

Auto-labelling drops suggestions into your project; human annotators see them as pre-filled annotations that can be accepted, edited, or rejected. Combined with /v1/quality endpoints (inter-annotator agreement, class-balance alerts), you get a full assisted-annotation workflow with quality controls.

One call. Eight task types. 5-10× faster.

300 free API calls / month. Auto-labelling across detect, segment, pose, classify, OCR.