Auto-label your datasetin minutes — not weeks.
The auto-labelling aggregator dispatches detection, segmentation, SAM refinement, pose, classification, OCR, zero-shot detection, and captioning in a single call. Get COCO-format suggestions back, drop them into your annotation project, and have humans verify rather than draw.
- Model
- Aggregator (8 sub-tasks)
- Inputs
- JPG/PNG ≤ 25 MB · task list
- Outputs
- COCO-format suggestions (bboxes · masks · keypoints · labels)
- Speed-up
- 5-10× vs manual
- Free quota
- 300 calls / month
Labelling is the single biggest cost in any CV project. Auto-labelling doesn't replace humans — it gets the boring 80% done so humans focus on the hard 20%. mSightFlow's aggregator runs whichever subset of detect, segment, pose, classify, OCR, zero-shot, caption you ask for, returns standard COCO output, and integrates with active learning so your team reviews uncertainty-sorted suggestions first.
When auto-labelling is the right tool
Bootstrap a new class
Use Grounding DINO + SAM to get day-1 annotations for a class with zero training data.
Refine existing labels
Run detection on your existing labelled set to find missed objects, double-count errors, or wrong-class labels — quality control at scale.
Multi-task labels in one pass
Need both bbox AND mask AND keypoints? One call, all three. Saves API quota and keeps annotations consistent across tasks.
The 8 task types
| Task | What it does | Models used |
|---|---|---|
detect | Object detection bboxes | YOLO family + /v1/detect |
segment | Semantic / instance masks | /v1/segmentation + SAM refinement |
pose | 17 COCO keypoints per person | YOLOv8-Pose |
classify | Top-k image labels | /v1/classify + cloud LLM |
ocr | Text regions + confidence | EasyOCR + cloud LLM fallback |
zero_shot | Text-prompted detection | Grounding DINO |
caption | Image description / VQA | BLIP + cloud LLM |
ai_detection | Synthetic-image / deepfake flag | 4-detector ensemble |
Pass any subset comma-separated: tasks=detect,segment,pose.
Code — single image, zero-shot, and batch
import os, requests
from pathlib import Path
resp = requests.post(
"https://api.msightflow.ai/v1/label/auto",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
files={"image": Path("scene.jpg").read_bytes()},
data={
"tasks": "detect,segment,pose",
"model": "yolov8m",
},
)
result = resp.json()
print(f"{len(result['annotations'])} suggestions generated")
# Each annotation has: bbox, mask (if segment), keypoints (if pose),
# category, confidence, source='ai_generated'
# Zero-shot new class + SAM mask refinement, in one call.
resp = requests.post(
"https://api.msightflow.ai/v1/label/auto",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
files={"image": Path("factory.jpg").read_bytes()},
data={
"tasks": "zero_shot,segment",
"prompt": "cracked weld bead, spatter, undercut", # zero-shot classes
"segment_with": "sam", # refine bboxes via SAM
},
)
# Returns: for each zero-shot detection, a SAM mask refined from the bbox
# Batch over a folder; pair with active learning to prioritise review.
import requests, glob, json
for img_path in glob.glob("dataset/*.jpg"):
resp = requests.post(
"https://api.msightflow.ai/v1/label/auto",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
files={"image": open(img_path, "rb")},
data={"tasks": "detect", "project_id": "PROJECT_ID"},
)
# Suggestions land in project automatically
# Then sort the project queue by model uncertainty:
queue = requests.get(
"https://api.msightflow.ai/v1/label/score-batch?project_id=PROJECT_ID",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
).json()
print(f"Review these {len(queue[:50])} first (lowest-confidence)")
Pricing — same as every other endpoint
Standard
$10/mo
- 5,400 API calls / month
- 500 exports / month
- Batch up to 10 images / call
Related features
Active learning
After auto-labelling, sort the queue by uncertainty so humans review the lowest-confidence suggestions first.
Learn moreSAM segmentation
When detection gets a bbox but you need a polygon, click once to convert it with SAM.
Learn moreDataset export
Once humans verify the suggestions, export the whole labelled set to COCO / YOLO / VOC in one click.
Learn moreFAQ
What's the difference between auto-labelling and zero-shot detection?
Zero-shot detection is one model (Grounding DINO) producing one signal (bounding boxes for a text prompt). Auto-labelling is an aggregator — it can call detection, classification, segmentation, SAM, pose, OCR, captioning, and zero-shot detection in a single batch, returning unified COCO output. Use zero-shot when you know exactly which prompt you want; use auto-label when you want a complete annotation pass across multiple task types.
How much faster is this than manual labelling?
Practitioners typically report 5-10× speed-up on bounding-box annotation when auto-labelling sets the initial state and the human only verifies/corrects. For segmentation, SAM-assisted refinement adds another 3-5×. For classification (single-label-per-image), well-fitting models can give you 95%+ accuracy and reduce labelling to spot-checking.
What format are the suggestions in?
COCO JSON format — the canonical CV annotation schema. Each suggestion gets a confidence score so you can sort by uncertainty (lowest-confidence first, which is what active learning does). Suggestions are flagged as 'ai_generated' in the source field so downstream QA can audit them separately from human annotations.
Can I use my own models?
Yes, on Pro tier. Bring your own ONNX or PyTorch model to host alongside the built-in models, and the auto-label aggregator will include it in dispatch. Useful for domain-specific classes (e.g. a fine-tuned welding-defect detector) where the public models are off the mark.
How does this work with my annotation team?
Auto-labelling drops suggestions into your project; human annotators see them as pre-filled annotations that can be accepted, edited, or rejected. Combined with /v1/quality endpoints (inter-annotator agreement, class-balance alerts), you get a full assisted-annotation workflow with quality controls.
One call. Eight task types. 5-10× faster.
300 free API calls / month. Auto-labelling across detect, segment, pose, classify, OCR.