Export to COCO, YOLO,or Pascal VOC — in one call.
Take your labelled dataset to any trainer in any framework. Auto- generated dataset.yaml, deterministic train/val/test split, optional webhook on completion, and a DatasetVersion snapshot for reproducibility. No format-conversion scripts.
- Formats
- COCO · YOLO · Pascal VOC
- Inputs
- project_id · format · split · webhook URL
- Outputs
- zipped dataset · dataset.yaml · webhook on completion
- Speed
- ~5k images / min
- Free quota
- 50 exports / month
Every CV team eventually writes its own COCO-to-YOLO converter, its own train/val/test splitter, its own dataset.yaml generator. Then they write the converter the other way. mSightFlow gives you all three formats out of one project state — pick by what your trainer needs, switch when you swap frameworks. Webhook + DatasetVersion means exports are CI-friendly and reproducible.
Three formats — pick by trainer
COCO JSON
- Detection + segmentation + keypoints in one file
- Universal — every modern trainer reads it
- RLE-encoded masks for instance segmentation
YOLO TXT
- One TXT per image with normalised bbox coords
- Auto-generated dataset.yaml ready for Ultralytics
- Polygon format for YOLOv8-seg
Pascal VOC XML
- One XML per image (annotation_file)
- Legacy support for older trainers
- Detection only (no native mask format)
Code — export, cURL, webhook
import os, requests
# Synchronous GET — the response body is the dataset ZIP.
# split syntax is "train/val/test" as integers, e.g. "80/10/10".
resp = requests.get(
"https://api.msightflow.ai/v1/projects/PROJECT_ID/export",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
params={"format": "yolo", "split": "80/10/10"},
stream=True,
)
resp.raise_for_status()
with open("dataset.zip", "wb") as f:
for chunk in resp.iter_content(chunk_size=1 << 20):
f.write(chunk)
print("saved dataset.zip ·", len(resp.content), "bytes")
# Same call as a one-liner — pipes the ZIP directly to disk.
curl -L \
-H "Authorization: Bearer $MSF_API_KEY" \
-o dataset.zip \
"https://api.msightflow.ai/v1/projects/PROJECT_ID/export?format=yolo&split=80/10/10"
# Each export also fires a project-level "project.exported" webhook if the
# project has a webhook URL configured. Configure it once in project settings
# (or via the projects API); every subsequent export delivers a POST to the URL.
# Webhook payload (POST to your URL):
# {
# "event": "project.exported",
# "project_id": "PROJECT_ID",
# "format": "yolo",
# "image_count": 4231,
# "version_id": "v3",
# "user": "you@example.com"
# }
# Example dataset.yaml generated for YOLO format
path: ./
train: images/train
val: images/val
test: images/test
# Classes
names:
0: person
1: hard_hat
2: safety_vest
# Metadata
exported_at: '2026-05-14T13:42:00Z'
source: mSightFlow
project_id: 'PROJECT_ID'
version: 'v3'
Pricing — same as every other endpoint
Related features
Data augmentation
Pair export with augmentation in a single request to ship a training-ready augmented dataset.
Learn moreAuto-labelling
The labelling layer feeds the export. Auto-label, verify, export — the canonical mSightFlow workflow.
Learn moreAnnotation quality
Run IAA + class-balance checks before exporting. Filter unverified or low-agreement annotations.
Learn moreFAQ
Which format should I use?
COCO if you have segmentation masks or keypoints — it's the only format that natively supports all annotation types. YOLO if you're training a YOLO-family model and want fastest data loading. Pascal VOC if your downstream tool (legacy Detectron, MMDetection) expects it. For most projects, default to COCO — every modern training pipeline can read it.
Are train/val/test splits reproducible?
Yes. Splits use a deterministic hash of image_id seeded by the export request, so the same project + same split ratios always produce the same files. Add a seed parameter to vary the split for cross-validation experiments without changing the underlying ratios.
What does dataset.yaml contain?
For YOLO format, the YAML has train/val/test paths (relative), class names (index → name), and number of classes. It drops straight into Ultralytics' train command: `yolo detect train data=dataset.yaml model=yolov8m.pt`. We also generate a README.md with class distribution + export timestamp.
Can I include only verified annotations?
Yes — pass include_unverified=false to filter out annotations flagged source=ai_generated that haven't been human-confirmed. Combined with the active-learning workflow, you can ship a high-quality export even from a mostly-auto-labelled project.
Are exports versioned?
Each export creates a DatasetVersion snapshot — the project state at export time is preserved so you can re-download the same dataset later even if the project changes. Versions are listed in the project settings and can be tagged (v1.0, v1.1-augmented).
From project to dataset.zip, one call.
50 free exports / month. COCO, YOLO, Pascal VOC. Webhook + version snapshots.