Grow your dataset 5-10×— labels stay correct, transforms stay reproducible.
Augment your labelled dataset on the server with Albumentations. Flip, rotate, jitter brightness, add noise, crop, blur — bounding boxes and polygons transform correctly. Export the augmented set in COCO or YOLO format with a train/val/test split.
- Engine
- Albumentations (bbox-aware)
- Inputs
- labelled dataset + pipeline config JSON
- Outputs
- augmented dataset (COCO/YOLO) with transformed labels
- Speed
- ~10k images / min
- Free quota
- 50 exports / month
Smaller datasets need augmentation more than bigger ones, but every dataset benefits. Augmentation regularises the model against lighting, rotation, sensor noise, and viewpoint variation — improvements you'd otherwise have to collect by adding more real data. mSightFlow runs Albumentations on our GPUs so you can export an augmented dataset in minutes instead of writing your own offline-augmentation script.
The 8 transforms that matter most
HorizontalFlip / VerticalFlip
Mirror images; bboxes flip with them.
Rotate
Rotate ±N degrees; bboxes re-bounded.
RandomBrightnessContrast
Per-image brightness and contrast jitter for lighting robustness.
HueSaturationValue
Per-pixel HSV jitter for colour invariance.
RandomCrop / Pad
Random window crop or zero-padding; bboxes clipped.
GaussNoise / ShotNoise
Additive noise for sensor / low-light simulation.
Grayscale
Convert RGB → grayscale (probability-gated).
Cutout / CoarseDropout (Pro)
Random rectangular masks; forces model to attend to non-discriminative regions.
Pro tier exposes the full Albumentations library — ElasticTransform, GridDistortion, RandomRain, RandomShadow, and ~40 more.
Code — pipeline + webhook
import os, requests, json
pipeline = [
{"type": "HorizontalFlip", "p": 0.5},
{"type": "Rotate", "limit": 15, "p": 0.7},
{"type": "RandomBrightnessContrast", "brightness_limit": 0.2, "p": 0.6},
{"type": "GaussNoise", "var_limit": [10, 50], "p": 0.3},
]
resp = requests.post(
"https://api.msightflow.ai/v1/projects/PROJECT_ID/export-augmented",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
json={
"pipeline": pipeline,
"augmentations_per_image": 4,
"format": "yolo",
"split": {"train": 0.8, "val": 0.1, "test": 0.1},
},
)
print(resp.json())
# → {"job_id": "...", "estimated_completion": "PT2M"}
# Webhook on completion — your endpoint receives a POST when the export is done.
# Configure the URL in your project settings, or pass per-request:
resp = requests.post(
"https://api.msightflow.ai/v1/projects/PROJECT_ID/export-augmented",
headers=hdr,
json={
"pipeline": pipeline,
"augmentations_per_image": 4,
"format": "coco",
"webhook_url": "https://your-app.example.com/cv-augment-done",
},
)
# Webhook payload:
# { "job_id": "...", "status": "done", "download_url": "https://...", "size_mb": 412.3 }
Pricing — same as every other endpoint
Related features
Dataset export
Pair augmentation with COCO / YOLO / VOC export and train/val/test split in one workflow.
Learn moreAuto-labelling
Label first, augment second. Auto-label gets the base set; augmentation grows it for training.
Learn moreAnnotation quality + IAA
Augmentation can amplify bad labels. Run quality checks before exporting to avoid garbage-in-garbage-out.
Learn moreFAQ
Why augment on the server instead of in my training loop?
Two reasons. (1) Reproducibility — augmentations are fixed when you export, so the same dataset trains the same way across runs and machines. (2) Sharing — your collaborators get the augmented set without needing to install albumentations and replicate your transform code. The trade-off is dataset-size on disk; for very large augmented sets you'll want training-time augmentation.
Are bounding boxes / polygons transformed correctly?
Yes. Albumentations is bbox-aware: rotation rotates both image and labels, flip mirrors both, crop drops bboxes outside the crop window. Polygon transforms work the same way for segmentation masks. We verify bbox sanity (no zero-area boxes, no boxes outside image) and drop invalid ones with a warning.
Which transforms are supported?
The mSightFlow defaults cover the eight most-used: HorizontalFlip, VerticalFlip, Rotate, RandomBrightnessContrast, GaussNoise, RandomCrop, Grayscale, and HueSaturationValue. On Pro tier, the full Albumentations library is exposed via pipeline config — including geometric distortions (ElasticTransform, GridDistortion), weather sims (RandomRain, RandomShadow), and Cutout / CoarseDropout.
How many augmented copies should I generate?
Empirically 3-5× the original count is the sweet spot for most detection tasks. Below 2× you don't get the regularisation effect; above 10× you mostly burn disk and training time without accuracy gains. For small datasets (< 500 images), 8-10× helps more.
Can I split train/val/test before augmenting?
Yes — and you should. mSightFlow augments only the train split by default; val and test pass through unchanged. This prevents augmented copies of test images leaking into training and inflating apparent accuracy. Set split via the export request.
From 500 images to 5,000 — with correct labels.
50 free exports / month. Bbox-aware. COCO + YOLO. No setup.