Albumentations server-side, bbox-aware

Grow your dataset 5-10×— labels stay correct, transforms stay reproducible.

Augment your labelled dataset on the server with Albumentations. Flip, rotate, jitter brightness, add noise, crop, blur — bounding boxes and polygons transform correctly. Export the augmented set in COCO or YOLO format with a train/val/test split.

Try in Studio free Read API reference

Engine: Albumentations (bbox-aware)
Inputs: labelled dataset + pipeline config JSON
Outputs: augmented dataset (COCO/YOLO) with transformed labels
Speed: ~10k images / min
Free quota: 50 exports / month

Smaller datasets need augmentation more than bigger ones, but every dataset benefits. Augmentation regularises the model against lighting, rotation, sensor noise, and viewpoint variation — improvements you'd otherwise have to collect by adding more real data. mSightFlow runs Albumentations on our GPUs so you can export an augmented dataset in minutes instead of writing your own offline-augmentation script.

The 8 transforms that matter most

HorizontalFlip / VerticalFlip

Mirror images; bboxes flip with them.

Rotate

Rotate ±N degrees; bboxes re-bounded.

RandomBrightnessContrast

Per-image brightness and contrast jitter for lighting robustness.

HueSaturationValue

Per-pixel HSV jitter for colour invariance.

RandomCrop / Pad

Random window crop or zero-padding; bboxes clipped.

GaussNoise / ShotNoise

Additive noise for sensor / low-light simulation.

Grayscale

Convert RGB → grayscale (probability-gated).

Cutout / CoarseDropout (Pro)

Random rectangular masks; forces model to attend to non-discriminative regions.

Pro tier exposes the full Albumentations library — ElasticTransform, GridDistortion, RandomRain, RandomShadow, and ~40 more.

Code — pipeline + webhook

Submit a pipeline

import os, requests, json

pipeline = [
    {"type": "HorizontalFlip", "p": 0.5},
    {"type": "Rotate", "limit": 15, "p": 0.7},
    {"type": "RandomBrightnessContrast", "brightness_limit": 0.2, "p": 0.6},
    {"type": "GaussNoise", "var_limit": [10, 50], "p": 0.3},
]

resp = requests.post(
    "https://api.msightflow.ai/v1/projects/PROJECT_ID/export-augmented",
    headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
    json={
        "pipeline": pipeline,
        "augmentations_per_image": 4,
        "format": "yolo",
        "split": {"train": 0.8, "val": 0.1, "test": 0.1},
    },
)
print(resp.json())
# → {"job_id": "...", "estimated_completion": "PT2M"}

Webhook on completion

# Webhook on completion — your endpoint receives a POST when the export is done.
# Configure the URL in your project settings, or pass per-request:

resp = requests.post(
    "https://api.msightflow.ai/v1/projects/PROJECT_ID/export-augmented",
    headers=hdr,
    json={
        "pipeline": pipeline,
        "augmentations_per_image": 4,
        "format": "coco",
        "webhook_url": "https://your-app.example.com/cv-augment-done",
    },
)

# Webhook payload:
# { "job_id": "...", "status": "done", "download_url": "https://...", "size_mb": 412.3 }

Pricing — same as every other endpoint

Free

50 exports / month
8 core transforms
Bbox-aware

Start free

Standard

$10/mo

500 exports / month
Webhook on completion
Custom split ratios

Pick Standard

Pro

$29/mo

Unlimited exports
Full Albumentations library

Go Pro

Related features

Dataset export

Pair augmentation with COCO / YOLO / VOC export and train/val/test split in one workflow.

Learn more

Auto-labelling

Label first, augment second. Auto-label gets the base set; augmentation grows it for training.

Learn more

Annotation quality + IAA

Augmentation can amplify bad labels. Run quality checks before exporting to avoid garbage-in-garbage-out.

Learn more

FAQ

Why augment on the server instead of in my training loop?

Two reasons. (1) Reproducibility — augmentations are fixed when you export, so the same dataset trains the same way across runs and machines. (2) Sharing — your collaborators get the augmented set without needing to install albumentations and replicate your transform code. The trade-off is dataset-size on disk; for very large augmented sets you'll want training-time augmentation.

Are bounding boxes / polygons transformed correctly?

Yes. Albumentations is bbox-aware: rotation rotates both image and labels, flip mirrors both, crop drops bboxes outside the crop window. Polygon transforms work the same way for segmentation masks. We verify bbox sanity (no zero-area boxes, no boxes outside image) and drop invalid ones with a warning.

Which transforms are supported?

The mSightFlow defaults cover the eight most-used: HorizontalFlip, VerticalFlip, Rotate, RandomBrightnessContrast, GaussNoise, RandomCrop, Grayscale, and HueSaturationValue. On Pro tier, the full Albumentations library is exposed via pipeline config — including geometric distortions (ElasticTransform, GridDistortion), weather sims (RandomRain, RandomShadow), and Cutout / CoarseDropout.

How many augmented copies should I generate?

Empirically 3-5× the original count is the sweet spot for most detection tasks. Below 2× you don't get the regularisation effect; above 10× you mostly burn disk and training time without accuracy gains. For small datasets (< 500 images), 8-10× helps more.

Can I split train/val/test before augmenting?

Yes — and you should. mSightFlow augments only the train split by default; val and test pass through unchanged. This prevents augmented copies of test images leaking into training and inflating apparent accuracy. Set split via the export request.

From 500 images to 5,000 — with correct labels.

50 free exports / month. Bbox-aware. COCO + YOLO. No setup.

Start free Or try in Studio