Ensemble of 4 complementary detectors

Spot AI-generated imagesand deepfakes.

Verify image authenticity in one API call. mSightFlow runs four complementary detectors in parallel — ConvNeXt-Base, DIRE (diffusion-specialist), SBI (deepfake faces), and UniversalFakeDetect (CLIP probe). The ensemble catches what any single model misses.

Try in Studio free Read API reference

Model: ConvNeXt + DIRE + SBI + UniversalFakeDetect (ensemble of 4)
Inputs: JPG/PNG ≤ 25 MB
Outputs: is_ai_generated · per-detector confidence · ensemble score
Latency: ~600 ms p50 (4 in parallel)
Free quota: 300 calls / month

Single-model AI-generated detectors all fail on the same kind of input: outputs from a generator they weren't trained on. The generator landscape moves faster than any individual detector can keep up. mSightFlow's ensemble approach combines four detectors with different inductive biases — pixel-statistics, diffusion-reconstruction, face-blending, and CLIP features — so an image that fools one usually trips at least one other.

The four detectors — complementary, not redundant

ConvNeXt-Base

General classifier. Fine-tuned ConvNeXt-Base discriminator trained on real-vs-AI pairs across multiple generators. The default workhorse.

DIRE

Diffusion-specialist. Diffusion Reconstruction Error — measures how easily a diffusion model can reconstruct the image. High score = likely diffusion output.

SBI

Deepfake-face detector. Self-Blended Images — purpose-built for face-swap / lip-sync deepfakes. Strongest on portrait and face-centred imagery.

UniversalFakeDetect

CLIP linear probe. Linear probe on top of CLIP ViT-L/14 features. Generalises to unseen generators where pixel-statistics-based detectors miss.

When AI-detection is the right tool

Media & newsroom verification

Pre-publish authenticity check on user-submitted photos and stringer content. Pair with C2PA-provenance lookups when available.

Content moderation

UGC platforms screening for synthetic-imagery policy violations — particularly non-consensual deepfakes and AI-generated CSAM-adjacent content.

KYC & identity

Detect synthetic / morphed ID photos as one layer in a multi-signal identity verification stack. Combine with liveness + face matching.

Code — request and decide

Python

import os, requests
from pathlib import Path

resp = requests.post(
    "https://api.msightflow.ai/v1/ai-detect",
    headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
    files={"image": Path("suspicious.jpg").read_bytes()},
).json()

print("verdict:", "AI-generated" if resp["is_ai_generated"] else "authentic")
print(f"ensemble confidence: {resp['ensemble_confidence']:.3f}")
for det, score in resp["detectors"].items():
    print(f"  {det:>22}  {score:.3f}")

Node.js

import fetch from "node-fetch";
import FormData from "form-data";
import fs from "fs";

const form = new FormData();
form.append("image", fs.createReadStream("suspicious.jpg"));

const resp = await fetch("https://api.msightflow.ai/v1/ai-detect", {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.MSF_API_KEY}` },
  body: form,
});
const r = await resp.json();
console.log(r.is_ai_generated ? "AI-generated" : "authentic", r.ensemble_confidence);

cURL

curl -X POST https://api.msightflow.ai/v1/ai-detect \
  -H "Authorization: Bearer $MSF_API_KEY" \
  -F "image=@suspicious.jpg"

Threshold policy

# Policy: flag for human review when ensemble confidence is in the grey zone.
THRESHOLD_FLAG = 0.4
THRESHOLD_BLOCK = 0.75

def decide(image_path):
    r = requests.post(api + "/ai_detection",
        headers=hdr, files={"image": open(image_path, "rb")}).json()
    s = r["ensemble_confidence"]
    if s >= THRESHOLD_BLOCK:
        return "block"
    if s >= THRESHOLD_FLAG:
        return "review"
    return "allow"

Honest limitations

Novel generators degrade scores. A detector trained before a generator existed will under-perform on its outputs. The ensemble buys robustness — but isn't magic.
Heavy compression / filters fool DIRE. JPEG re-encoding, social-platform filters, and aggressive denoising all attack diffusion fingerprints. UniversalFakeDetect and ConvNeXt-Base hold up better.
Faces are the hard case. SBI is best-in-class for deepfake faces, but adversarially-aware deepfakes specifically attack face detectors. Combine with provenance and liveness.
This is a probabilistic signal, not proof. Use ensemble_confidence as one input into a policy with thresholds + human review. We don't recommend a one-and-done block at any threshold.
Adversarial attacks exist. Researchers have shown that small perturbations can fool individual detectors. The ensemble raises the cost but doesn't eliminate it.

Pricing — same as every other endpoint

Free

300 API calls / month
All 4 detectors
Ensemble + per-detector scores
No credit card

Start free

Standard

$10/mo

5,400 API calls / month
Batch up to 10 images / call

Pick Standard

Pro

$29/mo

Unlimited calls
Higher per-provider quotas

Go Pro

Related features

CLIP image search

Match against a database of known fakes / reference images. Pair with detection for provenance + similarity defence.

Learn more

Captioning + VQA

Use VQA to ask “does this look AI-generated?”. Weaker than the specialised ensemble but useful for explanations.

Learn more

Object detection

For face-deepfake workflows: detect faces, then run /ai_detection on each face crop for per-face authenticity scores.

Learn more

FAQ

Why four detectors instead of one?

No single AI-generated detector generalises to every image-generation model. ConvNeXt-Base is a strong general classifier; DIRE specialises in diffusion outputs (Stable Diffusion class); SBI is purpose-built for deepfake faces; UniversalFakeDetect is a CLIP linear probe that generalises to unseen generators. The ensemble combines complementary signals — a model that fools one detector usually trips at least one of the others.

How accurate is it on the latest generators?

On in-distribution (Stable Diffusion, Midjourney, DALL-E 3) the ensemble is ~95% AUC. On novel generators released after the detector's training cutoff, accuracy degrades — typically 70-85% AUC in our internal tests. Treat the score as a strong signal, not a verdict; for high-stakes uses, combine with provenance signals (C2PA, EXIF, network metadata).

Does it work on cropped, recompressed, or filtered images?

Robustness varies by detector. DIRE is sensitive to compression. UniversalFakeDetect and ConvNeXt-Base are more robust to JPEG and resizing. The ensemble degrades gracefully — even when one detector is fooled, two or three others usually catch the artefact. Heavy retouching can reduce accuracy on faces (the SBI detector is most affected).

Can it detect deepfake video?

Not directly — we run on still images. For video, sample frames (e.g. every 30th) via /v1/video/upload and run /ai_detection on each. SBI was originally trained on face deepfakes and gives the strongest video signal when faces are present in frame.

Is this enough for KYC / identity verification?

It's a strong layer but not a complete KYC stack. Combine with liveness detection (separate vertical), face matching, document authenticity checks, and provenance signals. The ai_detection endpoint specifically targets pixel-level synthesis artefacts — different threat than impersonation via real-but-stolen photos.

Four detectors. One verdict.

300 free API calls / month. Ensemble defence against AI-generated imagery.

Start free Or try in Studio