Spot AI-generated imagesand deepfakes.
Verify image authenticity in one API call. mSightFlow runs four complementary detectors in parallel — ConvNeXt-Base, DIRE (diffusion-specialist), SBI (deepfake faces), and UniversalFakeDetect (CLIP probe). The ensemble catches what any single model misses.
- Model
- ConvNeXt + DIRE + SBI + UniversalFakeDetect (ensemble of 4)
- Inputs
- JPG/PNG ≤ 25 MB
- Outputs
- is_ai_generated · per-detector confidence · ensemble score
- Latency
- ~600 ms p50 (4 in parallel)
- Free quota
- 300 calls / month
Single-model AI-generated detectors all fail on the same kind of input: outputs from a generator they weren't trained on. The generator landscape moves faster than any individual detector can keep up. mSightFlow's ensemble approach combines four detectors with different inductive biases — pixel-statistics, diffusion-reconstruction, face-blending, and CLIP features — so an image that fools one usually trips at least one other.
The four detectors — complementary, not redundant
ConvNeXt-Base
General classifier. Fine-tuned ConvNeXt-Base discriminator trained on real-vs-AI pairs across multiple generators. The default workhorse.
DIRE
Diffusion-specialist. Diffusion Reconstruction Error — measures how easily a diffusion model can reconstruct the image. High score = likely diffusion output.
SBI
Deepfake-face detector. Self-Blended Images — purpose-built for face-swap / lip-sync deepfakes. Strongest on portrait and face-centred imagery.
UniversalFakeDetect
CLIP linear probe. Linear probe on top of CLIP ViT-L/14 features. Generalises to unseen generators where pixel-statistics-based detectors miss.
When AI-detection is the right tool
Media & newsroom verification
Pre-publish authenticity check on user-submitted photos and stringer content. Pair with C2PA-provenance lookups when available.
Content moderation
UGC platforms screening for synthetic-imagery policy violations — particularly non-consensual deepfakes and AI-generated CSAM-adjacent content.
KYC & identity
Detect synthetic / morphed ID photos as one layer in a multi-signal identity verification stack. Combine with liveness + face matching.
Code — request and decide
import os, requests
from pathlib import Path
resp = requests.post(
"https://api.msightflow.ai/v1/ai-detect",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
files={"image": Path("suspicious.jpg").read_bytes()},
).json()
print("verdict:", "AI-generated" if resp["is_ai_generated"] else "authentic")
print(f"ensemble confidence: {resp['ensemble_confidence']:.3f}")
for det, score in resp["detectors"].items():
print(f" {det:>22} {score:.3f}")
import fetch from "node-fetch";
import FormData from "form-data";
import fs from "fs";
const form = new FormData();
form.append("image", fs.createReadStream("suspicious.jpg"));
const resp = await fetch("https://api.msightflow.ai/v1/ai-detect", {
method: "POST",
headers: { Authorization: `Bearer ${process.env.MSF_API_KEY}` },
body: form,
});
const r = await resp.json();
console.log(r.is_ai_generated ? "AI-generated" : "authentic", r.ensemble_confidence);
curl -X POST https://api.msightflow.ai/v1/ai-detect \
-H "Authorization: Bearer $MSF_API_KEY" \
-F "image=@suspicious.jpg"
# Policy: flag for human review when ensemble confidence is in the grey zone.
THRESHOLD_FLAG = 0.4
THRESHOLD_BLOCK = 0.75
def decide(image_path):
r = requests.post(api + "/ai_detection",
headers=hdr, files={"image": open(image_path, "rb")}).json()
s = r["ensemble_confidence"]
if s >= THRESHOLD_BLOCK:
return "block"
if s >= THRESHOLD_FLAG:
return "review"
return "allow"
Honest limitations
- Novel generators degrade scores. A detector trained before a generator existed will under-perform on its outputs. The ensemble buys robustness — but isn't magic.
- Heavy compression / filters fool DIRE. JPEG re-encoding, social-platform filters, and aggressive denoising all attack diffusion fingerprints. UniversalFakeDetect and ConvNeXt-Base hold up better.
- Faces are the hard case. SBI is best-in-class for deepfake faces, but adversarially-aware deepfakes specifically attack face detectors. Combine with provenance and liveness.
- This is a probabilistic signal, not proof. Use ensemble_confidence as one input into a policy with thresholds + human review. We don't recommend a one-and-done block at any threshold.
- Adversarial attacks exist. Researchers have shown that small perturbations can fool individual detectors. The ensemble raises the cost but doesn't eliminate it.
Pricing — same as every other endpoint
Free
$0
- 300 API calls / month
- All 4 detectors
- Ensemble + per-detector scores
- No credit card
Related features
CLIP image search
Match against a database of known fakes / reference images. Pair with detection for provenance + similarity defence.
Learn moreCaptioning + VQA
Use VQA to ask “does this look AI-generated?”. Weaker than the specialised ensemble but useful for explanations.
Learn moreObject detection
For face-deepfake workflows: detect faces, then run /ai_detection on each face crop for per-face authenticity scores.
Learn moreFAQ
Why four detectors instead of one?
No single AI-generated detector generalises to every image-generation model. ConvNeXt-Base is a strong general classifier; DIRE specialises in diffusion outputs (Stable Diffusion class); SBI is purpose-built for deepfake faces; UniversalFakeDetect is a CLIP linear probe that generalises to unseen generators. The ensemble combines complementary signals — a model that fools one detector usually trips at least one of the others.
How accurate is it on the latest generators?
On in-distribution (Stable Diffusion, Midjourney, DALL-E 3) the ensemble is ~95% AUC. On novel generators released after the detector's training cutoff, accuracy degrades — typically 70-85% AUC in our internal tests. Treat the score as a strong signal, not a verdict; for high-stakes uses, combine with provenance signals (C2PA, EXIF, network metadata).
Does it work on cropped, recompressed, or filtered images?
Robustness varies by detector. DIRE is sensitive to compression. UniversalFakeDetect and ConvNeXt-Base are more robust to JPEG and resizing. The ensemble degrades gracefully — even when one detector is fooled, two or three others usually catch the artefact. Heavy retouching can reduce accuracy on faces (the SBI detector is most affected).
Can it detect deepfake video?
Not directly — we run on still images. For video, sample frames (e.g. every 30th) via /v1/video/upload and run /ai_detection on each. SBI was originally trained on face deepfakes and gives the strongest video signal when faces are present in frame.
Is this enough for KYC / identity verification?
It's a strong layer but not a complete KYC stack. Combine with liveness detection (separate vertical), face matching, document authenticity checks, and provenance signals. The ai_detection endpoint specifically targets pixel-level synthesis artefacts — different threat than impersonation via real-but-stolen photos.
Four detectors. One verdict.
300 free API calls / month. Ensemble defence against AI-generated imagery.