Extract text from any image— Latin, CJK, Cyrillic, mixed scripts.
Get text out of images in one API call. mSightFlow runs EasyOCR by default (fast, 80+ scripts), with one-flag fallback to GPT-4o or Claude Vision when the layout is harder than clean printed text — handwriting, tables, low contrast.
- Model
- EasyOCR + optional GPT-4o / Claude Vision fallback
- Inputs
- JPG/PNG ≤ 25 MB
- Outputs
- text regions · bboxes · confidence · full string
- Latency
- ~250 ms (EasyOCR) / ~3 s (cloud)
- Free quota
- 300 calls / month
Two engines under one endpoint. EasyOCR runs locally on our GPUs at ~250 ms per image and handles 80+ scripts — perfect for receipts, barcodes, license plates, signage, and the bulk of clean-print tasks. When EasyOCR struggles (handwriting, complex tables, low-contrast scans), set provider=cloud and the call reroutes to GPT-4o or Claude Vision with structured output support.
When OCR is the right tool
Receipt & invoice parsing
Expense tracking, account reconciliation, tax-prep tools. Pair with VQA for structured field extraction.
ID document extraction
KYC, age verification, onboarding flows. EasyOCR for the text; pair with face detection / liveness for full KYC.
License plates & signs
ANPR pipelines, signage digitisation, whiteboard capture, sticky-note-to-text. Combine with detection for plate cropping.
EasyOCR vs cloud-LLM fallback
| EasyOCR (default) | Cloud LLM (GPT-4o / Claude) | |
|---|---|---|
| Latency | ~250 ms | ~2-4 s |
| Cost (per call) | 1 quota unit | 1 quota unit + provider quota |
| Handwriting | Poor | Excellent |
| Complex tables / forms | Mixed | Strong (with output_schema) |
| Multi-language | 80+ scripts, must specify | Auto-detect, 100+ |
| Structured output | No | Yes (JSON schema) |
| Best for | Clean printed text, batch | Hard layouts, structured extraction |
Code — Python, Node, cURL
import os, requests
from pathlib import Path
resp = requests.post(
"https://api.msightflow.ai/v1/ocr",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
files={"image": Path("receipt.jpg").read_bytes()},
data={"language": "en", "provider": "easyocr"},
)
result = resp.json()
print(result["full_text"])
for r in result["regions"]:
print(f" {r['confidence']:.2f} {r['box']} {r['text']}")
# Hard layout? Fall back to GPT-4o for structured output.
resp = requests.post(
"https://api.msightflow.ai/v1/ocr",
headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
files={"image": Path("invoice.jpg").read_bytes()},
data={
"provider": "cloud",
"model": "gpt-4o",
"output_schema": '{"invoice_number": "string", "total": "number", "date": "string"}',
},
)
print(resp.json()["structured"])
# → {"invoice_number": "INV-00231", "total": 423.50, "date": "2026-04-12"}
import fetch from "node-fetch";
import FormData from "form-data";
import fs from "fs";
const form = new FormData();
form.append("image", fs.createReadStream("receipt.jpg"));
form.append("language", "en");
const resp = await fetch("https://api.msightflow.ai/v1/ocr", {
method: "POST",
headers: { Authorization: `Bearer ${process.env.MSF_API_KEY}` },
body: form,
});
const { full_text } = await resp.json();
console.log(full_text);
curl -X POST https://api.msightflow.ai/v1/ocr \
-H "Authorization: Bearer $MSF_API_KEY" \
-F "image=@receipt.jpg" \
-F "language=en"
Pricing — same as every other endpoint
Related features
Captioning + VQA
Ask “what's the total on this receipt?” — structured answers, no OCR post-processing.
Learn moreObject detection
Detect license plates, then OCR them. Detect documents, then OCR them. The two are paired.
Learn moreZero-shot detection
Find “a price tag”, “a barcode”, “a serial number”, then OCR the crop. No training.
Learn moreFAQ
EasyOCR vs cloud-LLM fallback — when does which win?
EasyOCR is fast (~250 ms), free-tier, and accurate on clean printed text in 80+ languages. Cloud-LLM fallback (GPT-4o / Claude Vision) is slower (~2-4 s) and pricier but handles handwriting, weird layouts (tables, multi-column receipts), low-contrast photos, and obscured text much better. Use EasyOCR by default; flip provider=cloud when accuracy matters more than speed.
Which languages are supported?
EasyOCR supports 80+ scripts including Latin (English, German, French, etc.), Cyrillic, CJK (Chinese, Japanese, Korean), Arabic, Devanagari, Thai, and more. Pass language=en, language=ja, or comma-separated for mixed-script. Cloud-LLM fallback supports any language the underlying model handles (~100+ for GPT-4o).
Can I do receipt or invoice parsing?
Yes, but for structured extraction (line items, totals, dates), pair OCR with /v1/describe?mode=vqa or use a cloud-LLM provider with a structured-output schema. OCR alone gives you raw text; structuring it into fields is a downstream step the captioning/VQA endpoint helps with.
Is OCR affected by image rotation?
EasyOCR handles rotated text within ±45° well; beyond that, results degrade. Set rotate=auto to let EasyOCR detect and rotate. For document scans, run /cv_tools/auto_orient first to normalise EXIF rotation.
Can I extract text from PDFs or scanned multi-page documents?
Not directly today — mSightFlow OCR is image-input. For PDFs, rasterise each page (e.g. pdf2image) and POST one image per page. A native PDF-input endpoint is on the roadmap.
Pixels in. Text out.
300 free API calls / month. EasyOCR + cloud fallback. No credit card.