EasyOCR + optional cloud-LLM fallback

Extract text from any image— Latin, CJK, Cyrillic, mixed scripts.

Get text out of images in one API call. mSightFlow runs EasyOCR by default (fast, 80+ scripts), with one-flag fallback to GPT-4o or Claude Vision when the layout is harder than clean printed text — handwriting, tables, low contrast.

Model
EasyOCR + optional GPT-4o / Claude Vision fallback
Inputs
JPG/PNG ≤ 25 MB
Outputs
text regions · bboxes · confidence · full string
Latency
~250 ms (EasyOCR) / ~3 s (cloud)
Free quota
300 calls / month

Two engines under one endpoint. EasyOCR runs locally on our GPUs at ~250 ms per image and handles 80+ scripts — perfect for receipts, barcodes, license plates, signage, and the bulk of clean-print tasks. When EasyOCR struggles (handwriting, complex tables, low-contrast scans), set provider=cloud and the call reroutes to GPT-4o or Claude Vision with structured output support.

When OCR is the right tool

Receipt & invoice parsing

Expense tracking, account reconciliation, tax-prep tools. Pair with VQA for structured field extraction.

ID document extraction

KYC, age verification, onboarding flows. EasyOCR for the text; pair with face detection / liveness for full KYC.

License plates & signs

ANPR pipelines, signage digitisation, whiteboard capture, sticky-note-to-text. Combine with detection for plate cropping.

EasyOCR vs cloud-LLM fallback

EasyOCR (default)Cloud LLM (GPT-4o / Claude)
Latency~250 ms~2-4 s
Cost (per call)1 quota unit1 quota unit + provider quota
HandwritingPoorExcellent
Complex tables / formsMixedStrong (with output_schema)
Multi-language80+ scripts, must specifyAuto-detect, 100+
Structured outputNoYes (JSON schema)
Best forClean printed text, batchHard layouts, structured extraction

Code — Python, Node, cURL

Python — EasyOCR
import os, requests
from pathlib import Path

resp = requests.post(
    "https://api.msightflow.ai/v1/ocr",
    headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
    files={"image": Path("receipt.jpg").read_bytes()},
    data={"language": "en", "provider": "easyocr"},
)
result = resp.json()
print(result["full_text"])
for r in result["regions"]:
    print(f"  {r['confidence']:.2f}  {r['box']}  {r['text']}")
Python — cloud fallback + structured output
# Hard layout? Fall back to GPT-4o for structured output.
resp = requests.post(
    "https://api.msightflow.ai/v1/ocr",
    headers={"Authorization": f"Bearer {os.environ['MSF_API_KEY']}"},
    files={"image": Path("invoice.jpg").read_bytes()},
    data={
        "provider": "cloud",
        "model": "gpt-4o",
        "output_schema": '{"invoice_number": "string", "total": "number", "date": "string"}',
    },
)
print(resp.json()["structured"])
# → {"invoice_number": "INV-00231", "total": 423.50, "date": "2026-04-12"}
Node.js
import fetch from "node-fetch";
import FormData from "form-data";
import fs from "fs";

const form = new FormData();
form.append("image", fs.createReadStream("receipt.jpg"));
form.append("language", "en");

const resp = await fetch("https://api.msightflow.ai/v1/ocr", {
  method: "POST",
  headers: { Authorization: `Bearer ${process.env.MSF_API_KEY}` },
  body: form,
});
const { full_text } = await resp.json();
console.log(full_text);
cURL
curl -X POST https://api.msightflow.ai/v1/ocr \
  -H "Authorization: Bearer $MSF_API_KEY" \
  -F "image=@receipt.jpg" \
  -F "language=en"

Pricing — same as every other endpoint

Free

$0

  • 300 API calls / month
  • EasyOCR + 80+ scripts
  • Visualisation overlay
  • No credit card
Start free

Pro

$29/mo

  • Unlimited calls
  • Higher per-provider quotas
Go Pro

Related features

FAQ

EasyOCR vs cloud-LLM fallback — when does which win?

EasyOCR is fast (~250 ms), free-tier, and accurate on clean printed text in 80+ languages. Cloud-LLM fallback (GPT-4o / Claude Vision) is slower (~2-4 s) and pricier but handles handwriting, weird layouts (tables, multi-column receipts), low-contrast photos, and obscured text much better. Use EasyOCR by default; flip provider=cloud when accuracy matters more than speed.

Which languages are supported?

EasyOCR supports 80+ scripts including Latin (English, German, French, etc.), Cyrillic, CJK (Chinese, Japanese, Korean), Arabic, Devanagari, Thai, and more. Pass language=en, language=ja, or comma-separated for mixed-script. Cloud-LLM fallback supports any language the underlying model handles (~100+ for GPT-4o).

Can I do receipt or invoice parsing?

Yes, but for structured extraction (line items, totals, dates), pair OCR with /v1/describe?mode=vqa or use a cloud-LLM provider with a structured-output schema. OCR alone gives you raw text; structuring it into fields is a downstream step the captioning/VQA endpoint helps with.

Is OCR affected by image rotation?

EasyOCR handles rotated text within ±45° well; beyond that, results degrade. Set rotate=auto to let EasyOCR detect and rotate. For document scans, run /cv_tools/auto_orient first to normalise EXIF rotation.

Can I extract text from PDFs or scanned multi-page documents?

Not directly today — mSightFlow OCR is image-input. For PDFs, rasterise each page (e.g. pdf2image) and POST one image per page. A native PDF-input endpoint is on the roadmap.

Pixels in. Text out.

300 free API calls / month. EasyOCR + cloud fallback. No credit card.