How to use Image to Text
Extract editable text from images and scanned PDFs entirely in the browser. The OCR engine is Tesseract.js — open source, runs as a Web Worker on your machine, supports 24 languages across Latin, Cyrillic, CJK, Indic, and RTL scripts, and never uploads your file to a server. Recognised words come back with a confidence score so you can see at a glance which parts need a manual sanity-check.
Good for
- •Pulling text out of a scanned contract, receipt, or printed form
- •Searching screenshots and photo notes for a phrase you half-remember
- •Copying a recipe / paragraph / quote off a phone snap of a printed page
- •Pre-processing a scanned PDF before pasting the text into Word
- •Batch-OCR'ing whiteboard photos after a meeting
Not good for
- •Handwriting — Tesseract is trained on printed text; cursive scans return mush
- •Tables (the output is line-by-line; column structure collapses)
- •Mixed-script docs (e.g., English + Arabic paragraphs in the same image) — pick the dominant language and accept some loss
- •Very low-resolution images below ~150 DPI — accuracy drops sharply
- •Heavily stylised fonts (logos, decorative type, distressed lettering)
Walkthrough
Step by step
- 01
Drop the image or PDF
Tools menu → Image to Text. Accepts PNG, JPG, TIFF, and PDF up to 25 MB. Multi-page PDFs get OCR'd page-by-page and the text concatenates in reading order.
- 02
Pick the language
24 languages across Latin/European, Cyrillic, CJK, Indic, and right-to-left scripts. Pick the one that matches your document — the matching language pack lazy-loads on first use (~5–15 MB, cached afterwards).
- 03
Toggle AI assist (optional)
For hard scans (faded faxes, handwriting, low-light photos) flip the AI toggle. The page routes to a server-side vision model with a small daily quota per IP — quota lifts on the Platform tier.
- 04
Start recognition
Click Extract Text. The progress bar reports per-page as Tesseract works. A clean 5-page scan typically completes in 15-30 seconds on a modern laptop; older devices take ~2× as long.
- 05
Read the confidence highlights
Every recognised word is scored 0-100. ≥90% renders normal, 70-89% amber, below 70% red with a dotted underline. Skim the reds first — that's where errors hide.
- 06
Copy or download
Click Copy to grab the whole text to clipboard, or Download for a .txt file. The original file is never modified or stored on a server.
Tips
- •Crank source-image DPI to 300+ before scanning — recognition jumps from ~80% to ~97% on clean prints.
- •Crop tightly around the text before uploading; busy borders and whitespace confuse the segmentation step.
- •Even a 5° page skew costs ~10% accuracy. The pre-process pass deskews automatically but only within a few degrees — re-scan straight if it's obviously rotated.
- •Multi-page PDFs: OCR is CPU-bound, so the tab needs to stay open. Plug into power on a laptop if it's a 50+ page doc.
- •For languages not in the dropdown (Thai, Greek, more Indic scripts), the AI toggle uses a model that covers ~100 languages — quota-gated.
Frequently asked
Ready to use Image to Text?
Open Image to TextDocverix Platform
Need workflow + audit on every doc your team handles?
Docverix Platform turns these tools into a routed, audited pipeline — validator → supervisor → approver, with a complete audit trail.