OCR settings that actually matter
Text layers, image-only scans, and when 300 dpi is worth it.
“OCR” gets used for two different jobs, and knowing which one you need saves a lot of frustration. The first job is extracting text that’s already there: most PDFs made from Word, a website, or an export already contain a real text layer, and ‘recognition’ is just reading it out, fast and exact. The second job is recognizing pixels: a scanned page is a photograph of text, and turning it back into characters is genuine machine-learning work.
FernPDF’s OCR tool handles the first case entirely on-device: it reads the embedded text layer and rebuilds it as a searchable PDF or a plain .txt file. If your document came from software rather than a scanner, this is all you need, and the output is character-perfect because nothing is being guessed.
For true scans, resolution drives everything. 300 dpi is the sweet spot for recognition accuracy. Below 200 dpi, characters lose the detail engines rely on, and accuracy falls off a cliff. If you control the scanner, scan at 300 dpi grayscale; color rarely helps recognition and triples the file size.
Skew is the silent killer. A page scanned two degrees off-axis can drop recognition accuracy by double digits, because line segmentation breaks before character recognition even starts. Good OCR pipelines deskew first; if yours doesn’t, re-scan flat rather than fighting the output.
Finally, keep the original. OCR output belongs in an invisible layer underneath the scan image, not as a replacement for it. That way the document still looks exactly like the paper, but you can search, select, and copy. That’s the “searchable PDF” format, and it’s what archives and courts expect.
Try it yourself. Every FernPDF tool runs in your browser. Open one and watch the network tab.
Open OCR PDF