Optimize scanned documents
Whether your document comes from a camera or scanner, learn how to enhance the result in Acrobat for a better PDF viewing and searching experience.
Whether your document comes from a camera or scanner, you can enhance the result in Acrobat for a better PDF viewing and searching experience. I’m starting with a JPEG file that was previously scanned but you could also select the create PDF tool to access your scanner directly within Acrobat. This file’s just an image. You can’t even search on text within the file.
To optimize and recognize text in this file, I’ll select Scan & OCR in the Tools center.
Then from the dropdown, I’ll select Enhance Scanned Document but you could also start from a Camera Image.
Then, I’ll select the gear icon, and in the Enhanced Scanned PDF dialogue box, you can control the image settings of how scanned images are filtered and compressed. The default settings are suitable for a wide range of scanned pages but you may want to customize the settings for higher quality images, smaller file sizes or to address scanning issues. When you Apply Adaptive Compression, it divides each page into black and white, gray scale and color regions, and then chooses a representation that’ll preserve the appearance while highly compressing each type of content. When scanning color or gray scale, select JPEG2000, ZIP, or JPEG. JPEG 2000 is not recommended if you’re creating PDF/A files. For black and white or monotone images, select either the lossless or lossy version of JPEG2 or CCITT Group 4. You’ll find the highest quality levels use the lossless methods. This slider sets the balance between smallest file size and maximum image quality. Further to the right, the higher the quality and the file size, and the more you move to the left, the smaller the file size and quality. Select Edit to modify the filters that are applied. Deskew rotates any page that’s not square which I want turned on for my scan. Background removal adjusts the contrast between letters and background for clarity. Descreen removes halftone dot structure which can also reduce JPEG compression, cause moire patterns, and make text difficult to recognize. Select Off when scanning a page with no pictures or filled areas. The Text Sharpening filter can be used if text characters are touching each other in which case you should use a higher or brighter setting. If the characters are separated, use a lower or darker setting.
Select Edit to modify the text recognition settings. You can select the language and set the PDF output style. Searchable Image keeps the image on top and places an invisible layer of text underneath so you can search the file. and Editable Text and Images converts the PDF to real text and graphics that you can edit or export.
Once you’re done configuring in the settings, select Enhance to run the compression and filters. Notice how the scan is deskewed and optimized and now I can even search on texts in the file and the original image is layered on top.
Here are a few tips to consider when optimizing a scanned image. Acrobat accepts images between 10 dpi and 3000. If you’re trying to recognize text, 72 dpi or higher is required and an input resolution higher than 600 dpi will be downsampled. Another tip is lossless compression can only be applied to monochrome images. Those are images with only one color or value of one color. And if you save the PDF using Save As, the scanned image may again be compressed with lossy compression. For most pages, black and white scanning at 300 dpi will produce text best suited for recognition. But if you find text recognition is unsuccessful in many words or the text is nine points or smaller, try scanning at a higher resolution. Scan in black and white whenever possible as well.
Now here are some scanner-specific tips. Avoid dithering or halftone scanner settings because these settings can improve the appearance of photographs but they make it difficult to recognize text. For text printed on colored paper, try increasing the brightness and contrast by about 10%. If your scanner has color-filtering capability, consider using a filter or lamp that drops out the background color. Or if the text isn’t crisp and drops out, try adjusting scanner contrast and brightness to clarify the scan.
If your scanner has a manual brightness control, adjust it so that the characters are clean and well formed. If characters are touching, use a higher or brighter setting, and if they’re separated, use a lower or darker setting.
When you come across scanned images that are large and difficult to read, be sure to use the compression, filter, and text recognition tools in Acrobat to optimize the viewing experience. -