Software for PDF Scanning


When you get a document that has been scanned rather than exported from the software that created There are also many software and hardware packages that scan paper directly into PDF. For now, I'm not going to address using Acrobat or other tools as the scanning software. For our purposes today, let's just say, "you've got those image files that you want to convert into something you can search."

The unique thing about PDF is that you can have an exact image of the document, plus the text, plus all kinds of metadata Scan PDF  "Paper Capture" tools in Acrobat, the software reads the picture, and figures out what the text is. So while you still see the "image," the software can also read the underlying text. OCR is not perfect, and it works best on first generation, laser printed images (just like your eyes do). In the past decade, however, OCR technology has gotten surprisingly accurate.

Scanning and OCR with Acrobat

I see that one area that concerns many people is how to use the OCR (Optical Character Recognition) abilities of Acrobat. Here's an overview, and I'll try to deal with other OCR issues very soon.

Acrobat can import scanned images (best in TIFF) or interface with any TWAIN driver to scanners and digital cameras. This imports only pixel data, so a text recognition step is needed to create searchable text and possibly reduce file size. Acrobat calls this OCR step paper capture.


Convert scanned pages to searchable Adobe PDF files that anyone with the free Adobe Reader can view, navigate, and print.     
           
    •     Efficiently correct OCR text suspects with the new QuickFix tool.    
           
    •     Use the new Zone tool to define areas of scanned pages to be treated as images, text, or even keywords.    
           
    •     Decrease processing time with workload balancing and multi-processor support (Cluster Edition only).    
           
    •     Create your own web interface with simple html pages using the Acrobat Capture SDK.

Bring your paper documents to life on the Web

Bridge the gap between your paper and digital workflows. Adobe® Acrobat® Capture® 3.0 is a professional production tool that teams with scan to pdf your scanner to convert volumes of paper documents into searchable Adobe Portable Document Format (PDF) files. Accurate OCR, advanced page and content recognition, and powerful cleanup tools let you turn all your important paper-based information into high-quality electronic documents ready for publication via the Web, intranets, extranets, CD-ROM, and more. Sophisticated Scan PDF productivity features streamline processing from start to finish, so you can get your jobs done more efficiently than ever.

When it's done, don't forget to File > Save the document. And there you have it. (At this point, I always like to do a little test by running a quick search on a word that I see on the first page. It just makes me feel better to know that it worked. multipage PDF scanning software I also have a continuing dialogue about what to do with the original TIFF file...)

As I said, if your image file is from a laser printed copy, and it's a decent scan, the OCR accuracy is amazingly good. But it may have garbled some words, so if you want to get really fancy, go back to Document > Paper Capture and select "Find first OCR suspect" or "Find all OCR suspects." This identifies characters that the OCR engine had problems with, and gives you a chance to correct the text. You can fix the spelling if it's important to you -- say for a proper name or term. That way you can be sure that the search software will find it. Otherwise, for a common word, I'd just save time and let it slide.
More details: www.softi.co.uk

PDF Scanning Software