You know you need to convert to all digital assets for your office, but the thought of scanning all of those paper files, one page at a time, has you a bit discouraged. Sure, it would be nice to have a digital copy of every document you have, but would you have to shut down the office for a week in order to take time to scan all of those files and then tag them?
Fortunately, some document management software uses a tool called optical character recognition (OCR) to help automate the task of organizing the files.
When you’re scanning a document, you’re basically sending a photo of the document to the computer. A photo of a page might be fine for reading, but it’s certainly not helpful if you want the text to be searchable. Optical character recognition software bridges this gap. OCR looks at the scanned image and converts it to editable text. If you have a lot of documents to scan, OCR is indispensable. You can digitize everything from books to old student papers.
Depending upon what you want to do with the converted text, you may save it as a word processing file for editing or as a PDF for reading. Printed tables can be converted to spreadsheets so their data can be easily accessed and added to a large database. Some OCR software can even convert printed documents to HTML, making them viewable on the web.
The text created by OCR is editable, which might be the end goal for many users who want to take a lengthy text and put it somewhere on the web or in a digital document without having to retype the entire thing. But this use is just the beginning of all of the possibilities for OCR. By automating where the software places the document once it “reads” it, you can effectively auto-archive reams of paper by just feeding them into the scanner.
Let’s say you’re a law firm and you have boxes of files for many different cases. When you digitize all of the files, you can teach the software to organize by a unique tag, like case number. This means that you can set all of the tags for a certain case number (defendant, charges, outcome, court, etc.) and have the software assign them to a file every time it reads that case number.
This way, if you want to compile a list of all of the times you successfully defended a client for jaywalking, all of the metadata will be in place.
If you’re considering optical character recognition software, ignore all of the bells and whistles on the interface and just be sure that you choose one with the most accurate recognition out there. If you’re just scanning a 10-page report, you can afford a few errors that you can proofread and fix—it’s still better than retyping it. But if you’re scanning thousands of pages, you can’t afford an OCR program that’s inaccurate.