Following is a glossary of some common document imaging terms that you should be familiar with as you move forward in the market (they have loosely been set up in order from the beginning of an imaging process to the end):
Scanning: This is the physical process of converting a paper to electronic images. It is accomplished through a scanner, a device that contains cameras, lighting, and other imaging electronics.
ADF (automatic document feeder): This is what separates dedicated document scanners from photo scanners. An ADF is a paper tray that enables a user to place multiple documents in a stack, which will then automatically be run through a scanner in succession.
MFP (multi-function peripheral): A printer that also offers document copying and scanning capabilities.
Double-feed detection: Document scanners will utilize various techniques in order to enable paper to be fed more smoothly and prevent paper jams.
Capture: The entire process, from scanning through data capture, associated with converting a paper document to electronic information.
Image processing: This is a set of technologies typically applied to a document image as it’s being captured. Auto-color and blank-page detection, as well as deskewing and despeckling, are common processes designed in order to improve the quality of images being output.
Grayscale thresholding: An advanced image-processing technique that involves taking grayscale data and using them in order to create higher-quality bitonal images. It’s typically very effective on lower-contrast images and in applications where OCR/ICR is being applied.
TIFF: A document image format often utilized in ECM systems. TIFF files are typically bitonal and incorporate Group 4 compression. A dedicated TIFF viewer is often required, as most browsers do not support TIFF.
PDF: A document format utilized in some document imaging applications. Both TIFF and PDF are ISO standards. PDF is a richer, more complex format, which can be a double-edged sword for document imaging.
OCR/ICR (optical/intelligent character recognition): Technology for electronically recognizing numbers, letters, and words on a document image. Although technically it’s not correct, many people refer to OCR as machine print recognition and ICR as handprint recognition. OCR/ICR can be applied to an entire page for file conversion (image to word processing, for example) or full-text indexing purposes or to fields for data extraction.
Forms processing: Extracting data from a document image. OCR/ICR is typically utilized, although mark recognition (think bubble tests), key-from-image techniques, and database lookups can also be applied. Captured data is typically fed into another software application like an ERP, accounting, or ECM system.
IDR (intelligent document recognition): Advanced forms processing that has helped expand its use from strictly structured forms (tax, health insurance claims, surveys, etc.) to more semi-structured (invoices) and even unstructured (correspondence) documents.
Auto-classification: A form of IDR used in order to identify document type; typically utilized in order to automate the routing of images to the next step in a workflow.
Confidence levels: Feedback on how confident an automated recognition application is on the accuracy of data being captured. Thresholds can be set, and if the confidence level falls below a certain percentage, manual intervention can be invoked.
QA (quality assurance): Human intervention in automated document capture processing. It can be invoked for every document and/or for a random sampling, and it can be based on confidence levels.
Workflow: The ability to move documents among various people and systems that need to access the information on the documents in order to complete a business process. Often, document imaging is utilized in order to facilitate workflow automation, as moving electronic documents among multiple parties can be more efficient than working with paper documents.
ECM (enterprise content management): The technology group under which document imaging is often included. Document images are content. Management features typically include automated workflow, access control, mark-up capabilities, and lifecycle/records management.
Metadata: Descriptive information about a document. For an image, it might include a date, the type of file, an account number or name, and other relevant information depending on the application. Metadata is often applied at the capture stage of an imaging process and utilized in workflow, retrieval, and archiving.
Records management: Software for controlling the lifecycle of a document. It typically comes into play after a document has reached its final format and is being archived for compliance (either internal policy or external regulation) purposes. It helps organizations effectively manage their archives by enforcing consistent processes related to the disposal of information.
These are not the only terms that you will run into in the field, but hopefully this glossary is enough to at least get you started and enable a clear chain of communication between you, your vendors, and your customers.