An area of concern that organizations are facing is the safeguard of a customer’s private information (PII). Invariably this is a complex problem as the PII can be in an unstructured format, on different document types submitted via different methods. Due to the complexities of this nature, redaction processing time becomes a factor, especially when large volumes of documents need to be viewed and redacted.
The redaction process can be automated and an exception queue can be set up for manual processing for those cases when a document fails to redact. The manual process requires a user interface (UI) to be used so the user can view the document, redact if necessary and save back to the document management system.
Two processes are required in order to accomplish an automated redaction process. The first is a workflow engine and secondly redaction software. The workflow application controls all I/O functions while the redaction software performs the processing within the individual work-steps. An example of the workflow would be for the workflow engine to make calls on the redaction application which in turn performs certain functions. For instance, the workflow engine retrieves a document from the document management system, makes a call to the redaction application, and the image cleanup and OCR processes are initiated. The workflow application records the history of the process in an extensive log for each event in the process.
The primary worksteps in the process are as follows:
- Image Retrieval – from the document management system document repository
• Image Cleanup – despeckle, deskew, and image enhancement
• OCR – to render document content to text
• Document Identification – in order to determine whether a document needs to be redacted or not
• Application of Redaction Rules – to locate and redact necessary PII fields, identify exceptions for manual redaction, and import back to the document management system
Requires the workflow application to integrate with the document management system document repository. Image retrieval is accomplished in either automated batch mode or manually one image at a time.
Image quality is difficult to control when the capture process is distributed and the source is unpredictable (fax, multi-function device, scanner). Therefore, it is extremely important to have a work step for image cleanup. The image cleanup process produces a sharper image with a smaller file size. This in turn produces greater redaction accuracy when identifying information on a document and the smaller file size will result in reduced processing times.
Each character is returned with an accuracy confidence value with alternate character information. One incorrect character can drastically change a word; for this reason the recognition engine reports a certainty value for all recognized words. Advanced font information and location information allows the redaction application to create text representations of the original, with a similar layout.
The system administrator sets up a profile for each document type. This profile is used to identify the document and to instruct the application to either redact the document or exclude it from redaction. If the document is excluded from redaction it is imported into the document management system without redaction. If the document is tagged for redaction it is sent to the next step, “Application of Redaction Rules”.
Application of Redaction Rules and Redaction
A comprehensive set of redaction rules are used to locate PII fields that require redaction. The application identifies the beginning and end of the field to be redacted and applies an overlay to obscure the document information. This overlay is user definable and can be a solid color or set of characters. The document is moved to an exception queue if the document was tagged for redaction and the redaction process failed to find any PII. Once in the exception queue users are able to retrieve the document in the manual redaction viewer and then manually redact the document. The redacted document is then imported into the document management system.