Kofax KTM – Understanding Your Kofax System`s Potential

Kofax KTM – Understanding Your Kofax System’s Potential
March 17, 2015
Adrian Enders, Senior ECM Consultant
DoxTek, Inc.
References
1. Van Ittersum, Randy and Erin Spalding, 2005. “Understanding the Difference Between Structured and Unstructured
Documents" - http://www.disusa.com/privatelibrary/documents/WP_Structured_&_Unstructured_Documents.pdf
2. . Fenton, Paul, January 6, 2014. "10 Benefits of Moving to Electronic Document Management System" http://blog.montrium.com/blog/10-benefits-of-moving-to-electronic-document-management-for-life-sciencecompanies
Overview of KTM
Kofax Transformation Modules (KTM) is an advanced forms processing engine designed to identify
electronic documents and then automatically extract data from the document. KTM specializes in
processing unstructured documents. A structured document is a document that has the same layout,
and the data always appears in the same location on the page; some examples are W-2, IRS 1040 Tax
Form, or a Form 4506-T. With unstructured documents, data can appear in unexpected places on the
document. KTM can be used for many applications; one of the most obvious is to support uploading
documents into an Electronic Document Management System (EDMS).
There are many sound business reasons to employ an EDMS: to lower manual labor costs, facilitate
collaboration, and/or to increase security and control. One of the major steps in deploying an EDMS
system or moving closer to a paper-less process, is to convert paper documents into electronic images.
Converting paper to electronic images is just part of the process; you also need to consider how the
documents will be retrieved in the future. An EDMS system allows you to assign keywords that describe
or identify the document. For example, on a W2 the form name (W2) and employee name (ex., Jim
Shoe) could be used to retrieve the W2 form for a specific employee. Rather than having employees
manually assign and type in the keywords for each document, KTM will automatically extract the data
and assign it to the image. This leaves your employees with more time for doing business rather than
typing data into an EDMS system.
There are three steps in processing these electronic documents: separating, classifying, and extracting
the data.
Separation
Separation determines where a document starts and stops. In the ideal situation, you would want to
scan the documents by simply stacking them into a document scanner and pressing "Start".. Minimal
pre-scanning preparation is most effective. KTM can automatically separate a stack of documents after
scanning.
Join us!
For more information contact us at: [email protected] or 866-678-8400. www.doxtek.com
KTM uses several methods to automatically separate pages into documents. One method is to insert
separator pages that contain a unique look or unique data to indicate to KTM that this page is the start
of a new document. This provides a high level of accuracy but requires preparation time for your
employees to create and insert the separator pages in the correct location before scanning.
If the first pages of the documents being scanned are all different from each other or contain some
unique data, then KTM can be set up to perform automatic separation based on these unique
characteristics. Scanned documents are compared to a sample set of documents given to KTM to
determine the first page, and then KTM will separate each time a sample page is found in the stack.
When a sample page is identified, all pages following that page are associated as one document until
KTM identifies the next sample page. This method generally requires only 1-5 copies of each document
as samples to determine separation.
In some cases, documents are more complex or do not vary in appearance enough for KTM to
distinguish between the start and end of a document. In this case, KTM can be configured for trainable
document separation. The content of the document is analyzed to determine first, middle, and last
pages. This requires more sample images be provided to KTM as a base set to identify the first and last
pages in a document.
Classification
Classification identifies what type of document is being scanned. Classification answers the question,
"What document is this?" Is it a W2 form, a travel request, or an employee’s I-9 form? There are three
methods that KTM uses to identify or classify the document.
First, the layout, or the "look" of the document, is used to determine what the document is. This is
generally the easiest and fastest method if feasible.
Second, the content of the document is used to determine what the document is. This requires that
Optical Character Recognition (OCR) be performed first to locate the words, so the process takes a little
longer than layout classification. KTM then runs a series of algorithms to match the words to its trained
document set.
Finally, specific instructions can be applied to the classification set. If the document specifically contains
the words "invoice number", "invoice amount," and "amount due," then it's probably an invoice.
As you probably noticed, document classification can be closely intertwined with document separation.
Documents are not always separated before they are classified. These classification techniques can be
blended to provide successful automatic document separation and classification.
Extraction
Extraction is the process of reading the data on the page. OCR is performed to locate the words on the
pages of the document. Remember that KTM is primed for extracting data from unstructured
documents. When your company receives invoices from your vendors, each document from each
vendor will look different but contain the same data. KTM uses key words as markers on the document
to find the data. For example, the invoice amount will always be currency and probably close to words
like "total", "total due", "pay amount," or something similar.
Join us!
For more information contact us at: [email protected] or 866-678-8400. www.doxtek.com
Some pieces of data contain specific patterns; amounts, dates, phone numbers, etc. KTM also uses the
format patterns to find information.
Additionally, the content of the document can be compared to external sources for identification. KTM
can compare a database table of vendors to find the vendor name and address on a document, then it
can automatically associate the document to that vendor with confidence.
Summary
KTM can easily be used for a variety of applications, perhaps most notably, for the ability to allow an EDMS to
help manage the automatic extraction of data. There are many business reasons to employ an EDMS. Although
cost, historically, has been a driving factor for many companies to implement such a system – the desire to
protect the data becomes increasingly important as concerns grow regarding security and control of internal
information and documentation. DoxTek is ready and willing to analyze the current processes used within any
company looking to employ an EDMS or who is interested in reviewing their current Kofax system.
Adrian Enders
Join us!
For more information contact us at: [email protected] or 866-678-8400. www.doxtek.com