This is a software system for forensic comparison of handwriting. It was developed at CEDAR, the Center of Excellence for Document Analysis and Recognition at the University at Buffalo. CEDAR-FOX has capabilities for interaction with the questioned document examinerto go through processing steps such as extracting regions of interest from a scanned document, determining lines and words of text, recognize textual elements. The final goal is to compare two samples of writing to determine the log-likelihood ratio under the prosecution and defense hypotheses. It can also be used to compare signature samples. The software, which is protected by a United States Patent can be licensed from Cedartech, Inc.
Details
Writer verification is the task to determine whether two handwritten samples are written by the same writer or not. It is used in questioned document examiner. By using a set of metrics, CedarFox can associate a measure of confidence whether two documents are written by the same individual or by different individuals. CedarFox allows you to select either the entire document or a specific region of a document in order to obtain the comparison. The comparison is based on macro features, micro features, and style features. Two different modes of writer verification are available: a questioned document is compared against a single known document, and a questioned document is compared against "multiple known" documents. Here the system learns from the known documents about the writer's habits. At least four known documents have to be available to use this mode. The task of identifying the user is split into two parts,
CEDAR-FOX performs variety of operations on document to make them ready for comparison. They include thresholding, line removal, line segmentation, word segmentation and transcript mapping.
Image Processing
Thresholding converts a gray scale image to binary for separating the foreground pixel from background pixel. The thresholding methods used are Otsu's thresholding, Adaptive thresholding and texture thresholding.
If document is written using rule line paper, user can perform an underline removal operation. Hough transform is applied for this operation and user can select the correct threshold for the same. Selecting high threshold will result in removing some of the character strokes and user has to come up with correct value for the threshold.
Line segmentation separates each line in the document and uses the concept of Bi-Variate Gaussian Densities. Word segmentation acts in similar way and separates each word within the document.
Transcript Matching is a ground truth matching where the software is provided a text file containing the transcript of the handwritten image. This is useful when different subjects are required to handwrite the same content and then it is matched with the unknown document. It finds the best word level alignment between transcript and the handwritten image. The character images are extracted and can be used to compare the similarity between the document.
System Utilities
CedarFox has user interfaces for scanning documents directly as well as for entering the results directly into spread-sheets and for printing intermediate results. A database access is also available for storing document meta-data.
Document Comparison
Many options are available with CEDAR-FOX for document comparison. The four major verification model used are
Parametric modelling of the distance space distribution using pdf.
Computing a 9-point strength of evidence.
Searching
CedarFox has several modalities for searching handwritten documents for the presence of key-words. Word spotting allows the user to select a word image as a query, which is used to find similar word images in a specified document. Another type of search allows the user to type in a word which is used to rank all words in the document as to how likely the word matches the query.
Handwriting Recognition
CedarFox has automatic character recognition capability. Word recognition with a pre-specified lexicon is also built-in. The user can also manually input character identities if the highest character recognition accuracy is desired for the purpose of writer verification/identification.
Legibility and Readability Analysis
Word gap comparison and comparison with Palmer metrics is supported.