About
175
Publications
25,329
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,622
Citations
Publications
Publications (175)
Handwriting Verification is a critical in document forensics. Deep learning based approaches often face skepticism from forensic document examiners due to their lack of explainability and reliance on extensive training data and handcrafted features. This paper explores using Vision Language Models (VLMs), such as OpenAI's GPT-4o and Google's PaliGe...
We present SSL-HV: Self-Supervised Learning approaches applied to the task of Handwriting Verification. This task involves determining whether a given pair of handwritten images originate from the same or different writer distribution. We have compared the performance of multiple generative, contrastive SSL approaches against handcrafted feature ex...
EEG decoding systems based on deep neural networks have been widely used in decision making of brain computer interfaces (BCI). Their predictions, however, can be unreliable given the significant variance and noise in EEG signals. Previous works on EEG analysis mainly focus on the exploration of noise pattern in the source signal, while the uncerta...
Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pr...
In clinical applications, neural networks must focus on and highlight the most important parts of an input image. Soft-Attention mechanism enables a neural network to achieve this goal. This paper investigates the effectiveness of Soft-Attention in deep neural architectures. The central aim of Soft-Attention is to boost the value of important featu...
In clinical applications, neural networks must focus on and highlight the most important parts of an input image. Soft-Attention mechanism enables a neural network to achieve this goal. This paper investigates the effectiveness of Soft-Attention in deep neural architectures. The central aim of Soft-Attention is to boost the value of important featu...
Electroencephalogram (EEG) signal has large variance and its pattern differs significantly across subjects. Cross subject EEG classification is a challenging task due to such pattern variation and the limited target data available, as collecting and annotating EEG data for a new user is costly and involve efforts from human experts. We model the ta...
During the last few years many document recognition methods have been
developed to determine whether a handwriting specimen can be attributed
to a known writer. However, in practice, the work-flow of the document
examiner continues to be manual-intensive. Before a systematic or
computational, approach can be developed, an articulation of the steps...
As the most common type of evidence at crime scenes, footwear marks are found more often than fingerprints, and yet left largely unused due to lack of efficient and reliable tools. While the central task is stated simply - retrieve the closest matches among a database of known outsole prints - the difficulty is the poor quality of the marks and a v...
The biometric verification task is to determine whether or not an input and a template belong to the same individual. In the context of automatic fingerprint verification the task consists of three steps: feature extraction, where features (typically minutiae) are extracted from each fingerprint, scoring, where the degree of match between the two s...
Understanding a block of handwritten text means mapping it into a semantic representation. We describe an approach to reading a block of handwritten text when there are certain loose constraints placed on the spatial layout and syntax of the text. Early recognition of primitives guides the location of syntactic components. A system to read handwrit...
A method for determining the delivery point codes (DPCs) for handwritten addresses is described. Determining the DPC requires locating and recognizing address components (e.g., ZIP Code, street number, P.O. box number) and using multiple information sources to assign a five, nine or eleven digit barcode (i.e., the DPC) to an address. Our method use...
We provide a statistical basis for reporting the results of handwriting examination by questioned document (QD) examiners. As a facet of Questioned Document (QD) examination, the analysis and reporting of handwriting examination suffers from the lack of statistical data concerning the frequency of occurrence of combinations of particular handwritin...
Over the last century forensic document science has developed progressively more sophisticated pattern recognition methodologies for ascertaining the authorship of disputed documents. We present a writer verification method and an evaluation of its performance on historical documents with known and unknown writers. The questioned document is compar...
One of the most challenging tasks in analyzing handwritten documents is to tackle the inherent skew that is introduced due to writer's handwriting, segment the handwritten lines and estimate the skew angle and its direction. Complexities such as variable spacing between words and lines, variable line skew, variable line width and height, overlappin...
Signature verification is a common task in forensic document analysis. The goal is to make a decision whether a questioned signature belongs to a set of known signatures of an individual or not. In a typical forgery case a very limited number of known signatures may be available, with as few as four or five knows. Here we describe a fully Bayesian...
Research on footwear impression evidence has been gaining increasing importance in forensic science. Given a footwear impression at a crime scene, a key task is to find the closest match in a local/national database so as to determine footwear brand and model. This process is made faster if database prints are grouped into clusters of similar patte...
In the analysis of handwriting in documents a central task is that of determining line structure of the text, e. g., number of text lines, location of their starting and end-points, line-width, etc. While simple methods can handle ideal images, real world documents have complexities such as overlapping line structure, variable line spacing, line sk...
Over the last century forensic document science has developed progressively more sophisticated pattern recognition methodologies for ascertaining the authorship of disputed documents. These include advances not only in computer assisted stylometrics, but forensic handwriting analysis. We present a writer verification method and an evaluation of an...
Footwear impression evidence has been gaining increasing importance in forensic investigation. The most challenging task for a forensic examiner is to work with highly degraded footwear marks and match them to the most similar footwear print available in the database. Retrieval process from a large database can be made significantly faster if the d...
A machine learning approach to off-line signature verification is presented. The prior distributions are determined from genuine and forged signatures of several individuals. The task of signature verification is a problem of determining genuine-class membership of a questioned (test) signature. We take a 3-step, writer independent approach: 1) Det...
Many governments have some form of "direct democracy" legislation procedure whereby individual citizens can propose various measures creating or altering laws. Generally, such a process is started with the gathering of a large number of signatures. There is interest in whether or not there are fraudulent signatures present in such a petition, and i...
A novel statistical model for determining whether a pair of documents, a known and a questioned, were written by the same individual is proposed. The goal of this formulation is to learn the specific uniqueness of style in a particular author's writing, given the known document. Since there are often insufficient samples to extrapolate a generalize...
We present a framework of adaptive (self-training) semi- supervised learning as applied to the problem of handwrit- ing recognition. Each problem instance itself is treated as a set of unlabeled "training" data; a general model, trained on a set of labeled data, is adapted into an appropriate problem specific model. Learning is continued until con-...
Large quantities of scanned handwritten and printed documents are rapidly being made available for use by information storage
and retrieval systems, such as for use by libraries. We present the design and performance of a language independent system
for spotting handwritten/printed words in scanned document images. The technique is evaluated with t...
Expanding on an earlier study to objectively validate the hypothesis that handwriting is individualistic, we extend the study to include handwriting in the Arabic script. Handwriting samples from twelve native speakers of Arabic were obtained. Analyzing differences in handwriting was done by using computer algorithms for extracting features from sc...
The fully Bayesian approach has been shown to be powerful in machine learning. This paper describes signature verification
using a non-parametric Bayesian approach. Given sample(s) of Genuine signatures of an individual, the task of signature verification
is a problem of classifying a questioned signature as Genuine or Forgery. The verification p...
Shoeprints are one of the most commonly found evidences at crime scenes. A latent shoeprint is a photograph of the impressions
made by a shoe on the surface of its contact. Latent shoeprints can be used for identification of suspects in a forensic case
by narrowing down the search space. This is done by elimination of the type of shoe, by matching...
Searching handwritten documents is a relatively unexplored frontier for documents in any language. Traditional approaches
use either image-based or text-based techniques. This paper describes a framework for versatile search where the query can
be either text or image, and the retrieval method fuses text and image retrieval methods. A UNICODE and a...
Line segmentation is the first and the most critical pre-processing step for a document recognition/analysis task. Complex handwritten documents with lines running into each other impose a great challenge for the line segmen- tation problem due to the absence of online stroke information. This paper describes a method to disentangle lines running i...
Writer adaptation or specialization is the adjustment of handwriting
recognition algorithms to a specific writer's style of handwriting. Such
adjustment yields significantly improved recognition rates over
counterpart general recognition algorithms. We present the first
unconstrained off-line handwriting adaptation algorithm for Arabic
presented in...
Impression evidence in the form of shoe-prints are commonly found in crime scenes. A critical step in automatic shoe-print identification is extraction of the shoe-print pattern. It involves isolating the shoe-print foreground (impressions made by the shoe) from the remaining elements (background and noise). The problem is formulated as one of labe...
A study of the discriminability of fingerprints of twins is presented. The fingerprint data used is of high quality and quan-tity because of a predominantly young subject population of 298 pairs of twins whose tenprints were captured using a livescan device. Discriminability using level 1 and level 2 features is independently reported. The level 1...
Writer adaptation or specialization is the adjust-ment of handwriting recognition algorithms to a specific writer's style of handwriting. Such adjustment yields significantly improved recognition rates over counterpart general recognition algorithms. We present a discussion of a method of prototype in-tegration for writer adaptation and evaluate th...
Signature verification is a common task in forensic document analysis. It's aim is to determine whether a questioned signature
matches known signature samples. From the viewpoint of automating the task it can be viewed as one that involves machine learning
from a population of signatures. There are two types of learning tasks to be accomplished: pe...
Generative models of pattern individuality attempt to represent the distribution of observed quantitative features, e.g., by learning parameters from a database, and then use such distributions to determine the probability of two random patterns being the same. Considering fingerprint patterns, Gaussian distributions have been previously used for m...
Offline Chinese handwriting recognition (OCHR) is a typically difficult pattern recognition problem. Many authors have presented various approaches to recognizing its different aspects. We present a survey and an assessment of relevant papers appearing in recent publications of relevant conferences and journals, including those appearing in ICDAR,...
The biometric verification task is one of determining whether an input consisting of measurements from an unknown individual matches the corresponding measurements of a known individual. This chapter describes a statistical learning methodology for determining whether a pair of biometric samples belong to the same individual. The methodology involv...
Matching of partial fingerprints has important applications in both biometrics and forensics. It is well-known that the accuracy of minutiae-based matching algorithms dramatically decrease as the number of available minutiae decreases. When singular structures such as core and delta are unavailable, general ridges can be utilized. Some existing hig...
Automating the task of scoring short handwritten student essays is considered. The goal is to as- sign scores which are comparable to those of hu- man scorers by coupling two AI technologies: op- tical handwriting recognition and automated essay scoring. The test-bed is that of essays written by children in reading comprehension tests. The pro- ces...
The problem of writer verification is to make a decision of whether or not two handwritten documents are written by the same person. Providing a strength of evidence for any such decision is an integral part of the writer verification problem. The strength of evidence should incorporate (i) The amount of information compared in each of the two docu...
The paper describes a lexicon driven approach for word recognition on handwritten documents using conditional random fields (CRFs). CRFs are discriminative models and do not make any assumptions about the underlying data and hence are known to be superior to hidden Markov models (HMMs) for sequence labeling problems. For word recognition, the docum...
Over a hundred years, several attempts have been made to quantitatively establish the degree of individuality of fingerprints. Measurements have been made using models based on grids, ridges, fixed probabilities, relative measurements and generative distributions. This paper is a survey and assessment of various fingerprint individuality models pro...
In the analysis and recognition of handwriting, a use-ful first task is to assign ground truth for words in the writing. Such an assignment is useful for various sub-sequent machine learning tasks for performing automatic recognition, writer verification, etc. Since automatic word segmentation and recognition can be error prone, an in-termediate ap...
Given a set of handwritten documents, a common goal is to search for a relevant subset. Attempting to find a query word or image in such a set of documents is called word spotting. Spotting handwritten words in documents written in the Latin alphabet, and more recently in Arabic, has received considerable attention. One issue is gener-ating candida...
The flngerprint veriflcation task answers the question of whether or not two flngerprints belongs to the same flnger. The paper focuses on the classiflcation aspect of flngerprint veriflcation. Classiflcation is the third and flnal step after after the two earlier steps of feature extraction, where a known set of features (minutiae points) have bee...
Understanding printed documents such as newspapers is a common intelligent activity of humans. Making a computer perform the task of analyzing a newspaper image and derive useful high-level representations requires the development and integration of techniques in several areas, including pattern recognition, computer vision, language understanding...
Handwritten essays are widely used in educational assessments, particularly in classroom instruction. This paper concerns the design of an automated system for performing the task of taking as input scanned images of handwritten student essays in reading com- prehension tests and to produce as output scores for the answers which are analogous to th...
In searching a repository of business documents, a task of interest is that of using a query signature image to retrieve from a database, other signatures matching the query. The signature retrieval task involves a two-step process of extracting all the signatures from the documents and then performing a match on these signatures. This paper presen...
Signature verification is a common task in forensic document analysis. It is one of determining whether a questioned signature matches known signature samples. From the viewpoint of automating the task it can be viewed as one that involves machine learning from a population of signatures. There are two types of learning to be accomplished. In the f...
A system for spotting words in scanned docu-ment images in three scripts, Devanagari, Ara-bic and Latin is described. Three main com-ponents of the system are a word segmenter, a shape based matcher for words and a search in-terface. The user gives a query which can be ei-ther a word image or text. The candidate words that are searched in the docum...
New machine learning strategies are proposed for person identification which can be used in several biometric modalities such as friction ridges, handwriting, signatures and speech. The biometric or forensic performance task answers the question of whether or not a sample belongs to a known person. Two different learning paradigms are discussed: pe...
Automatic signature verification of scanned documents are presented here. The strat-egy used for verification is applicable in scenarios where there are multiple knowns(genuine signature samples) from a writer. First the learning process invovles learning the variation and similarities from the known genuine samples from the given writer and then c...
Search aspects of a system for analyzing handwritten documents are
described. Documents are indexed using global image features, e.g.,
stroke width, slant as well as local features that describe the shapes
of words and characters. Image indexing is done automatically using page
analysis, page segmentation, line separation, word segmentation and
rec...
A statistical model for determining whether a pair of documents, a known and a questioned, were written by the same individual is proposed. The model has the following four components: (i) discriminating elements, e.g., global features and characters, are extracted from each document; (ii) differences between corresponding elements from each docume...
The design and performance of a content-based information retrieval system for handwritten documents is described. System indexing and retrieval is based on writer characteristics, textual content as well as document meta data such as writer profile. Documents are indexed using global image features, e.g., stroke width, slant, word gaps, as well lo...
Most fast k-nearest neighbor (k-NN) algorithms exploit metric properties of distance measures for reducing computation cost and a few can work effectively on both metric and nonmetric measures. We propose a cluster-based tree algorithm to accelerate k-NN classification without any presuppositions about the metric form and properties of a dissimilar...
A recognition-based system was developed for constructing handwriting databases. The system automatically recognizes the word and the character images in handwritten document images by applying a transcript mapping algorithm. The transcript-mapping process is modeled as an optimization problem involving multiple word-segmentation hypotheses, word r...
Analysis of allographs (characters) and allograph combinations (words) is the key for the identifica- tion/verification of a writer's handwriting. While allo- graphs are usually part of words and the segmentation of a word into allographs is a subjective process, analysis of handwritten words is a natural option, complementary to allograph and docu...
In certain spaces using some distance measures, the sum of any two distances is always bigger than the third one. Such a special property is called the tri-edge inequality (TEI). In this paper, the tri-edge inequality characterizing several binary distance measures is mathematically proven and experimentally verified, and the implications of TEI ar...
Using handwritten characters we address two questions (i) what is the group identification performance of different alphabets (upper and lower case) and (ii) what are the best characters for the verification task (same writer/different writer discrimination) knowing demographic information about the writer such as ethnicity, age or sex. The Bhattac...
Existing word image retrieval algorithms suffer from either low retrieval precision or high computation complexity. We present an effective and efficient approach for word image matching by using gradient-based binary features. Experiments over a large database of handwritten word images show that the proposed approach consistently outperforms the...
Classifying an unknown input is a fundamental problem in pattern recognition. A common method is to define a distance metric between patterns and find the most similar pattern in the reference set. When patterns are in binary feature vector form, there have been two approaches to improve the performance over the equal-weighted Hamming distance metr...
The analysis of handwritten documents from the viewpoint of determining their writership has great bearing on the criminal justice system. In many cases, only a limited amount of handwriting is available and sometimes it consists of only numerals. Using a large number of handwritten numeral images extracted from about 3000 samples written by 1000 w...
We investigate the combination of Type-III classifiers using the Dempster-Shafer Theory of Evidence. Various methods of building BPA's for each classifier using both "global" and "local" classifier information are explored. We propose modifications to two established BPAcomputation methods to make them better suited for combining Type-III classifie...
Several dissimilarity measures for binary vectors are formulated and examined for their recognition capability in handwriting identification for which the binary micro-features are used to characterize handwritten character shapes. Pertaining to eight dissimilarity measures, i.e., Jaccard-Needham, Dice, Correlation, Yule, Russell-Rao, Sokal-Michene...
The Dempster-Shafer Theory of Evidence is an estab- lished method for combining different sources of informa- tion. In this paper we explore ways to improve the combina- tion performance by building a better BPA for each clas- sifier using both "global" and "local classifier informa- tion. We propose modifications to two well-known BPA- computation...
Analysis of allographs (characters) and allograph com- binations (words) is the key for obtaining the discriminat- ing elements of handwriting. While allographs usually in- habit in words and segregation of a word into allographs is more subjective than objective, especially for cursive writ- ing, analysis of handwritten words is a natural and bett...
Analysis of handwritten characters (allographs) plays an important role in forensic document examination. However, so far there lacks a comprehensive and quantitative study on individuality of handwritten characters. Based on a large number of handwritten characters extracted from handwrit- ing samples of 1000 individuals in US, the individuality o...
Optical character recognition (OCR) is performed by optical character readers which are automated electronic systems. OCR may be defined as the process of converting images of machine printed or handwritten numerals, letters, and symbols into a computer- processable format. The long history of research in this area, commercial success, and the cont...
String distance measures are useful in both on-line and off-line character recognition for comparing on-line stroke and off-line contour sequence strings. Since stroke and contour string elements are angular in that they represent a circular measurement (0° ~ 360°), usual edit distances with cost matrix are inadequate for this type of strings. For...
Motivated by several rulings in United States courts concerning expert testimony in general, and handwriting testimony in particular, we undertook a study to objectively validate the hypothesis that handwriting is individual. Handwriting samples of 1,500 individuals, representative of the U.S. population with respect to gender, age, ethnic groups,...
A distance measure between two histograms has applications in feature selection, image indexing and retrieval, pattern classification and clustering, etc. We propose a distance between sets of measurement values as a measure of dissimilarity of two histograms. The proposed measure has the advantage over the traditional distance measures regarding t...
Classifying an unknown input is a fundamental problem in Pattern Recognition. One standard method is finding its nearest neighbors in a reference set. It would be very time consuming if computed feature by feature for all templates in the reference set; this naı̈ve method is O(nd) where n is the number of templates in the reference set and d is the...
Multi-classifier combination based on Dempster-Shafer theory of evidence has demonstrated it's superior performance. In the approach based on Dempster-Shafer theory, the basic probability assignments for evidence is usually derived from classifiers' global performance. However, our study discovered that while using classifiers' global performance a...
In our previous work of writer identification, a database of handwriting samples (written in English) of over one thousand individuals was created, and two types of computer-generated features of sample handwriting were extracted: macro and micro features. Using these features, writer identification experiments were performed: given that a document...
A study was undertaken to determine the power of handwriting to distinguish between individuals. Handwriting samples of one thousand five hundred individuals, representative of the US population with respect to gender, age, ethnic groups, etc., were obtained. Analyzing differences in handwriting was done by using computer algorithms for extracting...
Foreign mail recognition ( FMR ) is part of the more
general problem of recognizing destination addresses in a mail stream.
It is defined as the problem of finding the country of destination of a
mail piece sent to a foreign address. We discuss some of the differences
between FMR and domestic mail recognition ( DMR ) and
present its specific challe...
The sub-category classification problem is that of discriminating a pattern to all sub-categories. Not surprisingly, sub-category classification performance estimates are useful information to mine as many researchers are interested in any trend of pattern in specific sub-category. This paper presents a datamining technique to mine a database consi...
Motivated by several rulings in United States courts concerning
expert testimony in general and handwriting testimony in particular, we
undertook a study to objectively validate the hypothesis that
handwriting is individualistic. Handwriting samples of 1500 individuals,
representative of the US population with respect to gender, age, ethnic
groups,...
We undertook a study to objectively validate the hypothesis that handwriting is individualistic. Handwriting samples of one thousand five hundred individuals, representative of the US population with respect to gender age, ethnic groups, etc., were obtained. Analyzing differences in handwriting was done by using computer algorithms for extracting f...
In our previous work of writer identification, a database of handwriting samples (written in English) of over one thousand individuals was created, and two types of computer-generated features of sample handwriting were extracted: macro and micro features. Using these features, writer identification experiments were performed: given that a document...
This paper describes an off-line handwritten document data collection effort conducted at CEDAR and discusses systems that manage the document image data. We introduce the CEDAR letter, discuss its completeness and then describe the specification of the CEDAR letter image database consisting of writer data and features obtained from a handwriting s...
A word recognition algorithm is proposed that integrates character recognition with word shape analysis. The algorithm consists of a set of serial filters and parallel classifiers, and the decisions are combined to generate a consensus ranking of the input lexicon. Experimental results with multifont machine-printed word images are discussed. 1
This paper presents a word shape analysis approach for word recognition that is independent of character segmentation. The algorithm receives a word image and a lexicon. A set of global and local shape features are extracted from the image and matched with words in the lexicon by a set of highly specialized classifiers. A ranking combination strate...
Difficult pattern recognition problems involving large class sets and noisy input can be solved by a multiple classifier system, which allows simultaneous use of arbitrary feature descriptors and classification procedures. Independent decisions by each classifier can be combined by methods of the highest rank, Borda count, and logistic regression,...
this paper, we demonstrate that this is possible, and propose a method that can be used to combine the decisions of individual classifiers to obtain a classification procedure which performs better than any of the individual classifiers
A technique for combining the results of classifier decisions in a multi-classifier recognition system is presented. Each classifier produces a ranking of a set of classes. The combination technique uses these rankings to determine a small subset of the set of classes that contains the correct class. A group consensus function is then applied to re...
A regression method is proposed to combine decisions of multiple character recognition algorithms. The method computes a weighted sum of the rank scores produced by the individual classifiers and derive a consensus ranking. The weights are estimated by a logistic regression analysis. Two experiments are discussed where the method was applied to rec...
A top-down approach to word recognition is proposed. Discussions are presented on dynamically selecting the most effective feature combinations, which are applied to discriminate between a limited set of word hypotheses.
This paper is to determine the statistical validity of individuality in handwriting based on measurement of features, quantification
and statistical analysis. In classification problems such as writer, face, finger print or speaker identification, the number of classes is very large or unspecified. To establish the inherent distinctness of the clas...
Firm name recognition provides a useful source of information for automatic postal address interpretation. This paper presents two approaches to firm name recognition. The word-based approach treats a firm name as a list of words each providing an index to the database. The character-based approach treats a firm name as a sequence of characters and...
The similarity between two histograms has attracted many researchers in various fields. The type of histograms to be matched is often angular such as gradient directions in character images and hue values in color images. The distance between two angular type histograms differs from those of nominal or ordinal type histograms; however, conventional...