Sargur N. Srihari

Sargur N. Srihari
University at Buffalo, State University of New York | SUNY Buffalo · Department of Computer Science and Engineering

About

175
Publications
25,329
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
6,622
Citations

Publications

Publications (175)
Preprint
Full-text available
Handwriting Verification is a critical in document forensics. Deep learning based approaches often face skepticism from forensic document examiners due to their lack of explainability and reliance on extensive training data and handcrafted features. This paper explores using Vision Language Models (VLMs), such as OpenAI's GPT-4o and Google's PaliGe...
Preprint
Full-text available
We present SSL-HV: Self-Supervised Learning approaches applied to the task of Handwriting Verification. This task involves determining whether a given pair of handwritten images originate from the same or different writer distribution. We have compared the performance of multiple generative, contrastive SSL approaches against handcrafted feature ex...
Preprint
Full-text available
EEG decoding systems based on deep neural networks have been widely used in decision making of brain computer interfaces (BCI). Their predictions, however, can be unreliable given the significant variance and noise in EEG signals. Previous works on EEG analysis mainly focus on the exploration of noise pattern in the source signal, while the uncerta...
Chapter
Full-text available
Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pr...
Chapter
Full-text available
In clinical applications, neural networks must focus on and highlight the most important parts of an input image. Soft-Attention mechanism enables a neural network to achieve this goal. This paper investigates the effectiveness of Soft-Attention in deep neural architectures. The central aim of Soft-Attention is to boost the value of important featu...
Preprint
Full-text available
In clinical applications, neural networks must focus on and highlight the most important parts of an input image. Soft-Attention mechanism enables a neural network to achieve this goal. This paper investigates the effectiveness of Soft-Attention in deep neural architectures. The central aim of Soft-Attention is to boost the value of important featu...
Article
Full-text available
Electroencephalogram (EEG) signal has large variance and its pattern differs significantly across subjects. Cross subject EEG classification is a challenging task due to such pattern variation and the limited target data available, as collecting and annotating EEG data for a new user is costly and involve efforts from human experts. We model the ta...
Article
During the last few years many document recognition methods have been developed to determine whether a handwriting specimen can be attributed to a known writer. However, in practice, the work-flow of the document examiner continues to be manual-intensive. Before a systematic or computational, approach can be developed, an articulation of the steps...
Article
As the most common type of evidence at crime scenes, footwear marks are found more often than fingerprints, and yet left largely unused due to lack of efficient and reliable tools. While the central task is stated simply - retrieve the closest matches among a database of known outsole prints - the difficulty is the poor quality of the marks and a v...
Article
The biometric verification task is to determine whether or not an input and a template belong to the same individual. In the context of automatic fingerprint verification the task consists of three steps: feature extraction, where features (typically minutiae) are extracted from each fingerprint, scoring, where the degree of match between the two s...
Article
Full-text available
Understanding a block of handwritten text means mapping it into a semantic representation. We describe an approach to reading a block of handwritten text when there are certain loose constraints placed on the spatial layout and syntax of the text. Early recognition of primitives guides the location of syntactic components. A system to read handwrit...
Article
Full-text available
A method for determining the delivery point codes (DPCs) for handwritten addresses is described. Determining the DPC requires locating and recognizing address components (e.g., ZIP Code, street number, P.O. box number) and using multiple information sources to assign a five, nine or eleven digit barcode (i.e., the DPC) to an address. Our method use...
Conference Paper
We provide a statistical basis for reporting the results of handwriting examination by questioned document (QD) examiners. As a facet of Questioned Document (QD) examination, the analysis and reporting of handwriting examination suffers from the lack of statistical data concerning the frequency of occurrence of combinations of particular handwritin...
Conference Paper
Full-text available
Over the last century forensic document science has developed progressively more sophisticated pattern recognition methodologies for ascertaining the authorship of disputed documents. We present a writer verification method and an evaluation of its performance on historical documents with known and unknown writers. The questioned document is compar...
Conference Paper
One of the most challenging tasks in analyzing handwritten documents is to tackle the inherent skew that is introduced due to writer's handwriting, segment the handwritten lines and estimate the skew angle and its direction. Complexities such as variable spacing between words and lines, variable line skew, variable line width and height, overlappin...
Conference Paper
Signature verification is a common task in forensic document analysis. The goal is to make a decision whether a questioned signature belongs to a set of known signatures of an individual or not. In a typical forgery case a very limited number of known signatures may be available, with as few as four or five knows. Here we describe a fully Bayesian...
Conference Paper
Full-text available
Research on footwear impression evidence has been gaining increasing importance in forensic science. Given a footwear impression at a crime scene, a key task is to find the closest match in a local/national database so as to determine footwear brand and model. This process is made faster if database prints are grouped into clusters of similar patte...
Conference Paper
In the analysis of handwriting in documents a central task is that of determining line structure of the text, e. g., number of text lines, location of their starting and end-points, line-width, etc. While simple methods can handle ideal images, real world documents have complexities such as overlapping line structure, variable line spacing, line sk...
Conference Paper
Full-text available
Over the last century forensic document science has developed progressively more sophisticated pattern recognition methodologies for ascertaining the authorship of disputed documents. These include advances not only in computer assisted stylometrics, but forensic handwriting analysis. We present a writer verification method and an evaluation of an...
Conference Paper
Full-text available
Footwear impression evidence has been gaining increasing importance in forensic investigation. The most challenging task for a forensic examiner is to work with highly degraded footwear marks and match them to the most similar footwear print available in the database. Retrieval process from a large database can be made significantly faster if the d...
Conference Paper
A machine learning approach to off-line signature verification is presented. The prior distributions are determined from genuine and forged signatures of several individuals. The task of signature verification is a problem of determining genuine-class membership of a questioned (test) signature. We take a 3-step, writer independent approach: 1) Det...
Conference Paper
Many governments have some form of "direct democracy" legislation procedure whereby individual citizens can propose various measures creating or altering laws. Generally, such a process is started with the gathering of a large number of signatures. There is interest in whether or not there are fraudulent signatures present in such a petition, and i...
Conference Paper
A novel statistical model for determining whether a pair of documents, a known and a questioned, were written by the same individual is proposed. The goal of this formulation is to learn the specific uniqueness of style in a particular author's writing, given the known document. Since there are often insufficient samples to extrapolate a generalize...
Conference Paper
We present a framework of adaptive (self-training) semi- supervised learning as applied to the problem of handwrit- ing recognition. Each problem instance itself is treated as a set of unlabeled "training" data; a general model, trained on a set of labeled data, is adapted into an appropriate problem specific model. Learning is continued until con-...
Conference Paper
Large quantities of scanned handwritten and printed documents are rapidly being made available for use by information storage and retrieval systems, such as for use by libraries. We present the design and performance of a language independent system for spotting handwritten/printed words in scanned document images. The technique is evaluated with t...
Conference Paper
Expanding on an earlier study to objectively validate the hypothesis that handwriting is individualistic, we extend the study to include handwriting in the Arabic script. Handwriting samples from twelve native speakers of Arabic were obtained. Analyzing differences in handwriting was done by using computer algorithms for extracting features from sc...
Conference Paper
The fully Bayesian approach has been shown to be powerful in machine learning. This paper describes signature verification using a non-parametric Bayesian approach. Given sample(s) of Genuine signatures of an individual, the task of signature verification is a problem of classifying a questioned signature as Genuine or Forgery. The verification p...
Conference Paper
Shoeprints are one of the most commonly found evidences at crime scenes. A latent shoeprint is a photograph of the impressions made by a shoe on the surface of its contact. Latent shoeprints can be used for identification of suspects in a forensic case by narrowing down the search space. This is done by elimination of the type of shoe, by matching...
Chapter
Searching handwritten documents is a relatively unexplored frontier for documents in any language. Traditional approaches use either image-based or text-based techniques. This paper describes a framework for versatile search where the query can be either text or image, and the retrieval method fuses text and image retrieval methods. A UNICODE and a...
Conference Paper
Line segmentation is the first and the most critical pre-processing step for a document recognition/analysis task. Complex handwritten documents with lines running into each other impose a great challenge for the line segmen- tation problem due to the absence of online stroke information. This paper describes a method to disentangle lines running i...
Conference Paper
Writer adaptation or specialization is the adjustment of handwriting recognition algorithms to a specific writer's style of handwriting. Such adjustment yields significantly improved recognition rates over counterpart general recognition algorithms. We present the first unconstrained off-line handwriting adaptation algorithm for Arabic presented in...
Conference Paper
Impression evidence in the form of shoe-prints are commonly found in crime scenes. A critical step in automatic shoe-print identification is extraction of the shoe-print pattern. It involves isolating the shoe-print foreground (impressions made by the shoe) from the remaining elements (background and noise). The problem is formulated as one of labe...
Article
A study of the discriminability of fingerprints of twins is presented. The fingerprint data used is of high quality and quan-tity because of a predominantly young subject population of 298 pairs of twins whose tenprints were captured using a livescan device. Discriminability using level 1 and level 2 features is independently reported. The level 1...
Article
Writer adaptation or specialization is the adjust-ment of handwriting recognition algorithms to a specific writer's style of handwriting. Such adjustment yields significantly improved recognition rates over counterpart general recognition algorithms. We present a discussion of a method of prototype in-tegration for writer adaptation and evaluate th...
Chapter
Signature verification is a common task in forensic document analysis. It's aim is to determine whether a questioned signature matches known signature samples. From the viewpoint of automating the task it can be viewed as one that involves machine learning from a population of signatures. There are two types of learning tasks to be accomplished: pe...
Conference Paper
Generative models of pattern individuality attempt to represent the distribution of observed quantitative features, e.g., by learning parameters from a database, and then use such distributions to determine the probability of two random patterns being the same. Considering fingerprint patterns, Gaussian distributions have been previously used for m...
Article
Offline Chinese handwriting recognition (OCHR) is a typically difficult pattern recognition problem. Many authors have presented various approaches to recognizing its different aspects. We present a survey and an assessment of relevant papers appearing in recent publications of relevant conferences and journals, including those appearing in ICDAR,...
Chapter
The biometric verification task is one of determining whether an input consisting of measurements from an unknown individual matches the corresponding measurements of a known individual. This chapter describes a statistical learning methodology for determining whether a pair of biometric samples belong to the same individual. The methodology involv...
Article
Matching of partial fingerprints has important applications in both biometrics and forensics. It is well-known that the accuracy of minutiae-based matching algorithms dramatically decrease as the number of available minutiae decreases. When singular structures such as core and delta are unavailable, general ridges can be utilized. Some existing hig...
Conference Paper
Automating the task of scoring short handwritten student essays is considered. The goal is to as- sign scores which are comparable to those of hu- man scorers by coupling two AI technologies: op- tical handwriting recognition and automated essay scoring. The test-bed is that of essays written by children in reading comprehension tests. The pro- ces...
Conference Paper
The problem of writer verification is to make a decision of whether or not two handwritten documents are written by the same person. Providing a strength of evidence for any such decision is an integral part of the writer verification problem. The strength of evidence should incorporate (i) The amount of information compared in each of the two docu...
Conference Paper
The paper describes a lexicon driven approach for word recognition on handwritten documents using conditional random fields (CRFs). CRFs are discriminative models and do not make any assumptions about the underlying data and hence are known to be superior to hidden Markov models (HMMs) for sequence labeling problems. For word recognition, the docum...
Article
Over a hundred years, several attempts have been made to quantitatively establish the degree of individuality of fingerprints. Measurements have been made using models based on grids, ridges, fixed probabilities, relative measurements and generative distributions. This paper is a survey and assessment of various fingerprint individuality models pro...
Article
In the analysis and recognition of handwriting, a use-ful first task is to assign ground truth for words in the writing. Such an assignment is useful for various sub-sequent machine learning tasks for performing automatic recognition, writer verification, etc. Since automatic word segmentation and recognition can be error prone, an in-termediate ap...
Article
Given a set of handwritten documents, a common goal is to search for a relevant subset. Attempting to find a query word or image in such a set of documents is called word spotting. Spotting handwritten words in documents written in the Latin alphabet, and more recently in Arabic, has received considerable attention. One issue is gener-ating candida...
Article
The flngerprint veriflcation task answers the question of whether or not two flngerprints belongs to the same flnger. The paper focuses on the classiflcation aspect of flngerprint veriflcation. Classiflcation is the third and flnal step after after the two earlier steps of feature extraction, where a known set of features (minutiae points) have bee...
Chapter
Understanding printed documents such as newspapers is a common intelligent activity of humans. Making a computer perform the task of analyzing a newspaper image and derive useful high-level representations requires the development and integration of techniques in several areas, including pattern recognition, computer vision, language understanding...
Conference Paper
Handwritten essays are widely used in educational assessments, particularly in classroom instruction. This paper concerns the design of an automated system for performing the task of taking as input scanned images of handwritten student essays in reading com- prehension tests and to produce as output scores for the answers which are analogous to th...
Conference Paper
In searching a repository of business documents, a task of interest is that of using a query signature image to retrieve from a database, other signatures matching the query. The signature retrieval task involves a two-step process of extracting all the signatures from the documents and then performing a match on these signatures. This paper presen...
Conference Paper
Signature verification is a common task in forensic document analysis. It is one of determining whether a questioned signature matches known signature samples. From the viewpoint of automating the task it can be viewed as one that involves machine learning from a population of signatures. There are two types of learning to be accomplished. In the f...
Article
A system for spotting words in scanned docu-ment images in three scripts, Devanagari, Ara-bic and Latin is described. Three main com-ponents of the system are a word segmenter, a shape based matcher for words and a search in-terface. The user gives a query which can be ei-ther a word image or text. The candidate words that are searched in the docum...
Article
New machine learning strategies are proposed for person identification which can be used in several biometric modalities such as friction ridges, handwriting, signatures and speech. The biometric or forensic performance task answers the question of whether or not a sample belongs to a known person. Two different learning paradigms are discussed: pe...
Article
Automatic signature verification of scanned documents are presented here. The strat-egy used for verification is applicable in scenarios where there are multiple knowns(genuine signature samples) from a writer. First the learning process invovles learning the variation and similarities from the known genuine samples from the given writer and then c...
Conference Paper
Search aspects of a system for analyzing handwritten documents are described. Documents are indexed using global image features, e.g., stroke width, slant as well as local features that describe the shapes of words and characters. Image indexing is done automatically using page analysis, page segmentation, line separation, word segmentation and rec...
Conference Paper
A statistical model for determining whether a pair of documents, a known and a questioned, were written by the same individual is proposed. The model has the following four components: (i) discriminating elements, e.g., global features and characters, are extracted from each document; (ii) differences between corresponding elements from each docume...
Conference Paper
The design and performance of a content-based information retrieval system for handwritten documents is described. System indexing and retrieval is based on writer characteristics, textual content as well as document meta data such as writer profile. Documents are indexed using global image features, e.g., stroke width, slant, word gaps, as well lo...
Article
Most fast k-nearest neighbor (k-NN) algorithms exploit metric properties of distance measures for reducing computation cost and a few can work effectively on both metric and nonmetric measures. We propose a cluster-based tree algorithm to accelerate k-NN classification without any presuppositions about the metric form and properties of a dissimilar...
Conference Paper
Full-text available
A recognition-based system was developed for constructing handwriting databases. The system automatically recognizes the word and the character images in handwritten document images by applying a transcript mapping algorithm. The transcript-mapping process is modeled as an optimization problem involving multiple word-segmentation hypotheses, word r...
Conference Paper
Analysis of allographs (characters) and allograph combinations (words) is the key for the identifica- tion/verification of a writer's handwriting. While allo- graphs are usually part of words and the segmentation of a word into allographs is a subjective process, analysis of handwritten words is a natural option, complementary to allograph and docu...
Conference Paper
In certain spaces using some distance measures, the sum of any two distances is always bigger than the third one. Such a special property is called the tri-edge inequality (TEI). In this paper, the tri-edge inequality characterizing several binary distance measures is mathematically proven and experimentally verified, and the implications of TEI ar...
Conference Paper
Using handwritten characters we address two questions (i) what is the group identification performance of different alphabets (upper and lower case) and (ii) what are the best characters for the verification task (same writer/different writer discrimination) knowing demographic information about the writer such as ethnicity, age or sex. The Bhattac...
Conference Paper
Existing word image retrieval algorithms suffer from either low retrieval precision or high computation complexity. We present an effective and efficient approach for word image matching by using gradient-based binary features. Experiments over a large database of handwritten word images show that the proposed approach consistently outperforms the...
Article
Classifying an unknown input is a fundamental problem in pattern recognition. A common method is to define a distance metric between patterns and find the most similar pattern in the reference set. When patterns are in binary feature vector form, there have been two approaches to improve the performance over the equal-weighted Hamming distance metr...
Article
The analysis of handwritten documents from the viewpoint of determining their writership has great bearing on the criminal justice system. In many cases, only a limited amount of handwriting is available and sometimes it consists of only numerals. Using a large number of handwritten numeral images extracted from about 3000 samples written by 1000 w...
Article
We investigate the combination of Type-III classifiers using the Dempster-Shafer Theory of Evidence. Various methods of building BPA's for each classifier using both "global" and "local" classifier information are explored. We propose modifications to two established BPAcomputation methods to make them better suited for combining Type-III classifie...
Conference Paper
Several dissimilarity measures for binary vectors are formulated and examined for their recognition capability in handwriting identification for which the binary micro-features are used to characterize handwritten character shapes. Pertaining to eight dissimilarity measures, i.e., Jaccard-Needham, Dice, Correlation, Yule, Russell-Rao, Sokal-Michene...
Conference Paper
The Dempster-Shafer Theory of Evidence is an estab- lished method for combining different sources of informa- tion. In this paper we explore ways to improve the combina- tion performance by building a better BPA for each clas- sifier using both "global" and "local classifier informa- tion. We propose modifications to two well-known BPA- computation...
Conference Paper
Analysis of allographs (characters) and allograph com- binations (words) is the key for obtaining the discriminat- ing elements of handwriting. While allographs usually in- habit in words and segregation of a word into allographs is more subjective than objective, especially for cursive writ- ing, analysis of handwritten words is a natural and bett...
Conference Paper
Analysis of handwritten characters (allographs) plays an important role in forensic document examination. However, so far there lacks a comprehensive and quantitative study on individuality of handwritten characters. Based on a large number of handwritten characters extracted from handwrit- ing samples of 1000 individuals in US, the individuality o...
Article
Optical character recognition (OCR) is performed by optical character readers which are automated electronic systems. OCR may be defined as the process of converting images of machine printed or handwritten numerals, letters, and symbols into a computer- processable format. The long history of research in this area, commercial success, and the cont...
Chapter
String distance measures are useful in both on-line and off-line character recognition for comparing on-line stroke and off-line contour sequence strings. Since stroke and contour string elements are angular in that they represent a circular measurement (0° ~ 360°), usual edit distances with cost matrix are inadequate for this type of strings. For...
Article
Motivated by several rulings in United States courts concerning expert testimony in general, and handwriting testimony in particular, we undertook a study to objectively validate the hypothesis that handwriting is individual. Handwriting samples of 1,500 individuals, representative of the U.S. population with respect to gender, age, ethnic groups,...
Article
A distance measure between two histograms has applications in feature selection, image indexing and retrieval, pattern classification and clustering, etc. We propose a distance between sets of measurement values as a measure of dissimilarity of two histograms. The proposed measure has the advantage over the traditional distance measures regarding t...
Article
Classifying an unknown input is a fundamental problem in Pattern Recognition. One standard method is finding its nearest neighbors in a reference set. It would be very time consuming if computed feature by feature for all templates in the reference set; this naı̈ve method is O(nd) where n is the number of templates in the reference set and d is the...
Conference Paper
Multi-classifier combination based on Dempster-Shafer theory of evidence has demonstrated it's superior performance. In the approach based on Dempster-Shafer theory, the basic probability assignments for evidence is usually derived from classifiers' global performance. However, our study discovered that while using classifiers' global performance a...
Article
In our previous work of writer identification, a database of handwriting samples (written in English) of over one thousand individuals was created, and two types of computer-generated features of sample handwriting were extracted: macro and micro features. Using these features, writer identification experiments were performed: given that a document...
Article
A study was undertaken to determine the power of handwriting to distinguish between individuals. Handwriting samples of one thousand five hundred individuals, representative of the US population with respect to gender, age, ethnic groups, etc., were obtained. Analyzing differences in handwriting was done by using computer algorithms for extracting...
Conference Paper
Foreign mail recognition ( FMR ) is part of the more general problem of recognizing destination addresses in a mail stream. It is defined as the problem of finding the country of destination of a mail piece sent to a foreign address. We discuss some of the differences between FMR and domestic mail recognition ( DMR ) and present its specific challe...
Conference Paper
Full-text available
The sub-category classification problem is that of discriminating a pattern to all sub-categories. Not surprisingly, sub-category classification performance estimates are useful information to mine as many researchers are interested in any trend of pattern in specific sub-category. This paper presents a datamining technique to mine a database consi...
Conference Paper
Full-text available
Motivated by several rulings in United States courts concerning expert testimony in general and handwriting testimony in particular, we undertook a study to objectively validate the hypothesis that handwriting is individualistic. Handwriting samples of 1500 individuals, representative of the US population with respect to gender, age, ethnic groups,...
Conference Paper
Full-text available
We undertook a study to objectively validate the hypothesis that handwriting is individualistic. Handwriting samples of one thousand five hundred individuals, representative of the US population with respect to gender age, ethnic groups, etc., were obtained. Analyzing differences in handwriting was done by using computer algorithms for extracting f...
Article
In our previous work of writer identification, a database of handwriting samples (written in English) of over one thousand individuals was created, and two types of computer-generated features of sample handwriting were extracted: macro and micro features. Using these features, writer identification experiments were performed: given that a document...
Article
This paper describes an off-line handwritten document data collection effort conducted at CEDAR and discusses systems that manage the document image data. We introduce the CEDAR letter, discuss its completeness and then describe the specification of the CEDAR letter image database consisting of writer data and features obtained from a handwriting s...
Article
Full-text available
A word recognition algorithm is proposed that integrates character recognition with word shape analysis. The algorithm consists of a set of serial filters and parallel classifiers, and the decisions are combined to generate a consensus ranking of the input lexicon. Experimental results with multifont machine-printed word images are discussed. 1
Article
Full-text available
This paper presents a word shape analysis approach for word recognition that is independent of character segmentation. The algorithm receives a word image and a lexicon. A set of global and local shape features are extracted from the image and matched with words in the lexicon by a set of highly specialized classifiers. A ranking combination strate...
Article
Full-text available
Difficult pattern recognition problems involving large class sets and noisy input can be solved by a multiple classifier system, which allows simultaneous use of arbitrary feature descriptors and classification procedures. Independent decisions by each classifier can be combined by methods of the highest rank, Borda count, and logistic regression,...
Article
this paper, we demonstrate that this is possible, and propose a method that can be used to combine the decisions of individual classifiers to obtain a classification procedure which performs better than any of the individual classifiers
Article
Full-text available
A technique for combining the results of classifier decisions in a multi-classifier recognition system is presented. Each classifier produces a ranking of a set of classes. The combination technique uses these rankings to determine a small subset of the set of classes that contains the correct class. A group consensus function is then applied to re...
Article
Full-text available
A regression method is proposed to combine decisions of multiple character recognition algorithms. The method computes a weighted sum of the rank scores produced by the individual classifiers and derive a consensus ranking. The weights are estimated by a logistic regression analysis. Two experiments are discussed where the method was applied to rec...
Article
Full-text available
A top-down approach to word recognition is proposed. Discussions are presented on dynamically selecting the most effective feature combinations, which are applied to discriminate between a limited set of word hypotheses.
Conference Paper
This paper is to determine the statistical validity of individuality in handwriting based on measurement of features, quantification and statistical analysis. In classification problems such as writer, face, finger print or speaker identification, the number of classes is very large or unspecified. To establish the inherent distinctness of the clas...
Article
Firm name recognition provides a useful source of information for automatic postal address interpretation. This paper presents two approaches to firm name recognition. The word-based approach treats a firm name as a list of words each providing an index to the database. The character-based approach treats a firm name as a sequence of characters and...
Conference Paper
The similarity between two histograms has attracted many researchers in various fields. The type of histograms to be matched is often angular such as gradient directions in character images and hue values in color images. The distance between two angular type histograms differs from those of nominal or ordinal type histograms; however, conventional...

Network

Cited By