Article

OCR — Optical Character Recognition

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Unter OCR (Optical Character Recognition) versteht man die automatische Erkennung gedruckter Zeichen mithilfe optischer Abtastung. Das heisst nichts anderes, als dass der Computer mit Hilfe von OCR gedruckte Texte eigenständig abschreibt. Eingesetzt wird die Technologie vor allem bei der Dokumentenerkennung, Formularauswertung, in Archivsystemen oder bei der Belegerfassung. Allerdings braucht es für die Anwendung nicht nur einen Scanner, sondern auch eine OCR-Software für den PC, die Dokumente zur weiteren Verarbeitung in Formate wie Doc, HTML, PDF oder TXT konvertiert. Ist das Dokument konvertiert, kann der eingescannte Text vom Praxisteam bearbeitet werden.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Artificial neural networks, also known as multilayer perceptrons, are "a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs" (Reh, 2012). Artificial neural networks (ANN) are composed of nodes which imitate neurons in the organic brain. ...
... Each node in the first hidden layer represents an activated sum of weighted values from the input layer. Each connection of a node of the input layer and that of the first hidden layer has an associated weight that determines how strongly that input affects the target node (Reh, 2012). ...
Article
Full-text available
Abstract: The article addresses the issue of Georgian handwritten text recognition. As a result of theperformed research activity, a framework for recognizing handwritten Georgian text using Self-Normalizing Convolutional Neural Networks (CNN) was developed. To train the CNN model, an extensivedataset was created with over 200 000 character samples. This framework has been deployed as a webservice, as well as in the form of apps for Windows, Linux, and iOS. Keywords: Artificial intelligence, CNN, SNN, OCR, Handwritten Character Recognition.
Chapter
The classification tries to assign the best category to given unknown records based on previous observations. It is clear that with the growing amount of data, any classification algorithm can be very slow. The learning speed of many developed state-of-the-art algorithms like deep neural networks or support vector machines is very low. Evolutionary-based approaches in classification have the same problem. This paper describes five different evolutionary-based approaches that solve the classification problem and run in real time. This was achieved by using GPU parallelization. These classifiers are evaluated on two collections that contains millions of records. The proposed parallel approach is much faster and preserve the same precision as a serial version.
Article
Full-text available
This paper addresses the problem of retrieving meaningful geometric information implied in image data. We outline a general algorithmic scheme to solve the problem in any geometric domain. The scheme, which depends on the domain, may lead to concrete algorithms when the domain is properly and formally specified. Taking plane Euclidean geometry E{\mathbb {E}} as an example of the domain, we show how to formally specify E{\mathbb {E}} and how to concretize the scheme to yield algorithms for the retrieval of meaningful geometric information in E{\mathbb {E}}. For images of hand-drawn diagrams in E{\mathbb {E}}, we present concrete algorithms to retrieve typical geometric objects and geometric relations, as well as their labels, and demonstrate the feasibility of our algorithms with experiments. An example is presented to illustrate how nontrivial geometric theorems can be generated from retrieved geometric objects and relations and thus how implied geometric knowledge may be discovered automatically from images.
Article
Full-text available
The image processing technique known as super-resolution (SR), which attempts to increase the effective pixel sampling density of a digital imager, has gained rapid popularity over the last decade. The majority of literature focuses on its ability to provide results that are visually pleasing to a human observer. In this paper, we instead examine the ability of SR to improve the resolution-critical capability of an imaging system to perform a classification task from a remote location, specifically from an airborne camera. In order to focus the scope of the study, we address and quantify results for the narrow case of text classification. However, we expect the results generalize to a large set of related, remote classification tasks. We generate theoretical results through simulation, which are corroborated by experiments with a camera mounted on a DJI Phantom 3 quadcopter.
Article
Full-text available
In this paper, we propose an object categorization framework to extract different visual cues and tackle the problem of categorizing previously unseen objects under various viewpoints. Specifically, we decompose the input image into three visual cues: structure, texture and shape cues. Then, local features are extracted using the log-polar transform to achieve scale and rotation invariance. The local descriptors obtained from different visual cues are fused using the bag-of-words representation with some key contributions: (1) a keypoint detection scheme based on variational calculus is proposed for selecting sampling locations; (2) a codebook optimization scheme based on discrete entropy is proposed to choose the optimal codewords and at the same time increase the overall performance. We tested the proposed object classification framework on the ETH-80 dataset using the leave-one-object-out protocol to specifically tackle the problem of categorizing previously unseen objects under various viewpoints. On this popular dataset, the proposed object categorization system obtained a very high improvement in classification performance compared to state-of-the-art methods.
Article
The current paper propounds a method for extracting text from images of book covers and embedded text. The automation of this process greatly reduces the human interference while converting books (specifically their covers where this task becomes extremely difficult) to readable and editable electronic format specifically for electronic book readers. To achieve this purpose we propose a technique which works on scanned images of documents. The image is first clustered to reduce the number of color variances, a suitable plane is identified and then text region is segmented using connected component based method. The text thus obtained is then enhanced to ameliorate the results.
Article
As smartphones become ever more present and interwoven into the daily computing of individuals, a broader perspective of the differences between computer security and smartphone security must be considered. As a general purpose computer, smartphones inherently suffer from all the same computer security issues as traditional computers; however, there exists fundamental differences between smartphones and traditional computing in how we interact with smartphones via the touchscreen. Smartphones interaction is physical, hand-held, and tactile, and this thesis shows how this interaction leads to new side channel vulnerabilities. ^ This is demonstrated through the study of two side channels: One based on external smartphone observations via photographic and forensic evidence, and the other based on internal smartphone observations via the smartphone's on-board sensors. First, we demonstrate a smudge attack, a side channel resulting from oily residues remaining on the touch screen surface post user input. We show that these external observations can reveal users' Android password patterns, and we show that properties of the Android password pattern, in particular, render it susceptible to this attack. Next, we demonstrate a sensor-based side channel that leverages the smartphones internal on-board sensor, particularly the accelerometer, to surreptitiously learn about user input. We show that such attacks are practical; however, broad dictionary based attacks may be challenging. ^ The contributions of this thesis also speak to the future of security research as new computing platforms with new computing interfaces are developed. We argue that a broad perspective of the security of these new devices must be considered, including the computing interface.
Article
Full-text available
Research on event-based processing and analysis of media is receiving an increasing attention from the scientific community due to its relevance for an abundance of applications, from consumer video management and video surveillance to lifelogging and social media. Events have the ability to semantically encode relationships of different informational modalities, such as visual-audio-text, time, involved agents and objects, with the spatio-temporal component of events being a key feature for contextual analysis. This unveils an enormous potential for exploiting new information sources and opening new research directions. In this paper, we survey the existing literature in this field. We extensively review the employed conceptualization of the notion of event in multimedia, the techniques for event representation and modeling, the feature representation and event inference approaches for the problems of event detection in audio, visual, and textual content. Furthermore, we review some key event-based multimedia applications, and various benchmarking activities that provide solid frameworks for measuring the performance of different event processing and analysis systems. We provide an in-depth discussion of the insights obtained from reviewing the literature and identify future directions and challenges.
Chapter
This chapter gives an overview over the most important application areas of Markov model technology. First, he automatic recognition of speech will be considered as the prototypical application before the two further main application areas will be presented, namely character and handwriting recognition as well as the analysis of biological sequences. The chapter closes with an outlook onto some of the many further fields of application for Markov models.
Article
Inspection and monitoring of key components of nuclear power plant reactors is an essential activity for understanding the current health of the power plant and ensuring that they continue to remain safe to operate. As the power plants age, and the components degrade from their initial start-of-life conditions, the requirement for more and more detailed inspection and monitoring information increases. Deployment of new monitoring and inspection equipment on existing operational plant is complex and expensive, as the effect of introducing new sensing and imaging equipment to the existing operational functions needs to be fully understood. Where existing sources of data can be leveraged, the need for new equipment development and installation can be offset by the development of advanced data processing techniques. This paper introduces a novel technique for creating full 360° panoramic images of the inside surface of fuel channels from in-core inspection footage. Through the development of this technique, a number of technical challenges associated with the constraints of using existing equipment have been addressed. These include: the inability to calibrate the camera specifically for image stitching; dealing with additional data not relevant to the panorama construction; dealing with noisy images; and generalising the approach to work with two different capture devices deployed at seven different Advanced Gas Cooled Reactor nuclear power plants. The resulting data processing system is currently under formal assessment with a view to replacing the existing manual assembly of in-core defect montages. Deployment of the system will result in significant time savings on the critical outage path for the plant operator and will allow improved visualization of the surface of the inside of fuel channels, far beyond that which can be gained from manually analysing the raw video footage as is done at present.
Article
Image dehazing has been extensively studied, but the performance evaluation method for dehazing techniques has not attracted significant interest. This paper surveys many existing performance evaluation methods of image dehazing. In order to analyze the reliability of the evaluation methods, synthetic hazy images are first reconstructed using the ground-truth color and depth image pairs, and the dehazed images are then compared with the original haze-free images. Meanwhile we also evaluate dehazing algorithms not by the dehazed images' quality but by the performance of computer vision algorithms before/after applying image dehazing. All the aforementioned evaluation methods are analyzed and compared, and research direction for improving the existing methods is discussed.
Conference Paper
Hidden links are designed solely for search engines rather than visitors. To get high search engine rankings, link hiding techniques are usually used for the profitability of underground economies, such as illicit game servers, false medical services, illegal gambling, and less attractive high-profit industry. This paper investigates hyperlink hiding techniques on the Web, and gives a detailed taxonomy. We believe the taxonomy can help develop appropriate countermeasures. Statistical experimental results on real Web data indicate that link hiding techniques are very prevalent. We also tried to explore the attitude of Google towards link hiding spam by analyzing the PageRank values of relative links. The results show that more should be done to punish the hidden link spam.
Article
Progress in optical character recognition, which underlies most applications of document processing, has been driven mainly by technological advances in microprocessors and optical sensor arrays. Software development based on algorithmic innovations appears to be reaching the point of diminishing returns. Research results, dispersed among a dozen venues, tend to lag behind commercial methodology. Some early mainline applications, like reading typescript, patents and law books, have already become obsolete. Check, postal address, and form processing are on their way out. Open source software may open up niche applications that don't generate enough revenue for commercial developers, including poorly-funded transcription of historical documents (especially genealogical records). Smartphone cameras and wearable technologies are engendering new image-based applications, but there is little evidence of widespread adoption. As document contents are integrated into a web-based continuum of data, they are likely losing even the meager individuality of discrete sheets of paper. The persistent need to create, preserve and communicate information is giving rise to entirely new genres of digital documents with a concomitant need for new approaches to document understanding.
Article
Automatic Bangla character recognition has been a great challenge for research and development because of the huge number of characters, change of shape in a word and in conjunctive characters, and other similar reasons. An optical joint transform correlation-based technique is developed for Bangla character recognition which involves a simple architecture, but can operate at a very high speed because of optics, and offer a very high level of accuracy with negligible false alarms. The proposed correlation technique can successfully identify a target character in a given input scene by producing a single correlation peak per target at the target location. The discrimination between target and non-target correlation peaks is found to be very high even in noisy conditions. The recognition performance of the proposed technique is observed to be insensitive to the type and number of targets. Further improvement of the technique is made by incorporating a synthetic discriminant function, which is created from distorted images of the target character and hence can make the system efficiently recognize Bangla characters in different practical scenarios.
Article
Full-text available
Despite the large number of approaches and techniques proposed to solve the Arabic handwriting recognition problems, the corresponding results remain weak. Indeed, intensive experiments revealed that such approaches and techniques are unable to deal properly especially with large quantity of handwritten Arabic documents. The complex morphology of the Arabic writing is mainly behind this weakness. A deep study of some of these existing proposed approaches and techniques revealed fortunately their complementarity. Such a complementarity can be exploited on making them collaborating together in a flexible manner. This flexible collaboration can improve substantially the recognition rate and may be leading consequently to building powerful Arabic handwriting systems. Web services seem to be an adequate technology which can make possible the flexible collaboration of several approaches and techniques to solve certain problems. Consequently, we present first in this paper a comprehensive review of Arabic handwriting recognition commonly known as Arabic Optical Character Recognition (AOCR) approaches and techniques. Then, we present our idea which consists to build AOCR based on the flexible collaboration of two or more complementary approaches and techniques by using web service technology
Article
The standard approach to recognizing text in images consists in first classifying local image regions into candidate characters and then combining them with high-level word models such as conditional random fields. This paper explores a new paradigm that departs from this bottom-up view. We propose to embed word labels and word images into a common Euclidean space. Given a word image to be recognized, the text recognition problem is cast as one of retrieval: find the closest word label in this space. This common space is learned using the Structured SVM framework by enforcing matching label-image pairs to be closer than non-matching pairs. This method presents several advantages: it does not require ad-hoc or costly pre-/post-processing operations, it can build on top of any state-of-the-art image descriptor (Fisher vectors in our case), it allows for the recognition of never-seen-before words (zero-shot recognition) and the recognition process is simple and efficient, as it amounts to a nearest neighbor search. Experiments are performed on challenging datasets of license plates and scene text. The main conclusion of the paper is that with such a frugal approach it is possible to obtain results which are competitive with standard bottom-up approaches, thus establishing label embedding as an interesting and simple to compute baseline for text recognition.
Conference Paper
This paper extends our work on automated discovery of geometric theorems from diagrams by taking scanned and photographed images instead of images produced with dynamic geometry software. We first adopt techniques of Hough transform and randomized detection algorithms to detect geometric objects from scanned and photographed images, then use methods of image matching to recognize labels for the detected geometric objects, and finally employ numerical-computation-based methods to mine geometric relations among the objects. Experiments with a preliminary implementation of the techniques and methods demonstrate the effectiveness and efficiency of geometric information retrieval from scanned and photographed images for the purpose of discovering geometric theorems automatically.
Article
Full-text available
This article presents selected research on the development of complex fundamentals of building intelligent interactive systems for design of machine elements and assemblies on the basis of its features described in a natural language. We propose a new method for handwriting recognition that utilizes geometric features of letters. The article deals with recognition of isolated handwritten characters using neural networks. As a result of the geometrical analysis, graphical representations of recognized characters are obtained in the form of pattern descriptions of isolated characters. Selected parameters of the characters are inputs to the neural network for writing recognition which is font independent. In this article, we present a new method for off-line natural writing recognition and also describe our research and conclusions on the experiments.
Chapter
Document segmentation is the process of dividing a document (handwritten or printed) into its base components (lines, words, characters). Once the zones (text and non-text) have been identified, the segmentation of the text elements can begin. Several challenges exist which need to be worked out in order to segment the elements correctly. For line segmentation, touching, broken, or overlapping text lines frequently occur. Handwritten documents have the additional challenge of curvilinear lines. Once a line has been segmented, it is processed to further segment it into characters. Similar problems of touching and broken elements exist for characters. An added level of complexity exists since documents have a degree of noise which can come from scanning, photocopying, or from physical damage. Historical documents have some amount of degradation to them. In addition, variation of typefaces, for printed text, and styles for handwritten text bring new difficulties for segmentation and recognition algorithms. This chapter contains descriptions of some methodologies, presented from recent research, that propose solutions that overcome these obstacles. Line segmentation solutions include horizontal projection, region growth techniques, probability density, and the level set method as possible, albeit partial, solutions. A method of angle stepping to detect angles for slanted lines is presented. Locating the boundaries of characters in historical, degraded ancient documents employs multi-level classifiers, and a level set active contour scheme as a possible solution. Mathematical expressions are generally more complex since the layout does not follow standard and typical text blocks. Lines can be composed of split sections (numerator and denominator), can have symbols spanning and overlapping other elements, and contain a higher concentration of superscript and subscript characters than regular text lines. Template matching is described as a partial solution to segment these characters. The methods described here apply to both printed and handwritten. They have been tested on Latin-based scripts as well as Arabic, Dari, Farsi, Pashto, and Urdu.
Chapter
One of the first application domains for computer science was Optical Character Recognition. At that time, it was expected that a machine would quickly be able to read any document. History has proven that the task was more difficult than that. This chapter explores the history of the document analysis and recognition domain, from OCR to page analysis and on to the open problems which are still to be completely dealt with.
Article
Full-text available
To reduce the false alarm from the smoke detectors, a series of experiments were conducted to collect and analyse the time series of signal pattern generated by the detectors under three fire categories – flaming fire (propanol), smoldering fire (cloth-cotton) and non-fire sources(joss stick and steam). The time series of each fire category was studied. Dissimilarity measure such as Euclidean Distance was used to discriminate the data collected. It is used to classify the fire category into fire class or non-fire class and enhance the effectiveness of the fire alarm judgment system. 30 sets of learning samples of each fire category (total: 120 experiments) are collected. It showed that the accuracy of smoke detector was over 80%. If multi-sensor (smoke and heat) detector was used, the accuracy was over 90%.
Article
Full-text available
P300 spellers can provide a means of communication for individuals with severe neuromuscular limitations. However, its use as an effective communication tool is reliant on high P300 classification accuracies (>70%) to account for error revisions. Error-related potentials (ErrP), which are changes in EEG potentials when a person is aware of or perceives erroneous behaviour or feedback, have been proposed as inputs to drive corrective mechanisms that veto erroneous actions by BCI systems. The goal of this study is to demonstrate that training an additional ErrP classifier for a P300 speller is not necessary, as we hypothesize that error information is encoded in the P300 classifier responses used for character selection. We perform off-line simulations of P300 spelling to compare ErrP and non-ErrP based corrective algorithms. A simple dictionary correction based on string matching and word frequency significantly improved accuracy (35-185%), in contrast to an ErrP-based method that flagged, deleted and replaced erroneous characters (-50-0%). Providing additional information about the likelihood of characters to a dictionary-based correction further improves accuracy. Our Bayesian dictionary-based correction algorithm that utilizes P300 classifier confidences performed comparably (44-416%) to an oracle ErrP dictionary-based method that assumed perfect ErrP classification (43-433%).
Article
In this paper we present a new OCR-concept designed for the requirements of historic prints in the context of mass-digitizations. The core part is the glyph recognition, based on pattern matching with patterns that are derived from computer font glyphs and are generated on-the-fly. The classification of a sample is organized as a search process for the most similar glyph pattern. This results in consistently good hit rates for arbitrary fonts without any training. In particular, we investigate the performance of our prototype in comparison to popular commercially available OCR-software.
Article
Full-text available
Feature extraction is extracting from the raw data the information which is most relevant for classification purposes. The following subsections describe the techniques and algorithms used to extract an assortment of features used in this research.  We start by detecting the secondary components of the Arabic letters and extracting features from these components.  Then we remove the secondary components and extract additional features from o the main body, o the main body's skeleton, and o the main body's boundary.
Book
This book covers up-to-date methods and algorithms for the automated analysis of engineering drawings and digital cartographic maps. The Non-Deterministic Agent System (NDAS) offers a parallel computational approach to such image analysis. The book describes techniques suitable for persistent and explicit knowledge representation for engineering drawings and digital maps. It also highlights more specific techniques, e.g., applying robot navigation and mapping methods to this problem. Also included are more detailed accounts of the use of unsupervised segmentation algorithms to map images. Finally, all these threads are woven together in two related systems: NDAS and AMAM (Automatic Map Analysis Module).
Article
Full-text available
Artykuł przedstawia rozwiązanie zintegrowa-nego, opartego na treści systemu indeksowania i wyszuki-wania w archiwum multimedialnym. Treści są automatycz-nie indeksowane przy uyciu technik OCR, ASR i rozpo-znawania twarzy. Zapis metadanych korzysta częściowo z technik i standardów: Dublin Core, MPEG-7 i SQL. System jest konstruowany jako część międzynarodowego projektu OASIS Archive. 1. WSTĘP Jednym z kluczowych aspektów prawidłowego two-rzenia archiwum treści multimedialnych jest dbałość w czasie tworzenie moliwie właściwego i dokładnego opisu archiwizowanych materiałów. Ostatnio, prócz standardowego, ręcznie generowanego opisu, coraz większą rolę zaczynają odgrywać metadane generowane automatycznie na podstawie analizy treści. Rozdział drugi niniejszego artykułu, przedstawia w stopniu podstawowym dotychczasowy stan badań zwią-zanych z automatycznym generowaniem metadanych przy uyciu rónorakich mechanizmów analizy treści multimedialnej. W rozdziale trzecim zaś omówiono szczegółowiej jedno z rozwiązań zintegrowanego, opar-tego na treści indeksowania i wyszukiwania w archiwum multimedialnym, będące w duym zakresie wynikiem prac autorów artykułu. W zakończeniu przedstawiono wnioski, podsumowania i przewidywane dalsze etapy pracy.
Article
We present a methodology for fast and accurate detection of social security numbers for automatic processing of handwritten forms. We use a K-means clustering classifier that operates on a low dimensional space of features to identify indi-vidual numerals. We cross-validate the results of the classification against a database of employee social security numbers and demonstrate that the probability of misclassification is extremely low.
Article
Compared with machine-printed c h a r acters, hand-writings have variety of shape deformations. One of the important goals of character recognition is to nd some qualitative features that are invariant un-der deformation of shapes. In this paper, we propose a method for structural analysis of on-line handwrit-ten curves based o n t o p ological turning patterns. The topological turning pattern is described by initial di-rection, directional change, and innection number of the curve, computed from directional features of the segments that constitute the curve. The mathemat-ical properties of the topological turning pattern is explained along with the experimental results of the method on numerals.
Article
The last main stage in an automatic number plate recognition system (ANPRs) is optical character recognition (OCR), where the number plate characters on the number plate image are converted into encoded texts. In this study, an artificial neural network-based OCR algorithm for ANPR application and its efficient architecture are presented. The proposed architecture has been successfully implemented and tested using the Mentor Graphics RC240 field programmable gate arrays (FPGA) development board equipped with a 4M Gates Xilinx Virtex-4 LX40. A database of 3570 UK binary character images have been used for testing the performance of the proposed architecture. Results achieved have shown that the proposed architecture can meet the real-time requirement of an ANPR system and can process a character image in 0.7 ms with 97.3% successful character recognition rate and consumes only 23% of the available area in the used FPGA.
Article
The cursive nature of Arabic writing is the main challenge to Arabic Optical Character Recognition developer. Methods to segment Arabic words into characters have been proposed. This paper provides a comprehensive review of the methods proposed by researchers to segment Arabic characters. The segmentation methods are categorized into nine different methods based on techniques used. The advantages and drawbacks of each are presented and discussed. Most researchers did not report the segmentation accuracy in their research; instead, they reported the overall recognition rate which did not reflect the influence of each sub-stage on the final recognition rate. The size of the training/testing data was not large enough to be generalized. The field of Arabic Character Recognition needs a standard set of test documents in both image and character formats, together with the ground truth and a set of performance evaluation tools, which would enable comparing the performance of different algorithms. As each method has its strengths, a hybrid segmentation approach is a promising method. The paper concludes that there is still no perfect segmentation method for ACR and much opportunity for research in this area.
Conference Paper
Full-text available
We propose a new method for natural writing recognition that utilizes geometric features of letters. The paper deals with recognition of isolated handwritten characters using an artificial neural network. As a result of the geometrical analysis realized, graphical representations of recognized characters are obtained in the form of pattern descriptions of isolated characters. The radius measurements of the characters obtained are inputs to the neural network for natural writing recognition which is font independent. In this paper, we present a new method for off-line natural writing recognition and also describe our research and tests performed on the neural network.
Article
Scientists have long dreamed of creating machines humans could interact with by voice. Although one no longer believes Turing's prophecy that machines will be able to converse like humans in the near future, real progress has been made in the voice and text-based human-machine interaction. This paper is a light introduction and survey of some deployed natural language systems and technologies and their historical evolution. We review two fundamental problems involving natural language: the language prediction problem and the language understanding problem. While describing in detail all these technologies is beyond our scope, we do comment on some aspects less discussed in the literature such as language prediction using huge models and semantic labeling using Marcus contextual grammars.
Article
Full-text available
The problem of handwritten digit recognition has long been an open problem in the field of pattern classification and of great importance in industry. The heart of the problem lies within the ability to design an efficient algorithm that can recognize digits written and submitted by users via a tablet, scanner, and other digital devices. From an engineering point of view, it is desirable to achieve a good performance within limited resources. To this end, we have developed a new approach for handwritten digit recognition that uses a small number of patterns for training phase. To improve the overall performance achieved in classification task, the literature suggests combining the decision of multiple classifiers rather than using the output of the best classifier in the ensemble; so, in this new approach, an ensemble of classifiers is used for the recognition of handwritten digit. The classifiers used in proposed system are based on singular value decomposition (SVD) algorithm. The experimental results and the literature show that the SVD algorithm is suitable for solving sparse matrices such as handwritten digit. The decisions obtained by SVD classifiers are combined by a novel proposed combination rule which we named reliable multi-phase particle swarm optimization. We call the method “Reliable” because we have introduced a novel reliability parameter which is applied to tackle the problem of PSO being trapped in local minima. In comparison with previous methods, one of the significant advantages of the proposed method is that it is not sensitive to the size of training set. Unlike other methods, the proposed method uses just 15 % of the dataset as a training set, while other methods usually use (60–75) % of the whole dataset as the training set. To evaluate the proposed method, we tested our algorithm on Farsi/Arabic handwritten digit dataset. What makes the recognition of the handwritten Farsi/Arabic digits more challenging is that some of the digits can be legally written in different shapes. Therefore, 6000 hard samples (600 samples per class) are chosen by K-nearest neighbor algorithm from the HODA dataset which is a standard Farsi/Arabic digit dataset. Experimental results have shown that the proposed method is fast, accurate, and robust against the local minima of PSO. Finally, the proposed method is compared with state of the art methods and some ensemble classifier based on MLP, RBF, and ANFIS with various combination rules.
Article
Morphological operators are commonly used in image processing. We study their suitability for use in synthetic aperture radar (SAR) image enhancement and target classification. Morphological operations are nonlinear operators defined by set theory. The dilation and erosion operations grow or shrinkimage features that match to a predefined structuring element. The opening and closing operations are combinations of successive dilation and erosion. These morphological operations can visually emphasize scattering of interest in an image. We investigate whether these operations can also improve target classification performance. The operators are nonlinear and image dependent; thus we cannot predict performance without empirical testing. We test and evaluate the morphological operators using simulated and measured SAR data. Results show the dilation operator is most promising for increasing match score and separation between classes in the decision space.
Article
Full-text available
Purpose – This paper aims to present an evaluation of open source OCR for supporting research on material in small‐ to medium‐scale historical archives. Design/methodology/approach – The approach was to develop a workflow engine to support the easy customisation of the OCR process towards the historical materials using open source technologies. Commercial OCR often fails to deliver sufficient results here, as their processing is optimised towards large‐scale commercially relevant collections. The approach presented here allows users to combine the most effective parts of different OCR tools. Findings – The authors demonstrate their application and its flexibility and present two case studies, which demonstrate how OCR can be embedded into wider digitally enabled historical research. The first case study produces high‐quality research‐oriented digitisation outputs, utilizing services that the authors developed to allow for the direct linkage of digitisation image and OCR output. The second case study demonstrates what becomes possible if OCR can be customised directly within a larger research infrastructure for history. In such a scenario, further semantics can be added easily to the workflow, enhancing the research browse experience significantly. Originality/value – There has been little work on the use of open source OCR technologies for historical research. This paper demonstrates that the authors' workflow approach allows users to combine commercial engines' ability to read a wider range of character sets with the flexibility of open source tools in terms of customisable pre‐processing and layout analysis. All this can be done without the need to develop dedicated code.
Article
A novel device developed as a rehabilitation tool for people having difficulties in reading printed text is presented. The proposed system enables the user to randomly access the words of a printed text by directing a handheld pointer that is similar to a pencil. The application of modern information technologies makes it possible the implementation of a device that combines effectiveness, robustness, and friendliness of use. A micro camera is fixed to the pointer and frames the word the user wants to read. A voice synthesizer reproduces the word just recognized. A suitable measurement of the device motion based on real time image processing allows determining when the user has stopped the pointer thus indicating the wish to hear the pointed word(s). A trial run with a selected group of pupils shows the potential of the novel system.
Article
Handwritten Character Recognition is an important part of Pattern Recognition. This is also referred to as Intelligent Character Recognition (ICR). In this paper, a conditional probability based combination of multiple recognizers for character recognition will be introduced. After preprocessing the given character image, different feature recognition algorithms are employed, and their performance on a given training set is analyzed. The reliability of the recognition algorithms is measured in terms of Conditional Probabilities. A rule based on their reliability is identified to combine all these individual feature recognition algorithms by incorporating their interdependence.
Article
Full-text available
In this paper, we propose a content selection framework that improves the users’ experience when they are enriching or authoring pieces of news. This framework combines a variety of techniques to retrieve semantically related videos, based on a set of criteria which are specified automatically depending on the media’s constraints. The combination of different content selection mechanisms can improve the quality of the retrieved scenes, because each technique’s limitations are minimized by other techniques’ strengths. We present an evaluation based on a number of experiments, which show that the retrieved results are better when all criteria are used at time.
Article
Full-text available
The digitization of scanned forms and documents is changing the data sources that enterprises manage. To integrate these new data sources with enterprise data, the current state-of-the-art approach is to convert the images to ASCII text using optical character recognition (OCR) software and then to store the resulting ASCII text in a relational database. The OCR problem is challenging, and so the output of OCR often contains errors. In turn, queries on the output of OCR may fail to retrieve relevant answers. State-of-the-art OCR programs, e.g., the OCR powering Google Books, use a probabilistic model that captures many alternatives during the OCR process. Only when the results of OCR are stored in the database, do these approaches discard the uncertainty. In this work, we propose to retain the probabilistic models produced by OCR process in a relational database management system. A key technical challenge is that the probabilistic data produced by OCR software is very large (a single book blows up to 2GB from 400kB as ASCII). As a result, a baseline solution that integrates these models with an RDBMS is over 1000x slower versus standard text processing for single table select-project queries. However, many applications may have quality-performance needs that are in between these two extremes of ASCII and the complete model output by the OCR software. Thus, we propose a novel approximation scheme called Staccato that allows a user to trade recall for query performance. Additionally, we provide a formal analysis of our scheme's properties, and describe how we integrate our scheme with standard-RDBMS text indexing.
Article
Nowadays, the use of medical sensors with embedded communication modules provides accurate number reading and automatic recording. However, such readers are usually more expensive than similar devices without an embedded communication module. Further, different vendors define proprietary communication protocols and data formats for their own medical sensors. Due to the twin issues of high cost and diversity of standards, the automatic collection of patients' vital signs is not common in hospitals, meaning that medical staff need to periodically collect all patients' vital signs. This may cause further problems in caring for patients. We propose a low-cost reader using a cheap web camera to automatically read vital sign monitors in hospitals. The reader uses a high-resolution web camera to take a series of pictures of vital sign monitors and recognizes vital signs in electronic form and then forwards that information to hospital information systems. Its major benefit is that different sensors equipped with vital sign monitors, whether they include a computer communications module, can be digital-number recognized. It saves time in recording monitored vital signs of patients widely located in hospitals. In sum medical staff care of patients may be usefully assisted by the proposed reader which automatically collects all patients' vital signs, significantly improving patient care.
Article
With a selection of biomedical literature available for open access, a natural pairing seems to be the use of open source software to automatically analyze content, in particular, the content of gures. Considering the large number of possible tools and approaches, we choose to focus on the recognition of printed characters. As the problem of optical character recognition (OCR) under rea- sonable conditions is considered to be solved, and as open source software is fully capable of isolating the location of characters and identifying most of them accurately, we instead use OCR as an application area for the relatively recent development of compressive sampling, and in particular a fast implementation called compressive sensing matching pursuit (CoSaMP). Compressive sampling enables recovery of a signal from noisy measurements if certain rigorous mathe- matical conditions hold on previously measured samples, the mathematical con- ditions stating that measured samples must be essentially nearly perpendicular, orthogonal, to each other. For OCR, we investigate approximating such nearly orthogonal samples by selecting random curves, then using CoSaMP to deter- mine a sparse number of samples approximating character shapes. We compare the accuracy of three di erent methods of applying CoSaMP to the problem of matching a blurred character to one of a set of previously sampled characters. We show numerically that selecting random curves does not satisfy the strict mathematical conditions for compressive sampling theory to guarantee optimal solutions. However, character matching strategies using CoSaMP transformed characters can be developed whose accuracy is roughly comparable to a base- line comparison of blurred characters with original characters, suggesting that OCR is an example where the performance of compressive sampling methods declines gracefully as conditions are weakened on the sampling matrix.
Article
Handwriting recognition (HWR) on whiteboards experiences, due to its usage in so-called “Smart-Meeting-Rooms”, growing attention in the field of pattern recognition. Herein, distortions caused by the writer’s upright position are a challenge. In this thesis, systems for on-line HWR of whiteboards notes based on both continuous and discrete Hidden-Markov-Models (HMM) are developed and enhanced. Relevant features are selected and the pen’s pressure information is modeled in a lossless and implicit manner. The script lines within a line of text written on a whiteboard suffer from distortions. Hence, a novel approach for identifying the script lines in those texts is presented.
Article
Available in film copy from University Microfilms International. Thesis (Ph. D.)--Brown University, 2001. Vita. Thesis advisor: Stuart Geman. Includes bibliographical references (leaves 85-87).
Article
The volume of cargo flowing through today's transportation system is growing at an ever increasing rate. Recent studies show that 90% of all international cargo that enters the United States flows through our vast seaport system. When this cargo enters the US, time is of the essence to quickly obtain and verify its identity, screen it against an ever increasingly wide variety of security concerns, and ultimately correctly direct the cargo towards its final destination. Over the past few years, new port and container security initiatives and regulations have generated huge interest in the need for accurate real-time identification and tracking of incoming and outgoing traffic of vehicles and cargo. On the contrary, the manually intensive identification and tracking processes, typically employed today, are inherently both inefficient and inadequate, and can be seen as a possible enabling factor for potential threats to our ports and therefore our national security. The contradiction between current and required processes coupled to the correlation with accelerated growth in container traffic, has clearly identified the need for a solution. One heavily researched option is the utilization of video based systems implementing Optical Character Recognition (OCR) processes for automatically extracting the unique container identification code to expedite the flow of cargo through various points in the seaport. The actual current process of how this occurs along with the opportunities and challenges for adding such a technological solution will be investigated in great detail. This thesis will investigate the feasibility of application of motion compensation algorithms as an enhancement to OCR systems specifically designed to address the challenges of OCR of cargo containers in a seaport environment. This motion compensation could offer a cost effective alternative to the sophisticated hardware systems currently being offered to US ports. Saad, Ashraf, Committee Member ; Jackson, Joel, Committee Chair ; AlRegib, Ghassan, Committee Member. Thesis (M. S.)--Electrical and Computer Engineering, Georgia Institute of Technology, 2006.
ResearchGate has not been able to resolve any references for this publication.