Thomas M. Breuel

Technische Universität Kaiserslautern, Kaiserlautern, Rheinland-Pfalz, Germany

Are you Thomas M. Breuel?

Claim your profile

Publications (193)24.78 Total impact

  • Joost van Beusekom, Faisal Shafait, Thomas M. Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: Authentication of documents can be done by detecting the printing device used to generate the print-out. Many manufacturers of color laser printers and copiers designed their devices in a way to integrate a unique tracking pattern in each print-out. This pattern is used to identify the exact device the print-out originates from. In this paper, we present an important extension of our previous work for (a) detecting the class of printer that was used to generate a print-out, namely automatic methods for (b) comparing two base patterns from two different print-outs to verify if two print-outs come from the same printer and for (c) automatic decoding of the base pattern to extract the serial number and, if available, the time and the date the document was printed. Finally, we present (d) the first public dataset on tracking patterns (also called machine identification codes) containing 1,264 images from 132 different printers. Evaluation on this dataset resulted in accuracies of up to 93.0 % for detecting the printer class. Comparison and decoding of the tracking patterns achieved accuracies of 91.3 and 98.3 %, respectively.
    Formal Pattern Analysis & Applications 11/2013; · 0.74 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We address the challenge of sentiment analysis from visual content. In contrast to existing methods which infer sentiment or emotion directly from visual low-level features, we propose a novel approach based on understanding of the visual concepts that are strongly related to sentiments. Our key contribution is two-fold: first, we present a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology (VSO) consisting of more than 3,000 Adjective Noun Pairs (ANP). Second, we propose SentiBank, a novel visual concept detector library that can be used to detect the presence of 1,200 ANPs in an image. The VSO and SentiBank are distinct from existing work and will open a gate towards various applications enabled by automatic sentiment analysis. Experiments on detecting sentiment of image tweets demonstrate significant improvement in detection accuracy when comparing the proposed SentiBank based predictors with the text-based approaches. The effort also leads to a large publicly available resource consisting of a visual sentiment ontology, a large detector library, and the training/testing benchmark for visual sentiment analysis.
    Proceedings of the 21st ACM international conference on Multimedia; 10/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Historical text presents numerous challenges for contemporary different techniques, e.g. information retrieval, OCR and POS tagging. In particular, the absence of consistent orthographic conventions in historical text presents difficulties for any system which requires reference to a fixed lexicon accessed by orthographic form. For example, language modeling or retrieval engine for historical text which is produced by OCR systems, where the spelling of words often differ in various way, e.g. one word might have different spellings evolved over time. It is very important to aid those techniques with the rules for automatic mapping of historical wordforms. In this paper, we propose a new technique to model the target modern language by means of a recurrent neural network with long-short term memory architecture. Because the network is recurrent, the considered context is not limited to a fixed size especially due to memory cells which are designed to deal with long-term dependencies. In the set of experiments conducted on the Luther bible database and transform wordforms from Early New High German (ENHG) 14th - 16th centuries to the corresponding modern wordforms in New High German (NHG). We compare our proposed supervised model LSTM to various methods for computing word alignments using statistical, heuristic models. Our new proposed LSTM outperforms the other three state-of-the-art methods. The evaluation shows the accuracy of our model on the known wordforms is 93.90% and on the unknown wordforms is 87.95%, while the accuracy of the existing state-of-the-art combined approach of the wordlist-based and rule-based normalization models is 92.93% for known and 76.88% for unknown tokens. Our proposed LSTM model outperforms on normalizing the modern wordform to historical wordform. The performance on seen tokens is 93.4%, while for unknown tokens is 89.17%.
    Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing; 08/2013
  • Source
    Adnan Ul-Hasan, Faisal Shafait, Thomas M. Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: Long Short-Term Memory (LSTM) networks have yielded excellent results on handwriting recognition. This paper describes an application of bidirectional LSTM networks to the problem of machine-printed Latin and Fraktur recognition. Latin and Fraktur recognition differs significantly from handwriting recognition in both the statistical properties of the data, as well as in the required, much higher levels of accuracy. Applications of LSTM networks to handwriting recognition use two-dimensional recurrent networks, since the exact position and baseline of handwritten characters is variable. In contrast, for printed OCR, we used a one-dimensional recurrent network combined with a novel algorithm for baseline and x-height normalization. A number of databases were used for training and testing, including the UW3 database, artificially generated and degraded Fraktur text and scanned pages from a book digitization project. The LSTM architecture achieved 0:6% character-level test-set error on English text. When the artificially degraded Fraktur data set is divided into training and test sets, the system achieves an error rate of 1:64%. On specific books printed in Fraktur (not part of the training set), the system achieves error rates of 0:15% (Fontane) and 1:47% (Ersch-Gruber). These recognition accuracies were found without using any language modelling or any other post-processing techniques.
    International Conference on Document Analysis and Recognition, Washington D.C., USA; 08/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Recurrent neural networks (RNN) have been suc-cessfully applied for recognition of cursive handwritten docu-ments, both in English and Arabic scripts. Ability of RNNs to model context in sequence data like speech and text makes them a suitable candidate to develop OCR systems for printed Nabataean scripts (including Nastaleeq for which no OCR system is available to date). In this work, we have presented the results of applying RNN to printed Urdu text in Nastaleeq script. Bidirectional Long Short Term Memory (BLSTM) architecture with Connectionist Temporal Classification (CTC) output layer was employed to recognize printed Urdu text. We evaluated BLSTM networks for two cases: one ignoring the character's shape variations and the second is considering them. The recognition error rate at character level for first case is 5.15% and for the second is 13.6%. These results were obtained on synthetically generated UPTI dataset containing artificially degraded images to reflect some real-world scanning artefacts along with clean images. Comparison with shape-matching based method is also presented.
    International Conference on Document Analysis and Recognition, Washington D.C., USA; 08/2013
  • Nibal Nayef, Thomas M. Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: Symbol retrieval is important for content-based search in digital libraries and for automatic interpretation of line drawings. In this work, we present a complete symbol retrieval system. The proposed system has an off-line content-analysis stage, where the contents of a database of line drawings are represented as a symbol index, which is a compact indexable representation of the database. Such representation allows efficient on-line query retrieval. Within the retrieval system, three methods are presented. First, a feature grouping method for identifying local regions of interest (ROIs) in the drawings. The found ROIs represent symbols' parts. Second, a clustering method based on geometric matching, is used to cluster the similar parts from all the drawings together. A symbol index is then constructed from the clusters' representatives. Finally, the ROIs of a query symbol are matched to the clusters' representatives. The matching symbols' parts are retrieved from the clusters, and spatial verification is performed on the matching parts. By using the symbol index we are able to achieve a query look-up time that is independent of the database size, and dependent on the size of the symbol index. The retrieval system achieves higher recall and precision than state-of-the-art methods.
    Proc SPIE 01/2013;
  • Mayce Al Azawi, Marcus Liwicki, Thomas M. Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: This work proposes several approaches that can be used for generating correspondences between real scanned books and their transcriptions which might have different modifications and layout variations, also taking OCR errors into account. Our approaches for the alignment between the manuscript and the transcription are based on weighted finite state transducers (WFST). In particular, we propose adapted WFSTs to represent the transcription to be aligned with the OCR lattices. The character-level alignment has edit rules to allow edit operations (insertion, deletion, substitution). Those edit operations allow the transcription model to deal with OCR segmentation and recognition errors, and also with the task of aligning with different text editions. We implemented an alignment model with a hyphenation model, so it can adapt the non-hyphenated transcription. Our models also work with Fraktur ligatures, which are typically found in historical Fraktur documents. We evaluated our approach on Fraktur documents from Wanderungen durch die Mark Brandenburg" volumes (1862-1889) and observed the performance of those models under OCR errors. We compare the performance of our model for three different scenarios: having no information about the correspondence at the word (i), line (ii), sentence (iii) or page (iv) level.
    Proc SPIE 01/2013;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Long Short-Term Memory (LSTM) networks have yielded excellent results on handwriting recognition. This paper describes an application of bidirectional LSTM networks to the problem of machine-printed Latin and Fraktur recognition. Latin and Fraktur recognition differs significantly from handwriting recognition in both the statistical properties of the data, as well as in the required, much higher levels of accuracy. Applications of LSTM networks to handwriting recognition use two-dimensional recurrent networks, since the exact position and baseline of handwritten characters is variable. In contrast, for printed OCR, we used a one-dimensional recurrent network combined with a novel algorithm for baseline and x-height normalization. A number of databases were used for training and testing, including the UW3 database, artificially generated and degraded Fraktur text and scanned pages from a book digitization project. The LSTM architecture achieved 0.6% character-level test-set error on English text. When the artificially degraded Fraktur data set is divided into training and test sets, the system achieves an error rate of 1.64%. On specific books printed in Fraktur (not part of the training set), the system achieves error rates of 0.15% (Fontane) and 1.47% (Ersch-Gruber). These recognition accuracies were found without using any language modelling or any other post-processing techniques.
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on; 01/2013
  • Nibal Nayef, Thomas M. Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: Symbol spotting is important for automatic interpretation of technical line drawings. Current spotting methods are not reliable enough for such tasks due to low precision rates. In this paper, we combine a geometric matching-based spotting method with an SVM classifier to improve the precision of the spotting. In symbol spotting, a query symbol is to be located within a line drawing. Candidate matches can be found, however, the found matches may be true or false. To distinguish a false match, an SVM classifier is used. The classifier is trained on true and false matches of a query symbol. The matches are represented as vectors that indicate the qualities of how well the query features are matched, those qualities are obtained via geometric matching. Using the classification, the precision of the spotting improved from an average of 76.6% to an average of 97.2% on a database of technical line drawings.
    Proc SPIE 01/2013;
  • N. Nayef, T.M. Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a method for analysing symbol alphabets of technical line drawings and finding their underlying structure, which is important for investigating (dis)similarity of different symbols. The proposed method constructs a hierarchical structure of a set of technical symbols. The method is based on agglomerative hierarchical clustering that uses either of two variants as a similarity measure: either geometric matching between symbols' shapes, or an off-the-shelf shape descriptor. Identifying such a hierarchical structure of a set of symbols can improve symbol recognition / spotting systems, as it helps with scalability issues, and provides information on the degree of similarity among symbols, so that those systems can automatically adapt their parameter values for more accurate recognition. Our method has been tested on the symbol alphabet of the symbol recognition / spotting contest of GREC-2011, and achieved promising results.
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on; 01/2013
  • Source
    Adnan Ul-Hasan, Thomas M Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: Language models or recognition dictionaries are usually con-sidered an essential step in OCR. However, using a lan-guage model complicates training of OCR systems, and it also narrows the range of texts that an OCR system can be used with. Recent results have shown that Long Short-Term Memory (LSTM) based OCR yields low error rates even without language modeling. In this paper, we explore the question to what extent LSTM models can be used for multilingual OCR without the use of language models. To do this, we measure cross-language performance of LSTM models trained on different languages. LSTM models show good promise to be used for language-independent OCR. The recognition errors are very low (around 1%) without using any language model or dictionary correction.
    4th International Workshop on Multilingual OCR, Washington D.C., USA; 01/2013
  • S.S. Bukhari, F. Shafait, T.M. Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: Text-line extraction is the backbone of document image analysis. Since decades, a large number of text-line finding methods have been proposed, where these methods rely on certain assumptions about a target class of documents with respect to writing styles, digitization methods, intensity values, and scripts. There is no generic text-line finding method that can be robustly applied to a large variety of simple and complex document images. We introduced the ridge-based text-line finding method, and published its initial results for curled text-line detection on camera-captured document images. In this paper, we demonstrates our ridge-based method as a generic text-line finding approach that can be robustly applied on a diverse collection of simple and complex document images. The comprehensive performance evaluation of the ridge-based method and its comparison with several state-of-the-art methods is presented in the paper. For this purpose, diverse categories of publicly available and standard datasets have been selected: UWIII (scanned, printed English script), DFKI-I (camera-captured, printed English script), UMD (handwritten Chinese, Hindi, and Korean scripts), ICDAR2007 handwritten segmentation contest (handwritten English, French, German and Greek scripts), Arabic/Urdu (scanned, printed script), and Fraktur (scanned, calligraphic German script). Experiments on these datasets show that the ridge-based method achieves better text-line extraction results as those of the best performing, domain-specific text-line finding methods. Firstly, these results show that the ridge-based method is a generic text-line extraction method. Secondly, these results are also helpful for the community to assess the advantages of this method.
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on; 01/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A large amount of real-world data is required to train and benchmark any character recognition algo-rithm. Developing a page-level ground-truth database for this purpose is overwhelmingly laborious, as it in-volves a lot of manual efforts to produce a reason-able database that covers all possible words of a lan-guage. Moreover, generating such a database for his-torical (degraded) documents or for a cursive script like Urdu 1 is even more complex and grueling. The pre-sented work attempts to solve this problem by propos-ing a semi-automated technique for generating ground-truth database. It is believed that the proposed automa-tion will greatly reduce the manual efforts for devel-oping any OCR database. The basic idea is to apply ligature-clustering prior to manual labeling. Two pro-totype datasets for Urdu script have been developed us-ing the proposed technique and the results are also pre-sented.
    International Conference on Pattern Recognition, Japan; 11/2012
  • Damian Borth, Adrian Ulges, Thomas M. Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a novel approach towards automatic vocabulary selection for video concept detection. Our key idea is to expand concept vocabularies with trending topics that we mine automatically on other media like Wikipedia or Twitter. We evaluate several strategies for extending concept detection to auto-detect these topics in new videos, either by linking them to a static concept vocabulary, by a visual learning of trends on the fly, or by an expansion of the vocabulary. Our study on 6,800 YouTube clips and the top 23 target trends (covering a timespan of 6 months) demonstrates that a direct visual classification of trends (by a "live" learning on trend videos) outperforms an inference from static vocabularies. However, further improvements can be achieved by a combination of both approaches.
    10/2012;
  • Mohammad Reza Yousefi, Thomas M. Breuel
    [Show abstract] [Hide abstract]
    ABSTRACT: We study boosting by using a gating mechanism, Gated Boosting, to perform resampling instead of the weighting mechanism used in Adaboost. In our method, gating networks determine the distribution of the samples for training a consecutive base classifier, considering the predictions of the prior base classifiers. Using gating networks prevents the training instances from being repeatedly included in different subsets used for training base classifiers, being a key goal in achieving diversity. Furthermore, these are the gating networks that determine which classifiers' output to be pooled for producing the final output. The performance of the proposed method is demonstrated and compared to Adaboost on four benchmarks from the UCI repository, and MNIST dataset.
    Proceedings of the 35th Annual German conference on Advances in Artificial Intelligence; 09/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Document images prove to be a difficult case for standard stereo correspondence approaches. One of the major problem is that document images are highly self-similar. Most algorithms try to tackle this problem by incorporating a global optimization scheme, which tends to be computationally expensive. In this paper, we show that incorporation of layout information into the matching paradigm, as a grouping entity for features, leads to better results in terms of robustness, efficiency, and ultimately in a better D model of the captured document, that can be used in various document restoration systems. This can be seen as a divide and conquer approach that partitions the search space into portions given by each grouping entity and then solves each of them independently. As a grouping entity text-lines are preferred over individual character blobs because it is easier to establish correspondences. Text-line extraction works reasonably well on stereo image pairs in the presence of perspective distortions. The proposed approach is highly efficient and matches obtained are more reliable. The claims are backed up by showing their practical applicability through experimental evaluations.
    01/2012;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Table of Contents (ToC) is an integral part of multiple-page documents like books, magazines, etc. Most of the existing techniques use textual similarity for automatically detecting ToC pages. However, such techniques may not be applied for detection of ToC pages in situations where OCR technology is not available, which is indeed true for historical documents and many modern Nabataean (Arabic) and Indic scripts. It is, therefore, necessary to develop tools to navigate through such documents without the use of OCR. This paper reports a preliminary effort to address this challenge. The proposed algorithm has been applied to find Table of Contents (ToC) pages in Urdu books and an overall initial accuracy of 88% has been achieved.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a novel method for automatic text-line parameter selection for stereo image pairs. The parameters are selected such that correspondence between the same content in a stereo pair is maximized. Automatic parameter selection has been carried out by establishing robust text-line correspondence which is also a contribution of the presented work. The proposed method is applied to one text-line extraction algorithm as a proof of concept. The results are compared with the ground truth to show the validity of the method.
    Pattern Recognition (ICPR), 2012 21st International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Choosing a suitable classifier for a given dataset is an important part of developing a pattern recognition system. Since a large variety of classification algorithms are proposed in literature, non-experts do not know which method should be used in order to obtain good classification results on their data. Meta-learning tries to address this problem by recommending promising classifiers based on meta-features computed from a given dataset. In this paper, we empirically evaluate five different categories of state-of-the-art meta-features for their suitability in predicting classification accuracies of several widely used classifiers (including Support Vector Machines, Neural Networks, Random Forests, Decision Trees, and Logistic Regression). Based on the evaluation results, we have developed the first open source meta-learning system that is capable of accurately predicting accuracies of target classifiers. The user provides a dataset as input and gets an automatically created high-performance ready-to-use pattern recognition system in a few simple steps. A user study of the system with non-experts showed that the users were able to develop more accurate pattern recognition systems in significantly less development time when using our system as compared to using a state-of-the-art data mining software.
    Formal Pattern Analysis & Applications 01/2012; · 0.74 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In this paper we present a novel method for robust stereo matching on document image pairs. The matching itself is performed using an affine-invariant similarity measurement to compensate for perspective distortions, where affine invariance is achieved by normalization using second-order statistics, to finally allow a simple pixel-wise comparison. To handle the inherent high self-similarity of the page content we apply a dynamic programming approach on text-line pairs. We quantitatively show that the proposed method performs better in comparison to standard approaches using SURF - whether with or without incorporating text-line information.
    Pattern Recognition (ICPR), 2012 21st International Conference on; 01/2012

Publication Stats

1k Citations
24.78 Total Impact Points

Institutions

  • 2006–2013
    • Technische Universität Kaiserslautern
      • • Image Understanding and Pattern Recognition Group
      • • Fachbereich für Informatik
      Kaiserlautern, Rheinland-Pfalz, Germany
  • 2005–2011
    • Deutsches Forschungszentrum für Künstliche Intelligenz
      Kaiserlautern, Rheinland-Pfalz, Germany
  • 2009
    • Technische Universität Berlin
      Berlín, Berlin, Germany
  • 2000–2003
    • Palo Alto Research Center
      Palo Alto, California, United States
  • 1996
    • Idiap Research Institute
      Martigny, Valais, Switzerland
  • 1991–1992
    • Massachusetts Institute of Technology
      Cambridge, Massachusetts, United States