September 2024
·
2 Reads
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
September 2024
·
2 Reads
September 2024
August 2024
·
10 Reads
Neural Computing and Applications
Automatic transcription of large series of historical handwritten documents generally aims at allowing to search for textual information in these documents. However, automatic transcripts often lack the level of accuracy needed for reliable text indexing and search purposes. Probabilistic Indexing (PrIx) offers a unique alternative to raw transcripts. Since it needs training data to achieve good search performance, PrIx-based crowdsourcing techniques are introduced in this paper to gather the required data. In the proposed approach, PrIx confidence measures are used to drive a correction process in which users can amend errors and possibly add missing text. In a further step, corrected data are used to retrain the PrIx models. Results on five large series are reported which show consistent improvements after retraining. However, it can be argued whether the overall costs of the crowdsourcing operation pay off for the improvements, or perhaps it would have been more cost-effective to just start with a larger and cleaner amount of professionally produced training transcripts.
April 2024
·
3 Reads
The proposed PrIx framework is formally presented in this chapter. In short, PrIx aims at processing each text image in such away that all the sets of strokes in the image which can be reasonably interpreted as text elements, such as characters and words, become symbolically represented; that is, represented like electronic text. However, the primary concern of PrIx is to retain all the information needed to also represent the intrinsic uncertainty which underlies text images, and more specifically handwritten text images. A dual presentation is given. First, a “pure” image processing viewpoint is adopted, where each text element in the images is treated just as a small object that has to be somehow detected and identified. This presentation will make it clear that PrIx, and KWS alike, essentially boil down to an object recognition process, where the class’ posterior probability of each object has to be estimated at each image location. Then PrIx will be developed in full detail from another equivalent viewpoint where the underlying object recognition problem is equated to HTR, thereby considering PrIx as a form of HTR which explicitly retains image interpretation uncertainty.
April 2024
·
4 Reads
As discussed in the previous chapter, PrIx (and KWS), can be fruitfully seen under a handwritten text recognition (HTR) viewpoint. So, this chapter reviews segmentation-free approaches to HTR and the corresponding probabilistic models. Pre-processing steps used in traditional HTR workflows, such as handwriting style normalization and hand-crafted feature extraction, are briefly outlined. More recent techniques like Convolutional Neural Networks for feature extraction learning, and neural-network-based text line detection, are reviewed in greater detail. Probabilistic “optical models” for HTR are characterized by modeling simultaneously textual contents and geometric positions (alignment) of handwritten strokes on images. We review Recurrent Neural Networks (RNN) and, in particular, the Long-Short Term Memory RNN model, along with the, so called, Connectionist Temporal Classification loss function, which are currently considered state of the art.We also review the time-honored Hidden Markov Models, not only because of their sound formalization and paradigmatic role in the in the field of HTR, but also because they are still useful nowadays as a convenient asset to combine all types of optical models with linguistic constraints related with lexicon and syntax. These constraints have traditionally been modeled with #-grams, which can bust HTR (and PrIx) performance, sometimes very significantly. #-grams are also thoroughly reviewed, including the way they can seamlessly be integrated with HMMs or RNNs using Weighted Finite State Transducers and the corresponding automata algebra. Finally, Word Graph (or lattice) concepts and methods are explained in full detail since, as it was foretold in the previous chapter, these graphs constitute a key tool for PrIx computation.
April 2024
·
24 Reads
As discussed in the previous chapter, the PrIx framework explicitly adopts the IR point of view, but it draws from many concepts and methods developed over the last 50 years in the field ofKWS, first for speech signals and more recently for text images. A comprehensive survey of these approaches can be consulted in [23] and a recent review in [36]. This chapter overviews the taxonomy and the most important state-of-the-art approaches to KWS for text images. Detailed insights about the most interesting and/or relevant of these approaches will be provided in Chapter 7 of this book. This chapter also reviews the works carried out so far on certain issues which are very significant for PrIx (and KWS alike) but which are very seldom considered in the KWS literature; namely querying text images for hyphenated, abbreviated and/or multiple words. Finally, since PrIx shares optical and language models with HTR, the state of the art in HTR is also briefly outlined.
April 2024
·
5 Reads
This chapter provides details for the implementation of the different approaches proposed for PrIx under the probabilistic framework presented in Chapter 3, using the HTR models and tools described in Chapter 4. After a relatively brief presentation of the implementation of image-processing oriented methods, based on pixel-level posteriorgrams, in the rest of the chapter efficient algorithms are presented in detail which can be used to produce PrIxs under the HTR point of view. In both cases, the aim is to process large collections of text images off-line, so as to allow fast response to on-line queries with very low computing time complexity. To this end, for each image of the collection a series of sufficiently likely “spots” is extracted. Each spot contains a word or a character sequence, called (pseudo)word, along with the corresponding relevance probability and word position information. Two main approaches are discussed: lexicon-based and lexicon-free. The former allows simpler implementations and provides better search accuracy whenever the adopted lexicon provides enough coverage of the expected query words. The latter is not as accurate, but it is much more versatile since the words to be indexed are automatically “discovered” in the very images being indexed.
April 2024
·
1 Read
The proposed probabilistic framework and most of the specific approaches, algorithms, assumptions, and claims discussed throughout the previous chapters, require empirical validation. This is the main purpose of this chapter. In particular, the most relevant questions that we aim to answer trough the experiments are: 1. As compared with the HTR-oriented formulation proposed in Sec. 3.4, how do the various lexicon-based image-processing-oriented posteriorgram methods discussed in Secs. 3.1 and 3.3.2 perform? This will be studied in Sec. 6.2. 2. As discussed in Chapter 3, text-lines are particularly interesting image regions for indexing purposes. So, the question is, how the different RPs defined in that chapter can advantageously be used under a line-level PrIx paradigm? This will be developed in Sec. 6.3. 3. What is the impact of a language model on PrIx performance? Which general approach is preferable, lexicon-based, or lexicon-free? These questions are tackled in Sec. 6.4. 4. How does the amount of training examples affect PrIx performance? This is studied in Sec. 6.5. 5. Given that both PrIx andHTRuse the same underlying probability distributions, is there a clear correlation between the performance on HTR and PrIx tasks? This topic is examined in Sec. 6.6. 6. Since search is one of the main applications of both PrIx and KWS, how does our PrIx methods compare with state-of-the-art KWS approaches? This is studied at line-level in Sec. 6.7. 7. How much PrIx performance improvement can be expected by using the newer, neural-network-based CRNN optical models with respect to adopting more traditional statistical HMM models? We analyze this question in Sec. 6.8. 8. Can line-oriented PrIx RPs be used to tackle KWS under a segmentation-free paradigm? This is empirically assessed in 6.10.1. 9. Is the approach proposed in Sec. 3.6 to compute the RP for a query image (rather than a textual query) adequate to perform traditional QbE KWS? How does this approach fares with respect to other segmentation-free QbE KWS methods? This is studied in Sec. 6.10.2.
April 2024
·
4 Reads
PrIx development was originally driven by the need of searching for textual information in large collections of untranscribed text images. The spots that result from the PrIx process are not image transcripts, but they provide very rich probabilistic information about the text rendered in the images and image regions or locations. This chapter presents approaches to exploit this information to go beyond information search applications. Specifically, we will present methods to use the PrIx of an image or an image collection to deal with tasks that traditionally require actual textual data such as electronic text.We will cover, in order, basic and advanced text analytics, statistical information extraction and document image classification by textual content.
April 2024
·
9 Reads
In this chapter the main contributions of this book are summarized. In addition, for future works, we explain how the probabilistic indexing framework can also be applied using the new, promising models for optical modeling in HTR which are gaining increasing popularity. Finally, we also suggest how the ideas proposed in this book could be applied to other domains, where images are not necessarily of handwritten text.
... As an alternative to indexing noisy plain-text produced by HTR, in the last decade the Probabilistic Indexing (PrIx) framework has emerged as a solid technique for making free-text searching in untranscribed document images a reality [24,44,45,47,53]. This approach provides a convenient trade-off between recall and precision that allows users to locate most of the relevant information they are looking for in large image collections [3,44,46,52]. ...
January 2024
... In real-world applications, machine learning has seen considerable success in areas such as computer vision, natural language processing, and recommendation systems [11][12][13][14][15][16]. Computer vision, in particular, is a significant research area that simulates human visual systems to understand the real world through computational tools. ...
January 2023
IEEE Access
... It is also worth mentioning that this pipeline is hardware lightweight and can be trained with low memory GPU (6GB or less). This work extends the research started in Ref. [22]. Here, a detailed formalization of the problem and new ways of evaluating the results are proposed. ...
June 2023
Lecture Notes in Computer Science
... Typically, these images are organized sequentially in various archival units like folders, books or boxes, here called "image bundles". 1 Each of these bundles can encompass thousands of individual page images which are sequentially organized into several, often many "image documents", also known as "files", "acts", or, specifically for notarial documents considered in this work, "deeds". ...
June 2023
Pattern Recognition Letters
... Automatic classification techniques have been developed for image documents (deeds) to classify their specific typological categories, like "Letter of Payment" or "Will", with encouraging results documented in Refs. [8,21,23,30]. However, these studies assume that successive page images for each deed are assumedly given. ...
June 2023
Pattern Recognition Letters
... Adapting this process to the AMNLT challenge results in three Music aware Pseudo-GABC GABC MEI (c3)fra(hg)tri (<m>c<m>3)fra(<m>h<m>g)tri <clef shape="C" line="4"/> <syllable> <syl>nam</syl> <neume> <nc oct="2" pname="e"/> </neume> </syllable> (C4) nam(e2) Figure 13: Example of the adaptation of the output encodings. [37]. To address this limitation, we introduce a new measure specific to AMNLT that focuses on alignment accuracy: the Alignment Error Rate. ...
May 2023
Pattern Recognition
... As an alternative to indexing noisy plain-text produced by HTR, in the last decade the Probabilistic Indexing (PrIx) framework has emerged as a solid technique for making free-text searching in untranscribed document images a reality [24,44,45,47,53]. This approach provides a convenient trade-off between recall and precision that allows users to locate most of the relevant information they are looking for in large image collections [3,44,46,52]. ...
May 2023
Neural Computing and Applications
... Thus, this section shows the related works of this issue, followed by a summary as illustrated in Table 1. [20] 2021 CNN Hijja Accuracy 97% Al Hamad et al. [21] 2022 ILDT Benchmark -Zouhaira et al. [22] 2021 CNN and BLSTM KHATT CER 8%, WER 20.1% Maalej et al. [23] 2022 MDLSTM IFN/ENIT 92.59% Granell et al. [24] 2023 BLSTM HisClima -Pareek et al. [25] 2020 CNN and MLP Private Accuracy 97.21% Shuvo et al. [26] 2022 CNN Benchmark Accuracy 99.62% Souibgui et al. [27] 2022 Few-shot learning Borg SER 24% Alghizzawi et al. [28] 2023 Self-supervised TKH Accuracy 95% Cao et al. [29] 2020 CNN Benchmark -Meddeb et al. [30] 2022 Deep-learning BRAD RMSE 80% Awni et al. [31] 2022 Deep-learning IFN/ENIT 96.11% ...
April 2023
Pattern Recognition Letters
... Consequently, bWER is prone to become increasingly optimistic as the size of the evaluation sample (e.g., page image transcript) becomes larger. This is thoroughly studied in [43] and the results show that, in general, bWER can be safely used for typical page sizes and text densities, up to a some hundreds words per page, or even much larger in some datasets. ...
January 2023
SSRN Electronic Journal
... In the text recognition part, text recognition is studied using the network structure of CRNN(Convolutional Recurrent Neural Network) and CTC.For information extraction, a method based on text pattern and keyword matching is employed to obtain keyword key-value pairs [15,16,17]. ...
January 2023
SSRN Electronic Journal