
Leon Todoran- University of Amsterdam
Leon Todoran
- University of Amsterdam
About
14
Publications
2,731
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
539
Citations
Introduction
Current institution
Publications
Publications (14)
Publications on color document image anal- ysis present results on small, nonpublicly available datasets. In this paper we propose a well-defined and groundtruthed color dataset consisting of over 1000 pages, with associated tools for evaluation. As we fo- cus on aspects specific to color documents, we leave out the document textual content in the...
This paper describes the robust reading competitions for ICDAR 2003. With the rapid growth in research over the last few years on recognizing text in natural scenes, there is an urgent need to establish some common benchmark datasets and gain a clear understanding of the current state of the art. We use the term 'robust reading' to refer to text im...
This document is a collection of four working group reports in the areas of digital libraries, document image retrieval, layout analysis, and Web document analysis. These reports were the outcome of discussions by participants at the Fifth IAPR International Workshop on Document Analysis Systems held in Princeton, NJ on 19-21 August 2002.
Publications on color document image analysis present re- sults on small, non-publicly available datasets. We propose in this paper a well dened and groundtruthed color dataset existing of over 1000 pages, with associated tools for evaluation. The color data groundtruthing and evaluation tools are based on a well dened document model, complex- ity...
We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal e#ectively with these information sources, we define a document representation...
We present a fully implemented system based on generic document knowledge for detecting the logical structure of documents for which only general layout information is assumed. In particular, we focus on detecting the reading order. Our system integrates components based on computer vision, artificial intelligence, and natural language processing t...
This paper describes our participation in the TREC Video Retrieval evaluation. Our approach uses two complementary automatic approaches (the first based on visual content, the other on transcripts), to be refined in an interactive setting. The experiments focused on revealing relationships between (1) different modalities, (2) the amount of human p...
Goal driven authoring of training material from existing tech- nical manuals requires the automatic indexing of the man- ual. In this contribution we consider the different represen- tation levels and document knowledge required to do the task. On that basis we have developed tools for automatic indexing in diverse domains.
We present a fully implemented system based on generic document knowledge for detecting the logical structure of documents for which only general layout information is assumed. In particular, we focus on detecting the reading order. Our system integrates components based on computer vision, artificial intelligence, and natural language processing t...
We present a fully implemented system based on generic document knowledge for detecting the logical structure of documents for which only general layout information is assumed. In particular, we focus on detecting the reading order. Our system integrates components based on computer vision, articial intelligence, and natural language processing tec...
We present a framework to analyze color documents of complex layout. In addition, no assumption is made on the layout. Our framework combines in a content-driven bottom-up approach two different sources of information: textual and spatial. To analyze the text, shallow natural language processing tools, such as taggers and partial parsers, are used....
In this contribution we introduce a new method for global
segmentation of color documents with a structure based on text frames
and pictures. It is based on an extensive analysis of the expected shape
of clusters in RGB-color space. The method provides an improved
segmentation over k-means based clustering, and gives a proper basis for
indexing and...