Leon Todoran

Leon Todoran
  • University of Amsterdam

About

14
Publications
2,731
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
539
Citations
Current institution
University of Amsterdam

Publications

Publications (14)
Article
Full-text available
Publications on color document image anal- ysis present results on small, nonpublicly available datasets. In this paper we propose a well-defined and groundtruthed color dataset consisting of over 1000 pages, with associated tools for evaluation. As we fo- cus on aspects specific to color documents, we leave out the document textual content in the...
Article
Full-text available
This paper describes the robust reading competitions for ICDAR 2003. With the rapid growth in research over the last few years on recognizing text in natural scenes, there is an urgent need to establish some common benchmark datasets and gain a clear understanding of the current state of the art. We use the term 'robust reading' to refer to text im...
Article
Full-text available
This document is a collection of four working group reports in the areas of digital libraries, document image retrieval, layout analysis, and Web document analysis. These reports were the outcome of discussions by participants at the Fifth IAPR International Workshop on Document Analysis Systems held in Princeton, NJ on 19-21 August 2002.
Conference Paper
Full-text available
Publications on color document image analysis present re- sults on small, non-publicly available datasets. We propose in this paper a well dened and groundtruthed color dataset existing of over 1000 pages, with associated tools for evaluation. The color data groundtruthing and evaluation tools are based on a well dened document model, complex- ity...
Article
Full-text available
We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal e#ectively with these information sources, we define a document representation...
Article
We present a fully implemented system based on generic document knowledge for detecting the logical structure of documents for which only general layout information is assumed. In particular, we focus on detecting the reading order. Our system integrates components based on computer vision, artificial intelligence, and natural language processing t...
Article
Full-text available
This paper describes our participation in the TREC Video Retrieval evaluation. Our approach uses two complementary automatic approaches (the first based on visual content, the other on transcripts), to be refined in an interactive setting. The experiments focused on revealing relationships between (1) different modalities, (2) the amount of human p...
Conference Paper
Full-text available
Goal driven authoring of training material from existing tech- nical manuals requires the automatic indexing of the man- ual. In this contribution we consider the different represen- tation levels and document knowledge required to do the task. On that basis we have developed tools for automatic indexing in diverse domains.
Article
We present a fully implemented system based on generic document knowledge for detecting the logical structure of documents for which only general layout information is assumed. In particular, we focus on detecting the reading order. Our system integrates components based on computer vision, artificial intelligence, and natural language processing t...
Article
Full-text available
We present a fully implemented system based on generic document knowledge for detecting the logical structure of documents for which only general layout information is assumed. In particular, we focus on detecting the reading order. Our system integrates components based on computer vision, articial intelligence, and natural language processing tec...
Article
Full-text available
We present a framework to analyze color documents of complex layout. In addition, no assumption is made on the layout. Our framework combines in a content-driven bottom-up approach two different sources of information: textual and spatial. To analyze the text, shallow natural language processing tools, such as taggers and partial parsers, are used....
Conference Paper
Full-text available
In this contribution we introduce a new method for global segmentation of color documents with a structure based on text frames and pictures. It is based on an extensive analysis of the expected shape of clusters in RGB-color space. The method provides an improved segmentation over k-means based clustering, and gives a proper basis for indexing and...

Network

Cited By