Conference Paper

Metadata Extraction System for Chinese Books

DOI: 10.1109/ICDAR.2011.156 Conference: 2011 International Conference on Document Analysis and Recognition, ICDAR 2011, Beijing, China, September 18-21, 2011
Source: DBLP


Extracting metadata from academic papers has attracted much attention from researchers in past years. But how to extract metadata automatically from books is still seldom discussed. In this paper, we address this task on Chinese books and present a system to extract metadata from the title page of a book. This system consists of three components: metadata segmentation, metadata labeling, and post-processing. Different strategies are adopted in the system to identify different metadata types, and a variety of information sources, including geometric layout, linguistic, semantic content and header-footer, are used to accommodate the wide range of metadata layouts. Experimental results on real-world data have demonstrated the effectiveness of the proposed system.

Download full-text


Available from: Liangcai Gao, Sep 17, 2015
10 Reads
Show more