Conference Proceeding

Automatically mining review records from forum Web sites

Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
09/2010; DOI:10.1109/FSKD.2010.5569292 pp.2450 - 2455 In proceeding of: Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on, Volume: 5
Source: IEEE Xplore

ABSTRACT The rapid development of Web 2.0 bring the flourish of web reviews. Web reviews are usually released in form of structured records. As the important information source for many popular applications(e.g. monitoring and analysis of public opinion), review records need to be extracted accurately from web pages. To the best of our knowledge, little work in literatures has systemically investigated this problem. Besides the variety of web page templates, the user-generated review contents raises a new challenge. The inconsistency of review contents on both DOM tree and visual appearance impair the similarity among review records, which makes a serious impact on performance of the existing solutions on web data record extraction. To tackle this challenge, we propose a novel approach that performs automatic extraction of review records by employing sophisticated techniques. Our experimental results over 20 forum web sites indicate that the proposed approach can achieve high extraction accuracy.

0 0
 · 
0 Bookmarks
 · 
40 Views

Keywords

20 forum web sites
 
DOM tree
 
existing solutions
 
experimental results
 
information source
 
popular applications(e.g. monitoring
 
proposed approach
 
rapid development
 
records
 
review contents
 
review records
 
serious impact
 
tackle
 
user-generated review contents
 
visual appearance impair
 
web data record extraction
 
web page templates
 
web pages
 
web reviews