Conference Paper

A File Search Method Based on Intertask Relationships Derived from Access Frequency and RMC Operations on Files.

DOI: 10.1007/978-3-642-23088-2_27 In proceeding of: Database and Expert Systems Applications - 22nd International Conference, DEXA 2011, Toulouse, France, August 29 - September 2, 2011. Proceedings, Part I
Source: DBLP

ABSTRACT The tremendous growth in the number of files stored in filesystems makes it increasingly difficult to find desired files.
Traditional keyword-based search engines are incapable of retrieving files that do not include keywords. To tackle this problem,
we use file-access logs to derive intertask relationships for file search. Our observations are that 1) files related to the
same task are frequently used together, and 2) a set of Rename, Move, and Copy (RMC) operations tends to initiate a new task.
We have implemented a system named SUGOI, which detects two types of task, FI tasks and RMC tasks, from file-access logs.
An FI task corresponds to a group of files frequently accessed together. An RMC task is generated by RMC operations and then
constructs a graph of intertask relationships based on the influence of RMC operations and the similarity between tasks. In
utilizing detected tasks and intertask relationships, our system expands the search results of a keyword-based search engine.
Experiments using actual file-access logs indicate that the proposed approach significantly improves search results.

0 Bookmarks
 · 
44 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Since the Internet is sufficiently established, information on the Web is significantly enriched every day. It induces a fact that the information on Web pages has become increasingly useful in daily life. Therefore, it has become very common for us to refer to information on the Web, particularly when writing documents or programs. If we want to revisit the same Web pages to modify some part of a file later, it can be very hard to track down the Web pages originally referred to. In this paper, we propose methods for extracting relationships between files and Web pages based on the co-occurrence of data in Web-access logs and file-access logs. These relationships are very useful for revisiting Web pages related to target files. There are two approaches for merging the logs to analyse co-occurrence in these two types of access logs, involving a trade-off between accuracy and execution time. We call them the Pre-Merge and Post-Merge methods. We have evaluated these two methods using actual access logs.
    International Journal of Business Intelligence and Data Mining 10/2012; 7(3):152-171.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Every day, information on the Web becomes increasingly enriched. Web access is now very useful in many aspects of daily life, particularly for writing documents and programs. In fact, it has become quite usual to edit files while referring to information on the Web. During the file-editing process, we usually visit so many Web pages that we cannot remember all of the relevant ones. Later, if we want to revisit the same Web pages to modify some part of a file, it can be very hard to track down the Web pages originally referred to. In this paper, we propose methods for finding relationships between files and Web pages based on the co-occurrence of data in Web-access logs and file-access logs. These relationships are very useful for revisiting Web pages related to target files. To analyze co-occurrence in these two types of access logs, there are two approaches for merging the logs, involving a trade-off between accuracy and execution time. We call them the Pre-Merge and Post-Merge methods, and we have evaluated these two methods using actual access logs.
    iiWAS'2011 - The 13th International Conference on Information Integration and Web-based Applications and Services, 5-7 December 2011, Ho Chi Minh City, Vietnam; 01/2011