Building a knowledge repository of educational resources using dynamic harvesting
ABSTRACT World Wide Web is hosting huge information regarding lots of areas and education is not an exception. Given the huge amount of data, searching for any educational resource manually is very difficult. To overcome this, an intelligent repository of educational resources that helps to decide among the available resources is needed. This paper discusses an attempt to build such repository. This will help users to decide among the available solutions for their needs by providing a comparative analysis among the solutions. The user will also be provided with user experience of the solutions. As the content over the web changes regularly and also new resources get added to the web, the repository will be updated dynamically. And all these tasks are done automatically as far as possible. This work uses crawling, classification, and information extraction techniques for the task of identifying the softwares/tools for education from the web. Our implementation focuses on the free open source softwares (FOSS) for education domain. The final framework of this system would be generic so that it can be extended to any other domain.
SourceAvailable from: psu.edu[Show abstract] [Hide abstract]
ABSTRACT: Recent approaches to text classification have used two di#erent first-order probabilistic models for classification, both of which make the naive Bayes assumption.
[Show abstract] [Hide abstract]
ABSTRACT: The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e.g., company, person, employee, product) and relations (e.g., employed_by, produced_by) of interest when creating the knowledge base. The second is a set of training data consisting of labeled regions of hypertext that represent instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This article describes our general approach, several machine learning algorithms for this task, and promising initial results with a prototype system that has created a knowledge base describing university people, courses, and research projects.Artificial Intelligence 04/2000; 118(1-2-118):69-113. DOI:10.1016/S0004-3702(00)00004-7 · 2.71 Impact Factor