Automatic Extraction of Pedagogic Metadata from Learning Content.

I. J. Artificial Intelligence in Education 01/2008; 18:97-118.
Source: DBLP

ABSTRACT Annotating learning material with metadata allows easy reusability by different learning/tutoring systems. Several metadata standards have been developed to represent learning objects and courses. A learning system needs to use pedagogic attributes including document type, topic, coverage of concepts, and for each concept the significance and the role. Moreover, in order to have a flexible and reusable repository of e-learning materials, it is necessary that the annotation of the documents with such metadata be done in an automatic fashion as far as possible. This paper describes the attributes that represent some important pedagogic characteristics of learning materials. To reduce the overhead of manual annotation we have explored the feasibility of automatic annotation of learning materials with metadata. This facilitates the creation of an elearning open repository for storing these annotated learning materials, which can be used by learning systems. The automatic annotation is based on a domain knowledge base and a number of algorithms like standard classification algorithms, parsing and analysis of documents have been used for this purpose. The results show a fair degree of accuracy, which may be improved in future using more sophisticated algorithms.

Download full-text


Available from: Sudeshna Sarkar, Jun 16, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: The objective of any tutoring system is to provide a meaningful learning to the learner. Therefore an automated tutoring system should be able to know whether a concept mentioned in a document is a prerequisite for studying that document, or it can be learned from it. This paper addresses the problem of identifying defined concepts and prerequisite concepts from learning resources in html format. In this paper a supervised machine learning approach was taken to address the problem, based on linguistic features which enclose contextual information and stylistic features such as font size and font weight. This paper shows that contextual information in addition to format information can give better results when used with the SVM classifier than with the (LP)<sup>2</sup> algorithm.
    Neural Networks (IJCNN), The 2010 International Joint Conference on; 08/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper proposes a methodology for semantic open syllabus construction to be eventually used by an automated and collaborative book authoring tool. It proposes the use of RDFOWL over the existing use of XML to define an Open Syllabus. Using RDFOWL will eliminate fixed schema requirement of XML allowing seamless semantic integration with other semantic e-learning objects. The paper also delineates the importance of using semantic technologies in an active learning community for specifying the syllabus and general course development. It emphasizes on creating a learner centric syllabi which fosters collaborative learning and knowledge generation from the collaborative learning community. The student indirectly participates in the formation of course objectives and hence course development leading to a truly learner centric teaching environment. This paper proposes detailed semantic open syllabus ontology to specify course contents and a corresponding methodology that would aid in automated textbook generation in a collaborative learning environment.
    Technology Enhanced Education (ICTEE), 2012 IEEE International Conference on; 01/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Abstract. Personalization is increasingly vital especially for enterprises to be able to reach their customers. The key challenge in supporting personalization is the need for rich metadata, such as metadata about structural relationships, subject/concept relations between documents and cognitive metadata about documents (e.g. difficulty of a document). Manual annotation of large knowledge bases with such rich metadata is not scalable. As well as, automatic mining of cognitive metadata is challenging since it is very difficult to understand underlying intellectual knowledge about document automatically. On the other hand, the Web content is increasing becoming multilingual since growing amount of data generated on the Web is non-English. Current metadata extraction systems are generally based on English content and this requires to be revolutionized in order to adapt to the changing dynamics of the Web. To alleviate these problems, we introduce a novel automatic metadata extraction framework, which is based on a novel fuzzy based method for automatic cognitive metadata generation and uses different document parsing algorithms to extract rich metadata from multilingual enterprise content using the newly developed DocBook, Resource Type and Topic ontologies. Since the metadata generation process is based upon DocBook structured enterprise content, our framework is focused on enterprise documents and content which is loosely based on the DocBook type of formatting. DocBook is a common documentation formatting to formally produce corporate data and it is adopted by many enterprises. The proposed framework is illustrated and evaluated on English, German and French versions of the Symantec Norton 360 knowledge bases. The user study showed that the proposed fuzzy-based method generates reasonably accurate values with an average precision of 89.39% on the metadata values of document difficulty, document interactivity level and document interactivity type. The proposed fuzzy inference system achieves improved results compared to a rule-based reasoner for difficulty metadata extraction (~11% enhancement). In addition, user perceived metadata quality scores (mean of 5.57 out of 6) found to be high and automated metadata analysis showed that the extracted metadata is high quality and can be suitable for personalized information retrieval.
    Journal of Web Semantics 01/2011; DOI:10.1016/j.websem.2011.11.001 · 1.38 Impact Factor