Mining Associative Meanings from the Web: from word disambiguation to the global brain

Source: CiteSeer

ABSTRACT . A general problem in all systems to process language (parsing, translating, etc.) is ambiguity: words have many, fuzzily defined meanings, and meanings shift with the context. This may be tackled by quantifying the connotative or associative meaning, which can be represented as a matrix of mutual association strengths. With many thousands of words, there are billions of possible associations, though, and there is no obvious method to measure all of them. This "knowledge acquisition bottleneck" can be tackled by mining implicit associations from the billions of documents and millions of users on the World-Wide Web. The present paper discusses two methods to achieve this: lexical co-occurrence, a measurement of the frequency with which words appear in each other's neighborhood, and web learning algorithms, an application of the Hebbian rule to create associations between subsequently "activated" words or pages. The mechanism of spreading activation can be applied to the resulting associative networks for clustering, contextdriven disambiguation, and personalized recommendation. A generalization of such methods could transform the web into a "global brain", that is, an intelligent, learning network that assimilates the implicit knowledge and preferences of its users. 1.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We present a system which unearths relationships between named entities from information in Web pages. We use an adaptive named entity recognition system, ESpotter, which recognizes entities of various types with high precision and recall from various domains on the Web, to generate entity data such as peoples' names. Given an entity, we apply a link analysis algorithm to the entity data for finding other entities which are closely related to it. We present our results to people whose names had been included for them to assess our findings. User feedback is analyzed by a statistical method. The results can be used to maintain a domain ontology. Our experiments on the Knowledge Media Institute (KMi) domain show that our system can accurately find entities such as organizations, people, projects, and research areas which are closely related to people working in KMi, and the results conform with the existing knowledge in our ontology and suggest new knowledge which can be used to update the ontology.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Community-driven services compile data provided by the community members, for instance playlists in Web 2.0 music sites. We show how this data can be analysed and knowledge about sequential as-sociations between songs and artists can be discovered. While most of this kind of analysis focus on (symmetric) similarity measures, we intend to discover which songs can "musically follow" others, focusing on the sequential nature of this data in a database of over 500,000 playlists. We obtain a song association model and an artists association model, we evaluate these models comparing the results with other similarity-based analysis, and finally we show how these models can be used to automatically schedule sequences of songs in a social Web radio service.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As most of us subconsciously feel, it is a great difficulty to create a program which could imitate human's way of thinking. Recently the importance of the relation between expressions "feel", "create" and "way of thinking" used in the previous sentence is being noticed, what gave birth to so called "affective computing". During our experi- ments within GENTA project, we have observed useful con- notations between the common sense information and the emotional information which could be retrieved automati- cally from the Internet resources. Those observations seem promising for the language and knowledge acquisition and suggested us to investigate the subject, and also to develop some ideas, which could be useful to the researchers in vari- ous AI fields. We describe GENTA-related sub-projects and their preliminary experiments.
    Journal of Systemics, Cybernetics and Informatics. 01/2004; 2:50-57.

Full-text (2 Sources)

Available from
Jun 6, 2014