Fig 1- - uploaded by Vincenzo Maltese
Content may be subject to copyright.
A clear example of cultural diversity

A clear example of cultural diversity

Source publication
Article
Full-text available
One of the aims of the LivingKnowledge project is to bring a new quality into search and knowledge management technology, by making opinions, bias, diversity and evolution more tangible and more digestible. In order to capture diversity in knowledge, we believe that developing more powerful tools for extracting information from both text and non-te...

Contexts in source publication

Context 1
... is strongly influenced by the diversity of context, mainly cultural, in which it is generated. Thus, while it may be appropriate to say that (some kinds of) cats and dogs are food in some parts of China, Japan, Korea, Laos and the Philippines, this is unlikely to be the case in the rest of the world [223]. A similar example is provided in Fig. 1. Sometimes, it is not just a matter of diversity in culture, viewpoints or opinion, but rather a combination of different perspectives and goals. In fact, knowledge useful for a certain task, and in a certain environment, will often not be directly applicable given other circumstances, and will thus require adaptation. Hence, there is ...
Context 2
... etc.) evolve over time. New facts are added (e.g., awards, lawsuits, divorces), some facts change (e.g., spouses, CEOs, political positions). These changes do in turn influence the media coverage of certain entities. Facts and media coverage together influence opinions in specific portals, blogs, forums, etc. This situation is illustrated in Fig. 10. Potentially, there are also influences from media and opinions on facts. For example, media coverage may force a politician to resign from an office, and sometimes it is the grassroots' opinions in blogs and online forums that are eventually picked up by print media and TV. Understanding these mutual influences, as a function of time, ...
Context 3
... prominent people in entertainment, business, and politics such as www.thewiplist.com, which contain temporal profiles about people whose assessment in the public has been highly time-variant such as Bill Clinton, Silvio Berlusconi, etc. As for the evolution of media coverage, Websites like news.google.com provide relevant information, as shown in Fig. 11. In isolation, this kind of information is not that informative. But when connected with temporal facts and opinions, it can reveal important insights. For example, the April 2008 burst in Berlusconi's timeline and the older burst in 1994 are clearly due to his becoming elected as prime minister. So, facts and media intensity are ...
Context 4
... use Websites on the topic of (human) migration to demonstrate that Web pages are primarily multimodal objects. They make meanings hierarchically following topological principles relating to the organisation of space, and hence -periodicity over linearity predominating - structures. The way that evolution over time is expressed in the home page in Fig. 13 expresses how multimodal and hierarchical meaning-making principles dominate in contemporary Web pages. Take, for example, the masthead-cum-logo cluster used to express the Website's topic in this and many other Websites. These cluster type functions, rather like a title, give the page its basic identity. The masthead is the name of ...
Context 5
... may further illustrate the principle of thematic expansion which is at work on this page. As illustrated by the red arrows in Fig. 14 a reading of the page starts with the Macro-Theme which is expanded into 4 Macro-News thanks to the cohesive tie created by the Swoosh-type object identifiable with the logo. To put the matter in a different way, the logo functions on a par with numbering systems, tabulating systems, and subordinating structures in language to create ...
Context 6
... with indices in traditional literacy. The logo is used to structure and articulate the entire page. A different colour is used to indicate that there are four different Clusters which make up a higher level unit of meaning based on a repeating visual pattern and hence form a SuperCluster. When clicking on Migration Histories the page shown in Fig. 16 is loaded. The function of the page is to provide an Introduction to personal experiences of migration to England over a period of 200 years. With the invitation "Listen to people's personal experiences of the different receptions they faced when arriving in England, and the struggle to create a new home" the page sets up the ...
Context 7
... new home" the page sets up the expectation that many different opinions are contained in the Website, many of which are likely to relate to poverty and social relations of power. Indeed, the Moving Here Website is characterised by the prominent use of a Pagelet, a higher order textual structure, typically related to strongly ideological stances. Fig. 17 (see dotted rectangle) contains a Pagelet timeline which dominates the page and functions as a visually oriented timeline complex. Thus, it replaces the earlier generation of timelines with linear paragraph structure with one based on the principle of periodicity. Fig. 15 exemplifies a more traditional timeline consisting of a date ...
Context 8
... order textual structure, typically related to strongly ideological stances. Fig. 17 (see dotted rectangle) contains a Pagelet timeline which dominates the page and functions as a visually oriented timeline complex. Thus, it replaces the earlier generation of timelines with linear paragraph structure with one based on the principle of periodicity. Fig. 15 exemplifies a more traditional timeline consisting of a date followed by a one-line synthesis of an ...
Context 9
... are made up of SuperClusters which, in turn, are made up of Clusters which, again, are made up of SubClusters. In the example given in Fig. 16 the Pagelet is made up of 4 SuperClusters i.e. a set of Clusters that contains a periodically repeating pattern such that invariants are easily distinguishable from variants. The first SuperCluster is the overall timeline made up of three Clusters, i.e. the numbers indicating centuries, which are linked up with each other at the ...
Context 10
... like the one illustrated have distinctive linguistic, visual and spatial properties, including the tell-tale, textually-discrete references to years and the explicit line-based linkage between image and written text. Suitably annotated, these features, possibly coupled with keyword searching, would make Timelines such as the one shown in Fig. 17 detectable as GENRES and would come to the rescue of scores of teenagers desperately trying to revise history ahead of tomorrow's classroom test who currently waste precious time when resorting to keyword methods that unearth many irrelevant instances of the word 'timeline' and, frustratingly, fail to detect the higher-level Timeline ...
Context 11
... such as paragraphs. A manual annotation tool, with some semi-automatic features, the MCA Web Browser, is currently being developed to annotate Websites relating to the various thematics explored within the project. It is designed to speed up the process of multimodal genre analysis which in the current stage of research is a laborious process. As Fig. 18 shows the tool takes the form of a Website browser capable of annotating the different hierarchical levels found in Websites in terms of a series of coloured rectangles according to position in the hierarchy: the current scale posits the existence of 5 levels of structure in Web pages which, from lowest to highest, are: SubClusters, ...
Context 12
... as shapes and lines. Kress and van Leeuwen [53] hold that shapes are the visual counterparts of nouns in that they represent Participants, while lines represent Processes and are thus the visual counterparts to verbs. We posit that lines have major functions in Websites e.g. the pointing (or deictic) function of lines such as the Swoosh Logo in Fig. 13. We may note that framing is also a very important feature of Websites carried out by lines. An example is given in Fig. 16 where four Photos are used to frame the abstract concept of a visual repository. It is represented in the centre by an Icon which could not have this meaning if it were not contextualised by the surrounding frame. ...
Context 13
... Participants, while lines represent Processes and are thus the visual counterparts to verbs. We posit that lines have major functions in Websites e.g. the pointing (or deictic) function of lines such as the Swoosh Logo in Fig. 13. We may note that framing is also a very important feature of Websites carried out by lines. An example is given in Fig. 16 where four Photos are used to frame the abstract concept of a visual repository. It is represented in the centre by an Icon which could not have this meaning if it were not contextualised by the surrounding frame. Frames are thus important devices in the co- contextualisation of resources and a further evidence of the hierarchical ...
Context 14
... Intersemiotic (i.e. multimodal) rather than language-only structures: they rely on integrated visual, spatial and linguistic resources rather than just on language in the creation of meaning; like many home pages, the Web page in Fig. 13 contains no paragraphs and indeed no sentences of the type typically associated with paragraph-based running text (cf. newspaper articles) characterised by explicit subject, verb nuclei and predicates. The only sentences used are subjectless (and hence themeless) imperative forms: "Find out...", "Go to...", "Contact us…" and so on; ...
Context 15
... predicates. The only sentences used are subjectless (and hence themeless) imperative forms: "Find out...", "Go to...", "Contact us…" and so on; instead meanings are mostly made through highly elliptical linguistic structures which are intertwined with visual and spatial resources • Hierarchical and cyclic rather than linear: many Websites (e.g. Fig. 14) encase written texts in explicit frames that guide the page-scanning and reading process. Websites base their thematic expansion on periodicity and visual/spatial subordination. Frames are indicative of a hierarchical scale of page subunits running from page to resource/subcluster level via the following sequence ...
Context 16
... the Internet evolves, it is increasingly dominated by compositional hierarchies. There is a decrease in the number of pagey Websites that rely on the parchment-based principle of scrolling and a corresponding increase in the 3-D properties of screeny Web sites (e.g. Fig. 13) which rely on horizontal organisation and 'piercing-the-page' access to information. Central to the semiotics view of Websites described here is the co-contextualising nature of Website objects that derives from the semiotically hierarchical organisation of Websites. It is this co-contextualising property that determines the ...
Context 17
... the example in Fig. 19 and taken from [215]. Notice that here the fundamental categories are those used by Bhattacharyya [78]. All the terms occurring in the labels of the faceted lightweight ontology on the right correspond to a term and corresponding concept in the medicine domain background knowledge. They have a well defined structure and, as such, they ...

Similar publications

Chapter
Full-text available
Sentence selection and summary generation are two main steps to generate informative and readable summaries. However, most previous works treat them as two separated subtasks. In this paper, we propose a novel extractive-and-abstractive hybrid framework for single document summarization task by jointly learning to select sentence and rewrite summar...
Article
Full-text available
Conjunctive adverbials or simply conjuncts represent specific sentence elements contributing to the overall semantic coherence of a text. Their use or omission depends entirely on the decision of the author of the text, the way he or she perceives and intends to convey a particular type of connection between its individual parts. In the present lin...
Preprint
Full-text available
Current deep learning methods for anomaly detection in text rely on supervisory signals in inliers that may be unobtainable or bespoke architectures that are difficult to tune. We study a simpler alternative: fine-tuning Transformers on the inlier data with self-supervised objectives and using the losses as an anomaly score. Overall, the self-super...
Preprint
Full-text available
Variational autoencoders (VAEs) have received much attention recently as an end-to-end architecture for text generation with latent variables. In this paper, we investigate several multi-level structures to learn a VAE model to generate long, and coherent text. In particular, we use a hierarchy of stochastic layers between the encoder and decoder n...
Article
Full-text available
According to the different importance degree between feature and text, this paper proposes a method of improved traditional TF-IDF algorithm according to the position information of the words and the ability to distinguish the texts, and proposes a text representation method based on improved TF-IDF and the corresponding word vector calculation met...

Citations

... Topic diversity refers to the presence of multiple, possibly contradictory, topics in a given text [23]. Several diversity dimensions have been discussed in the literature, including diversity in topic, diversity in viewpoint, and diversity in language. ...
... The greater the number of topics in a conversation the more diverse it is. Text sentiment analysis attempts to extract the semantic orientation conveyed in the text, which can be positive, negative, or neutral [23]. Topic diversity and sentiment analysis have many applications in health care, public opinion analysis, social relationship analysis, marketing, and sales predictions [24]. ...
Article
Full-text available
Keyword extraction refers to the process of detecting the most relevant terms and expressions in a given text in a timely manner. In the information explosion era, keyword extraction has attracted increasing attention. The importance of keyword extraction in text summarization, text comparisons, and document categorization has led to an emphasis on graph-based keyword extraction techniques because they can capture more structural information compared to other classic text analysis methods. In this paper, we propose a simple unsupervised text mining approach that aims to extract a set of keywords from a given text and analyze its topic diversity using graph analysis tools. Initially, the text is represented as a directed graph using synonym relationships. Then, community detection and other measures are used to identify keywords in the text. The set of extracted keywords is used to assess topic diversity within the text and analyze its sentiment. The proposed approach relies on grouping semantically similar candidate words. This approach ensures that the set of extracted keywords is comprehensive. Differing from other graph-based keyword extraction approaches, the proposed method does not require user parameters during graph construction and word scoring. The proposed approach achieved significant results compared to other keyword extraction techniques.
... Such technologies will be called expert. Annotated information objects are used, in particular, as a standard for evaluating the quality of text-processing programs (Giunchiglia, et al., 2009). ...
Conference Paper
Full-text available
The paper focuses on a fundamental problem: developing a model that describes the iterative process of goal-oriented creation of new individual knowledge, and information technology that provides this process. It is assumed that a linguist creates a personal knowledge system as a cross-language typology. We proceed from the notion of a goal-oriented knowledge system. A linguist creates the typology to fill a knowledge gap in contrastive grammar. This gap can be identified through observation of the subject area. Our study arose from a need to fill the gap in the cross-language knowledge system for machine translation. Here, we suggest a model and information technology that facilitates goal-oriented creation of a new personal knowledge system by linguists as a typology. The proposed model consists of two submodels, one representing the formation of annotations of the studied language units, which is performed by a group of linguists, and the other representing the creation of a cross-language typology by an expert-linguist on the basis of generated annotations. In the process of creating a new typology, a linguist analyses bilingual texts. With the help of information technology, a linguist matches up emerging parts of knowledge with the analysed aligned sentences of these texts. The ability to establish this correspondence is the principal distinction of the proposed technology. To show the feasibility of the technology, our team has designed the prototype of the computer system supporting goal-oriented creation of new individual knowledge. This prototype contains German-Russian translations of books totaling about 2.5 million words, analysed by a linguist. The subject of the analysis is translation models of German modal verbs into Russian, which are discovered by a linguist from bilingual texts in an automated mode. There is a wide range of Russian lexical units and syntactic constructions in translations of German modal verbs. At present, there is no systematic description of them. The main aim of the translations' analysis is to create the typology, which will fill the gap in the German-Russian contrastive grammar.
... Applying certain algorithms can also lead to search engines presenting one side of an argument or only the results of a certain type or tendency. Some algorithms not only try to rank results according to relevance but also mix different result types within the top results to achieve diversity (Giunchiglia et al. 2009). ...
Chapter
Full-text available
This chapter discusses the responsibilities of Google as the leading search engine provider to provide fair and unbiased results. In its role, Google has a large influence on what is actually searchable on the Web as well as what results users get to see when they search for information. Google serves billions of queries per month, and users only seldom consider alternatives to this search engine. This market dominance further exacerbates the situation. This leads to questions regarding the responsibility of search engines in general, and Google in particular, for providing fair and balanced results. Areas to consider here are (1) the inclusion of documents in the search engine’s databases and (2) results ranking and presentation. I find that, while search engines should at least be held responsible for their practices regarding indexing, results ranking, delivering results from collections built by the search engine provider itself and the presentation of search engine results pages; today’s dominant player, Google, argues that there actually is no problem with these issues. Its basic argument here is that “competition is one click away”, and, therefore, it should be treated like any other smaller search engine company. I approach the topic from two standpoints: from a technical standpoint, I will discuss techniques and algorithms from information retrieval and how decisions made in the design of the algorithms influence what we as users get to see in search engines. From a societal standpoint, I will discuss what biased search engines mean for knowledge acquisition in society and how we can overcome today’s unwanted search monopoly.
... Annotation is performed by expert linguists as part of semantic text analysis of one or several corpora [9][10][11]. Annotated information objects are used among others as a model to assess the quality of natural language text processing software [12]. ...
Article
Full-text available
This paper explores the problem of constructing a classification scheme of logical-semantic relations between parts of sentences, sentences and fragments of text regardless of its language. The proposed technology for the construction of the classification scheme involves two main steps: the automated formation of a classification heading list and the development of a scheme based on the generated list. The developed method of automated heading formation makes it possible to create verifiable classifications for a wide range of subject areas in which methods of text (and other information objects) processing are applied, for example, in the field of scientific and technical information.
... 2012;, among others. In addition, Sasaki et al. have also propounded the organization of news using a domain analytical approach (2012), while the Living Knowledge Project has studied the description of news and other information on the Web in relation to aspects such as diversity, opinion, bias and context, although focusing on techniques such as automatic classification and faceting and other aspects such as the public image of a company, PR campaigns and future predictions (Giunchiglia et al. 2009;Madalli and Prasad 2011). Concerning subjectivity in KO in a broader sense, although some authors have worked with statistical methodology in combination with feature selection methods to extract subjectivity from documents (Sarvabhotla et al. 2011), the most common approach-leaving aside positivist views in which research on subjectivity was discarded for being considered unwelcome-has been the ethical one in which bias has even been discussed as a potentially positive feature (Feinberg 2007;Hjørland 2008), as well as forming a part of the legitimate plural construction of reality (García Gutiérrez 2002;2011c). ...
... It is the set of ideas that constitute one's goals, expectations, and actions." For the purpose of this work and unless otherwise expressly indicated, as the semantic precision of the concepts "opinion," "criticism" or "bias" are not particularly necessary, they will be used indistinctly or in accordance with the dominant subjective meaning in the general definitions found in Giunchiglia et al. (2009). MKOS are additional "mediators," "metamediators" in the complex process of journalistic discourse (post)production. ...
Article
Full-text available
This paper studies knowledge organization (KO) in media archives, focusing on the presence of subjectivity in the core tasks of mass media knowledge organizers (MKOS) dealing with press, radio and TV records, such as classification, representation, and any other process related to content analysis and organization in news information systems. Far from rejecting subjectivity and ideological bias in these operations - since they co-participate in the media construction of reality - the authors consider MKOS to be genuine ideological and cultural mediators with the right and social responsibility to explicitly state the results of their " objectifiable" work (obtained through KO protocols and procedures determined by the media/company, classifications, thesauri, ontologies, etc) and differentiate them from those of their political, ideological, cultural and, in sum, subjective stances. In order to achieve this, we propose the application of critical operators that should be followed by technical, collaborative and even technological actions geared to investing information systems with the capacity to consider those stances and allowing users to distinguish them. In short, it is the theoretical recognition of the subjective and biased presence of media knowledge organization operators in a job that is usually considered neutral, banal and even objective, and the initial development of tools for critical, self-critical, technical, and technological training keyed to its practical solution. This paper outlines the lines of work of a broader research study on the critical function of KO in the field of global media memory.
... Diversity in search results is a multi--faceted concept. Giunchiglia et al. (2009) define the following dimensions of diversity: diversity of sources (multiplicity of sources of texts and images); diversity of resources (e.g., images, text); diversity of topic; diversity of viewpoint; diversity of genre (e.g., blogs, news, comments); diversity of language; geographical/spatial diversity; and temporal diversity. ...
Article
Full-text available
Purpose — The purpose of this chapter is to give an overview of the context of Web search and search engine related research, as well as to introduce the reader to the sections and chapters of the book. Methodology/approach — We review literature dealing with various aspects of search engines, with special emphasis on emerging areas of Web searching, search engine evaluation going beyond traditional methods and new perspectives on Web searching. Findings — The approaches to studying Web search engines are manifold. Given the importance of Web search engines for knowledge acquisition, research from different perspectives needs to be integrated into a more cohesive perspective. Research limitations/implications — The chapter suggests a basis for research in the field and also introduces further research directions. Originality/value of paper — The chapter gives a concise overview of the topics dealt within the book and also shows directions for researchers interested in Web search engines.
... Bias is defined by Wikipedia as "an inclination to present or hold a partial perspective at the expense of (possibly equally valid) alternatives". 5 The definition of bias by Giunchiglia et al. in [5] states that "bias is the degree of correlation between (a) the polarity of an opinion and (b) the context of the opinion holder". The context can be a variety of factors such as ideological, political, or educational background, ethnicity, race, profession, age, location, or time. ...
... Bias is defined by Wikipedia as "an inclination to present or hold a partial perspective at the expense of (possibly equally valid) alternatives". 5 The definition of bias by Giunchiglia et al. in [5] states that "bias is the degree of correlation between (a) the polarity of an opinion and (b) the context of the opinion holder". The context can be a variety of factors such as ideological, political, or educational background, ethnicity, race, profession, age, location, or time. ...
... In the Cambridge Advanced Learner's Dictionary diversity is defined as: "when many different types of things or people are included in something". 8 In [5] diversity is given from a more knowledge diversity focused point of view as "the co-existence of contradictory opinions and/or statements (some typically nonfactual or referring to opposing beliefs/opinions)". In the same paper, different dimensions of diversity are described such as: diversity of resources, diversity of topic, diversity of viewpoint, diversity of genre, diversity of language, geographical/spatial diversity, and temporal diversity. ...
Article
Full-text available
The Web is an unprecedented enabler for publishing, using and exchanging information at global scale. Virtually any topic is covered by an amazing diversity of opinions, view-points, mind sets and backgrounds. The research project RENDER works on methods and techniques to leverage di-versity as a crucial source of innovation and creativity, and designs novel algorithms that exploits diversity for ranking, aggregating and presenting Web content. Essential in this respect is a knowledge model that makes accessible — cogni-tively to human users as well as computationally to the ma-chine — the diversity in content. In this paper, we present a glossary of relevant terms that serves as baseline to the specification of the Knowledge Diversity Model.
Article
Websites, as an internet product, provide us with a multimedia content: they combine written text, images, audio, video, and hyperlinks. The purpose of this study is to gain insight into the ways content is organised through design and the semiotic resources that are actually put to use by the European research groups in their homepages as a means of facilitating visibility of their work. The study comprises a corpus of 10 homepages from Horizon 2020 research projects. I draw on Systemic Functional Linguistics−Multimodal Discourse Analysis (SFL−MDA) The analysis takes two analytical layers, going from the study of the homepage as a multimodal ensemble in order to identify the semiotic resources at work, and then carry out a cluster analysis to describe layout, which builds up visibility through the textual organisation of content.
Article
Full-text available
Рассматривается задача построения классификационной схемы логико-семантических отношений между частями предложений, предложениями и фрагментами текста вне зависимости от языка, на котором текст написан. Предлагаемая технология построения классификационной схемы включает две основные стадии: автоматизированное формирование перечня классификационных рубрик и создание схемы на основе сформированного перечня. Разработанный метод автоматизированного формирования рубрик позволяет создавать верифицируемые классификации в широком спектре предметных областей, использующих методы обработки текстов и других информационных объектов, в том числе в сфере научно-технической информации.
Article
Full-text available
This study takes the old myth of objectivity in media discourse to one of the most important but unrecognized actors in the process of its construction: the mass media information scientist or documentalist. Accepting the subjective presence of the documentalist in his/her productions, this article opts for the recognition and explicit statement of this role, recommending two actions. First, we suggest that public higher education institutions combine the technical training of mass media documentalists with training in critical thinking skills. Our study analysed the subjects covered in course syllabi to detect the deficiencies to be addressed in meeting this objective. Second, we propose alternative lines of training that can contribute to cross-training of mass media documentalists in those degree programs to ensure that they acquire the needed skills in critical analysis.