Article

Generating metadata to study and teach about African issues

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Purpose – After almost three centuries of employing western educational approaches, many African societies are still characterized by low western literacy rates, civil conflicts, and underdevelopment. It is obvious that these western educational paradigms, which are not indigenous to Africans, have done relatively little good for Africans. Thus, the purpose of this paper is to argue that the salvation for Africans hinges upon employing indigenous African educational paradigms which can be subsumed under the rubric of ubuntugogy, which the authors define as the art and science of teaching and learning undergirded by humanity toward others. Design/methodology/approach – Therefore, ubuntugogy transcends pedagogy (the art and science of teaching), andragogy (the art and science of helping adults learn), ergonagy (the art and science of helping people learn to work), and heutagogy (the study of self-determined learning). That many great African minds, realizing the debilitating effects of the western educational systems that have been forced upon Africans, have called for different approaches. Findings – One of the biggest challenges for studying and teaching about Africa in Africa at the higher education level, however, is the paucity of published material. Automated generation of metadata is one way of mining massive data sets to compensate for this shortcoming. Originality/value – Thus, the authors address the following major research question in this paper: What is automated generation of metadata and how can the technique be employed from an African-centered perspective? After addressing this question, conclusions and recommendations are offered.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Purpose – This paper aims to determine the key trust antecedents that influence Internet users’ trust level toward Internet service providers (ISPs) in a high-risk society. It also investigates trust-building process, major causes of its violation, their potential implications and restoration. Design/methodology/approach – A mixed-method approach was used in collecting data in Kenya in 2014 by using questionnaire and interview techniques. The former was administered to 250 (with 81 per cent response rate) randomly selected Internet users at Kenyatta University while the latter focused on key decision-makers from four randomly selected ISPs in Nairobi. Findings – The results show that Internet users’ perceptions of ISPs’ ability to be trusted in Kenya depend more on their competence in terms of service delivery (ability) and desire to protect users (benevolence) than upholding acceptable standards (integrity). The results also indicate a lack of trust manifested in poor communication and greed for profit among ISPs as major causes of trust violation. Originality/value – This paper proposes two frameworks that can enhance Internet use by providing a better understanding of trust in a high-risk society.
Article
Full-text available
Spatial data is a key resource for the development of a nation. There is a lot of economic potential that is locked away in spatial data collections and this potential is realised by making the data widely available. Spatial Data Infrastructures (SDI) provide a platform for spatial data users, producers and those that manage it, to distribute the data more efficiently. Governments all over the world are realising the value of National Spatial Data Infrastructures (NSDI), and therefore making major investments to establish them. However, in Africa, implementation of formal NSDI is being done at a seemingly slow pace. This paper presents an assessment of the status of NSDI activity in Africa. 29 countries were used in the survey and an assessment was made per region (South, West, East, North and Central Africa). The results show that generally, formal NSDI activity in most African countries is still in its infancy. The paper also gives recommendations of possible measures that can be taken to foster SDI implementation on the continent. In addition, it highlights potential areas for further SDI research.
Conference Paper
Full-text available
This paper depicts the interrelation between situated learning and learning management from an organisational and personal perspective. Based on this introduction we show how educational metadata can be used for approaches of situated learning and how we can take care of learning approaches and contexts using situated and context specific metadata and rolebased models.
Article
Full-text available
International audience This paper describes three largely qualitative studies, spread over a five year period, into the current practice of data mining in several large South African organisations. The objective was to gain an understanding through in-depth interviews of the major issues faced by participants in the data mining process. The focus is more on the organisational, resource and business issues than on technological or algorithmic aspects. Strong progress is revealed to have been made over this period, and a model for the data mining organisation is proposed. Ce papier décrit trois études, à caractère principalement qualitatif, échelonnées sur une période de cinq ans, ayant pour but d’analyser des données exploitées dans diverses organisations Sud Africaines. L’objectif était d’acquérir une plus grande compréhension des problèmes auxquels les participants eurent à faire face dans le processus d’exploitation de données. Des interviews très élaborées furent utilisées. L’accent fut plutôt mis sur les problèmes d’origine commerciales et non pas sur les aspects technologiques et algorithmiques. Durant le cours de nos recherches, des progrès concrets furent réalises et un modèle pour l’exploitation de données fut proposé.
Conference Paper
Full-text available
This paper presents a proposal to create a graph representation for GIS, using both spatial and non-spatial data and also including spatial relations between spatial objects. Because graphs are a powerful and flexible knowledge representation we are able to combine spatial and non-spatial data at the same time and this is one of the strengths of the proposal. We hope to apply this knowledge representation to the data mining process with GIS data including three types of spatial relations: topological, orientation and distance.
Conference Paper
Full-text available
Data mining is the process of extracting implicit, valuable, and interesting information from large sets of data. Visualization is the process of visually exploring data for pattern and trend analysis, and it is a common method of browsing spatial datasets to look for patterns. However the growing volume of spatial datasets make it difficult for humans to browse such datasets in their entirety, and data mining algorithms are needed to filter out large uninteresting parts of spatial datasets. We construct a web-based visualization software package for observing the summarization of spatial patterns and temporal trends. We also present data mining algorithms for filtering out vast parts of datasets for spatial outlier patterns. The algorithms were implemented and tested with a real-world set of Minneapolis-St. Paul (Twin Cities) traffic data.
Conference Paper
Full-text available
With explosively increasing volumes of remote sensing, modelling and other Earth science data available, and the popularity of the Internet, scientists are now facing challenges to publish and to find interesting data sets effectively and efficiently. Metadata has been recognized as a key technology to ease the searching and retrieval of Earth science data. In this paper, we discuss the DIMES (DIstributed MEtadata Server) prototype system. Designed to be flexible yet simple, DIMES uses XML to represent, store, retrieve and interoperate metadata in a distributed environment. DIMES accepts metadata in any well-formed XML format and thus assumes the “tree” semantics of metadata entries. Additional domain knowledge can be represented as specific links through XML's ID/IDREF mechanism. DIMES provides a number of mechanisms, including the “nearest-neighbor search”, to navigate and to search metadata. Though started for the Earth science community, DIMES can be easily extended to serve scientific communities in other disciplines
Conference Paper
Full-text available
We address the implementation of a distributed data system designed to serve Earth system scientists. A consortium led by George Mason University has been funded by NASA's Working Prototype Earth Science Information Partner (WP-ESIP) program to develop, implement, and operate a distributed data and information system. The system will address the research needs of seasonal to interannual scientists whose research focus includes phenomena such as El Nino, monsoons and associated climate studies. The system implementation involves several institutions using a multitiered client-server architecture. Specifically the consortium involves an information system of three physical sites, GMU, the Center for Ocean-Land-Atmosphere Studies (COLA) and the Goddard Distributed Active Archive Center, distributing tasks in the areas of user services, access to data, archiving, and other aspects enabled by a low-cost, scalable information technology implementation. The project can serve as a model for a larger WP-ESIP Federation to assist in the overall data information system associated with future large Earth Observing System data sets and their distribution. The consortium has developed innovative information technology techniques such as content based browsing, data mining and associated component working prototypes; analysis tools particularly GrADS developed by COLA, the preferred analysis tool of the working seasonal to interannual communities; and a Java front-end query engine working prototype
Article
Full-text available
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information-providing services, such as data warehousing and online services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided and to increase business opportunities. In response to such a demand, this article provides a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented
Article
Full-text available
This article describes an internet infrastructure for working with data called DataSpace. A distributed DataSpace application containing data from the 2MASS and DPOSS astronomical data sets is also described. DataSpace is designed so that client applications supporting the remote analysis and distributed mining of data are easy to build.
Article
Full-text available
Astronomy has a long history of acquiring, systematizing, and interpreting large quantities of data. Starting from the earliest sky atlases through the first major photographic sky surveys of the 20th century, this tradition is continuing today, and at an ever increasing rate. Like many other fields, astronomy has become a very data-rich science, driven by the advances in telescope, detector, and computer technology. Numerous large digital sky surveys and archives already exist, with information content measured in multiple Terabytes, and even larger, multi-Petabyte data sets are on the horizon. Systematic observations of the sky, over a range of wavelengths, are becoming the primary source of astronomical data. Numerical simulations are also producing comparable volumes of information. Data mining promises to both make the scientific utilization of these data sets more effective and more complete, and to open completely new avenues of astronomical research. Technological problems range from the issues of database design and federation, to data mining and advanced visualization, leading to a new toolkit for astronomical research. This is similar to challenges encountered in other data-intensive fields today. These advances are now being organized through a concept of the Virtual Observatories, federations of data archives and services representing a new information infrastructure for astronomy of the 21st century. In this article, we provide an overview of some of the major datasets in astronomy, discuss different techniques used for archiving data, and conclude with a discussion of the future of massive datasets in astronomy.
Article
Full-text available
Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have a role to play in spatial data mining. To this end, we develop a new clustering method called CLARANS which is based on randomized search. We also develop two spatial data mining algorithms that use CLARANS. Our analysis and experiments show that with the assistance of CLARANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms. Furthermore, experiments conducted to compare the performance of CLARANS with that of existing clustering methods show that CLARANS is the most efficient. keywords: spatial data mining, clustering algorithms, randomized search 1 Introduction Data mining in general is the search for hidden patterns that may exist in large databases. Spatial data mining in particular is the di...
Article
Full-text available
Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is a highly demanding field because very large amounts of spatial data have been collected in various applications, ranging from remote sensing, to geographical information systems (GIS), computer cartography, environmental assessment and planning, etc. In this paper, an efficient method for building decision trees for the classification of objects stored in geographic information databases is proposed and studied. Our approach to spatial classification is based on both (1) non-spatial properties of the classified objects and (2) attributes, predicates and functions describing spatial relations between classified objects and other features located in the spatial proximity of the classified objects. Several optimization techniques are explored, including a two-step spatial computation technique, use of spatial-join indices, etc. We implemented the algorithm and conducted experiments that showed...
Article
After almost three centuries of employing Western educational approaches, many African societies are still characterized by low Western literacy rates, civil conflicts and underdevelopment. It is obvious that these Western educational paradigms, which are not indigenous to Africans, have done relatively little good for Africans. Thus, I argue in this paper that the salvation for Africans hinges upon employing indigenous African educational paradigms which can be subsumed under the rubric of ubuntugogy, which I define as the art and science of teaching and learning undergirded by humanity towards others. Therefore, ubuntugogy transcends pedagogy (the art and science of teaching), andragogy (the art and science of helping adults learn), ergonagy (the art and science of helping people learn to work), and heutagogy (the study of self-determined learning).
Chapter
Data mining techniques have gained acceptance as a viable means of finding useful information in data. While the techniques can be applied to any kind of data, a brief survey of the work presented at recent conferences in data mining and knowledge discovery might lead one to believe that these techniques are being applied mainly to commercial data sets, to address problems such as customer relationship management, market basket analysis, credit card fraud, etc. Often overlooked is the fact that data mining techniques have long been applied to scientific datasets, with fields such as remote sensing, astronomy, biology, physics, and chemistry, providing a rich environment for the practice of these techniques. In this paper, I describe the various scientific and engineering areas in which data mining is playing an important role and discuss some of the issues that make scientific data mining different from its commercial counterpart. I show that the diversity of applications, the richness of the problems faced by practitioners, and the opportunity to borrow ideas from other domains, make scientific data mining an exciting and challenging field.
Article
The study of fluid flow turbulence has been an active area of research for over 100 years, mainly because of its technological importance to a vast number of appli-cations. In recent times with the advent of supercomputers and new experimental imaging techniques, terabyte scale data sets are being generated, and hence stor-age as well as analysis of this data has become a major issue. In this chapter we outline a new approach to tackling these data-sets which relies on selective data storage based on real-time feature extraction and utilizing data mining tools to aid in the discovery and analysis of the data. Visualization results are presented which highlight the type and number of spatially and temporally evolving coher-ent features that can be extracted from the data sets as well as other high level features.
Article
This paper depicts the interrelation between situated learning and learning management from an organizational and personal perspective. Based on this introduction we show how educational metadata can be used for approaches of situated learning and how we can take care of contexts using context specific role-based metadata.
Article
The development of software tools and techniques for the efficient ac-cess and analysis of large astronomical databases poses some unique challenges. We briefly describe some of the problems astronomical data and datasets present and give an example from our own efforts to auto-mate the classification of galaxies, and then discuss where "clustering" algorithms may be applicable.
Article
The evolution of digital libraries and the Internet has dramatically transformed the pro-cessing, storage, and retrieval of information. Efforts to digitize text, images, video, and audio now consume a substantial portion of both academic and industrial activity. Even when there is no shortage of textual materials on a particular topic, procedures for in-dexing or extracting the knowledge or conceptual information contained in them can be lacking. Recently developed information retrieval technologies are based on the concept of a vector space. Data are modeled as a matrix, and a user's query of the database is represented as a vector. Relevant documents in the database are then identified via simple vector operations. Orthogonal factorizations of the matrix provide mechanisms for han-dling uncertainty in the database itself. The purpose of this paper is to show how such fundamental mathematical concepts from linear algebra can be used to manage and index large text collections.
Article
The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we generalize this algorithm in two important directions. The generalized algorithm—called GDBSCAN—can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes. In addition, four applications using 2D points (astronomy), 3D points (biology), 5D points (earth science) and 2D polygons (geography) are presented, demonstrating the applicability of GDBSCAN to real-world problems.
Conference Paper
Text mining applies the same analytical functions of data mining to the domain of textual information, relying on sophisticated text analysis techniques that distill information from free-text documents. IBM's Intelligent Miner for Text provides the necessary tools to unlock the business information that is "trapped" in email, insurance claims, news feeds, or other document repositories. It has been successfully applied in analyzing patent portfolios, customer complaint letters, and even competitors' Web pages. After defining our notion of "text mining", we focus on the differences between text and data mining and describe in some more detail the unique technologies that are key to successful text mining. 1. MINING TEXT "There is gold hidden in your companies data" - and data mining promises to help you finding it. And in fact, many successful applications of data mining prove that this is true indeed. But data mining addresses only a very limited part of a company's total data assets: the structured information available in databases. Probably more than 90% of a companies data are never being looked at: letters from customers, email correspondence, recordings of phone calls with customers, contracts, technical documentation, patents, . . . With ever dropping prices of mass storage, companies collect more and more of such data online. But what can we get from all this data? More often than not, the only way the data is made usable - outside of very specific applications for subsets of that data - is by making it accessible and searchable in a companies intranet. But today there is more you can do: text mining helps to dig out the hidden gold from textual information. Text mining leaps from old-fashioned information retrieval to information and knowledge discovery.
Conference Paper
Bioinformatics is as a bridge between life science and computer science: computer algorithms are needed to face complexity of biological processes. Bioinformatics applications manage complex biological data stored into distributed and often heterogeneous databases and require large computing power. We discuss requirements of such applications and present the architecture of PROTEUS, a grid-based problem solving environment that integrates ontology and workflow approaches to enhance composition and execution of bioinformatics applications on the grid.
Conference Paper
Data from biological research is proliferating rapidly and advanced data storage and analysis methods are required to manage it. We introduce the main sources of biological data available and outline some of the domain specific problems associated with automated analysis. We discuss two major areas in which we are likely experience software engineering challenges over the next ten years: data integration and presentation.
Article
Spatial data mining algorithms heavily depend on the efficient processing of neighborhood relations since the neighbors of many objects have to be investigated in a single run of a typical algorithm. Therefore, providing general concepts for neighborhood relations as well as an efficient implementation of these concepts will allow a tight integration of spatial data mining algorithms with a spatial database management system. This will speed up both, the development and the execution of spatial data mining algorithms. In this paper, we define neighborhood graphs and paths and a small set of database primitives for their manipulation. We show that typical spatial data mining algorithms are well supported by the proposed basic operations. For finding significant spatial patterns, only certain classes of paths “leading away” from a starting object are relevant. We discuss filters allowing only such neighborhood paths which will significantly reduce the search space for spatial data mining algorithms. Furthermore, we introduce neighborhood indices to speed up the processing of our database primitives. We implemented the database primitives on top of a commercial spatial database management system. The effectiveness and efficiency of the proposed approach was evaluated by using an analytical cost model and an extensive experimental study on a geographic database.
Article
Spatial development initiatives (SDIs) are becoming a critical feature in the planning for reconstruction in post-apartheid South (and Southern) Africa. The SDI programme marks a fundamental break with the trajectories and initiatives for economic and spatial planning of the apartheid past. The objective in this paper is to examine the record and developmental impact of SDI planning in South (ern) Africa through the lens of the most well-known SDI, the Maputo Development Corridor or Maputo SDI. The cross-border nature of the Maputo SDI makes it an important case study in terms of a recent shift in focus of the SDI programme towards a greater role for strengthening the regional Southern African economy. It is argued that the case of the Maputo SDI represents one illustration of the construction or configuring of a 'new regionalism' in Southern Africa. Copyright Royal Dutch Geographical Society 2001.
Conference Paper
Biological databanks have proven useful to bioscience researchers, especially in the analysis of raw data. Computational tools for sequence identification, structural analysis, and visualization have been built to access these databanks. This paper describes a way to utilize these resources (both data and tools) by integrating different biological databanks into a unified XML framework. An interface to access the embedded bioinformatic tools for this common model is built by leveraging the query language of XML database management system. The proposed framework has been implemented with the emphasis of reusing the existing bioinformatic data and tools. This paper describes the overall architecture of this prototype and some design issues
Conference Paper
An association rule in data mining is an implication of the form X→Y where X is a set of antecedent items and Y is the consequent item. For years researchers have developed many tools to visualize association rules. However, few of these tools can handle more than dozens of rules, and none of them can effectively manage rules with multiple antecedents. Thus, it is extremely difficult to visualize and understand the association information of a large data set even when all the rules are available. This paper presents a novel visualization technique to tackle many of these problems. We apply the technology to a text mining study on large corpora. The results indicate that our design can easily handle hundreds of multiple antecedent association rules in a three-dimensional display with minimum human interaction, low occlusion percentage, and no screen swapping
Article
Inventories of manually compiled dictionaries usually serve as a source for word senses. However, they often include many rare senses while missing corpus/domain-specific senses. We present a clustering algorithm called CBC (Clustering By Committee) that automatically discovers word senses from text. It initially discovers a set of tight clusters called committees that are well scattered in the similarity space. The centroid of the members of a committee is used as the feature vector of the cluster. We proceed by assigning words to their most similar clusters. After assigning an element to a cluster, we remove their overlapping features from the element. This allows CBC to discover the less frequent senses of a word and to avoid discovering duplicate senses. Each cluster that a word belongs to represents one of its senses. We also present an evaluation methodology for automatically measuring the precision and recall of discovered senses.
Article
. Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is an important task for understanding and use of spatial data- and knowledge-bases. In this paper, an efficient method for mining strong spatial association rules in geographic information databases is proposed and studied. A spatial association rule is a rule indicating certain association relationship among a set of spatial and possibly some nonspatial predicates. A strong rule indicates that the patterns in the rule have relatively frequent occurrences in the database and strong implication relationships. Several optimization techniques are explored, including a two-step spatial computation technique (approximate computation on large sets, and refined computations on small promising patterns), shared processing in the derivation of large predicates at multiple concept levels, etc. Our analysis shows that interesting association rules can be discovered efficiently in large sp...
Article
An invaluable portion of scientific data occurs naturally in text form. Given a large unlabeled document collection, it is often helpful to organize this collection into clusters of related documents. By using a vector space model, text data can be treated as high-dimensional but sparse numerical data vectors. It is a contemporary challenge to efficiently preprocess and cluster very large document collections. In this paper we present a time and memory ecient technique for the entire clustering process, including the creation of the vector space model. This efficiency is obtained by (i) a memory-ecient multi-threaded preprocessing scheme, and (ii) a fast clustering algorithm that fully exploits the sparsity of the data set. We show that this entire process takes time that is linear in the size of the document collection. Detailed experimental results are presented - a highlight of our results is that we are able to effectively cluster a collection of 113,716 NSF award abstracts in 23 minutes (including disk I/O costs) on a single workstation with modest memory consumption.
Article
. Knowledge discovery in databases (KDD) is an important task in spatial databases since both, the number and the size of such databases are rapidly growing. This paper introduces a set of basic operations which should be supported by a spatial database system (SDBS) to express algorithms for KDD in SDBS. For this purpose, we introduce the concepts of neighborhood graphs and paths and a small set of operations for their manipulation. We argue that these operations are sufficient for KDD algorithms considering spatial neighborhood relations by presenting the implementation of four typical spatial KDD algorithms based on the proposed operations. Furthermore, the efficient support of operations on large neighborhood graphs and on large sets of neighborhood paths by the SDBS is discussed. Neighborhood indices are introduced to materialize selected neighborhood graphs in order to speed up the processing of the proposed operations. Keywords: Spatial Data Mining, Neighborhood Graphs, Effici...
Spatial data infrastructure - Africa
  • Sdi-Africa
Text mining with conceptual graphs
  • M Gomez
  • A Gelbuhk
  • A Lopez
  • R Yates
Book review of Ron Eglash's African fractals: modern computing and indigenous design
  • A K Bangura
Welcome to Africa France bioinformatics program”
  • V Lefort
Massive data: management, analysis, visualization, and security
  • S Hambrusch
  • C Hoffman
  • M Bock
  • S King
  • D Miller
Metadata standards - Friend or Foe?
  • Snapsurvey
  • Com
African astronomical society”, available at: www.spemconnector.org/afrocanastronomicalsociety
  • Stemconnector
Algorithms and applications for spatial data mining”, Geographic Data Mining and Knowledge Discovery
  • M Ester
  • H Kriegel
  • J Sander