Chapter

The Data Deluge: An e‐Science Perspective

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper previews the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites. In order to be exploited by search engines and data mining software tools, such experimental data needs to be annotated with relevant metadata giving information as to provenance, content, conditions and so on. The need to automate the process of going from raw data to information to knowledge is briefly discussed. The paper argues the case for creating new types of digital libraries for scientific data with the same sort of management services as conventional digital libraries in addition to other data-specific services. Some likely implications of both the Open Archives Initiative and e-Science data for the future role for university libraries are briefly mentioned. A substantial subset of this e-Science data needs to archived and curated for long-term preservation. Some of the issues involved in the digital preservation of both scientific data and of the programs needed to interpret the data are reviewed. Finally, the implications of this wealth of e-Science data for the Grid middleware infrastructure are highlighted.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Even though its growth is less exponential, the trends seems to be a constant growth factor. Web users, as well as researchers easily get drowned within this data deluge [43] and searching for relevant information is nowadays a tedious task. In despite of the numerous scientific search engines, the bibliographic phase is still a complex and tremendous task that researchers regularly face. ...
... Both of these ideas are not yet added to our approach and only the specific connected categories are kept because several general ones-sometimes even off-topic-may be included into synset's categories. A perfect example is the synset for Artificial intelligence 43 . It contains correct categories such as Artificial intelligence or Computational neuroscience but also very generic ones such as Emerging technologies or Formal sciences. ...
... This safety should be defined. Currently, we did not find a rule defining the threshold when categories are good enough based on domains connections.43 https://babelnet.org/synset?word=bn:00002150n ...
Thesis
The abundance of data from the Internet is such that web users struggle to find data relevant to their initial problematic and find themselves drown in the data deluge. This observation is also applicable to researchers and other scientists during their bibliographic research phase. This thesis, realized in collaboration with MDPI (publisher of open access scientific articles - www.mdpi.com), proposes an original way of linking semantically related scientific articles. For this, the disambiguation of the keywords of the articles using a knowledge base is achieved, thanks to the first step of theapproach, namely the categorization. An data augmentation is then performed by the extraction of semantical neighbors from these contextualized keywords. Finally, a metric taking into account all the possible intersections between disambiguated keywords and their semantic neighbors is proposed. Other similarity measures based on neural networks or other probabilistie models have been implemented and compared in this thesis. The results obtained are comparable to those obtained by our approach and further investigations are possible (e.g., combination of these methods). The evaluation of our approach highlights promising results (up to 92% accuracy) and opens interesting leads for future research.
... To be sure, e-Research transforms knowledge in a various ways, but how? Some (for example, Hey & Trefethen, 2003) have claimed that a 'paradigm shift' or 'revolution' (as opposed to evolution) in knowledge is taking place, drawing on philosophical ideas about the nature of knowledge. Yet we will argue, first, that philosophy on its own is unlikely to be a useful guide to this transformation. ...
... Instead, it has been shown that there a major field differences (see, for example, Fry & Schroeder, forthcoming;Olson, Zimmerman, & Bos, 2008). The second would apply most readily to the problem of the 'data deluge' (Hey & Trefethen, 2003) as it has been argued that this problem requires a paradigm shift. Again, however, there are field differences (Borgman, 2007). ...
Preprint
e-Research is a rapidly growing research area, both in terms of publications and in terms of funding. In this article we argue that it is necessary to reconceptualize the ways in which we seek to measure and understand e-Research by developing a sociology of knowledge based on our understanding of how science has been transformed historically and shifted into online forms. Next, we report data which allows the examination of e-Research through a variety of traces in order to begin to understand how the knowledge in the realm of e-Research has been and is being constructed. These data indicate that e-Research has had a variable impact in different fields of research. We argue that only an overall account of the scale and scope of e-Research within and between different fields makes it possible to identify the organizational coherence and diffuseness of e-Research in terms of its socio-technical networks, and thus to identify the contributions of e-Research to various research fronts in the online production of knowledge.
... For the same research aim, some natural sciences domains with many similarities to DCH research appear more advanced, or at least seem to converge more efficiently on stabilized workflows and methodologies [Huisman et al. 2021]. For a while, the 2:3 Studies in Digital Heritage, Vol. 6, No. 1, Publication date: June 2022 DCH community has been claiming to move forward massive digitization [Santos et al. 2017, Hey andTrefethen 2003] but had to admit the important gap between humanities and E-science top contributors (physics, astronomy, biology or earth and chemical sciences) [Hey andTrefethen 2003, Schrorer andMudge 2017] in term of advances in data integration and ingestion. Beside this scalability gap, Heritage Sciences share a sort of modus operandi with biology and earth sciences. ...
... For the same research aim, some natural sciences domains with many similarities to DCH research appear more advanced, or at least seem to converge more efficiently on stabilized workflows and methodologies [Huisman et al. 2021]. For a while, the 2:3 Studies in Digital Heritage, Vol. 6, No. 1, Publication date: June 2022 DCH community has been claiming to move forward massive digitization [Santos et al. 2017, Hey andTrefethen 2003] but had to admit the important gap between humanities and E-science top contributors (physics, astronomy, biology or earth and chemical sciences) [Hey andTrefethen 2003, Schrorer andMudge 2017] in term of advances in data integration and ingestion. Beside this scalability gap, Heritage Sciences share a sort of modus operandi with biology and earth sciences. ...
Article
Full-text available
In the field of Digital Heritage Studies, data provenance has always been an open and challenging issue. As Cultural Heritage objects are unique by definition, the methods, the practices and the strategies to build digital documentation are not homogeneous, universal or standardized. Metadata is a minimalistic yet powerful form to source and describe a digital document. They are often required or mandatory at an advanced stage of a Digital Heritage project. Our approach is to integrate since data capture steps meaningful information to document a Digital Heritage asset as it is moreover being composed nowadays from multiple sources or multimodal imaging surveys. This article exposes the methodological and technical aspects related to the ongoing development of MEMoS; standing for Metadata Enriched Multimodal documentation System. MEMoS aims to contribute to data provenance issues in current multimodal imaging surveys. It explores a way to document CH oriented capture data sets with a versatile descriptive metadata scheme inspired from the W7 ontological model. In addition, an experiment illustrated by several case studies, explores the possibility to integrate those metadata encoded into 2D barcodes directly to the captured image set. The article lays the foundation of a three parted methodology namely describe - encode and display toward metadata enriched documentation of CH objects.
... The widespread re-purposing [175] of data is a root cause of many data curation problems. In practice, there are three main categories of approaches to data curation, as data workers aim to prepare data for analytics, namely, ad hoc/manual [76,120], automated [68], and human-in-the-loop [89,113]. Manual approaches to data curation are still the predominant choice for data workers [15] and usually do not facilitate building well-defined processes for required data curation activities, and, hence, the impact of data curation (particularly transfor-mations) on the quality of the data [120] remains unclear, as observed in the Illness Severity Prediction case study with the impact of inconsistent and missing data. ...
... Manual approaches to data curation are still the predominant choice for data workers [15] and usually do not facilitate building well-defined processes for required data curation activities, and, hence, the impact of data curation (particularly transfor-mations) on the quality of the data [120] remains unclear, as observed in the Illness Severity Prediction case study with the impact of inconsistent and missing data. Thus, it can lead to several potential issues, including bias effects brought in by the data worker during their analysis process, the problem of data reusability brought in by lack of transparency and documentation of the analysis or generative processes, and scalability and generalisability across different datasets and use cases [61,76]. ...
Article
Full-text available
The appetite for effective use of information assets has been steadily rising in both public and private sector organisations. However, whether the information is used for social good or commercial gain, there is a growing recognition of the complex socio-technical challenges associated with balancing the diverse demands of regulatory compliance and data privacy, social expectations and ethical use, business process agility and value creation, and scarcity of data science talent. In this vision paper, we present a series of case studies that highlight these interconnected challenges, across a range of application areas. We use the insights from the case studies to introduce Information Resilience, as a scaffold within which the competing requirements of responsible and agile approaches to information use can be positioned. The aim of this paper is to develop and present a manifesto for Information Resilience that can serve as a reference for future research and development in relevant areas of responsible data management.
... High quality surgical stereoscopic video is based on two streams of 1.5 Gbits/s each, while radiological images from a CT scan, MRI, or PET scan might take up several hundred megabytes per test. In 2009, 2.5 PBytes were needed to store mammograms in the United States, and by 2010, 30% of all photos kept globally were medical images [3]. Medical records are kept for extended periods in many nations. ...
Article
Full-text available
Endoscopic video storage is a major issue today in cloud-based health centre. The Electronics Health Record must include full-length endoscopic surgeries for diagnosis and research. Hence this is paper presents compression of endoscopic using Map Reduce technique. Artificial Intelligent based solution is employed as an intelligent video splitter to form the key value as "Map" stages to filter the endoscopic video into a group of frames based on redundancy. These outputs are passed to "reduce" to merge them into a single output. After mapping and reducing endoscopic video frames, lossless compression is applied and the experimental results for PSNR 30-40 dB, SSI 0.7-0.8, Bitrate 32.17 and MSE 2.1 is obtained.
... Durch die Digitalisierung von fast allen Lebensbereichen steigt das Volumen von generierten Daten seit Jahren stetig an. [1][2][3] Besonders relevant ist diese Entwicklung in den Ingenieur:innenwissenschaften. Die durch Industrie 4.0 generierten Datenmengen bedürfen eines adäquaten Managements um moderne Methoden wie Machine Learning (ML) aber auch klassische Methoden der Informationsgewinnung zu ermöglichen. ...
Article
Full-text available
Forschungsdatenmanagement (FDM) gewinnt seit Jahren an Bedeutung. Das Ziel, Daten wiederverwendbar aufzubereiten und nachzunutzen anstatt sie aufwendig neu zu erheben, wird von Forschenden der deutschen Ingenieur:innenwissenschaften jedoch nur selten verfolgt. Um dem entgegenzuwirken, wurde ein Rahmenwerk für das FDM in den Ingenieur:innenwissenschaften entwickelt. In einer Proof-Of-Concept-Studie soll dieses nun erstmals anhand des Forschungsprojekts „KIOptiPack“ validiert werden.
... CT, MRI, and PET-scan images, for instance, can take up several hundred megabytes per examination, while high-definition surgical stereoscopic video is based on two streams of 1.5 Gbits/s each. In 2009, 2.5 PBytes were needed to store mammograms in the United States, and by 2010, 30% of all images stored globally were medical images [3]. Also, the preservation time for medical records is relatively considerable in many nations.Medical photographs related to a patient must be retained for 20 years following their final visit in nations like France and Poland. ...
Article
Full-length endoscopic operations are increasingly being stored. Endoscopic videos are used for further diagnosis and research, and they must be combined with patient-level data in the electronic health record (EHR). The health-based cloud centre is a large-scale private cloud that requires a lot of storage capacity. Using a HEVC-based method, this research provides a revolutionary video compression technique based on DPCM. Effective compression Technique is explored in our proposed study. Our goal is to demonstrate a method for lossless compression of endoscopic video while maintaining quality as a part of our research work. Our implementation measures performance in terms of PSNR, SSI, and other metrics. Discussion: The proposed method is implemented in MATLAB, the testing results revealed its use in terms of compression ratio, PSNR, SSI, and bit rate.
... High quality surgical stereoscopic video is based on two streams of 1.5 Gbits/s each, while radiological images from a CT scan, MRI, or PET scan might take up several hundred megabytes per test. In 2009, 2.5 PBytes were needed to store mammograms in the United States, and by 2010, 30% of all photos kept globally were medical images [3]. Medical records are kept for extended periods in many nations. ...
Conference Paper
Endoscopic video storage is a major issue today in cloud-based health centre. The EHR must include full-length endoscopic surgeries for diagnosis and research. This paper presents MapReduce-based compression. Thus, MapReduce solves the storage issue. The first AI-based solution is employed as an intelligent video splitter to form the key value as "Map" stages to filter the endoscopic video into a group of frames based on redundancy. These outputs are passed to "reduce" to merge them into a single output. After mapping and reducing endoscopic video frames, lossless compression is applied with PSNR 30-40 dB, SSI 0.7-0.8, Bitrate 32.17, MSE 2.1.
... The appearance and evolution of databases in geochemistry are intimately linked to the progression of analytical techniques in geochemical laboratories that have allowed researchers to acquire an ever-growing volume of data, and to the changes in the way researchers have (or have not) shared these data with their peers and the general public. The geochemical data ecosystem has radically changed since the first geochemical analyses were performed in the 19th century: from the exponential growth of data volumes (e.g., Hey and Trefethen, 2003); to the advent of electronic publishing (e.g., Smith, 2001); to the rise of the Open Science movement (e.g., Woelfle et al., 2011;Vicente-Saez and Martinez-Fuentes, 2018); to the rapidly expanding application of computational methodologies such as artificial intelligence and machine learning for extracting new knowledge from geochemical data (e.g., Pignatelli and Piochi, 2021;He et al., 2022). ...
... The technicization of science is a natural consequence of the development of information and communication technologies, fast computer networks, media convergence, massification, and globalization of the processes of obtaining, processing and using information, the increase in information resources (Hey, Trefethen, 2003;Jankowski, 2007), the pressure of sharing data and communication between scientists with different audiences, and the increasing importance of reputation and the adoption of reputation management systems in scientific careers (Burgelman et al., 2010). The technicization of science refers to the pervasive use of information and communication technology (Buecheler et al., 2010) to create scientific knowledge (Fausto et al., 2012). ...
... Yet, in our time the necessity of procuring or caring for such specialized resources has been questioned in many quarters (Knoche, 2016), observing how they might be out of touch with the epistemology of contemporary academia. While the traditional task of the book collection as privileged storage of information -an "ark to save learning from deluge," in Francis Bacon's (1561-1626) words 13 -has been unequivocally challenged, we could argue that the precisely because collections are limited assets -as opposed to the boundless extent of the digital information flood (Hey & Trefethen, 2003) -and because they often express a clear critical stance -rather than the supposed neutrality of large repositories -they could be endowed with new meanings in contemporary scholarship and teaching. As often contended by Italian semiologist Umberto Eco (1932Eco ( -2016, the real task of education in the humanities today should not be to enlarge the boundaries of what is known, but rather to decimate 14 -or, in German, aufheben: to remove and preserve at once: removing what is superfluous in a specific field of inquiry to succeed in preserving what is essential (Assunto, 1978). ...
Conference Paper
Full-text available
The College of Architecture and Urban Planning (CAUP) at Tongji University has recently acquired a remarkable book collection on Western art and architecture: the personal research library of the late Henry A. Millon (1927-2018). The research presented in the paper tries to address the challenges and affordances of this collection. It does that by developing a methodology that leverages heuristic models from data science to interpret, classify, and translate its contents, and to represent them as an interconnected set of associative visual representations. The goal of this approach is to use an image-based language to make the contents of the collection truly accessible for research, teaching, and public outreach. Such a methodology could prove to be at once scalable, economical, and effective, due to the intrinsic capacity of images to foster critical insight, easily overcoming barriers in language, specialization, and cultural background: "images externalize and clarify common ground. They can be understood, revised, and manipulated by a community … they facilitate information processing, they expand long-term memory, they organize thought, they promote inference and discovery. Because they are visual and spatial, they allow human agility in visual-spatial processing and inference" (Tversky, 2011, p. 502).
... Global Biodiversity Information Facility [GBIF], Food and Agriculture Organization of the United Nations [FAO], national statistical institutes) that can be used in downstream studies (Ladouceur & Shackelford 2021), such as ES assessments. In fact, according to Borgman et al. (2006Borgman et al. ( , 2007, data are becoming scientific capital, and e-Science (a term used to represent the increasingly global collaborations of people and shared resources) promises to increase the pace of science via fast, distributed access to computational resources, analytical tools, and digital libraries (Hey & Trefethen 2003). Digital data can be used for several purposes, generally involving the compilation, standardization, and integration of data from different sources. ...
Article
African coastal ecosystems encompass high biodiversity that provides crucial ecosystem services (ES). However, the supply of these ES is threatened due to ecosystem degradation, which threatens human well-being and livelihoods. This study investigated the link between pressures and the ES provided by marine macroinvertebrates (MMI) in mangroves and seagrasses. We assessed ecosystem condition (marine protected areas, MPAs), pressures, namely climate change (sea surface temperature and sea level), land-use and land-cover changes, and overexploitation (mangrove deforestation and overfishing), and core MMI ES (provisioning, regulation, cultural). Our results revealed a low ratio of MPAs compared to the Aichi target 11, emphasizing the need for a comprehensive conservation strategy. Sea temperature and level showed an increasing trend, indicating the vulnerability of coastal ecosystems to climate change. The decline in mangrove forest cover highlights the need to mitigate adverse effects of land-use change. The increasing number of artisanal fishery licenses suggests an increased pressure on MMI, which can have severe consequences for local communities. MMI food production, particularly shrimp, and recreational fishing increased in the last two decades. Regulation services, and cultural services related to research and education, varied through time due to the limited availability of data. This information was used to develop an exploratory conceptual model illustrating the complex relationships among pressures, condition, MMI ES, and management goals for the sustainable use of marine resources and their connection with food security. Our findings underscore the importance of preserving MMI populations and habitats while addressing knowledge gaps to enhance the resilience of coastal ecosystems.
... -Spanish university webs‖, -teenagers' MySpaces‖, -all blogs‖). This is not the escience -data deluge‖ (Hey & Trefethen, 2003) but is more like an e-research -document deluge‖. ...
Thesis
Full-text available
Lennart Björneborn‘s famous tweet, connecto ergo sum‘, which means, ‗i link, therefore I exist‘, puts forward the intriguing dimension of the web as a platform for link-based research, a major tenet of Webometrics. Webometrics, as discussed in this study, explored the web presence, web visibility, web-impact and linkage of archival institutions in the ESARBICA region; examining the types of institutions that provide links to archival institutions in the ESARBICA region; to examine the search queries that lead patrons to the websites of archival institutions in the ESARBICA region; establish the essential web services provided to clients, as well as ascertaining the extent to which archival institutions in the ESARBICA region have implemented Web 2.0. The study was underpinned by the Citation Analysis theory. Search engines, metasearch engines and web content analysis were used to collect webometrics data from ESARBICA archival websites. The data was analysed using UCINET for Windows ©2002, Microsoft Excel ©2013 and NVivo 10© 2014 software packages. The findings of the study revealed that the web-impact of ESARBICA archival institutions is generally low as evidenced by the low impact factors attained. The impact results show that in the ESARBICA region, Southern Africa was more represented with the archival institutions from six countries (Lesotho, Malawi, Namibia, South Africa, Swaziland and Zimbabwe), while the Eastern African region had archival institutions from two countries (Kenya and Tanzania). The findings further showed that not all archival websites attained web presence in the form of accessible websites, as such, institutions such as the National Archives and Records Services of Botswana made use of their Facebook page to attain web presence. The link classification results revealed that the ESARBICA websites mostly attracted industry links with extensions .com and .co as the most popular Top Level Domain (TLD). A strong link relationship was noted between archival institutions and research based activities in universities, as well as evidence of openness as archival institutions published documents with archives related discussions on Google Scholar. The study showed that ESARBICA archival websites are not interactive in nature and have not yet embraced Web 2.0 tools on their archival websites. The implications of the study included that archival institutions without websites might consider attaining web presence through constructing websites, establishment of link relationships by archival institutions, making efforts to avail more data to enhance web presence in ranking. The study recommended that ESARBICA archival institutions host standalone websites; establish links with archives related research sites; establish a feedback mechanism and make use of Search Engine Optimisation and Web 2.0 tools to enhance web visibility of archives.
... As the digital world advances, there has been an increase in the number of day-to-day interactions over proliferating forms of communication, including newer, nuanced products such as Siri, Alexa, and Google Assistant, known as virtual personal assistants (VPA), and the internet of things (IoTs) applications [1,2]. The two key drivers of the digital revolution are Moore's Law-the exponential increase in computing power and solid-state memory, and the substantial progress in enhancing communication bandwidth [3]. This, in fact, has raised the expectations and demands of customers. ...
Article
Full-text available
Human communication is predominantly expressed through speech and writing, which are powerful mediums for conveying thoughts and opinions. Researchers have been studying the analysis of human sentiments for a long time, including the emerging area of bimodal sentiment analysis in natural language processing (NLP). Bimodal sentiment analysis has gained attention in various areas such as social opinion mining, healthcare, banking, and more. However, there is a limited amount of research on bimodal conversational sentiment analysis, which is challenging due to the complex nature of how humans express sentiment cues across different modalities. To address this gap in research, a comparison of multiple data modality models has been conducted on the widely used MELD dataset, which serves as a benchmark for sentiment analysis in the research community. The results show the effectiveness of combining acoustic and linguistic representations using a proposed neural-network-based ensemble learning technique over six transformer and deep-learning-based models, achieving state-of-the-art accuracy.
... Organizations such as the UK e-Science Programme (Hey & Trefethen, 2003), the Australian e-Research Programme (Tsoi et al., 2007), and others have brought to the attention of stakeholders the importance of research data as a resource whose worth must be preserved for future research. To accomplish this, research data must be systematically organized, securely kept, fully described, easily findable, accessible with the necessary authority, shareable, preserved, and curated (Procter et al., 2012). ...
Article
Full-text available
Due to the vast amounts of data generated during research activities and mandates obtained from funding agencies, higher education institutions put a high priority on research data management (RDM). RDM is prioritized by global organizations, and India is no exception. The current work aims to address the current state of RDM in the top-ranked higher education institutions (HEIs) of India and attempts to answer the research questions posed on various aspects of RDM, such as RDM services, policy implementation, responsible stakeholders, funding support for RDM strategy and development, and so on. Furthermore, unlike most of the existing literature, the development of RDM capacity is viewed as an institutional concern in this study rather than a library concern. The survey and website analysis approaches were used to conduct this study. The outcomes of the study show that RDM is still in its early stage among India’s leading institutions.
... This process needs to be automated because most genomes are too large for manual annotation, not to mention the need to annotate as many genomes as possible since sequencing speed is no longer an issue. The annotations are made possible by the fact that genes have recognizable start and end regions (promoters and terminators that often have similar or identical composition in different groups of organisms), although the exact sequence found in these regions may vary between genes [3]. ...
Article
Full-text available
In this article, we will consider what IT technologies are most used in medicine and by genomics methods in particular, also we will take a look at the use of big data in this matter. Additionally, we will learn what a connectome is, analyze 4M and 3V frameworks in genomics. Statistics in medicine is one of the analysis tools experimental data and clinical observations, as well as the language by means of which the obtained mathematical results are reported. However, this is not the only task of statistics in medicine. Mathematical apparatus widely used for diagnostic purposes, solving classification problems and search for new patterns, for setting new scientific hypotheses. The use of statistical programs presupposes knowledge of the basic methods and stages of statistical analysis: their sequence, necessity and sufficiency. In the proposed presentation, the main emphasis is not on detailed presentation of the formulas that make up the statistical methods, and on their essence and application rules. Finally, we talk through genome-wide association studies, methods of statistical processing of medical data and their relevance. In this article, we analyzed the basic concepts of statistics, statistical methods in medicine and data science, considered several areas in which large amounts of data are used that require modern IT technologies, including genomics, genome-wide association studies, visualization and connectome data collection.
... This paper lies at the focal point of three orthogonal advances. First, the recent surge in GLAM 1 -led digitisation efforts (Terras, 2011), open citizen science (Haklay et al., 2021) and the expansive commodification of data (Hey and Trefethen, 2003), have enabled a new mode of historical inquiry that capitalises on the 'big data of the past' (Kaplan and Di Lenardo, 2017). Second, the 2017 breakthrough that was the transformer architecture (Vaswani et al., 2017) has led to the so-called ImageNet moment of Natural Language Processing (Ruder, 2018) in transfer-learning (Raffel et al., 2020), few-shot learning (Schick and Schütze, 2021), zero-shot learning (Sanh et al., 2021), and prompt-based learning (Le Scao and Rush, 2021) for natural language. ...
... This paper lies at the focal point of three orthogonal advances. First, the recent surge in GLAM 1 -led digitisation efforts (Terras, 2011), open citizen science (Haklay et al., 2021) and the expansive commodification of data (Hey and Trefethen, 2003), have enabled a new mode of historical inquiry that capitalises on the 'big data of the past' (Kaplan and Di Lenardo, 2017). Second, the 2017 breakthrough that was the transformer architecture (Vaswani et al., 2017) has led to the so-called ImageNet moment of Natural Language Processing (Ruder, 2018) and brought about unprecedented progress in transfer-learning (Raffel et al., 2020), few-shot learning (Schick and Schütze, 2021), zero-shot learning (Sanh et al., 2021), and prompt-based learning (Le Scao and Rush, 2021) for natural language. ...
Preprint
Full-text available
In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.
... Datasets and reproducibility of research play a crucial role in modern data-driven research. Scientific data management has become increasingly complex and is gaining traction in the research community, mainly when spotlighting data and metadata's share, reuse, and interoperation, particularly for machine-actionable processes [1]. Genomic databases are classic examples of this scenario. ...
Preprint
Full-text available
While the publication of datasets in scientific repositories has become broadly recognised, the repositories tend to have increasing semantic-related problems. For instance, they present various data reuse obstacles for machine-actionable processes, especially in biological repositories, hampering the reproducibility of scientific experiments. An example of these shortcomings is the GenBank database. We propose GAP, an innovative data model to enhance the semantic data meaning to address these issues. The model focuses on converging related approaches like data provenance, semantic interoperability, FAIR principles, and nanopublications. Our experiments include a prototype to scrape genomic data and trace them to nanopublications as a proof of concept. For this, (meta)data are stored in a three-level nanopub data model. The first level is related to a target organism, specifying data in terms of biological taxonomy. The second level focuses on the biological strains of the target, the central part of our contribution. The strains express information related to deciphered (meta)data of the genetic variations of the genomic material. The third level stores related scientific papers (meta)data. We expect it will offer higher data storage flexibility and more extensive interoperability with other data sources by incorporating and adopting associated approaches to store genomic data in the proposed model.
... "Data Deluge" [Hey 2003] 2015: 6.5 ZettaByte (10 21 ) of data 2020: 44-60 ZettaByte of data [IDC] Science nowadays data-driven HPC/Supercomputing as a driving force Example: Molecular Simulation Workflow Trajectories of molecular systems based on computer models Calculation of the trajectories on the compute nodes Output files are written to the parallel filesystem Knowledge gained through output data analysis (multiple 100TB/project) Publication of results in a scientific paper Figure 1. "'Dark data' is not carefully indexed and stored so it becomes nearly invisible to scientists and other potential users and therefore is more likely to remain underutilized and eventually lost [..]" [Heidorn 2008, 280] 2. " [..] the type of data that exists only in the bottom left-hand desk drawer of scientists on some media that is quickly aging and soon will be unreadable by commonly available devices" [Heidorn 2008, 281] "Dark Data" [Heidorn 2008]: unusable/invisible data 1. ...
Presentation
Full-text available
... This problem of scale has been widely documented (e.g. Berman, 2008;Hey, 2003). And it is from this problem that others arise. ...
Thesis
The goal of this study is to understand if digital repositories that have a preservation mandate are engaging in disaster planning, particularly in relation to their pursuit of trusted digital repository status. For those that are engaging in disaster planning, the study examines the creation of formal disaster response and recovery plans, finding that in most cases the process of going through an audit for certification as a trusted repository provides the impetus for the creation of formalized disaster planning documentation. This paper also discusses obstacles that repositories encounter and finds that most repositories struggle with making their documentation available.
Article
The purpose of this study is to reveal the current situation of research data management (RDM) services within the framework of data management responsibilities, policies, budget and resource competencies in libraries at research universities in Türkiye. The study also aims to present the potential of researchers from research universities to store and share their research data in data archives. Within the scope of the research, interviews were conducted with 15 participants who are responsible or have the potential to be responsible for the processes related to the RDM. In addition, the records of researchers, who are members of research universities, transferred to Zenodo and Aperta were examined within the scope of the study. According to the results, the amount of shared data sets in Zenodo and Aperta was quite small. However, almost all of the data transferred to these data archives is open access. Findings based on participant opinions showed that tools and techniques such as cloud storage, modern techniques, devices, and service providers, as well as resources such as budget, infrastructure and personnel for RDM are not sufficient at research universities in Türkiye. It is thought that this study will draw attention to the benefits of RDM services for universities by revealing what kind of roles the libraries affiliated with research universities play in the process of RDM.
Article
Full-text available
There is a paradigm shift in practicing methods of research which has entered the phase of eScience. The very essence of science is changing, particularly through the deployment of electronic networks and high-speed computers which are the two core components of e-science. This transformation is not limited to the natural sciences, where e-science has become, the modus operandi but also the domains of the social sciences and humanities. Preserving and archiving raw data for reuse has become imminent in the context of time and money. Librarians have the responsibility to initiate research data literacy to facilitate research data management. This article presents an overall picture of research data literacy and role of library professionals. India, in order to promote its scientific productivity needs to join the stream of research data management which needs initiatives from the academic libraries.
Chapter
Endoscopic video storage is a major issue today in cloud-based health centre. The EHR must include full-length endoscopic surgeries for diagnosis and research. This paper presents MapReduce-based compression. Thus, MapReduce solves the storage issue. The first AI-based solution is employed as an intelligent video splitter to form the key value as “Map” stages to filter the endoscopic video into a group of frames based on redundancy. These outputs are passed to “reduce” to merge them into a single output. After mapping and reducing endoscopic video frames, lossless compression is applied with PSNR 30–40 dB, SSI 0.7–0.8, Bitrate 32.17, MSE 2.1.
Article
Full-text available
With the growing global awareness of the environmental impact of clothing consumption, there has been a notable surge in the publication of journal articles dedicated to “fashion sustainability” in the past decade, specifically from 2010 to 2020. However, despite this wealth of research, many studies remain disconnected and fragmented due to varying research objectives, focuses, and approaches. Conducting a systematic literature review with a mixed methods research approach can help identify key research themes, trends, and developmental patterns, while also shedding light on the complexity of fashion, sustainability, and consumption. To enhance the literature review and analytical process, the current systematic literature review employed text mining techniques and bibliometric visualization tools, including RAKE, VOSviewer, and CitNetExplorer. The findings revealed an increase in the number of publications focusing on “fashion and sustainability” between 2010 and 2021. Most studies were predominantly conducted in the United States, with a specific focus on female consumers. Moreover, a greater emphasis was placed on non-sustainable cues rather than the sustainable cues. Additionally, a higher number of case studies was undertaken to investigate three fast-fashion companies. To enhance our knowledge and understanding of this subject, this article highlights several valuable contributions and provides recommendations for future research.
Article
Data, a conceptual object that has been brought anew to light with the arise of big data, open data, artificial intelligence, machine learning and algorithms, considered as the "new economic black gold of the 21st century" (Lemaire, 2022), as a competitive advantage (CNIL and BpiFrance 2018) and as a pillar of Industry 4.0 (Mandon and Bellit, 2021), is the subject of trade wars and the issue of recurring citizen debates. The need for data control is constantly quoted, confirmed since the Covid-19 pandemic (European Commission, 2020). Data literacy is then seen as a miracle solution, the key skill of the twenty-first century (Alliancy, 2021) prominent to the training of "data literates" able to understand and to master this phenomenon. The question about the way data literacy deals with the challenge of data acculturation arises: what visions does it depict? To which skills and knowledge does it train? Does it bring a single solution or a plurality of distinct applications? Must we talk then about data literacy or data literacies? In this article, we will shed a new light on the characteristics of data literacy by recalling its legacies and evolutions while highlighting the questionings that are imposed on it.
Chapter
Research Data Practices (RDP) refer to research activities conducted across the lifespan of data. Characterizing RDP in disciplinary contexts is beneficial for providing data stakeholders with practical understanding of RDP necessary to design data curation services which are tailored to researchers’ need. In this paper, we focus on the five most common types of RDP – collecting data, processing data, analyzing data, representing data, and publishing or citing data. First, we compared the distributions of the five types of RDP across disciplines and observed noticeable differences between disciplines. In addition, we examined the characteristics of each type of RDP under different disciplinary contexts, by developing discipline-specific RDP vocabulary employing the tf-idf approach. Based on the common terms as well as the discipline-specific ones, we found that the five types of RDP can be distinctly conceptualized, while each type of RDP varies by disciplines in terms of their action, object, and instrument.
Thesis
With the new information requirements in the field of research, derived from the ERA, e-Science, e-Research and Open Science, the role of the academic libraries needed to evolve in order to even the quality level that the research community requires. Thus, resarch data management is meant to be a general core, because data have become the reference point that make research to prove its results, that it can be replicated, disseminated and spread. This fact makes appropriate management the goal to be achieved, in particular for academic libraries due to their path and transversality. This Master Thesis brings together and lists the main characteristics to frame the scope of the role of the librarian, the consulting and instructional service to be given to offer a quality management service, the audience to whom should be directed, and the institutional infrastructure allies, in order to show that this professional development is not a utopia
Article
Full-text available
This study aims to analyze the needs of researchers in a regional comprehensive university for research data management services; discuss the options for developing a research data management program at the university; and then propose a phased three-year implementation plan for the university libraries. The method was to design a survey to collect information from researchers and assess and evaluate their needs for research data management services. The results show that researchers’ needs in a regional comprehensive university could be quite different from those of researchers in a research-intensive university. Also, the results verify the hypothesis that researchers in the regional comprehensive university would welcome the libraries offering managed data services for the research community. Therefore, this study suggests a phased three-year implementation plan. The significance of the study is that it can give some insights and helpful information for regional comprehensive universities that are planning to develop a research data management program.
Article
This article reports results from a survey about data management practices and attitudes sent to agriculture researchers and extension personnel at the University of Tennessee Institute of Agriculture (UTIA) and the College of Agricultural Sciences and Warner College of Natural Resources at Colorado State University. Results confirm agriculture researchers, like many other scientists, continue to exhibit data management practices that fall short of generally accepted best practices. In addition, librarians, and others seeking to influence future behavior, may be informed by our finding of a relationship between the land-grant mission and researchers' data management practices. [ABSTRACT FROM AUTHOR]
Article
Full-text available
In the aftermath of revelations made by ex-NSA employee Edward Snowden about violation of privacy of individuals by states in the name of surveillance, right to privacy became one of the highly debated rights. There is no doubt that the state must secure privacy of its citizens, but it also has a responsibility towards safety of the citizens. There exist different views related to privacy and surveillance. One view is that the state has no right to look into the private affairs of an individual while the other view is that there is no harm in putting someone suspicious under the surveillance as it is the duty of the State to prevent any untoward act in the society. Considering the contrasting views about privacy and surveillance, this article explores the position existing in the United Kingdom and aims to answer several questions pertaining to the Privacy v. Surveillance debate.
Article
Full-text available
The growth of published science in recent years has escalated the difficulty that human and algorithmic agents face in reasoning over prior knowledge to select the next experiment. This challenge is increased by uncertainty about the reproducibility of published findings. The availability of massive digital archives, machine reading, extraction tools and automated high-throughput experiments allows us to evaluate these challenges computationally at scale and identify novel opportunities to craft policies that accelerate scientific progress. Here we demonstrate a Bayesian calculus that enables positive prediction of robust scientific claims with findings extracted from published literature, weighted by scientific, social and institutional factors demonstrated to increase replicability. Illustrated with the case of gene regulatory interactions, our approach automatically estimates and counteracts sources of bias, revealing that scientifically focused but socially and institutionally diverse research activity is most likely to replicate. This results in updated certainty about the literature, which accurately predicts robust scientific facts on which new experiments should build. Our findings allow us to identify and evaluate policy recommendations for scientific institutions that may increase robust scientific knowledge, including sponsorship of increased diversity of and independence between investigations of any particular scientific phenomenon, and diversity of scientific phenomena investigated.
Chapter
In this chapter, we will introduce several key points of this new discipline with a particular focus on human-inspired cognitive systems. We will provide several examples of well-known developed robots, to finally reach a detailed description of a special case study: F.A.C.E., Facial Automaton for Conveying Emotions, which is a highly expressive humanoid robot with a bio-inspired cognitive system. At the end of the chapter, we will briefly discuss the future perspective about this branch of science and its potential merging with the IoT, giving our vision of what could happen in a not-too-distant future.
Chapter
While the publication of datasets in scientific repositories has become broadly recognised, the repositories tend to have increasing semantic-related problems. For instance, they present various data reuse obstacles for machine-actionable processes, especially in biological repositories, hampering the reproducibility of scientific experiments. An example of these shortcomings is the GenBank database. We propose GAP, an innovative data model to enhance the semantic data meaning to address these issues. The model focuses on converging related approaches like data provenance, semantic interoperability, FAIR principles, and nanopublications. Our experiments include a prototype to scrape genomic data and trace them to nanopublications as a proof of concept. For this, (meta)data are stored in a three-level nanopub data model. The first level is related to a target organism, specifying data in terms of biological taxonomy. The second level focuses on the biological strains of the target, the central part of our contribution. The strains express information related to deciphered (meta)data of the genetic variations of the genomic material. The third level stores related scientific papers (meta)data. We expect it will offer higher data storage flexibility and more extensive interoperability with other data sources by incorporating and adopting associated approaches to store genomic data in the proposed model.KeywordsNanopublicationFAIR principlesData provenanceGenomic dataReusabilityInteroperability
Article
This paper presents findings from an interview study of research data managers in academic data archives. Our study examined policies and professional autonomy with a focus on dilemmas encountered in everyday work by data managers. We found that dilemmas arose at every stage of the research data lifecycle, and legacy data presents particularly vexing challenges. The iFields' emphasis on knowledge organization and representation provides insight into how data, used by scientists, are used to create knowledge. The iFields' disciplinary emphasis also encompasses the sociotechnical complexity of dilemmas that we found arise in research data management. Therefore, we posit that iSchools are positioned to contribute to data science education by teaching about ethics and infrastructure used to collect, organize, and disseminate data through problem‐based learning.
Article
Full-text available
The ever-increasing amount of data generated from experiments and simulations in engineering sciences is relying more and more on data science applications to generate new knowledge. Comprehensive metadata descriptions and a suitable research data infrastructure are essential prerequisites for these tasks. Experimental tribology, in particular, presents some unique challenges in this regard due to the interdisciplinary nature of the field and the lack of existing standards. In this work, we demonstrate the versatility of the open source research data infrastructure Kadi4Mat by managing and producing FAIR tribological data. As a showcase example, a tribological experiment is conducted by an experimental group with a focus on comprehensiveness. The result is a FAIR data package containing all produced data as well as machine- and user-readable metadata. The close collaboration between tribologists and software developers shows a practical bottom-up approach and how such infrastructures are an essential part of our FAIR digital future.
Article
Full-text available
RESUMO Objetivo: De forma similar à "explosão informacional" o fenômeno do Big Data vem sendo de forma crescente, objeto da CI/OC. Como descobrir, acessar, processar e reusar a enorme e crescente quantidade de dados que são disponibilizados continuamente na Web por nossa sociedade? Em especial, como tratar os chamados "dados não estruturados", documentos textuais, que sempre foram o objeto da CI/OC? Metodologia: Teorias de amplo espectro como Ontologia e Semiótica foram utilizadas para analisar dados como elemento essencial do Big Data, em especial os "dados não estruturados". Resultados: A partir da análise de várias definições de dados, um dado é identificado como parte de esquemas lógicos e semióticos já conhecidos, as proposições. Um dado é encontrado juntamente com outros, formando conjuntos de dados. Conjuntos de dados são na verdade conjuntos de proposições. Estas estão presentes no que é conhecido como dados estruturados-tabelas de bancos de dados relacionais ou de planilhas. Documentos textuais também contém conjuntos de proposições. Dados estruturados são comparados com "dados não estruturados". Conclusões: Embora no limite, ambos contenham proposições e possam ser equivalentes, enquanto conjuntos, dados estruturados são expressos e percebidos como um todo, conjuntos de dados não estruturados são processuais, expressos sequencialmente o que torna mais difícil a identificação de dados não estruturados em documentos textuais para seu processamento por máquinas.
Article
Full-text available
Objective. This study aims to analyze the scientific production on research data management indexed in the Dimensions database. Design/Methodology/Approach. Using the term “research data management” in the Dimensions database, 677 articles were retrieved and analyzed employing bibliometric and altmetric indicators. The Altmetrics.com system was used to collect data from alternative virtual sources to measure the online attention received by the retrieved articles. Bibliometric networks from journals bibliographic coupling and keywords co-occurrence were generated using the VOSviewer software. Results/Discussion. Growth in scientific production over the period 1970-2021 was observed. The countries/regions with the highest rates of publications were the USA, Germany, and the United Kingdom. Among the most productive authors were Andrew Martin Cox, Stephen Pinfield, Marta Teperek, Mary Anne Kennan, and Amanda L. Whitmire. The most productive journals were the International Journal of Digital Curation, Journal of eScience Librarianship, and Data Science Journal, while the most representative research areas were Information and Computing Sciences, Information Systems, and Library and Information Studies. Conclusions. The multidisciplinarity in research data management was demonstrated by publications occurring in different fields of research, such as Information and Computing Sciences, Information Systems, Library and Information Studies, Medical and Health Sciences, and History and Archeology. About 60% of the publications had at least one citation, with a total of 3,598 citations found, featuring a growing academic impact. Originality/Value. This bibliometric and altmetric study allowed the analysis of the literature on research data management. The theme was investigated in the Dimensions database and analyzed using productivity, impact, and online attention indicators.
Article
Full-text available
In the design and development of novel materials that have excellent mechanical properties, classification and regression methods have been diversely used across mechanical deformation simulations or experiments. The use of materials informatics methods on large data that originate in experiments or/and multiscale modeling simulations may accelerate materials’ discovery or develop new understanding of materials’ behavior. In this fast-growing field, we focus on reviewing advances at the intersection of data science with mechanical deformation simulations and experiments, with a particular focus on studies of metals and alloys. We discuss examples of applications, as well as identify challenges and prospects.
Preprint
Full-text available
In the design and development of novel materials that have excellent mechanical properties, classification and regression methods have been diversely used across mechanical deformation simulations or experiments. The use of materials informatics methods on large data that originate in experiments or/and multiscale modeling simulations may accelerate materials discovery or develop new understanding of materials’ behavior. In this fast-growing field, we focus on reviewing advances at the intersection of data science with mechanical deformation simulations and experiments, with a particular focus on studies of metals and alloys. We discuss examples of applications, as well as identify challenges and prospects.
Article
Full-text available
In December 1999,IBM announced the start of a five-year effort to build a massively parallel computer, to be applied to the study of biomolecular phenomena such as protein folding. The project has two main goals: to advance our understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. This project should enable biomolecular simulations that are orders of magnitude larger than current technology permits. Major areas of investigation include: how to most effectively utilize this novel platform to meet our scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets, with reasonable cost, through novel machine architectures. This paper provides an overview of the Blue Gene project at IBM Research. It includes some of the plans that have been made, the intended goals, and the anticipated challenges regarding the scientific work, the software application, and the hardware design.
Conference Paper
Full-text available
The usefulness of the many on-line journals and scientific digital lib raries that exist today is limited by the lack of a service that can federate them through a unified interface. The Open Archive Initiative (OAI) is one major effort to address technical interoperability among distributed archives. The objective of OAI is to develop a framework to facilitate the discovery of content in distributed archives. In this paper, we describe our experience and lessons learned in building Arc, the first federated searching service based on the OAI protocol. Arc harvests metadata from several OAI compliant archives, normalizes them, and stores them in a search service based on a relational database (MySQL or Oracle). At present we have over 165K metadata records from 16 data providers from various domains.
Conference Paper
Full-text available
Data Grids are becoming increasingly important in scientific communities for sharing large data collections and for archiving and disseminating them in a digital library framework. The Storage Resource Broker (SRB) provides transparent virtualized middleware for sharing data across distributed, heterogeneous data resources separated by different administrative and security domains. The MySRB is a Web-based interface to the SRB that provides a user-friendly interface to distributed collections brokered by the SRB. In this paper we briefly describe the use of the SRB infrastructure as tools in the data grid architecture for building distributed data collections, digital libraries, and persistent archives. We also provide details about the MySRB and its functionalities.
Conference Paper
Grids tie together distributed storage systems and execution platforms into globallyaccessible resources. Data Grids provide collection management and global namespacesfor organizing data objects that reside within a grid. Knowledge-based grids provideconcept spaces for discovering relevant data objects. In a knowledge-based grid, dataobjects are discovered by mapping from scientific domain concepts, to the attributes usedwithin a data grid collection, to the digital objects residing in an archival storage system.These concepts will be explored in the context of large-scale storage in the web, andillustrated based on infrastructure under development at the San Diego SupercomputerCenter.
Article
If digital documents and their programs are to be saved, their migration must must not modify their bit streams, beacuse programs and their files can be corrupted by the slightest change. If such changes are unavoidable, they must be reversible without loss. More over, one must record enough detail about each transformation to allow reconstruction of the original encoding of the bit stream. Although bit streams can be designed to be immune to any expected change, future migration may introduce unextpected alterations. Similarly, encryption makes it impossible to recover an original bit stream without the decryption key.
Conference Paper
Implicit in the evolution of current technology and high-end system evolution is the anticipated achievement of the implementation of computers capable of a peak performance of 1 Petaflops by the year 2010. This is consistent with both the semiconductor industry’s roadmap of basic device technology development and an extrapolation of the TOP-500 list of the world’s fastest computers according to the Linpack benchmark. But if contemporary experiences with today’s largest systems hold true for their descendents at the end of the decade, then they will be very expensive (> $100M), consume too much power (> 3 Mwatts), take up too much floor space (> 10,000 square feet), deliver very low efficiency (< 10%), and are too difficult to program. Also important is the likely degradation of reliability due to the multiplicative factors of MTBF and scale of components. Even if these systems do manage to drag the community to the edge of Petaflops, there is no basis of confidence to assume that they will provide the foundation for systems across the next decade that will transition across the trans-Petaflops performance regime. It has become increasingly clear that an alternative model of system architecture may be required for future generation high-end computers.
Conference Paper
The preservation of digital data for the long term presents a variety of challenges from technical to social and organizational. The technical challenge is to ensure that the information, generated today, can survive long term changes in storage media, devices and data formats. This paper presents a novel approach to the problem. It distinguishes between archiving of data files and archiving of programs (so that their behavior may be reenacted in the future).For the archiving of a data file, the proposal consists of specifying the processing that needs to be performed on the data (as physically stored) in order to return the information to a future client (according to a logical view of the data). The process specification and the logical view definition are archived with the data.For the archiving of a program behavior, the proposal consists of saving the original executable object code together with the specification of the processing that needs to be performed for each machine instruction of the original computer (emulation).In both cases, the processing specification is based on a Universal Virtual Computer that is general, yet basic enough as to remain relevant in the future.
Article
The Open Archives Initiative (OAI) develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. The roots of the OAI lie in the E-Print community. Over the last year its focus has been extended to include all content providers. This paper describes the recent history of the OAI -- its origins in promoting E-Prints, the broadening of its focus, the details of its technical standard for metadata harvesting, the applications of this standard, and future plans.
In Search of Petabyte Databases, talk at 2001 HPTS Workshop Asilomar www.research
  • J Gray
  • T Hey
Gray J. and Hey T., In Search of Petabyte Databases, talk at 2001 HPTS Workshop Asilomar www.research.microsoft/~gray
The myGrid Project: http://mygrid.man.ac.uk [45] The Comb-e-Chem project
  • Geodise The
  • R W Project
  • Moore
The GEODISE Project: http://www.geodise.org/ [44] The myGrid Project: http://mygrid.man.ac.uk [45] The Comb-e-Chem project: http://www.combechem.org [46] R.W.Moore, Digital Libraries, Data Grids and Persistent Archives, presentation at NARA December 2001.
  • Eu Us Workshop On Large Scientific
  • Databases
EU/US Workshop on Large Scientific Databases: http://www.cacr.caltech.edu/euus [60] J.Rothenberg, Ensuring the Longevity of Digital Documents, Scientific American, 272(1), January 1995.
  • Ginsparg E-Print Archive
Ginsparg e-print archive: http://arxiv.org [54] C. Lynch, Metadata Harvesting and the Open Archives Initiative, ARL Bimonthly Report 217:1-9 (2001).
Arc – An OAI Service Provider for Cross-Archive Searching, JCDL'01 [56] The Caltech Library Systems Digital Collections project: http://library
  • Z Lui
  • K Maly
  • M Zubair
  • M Nelson
Z. Lui, K. Maly, M. Zubair and M.Nelson, Arc – An OAI Service Provider for Cross-Archive Searching, JCDL'01, Roanoke, Virginia ACM 65-66 (2001) [56] The Caltech Library Systems Digital Collections project: http://library.caltech.edu/digital/
Shaping the Future? Grids, Web Services and Digital Libraries
  • T Hey
  • L Lyon
T. Hey and L.Lyon, Shaping the Future? Grids, Web Services and Digital Libraries, International JISC/CNI Conference, Edinburgh, Scotland, June 2002.
Service: http://edina.ac.uk [34] Arts and Humanities Data Service
  • Mimas Service
MIMAS Service: http://www.mimas.ac.uk [33] EDINA Service: http://edina.ac.uk [34] Arts and Humanities Data Service: http://www.ahds.ac.uk [35] Xindice native XML database: http://xml.apache.org/xindice [36] DAML+OIL: http://www.daml.org/2001/03/daml+oil-index.html
Towards a Semantic Grid
  • D Deroure
  • N Jennings
  • N Shadbolt
D. DeRoure, N. Jennings and N.Shadbolt, Towards a Semantic Grid, Concurrency & Computation (to be published) and in this collection.
Digital Libraries, Data Grids and Persistent Archives, presentation at NARA
  • R W Moore
R.W.Moore, Digital Libraries, Data Grids and Persistent Archives, presentation at NARA December 2001.
Metadata Harvesting and the Open Archives Initiative
  • C Lynch
C. Lynch, Metadata Harvesting and the Open Archives Initiative, ARL Bimonthly Report 217:1-9 (2001).