
Jane Greenberg- Drexel University
Jane Greenberg
- Drexel University
About
193
Publications
20,945
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,477
Citations
Current institution
Publications
Publications (193)
The materials science community seeks to support the FAIR principles for computational simulation research. The MatCore Project was recently launched to address this need, with the goal of developing an overall metadata framework and accompanying guidelines. This paper reports on the MatCore goals and overall progress. Historical background context...
Biologists study Diatoms, a fundamental algae, to assess the health of aquatic systems. Diatom specimens have traditionally been preserved on analog slides, where a single slide can contain thousands of these microscopic organisms. Digitization of these collections presents both metadata challenges and opportunities. This paper reports on metadata...
HIVE4MAT is a linked data interactive application for navigating ontologies of value to materials science. HIVE enables automatic indexing of textual resources with standardized terminology. This article presents the motivation underlying HIVE4MAT, explains the system architecture, reports on two evaluations, and discusses future plans.
This paper reports on a scientometric analysis bolstered by human in the loop, domain experts, to examine the field of metal organic frameworks (MOFs) research. Scientometric analyses reveal the intellectual landscape of a field. The study engaged MOF scientists in the design and review of our research workflow. MOF materials are an essential compo...
Purpose
This paper reports on a scientometric analysis bolstered by human-in-the-loop, domain experts, to examine the field of metal-organic frameworks (MOFs) research. Scientometric analyses reveal the intellectual landscape of a field. The study engaged MOF scientists in the design and review of our research workflow. MOF materials are an essenti...
Image‐based machine learning tools are an ascendant ‘big data’ research avenue. Citizen science platforms, like iNaturalist, and museum‐led initiatives provide researchers with an abundance of data and knowledge to extract. These include extraction of metadata, species identification, and phenomic data. Ecological and evolutionary biologists are in...
This book explores the latest advances in how knowledge organization can both draw on and inform different disciplines and technological developments. It examines how best to combine theory and practice. The content considers practical solutions as well as the theory behind the design, development and implementation of knowledge organization system...
Flexible metadata pipelines are crucial for supporting the FAIR data principles. Despite this need, researchers seldom report their approaches for identifying metadata standards and protocols that support optimal flexibility. This paper reports on an initiative targeting the development of a flexible metadata pipeline for a collection containing ov...
Researchers across nearly every discipline seek to leverage ontologies for knowledge discovery and computational tasks; yet, the number of machine readable materials science ontologies is limited. The work presented in this paper explores the Processing, Structure, Properties and Performance (PSPP) framework for accelerating the development of mate...
This paper reports on a demonstration of YAMZ (Yet Another Metadata Zoo) as a mechanism for building community consensus around metadata terms. The demonstration is motivated by the complexity of the metadata standards environment and the need for more user-friendly approaches for researchers to achieve vocabulary consensus. The paper reviews a ser...
Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the associated metadata is often sparse and, at times, erroneous. This paper extends previous research conducted with the Illinois Natural History Survey (INHS) colle...
Researchers across nearly every discipline seek to leverage ontologies for knowledge discovery and computational tasks; yet, the number of machine readable materials science ontologies is limited. The work presented in this paper explores the Processing, Structure, Properties and Performance (PSPP) framework for accelerating the development of mate...
Flexible metadata pipelines are crucial for supporting the FAIR data principles. Despite this need, researchers seldom report their approaches for identifying metadata standards and protocols that support optimal flexibility. This paper reports on an initiative targeting the development of a flexible metadata pipeline for a collection containing ov...
This panel will explore provenance of the past, present, and future in information science and technology, as both a concept and professional and intellectual value. The OED defines “provenance” as origin, source, ownership of an artwork, or guidance to determine authenticity. Provenance today is not limited to history domains. It can be used to de...
Reproducibility of research is essential for science. However, in the way modern computational biology research is done, it is easy to lose track of small, but extremely critical, details. Key details, such as the specific version of a software used or iteration of a genome can easily be lost in the shuffle or perhaps not noted at all. Much work is...
This research explores temporal concept drift and temporal alignment in knowledge organization systems (KOS). A comparative analysis is pursued using the 1910 Library of Congress Subject Headings, 2020 FAST Topical, and automatic indexing. The use case involves a sample of 90 nineteenth-century Encyclopedia Britannica entries. The entries were inde...
Measuring the distance between ontological elements is a fundamental component for any matching solutions. String-based distance metrics relying on discrete symbol operations are notorious for shallow syntactic matching. In this study, we explore Wasserstein distance metric across ontology concept embeddings. Wasserstein distance metric targets con...
Metal-Organic Frameworks (MOFs) are a class of modular, porous crystalline materials that have great potential to revolutionize applications such as gas storage, molecular separations, chemical sensing, catalysis, and drug delivery. The Cambridge Structural Database (CSD) reports 10,636 synthesized MOF crystals which in addition contains ca. 114,37...
Reproducibility of research is essential for science. However, in the way modern computational biology research is done, it is easy to lose track of small, but extremely critical, details. Key details, such as the specific version of a software used or iteration of a genome can easily be lost in the shuffle, or perhaps not noted at all. Much work i...
Metadata is a key data source for researchers seeking to apply machine learning (ML) to the vast collections of digitized biological specimens that can be found online. Unfortunately, the available metadata is often sparse and, at times, erroneous. This paper extends previous research with the Illinois Natural History Survey (INHS) collection (7,24...
FAIR metadata is critical to supporting FAIR data overall. Transparency, community engagement, and flexibility are key aspects of FAIR that apply to metadata. This paper presents YAMZ (Yet Another Metadata Zoo), a community-driven vocabulary application that supports FAIR. The history of YAMZ and its original features are reviewed, followed by a pr...
Reproducibility of research is critical for science. Computational biology research presents a significant challenge, given the need to track critical details, such as software version or genome draft iteration. Metadata research infrastructures, while greatly improved, often assume a level of programming skills in their user community, or rely on...
Species classification is an important task which is the foundation of industrial, commercial, ecological and scientific applications involving the study of species distributions, dynamics and evolution.
While conventional approaches for this task use off‐the‐shelf machine learning (ML) methods such as existing Convolutional Neural Network (ConvNet...
In the preface to his Cyclopaedia published in 1728 Ephraim Chambers offers readers a systematized structure of his attempt to produce a universal repository of human knowledge. Divided into an interconnected taxonomic tree and domain vocabulary, this structure forms the basis of one effort from the Metadata Research Center to study historical onto...
The advantages of data sharing across organizations and disciplines are indisputable; although, sensitive and restricted data cannot be easily shared due to policies and legal matters. The research presented in this paper takes a step toward systematizing the sharing of sensitive and restricted research data by developing an ontology to frame and g...
Knowledge Organization Systems (KOS) as networks of knowledge have the potential to inform AI operations. This paper explores natural language processing and machine learning in the context of KOS and Helping Interdisciplinary Vocabulary Engineering (HIVE) technology. The paper presents three use cases: HIVE and Historical Knowledge Networks, HIVE...
FAIR metadata is critical to supporting FAIR data overall. Transparency, community engagement, and flexibility are key aspects of FAIR that apply to metadata. This paper presents YAMZ (Yet Another Metadata Zoo), a community-driven vocabulary application that supports FAIR. The history ofYAMZ and its original features are reviewed, followed by a pre...
Purpose
The output of academic literature has increased significantly due to digital technology, presenting researchers with a challenge across every discipline, including materials science, as it is impossible to manually read and extract knowledge from millions of published literature. The purpose of this study is to address this challenge by exp...
Preservation pipelines demonstrate extended value when digitized content is also computation-ready. Expanding this to historical controlled vocabularies published in analog format requires additional steps if they are to be fully leveraged for research. This paper reports on work addressing this challenge. We report on a pipeline and project progre...
Metadata are key descriptors of research data, particularly for researchers seeking to apply machine learning (ML) to the vast collections of digitized specimens. Unfortunately, the available metadata is often sparse and, at times, erroneous. Additionally, it is prohibitively expensive to address these limitations through traditional, manual means....
Sentence embedding methods offer a powerful approach for working with short textual constructs or sequences of words. By representing sentences as dense numerical vectors, many natural language processing (NLP) applications have improved their performance. However, relatively little is understood about the latent structure of sentence embeddings. S...
Processes and practices—and in general, informational doings and their diverse constellations—are pertinent elements of the information landscape. This panel presents research on documentation and description of processes and practices in the information field addressing: 1) how different conceptualisations of processes and practices influence how...
Scientific literature is one of the most significant resources for sharing knowledge. Researchers turn to scientific literature as a first step in designing an experiment. Given the extensive and growing volume of literature, the common approach of reading and manually extracting knowledge is too time consuming, creating a bottleneck in the researc...
In the preface to his Cyclopaedia published in 1728 Ephraim Chambers offers readers a systematized structure of his attempt to produce a universal repository of human knowledge. Divided into an interconnected taxonomic tree and domain vocabulary, this structure forms the basis of one effort from the Metadata Research Center to study historical onto...
Scientific literature is one of the most significant resources for sharing knowledge. Researchers turn to scientific literature as a first step in designing an experiment. Given the extensive and growing volume of literature, the common approach of reading and manually extracting knowledge is too time consuming, creating a bottleneck in the researc...
Preservation pipelines demonstrate extended value when digitized content is also computation ready. Expanding this to historical controlled vocabularies published in analog format requires additional steps if they are to be fully leveraged for research. This paper reports on work addressing this challenge. We report on a pipeline and project progre...
Metadata is a key data source, particularly for researchers seeking to apply machine learning (ML) to the vast collections of digitized specimens. Unfortunately, the available metadata is often sparse and, at times, erroneous. Additionally, it is prohibitively expensive to address these limitations through traditional, manual means. This paper repo...
Reproducible computational research (RCR) is the keystone of the scientific method for in silico analyses, packaging the transformation of raw data to published results. In addition to its role in research integrity, improving the reproducibility of scientific studies can accelerate evaluation and reuse. This potential and wide support for the FAIR...
The field of LIS continues to face a vexing paradox. Its longstanding ideal of and concomitant commitment to serving diverse communities and users equally has failed to translate into diversity, equity, and inclusion (DEI) in the profession or in LIS education. This article analyzes efforts to promote diversity, equity, and inclusion in North Ameri...
Introduction
Secondary use of electronic health record (EHR) data for research requires that the data are fit for use. Data quality (DQ) frameworks have traditionally focused on structural conformance and completeness of clinical data extracted from source systems. In this paper, we propose a framework for evaluating semantic DQ that will allow res...
Biodiversity image repositories are crucial sources for training machine learning approaches to support biological research. Metadata about object (e.g. image) quality is a putatively important prerequisite to selecting samples for these experiments. This paper reports on a study demonstrating the importance of image quality metadata for a species...
This paper introduces Helping Interdisciplinary Vocabulary Engineering for Materials Science (HIVE-4-MAT), an automatic linked data ontology application. The paper provides contextual background for materials science, shared ontology infrastructures, and knowledge extraction applications. HIVE-4-MAT’s three key features are reviewed: 1) Vocabulary...
Biodiversity image repositories are crucial sources of training data for machine learning approaches to biological research. Metadata, specifically metadata about object quality, is putatively an important prerequisite to selecting sample subsets for these experiments. This study demonstrates the importance of image quality metadata to a species cl...
Introduces HIVE-4-MAT - Helping Interdisciplinary Vocabulary Engineering for Materials Science, an automatic linked data ontology application. Covers contextual background for materials science, shared ontology infrastructures, and reviews the knowledge extraction and indexing process. HIVE-4-MAT's vocabulary browsing, term search and selection, an...
Species classification is an important task that is the foundation of industrial, commercial, ecological, and scientific applications involving the study of species distributions, dynamics, and evolution.
While conventional approaches for this task use off-the-shelf machine learning (ML) methods such as existing Convolutional Neural Network (ConvNe...
This paper presents a use case exploring the application of the Archival Resource Key (ARK) persistent identifier for promoting and maintaining ontologies. In particular, we look at improving computation with an in-house ontology server in the context of temporally aligned vocabularies. This effort demonstrates the utility of ARKs in preparing hist...
Purpose
Given the ubiquitous presence of the internet in our lives, many individuals turn to the web for medical information. A challenge here is that many laypersons (as “consumers”) do not use professional terms found in the medical nomenclature when describing their conditions and searching the internet. The Consumer Health Vocabulary (CHV) onto...
Reproducible computational research (RCR) is the keystone of the scientific method for in silico analyses, packaging the transformation of raw data to published results. In addition to its role in research integrity, RCR has the capacity to significantly accelerate evaluation and reuse. This potential and wide-support for the FAIR principles have m...
Representing aboutness is a challenge for humanities documents, given the linguistic indeterminacy of the text. The challenge is even greater when applying automatic indexing to historical documents for a multidisciplinary collection, such as encyclopedias. The research presented in this paper explores this challenge with an automatic indexing comp...
Over the last twenty years, a wide variety of resources have been developed to address the rights and licensing problems inherent with contemporary data sharing practices. The landscape of developments is this area is increasingly confusing and difficult to navigate, due to the complexity of intellectual property and ethics issues associated with s...
The data paper, an emerging scholarly genre, describes research data sets and is intended to bridge the gap between the publication of research data and scientific articles. Research examining how data papers report data events, such as data transactions and manipulations, is limited. The research reported on in this article addresses this limitati...
The data paper, an emerging scholarly genre, describes research datasets and is intended to bridge the gap between the publication of research data and scientific articles. Research examining how data papers report data events, such as data transactions and manipulations, is limited. The research reported on in this paper addresses this limitation...
Introduction: Medical regulatory authorities (MRAs) depend upon accurate and up to date information about health professionals to ensure that practitioners are qualified to safely practice their profession. While there appears to be consensus in the international community of MRAs regarding the types of information required to make licensure/regist...
Our rapidly growing, data-driven culture is motivating curriculum change in nearly every discipline, not the least of which is information science. This article explores this change specifically within the iSchool community, in which information science is a major unifying discipline. A cross-institutional analysis of data-related curricula was con...
As the demand for data science and data‐intensive capabilities grows in all sectors, educators in schools of information and library and information science are working to deepen and expand their programs to meet workforce expectations. This panel will examine current trends and investments in data education and professionalization, with an emphasi...
Agencies responsible for regulating health professionals require detailed data describing individual practitioners and their qualifications. Initial collection of these data allows agencies to determine whether health professionals are qualified to practise safely. Ongoing collection of practitioner data is also critical for ensuring an adequate su...
Agencies responsible for regulating health professionals require detailed data describing individual practitioners and their qualifications. Initial collection of these data allows agencies to determine whether health professionals are qualified to practise safely. Ongoing collection of practitioner data is also critical for ensuring an adequate su...
Legal and policy-oriented restrictions often hamper if not inhibit well-intended efforts to share sensitive or restricted data. The research reported on in this paper is a part of a larger initiative to develop a prototype system for automatically generating data sharing agreements that address privacy, legal concerns, and other restrictions. A con...
International migration of health professionals has been increasing in our globalized world, compounding a pressing need to improve information systems that confirm their qualifications and track health workforce volume. This paper reports on research to help address this need by introducing a framework for defining health professionals as agents....
The panel aims to update ASIS&T members and information professionals about important standards, specifications and best practice guidelines that have been developed or initiated by the international standards institutions and communities in dealing with open data. The openness and flexibility of the web have created both new opportunities and new...
The purpose of the paper is to provide a framework for addressing the disconnect between metadata and data science. Data science cannot progress without metadata research. This paper takes steps toward advancing the synergy between metadata and data science, and identifies pathways for developing a more cohesive metadata research agenda in data sci...
Scientific software is as important to scientific studies as raw data. Yet, attention to this genre of research data is limited in studies on data reuse, citation, and metadata standards. This paper presents results from an exploratory study that examined how scientific software's reuse information is presented in the current citation practice and...
This book constitutes the refereed proceedings of the 10th Metadata and Semantics Research Conference, MTSR 2016, held in Göttingen, Germany, in November 2016. The 26 full papers and 6 short papers presented were carefully reviewed and selected from 67 submissions. The papers are organized in several sessions and tracks: Digital Libraries, Informat...
This paper reports on research exploring a threshold for engaging scientists in semantic ontology development. The domain application, nanocrystalline metals, was pursued using a multi-method approach involving algorithm comparison, semantic concept/term evaluation, and term sorting. Algorithms from four open source term extraction applications (RA...
Controlled vocabularies have great applicability for organizing and providing access to scientific data. This paper presents research examining the controlled vocabulary use and desired application features specific to scientific data. A survey was conducted, gathering data from U.S. DataNet participants and other data stakeholder communities. Resu...
This paper presents research examining metadata capital in the context of the Viral Vector Core Laboratory at the National Institute of Environmental Health Sciences (NIEHS). Methods include collaborative workflow modeling and a metadata analysis. Models of the laboratory’s workflow and metadata activity are generated to identify potential opportun...
Scientists produce vast amounts of data that often are not preserved properly or do not have inventories, placing them at risk. As part of an effort to more fully understand the data-at-risk predicament, researchers who were engaged in the DARI project at UNC's Metadata Research Center surveyed information custodians working in a range of settings....
Metadata is crucial for understanding data, and can be viewed as a form of capital in the context of Big data. This paper reports on research simulating the potential of SGHI (Self-Generated Health Information) for predicting asthma episodes. A data set of 2,000 cases was generated using the Monte Carlo simulation method, with secondary modificatio...
Data preservation has gained momentum and visibility in connection with the growth in digital data and data sharing policies. The Dryad Repository, a curated general–purpose repository for preserving and sharing the data underlying scientific publications, has taken steps to develop a preservation policy to ensure the long–term persistence of this...
The Research Data Alliance (RDA) Metadata Standards Directory Working Group (MSDWG) is building a directory of descriptive, discipline-specific metadata standards. The purpose of the directory is to promote the discovery, access and use of such standards, thereby improving the state of research data interoperability and reducing duplicative standar...
Purpose
– The purpose of this paper is to examine the effect of the Helping Interdisciplinary Vocabulary Engineering (HIVE) system on the inter-indexer consistency of information professionals when assigning keywords to a scientific abstract. This study examined first, the inter-indexer consistency of potential HIVE users; second, the impact HIVE h...
EDITOR'S SUMMARY
While the value of information is widely recognized, the next step is recognizing metadata as an economic asset. Generating metadata involves costs in technological and human resources, but failure to generate and use metadata can lead to lost opportunity costs. Metadata activities are ultimately motivated by a drive for return on...
This paper explores medatada capital via linked open metadata vocabularies, specifically via the HIVE (Helping Interdisciplinary Vocabulary Engineering) initiative in the U.S. DataNet Federation Consortium (DFC). Formulas representing 'Capital-sigma notation' and 'Succesive growth rates' are introduced as potential means for quanitifying metadata c...
Computer models are widely used in hydrology and water resources management. A large variety of models exist, each tailored to address specific challenges related to hydrologic science and water resources management. When scientists and engineers apply one of these models to address a specific question, they must devote significant effort to set up...
Future predictions generally resolve some place between a desired outcome and a predetermined path set by fixed circumstances. This essay explores the future of metadata, recognizing the impossibility of creating a precise road map comingled with the fact that researchers and practitioners do, in fact, have some capacity to impact future plans. We...
The DataNet Federation Consortium (DFC) is developing data grids for multidisciplinary research. As the DFC grid grows in size and number of disciplines, it becomes critical to address metadata management and findability challenges. The HIVE project is being integrated into the iRODS in the DFC architecture to provide a scaleable linked open data a...
Purpose
– This editorial underscores the importance of linked data and linked open data (LD/LOD) in contemporary librarianship and information science. It aims to present the motivation for this special issue of Library High Tech (LHT), specifically the theme of linking and opening vocabularies (LOV) as a component of the LOD landscape. The editori...
Metadata disorder and unnecessary costs are increasing due to the expanding population of
scientific data schemes and standards. Metadata challenges are reviewed; and SeaIce1
, a
community driven metadata vocabulary application, is introduced as a potential solution. SeaIce
functions and development challenges are presented. CAMP-4-DATA participant...