About
87
Publications
10,612
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
984
Citations
Introduction
John S. Erickson, Ph.D. has spent nearly two decades applying Web Science principles to solve the unique social, legal, and technical problems inherent in creating and disseminating information in networked ecosystems. John is the Deputy Director of the Web Science Research Center at the Tetherless World Constellation at Rensselaer Polytechnic Institute, managing the delivery of large scale open government data and eScience projects that advance Semantic Web best practices.
Current institution
Additional affiliations
December 2009 - present
Bitwacker Associates
Position
- Head of Faculty
Description
- John uses Web Science principles to solve the unique social, legal, and technical problems inherent in creating and disseminating information in networked ecosystems.
Education
August 1992 - June 1997
August 1988 - August 1989
August 1980 - May 1984
Publications
Publications (87)
Governments around the world have been releasing raw data to their citizens at an increased pace. The mixing and linking of these datasets by a community of users enhances their value and makes new insights possible. The use of mashups — digital works in which data from one or more sources is combined and presented in innovative ways — is a great w...
In this poster, we will present the International Open Government Dataset Search (IOGDS). IOGDS is a faceted browsing interface for searching over more than one million open government datasets from around the world. We will present ongoing research and development towards the improved discovery and access to open government data. IOGDS has been de...
In December 2010, the International Open Government Dataset Search (IOGDS) team at the Tetherless World Constellation (TWC) at Rensselaer Polytechnic Institute (RPI) embarked on a project to discover, document, and analyze open data catalogs published by governments at various levels around the world. By early 2013, the IODGS project had accumulate...
The TWC International Open Government Dataset Catalog (IOGDC) integrates a diverse selection of more than 70 government dataset catalogs from around the world. IOGDC demonstrates a practical dataset catalog metadata model for integrating diverse dataset catalogs collected from the real world and linking those catalogs into Linked Data Cloud. IOGDC'...
As open government initiatives around the world publish an increasing number of raw datasets, citizens and communities face daunting challenges when organizing, understanding, and associating disparate data related to their interests. Immediate and incremental solutions are needed to integrate, collaboratively manipulate, and transparently consume...
We propose a survival analysis approach for discovering and characterizing user behavior and risks for lending protocols in decentralized finance (DeFi). We demonstrate how to gather and prepare DeFi transaction data for survival analysis. We illustrate our approach using transactions in Aave, one of the largest lending protocols. We develop a DeFi...
The emerging decentralized financial ecosystem (DeFi) is comprised of numerous protocols, one type being lending protocols. People make transactions in lending protocols, each of which is attributed to a specific blockchain address which could represent an externally-owned account (EOA) or a smart contract. Using Aave, one of the largest lending pr...
As concerns have grown about bias in ML models, the field of ML fairness has expanded considerably beyond classification. Researchers now propose fairness metrics for regression, but unlike classification there is no literature review of regression fairness metrics and no comprehensive resource to define, categorize, and compare them. To address th...
Introduction and aims
Dietary Rational Gene Targeting (DRGT) is a therapeutic dietary strategy that uses healthy dietary agents to modulate the expression of disease-causing genes back toward the normal. Here we use the DRGT approach to (1) identify human studies assessing gene expression after ingestion of healthy dietary agents with an emphasis o...
We propose a decentralized finance (DeFi) survival analysis approach for discovering and characterizing user behavior and risks in lending protocols. We demonstrate how to gather and prepare DeFi transaction data for survival analysis. We demonstrate our approach using transactions in AAVE, one of the largest lending protocols. We develop a DeFi su...
The way people respond to messaging from public health organizations on social media can provide insight into public perceptions on critical health issues, especially during a global crisis such as COVID-19. It could be valuable for high-impact organizations such as the US Centers for Disease Control and Prevention (CDC) or the World Health Organiz...
This paper reports on Data Analytics Research (DAR), a course-based undergraduate research experience (CURE) in which undergraduate students conduct data analysis research on open real-world problems for industry, university, and community clients. We describe how DAR, offered by the Mathematical Sciences Department at Rensselaer Polytechnic Instit...
The way people respond to messaging from public health organizations on social media can provide insight into public perceptions on critical health issues, especially during a global crisis such as COVID-19. It could be valuable for high-impact organizations such as the US Centers for Disease Control and Prevention (CDC) or the World Health Organiz...
Access to healthcare data such as electronic health records (EHR) is often restricted by laws established to protect patient privacy. These restrictions hinder the reproducibility of existing results based on private healthcare data and also limit new research. Synthetically-generated healthcare data solve this problem by preserving privacy and ena...
This study examines how social determinants associated with COVID-19 mortality change over time. Using US county-level data from July 5 and December 28, 2020, the effect of 19 high-risk factors on COVID-19 mortality rate was quantified at each time point with negative binomial mixed models. Then, these high-risk factors were used as controls in two...
In this exploratory study, we scrutinize a database of over one million tweets collected from March to July 2020 to illustrate public attitudes towards mask usage during the COVID-19 pandemic. We employ natural language processing, clustering and sentiment analysis techniques to organize tweets relating to mask-wearing into high-level themes, then...
Generating synthetic data represents an attractive solution for creating open data, enabling health research and education while preserving patient privacy. We reproduce the research outcomes obtained on two previously published studies, which used private health data, using synthetic data generated with a method that we developed, called HealthGAN...
Because radio spectrum is a finite resource, its usage and sharing is regulated by government agencies. These agencies define policies to manage spectrum allocation and assignment across multiple organizations, systems, and devices. With more portions of the radio spectrum being licensed for commercial use, the importance of providing an increased...
This study examines social determinants associated with disparities in COVID-19 mortality rates in the United States.Using county-level data, 42 negative binomial mixed models were used to evaluate the impact of social determinants on COVID-19 outcome. First, to identify proper controls, the effect of 24 high-risk factors on COVID-19 mortality rate...
In this exploratory study, we scrutinize a database of over 1 million tweets collected across the first five months of 2020 to draw conclusions about public attitudes towards the preventative measure of mask usage during the COVID-19 pandemic. In recent months, a body of literature has emerged to suggest the robustness of trends in online activity...
We develop a semantics-driven, automated approach for dynamically performing rigorous scientific studies. This framework may be applied to a wide variety of data and study types; here, we demonstrate its suitability for conducting retrospective cohort studies using publicly available population health data. The goal is to identify risk factors that...
Increased understanding of developmental disorders of the brain has shown that genetic mutations, environmental toxins and biological insults typically act during developmental windows of susceptibility. Identifying these vulnerable periods is a necessary and vital step for safeguarding women and their fetuses against disease causing agents during...
One primary task of population health analysis is the identification of risk factors that, for some subpopulation, have a significant association with some health condition. Examples include finding lifestyle factors associated with chronic diseases and finding genetic mutations associated with diseases in precision health. We develop a combined se...
We present a new "grey-box" approach to anomaly detection in smart manufacturing. The approach is designed for tools run by control systems which execute recipe steps to produce semiconductor wafers. Multiple streaming sensors capture trace data to guide the control systems and for quality control. These control systems are typically PI controllers...
A major challenge when working with open government data is managing, connecting, and understanding the links between references to entities found across multiple datasets when these datasets use different vocabularies to refer to identical entities (i.e.: one dataset may refer to Microsoft as "Microsoft", another may refer to the company by its SE...
A large number of legacy datasets are contained in geoscience literature published between 1930 and 1980 and not expressed external to the publication text in digitized formats. Extracting, organizing, and reusing these “dark” datasets is highly valuable for many within the earth and planetary science community. As a part of the Deep Carbon Observa...
Reproducibility and reuse are rapidly becoming guiding principles in publishing and sharing scientific results. In order to enhance researchers' ability to leverage existing results, many are moving in the direction of semantic workflow systems, which enable users to define and share experimental procedures as linked data on the Web. These workflow...
Geoscience researchers are increasingly dependent on informatics and the Web to conduct their research. Geoscience is one of the first domains that take lead in initiatives such as open data, open code, open access, and open collections, which comprise key topics of Open Science in academia. The meaning of being open can be understood at two levels...
By treating the end-to-end data science workflow as data itself and through the conceptual modeling of the goals and functional intent of the data analyst, the entire process of data analytics becomes open and accessible to the powerful tools of artificial intelligence, machine learning, statistics, and data mining. We examine the fundamental quest...
New NIH grants require establishing scientific rigor, i.e. applicants must provide evidence of strict application of the scientific method to ensure robust and unbiased experimental design, methodology, analysis, interpretation and reporting of results. Researchers must transparently report experimental details so others may reproduce and extend fi...
Data interoperability is an ongoing challenge for global open data initiatives. The machine-readable specification of data types for datasets will help address interoperability issues. Data types have typically been at the syntactical level such as integer, float and string, etc. in programming languages. The work presented in this paper is a model...
The widespread propagation of networked computers in Earth and space science and especially the representation of scientific data in formats that can be shared, analyzed and visualized has given rise to exciting new opportunities for exploration and discovery. data are the representation of some facts. We can see data of various subjects, types and...
In decision support systems such as those designed to predict scientific and technical emergence based on analysis of collections of data the presentation of provenance lineage records in the form of a human-readable explanation has been shown to be an effective strategy for assisting users in the interpretation of results. This work focuses on the...
The Research Data Alliance (RDA) - Data Type Registry (DTR) working group output addressed a core issue relevant to data interoperability: to parse, understand, and potentially reuse data retrieved from others. DTR explores ways to enable data creators to record and make explicit the implicit assumptions of a dataset. The RDA - Persistent Identifie...
The Deep Carbon Observatory (DCO) community is building a cyber-enabled platform for linked science, made available to the community by a multi-institutional data portal. Persistent identifiers and domain specific data types have been identified as key technological issues the portal must address. This presentation focuses on the DCO portal’s plann...
Focusing on malicious attacks on the current infrastructure might be distracting us from another looming challenge: the risk to emerging infrastructure due to carelessness.
The multi-disciplinary nature of Web Science and the large size and diversity of data collected and studied by its practitioners has inspired a new type of Web resource known as the Web Observatory. Web observatories are platforms that enable researchers to collect, analyze and share data about the Web and to share tools for Web research. At the Bo...
This paper explores the impact of health information technologies, including the Web, on society and advocates for the development of a Health Web Observatory (HWO) to collect, store and analyze new sources of health information. The paper begins with a high-level literature review from across domains to demonstrate the need for a multi-disciplinar...
As the World Wide Web continues to grow and change, the need to study and understand it grows. Web Science is an effort to do just this. Due to the multidisciplinary nature of Web Science, and the wide variety of data on the web it studies and produces, Web Observatories are needed to help foster collaboration and provide archiving of work and stud...
A major challenge when working with open government data is managing, connecting, and understanding the links between references to entities found across multiple datasets when these datasets use different vocabularies to refer to identical entities i.e.: one dataset may refer to Microsoft as "Microsoft", another may refer to the company by its SEC...
First responder communities must identify technologies that are effective in performing duties ranging from law enforcement to emergency medical to fire fighting. We aimed to create tools that gather and assist in quickly understanding responders' requirements using semantic technologies and social network analysis. We describe the design and proto...
A database structure that may be used for semistructured databases assigns each node of a database to a collection. For each collection, create rights, retrieve rights, associate rights and dissassociate rights are provided to one or more users, the rights being assigned in common for all nodes of the collection. Users can only carry out the task i...
A database structure that may be used for semistructured databases assigns each node of a database to a collection. For each collection, create rights, retrieve rights, associate rights and dissassociate rights are provided to one or more users, the rights being assigned in common for all nodes of the collection. Users can only carry out the task i...
One aspect of the present invention is a method of playing multi-media content through a personal computer. The personal computer includes a processor and memory, with the memory having software instructions stored therein. The processor executes the instructions to carry-out the method. The method includes: receiving data representing multi-media...
International open government initiatives are releasing an increasing volume of raw government datasets directly to citizens via the Web. The transparency resulting from these releases not only creates new application opportunities but also imposes new burdens inherent to large-scale distributed data integration, collaborative data manipulation and...
Apparatus for controlling cross-organizational access by end users associated with a plurality of organizations to one or more distributed object services available via a resource server across an information technology communications network. The apparatus comprises at least one Requesting Organization (RO) having access to services via the resour...
A method of providing internet access to a data object repository comprising managing data objects hosted by said repository using a generalised repository directed graph data model having object nodes and resources, said resources comprising at least one of
(i) a literal;
(ii) actual resource data; and/or
(iii) a URI directing a request for resour...
Copyrighted electronic media are packaged in a secure electronic format, and registered on associated registration server, which serves to provide on-line licensing and copyright management for that media. Users are connected to the server, e.g., through a computer network or the Internet, to enable data transfers and to transact licenses to utiliz...
Emphasizing communication, collaborative work, and community, the authors envision a cloud-based platform that inverts the traditional application-content relationship by placing content rather than applications at the center, enabling users to rapidly build customized solutions around their content items. The future of collaboration will focus on...
In this paper, we argue increased outsourcing of non-core competencies will create demand for cloud-based platforms to address the need for content-centered collaboration between organizations. We introduce a prototype created to evaluate the suitability of current enterprise content management (ECM) technologies for this type of platform. Followin...
Copyrighted electronic media are packaged in a secure electronic format, and registered on associated registration server, which serves to provide on-line licensing and copyright management for that media. Users are connected to the server, e.g., through a computer network or the Internet, to enable data transfers and to transact licenses to utiliz...
Surveys of open repository implementers over recent years have clearly highlighted the "institutional" focus of institutional repositories (IRs).2 The stated motivations for implementing IRs uniformly emphasize the requirements of the host institution, while the benefits claimed for individual users and contributors are either those of the institut...
Messages including encrypted data and having the form of XML documents are exchanged within an information technology network according to Simple Object Access Protocol (SOAP). Each message includes a session key (encrypted to the public key of the party receiving the message) within the XML document containing the encrypted data, meaning that each...
Digital Creative Works such as copyrighted electronic media are packaged in a secure electronic format, or CONTAINER, and registered on associated registration server, which serves to provide on-line licensing and copyright management for that Work. Users are connected to the registration server through a computer network or the Internet to enable...
Copyrighted electronic media are packaged in a secure electronic format, and registered on associated registration server, which serves to provide on-line licensing and copyright management for that media. Users are connected to the server, e.g., through a computer network or the Internet, to enable data transfers and to transact licenses to utiliz...
There is growing dissatisfaction with the established scholarly communication system. This dissatisfaction is the result of a variety of factors including rapidly rising subscription prices, concerns about copyright, latency between results and their actual publication, and restrictions on what can be published and how it can be disseminated. The r...
Digital rights management (DRM) mechanisms, built upon trusted computing platforms, promise to give content providers the ability to impose rules reliably and deterministically on end-user experiences with information resources ranging from literary works and scholarly publications to a vast array of entertainment content. DRM represents just the f...
The increased commercial demand and use of sophisticated digital rights management (DRM) technologies due to the wide scale adoption of trusted computing principles in end-user systems is discussed. DRM includes a range of technologies that provide parties varying degrees of control over how digital content and services may be used, including by wh...
An abstract is not available.
Current rights management systems focus on the rights of the content provider. Privacy protection schemes exist that would enable the protection of consumer rights while allowing also the protection of content provider rights. We propose that the W3C provide a rights management framework that includes the consumer as a first class participant, and...
Copyrighted electronic media are packaged in a secure electronic format, and registered on associated registration server, which serves to provide on-line licensing and copyright management for that media. Users are connected to the server, e.g., through a computer network or the Internet, to enable data transfers and to transact licenses to utiliz...
In this paper, we argue that increased outsourcing of non-core competencies will drive the demand for a new generation of multi-tenanted cloud-based platforms that address the needs of content-centered collaboration between organizations. We introduce the FRACTAL conceptual prototype which has allowed us to evaluate the suitability of current enter...
With Xerox's upcoming unveiling of an XML-based version the Digital Property Rights Language (DPRL), I've been pondering to what extent Xerox and other rights management players (e. g. InterTrust) will work towards open standards. In particular, I'm curious to see if they'll work to foster a rights management interoperability framework analogous to...
We present Quambo, a recommender system add-on for the DSpace open source repository platform. We explain how Quambo generates content recommendations based upon a user selected set of examples, our approach to presenting content recommendations to the user, and our experiences applying the system to a repository of technical reports. We consider h...
Thesis (Ph. D.)--Dartmouth College, Thayer School of Engineering, 1997. Photocopy.