Mark Alan Musen

Mark Alan Musen
Stanford University | SU · Stanford Center for Biomedical Informatics Research

About

678
Publications
154,954
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
29,816
Citations
Introduction
Mark Alan Musen currently works at the Stanford Center for Biomedical Informatics Research, Stanford University.
Additional affiliations
September 1988 - present
Stanford University
Position
  • Professor
Education
December 1987 - August 1988
Erasmus University Rotterdam
Field of study
  • Biomedical Informatics
September 1983 - December 1987
Stanford University
Field of study
  • Biomedical Informatics
June 1980 - June 1983
Stanford Medicine
Field of study
  • Internal Medicine

Publications

Publications (678)
Chapter
When applying ontologies in practice, human and machine agents need to ensure that their provenance is trustworthy and it can be relied upon the contained concepts. This is particularly crucial for sensitive tasks such as in medical diagnostics or for safety-criticial applications. In this paper, we propose an architecture for the decentralized att...
Conference Paper
Full-text available
Ontologies are shared, formal conceptualizations of a domain that are consumed by human and machine agents alike. Trust in ontolo-gies is a central issue for their application. For example, machine learning algorithms for medical diagnosis may rely on the correctness of ontolo-gies and could potentially deliver false results. For enhancing trust, w...
Article
Objective Although social and environmental factors are central to provider–patient interactions, the data that reflect these factors can be incomplete, vague, and subjective. We sought to create a conceptual framework to describe and classify data about presence, the domain of interpersonal connection in medicine. Methods Our top-down approach fo...
Chapter
This chapter discusses information technology that provides health-care workers and patients with situation-specific advice that can inform their decision making. The intricacies of the clinical environment, new legislative mandates, and the increasing complexity of medical practice all escalate the demand for clinical decision-support systems (CDS...
Preprint
Full-text available
Objective Although social and environmental factors are central to provider patient interactions, the data that reflect these factors can be incomplete, vague, and subjective. We sought to create a conceptual framework to describe and classify data about presence, the domain of interpersonal connection in medicine. Methods Our top down approach for...
Article
Full-text available
While the biomedical community has published several “open data” sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data t...
Article
Full-text available
Metadata that are structured using principled schemas and that use terms from ontologies are essential to making biomedical data findable and reusable for downstream analyses. The largest source of metadata that describes the experimental protocol, funding, and scientific leadership of clinical studies is ClinicalTrials.gov. We evaluated whether va...
Conference Paper
Full-text available
An overarching WHO-FIC Content Model will allow uniform modeling of classifications in the WHO Family of International Classifications (WHO-FIC). It will facilitate the development of a unified Foundation to generate variants of ICD, ICF, and ICHI as linearizations and will promote their joint use. We describe a conceptualization and formalization...
Article
Full-text available
Objective: To assess usability and usefulness of a machine learning-based order recommender system applied to simulated clinical cases. Materials and methods: 43 physicians entered orders for 5 simulated clinical cases using a clinical order entry interface with or without access to a previously developed automated order recommender system. Case...
Preprint
Full-text available
The National Institutes of Health's (NIH) Human Biomolecular Atlas Program (HuBMAP) aims to create a comprehensive high-resolution atlas of all the cells in the healthy human body. Multiple laboratories across the United States are collecting tissue specimens from different organs of donors who vary in sex, age, and body size. Integrating and harmo...
Article
Full-text available
An overarching WHO-FIC Content Model will allow uniform modeling of classifications in the WHO Family of International Classifications (WHO-FIC) and promote their joint use. We provide an initial conceptualization of such a model.
Preprint
Full-text available
While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data t...
Article
Clinical decision support tools that automatically disseminate patterns of clinical orders have the potential to improve patient care by reducing errors of omission and streamlining physician workflows. However, it is unknown if physicians will accept such tools or how their behavior will be affected. In this randomized controlled study, we exposed...
Article
Developing promising treatments in biomedicine often requires aggregation and analysis of data from disparate sources across the healthcare and research spectrum. To facilitate these approaches, there is a growing focus on supporting interoperation of datasets by standardizing data-capture and reporting requirements. Common Data Elements (CDEs)-pre...
Preprint
Full-text available
Objective: To determine whether clinicians will use machine learned clinical order recommender systems for electronic order entry for simulated inpatient cases, and whether such recommendations impact the clinical appropriateness of the orders being placed. Materials and Methods: 43 physicians used a clinical order entry interface for five simulate...
Preprint
Full-text available
Objective: ClinicalTrials.gov is a registry of clinical-trial metadata whose use is required by many funding agencies and scientific publishers. Metadata are essential to the reuse of data, but issues such as heterogenous metadata schemas, inconsistent values, and usage of free text instead of controlled terms pervade many metadata repositories. Ou...
Article
Full-text available
The FAIR principles articulate the behaviors expected from digital artifacts that are Findable, Accessible, Interoperable and Reusable by machines and by people. Although by now widely accepted, the FAIR Principles by design do not explicitly consider actual implementation choices enabling FAIR behaviors. As different communities have their own, of...
Chapter
Pinterest is a popular Web application that has over 250 million active users. It is a visual discovery engine for finding ideas for recipes, fashion, weddings, home decoration, and much more. In the last year, the company adopted Semantic Web technologies to create a knowledge graph that aims to represent the vast amount of content and users on Pi...
Article
Full-text available
In this deliberately provocative position paper, we claim that more than ten years into Linked Data there are still (too?) many unresolved challenges towards arriving at a truly machine-readable and decentralized Web of data. We take a deeper look at key challenges in usage and adoption of Linked Data from the ever-present "LOD cloud" diagram. Here...
Article
Full-text available
The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in...
Preprint
Full-text available
Pinterest is a popular Web application that has over 250 million active users. It is a visual discovery engine for finding ideas for recipes, fashion, weddings, home decoration, and much more. In the last year, the company adopted Semantic Web technologies to create a knowledge graph that aims to represent the vast amount of content and users on Pi...
Chapter
Full-text available
The metadata about scientific experiments published in online repositories have been shown to suffer from a high degree of representational heterogeneity—there are often many ways to represent the same type of information, such as a geographical location via its latitude and longitude. To harness the potential that metadata have for discovering sci...
Preprint
Full-text available
The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed--the CEDAR Workbench--is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality me...
Conference Paper
Full-text available
We present WebProtégé, a tool to develop ontologies represented in the Web Ontology Language (OWL). WebProtégé is a cloud-based application that allows users to collaboratively edit OWL ontologies, and it is available for use at https://webprotege.stanford.edu. WebProtégé currently hosts more than 68,000 OWL ontology projects and has over 50,000 us...
Preprint
Full-text available
Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very...
Preprint
Full-text available
The metadata about scientific experiments published in online repositories have been shown to suffer from a high degree of representational heterogeneity---there are often many ways to represent the same type of information, such as a geographical location via its latitude and longitude. To harness the potential that metadata have for discovering s...
Preprint
Full-text available
This paper introduces HopRank, an algorithm for modeling human navigation on semantic networks. HopRank leverages the assumption that users know or can see the whole structure of the network. Therefore, besides following links, they also follow nodes at certain distances (i.e., k-hop neighborhoods), and not at random as suggested by PageRank, which...
Preprint
Full-text available
There is a growing acknowledgement in the scientific community of the importance of making experimental data machine findable, accessible, interoperable, and reusable (FAIR). Recognizing that high quality metadata are essential to make datasets FAIR, members of the GO FAIR Initiative and the Research Data Alliance (RDA) have initiated a series of w...
Article
Full-text available
We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known databases: BioSample—a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples—a repository managed by the European Bioinformatics Institute (EBI). We...
Preprint
Full-text available
We present the WebProtégé application to develop ontologies represented in the Web Ontology Language (OWL). WebProtégé is a cloud-based application that allows users to collaboratively edit OWL ontologies, and it is available for use at https://webprotege.stanford.edu. WebProtégé currently hosts more than 66,000 OWL ontology projects and has over 5...
Article
Full-text available
BioPortal is widely regarded to be the world's most comprehensive repository of biomedical ontologies. With a coverage of many biomedical subfields by 716 ontologies (June 27, 2018), BioPortal is an extremely diverse repository. BioPortal maintains easily accessible information about the ontologies submitted by ontology curators. This includes size...
Preprint
Full-text available
We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well- known databases: BioSample---a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples---a repository managed by the European Bioinformatics Institute (EBI...
Article
Full-text available
The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious...
Article
Full-text available
Background Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of...
Conference Paper
Full-text available
There is an expectation that scientists will archive their experimental data online in public repositories to enable other investigators to verify their work and to re-explore their data in search of new discoveries. When left to their own devices, however, scientists do a poor job creating the metadata that describe their datasets. A lack of stand...
Article
Full-text available
In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acq...
Article
Full-text available
Adverse drug reactions (ADR) result in significant morbidity and mortality in patients, and a substantial proportion of these ADRs are caused by drug-drug interactions (DDIs). Pharmacovigilance methods are used to detect unanticipated DDIs and ADRs by mining Spontaneous Reporting Systems, such as the US FDA Adverse Event Reporting System (FAERS). H...
Article
Full-text available
Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis — the ontology and the annotations — evolve rapidly. We used gene signatures derived from 104 disease analyses t...
Article
Full-text available
Biomedical ontologies are large: Several ontologies in the BioPortal repository contain thousands or even hundreds of thousands of entities. The development and maintenance of such large ontologies is difficult. To support ontology authors and repository developers in their work, it is crucial to improve our understanding of how these ontologies ar...
Article
Full-text available
Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis - the ontology and the annotations - evolve rapidly. We used gene signatures derived from 104 disease analyses t...
Article
Full-text available
Many vocabularies and ontologies are produced to represent and annotate agronomic data. However, those ontologies are spread out, in different formats, of different size, with different structures and from overlapping domains. Therefore, there is need for a common platform to receive and host them, align them, and enabling their use in agro-informa...
Conference Paper
Full-text available
Adverse drug reactions (ADR) result in significant morbidity and mortality in patients, and a substantial proportion of these ADRs are caused by drug-drug interactions (DDIs). Pharmacovigilance methods are used to detect unan-ticipated DDIs and ADRs by mining Spontaneous Reporting Systems, such as the US FDA Adverse Event Reporting System (FAERS)....
Conference Paper
Full-text available
The Center for Expanded Data Annotation and Retrieval (CEDAR) aims to revolutionize the way that metadata describing scientific experiments are authored. The software we have developed—the CEDAR Workbench—is a suite of Web-based tools and REST APIs that allows users to construct metadata templates, to fill in templates to generate high-quality meta...
Conference Paper
Full-text available
BiOnIC is a catalog of aggregated statistics of user clicks, queries, and reuse counts for access to over 200 biomedical ontologies. BiOnIC also provides anonymized sequences of classes accessed by users over a period of four years. To generate the statistics, we processed the access logs of BioPortal, a large open biomedical ontology repository. W...
Article
Full-text available
The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web...
Article
Full-text available
The metadata about scientific experiments are crucial for finding, reproducing, and reusing the data that the metadata describe. We present a study of the quality of the metadata stored in BioSample--a repository of metadata about samples used in biomedical experiments managed by the U.S. National Center for Biomedical Technology Information (NCBI)...
Article
Full-text available
Background Structured data acquisition is a common task that is widely performed in biomedicine. However, current solutions for this task are far from providing a means to structure data in such a way that it can be automatically employed in decision making (e.g., in our example application domain of clinical functional assessment, for determining...
Article
Full-text available
Clinicians and clinical decision-support systems often follow pharmacotherapy recommendations for patients based on clinical practice guidelines (CPGs). In multimorbid patients, these recommendations can potentially have clinically significant drug–drug interactions (DDIs). In this study, we describe and validate a method for programmatically detec...
Preprint
This proposal is a response to NIH's call for creation of a Data Commons (RM-17-026). The Commons must support use cases of many stakeholders who need access to scholarly process, content, and outcomes in pursuit of knowledge. Moreover, the Commons must be flexible enough to respect researchers’ idiosyncratic workflows, yet specific enough to solve...
Article
Full-text available
Background Ontologies and controlled terminologies have become increasingly important in biomedical research. Researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability across disparate datasets. However, the number, variety and complexity of current biomedical ontologies make it cum...