Daniel Garijo

Daniel Garijo
Universidad Politécnica de Madrid | UPM · Departamento de Inteligencia Artificial

PhD In Artificial Intelligence. ALL MY PUBLICATIONS ARE AVAILABLE AT at http://dgarijo.com/publications

About

121
Publications
30,841
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,131
Citations
Introduction
I am a researcher at the Information Sciences Institute of the University of Southern California. I also collaborate with the Ontology Engineering Group at the Computer Science Faculty of Universidad Politécnica de Madrid. My research activities focus on e-Science and the Semantic Web, specifically on how to increase the understandability of scientific workflows using their outputs, inputs,provenance, metadata, intermediate results and exposing them as Linked Data.

Publications

Publications (121)
Preprint
Full-text available
The Workflows Community Summit gathered 111 participants from 18 countries to discuss emerging trends and challenges in scientific workflows, focusing on six key areas: time-sensitive workflows, AI-HPC convergence, multi-facility workflows, heterogeneous HPC environments, user experience, and FAIR computational workflows. The integration of AI and...
Preprint
Full-text available
Recent trends within computational and data sciences show an increasing recognition and adoption of computational workflows as tools for productivity, reproducibility, and democratized access to platforms and processing know-how. As digital objects to be shared, discovered, and reused, computational workflows benefit from the FAIR principles, which...
Article
Full-text available
Recording the provenance of scientific computation results is key to the support of traceability, reproducibility and quality assessment of data products. Several data models have been explored to address this need, providing representations of workflow plans and their executions as well as means of packaging the resulting information for archiving...
Article
Full-text available
RDF-star has been proposed as an extension of RDF to make statements about statements. Libraries and graph stores have started adopting RDF-star, but the generation of RDF-star data remains largely unexplored. To allow generating RDF-star from heterogeneous data, RML-star was proposed as an extension of RML. However, no system has been developed so...
Chapter
Greenhouse gas emissions have become a common means for determining the carbon footprint of any commercial activity, ranging from booking a trip or manufacturing a product to training a machine learning model. However, calculating the amount of emissions associated with these activities can be a difficult task, involving estimations of energy used...
Preprint
Full-text available
Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given th...
Article
The 12th Symposium on Educational Advances in Artificial Intelligence (EAAI-22, cochaired by Michael Guerzhoy and Marion Neumann) continued the AAAI/ACM SIGAI New and Future AI Educator Program to support the training of early-career university faculty, secondary school faculty, and future educators (PhD candidates or postdocs who intend a career i...
Article
Full-text available
Background The FAIR principles (Wilkinson et al. 2016) are fundamental for data discovery, sharing, consumption and reuse; however their broad interpretation and many ways to implement can lead to inconsistencies and incompatibility (Jacobsen et al. 2020). The European Open Science Cloud (EOSC) has been instrumental in maturing and encouraging FAIR...
Article
Full-text available
RO-Crate (Soiland-Reyes et al. 2022) is a lightweight method to package research outputs along with their metadata, based on Linked Data principles (Bizer et al. 2009) and W3C standards. RO-Crate provides a flexible mechanism for researchers archiving and publishing rich data packages (or any other research outcome) by capturing their dependencies...
Article
Full-text available
A Digital Object (DO) "is a sequence of bits, incorporating a work or portion of a work or other information in which a party has rights or interests, or in which there is value". DOs should have persistent identifiers, meta-data and be readable by both humans and machines. A FAIR Digital Object is a DO able to interact with automated data processi...
Chapter
Ontologies define data organization and meaning in Knowledge Graphs (KGs). However, ontologies have generally not been taken into account when designing and generating Application Programming Interfaces (APIs) to allow developers to consume KG data in a developer-friendly way. To fill this gap, this work proposes a method for API generation based o...
Chapter
The FAIR principles have become a popular means to guide researchers when publishing their research outputs (i.e., data, software, etc.) in a Findable, Accessible, Interoperable and Reusable manner. In order to ease compliance with FAIR, different frameworks have been developed by the scientific community, offering guidance and suggestions to resea...
Article
Full-text available
Scientific software registries and repositories improve software findability and research transparency, provide information for software citations, and foster preservation of computational methods in a wide range of disciplines. Registries and repositories play a critical role by supporting research reproducibility and replicability, but developing...
Article
An increasing amount of researchers use software images to capture the requirements and code dependencies needed to carry out computational experiments. Software images preserve the computational environment required to execute a scientific experiment and have become a crucial asset for reproducibility. However, software images are usually not prop...
Article
Full-text available
An increasing number of researchers support reproducibility by including pointers to and descriptions of datasets, software and methods in their publications. However, scientific articles may be ambiguous, incomplete and difficult to process by automated systems. In this paper we introduce RO-Crate, an open, community-driven, and lightweight approa...
Article
Full-text available
Wikidata has been increasingly adopted by many communities for a wide variety of applications, which demand high-quality knowledge to deliver successful results. In this paper, we develop a framework to detect and analyze low-quality statements in Wikidata by shedding light on the current practices exercised by the community. We explore three indic...
Article
Full-text available
An increasing number of researchers rely on computational methods to generate or manipulate the results described in their scientific publications. Software created to this end—scientific software—is key to understanding, reproducing, and reusing existing work in many disciplines, ranging from Geosciences to Astronomy or Artificial Intelligence. Ho...
Preprint
Full-text available
The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds of seemingly equivalent workflow systems, many isolated research claims, and a steep learning curve. To address some of these challenges and lay the groundwork for transforming workflows research and development, the WorkflowsRI and ExaWorks projec...
Preprint
Full-text available
An increasing number of researchers support reproducibility by including pointers to and descriptions of datasets, software and methods in their publications. However, scientific articles may be ambiguous, incomplete and difficult to process by automated systems. In this paper we introduce RO-Crate, an open, community-driven, and lightweight approa...
Preprint
Full-text available
Application developers today have three choices for exploiting the knowledge present in Wikidata: they can download the Wikidata dumps in JSON or RDF format, they can use the Wikidata API to get data about individual entities, or they can use the Wikidata SPARQL endpoint. None of these methods can support complex, yet common, query use cases, such...
Article
Major societal and environmental challenges involve complex systems that have diverse multi-scale interacting processes. Consider, for example, how droughts and water reserves affect crop production and how agriculture and industrial needs affect water quality and availability. Preventive measures, such as delaying planting dates and adopting new a...
Preprint
Full-text available
Wikidata has been increasingly adopted by many communities for a wide variety of applications, which demand high-quality knowledge to deliver successful results. In this paper, we develop a framework to detect and analyze low-quality statements in Wikidata by shedding light on the current practices exercised by the community. We explore three indic...
Preprint
Full-text available
Scientific workflows are a cornerstone of modern scientific computing, and they have underpinned some of the most significant discoveries of the last decade. Many of these workflows have high computational, storage, and/or communication demands, and thus must execute on a wide range of large-scale platforms, from large clouds to upcoming exascale H...
Article
The adoption of Knowledge Graphs (KGs) by public and private organizations to integrate and publish data has increased in recent years. Ontologies play a crucial role in providing the structure for KGs, but are usually disregarded when designing Application Programming Interfaces (APIs) to enable browsing KGs in a developer-friendly manner. In this...
Article
Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibil...
Chapter
Many organizations maintain knowledge graphs that are organized according to ontologies. However, these ontologies are implemented in languages (e.g. OWL) that are difficult to understand by users who are not familiar with knowledge representation techniques. In particular, this affects web developers who want to develop ontology-based applications...
Article
Full-text available
Abstract This review summarizes the last decade of work by the ENIGMA (Enhancing NeuroImaging Genetics through Meta Analysis) Consortium, a global alliance of over 1400 scientists across 43 countries, studying the human brain in health and disease. Building on large-scale genetic studies that discovered the first robustly replicated genetic loci as...
Chapter
With the adoption of Semantic Web technologies, an increasing number of vocabularies and ontologies have been developed in different domains, ranging from Biology to Agronomy or Geosciences. However, many of these ontologies are still difficult to find, access and understand by researchers due to a lack of documentation, URI resolving issues, versi...
Chapter
In recent years, Semantic Web technologies have been increasingly adopted by researchers, industry and public institutions to describe and link data on the Web, create web annotations and consume large knowledge graphs like Wikidata and DBpedia. However, there is still a knowledge gap between ontology engineers, who design, populate and create know...
Chapter
Full-text available
Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly nee...
Chapter
In the originally published version of chapter 18 the name of Rongpeng Li was misspelled. This has been corrected.
Chapter
Ontologies are widely used nowadays for many different purposes and in many different contexts, like industry and research, and in domains ranging from geosciences, biology, chemistry or medicine. When used for research, ontologies should be treated as other research artefacts, such as data, software, methods, etc.; following the same principles us...
Preprint
Full-text available
Climate science is critical for understanding both the causes and consequences of changes in global temperatures and has become imperative for decisive policy-making. However, climate science studies commonly require addressing complex interoperability issues between data, software, and experimental approaches from multiple fields. Scientific workf...
Preprint
Full-text available
In recent years, Semantic Web technologies have been increasingly adopted by researchers, industry and public institutions to describe and link data on the Web, create web annotations and consume large knowledge graphs like Wikidata and DBPedia. However, there is still a knowledge gap between ontology engineers, who design, populate and create know...
Preprint
Full-text available
Knowledge graphs (KGs) have become the preferred technology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is heterogeneous, difficult to integrate and only covers a subset of the operations that are commonly nee...
Preprint
Full-text available
With the adoption of Semantic Web technologies, an increasing number of vocabularies and ontologies have been developed in different domains, ranging from Biology to Agronomy or Geosciences. However, many of these ontologies are still difficult to find, access and understand by researchers due to a lack of documentation, URI resolving issues, versi...
Preprint
Full-text available
Ontologies are widely used nowadays for many different purposes and in many different contexts, like industry and research, and in domains ranging from geosciences, biology, chemistry or medicine. When used for research, on-tologies should be treated as other research artefacts, such as data, software, methods , etc.; following the same principles...
Poster
Understanding the impacts of climate change on natural and human systems poses major challenges as it requires the integration of models and data across various disciplines, including hydrology, agriculture, ecosystem modeling, and econometrics. While tactical situations arising from an extreme weather event require rapid responses, integrating the...
Article
Full-text available
Computational workflows describe the complex multi-step methods that are used for data collection, data preparation, analytics, predictive modelling, and simulation that lead to new data products. They can inherently contribute to the FAIR data principles: by processing data according to established metadata; by creating metadata themselves during...
Article
Full-text available
The progress of science is tied to the standardization of measurements, instruments, and data. This is especially true in the Big Data age, where analyzing large data volumes critically hinges on the data being standardized. Accordingly, the lack of community‐sanctioned data standards in paleoclimatology has largely precluded the benefits of Big Da...
Chapter
Scientific data generation in the world is continuous. However, scientific studies once published do not take advantage of new data. In order to leverage this incoming flow of data, we present Neuro-DISK, an end-to-end framework to continuously process neuroscience data and update the assessment of a given hypothesis as new data become available. O...
Technical Report
This document proposes an overarching modeling problem framework and definitions for terms used in the World Modelers program in TA2-TA3 modeling, also called bottom-up modeling. There are several reasons to propose a shared conceptual framework and terminology. First, there is a high degree of ambiguity in the use of terms such as "scenario", "mod...
Conference Paper
The web contains millions of useful spreadsheets and CSV files, but these files are difficult to use in applications because they use a wide variety of data layouts and terminology. We present Table To Wikidata Mapping Language (T2WML), a language that makes it easy to map and link arbitrary spreadsheets and CSV files to the Wikidata data model. Th...
Preprint
Full-text available
This review summarizes the last decade of work by the ENIGMA (Enhancing NeuroImaging Genetics throughMeta Analysis) Consortium, a global alliance of over 1,400 scientists across 43 countries, studying the humanbrain in health and disease. Building on large-scale genetic studies that discovered the first robustly replicatedgenetic loci associated wi...
Poster
Many aspects of geosciences pose novel problems for intelligent systems research. Geoscience data is challenging because it tends to be uncertain, intermittent, sparse, multiresolution, and multi-scale. Geosciences processes and objects often have amorphous spatiotemporal boundaries. The lack of ground truth makes model evaluation, testing, and com...
Poster
Many aspects of geosciences pose novel problems for intelligent systems research. Geoscience data is challenging because it tends to be uncertain, intermittent, sparse, multiresolution, and multi-scale. Geosciences processes and objects often have amorphous spatiotemporal boundaries. The lack of ground truth makes model evaluation, testing, and com...
Conference Paper
Automated Machine Learning (AutoML) systems are emerging that automatically search for possible solutions from a large space of possible kinds of models. Although fully automated machine learning is appropriate for many applications, users often have knowledge that supplements and constraints the available data and solutions. This paper proposes hu...
Conference Paper
Understanding the interactions between natural processes and human activities poses major challenges as it requires the integration of models and data across disparate disciplines. It typically takes many months and even years to create valid end-to-end simulations as different models need to be configured in consistent ways and generate data that...
Article
Full-text available
Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers t...
Article
Due to the increasing uptake of semantic technologies, ontologies are now part of a good number of information systems. As a result, software development teams that have to combine ontology engineering activities with software development practices are facing several challenges, since these two areas have evolved, in general, separately. In this pa...
Preprint
Full-text available
The cerebral cortex underlies our complex cognitive capabilities, yet we know little about the specific genetic loci influencing human cortical structure. To identify genetic variants impacting cortical structure, we conducted a genome-wide association meta-analysis of brain MRI data from 35,660 individuals with replication in 15,578 individuals. W...
Conference Paper
Model repositories are key resources for scientists in terms of model discovery and reuse, but do not focus on important tasks such as model comparison and composition. Model repositories do not typically capture important comparative metadata to describe assumptions and model variables that enable a scientist to discern which models would be bette...
Poster
Geoscience problems are complex and often involve data that changes across space and time. Frequently geoscience knowledge and understanding provides valuable information and insight for problems related to energy, water, climate, mineral resources, and our understanding of how the Earth evolves through time. Simultaneously, many grand challenges i...
Poster
Data integration applications are ubiquitous in scientific disciplines. A state-of-the-art data integration system accepts both a set of data sources and a target ontology as input, and semi-automatically maps the data sources in terms of concepts and relationships in the target ontology. Mappings can be both complex and highly domain-specific. Onc...
Conference Paper
Full-text available
In this paper we describe WIDOCO, a WIzard for DOCumenting Ontologies that guides users through the documentation process of their vocabularies. Given an RDF vocabulary, WIDOCO detects missing vocabulary metadata and creates a documentation with diagrams, human readable descriptions of the ontology terms and a summary of changes with respect to pre...
Conference Paper
Full-text available
Traditional approaches to ontology development have a large lapse between the time when a user using the ontology has found a need to extend it and the time when it does get extended. For scientists, this delay can be weeks or months and can be a significant barrier for adoption. We present a new approach to ontology development and data annotation...
Article
Scientific collaborations involving multiple institutions are increasingly commonplace. It is not unusual for publications to have dozens or hundreds of authors, in some cases even a few thousands. Gathering the information for such papers may be very time consuming, since the author list must include authors who made different kinds of contributio...
Article
The reproducibility of scientific experiments is crucial for corroborating, consolidating and reusing new scientific discoveries. However, the constant pressure for publishing results (Fanelli, 2010) has removed reproducibility from the agenda of many researchers: in a recent survey published in Nature (with more than 1500 scientists) over 70% of t...
Article
The reproducibility of scientific experiments is crucial for corroborating, consolidating and reusing new scientific discoveries. However, the constant pressure for publishing results (Fanelli, 2010) has removed reproducibility from the agenda of many researchers: in a recent survey published in Nature (with more than 1500 scientists) over 70% of t...
Conference Paper
We propose a new area of research on automating data narratives. Data narratives are containers of information about computationally generated research findings. They have three major components: 1) A record of events, that describe a new result through a workflow and/or provenance of all the computations executed; 2) Persistent entries for key ent...
Article
Scientific data is continuously generated throughout the world. However, analyses of these data are typically performed exactly once and on a small fragment of recently generated data. Ideally, data analysis would be a continuous process that uses all the data available at the time, and would be automatically re-run and updated when new data appear...
Article
Full-text available
During the last 20 years, video games have become very popular and widely adopted in our society. However, despite the growth on video game industry, there is a lack of interoperability that allow developers to interchange their information freely and to form stronger partnerships. In this paper we present the Video Game Ontology (VGO), a model for...
Article
Scientific workflows are increasingly used to manage and share scientific computations and methods to analyze data. A variety of systems have been developed that store the workflows executed and make them part of public repositories However, workflows are published in the idiosyncratic format of the workflow system used for the creation and executi...
Conference Paper
OntoSoft is a distributed semantic registry for scientific software. This paper describes three major novel contributions of OntoSoft: 1) a software metadata registry designed for scientists, 2) a distributed approach to software registries that targets communities of interest, and 3) metadata crowdsourcing through access control. Software metadata...
Conference Paper
Full-text available
Software is fundamental to academic research work, both as part of the method and as the result of research. In June 2016 25 people gathered at Schloss Dagstuhl for a week-long Perspectives Workshop and began to develop a manifesto which places emphasis on the scholarly value of academic software and on personal responsibility. Twenty pledges cover...