About
163
Publications
82,043
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,829
Citations
Introduction
Data scientist with a biological background: more than 15 years of experience in programming and data analysis as well as more than 10 years of experience in applying machine learning.
Additional affiliations
Education
October 2012 - November 2012
July 2011 - July 2011
Independent Researcher
Field of study
- Management of corrective and preventive actions under ISO/IEC 17025
September 2004
Microsoft Developer Network Users Group Peru
Field of study
- Extreme Programming
Publications
Publications (163)
A major justification for taxonomic and biogeographic research is its assumed ability to predict the presence of traits in a group for which the trait has been observed in only a representative subset of the group. Such predictors are regularly used by breeders interested in choosing potential sources of disease and pest resistant germplasm for cul...
The common potato, Solanum tuberosum L., is the third most important food crop and is grown and consumed worldwide. Indigenous cultivated (landrace) potatoes and wild potato species, all classified as Solanum section Petota, are widely used for potato improvement. Members of section Petota are broadly distributed in the Americas from the southweste...
The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model...
Motivation Modern genomic breeding methods rely heavily on very large amounts of phenotyping and genotyping data, presenting new challenges in effective data management and integration. Recently, the size and complexity of datasets have increased significantly, with the result that data is often stored on multiple systems. As analyses of interest i...
We report the first identification of novel viruses, and sequence of an entire viral genome, by a single step of high-throughput parallel sequencing of small RNAs from diseased, as well as symptomless plants. Contigs were assembled from sequenced total siRNA from plants using small sequence assembly software and could positively identify RNA, ssDNA...
Solanum section Petota, which includes the cultivated potato (S. tuberosum) and its wild relatives, is distributed from the southwestern United States to central Argentina, Uruguay, and adjacent Chile. This taxonomic treatment includes all wild species of section Petota from northern South America, which includes Venezuela, Colombia, Ecuador, and P...
Wild potato species have substantial phenotypic and physiological diversity. Here, we report a comprehensive assessment of wild and cultivated potato species based on genomic analyses of 201 accessions of Solanum section Petota. We sequenced the genomes of these 201 accessions and identified 6,487,006 high-quality single nucleotide polymorphisms (S...
Several databases exist that store data from plant breeding experiments. To facilitate data access and data re-use a common Breeding Application Programming Interface (BrAPI) has been agreed between database developers from an international community representing the public and private sector. It is based on the principles of open linked data and a...
The whole idea of crop improvement, be it through conventional breeding methods or through genomics-assisted breeding methods, is to increase genetic gain over time. Genomics-assisted breeding is hoped to help reduce the length of the conventional breeding cycle by allowing selection, at least in some generations, during early developmental stages...
Several databases exist that store data from plant breeding experiments. To facilitate data access and data re-use a common Breeding Application Programming Interface (BrAPI) has been agreed between database developers from an international community representing the public and private sector. It is based on the principles of open linked data and a...
Here I present a personal selection of experiences and lessons (stories) learnt with regard to knowledge management and along the three dimensions of: data integration, data exploration and data analysis. Additional considerations inlcude: cost reduction and disaster recovery. The stories highlight the importance of documentation of processes, prot...
Several databases exist that store data from plant breeding experiments. To facilitate data access and data re-use a common Breeding Application Programming Interface (BrAPI) has been agreed between database developers from an international community representing the public and private sector. It is based on the principles of open linked data and a...
Several hypotheses have been postulated about the wild species that probably gave rise to cultivated potato species. Some of them have been evaluated, using molecular approaches and have received some support. Unfortunately, these studies have been done with a limited number of species and/or representatives of species, which limits the acceptance...
Sweetpotato is a food security crop especially for the poorest of the world, a polyploid crop (2n=6x=90) with a highly heterozygous genome with an estimated size of 4.4Gb. Consequently, genome analysis is difficult and genomic tools for sweetpotato improvement are scarce. Toward closing this gap, we followed the two-way pseudo-testcross approach to...
The Insect Life Cycle Modeling software (ILCYM) assists in the development of phenology models. It also provides analytical tools for studying pest population ecology. The software consists of three modules. The "model builder" facilitates the development of insect phenology models based on experimental temperature-dependent life-table data of a sp...
The aims of the course are to provide statisticians or scientists with interest in statistical programming with additional tools to support reproducibility and productivity. As experiments get more complicated both the design and analysis become more challenging. Modularization, documentation, standardization and keeping logs is a key ingredient to...
Here we give an update on recent developments in the wider community, an update on the workshop at NCSU in early 2016. We highlight the motivations that is the necessity for data sharing and re-use of data. We also briefly describe major issues.
Faster routine analysis of breeding trials
Building on current tools like CloneSelector
Working with community tools:
Sweetpotatobase
KSU fieldbook app
Accu datalog
Technical update to take better advantage of
Interactivity: Linked data & linked views
R reproducible reports for automating analyses
Ontologies to facilitate (statistical) handling o...
Here we give an overview of current status of ontology related research at CIP. This includes building and curation of potato and sweet potato ontology with special sections on participatory varietal selection, and the building of R libraries and R based tools; a) to access and transform ontologies from/between formats; and b) to analyze and visual...
This analytical tool aims to facilitate data exploration, visualization, analysis and reporting for clonal crop breeders. Ontologies in this application facilitate the analytical workflow in two ways: a) they reduce ambiguities specifically in data types and b) help to automate reporting. The tool has a web and a desktop interface.
The Crop Ontology (CO, http://www.cropontology.org/) is a resource of the Integrated Breeding Platform (IBP, http://integratedbreeding.net/) providing breeders with crop specific terms for fieldbook edition and data annotation. Until Mai 2015, a plant phenotype was annotated with 3 CO identifiers for the trait, the method and the scale, respectivel...
Plant breeders routinely work with large datasets that increasingly also add new levels of dimensionality and complexity using, for example, –omics data. This gives breeders additional work for data analysis, which can be addressed at least partially using computational tools. To some extent, field trials are routine and repetitive and can benefit...
Plant breeders and educators working with the International Potato Center (CIP) needed freely available statistical tools. In response, we created first a set of scripts for specific tasks using the open source statistical software R. Based on this we eventually compiled the R package agricolae as it covered a niche. Here we describe for the first...
The sweetpotato ontology is part of a community effort to establish a set of related crop ontologies (www.cropontology.org). The crop ontologies provide a standard nomenclature to describe crop development and agronomic traits to facilitate the analyzing and sharing of phenotypic and genotypic information. An ontology consists of controlled, hierar...
The genus Solanum is among the most species-rich genera both of the Peruvian flora and of the tropical Andes in general. We currently recognize 276 species of Solanum L. from Peru, of which 253 are native, while 23 are introduced and/or cultivated. A total of 74 Solanum species (29% of native species) are endemic to Peru. Additional 58 species occu...
Crop wild relatives have a long history of use in potato breeding, particularly for pest and disease resistance, and are expected to be increasingly used in the search for tolerance to biotic and abiotic stresses. Their current and future use in crop improvement depends on their availability in ex situ germplasm collections. As these plants are imp...
Los autores. Este artículo es publicado por la Revista Peruana de Biología de la Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos. Este es un artículo de acceso abierto, distribuido bajo los términos de la Licencia Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional.(http://creativecommons.org/licenses/...
Datacheck provides functions to check variables against a set of data quality rules. A rule file can be accompanied by look-up tables. In addition, there are some convenience functions that may serve as an example for defining clearer 'data rules'. An HTML based user interface facilitates initial exploration of the functionality. This is a release...
Note: We were made aware of a naming conflict with a prior similar tool; therefore we renamed our tool to QuiGMap.
Genetic maps are an important visual tool to summarize and explore genetic data. As genetic markers sets get more dense it is more and more important that the graphical user interface of a genetic map viewer is fast. Limitations in th...
Sweetpotato, the seventh most important food crop worldwide, is valued for combating hunger, malnutrition, and poverty because of its hardy nature, wide adaptability, and tolerance to abiotic stresses. Cultivated sweetpotato is hexaploid (2n = 6x = 90) and due to the complexity of its genome, it is still an orphan in the understanding of its geneti...
Records (with and without coordinates) representing germplasm accessions and sightings of the wild relatives of potato. These records were used as input to assess the ex situ conservation urgency of 73 wild relatives of potato
Plant breeders and educators working with the International Potato Center (CIP) needed freely available statistical tools. In response, we created first a set of scripts for specific tasks using the open source statistical software R. Based on this we eventually compiled the R package agricolae as it covered a niche. Here we describe for the first...
Genebanks increasingly use molecular markers for routine characterization of ex-situ collections and farmer managed diversity. The International Potato Center presently uses a SSR marker-kit to create molecular profiles for potato accessions. We identified a need for a compact graphical representation that allows comparative presentation of molecul...
Genebanks increasingly use molecular markers for routine characterization of ex-situ collections and farmer managed diversity. The International Potato Center presently uses a SSR marker-kit to create molecular profiles for potato accessions. We identified a need for a compact graphical representation that allows comparative presentation of molecul...
Genebanks increasingly use molecular markers for routine characterization of ex-situ collections and farmer managed diversity. The International Potato Center presently uses a SSR marker-kit to create molecular profiles for potato accessions. We identified a need for a compact graphical representation that allows comparative presentation of molecul...
This R package provides convenience functions for botanists to create specimen indices ('exsiccatae'). This version is primarily a maintainance release to be compatible with the latest base software (R 3.3.1). Bug fixes • grouping of specimen numbers by species in Index to Numbered Collections fixed.
El proyecto de Ontología de Cultivos ha sido identificado como una iniciativa del "Generation Challenge Program" (GCP), liderado por "Bioversity International", como una herramienta fundamental para la gestión y análisis de la información relacionada con los cultivos. En los últimos años en la agricultura se ha notado un incremento en las terminolo...
La integración de datos implica el proceso de combinar datos heterogéneos provenientes de diferentes fuentes, facilitando una vista unificada de los mismos a los investigadores. En el CIP se tienen registros con información pasaporte, genealógica, genotípica, fenotípica (evaluación y caracterización) además de datos geográficos y climáticos. Esta c...
The package provides functionality to check variables against a set of basic rules. A rule file can be accompanied by look-up tables. In addition, the package provides some convenience testing functions that may serve also as an example for defining 'data rules' or tests. This version is primarily a maintainance release to be compatible with the la...
RTBMaps is an online GIS tool to visualize production, constraints and social indicators associated with Roots and Tubers and Bananas (RTB) crops. Information mapped by the tool, includes data on pests and diseases, evapotranspiration rates, vulnerability to failed harvests, fertilizer application rates and the incidence of malnutrition in children...
Genebanks increasingly use molecular markers for routine characterization of ex-situ collections and farmer managed diversity. CIP's (International Potato Center) genebank presently uses a SSR marker-kit to produce molecular pro�les for potato accessions. We have been searching for a compact graphical representation that shows both molecular divers...
Gene banks increasingly use molecular markers for routine characterization of plant collections and farmer managed diversity. The gene bank of the International Potato Center presently uses a micro-satellite marker kit to produce molecular profiles for potato accessions. We have been searching for a compact graphical representation that shows both...
Integrating and sharing accession-level and omics-size genotype, phenotype and environmental data: Experiences at the International Potato Center (CIP) Introduction Plant breeding consists in the creation and selection of new genotypes. This involves not only keeping records across generations and environments but also accommodating data of increas...
Integrating and sharing accession-level and omics-size genotype, phenotype and environmental data: Experiences at the International Potato Center (CIP) Introduction Plant breeding consists in the creation and selection of new genotypes. This involves not only keeping records across generations and environments but also accommodating data of increas...
Progress in developing a potato ontology for breeders Introduction The potato ontology is part of a community effort to establish a set of related crop ontologies. The advantage of an ontology is that both humans and software applications can understand a data domain. This will allow the application of numerical or data mining techniques that may h...
Progress in developing a sweetpotato ontology for breeders Introduction Crop ontologies have been identified under the Generation Challenge Program and at the International Potato Center (CIP) as a crucial tool for managing and analyzing crop related information. Here we report progress on applying ontological concepts on sweetpotato traits importa...
Sweetpotato is the fifth most important global crop. The largest comprehensive
collection of sweetpotato plants is held in trust at the International Potato Center
(CIP). To analyze genetic similarity, group germplasm, to complement
phenotypic characterization and evaluation data, we developed a single sequence
repeat (SSR) marker kit for sweetpota...
The purpose of this white paper is to provide an overview of the ongoing initiatives at center level to respond to changing public expectations and to the challenge of improving the conduct of science by making research data widely available. We also attempt to provide a framework for implementing open access for research data to maximize the CGIAR...
Understanding the genetic architecture of complex traits of agronomical importance is key
to improving them to enhance the performance of crops. Appropriate plant material,
available genotypic resources and segregation of traits of interest, multi-environment,
multi-year trials and best database management ensure rapid and efficient results. In
ord...
The genome of potato, a major global food crop, was recently sequenced. The work presented here details the integration of the potato reference genome (DM) with a new STS marker based linkage map and other physical and genetic maps of potato and the closely related species tomato. Primary anchoring of the DM genome assembly was accomplished using a...
Potato and sweetpotato breeding involves not only handling large amounts of plant materials, locations and respective environmental and management conditions but also the associated phenotypic traits like yield, resistance levels and nutritional values. Molecular markers have been introduced to facilitate the selection process – these current portf...
Species Distribution Modeling has many applications in ex-situ and in-situ management of genetic resources. In this presentation I present a summary of several recent published studies around three applications. They are: a) testing predictivity of bio-geographical origin; b) assessing the invasiveness of species and c) analyzing disjunct habitats...
Background
Conserved ortholog set (COS) markers are an important functional genomics resource that has greatly improved orthology detection in Asterid species. A comprehensive list of these markers is available at Sol Genomics Network (http://solgenomics.net/) and many of these have been placed on the genetic maps of a number of solanaceous specie...
*Premise of the study: Taxonomists manage large amounts of specimen data. This is usually initiated in spreadsheets and then converted for publication into locality lists and indices to associate collectors and collector numbers from herbarium sheets to identifications (exsiccatae). This conversion process is mostly done by hand and is time-consumi...
The package provides functionality to check variables
against a set of basic rules. A rule file can be accompanied by
lookup tables. In addition, the package provides some
convenience testing functions that may serve also as an example
for defining clearer ’data rules’ or tests. An html based user
interface facilitates initial exploration of the func...
Crop wild relatives (CWR) offer a critical resource to address food security needs by providing genetic diversity for crop improvement, leading to increased plasticity and productivity of farming systems. However, plant breeders typically have not developed systematic or comprehensive strategies for the characterization and utilization of CWR for c...
The tool allows creating simple specimen indices as found in taxonomic treatments based on a table
of specimen records. An example file of tabulated speciment data is provided. In addition, four
different exsiccatae styles are provided.
Conserved ortholog set (COS) markers are useful for genetic mapping across diverse taxa, including the Solanaceae. We amplified over 300 COS markers from diverse set of Solanum germplasm, sequenced them and aligned into the whole genome sequence of potato. We also mapped a set of COS markers genetically using three diploid interspecific Solanum cro...
Ontologies serve to structure a knowledge domain, share more easily the concepts and facilitate use of terms by both humans and algorithms. Other potential uses include knowledge transfer across species and predictions. The case of potato has several opportunities to test and evaluate the different uses of ontologies: a) potato has more than 100 re...
The Crop Ontology (CO) of the Generation Challenge Program (GCP) (http://cropontology.org/) currently contains eleven crop-specific ontologies and has been developed for the Integrated Breeding Platform (IBP) (https://www.integratedbreeding.net/) by several CGIAR centers. The CO provides validated trait names used by crop communities of practice (C...
En centros de investigación biológica del mundo se evalúa una gran cantidad de clones o accesiones de manera manual, siendo muy costoso. El contar con herramientas que permiten que la evaluación se realice en corto plazo y con menor sesgo en los datos sería de gran ayuda. Caracterizar muestras de tubérculos a partir de una imagen facilitará el trab...
La toma de datos en campo debe incluir una documentación adecuada de los parámetros del experimento. Preferiblemente se usara protocolos y variables definidos y transparentes para mejorar su potencial para el re-uso. En este trabajo presentamos un "software libre‟ llamado "DataCollector‟ para facilitar la organización y documentación de libros de c...
Origin and diversity of sweetpotato
Sweetpotato (Ipomoea batatas) belongs to the botanical family Convolvulaceae, Genus Ipomoea, section Batatas. The crop originates from wild species probably somewhere between the Yucatan Peninsula of Mexico and the Orinoco River in Venezuela. Understanding its origin, domestication, and diversity is vital to desi...
Originating from the Andean region and co-evolved with its food plant, the potato (Solanum sp.), the potato tuber moth Phthorimaea operculella (Zeller) has become an invasive potato pest globally. The hypothesis of our present study was that the future distribution and abundance (damage potential) of this pest will be greatly affected by climate-ch...
Solanum morelliforme is an epiphytic wild potato (Solanum section Petota) species widely distributed throughout central Mexico to Honduras. A strikingly disjunct (approximately 4,000 km) population was recently discovered in Bolivia, representing the first record of this species in South America, and the first species in the section growing in both...