Skills and Expertise
Sep 1993 - Sep 1995
US Department of Energy
- Washington, DC, USA
- Program Officer
Research Items (63)
- Jan 2017
Helping women make choices to reduce cancer risk and to improve breast health behaviors is important, but the best ways to reach more people with intervention assistance is not known. To test the efficacy of a web-based intervention designed to help women make better breast health choices, we adapted our previously tested, successful breast health intervention package to be delivered on the Internet, and then we tested it in a randomized trial. We recruited women from the general public to be randomized to either an active intervention group or a delayed intervention control group. The intervention consisted of a specialized website providing tailored and personalized risk information to all participants, followed by offers of additional support if needed. Follow-up at one-year post randomization revealed significant improvements in mammography screening in intervention women compared with control women (improvement of 13 percentage points). The intervention effects were more powerful in women who increased breast health knowledge and decreased cancer worry during intervention. These data indicate that increases in mammography can be accomplished in population-based mostly insured samples by implementing this simple, low resource intensive intervention.
Background: Efforts to harmonize genomic data standards used by the biodiversity and metagenomic research communities have shown that prokaryotic data cannot be understood or represented in a traditional, classical biological context for conceptual reasons, not technical ones. Results: Biology, like physics, has a fundamental duality—the classical macroscale eukaryotic realm vs. the quantum microscale microbial realm—with the two realms differing profoundly, and counter-intuitively, from one another. Just as classical physics is emergent from and cannot explain the microscale realm of quantum physics, so classical biology is emergent from and cannot explain the microscale realm of prokaryotic life. Classical biology describes the familiar, macroscale realm of multi-cellular eukaryotic organisms, which constitute a highly derived and constrained evolutionary subset of the biosphere, unrepresentative of the vast, mostly unseen, microbial world of prokaryotic life that comprises at least half of the planet’s biomass and most of its genetic diversity. The two realms occupy fundamentally different mega-niches: eukaryotes interact primarily mechanically with the environment, prokaryotes primarily physiologically. Further, many foundational tenets of classical biology simply do not apply to prokaryotic biology. Conclusions: Classical genetics one held that genes, arranged on chromosomes like beads on a string, were the fundamental units of mutation, recombination, and heredity. Then, molecular analysis showed that there were no fundamental units, no beads, no string. Similarly, classical biology asserts that individual organisms and species are fundamental units of ecology, evolution, and biodiversity, composing an evolutionary history of objectively real, lineage-defined groups in a single-rooted tree of life. Now, metagenomic tools are forcing a recognition that there are no completely objective individuals, no unique lineages, and no one true tree. The newly revealed biosphere of microbial dark matter cannot be understood merely by extending the concepts and methods of eukaryotic macrobiology. The unveiling of biological dark matter is allowing us to see, for the first time, the diversity of the entire biosphere and, to paraphrase Darwin, is providing a new view of life. Advancing and understanding that view will require major revisions to some of the most fundamental concepts and theories in biology.
We describe the outcomes of three recent workshops aimed at advancing development of the Biological Collections Ontology (BCO), the Population and Community Ontology (PCO), and tools to annotate data using those and other ontologies. The first workshop gathered use cases to help grow the PCO, agreed upon a format for modeling challenging concepts such as ecological niche, and developed ontology design patterns for defining collections of organisms and population-level phenotypes. The second focused on mapping datasets to ontology terms and converting them to Resource Description Framework (RDF), using the BCO. To follow-up, a BCO hackathon was held concurrently with the 16 th Genomics Standards Consortium Meeting, during which we converted additional datasets to RDF, developed a Material Sample Core for the Global Biodiversity Information Framework, created a Web Ontology Language (OWL) file for importing Darwin Core classes and properties into BCO, and developed a workflow for converting biodiversity data among formats.
The most fundamental unit of traditional biodiversity - the individual organism (defined as a physically connected, multi-cellular aggregation, with all of the cells clonally derived from one ancestral cell) - has no parallel in the world of prokaryotic biology. Yet recent advances (metagenomics tools, etc) have shown that about half of the world's biomass and by far most of its physiological (as opposed to morphological, or mechanical) biodiversity occurs in the prokaryotic realm. Other work is establishing that symbiosis (in the sense of the bio-outsourcing of some key biological functions) is far more common than traditionally recognized. In fact, because horizontal gene transfer (HGT) occurs without regard for species boundaries, it is now becoming clear that some microbial genes are more attributes of a particular ecosystem than of a particular individual or species. (This is why, for example, the same "cassette" of pathology genes is often found spreading across multiple bacterial species in hospital settings.) All of the above suggests that an understanding of biodiversity is incomplete, and significantly so, without the inclusion of microbial biodiversity, both as a component of ecosystems ("free living" microbial communities) and also at the level of full understanding of macroscale organisms (as influenced by the presence, make-up, physiology, and function of their associated microbiomes). From the perspective of biodiversity informatics, the complete addition of a microbial component using current methods and schemas is not possible. This session will begin with an overview of the gaps in today’s biodiversity information systems which preclude an accurate understanding of patterns of life on Earth, and how those patterns are changing. This overview will be followed by a discussion around possible roles for Darwin Core in helping to fill these gaps, and how the standard might have to evolve in order to fulfill these roles.
The Genomic Standards Consortium (GSC) is an open-membership community that was founded in 2005 to work towards the development, implementation and harmonization of standards in the field of genomics. Starting with the defined task of establishing a minimal set of descriptions the GSC has evolved into an active standards-setting body that currently has 18 ongoing projects, with additional projects regularly proposed from within and outside the GSC. Here we describe our recently enacted policy for proposing new activities that are intended to be taken on by the GSC, along with the template for proposing such new activities.
Question - How do I get information about the location of all genes on their respective chromosomes using base pair numeration?
There is a conceptual problem with the idea that a gene would have an absolute "chromosomal address" described quantitatively in base pairs. All human chromosomes carry variable number tandem repeats (or VNTRs) that occur with different numbers of repeats in different individuals. As a result, some human chromosomes show length variations of up to 10% across "normal" individuals. Given the size of human chromosomes, a possible 10% length variation means the bp address of a gene could vary by millions of base pairs across different individuals. It is better to represent gene locations using relative addressing, where the location is specified relative to some nearby genomic landmark.
This position paper, Data Management for LTER: 1980 – 2010, was independently prepared by Dr. Robert J. Robbins, a member of the 30 Year LTER Review Committee. This position paper is not part of the LTER 30 Year Review Report but is provided by the BIO Advisory Committee without comment or endorsement as an independent perspective regarding LTER data management. Jose' N. Onuchic, Chair, BIO Advisory Committee BIO 12-002
Public Release: 12/09/2011 Dear Colleagues: The Advisory Committee for the Biological Sciences (BIO AC) has approved the Long Term Ecological Research Program Report of the 30 Year Review Committee for public posting on the Advisory Committee web site. The BIO AC retains its prerogative to comment on the content of the report at a future date. The Advisory Committee would like to thank the Review Committee for its efforts in preparing this important report. Sincerely, Barbara Schaal, Ph.D. , ChairL Advisory Committee for the Biological Sciences This report was prepared by the participants of the review committee. Any opinions, findings, conclusions, or recommendations expressed in this report are those of the participants and do not necessarily represent the official views, opinions, or policy of the National Science Foundation. BIO 12-001
This report summarizes the proceedings of the 14 th workshop of the Genomic Standards Con-sortium (GSC) held at the University of Oxford in September 2012. The primary goal of the workshop was to work towards the launch of the Genomic Observatories (GOs) Network un-der the GSC. For the first time, it brought together potential GOs sites, GSC members, and a range of interested partner organizations. It thus represented the first meeting of the GOs Network (GOs1). Key outcomes include the formation of a core group of "champions" ready to take the GOs Network forward, as well as the formation of working groups. The workshop also served as the first meeting of a wide range of participants in the Ocean Sampling Day (OSD) initiative, a first GOs action. Three projects with complementary interests – COST Ac-tion ES1103, MG4U and Micro B3 – organized joint sessions at the workshop. A two-day GSC Hackathon followed the main three days of meetings.
“If names be not correct, language is not in accordance with the truth of things. If language be not in accordance with the truth of things, affairs cannot be carried on to success.” - Confucius, Analects, Book XIII, Chapter 3, verses 4-7, translated by James Legge Two workshops (hereafter described as “workshops”) were held in 2012, which brought together domain experts from genomic and biodiversity informatics, information modeling and biology, to clarify concepts and terms at the intersection of these domains. These workshops grew out of efforts sponsored by the NSF funded Resource Coordination Network (RCN) project for GSC  (RCN4GSC, hosted at UCSD, with John Wooley as PI) to reconcile terms from the Darwin Core (DwC)  vocabulary and with those in the MIxS family of checklists (Minimum Information about Any Type of Sequence) . The original RCN4GSC meetings were able to align many terms between DwC and MIxS, finding both common and complementary terms. However, deciding exactly what constitutes the concept of a sample, a specimen, and an occurrence  to satisfy the needs of all use cases proved difficult, especially given the wide variety of sampling strategies employed within and between communities. Further, participants in the initial RCN4GSC workshops needed additional guidance on how to relate these entities to processes that act upon them and the environments in which organisms live. These issues provided the motivation for the workshops described below. The two workshops drew largely from experiences of the Basic Formal Ontology (BFO)  and were led by Barry Smith, State University of New York at Buffalo. We chose to interact with Smith based on his successful interactions with the GSC in developing the Environment Ontology (EnvO)  and also, on the ability of BFO to unite previously disconnected ontologies in the medical domain . The first workshop addressed term definitions in biodiversity informatics, working within the BFO framework, while the second workshop developed a prototype Bio-Collections Ontology, dealing with samples and processes acting on samples. Concurrent with these workshops were two ongoing efforts involving data acquisition, visualization, and analysis that rely on a solid conceptual understanding of samples, specimens, and occurrences. These implementations are included in this report to show practical applications of term clarification. Finally, this report provides a discussion of some of the next steps discussed during the workshops.
The workshop-hackathon was convened by the Global Biodiversity Information Facility (GBIF) at its secretariat in Copenhagen over 22-24 May 2013 with additional support from several projects (RCN4GSC, EAGER, VertNet, BiSciCol, GGBN, and Micro B3). It assembled a team of experts to address the challenge of adapting the Darwin Core standard for a wide variety of sample data. Topics addressed in the workshop included 1) a review of outstanding issues in the Darwin Core standard, 2) issues relating to publishing of biodiversity data through Darwin Core Archives, 3) use of Darwin Core Archives for publishing sample and monitoring data, 4) the case for modifying the Darwin Core Text Guide specification to support many-to-many relations, and 5) the generalization of the Darwin Core Archive to a “Biodiversity Data Archive”. A wide variety of use cases were assembled and discussed in order to inform further developments.
Following up on efforts from two earlier workshops, a meeting was convened in San Diego to (a) establish working connections between experts in the use of the Darwin Core and the GSC MIxS standards, (b) conduct mutual briefings to promote knowledge exchange and to increase the understanding of the two communities' approaches, constraints, community goals, subtleties, etc., (c) perform an element-by-element comparison of the two standards, assessing the compatibility and complementarity of the two approaches, (d) propose and consider possible use cases and test beds in which a joint annotation approach might be tried, to useful scientific effect, and (e) propose additional action items necessary to continue the development of this joint effort. Several focused working teams were identified to continue the work after the meeting ended.
At the GSC11 meeting (4-6 April 2011, Hinxton, England, the GSC's genomic biodiversity working group (GBWG) developed an initial model for a data management testbed at the interface of biodiversity with genomics and metagenomics. With representatives of the Global Biodiversity Information Facility (GBIF) participating, it was agreed that the most useful course of action would be for GBIF to collaborate with the GSC in its ongoing GBWG workshops to achieve common goals around interoperability/data integration across (meta)-genomic and species level data. It was determined that a quick comparison should be made of the contents of the Darwin Core (DwC) and the GSC data checklists, with a goal of determining their degree of overlap and compatibility. An ad-hoc task group lead by Renzo Kottman and Peter Dawyndt undertook an initial comparison between the Darwin Core (DwC) standard used by the Global Biodiversity Information Facility (GBIF) and the MIxS checklists put forward by the Genomic Standards Consortium (GSC). A term-by-term comparison showed that DwC and GSC concepts complement each other far more than they compete with each other. Because the preliminary analysis done at this meeting was based on expertise with GSC standards, but not with DwC standards, the group recommended that a joint meeting of DwC and GSC experts be convened as soon as possible to continue this joint assessment and to propose additional work going forward.
Building on the planning efforts of the RCN4GSC project, a workshop was convened in San Diego to bring together experts from genomics and metagenomics, biodiversity, ecology, and bioinformatics with the charge to identify potential for positive interactions and progress, especially building on successes at establishing data standards by the GSC and by the biodiversity and ecological communities. Until recently, the contribution of microbial life to the biomass and biodiversity of the biosphere was largely overlooked (because it was resistant to systematic study). Now, emerging genomic and metagenomic tools are making investigation possible. Initial research findings suggest that major advances are in the offing. Although different research communities share some overlapping concepts and traditions, they differ significantly in sampling approaches, vocabularies and workflows. Likewise, their definitions of 'fitness for use' for data differ significantly, as this concept stems from the specific research questions of most importance in the different fields. Nevertheless, there is little doubt that there is much to be gained from greater coordination and integration. As a first step toward interoperability of the information systems used by the different communities, participants agreed to conduct a case study on two of the leading data standards from the two formerly disparate fields: (a) GSC's standard checklists for genomics and metagenomics and (b) TDWG's Darwin Core standard, used primarily in taxonomy and systematic biology.
A traditional drawing of the tree of life emphasizes the eukaryota. Classical biology has largely been the study of somatic tissue in multi-cellular eukaryotes. Yet the eukaryotes represent a highly specialized and highly constrained form of life. Attempts to understand biological dark matter -- the invisible prokaryotic realm -- by generalizing from studies on eukaryota cannot succeed. As metagenomic tools allow the study of the prokaryotic realm, we are discovering this truly is a microbial planet. The macroscopic organisms -- the subject of classical biology -- are just lumps in the microbial soup.
This report details the outcome of the 13(th) Meeting of the Genomic Standards Consortium. The three-day conference was held at the Kingkey Palace Hotel, Shenzhen, China, on March 5-7, 2012, and was hosted by the Beijing Genomics Institute. The meeting, titled From Genomes to Interactions to Communities to Models, highlighted the role of data standards associated with genomic, metagenomic, and amplicon sequence data and the contextual information associated with the sample. To this end the meeting focused on genomic projects for animals, plants, fungi, and viruses; metagenomic studies in host-microbe interactions; and the dynamics of microbial communities. In addition, the meeting hosted a Genomic Observatories Network session, a Genomic Standards Consortium biodiversity working group session, and a Microbiology of the Built Environment session sponsored by the Alfred P. Sloan Foundation.
Microbial ecology has been enhanced greatly by the ongoing 'omics revolution, bringing half the world's biomass and most of its biodiversity into analytical view for the first time; indeed, it feels almost like the invention of the microscope and the discovery of the new world at the same time. With major microbial ecology research efforts accumulating prodigious quantities of sequence, protein, and metabolite data, we are now poised to address environmental microbial research at macro scales, and to begin to characterize and understand the dimensions of microbial biodiversity on the planet. What is currently impeding progress is the need for a framework within which the research community can develop, exchange and discuss predictive ecosystem models that describe the biodiversity and functional interactions. Such a framework must encompass data and metadata transparency and interoperation; data and results validation, curation, and search; application programming interfaces for modeling and analysis tools; and human and technical processes and services necessary to ensure broad adoption. Here we discuss the need for focused community interaction to augment and deepen established community efforts, beginning with the Genomic Standards Consortium (GSC), to create a science-driven strategic plan for a Genomic Software Institute (GSI).
GBWG (the GSC Biodiversity Working Group, with assistance from RCN4GSC) is reaching out to other communities to engage scientists at the interface of genomics and biodiversity. The NSF RCN4GSC project at UCSD has the mission to: create a research coordination network to promote and integrate standards for genomic and metagenomic data and metadata within an international community. The network is based on the existing Genomic Standard Consortium and will be extended under this award to include ecological data standards such as Ecological Metadata Language, biodiversity standards such as Darwin Core, and environmental research programs such as the Global Lake Ecological Observatory Network and Long Term Ecological Research. Why the big push into biodiversity? • Biodiversity is less a field of biology than a perspective (that of variance) into biology. • Diversity is a sine qua non of biology; no diversity, no evolution. • Genetics / genomics are equally central to biology. • Probably half of the world’s biomass and by far most of its biodiversity exists in microbial communities, many of which have been effectively invisible until recently. Understanding this biological dark matter is critical to a full understanding of the biosphere.
Recent developments in our ability to capture, curate, and analyze data, the field of data-intensive science (DIS), have indeed made these interesting and challenging times for scientific practice as well as policy making in real time. We are confronted with immense datasets that challenge our ability to pool, transfer, analyze, or interpret scientific observations. We have more data available than ever before, yet more questions to be answered as well, and no clear path to answer them. We are excited by the potential for science-based solutions to humankind's problems, yet stymied by the limitations of our current cyberinfrastructure and existing public policies. Importantly, DIS signals a transformation of the hypothesis-driven tradition of science ("first hypothesize, then experiment") to one that is typified by "first experiment, then hypothesize" mode of discovery. Another hallmark of DIS is that it amasses data that are public goods (i.e., creates a "commons") that can further be creatively mined for various applications in different sectors. As such, this calls for a science policy vision that is long term. We herein reflect on how best to approach to policy making at this critical inflection point when DIS applications are being diversified in agriculture, ecology, marine biology, and environmental research internationally. This article outlines the key policy issues and gaps that emerged from the multidisciplinary discussions at the NSF-funded DIS workshop held at the Seattle Children's Research Institute in Seattle, on September 19-20, 2010.
Helping women make choices to reduce cancer risk and to improve breast health behaviors is important, but the best ways to reach more people with intervention assistance is not known. To test the efficacy of a Web-based intervention designed to help women make better breast health choices, we adapted our previously tested, successful breast health intervention package to be delivered on the Internet, and then we tested it in a randomized trial. We recruited women from the general public to be randomized to either an active intervention group or a delayed intervention control group. The intervention consisted of a specialized Web site providing tailored and personalized risk information to all participants, followed by offers of additional support if needed. Follow-up at 1-year post-randomization revealed significant improvements in mammography screening in intervention women compared with control women (improvement of 13 percentage points). The intervention effects were more powerful in women who increased breast health knowledge and decreased cancer worry during intervention. These data indicate that increases in mammography can be accomplished in population-based mostly insured samples by implementing this simple, low resource intensive intervention.
A random, population-based sample of 431 women aged 18–74 in King County, Washington, USA, completed a survey module on Internet use and access. Level of mental health, level of general health perceptions, older age, and higher income predicted women's health-related Internet use. Participants without access reported various barriers to obtaining access; perceived lack of usefulness of the Internet as an information source and unfamiliarity with using this technology appear equally important reasons as financial cost for not adopting the Internet. Internet use motivators are complex; these findings have relevance to the design of Internet-based interventions.
Data protection is important for all information systems that deal with human-subjects data. Grid-based systems--such as the cancer Biomedical Informatics Grid (caBIG)--seek to develop new mechanisms to facilitate real-time federation of cancer-relevant data sources, including sources protected under a variety of regulatory laws, such as HIPAA and 21CFR11. These systems embody new models for data sharing, and hence pose new challenges to the regulatory community, and to those who would develop or adopt them. These challenges must be understood by both systems developers and system adopters. In this paper, we describe our work collecting policy statements, expectations, and requirements from regulatory decision makers at academic cancer centers in the United States. We use these statements to examine fundamental assumptions regarding data sharing using data federations and grid computing. An interview-based study of key stakeholders from a sample of US cancer centers. Interviews were structured, and used an instrument that was developed for the purpose of this study. The instrument included a set of problem scenarios--difficult policy situations that were derived during a full-day discussion of potentially problematic issues by a set of project participants with diverse expertise. Each problem scenario included a set of open-ended questions that were designed to elucidate stakeholder opinions and concerns. Interviews were transcribed verbatim and used for both qualitative and quantitative analysis. For quantitative analysis, data was aggregated at the individual or institutional unit of analysis, depending on the specific interview question. Thirty-one (31) individuals at six cancer centers were contacted to participate. Twenty-four out of thirty-one (24/31) individuals responded to our request- yielding a total response rate of 77%. Respondents included IRB directors and policy-makers, privacy and security officers, directors of offices of research, information security officers and university legal counsel. Nineteen total interviews were conducted over a period of 16 weeks. Respondents provided answers for all four scenarios (a total of 87 questions). Results were grouped by broad themes, including among others: governance, legal and financial issues, partnership agreements, de-identification, institutional technical infrastructure for security and privacy protection, training, risk management, auditing, IRB issues, and patient/subject consent. The findings suggest that with additional work, large scale federated sharing of data within a regulated environment is possible. A key challenge is developing suitable models for authentication and authorization practices within a federated environment. Authentication--the recognition and validation of a person's identity--is in fact a global property of such systems, while authorization - the permission to access data or resources--mimics data sharing agreements in being best served at a local level. Nine specific recommendations result from the work and are discussed in detail. These include: (1) the necessity to construct separate legal or corporate entities for governance of federated sharing initiatives on this scale; (2) consensus on the treatment of foreign and commercial partnerships; (3) the development of risk models and risk management processes; (4) development of technical infrastructure to support the credentialing process associated with research including human subjects; (5) exploring the feasibility of developing large-scale, federated honest broker approaches; (6) the development of suitable, federated identity provisioning processes to support federated authentication and authorization; (7) community development of requisite HIPAA and research ethics training modules by federation members; (8) the recognition of the need for central auditing requirements and authority, and; (9) use of two-protocol data exchange models where possible in the federation.
Structured interview instruments. This file contains the four interview instruments that were used in the study. The questions for these instruments were developed using a team-based approach as described in the methods section of the paper.
- Jan 2007
Information explosion and new advances in high throughput experiments have challenged biomedical research, and suggested a future in which inter-institutional and international collaborations will be the norm. The cancer Biomedical Informatics Grid is an ambitious initiative launched by the US National Cancer Institute to develop a network of tools, data, and researchers to support translational and clinical research in oncology, with an ultimate goal to improve cancer care for patients. The three year pilot phase of caBIG ends in 2007, and has engaged over 900 clinicians, scientists, and patient advocates as developers, adopters, and workspace participants. Progress has been demonstrated in creating tools and building prototype grid architecture for collaborative research. Accomplishments in the pilot phase set the stage for extension of the community into other biomedical domains and for federation of the caBIG enterprise with similar initiatives in other scientific areas and in other countries.
Much is written about Internet access, Web access, Web site accessibility, and access to online health information. The term access has, however, a variety of meanings to authors in different contexts when applied to the Internet, the Web, and interactive health communication. We have summarized those varied uses and definitions and consolidated them into a framework that defines Internet and Web access issues for health researchers. We group issues into two categories: connectivity and human interface. Our focus is to conceptualize access as a multicomponent issue that can either reduce or enhance the public health utility of electronic communications.
The Internet might transform the way in which health information is communicated to patient and general populations. Understanding differences in usage patterns will be critically important to ensuring the successful distribution of health information. The present study presents early data on the use patterns and predictors of use of a Web-based intervention in a population-based subsample of women aged 18-74 in King County, WA. By three months over half (51%) of users had logged into the website, using multiple components. Predictors of use by three months included employment, perceptions of health and mental health scores. These data have implications for how to conduct Web-based intervention research and for individuals that may not benefit from such interventions.
Bioinformatics, the use of computers to support biological information management, has become an enabling technology, essential for the success of big-science projects in biology. Not yet a true discipline of its own, bioinformatics occupies space between biology and computer science, with interests in library and information science, engineering, and management as well. The interdisciplinary nature of bioinformatics can make it difficult for projects to gain support from agencies focused either on biology or on computer science. With the growth of global networking, achieving interoperability among biological information resources is now one of the most pressing challenges in bioinformatics. Technical, semantic, and social advances will be required for success to occur. Although the great success of WWW browsers in providing a loosely coupled, distributed information delivery system has finally proved the tremendous utility of a federated information infrastructure, WWW technology itself is not sufficient to meet the needs of those who need coordinated access into robust, structured data. Tools for resource discovery and resource filtering loom as unmet needs. Better data standardization and data indexing will be required as the resources continue to grow exponentially. As databases become more like scientific literature, new infrastructure functionality must be added. Intellectual property rights, data sharing, and data access remain as challenging policy issues, complicated by differing national approaches. Improved and coordinated approaches to providing long-term support for bioinformatics projects are needed. As the Global Information Infrastructure takes shape, international agencies with an interest in bioinformatics should work together to ensure that advances in the commercial sector are accompanied with support for needed functionality in the research community.
The original goals of the Human Genome Project (HGP) were: 1) construction of a high-resolution genetic map of the human genome: 2) production of a variety of physical maps of all human chromosomes and of the DNA of selected model organisms; 3) determination of the complete sequence of human DNA and of the DNA of selected model organisms; 4) development of capabilities for collecting, storing, distributing, and analyzing the data produced; and 5) creation of appropriate technologies necessary to achieve these objectives. Here, the authors assert that the most pressing information-infrastructure requirement now facing the HGP is achieving better interoperation among electronic information resources. Other needs may be equally important (better methods to support large-scale sequencing and mapping, for example), but none are as pressing. The problem of interoperability grows exponentially with the data. Efforts to develop distributed information publishing systems are now underway in many locations. If the needs of the genome project are not soon defined and articulated, they will not be addressed by these external projects. De facto standards will emerge and if these prove inadequate for scientific data publishing, the research community will have little choice but to tolerate this inadequacy indefinitely
Information technology is transforming biology, and the relentless effects of Moore's Law are transforming that transformation. Nowhere are these changes more apparent than in the international collaboration known as the Human Genome Project (HGP). The authors consider the relationship of informatics to genomic research. Topics discussed include: the nature of information technology; Moore's Law; informatics as an enabling technology; agency commitments; community recommendations; current trends; and future support
The goals of the Human Genome Project are: (1) construction of a high-resolution genetic map of the human genome, (2) production of a variety of physical maps of all human chromosomes and of the DNA of selected model organisms, (3) determination of the complete sequence of human DNA and of the DNA of selected model organisms, (4) development of capabilities for collecting, storing, distributing, and analyzing the data produced, and (5) creation of appropriate technologies necessary to achieve these objectives.
Biology is entering a new era in which data are being generated that cannot be published in the traditional literature. Databases are taking the role of scientific literature in distributing this information to the community. The success of some major biological undertakings, such as the Human Genome Project, will depend upon the development of a system for electronic data publishing. Many biological databases began as secondary literature-reviews in which certain kinds of data were collected from the primary literature. Now these databases are becoming a new kind of primary literature with findings being submitted directly to the database and never being published in print form. Some databases are offering publishing on demand services, where users can identify subsets of the data that are of interest, then subscribe to periodic distributions of the requested data. New systems, such as the Internet Gopher, make building electronic information resources easy and affordable while offering a powerful search tool to the scientific community. Although many questions remain regarding the ultimate interactions between electronic and traditional data publishing and about their respective roles in the scientific process, electronic data publishing is here now, changing the way biology is done. The technical problems associated with mounting cost-effective electronic data publishing are either solved, or solutions seem in reach. What is needed now, to take us all the way into electronic data publishing as a new, formal literature, is the development of more high-quality, professionally operated EDP sites. The key to transforming these into a new scientific literature is the establishment of appropriate editorial and review policies for electronic data publishing sites. Editors have the opportunity and the responsibility to work in the vanguard of a revolution in scientific publishing.
- Dec 1993
This conference was held in St. Petersburg Beach, Florida June 4--7, 1992. The purpose of this conference was to provide a forum for exchange of state-of-the-art information in the field of the human genome. This provided an opportunity to gain firsthand knowledge of the scope, direction, and future prospects of information and computing in complex genome analysis. Topics of discussion include the following: Linguistic approaches; applications of neural networks; databases; genome map; genome sequence analysis; mapping and sequencing; and computer technologies and mathematical approaches. Individual papers are processed separately for inclusion in the appropriate data bases.
Informatics of some kind will play a role in every aspect of the Human Genome Project (HGP): data acquisition, data analysis, data exchange, data publication and, data visualization. What are the real requirements and challenges? The primary requirement is clear thinking and the main challenge is design. If good design is lacking, the price will be failure of genome informatics and ultimately failure of the genome project itself. We need good designs to deliver the tools necessary for acquiring and analyzing DNA sequences. As these tools become more efficient, we will need new tools for comparative genomic analyses. To make the tools work, we will need to address and solve nomenclature issues that are essential, if also tedious. We must devise systems that will scale gracefully with the increasing flow of data. We must be able to move data easily from one system to another, with no loss of content. As scientists, we will have failed in our responsibility to share results, should repeating experiments ever become preferable to searching the literature. Our databases must become a new kind of scientific literature and we must develop ways to make electronic data publishing as routine as traditional journal publishing. Ultimately, we must build systems so advanced that they are virtually invisible. In summary, the HGP can be considered the most ambitious, most audacious information-management project ever undertaken. In the HGP, computers will not merely serve as tools for cataloging existing knowledge. Rather, they will serve as instruments, helping to create new knowledge by changing the way we see the biological world. Computers will allow us to see genomes, just as radio telescopes let us see quasars and electron microscopes let us see viruses.
Version 5.0 of the Genome Data Base (GDBTM) was released in March 1993. This document describes some of the significant changes to the types of data which are stored within the GDB. In addition to handling a wider scope of data, the GDB 5.0 application software now supports the X-Windows protocol. Although the GDB software still remains the most widely utilized method for accessing the data, alternate methods of access are now available, Including direct SQL (Structured Query Language) queries, FTP (Internet File Transfer Protocol), WAIS (Wide Area Information Server), and other tools produced by third-party developers.
The variable white mutation arose spontaneously in 1983 within a laboratory stock of wild-type deer mice (Peromyscus maniculatus). The original mutant animal was born to a wild-type pair that had previously produced several entirely wild-type litters. Other variable white animals were bred from the initial individual. Variable white deer mice exhibit extensive areas of white on the head, sides, and tail. Usually a portion of pigmented pelage occurs dorsally and on the shoulders, but the extent of white varies from nearly all white to patches of white on the muzzle, tip of tail, and sides. The pattern is irregular, but not entirely asymmetrical. Eyes are pigmented, but histologically reveal a decrease in thickness and pigmentation of the choroid layer. Many variable white animals do not respond to auditory stimuli, an effect that is particularly evident in animals in which the head is entirely white. Ataxic behavior is also prevalent. Pigment distribution, together with auditory and retinal deficiencies, suggests a neural crest cell migration defect. Breeding data are consistent with an autosomal semidominant, lethal mode of inheritance. The trait differs from two somewhat similar variants in Peromyscus: from dominant spot (S) in extent and pattern of pigmentation and from whiteside (ws), an autosomal recessive trait, in the mode of inheritance and viability. Evidence for possible homology with the Va (varitint-waddler) locus in house mouse (Mus) is presented. The symbol Vw is tentatively assigned for the variable white locus in Peromyscus.
In 1991 the Genome Data Base at Johns Hopkins University School of Medicine was selected as the central repository for mapping data from the Human Genome Project, and was funded by NIH and DOE under a three year award. GDB has now finished 28 months of Federally funded operation. During this period a great deal of progress and many internal changes have taken place. In addition, many changes have also occurred in the external environment, and GDB has adapted its strategies to play an appropriate role in those changes as well. Recognizing the central role of mapping information in the genome project, it is important that GDB respond aggressively to the increasing demands of genomic researchers, as well as formulate a program of response to a number of long standing, but still unmet, needs of that community. It is even more important that GDB provide leadership in the genome informatics enterprise. Three themes described here are dominant in our future plans and represent the essence of the major changes made in the past year. They include: enhanced data acquisition, better map representation, and full integration into the collection of genomic databases.
The purpose and scope of the human genome project are discussed. Basic biological concepts underlying the project are reviewed. The two processes-logic and experimentation-used to understand DNA sequences are compared, and the process of obtaining the sequence is described. Two problem areas in this collaboration between biologists and computer scientists are examined. They arise from differences in training that can make communication difficult, and nomenclature problems.< >
Mapping and sequencing the entire human genome in a timely fashion requires organization of all available resources to the common goal. Federal funding agencies have established individual genome centers that will focus on one or more chromosomes. Further, chromosome-specific workshops are being organized to permit individual centers, researchers, or groups to pool their results with other colleagues working on the same chromosome. The activities imply the following: (1) each chromosome community should have its own database; (2) the databases should permit inclusion of data from many different groups and give different map interpretations of the same chromosome region; and (3) similar formats for data storage and representation should be used across the databases to simplify data exchange and interpretation. However, no matter how sophisticated modern database management systems may be, they cannot realistically fulfill their responsibilities until all parties concerned are prepared to submit their data to centralized databases. To do this they need to be provided with adequate tools and incentives. Provision of the tools is the task of the database organizations. Provision of incentives is partly a question of adequate peer recognition for direct submission, partly a willingness to openly share information with the community at large, and partly the need for funding organizations to insist on data sharing as a requisite for their continued support.
An autosomal recessive mutation affecting hair and eye pigmentation was discovered in the F2 progeny of wild-type deer mice, (Peromyscus maniculatus), trapped near East Lansing, Michigan. When homozygous, the mutation (designated as blonde, bl), reduces both black and yellow pigmentation deposited in the fur, reduces or eliminates pigmentation in the non-follicular melanocytes of the outer ear, peri-orbital skin and tail, slightly reduces the amount of pigmentation in the choroidal melanocytes, and completely eliminates pigmentation of the retinal epithelium.
Data are presented to show that the ingestion of sublethal quantities of toxic bait can modify rodents' subsequent acceptance of that and other baits by way of two separate mechanisms: a learned aversion to that specific toxicant-bait combination and a nonlearned aversion to novel foods in general. A test procedure is offered which provides separate measures of the degree of bait shyness produced by these two mechanisms and which also avoids confounding the results with the direct effect of toxicant flavor upon bait palatability.
An investigation was made of the occurrence of learned and nonlearned aversions in the acquisition of illness-induced taste aversions in mice of the genusPeromyscus. It was determined: (1) that illness following the ingestion of a novel flavor both produced aversions specific to that flavor and also enhanced neophobia directed toward novel flavors in general; (2) that the specific aversion and the enhanced neophobia appeared to be mediated by independent processes, with no indication that the enhanced neophobia was dependent upon the integrity of the specific aversion; and (3) that illness following the ingestion of familiar water produced enhanced neophobia, which did not appear to be mediated by an aversion to water. It was noted that the results were fundamentally in agreement with those previously obtained with laboratory rats, except that a demonstration of the independence between the two types of aversions has not yet been reported in those animals.
An examination of the effect of sex upon taste-aversion learning in deer mice (Peromyscus maniculatus bairdi) found that (a) sex has no apparent effect upon either the acquisition or the extinction of a LiCl-induced aversion to sucrose solution if the animals are tested while fluid deprived, but that (b) if the animals are tested under nondeprived conditions, males exhibit a greater initial aversion than females but both sexes seem to extinguish their aversions at similar rates. These findings differ from those previously reported for laboratory rats, in which it has been found that sex affects the extinction but not the acquisition of poison-induced taste aversions. It was suggested that either (a) sex interacts with taste-aversion learning via different mechanisms in deer mice and in rats, or (b) the apparent differences in extinction rates reported for rats might conceivably reflect differences in initial aversion strength which were undetected due to the use of high doses of toxin.
Although bait shyness has long been recognized as a problem to be overcome in the control of vertebrate pests, it has recently been suggested that the phenomenon might be turned to an advantage and used as an alternative, non-lethal form of control. Unfortunately, this technique has not proven to be as useful as hoped, as the work which has been done on coyotes is inconclusive at best and some recent work on rodents has cast serious doubts upon the method's potential. However, an extensive literature dealing with the formation of poison-based food aversions now exists, and insights gained from these studies can be used to increase the efficacy of traditional, lethal control techniques. For example, the efficacy of pre-baiting may be greatly increased if the pre-bait is treated with a non-toxic flavor which mimics the flavor of the subsequently used toxin, even if this non-toxic flavor decreases the acceptability of the pre-bait.
An investigation was made of the effects of prior familiarization with sucrose on the acquisition and extinction of LiCl-induced aversions to sucrose by mice of the genus Peromyscus. As in previous studies on other species, it was found that flavor familiarization inhibits the formation of learned taste aversions. However, in contrast to some reports on other species, it was demonstrated that for Peromyscus familiarization does not accelerate, but instead retards, the extinction of taste aversions. It was noted that (a) the contrasting extinction results reported for other species may be confounded with masked acquisition effects, (b) the latent inhibition effect is often not obtained with fewer than 20 preexposures, yet the flavor-preexposure effect has been demonstrated with as few as one preexposure, (c) the flavor-preexposure schedule is logically and operationally equivalent to a short partial-reinforcement schedule, and (d) both the acquisition and extinction effects shown by Peromyscus are consistent with a partial-reinforcement interpretation. Therefore, it was suggested that future analysis of the phenomenon might profitably consider the possibility that the flavor-preexposure effect upon taste-aversion learning may be a case of partial reinforcement.
A series of experiments tested the ability of mice of the native genus Peromyscus to form learned taste aversions. It was found that (a) the mice acquired a strong aversion after a single flavor/toxicosis pairing, (b) naive mice drinking a LiCl solution apparently began to experience toxic effects within 90 sec after the beginning of consumption, (c) the mice acquired a total aversion after a single flavor/delayed illness pairing when high doses of toxin were employed, and (d) the aversion produced by a single flavor/delayed-illness pairing was specific to the flavor paired with illness and was dependent on the contingency between the flavor and illness. Although these responses are qualitatively similar to those reported for domestic rats, the mice formed considerably weaker aversions than those previously reported for laboratory rats tested with the same weight-specific doses of LiCl.
Progress on the project to study the effects of the ELF Communication System on small mammals and nesting birds is detailed for the base period, 1982. Initial population surveys were conducted which showed that the main study species, the Black-capped Chickadee (Parus atricapillus) and the deer-mouse, (Peromyscus gracilis) were present and abundant on the pilot plot and several other plots which are potential sites for establishing permanent plots once the route of the ELF antenna is known. The pilot plot will serve as a testing site for studies of parental care, nestling growth and maturation, fecundity, homing, activity patterns, embryological development and metabolic physiology. Nesting boxes were placed on the pilot plot in September and October, 1982, to insure that animals would be available for these studies in the spring, 1983. A data-base management system was developed to coordinate the collection and analysis of data for the planned studies.
As the Human Genome Project (HGP) moves toward its successful completion, more and more people have become interested in understanding this project and its results. Since the HGP has significant ethical, legal, and social implications for all citizens, the number of individuals who do, or should wish to become familiar with the project is high. In addition to its importance in the training of professional geneticists, the HGP is of special relevance for undergraduate training in basic biology, and even for high-school and other K-12 education. Understanding the results of HGP research requires a familiarity with the notions of basic genetics. Unlike other disciplines that evolved over centuries, modern genetics began abruptly with the rediscovery of Gregor Mendel's work in 1900. Within a few years, fundamental concepts were elaborated and the foundations of genetics established. Because genetics developed so rapidly in just a few decades after 1900, the literature of that period constitutes a valuable resource even now. It may be read profitably by students and scientists wishing to understand the foundations of their field, as well as by laymen or historians of science. Unfortunately, the early literature is rapidly becoming almost inaccessible. Newer libraries do not hold older journals and even established libraries are moving their materials from that era into hard-to-reach (and impossible to browse) long-term storage in remote warehouses. To be sure, key studies from the early work are discussed in nearly all textbooks, but a comparison of these presentations with the actual literature shows that most textbook treatments have essentially mythologized the early work so that real understanding is lost. There have been several collections of classic works developed over the years (although none lately), but these suffer from the effects of the necessary, but nonetheless pernicious, highly selective sampling that accompanies these projects. Such selectivity, coupled with introductions that offer essentially modern interpretations of the work, obscure the intellectual rigor and excitement of the original efforts.