Karl Czajkowski's research while affiliated with University of Southern California and other places

Publications (51)

Article
Full-text available
The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biome...
Article
Full-text available
Significance Imaging of labeled excitatory synapses in the intact brain before and after classical conditioning permits a longitudinal analysis of changes that accompany associative memory formation. When applied to midlarval stage zebrafish, this approach reveals adjacent regions of synapse gain and loss in the lateral and medial pallium, respecti...
Preprint
Full-text available
Discovery of new knowledge is increasingly data-driven, predicated on a team's ability to collaboratively create, find, analyze, retrieve, and share pertinent datasets over the duration of an investigation. This is especially true in the domain of scientific discovery where generation, analysis, and interpretation of data are the fundamental mechan...
Conference Paper
Database evolution is a notoriously difficult task, and it is exacerbated by the necessity to evolve database-dependent applications. As science becomes increasingly dependent on sophisticated data management, the need to evolve an array of database-driven systems will only intensify. In this paper, we present an architecture for data-centric ecosy...
Conference Paper
The foundation of data oriented scientific collaboration is the ability for participants to find, access and reuse data created during the course of an investigation, what has been referred to as the FAIR principles. In this paper, we describe ERMrest, a collaborative data management service that promotes data oriented collaboration by enabling FAI...
Conference Paper
Full-text available
The pace of discovery in eScience is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. It is all too common for investigators to spend inordinate amounts of time developing ad hoc procedures to manage their data. In previous work, we presented Deriva, a Scientifi...
Conference Paper
Creating and maintaining an accurate description of data assets and the relationships between assets is a critical aspect of making data findable, accessible, interoperable, and reusable (FAIR). Typically, such metadata are created and maintained in a data catalog by a curator as part of data publication. However, allowing metadata to be created an...
Article
Scientific discovery is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. While the details vary from domain to domain, these data often consist of diverse digital assets (e.g. image files, sequence data, or simulation outputs) that are organized with complex rel...
Conference Paper
Full-text available
The overhead and burden of managing data in complex discovery processes involving experimental protocols with numerous data-producing and computational steps has become the gating factor that determines the pace of discovery. The lack of comprehensive systems to capture, manage, organize and retrieve data throughout the discovery life cycle leads t...
Conference Paper
Full-text available
Increasingly, scientific discovery is driven by the analysis, manipulation, organization, annotation, sharing, and reuse of high-value scientific data. While great attention has been given to the specifics of analyzing and mining data, we find that there are almost no tools nor systematic infrastructure to facilitate the process of discovery from d...
Article
Full-text available
Biomedical research depends upon increasingly high throughput instruments and sophisticated data analytics. In spite of the significant overhead of handling research data, there is little support for researchers to manage and organize data for purposes of exploration, analysis, and ultimately publication. Shared file systems with metadata coded int...
Conference Paper
Full-text available
Increasingly, advances in biomedical research are the result of combining and analyzing heterogeneous data types from different sources, spanning genomic, proteomic, imaging, and clinical data. Yet despite the proliferation of data-driven methods, tools to support the integration and management of large collections of data for purposes of data driv...
Article
Full-text available
One of the main challenges in Grid computing is efficient allocation of resources (CPU – hours, network bandwidth, etc.) to the tasks submitted by users. Due to the lack of centralized control and the dynamic/stochastic nature of resource availability, any successful allocation mechanism should be highly distributed and robust to the changes in the...
Article
One of the criteria for the Grid infrastructure is the ability to share resources with nontrivial qualities of service. However, sharing resources in Grids is complicated in that is requires the ability bridge the differing policy requirements of the resource owners to create a consistent cross-organizational policy domain that delivers the necessa...
Article
We often encounter in distributed systems the need to model, access, and manage state. This state may be, for example, data in a purchase order, service level agreements representing resource availability, or the current load on a computer. We introduce two closely related approaches to modeling and manipulating state within a Web services (WS) fra...
Article
This chapter focuses on the management of Grid resources and services. It introduces a generalized resource management framework and uses it as a basis for characterizing existing approaches and for defining a direction for resource management development, particularly as framed within the Open Grid Services Architecture. Although there are many fa...
Article
Recent scientific and engineering advances increase the demands on tools for high performance interactive visual exploration of large-scale, multi-dimensional simulation and sensor-based datasets. For example, earthquake scientists can now study earthquake phenomena in detail via "first principle," physics-based, large-scale simulations in a time-v...
Article
Full-text available
One of the main challenges in Grid computing is e#- cient allocation of resources (CPU-hours, network bandwidth, etc.) to the tasks submitted by users. Due to the lack of centralized control and the dynamic/stochastic nature of resource availability, any successful allocation mechanism should be highly distributed and robust to the changes in the G...
Article
We present a reformulation of the well-known GRAM architecture based on the Service-Level Agreement (SLA) negotiation protocols defined within the Ser- vice Negotiation and Access Protocol (SNAP) framework. We illustrate how a range of local, distributed, and workflow scheduling mechanisms can be viewed as part of a cohesive yet open system, in whi...
Article
Abstract,We present a reformulation of the well-known GRAM architecture based on the Service-Level Agreement (SLA) negotiation protocols defined within the Ser- vice Negotiation and Access Protocol (SNAP) framework. We illustrate how a range of local, distributed, and workflow scheduling mechanisms can be viewed as part of a cohesive yet open syste...
Article
The WS-Resource construct has been proposed as a means of expressing the relationship between stateful resources and Web services. We introduce here the WS-Resource framework, a set of proposed Web services specifications that define a rendering of the WS-Resource approach in terms of specific message exchanges and related XML definitions. These sp...
Article
Computational grids are enabling collaboration between scientists and organizations to generate and archive extremely large datasets across shared, distributed resources. There is a need to visually explore such data throughout the life-cycle of projects. Practical exploration of large datasets requires visualization tools that can function in the...
Conference Paper
Grid computing is concerned with the sharing and coordinated use of diverse resources in distributed "virtual organizations." The dynamic and multiinstitutional nature of these environments introduces challenging security issues that demand new technical approaches. In particular, one must deal with diverse local mechanisms, support dynamic creatio...
Article
Full-text available
This document provides information to the community regarding the specification of the Agreement-Based Grid Service Management (OGSI-Agreement) model. Distribution of this document is unlimited. The Open Grid Services Architecture (OGSA) integrates Grid technologies with Web services mechanisms to create a distributed computing framework based arou...
Conference Paper
A fundamental problem in distributed computing is to map activities such as computation or data transfer onto resources that meet requirements for performance, cost, security, or other quality of service metrics. The creation of such mappings requires negotiation among application and resources to discover, reserve, acquire, configure, and monitor...
Article
Building on both Grid and Web services technologies, the Open Grid Services Architecture (OGSA) defines mechanisms for creating, managing, and exchanging information among entities called Grid services. Succinctly, a Grid service is a Web service that conforms to a set of conventions (interfaces and behaviors) that define how a client interacts wit...
Conference Paper
Computational grids are enabling collaboration between scientists and organizations to generate and archive extremely large datasets across shared, distributed resources. There is a need to visually explore such data throughout the life-cycle of projects. Practical exploration of large datasets requires visualization tools that can function in the...
Conference Paper
Grid technologies enable large-scale sharing of resources within formal or informal consortia of individuals and/or institutions: what are sometimes called virtual organizations. In these settings, the discovery, characterization, and monitoring of resources, services, and computations are challenging problems due to the considerable diversity; lar...
Conference Paper
Applications designed to execute on “computational grids” frequently require the simultaneous co-allocation of multiple resources in order to meet performance requirements. For example, several computers and network elements may be required in order to achieve real-time reconstruction of experimental data, while a large numerical simulation may req...
Conference Paper
Full-text available
The development of applications and tools for high-performance “computational grids” is complicated by the heterogeneity and frequently dynamic behavior of the underlying resources; by the complexity of the applications themselves, which often combine aspects of supercomputing and distributed computing; and by the need to achieve high levels of per...
Conference Paper
Advances in networking infrastructure have made it possible to build very large scale applications whose execution spans multiple supercomputers. In such very large scale or ultra-scale applications, a central requirement is the ability to simultaneously co-allocate large collections of resources, to initiate a computation on those resources and to...
Article
Metacomputing systems are intended to support remote and/or concurrent use of geographically distributed computational resources. Resource management in such systems is complicated by five concerns that do not typically arise in other situations: site autonomy and heterogeneous substrates at the resources, and application requirements for policy ex...
Article
The GRAAP working group of the Global Grid Forum is drafting a specication for the management of resources and services using negotiated service level agreements in a Web services environment (WS-Agreement). This memo discusses on- going design considerations for this activity, focusing on the desire to strike a balance between goals for e xibility...
Article
Status of this Memo This document specifies the Grid Notification Framework developed for the Grid community. We welcome a discussion of the issues and suggestions for improvements. Abstract This document describes a generic notification framework through which information about the existence of a grid entity, as well as properties about its state...
Article
Building on both Grid and Web services technologies, the Open Grid Services Infrastructure (OGSI) defines mechanisms for creating, managing, and exchanging information among entities called Grid services. Succinctly, a Grid service is a Web service that conforms to a set of conventions (interfaces and behaviors) that define how a client interacts w...

Citations

... Furthermore, initiatives are being proposed to develop cloud environments to store, manage, and analyze data in effective, scalable, and secure ways (Schatz et al., 2022). Similarly, models have been developed to create an ecosystem to improve data description and hosting (Charbonneau et al., 2022). ...
... However, automated database schema evolution is a non-trivial task [8], [9] and CI/CD for relational database applications is still rarely applied [10]. It remains one of the most challenging aspects of database development [11]. Database design [12], testing [13] [14], data quality [15], and schema evolution [16] are preconditions for successful CI/CD adoption. ...
... FaceBase uses the DERIVA (Discovery Environment for Relational Information and Versioned Assets) platform for data-intensive sciences built with FAIR principles at its core (Schuler et al. 2016;Bugacov et al. 2017). DERIVA provides a unique platform for building online data resources consisting of a metadata catalog with rich data modeling facilities and expressive ad hoc query support (Czajkowski et al. 2018), file storage with versioning and fixity, persistent identifiers for all data records, user-friendly desktop applications for bulk data transfer and validation, semantic file containers (Chard et al. 2016), and intuitive user interfaces for searching, displaying, and data entry that adapt to any data model implemented in the system (Tangmunarunkit et al. 2021). Building on this foundation, we devised and employed the following strategy for selfserve data curation. ...
... Once a data package is uploaded, a DERIVA [44] database instance automatically begins ingesting it, performing further validation using a custom validation script. Users are notified by email when the ingest process has completed and are provided with a link to preview the data in a secure section of the CFDE portal (or to access a description of any ingest errors). ...
... This can include automated workflows for approval processes and user authentication mechanisms [8,15,25,40]. Such functionality ensures that the visibility of catalog content needs to be unlocked by access requests and the assignment of appropriate access keys [5,41]. As a more recent development, Artificial Intelligence (AI) can be used to identify sensitive or secret data by assigning attributes or to display data sets that are not accessible to the user [15,23,24]. ...
... If the goal is simply to facilitate enhanced annotation of a singular, very specific kind of study data, then one does not need the sophistication of the technology that CEDAR offers. For example, developers associated with the NIH Biomedical Informatics Research Network (BIRN) created an attractive, hardcoded interface for annotating their particular kinds of neuroimaging data as part of their Digital Asset Management System (DAMS) 54 . The Stanford Microarray Database developed a similarly well-crafted tool, known as Annotare 55 , for entering metadata about gene-expression datasets in accordance with hardcoded MAGE-Tab descriptions 50 . ...
... The Entity-relationship model, put forward by P.P.Chen in 1976 [28], was an efficient method used to design database schema in different subjects, such as internal control construction of a transaction system [29], data storage service for web-based, data-oriented collaboration [30] and a system of adult extended education [31]. Although improved entity-relationship models have been put forward [32][33][34][35], it is more convenient to directly modify the rules of the basic entity-relationship model in this article. ...
... Thus, these tools cannot be used to enable ne-grained data-driven rule engines, such as R . Other data-driven policy engines, such as IOBox [12], also require individual data events. IOBox is an extract, transform, and load (ETL) system, designed to crawl and monitor local le systems to detect le events, apply pattern matching, and invoke actions. ...
... In prevous work [2] we introduced the concept of a BDAM for biomedical research and described our experiences with several prototypical user studies which had informed the early design of the system. In this paper, we expand that discussion to describe in detail the architecture, design and implementation of the BDAM catalog service as well as the use of BDAM in a major research center and microscopy core for stem cell-based kidney research. ...