
Varun Ratnakar- Master of Science
- University of Southern California
Varun Ratnakar
- Master of Science
- University of Southern California
About
76
Publications
13,699
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,209
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (76)
Collaborative and multi-site neuroimaging studies have greatly accelerated the rate at which new and existing data can be aggregated to answer a neuroscientific question. New research initiatives are continuously collecting more data, allowing opportunities to refine previous published findings through continuous and dynamic updates. Yet, we lack a...
We present a Python package geared toward the intuitive analysis and visualization of paleoclimate timeseries, Pyleoclim. The code is open‐source, object‐oriented, and built upon the standard scientific Python stack, allowing users to take advantage of a large collection of existing and emerging techniques. We describe the code's philosophy, struct...
Understanding the interactions between natural processes and human activities poses major challenges as it requires the integration of models and data across disparate disciplines. It typically takes many months and even years create valid end-to-end simulations as the different models need to be configured in consistent ways so their results can b...
Major societal and environmental challenges involve complex systems that have diverse multi-scale interacting processes. Consider, for example, how droughts and water reserves affect crop production and how agriculture and industrial needs affect water quality and availability. Preventive measures, such as delaying planting dates and adopting new a...
Understanding the impacts of climate change on natural and human systems poses major challenges as it requires the integration of models and data across various disciplines, including hydrology, agriculture, ecosystem modeling, and econometrics. While tactical situations arising from an extreme weather event require rapid responses, integrating the...
The progress of science is tied to the standardization of measurements, instruments, and data. This is especially true in the Big Data age, where analyzing large data volumes critically hinges on the data being standardized. Accordingly, the lack of community‐sanctioned data standards in paleoclimatology has largely precluded the benefits of Big Da...
Scientific data generation in the world is continuous. However, scientific studies once published do not take advantage of new data. In order to leverage this incoming flow of data, we present Neuro-DISK, an end-to-end framework to continuously process neuroscience data and update the assessment of a given hypothesis as new data become available. O...
Understanding the interactions between natural processes and human activities poses major challenges as it requires the integration of models and data across disparate disciplines. It typically takes many months and even years to create valid end-to-end simulations as different models need to be configured in consistent ways and generate data that...
Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers t...
The cerebral cortex underlies our complex cognitive capabilities, yet we know little about the specific genetic loci influencing human cortical structure. To identify genetic variants impacting cortical structure, we conducted a genome-wide association meta-analysis of brain MRI data from 35,660 individuals with replication in 15,578 individuals. W...
Model repositories are key resources for scientists in terms of model discovery and reuse, but do not focus on important tasks such as model comparison and composition. Model repositories do not typically capture important comparative metadata to describe assumptions and model variables that enable a scientist to discern which models would be bette...
Traditional approaches to ontology development have a large lapse between the time when a user using the ontology has found a need to extend it and the time when it does get extended. For scientists, this delay can be weeks or months and can be a significant barrier for adoption. We present a new approach to ontology development and data annotation...
Scientific collaborations involving multiple institutions are increasingly commonplace. It is not unusual for publications to have dozens or hundreds of authors, in some cases even a few thousands. Gathering the information for such papers may be very time consuming, since the author list must include authors who made different kinds of contributio...
Scientific data is continuously generated throughout the world. However, analyses of these data are typically performed exactly once and on a small fragment of recently generated data. Ideally, data analysis would be a continuous process that uses all the data available at the time, and would be automatically re-run and updated when new data appear...
OntoSoft is a distributed semantic registry for scientific software. This paper describes three major novel contributions of OntoSoft: 1) a software metadata registry designed for scientists, 2) a distributed approach to software registries that targets communities of interest, and 3) metadata crowdsourcing through access control. Software metadata...
This software implements the DISK approach with a portal for creating hypothesis, lines of inquiry and testing them against the WINGS workflow system.
DISK can formulate new hypotheses by refining user-provided hypothesis statements. For example, in the omics domain, DISK is aware of the concept of mutations. As part of its automated analysis, DISK...
This bundle contains a web page describing the materials used in the evaluation of the paper, along with references to the software and datasets, provenance metadata and workflows used. All the scripts and descriptions are included as well.
Contributors in hundreds of semantic wiki sites are creating structured information in RDF every day, thus growing the semantic content of the Web in spades. Although wikis have been analyzed extensively, there has been little analysis of the use of semantic wikis. The Provenance Bee Wiki was created to gather and aggregate data from these sites, s...
This paper presents OntoSoft, an ontology to describe metadata for scientific software. The ontology is designed considering how scientists would approach the reuse and sharing of software. This includes supporting a scientist to: 1) identify software, 2) understand and assess software, 3) execute software, 4) get support for the software, 5) do re...
Although science has become an increasingly collaborative endeavor over the last hundred years, only little attention has been devoted to supporting scientific communities. Our work focuses on scientific collaborations that revolve around complex science questions that require significant coordination to synthesize multidisciplinary findings, entic...
Collaboration is ubiquitous in today's science, yet there is limited support for coordinating scientific work. The general-purpose tools that are typically used (e.g., email, shared document editing, social coding sites), have still not replaced in-person meetings, phone calls, and extensive emails needed to coordinate and track collaborative activ...
Recent highly publicized cases of premature patient assignment into clinical trials, resulting from non-reproducible omics analyses, have prompted many to call for a more thorough examination of translational omics and highlighted the critical need for transparency and reproducibility to ensure patient safety. The use of workflow platforms such as...
The Web was originally developed to support collaboration in science. Although scientists benefit from many forms of collaboration on the Web (e.g., blogs, wikis, forums, code sharing, etc.), most collaborative projects are coordinated over email, phone calls, and in-person meetings. Our goal is to develop a collaborative infrastructure for scienti...
Although collaborative activities are paramount in science, little attention has been devoted to supporting on-line scientific collaborations. Our work focuses on scientific collaborations that revolve around complex science questions that require significant coordination to synthesize multidisciplinary findings, enticing contributors to remain eng...
This paper gives an overview of the Organic Data Science framework, a new approach for scientific collaboration that opens the science process and exposes information about shared tasks, participants, and other relevant entities. The framework enables scientists to formulate new tasks and contribute to tasks posed by others. The framework is curren...
Domain experts are often untrained in big data technologies and this limits their ability to exploit the data they have available. Workflow systems hide the complexities of high-end computing and software engineering by offering pre-packaged analytic steps combined into multi-step methods commonly used by experts. A current limitation of workflow s...
Semantic wikis augment wikis with semantic properties that can be used to aggregate and query data through reasoning. Semantic wikis are used by many communities, for widely varying purposes such as organizing genomic knowledge, coding software, and tracking environmental data. Although wikis have been analyzed extensively, there has been no publis...
Semantic wikis augment wikis with semantic properties that can be used to structure content that can therefore be aggregated and queried through reasoning. Semantic wikis have been adopted by many communities for very diverse purposes, such as organizing genomic knowledge, coding software, learn about hobbies, and tracking environmental data. Altho...
Although to-do lists are a ubiquitous form of personal task management, there has been no work on intelligent assistance to automate, elaborate, or coordinate a user’s to-dos. Our research focuses on three aspects of intelligent assistance for to-dos. We investigated the use of intelligent agents to automate to-dos in an office setting. We collecte...
Personal tasks are managed with a variety of mechanisms from To-Do lists to calendars to emails. Task management remains challenging, since many tasks are interrelated, some may depend on other people's tasks to be accomplished, and their priority and status changes over time. While many tasks could be automated by services and agents available on...
Computational workflows are a powerful paradigm to represent and manage complex applications, particularly in large-scale distributed scientific data analysis. Workflows represent application components that result in individual computations as well as their interdependences in terms of dataflow. Workflow systems use these representations to manage...
The San Joaquin River (SJR) restoration effort began in October 2009
with the onset of federally mandated continuous flow. A key objective of
the effort is to restore and maintain fish populations in the main stem
of the San Joaquin River, from below the Friant Dam to the confluence of
the Merced River. In addition to the renewed flows, the restora...
Scientific metadata containing semantic descriptions of scientific data is expensive to capture and is typically not used
across entire data analytic processes. We present an approach where semantic metadata is generated as scientific data is being
prepared, and then subsequently used to configure models and to customize them to the data. The metad...
Shortipedia is a Web-based knowledge repository, that pulls together a growing number of sources in order to provide a comprehensive, diversified view on entities of interest. Contributors to Shortipedia can easily add claims to the knowledge base, provide sources for their claims, and find links to knowledge already available on the Semantic Web.
This paper describes an approach to allow end users to define new procedures through tutorial instruction. Our approach allows users to specify procedures in natural language in the same way that they would instruct another person, while the system handles incompleteness and ambiguity inherent in natural human instruction and formulates follow up q...
To assist scientists in data analysis tasks, we have developed semantic workflow representations that support automatic constraint propagation and reasoning algorithms to manage constraints among the individual workflow steps. Semantic constraints can be used to represent requirements of input datasets as well as best practices for the method repre...
Describes the Wings intelligent workflow system that assists scientists with designing computational experiments by automatically tracking constraints and ruling out invalid designs, letting scientists focus on their experiments and goals.
To-Do lists are widely used for personal task management. We propose a novel approach to assist users in managing their To-Dos by matching them to How-To knowledge from the Web. We have implemented a system that, given a To-Do item, provides a number of possibly matching How-Tos, broken down into steps that can be used as new To-Do entries. Our imp...
Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multidi-mensional parameter space consisting of input performance paramet...
Workflow systems can manage complex scientific applications with distributed data processing. Although some workflow systems can represent collections of data with very compact abstractions and manage their execution efficiently, there are no approaches to date to manage collections of application components required to express some scientific appl...
Workflows are becoming an increasingly more common paradigm to manage scientific analyses. As workflow repositories start to emerge, workflow retrieval and discovery becomes a challenge. Studies have shown that scientists wish to discover workflows given properties of workflow data inputs, intermediate data products, and data results. However, work...
Data analysis processes in scientific applications can be ex- pressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Per- formance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters...
The goal of this effort was to drastically reduce the human effort required to configure and execute new workflows for data analysis from weeks to minutes by eliminating the need for costly human monitoring and intervention. This involves developing end-to-end data analysis systems to analyze data from many different sources and with many different...
This paper describes our experience to date employing the systematic mapping and optimization of large- scale scientific application workflows to current and future parallel platforms. The overall goal of the project is to integrate a set of system layers - application program, compiler, run-time environment, knowledge representation, optimization...
Our research focuses on creating and executing large-scale scientific workflows that often involve thousands of computations over distributed, shared resources. We describe an approach to workflow creation and refinement that uses semantic representations to 1) describe complex scientific applications in a data-independent manner, 2) automatically...
The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order t...
Assisting users with to-do lists presents new challenges for intelligent user interfaces. This paper presents a detailed analysis of to-do list entries jotted by users of a system that automates tasks for users that we would like to extend to assist users with their to-do entries. We also present four distinct stages of interpretation of to-do entr...
To-do lists have been found to be the most popular personal information management tools, yet there is no automated system to interpret and act upon them when appropriate on behalf of the user. Automating to-do lists is challenging, not only because they are specified as free text but also because most items contain abbreviated tasks, many do not s...
Scientific workflows are being developed for many domains as a useful paradigm to manage complex scientific computations. In our work, we are challenged with efficiently generating and validating workflows that contain large amounts (hundreds to thousands) of individual computations to be executed over distributed environments. This paper describes...
Collaborative e-Science projects commonly require data analysis to be performed on distributed data sets which may contain sensitive information. In addition to the credential-based privacy protection, ensuring proper handling of computerized data for disclosure and analysis is particularly essential in e- Science. In this paper, we propose a seman...
The first Provenance Challenge was set up in order to provide a forum for the community to help understand the capabilities of different provenance systems and the expressiveness of their provenance representations. To this end, a Functional Magnetic Resonance Imaging workflow was defined, which participants had to either simulate or run in order t...
In recent years, workflows have been increasingly used in scientific applications. This paper presents novel metadata reasoning capabilities that we have developed to support the creation of large workflows. They include 1) use of semantic web technologies in handling metadata constraints on file collections and nested file collections, 2) propagat...
Metadata catalogs store descriptive information (metadata attributes) about logical data items. These catalogs can then be queried to retrieve the particular logical data item that matches the criteria. However, the query has to be formulated in terms of the metadata attributes defined for the catalog. Our work explores the concept of virtual metad...
Scientific workflows are being developed for many domains as a paradigm to manage complex scientific computations. In our work, we are challenged with efficiently generating and validating workflows that contain large amounts (hundreds to thousands) of individual computations to be executed over distributed environments. We describe a new approach...
A common approach to managing large, heteroge-neous, and distributed collections of data is to sepa-rate the data itself (and its physical rendering in replicas) from the metadata that describes the na-ture of the data (often called logical data descrip-tions). Metadata catalogs store descriptive informa-tion (metadata attributes) about logical dat...
Representation of the earthquake source is an important element in seismic hazard analysis and earthquake simulations. Source models span a range of conceptual complexity - from simple time-independent point sources to extended fault slip distributions. Further computational complexity arises because the seismological community has established so m...
When designing mixed-initiative systems, full formalization of all potentially relevant knowledge may not be cost-effective or practical. This paper motivates the need for semi-formal representations that combine machine-processable structures with free text statements, and discusses the need to design them in a way that makes the free text more am...
Decision making is a fundamental human activ- ity. Most important decisions require careful analysis of the factors influencing a decision. Surprisingly, there has been little work on tools to capture and assess validity of a heterogeneous set of facts and claims that bear on a decision. Good decision making requires two compo- nents which are spec...
We propose a new approach to develop knowledge bases that captures at different levels of formality and specificity how each piece of knowledge in the system was derived from original sources, which are often Web sources. If a knowledge base contains a trace of information about how each piece of knowledge was defined, it will be easier to reuse, e...
TRELLIS provides an interactive environment that allows users to add their observations, opinions, and conclusions as they
analyze information by making semantic annotations about on-line documents. TRELLIS includes a vocabulary and markup language
for semantic annotations of decisions and tradeoffs, and allows users to extend this vocabulary with...
Several languages have been proposed as candidates for semantic markup. We needed to adopt a language for our current research on developing user-oriented tools operating over the Semantic Web. This paper presents the results of our analysis of three candidates that we considered: XML, RDF, and DAML+OIL along with their associated schemas and ontol...
Several languages have been proposed as candidates for semantic markup. We needed to adopt a language for our current research on developing user-oriented tools operating over the Semantic Web. This paper presents the results of our analysis of three candidates that we considered: XML, RDF, and DAML+OIL along with their associated schemas and ontol...
TRELLIS provides an interactive environment that allows users to add their observations, opinions, and conclusions as they analyze information by making semantic annotations about on-line documents. TRELLIS includes a vocabulary and markup language for semantic annotations of decisions and tradeoffs, and allows users to extend this vocabulary with...
This paper describes an approach to derive assessments about information sources based on individual feedback about the sources. We describe TRELLIS, a system that helps users annotate their analysis of alternative information sources that can be contradictory and incomplete. As the user makes a decision on which sources to dismiss and which to bel...
Many useful planning applications are handled by plan execution tools, such as PRS, that keep track of several interacting goals and tasks, and different ways to expand them, using procedure definitions. I describe Tailor, an implemented tool to help end users modify the procedure definitions used in these tools by interpreting instructions given a...
Many scientists do not share their data due to the cost and lack of in-centives of traditional approaches to data sharing. We present a new approach to data sharing that takes into account the cultural practices of science and of-fers a semantic framework that 1) links dataset contributions directly to science questions, 2) reduces the burden of da...