Amarnath Gupta

Amarnath Gupta
University of California, San Diego | UCSD · San Diego Supercomputer Center (SDSC)

PhD

About

225
Publications
30,625
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,583
Citations
Citations since 2017
22 Research Items
2012 Citations
20172018201920202021202220230100200300400
20172018201920202021202220230100200300400
20172018201920202021202220230100200300400
20172018201920202021202220230100200300400

Publications

Publications (225)
Preprint
Data science applications increasingly rely on heterogeneous data sources and analytics. This has led to growing interest in polystore systems, especially analytical polystores. In this work, we focus on a class of emerging multi-data model analytics workloads that fluidly straddle relational, graph, and text analytics. Instead of a generic polysto...
Chapter
The Relationship-based Access Control Model (ReBAC) generalizes Role-based Access Control (RBAC) by considering both hierarchical and non-hierarchical relationships between users to specify access control of a set of target resources (objects). This paper extends the ReBAC model by considering relationships between objects as well as between subjec...
Article
Full-text available
Early detection of diseases such as COVID-19 could be a critical tool in reducing disease transmission by helping individuals recognize when they should self-isolate, seek testing, and obtain early medical intervention. Consumer wearable devices that continuously measure physiological metrics hold promise as tools for early illness detection. We ga...
Article
Full-text available
There is significant variability in neutralizing antibody responses (which correlate with immune protection) after COVID-19 vaccination, but only limited information is available about predictors of these responses. We investigated whether device-generated summaries of physiological metrics collected by a wearable device correlated with post-vaccin...
Preprint
Full-text available
Modern big data applications usually involve heterogeneous data sources and analytical functions, leading to increasing demand for polystore systems, especially analytical polystore systems. This paper presents AWESOME system along with a domain-specific language ADIL. ADIL is a powerful language which supports 1) native heterogeneous data models s...
Preprint
Knowledge analysis is an important application of knowledge graphs. In this paper, we present a complex knowledge analysis problem that discovers the gaps in the technology areas of interest to an organization. Our knowledge graph is developed on a heterogeneous data management platform. The analysis combines semantic search, graph analytics, and p...
Chapter
Full-text available
Quantum materials research is a rapidly growing domain of materials research, seeking novel compounds whose electronic properties are born from the uniquely quantum aspects of their constituent electrons. The data from this rapidly evolving area of quantum materials requires a new community-driven approach for collaboration and sharing the data fro...
Preprint
Full-text available
Social media data are often modeled as heterogeneous graphs with multiple types of nodes and edges. We present a discovery algorithm that first chooses a "background" graph based on a user's analytical interest and then automatically discovers subgraphs that are structurally and content-wise distinctly different from the background graph. The techn...
Preprint
Full-text available
Social media data are often modeled as heterogeneous graphs with multiple types of nodes and edges. We present a discovery algorithm that first chooses a "background" graph based on a user's analytical interest and then automatically discovers subgraphs that are structurally and content-wise distinctly different from the background graph. The techn...
Preprint
Full-text available
We present our experience with a data science problem in Public Health, where researchers use social media (Twitter) to determine whether the public shows awareness of HIV prevention measures offered by Public Health campaigns. To help the researcher, we develop an investigative exploration system called boutique that allows a user to perform a mul...
Preprint
Full-text available
Many data science applications like social network analysis use graphs as their primary form of data. However, acquiring graph-structured data from social media presents some interesting challenges. The first challenge is the high data velocity and bursty nature of the social media data. The second challenge is that the complex nature of the data m...
Preprint
Full-text available
Temporal text, i.e., time-stamped text data are found abundantly in a variety of data sources like newspapers, blogs and social media posts. While today's data management systems provide facilities for searching full-text data, they do not provide any simple primitives for performing analytical operations with text. This paper proposes the temporal...
Conference Paper
Full-text available
AWESOME is a polystore system that enables a data analyst to create a data ingestion script that specifies how it should collect, organize, run a data-derivation pipeline and reports results of the analysis. The collected data can be stored in different component stores under AWESOME for subsequent secondary analysis. This paper demonstrates the pr...
Article
Full-text available
Proactive forensics uses the investigative principles of digital forensics to develop automated techniques that prevent cybercrime. One such prevention-minded methodology is PROFORMA, a prototype system that continuously evaluates the trustworthiness and risk of social communications.
Conference Paper
Full-text available
Specifying the search space is an important step in designing multimedia annotation systems. With the large amount of data available from sensors and web services, context-aware approaches for pruning search spaces are becoming increasingly common. In these approaches, the search space is limited by the contextual information obtained from a fixed...
Conference Paper
Full-text available
Polystores, i.e., data management systems that use multiple stores for different data models, are gaining popularity. We are developing a polystore-based system called AWESOME to support social data analytics. The AWESOME polystore can support relational, semistructured, graph and text data and houses a Spark computation engine to produce derived d...
Conference Paper
Attack Graphs have been widely used by the network security administrators to gain an understanding of possible attack paths, an attacker may follow to compromise critical resources. As networks get larger and more complex, one needs to use databases to perform iterative, interactive analysis tasks with attack graphs. In this paper we investigate h...
Conference Paper
Full-text available
Social media data can be viewed as " mixed model " data that reflect interesting community behavior. We take a graph-centric view of microblogs and develop a user-defined specification of a community on these social graphs. We demonstrate the temporal behavior of communities can be captured by a set of graph metrics. We describe a system which tran...
Article
Full-text available
Wildfires are critical for ecosystems in many geographical regions. However, our current urbanized existence in these environments is inducing the ecological balance to evolve into a different dynamic leading to the biggest fires in history. Wildfire wind speeds and directions change in an instant, and first responders can only be effective if they...
Chapter
Graphs have emerged as an important genre of data that are found in a wide class of applications. The most dominant benchmark for graph data today is Graph 500 that generates a Stochastic Kronecker graph of various sizes, and reports the time to perform a breadth-first search. Apache Giraph uses Pagerank computation as an algorithmic benchmark for...
Conference Paper
Connecting people to the resources they need is a fundamental task for any society. We present the idea of a technology that can be used by the middle tier of a society so that it uses people's mobile devices and social networks to connect the needy with providers. We conceive of a world observatory called the Social Life Network (SLN) that connect...
Conference Paper
The NIF system is a semantic search engine that uses an ontology to improve search quality. In this experience paper we present SKEYQL, our semantic keyword query language and describe a number of ontology-based query reformulation strategies that go beyond standard query expansion techniques. We also present a set of lessons learnt and strategies...
Conference Paper
Computational problems are increasingly relying on context-aware approaches for tractable solutions. Usually, these approaches statically link additional sources of information to those already present in the problem space. We have been building CueNet, a context discovery framework, which will dynamically discover the most relevant context for a g...
Article
The availability of enormous volumes of heterogeneous Cyber-Physical-Social (CPS) data streams allow design and implementation of networks to connect people with essential life resources. We call these networks Social Life Networks (SLNs). We are developing concepts, technology, and infrastructure to design and build these networks. SLNs will be he...
Article
Full-text available
We report on progress of employing the Kepler workflow engine to prototype “end-to-end” application integration workflows that concern data coming from microscopes deployed at the National Center for Microscopy Imaging Research (NCMIR). This system is built upon the mature code base of the Cell Centered Database (CCDB) and integrated rule-oriented...
Article
Full-text available
The number of available neuroscience resources (databases, tools, materials, and networks) available via the Web continues to expand, particularly in light of newly implemented data sharing policies required by funding agencies and journals. However, the nature of dense, multifaceted neuroscience data and the design of classic search engine systems...
Conference Paper
In this short paper, we present early results from an ongoing research on creating a new graph-based representation from NLP analysis of scientific documents so that the graph can be utilized for answering structured queries on NL-processed data. We present a sketch of the data model and the query language to show how scientifically meaningful quer...
Article
Full-text available
An initiative of the NIH Blueprint for neuroscience research, the Neuroscience Information Framework (NIF) project advances neuroscience by enabling discovery and access to public research data and tools worldwide through an open source, semantically enhanced search portal. One of the critical components for the overall NIF system, the NIF Standard...
Article
The numbers of available neuroscience resources (databases, tools, materials and networks) on the web have, and continue to expand; particularly in light of newly implemented data sharing policies required by funding agencies and journals. However, the nature of dense, multi-faceted neuroscience data and the design of classic search engine systems...
Data
Methods. Detailed description of the methods and data types used in the BiologicalNetworks system for host-pathogen studies.
Article
Full-text available
Understanding of immune response mechanisms of pathogen-infected host requires multi-scale analysis of genome-wide data. Data integration methods have proved useful to the study of biological processes in model organisms, but their systematic application to the study of host immune system response to a pathogen and human disease is still in the ini...
Article
In this paper, we examine the problem of efficiently computing aggregate functions over polygonal regions of space. We first formalize a class of efficient region-based aggregation model, where the aggregation query is computed by representing the query region with pre-defined regions using set operations. By focusing on a grid tessellation, we fir...
Chapter
As we saw in the last chapter, there is wide diversity in the way data modelers and knowledge representation researchers view events. In this chapter, we will present a number of approaches to event data modeling.
Article
Full-text available
A significant problem in the study of mechanisms of an organism's development is the elucidation of interrelated factors which are making an impact on the different levels of the organism, such as genes, biological molecules, cells, and cell systems. Numerous sources of heterogeneous data which exist for these subsystems are still not integrated su...
Article
Community Cyber infrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) is an eScience project to enable the microbial ecology community in managing the challenges of metagenomics analysis. CAMERA supports extensive metadata based data acquisition and access, as well as execution of metagenomics experiments through stand...
Article
This paper presents BIODB, an ontology-enhanced information system to manage heterogeneous data. An ontology-enhanced system is a system where ad hoc data is imported into the system by a user, annotated by the user to connect the data to an ontology or other data sources, and then all data connected through the ontology can be queried in a federat...
Conference Paper
Social science is often concerned with the emergence of collective behavior out of the interactions of large numbers of individuals, but in this regard it has long suffered from a severe measurement problem - namely that individual-level behavior and ...
Article
The objective of the CAMERA project is to provide a facility to enable researchers to achieve revolutionary advances in the understanding of marine microbial ecology.
Conference Paper
Full-text available
This paper proposes a navigational method for mining by collecting evidences from diverse data sources. Since the representation method and even semantics of data elements differ widely from one data source to the other, consolidation of data under a single platform doesnt become cost effective. Instead, this paper has proposed a method of mining i...
Article
Full-text available
Events are at least as important as objects in modeling the dynamic universe. Modeling the real world and weaving the web of events require Composite Events that are valid constitution of atomic and composite sub-events. The progress in event composition has been limited to construction of some entities and relationships in upper ontologies such as...
Article
Full-text available
As increasing volumes and varieties of data are becoming available online, the challenges of accessing and using heterogeneous data resources are growing. We have developed a mediator-based data integration system called Cartel for biological oceanography data. A mediation approach is appropriate in cases where a single central warehouse is not des...
Article
Full-text available
Since the dawn of human civilization, stories have been a popular medium of communication, both synchronously and asynchronously. Technically, a story is a time-ordered coher- ent sequence of events. In many applications, heterogeneous data is collected and organized so appropriate stories could be told. In this paper, we present a system that help...
Conference Paper
Full-text available
Autism spectrum disorder is an inherently complex phenomenon requiring large studies of many different types to further understanding of its causes. The National Database for Autism Research (NDAR) is being constructed to aid in this effort by providing a means for researchers to share and integrate data. An autism ontology drafted by a group at St...
Conference Paper
Full-text available
When SQL and the relational data model were introduced 25 years ago as a general data management concept, enterprise software migrated quickly to this new technology. It is fair to say that SQL and the various implementations of RDBMSs became the backbone ...
Article
Full-text available
Using global physical and biological datasets, we tested oceanographic retention (fac- toring out effects of seamount depth and age) as one possible mechanism structuring seamount ben- thic decapod and gastropod communities. We first determined the relative oceanographic retentive potential (such as from Taylor caps or columns) for individual seamo...
Article
Full-text available
Amarnath Gupta + Yang Yang Aditya Bagchi Animesh RayUniversity of California Indian Statistical Keck GraduateSan Diego Institute, Calcutta Institute{gupta,yyang}@sdsc.eduaditya@isical.ac.in Animesh Ray@kgi.edu1
Article
Full-text available
This paper presents current progress in the development of semantic data integration environment which is a part of the Biomedical Informatics Research Network (BIRN; http://www.nbirn.net) project. BIRN is sponsored by the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). A goal is the developmen...
Article
Full-text available
The overarching goal of the NIF (Neuroscience Information Framework) project is to be a one-stop-shop for Neuroscience. This paper provides a technical overview of how the system is designed. The technical goal of the first version of the NIF system was to develop an information system that a neuroscientist can use to locate relevant information fr...
Article
Full-text available
A critical component of the Neuroscience Information Framework (NIF) project is a consistent, flexible terminology for describing and retrieving neuroscience-relevant resources. Although the original NIF specification called for a loosely structured controlled vocabulary for describing neuroscience resources, as the NIF system evolved, the requirem...
Article
The overarching goal of the NIF (Neuroscience Information Framework) project is to be a one-stop-shop for Neuroscience. This paper provides a technical overview of how the system is designed. The technical goal of the first version of the NIF system was to develop an information system that a neuroscientist can use to locate relevant information fr...
Article
Full-text available
Querying live media streams is a challenging problem that is becoming an essential requirement in a growing number of applications. Research in multimedia information systems has addressed and made good progress in dealing with archived data. Meanwhile, research in stream databases has received significant attention for querying alphanumeric symbol...
Conference Paper
Full-text available
Annotation is the process of supplementing data with additional information that was not part of the actual observation, but reflects post-facto comments and associations made by a user who analyzes the data. While annotation management systems are emerging in the field of relational data, such systems for scientific applications, where there is a...
Article
Databases have become integral parts of data management, dissemination, and mining in biology. At the Second Annual Conference on Electron Tomography, held in Amsterdam in 2001, we proposed that electron tomography data should be shared in a manner analogous to structural data at the protein and sequence scales. At that time, we outlined our progre...
Article
The broadly defined mission of the Biomedical Informatics Research Network (BIRN, www.nbirn.net) is to better understand the causes human disease and the specific ways in which animal models inform that understanding. To construct the community-wide infrastructure for gathering, organizing and managing this knowledge, BIRN is developing a federated...
Article
Full-text available
With support from the Institutes and Centers forming the NIH Blueprint for Neuroscience Research, we have designed and implemented a new initiative for integrating access to and use of Web-based neuroscience resources: the Neuroscience Information Framework. The Framework arises from the expressed need of the neuroscience community for neuroinforma...
Article
This chapter focuses the application of brain cartography to the problem of multiscale integration of brain data in the context of the Biomedical Informatics Research Network (BIRN) project.The BIRN project focuses on creating a grid infrastructure for integrating data on brain morphology and function obtained by different researchers to support co...
Conference Paper
Full-text available
We present the semantic data model for an ontological database for subcellular anatomy for Neurosciences. The data model builds upon the foundations of OWL and the Basic Formal Ontology, but extends them to include novel constructs that address several unresolved challenges encountered by biologists in using ontological models in their databases. T...