Sanguthevar Rajasekaran

Sanguthevar Rajasekaran
  • University of Connecticut

About

492
Publications
46,893
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,839
Citations
Current institution
University of Connecticut

Publications

Publications (492)
Article
Full-text available
In this paper, we describe the supervised dynamic correlated topic model (sDCTM) for classifying categorical time series. This model extends the correlated topic model used for analyzing textual documents to a supervised framework that features dynamic modeling of latent topics. sDCTM treats each time series as a document and each categorical value...
Article
Full-text available
Health and disease are fundamentally influenced by microbial communities and their genes (the microbiome). An in-depth analysis of microbiome structure that enables the classification of individuals based on their health can be crucial in enhancing diagnostics and treatment strategies to improve the overall well-being of an individual. In this pape...
Article
Full-text available
With the advent of the “Internet of Things” (IoT), insurers are increasingly leveraging remote sensor technology in the development of novel insurance products and risk management programs. For example, Hartford Steam Boiler’s (HSB) IoT freeze loss program uses IoT temperature sensors to monitor indoor temperatures in locations at high risk of wate...
Chapter
Record Linkage is the process of merging data from several sources and identifying records that are associated with the same entities, or individuals, where a unique identifier is not available. Record Linkage has applications in several domains such as master data management, law enforcement, health care, social networking, historical research, et...
Chapter
Full-text available
Jaro similarity is widely used in computing the similarity (or distance) between two strings of characters. For example, record linkage is an application of great interest in many domains for which Jaro similarity is popularly employed. Existing algorithms for computing the Jaro similarity between two given strings take quadratic time in the worst...
Article
The ‘Internet of Things’ (IoT) is a rapidly developing set of technologies that leverages large numbers of networked sensors, to relay data in an online fashion. Typically, knowledge of the sensor environment is incomplete and subject to changes over time. There is a need to employ classification algorithms to understand the data. We first review o...
Preprint
Full-text available
Background Health and disease are fundamentally influenced by microbial communities and their genes (the microbiome).An in-depth analysis of microbiome structure that enables the classification of individuals based on their health can be crucial in enhancing diagnostics and treatment strategies to improve the overall well-being of an individual. In...
Article
Full-text available
Recent advances in technology have led to an explosion of data in virtually all domains of our lives. Modern biomedical devices can acquire a large number of physical readings from patients. Often, these readings are stored in the form of time series data. Such time series data can form the basis for important research to advance healthcare and wel...
Article
Full-text available
Most heating, ventilation, and air-conditioning (HVAC) systems operate with one or more faults that result in increased energy consumption and that could lead to system failure over time. Today, most building owners are performing reactive maintenance only and may be less concerned or less able to assess the health of the system until catastrophic...
Article
Density functional theory (DFT) within the local or semilocal density approximations, i.e., the local density approximation (LDA) or generalized gradient approximation (GGA), has become a workhorse in the electronic structure theory of solids, being extremely fast and reliable for energetics and structural properties, yet remaining highly inaccurat...
Article
Record linkage is an important problem studied widely in many domains including biomedical informatics. A standard version of this problem is to cluster records from several datasets, such that each cluster has records pertinent to just one individual. Typically, datasets are huge in size. Hence, existing record linkage algorithms take a very long...
Preprint
Full-text available
In this work, we consider the problem of designing secure and efficient federated learning (FL) frameworks. Existing solutions either involve a trusted aggregator or require heavyweight cryptographic primitives, which degrades performance significantly. Moreover, many existing secure FL designs work only under the restrictive assumption that none o...
Article
Full-text available
Background Nowadays we are observing an explosion of gene expression data with phenotypes. It enables us to accurately identify genes responsible for certain medical condition as well as classify them for drug target. Like any other phenotype data in medical domain, gene expression data with phenotypes also suffer from being a very underdetermined...
Article
Indoor radon concentrations are controlled by both human factors and geological factors. It is important to separate the anthropogenic and geogenic contributions. We show that there is a positive correlation between the radiometric map of uranium in the ground and the measured radon in the household in Sweden. A map of gamma radiation is used to ob...
Preprint
Full-text available
Background Current form of genome-wide association studies (GWAS) is inadequate to accurately explain the genetics of complex traits due to the lack of sufficient statistical power. It explores each variant individually, but current studies show that multiple variants with varying effect sizes actually act in a concerted way to develop a complex di...
Preprint
Background Alzheimer’s disease (AD) is the most common form of dementia among older people. It is a complex disease and the genetics and environmental factors behind it are not conclusive yet. Traditional statistical analyses are inadequate to identify variants, genes, or pathways capable of explaining AD as a unit. In this context, pathway network...
Article
Full-text available
The dimensionality of the spatially distributed channels and the temporal resolution of electroencephalogram (EEG) based brain-computer interfaces (BCI) undermine emotion recognition models. Thus, prior to modeling such data, as the final stage of the learning pipeline, adequate preprocessing, transforming, and extracting temporal (i.e., time-serie...
Article
A biological pathway is an ordered set of interactions between intracellular molecules having collective activity that impacts cellular function, for example, by controlling metabolite synthesis or by regulating the expression of sets of genes. They play a key role in advanced studies of genomics. However, existing pathway analytics methods are ina...
Conference Paper
The large model size, high computational operations, and vulnerability against membership inference attack (MIA) have impeded deep learning or deep neural networks (DNNs) popularity, especially on mobile devices. To address the challenge, we envision that the weight pruning technique will help DNNs against MIA while reducing model storage and compu...
Chapter
A biological pathway is an ordered set of interactions between intracellular molecules having collective activity that impacts cellular function, for example, by controlling metabolite synthesis or by regulating the expression of sets of genes. They play a key role in advanced studies of genomics. However, existing pathway analytics methods are ina...
Article
Full-text available
Objective To investigate seasonality and temporal trends in the incidence of NEC. Study design A retrospective cohort study from two tertiary NICUs in northern and central Connecticut involving 16,761 infants admitted over a 28-year period. Various perinatal and neonatal risk factors were evaluated by univariate, multivariate, and spectral density...
Preprint
Full-text available
Although federated learning has increasingly gained attention in terms of effectively utilizing local devices for data privacy enhancement, recent studies show that publicly shared gradients in the training process can reveal the private training images (gradient leakage) to a third-party in computer vision. We have, however, no systematic understa...
Book
This book constitutes the proceedings of the 10th International Conference on Computational Advances in Bio and Medical Sciences, ICCABS 2020, held in December 2020. Due to COVID-19 pandemic the conference was held virtually. The 6 regular and 5 invited papers presented in this book were carefully reviewed and selected from 16 submissions. The use...
Preprint
Full-text available
Widespread availability of next-generation sequencing (NGS) technologies has prompted a recent surge in interest in the microbiome. As a consequence, metagenomics is a fast growing field in bioinformatics and computational biology. An important problem in analyzing metagenomic sequenced data is to identify the microbes present in the sample and fig...
Article
Discovering patterns in biological sequences is a crucial step to extract useful information from them. Motif search has numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns etc. The general problem of motif search is intractable. One of the most studied models of motif s...
Preprint
Full-text available
Distributed learning such as federated learning or collaborative learning enables model training on decentralized data from users and only collects local gradients, where data is processed close to its sources for data privacy. The nature of not centralizing the training data addresses the privacy issue of privacy-sensitive data. Recent studies sho...
Article
Full-text available
Feature selection is a crucial problem in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and Least Absolute Shrinkage and Selection Operator (LASSO), can select features during training. However, these embedded approaches can only be applied to a small subs...
Preprint
Full-text available
Density functional theory within the local or semilocal density approximations (DFT-LDA/GGA) has become a workhorse in electronic structure theory of solids, being extremely fast and reliable for energetics and structural properties, yet remaining highly inaccurate for predicting band gaps of semiconductors and insulators. Accurate prediction of ba...
Preprint
Deep learning or deep neural networks (DNNs) have nowadays enabled high performance, including but not limited to fraud detection, recommendations, and different kinds of analytical transactions. However, the large model size, high computational cost, and vulnerability against membership inference attack (MIA) have impeded its popularity, especiall...
Preprint
Full-text available
The closest pair of points problem or closest pair problem (CPP) is an important problem in computational geometry where we have to find a pair of points from a set of points in metric space with the smallest distance between them. This problem arises in a number of applications, such as but not limited to clustering, graph partitioning, image proc...
Preprint
Full-text available
Nowadays we are observing an explosion of gene expression data with phenotypes. It enables researchers to efficiently identify genes responsible for certain medical condition as well as classify them for drug target. Like any other phenotype data in medical domain, gene expression data with phenotypes also suffers from being very underdetermined sy...
Chapter
Given a collection of records, the problem of record linkage is to cluster them such that each cluster contains all the records of one and only one individual. Existing algorithms for this important problem have large run times especially when the number of records is large. Often, a small number of new records have to be linked with a large number...
Article
The evidence base in health psychology is vast and growing rapidly. These factors make it difficult (and sometimes practically impossible) to consider all available evidence when making decisions about the state of knowledge on a given phenomenon (e.g., associations of variables, effects of interventions on particular outcomes). Systematic reviews,...
Book
This book constitutes revised selected papers from the 9th International Conference on Computational Advances in Bio and Medical Sciences, ICCABS 2019, held in Miami, Florida, USA in November 2019. The 15 papers presented in this volume were carefully reviewed and selected from 30 submissions. They deal with topics such as computational biology; bi...
Chapter
The closest pair of points problem or closest pair problem (CPP) is an important problem in computational geometry where we have to find a pair of points from a set of points in a metric space with the smallest distance between them. This problem arises in a number of applications, such as but not limited to clustering, graph partitioning, image pr...
Conference Paper
Underwater acoustic sensor networks (UWASNs) have been introduced as a new technology to extract the data for underwater real-time applications such as seismic monitoring, undersea monitoring and control, oil well inspection, military applications, and disaster prevention. This new technology adds more networking capabilities and enables real-time...
Conference Paper
Higher order spectra (HOS) are a powerful tool in nonlinear time series analysis and they have been extensively used as feature representations in data mining, communications and cosmology domains. However, HOS estimation suffers from high computational cost and memory consumption. Any algorithm for computing the kth order spectra on a dataset of s...
Conference Paper
In recent years, convolutional neural networks (CNN) have been successfully employed for performing various tasks due to their high capacity. However, just like a double-edged sword, high capacity results from millions of parameters, which also brings a huge amount of redundancy and dramatically increases the computational complexity. The task of p...
Article
Full-text available
Motivation: Metagenomics is the study of genetic materials directly sampled from natural habitats. It has the potential to reveal previously hidden diversity of microscopic life largely due to the existence of highly parallel and low-cost next-generation sequencing technology. Conventional approaches align metagenomic reads onto known reference ge...
Article
The National Science Foundation (NSF) 2018 Materials and Data Science Hackathon (MATDAT18) took place at the Residence Inn Alexandria Old Town/Duke Street, Alexandria, VA over the period May 30–June 1, 2018. This three-day collaborative “hackathon” or “datathon” brought together teams of materials scientists and data scientists to collaboratively e...
Article
Full-text available
Background: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-motif model is one of these that has been studied wi...
Chapter
Motif mining is a classical data mining problem which aims to extract relevant information and discover knowledge from voluminous datasets in a variety of domains. Specifically, for the temporal data containing real numbers, it is formulated as time series motif mining (TSMM) problem. If the input is alphabetical and edit-distance is considered, th...
Chapter
A Generative Adversarial Network (GAN) is an unsupervised generative framework to generate a sample distribution that is identical to the data distribution. Recently, mix strategy multi-generator/discriminator GANs have been shown to outperform single pair GANs. However, the mixed model suffers from the problem of linearly growing training time. Al...
Chapter
The Closest Pair Problem (CPP) is one of the fundamental problems that has a wide range of applications in data mining, such as unsupervised data clustering, user pattern similarity search, etc. A number of exact and approximate algorithms have been proposed to solve it in the low dimensional space. In this paper, we address the problem when the me...
Article
With the advances in the next generation sequencing technology, huge amounts of data have been and get generated in biology. A bottleneck in dealing with such datasets lies in developing effective algorithms for extracting useful information from them. Algorithms for finding patterns in biological data pave the way for extracting crucial informatio...
Preprint
Full-text available
Polyspectral estimation is a problem of great importance in the analysis of nonlinear time series that has applications in biomedical signal processing, communications, geophysics, image, radar, sonar and speech processing, etc. Higher order spectra (HOS) have been used in unsupervised and supervised clustering in big data scenarios, in testing for...
Article
Full-text available
Minimotif Miner (MnM) is a database and web system for analyzing short functional peptide motifs, termed minimotifs. We present an update to MnM growing the database from ∼300 000 to >1 000 000 minimotif consensus sequences and instances. This growth comes largely from updating data from existing databases and annotation of articles with high-throu...
Conference Paper
Through the use of network simulators we are able to test and examine different combinations of protocols in low cost and controlled environments. To ensure the accuracy of these simulators it is crucial that they consistently expand and enhance their modules to offer extensive support. In this paper, we introduce Aqua-Sim Next Generation, an NS-3...
Conference Paper
Advances made in sequencing technology have resulted in the sequencing of thousands of genomes. Novel analysis tools are needed to process these data and extract useful information. Such tools could aid in personalized medicine. As an example, we could identify the causes for a disease by comparing the genomes of people who have the disease and tho...
Article
Full-text available
In prior works, stochastic dual coordinate ascent (SDCA) has been parallelized in a multi-core environment where the cores communicate through shared memory, or in a multi-processor distributed memory environment where the processors communicate through message passing. In this paper, we propose a hybrid SDCA framework for multi-core clusters, the...
Preprint
In prior works, stochastic dual coordinate ascent (SDCA) has been parallelized in a multi-core environment where the cores communicate through shared memory, or in a multi-processor distributed memory environment where the processors communicate through message passing. In this paper, we propose a hybrid SDCA framework for multi-core clusters, the...
Conference Paper
Full-text available
RNA Sequencing (RNA-seq) based on next-generation sequencing (NGS) technology enables transcriptome analyses of entire genomes at a very high resolution. Due to limitations of the sequencing technology the reads are very short and erroneous. As a consequence it is a very challenging task to accurately map RNA-seq reads onto the genome and identify...
Conference Paper
The concept of securing data in Underwater Sensor Networks (UWSN) is an ongoing conflict due to the many challenges faced in this harsh environment. Due to the restriction of energy among underwater modems, we must ensure that our transmissions are efficient as possible. Furthermore, we must consider the open possibility of mobile malicious nodes i...
Article
Full-text available
Background Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for so...
Article
Full-text available
Motivation Next-generation sequencing (NGS) techniques produce millions to billions of short reads. The procedure is not only very cost effective but also can be done in laboratory environment. The state-of-the-art sequence assemblers then construct the whole genomic sequence from these reads. Current cutting edge computing technology makes it poss...
Conference Paper
Identifying the location of a target in an unbounded underwater environment is a challenging task. Limited and intermittent communication only adds to the difficulty of the process. Many target search algorithms use communication as an essential structural component to coordinate Autonomous Underwater Vehicles’ (AUV) motion and share sensory inform...
Article
Motivation: A massive number of bioinformatics applications require counting of k-length substrings in genetically important long strings. A k-mer counter generates the frequencies of each k-length substring in genome sequences. Genome assembly, repeat detection, multiple sequence alignment, error detection, and many other related applications use...
Article
Full-text available
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in fin...
Poster
High-temperature gas sensors have recently garnered significant attention in industrial and research fields as concerns about the environment and energy consumption have been rising. • Specifically, the demand of wireless high-temperature surface acoustic wave (SAW) sensors has been increasing because they show promising sensing characteristics in...
Article
Full-text available
All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new "C-terminome" database and web sys...
Data
GERP score analysis for minimotif conservation. (XLSX)

Network

Cited By