Vipin KumarUniversity of Minnesota Twin Cities | UMN · Department of Computer Science and Engineering
Vipin Kumar
Phd, computer science
About
746
Publications
446,144
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
85,897
Citations
Introduction
Vipin Kumar currently works at the Department of Computer Science and Engineering, University of Minnesota Twin Cities. Vipin does research in Data Mining, Machine learning, and their applications to environmental and health sciences. His current projects are is 'Big Data in Climate' and Machine Learning for Health Care.
Additional affiliations
August 1989 - present
University of Minnesota Twin Cities
August 1989 - present
January 1983 - July 1989
Education
August 1979 - December 1982
January 1978 - May 1979
Philips international institute
Field of study
- Electronics engineering
July 1973 - May 1977
Publications
Publications (746)
The prediction of streamflows and other environmental variables in unmonitored basins is a grand challenge in hydrology. Recent machine learning (ML) models can harness vast datasets for accurate predictions at large spatial scales. However, there are open questions regarding model design and data needed for inputs and training to improve performan...
Streamflow, vital for water resource management, is governed by complex hydrological systems involving intermediate processes driven by meteorological forces. While deep learning models have achieved state-of-the-art results of streamflow prediction, their end-to-end single-task learning approach often fails to capture the causal relationships with...
Accurate long-term predictions are the foundations for many machine learning applications and decision-making processes. Traditional time series approaches for prediction often focus on either autoregressive modeling, which relies solely on past observations of the target ``endogenous variables'', or forward modeling, which considers only current c...
We present a knowledge-guided machine learning (KGML) framework for modeling multi-scale processes, and study its performance in the context of streamflow forecasting in hydrology. Specifically, we propose a novel hierarchical recurrent neural architecture that factorizes the system dynamics at multiple temporal scales and captures their interactio...
In recent years, there is increased interest in foundation models for geoscience due to vast amount of earth observing satellite imagery. Existing remote sensing foundation models make use of the various sources of spectral imagery to create large models pretrained on masked reconstruction task. The embeddings from these foundation models are then...
Machine learning (ML) has been broadly applied for vadose zone applications in recent years. This article provides a comprehensive review of such developments. ML applications for variables corresponding to different complex vadose zone processes are summarized mostly in a prediction context. By analyzing and assessing these applications, we discov...
This paper presents a deep supervised learning architecture for 30-min global precipitation nowcasts with a 4-h lead time. The architecture follows a U-Net structure with convolutional long short-term memory (ConvLSTM) cells empowered by ConvLSTM-based skip connections to reduce information loss due to the pooling operation. The training uses data...
Accurate and cost-effective quantification of the carbon cycle for agroecosystems at decision-relevant scales is critical to mitigating climate change and ensuring sustainable food production. However, conventional process-based or data-driven modeling approaches alone have large prediction uncertainties due to the complex biogeochemical processes...
Process-based models are widely used to predict the agroecosystem dynamics, but such modeled results often contain considerable uncertainty due to the imperfect model structure, biased model parameters, and inaccurate or inaccessible model inputs. Data assimilation (DA) techniques are widely adopted to reduce prediction uncertainty by calibrating m...
We present a Task-aware modulation using Representation Learning (TAM-RL) framework that enhances personalized predictions in few-shot settings for heterogeneous systems when individual task characteristics are not known. TAM-RL extracts embeddings representing the actual inherent characteristics of these entities and uses these characteristics to...
Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world's freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river...
Meeting the United Nation’ Sustainable Development Goals (SDGs) calls for an integrative scientific approach, combining expertise, data, models and tools across many disciplines towards addressing sustainability challenges at various spatial and temporal scales. This holistic approach, while necessary, exacerbates the big data and computational cha...
Improving the estimation of CO exchange between the atmosphere and terrestrial ecosystems is critical to reducing the large uncertainty in the global carbon budget. Large amounts of the atmospheric CO assimilated by plants return to the atmosphere by ecosystem respiration (Reco), including plant autotrophic respiration (Ra) and soil microbial heter...
Cashews are grown by over 3 million smallholder farmers in >40 countries worldwide as a principal source of income. Expanding the area of cashew plantations and increasing productivity are critical to improving the livelihood of many smallholder communities. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew...
Personalized prediction of responses for individual entities caused by external drivers is vital across many disciplines. Recent machine learning (ML) advances have led to new state-of-the-art response prediction models. Models built at a population level often lead to sub-optimal performance in many personalized prediction settings due to heteroge...
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across...
Deep learning (DL) models are increasingly used to make accurate hindcasts of management‐relevant variables, but they are less commonly used in forecasting applications. Data assimilation (DA) can be used for forecasts to leverage real‐time observations, where the difference between model predictions and observations today is used to adjust the mod...
Applying Deep Learning (DL) models to graphical causal learning has brought outstanding effectiveness and efficiency but is still far from widespread use in domain sciences. In research of EHR (Electronic Healthcare Records), we realize that some confounding bias inherently exists in the causally formed data, which DL cannot automatically adjust. T...
In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited p...
Creating separable representations via representation learning and clustering is critical in analyzing large unstructured datasets with only a few labels. Separable representations can lead to supervised models with better classification capabilities and additionally aid in generating new labeled samples. Most unsupervised and semisupervised method...
The astounding success of these methods has made it imperative to obtain more explainable and trustworthy estimates from these models. In hydrology, basin characteristics can be noisy or missing, impacting streamflow prediction. For solving inverse problems in such applications, ensuring explainability is pivotal for tackling issues relating to dat...
Streamflow prediction is a long‐standing hydrologic problem. Development of models for streamflow prediction often requires incorporation of catchment physical descriptors to characterize the associated complex hydrological processes. Across different scales of catchments, these physical descriptors also allow models to extrapolate hydrologic infor...
Lakes and reservoirs, as most humans experience and use them, are dynamic bodies of water, with surface extents that increase and decrease with seasonal precipitation patterns, long-term changes in climate, and human management decisions. This paper presents a new global dataset that contains the location and surface area variations of 681,137 lake...
Agricultural nitrous oxide (N2O) emission accounts for a non-trivial fraction of global greenhouse gas (GHG) budget. To date, estimating N2O fluxes from cropland remains a challenging task because the related microbial processes (e.g., nitrification and denitrification) are controlled by complex interactions among climate, soil, plant and human act...
Deep learning (DL) models can accurately predict many hydrologic variables including streamflow and water temperature; however, these models have typically predicted hydrologic variables independently. This study explored the benefits of modeling two interdependent variables, daily average streamflow and daily average stream water temperature, toge...
There is a growing consensus that solutions to complex science and engineering problems require novel methodologies that are able to integrate traditional physics-based modeling approaches with state-of-the-art machine learning (ML) techniques. This paper provides a structured overview of such techniques. Application-centric objective areas for whi...
The dataset described here includes estimates of historical (1980–2020) daily surface water temperature, lake metadata, and daily weather conditions for lakes bigger than 4 ha in the conterminous United States (n = 185,549), and also in situ temperature observations for a subset of lakes (n = 12,227). Estimates were generated using a long short‐ter...
The global decline of water quality in rivers and streams has resulted in a pressing need to design new watershed management strategies. Water quality can be affected by multiple stressors including population growth, land use change, global warming, and extreme events, with repercussions on human and ecosystem health. A scientific understanding of...
Lakes and reservoirs, as most humans experience and use them, are dynamic bodies of water, with surface extents that increase and decrease with seasonal precipitation patterns, long-term changes in climate, and human management decisions. This paper presents a new global dataset that contains the location and surface area variations of 683,734 medi...
Agriculture contributes nearly a quarter of global greenhouse gas (GHG) emissions, which is motivating interest in certain farming practices that have the potential to reduce GHG emissions or sequester carbon in soil. The related GHG emission (including N2O and CH4) and changes in soil carbon stock are defined here as “agricultural carbon outcomes”...
Objective:
Hospital-acquired infections (HAIs) are associated with significant morbidity, mortality, and prolonged hospital length of stay. Risk prediction models based on pre- and intraoperative data have been proposed to assess the risk of HAIs at the end of the surgery, but the performance of these models lag behind HAI detection models based o...
Agricultural nitrous oxide (N2O) emission accounts for a non-trivial fraction of global greenhouse gases (GHGs) budget. To date, estimating N2O fluxes from cropland remains a challenging task because the related microbial processes (e.g., nitrification and denitrification) are controlled by complex interactions among climate, soil, plant and human...
Machine Learning is being extensively used in hydrology, especially streamflow prediction of basins/watersheds. Basin characteristics are essential for modeling the rainfall-runoff response of these watersheds and therefore data-driven methods must take into account this ancillary characteristics data. However there are several limitations, namely...
In many applications, finding adequate labeled data to train predictive models is a major challenge. In this work, we propose methods to use group-level binary labels as weak supervision to train instance-level binary classification models. Aggregate labels are common in several domains where annotating on a group-level might be cheaper or might be...
Collecting large annotated datasets in Remote Sensing is often expensive and thus can become a major obstacle for training advanced machine learning models. Common techniques of addressing this issue, based on the underlying idea of pre-training the Deep Neural Networks (DNN) on freely available large datasets, cannot be used for Remote Sensing due...
Near-term forecasts of environmental outcomes can inform real-time decision making. Data assimilation modeling techniques can be used for forecasts to leverage real-time data streams, where the difference between model predictions and observations can be used to adjust the model to make better predictions tomorrow. In this use case, we developed a...
Mapping and monitoring crops is a key step towards sustainable intensification of agriculture and addressing global food security. A dataset like ImageNet that revolutionized computer vision applications can accelerate development of novel crop mapping techniques. Currently, the United States Department of Agriculture (USDA) annually releases the C...
Major societal and environmental challenges involve complex systems that have diverse multi-scale interacting processes. Consider, for example, how droughts and water reserves affect crop production and how agriculture and industrial needs affect water quality and availability. Preventive measures, such as delaying planting dates and adopting new a...
Objective
The association of body mass index (BMI) and all-cause mortality is controversial, frequently referred to as a paradox. Whether the cause is metabolic factors or statistical biases is still controversial. We assessed the association of BMI and all-cause mortality considering a wide range of comorbidities and baseline mortality risk.
Meth...
Most environmental data come from a minority of well-monitored sites. An ongoing challenge in the environmental sciences is transferring knowledge from monitored sites to unmonitored sites. Here, we demonstrate a novel transfer-learning framework that accurately predicts depth-specific temperature in unmonitored lakes (targets) by borrowing models...
Diseases can show different courses of progression even when patients share the same risk factors. Recent studies have revealed that the use of trajectories, the order in which diseases manifest throughout life, can be predictive of the course of progression. In this study, we propose a novel computational method for learning disease trajectories f...
Several deep learning methods for phase retrieval exist, but most of them fail on realistic data without precise support information. We propose a novel method based on single-instance deep generative prior that works well on complex-valued crystal data.
Physics-based models are often used to study engineering and environmental systems. The ability to model these systems is the key to achieving our future environmental sustainability and improving the quality of human life. This article focuses on simulating lake water temperature, which is critical for understanding the impact of changing climate...
The availability of massive earth observing satellite data provide huge opportunities for land use and land cover mapping. However, such mapping effort is challenging due to the existence of various land cover classes, noisy data, and the lack of proper labels. Also, each land cover class typically has its own unique temporal pattern and can be ide...
Timely and accurate monitoring of tree crop extent and productivities are necessary for informing policy-making and investments. However, except for a very few tree species (e.g., oil palms) with obvious canopy and extensive planting, most small-crown tree crops are understudied in the remote sensing domain. To conduct large-scale small-crown tree...
Land cover mapping is essential for monitoring global environmental change and managing natural resources. Unfortunately, traditional classification models are plagued by limited training data available in existing land cover products and data heterogeneity over space and time. In this survey, we provide a structured and comprehensive overview of c...