Vipin Kumar

Vipin Kumar
University of Minnesota Twin Cities | UMN · Department of Computer Science and Engineering

Phd, computer science

About

718
Publications
362,138
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
75,843
Citations
Citations since 2017
119 Research Items
30168 Citations
201720182019202020212022202301,0002,0003,0004,0005,000
201720182019202020212022202301,0002,0003,0004,0005,000
201720182019202020212022202301,0002,0003,0004,0005,000
201720182019202020212022202301,0002,0003,0004,0005,000
Introduction
Vipin Kumar currently works at the Department of Computer Science and Engineering, University of Minnesota Twin Cities. Vipin does research in Data Mining, Machine learning, and their applications to environmental and health sciences. His current projects are is 'Big Data in Climate' and Machine Learning for Health Care.
Additional affiliations
August 1989 - present
University of Minnesota Twin Cities
August 1989 - present
University of Minnesota Twin Cities
Position
  • Professor (Full)
January 1983 - July 1989
University of Texas at Austin
Position
  • Professor (Assistant)
Education
August 1979 - December 1982
University of Maryland, College Park
Field of study
  • Computer science
January 1978 - May 1979
Philips international institute
Field of study
  • Electronics engineering
July 1973 - May 1977
Indian Institute of Technology Roorkee
Field of study
  • Electronics engineering

Publications

Publications (718)
Preprint
Full-text available
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across...
Article
Deep learning (DL) models are increasingly used to make accurate hindcasts of management‐relevant variables, but they are less commonly used in forecasting applications. Data assimilation (DA) can be used for forecasts to leverage real‐time observations, where the difference between model predictions and observations today is used to adjust the mod...
Preprint
Applying Deep Learning (DL) models to graphical causal learning has brought outstanding effectiveness and efficiency but is still far from widespread use in domain sciences. In research of EHR (Electronic Healthcare Records), we realize that some confounding bias inherently exists in the causally formed data, which DL cannot automatically adjust. T...
Preprint
In many environmental applications, recurrent neural networks (RNNs) are often used to model physical variables with long temporal dependencies. However, due to mini-batch training, temporal relationships between training segments within the batch (intra-batch) as well as between batches (inter-batch) are not considered, which can lead to limited p...
Preprint
Full-text available
Creating separable representations via representation learning and clustering is critical in analyzing large unstructured datasets with only a few labels. Separable representations can lead to supervised models with better classification capabilities and additionally aid in generating new labeled samples. Most unsupervised and semisupervised method...
Preprint
Full-text available
The astounding success of these methods has made it imperative to obtain more explainable and trustworthy estimates from these models. In hydrology, basin characteristics can be noisy or missing, impacting streamflow prediction. For solving inverse problems in such applications, ensuring explainability is pivotal for tackling issues relating to dat...
Article
Full-text available
Streamflow prediction is a long‐standing hydrologic problem. Development of models for streamflow prediction often requires incorporation of catchment physical descriptors to characterize the associated complex hydrological processes. Across different scales of catchments, these physical descriptors also allow models to extrapolate hydrologic infor...
Article
Full-text available
Lakes and reservoirs, as most humans experience and use them, are dynamic bodies of water, with surface extents that increase and decrease with seasonal precipitation patterns, long-term changes in climate, and human management decisions. This paper presents a new global dataset that contains the location and surface area variations of 681,137 lake...
Article
Full-text available
Agricultural nitrous oxide (N2O) emission accounts for a non-trivial fraction of global greenhouse gas (GHG) budget. To date, estimating N2O fluxes from cropland remains a challenging task because the related microbial processes (e.g., nitrification and denitrification) are controlled by complex interactions among climate, soil, plant and human act...
Article
Full-text available
Deep learning (DL) models can accurately predict many hydrologic variables including streamflow and water temperature; however, these models have typically predicted hydrologic variables independently. This study explored the benefits of modeling two interdependent variables, daily average streamflow and daily average stream water temperature, toge...
Article
There is a growing consensus that solutions to complex science and engineering problems require novel methodologies that are able to integrate traditional physics-based modeling approaches with state-of-the-art machine learning (ML) techniques. This paper provides a structured overview of such techniques. Application-centric objective areas for whi...
Article
Full-text available
The dataset described here includes estimates of historical (1980–2020) daily surface water temperature, lake metadata, and daily weather conditions for lakes bigger than 4 ha in the conterminous United States (n = 185,549), and also in situ temperature observations for a subset of lakes (n = 12,227). Estimates were generated using a long short‐ter...
Article
The global decline of water quality in rivers and streams has resulted in a pressing need to design new watershed management strategies. Water quality can be affected by multiple stressors including population growth, land use change, global warming, and extreme events, with repercussions on human and ecosystem health. A scientific understanding of...
Preprint
Lakes and reservoirs, as most humans experience and use them, are dynamic bodies of water, with surface extents that increase and decrease with seasonal precipitation patterns, long-term changes in climate, and human management decisions. This paper presents a new global dataset that contains the location and surface area variations of 683,734 medi...
Preprint
Agriculture contributes nearly a quarter of global greenhouse gas (GHG) emissions, which is motivating interest in certain farming practices that have the potential to reduce GHG emissions or sequester carbon in soil. The related GHG emission (including N2O and CH4) and changes in soil carbon stock are defined here as “agricultural carbon outcomes”...
Article
Objective: Hospital-acquired infections (HAIs) are associated with significant morbidity, mortality, and prolonged hospital length of stay. Risk prediction models based on pre- and intraoperative data have been proposed to assess the risk of HAIs at the end of the surgery, but the performance of these models lag behind HAI detection models based o...
Preprint
Full-text available
Agricultural nitrous oxide (N2O) emission accounts for a non-trivial fraction of global greenhouse gases (GHGs) budget. To date, estimating N2O fluxes from cropland remains a challenging task because the related microbial processes (e.g., nitrification and denitrification) are controlled by complex interactions among climate, soil, plant and human...
Preprint
Full-text available
Machine Learning is being extensively used in hydrology, especially streamflow prediction of basins/watersheds. Basin characteristics are essential for modeling the rainfall-runoff response of these watersheds and therefore data-driven methods must take into account this ancillary characteristics data. However there are several limitations, namely...
Preprint
Full-text available
In many applications, finding adequate labeled data to train predictive models is a major challenge. In this work, we propose methods to use group-level binary labels as weak supervision to train instance-level binary classification models. Aggregate labels are common in several domains where annotating on a group-level might be cheaper or might be...
Preprint
Full-text available
Collecting large annotated datasets in Remote Sensing is often expensive and thus can become a major obstacle for training advanced machine learning models. Common techniques of addressing this issue, based on the underlying idea of pre-training the Deep Neural Networks (DNN) on freely available large datasets, cannot be used for Remote Sensing due...
Preprint
Full-text available
Near-term forecasts of environmental outcomes can inform real-time decision making. Data assimilation modeling techniques can be used for forecasts to leverage real-time data streams, where the difference between model predictions and observations can be used to adjust the model to make better predictions tomorrow. In this use case, we developed a...
Preprint
Full-text available
Mapping and monitoring crops is a key step towards sustainable intensification of agriculture and addressing global food security. A dataset like ImageNet that revolutionized computer vision applications can accelerate development of novel crop mapping techniques. Currently, the United States Department of Agriculture (USDA) annually releases the C...
Article
Major societal and environmental challenges involve complex systems that have diverse multi-scale interacting processes. Consider, for example, how droughts and water reserves affect crop production and how agriculture and industrial needs affect water quality and availability. Preventive measures, such as delaying planting dates and adopting new a...
Article
Full-text available
Objective: The association of body mass index (BMI) and all-cause mortality is controversial, frequently referred to as a paradox. Whether the cause is metabolic factors or statistical biases is still controversial. We assessed the association of BMI and all-cause mortality considering a wide range of comorbidities and baseline mortality risk. Me...
Article
Full-text available
Most environmental data come from a minority of well-monitored sites. An ongoing challenge in the environmental sciences is transferring knowledge from monitored sites to unmonitored sites. Here, we demonstrate a novel transfer-learning framework that accurately predicts depth-specific temperature in unmonitored lakes (targets) by borrowing models...
Article
Diseases can show different courses of progression even when patients share the same risk factors. Recent studies have revealed that the use of trajectories, the order in which diseases manifest throughout life, can be predictive of the course of progression. In this study, we propose a novel computational method for learning disease trajectories f...
Preprint
Full-text available
Several deep learning methods for phase retrieval exist, but most of them fail on realistic data without precise support information. We propose a novel method based on single-instance deep generative prior that works well on complex-valued crystal data.
Article
Physics-based models are often used to study engineering and environmental systems. The ability to model these systems is the key to achieving our future environmental sustainability and improving the quality of human life. This article focuses on simulating lake water temperature, which is critical for understanding the impact of changing climate...
Preprint
Full-text available
The availability of massive earth observing satellite data provide huge opportunities for land use and land cover mapping. However, such mapping effort is challenging due to the existence of various land cover classes, noisy data, and the lack of proper labels. Also, each land cover class typically has its own unique temporal pattern and can be ide...
Article
Full-text available
Timely and accurate monitoring of tree crop extent and productivities are necessary for informing policy-making and investments. However, except for a very few tree species (e.g., oil palms) with obvious canopy and extensive planting, most small-crown tree crops are understudied in the remote sensing domain. To conduct large-scale small-crown tree...
Preprint
Full-text available
Land cover mapping is essential for monitoring global environmental change and managing natural resources. Unfortunately, traditional classification models are plagued by limited training data available in existing land cover products and data heterogeneity over space and time. In this survey, we provide a structured and comprehensive overview of c...
Conference Paper
Several deep learning methods for phase retrieval exist, but most of them fail on realistic data without precise support information. We propose a novel method based on single-instance deep generative prior that works well on complex-valued crystal data.
Conference Paper
The existing iterative and data-driven methods fail to solve phase retrieval due to the intrinsic problem symmetries. We propose two end-to-end learning methods that break the barrier and work in a new regime.
Preprint
Streamflow prediction is one of the key challenges in the field of hydrology due to the complex interplay between multiple non-linear physical mechanisms behind streamflow generation. While physically-based models are rooted in rich understanding of the physical processes, a significant performance gap still remains which can be potentially address...
Preprint
Causal inference is a powerful statistical methodology for explanatory analysis and individualized treatment effect (ITE) estimation, a prominent causal inference task that has become a fundamental research problem. ITE estimation, when performed naively, tends to produce biased estimates. To obtain unbiased estimates, counterfactual information is...
Preprint
Most environmental data come from a minority of well-observed sites. An ongoing challenge in the environmental sciences is transferring knowledge from monitored sites to unobserved sites. Here, we demonstrate a novel transfer learning framework that accurately predicts temperature in unobserved lakes (targets) by borrowing models from highly observ...
Preprint
Full-text available
This paper proposes a physics-guided machine learning approach that combines advanced machine learning models and physics-based models to improve the prediction of water flow and temperature in river networks. We first build a recurrent graph network model to capture the interactions among multiple segments in the river network. Then we present a p...
Article
Full-text available
Global river monitoring is an important mission within the remote sensing society. One of the main challenges faced by this mission is generating an accurate water mask from remote sensing images (RSI) of rivers (RSIR), especially on a global scale with various river features. Aiming at better water area classification using semantic information, t...
Article
Full-text available
Phosphorus (P) loading to lakes is degrading the quality and usability of water globally. Accurate predictions of lake P dynamics are needed to understand whole-ecosystem P budgets, as well as the consequences of changing lake P concentrations for water quality. However, complex biophysical processes within lakes, along with limited observational d...
Article
Full-text available
The recent availability of freely and openly available satellite remote sensing products has enabled the implementation of global surface water monitoring at a level not previously possible. Here we present a global set of satellite-derived time series of surface water storage variations for lakes and reservoirs for a period that covers the satelli...
Preprint
Full-text available
In many physical systems, inputs related by intrinsic system symmetries are mapped to the same output. When inverting such systems, i.e., solving the associated inverse problems, there is no unique solution. This causes fundamental difficulties for deploying the emerging end-to-end deep learning approach. Using the generalized phase retrieval probl...
Preprint
In this manuscript, we provide a structured and comprehensive overview of techniques to integrate machine learning with physics-based modeling. First, we provide a summary of application areas for which these approaches have been applied. Then, we describe classes of methodologies used to construct physics-guided machine learning models and hybrid...
Article
Full-text available
Expansion of large-scale tree plantations for commodity crop and timber production is a leading cause of tropical deforestation. While automated detection of plantations across large spatial scales and with high temporal resolution is critical to inform policies to reduce deforestation, such mapping is technically challenging. Thus, most available...
Article
Full-text available
Background: The ubiquity of electronic health records (EHR) offers an opportunity to observe trajectories of laboratory results and vital signs over long periods of time. This study assessed the value of risk factor trajectories available in the electronic health record to predict incident type 2 diabetes. Study design and methods: Analysis was...
Preprint
Full-text available
Many real-world phenomena are observed at multiple resolutions. Predictive models designed to predict these phenomena typically consider different resolutions separately. This approach might be limiting in applications where predictions are desired at fine resolutions but available training data is scarce. In this paper, we propose classification a...
Preprint
Full-text available
Abstract. The recent availability of freely and openly available satellite remote sensing products has enabled the implementation of global surface water monitoring to a level not previously possible. Here we present a global set of satellite-derived time series of surface water storage variations for lakes and reservoirs for a period that covers t...
Chapter
Full-text available
Many real-world phenomena are observed at multiple resolutions. Predictive models designed to predict these phenomena typically consider different resolutions separately. This approach might be limiting in applications where predictions are desired at fine resolutions but available training data is scarce. In this paper, we propose classification a...
Poster
Understanding the impacts of climate change on natural and human systems poses major challenges as it requires the integration of models and data across various disciplines, including hydrology, agriculture, ecosystem modeling, and econometrics. While tactical situations arising from an extreme weather event require rapid responses, integrating the...