
David Crandall- Indiana University Bloomington
David Crandall
- Indiana University Bloomington
About
132
Publications
29,790
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
9,214
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (132)
Stereophotogrammetry [62] is an established technique for scene understanding. Its origins go back to at least the 1800s when people first started to investigate using photographs to measure the physical properties of the world. Since then, thousands of approaches have been explored. The classic geometric technique of Shape from Stereo is built on...
Video temporal grounding aims to localize relevant temporal boundaries in a video given a textual prompt. Recent work has focused on enabling Video LLMs to perform video temporal grounding via next-token prediction of temporal timestamps. However, accurately localizing timestamps in videos remains challenging for Video LLMs when relying solely on t...
This short paper presents preliminary research on the Case-Enhanced Vision Transformer (CEViT), a similarity measurement method aimed at improving the explainability of similarity assessments for image data. Initial experimental results suggest that integrating CEViT into k-Nearest Neighbor (k-NN) classification yields classification accuracy compa...
Counterfeit goods are pervasive, being found in products as diverse as textiles and optical media to pharmaceuticals and sensitive electronics. Here, an anti‐counterfeit platform is reported in which plasmonic nanoparticles (NPs) are used to create unique image tags that can be authenticated quickly and reliably. Specifically, plasmonic NPs are ass...
The last several years have witnessed remarkable progress in video-and-language (VidL) understanding. However, most modern VidL approaches use complex and specialized model architectures and sophisticated pretraining protocols, making the reproducibility, analysis and comparisons of these frameworks difficult. Hence, instead of proposing yet anothe...
Video segmentation—partitioning video frames into multiple segments or objects—plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing. Recently, with the renaissance of connectionism in computer visi...
Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of...
Video anomaly detection has been extensively studied for static cameras but is much more challenging in egocentric driving videos where the scenes are extremely dynamic. This paper proposes an unsupervised method for anomaly detection based on future object localization. The idea is to predict locations of traffic participants short time steps into...
In this paper, we propose a model that can attack segmentation models with semantic and dynamic targets in the context of self-driving. Specifically, our model is designed to map an input image as well as its corresponding label to perturbations. After adding the perturbation to the input image, the adversarial example can manipulate the labels of...
We propose to predict the future trajectories of observed agents (e.g., pedestrians or vehicles) by estimating and using their goals at multiple time scales. We argue that the goal of a moving agent may change over time, and modeling goals continuously provides more accurate and detailed information for future trajectory estimation. In this paper,...
Visible-infrared person re-identification (VI-ReID) is a challenging cross-modality pedestrian retrieval problem. Due to the large intra-class variations and cross-modality discrepancy with large amount of sample noise, it is difficult to learn discriminative part features. Existing VI-ReID methods instead tend to learn global representations, whic...
Our project is at the interface of Big Data and HPC – High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several differe...
Infants are powerful learners. A large corpus of experimental paradigms demonstrate that infants readily learn distributional cues of name-object co-occurrences. But infants' natural learning environment is cluttered: every heard word has multiple competing referents in view. Here we ask how infants start learning name-object co-occurrences in natu...
Pervasive photo sharing in online social media platforms can cause unintended privacy violations when elements of an image reveal sensitive information. Prior studies have identified image obfuscation methods (e.g., blurring) to enhance privacy, but many of these methods adversely affect viewers' satisfaction with the photo, which may cause people...
Knowing who is in one's vicinity is key to managing privacy in everyday environments, but is challenging for people with visual impairments. Wearable cameras and other sensors may be able to detect such information, but how should this complex visually-derived information be conveyed in a way that is discreet, intuitive, and unobtrusive? Motivated...
Recognizing abnormal events such as traffic violations and accidents in natural driving scenes is essential for successful autonomous and advanced driver assistance systems. However, most work on video anomaly detection suffers from one of two crucial drawbacks. First, it assumes cameras are fixed and videos have a static background, which is reaso...
Transferring knowledge across different datasets is an important approach to successfully train deep models with a small-scale target dataset or when few labeled instances are available. In this paper, we aim at developing a model that can generalize across multiple domain shifts, so that this model can adapt from a single source to multiple target...
Real-world learning systems have practical limitations on the quality and quantity of the training datasets that they can collect and consider. How should a system go about choosing a subset of the possible training examples that still allows for learning accurate, generalizable models? To help address this question, we draw inspiration from a high...
Most work on temporal action detection is formulated in an offline manner, in which the start and end times of actions are determined after the entire video is fully observed. However, real-time applications including surveillance and driver assistance systems require identifying actions as soon as each video frame arrives, based only on current an...
Infants and toddlers view the world, at a basic sensory level, in a fundamentally different way from their parents. This is largely due to biological constraints: infants possess different body proportions than their parents and the ability to control their own head movements is less developed. Such constraints limit the visual input available. Thi...
Our project is at Interface Big Data and HPC -- High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several different app...
Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi...
In a world of pervasive cameras, public spaces are often captured from multiple perspectives by cameras of different types, both fixed and mobile. An important problem is to organize these heterogeneous collections of videos by finding connections between them, such as identifying correspondences between the people appearing in the videos and the p...
Automatic image captioning has been studied extensively over the last few years, driven by breakthroughs in deep learning-based image-to-text translation models. However, most of this work has considered captioning web images from standard data sets like MS-COCO, and has considered single images in isolation. To what extent can automatic captioning...
A single image captures the appearance and position of multiple entities in a scene as well as their complex interactions. As a consequence, natural language grounded in visual contexts tends to be diverse---with utterances differing as focus shifts to specific objects, interactions, or levels of detail. Recently, neural sequence models such as RNN...
With the rise of digital photography and social networking, people are sharing personal photos online at an unprecedented rate. In addition to their main subject matter, photographs often capture various incidental information that could harm people's privacy. While blurring and other image filters may help obscure private content, they also often...
In a world in which cameras are becoming more and more pervasive, scenes in public spaces are often captured from multiple perspectives by diverse types of cameras, including surveillance and wearable cameras. An important problem is how to organize these heterogeneous collections of videos by finding connections between them, such as identifying c...
A major emerging challenge is how to protect people's privacy as cameras and computer vision are increasingly integrated into our daily lives, including in smart devices inside homes. A potential solution is to capture and record just the minimum amount of information needed to perform a task of interest. In this paper, we propose a fully-coupled t...
Deep learning methods have surpassed the performance of traditional techniques on a wide range of problems in computer vision, but nearly all of this work has studied consumer photos, where precisely correct output is often not critical. It is less clear how well these techniques may apply on structured prediction problems where fine-grained output...
Ground-penetrating radar on planes and satellites now makes it practical to collect 3D observations of the subsurface structure of the polar ice sheets, providing crucial data for understanding and tracking global climate change. But converting these noisy readings into useful observations is generally done by hand, which is impractical at a contin...
Recent advances in wearable camera technology have led many cognitive psychologists to study the development of the human visual system by recording the field of view of infants and toddlers. Meanwhile, the vast success of deep learning in computer vision is driving researchers in both disciplines to aim to benefit from each other's understanding....
Toddlers quickly learn to recognize thousands of everyday objects despite the seemingly suboptimal training conditions of a visually cluttered world. One reason for this success may be that toddlers do not just passively perceive visual information, but actively explore and manipulate objects around them. The work in this paper is based on the idea...
We propose a novel convolutional neural network architecture for estimating geospatial functions such as population density, land cover, or land use. In our approach, we combine overhead and ground-level images in an end-to-end trainable neural network, which uses kernel regression and density estimation to convert features extracted from the groun...
Since its ambitious beginnings to create a hyperlinked information system, the web has evolved over 25 years to become our primary means of expression and communication. No longer limited to text, the evolving visual features of websites are important signals of larger societal shifts in humanity's technologies, aesthetics, cultures, and industries...
With the help of various assistive devices, people with visual impairments are able to live their lives with greater independence both online and offline. But significant work remains to understand and address their safety, security, and privacy concerns, especially in the physical, offline world. For example, people with visual impairments are par...
With the help of various assistive devices, people with visual impairments are able to live their lives with greater independence both online and offline. But significant work remains to understand and address their safety, security, and privacy concerns, especially in the physical, offline world. For example, people with visual impairments are par...
Given the progress in image recognition with recent data driven paradigms, it's still expensive to manually label a large training data to fit a convolutional neural network (CNN) model. This paper proposes a hybrid supervised-unsupervised method combining a pre-trained AlexNet with Latent Dirichlet Allocation (LDA) to extract image topics from bot...
Two major trends in computing systems are the growth in high performance computing (HPC) with an international exascale initiative, and the big data phenomenon with an accompanying cloud infrastructure of well publicized dramatic and increasing size and sophistication. This tutorial weaves these trends together using some key building blocks. The f...
Status of NSF 1443054 Project -------------------------------------------------- Big Data Application Analysis identifies features of data intensive applications that need to be supported in software and represented in benchmarks. This analysis was started for proposal and has been extended to support HPC-Simulations-Big Data convergence. The proje...
This covers Streaming workshops held, IoTCloud for cloud control of robots, SPIDAL project, HPC-ABDS, WebPlotviz visualization and Stock Market data, Scientific paper impact analysis for XSEDE
This poster covers the Harp HPC Hadoop plugin, RaPyDLI deep learning system, Virtual Clusters on XSEDE Comet system, Cloudmesh to defer Ansible Big data applications, Big Data Ogres and Diamonds to converge HPC and Big Data, Performance of Flink on machine learning
This poster introduces all of DSC projects below and covers 1) 3) 4) 5) 1) Digital Science Center Facilities 2) RaPyDLI Deep Learning Environment 3) SPIDAL Scalable Data Analytics Library and applications including Bioinformatics and Polar Remote Sensing Data Analysis 4) MIDAS Big Data Software; Harp for HPC-ABDS 5) Big Data Ogres Classification an...
Lifelogging cameras capture everyday life from a first-person perspective, but generate so much data that it is hard for users to browse and organize their image collections effectively. In this paper, we propose to use automatic image captioning algorithms to generate textual representations of these collections. We develop and explore novel techn...
Neural sequence models are widely used to model time-series data in many fields. Equally ubiquitous is the usage of beam search (BS) as an approximate inference algorithm to decode output sequences from these models. BS explores the search space in a greedy left-right fashion retaining only the top-$B$ candidates -- resulting in sequences that diff...
Accurate, efficient, global observation of natural events is important for ecologists, meteorologists, governments, and the public. Satellites are effective but limited by their perspective and by atmospheric conditions. Public images on photo-sharing websites could provide crowd-sourced ground data to complement satellites, since photos contain ev...
Subjective and sentiment analysis has gained considerable attention recently.
Most of the resources and systems built so far are done for English. The need
for designing systems for other languages is increasing. This paper surveys
different ways used for building systems for subjective and sentiment analysis
for languages other than English. There...
This is a 21-month progress report on an NSFfunded project NSF14-43054 started October 1, 2014 and involving a collaboration between university teams at Arizona, Emory, Indiana (lead), Kansas, Rutgers, Virginia Tech, and Utah. The project is constructing data building blocks to address major cyberinfrastructure challenges in seven different communi...
This paper presents a novel model of science funding that exploits the wisdom of the scientific crowd. Each researcher receives an equal, unconditional part of all available science funding on a yearly basis, but is required to individually donate to other scientists a given fraction of all they receive. Science funding thus moves from one scientis...
During early visual development, the infant's body and actions both create and constrain the experiences on which the visual system grows. Evidence on early motor development suggests a bias for acting on objects with the eyes, head, trunk, hands, and object aligned at midline. Because these sensory-motor bodies structure visual input, they may als...
Lifelogging cameras capture everyday life from a first-person perspective, but generate so much data that it is hard for users to browse and organize their image collections effectively. In this paper, we propose to use automatic image captioning algorithms to generate textual representations of these collections. We develop and explore novel techn...
The dramatic growth of social media websites over the last few years has created huge collections of online images and raised new challenges in organizing them effectively. One particularly intuitive way of browsing and searching images is by the geo-spatial location of where on Earth they were taken, but most online images do not have GPS
metadata...
People with visual impairments face a variety of obstacles in their daily lives. Recent work has identified specific physical privacy concerns of this population and explored how emerging technology , such as wearable devices, could help. In this study we investigated their physical safety and security concerns and behaviors by conducting interview...
Many practical perception systems exist within larger processes which often include interactions with users or additional components that are capable of evaluating the quality of predicted solutions. In these contexts, it is beneficial to provide these oracle mechanisms with multiple highly likely hypotheses rather than a single prediction. In this...
Low-cost, lightweight wearable cameras let us record (or 'lifelog') our lives from a 'first-person' perspective for purposes ranging from fun to therapy. But they also capture private information that people may not want to be recorded, especially if images are stored in the cloud or visible to other people. For example, recent studies suggest that...
Simultaneous Localization and Mapping (SLAM) for mobile robots is a computationally expensive task. A robot capable of SLAM needs a powerful onboard computer, but this can limit the robot's mobility because of weight and power demands. We consider moving this task to a remote compute cloud, by proposing a general cloud-based architecture for real-t...
The basal topography of the Canadian Arctic Archipelago ice caps is unknown for a number of the glaciers which drain the ice caps. The basal topography is needed for calculating present sea level contribution using the surface mass balance and discharge method and to understand future sea level contributions using ice flow model studies. During the...
Hands appear very often in egocentric video, and their appearance and pose give important cues about what people are doing and what they are paying attention to. But existing work in hand detection has made strong assumptions that work well in only simple scenarios, such as with limited interaction with other people or in lab settings. We develop m...
Convolutional Neural Networks have achieved state-of-the-art performance on a
wide range of tasks. Most benchmarks are led by ensembles of these powerful
learners, but ensembling is typically treated as a post-hoc procedure
implemented by averaging independently trained models with model variation
induced by bagging or random initialization. In thi...
Wearable devices are becoming part of everyday life, from first-person cameras (GoPro, Google Glass), to smart watches (Apple Watch), to activity trackers (FitBit). These devices are often equipped with advanced sensors that gather data about the wearer and the environment. These sensors enable new ways of recognizing and analyzing the wearer's eve...
With vast quantities of imagery now available online, researchers have begun to explore whether visual patterns can be discovered automatically. Here we consider the particular domain of architecture, using huge collections of street-level imagery to find visual patterns that correspond to semantic-level architectural elements distinctive to partic...
Various technologies have been developed to help make the world more accessible to visually impaired people, and recent advances in low-cost wearable and mobile computing are likely to drive even more advances. However, the unique privacy and security needs of visually impaired people remain largely unaddressed. We conducted an exploratory user stu...
While media reports about wearable cameras have focused on the privacy concerns of bystanders, the perspectives of the `lifeloggers' themselves have not been adequately studied. We report on additional analysis of our previous in-situ lifelogging study in which 36 participants wore a camera for a week and then reviewed the images to specify privacy...
Geographic location is a powerful property for organizing large-scale photo collections, but only a small fraction of online photos are geo-tagged. Most work in automatically estimating geo-tags from image content is based on comparison against models of buildings or landmarks, or on matching to large reference collections of geotagged images. Thes...
Climate models that predict polar ice sheet behavior require accurate measurements of the bedrock-ice and ice-air boundaries in ground-penetrating radar imagery. Identifying these features is typically performed by hand, which can be tedious and error prone. We propose an approach for automatically estimating layer boundaries by viewing this task a...
We live and work in environments that are inundated with cameras embedded in
devices such as phones, tablets, laptops, and monitors. Newer wearable devices
like Google Glass, Narrative Clip, and Autographer offer the ability to quietly
log our lives with cameras from a `first person' perspective. While capturing
several meaningful and interesting m...
Consumer electronic devices like smartphones increasingly feature arrays of sensors that can 'see', 'hear', and 'feel' the environment around them. While these devices began with primitive capabilities, newer generations of electronics offer sophisticated sensing arrays that collect high-fidelity representations of the physical world. For example,...
A number of wearable 'lifelogging' camera devices have been released recently, allowing consumers to capture images and other sensor data continuously from a first-person perspective. Unlike traditional cameras that are used deliberately and sporadically, lifelogging devices are always 'on' and automatically capturing images. Such features may chal...
Understanding visual attention in children could yield insight into how the visual system develops during formative years and how children's overt attention plays a role in development and learning. We are particularly interested in the role of hands and hand activities in children's visual attention. We use head-mounted cameras to collect egocentr...
Egocentric cameras are becoming more popular, introducing increasing volumes of video in which the biases and framing of traditional photography are replaced with those of natural viewing tendencies. This paradigm enables new applications, including novel studies of social interaction and human development. Recent work has focused on identifying th...
Photo-sharing websites have become very popular in the last few years, leading to huge collections of online images. In addition to image data, these websites collect a variety of multimodal metadata about photos including text tags, captions, GPS coordinates, camera metadata, user profiles, etc. However, this metadata is not well constrained and i...
A response by Bollen et al.
Mobile devices collect a variety of information about their environments, recording “digital footprints” about the locations and activities of their human owners. These footprints come from physical sensors such as GPS, WiFi, and Bluetooth, as well as social behavior logs like phone calls, application usage, etc. Existing studies analyze mobile dev...
Vehicle recognition is a challenging task with many useful applications. State-of-the-art methods usually learn discriminative classifiers for different vehicle categories or different viewpoint angles, but little work has explored vehicle recognition using semantic visual attributes. In this paper, we propose a novel iterative multiple instance le...
The traditional peer review system for grant proposals is not always optimal. A new crowdfunding proposal based on advances in technology and mathematics could improve efficiency while retaining peer judgement.
The billions of public photos on online social media sites contain a vast amount of latent visual information about the world. In this paper, we study the feasibility of observing the state of the natural world by recognizing specific types of scenes and objects in large-scale social image collections. More specifically, we study whether we can rec...