Lex Fridman

Lex Fridman
Massachusetts Institute of Technology | MIT

PhD

About

69
Publications
70,953
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,921
Citations
Introduction
I'm a research scientist at MIT, working on human-centered artificial intelligence. In particular, I'm developing and applying new computer vision and deep learning approaches in the context of self-driving cars with a human-in-the-loop. I work with large-scale, real-world data, with the goal of building intelligent systems that have real-world impact. I received my BS, MS, and PhD from Drexel University where I worked on applications of machine learning, computer vision, and decision fusion techniques in a number of fields including robotics, active authentication, activity recognition, and optimal resource allocation on multi-commodity networks. Before joining MIT, I was at Google working on machine learning for large-scale behavior-based authentication.
Additional affiliations
September 2014 - May 2015
Google Inc.
Position
  • Visiting Researcher

Publications

Publications (69)
Article
Non-intrusive, real-time analysis of the dynamics of the eye region allows us to monitor humans’ visual attention allocation and estimate their mental state during the performance of real-world tasks, which can potentially benefit a wide range of human-computer interaction (HCI) applications. While commercial eye-tracking devices have been frequent...
Article
Semantic scene segmentation has primarily been addressed by forming high-level visual representations of single images. The problem of semantic segmentation in dynamic scenes has begun to receive attention with the video object segmentation and tracking problem. While there has been some recent work attempt to use deep learning models on the video...
Article
Little is known about the use of driving automation in production vehicles. This study measured mileage driven on different roadway classes when level 1 and 2 automation was engaged. Volunteer drivers drove for 4 weeks in a Range Rover Evoque with adaptive cruise control (ACC) or a Volvo S90 with ACC and Pilot Assist (PA) with one of two versions o...
Preprint
Full-text available
Object detection is a critical part of visual scene understanding. The representation of the object in the detection task has important implications on the efficiency and feasibility of annotation, robustness to occlusion, pose, lighting, and other visual sources of semantic uncertainty, and effectiveness in real-world applications (e.g., autonomou...
Article
Full-text available
Today, and possibly for a long time to come, the full driving task is too complex an activity to be fully formalized as a sensing-acting robotics system that can be explicitly solved through model-based and learning-based approaches in order to achieve full unconstrained vehicle autonomy. Localization, mapping, scene perception, vehicle control, tr...
Article
If a vehicle is driving itself and asks the driver to take over, how much time does the driver need to comprehend the scene and respond appropriately? Previous work on natural-scene perception suggests that observers quickly acquire the gist, but gist-level understanding may not be sufficient to enable action. The moving road environment cannot be...
Preprint
Full-text available
When asked, a majority of people believe that, as pedestrians, they make eye contact with the driver of an approaching vehicle when making their crossing decisions. This work presents evidence that this widely held belief is false. We do so by showing that, in majority of cases where conflict is possible, pedestrians begin crossing long before they...
Preprint
Full-text available
Humans, as both pedestrians and drivers, generally skillfully navigate traffic intersections. Despite the uncertainty, danger, and the non-verbal nature of communication commonly found in these interactions, there are surprisingly few collisions considering the total number of interactions. As the role of automation technology in vehicles grows, it...
Preprint
Full-text available
We use an immersive virtual reality environment to explore the intricate social cues that underlie non-verbal communication involved in a pedestrian's crossing decision. We "hack" non-verbal communication between pedestrian and vehicle by engineering a set of 15 vehicle trajectories, some of which follow social conventions and some that break them....
Preprint
Full-text available
Semantic scene segmentation has primarily been addressed by forming representations of single images both with supervised and unsupervised methods. The problem of semantic segmentation in dynamic scenes has begun to recently receive attention with video object segmentation approaches. What is not known is how much extra information the temporal dyn...
Preprint
Full-text available
There are as many paths to mass adoption of autonomous vehicle systems as there are people, companies, and governments willing to try to engineer and support the development of such systems. Opinions vary vastly. Many researchers, engineers, and policy-makers believe that semi-autonomous systems are too difficult to engineer safely and effectively...
Conference Paper
Full-text available
Researchers, technology reviewers, and governmental agencies have expressed concern that automation may necessitate the introduction of added displays to indicate vehicle intent in vehicle-to-pedestrian interactions. An automated online methodology for obtaining communication intent perceptions for 30 external vehicle-to-pedestrian display concepts...
Preprint
Full-text available
Building effective, enjoyable, and safe autonomous vehicles is a lot harder than has historically been considered. The reason is that, simply put, an autonomous vehicle must interact with human beings. This interaction is not a robotics problem nor a machine learning problem nor a psychology problem nor an economics problem nor a policy problem. It...
Article
As machine learning approaches ubiquity in industrial systems and consumer products, human factors research must attend to machine learning, specifically on how intelligent systems built on machine learning are different from early generations of automated systems, and what these differences mean for human-system interaction, design, evaluation and...
Preprint
Full-text available
We propose that safe, beautiful, fulfilling vehicle HMI design must start from a rigorous consideration of minimalist design. Modern vehicles are changing from mechanical machines to mobile computing devices, similar to the change from landline phones to smartphones. We propose the approach of "designing toward minimalism", where we ask "why?" rath...
Conference Paper
Full-text available
Cognitive load has been shown, over hundreds of validated studies, to be an important variable for understanding human performance. However, establishing practical, non-contact approaches for automated estimation of cognitive load under real-world conditions is far from a solved problem. Toward the goal of designing such a system, we propose two no...
Conference Paper
Full-text available
We will explore how deep learning approaches can be used for perceiving and interpreting the state and behavior of human beings in images, video, audio, and text data. The course will cover how convolutional, recurrent and generative neural networks can be used for applications of face recognition, eye tracking, cognitive load estimation, emotion r...
Article
Cognitive load has been shown, over hundreds of validated studies, to be an important variable for understanding human performance. However, establishing practical, non-contact approaches for automated estimation of cognitive load under real-world conditions is far from a solved problem. Toward the goal of designing such a system, we propose two no...
Article
Full-text available
There is an increase in usage of smaller cells or femtocells to improve performance and coverage of next-generation heterogeneous wireless networks (HetNets). However, the interference caused by femtocells to neighboring cells is a limiting performance factor in dense HetNets. This interference is being managed via distributed resource allocation m...
Article
Full-text available
The relationship between a driver’s glance orientation and corresponding head rotation is highly complex due to its nonlinear dependence on the individual, task, and driving context. This paper presents expanded analytic detail and findings from an effort that explored the ability of head pose to serve as an estimator for driver gaze by connecting...
Conference Paper
Full-text available
We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a model-free, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and...
Article
Full-text available
Today, and possibly for a long time to come, the full driving task is too complex an activity to be fully formalized as a sensing-acting robotics system that can be explicitly solved through model-based and learning-based approaches in order to achieve full unconstrained vehicle autonomy. Localization, mapping, scene perception, vehicle control, tr...
Article
Full-text available
We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an "arguing machines" framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operati...
Article
Full-text available
Visualizing the information available to a human observer in a single glance at an image provides a powerful tool for evaluating models of full-field human vision. The hard part is human-realistic visualization of the periphery. Degradation of information with distance from fixation is far more complex than a mere reduction of acuity that might be...
Conference Paper
Full-text available
In the peripheral field of view our visual system provides a much lower image quality than in the central region. This has often been attributed to a mere loss of spatial acuity, but recent investigations suggest that the system uses a more refined strategy. For lowering its data load it computes a statistical summary representation based on low-le...
Conference Paper
Full-text available
We consider a large dataset of real-world, on-road driving from a 100-car naturalistic study to explore the predictive power of driver glances and, specifically, to answer the following question: what can be predicted about the state of the driver and the state of the driving environment from a 6-second sequence of macro-glances? The context-based...
Conference Paper
Failures in drivers’ attention allocation become evident when multi-tasking related demands leave vehicle operators unable to detect or respond appropriately to roadway threats or interfere adversely with their ability to appropriately control the vehicle. Robust methods for obtaining evidence and data about demands upon and decrements in the alloc...
Article
Multitasking related demands can adversely affect drivers' allocation of attention to the roadway, resulting in delays or missed responses to roadway threats and to decrements in driving performance. Robust methods for obtaining evidence and data about demands on and decrements in the allocation of driver attention are needed as input for design, t...
Article
Full-text available
We propose a framework for semi-automated annotation of video frames where the video is of an object that at any point in time can be labeled as being in one of a finite number of discrete states. A Hidden Markov Model (HMM) is used to model (1) the behavior of the underlying object and (2) the noisy observation of its state through an image proces...
Article
We consider a large dataset of real-world, on-road driving from a 100-car naturalistic study to explore the predictive power of driver glances and, specifically, to answer the following question: what can be predicted about the state of the driver and the state of the driving environment from a 6-second sequence of macro-glances? The context-based...
Article
Driving is an intricate task where different demands compete for the driver’s attention. Current interface designs present novel multi-modal interactions that extend beyond traditional visual-manual modalities. These new interaction paradigms have given rise to additional subtask elements which call upon varying degrees of cognitive, auditory, voca...
Conference Paper
This study explores the effects of minor changes in automation level on drivers' engagement in secondary activities. Three levels of automation were tested: manual, semi-autonomous, and fully-autonomous. Potential distractor items were present and participants were instructed they could use them if they felt it was safe. Hand positions and engageme...
Conference Paper
Full-text available
Representations obtained from the statistical pooling of features gain increasing popularity. The common assumption is that low-level features are best suited for such a statistical pooling. Here we investigate which level of a visual feature hierarchy can actually produce the optimal statistical representation. We make use of the award-winning VGG...
Article
Automated estimation of the allocation of a driver's visual attention could be a critical component of future advanced driver assistance systems. In theory, vision-based tracking of the eye can provide a good estimate of gaze location. But in practice, eye tracking from video is challenging because of sunglasses, eyeglass reflections, lighting cond...
Article
Full-text available
Our senses can process only a limited amount of the incoming sensory information. In the human visual system this becomes apparent in the reduced performance in the peripheral field of view, as compared to the central fovea. Recent research shows that this loss of information cannot solely be attributed to a spatially coarser resolution but is esse...
Article
Full-text available
The relationship between a driver's glance pattern and corresponding head rotation is highly complex due to its nonlinear dependence on the individual, task, and driving context. This study explores the ability of head pose to serve as an estimator for driver gaze by connecting head rotation data with manually coded gaze region data using both a st...
Article
Full-text available
Cell biasing and downlink transmit power are two controls that may be used to improve the spectral efficiency of cellular networks. With cell biasing, each mobile user associates with the base station offering, say, the highest biased signal to interference plus noise ratio. Biasing affects the cell association decisions of mobile users, but not th...
Article
Full-text available
We introduce a recurrent neural network architecture for automated road surface wetness detection from audio of tire-surface interaction. The robustness of our approach is evaluated on 785,826 bins of audio that span an extensive range of vehicle speeds, noises from the environment, road surface types, and pavement conditions including internationa...
Article
Full-text available
We present a large-scale study, exploring the capability of temporal deep neural networks in interpreting natural human kinematics and introduce the first method for active biometric authentication with mobile inertial sensors. At Google, we have created a first-of-its-kind dataset of human movements, passively collected by 1500 volunteers using th...
Article
We propose a method for automated synchronization of vehicle sensors useful for the study of multi-modal driver behavior and for the design of advanced driver assistance systems. Multi-sensor decision fusion relies on synchronized data streams in (1) the offline supervised learning context and (2) the online prediction context. In practice, such da...
Article
Full-text available
Accurate, robust, inexpensive gaze tracking in the car can help keep a driver safe by facilitating the more effective study of how to improve (1) vehicle interfaces and (2) the design of future Advanced Driver Assistance Systems. In this paper, we estimate head pose and eye pose from monocular video using methods developed extensively in prior work...
Article
Full-text available
Automated estimation of the allocation of a driver's visual attention may be a critical component of future Advanced Driver Assistance Systems. In theory, vision-based tracking of the eye can provide a good estimate of gaze location. In practice, eye tracking from video is challenging because of sunglasses, eyeglass reflections, lighting conditions...
Article
Full-text available
Active authentication is the problem of continuously verifying the identity of a person based on behavioral aspects of their interaction with a computing device. In this study, we collect and analyze behavioral biometrics data from 200subjects, each using their personal Android mobile device for a period of at least 30 days. This dataset is novel i...
Conference Paper
Full-text available
Active authentication is the process of continuously verifying a user based on his/her ongoing interactions with a computer. Forensic sty-lometry is the study of linguistic style applied to author (user) identification. This paper evaluates the Active Linguistic Authentication Dataset, collected from users working individually in an office environm...
Conference Paper
Full-text available
Active authentication is the process of continuously verifying a user based on his/her ongoing interaction with the computer. Forensic stylometry is the study of linguistic style, applied to author (user) identification. We evaluate the Active Linguistic Authentication Dataset [Juola et al., 2013], collected from users working individually in an of...
Article
Full-text available
The authors apply a decision fusion architecture on a collection of behavioral biometric sensors using keystroke dynamics, mouse movement, stylometry, and Web browsing behavior. They test this active authentication approach on a dataset collected from 19 individuals in an office environment.
Conference Paper
Full-text available
In this paper, we consider cellular downlink communication from a set of fixed stations to a set of users with uncertain locations modeled by a spatial distribution. Each user is associated with a transmitter through association zones that maximize the signal-to-interference-plus-noise ratio (SINR) to that user. We define an expected spatial capaci...
Conference Paper
Full-text available
The interaction between humans and most desktop and laptop computers is often performed through two input devices: the keyboard and the mouse. Continuous tracking of these devices provides an opportunity to verify the identity of a user, based on a profile of behavioral biometrics from the user's previous interaction with these devices. We propose...
Article
Full-text available
We present a software library that aids in the design of mobile ad hoc networks (MANET). The OMAN design engine works by taking a specification of network requirements and objectives, and allocates resources which satisfy the input constraints and maximize the communication performance objective. The tool is used to explore networking design option...
Conference Paper
Full-text available
In the energy-constrained medium of video sensor networks, the objective of much research has been to statistically minimize the number of nodes that will achieve a sufficient degree of coverage. We consider increasing the number of nodes beyond the threshold of full coverage, and cooperatively filtering out the high level of redundant data in the...
Conference Paper
Full-text available
Complex graphs, ones containing thousands of nodes of high degree, are difficult to visualize. Displaying all of the nodes and edges of these graphs can create an incomprehensible cluttered output. This paper presents a simplification algorithm that may be applied to a complex graph in order to produce a controlled thinning of the graph. Using impo...
Conference Paper
Full-text available
Movement and allocation of network resources for a system of communicating agents are usually optimized independently. Path planning under kinematic restrictions and obstacle avoidance provides a set of paths for the agents, and given the paths, it is then the job of network design algorithms to allocate communication resources to ensure a satisfac...
Conference Paper
Full-text available
Cognitive radios permit dynamic control of physical layer resources such as transmission power and constellation size; these degrees of freedom can be employed to achieve significant improvements in network throughput above that obtainable using conventional radios (with fixed transmission power and constellation size). In this paper we present a u...
Conference Paper
Full-text available
Resource allocation in ad hoc communication networks is a field of high complexity because of both i) the distributed nature of the interactions between the nodes, and ii) the large set of control variables for even the most primitive networks. Visual representation of this information across physical space and across layers of the network can grea...
Conference Paper
Full-text available
Path planning and network design are often treated by architects of mobile communication networks as separate problems. In fact, most mobile ad hoc network (MANET) designs do not consider the path that the network nodes would take as part of the objective set, but incorporate them in an abstract form as general constraints on mobility (limit on the...
Conference Paper
Full-text available
The conventional philosophy in designing mobile networks is that network node movement should be independent of network state. However, there are practical situations where movement decisions may be modified to ensure connectivity. For example, emergency responders in a crisis region relying upon an ad hoc network may need constant reliable communi...
Conference Paper
Full-text available
In this paper we apply robust optimization techniques to the problem of power control in mobile ad hoc wireless networks. Our approach is inherently multi-objective in that we seek a solution set that trades off the dual objectives of achieving optimality and maintaining feasibility. In particular, our objective is to minimize the aggregate power e...

Network

Cited By