Zhaozheng Yin's research while affiliated with Stony Brook University and other places

Publications (66)

Preprint
Microscopy cell images of biological experiments on different tissues/organs/imaging conditions usually contain cells with various shapes and appearances on different image backgrounds, making a cell counting model trained in a source domain hard to be transferred to a new target domain. Thus, costly manual annotation is required to train deep lear...
Article
Phase contrast microscopy, as a noninvasive imaging technique, has been widely used to monitor the behavior of transparent cells without staining or altering them. Due to the optical principle of the specifically-designed microscope, phase contrast microscopy images contain artifacts such as halo and shade-off which hinder the cell segmentation and...
Preprint
Full-text available
Detecting dangerous traffic agents in videos captured by vehicle-mounted dashboard cameras (dashcams) is essential to facilitate safe navigation in a complex environment. Accident-related videos are just a minor portion of the driving video big data, and the transient pre-accident processes are highly dynamic and complex. Besides, risky and non-ris...
Preprint
Full-text available
Aerial robots such as drones have been leveraged to perform bridge inspections. Inspection images with both recognizable structural elements and apparent surface defects can be collected by onboard cameras to provide valuable information for the condition assessment. This article aims to determine a suitable deep neural network (DNN) for parsing mu...
Article
The rapid advancement of sensor technologies and artificial intelligence are creating new opportunities for traffic safety enhancement. Dashboard cameras (dashcams) have been widely deployed on both human driving vehicles and automated driving vehicles. A computational intelligence model that can accurately and promptly predict accidents from the d...
Article
Full-text available
As artificial intelligence and industrial automation are developing, human-robot collaboration (HRC) with advanced interaction capabilities has become an increasingly significant area of research. In this paper, we design and develop a real-time, multi-model HRC system using speech and gestures. A set of sixteen dynamic gestures is designed for com...
Article
Future action anticipation aims to infer future actions from the observation of a small set of past video frames. In this paper, we propose a novel Jointly-learnt Action Anticipation Network (J-AAN) via Self-Knowledge Distillation (Self-KD) and cycle consistency for future action anticipation. In contrast to the current state-of-the-art methods whi...
Article
Full-text available
To deal with the challenges in video object detection (VOD), such as occlusion and motion blur, many state-of-the-art video object detectors adopt a feature aggregation module to encode the long-range contextual information to support the current frame. The main drawbacks of these detectors are three-folds: first, the frame-wise detection slows dow...
Chapter
The purpose of cell counting is to estimate the number of cells in microscopy images. Most popular methods obtain the cell numbers by integrating the density maps that are generated by deep cell counting networks. However, these cell counting networks that reply on estimated cell density maps may leave cell locations in a black-box. In this paper,...
Chapter
Full-text available
Cell segmentation is a fundamental and critical step in numerous biomedical image studies. For the fully-supervised cell segmentation algorithms, although highly effective, a large quantity of high-quality training data is required, which is usually labor-intensive to produce. In this work, we formulate the unsupervised cell segmentation as a sligh...
Chapter
Detecting the airway anomaly can be an essential part to aid the lung disease diagnosis. Since normal human airways share an anatomical structure, we design a graph prototype whose structure follows the normal airway anatomy. Then, we learn the prototype and a graph neural network from a weakly-supervised airway dataset, i.e., only the holistic lab...
Chapter
Recent advances in deep learning have achieved impressive results on microscopy cell counting tasks. The success of deep learning models usually needs sufficient training data with manual annotations, which can be time-consuming and costly. In this paper, we propose an annotation-efficient cell counting approach which injects cell counting networks...
Preprint
Full-text available
Bridge inspection is an important step in preserving and rehabilitating transportation infrastructure for extending their service lives. The advancement of mobile robotic technology allows the rapid collection of a large amount of inspection video data. However, the data are mainly images of complex scenes, wherein a bridge of various structural el...
Article
A carefully tailored tone in response to a complaint on social media can create positive emotions for an upset customer. However, very few studies have identified what response tones, based on an established theory, would be most effective for complaint management. This study conceptualizes a service agent's response tones based on Ballmer and Bren...
Article
Full-text available
Real-time Action Recognition (ActRgn) of assembly workers can timely assist manufacturers in correcting human mistakes and improving task performance. Yet, recognizing worker actions in assembly reliably is challenging because such actions are complex and fine-grained, and workers are heterogeneous. This paper proposes to create an individualized s...
Preprint
Full-text available
Recently, autonomous vehicles and those equipped with an Advanced Driver Assistance System (ADAS) are emerging. They share the road with regular ones operated by human drivers entirely. To ensure guaranteed safety for passengers and other road users, it becomes essential for autonomous vehicles and ADAS to anticipate traffic accidents from natural...
Article
Full-text available
Recent progress in video object detection (VOD) has shown that aggregating features from other frames to capture long-range contextual information is very important to deal with the challenges in VOD, such as partial occlusion, motion blur, etc. To exploit more effective feature aggregation, we propose several improvements over previous works in th...
Preprint
Full-text available
To assist human drivers and autonomous vehicles in assessing crash risks, driving scene analysis using dash cameras on vehicles and deep learning algorithms is of paramount importance. Although these technologies are increasingly available, driving scene analysis for this purpose still remains a challenge. This is mainly due to the lack of annotate...
Article
Bridge inspection is an important step in preserving and rehabilitating transportation infrastructure for extending their service lives. The advancement of mobile robotic technology allows the rapid collection of a large amount of inspection video data. However, the data are mainly the images of complex scenes, wherein a bridge of various structura...
Article
Modeling the moving behaviors and predicting the future paths of pedestrians, especially for those in complex scenes, remain a challenging problem in machine learning. We recognize that human motion trajectories, governed by social norms and constrained by physical structures of the surrounding environment, are both forward predictable and backward...
Article
Reducing traffic fatal crashes has been an important mission of transportation. With the rapid development of sensor and Artificial Intelligence (AI) technologies, the computer vision (CV)-based crash anticipation in the near-crash phase is receiving growing attention. The ability to perceive fatal crash risks in an early stage is of paramount impo...
Article
We Introduce a simple yet effective network that embeds a novel Discriminative Feature Pooling (DFP) mechanism and a novel Video Segment Attention Model (VSAM), for video-based human action recognition from both trimmed and untrimmed videos. Our DFP module introduces an attentional pooling mechanism for 3D Convolutional Neural Networks that attenti...
Preprint
In 2019, outbreaks of vaccine-preventable diseases reached the highest number in the US since 1992. Medical misinformation, such as antivaccine content propagating through social media, is associated with increases in vaccine delay and refusal. Our overall goal is to develop an automatic detector for antivaccine messages to counteract the negative...
Article
We propose weakly supervised training schemes to train end-to-end cell segmentation networks that only require a single point annotation per cell as the training label and generate a high-quality segmentation mask close to those fully supervised methods using mask annotation on cells. Three training schemes are investigated to train cell segmentati...
Preprint
Full-text available
This paper presents an intelligent price suggestion system for online second-hand listings based on their uploaded images and text descriptions. The goal of price prediction is to help sellers set effective and reasonable prices for their second-hand items with the images and text descriptions uploaded to the online platforms. Specifically, we desi...
Preprint
Full-text available
Different from shopping in physical stores, where people have the opportunity to closely check a product (e.g., touching the surface of a T-shirt or smelling the scent of perfume) before making a purchase decision, online shoppers rely greatly on the uploaded product images to make any purchase decision. The decision-making is challenging when sell...
Preprint
Full-text available
The pancreatic disease taxonomy includes ten types of masses (tumors or cysts)[20,8]. Previous work focuses on developing segmentation or classification methods only for certain mass types. Differential diagnosis of all mass types is clinically highly desirable [20] but has not been investigated using an automated image understanding approach. We e...
Conference Paper
Full-text available
With the development of industrial automation and artificial intelligence, robotic systems are developing into an essential part of factory production, and the human-robot collaboration (HRC) becomes a new trend in the industrial field. In our previous work, ten dynamic gestures have been designed for communication between a human worker and a robo...
Article
In 2019, outbreaks of vaccine-preventable diseases reached the highest number in the US since 1992. Medical misinformation, such as antivaccine content propagating through social media, is associated with increases in vaccine delay and refusal. Our overall goal is to develop an automatic detector for antivaccine messages to counteract the negative...
Article
This study aims at sensing and understanding the worker’s activity in a human-centered intelligent manufacturing system. We propose a novel multi-modal approach for worker activity recognition by leveraging information from different sensors and in different modalities. Specifically, a smart armband and a visual camera are applied to capture Inerti...
Conference Paper
Full-text available
Video object detection (VOD) has been a rising topic in recent years due to the challenges such as occlusion, motion blur, etc. To deal with these challenges, feature aggregation from local or global support frames is verified effective. To exploit better feature aggregation, in this paper, we propose two improvements over previous works: a class-c...
Conference Paper
Full-text available
Human-robot collaboration (HRC) is a challenging task in modern industry and gesture communication in HRC has attracted much interest. This paper proposes and demonstrates a dynamic gesture recognition system based on Motion History Image (MHI) and Convolutional Neural Networks (CNN). Firstly, ten dynamic gestures are designed for a human worker to...
Article
Assembly carries paramount importance in manufacturing. Being able to support workers in real time to maximize their positive contributions to assembly is a tremendous interest of manufacturers. Human action recognition has been a way to automatically analyze and understand worker actions to support real-time assistance for workers and facilitate w...
Article
Influencers are non-celebrity individuals who gain popularity on social media by posting visually attractive content (e.g., photos and videos) and by interacting with other users (i.e., Followers) to create a sense of authenticity and friendship. Brands partner with Influencers to garner engagement from their target consumers in a new marketing str...
Article
Quality and efficiency are crucial indicators of any manufacturing company. Many companies are suffering from a shortage of experienced workers across the production line to perform complex assembly tasks. To reduce time and error in an assembly task, a worker-centered system consisting of multi-modal Augmented Reality (AR) instructions with the su...
Article
Full-text available
In a human-centered intelligent manufacturing system, every element is to assist the operator in achieving the optimal operational performance. The primary task of developing such a human-centered system is to accurately understand human behavior. In this paper, we propose a fog computing framework for assembly operation recognition, which brings c...
Conference Paper
Full-text available
Different from shopping in physical stores, where people have the opportunity to closely check a product (e.g., touching the surface of a T-shirt or smelling the scent of perfume) before making a purchase decision, online shoppers rely greatly on the uploaded product images to make any purchase decision. The decision-making is challenging when sell...
Preprint
Full-text available
In a human-centered intelligent manufacturing system, sensing and understanding of the worker's activity are the primary tasks. In this paper, we propose a novel multi-modal approach for worker activity recognition by leveraging information from different sensors and in different modalities. Specifically, a smart armband and a visual camera are app...
Article
Full-text available
Training and on-site assistance is critical to help workers master required skills, improve worker productivity, and guarantee the product quality. Traditional training methods lack worker-centered considerations that are particularly in need when workers are facing ever-changing demands. In this study, we propose a worker-centered training & assis...
Article
In this paper, we solve the problem of mitosis event localization and its stage localization in time-lapse phase-contrast microscopy images. Our method contains three steps: first, we formulate a Low-Rank Matrix Recovery (LRMR) model to find salient regions from microscopy images and extract candidate patch sequences, which potentially contain mito...
Article
Full-text available
Production innovations are occurring faster than ever. Manufacturing workers thus need to frequently learn new methods and skills. In fast changing, largely uncertain production systems, manufacturers with the ability to comprehend workers’ behavior and assess their operation performance in near real-time will achieve better performance than peers....
Article
Full-text available
In today’s competitive production era, the ability to identify and track important objects in a near real-time manner is greatly desired among manufacturers who are moving towards the streamline production. Manually keeping track of every object in a complex manufacturing plant is infeasible; therefore, an automatic system of that functionality is...
Preprint
Full-text available
American Sign Language (ASL) alphabet recognition by computer vision is a challenging task due to the complexity in ASL signs, high interclass similarities , large intraclass variations, and constant occlusions. This paper describes a method for ASL alphabet recognition using Convolutional Neural Networks (CNN) with multiview augmentation and infer...
Conference Paper
Full-text available
In a smart manufacturing system involving workers, recognition of the worker's activity can be used for quantification and evaluation of the worker's performance, as well as to provide onsite instructions with augmented reality. In this paper, we propose a method for activity recognition using Inertial Measurement Unit (IMU) and surface electromyog...
Conference Paper
Full-text available
Recognition of American Sign Language (ASL) alphabet not only could bring benefits to the ASL users, but also could provide solutions for natural human-computer/robot interactions in many applications. In this paper, we propose a method for ASL alphabet recognition with use of a Leap Motion Controller (LMC). The skeleton data from the native LMC AP...
Article
Full-text available
Smartphones embedded with cameras and other sensors offer possibilities to attack the problem of indoor localization where GPS is not reliable. In this paper, a novel tree-based localization system is proposed based on WiFi, inertial and visual signals. There are three levels in the tree: (1) WiFi-based coarse positioning. The WiFi database of a bu...
Article
In biomedical applications such as tracking hundreds of specimens over months, we assume none of the existing visual tracking approaches is capable of achieving error-free accuracy in these challenging big data scenarios. Meanwhile, biological discovery requires high-quality tracking results for solid analysis. However, manually debugging (verifyin...
Article
Vision based pedestrian tracking becomes a hard problem when long-term/heavy occlusion happens or pedestrian temporarily moves out of the visual field. In this paper, a novel persistent pedestrian tracking system is presented which combines visual signal from surveillance cameras and sensor signals from Inertial Measurement Unit (IMU) carried by pe...
Article
Increasing evidence has shown that the energy use of ant colonies increases sublinearly with colony size so that large colonies consume less per capita energy than small colonies. It has been postulated that social environment (e.g., in the presence of queen and brood) is critical for the sublinear group energetics, and a few studies of ant workers...
Conference Paper
This paper reports on a system using a Digital Micromirror Device (DMD) to modulate a near-infrared laser source spatially and temporally. The DMD can produce an arbitrary heat source varying both spatially and temporally over the target. When the thermal response of the target surface is recorded using a thermal imager, this provides new possibili...
Article
Automated microscopy image restoration, especially in Differential Interference Contrast (DIC) imaging modality, has attracted increasing attentions since it greatly facilitates long-term living cell analysis without staining. Although the previous work on DIC image restoration is able to restore the nuclei regions of living cells, it is still chal...
Conference Paper
Human tracking with wearable sensors such as Inertial Measurement Unit (IMU) is of great significance for ubiquitous computing and ambient applications. This paper proposes a novel Dead Reckoning based tracking algorithm using IMUs placed in the pocket. The contribution of our approach lies in three-folds: (1) Precise steps are detected according t...
Conference Paper
Automated image restoration in microscopy, especially in Differential Interference Contrast (DIC) imaging modality, has attracted increasing attention since it greatly facilitates living cell analysis. Previous work is able to restore the nuclei of living cells, but it is very challenging to reconstruct the unnoticeable cytoplasm details in DIC ima...
Conference Paper
Human physical activity recognition based on wearable sensors has applications relevant to our daily life such as healthcare. How to achieve high recognition accuracy with low computational cost is an important issue in the ubiquitous computing. Rather than exploring handcrafted features from time-series sensor signals, we assemble signal sequences...
Conference Paper
We propose a novel microscopy image restoration algorithm capable of co-restoring Phase Contrast and Differential Interference Contrast (DIC) microscopy images captured on the same cell dish simultaneously. Cells with different phase retardation and DIC gradient signals are restored into a single image without the halo artifact from phase contrast...
Conference Paper
Studying the behavior of fruit flies that mimic normal animal motivations can inform us about the molecular mechanisms and biochemical pathways. We build a glass chamber to house flies and record their behaviors in video frame sequences. Due to the challenges of low image contrast, small object size and fast object motion, we propose an adaptive Lo...

Citations

... The predictive uncertainties are naturally formed because of the usage of the Bayesian neural network (BNN) to predict the scores of accidents. In a paper by [14] Karim et al., the Dynamic Spatial-Temporal Attention (DSTA) network was proposed to analyze streaming dashcam video information that consist complex spatial-temporal connections of traffic agents in a dynamic backdrops the Dynamic Temporal Attention (DTA) and the Dynamic Spatial Learning (DSA) network. The Gated Recurrent Unit network has been implemented with a VGG-16 feature extractor to extract the features from each frame and detect the frame with the highest detection scores to update the likelihood of a future crash. ...
... We employ an annotation-efficient strategy to generate 3D masks of tumors for the labor cost reduction purpose. Specifically, we start with the PDAC segmentation model trained with arterial-late phase described in [46] to generate pseudo annotations. Next, the model is fine-tuned under the supervision of pseudo annotations and then applied to produce segmentation masks on our dataset. ...
... [11][12][13][14][15] Considering this, some researchers have introduced DL-based methods to cell detection and counting. [16][17][18][19][20][21][22][23] As demonstrated in our previous work, 10 a modified YOLO method has been proposed for accurate blood cell count, which provides an efficient solution for indemnifying overlapping cells. ...
... This kind of two-dimensional (2D) convolution can extract spatial features well, but rarely deals with temporal features. So, Ji et al. [40] extended the traditional 2D-CNN [41] to 3D-CNN, performing feature extraction in both the time dimension and space dimension, which means the feature maps among adjacent frames can interact during the convolution process. Guo et al. [42] employ 3D-CNN and statistic analysis algorithms to extract video and WiFi features, respectively, and propose a novel multi-modal learning approach for video and WiFi feature fusion. ...
... They propose exploiting inter-video and intravideo proposal relations to tackle object confusion. Another seminal works [27,28] attempts to solve this problem by devising better feature aggregation schemes that enhance target frame feature representation. Despite the gratifying improvement in detection, these approaches rely on a regionbased detector that focuses more on discriminating between background and foreground regions than differentiating between various foreground regions [12]. ...
... Hence using other positioning methods instead of GPS signal is a necessary feature for the UAS designed for bridge inspection. Some local positioning methods have been applied in bridge inspection UASs, including vision-based positioning methods, ultrasonic beacon-based positioning methods, and optical flow-based methods [25][26][27][28]. A lot of payloads such as high-performance onboard computers and stereo cameras are needed by the vision-based positioning method, which increases the load of UAS. ...
... Several works have shown that learning the movement of users is still a great challenge [21]. In [22], the authors proposed a reciprocal twin networks approach for accurate and robust Pedestrian trajectory prediction. A forward prediction network and a backward prediction network were designed to predict future trajectories based on past observations and perform the trajectory prediction backward, respectively. ...
... Service providers are advised to provide justice-seeking consumers with a recovery response process, which focuses on listening to the consumer and taking actions that aim to solve the issue at hand. A carefully tailored response and the proper handling of a complaint can create positive emotions for an upset consumer (Argyris et al., 2021). ...
... The goal of activity recognition is to understand the nature of the work environment, which allows for a better understanding of how people perform their jobs and what they are operating at any given time [1,2]. Our previous work analyzed the recognition of coarse-grained gestures [3][4][5] and worker assembly operation steps [6]. While activity recognition is a current focus of research, the industry's challenging problem of ne-grained activity recognition is largely overlooked. ...
... D UE to the advancement of deep neural networks, significant progress has been achieved on object detection in still images [1], [2], [3], [4], [5], [6], [7], [8], [9]. With the development of storage and communication technologies, video is becoming a popular media to convey more abundant information, and video-based analysis becomes pervasive nowadays, such as action recognition [10], [11], sematic segmentation [12], [13], object tracking [14], [15] and detection [16], [17], etc. Among them, video object detection (VOD), a fundamental task for numerous downstream applications such as robotics and autonomous driving, has revealed increasing importance. ...