# Xiangyang Ji's research while affiliated with Tsinghua University and other places

## Publications (246)

Preprint
Full-text available
Category-level object pose estimation aims to predict the 6D pose as well as the 3D metric size of arbitrary objects from a known set of categories. Recent methods harness shape prior adaptation to map the observed point cloud into the canonical space and apply Umeyama algorithm to recover the pose and size. However, their shape prior integration s...
Article
Objective Warfarin anticoagulation management requires sequential decision-making to adjust dosages based on patients’ evolving states continuously. We aimed to leverage reinforcement learning (RL) to optimize the dynamic in-hospital warfarin dosing in patients after surgical valve replacement (SVR). Materials and Methods 10 408 SVR cases with war...
Preprint
While category-level 9DoF object pose estimation has emerged recently, previous correspondence-based or direct regression methods are both limited in accuracy due to the huge intra-category variances in object shape and color, etc. Orthogonal to them, this work presents a category-level object pose and size refiner CATRE, which is able to iterative...
Article
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the comp...
Article
Offline reinforcement learning aims to maximize the expected cumulative rewards with a fixed collection of data. The basic principle of current offline reinforcement learning methods is to restrict the policy to the offline dataset action space. However, they ignore the case where the dataset's trajectories fail to cover the state space completely....
Article
3D meshes are widely employed to represent geometry structure of 3D shapes. Due to limitation of scanning sensor precision and other issues, meshes are inevitably affected by noise, which hampers the subsequent applications. Convolultional neural networks (CNNs) achieve great success in image processing tasks, including 2D image denoising, and have...
Article
Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward. These pre-trained policies can accelerate learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning. Conventional approaches of unsupervised ski...
Article
Full-text available
As an essential step in the pathological diagnosis, histochemical staining can show specific tissue structure information and, consequently, assist pathologists in making accurate diagnoses. Clinical kidney histopathological analyses usually employ more than one type of staining: H&E, MAS, PAS, PASM, etc. However, due to the interference of colors...
Preprint
The success of deep neural networks greatly relies on the availability of large amounts of high-quality annotated data, which however are difficult or expensive to obtain. The resulting labels may be class imbalanced, noisy or human biased. It is challenging to learn unbiased classification models from imperfectly annotated datasets, on which we us...
Preprint
One of the main challenges for feature representation in deep learning-based classification is the design of appropriate loss functions that exhibit strong discriminative power. The classical softmax loss does not explicitly encourage discriminative learning of features. A popular direction of research is to incorporate margins in well-established...
Article
Full-text available
Effectively imaging within volumetric scattering media is of great importance and challenging especially in macroscopic applications. Recent works have demonstrated the ability to image through scattering media or within the weak volumetric scattering media using spatial distribution or temporal characteristics of the scattered field. Here, we focu...
Preprint
Full-text available
The performance of machine learning models under distribution shift has been the focus of the community in recent years. Most of current methods have been proposed to improve the robustness to distribution shift from the algorithmic perspective, i.e., designing better training algorithms to help the generalization in shifted test distributions. Thi...
Preprint
Modern object detectors have taken the advantages of pre-trained vision transformers by using them as backbone networks. However, except for the backbone networks, other detector components, such as the detector head and the feature pyramid network, remain randomly initialized, which hinders the consistency between detectors and pre-trained models....
Preprint
Vision-language pre-training (VLP) relying on large-scale pre-training datasets has shown premier performance on various downstream tasks. In this sense, a complete and fair benchmark (i.e., including large-scale pre-training datasets and a variety of downstream datasets) is essential for VLP. But how to construct such a benchmark in Chinese remain...
Article
Full-text available
The demand for on-chip multifunctional optoelectronic systems is increasing in today's Internet of Things era. III-nitride quantum well diodes (QWDs) can transmit and receive information through visible light and can be used as both light-emitting diodes (LEDs) and photodetectors (PDs). Spectral emission-detection overlap gives the III-nitride QWD...
Preprint
Full-text available
Compressed Sensing MRI (CS-MRI) aims at reconstructing de-aliased images from sub-Nyquist sampling k-space data to accelerate MR Imaging, thus presenting two basic issues, i.e., where to sample and how to reconstruct. To deal with both problems simultaneously, we propose a novel end-to-end Probabilistic Under-sampling and Explicable Reconstruction...
Preprint
Full-text available
Point clouds upsampling is a challenging issue to generate dense and uniform point clouds from the given sparse input. Most existing methods either take the end-to-end supervised learning based manner, where large amounts of pairs of sparse input and dense ground-truth are exploited as supervision information; or treat up-scaling of different scale...
Preprint
This paper gives the first polynomial-time algorithm for tabular Markov Decision Processes (MDP) that enjoys a regret bound \emph{independent on the planning horizon}. Specifically, we consider tabular MDP with $S$ states, $A$ actions, a planning horizon $H$, total reward bounded by $1$, and the agent plays for $K$ episodes. We design an algorithm...
Preprint
Full-text available
6D object pose estimation is a fundamental yet challenging problem in computer vision. Convolutional Neural Networks (CNNs) have recently proven to be capable of predicting reliable 6D pose estimates even under monocular settings. Nonetheless, CNNs are identified as being extremely data-driven, and acquiring adequate annotations is oftentimes very...
Preprint
Full-text available
While 6D object pose estimation has recently made a huge leap forward, most methods can still only handle a single or a handful of different objects, which limits their applications. To circumvent this problem, category-level object pose estimation has recently been revamped, which aims at predicting the 6D pose as well as the 3D metric size for pr...
Preprint
Full-text available
Estimating the risk level of adversarial examples is essential for safely deploying machine learning models in the real world. One popular approach for physical-world attacks is to adopt the "sticker-pasting" strategy, which however suffers from some limitations, including difficulties in access to the target or printing by valid colors. A new type...
Article
Detecting oriented and densely packed objects is a challenging problem considering that the receptive field intersection between objects causes spatial feature aliasing. In this paper, we propose a convex-hull feature adaptation (CFA) approach, with the aim to configure convolutional features in accordance with irregular object layouts. CFA roots i...
Article
6D object pose estimation is a fundamental yet challenging problem in computer vision. Convolutional Neural Networks (CNNs) have recently proven to be capable of predicting reliable 6D pose estimates even under monocular settings. Nonetheless, CNNs are identified as being extremely data-driven, and acquiring adequate annotations is oftentimes very...
Preprint
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application. Instead of placing an existing Transformer-based image classification model directly after an image codec, we aim to redesign the Vision Transformer (ViT) model to perform image classification from the comp...
Preprint
Full-text available
Guided filter is a fundamental tool in computer vision and computer graphics which aims to transfer structure information from guidance image to target image. Most existing methods construct filter kernels from the guidance itself without considering the mutual dependency between the guidance and the target. However, since there typically exist sig...
Article
Depth map records distance between the viewpoint and objects in the scene, which plays a critical role in many real-world applications. However, depth map captured by consumer-grade RGB-D cameras suffers from low spatial resolution. Guided depth map super-resolution (DSR) is a popular approach to address this problem, which attempts to restore a hi...
Article
Confocal microscopy is a standard approach for obtaining volumetric images of a sample with high axial and lateral resolution, especially when dealing with scattering samples. Unfortunately, a confocal microscope is quite expensive compared to traditional microscopes. In addition, the point scanning in confocal microscopy leads to slow imaging spee...
Article
Full-text available
Abstract In this paper, a deep learning method for video‐based action recognition is proposed. On the one hand, boundary compensation on the basis of a deep neural network is performed to achieve action proposal. Boundary compensation considering non‐maximum suppression according to sliding window priority is applied to remove redundant windows. To...
Article
Library matching using carbon-13 nuclear magnetic resonance (¹³C NMR) spectra has been a popular method adopted in compound identification systems. However, the usability of existing approaches has been restricted as enlarging a library containing both a chemical structure and spectrum is a costly and time-consuming process. Therefore, we propose a...
Article
Controllable tailoring and understanding the phase-structure relationship of the 1T phase two-dimensional (2D) materials are critical for their applications in nanodevices. The in situ transmission electron microscope (TEM) could regulate and monitor the evolution process of the nanostructure of 2D material with atomic resolution. In this work, a c...
Article
Full-text available
Quantitative volumetric fluorescence imaging at high speed across a long term is vital to understand various cellular and subcellular behaviors in living organisms. Light-field microscopy provides a compact computational solution by imaging the entire volume in a tomographic way, while facing severe degradation in scattering tissue or densely-label...
Article
The difficulty of no-reference image quality assessment (NR IQA) often lies in the lack of knowledge about the distortion in the image, which makes quality assessment blind and thus inefficient. To tackle such issue, in this article, we propose a novel scheme for precise NR IQA, which includes two successive steps, i.e., distortion identification a...
Preprint
Full-text available
Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward. These pre-trained policies can accelerate learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning. Conventional approaches of unsupervised ski...
Preprint
We study the optimal batch-regret tradeoff for batch linear contextual bandits. For any batch number $M$, number of actions $K$, time horizon $T$, and dimension $d$, we provide an algorithm and prove its regret guarantee, which, due to technical reasons, features a two-phase expression as the time horizon $T$ grows. We also prove a lower bound theo...
Preprint
Confocal microscopy is the standard approach for obtaining volumetric images of a sample with high axial and lateral resolution, especially when dealing with scattering samples. Unfortunately, a confocal microscope is quite expensive compared to traditional microscopes. In addition, the point scanning in a confocal leads to slow imaging speed and p...
Article
Ptychography-based lensless on-chip microscopy enables high-throughput imaging by retrieving the missing phase information from intensity measurements. Numerous reconstruction algorithms for ptychography have been proposed, yet only a few incremental algorithms can be extended to lensless on-chip microscopy because of large-scale datasets but limit...
Preprint
Depth estimation from a single image is an active research topic in computer vision. The most accurate approaches are based on fully supervised learning models, which rely on a large amount of dense and high-resolution (HR) ground-truth depth maps. However, in practice, color images are usually captured with much higher resolution than depth maps,...
Article
Multifunctioning InGaN/GaN multi-quantum well (MQW) diodes can transmit and detect light separately. In particular, they have spectral overlap between electroluminescence (EL) and responsivity, conferring the unique ability to detect light emitted by another device sharing an identical MQW structure. Here, we monolithically integrate a III-nitride...
Preprint
Full-text available
Directly regressing all 6 degrees-of-freedom (6DoF) for the object pose (e.g. the 3D rotation and translation) in a cluttered environment from a single RGB image is a challenging problem. While end-to-end methods have recently demonstrated promising results at high efficiency, they are still inferior when compared with elaborate P$n$P/RANSAC-based...
Article
Full-text available
Recent research on whole slide imaging (WSI) has greatly promoted the development of digital pathology. However, accurate autofocusing is still the main challenge for WSI acquisition and automated digital microscope. To address this problem, this paper describes a low cost WSI system and proposes a fast, robust autofocusing method based on deep lea...
Preprint
Full-text available
Learning with noisy labels is an important and challenging task for training accurate deep neural networks. Some commonly-used loss functions, such as Cross Entropy (CE), suffer from severe overfitting to noisy labels. Robust loss functions that satisfy the symmetric condition were tailored to remedy this problem, which however encounter the underf...
Article
Structured illumination microscopy (SIM) enhances spatial resolution by projecting sinusoidal patterns with various orientations and lateral phase shifts. Here, we report a framework, termed Deep-SIM, powered by a deep neural network that learns the physical relationship between images with different lateral phase shifts. This approach captures one...
Article
Despite of the substantial progress of visual object detection, models trained in one video domain often fail to generalize well to others due to the change of camera configurations, lighting conditions, and object person views. In this paper, we present Domain Contrast (DC), a simple yet effective approach inspired by contrastive learning for trai...
Article
Few-shot semantic segmentation remains an open problem for the lack of an effective method to handle the semantic misalignment between objects. In this article, we propose part-based semantic transform (PST) and target at aligning object semantics in support images with those in query images by semantic decomposition-and-match. The semantic decompo...
Preprint
Full-text available
Robust loss functions are essential for training deep neural networks with better generalization power in the presence of noisy labels. Symmetric loss functions are confirmed to be robust to label noise. However, the symmetric condition is overly restrictive. In this work, we propose a new class of loss functions, namely \textit{asymmetric loss fun...
Article
Micro multiple quantum well (MQW) III‐nitride diodes usually function as micro light‐emitting diodes, which are considered the next generation of display technology. In addition to both illumination and display, MQW III‐nitride diodes in theory have the capability to both detect and modulate light. Here, proof that the III‐nitride diode can simulta...
Preprint
Full-text available
Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. However, the current MIL methods are usually based on independent and identical distribution hypothesis, thus neglect the correlation among different instances. To address this problem, we proposed...
Preprint
Encouraging progress in few-shot semantic segmentation has been made by leveraging features learned upon base classes with sufficient training data to represent novel classes with few-shot examples. However, this feature sharing mechanism inevitably causes semantic aliasing between novel classes when they have similar compositions of semantic conce...
Article
Weakly supervised object detection (WSOD) is a challenging task that requires simultaneously learning object detectors and estimating object locations under the supervision of image category labels. Many WSOD methods that adopt multiple instance learning (MIL) have nonconvex objective functions and, therefore, are prone to get stuck in local minima...
Preprint
Despite the substantial progress of active learning for image recognition, there still lacks an instance-level active learning method specified for object detection. In this paper, we propose Multiple Instance Active Object Detection (MI-AOD), to select the most informative images for detector training by observing instance-level uncertainty. MI-AO...
Preprint
Full-text available
Depth map records distance between the viewpoint and objects in the scene, which plays a critical role in many real-world applications. However, depth map captured by consumer-grade RGB-D cameras suffers from low spatial resolution. Guided depth map super-resolution (DSR) is a popular approach to address this problem, which attempts to restore a hi...
Preprint
Full-text available
We propose an unsupervised foreground-background segmentation method via training a segmentation network on the synthetic pseudo segmentation dataset generated from GANs, which are trained from a collection of images without annotations to explicitly disentangle foreground and background. To efficiently generate foreground and background layers and...
Preprint
We propose a novel joint lossy image and residual compression framework for learning $\ell_\infty$-constrained near-lossless image compression. Specifically, we obtain a lossy reconstruction of the raw image through lossy image compression and uniformly quantize the corresponding residual to satisfy a given tight $\ell_\infty$ error bound. Suppose...
Article
Full-text available
Significance: Fourier ptychography (FP) is a computational imaging approach that achieves high-resolution reconstruction. Inspired by neural networks, many deep-learning-based methods are proposed to solve FP problems. However, the performance of FP still suffers from optical aberration, which needs to be considered. Aim: We present a neural net...
Preprint
In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. Towards conservative strategy, current methods choose to regularize the behavior policy or learn a lower bound of the value function. However, exorbitant conservation tends to impair the policy's generalization ability and degrade its...
Preprint
Full-text available
6D pose estimation from a single RGB image is a challenging and vital task in computer vision. The current mainstream deep model methods resort to 2D images annotated with real-world ground-truth 6D object poses, whose collection is fairly cumbersome and expensive, even unavailable in many cases. In this work, to get rid of the burden of 6D annotat...
Preprint
Reward decomposition is a critical problem in centralized training with decentralized execution~(CTDE) paradigm for multi-agent reinforcement learning. To take full advantage of global information, which exploits the states from all agents and the related environment for decomposing Q values into individual credits, we propose a general meta-learni...