
Hongzhi Zhang- Harbin Institute of Technology
Hongzhi Zhang
- Harbin Institute of Technology
About
72
Publications
13,142
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
641
Citations
Current institution
Publications
Publications (72)
Recently, considerable progress has been made in allin-one image restoration. Generally, existing methods can be degradation-agnostic or degradation-aware. However, the former are limited in leveraging degradation-specific restoration, and the latter suffer from the inevitable error in degradation estimation. Consequently, the performance of existi...
In this paper, we consider two challenging issues in reference-based super-resolution (RefSR) for smartphone, (i) how to choose a proper reference image, and (ii) how to learn RefSR in a self-supervised manner. Particularly, we propose a novel self-supervised learning approach for real-world RefSR from observations at dual and multiple camera zooms...
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images, which is conducive to enhancing the imaging effects of smartphones with limited sensors. The main challenge of BurstSR is to effectively combine the complementary information from input frames, while existing...
Burst super-resolution (BurstSR) aims at reconstructing a high-resolution (HR) image from a sequence of low-resolution (LR) and noisy images, which is conducive to enhancing the imaging effects of smartphones with limited sensors. The main challenge of BurstSR is to effectively combine the complementary information from input frames, while existing...
Natural videos captured by consumer cameras often suffer from low framerate and motion blur due to the combination of dynamic scene complexity, lens and sensor imperfection, and less than ideal exposure setting. As a result, computational methods that jointly perform video frame interpolation and deblurring begin to emerge with the unrealistic assu...
In this paper, we consider two challenging issues in reference-based super-resolution (RefSR), (i) how to choose a proper reference image, and (ii) how to learn real-world RefSR in a self-supervised manner. Particularly, we present a novel self-supervised learning approach for real-world image SR from observations at dual camera zooms (SelfDZSR). C...
Recently, deep learning-based image denoising methods have achieved promising performance. However, when the distribution of real-world noisy images is unknown, the denoising performance is still limited due to the domain gap between the training set and the testing set. Nonetheless, the unknown noise distribution usually can be modeled as proper c...
Object detection is usually solved by learning a deep architecture involving classification and localization tasks, where feature learning for these two tasks is shared using the same backbone model. Recent works have shown that suitable disentanglement of classification and localization tasks has the great potential to improve performance of objec...
The study of multi-task learning has drawn great attention from the community. Despite the remarkable progress, the challenge of optimally learning different tasks simultaneously remains to be explored. Previous works attempt to modify the gradients from different tasks. Yet these methods give a subjective assumption of the relationship between tas...
In this paper, we consider two challenging issues in reference-based super-resolution (RefSR), (i) how to choose a proper reference image, and (ii) how to learn real-world RefSR in a self-supervised manner. Particularly, we present a novel self-supervised learning approach for real-world image SR from observations at dual camera zooms (SelfDZSR). F...
Weakly supervised semantic segmentation (WSSS) based on bounding box annotations has attracted considerable recent attention and has achieved promising performance. However, most of existing methods focus on generation of high-quality pseudo labels for segmented objects using box indicators, but they fail to fully explore and exploit prior from bou...
Crowd counting is critical for numerous video surveillance scenarios. One of the main issues in this task is how to handle the dramatic scale variations of pedestrians caused by the perspective effect. To address this issue, this paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dila...
Crowd counting is critical for numerous video surveillance scenarios. One of the main issues in this task is how to handle the dramatic scale variations of pedestrians caused by the perspective effect. To address this issue, this paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dila...
Well-annotated training samples show necessity in achieving high performance of object detection, but collection of massive samples is extremely laborious and costly. Recently, cut-paste based methods show the potential to augment the training samples by cutting the foreground instances and pasting them on some background regions. However, existing...
Abstract In this paper, online non-negative discriminative dictionary learning for tracking is proposed, which combines the advantages of the global dictionary learning model and the class-specific dictionary learning model. The previous algorithm based on general dictionary learning does not take into account the inter-class relations between clas...
The tradeoff between receptive field size and efficiency is a crucial issue in low level vision. Plain convolutional networks (CNNs) generally enlarge the receptive field at the expense of computational cost. Recently, dilated filtering has been adopted to address this issue. But it suffers from gridding effect, and the resulting receptive field is...
Ear recognition task is known as predicting whether two ear images belong to the same person or not. In this paper, we present a novel metric learning method for ear recognition. This method is formulated as a pairwise constrained optimization problem. In each training cycle, this method selects the nearest similar and dissimilar neighbors of each...
Convolutional neural networks (CNNs) based deep features have been demonstrated with remarkable performance in various computer vision tasks, such as image classification and face verification. Compared with the hand-crafted descriptors, deep features exhibit more powerful representation ability. Typically, higher layer features contain more semant...
deep features extracted from Convolutional Neural Networks (CNNs) have been widely adopted in various applications, such as face recognition. Compared with the handcrafted descriptors, deep features have more powerful representation ability which can lead to better performance. Effective feature representations play an important role in ear recogni...
Recently, deep features extracted from Convolutional Neural Networks (CNNs) have been widely adopted in various applications, such as face recognition. Compared with the handcrafted descriptors, deep features have
more powerful representation ability which can lead to better performance. Effective feature representations play an important role in e...
Medical diagnosis using the tongue is a unique and important diagnostic method of traditional Chinese medicine (TCM)
. However, the clinical applications of tongue diagnosis have been limited due to three factors: (1) tongue diagnosis is usually based on the capacity of the eye for detailed discrimination. (2) the correctness of tongue diagnosis de...
Automatic tongue area segmentation is crucial for computer-aided tongue diagnosis, but traditional intensity-based segmentation methods that make use of monochromatic images cannot provide accurate and robust results. We propose a novel tongue segmentation method that uses hyperspectral images
and the support vector machine
. This method combines s...
Automated tongue image segmentation
, in Chinese medicine, is difficult due to two special factors: (1) there are many pathological details on the surface of the tongue, which have a large influence on edge extraction, and (2) the shapes of the tongue bodies captured from various persons (with different diseases) are quite different, so it is impos...
This chapter presents a region merging-based automatic tongue segmentation method. First, gradient vector flow
is modified as a scalar diffusion equation to diffuse the tongue image while preserving the edge structures of the tongue body. Then the diffused tongue image is segmented into many small regions by using the watershed algorithm. Third, ma...
This chapter focuses on relationships between diseases and the appearance of the human tongue in terms of quantitative features. The experimental samples are digital tongue images captured from three groups of candidates: one group in normal health, one suffering with appendicitis
, and a third suffering with pancreatitis
. For the purposes of diag...
Tongue diagnosis, one
of the most important diagnosis methods of Traditional Chinese Medicine
, is considered a very good candidate for remote diagnosis methods because of its simplicity and noninvasiveness. Recently, considerable research interests have been given to the development of automated tongue segmentation technologies, which is difficult...
Diabetes mellitus (DM)
and its complications leading to diabetic retinopathy (DR)
will soon become one of the twenty-first century’s major health problems. This represents a huge financial burden to healthcare officials and governments. To combat this approaching epidemic, this chapter presents a noninvasive method to detect DM and nonproliferative...
Color images produced by digital cameras are usually device-dependent
, i.e., the generated color information (usually presented in the RGB color space) is dependent on the imaging characteristics of specific cameras. This is a serious problem in computer-aided tongue image analysis because it relies on the accurate rendering of color information....
In order to improve the quality and consistency of tongue images acquired by current imaging devices, this research aims to develop a novel imaging system which can faithfully and precisely record human tongue information for medical analysis. A thorough demand analysis is first conducted in this chapter in order to summarize requirements for rende...
A tongue diagnosis system can offer significant information for health conditions. To ensure the feasibility and reliability of tongue diagnosis, a robust and accurate tongue segmentation method is a prerequisite. However, both of the common segmentation methods (edge-based or region-based) have limitations so that satisfactory results especially f...
Tongue diagnosis is one of the most important and widely used diagnostic methods in Chinese medicine. Visual inspection of the human tongue offers a simple, immediate, inexpensive, and noninvasive solution for various clinical applications and self-diagnosis. Increasingly, powerful information technologies have made it possible to develop a compute...
A novel tongue color analysis system for medical applications is introduced in this chapter. Using the tongue color gamut
, tongue foreground pixels are first extracted and assigned to one of 12 colors representing this gamut. The ratio of each color for the entire image is calculated and forms a tongue color feature vector
. Experimenting on a lar...
In this book, four types of tongue image analysis technologies were elaborated by including the most current research findings in all aspects of tongue image acquisition, preprocessing, classification, and diagnostic support methodologies. In this chapter, we summarized these technologies from a systemic point of view and presented our thoughts on...
In this chapter, an in-depth analysis of the statistical distribution characteristics of human tongue color with the aim of proposing a mathematically described tongue color space
for diagnostic feature extraction is presented. Three characteristics of tongue color space, i.e., the tongue color gamut
that defines the range of colors, color centers...
In order to improve the correction accuracy of tongue colors by use of the Munsell colorchecker
, this
research aims to design a new colorchecker by aid of the tongue color space
. Three essential issues leading to the development of this space-based colorchecker are investigated in this chapter. First, based on a large and comprehensive tongue dat...
Tongue diagnosis is an important diagnostic method in traditional Chinese medicine (TCM)
. However, due to its qualitative, subjective, and experience-based nature, traditional tongue diagnosis has a very limited application in clinical medicine. Moreover, traditional tongue diagnosis is always concerned with the identification of syndromes rather...
The human tongue is an important organ of the body, which carries information of the health status. The images of the human tongue that are currently used in computerized tongue diagnosis of traditional Chinese medicine (TCM)
are all RGB color images captured with color CCD cameras. However, this conversional method impedes the accurate analysis of...
Traditional Chinese Medicine
diagnoses a wide range of health conditions by examining features of the tongue, including its shape. This chapter presents a classification approach for automatically recognizing and analyzing tongue shapes based on geometric features. The approach corrects tongue deflection by applying three geometric criteria and the...
This is the first book offering a systematic description of tongue image analysis and processing technologies and their typical applications in computerized tongue diagnostic (CTD) systems. It features the most current research findings in all aspects of tongue image acquisition, preprocessing, classification, and diagnostic support methodologies,...
Dictionary learning (DL) has recently attracted intensive attention due to its representative and discriminative power in various classification tasks. Although much progress has been reported in the existing supervised DL approaches, it is still an open problem that how to build the relationship between dictionary atoms and the class labels in mul...
The discriminative ability of geometric features can be well supported by empirical studies in ear recognition. Recently, a number of methods have been suggested for geometric feature extraction from ear images. However, these methods usually have relatively high feature dimension or are sensitive to rotation and scale variations. In this paper, we...
Dictionary learning (DL) has recently attracted intensive attention due to its representative and discriminative power in various classification tasks. Although much progress has been reported in the existing supervised DL approaches, it is still an open problem that how to build the relationship between dictionary atoms and the class labels in mul...
Discriminative dictionary learning (DDL) has recently attracted intensive attention due to its representative and discriminative power in various classification tasks. However, most of the existing DDL methods fall into two extreme cases, i.e., they either learn a global dictionary for all classes or train a class-specific dictionary, leading to le...
Due to limited network bandwidth, the blurred and downsampled high-resolution images in the spatial domain are inevitably used for transmission over the internet, and so single image super-resolution (SISR) algorithms would play a vital role in reconstructing the lost spatial information of the low-resolution images. Recently, it has been recognize...
In computational tongue diagnosis, specular reflection is generally inevitable in tongue image acquisition, which has adverse impact on the feature extraction and tends to degrade the diagnosis performance. In this paper, we proposed a two-stage (i.e., the detection and inpainting pipeline) approach to address this issue: (i) by considering both hi...
Image blur caused by camera shake is often spatially variant, which makes it more challenging to recover the latent sharp image. Geometrical camera shake model based non-uniform deblurring methods, modeling the blurry image as a weighted summation of the homographically transformed images of the latent sharp image, although can achieve satisfactory...
Feature selection aims to select a subset of features to decrease time complexity, reduce storage burden and improve the generalization ability of classification or clustering. For the countless unlabeled high dimensional data, unsupervised feature selection is effective in alleviating the curse of dimension-ality and can find applications in vario...
Image noise is difficult to avoid during the image acquisition and communication, and thus we need to suppress noise in the low level vision. Among all of the existing image denoising methods, image priors, such as hyper-Laplacian priors of the heavy-tailed distribution of image gradient, play an important role. However, many denoising methods tend...
Distance metric learning plays an important role in many machine learning tasks. In this paper, we propose a method for learning a Mahanalobis distance metric. By formulating the metric learning problem with relative distance constraints, we suggest a Relative Distance Constrained Metric Learning (RDCML) model which can be easily implemented and ef...
Using a global regression model for single image super-resolution (SISR) generally fails to produce visually pleasant output. The recently developed local learning methods provide a remedy by partitioning the feature space into a number of clusters and learning a simple local model for each cluster. However, in these methods the space partition is...
As a fundamental tool, L
0 gradient smoothing has found a flurry of applications. Inspired by the progress of research on hyper-Laplacian prior, we propose a novel model, corresponding to L
p-norm of gradients, for image smoothing, which can better maintain the general structure, whereas diminishing insignificant texture and impulse noise-like high...
With the rapid development of the face recognition technology, more and more optical products are applied in people's real life. The recognition accuracy can be improved by increasing the number of training samples, but the colossal training samples will result in the increase of computational complexity. In recent years, sparse representation meth...
Classification using the l
2-norm-based representation is usually computationally efficient and is able to obtain high accuracy in the recognition of faces. Among l
2-norm-based representation methods, linear regression classification (LRC) and collaborative representation classification (CRC) have been widely used. LRC and CRC produce residuals in...
The sparse representation classification (SRC) method proposed by Wright et al. is considered as the breakthrough of face recognition because of its good performance. Nevertheless it still cannot perfectly address the face recognition problem. The main reason for this is that variation of poses, facial expressions, and illuminations of the facial i...
Pulse signal contains important information about health status and pulse diagnosis has been extensively applied in oriental medicine. In recent years more and more research interests have been given on computerized pulse diagnosis. Pulse feature extraction plays an important role in computerized pulse diagnosis. The most popular pulse feature extr...
Multiclass classification is an important problem in pattern recognition. Various classification methods have been proposed in the past few decades. However, most of these classification methods neglect the errors or the noises that exist in samples. As a result, classification accuracy is badly influenced by the errors or noises. In this paper, we...
Nonuniform blurring is general for image degradation. Either defocus, camera shaking, or motion would result in nonuniform blurring. However, most current image restoration algorithms were developed for restoration from image blurred with one single space-invariant convolution kernel. The computational inefficiency would be significant if we direct...
Recent study reported that wrist pulse blood flow signal is effective for disease diagnosis. The multiscale entropy, which was developed for quantifying the complexity of a time series of physiological signals over a range of scales, had been widely applied for feature extraction from medical signals. In this paper, using the multiscale sample entr...
For the robust recognition of noisy face images, this paper proposed an improved fast neighborhood component analysis (FNCA) method by introducing a spatially smooth regularizer (FNCA-SSR). The SSR can penalize large differences between adjacent pixels by enforcing local spatially smoothness, and makes FNCA-SSR model robust to Gaussian and pepper-s...
Subspace methods have been very successful in face recognition. Neighborhood components analysis (NCA), one popular subspace method, however, cannot outperform discriminative common vectors (DCV) when applied to face recognition. In this paper, we proposed a Gabor feature-based fast NCA method (Gabor-FNCA). First, we extract multi-scale and multi-o...
Wrist pulse signal is of great importance in the analysis of the health status and pathologic changes of a person. A number of feature extraction methods have been proposed to extract linear and nonlinear, and time and frequency features of wrist pulse signal. These features are heterogeneous in nature and are likely to contain complementary inform...
The toothprint on tongue is an important objective index for revealing the human sub-health state, and thus the extraction and description of toothprint is of great significance in clinical applications. Current toothprint detection methods are only based on concave point. These methods, however, heavily depend on the accuracy of the segmentation o...
Examination of the tongue condition is a standard diagnostic method in Traditional Chinese Medicine (TCM) and takes account of a wide variety of features including shape, texture, and color. The terms “warm”, “neutral”, and “cool” are used to refer to a kind of chromatics characteristic of the tongue color and are associated with various health sta...
Along with the rapid growth of medical data, image retrieval, a kind of technology for browsing, searching and retrieving similar images of the given image, has become increasingly important from a large database of digital images. Tongue coating is the most important characteristic to reveal the pathological changes of the tongues for identifying...
Tongue diagnosis is a unique and important diagnostic method in Traditional Chinese Medicine (TCM). It is used to observe abnormal changes in the tongue color for identifying syndrome patterns. However, due to its qualitative, subjective and experience-based nature, it is hard to represent and visualize tongue color in computerized tongue diagnosis...
With the increasing of tongue image data, exploratory data analysis would provide an effective means to obtain knowledge and to verify the correctness of several traditional tongue diagnosis findings. This paper proposes an exploratory tongue color analysis method by using the manifold learning technique. We further present an evaluation criterion...
Petechia Spot ranks itself among the most significant pathological features utilized in the practice of the Tongue Diagnosis because it can reveal the existence of tiny hemorrhage in some organ inside the body. In this paper, an automated feature extraction method for recognition of Petechia Spot in tongue image is proposed employing marker-based w...
Using the framework of quotient image model, this paper presented a median filtering-based method (MFQI) for the illumination normalization of facial images. The proposed method first uses adaptive median filtering to derive an estimation of the received light, and then uses the quotient image model for local illumination normalization. Compared wi...