ArticlePDF Available

Tongue Images Classification Based on Constrained High Dispersal Network

Authors:

Abstract and Figures

Computer aided tongue diagnosis has a great potential to play important roles in traditional Chinese medicine (TCM). However, the majority of the existing tongue image analyses and classification methods are based on the low-level features, which may not provide a holistic view of the tongue. Inspired by deep convolutional neural network (CNN), we propose a novel feature extraction framework called constrained high dispersal neural networks (CHDNet) to extract unbiased features and reduce human labor for tongue diagnosis in TCM. Previous CNN models have mostly focused on learning convolutional filters and adapting weights between them, but these models have two major issues: redundancy and insufficient capability in handling unbalanced sample distribution. We introduce high dispersal and local response normalization operation to address the issue of redundancy. We also add multiscale feature analysis to avoid the problem of sensitivity to deformation. Our proposed CHDNet learns high-level features and provides more classification information during training time, which may result in higher accuracy when predicting testing samples. We tested the proposed method on a set of 267 gastritis patients and a control group of 48 healthy volunteers. Test results show that CHDNet is a promising method in tongue image classification for the TCM study.
This content is subject to copyright. Terms and conditions apply.
Research Article
Tongue Images Classification Based on
Constrained High Dispersal Network
Dan Meng,1Guitao Cao,1,2 Ye Duan,2Minghua Zhu,1Liping Tu,2,3
Dong Xu,2and Jiatuo Xu4
1MOE Research Center for Soware/Hardware Co-Design Engineering, East China Normal University, Shanghai 200062, China
2Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
3Department of TCM Information and Technology Center, Shanghai University of TCM, Shanghai, China
4Department of Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Pudong New Area,
Shanghai 201203, China
Correspondence should be addressed to Guitao Cao; gtcao@sei.ecnu.edu.cn and Minghua Zhu; mhzhu@sei.ecnu.edu.cn
Received 31 August 2016; Revised 5 January 2017; Accepted 17 January 2017; Published 30 March 2017
Academic Editor: Jeng-Ren Duann
Copyright ©  Dan Meng et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computer aided tongue diagnosis has a great potential to play important roles in traditional Chinese medicine (TCM). However,
the majority of the existing tongue image analyses and classication methods are based on the low-level features, which may not
provide a holistic view of the tongue. Inspired by deep convolutional neural network (CNN), we propose a novel feature extraction
framework called constrained high dispersal neural networks (CHDNet) to extract unbiased features and reduce human labor
for tongue diagnosis in TCM. Previous CNN models have mostly focused on learning convolutional lters and adapting weights
between them, but these models have two major issues: redundancy and insucient capability in handling unbalanced sample
distribution. We introduce high dispersal and local response normalization operation to address the issue of redundancy. We also
add multiscale feature analysis to avoid the problem of sensitivity to deformation. Our proposed CHDNet learns high-level features
and provides more classication information during training time, which may result in higher accuracy when predicting testing
samples. We tested the proposed method on a set of  gastritis patients and a control group of  healthy volunteers. Test results
show that CHDNet is a promising method in tongue image classication for the TCM study.
1. Introduction
Tongue image classication is a key component in traditional
Chinese medicine (TCM). For thousands of years, Chinese
medical physicians have judged the patient’s health status
by examining the tongues color, shape, and texture [, ].
With the improvement in digital medical imaging equipment
and pattern recognition methods, computer aided tongue
diagnoses have a great potential to play an important role in
TCM by providing more accurate, consistent, and objective
clinical diagnoses [].
In the past decades, tongue image feature extraction
methods have been intensively studied. According to these
studies, computer aided tongue diagnosis methods can be
divided into two categories: single feature and multifeatures.
Many single feature extraction methods have been proposed
and applied to tongue images analysis. Such methods can
exploit useful information based on a simple descriptor such
as color, texture, shape, and orientation. As presented in [–
], a single feature was used to analyze the tongue images. Li
and Yuen [] investigated and reported the color matching of
tongue images with dierent metrics in dierent color space.
In [], Wang et al. presented a color recognition scheme of
tongue images by obtaining a number of homogenous regions
before classication. Spectral Angle Mapper (SAM) [] can
recognize and classify tongue colors using their spectral
signatures rather than their color values in RGB color space.
In [] the authors aimed to build a mathematically described
tongue color space for diagnostic feature extraction based on
the statistical distribution of tongue color. In [], partition
Hindawi
Evidence-Based Complementary and Alternative Medicine
Volume 2017, Article ID 7452427, 12 pages
https://doi.org/10.1155/2017/7452427
Evidence-Based Complementary and Alternative Medicine
patients’ state (either healthy or diseased) was quantitatively
analyzed using geometry tongue shape features with the
computerized methods. In [], Cao et al. presented a feature
extraction method based on statistical features.
Although many models based on the strategy of a single
featurehavebeenproposedandachievedsuccessfulresults,
this type of method only utilizes low-level features. So mul-
tifeatures [, ] are helpful to detect normal and abnormal
tongue images. e works [–] used multifeatures (such
as the combination of color and texture or shape) to identify
and match tongue images. In [], Kanawong et al. proposed
a coating separation step before extracting features. In [],
Guo proposed a color-textured operator called primary
dierence signal local binary pattern (PDSLBP) to handle the
tongue image matching problem. In [], a tongue computing
model (TCoM) based on quantitative measurements that
include chromatic and textural features was proposed to
diagnose appendicitis. In [], multilabeled learning was
applied to tongue image classication aer extracting color
andtexturefeatures.
In fact, the aforementioned methods only used low-level
features either single feature or multifeatures, which cannot
completely describe the characteristics of the tongue. It was
necessary for us to integrate a framework that can generate
complete features from tongue images. us, high-level
features were necessary for computer aided tongue analysis.
Most existing publications have described and applied deep
learning models to extract high-level feature representations
forawiderangeofvisionanalysistasks[](suchashand-
written digit recognition [], face recognition [], and
object recognition []). However, there exists little or no
literature on computer aided tongue image analysis using
deep learning models, whereas computer aided expert system
with unambiguity and objectivity tongue analysis results can
be used to facilitate both traditional Chinese and Western
medical practices’ diagnostic results.
PCANet [] is the least simple deep neural network
in the visual classication task developed by Chan et al.
[]. In [], PCANet was compared with the well-known
convolutional neural networks (CNN) [] with respect to
performance on various tasks. Instead of initializing the
weights of the network in CNN randomly or via pretraining
andthenupdatingtheminabackpropagationway,PCANet
[] treated only the most basic PCA lters for use in the
convolution lter bank in each stage without further training.
Since PCANet [] was developed, high-level extraction
methods based on PCANet [] have been studied in the eld
of face recognition [], human fall detection [], speech
emotion detection [], and so forth. Although PCANet
has not been applied to the eld of computer aided tongue
diagnoses, it has several characteristics that make it applicable
for tongue image classication. As presented in PCANet
[], it is easy to train and to adapt to dierent data and
tasks without changing the architecture of network. Besides,
there is little need to ne-tune parameters. Additionally,
PCANet [] combined with machine-learning classication
algorithms, such as -nearest neighbor (KNN), SVM, and
Random Forest (RF), can achieve excellent performance in
classication tasks.
However, we observed that the original PCANet []
has two major issues: redundancy and insucient capability
to handle unbalanced samples. e PCA method, by its
nature, will respond to the large eigenvalues. erefore, in the
PCANet, there is oen a signicant amount of redundancy
in the convoluted feature maps. Another issue is that clas-
sicationtasksmentionedinPCANet[]arebasedonthe
assumption that the distribution of samples is balanced and
the number of samples in dataset is large. Since PCANet []
only consists of a convolutional layer and histogram, we had
no way of knowing if it could gain sucient information in
distinguishing normal from abnormal status for our specic
tongue classication task.
Inspired by the works of deep learning models and their
variants, this paper proposes a framework referred to as
constrained high dispersal neural networks (CHDNet) based
on PCA convolutional kernels to aid in tongue diagnosis. e
proposed CHDNet learns useful features from the clinical
data in an unsupervised way, and, with these obtained
features, a supervised machine-learning technique is used to
learn how to partition a patient’s health status into normal or
abnormal states.
e main contributions of this paper are as follows:
(i) A new feature extraction method (CHDNet) is pre-
sented which explores the feature representations of
normal and abnormal tongue images, which mainly
benet from using four important components: non-
linear transformation, multiscale feature analysis,
high dispersal, and local normalization.
(ii) CHDNet can provide robust feature representations
to predict patient’s health status based on obtained
samples with unbalanced distribution, given the fact
that most people come to the hospital for their
physicians advice or prescription when they feel sick.
(iii) CHDNet has been evaluated by using diagnosed
samples by clinicians. Experimental results conrmed
that gastritis patients classied by our proposed
model were in good agreement with the clinicians
diagnosis.
e rest of this paper is organized as follows. Section 
introduces the framework of the proposed CHDNet. Experi-
mental results are shown in Section . Finally, the conclusion
andfutureworkarepresentedinSection.
2. Algorithm Overview
For each image, we rst extracted the tongue body from
its background. en, we applied CHDNet to learn features
of normal and abnormal tongue bodies. Figure  shows
the owchart of normal and abnormal detection framework
based on CHDNet. For each tongue image, aer extracting
the tongue body from its background, it was normalized
to xed height and weight. Aer these two steps of pre-
processing, we partitioned the tongue images into training
and testing sets to learn convolutional kernels and generate
feature representations. We then sent feature representations
of the whole tongue images dataset into classier using
Evidence-Based Complementary and Alternative Medicine
Convolution with lters and
nonlinear transformation layer
Tongue images
Tongue body
extraction
Training Test i n g
Learned 2D
PCA ters
Convolution with lters and
nonlinear transformation layer
Multifeature analysis
Proposed
CHDNet Multiscale feature
Pooling layer
Feature representations
of whole tongue images
dataset
Feature
representation
vector
Vali d a t i on
ended?
Decision
based
on model
No
Tra i n Trai n Tr a in Test Tr a in
Classier
Tongue images
Analysis model
Average
classication
accuracy
Dataset
partition
K-folds validation
Ye s
Cross validation
···
······
···
Multiscale feature pooling layer
Histogram
High dispersal
Local response normalization
F : Flowchart of normal and abnormal detection framework based on CHDNet.
Evidence-Based Complementary and Alternative Medicine
Input:Tongueimageswithlabels(𝑖,𝑖).where
= 1,2,...,;𝑖=1,2,...,𝑐.
Output: Predict labels of testing images.
(1)Partition the whole dataset into training set and testing set.
(2) if 𝑖training set then
(3) Compute patch mean removal of 𝑖.
(4) for =1to stage do
(5) Compute convolutional kernels 𝑗at stage .
(6) Compute convolutional feature map 𝑖,𝑗 using ().
(7) Compute non-linear transformation feature map 𝑖,𝑗
using ().
(8) Compute the convolutional kernels 𝑗at stage .
(9) if ==stage then
(10) for each convolutional lter 2do
(11) Compute the multi-scale feature maps using ().
(12) Apply high dispersal operation by ().
(13) Execute local response normalization by ().
(14) Compute the feature map at stage according
to ().
(15) end for
(16) end if
(17) end for
(18) Extract feature representations for 𝑖by ().
(19) else
() for=1to stage do
(21) Compute the feature map at stage using the learned
kernels, which is similar to Steps (6)(16).
(22) end for
(23) Extract multiscale features 𝑖
test, which is similar to
Step (18).
(24) end if
(25) for validation =  to do
(26) Train classier:
train[train,train].
(27) Predict labels for test images:
predictLabels =predict[test]
(28) end for
A : Tongue images classication based on CHDNet.
folds cross validation strategy for the classication task. e
samples were labeled into two classes, namely, normal and
abnormal.ewholefeaturerepresentationswereseparated
into training and testing sets in dierent ways. e
classier was rst trained with −1subsets, and then its
performancewasevaluatedontheth subset. Each subset
was used as the test set once through repeating the process
times. e nal result was obtained by averaging the outcome
produced in corresponding rounds.
Algorithm  elaborates the details of the proposed CHD-
Net.
2.1. Features Extraction with the Proposed CHDNet. e
original tongue image sizes are 1728 × 1296 pixels. Aer
tongue body extraction, we notice that our interest area
(tongue body) is around 500×500 pixels, so we zoomed all
the tongue body images into 32×32pixels.
For image classication tasks, the hand-craed low-level
features can generally work well when dealing with some
specic tasks or data processes, like HOG [], LBP [], and
SIFT [] for object recognition. Yet they are not universal
for all conditions, especially when the application scenario
is medicine. erefore, the concept of learning features from
data of interest is proposed to overcome the limitation of
hand-craed features, and deep learning is treated as a
better method to extract high-level features, which provide
more invariance into intraclass variability. As mentioned in
Section , PCANet [] is a deep network, which has the
capability to extract high-level features.
Compared to PCANet [], CHDNet has four important
components:
(i) High Dispersal. With the high dispersal operation,
features in each feature map achieve the property of
dispersal without redundancy.
(ii) Local Response Normalization.Aertheprocessing
of high dispersal, features in the same position of
dierent feature maps still have redundancy. e
Evidence-Based Complementary and Alternative Medicine
Input layer
First stage
Nonlinear
transformation
PCA lters
convolution
Second stage Feature pooling layer
PCA lters
convolution
Histogram & multiscale
feature analysis
High dispersal & local
response normalization
Nonlinear
transformation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
H1
H2
H2
F1
F2
F2
F
Ti
0K1
1
K2
1
Ci1
1
Ci2
1
Ti1
1
Ti2
1
K1
2
K2
2
K2
2
K1
2
K2
2
K2
2
K1
2
K2
2
K2
2
C11
2
C12
2
C12
2
C21
2
C22
2
C22
2
C11
2
C12
2
C12
2
T11
2
T12
2
T12
2
T21
2
T22
2
T22
2
T11
2
T12
2
T12
2
H1
0
H1
1
H1
2
H2
0
H2
1
H2
2
H2
2
H2
2
H2
2
F1
F2
F2
Ti1
1
K1
1Ci1
1
F : e structure of the two-stage CHDNet.
proposed local response normalization aims to solve
this problem.
(iii) Nonlinear Transformation Layer.Sinceweusetanh()
in feature convolutional layer, negative value exists,
which conicts with the principle of visual systems.
In order to prepare suitable inputs for convolutional
layer and local response normalization, we add a
nonlinear transformation layer aer each convolution
layer.
(iv) Multiscale Feature Analysis.Toimprovetheabilityto
handle deformation, we introduce multiscale feature
analysis before high dispersal and local response
normalization.
It should be pointed out that these new ideas and methods
are uniquely designed to address the major limitations of
the original PCANet [] approach and aim to signicantly
improve its performance. Experimental results in Section 
will validate how these new ideas impact the performance of
tongue images classication. Since we add several constraints
on the networks and features distributed in a high dispersal
manner, our proposed feature extraction method is called the
constrained high dispersal neural network (CHDNet).
Compared with convolutional neural network (CNN),
which is a complex neural network architecture, requiring
tricky parameter tuning and time-consuming calculations,
CHDNet is extremely simple and ecient.
Figure  shows the basic architecture of CHDNet. It
is composed of three components: PCA lters convolution
layer, nonlinear transformation layer, and a feature pooling
layer.
2.1.1. Nonlinear Transformation. Suppose we have training
samples of size ×. As illustrated in Figure , we collect
patches of size 1×2,withstride =1.Wealsodopatchmean
removal on all overlapping patches and, then, vectorize and
combine them into a matrix.
For th image 0
𝑖at input stage, aer patch mean removal
operation we get 0
𝑖=[0
𝑖,1,0
𝑖,2,...,0
𝑖,𝑝]∈
𝑠1𝑠2×𝑝.Similarto
th image, for the entire training samples we have {0
𝑖}𝑁
𝑖=1 =
[0
1,0
2,...] 𝑁×𝑠1𝑠2×𝑝. In order to learn PCA lters at the
rst stage, we need to obtain the eigenvectors of covariance
matrix 0(0)𝑇and select the largest 1eigenvaluesasthe
PCA lters:
1,1=svd 1
⋅⋅
100𝑇
1=reshape 1,1,2∈𝑠1×𝑠2,()
where the matrix 1∈
𝑠1𝑠2×𝑠1𝑠2contains the eigenvec-
tors of covariance matrix and the diagonal entries of the
matrix 1holds the corresponding eigenvalues. In (),
reshape(1,1,2)is a function that maps 1∈
𝑠1𝑠2to 1
𝑠1×𝑠2. en, with the boundary zero-padded we move on to
obtain the PCA convoluted feature maps:
1
𝑖𝑗 =tanh 1
𝑗∗0
𝑖∈𝑁×𝑠1𝑠2×𝑝.()
Before entering into the second stage, a nonlinear transfor-
mation layer is applied using the following equation:
1
𝑖𝑗 =⋅+1
𝑖𝑗2+(1−)log 1+𝐶1
𝑖𝑗 . ()
At the second stage, we share a similar process with
stage one, and the input images of the second stage are
{1
𝑖𝑗}𝑖=𝑁,𝑗=𝑉1
𝑖=1,𝑗=1 ∈
𝑠1𝑠2×𝑉1𝑁𝑝 .Forth image convoluted with th
lter and applied nonlinear transformation procedure at the
previous stage, aer patch mean removal we have
1
𝑖𝑗 =1
𝑖𝑗,1,1
𝑖𝑗,2,...,1
𝑖𝑗,𝑝∈𝑠1𝑠2×𝑝
=1,2,...,, =1,2,...,
1.()
Evidence-Based Complementary and Alternative Medicine
1st patch 2nd patch 3rd patch 4th patch
5th patch
···
···
4∗16 matrix
F : Illustration of 2×2patch size taking for a 5×5image.
For each input image 0
𝑖,weget1feature maps. So we
combine them and obtain
1
𝑖𝑗𝑖=𝑁,𝑗=𝑉1
𝑖=1,𝑗=1 =1
11,1
12,...,1
1𝑉1,1
21,...,1
𝑁𝑉1. ()
Similar to stage , we save the largest 2eigenvalues of
1(1)𝑇to get the PCA lters at the second stage, followed
by convolutional layer and nonlinear transformation layer.
2=reshape 2,1,2∈𝑠1×𝑠2.()
e convoluted and nonlinear transformation layers are
2
𝑖𝑗 =tanh 2
𝑗∗1
𝑖𝑗∈𝑁×𝑠1𝑠2×𝑉1𝑁𝑝
2
𝑖𝑗 =⋅+2
𝑖𝑗2+(1−)log 1+𝐶1
𝑖𝑗 . ()
2.1.2. Feature Pooling. e last component of CHDNet is
the feature pooling layer, containing histogram, multiscale
feature analysis, high dispersal, and local response normaliza-
tion. We illustrate this layer more clearly by taking a specic
input image 0
𝑖as an example.
(a) Histogram. For each set of feature maps, we convert feature
maps belonging to the corresponding lter at the last stage
into one histogram image whose every pixel is an integer in
the range [0,255]and treated as a distinct “word”; 𝑖𝑗 can be
expressed as
𝑖𝑗 =𝑉1
𝑘=1 2𝑘−1 mod 256×Heviside 2
(𝑖×𝑘)𝑗.
𝑗=𝑗min 𝑗
max 𝑗−min 𝑗×255.
()
(b) Multiscale Feature Analysis.Foreachhistogramimage𝑖𝑗,
we constructed a sequence of grids at resolutions 0,1,...,.
Let 𝑙
𝑖𝑗 denote the histogram of 𝑖𝑗 at resolution ,sothat
𝑙
𝑖𝑗() is the vector containing numbers of points from 𝑖𝑗
that fall into the th cell of the grid according to dierent
words. We cascade 𝑙
𝑖𝑗() to build a multiscale feature map
as
𝑖𝑗 =0
𝑖𝑗 (1),1
𝑖𝑗 (2),...,𝐿
𝑖𝑗 ()∈𝐺×256
= 𝐿
𝑙=02𝑙, =1,2,...,
2.()
(c) High Dispersal. For each multiscale feature map, we use
high dispersal to prevent degenerate situations and enforce
competition between features by
𝑖𝑗(𝑥,𝑦) =⋅ 𝑖𝑗(𝑥,𝑦)
𝑟,𝑐
𝑝=1,𝑞=1
𝑖𝑗(𝑥,𝑦)
2.()
(d) Local Normalization. For each feature at the same position
in dierent multiscale feature maps, we use local normaliza-
tion to prevent redundancy by
𝑖𝑗(𝑥,𝑦) =
𝑖𝑗(𝑥,𝑦)
+min(𝑉2,𝑗+𝑛/2)
max(1,𝑗−𝑛/2)
𝑖𝑗(𝑥,𝑦) 2𝛽.()
Finally, feature vector of the input images is then dened
as
𝑖=vec 𝑖1,vec 𝑖2,...,vec 𝑖𝑉2𝑇.()
e parameters ,,,,,,andin (), (), (), and
() are determined by experimental experiences. We are
Evidence-Based Complementary and Alternative Medicine
using a grid search method to determine these parameters
basedontherandomlyselectedtrainingsamples.Tobe
more specic, ∈[10
−10,1]with the step of , =0or ,
∈[0,10]with the step of , ∈[10
−6,102]with the step of
, ∈[0,1]with the step of ., and =(315×)/84,where
 ∈ [0.59,0.60]with the step of 10−2,andwith the step of .
e number of lters 1and 2andthesizeofthePCAlter
1×2are decided as suggested in [].
ItshouldbenoticedthatwhencomparedwithPCANet
[], our CHDNet only shares some similarity in learning
PCA kernels, but the structure of our CHDNet and especially
the techniques in the feature pooling layer are new and
uniquely designed to address the tongue image classication
problem and signicantly improve its performance.
3. Experiments
We used the same dataset as in our previous work [].
Raw tongue image samples were acquired from Dongzhimen
Hospital, Beijing University of Chinese Medicine. We have
only collected  cases, and the proposed CHDNet for
tongue image classication will be transformed into mobile
application. More cases will be obtained by practical applica-
tion. e  cases include  normal cases and  abnormal
cases diagnosed by clinicians. Our tongue image classication
consists of two steps: feature extraction and classication.
Feature extraction step can be further divided into training
and testing stage. During training stage of feature extraction
step, in order to learn unbiased convolutional kernels, we
randomly choose  normal and  abnormal samples (about
26.67% of total number of the whole tongue images in
dataset) as training set, which is used to learn convolutional
kernels and determine parameters ,,,,,,and.With
the learned kernels and determined parameters, we extract
features of the le  samples. As a result, feature repre-
sentations for  samples are obtained. en these feature
representations are sent into the classier. All of the reported
results in this section are the averaged outcomes aer 10
rounds of 5-fold cross validation.
Besides accuracy, sensitivity, and specicity, we also use
precision, recall, and 1-score to evaluate the performance
of our proposed method and other methods. ese indices
arecommonlyusedindetectionliterature[].Accuracy
(ACC) = (TP +TN)/(TP +FN +FP +TN), sensitivity
(SEN)=TP/(TP +FN), specicity (SPE) = TN/(TN +FP),
positive predictive value (PPV) = TP/(TP +FP),negative
predictive value (NPV) = TN/(TN +FN),and1-score =
(2×TP)/(2×TP +FP +FN),where
(i) TP (True Positive) is the number of positive samples
correctly predicted by the system;
(ii) TN (True Negative) is the number of negative samples
correctly predicted by the system;
(iii) FP (False Positive) is the number of false detection
instances of positive samples by the system;
(iv) FN (False Negative) is the number of actual positive
missed by the system.
T : Performance comparisonon the tongue images dataset with
the proposed components of CHDNet.
Method Accuracy Sensitivity Specicity
PCANet 84.77%100.00%0.00%
PCANet + NT 85.40%100.00%4.17%
PCANet + MFA 86.01%99.25%12.31%
PCANet + HD 87.37%98.16%27.35%
PCANet + LRN 84.77%100.00%0.00%
CHDNet with all four
components 91.14%94.26%75.40%
3.1. Impact of the Proposed Components of CHDNet. As il-
lustrated in Section , our proposed CHDNet proposed
modications are based on PCANet from four aspects. e
example in Table  shows how these new ideas improve upon
the PCANet method and contribute to our nal performance
gain. For a fair comparison, we use LIBLINEAR SVM [] as
the classier for all methods listed in Table . Here, we use the
tongue images dataset and demonstrate that the combination
oftheproposedhighdispersal(HD)method,localresponse
normalization (LRN), multiscale feature analysis (MFA), and
nonlinear transform (NT) is able to signicantly improve
tongue image recognition rate from 84.77%achievedby
the original PCANet algorithm to 91.14%. It is well known
that the misdiagnosis rate decreases with higher sensitivity,
and the misdiagnosis rate decreases with higher specicity.
is means that although PCANet [] combined with
LIBLINEAR SVM [] can correctly classify the abnormal
samples, it can hardly recognize the normal status. With the
help of our four components, specicity improves greatly at
the cost of sensitivity decreasing slightly.
3.2. Unbalanced Dataset Processing. In our classication task,
the majority of examples are from one of the classes. e
number of abnormal data is much larger than that of
normal data, since most patients come to visit the hospital
only when they feel ill. Unbalance in the class distribution
oen causes machine-learning algorithms to perform poorly
on the minority class. erefore we need to improve the
performance of classier with unbalanced dataset. In this
paper, in order to achieve better performance, we adjusted the
class weight of normal and abnormal samples. Since the SVM
tendstobebiasedtowardsthemajorityclass,weshouldplace
a heavier penalty on misclassied minority class.
For weighted LIBLINEAR SVM [], we tuned the class
weight for validation and training set. As we used -fold
cross validation strategy, we partitioned the whole  feature
representations obtained by our proposed CHDNet into 
training and testing sets in  dierent ways. e weighted
LIBLINEAR SVM [] was rst trained with  subsets, and
then its performance was evaluated on the th subset. Each
subset was used as the test set once through repeating the
process  times. e nal result was obtained by averaging
the outcome produced in the  corresponding rounds. For
each round, we assign weight to each class, with the majority
class always set to  and the minority class given larger weight,
Evidence-Based Complementary and Alternative Medicine
T : Performance comparison for weighted LIBLINEAR SVM.
Normal : Abnormal Accuracy Sensitivity Specicity -mean
1: 0 =1 : 1 90.61%93.79%72.93%68.40%
1: 0 =2 : 1 91.01%94.22%73.36%69.12%
1: 0 =3 : 1 90.62%94.57%68.60%64.88%
1: 0 =4 : 1 90.94%94.19%72.89%68.66%
1: 0 =5 : 1 91.02%94.79%70.07%66.42%
1: 0 =6 : 1 90.55%94.28%69.89%66.25%
1: 0 =7 : 1 90.59%94.87%66.91%63.48%
1: 0 =8 : 1 91.14%94.26%75.40%71.07%
1: 0 =9 : 1 91.04%94.39%72.27%68.22%
namely, integers ranging from  to . Since we raised weight
of the minority class, the cost of misclassifying the minority
class goes up. As a result, True Positive rate becomes higher
whileTrueNegativerateturnsouttobelower.Tothisend,
in order to measure the balance of accuracy in our problem,
we used the geometric mean (-mean) [] of sensitivity and
specicity:
=Sensitivity ×Specicity.()
is measure has the distinctive property of being indepen-
dent of the distribution of examples between classes. e
resultinTableshowsthatwhentheweightoftwoclasses
is  :  the model outperforms other weight congurations on
normal and abnormal samples.
3.3. Comparison of Classication Accuracy Using Dierent
Feature Extraction Methods. During training stage, CHDNet
learns convolutional kernels, and feature representations for
training set can also be obtained. During testing stage,
testing samples are convoluted with the PCA lters learned
at training stage and applied as a nonlinear transformation.
Aerfeaturepoolinglayer,thefeaturesthatresultedfrom
CHDNet combined with features of training set are fed into
LIBLINEAR SVM [].
According to the experimental experience [], the num-
ber of lters is xed to 1=
2=8,andtheltersizeis
5×5. We experienced set parameters =10
−8,=1,=2,
=10−4, = 0.75, =2.238,and=5for our CHDNet.
Table  shows some tongue images (with the patient
numbers shown as N, N, D, and X) in our
dataset and the prediction labels are based on the training
LIBLINEAR SVM [] classier. Besides, tongue images
labeled with  represent that patients are in normal status,
while  reects an abnormal gastritis condition. Table  lists
pathological information by TCM, which indicates that the
prediction label based on our CHDNet and LIBLINEAR
SVM [] is in line with the diagnosis of the Chinese medical
physician.
Sensitivity refers to the test’s ability to correctly detect
patients who do have the condition, and specicity relates
to the test’s ability to correctly detect patients without a
condition. As a result, in the context of computer aided
tongue image classication, we pay more attention to sen-
sitivity and specicity than other indices (e.g., accuracy,
positive predictive value, negative predictive value, and 1-
measure) in order to nd a trade-o between sensitivity and
specicity.
e proposed CHDNet framework was compared with
both low-level and high-level feature extracting approaches
quantitatively on the same dataset under the seven dierent
classiers. e compared methods are single feature obtained
by HOG, LBP, and SIFT, multifeatures that resulted from
the combination of three mentioned single features, state-of-
the-art hand-craed features calculated by Doublets [] and
Doublets + HOG [], and high-level features generated by
PCANet. Experimental results show that the proposed feature
extraction method outperformed both low-level and high-
level feature extracting approaches. From Table , although
some feature extraction methods like HOG or LBP can
achieve the high sensitivity, their specicities under the
same classier which yields high sensitivity is relatively low.
For example, HOG features combined with GBDT achieves
100.00% sensitivity, and the specicity is only 54.58%. If
HOGfeaturescombinedwithaCARTclassier,itcanachieve
56.31% specicity, which is the best performance among the
seven classiers. However, its sensitivity is only 90.19%. at
is to say, when the distribution of samples is unbalanced,
the tongue images classication model based on HOG or
LBPfeaturesisunbalanced.ispropertyalsoholdstrue
for other single feature extracting approaches. We can also
see that multifeatures have the power of containing richer
information for building a more accurate classication model
when compared with single features. However, specicity is
still not acceptable. While high-level feature learned through
training samples can achieve the best sensitivity, they can
hardly construct a balanced classication model.
As shown in Table , by adding nonlinear transformation,
multiscale feature analysis, high dispersal, and local response
normalization, our CHDNet achieved a 91.14%recognition
accuracy rate, which is more than six percentage points o
PCANet []. We noticed that the sensitivity of our proposed
CHDNet is about 4.8% inferior to the best performance;
however, the specicity of our CHDNet is at least 8%superior
when compared to other feature extraction methods. ese
resultsindicatethattheclassicationmodelbasedonthese
comparisons seems partial to the majority of the tongue
Evidence-Based Complementary and Alternative Medicine
T : Some normal and abnormal tongue images classied by our method.
Patient’s
number Original image Mask Tongue body Normalization Predict
label
Actual
label
N 
N 
D 
X 
e rst column is the original image of 1728 × 1296 pixels; the second column is the background mask; the third column is the extraction of tongue body; the
fourth column is the color space transformed tongue body image; the h column is the normalized 512 × 512 pixels image; the last column is the predicted
label.
image data. In addition, the receiver operating characteristic
(ROC)curveisusuallyusedtoevaluatetheperformanceof
classication models. Since we repeated 5cross validations 10
times,FiguregivesthemeanROCcurveofeachmentioned
method. e bigger the area under curve (AUC), the better
the performance of the model. From Figure , we can see
the AUC of our CHDNet is equal to 0.94,whichisthe
closest one among the compared methods. is indicates
that our proposed CHDNet has the best performance when
compared with the other four mentioned feature extraction
methods. Even positive and negative samples are unevenly
distributed.
3.4. Comparison of Classication Accuracy Using Dierent
Classiers. Another important issue for automatic classi-
cation of tongue images is to develop a high accuracy
classier. To apply deep learning models to the tongue image
classication task, the problem can be considered as a feature
extraction problem of digital signals or images, cascading
with a classier [].
Aer obtaining feature representations, dierent ma-
chine-learning algorithms have been used for classication
tasks. Among them, distance-based models, support vector
machines (SVM), and tree based models are three widely
used algorithms.
e performance of our proposed CHDNet incorporating
LIBLINEAR SVM [] was also compared with other classi-
ers, including LDA, KNN, CART, GBDT, RF, and LIBSVM
using identical data and features. Generally, the performances
of CART, RF, and GBDT are comparatively poor because
the dimension of features obtained by our CHDNet is very
high and these tree models are inferior to simple classiers.
Besides, since we handle classication problems well with
the unbalanced distribution tongue images dataset, the per-
formance of LDA and KNN performance is not as good as
weighted LIBLINEAR SVM [].
In bioinformatics, SVM is a commonly used tool for
classication or regression purposes with high-dimensional
features []. Instead of using LIBSVM [] as the classier,
we use LIBLINEAR SVM []. e reason is that LIBLINEAR
 Evidence-Based Complementary and Alternative Medicine
0.0 0.2 0.4 0.6 0.8 1.0
False Positive rate
True Positive rate
0.8
1.0
0.6
0.4
0.2
0.0
HOG (area = 0.86)
LBP (area = 0.89)
SIFT (area = 0.71)
HOG + LBP (area = 0.90)
HOG + SIFT (area = 0.71)
LBP + SIFT (area = 0.71)
HOG + LBP + SIFT (area = 0.71)
PCANet (area = 0.76)
Doublets (area = 0.55)
Doublets + HOG (area = 0.52)
Our method (area = 0.94)
F : Mean receiver operating characteristic of dierent feature extraction methods on tongue images dataset.
T : Pathological information by TCM.
Patient’s
number
Pathological
feature
Clinical
practitioners
subjective
diagnosis
Diagnosis
result
N — Normal
N — Normal
D Supercial
gastritis
Cold Zheng-
deciency
cold of the
spleen
Abnormal
X Atrophic
gastritis
Hot
Zheng-damp
heat in the
spleen and
the stomach
Abnormal
e rst column is the patient’s number, the second column is pathological
feature, the third column is clinical practitioners subjective diagnosis, and
the last column is the clinical practitioners diagnosis result.
SVM [] performs better than LIBSVM [] when the
number of samples is far smaller than the number of features.
As in our CHDNet, the number of samples is 315,and
thenumberoffeaturesofeachsampleis43008.So,com-
pared with LIBSVM [], LIBLINEAR SVM [] is a better
choice.
As shown in Table , the overall performance of LIB-
LINEAR SVM [] is the best of the six classiers in
terms of accuracy, specicity, precision, recall, and 1-score
(specied in bold). Aer optimizing the weights of the
LIBLINEAR SVM [], the accuracy of LIBLINEAR SVM
can reach 91.14%, which is 6.24% higher than LDA. Besides,
the specicity of LIBLINEAR SVM [] improves from 3%
to 25%whencomparedwithdistance-basedmodelsand
treestructuremodels.roughthecomparison,wecansee
that SVM classier [, ] with the optimal parameters is
superior to the other ve methods. e LIBLINEAR SVM
[] method increases the performance accuracy to 91.14%
and improves other performance measurements in dierent
levels, which are the best of all the other classiers.
is indicates that our feature extraction framework
combined with LIBLINEAR SVM [] can be considered as
a reliable indicator to normal and abnormal samples.
4. Conclusions
In this paper, we proposed a new framework for tongue
imagesclassicationonunsupervisedfeaturelearningmeth-
ods. We learned features with CHDNet and trained a
weighted LIBLINEAR SVM classier to predict normal/
abnormal patients. With this novel framework, tests show
that our framework combined with weighted LIBLINEAR
SVM can obtain suitable features, which are able to construct
the most balanced prediction model when compared with
Evidence-Based Complementary and Alternative Medicine 
T : Comparison of the proposed method with other feature extracting approaches.
LDA KNN CART GBDT RF LIBSVM LIBLEAR SVM
HOG [] Sensitivity 91.65%100.00%90.19%100.00%100.00%99.93%98.02%
Specicity 58.00%22.33%56.31%54.58%47.42%48.82%51.27%
LBP Sensitivity 99.70%100.00%90.49%92.06%100.00%100.00%100.00%
Specicity 64.56%60.93%65.91%65.16%58.29%58.64%58.84%
SIFT Sensitivity 98.95%100.00%90.49%92.78%100.00%98.28%98.17%
Specicity 41.44%0.20%58.18%55.20%62.31%44.53%45.42%
HOG + LBP Sensitivity 99.93%99.96%91.38%92.85%100.00%100.00%100.00%
Specicity 49.60%56.69%64.49%67.24%58.20%60.33%60.36%
HOG + SIFT Sensitivity 100.00%100.00%91.87%92.32%100.00%98.19%97.76%
Specicity 59.91%0.42%60.56%58.29%62.49%43.87%46.62%
LBP + SIFT Sensitivity 100.00%100.00%91.58%91.90%100.00%98.31%97.87%
Specicity 58.89%0.82 62.31%63.33%60.20%44.07%44.33%
HOG + LBP + SIFT Sensitivity 99.96%100.00%91.95%92.85%100.00%98.27%98.24%
Specicity 59.67%0.87%65.64%66.33%59.93%43.18%46.18%
Doublets [] Sensitivity 91.88%100.00%93.22%93.67%100.00%100.00%100.00%
Specicity 36.71%0.00%48.91%49.42 25.51%0.00%0.00%
Doublets + HOG [] Sensitivity 92.35%100.00%94.22%94.38%100.00%100.00%100.00%
Specicity 29.96%0.00%51.80%46.44%27.29%0.00%0.00%
PCANet [] Sensitivity 100.00%98.35%87.05%88.50%100.00%100.00%100.00%
Specicity 0.00%14.09%29.51%28.40 0.00%0.00%0.00%
Our method Sensitivity 90.68%93.11%92.44%92.63%93.37%94.68%94.26%
Specicity 52.91%69.18%61.56%59.98%65.62%71.18%75.40%
Note. e sum rule of feature combination is a cascade operation. Given two types of features 𝑓𝑖and 𝑓𝑗obtained by feature extraction methods FEM𝑖and
FEM𝑗, respectively, then FEM𝑖+FEM𝑗is equal to [𝑓𝑖,𝑓
𝑗]. For example, HOG + LBP means, for each sample, we append LBP features just aer HOG features.
Besides, “our method” is short for the proposed CHDNet feature extraction method.
T : Comparison of the proposed method with dierent classiers.
Classier ACC SEN SPE PPV NPV 1-score
LDA 84.90%91.09%50.42%91.16%51.10%91.08%
KNN 89.55%93.10%69.71%94.56%65.18%93.78%
CART 87.67%93.03%57.80%92.56%60.04%92.75%
GBDT 87.91%93.18%58.60%92.65%62.72%92.87%
RF 88.92%93.29%64.60%93.70%64.65%93.45%
LIBSVM 91.27%94.76%72.04%95.00%72.90%94.83%
LIBLINEAR SVM 91.14%94.22%75.40%95.59%74.56%94.83%
Note. ACC = accuracy, SEN = sensitivity, SPC = specicity, PPV = positive predictive value, and NPV = negative predictive value.
other feature extracting methods. For future study, we would
like to develop a real-time computer aided tongue diagnosis
system based on this approach.
Conflicts of Interest
e authors declare that they have no conicts of interest.
Acknowledgments
is work was supported in part by the NSFC-Zhejiang Joint
Fund for the Integration of Industrialization and Informati-
zation under Grant no. U, and it was supported in
part by the National Natural Science Foundation of China
(Grantno.,Grantno.,andGrantno.
). e authors would like to thank Professor Shao Li
for providing the data.
References
[] Y.Jiao,X.Zhang,L.Zhuo,M.Chen,andK.Wang,“Tongue
image classication based on Universum SVM,” in Proceedings
of the 3rd International Conference on BioMedical Engineering
and Informatics (BMEI ’10), pp. –, IEEE, Yantai, China,
October .
[] B.Zhang,X.Wang,J.You,andD.Zhang,“Tonguecoloranalysis
for medical application,Evidence-based Complementary and
Alternative Medicine,vol.,ArticleID,pages,.
 Evidence-Based Complementary and Alternative Medicine
[]T.Obafemi-Ajayi,R.Kanawong,D.Xu,S.Li,andY.Duan,
“Features for automated tongue image shape classication,” in
Proceedings of the IEEE International Conference on Bioinfor-
matics and Biomedicine Workshops (BIBMW ’12), pp. –,
October .
[] C. H. Li and P. C. Yuen, “Tongue image matching using color
content,Pattern Recognition,vol.,no.,pp.,.
[] Y.-G. Wang, J. Yang, Y. Zhou, and Y.-Z. Wang, “Region partition
and feature matching based color recognition of tongue image,
Pattern Recognition Letters,vol.,no.,pp.,.
[] Q. Li and Z. Liu, “Tongue color analysis and discrimination
based on hyperspectral images,Computerized Medical Imaging
and Graphics,vol.,no.,pp.,.
[] X. Wang, B. Zhang, Z. Yang, H. Wang, and D. Zhang, “Statistical
analysis of tongue images for feature extraction and diagnos-
tics,IEEE Transactions on Image Processing,vol.,no.,pp.
–, .
[] B. Zhang and H. Zhang, “Signicant geometry features in
tongue image analysis,Evidence-Based Complementary and
Alternative Medicine,vol.,ArticleID,pages,.
[] G.Cao,J.Ding,Y.Duan,L.Tu,J.Xu,andD.Xu,“Classication
of tongue images based on doublet and color space dictio-
nary,” in Proceedings of the IEEE International Conference on
Bioinformatics and Biomedicine (BIBM ’16), Shenzhen, China,
December .
[] L. Zhang, L. Yang, and T. Luo, “Unied saliency detection model
using color and texture features,PLoS ONE, vol. , no. ,
Article ID e, .
[] L. Zhang, Y. Sun, T. Luo, and M. M. Rahman, “Note: a manifold
ranking based saliency detection method for camera,Review of
Scientic Instruments,vol.,no.,ArticleID,.
[]R.Kanawong,T.Obafemi-Ajayi,J.Yu,D.Xu,S.Li,andY.
Duan, “ZHENG classication in Traditional Chinese Medicine
based on modied specular-free tongue images,” in Proceed-
ings of the IEEE International Conference on Bioinformatics
and Biomedicine Workshops (BIBMW ’12), pp. –, IEEE,
Philadelphia, Pa, USA, October .
[] Z. Guo, “Tongue image matching using color and texture,” in
Proceedings of the International Conference on Medical Biomet-
rics (ICMB ’08), pp. –, Hong Kong, .
[] B. Pang, D. Zhang, and K. Wang, “Tongue image analysis for
appendicitis diagnosis,Information Sciences,vol.,no.,pp.
–, .
[] X. Zhang, J. Zhang, G. Hu, and Y. Wang, “Preliminary study
of tongue image classication based on multi-label learning,
in Advanced Intelligent Computing eories and Applications,
vol.  of Lecture Notes in Computer Science, pp. –,
Springer, .
[] Y. Bengio, A. Courville, and P. Vincent, “Representation learn-
ing: a review and new perspectives,IEEE Transactions on
Pattern Analysis and Machine Intelligence,vol.,no.,pp.
–, .
[] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet clas-
sication with deep convolutional neural networks,” in Pro-
ceedings of the 26th Annual Conference on Neural Information
Processing Systems (NIPS ’12), December .
[] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, “DeepFace: clos-
ing the gap to human-level performance in face verication,” in
Proceedings of the 27th IEEE Conference on Computer Vision and
Pattern Recognition (CVPR ’14),pp.,June.
[] D. G. Lowe, “Distinctive image features from scale-invariant
keypoints,” International Journal of Computer Vision,vol.,no.
, pp. –, .
[] N. Dalal and B. Triggs, “Histograms of oriented gradients for
human detection,” in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR
’05), pp. –, IEEE, San Diego, Calif, USA, June .
[]F.Wang,W.Zuo,L.Zhang,D.Meng,andD.Zhang,“A
kernel classication framework for metric learning,IEEE
Transactions on Neural Networks and Learning Systems,vol.,
no.,pp.,.
[] J. Ding, G. Cao, and D. Meng, “Classication of tongue images
based on doublet SVM,” in Proceedings of the International
Symposium on System and Soware Reliability (ISSSR ’06),pp.
–, Shanghai, China, October .
[] T.-H.Chan,K.Jia,S.Gao,J.Lu,Z.Zeng,andY.Ma,“PCANet:
a simple deep learning baseline for image classication?” IEEE
Transactions on Image Processing,vol.,no.,pp.,
.
[] G.E.Hinton,N.Srivastava,A.Krizhevsky,I.Sutskever,andR.
R. Salakhutdinov, “Improving neural networks by preventing
co-adaptation of feature detectors,Computing Science,vol.,
no. , pp. –, .
[] J. W. Huang and C. Yuan, “Weighted-PCANet for face recog-
nition,” in Neural Information Processing,vol.ofLecture
Notes in Computer Science, pp. –, Springer, Berlin,
Germany, .
[] S. Wang, L. Chen, Z. Zhou, X. Sun, and J. Dong, “Human fall
detection in surveillance video based on PCANet,Multimedia
Tools and Applications,vol.,no.,pp.,.
[] Z. Huang, W. Xue, Q. Mao, and Y. Zhan, “Unsupervised domain
adaptation for speech emotion recognition using PCANet,
Multimedia Tools and Applications,pp.,.
[] T. Ojala, M. Pietik¨
ainen, and T. M¨
aenp¨
a¨
a, “Multiresolution
gray-scale and rotationinvariant texture classication with local
binary patterns,IEEE Transactions on Pattern Analysis and
Machine Intelligence,vol.,no.,pp.,.
[]R.Kanawong,T.Obafemi-Ajayi,T.Ma,D.Xu,S.Li,and
Y. Duan, “Automated tongue feature extraction for ZHENG
classication in Traditional Chinese Medicine,Evidence-Based
Complementary and Alternative Medicine,vol.,ArticleID
,  pages, .
[] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin,
“LIBLINEAR: a library for large linear classication,Journal of
Machine Learning Research,vol.,pp.,.
[] V. N.Vapnik, Statistical Learning eory, Adaptive and Learning
Systems for Signal Processing, Communications, and Control,
John Wiley & Sons, Inc., New York, NY, USA, .
[] F.Shen,C.Shen,X.Zhou,Y.Yang,andH.T.Shen,“Faceimage
classication by pooling raw features,Pattern Recognition,vol.
,pp.,.
[] C. C. Chang and C. J. Lin, “LIBSVM: a library for support
vector machines,ACM Transactions on Intelligent Systems and
Tech n o l o g y ,vol.,no.,pp.,.
... With the assistance of artificial intelligence (AI), tongue diagnosis will be objective and people without medical knowledge can give themselves a preliminary diagnosis of a health condition. In recent years, much effort has been spent on AI-based tongue diagnosis, especially in the field of tongue color recognition [5,6], tongue shape analysis [7], cracks segmentation [8], thickness, and moisture of tongue coating classification [9,10]. ...
... Swin-Transformer Encoder Z 10,11 Swin-Transformer Encoder Z [4][5][6][7][8][9] Swin-Transformer Encoder Z 2,3 Pre-trained Position Embedding Figure 4: Architecture of tongue segmentor. e tongue image is divided into several patches and then added with position embeddings to retain spatial information. ...
Article
Full-text available
Tongue diagnosis is a convenient and noninvasive clinical practice of traditional Chinese medicine (TCM), having existed for thousands of years. Prickle, as an essential indicator in TCM, appears as a large number of red thorns protruding from the tongue. The term “prickly tongue” has been used to describe the flow of qi and blood in TCM and assess the conditions of disease as well as the health status of subhealthy people. Different location and density of prickles indicate different symptoms. As proved by modern medical research, the prickles originate in the fungiform papillae, which are enlarged and protrude to form spikes like awn. Prickle recognition, however, is subjective, burdensome, and susceptible to external factors. To solve this issue, an end-to-end prickle detection workflow based on deep learning is proposed. First, raw tongue images are fed into the Swin Transformer to remove interference information. Then, segmented tongues are partitioned into four areas: root, center, tip, and margin. We manually labeled the prickles on 224 tongue images with the assistance of an OpenCV spot detector. After training on the labeled dataset, the super-resolutionfaster-RCNN extracts advanced tongue features and predicts the bounding box of each single prickle. We show the synergy of deep learning and TCM by achieving a 92.42% recall, which is 2.52% higher than the previous work. This work provides a quantitative perspective for symptoms and disease diagnosis according to tongue characteristics. Furthermore, it is convenient to transfer this portable model to detect petechiae or tooth-marks on tongue images.
... Their result shows that their method is more practical and accurate than the traditional methods. Meng et al. [130] propose a feature extraction framework called constrained high dispersal neural network (CHDNet) to extract unbiased features and reduce human labor for tongue diagnosis. High dispersal and local response normalization operation are introduced to address the issue of redundancy. ...
Article
Full-text available
Traditional Chinese Medicine (TCM), as an effective alternative medicine, utilizes tongue diagnosis as a major method to assess the patient’s health status by examining the tongue’s color, shape, and texture. Tongue images can also give the pre-disease indications without any significant disease symptoms, which provides a basis for preventive medicine and lifestyle adjustment. However, traditional tongue diagnosis has limitations, as the process may be subjective and inconsistent. Hence, computer-aided tongue diagnoses have a great potential to provide more consistent and objective health assessments. This paper reviewed the current trends in TCM tongue diagnosis, including tongue image acquisition hardware, tongue segmentation, feature extraction, color correction, tongue classification, and tongue diagnosis system. We also present a case of TCM constitution classification based on tongue images.
... Firstly, they made use of massive dataset containing 1548 tongue images captured by several tools. Meng et al. [16] presented an innovative feature extraction module termed CHDNet to extract unbiased features and reduce the human effort upon tongue diagnosis in TCM. CNN model is known to have primarily concentrated on learning convolution filter and adjusting the weights amongst itself. ...
Article
The rapid development of biomedical imaging modalities led to its wide application in disease diagnosis. Tongue-based diagnostic procedures are proficient and non-invasive in nature to carry out secondary diagnostic processes ubiquitously. Traditionally, physicians examine the characteristics of tongue prior to decision-making. In this scenario, to get rid of qualitative aspects, tongue images can be quantitatively inspected for which a new disease diagnosis model is proposed. This model can reduce the physical harm made to the patients. Several tongue image analytical methodologies have been proposed earlier. However, there is a need exists to design an intelligent Deep Learning (DL) based disease diagnosis model. With this motivation, the current research article designs an Intelligent DL-based Disease Diagnosis method using Biomedical Tongue Images called IDLDD-BTI model. The proposed IDLDD-BTI model incorporates Fuzzy-based Adaptive Median Filtering (FADM) technique for noise removal process. Besides, SqueezeNet model is employed as a feature extractor in which the hyperparameters of SqueezeNet are tuned using Oppositional Glowworm Swarm Optimization (OGSO) algorithm. At last, Weighted Extreme Learning Machine (WELM) classifier is applied to allocate proper class labels for input tongue color images. The design of OGSO algorithm for SqueezeNet model shows the novelty of the work. To assess the enhanced diagnostic performance of the presented IDLDD-BTI technique, a series of simulations was conducted on benchmark dataset and the results were examined in terms of several measures. The resultant experimental values highlighted the supremacy of IDLDD-BTI model over other state-of-the-art methods.
... Machine learning is increasingly being used to overcome this basic problem of representation, prediction, and treatment selection in the branch of medicine (Schultebraucks et al., 2019). The advantage of machine learning is that it analyzes various data types and integrates them into the research in disease risks, diagnosis, prognosis, and appropriate treatment (Kee et al., 2019), such as disease risk prediction (Poplin et al., 2018), tongue diagnosis (Meng et al., 2017;Dai and Wang, 2018), medication rule analysis (You et al., 2019), and prediction of the risk of medicine-induced injury (Saini et al., 2018). Deep Learning (DL) uses a multilayer neural network structure to decompose complex mappings into a series of nested simple mappings and extracts from local features to overall features layer by layer to solve complex problems. ...
Article
Full-text available
Purpose: This study proposes an S-TextBLCNN model for the efficacy of traditional Chinese medicine (TCM) formula classification. This model uses deep learning to analyze the relationship between herb efficacy and formula efficacy, which is helpful in further exploring the internal rules of formula combination. Methods: First, for the TCM herbs extracted from Chinese Pharmacopoeia , natural language processing (NLP) is used to learn and realize the quantitative expression of different TCM herbs. Three features of herb name, herb properties, and herb efficacy are selected to encode herbs and to construct formula-vector and herb-vector. Then, based on 2,664 formulae for stroke collected in TCM literature and 19 formula efficacy categories extracted from Yifang Jijie , an improved deep learning model TextBLCNN consists of a bidirectional long short-term memory (Bi-LSTM) neural network and a convolutional neural network (CNN) is proposed. Based on 19 formula efficacy categories, binary classifiers are established to classify the TCM formulae. Finally, aiming at the imbalance problem of formula data, the over-sampling method SMOTE is used to solve it and the S-TextBLCNN model is proposed. Results: The formula-vector composed of herb efficacy has the best effect on the classification model, so it can be inferred that there is a strong relationship between herb efficacy and formula efficacy. The TextBLCNN model has an accuracy of 0.858 and an F 1 -score of 0.762, both higher than the logistic regression (acc = 0.561, F 1 -score = 0.567), SVM (acc = 0.703, F 1 -score = 0.591), LSTM (acc = 0.723, F 1 -score = 0.621), and TextCNN (acc = 0.745, F 1 -score = 0.644) models. In addition, the over-sampling method SMOTE is used in our model to tackle data imbalance, and the F 1 -score is greatly improved by an average of 47.1% in 19 models. Conclusion: The combination of formula feature representation and the S-TextBLCNN model improve the accuracy in formula efficacy classification. It provides a new research idea for the study of TCM formula compatibility.
... Later, they utilize ResNet34 CNN framework for extracting features and performs classification. Meng et al. [27] proposed a new feature extraction architecture named CHDNet for extracting unbiased features and decrease human labor to diagnoses tongue in TCM. Prior CNN modules are mainly concentrated on learning convolution filters and adapting weights among themselves, however, these modules contain 2 main problems: redundancy and inadequate ability in managing unbalanced sample distribution. ...
Article
Full-text available
In recent times, internet of things (IoT) and wireless communication techniques become widely used in healthcare sector. Biomedical image processing is commonly employed to detect the existence of diseases using biomedical images. Tongue diagnosis is an efficient, non-invasive model to perform auxiliary diagnosis any time anywhere that is support the global necessity in the primary healthcare system. Conventionally, medical practitioners investigate the tongue features based on their expert’s knowledge comes from experience. In order to eradicate the qualitative aspects, tongue images can be quantitatively examined, offering an effective disease diagnostic process in such a way that the physical harm of the patients can be minimized. Numerous tongue image analysis approaches exist in the literature, it is required to develop automated deep learning (DL) models to diagnose the diseases using tongue image analysis. In this view, this paper designs an automated IoT and synergic deep learning based tongue color image (ASDL-TCI) analysis model for disease diagnosis and classification. The proposed ASDL-TCI model operates on major stages namely data acquisition, pre-processing, feature extraction, classification, and parameter optimization. Primarily, the IoT devices are used to capture the human tongue images and transmitted to the cloud for further analysis. In addition, median filtering based image pre-processing and SDL based feature extraction techniques are employed. Moreover, deep neural network (DNN) based classifier is applied to determine the existence of the diseases. Lastly, enhanced black widow optimization (EBWO) based parameter tuning process takes place to enhance the diagnostic performance. For assessing the effectual performance of the ASDL-TCI model, a set of simulations take place on benchmark tongue images and examined the results under distinct dimensions. The simulation outcome verified the enhanced diagnostic performance of the ASDL-TCI model over the compared methods with the maximum precision, recall, and accuracy of 0.984, 0.973, and 0.983.
... However, traditional tongue diagnosis is affected by factors such as the external environment and doctors' subjective clinical experience. Computerized tongue diagnosis systems are gradually being accepted by an increasing number of clinicians as a medical application for the health assessment and diagnosis of diseases, such as type-2 diabetes mellitus [4][5][6][7], breast cancer [8], colorectal cancer [9], appendicitis [10], and gastritis [11]. ...
Article
Full-text available
Background Tongue diagnosis is an important research field of TCM diagnostic technology modernization. The quality of tongue images is the basis for constructing a standard dataset in the field of tongue diagnosis. To establish a standard tongue image database in the TCM industry, we need to evaluate the quality of a massive number of tongue images and add qualified images to the database. Therefore, an automatic, efficient and accurate quality control model is of significance to the development of intelligent tongue diagnosis technology for TCM. Methods Machine learning methods, including Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Adaptive Boosting Algorithm (Adaboost), Naïve Bayes, Decision Tree (DT), Residual Neural Network (ResNet), Convolution Neural Network developed by Visual Geometry Group at University of Oxford (VGG), and Densely Connected Convolutional Networks (DenseNet), were utilized to identify good-quality and poor-quality tongue images. Their performances were made comparisons by using metrics such as accuracy, precision, recall, and F1-Score. Results The experimental results showed that the accuracy of the three deep learning models was more than 96%, and the accuracy of ResNet-152 and DenseNet-169 was more than 98%. The model ResNet-152 obtained accuracy of 99.04%, precision of 99.05%, recall of 99.04%, and F1-score of 99.05%. The performances were better than performances of other eight models. The eight models are VGG-16, DenseNet-169, SVM, RF, GBDT, Adaboost, Naïve Bayes, and DT. ResNet-152 was selected as quality-screening model for tongue IQA. Conclusions Our research findings demonstrate various CNN models in the decision-making process for the selection of tongue image quality assessment and indicate that applying deep learning methods, specifically deep CNNs, to evaluate poor-quality tongue images is feasible.
... In the 1970s, AI was applied in the field of TCM diagnosis [5], which provided a rare opportunity for the development of objective and modernized TCM diagnosis, but the problems of logical reasoning and objective quantification were not well solved and its development speed was slow [6]. In recent years, thanks to the rapid development of microsensors [7], computer image analysis, speech recognition technology [8], and deep learning [9,10], the programmatic innovation of TCM has been accelerated and a milestone progress has been made in the standardization and normalization of TCM diagnosis [11]. e purpose of this paper is to review the application and development of AI in assisting TCM diagnosis and to analyze the current development bottlenecks and future development directions of AI-assisted TCM diagnosis. ...
Article
Full-text available
As an emerging comprehensive discipline, artificial intelligence (AI) has been widely applied in various fields, including traditional Chinese medicine (TCM), a treasure of the Chinese nation. Realizing the organic combination of AI and TCM can promote the inheritance and development of TCM. The paper summarizes the development and application of AI in auxiliary TCM diagnosis, analyzes the bottleneck of artificial intelligence in the field of auxiliary TCM diagnosis at present, and proposes a possible future direction of its development.
Article
Objective: To study the current situation of the application of artificial intelligence in Chinese medicine diagnosis. Methods: In the past ten years, the Chinese databases China Knowledge Network, Wanfang database, and English databases Pub Med, web of science, Science Direct, and Google scholars were used to study the application of artificial intelligence in Chinese medicine diagnosis with the theme words or keywords artificial intelligence, machine learning, deep learning, Chinese medicine diagnosis, Chinese medicine diagnosis, and Chinese medicine diagnosis respectively. Machine learning, deep learning, TCM diagnosis, four diagnoses. Intelligent diagnosis, intelligent Chinese medicine, and so on, to filter out the literature related to intelligent Chinese medicine diagnosis, and categorize them by looking, intelligent Chinese medicine diagnosis by smelling, by asking, by cutting, and so on, and conduct more literature related to intelligent diagnosis of TCM was categorized into the intelligent diagnosis of TCM viewing, intelligent diagnosis of TCM smelling, intelligent diagnosis of TCM questioning and intelligent diagnosis of TCM cutting, and analyzed in depth. Results: The Chinese and English literature on the intelligent research of the four diagnoses of TCM in the past ten years was summarized, and the research hotspots in this field were analyzed in-depth on this basis. Conclusion: Artificial intelligence technology is widely used in TCM clinical diagnosis, and the application of artificial intelligence technology makes TCM diagnosis technology more accurate and can effectively help modernize and standardize TCM research.
Article
Ethnopharmacological relevance Tongue coating has been used as an effective signature of health in traditional Chinese medicine (TCM). The level of greasy coating closely relates to the strength of dampness or pathogenic qi in TCM theory. Previous empirical studies and our systematic review have shown the relation between greasy coating and various diseases, including gastroenteropathy, coronary heart disease, and coronavirus disease 2019 (COVID-19). However, the objective and intelligent greasy coating and related diseases recognition methods are still lacking. The construction of the artificial intelligent tongue recognition models may provide important syndromes diagnosis and efficacy evaluation methods, and contribute to the understanding of ethnopharmacological mechanisms based on TCM theory. Aim of the study The present study aimed to develop an artificial intelligent model for greasy tongue coating recognition and explore its application in COVID-19. Materials and methods Herein, we developed greasy tongue coating recognition networks (GreasyCoatNet) using convolutional neural network technique and a relatively large (N = 1486) set of tongue images from standard devices. Tests were performed using both cross-validation procedures and a new dataset (N = 50) captured by common cameras. Besides, the accuracy and time efficiency comparisons between the GreasyCoatNet and doctors were also conducted. Finally, the model was transferred to recognize the greasy coating level of COVID-19. Results The overall accuracy in 3-level greasy coating classification with cross-validation was 88.8% and accuracy on new dataset was 82.0%, indicating that GreasyCoatNet can obtain robust greasy coating estimates from diverse datasets. In addition, we conducted user study to confirm that our GreasyCoatNet outperforms TCM practitioners, yet only consuming roughly 1% of doctors’ examination time. Critically, we demonstrated that GreasyCoatNet, along with transfer learning, can construct more proper classifier of COVID-19, compared to directly training classifier on patient versus control datasets. We, therefore, derived a disease-specific deep learning network by finetuning the generic GreasyCoatNet. Conclusions Our framework may provide an important research paradigm for differentiating tongue characteristics, diagnosing TCM syndrome, tracking disease progression, and evaluating intervention efficacy, exhibiting its unique potential in clinical applications.
Article
Background and Objective : The modernization of tongue diagnosis is an important research in Traditional Chinese Medicine. Accurate and practical tongue segmentation method is a premise in subsequent analyses. In this paper, an unsupervised tongue segmentation method is proposed based on an improved gPb-owt-ucm algorithm. The gPb-owt-ucm is short for global pixel point, oriented watershed transform and ultrametric contour map. Methods: Improved gPb-owt-ucm algorithm is adopted in this paper because of its powerful contour detection capabilities. The boundary feasibility of each pixel is calculated by the weight of pixel, and the result is converted to multiple closed regions and hierarchical tree. Finally, locating tongue accurate boundary by rectangular slider is taken to perform the final tongue segmentation. Two experiments are designed to evaluate its effectiveness by comparing with the snake method. Results : 300 tongue images were tested (150 images for the diabetes and 150 images for the health) in two experiments. The first one is to validate boundary detection performance (CBDR experiment). The second one is for validation of classification performance (CCE experiment) between diabetic and healthy tongues. In CBDR experiment, the mean and variance of IoU obtained using our improved gPb-owt-ucm method are 0.72±0.19, which are better than the snake method. In CCE experiment, the obtained precision and F1-score using our method are 1.0 and 0.97 over diabetic data respectively, and results of 0.94, 0.97 over health data. Conclusion : The effectiveness of our improved unsupervised gPb-owt-ucm method is validated in comparisons with the snake method. In the future, we plan to combine the proposed method with a supervised method in order to achieve more improvements for the tongue segmentation.
Article
Full-text available
Research in emotion recognition seeks to develop insights into the variances of features of emotion in one common domain. However, automatic emotion recognition from speech is challenging when training data and test data are drawn from different domains due to different recording conditions, languages, speakers and many other factors. In this paper, we propose a novel feature transfer approach with PCANet (a deep network), which extracts both the domain-shared and the domain-specific latent features to facilitate performance improvement. The proposal attempts to learn multiple intermediate feature representations along an interpolating path between the source and target domains using PCANet by considering the distribution shift between source domain and target domain, and then aligns other feature representations on the path with target subspace to control them to change in the right direction towards the target. To exemplify the effectiveness of our approach, we select the INTERSPEECH 2009 Emotion Challenge’s FAU Aibo Emotion Corpus as the target database and two public databases (ABC and Emo-DB) as source set. Experimental results demonstrate that the proposed feature transfer learning method outperforms the conventional machine learning methods and other transfer learning methods on the performance.
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
Research focused on salient object region in natural scenes has attracted a lot in computer vision and has widely been used in many applications like object detection and segmentation. However, an accurate focusing on the salient region, while taking photographs of the real-world scenery, is still a challenging task. In order to deal with the problem, this paper presents a novel approach based on human visual system, which works better with the usage of both background prior and compactness prior. In the proposed method, we eliminate the unsuitable boundary with a fixed threshold to optimize the image boundary selection which can provide more precise estimations. Then, the object detection, which is optimized with compactness prior, is obtained by ranking with background queries. Salient objects are generally grouped together into connected areas that have compact spatial distributions. The experimental results on three public datasets demonstrate that the precision and robustness of the proposed algorithm have been improved obviously.
Conference Paper
Weighted-PCANet, a novel feature learning method is proposed to face recognition by combining Linear Regression Classification model (LRC) and PCANet construction. The sample specific hat matrix is used to handle different images in feature extraction stage. After appropriate adaption, the performance of this new model outperform than various mainstream methods including PCANet for face recognition on Extended YaleB dataset. Particularly, various experiments testify the robustness of weighted-PCANet while dealing with less training samples or corrupted data.
Conference Paper
Tongue diagnosis characterization is a key research issue in the development of Traditional Chinese Medicine (TCM). Many kinds of information, such as tongue body color, coat color and coat thickness, can be reflected from a tongue image. That is, tongue images are multi-label data. However, traditional supervised learning is used to model single-label data. In this paper, multi-label learning is applied to the tongue image classification. Color features and texture features are extracted after separation of tongue coat and body, and multi-label learning algorithms are used for classification. Results showed LEAD (Multi-Label Learning by Exploiting Label Dependency), a multi-label learning algorithm demonstrating to exploit correlations among labels, is superior to the other multi-label algorithms. At last, the iteration algorithm is used to set an optimal threshold for each label to improve the results of LEAD. In this paper, we have provided an effective way for computer aided TCM diagnosis.