Conference PaperPDF Available

A Reinforcement Learning Framework for Medical Image Segmentation

Authors:

Abstract and Figures

This paper introduces a new method to medical image segmentation using a reinforcement learning scheme. We use this novel idea as an effective way to optimally find the appropriate local thresholding and structuring element values and segment the prostate in ultrasound images. Reinforcement learning agent uses an ultrasound image and its manually segmented version and takes some actions (i.e., different thresholding and structuring element values) to change the environment (the quality of segmented image). The agent is provided with a scalar reinforcement signal determined objectively. The agent uses these objective reward/punishment to explore/exploit the solution space. The values obtained using this way can be used as valuable knowledge to fill a Q-matrix. The reinforcement learning agent can use this knowledge for similar ultrasound images as well. The results demonstrate high potential for applying reinforcement learning in the field of medical image segmentation.
Content may be subject to copyright.
A Reinforcement Learning Framework
for Medical Image Segmentation
Farhang Sahba, Member, IEEE, and Hamid R. Tizhoosh, and Magdy M.A. Salama, Fellow, IEEE
Abstract This paper introduces a new method to medical
image segmentation using a reinforcement learning scheme.
We use this novel idea as an effective way to optimally find
the appropriate local thresholding and structuring element
values and segment the prostate in ultrasound images. Re-
inforcement learning agent uses an ultrasound image and
its manually segmented version and takes some actions (i.e.,
different thresholding and structuring element values) to change
the environment (the quality of segmented image). The agent
is provided with a scalar reinforcement signal determined
objectively. The agent uses these objective reward/punishment
to explore/exploit the solution space. The values obtained using
this way can be used as valuable knowledge to fill a Q-matrix.
The reinforcement learning agent can use this knowledge for
similar ultrasound images as well. The results demonstrate high
potential for applying reinforcement learning in the field of
medical image segmentation.
I. INTRODUCTION
Many applications in medical imaging need to segment
an object in the image [1]. Ultrasound imaging is an impor-
tant image modality for clinical applications. The accurate
detection of the prostate boundary in ultrasound images is
crucial for diagnostic tasks [2]. However, in these images
the contrast is usually low and the boundaries between the
prostate and background are fuzzy. Also speckle and weak
edges make the ultrasound images inherently difficult to
segment. The prostate boundaries are generally extracted
from transrectal ultrasound (TRUS) images [2]. Prostate seg-
mentation methods generally have limitations when there are
shadows with similar gray level and texture attached to the
prostate, and/or missing boundary segments. In these cases
the segmentation error may increase considerably. Another
obstacle may be the lack of a sufficient number of training
(gold) samples if a learning technique is employed and the
samples are being prepared by an expert as done in the
supervised methods. Algorithms based on active contours
have been quite successfully implemented with the major
drawback that they depend on user interaction to determine
the initial snake. Therefore, a more universal approach should
require a minimum level of user interaction and training data
set.
Farhang Sahba is with the Pattern Analysis and Machine Intelligence Lab-
oratory, Department of System Design Engineering, University of Waterloo,
Waterloo, Ontario , Canada ( email: fsahba@uwaterloo.ca).
Hamid R. Tizhoosh is with the Pattern Analysis and Machine Intelli-
gence Laboratory, Department of System Design Engineering, University of
Waterloo, Waterloo, Ontario , Canada (email: tizhoosh@uwaterloo.ca).
Magdy M.A. Salama is with the Department of Electrical and Computer
Engineering, University of Waterloo, Waterloo, Ontario , Canada (email:
msalama@hivolt.uwaterloo.ca).
Considering the above factors our new algorithm based on
reinforcement learning (RL) is introduced to locally segment
the prostate in ultrasound images. The most important con-
cept of RL is learning by trial and error based on interaction
with the environment [3], [4]. It makes the RL agent suitable
for dynamic environments. Its goal is to find out an action
policy that controls the behavior of the dynamic process,
guided by signals (reinforcements) that indicate how well it
has been performing the required task.
In the case of applying this method to medical image
segmentation, the agent takes some actions (i.e., different
values for thresholding and structuring element for a mor-
phological operator) to change its environment (the quality
of the segmented object). Also, states are defined based on
the quality of this segmented object. First, the agent takes the
image and applies some values. Then it receives an objective
reward or punishment obtained based on comparison of
its result with the goal image. The agent tries to learn
which actions can gain the highest reward. After this stage,
based on the accumulated rewards, the agent has appropriate
knowledge for similar images as well.
In our algorithm we use this reinforced local parameter
adjustment to segment the prostate. The proposed method
will control the local threshold and the post-processing
parameter by using a reinforcement learning agent. The main
purpose of this work is to demonstrate this ability that as an
intelligent technique, reinforcement learning can be trained
using a very limited number of samples and also can gain
extra knowledge during online training. This is a major
advantage in contrast to other approaches (like supervised
methods) which either need a large training set or significant
amount of expert or a-priori knowledge.
This paper is organized as follows: Section II is a short
introduction to reinforcement learning. Section III describes
the proposed method. Section IV presents results and the last
part, section V, concludes the work.
II. REINFORCEMENT LEARNING
Reinforcement learning (RL) is based on the idea that an
artificial agent learns by interacting with its environment
[3], [4]. It allows agents to automatically determine the
ideal behavior within a specific context that maximizes
performance with respect to predefined measures. Several
components constitute the general idea behind reinforcement
learning. The RL agent is the decision-maker of the process
and attempts to take an action recognized by the environment.
It receives a reward or punishment from its environment
depending on the action taken. The RL agents discover which
0-7803-9490-9/06/$20.00/©2006 IEEE
2006 International Joint Conference on Neural Networks
Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada
July 16-21, 2006
1238
actions bring more reward using exploration and exploitation.
The agent also receives information concerning the state of
the environment. At the beginning of the learning process
the RL agent does not have any knowledge about how
promising taking different actions are [3]. It takes the various
actions, and observes the results. After a while, the agent
has explored many actions which bring the highest reward
and gradually begins to exploit them. In fact, the agent
acquires knowledge of the actions and eventually learns to
perform the actions that are the most rewarding. During this
process it tries to meet a certain goal relating to the state
of the environment. The reward and punishment could be
defined objectively when they are defined using a function;
or gained subjectively when they are given to the agent by
an experienced operator.
Action policy πis the strategy used by the agent to select
an action to change the current state. The agent must make
a trade-off between immediate and long-term return. It must
explore unseen states as well as the states which maximize its
return by choosing what it already knows. Therefore, there
needs to be a balance between exploration of unseen states
and exploitation of familiar (rewarding) states.
Reinforcement learning learns online, and can continu-
ously learn and adapt while performing the required task.
This behavior is useful for many cases like medical imaging
where precise learning samples are difficult or impossible to
obtain [3], [7].
The design of RL agents is based on the definition of the
problem at hand. Figures 1(a) and 1(b) show the general
components of reinforcement learning and the model used
in our proposed approach, respectively. The agent, which is
the decision maker of the process, observes the state of the
environment. Then it takes an action based on the former
experience associated with the current observation and ac-
cumulated reinforcement (reward/punishment). Finally, The
agent receives a reward or punishment from its environment
depending on the action taken.
Q-Learning, a popular technique proposed by Watkins in
1989, is an iterative method for action policy learning [5],
[6]. This off-policy method is one of the most commonly
used RL methods used in temporal difference learning [4].
Boltzman policy is frequently used to estimates the proba-
bility of taking each action agiven a state s. The probability
used in this policy is calculated as follows [3]:
p(a)= eQ(s,a)
θ
eQ(s,:)
θ
.(1)
In this equation θis the temperature. It is initialized with
high value and decreases when the numbers of iterations
increases. Also there are other policies for Q-Learning
such as ε-greedy and greedy. The ε-greedy performs for
some applications better than greedy because in the greedy
policy all actions are not explored, while ε-greedy selects
the action with the highest Q-value in a given state, with
probability of 1εand selects other ones with probability
of ε. Considering action atwhen visiting state stand
following an action policy such as Boltzman exploration,
Q-learning algorithm can be defined as given in Table I.
TABLE I
Q-LEARNING ALGORITHM [3].
Initialize Q(s, a)randomly
Repeat (for each episode)
Initialize s
Repeat (for each iteration of episode)
Choose afrom susing policy πderived from Q
Take action a, observe reward r, and next state s
Q(s, a)Q(s, a)+α[r+γmax
aQ(s,a
)Q(s, a)]
ss
until sis terminal
Fig. 1. (a) A general model for Reinforcement learning agent, (b) Model
used in proposed approach.
III. PROPOSED APPROACH
Reinforcement learning has already been used for some
other image processing applications [7], [8], [9], [10]. In this
paper, we show that it enables us to implement the task of
prostate segmentation in a new way.
In our proposed approach we divide the ultrasound image
to several sub-images and use two main stages to locally seg-
ment the objects of interest. We first threshold the sub-images
using local values. Due to some disturbing factors such as
speckle and low contrast we usually have many artifacts after
thresholding. Therefore, we use morphological opening in a
second stage to locally post-process each thresholded sub-
image. The reinforcement learning agent determines the local
thresholding value and the size of structuring element for
each individual sub-image.
To construct the RL agent, three components; states,
actions and reward should be defined. The Q-matrix can be
1239
constructed according to definition for states and actions.
The RL agent starts its work using an ultrasound image
and its manually segmented version. The agent works on
each sub-image and using the gold standard (obtained from
the manually segmented version) explores the solution space
for that sub-image. During this time the RL agent changes
the local thresholding values and the size of structuring
element for each sub-image individually. By taking each
action the agent receives corresponding reward/punishment
for that state-action pair and updates the corresponding
value in Q-matrix. After this process the agent has explored
many actions and tries to exploit the most rewarding ones.
This method is specifically useful for prostate ultrasound
images where there are several images from a patient having
inherently the same characteristics. In such a case, instead
of parameter adjustment for each individual input image or
using a large training data set to cover all possible cases,
we can use some of them and acquire their knowledge
to segment the other ones. It is also useful to gain extra
knowledge during online training when the agent tries to
segment new images.
Figure 2 (a) and (b) illustrate a prostate ultrasound image
and its manually segmented version. They can be employed
as a sample reference images to gain the knowledge for the
RL agent.
Fig. 2. (a) Original ultrasound image, (b) Its manually segmented version
A. Defining the States
To define the states, following features have been consid-
ered:
1) The location of the sub-images: To segment the image
locally, we divide it into MSrows and NScolumns (totally
MS×NSsub-images) and the RL agent works on each of
them separately. The location of each sub-image is used as
a state parameter.
2) Existence of attached parts to the prostate and/or
missing boundary segments: Generally, prostate segmen-
tation methods have limitations when the image contains
irrelevant parts with similar gray level (usually caused by
shadow) attached to the prostate, and/or missing boundary
segments. When we threshold the sub-images these attached
and missing parts may be revealed as well. The presence
and intensity of these parts on the prostate boundary can be
evaluated as a state parameter.
In our proposed algorithm we use a method to represents
that how much these parts exist on the prostate. To recognize
the irregularity in the boundary of the segmented object (in
our case prostate) we use signature of a contour combined
with an estimator based on Kalman filter [11], [12].
Asignature is a functional representation of a contour,
generated in various techniques [12], [13]. In our approach
we use a signature based on the distance versus angle. In this
method, we suppose that the geometric center of the prostate
in original image is given by the user and the distance from
the points on the boundary to the geometric center of the
object, is represented as a 2πperiodic function. Generally,
in a signature one angle θmay have several distances rand
we may represent it as a 2D function f(θ, r)containing the
values 0and 1. But because we want to find the points
where an irregularity is starting (due to the attached parts
and missing boundary segments), for each angle we use the
nearest corresponding contour point as measured data. Using
this method the signature can always be described as a 1D
function.
Because we use the geometric center of the object shape
this representation is invariant to translation. Also we nor-
malize rto make this transformation scale invariant. Because
we just need to detect abrupt changes in the signature path
as irregular points, our method is not sensitive to orientation
as well.
To find the points corresponding to irregular parts we can
use an estimator based on Kalman filter [11]. We can use
some properties of this filter to evaluate the data on the object
signature and detect the existence of the attached and/or
missing parts. To implement such a technique, we simulate
the problem of signature tracing as a dynamic tracking
system. In this system the data located on the signature of
the segmented object are used as the input (measurement
data) for the tracking filter. Using such an estimator the
Kalman filter can track the trajectory of the signature for
a whole period. Each data on the signature brings updated
information for the current and future data. We simulate it as
a 1D dynamic movement. For this movement we can consider
1240
the position and velocity as the variables which describe the
state of the system. Using this method we can estimate the
position and eventually the abrupt changes on the border of
the segmented object.
In our case, we have one variable for position and one for
velocity. We represent the state variables based on the data
located on a signature. Whenever we want to extract the state
parameter we can use the geometric center Oof the whole
segmented area to define such state variables and consider
the following state vector:
x=r
˙r,(2)
where ris the distance between the geometric center
O(xc,y
c)and pixels (xp,y
p)located on the border of the
prostate (signature value). For each value of rthere is a
corresponding angel θbetween the vertical axis and r. Hence,
the following equations can be considered for r, ˙rand θ:
r=(xpxc)2+(ypyc)21
2,(3)
˙rycos θxsin θ, (4)
θ= tan1ypyc
xpxc,(5)
where ˙ris the radial velocity. Using the above state vector we
represent the sequential data on the signature of the detected
object in terms of θ. This estimator considers a discrete
dynamic model contains state and measurement equations:
xk=Ak1xk1+Bk1Wk1,(6)
Zk=Hkxk+Vk,(7)
where xis the state vector in equation 2. Other components
are defined as follows:
Ak=1T
01
,(8)
Bk=T2
2
T,(9)
Hk=10
,(10)
where WkN(0,Q
k)and VkN(0,R
k)are the
process and measurement noise, respectively. Tis the interval
which represents the changes in the state and measurement
equations. The value of Tdoes not affect the final result
and therefore we can choose it as T=1for simplicity.
The values of Rand Qare the square of measurement and
process noise covariance, respectively. The accelerations in
the radius is modelled as a zero-mean, white, Gaussian noise
W. Also the measurement data Zkwhich is calculated based
on the location of boundary pixels is assumed to be a noisy
version of the actual position.
Kalman filter starts the estimation using the signature of
the segmented object in the thresholded image . In each se-
quential iteration, the points on the signature (corresponding
to the points along the prostate border) are used as measured
data and the Kalman filter estimates the next r. These
predicted values determine a point as the next one on the
signature. Also it predicts ˙rfor the next iteration. When we
go to the next iteration the new data on the signature is the
new measured data for the filter. This data is compared to
the predicted position from the previous iteration. If there
is sufficient correlation between them the measured data is
incorporated to update the filter state, otherwise based on
the shape of prostate the prediction point is considered as
measured data and filter starts the next iteration. To measure
the correlation we implement an association process between
the predicted and measured data. For this association process
we use an interval δr, the so-called “association interval”
around the predicted point. Only the data located on the
signature and inside of this interval are considered as valid
measurements for updating the filter. For good performance,
the association interval must be adaptive. This means that
its size must be varied. It can be changed based on the
covariance of the Kalman filter so that it maximizes the
presence of valid data and minimize invalid data. In one
dimensional problems, it can be represented as the following
form:
α.δrL, (11)
The value of Lis a constant and αis the element of the
Kalman filter covariance. In the case when there is no
data inside the association interval we need to capture it.
Therefore the value of Lshould be larger gradually. Figure
3 shows the r,θand association interval for a sample point
on the prostate border.
Fig. 3. r,θand association interval for a sample point on the prostate
border
When the border reaches an attached shadow, or a missing
boundary segment is encountered, then there is an abrupt
change in the pixels’ path . These sharp changes are con-
sidered as new paths for tracking process. After a few
iterations Kalman filter detects that such cases do not belong
to the true path because the data do not correlate with the
followed path. The association interval is made larger until
it again captures the true data on the object signature which
have enough correlation with the prediction point. Using
1241
the association technique the data belonging to shadows and
missing segments on the prostate border are detected.
In the area that border is changing smoothly the measure-
ment data is placed inside the association interval and the
value of measurement noise in matrix Rshould be small.
In the case when there is no data inside the association gate
we cannot be sure about the validity of measurement data.
Therefore, in these cases the value of measurement noise in
Rshould be large. Also the value of process noise in matrix
Qsimulates the small variation around the estimated point.
Figure 4 (a) illustrates the irrelevant parts that may be
revealed after thresholding. In this figure the parts AB,CD
and GH are attached parts and EF is a missing boundary
segment. Figure 4 (b) shows the points used to make the
signature and consequently for the Kalman filter. Also 4 (c)
shows the result of kalman filter on the signature of the
segmented object in part AB. The estimation of filter for
the border is marked with ’×’ sign.
The above process needs to be applied on the whole
segmented object. Therefore, to find the points corresponding
to the attached and missing parts we look at the whole
segmented image. When we detect the potential points we
note that in which sub-image they are located and follow the
local operations.
Using this method, if there exists an attached or missing
part on the prostate border we can estimate its thickness as:
thickness = Thickness of attached or missing parts
The discretized value of this thickness is used as a
parameter to define the state for the RL agent.
B. Defining the Actions
For each sub-image the agent must adjust the threshold
value and the size of structuring element for morphological
opening. This can be done by increasing and decreasing
of the assigned local thresholding value for each sub-
images. We can add/subtract a specific value (±Tr)to
increase/decrease the threshold (Tr). Also we can use a sim-
pler way by taking some predefined values (T1,T
1, ..., Tn)
between the maximum and minimum gray levels in each
iteration. For morphological opening we increase/decrease
the size of structuring element in a specific interval or choose
among some predefined values (s1,s
2, ..., sn).
C. Defining Reward/Punishment
In order to define an objective reward/punishment we
need to have a criterion for how well the object has been
segmented in each sub-image. We can use several criteria
for this purpose. A straightforward method is to compare
the results before and after action based on the quality of
segmented objects. To measure this for each sub-image we
note that how much the quality is changed after the action.
For each sub-image, for high increase in the quality of
segmented object the agent receives high rewards, for the
Fig. 4. (a) Attached parts and missing boundary segments, (b) The points
used to make the data for object signature and consequently for the Kalman
filter, (c) The result of kalman estimator on the signature of the segmented
object in part AB. The estimation is marked by ’×’ sign.
medium increase it will receive less, and for the decreasing
quality it will be punished:
reward =1.DD0,
2.DD0,(12)
where Dis the normalized difference between the quality
measure before and after taking the action which is automat-
ically determined based on increasing or decreasing of the
attached or missing parts. In this equation 1and 2are the
constant values.
D. Offline Procedure and Testing
Now the system is completely designed and can start using
a reference image and its segmented version. The states and
actions are based on what we designed in section III-A and
III-B, respectively. The perfect output image is available
using manually segmented version. For reward/punishment
function, we use the same equation 12 but for the quality
measure of each sub-image we calculate in how far the
1242
similarity with the perfect output image is changed after the
action was taken. To measure this similarity we can calculate
the percentage of the pixels that are the same in the perfect
output image and the image segmented by the RL agent.
During this procedure, the system must explore the pa-
rameter space. It can be achieved using the Boltzman
policy with a high temperature or ε-greedy policy. After
a sufficiently large number of iterations, the Q-matrix is
filled with appropriate values. It means that the agent can
estimate the best action for each given state. Then we can
use the system on new samples. The agent must find the
appropriate thresholding and post-processing parameter (size
of structuring element) for each sub-image such that the
prostate can be correctly segmented. The system takes its
action based on the knowledge it has gained already. After
a limited number of iterations the system can recognize the
optimal values and segment the prostate.
IV. RESULTS AND DISCUSSIONS
In this section we present and discuss the results of
the proposed approach. The ultrasound and the manually
segmented version illustrated in Figure 2(a) and (b) can
be used as sample. We implemented an ε-greedy policy to
explore/exploit the solution space. The ultrasound image was
divided to MS=3rows and NS=4columns. The number
of discrete levels for thickness (as over-segmentation and
under-segmentation) was set to 9. Because we have 12 sub-
images in our case, there are 9×12 = 108 states in total.
The RL agent was trained using a total of 5000 iterations for
all sub-images.
The threshold action is defined as increasing/decreasing
of a specific value for the current local threshold. This
value is equal to 1
10 of the difference between the maximum
and minimum gray levels for each sub-image or 0for
no change. For the post-processing action (morphological
opening operator) we chose the size of structuring element
among values 0,5,10 or 20. For calculation of reward we
choose 1=2=10(see Eq. 12). After the performing of
procedure the Q-matrix was filled with appropriate values. In
fact, the agent gained enough knowledge to recognized the
optimum values for each sub-image.
In the test stage we used 6similar sample images from
the same patient in order to verify the segmentation results.
Figure 5 shows these test images (images I1-I6). In all cases,
after a limited number of iterations (usually less than 20
for the conducted experiments) the agent could segment the
prostate and terminate the process.
To quantitatively evaluate our results, we have used a
similarity measure, η, based on the misclassification rate as
a general criterion in image segmentation [14], [15]:
η= 100 ×|BOBT|+|FOFT|
|BO|+|FO|,(13)
where BOand FOdenote the background and foreground
of the perfect image (manually segmented), BTand FT
denote the background and foreground area pixel in the
result image, and |.|is the cardinality of the set. Table II
shows the summarized results for these images.
TABLE II
RESULTS FOR TEST IMAGES I1-I6 (SEE FIGURE 5)
Sample I1 I2 I3 I4 I5 I6
η%95.5 94.8 96.6 95.7 94.3 95.6
Table II shows that for simple cases the proposed approach
has acceptable results to use as the input of a fine tune
segmentation algorithm. For instance the result of proposed
approach may be used as initial snake for the well-known
method introduced in [16] or as a coarse estimation for
the methods introduced by authors in [17]. Even in some
parts that the original image has good quality, the results of
proposed approach can be matched with final segmentation.
V. C ONCLUSIONS
In this work, a reinforcement learning method as a novel
idea for prostate segmentation was proposed and some results
were illustrated. First, the image is divided to some sub-
images. Then in an offline stage, the agent takes some
actions (i.e. changing the thresholding value and the size of
structuring element) to change its environment (the quality of
the segmented parts) in each sub-image. After this stage, the
agent takes actions with maximum reward for each possible
state for each sub-image. It can choose the appropriate values
for the input image with similar characteristics based on its
accumulated knowledge. The proposed method can be trained
for object segmentation in medical images to achieve an
acceptable level of performance. The idea in this method has
the potential to be used as the main segmentation approach,
or as an interim stage to serve other segmentation methods.
This method was applied to some similar test ultrasound
images containing prostate. Based on a simple similarity
measure, we showed the effectiveness of proposed approach.
Our future work will concentrate on extension of the al-
gorithm. Adaptive selection of number of sub-images, and
integration of more and robust features will be investigated.
Adding other operations like noise filtering to be controlled
by the RL agent will be tested. Also, more appropriate quality
measures (usually used in medical imaging) must be apply
to evaluate the performance more accurately.
REFERENCES
[1] C. Mettlin, American society national cancer detection project, Cancer,
vol. 75, pp. 1790-1794, 1995.
[2] M. F. Insana and Brown D. G., Acoustic scattering theory applied to
soft biological tissues, Ultrasonic Scattering in biological tissues, CRC
Press, Boca Raton,pp 76-124, 1993.
[3] R.S. Sutton, and A.G. Barto , Reinforcement Learning, MIT Press,
Cambridge, MA, 1998.
[4] S. Singh, P. Norving, D.Cohn, Introduction to Reinforcement Learning,
Harlequin Inc.,1996.
[5] C.J.C.H. Watkins, P. Dayan,”Q-Learning” Machine Learning, 8, 279-
292, 1992.
[6] C. J. C. H. Watkins, Learning from Delayed Rewards Cambridge:
Cambridge University, 1989.
[7] Tizhoosh, H.R., Taylor, G.W.,” Reinforced Contrast Adaptation”, Proc.
To be published in International Journal of Image and Graphic, 2006.
1243
Fig. 5. The original image and its result for test images I1-I6: (a) Image
1, (b) Image 2, (c) Image 3, (d) Image 4, (e) Image 5, (f) Image 6
[8] M. Shokri., H. R. Tizhoosh, ” Q(λ)-based Image Threshold-
ing”,Canadian Conference on Computer and Robot Vision, 2004.
[9] F. Sahba., H. R. Tizhoosh, ”Filter Fusion for Image Enhancement Using
Reinforcement Learning”, CCECE03, Montreal, May 2003.
[10] F. Sahba, H. R. Tizhoosh, M. M. A. Salama, ” Using Reinforcement
Learning for Filter Fusion in Image Enhancement”, The Fourth IASTED
International Conference on Computational Intelligence, July 2005,
Calgary, Canada.
[11] Gelb, A. Applied Optimal Estimation, MIT Press, 1974.
[12] Rourke, J., ” The Signature Of A Plane Curve”, SIAM J. Comput, Vol.
15, No. 1, Feb 1986. pp. 34-51
[13] Holt, R. J. and Netravali, A. N., ” Using Line Correspondences
In Invariant Signatures For Curve Recognition”, Image and Vision
Computing, Vol. 11, No. 7, Sept. 1993. Pp. 440-446.
[14] B. Sankur, M. Sezgin, ” Survey over image thresholding technique and
quantitative performance evaluation”, journal of Electronic Imaging,
vol. 13, no. 1, pp 146-165, 2004.
[15] W. A. Yasnoff, J. K. Mui and J. W. Bacus,” Error measures for scene
segmentation”, Pattern Recognition, 9, 1977.
[16] H. M. Ladak, F. Mao, Y. Wang, D. B. Downey, D. A. Steinman, A.
Fenster, ” Prostate boundary segmentation from 2D ultrasound images”,
Medical Physics 27 , pp. 1777-1788, 2000.
[17] F. Sahba, H. R. Tizhoosh, M. M. A. Salama, ” A Coarse-to-fine
approach to prostate boundary segmentation in ultrasound images”,
BioMedical Engineering OnLine, 4:58, October 2005.
[18] M. Sezgin, B. Sankur, ” Survey over image thresholding techniques
and quantitative performance evaluation”, Journal of Electronic Imag-
ing, January 2004 , Volume 13, Issue 1, pp. 146-168
1244
... RL closely mimics how human analysts learn to decompose complex shapes into blocks and Fig. 1 a CAD model that cannot be automatically meshed entirely with hexahedra b model is subdivided into 6-sided blocks using cuts c each block is meshed by mapping a regular hexahedral mesh of a unit cube onto the block in recent years, RL, combined with deep neural networks (DNN), has matched or surpassed human-level skill in several fields [42]. It is worth noting that this study is different from the use of reinforcement learning for image segmentation in medicine [43] or in video processing [44,45] or segmentation of 3D point clouds [46]. ...
Article
Full-text available
The problem of hexahedral mesh generation of general CAD models has vexed researchers for over 3 decades and analysts often spend more than 50% of the design-analysis cycle time decomposing complex models into simpler blocks meshable by existing techniques. The decomposed blocks are required for generating good quality meshes (tilings of quadrilaterals or hexahedra) suitable for numerical simulations of physical systems governed by conservation laws. We present a novel AI-assisted method for decomposing (segmenting) planar CAD (computer-aided design) models into well shaped rectangular blocks. Even though the simple examples presented here can also be meshed using many conventional methods, we believe this work is proof-of-principle of a AI-based decomposition method that can eventually be generalized to complex 2D and 3D CAD models. Our method uses reinforcement learning to train an agent to perform a series of optimal cuts on the CAD model that result in a good quality block decomposition. We show that the agent quickly learns an effective strategy for picking the location and direction of the cuts and maximizing its rewards. This paper is the first successful demonstration of an agent autonomously learning how to perform this block decomposition task effectively, thereby holding the promise of a viable method to automate this challenging process for more complex cases.
... There are numerous studies demonstrating the use of images or pixel values in reinforcement learning (RL). These applications range from playing Atari games [10,11,12] to improving ultrasound images and diagnoses [13,14]. A series of image trajectories have also been adopted for training RL agents [15,16,17]. ...
Preprint
Full-text available
Traffic congestion is a persistent problem in our society. Existing methods for traffic control have proven futile in alleviating current congestion levels leading researchers to explore ideas with robot vehicles given the increased emergence of vehicles with different levels of autonomy on our roads. This gives rise to hybrid traffic control, where robot vehicles regulate human-driven vehicles, through reinforcement learning (RL). However, most existing studies use precise observations that involve global information, such as network throughput, as well as local information, such as vehicle positions and velocities. Obtaining this information requires updating existing road infrastructure with vast sensor networks and communication to potentially unwilling human drivers. We consider image observations as the alternative for hybrid traffic control via RL: 1) images are readily available through satellite imagery, in-car camera systems, and traffic monitoring systems; 2) Images do not require a complete re-imagination of the observation space from network to network; and 3) images only require communication to equipment. In this work, we show that robot vehicles using image observations can achieve similar performance to using precise information on networks, including ring, figure eight, merge, bottleneck, and intersections. We also demonstrate increased performance (up to 26%) in certain cases on tested networks, despite only using local traffic information as opposed to global traffic information.
Article
Full-text available
A segmentation model of the ultrasound (US) images of breast tumors based on virtual agents trained using reinforcement learning (RL) is proposed. The agents, living in the edge map, are able to avoid false boundaries, connect broken parts, and finally, accurately delineate the contour of the tumor. The agents move similarly to robots navigating in the unknown environment with the goal of maximizing the rewards. The individual agent does not know the goal of the entire population. However, since the robots communicate, the model is able to understand the global information and fit the irregular boundaries of complicated objects. Combining the RL with a neural network makes it possible to automatically learn and select the local features. In particular, the agents handle the edge leaks and artifacts typical for the US images. The proposed model outperforms 13 state-of-the-art algorithms, including selected deep learning models and their modifications.
Article
Full-text available
Artificial intelligence (AI) advancements, especially deep learning, have significantly improved medical image processing and analysis in various tasks such as disease detection, classification, and anatomical structure segmentation. This work overviews fundamental concepts, state-of-the-art models, and publicly available datasets in the field of medical imaging. First, we introduce the types of learning problems commonly employed in medical image processing and then proceed to present an overview of commonly used deep learning methods, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs), with a focus on the image analysis task they are solving, including image classification, object detection/localization, segmentation, generation, and registration. Further, we highlight studies conducted in various application areas, encompassing neurology, brain imaging, retinal analysis, pulmonary imaging, digital pathology, breast imaging, cardiac imaging, bone analysis, abdominal imaging, and musculoskeletal imaging. The strengths and limitations of each method are carefully examined, and the paper identifies pertinent challenges that still require attention, such as the limited availability of annotated data, variability in medical images, and the interpretability issues. Finally, we discuss future research directions with a particular focus on developing explainable deep learning methods and integrating multi-modal data.
Article
Full-text available
Contagious disease pandemics, such as COVID-19, can cause hospitals around the world to delay nonemergent elective surgeries, which results in a large surgery backlog. To develop an operational solution for providing patients timely surgical care with limited health care resources, this study proposes a stochastic control process-based method that helps hospitals make operational recovery plans to clear their surgery backlog and restore surgical activity safely. The elective surgery backlog recovery process is modeled by a general discrete-time queueing network system, which is formulated by a Markov decision process. A scheduling optimization algorithm based on the piecewise decaying ϵϵ\epsilon-greedy reinforcement learning algorithm is proposed to make dynamic daily surgery scheduling plans considering newly arrived patients, waiting time and clinical urgency. The proposed method is tested through a set of simulated dataset, and implemented on an elective surgery backlog that built up in one large general hospital in China after the outbreak of COVID-19. The results show that, compared with the current policy, the proposed method can effectively and rapidly clear the surgery backlog caused by a pandemic while ensuring that all patients receive timely surgical care. These results encourage the wider adoption of the proposed method to manage surgery scheduling during all phases of a public health crisis.
Conference Paper
Full-text available
Outlining, or segmenting, the prostate is a very important task in the assignment of appropriate therapy and dose for cancer treatment; however, manual outlining is a tedious and time-consuming task. In this paper, an algorithm is described for semi-automatic segmentation of the prostate from 2D ultrasound images. The algorithm uses model-based initialization and the efficient discrete dynamic contour. Initialization requires the user to select only four points from which the outline of the prostate is estimated using cubic interpolation functions and shape information. The estimated contour is then deformed automatically to better fit the image. The algorithm can easily segment a wide range of prostate images, and contour editing tools are included to handle more difficult cases. The performance of the algorithm with a single user was compared to manual outlining by a single expert observer. The average distance between semi-automatically and manually outlined boundaries was found to be less than 5 pixels (0.63 mm), and the accuracy and sensitivity were both over 90%
Conference Paper
Full-text available
A new approach to image enhancement based on fusion of a number of filters using a reinforcement learning scheme is presented. In most applications the result of applying a single filter is usually unsatisfactory. Appropriate fusion of the results of several different filters, such as median, local average, sharpening, and Wiener filters, can resolve this difficulty. Many different techniques already exist in literatures. In this work, a reinforcement-learning agent will be proposed. During learning, the agent takes some actions (i.e., different weights for filters) to change its environment (the image quality). Reinforcement is provided by a scalar evaluation determined subjectively by the user. The approach has several advantages. The user interaction eliminates the need for objective image quality measures. No formal user model is required. Finally, no training data is necessary. The paper describes the implementation and evaluation of a global reinforced adjustment of the weights of the different filters.
Article
Full-text available
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989). We show thatQ-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action-values are represented discretely. We also sketch extensions to the cases of non-discounted, but absorbing, Markov environments, and where manyQ values can be changed each iteration, rather than just one.
Article
Much information of potential diagnostic significance concerning acoustic tissuecharacteristics can be made available only through sophisticated signal processing. This requires a fundamental understanding of the basic acoustic interactions in tissue. The basic equations of acoustic scatteringtheory are reviewed as they relate to scattering in soft biological tissues. Using the inhomogeneous continuum model for scattering processes, it is shown how the basic scattering properties are derived for both coherent and incoherent scattering. The treatment begins with scattering from a single inhomogeneity and is then developed to cover scattering from random continua. Backscatter coefficients for isotropic random media are addressed and the extension to anisotropic media considered. An example is given of the analysis of backscatter measurements from the kidney.
Article
One of the problems in image processing is finding an appropriate threshold in order to convert an image to a binary one. In this paper we introduce a new method for image thresholding. We use reinforcement learning as an effective way to find the optimal threshold. Q(λ) is implemented as a learning algorithm to achieve more accurate results. The reinforcement agent uses objective rewards to explore/exploit the solution space. It means that there is not any experienced operator involved and the reward and punishment function must be defined for the agent. The results show that this method works successfully and can be trained for any particular application.
Article
The signature of a plane curve GAMMA associated with every point p of GAMMA the length of GAMMA to the left of or on the line tangent to GAMMA at p. The signature has properties that make it a useful tool for pattern recognition: it discards the location, orientation, and scale, and 'slant' in special cases, but preserves symmetries. Its integral is a measure of convexity. This paper explores the theoretical properties of this concept. It is shown that in the special case of closed rectilinear curves, the signature retains enough information to permit exact reconstruction of the curve. Computing the signature and reconstructing curves from their signatures are interesting computational problems; time complexity bounds on these problems are presented. Several challenging open questions are posed.
Article
Plane curves and space curves distorted by an affine or projective transformation may be recognized if Invariant descriptions of them are available. Recent research in this area has shown that it is possible to identify transformed curves through the use of various combinations of differential invariants and point correspondences. Purely differential invariants usually require very high order derivatives of the space curves, though taking advantage of point correspondences sharply reduces the order of derivatives necessary. In cases where point correspondences are not available but line correspondences are, it is still possible to construct invariant signatures of the curves without increasing the order of derivatives necessary. Using just first order derivatives, invariant signature functions can be established for plane curves using one line correspondence for curves subjected to affine transformations and using two line correspondences for curves subjected to projective transformations. Still with only first order derivatives, invariant signatures can be found for space curves using two line correspondences for curves subjected to affine transformations, and using three line correspondences for curves subjected to projective transformations. In each of the four cases, these invariant signatures are graphs of one invariant quantity versus another. Determining the equivalence of objects then requires identification of a pair of two-dimensional graphs. Planar objects and surfaces In space may be recognized by matching their boundaries using these variants. Furthermore a group of partially occluded curves may be resolved Into Its individual components.
Article
Scene segmentation is an important problem in pattern recognition. Current subjective methods for evaluation and comparison of scene segmentation techniques are inadequate and objective quantitative measures are desirable. Two error measures, the percentage area misclassified (p) and a new pixel distance error (ϵ) were defined and evaluated in terms of their correlation with human observation for comparison of multiple segmentations of the same scene and multiple scenes segmented by the same technique. The results indicate that both these measures can be helpful in the evaluation and comparison of scene segmentation procedures.