Automated DNA fiber tracking and measurement.
AUTOMATED DNA FIBER TRACKING AND MEASUREMENT
Yaping Wang1,2, Paul Chastain3, Pew-Thian Yap2, Jie-Zhi Cheng2,
David Kaufman3, Lei Guo1, Dinggang Shen2
1School of Automation, Northwestern Polytechnical University, Xi’an, Shaanxi 710072, P. R. China
2Dept. of Radiology and BRIC, University of North Carolina at Chapel Hill, U.S.A
3Dept. of Pathology and Laboratory Medicine, University of North Carolina at Chapel Hill, U.S.A
The rapid assessment of how cells respond to pathologic,
biological, environmental, and endogenous agents is critical for
understanding how such responses may increase genomic
instability, disease development and, ultimately, affect quality of
life. Recently, we have applied the technique of DNA fiber
analysis to increase our understanding of how DNA damaging
agents influence DNA replication . The major drawback to this
technique, however, is the length of time needed to manually
assess the replication dynamics, i.e., 40 hours per experimental
condition. We introduce in this paper an automatic and effective
image analysis framework for accurate quantification of various
replication tracks for high throughput DNA analysis, called
Computer Assisted Selection and Analysis (CASA). This algorithm
is implemented in two major steps: random-walk-based DNA fiber
tracking and pattern-recognition-based measurement of the
replication tracks. Experimental results show that the proposed
method can yield results which are highly consistent with manual
analysis and needs only 160-fold less processing time, without
Index Terms— DNA replication tracks, DNA fiber tracking,
random walk, DNA pattern recognition
To minimize the potentially lethal, mutagenic and clastogenic
effects of replicating and segregating damaged DNA, human cells
rely on an intricate network of cell cycle checkpoints and DNA
repair pathways. Previous studies showed that cells are most
susceptible to malignant transformation when treated with
chemical carcinogens during the earliest part of the S phase
(synthesis phase), because critical DNA target sites for malignant
transformation are replicated in this period of time . S phase
cells are most vulnerable to genetic alterations induced by DNA
lesions that interfere with DNA replication. Human cells respond
to DNA damage by reducing the rate of ongoing DNA synthesis
through passive and active mechanism and actively inhibiting
origin initiation .
To detect locations within the genome undergoing replication
at any given moment, DNA was pulse labeled sequentially with
two thymidine analogs, called IdU (5-Iodo-2′-deoxyuridine) and
CldU (2′-Chloro-2′-deoxyuridine) respectively, which can only be
incorporated into nascent DNA during S phase (Fig. 1). After
pulsing the cells, the cells are lysed, their genomic DNA
straightened and aligned, and IdU and CldU tracks are identified
using antibodies which stain IdU tracks red and CldU tracks green
. After staining, the DNA fibers containing the IdU and CldU
tracks are visualized using confocal laser scanning microscopy
(CLSM) and images are taken (see Fig. 1). The DNA tracks are
then divided into replication tracks that are active during the first
pulse (red-only tracks), initiated during the second pulse (green-
only tracks), or active during both pulses (red-green tracks, green-
red-green tracks and red-green-red tracks) (Fig. 2) . By
comparing how these various tracks change between normal and
cancer cells or when cells are exposed to various DNA damaging
agents, such as ultraviolet (UV) damage, ionizing radiation, and
oxidative damage, one can gain critical insights into how the cell
protects itself from these agents [1, 4].
A CLSM image typically contains hundreds or thousands of
DNA fibers. Analysis based on human observations is time-
intensive and may be prone to observer bias. One experimental
condition can take over 40 hours to analyze and typically 10 to 50
experimental conditions are needed for each study. Here we
propose an effective and fast algorithm for automated
identification and quantification of R-Only, G-Only, R-G, G-R-G
and R-G-R tracks in the combined stained image (Fig. 1). The
proposed algorithm consists of (1) DNA fiber tracking based on a
Fig. 1. Immunofluorescence microscopy of combed DNA fibers:
(a) IdU stained DNA fibers; (b) CldU stained DNA fibers; (c)
combined stained DNA fibers.
Fig. 2. (A) Color patterns exhibited in DNA tracks after a 10-min
pulsing of asynchronous cells with IdU followed by a 20-min
pulsing with CldU. Individual replication units are visualized by
immuno-fluorescence detection of the incorporated halogenated
nucleotides in combed DNA molecules: (i) green-only (G-Only),
newly initiated origins; (ii) red-only (R-Only), replication
terminations; (iii) red-green (R-G), ongoing replication tracks; (iv)
green-red-green (G-R-G), origins which are initiated during first
pulse; (v) red-green-red (R-G-R), replication merging. (B) Field of
labeled replication tracks, along with examples of individual
random-walk scheme and (2) a robust DNA color pattern
recognition method. Using these two strategies, the DNA
replication tracks can be effectively detected and recognized. The
results of this method are compared with the manually labeled
results from an expert.
Detection of DNA fiber is formulated as a line detection problem.
There are many existing line detection and vessel detection
approaches leveraging the Hessian matrix of local structures in
different ways [5-6]. Hessian matrices are known for their
effectiveness in representing local structures, and are hence widely
used for analysis of the line or curvilinear structure. Here, we
employ the line filter described in , which gives better geometric
interpretation of the line structure and effectively suppresses noise
and background by accounting for all eigenvalues of the Hessian
matrix simultaneously. First, a Hessian filter is employed to
enhance the line structures of the combined stained DNA image
(i.e., in Fig. 1(c)). To locate the DNA fiber, points on the Hessian
map over a certain threshold are set as seeding points. A random-
walk-based tracking method is then used to extract all enhanced
tracks. We subsequently use a pattern analysis method to identify
the tracks of interest, i.e., the R-Only, G-Only, R-G, G-R-G, and
R-G-R tracks. Details of these steps are given below.
2.1. Hessian Probability Map
The Hessian map is generated based on the combined stained
image. The probability map is generated by a multiscale line
enhancement filter based on the eigenvalues and eigenvectors of
Hessian matrix, reflecting local structures . For the sake of
simplicity, we call the value of each point on the Hessian
probability map as the lineness.
2.2. DNA Fiber Tracking by Random-Walk
The standard way of detecting the tracks of interest is through
progressively tracing the points on the track. For this, we use the
random-walk-based scheme proposed in , which derives
multiple sampling paths by exploring the uncertainties of three
tracking factors: tracking direction, jumping distance, and lineness
value, as detailed below.
2.2.1. Seeding Strategy and Initialization
The seeding points for the DNA tracking are selected by globally
thresholding the Hessian probability map (i.e., we use 0.01). The
detected points are clustered, and then the clusters with sparse
points are treated as noise clusters and discarded. For each
remaining cluster, the point with the highest lineness is set as the
initial seeding point for tracking initialization.
2.2.2. Tracking Direction
The tracking direction determines the location of the next
candidate point. In our case, the direction of search is obtained by
combining the previous direction and the locally estimated
direction, which is acquired based on the Hessian matrix. For the
initial seeding points, we use Principal Component Analysis (PCA)
to estimate the initial local principal direction of the vicinity of the
seeding point. This gives a more robust direction than the direction
given by the eigenvector of a single Hessian matrix. The search
angle for subsequent tracking is set within the range of .
2.2.3. Jumping Distance
The jumping distance is described by an equally-distributed
random variable , which ranges from 4 to 6 pixels. With this
setting, we can see from labels ① and ② in Fig. 3(I-d) and Fig.
3(II-d), respectively, that small gaps do not disrupt the tracking.
2.2.4. Optimal Lineness Point
Multiple local maxima sometimes exist in the tracking range. In
such cases, the point which has the maximum lineness is selected
as the next point within the ring-shape window determined by the
searching angle range and jumping distance.
2.2.5. Stopping Criterion
The iteration of the random-walk-based line tracking ends when no
more point with strong lineness can be found.
2.2.6. Elimination of Undesired Fibers
Because of the existence of crossing, overlay and noise in most
DNA stained images, distinguishing one fiber from another
overlapping fiber is non-trivial. We eliminate crossing DNA
replication tracks from our analysis by discarding fibers with a
significant amount of overlapping points with other fibers.
Specifically, fibers are discarded if they are covered by another
longer fiber at an overlap rate of more than 90%. Labels ③ in Fig.
3(II-d) and Fig. 3(III-d) demonstrate this type of examples. For
achieving robustness to noise, fibers with length shorter than a
specified minimum value are removed (one can see this by
comparing the corresponding pictures in the 2nd and 3rd columns of
Fig. 3). Noisy fibers not discarded in this step will be further
processed in the color pattern recognition step, as discussed next.
2.3. Pattern Recognition
After DNA fiber tracking, we need to further determine the color
patterns of the tracks of interest: the R-Only, G-Only, R-G, G-R-G,
and R-G-R tracks. These tracks are essential for the biologist to
quantify how cells respond
environmental, and endogenous agents.
2.3.1. Robust Color Coding
Based on the results in Section 2.2, we estimate for each point on
the track its color using the red and green stained DNA fiber
images. We employ a robust color code judging method.
Specifically, for each point, we estimate its color code based on the
color information in its eight neighboring points. If over half of the
points in the neighborhood are green, the current point will be
labeled as having a green component, and vice versa for red.
Because the green signal represents a more accurate replication
signal (since the antibodies which recognize IdU have a slight
affinity towards CldU), we record the points on the lines according
to the criteria as follows: A point with both green and red
components is regarded as a green point (labeled as G); a point
with the green component only is regarded as a green point
(labeled as G); a point with the red component only is regarded as
a red point (labeled as R); and a point without both green and red
values is regarded as a gap point.
2.3.2. Track Connectedness and Gap Tolerance
Due to staining inhomogeneity and imaging noise, some fibers will
be disrupted by gaps or noisy signals, as shown by labels ① and ②
in Fig. 3(I-d) and Fig. 3(II-d). Hence, in the process of
determining the color pattern of a certain DNA replication track,
the track connectedness and gap tolerance are very important
factors for judging the color pattern correctly. We introduce a
confidence score for each track :
Here, denotes the total length of this track ; denotes the
total length of the gaps on this track . Tracks with less than 0.9
are discarded. For tracks with , we allow the existence of
multiple gaps. We set the maximal length of gap tolerance as
pixels for each gap. Our criterion is set based on the experience of
an expert: (1) A maximum of three gaps are allowed on one single
to pathologic, biological,
track; (2) The maximal length of one gap should not be longer than
pixels; (3) The length of a single connected color segment
should be over pixels. We use these three conditions as
constraints in the process of color pattern recognition. This is
demonstrated by labels ① and ② in Fig. 3(I-d) and Fig. 3(II-d).
Note for our purpose, a setting of 3 pixels allows our color
coding method to tolerate gaps in the aforementioned tracks.
2.3.3. Track Length Limitation
Undesired tracts are further removed based on the minimum length
requirement. The minimum required length for a G-Only or R-
Only track is 8 pixels. For the R-G track, the minimum length is 9
pixels with each of the red or green segment over 3 pixels. For the
G-R-G or R-G-R track, the minimum length is 12 pixels with each
of red or green segment over 3 pixels. These values were
empirically determined by a DNA fiber analysis expert.
2.3.4. Ratio of Lineness
As mentioned in section 2.1, lineness is the values we obtained in
the Hessian probability map. We found that for most fibers, the
lineness contrast between the central line and its neighborhood is
very apparent. However, on occasion, there are fibers that have
weak signals, are twisted together, or contain dot noise and
consequently do not show sufficient lineness contrast in their local
ROIs. For discarding these fibers, we introduced a measurement
Here represents the average lineness of the central part
of the fiber track and represents the remainder of
the local ROI of the track. We set the central segment according to
the general width of the DNA fiber, and then set its neighborhood
to be points on the two sides of the central segment. Here, we
set to be 2.0 based on empirical observation. The tracks with
ratios less than 2.0 are discarded. The effectiveness of this
setting is demonstrated by label ④ in Fig. 3(II-d).
3.1. Qualitative Evaluation
Example results from each of the major steps of our method are
illustrated in Fig. 3, where three close-up views of the DNA fiber
image shown in Fig. 1(c) are given. An overall view of the
detection result is shown in Fig. 4. It can be observed that CASA
yields results which closely match the data.
3.2. Quantitative Evaluation
For quantitative assessment, we show in Fig. 5 the numbers of
tracks detected manually by an expert and CASA, respectively. It
can be observed that the numbers of tracks of interest detected by
CASA are comparable with those determined by the expert. We
performed further evaluation by using 10 additional DNA fiber
images from cells that were not treated with DNA damaging
agents. For each color pattern, we compute the scoring difference
by subtracting the number of tracks determined manually by the
expert from the number determined by CASA. The results, given
in Fig. 6, show that the scoring difference is predominantly in the
range of [-3, 3], indicating high accuracy. The reason why CASA,
on occasions, identifies more tracks than the expert is partly due to
the fact that CASA can more easily detect the small replication
tracks, which are harder for the expert to detect.
The length of a DNA replication track is commonly used to
determine how replication rates are influenced by various
conditions . Measuring the lengths manually with high precision
is extremely laborious and time-consuming. With CASA, this can
be done automatically and rapidly. The average green and red track
lengths obtained by CASA for the 10 DNA fiber images above are:
23.90 and 12.40 pixels, respectively. Their ratio is close to 2:1,
which is in agreement with the fact that the cells were pulsed with
IdU for 10 minutes and CldU for 20 minutes (a two-fold difference
in staining time).
Fig. 3. A step-by-step demonstration of the proposed method based on three close-up views of the CLSM image in Fig. 1(c). Each row
represents one close-up image. From left to right are (a) the raw images, (b) initial fiber tracking results overlaid on the Hessian map, (c)
the final fiber tracking results, and (d) the detection results which are drawn around each DNA fiber. The frames delineating the fibers
show the detection results given by CASA. The numbers are indices of the detected fibers.
This work was supported in part by NIH grants (EB008374, ES018918, ES015856, CA084493, CA125337, ES010126, ES0005948, and ES18918).
CASA, currently implemented in MATLAB, requires only 15
minutes to analyze about 10,000 replication tracks in these 10
images. This is more than 160-fold reduction in processing time
compared with 40 hours of manual labeling. When allowed to
process the data in parallel, CASA can produce results in an even
shorter amount of time.
To evaluate CASA’s capability in quantifying replication
dynamics after the cells were exposed to DNA damaging agents,
we analyzed the DNA replication tracks from cells that were
irradiated with different UV fluence (0, 5, 10, and 20 )
between the IdU and CldU pulses. We found that the results given
by CASA agree very well with the results given by the expert.
CASA was able to detect the increase of the R-Only tracks, the
decrease of the G-Only tracks, and the shortening of the green
segments in the R-G tracks, which were the results reported in the
literature [1, 4]. Fig. 7(a) represents the comparison of the relative
amount (compared with the case of no radiation - 0 fluence) of R-
Only tracks quantified by CASA and the expert. Both analyses
revealed an increase in R-Only tracks with the increase of UV
radiation exposure. This indicates that some active replication
forks are unable to overcome the UV induced damage. In fact, Fig.
7(a) indicates that CASA yields a more reasonable result at
, where CASA shows more DNA replication tracks being
terminated compared to . Fig. 7(b) shows the relative
amount of G-Only tracks. Both the expert and CASA observed a
decrease in G-Only tracks in the UV treated cells, indicating that
the replication origins were inhibited as a consequence of UV
damage. The trends shown in Fig. 7(a) and Fig. 7(b) are both
consistent with the findings previously reported [1, 4]. The length
of green portion of the R-G track is usually assessed to help
understand replication dynamics in relation to UV damage. From
Fig. 7(c), we can observe that both the expert and CASA observed
the shortening of the green segments of the R-G tracks with
increasing UV radiation exposure, suggesting a reduction of
overall replication rate, in line with the conclusion in .
A fully automatic DNA fiber tracking and pattern analysis method,
namely CASA, is proposed in this paper. Experimental results
show that the performance of our method is statistically
comparable to human observer analysis. CASA will benefit
scientists who need to perform high throughput analyses of DNA
fibers in an unbiased manner with speed, accuracy and precision.
As a caveat, however, since it is difficult to determine a set of
parameters which cater for all situations exhaustively, there are
some occasions where a human expert can judge a DNA fiber
better than CASA. Therefore, our future work will entail adding
human interaction functionalities into CASA.
 P.D. Chastain, et al., “Checkpoint Regulation of Replication Dynamics
in UV-Irradiated Human Cells,” Cell Cycle, 5(18): 2160-2167, 2006.
 D.G. Kaufman, et al., “Early S phase DNA replication: a search for
targets of carcinogenesis,” Adv Enzyme Regul., 47: 127-38, 2007.
 W.K. Kaufmann, “The human intra-S checkpoint response to UVC-
induced DNA damage,” Carcinogenesis, 31(5): 751-65, 2010.
 K. Unsal-Kaçmaz, et al., “The human Tim/Tipin complex coordinates
an Intra-S checkpoint response to UV that slows replication fork
displacement,” Mol Cell Biol, 27(8): 3131-3142, 2007.
 A. Frangi, et al., “Multiscale Vessel Enhancement Filtering,” MICCAI
1998, LNCS, 1496: 130-137, 1998.
 J. Cheng, et al., “Detection of Arterial Calcification in Mammograms
by Random Walks,” IPMI 2009, LNCS, 5635: 713-724, 2009.
Fig. 5. Track number comparison of the image shown in Fig. 1(c).
Fig. 6. Scoring difference between the expert and CASA.
Fig. 7. CASA compares favorably with the results given by the
expert over different amounts of radiation dose.
UV Radiation (J/m2)
UV Radiation (J/m2)
UV Radiation (J/m2)
Fig. 4. An overall view of the detection results of the DNA fiber
image shown in Fig. 1(c).