Conference PaperPDF Available

Abstract and Figures

Table detection is a crucial step in many document analysis applications as tables are used for presenting essential information to the reader in a structured manner. It is a hard problem due to varying layouts and encodings of the tables. Researchers have proposed numerous techniques for table detection based on layout analysis of documents. Most of these techniques fail to generalize because they rely on hand engineered features which are not robust to layout variations. In this paper, we have presented a deep learning based method for table detection. In the proposed method, document images are first pre-processed. These images are then fed to a Region Proposal Network followed by a fully connected neural network for table detection. The proposed method works with high precision on document images with varying layouts that include documents, research papers, and magazines. We have done our evaluations on publicly available UNLV dataset where it beats Tesseract's state of the art table detection system by a significant margin.
Content may be subject to copyright.
Table Detection using Deep Learning
Azka Gilani, Shah Rukh Qasim, Imran Malikand Faisal Shafait
National University of Sciences and Technology (NUST), Islamabad, Pakistan
Email: agilani(dot)mscs15seecs,14beesqasim,malik(dot)imran,faisal(dot)shafait(at)
Abstract— Table detection is a crucial step in many document
analysis applications as tables are used for presenting essential
information to the reader in a structured manner. It is a
hard problem due to varying layouts and encodings of the
tables. Researchers have proposed numerous techniques for table
detection based on layout analysis of documents. Most of these
techniques fail to generalize because they rely on hand engineered
features which are not robust to layout variations. In this paper,
we have presented a deep learning based method for table
detection. In the proposed method, document images are first
pre-processed. These images are then fed to a Region Proposal
Network followed by a fully connected neural network for table
detection. The proposed method works with high precision on
document images with varying layouts that include documents,
research papers, and magazines. We have done our evaluations
on publicly available UNLV dataset where it beats Tesseract's
state of the art table detection system by a significant margin.
Tables are widely used for presenting structural and func-
tional information. They are present in diverse classes of doc-
uments including newspapers, research articles and scientific
documents, etc. Tables enable readers to rapidly compare,
analyse and understand facts present in documents. Table
detection in documents is significant in the field of documents
analysis and recognition; hence it has attracted a number of
researchers to make their contributions in this domain.
Table detection is carried out by layout and content analysis
of documents. Tables have varying layouts and variety of
encodings. Because of this reason, writing a general algorithm
for table detection is very hard. Hence, table detection is
considered a hard problem in scientific society. Large number
of researches have been carried out in this field but most of
them have their limitations. Existing commercial and open
source techniques for document analysis including Tesseract
lack the capability to completely detect table regions from
document images [1].
In the recent years, deep learning techniques have greatly
improved the results on various computer vision problems.
Recently, Hao et al. [2] presented an approach for table
detection in documents using deep learning. Their proposed
method employs combination of custom algorithms and ma-
chine learning in order to generate region proposals and to
detect whether a table exists in the proposed region or not.
The major limitation of this method is that it is limited to
PDF (Portable Document Format) documents only which are
non-raster. Another limitation is that it works well on the tables
that have ruling lines but fails to detect those without ruling
lines and those which are spanned across multiple columns.
Hence in order to improve the performance of table detection
and to make up for the limitations of prior techniques, this
paper proposes a methodology for table detection based on
purely based on deep learning without using extensive pre or
post processing. To explain it further, the document image
is transformed into a new image. Then, this paper uses
Faster Recurrent Convolutional Neural Network (Faster R-
CNN) as the deep learning module. In contrary to Hao et
al. [2] technique, the Faster-RCNN computes region proposals
itself and then helps in determining whether the selected area
is a table or not. Our approach has a major advantage of
being invariant to changes in table structure and layout as it
can be fined-tuned to work on any dataset very easily. This
capability is not present in any of the existing approaches.
Hence, we make a significant contribution to table detection
problem by making it data-driven. Additionally, we have
used publicly available UNLV dataset for evaluation of our
proposed methodology [3] where it gives better results than
Tesseract’s table detection system. We have also compared
our results with the commercial market leading OCR Engine,
Abbyy Cloud OCR SDK [4].
The rest of the paper is organized as follows: Section
II describes researches related to table detection. Section 3
describes our proposed methodology that consists of pre-
processing and detection module. Section IV explains de-
scribes performance measures that have been used to eval-
uate our system and explains experimental results. Section V
concludes the paper and provides some directions for future
Several researchers have reported their work regarding table
detection in document images. Kieninger et al. [5]–[7] pro-
posed an algorithm for table spotting and structure extraction
from documents called T-Recs. This system takes word bound-
ing boxes as input. They are clustered to form segmentation
graph using bottom-up approach. The key problem with this
technique is that it depends entirely on word bounding boxes
and is unable to perform well in presence of multi-column
Another approach was proposed by Wang et al. [8]. It
detects table lines depending on distance between consecutive
words. After that, horizontal consecutive words are grouped
together with vertical adjacent lines in order to propose
table entity candidates. This statistical approach assumes that
maximum number of columns in the document is two and
designs the algorithm according to three layout templates
(single column, double column, mixed column). Then, column
classification algorithm is applied to find out column layout
of the page and use this information as prior knowledge for
table spotting. Major limitation of this technique is that it can
only work on those templates for which it has been designed.
Hu et al. [9] presented an approach for table detection while
assuming that input images are single columned. Like previous
methods, this technique can not be applied on multi-column
layouts. Shafait et al. [10] presented another approach for
table detection in heterogeneous documents. This system is
integrated into open source Tesseract OCR engine. It works
well on large variety of documents but major limitation is that
it is a traditional technique and not data-driven.
Tupaj et al. [11] proposed an OCR based table detection
technique. The system searches for sequences of table-like
lines based on the keywords that might be present in the
table headers. The line that contains keyword is regarded as
the starting line while subsequent lines are then analyzed to
match with predefined set of tokens which are then categorized
as table structure. The limitation of this technique is that it
depends highly on the keywords that might appear in table
Harit et al. [12] proposed a technique for table detection
based on the identification of unique table start and trailer
pattern. The major limitation of this method is that it will not
work properly whenever the table start patterns are not unique
in document images.
Gatos et al. [13] proposed an approach for table detection by
finding area of intersection between the horizontal and vertical
lines. Table are then reconstructed by drawing corresponding
horizontal and vertical lines that are connected to intersection
pairs. The limitation of this system is that it works only for the
documents in which the table rows and columns are separated
by ruling lines. Costa e Silva [14] presented a technique
for table detection using Hidden Markov Models (HMMs).
The system extracts text from PDF files using pdftotext Linux
utility. Feature vectors are then computed on basis of spaces
present between the text. The major limitation of this technique
is that it works only on non-raster PDF files that do not have
any noise.
Kasar [15] presented a method to locate tables by identify-
ing column and row line separators. This system then employs
run-length approach in order to detect horizontal and vertical
lines from input image. From each group of horizontal and
vertical lines, a set of 26 low level features are extracted and
passed to Support Vector Machine (SVM) which then detects
the table. The major limitation of this approach is that it will
fail on tables without ruling lines.
Jahan et al. [16] presented a method that uses local thresh-
olds for word spacing and line height for localization and
extraction of table regions from document images. The major
limitation of this method is that it detects table regions along
with surrounding text regions. Hence it cannot be used for
localization of table regions only.
Anh et al. [17] presented a hybrid approach for table
detection in document images. This system first classifies
document in text and non-text regions. On the basis of that,
it uses a hybrid method to find candidate table regions. These
regions are then examined to get table regions. This approach
will fail if table is spanned across multiple columns in the
document. Moreover, it will not work for scanned images as
it does not use any heuristic filter to cater for noisy images.
Hao et al. [2] presented deep learning based approach for
table detection. This system computes region proposals from
document images through some predefined set of rules. These
region proposals are then passed to the CNN that detects
whether a certain region proposal belongs to table region or
not. The major limitation is that it works well for tables with
ruling lines but fails to localize table regions if the table is
spanned across multiple columns. Another limitation is that it
works only on non-raster PDF documents.
In order to make up for the limitations of prior method-
ologies, this paper attempts to adapt Faster R-CNN, a deep
learning technique used for object detection in natural images,
to solve table detection problem.
The proposed method consists of two major modules:
Image transformation and table detection. Documents consist
of content region and blank spaces. Image transformation is
applied in order to separate these regions while the table
detection module uses Faster R-CNN as a basic element of
deep network. Faster R-CNN is highly dependant on combined
network that is composed of Region Proposal Networks (RPN)
and Fast R-CNN. In this section we will describe each module
in detail.
A. Image Transformation
Image Transformation is the initial step of our proposed
methodology. Faster R-CNN [18] was initially proposed for
natural images. Hence image transformation plays a pre-
liminary role in conversion of document images to natural
images as close as possible so that we can easily fine-tune
on existing Faster R-CNN models. Distance transform [19]–
[21] is a derived representation of digital image. It calculates
the precise distance between text regions and white spaces
present in the document image which can give a good estimate
about presence of a table region. In our proposed methodology,
we have used different types of distance transforms so that
different features can be stored in all three channels. Image
transformation is done using the following procedure:
return P
The transformation algorithm takes binary image as an
input. It then computes Euclidean distance transform, linear
distance transform and max distance transform [19]–[21] on
blue, green and red channels of the image respectively. Result
Fig. 1: Transformed Images
of the image transformation algorithm on document images is
shown in Figure 1.
B. Table detection
For detection, our approach employed Faster R-CNN [18].
Faster R-CNN was originally proposed for object detection
and classification in natural images. It is composed of two
modules. The first module is a RPN that propose regions.
The region proposals are fed to the second module which is
the detector module that was originally proposed in Fast R-
CNN [22]. The entire system is a unified network for object
detection. Figure 2 shows the architectural diagram for our
1) Region Proposal Network: As described in [18], Re-
gion Proposal Network (RPN) takes the transformed image
as an input and returns an output of a set of rectangular
object proposals, each with an objectness score. RPN shares
common set of convolutional layers with detector module of
Faster-RCNN. Ren et al. has used Zeiler and Fergus model
(ZF) [23] and Simonyan and Zisserman model (VGG-16) in
their experiments.
In order to generate region proposals, a small network is
slided over the convolutional feature map output by the last
shared feature map. RPN takes an n×nspatial window of
the input convolutional feature map as an input. It maps each
sliding window to a lower dimensional (256-d for ZF model)
feature. The complete architecture of network is shown in
Figure 2. The feature map is then passed to two fully con-
nected layers that include a regression layer and a classification
layer. For this paper we have used default implementation
of Faster R-CNN that takes n=3. The fully connected layers
of the network are shared across all spatial locations. This
architecture [18] is naturally implemented with an n×ncon-
volutional layer followed by two 1×1convolutional layers for
regression and classification. At each sliding window, Faster
R-CNN simultaneously predicts multiple region proposal for
each location that can be denoted by k. So classification layer
has 2k output scores while regression layer has 4k outputs that
Fig. 2: Our approach: The document image is first transformed
and then fed into a fine-tuned CNN model. It outputs a feature
map which are fed into region proposal network for proposing
candidate table regions. These regions are finally given as
input to fully connected detection network along with the
convolutional feature map to classify them into tables or non-
are encoding coordinates of kboxes. The kregional proposals
are then parametrized in relevance to kreference boxes which
are known as anchors. Faster R-CNN yields k=9 anchors at
each sliding position.
The important fact is that Faster R-CNN generates region
proposals that are scale and translational invariant. The RPN is
then trained end-to-end by Stochastic Gradient Descent (SGD)
and back propagation.In this paper, all the layers are fine tuned
by ZF network.
2) Detection network: After the training of network for
region proposal generation, these proposals are then passed
to the region based object detection CNN’s module that will
utilize these proposals. Detection module is highly based on
the unified network that is composed of RPN and Fast R-CNN
with shared convolutional layers. Resultantly, it detect tables
from test set and returns the coordinates of bounding boxes of
predicted tables.
C. Training
We have used Caffe based implementation of Faster R-
CNN [18] to fine tune on our images. Momentum Optimizer
with learning rate of 0.001 and a momentum of 0.9 was
used. Number of training iterations was 10,000. We trained
our system on 2 classes i.e. background and table region.
Background class has been used as the negative example
(table region is missing) while table class has been used as
the positive example (containing table region). Due to this
reason, our proposed system doesn’t search aggressively for
table regions on negative samples.
Fig. 3: Results showing: (a) Partial detection, (b) Missed, (c) Over-Segmented, and (c) False Positives
Fig. 4: Some sample images from the UNLV dataset showing detection results of proposed Table Detection approach. Ground
truth is blue while the detected regions are red.
Different performance measures have been mentioned in the
literature for evaluation of table detection algorithms. These
measures include precision and recall [9], [24] that have been
used for evaluating various table detection algorithms [1], [8],
[24]–[26]. We have compared our proposed methodology with
Shafait et al. [10] and a commercial engine, Abbyy Cloud
OCR SDK [4]. The evaluation measures described in [10] have
been employed.
We have used open sourced UNLV dataset [3] as used
in [10] to make a fair comparison of both methodologies.
Due to this dataset, we didn’t compare our methodology with
systems that were proposed in ICDAR 2013 Table Detection
competition. So, it wouldn’t be a fair comparison to their
proposed approach. Most of the techniques that were proposed
in ICDAR 2013 [15], [27] are not data driven and are highly
dependent on table layout and extraction of custom features
from the images. This makes those techniques non robust to
varying layout.
Considering Ground truth bounding box is represent by Gi
while the bounding box detected by our system is represented
by Dj. The formula for finding the overlapped region between
two bounding boxes is given by [10].
A(Gi, Dj) = 2× |GiDj|
|Gi|+|Dj|, A [0,1] (1)
A(Gi, Dj) represents the overlapped region between ground
truth and detected bounding boxes. Depending on the area of
intersecting region, its value will lie between zero and one.
Note that we are using the same threshold values as described
by Shafait et al. [10] to make a fair comparison with their
Figure 3 shows some of the errors (partial detection, over
segmentation, and false positive detection) that occurred dur-
ing table detection. Here the blue region represents the ground
truth bounding boxes while red region represents bounding
boxes of detected regions.
A. Correct Detections
These are the number of ground truth tables that have a
major overlap (A 0.9) with one of the detected tables. The
area has been calculated using eq.1
B. Partial Detections
These are the number of ground truth tables that have a
partial overlap (0.1 < A < 0.9) with one of the detected tables.
C. Over-Segmented Tables
These are the number of ground truth tables that have
overlap (0.1 < A < 0.9) with more than one detected tables. It
means that different parts of the ground truth table have been
detected as separate tables.
D. Under-Segmented Tables
These are the number of ground truth tables that have major
overlap (0.1 < A <0.9) with detected table but that detected
table also overlaps with several other ground truth tables. It
means that more than one tables were merged during detection
and were reported as single table.
E. False Positive Tables
This indicates the number of detected tables that do not have
an overlap (A 0.1) with any of the ground truth tables. Such
tables are missed during detection.
F. Missed Tables
This indicates the number of ground truth tables that do not
have an overlap (A 0.1) with any of the detected tables. It
means that these tables are missed by the detecting algorithm.
G. Precision
Precision measure has been used for evaluating the overall
performance of table detection method. It finds the percentage
of detected tables that actually belong to table regions of
ground truth document image. Formula for calculating pre-
cision is as follows:
Area of Ground truth regions in Detected regions
Area of all Detected table regions (2)
H. Recall
It is evaluated by finding the percentage of ground truth
table regions that were marked as detected table regions.
Formula for calculating recall is as follows:
Area of Ground truth regions in Detected regions
Area of all Ground truth table regions (3)
I. F1 Score
It considers both precision and recall to compute the accu-
racy of methodology.
2×Precision ×Recall
Precision ×Recall (4)
Fig. 5: Visualization of various engines. Ground truth bound-
ing box is represented by blue color while the detected
bounding box by our method is represented by green color.
Magenta color represents bounding box of Abbyy Cloud OCR
SDK while maroon color shows the result of Tesseract.
In order to evaluate the performance of the proposed
methodology, we chose publicly available UNLV dataset [3].
This dataset consists of wide variety of document images
ranging from business reports to research papers and mag-
azines that includes varying and very complex table layouts.
This dataset contains approximately 10,000 images at different
resolutions. For each scanned image, manually keyed ground
truth text is provided, along with manually determined zone
information. Each zone is further categorized depending on
the contents (text, half-tone, table, etc.) of that zone. Amongst
10,000 document images, only 427 contain table regions. We
have used all of these 427 images from UNLV dataset for
evaluating our proposed technique. As the dataset is small so
we have used transfer learning approach [28]. We have used
data augmentation approaches including rotation, scaling and
flipping to overcome over-fitting.
Performance comparison between open source Shafait et
al. [10] technique (Tesseract), a commercial engine (Abbyy
Cloud OCR SDK) and our method is shown in Table I. While
parsing table, row and column headers are often used as keys.
So even if they are missed, it is impossible to extract any
information; hence, the whole detected table becomes useless.
Thus, number of correct detections is the most expressive
performance measure. Tesseract and Abbyy fail to detect
tables in presence of complex layouts that consists of wide
white spaces. The results exhibit that our approach has better
performance as correct detections significantly improve from
44% to 60.5%.
Figure 5 visualizes results of all three engines with respect
to ground truth. Overall results of our proposed methodology
have been shown in Figure 4.
This paper presented an approach for table detection based
on deep learning. The proposed system uses image transfor-
mation for separating text regions from non-text regions. It
then uses RPN followed by fully connected neural network for
detection of table regions in document images. Experimental
Accuracy (%)
Performance Measures Tesseract Abbyy Without Distance Transform Our Approach
Correct Detections 44.9 41.28 51.37 60.5
Partial Detections 28.4 32.1 42.2 30.2
Missed Tables 25.68 25.68 6.42 9.17
Over Segmented Tables 3.66 7.33 29.35 24.7
Under Segmented Table 3.66 7.33 42.20 30.27
False Positive Detections 22.72 7.21 5 10.17
Area Precision 93.2 95.0 84.5 82.3
Area Recall 64.29 64.3 89.17 90.67
F1 Score 76.09 76.69 86.77 86.29
TABLE I: Performance comparison of different engines
results show that deep learning based system is robust to layout
analysis for table detection as it is not dependent on hand
engineered features. The proposed system has been evaluated
on publicly available UNLV dataset. It gives better results
as compared to the Tesseract's state-of-the-art table detection
system. We plan to extend this work in the direction of table
structure and content extraction in future.
[1] J. Hu, R. S. Kashi, D. Lopresti, and G. T. Wilfong, “Evaluating the
performance of table processing algorithms,” International Journal on
Document Analysis and Recognition, vol. 4, no. 3, pp. 140–153, 2002.
[2] L. Hao, L. Gao, X. Yi, and Z. Tang, “A table detection method for
pdf documents based on convolutional neural networks,” in Document
Analysis Systems (DAS), 2016 12th IAPR Workshop on. IEEE, 2016,
pp. 287–292.
[3] A. Shahab, “Table ground truth for the UW3 and
UNLV datasets,” [Online; accessed 7-April-2017]. [Online].
[4] Abbyy. (2017) OCR SDK engine. [Online]. Available: https://www.
[5] T. Kieninger and A. Dengel, “A paper-to-html table converting system,”
in Proceedings of document analysis systems (DAS), vol. 98, 1998.
[6] ——, “Table recognition and labeling using intrinsic layout features,” in
International Conference on Advances in Pattern Recognition. Springer,
1999, pp. 307–316.
[7] ——, “Applying the t-recs table recognition system to the business letter
domain,” in Document Analysis and Recognition, 2001. Proceedings.
Sixth International Conference on. IEEE, 2001, pp. 518–522.
[8] Y. Wang, I. Phillip, and R. Haralick, “Automatic table ground truth
generation and a background-analysis-based table structure extraction
method,” in Document Analysis and Recognition, 2001. Proceedings.
Sixth International Conference on. IEEE, 2001, pp. 528–532.
[9] J. Hu, R. S. Kashi, D. P. Lopresti, and G. Wilfong, “Medium-
independent table detection,” in Electronic Imaging. International
Society for Optics and Photonics, 1999, pp. 291–302.
[10] F. Shafait and R. Smith, “Table detection in heterogeneous documents,”
in Proceedings of the 9th IAPR International Workshop on Document
Analysis Systems. ACM, 2010, pp. 65–72.
[11] S. Tupaj, Z. Shi, C. H. Chang, and H. Alam, “Extracting tabular infor-
mation from text files,” EECS Department, Tufts University, Medford,
USA, 1996.
[12] G. Harit and A. Bansal, “Table detection in document images using
header and trailer patterns,” in Proceedings of the Eighth Indian Con-
ference on Computer Vision, Graphics and Image Processing. ACM,
2012, p. 62.
[13] B. Gatos, D. Danatsas, I. Pratikakis, and S. Perantonis, “Automatic table
detection in document images,” Pattern recognition and data mining, pp.
609–618, 2005.
[14] A. C. e Silva, “Learning rich Hidden Markov Models in document
analysis: Table location,” in Document Analysis and Recognition, 2009.
ICDAR’09. 10th International Conference on. IEEE, 2009, pp. 843–
[15] T. Kasar, P. Barlas, S. Adam, C. Chatelain, and T. Paquet, “Learning
to detect tables in scanned document images using line information,” in
Document Analysis and Recognition (ICDAR), 2013 12th International
Conference on. IEEE, 2013, pp. 1185–1189.
[16] M. A. Jahan and R. G. Ragel, “Locating tables in scanned documents
for reconstructing and republishing,” in Information and Automation for
Sustainability (ICIAfS), 2014 7th International Conference on. IEEE,
2014, pp. 1–6.
[17] T. T. Anh, N. In-Seop, and K. Soo-Hyung, “A hybrid method for table
detection from document image,” in Pattern Recognition (ACPR), 2015
3rd IAPR Asian Conference on. IEEE, 2015, pp. 131–135.
[18] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-
time object detection with region proposal networks,” in Advances in
neural information processing systems, 2015, pp. 91–99.
[19] H. Breu, J. Gil, D. Kirkpatrick, and M. Werman, “Linear time Euclidean
distance transform algorithms,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 17, no. 5, pp. 529–533, 1995.
[20] R. Fabbri, L. D. F. Costa, J. C. Torelli, and O. M. Bruno, “2D Euclidean
distance transform algorithms: A comparative survey,” ACM Computing
Surveys (CSUR), vol. 40, no. 1, p. 2, 2008.
[21] I. Ragnemalm, “The Euclidean distance transform in arbitrary dimen-
sions,” Pattern Recognition Letters, vol. 14, no. 11, pp. 883–888, 1993.
[22] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International
Conference on Computer Vision, 2015, pp. 1440–1448.
[23] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolu-
tional networks,” in European conference on computer vision. Springer,
2014, pp. 818–833.
[24] T. Kieninger and A. Dengel, “An approach towards benchmarking of
table structure recognition results,” in Document Analysis and Recog-
nition, 2005. Proceedings. Eighth International Conference on. IEEE,
2005, pp. 1232–1236.
[25] S. Mandal, S. Chowdhury, A. K. Das, and B. Chanda, “A simple and
effective table detection system from document images,International
Journal of Document Analysis and Recognition (IJDAR), vol. 8, no. 2-3,
pp. 172–182, 2006.
[26] F. Shafait, D. Keysers, and T. Breuel, “Performance evaluation and
benchmarking of six-page segmentation algorithms,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 30, no. 6, pp. 941–
954, 2008.
[27] J. Fang, L. Gao, K. Bai, R. Qiu, X. Tao, and Z. Tang, “A table
detection method for multipage pdf documents via visual seperators
and tabular structures,” in Document Analysis and Recognition (ICDAR),
2011 International Conference on. IEEE, 2011, pp. 779–783.
[28] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical
machine learning tools and techniques. Morgan Kaufmann, 2016.
... Tables are broadly present in a prodigious variety of documents such as official documents, bills, scientific articles, reports, or archival documents among others; and, hence, techniques for table analysis are instrumental to automatically extract important information kept in a tabular form from numerous sources [1]. Tables facilitate readers to easily compare, analyze and understand facts present in documents [2]. So, table detection is an essential task as the accurate table detection will enhance document analysis addressing important information extraction. ...
... Tables are made of horizontal and vertical lines or by introducing uniform spaces to differentiate the cells within it. The variety of styles makes it difficult to to provide a generic algorithm for table detection [2]. Our main contribution in this paper is writing a generalized algorithm for table detection followed by information extraction. ...
... Two inner bounding boxes A and B can be said to be in the same row if their projections on the Y axis overlap. This can be checked easily with the following formula given in Eq. (2). ...
Full-text available
Table detection is an essential step in many document analysis systems. Tabular data are a pivotal form of information representation that can organize data in a conventional structure for comfortable and quick information retrieval and comparison. Detection of table structures in PDF files or images is a challenging task because of the variability of table layouts, and sometimes the tabular structures’ similarities with non-tabular elements like charts, plots, etc. In this work, we have presented a table detection method using a geometric analysis of the table cell cores that represents the table cell texts. The proposed method works by analyzing the text gap information, and hence it can detect the table cell cores, irrespective of the presence of the table boundary lines and cell-separating rule-lines. Experimentations have been done on various document images of complex structures from well-known datasets. The detection accuracies obtained by us corroborate the usefulness of the proposed method.
... Since 2017, Deep Learning (DL)-based approaches are the mainstream in the research and development of methods and tools for TE from document images and PDF files. Such approaches includes the following: (i) binary classification based on fine-tuning of pre-trained convolutional neural networks (NN) for object detection on images [1], [9], [36], [39], [40], semantic segmentation [15], [17], feature generation [25], graph NN [31], recurrent NN [18], and cascade NN [13]. ...
... As a result, the training of the DNN-model was carried out by using DL4TD Python scripts on three existing datasets of documents (UNLV, Marmot, and ICDAR 2017) containing total 2,800 samples. We adopted the image transformation technique presented in [9] to bring closer the look of documents to natural scene images. To augment the number of training samples, we used affine transformations. ...
... Koci et al. adopted a graphic model to represent the layout and spatial features of the potential forms within a page and then identified the form as a subgraph using a genetic algorithm (5). Overall, these above heuristic-based approaches have difficulties in identifying tables in practice, and their robustness remains a doubt (6). Alternatively, deep learning-based approaches have been proposed. ...
Full-text available
Background: Complete electronic health records (EHRs) are not often available, because information barriers are caused by differences in the level of informatization and the type of the EHR system. Therefore, we aimed to develop a deep learning system [deep learning system for structured recognition of text images from unstructured paper-based medical reports (DeepSSR)] for structured recognition of text images from unstructured paper-based medical reports (UPBMRs) to help physicians solve the data-sharing problem. Methods: UPBMR images were firstly preprocessed through binarization, image correction, and image segmentation. Next, the table area was detected with a lightweight network (i.e., the proposed YOLOv3-MobileNet model). In addition, the text of the table area was detected and recognized with the model based on differentiable binarization (DB) and convolutional recurrent neural network (CRNN). Finally, the recognized text was structured according to its row and column coordinates. DeepSSR was trained and validated on our dataset with 4,221 UPBMR images which were randomly split into training, validation, and testing sets in a ratio of 8:1:1. Results: DeepSSR achieved a high accuracy of 91.10% and a speed of 0.668 s per image. In the system, the proposed YOLOv3-MobileNet model for table detection achieved a precision of 97.8% and a speed of 0.006 s per image. Conclusions: DeepSSR has high accuracy and fast speed in structured recognition of text based on UPBMR images. This system may help solve the data-sharing problem due to information barriers between hospitals with different EHR systems.
... T. Kieninger and A. Dengel [28] can be considered as the pioneers to apply unsupervised learning method to the table detection using a bottomup clustering of given word segments. After that, we have the supervised methods such as in [5,11] that used Faster R-CNN based model for table detection, or [22, 23] used CNN to performed table detection and table structure recognition at the same time. The present work aims at designing a method of table detection using image processing techniques. ...
... For the object detection method, the table is regarded as a class of objects for table detection and to predict the position coordinates of the table. Gilani et al. [18] used distance transformation technology to convert document images into natural images and used Faster R-CNN for table detection. This method was superior to the most advanced Tesseract [19] table detection on the UNLV dataset [20]. ...
Full-text available
As financial document automation becomes more general, table detection is receiving more and more attention as an important part of document automation. Disclosure documents contain both bordered and borderless tables of varying lengths, and there is currently no model that performs well on these types of documents. To solve this problem, we propose a table detection model based on YOLO-table. We introduce involution into the backbone of the network to improve the network’s ability to learn table spatial layout features and design a simple Feature Pyramid Network to improve model effectiveness. In addition, this paper proposes a table-based augment method. We experiment on a disclosure document dataset, and the results show that the F1-measure of the YOLO-table reaches 97.3%. Compared with YOLOv3, our method improves the accuracy by 2.8% and the speed by 1.25 times. It also evaluates the ICDAR2013 and ICDAR2019 Table Competition datasets and achieves state-of-the-art performance.
... With the development of deep learning, there have been many deep learning based approaches trying to solve the table structure recognition problem and achieved promising results. Some of these approaches [6,29] leverage top-down methods and identify columns and rows using semantic segmentation and object detection methods. Meanwhile, some studies [3,12] employ bottomup approaches and represent the table structures with graph models. ...
Full-text available
Tabular data in digital documents is widely used to express compact and important information for readers. However, it is challenging to parse tables from unstructured digital documents, such as PDFs and images, into machine-readable format because of the complexity of table structures and the missing of meta-information. Table Structure Recognition (TSR) problem aims to recognize the structure of a table and transform the unstructured tables into a structured and machine-readable format so that the tabular data can be further analysed by the down-stream tasks, such as semantic modeling and information retrieval. In this study, we hypothesize that a complicated table structure can be represented by a graph whose vertices and edges represent the cells and association between cells, respectively. Then we define the table structure recognition problem as a cell association classification problem and propose a conditional attention network (CATT-Net). The experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods on various datasets. Besides, we investigate whether the alignment of a cell bounding box or a text-focused approach has more impact on the model performance. Due to the lack of public dataset annotations based on these two approaches, we further annotate the ICDAR2013 dataset providing both types of bounding boxes, which can be a new benchmark dataset for evaluating the methods in this field. Experimental results show that the alignment of a cell bounding box can help improve the Micro-averaged F1 score from 0.915 to 0.963, and the Macro-average F1 score from 0.787 to 0.923.
... Keywords:table detection; process table modeling; graph neural network; similarity measure 0 引言 机械工艺规划是一种需要依靠先验知识的重用设计过程,它所制定的加工工艺规程蕴含大量的 工艺知识。据统计,约 40%的新产品会重用相似产品的加工工艺规程,约 40%会对相似产品的加工 工艺进行一定修改,只有 20%需要全新的工艺设计 [1] 。因此,有效、准确的度量不同工艺规程的相 似工艺,用以工艺重用,对于缩短产品工艺设计周期具有重要意义。但大部分工艺规程都以表格形 式记录,以非结构化文档图像为载体。这使得工艺表格的内容呈现具有复杂、多样的特点,且文字 不可直接编辑、利用。此外,工程师在重新设计新产品的工艺表格之前,会从已有成功实例中人工 寻找相似的工艺表格 [2] ,但人工评判不同工艺表格的相似工艺往往带有主观因素,无法完全涵盖语 义信息,不准确且效率低,严重影响设计效果。目前,在工艺重用中,尚缺乏有效评估相似工艺规 程的相关研究。 挖掘工艺表格中的语义信息,进而评估不同表格实例的相似工艺正成为一个热点研究。首先需 要提取表格信息,将不可编辑的字符转换成可编辑的形式。目前,表格信息提取研究大多基于表格 检测与 OCR [3][4][5] (Optical Character Recognition,光学字符识别)文字识别技术。表格检测是为了定 位表格区域,排除其他非表格元素的干扰,然后利用成熟的 OCR 技术识别表格文本。R.Amarnath 等人 [6] 提出表格框线检测的方法定位表格区域,用于手写文档图像的表格信息提取。吕志刚等人 [7] 提出融合局部图像特征的表格检测算法,应用于 OCR 信息识别软件。Nikola Milosevic 等人 [8] 提出模 板匹配的方法识别表格及其单元格,用以提取表格信息。但上述方法都基于启发式规则,泛化性较 差,因此为了提高泛化性,不少学者倾向于采用深度学习的方法。Azka Gilani 等人 [9] 采用 Faster R-CNN 检测文档中表格区域。 Ningning Sun 等人 [10] 则在 Azka Gilani 等人研究的基础上增加表格框角 点定位的方法, 用以提升表格检测精度。 而定位表格区域是为了排除非表格元素干扰, 从而保证 OCR 技术提取表格文本的效果。 提取表格信息之后,需要进行不同表格工艺信息的相似性度量,为工艺知识重用奠定支撑。目 前,工艺相似性度量方法可大致分为:(1)基于传统的方法。常智勇等人 [11] 基于编辑距离计算加工 意图字符串间的相似性,用于工艺重用。李秀玲等人 [12] 通过向量空间模型将工艺实例向量化表示后, 基于奇异值分解和二范数计算工艺实例间的相似度,用于合并相似的工艺知识。万姗等人 [13] 基于本 体概念度量数控维修案例间的相似性,用于重用历史维护案例知识,提高维护服务效率。但传统方 法都无法处理工艺信息间的联系及其深层语义。(2)基于深度学习的方法,又可分为基于文本向量 和基于图谱向量两种方法。基于文本向量的方法是在词向量的基础上,通过训练将文本表示成向量 3 并计算距离以表征语义相似度 [14] 。陈治宇等 [15] 基于 Word2vec 将装配工艺语素嵌入成词向量计算距 离得到词与词的相似度,用于装配工艺文档词素分类。但对于工艺表格而言,只将提取的文本映射 成向量以度量相似性,会因缺失工艺表格特有的语义与结构特点,导致结果不准确。基于图谱向量 的方法是将信息构建成图谱,根据图谱节点间的连接关系,通过训练将图谱表示成向量服务于下游 任务 [16,17] 。例如 TransE [18] 系列模型和 GraphSAGE [19] 、Graph2vec [20] [25] ...
为解决人工评估复杂工艺表格的相似性用于工艺重用设计存在效率低、精度差等问题,提出一种图神经网络组合算法有效提取工艺表格的结构、语义等特征以度量相似性。首先提出改进Mask R-CNN(mask region-based convolution neural networks)算法用以表格检测,包括距离变换突出表格特征、Confluence 算法提高检测精度、角点定位调整检测框以实现精准定位,同时利用OCR(Optical Character Recognition,光学字符识别)技术提取表格文本信息。然后针对提取的关键单元信息,分别建模工艺表格的结构特性图网络与语义关系图网络。进一步,提出图神经网络组合算法提取图网络模型的结构特征与节点属性,并转化成低维实值向量,以支撑提出的一种联合相似度综合评估方法,实现度量工艺表格语义相似性。最后,经实验分析表明了所提方法的有效性,并以工艺重用实例验证了方法的可行性。
As the manufacturing sector enters the Industry 4.0 era, a higher level cooperative system must be established between manufacturers. Therefore, seamless sharing of information is required between companies, for instance, between an original equipment manufacturer and a parts manufacturer. However, books in PDF or image format that cannot be modified are still commonly used in the field to convey information. Moreover, locating the necessary information in documents, drafted based on unstructured data, is challenging. To overcome these drawbacks, this study proposes an end-to-end digitalization method to convert an image format catalog book into structured digital part specifications. The proposed method also defines a neutral reference data dictionary to ensure consistent digitalization to facilitate data interoperability, classifying catalog pages per part and identifying part numbers, detecting specification tables and recognizing texts in a table, and building part objects and their property objects from the texts extracted from the table. To validate our method, we conducted an experiment where catalog books for motor parts were digitalized. The experiment results exhibited excellent accuracy performance with 96.97% and 90.59%, in part object and property object conversion, respectively, when considering specifications.
Tabular structures in business documents offer a complementary dimension to the raw textual data. For instance, there is information about the relationships among pieces of information. Nowadays, digital mailroom applications have become a key service for workflow automation. Therefore, the detection and interpretation of tables is crucial. With the recent advances in information extraction, table detection and recognition has gained interest in document image analysis, in particular, with the absence of rule lines and unknown information about rows and columns. However, business documents usually contain sensitive contents limiting the amount of public benchmarking datasets. In this paper, we propose a graph-based approach for detecting tables in document images which do not require the raw content of the document. Hence, the sensitive content can be previously removed and, instead of using the raw image or textual content, we propose a purely structural approach to keep sensitive data anonymous. Our framework uses graph neural networks (GNNs) to describe the local repetitive structures that constitute a table. In particular, our main application domain are business documents. We have carefully validated our approach in two invoice datasets and a modern document benchmark. Our experiments demonstrate that tables can be detected by purely structural approaches.
Full-text available
Pool of knowledge available to the mankind depends on the source of learning resources, which can vary from ancient printed documents to present electronic material. The rapid conversion of material available in traditional libraries to digital form needs a significant amount of work if we are to maintain the format and the look of the electronic documents as same as their printed counterparts. Most of the printed documents contain not only characters and its formatting but also some associated non text objects such as tables, charts and graphical objects. It is challenging to detect them and to concentrate on the format preservation of the contents while reproducing them. To address this issue, we propose an algorithm using local thresholds for word space and line height to locate and extract all categories of tables from scanned document images. From the experiments performed on 298 documents, we conclude that our algorithm has an overall accuracy of about 75% in detecting tables from the scanned document images. Since the algorithm does not completely depend on rule lines, it can detect all categories of tables in a range of scanned documents with different font types, styles and sizes to extract their formatting features. Moreover, the algorithm can be applied to locate tables in multi column layouts with small modification in layout analysis. Treating tables with their existing formatting features will tremendously help the reproducing of printed documents for reprinting and updating purposes.
Conference Paper
Full-text available
This paper presents a new approach to detect tabular structures present in document images and in low resolution video images. The algorithm for table detection is based on identifying the unique table start pattern and table trailer pattern. We have formulated perceptual attributes to characterize the patterns. The performance of our table detection system is tested on a set of document images picked from UW-III (University of Washington) dataset, UNLV dataset, video images of NPTEL videos, and our own dataset. Our approach demonstrates improved detection for different types of table layouts, with or without ruling lines. We have obtained correct table localization on pages with multiple tables aligned side-by-side.
Conference Paper
Full-text available
This paper presents a method to detect table regions in document images by identifying the column and row line-separators and their properties. The method employs a run-length approach to identify the horizontal and vertical lines present in the input image. From each group of intersecting horizontal and vertical lines, a set of 26 low-level features are extracted and an SVM classifier is used to test if it belongs to a table or not. The performance of the method is evaluated on a heterogeneous corpus of French, English and Arabic documents that contain various types of table structures and compared with that of the Tesseract OCR system.
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research. Please visit the book companion website at It contains Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc. Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface Includes open-access online courses that introduce practical applications of the material in the book.
Conference Paper
In this paper, we present a hybrid method consisting of three main stages for detecting tables in document images. Based on table structure, our system separates table into two main categories, ruling line table and non-ruling line table. In the first stage, the text and non-text elements in document are classified by a heuristic filter. Then, the white space analysis is used to group the text elements into text lines, while ruling line table candidates are identified from non-text elements. In the second stage, based on the text lines, text and non-text elements, a hybrid method which consist of the alternative bottom-up and top-down approaches is implemented to find the table region candidates. In the final stage, these candidates are examined to get the table regions by analyzing text lines and spare lines. Experimental results with the document database from the ICDAR2013 table competition show that the proposed method works better than the previous ones.
Conference Paper
Large Convolutional Neural Network models have recently demonstrated impressive classification performance on the ImageNet benchmark \cite{Kriz12}. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.
Conference Paper
Document organization may be described in different ways. The physical presentation on one hand by means of a nested layout structure representing a “part-of”-relationship, e.g. of text blocks, text line segments, words, and characters. The composition of meaningful entities, such as title, author, address, or abstract, on the other, in terms of a logical structure. Both views to the contents of a document are complementary. They relate to each other as being explicitely given by publication guidelines where the position and dimensions of logical objects are precisely described. Moreover such “publication guidelines” more or less hold for various types of documents. Although they are intuitively understandable for human beings, they are hard to formalize explicitly because of the freedom originators and authors of documents have in order to incorporate their own creativity. However, humans use these intrinsic layout features to get first hints towards the logical meaning of document information. This is even obvious when considering the very difficult document examples in Figure 1 where no text is given but it is nevertheless possible to generate hypothesis for logical meaning. This paper deals with the problem of how to make use of such relationships between the layout structure and logical objects in documents. In particular, we describe an approach for layout based labeling of documents and for recognizing the structure of tables.