Conference PaperPDF Available

Segmentation-Free Detection of Comic Panels

Authors:

Abstract and Figures

The detection of comic panels is a crucial funcionality in assistance systems for iconotextual media analysis. Most systems use recursive cuts based on image projections or background segmentation to find comic panels. Usually this limits the applicability to comics with white background and free space between the panels. In this paper, we introduce a set of new features that allow for a detection of panels by their outline instead of the separating space. Our method is therefore more tolerant against structured backgrounds.
Content may be subject to copyright.
Segmentation-Free Detection of Comic Panels
Martin Stommel1, Lena I. Merhej2, and Marion G. M¨uller3
1Artificial Intelligence Group, Universit¨at Bremen, Germany
mstommel@tzi.de
2Art, Art History and Visual Studies, Duke University, Durham NC, USA,
VisComX, Visual Communication and Expertise, Jacobs University Bremen
l.merhej@jacobs-university.de
3Mass Communication, School of Humanities & Social Sciences,
Jacobs University Bremen, Germany
m.mueller@jacobs-university.de
Abstract. The detection of comic panels is a crucial funcionality in
assistance systems for iconotextual media analysis. Most systems use
recursive cuts based on image projections or background segmentation
to find comic panels. Usually this limits the applicability to comics with
white background and free space between the panels. In this paper, we
introduce a set of new features that allow for a detection of panels by
their outline instead of the separating space. Our method is therefore
more tolerant against structured backgrounds. 4
1 Introduction
The understanding of the narrative is one of the most important aspects that
artists, art historians and scholars of visual studies pursue in the analysis of
iconotextual sequences such as comics and serialized graphics (e.g. poster series
or newspaper advertising). The development of dedicated imaging software sys-
tems has simplified primarily the technical side of the production process. Tools
such as Comic Life (Freeverse Software), Manga Studio Debut (Smith Micro
Software Inc.), and Comic Book Creator (Planetwide Games) support the de-
sign of graphics and text, online publishing and page layout. Few tools, however,
exist to support the analytical side. Audio and video annotation tools such as
ELAN5, ANVIL6, and KIVI7allow for the labelling of the time-line and the
creation of multiple views on the data. Unfortunately they do not support the
specific structure of a comic where multiple panels with graphical and textual
parts (cf. Fig. 1) are arranged in multiframes [1] and represented on single or
double pages.
4The original publication is available at
link.springer.com/content/pdf/10.1007/978-3-642-33564-8 76.pdf
5Language Archive Organization, http://www.lat-mpi.eu/tools/elan/
6Michael Kipp, http://www.anvil-software.de/
7http://keyvisuals.jacobs-university.de/kivi.html
Fig. 1. Structure of a comic book
A basic functionality of a tool for the analysis of the narration of comics
is to automatically extract the panels of the comic pages and display them in
reading order. It is the prerequisite for the annotation of the time-line of a comic
and therefore the analysis of causal and other dependencies or the rhythm and
development of the narrative.
In this paper, we present a method for the automatic detection of comic
panels that exploits multiple sources of local and global image information. As
a consequence, it is possible to relieve some of the constraints that are usually
imposed on the appearance of the panel separations.
2 Related Work
Algorithmic approaches to detect comic panels are related to document structure
analysis [2] and image segmentation [3, 4]. By assuming that the backround is
predominantly white and homogeneous, it can be identified by tresholding, wa-
tershed segmentation [5] or region growing [4,6]. The remaining foreground areas
are considered as the comic panels. Small connections between panels (caused
e.g. by overlapping text balloons) can be broken up to a certain degree by mor-
phological operations [4, 6]. For simple layouts, the reading order of the panels
can be estimated by first sorting according to the vertical position, then by
horizontal position within a row [5].
Since segmentation algorithms are sensitive even to small connections be-
tween panels, many systems use top-down approaches [3,4,6]. Pages are recur-
sively split into segments based on detected separating stripes. The lack of visual
features of the stripes is compensated by the assumption of a certain length,
straightness, and homogeneity. These properties are most frequently measured
by axis-parallel [7] or omnidirectional [3] projections of the intensity or the gra-
dient [8]. The ordering of the recursive splits plays an important role, since errors
manifest in split, missed, or merged panels. Tanaka et al.[8] therefore introduce
a dedicated detector for T-joints of separating stripes. Han et al. [9] improve the
noise robustness of the traditional X-Y recursive cut algorithm [10] by reducing
the document page to a set of candidate splitting points. The splitting points
are detected by a multilayer perceptron. The method is suited for disjoint panels
with horizontal and vertical borders. Corner detectors are occasionally used to
increase the accuracy of the detected panel positions [11].
3 Proposed Features and Procedural Pathways
As the examples from our data set (Fig. 2) show, the assumption of white back-
ground is often not appropriate to find the gaps between the panels. Instead,
the background is often coloured. Overlapping text balloons make it difficult to
distinguish between panel separations and elements within a panel. We therefore
decided to focus on the detection of the panel outlines rather than the detection
of homogenous gaps. As a simplification, we assume rectangular panels at the
moment. This is of course not always fulfilled. In order to recognise the panel
outlines, we developed five procedural pathways that deal with edges, corners,
regions, globally dominant structures, and rectangles (cf. Fig. 3).
Dominant Vertical and Horizontal Edges After a grey scale conversion
of the input image, the gradient is computed using the Sobel operator (Fig. 3,
blue pathway). Then the horizontal and vertical projections of the gradient are
computed, i.e. the mean gradient magnitude in each row and column. The aim is
to detect rows and columns with long vertical and horizontal edges. To increase
the selectivity, only those pixels are considered whose gradient orientation is
perpendicular to the direction of the projection. Then for every pixel the gradient
magnitude is weighted by the respective average gradient in one line or row,
depending on the orientation. The weighted gradient is then scaled by a root
function and normalised to the maximum. The two pathways described in the
following contribute additional multiplicative weightings.
Stripe-Shaped Gaps Between Panels A skeletonisation is performed to de-
tect homogeneous stripe-shaped gaps between the panels (Fig.3, red pathway).
To this end a coarse edge map is constructed by thresholding the gradient mag-
nitude at half of the maximum. The skeletonisation gives the center pixels of
the regions between the resulting edge pixels. For every local region around a
skeleton point, the principle orientation of the local skeleton is computed, so we
know the orientation of the respective region. We then compute horizontal and
vertical projections of the skeleton analog to our procedure in the edge detec-
tion. By weighting each skeleton point with the mean number of skeleton points
in each row or column, we emphasize horizontal or vertical skeleton lines cor-
responding to vertical or horizontal stripes. For every edge pixel, we determine
the closest point on the skeleton as well as its orientation. The skeletonisation
is then used in two ways to reinforce the edge detection results near the panel
borders: First, we weight the edge pixel by the weight of the local skeleton. This
emphasises edge pixels near horizontal or vertical stripes. Secondly, we weight
the edge pixel by the collinearity between the edge direction and the direction of
the local skeleton. This suppresses edge pixels which are not the outline of the
separating stripe but regular panel contents.
a) b) c)
d) e) f)
Fig. 2. Some pages from our data set: a) ”Cola”, b) + c) ”Une Enfance Heureuse”, d)
+ e) ”Malaak”, f) ”Mission: Moon”
Panel Corners and T-Junctions The detection of panel corners is solved by a
template matching algorithm with additional inputs from a Hough transform [12]
(Fig. 3, green pathway). The motivation for template matching was a complete
failure of the Foerstner [13] and Susan corner detectors [14]. This might be
a result of the coarseness of the comic drawings. Unfortunately, it cannot be
solved by downsizing the image because that would destroy strokes and smear
gaps. Our template matching algorithm subdivides each local image region into
a 3×3-grid and computes the mean vertical and horizontal gradient in each
cell. By assigning a positive or negative sign to every cell and orientation and
summing up the results of selected cells, we compute the response for the four
panel corner directions and four directions of T-junctions. The Hough transform
described in the following paragraph provides the accumulated strength of lines
Fig. 3. Procedural pathways of the algorithm
running through each corner coordinate. The weights of a possible vertical and
a possible horizontal line are read from the accumulator of the Hough transform
and multiplied with the response of the corner detector. This prefers corners on
the panel borders over corners that lie on short edges or curved contours. A fixed
number of the highest ranking local maxima in the corner response are chosen
as discrete corner points. In our system the number is set to 80, which is near
the maximum number of panel corners on the pages of our data base.
Mutual Reinforcement of Edges and Corners A Hough transform [12]
is used to compute histograms (called accumulators or accu, for short) of hy-
pothetical line parameters (Fig. 3, brown pathway) that are plausible with the
local gradient information. We compute accus with fine and coarse bin width. A
comparison between the two accus indicates areas with parallel lines, for exam-
ple in hatchings. Hatchings should be suppressed because they achieve high edge
weights but rarely represent panel borders. The fine accu is also used in corner
detection as mentioned before. For every line parameter, we also accumulate the
corner weights where a line crosses the image. The types of corners accumulated
along a line also indicate a panel border: For panel borders near mostly homoge-
neous stripes, T-junctions and corners to the inside of the panel are more likely.
The maximum corner weight that is accumulated for a hypothesised panel bor-
der orientation is chosen to weight the gradient magnitude, together with the
parallel line suppression. The result is a measure of the panel border strength.
The panel border strength under the detected corners is used in an iterative
process to threshold the result (see Fig. 4).
Rectangles and Reading Order In the last part of the procedural path-
way (Fig.3, yellow), the thresholded border strength is converted to a list of
rectangles. At first, a list of panel candidates is created by forming a rectangle
between all diagonally opposite corners. The confidence of every side of a rect-
angle is estimated by accumulating the respective values in the thresholded map
a) b) c) d) e) f)
Fig. 4. Thresholded borders (red and blue depending on the orientation) and corners
(blue) for the images in Fig. 1
and searching for suitable corners. Rectangles that cross highly confident borders
of other rectangles are discarded, unless they are crossed themselves by other
rectangles. Generally unconfident rectangles are discarded. Then every rectangle
that is intersected by a more confident rectangle is discarded. Intersections with
the most confident rectangles are processed first. For the remaining rectangles
the reading order is determined. To this end, the geometric relation ’upper left’
describing a precedence in reading order is computed for every pair of rectangles.
The final reading order is determined by iteratively selecting the first rectangle
that is not blocked by other rectangles that have not been selected, yet.
4 Experimental Results
We tested our algorithm on the comics ”Cola”8and ”Une Enfance Heureuse”
(2003) by Mazen Kerbaj, ”Malaak” (2006) by Joumana Medlej9, and ”Mission:
Moon”, Ep. 1 by Ahmad Qatato10 (Samandal Comic Book, issue 8). Figure 2
shows some examples. The combination of edge, corner and skeleton features to-
gether with the mutual reinforcement by the Hough transform leads to a robust
emphasis of the panel borders with good background suppression. Although the
strength of the detection varies over the image plane, the iterative thresholding is
usually able to automatically find appropriate binarisation parameters. The cor-
ner detector reliably yields the strongest responses for the correct panel corners.
Figure 4 shows the output of the feature extraction. False borders are sometimes
detected for straight horizontal or vertical lines within the panels. Text balloons
extending over the panel borders usually produce gaps in the detections. The
same holds for adjoining panels, where the joint border has T-junctions in both
directions. This case is difficult to distinguish from other straight lines within a
comic.
The analysis of the rectangles is able to resolve most conflicts that are caused
by ambiguities and overlaps. As a result, many panels can be localised correctly.
8Kerbaj, M.: Cola. In: Le tour du monde en bande dessine, Vol.2, Delcourt (2009)
9http://www.malaakonline.com
10 http://ahmadqatato.com
Figure 5 shows that most panels outlined by a rectangle have been found cor-
rectly. Panels without border do not yield detections (Fig. 5d). A lower recall
here is tolerable because it is easier to manually add missing rectangles without
having to delete spurious matches beforehand. The reading order is determined
correctly. However, the filter heuristics are presently too simple to resolve two
frequent ambiguities: 1. Small rectangles within a panel are preferred over the
surrounding panel outline. This happens for example with rectangular text boxes
(Fig. 5a, upper right panel, Fig. 5c, lower left panel). 2. Two adjacent rectangles
with a low confidence of the adjoining side are not merged together. The reso-
lution of these conflicts requires better heuristics or information by additional
features. The speed of the algorithm is about 5s per image on a desktop computer
(Intel Core2 Quad, 2.6GHz). A further speed up is easily possible by employing
some of the known standard techniques for code optimisation (e.g. multithread-
ing).
5 Conclusion
The detection of comic panels is a difficult problem if the panels are not sep-
arated by white background. In this paper we proposed a set of corner based,
edge based, region based and global features that allow for the recognition of
panels by their outlining rectangle instead of separating white regions. While the
heuristics to resolve ambiguities in the assignment of correct rectangles serve ba-
sic visualisation needs, our experiments document a high reliability of the feature
extraction steps even in difficult image material.
References
1. Groensteen, T.: Syst´eme de la bande dessin´ee [”System of Comics”, issued in
English in 2007 by the University Press of Mississippi]. Presses universitaires de
France, Paris (1999)
2. Liang, J.: Document Structure Analysis and Performance Evaluation. PhD thesis,
University of Washington (1999)
3. Chan, C.H., Leung, H., Komura, T.: Automatic panel extraction of color comic
images. In: 8th Pacific Rim Conference on Multimedia (PCM), Springer (2007)
4. Ho, A.K.N., Burie, J.C., Ogier, J.M.: Comics page structure analysis based on au-
tomatic panel extraction. In: 9th International Workshop on Graphics Recognition
(GREC 2011), Seoul, Korea, September 15th-16th, 2011. (2011)
5. Ponsard, C., Fries, V.: Enhancing the Accessibility for All of Digital Comic Books.
Int. J. on Human-Computer Interaction (eMinds) 1(5) (2009) 127–144
6. Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and speech balloon extraction from
comics books. In: IAPR International Workshop on Document Analysis Systems
(DAS 2012), Gold Coast, Australia, March 27th - 29th, 2012, IEEE (2012) 424–428
7. Ha, J., Haralick, R., Phillips, I.: Document Page Decomposition by the Bounding-
Box Projection Technique. In: Proceedings of Third Intl Conf. Document Analysis
and Recognition (ICDAR), IEEE (1995) 1119–1122
a) b) c)
d) e) f)
Fig. 5. Detected panels (dashed rectangles) and estimated reading order (numbers)
8. Tanaka, T., Shoji, K., Toyama, F., Miyamichi, J.: Layout Analysis of Tree-
Structured Scene Frames in Comic Images. In: International Joint Conference
on Artificial Intelligence (IJCAI), Morgan Kaufmann (2007) 2885–2890
9. Han, E., Kim, K., Yang, H., Jung, K.: Frame Segmentation Used MLP-Based
X-Y Recursive for Mobile Cartoon Content. In Jacko, J., ed.: Human-Computer
Interaction. HCI Intelligent Multimodal Interaction Environments. Volume 4552 of
Lecture Notes in Computer Science. Springer Berlin / Heidelberg (2007) 872–881
10. Nagy, G., Seth, S.: Hierarchical representation of optically scanned documents. In:
Int. Conf. on Pattern Recognition (ICPR), IEEE Comp. Soc. (1984) 347–349
11. Ishii, D., Watanabe, H.: A Study on Frame Position Detection of Digitized Comics
Images. In: Workshop on Picture Coding and Image Processing (PCSJ/IMPS),
Dec. 7, 2010, Nagoya, Japan. (2010) 1–2
12. Ballard, D.H.: Generalizing the Hough transform to detect arbitrary shapes. Pat-
tern Recognition 13(2) (1981) 111–122
13. orstner, W.: A feature based correspondence algorithm for image matching. ISP
Comm. III, Rovaniemi, Int. Arch. of Photogrammetry 26(3/3) (1986)
14. Smith, S.M., Brady, J.M.: SUSAN – A new approach to low level image processing.
Technical Report TR95SMS1c, Chertsey, Surrey, UK (1995)
... More recent articles focus on training Deep Neural Networks [76], but there is not enough publicly annotated data yet to reach the full potential of such approaches [18]. Finally, it is worth noticing that before being applied, many face detection approaches require some preprocessing, in particular detecting panel bounds and speech bubbles [76], which in turn constitute specific problems [307,352]. ...
... However, this requires efficiently solving certain lower-level problems, in particular panel identification and panel ordering. Existing methods to detect the boundaries of panels take advantage of the black lines generally outlining them, or of the white space called gutter separating them [18,352]. But a number of artists use complex page layouts, which makes both panel detection and ordering much harder: overlapping panels, panels joined by other objects (speech bubbles, caption) partly open panels, or even panels with no explicit boundary [352]. ...
... Existing methods to detect the boundaries of panels take advantage of the black lines generally outlining them, or of the white space called gutter separating them [18,352]. But a number of artists use complex page layouts, which makes both panel detection and ordering much harder: overlapping panels, panels joined by other objects (speech bubbles, caption) partly open panels, or even panels with no explicit boundary [352]. ...
... More recent articles focus on training Deep Neural Networks [76], but there is not enough publicly annotated data yet to reach the full potential of such approaches [18]. Finally, it is worth noticing that before being applied, many face detection approaches require some preprocessing, in particular detecting panel bounds and speech bubbles [76], which in turn constitute specific problems [312,364]. ...
... However, this requires efficiently solving certain lower-level problems, in particular panel identification and panel ordering. Existing methods to detect the boundaries of panels take advantage of the black lines generally outlining them, or of the white space called gutter separating them [18,364]. But a number of artists use complex page layouts, which makes both panel detection and ordering much harder: overlapping panels, panels joined by other objects (speech bubbles, caption) partly open panels, or even panels with no explicit boundary [364]. ...
... Existing methods to detect the boundaries of panels take advantage of the black lines generally outlining them, or of the white space called gutter separating them [18,364]. But a number of artists use complex page layouts, which makes both panel detection and ordering much harder: overlapping panels, panels joined by other objects (speech bubbles, caption) partly open panels, or even panels with no explicit boundary [364]. ...
... More recent articles focus on training Deep Neural Networks [65], but there is not enough publicly annotated data yet to reach the full potential of such approaches [17]. Finally, it is worth noticing that before being applied, many face detection approaches require some preprocessing, in particular detecting panel bounds and speech bubbles [65], which in turn constitute specific problems [236,276]. ...
... However, this requires efficiently solving certain lower-level problems, in particular panel identification and panel ordering. Existing methods to detect the boundaries of panels take advantage of the black lines generally outlining them, or of the white space called gutter separating them [17,276]. But a number of artists use complex page layouts, which makes both panel detection and ordering much harder: overlapping panels, panels joined by other objects (speech bubbles, caption) partly open panels, or even panels with no explicit boundary [276]. ...
... Existing methods to detect the boundaries of panels take advantage of the black lines generally outlining them, or of the white space called gutter separating them [17,276]. But a number of artists use complex page layouts, which makes both panel detection and ordering much harder: overlapping panels, panels joined by other objects (speech bubbles, caption) partly open panels, or even panels with no explicit boundary [276]. ...
Article
Full-text available
A character network is a graph extracted from a narrative in which vertices represent characters and edges correspond to interactions between them. A number of narrative-related problems can be addressed automatically through the analysis of character networks, such as summarization, classification, or role detection. Character networks are particularly relevant when considering works of fiction (e.g., novels, plays, movies, TV series), as their exploitation allows developing information retrieval and recommendation systems. However, works of fiction possess specific properties that make these tasks harder. This survey aims at presenting and organizing the scientific literature related to the extraction of character networks from works of fiction, as well as their analysis. We first describe the extraction process in a generic way and explain how its constituting steps are implemented in practice, depending on the medium of the narrative, the goal of the network analysis, and other factors. We then review the descriptive tools used to characterize character networks, with a focus on the way they are interpreted in this context. We illustrate the relevance of character networks by also providing a review of applications derived from their analysis. Finally, we identify the limitations of the existing approaches and the most promising perspectives.
... More recent articles focus on training Deep Neural Networks [76], but there is not enough publicly annotated data yet to reach the full potential of such approaches [18]. Finally, it is worth noticing that before being applied, many face detection approaches require some preprocessing, in particular detecting panel bounds and speech bubbles [76], which in turn constitute specific problems [312,364]. ...
... However, this requires efficiently solving certain lower-level problems, in particular panel identification and panel ordering. Existing methods to detect the boundaries of panels take advantage of the black lines generally outlining them, or of the white space called gutter separating them [18,364]. But a number of artists use complex page layouts, which makes both panel detection and ordering much harder: overlapping panels, panels joined by other objects (speech bubbles, caption) partly open panels, or even panels with no explicit boundary [364]. ...
... Existing methods to detect the boundaries of panels take advantage of the black lines generally outlining them, or of the white space called gutter separating them [18,364]. But a number of artists use complex page layouts, which makes both panel detection and ordering much harder: overlapping panels, panels joined by other objects (speech bubbles, caption) partly open panels, or even panels with no explicit boundary [364]. ...
Preprint
Full-text available
A character network is a graph extracted from a narrative, in which vertices represent characters and edges correspond to interactions between them. A number of narrative-related problems can be addressed automatically through the analysis of character networks, such as summarization, classification, or role detection. Character networks are particularly relevant when considering works of fictions (e.g. novels, plays, movies, TV series), as their exploitation allows developing information retrieval and recommendation systems. However, works of fiction possess specific properties making these tasks harder. This survey aims at presenting and organizing the scientific literature related to the extraction of character networks from works of fiction, as well as their analysis. We first describe the extraction process in a generic way, and explain how its constituting steps are implemented in practice, depending on the medium of the narrative, the goal of the network analysis, and other factors. We then review the descriptive tools used to characterize character networks, with a focus on the way they are interpreted in this context. We illustrate the relevance of character networks by also providing a review of applications derived from their analysis. Finally, we identify the limitations of the existing approaches, and the most promising perspectives.
... Thus, identifying these semantic contents it is essential. For panel extraction, techniques that make use of region of interest detection (Stommel, Merhej, & Müller, 2012) and recursive binary splitting (Pang, Cao, Lau, & Chan, 2014) can be taken as key examples. Both methods are quite robust and are capable of extracting panels not separated by white backgrounds as well. ...
Article
Full-text available
Visual impairment can affect a student’s ability to learn since their concept development when interacting with educational material is being limited. Learning activities based on images and visually rich content are mainstream learning methods, where facilitating students with visual impairments for engaged learning can be challenging. For comic books, which have shown promising results in engaged student learning, this problem is more severe. To overcome this challenge, this research presents a novel voice synthesised learning method to reduce the gap between the learning experience of a student with visual impairments compared to a mainstream learning activity. Utilising comic books, the proposed technique and the tool developed extracts semantic content, stores them in a database, and generates an audio stream in multiple languages on the user’s demand. To assess the usability of the system, a survey for a selected set of students with visual impairments was carried out. The results showed a mean rating of 5.76 out of 7 for the Informative Interest. Furthermore, a concept-mapping approach was used to analyse the feedback given through the open-ended questions. From the analysis, key concepts with an emphasis on positive emotions, willingness to try again, and features to be improved, were identified.
... Although the results of the tests conducted for these methods prove that they work well, for comics with border-free panels and those with no distinct separation between the background and the foreground, the performance of these methods are not very good since they have not been taken into consideration when devising these methods. In addition, as a solution for comic books with panels which are not separated by white backgrounds, techniques that utilize region of interest detection [3] as well as recursive binary splitting [4] have been developed. ...
Article
Full-text available
Digitisation of comic books would play a crucial role in identifying new areas in which digital comics can be used. Currently, existing systems in this domain lack the capacity to achieve complete digitisation. Digitisation requires a thorough analysis of the semantic content within comic books. This can be further sub-categorised as detection and identification of comic book characters, extraction and analysis of panels as well as texts, derivation of associations between characters and speech balloons, and analysis of different styles of reading. This paper provides an overview of using several object-detection models to detect semantic content in comics. This analysis showed that, under the constraint of limited computational capacity, YOLOv3 was the best-suited model out of the models evaluated. A study of text extraction and recognition using Optical Character Recognition, a method for determining associable speech balloons, as well as a distance-based approach for associations between characters and speech balloons are also presented here. This association method provides an increased accuracy compared to the Euclidean distance-based approach. Finally, a study on comic style is provided along with a learning model with an accuracy of 0.89 to analyse the reading order of comics.
... Though having high results in the tests conducted for these methods, for comics with border-free panels or ones without a distinct separation between the background and the foreground, these methods perform quite poorly as they have not been considered when devising these methods. In addition, as a remedy for comic books that have panels that are not separated by white backgrounds, techniques that utilize region of interest detection [3] as well as recursive binary splitting [4] have been developed. ...
Conference Paper
Full-text available
Comic book digitization would play a pivotal role in exploring new avenues on how digital comics can be consumed. As of present, the systems capable of doing such a task are limited in capability to achieve complete digitization. This task of digitization requires the understanding of the content within comic books, which can be drawn from sub-tasks such as identification and extraction of comic book content, extraction and analysis of texts, derivation of character-speech balloon associations and analysis of reading styles. In this paper, first, an analysis of the usage of several object detection models for detecting semantic elements is presented. Under the constraint of limited computational power, this analysis revealed that YOLOv3 was the most suited out of the models evaluated. Then, a particular focus is given to the analysis of extraction and recognition of texts utilizing Optical Character Recognition, along with distance-based methods for deriving associable speech balloons as well as character and speech balloon associations under given constraints. The presented association method gave an improved accuracy relative to the Euclidean distance-based method. Finally, an analysis of comic styles is presented along with a learning model to determine the reading order of comics with an accuracy of 0.89.
... For example, it will erroneously filter inset panels and encounter problems with panels that are connected by objects in the foreground. Similarly, a complex algorithm that uses a set of engineered features developed by Stommel et al. (2012) can only detect rectangular panels. ...
Article
Comics are complex documents whose reception engages cognitive processes such as scene perception, language processing, and narrative understanding. Possibly because of their complexity, they have rarely been studied in cognitive science. Modeling the stimulus ideally requires a formal description, which can be provided by feature descriptors from computer vision and computational linguistics. With a focus on document analysis, here we review work on the computational modeling of comics. We argue that the development of modern feature descriptors based on deep learning techniques has made sufficient progress to allow the investigation of complex material such as comics for reception studies, including experimentation and computational modeling of cognitive processes.
... Methods in [1,17,20,24,44,47] rely on white line cutting, connected component labeling, morphological analysis or region growing. More recently, new methods based on watershed [35], line segmentation using the Canny operator and polygon detection [24], region of interest detection [45], and recursive binary splitting [33] have been proposed. The work in [16,17] has proposed new approaches which can handle irregular comic panels by representing the detected panels by a quadrilateral shape instead of the bounding box. ...
Article
Full-text available
Comic book image analysis methods often propose multiple algorithms or models for multiple tasks like panel and character (body and face) detection, balloon segmentation, text recognition, etc. In this work, we aim to reduce the processing time for comic book image analysis by proposing one model that can learn multiple tasks called Comic MTL instead of using one model per task. In addition to detection and segmentation tasks, we integrate the relation analysis task for balloons and characters into the Comic MTL model. The experiments are carried out on DCM772 and eBDtheque public datasets that contain the annotations for panels, balloons, characters and also the associations between balloon and character. We show that the Comic MTL model can detect the associations between balloons and their speakers (comic characters) and handle other tasks like panel and character detection and also balloons segmentation with promising results.
Conference Paper
Full-text available
Comic books represent an important cultural heritage in many countries. However, few researches have been done in order to analyse the content of comics such as panels, speech balloons or characters. At first glance, the structure of a comic page may appear easy to determine. In practice, the configuration of the page, the size and the shape of the panels can be different from one page to the next. Moreover, authors often draw extended contents (speech balloon or comic art) that overlap two panels or more. In some situations, the panel extraction can become a real challenge. Speech balloons are other important elements of comics. Full text indexing is only possible if the text can be extracted. However the text is usually embedded among graphic elements. Moreover, unlike newspapers, the text layout in speech balloons can be irregular. Classic text extraction method can fail. We propose, in this paper, a method based on region growing and mathematical morphology to extract automatically the panels of a comic page and a method to detect speech balloons. Our approach is compared with other methods find in the literature. Results are presented and discussed.
Conference Paper
Full-text available
This paper describes a method for extracting words, textlines and text blocks by analyzing the spatial configuration of bounding boxes of connected component on a given document image. The basic idea is that connected components of black pixels can be used as computational units in document image analysis. In this paper, the problem of extracting words, textlines and text blocks is viewed as a clustering problem in the 2-dimensional discrete domain. Our main strategy is that profiling analysis is utilized to measure horizontal or vertical gaps of (groups of) components during the process of image segmentation. For this purpose, we compute the smallest rectangular box, called the bounding box, which circumscribes a connected component. Those boxes are projected horizontally and/or vertically, and local and global projection profiles are analyzed for word, textline and text-block segmentation. In the last step of segmentation, the document decomposition hierarchy is produced from these segmented objects
Article
This paper describes a new approach to low level image processing; in particular, edge and corner detection and structure preserving noise reduction.Non-linear filtering is used to define which parts of the image are closely related to each individual pixel; each pixel has associated with it a local image region which is of similar brightness to that pixel. The new feature detectors are based on the minimization of this local image region, and the noise reduction method uses this region as the smoothing neighbourhood. The resulting methods are accurate, noise resistant and fast.Details of the new feature detectors and of the new noise reduction method are described, along with test results.
Article
The objective of the research to be pursued is to develop a schema for representing raster-digitized (scanned) documents, The representation is to retain not only the spatial structure of a printed document, but should also facilitate automatic labeling of various components, such as text, figures, subtitles, and figure captions, and allow the extraction of important relationships (such as reading order) among them. Intended applications include (1) data compression for document transmission and archival, and (2) document entry, with out rekeying, into editing, formatting, and information retrieval systems.
Article
A new feature based correspondence algorithm for image matching is presented. The interest operator is optimal for selecting points which promise high matching accuracy, for selecting corners with arbitrary number and orientation of edges or centres of discs, circles or rings. The similarily measure can take the seldomness of the selected points into account. The consistency of the solution is achieved by maximum likelihood type (robust) estimation for the parameters of an object model. -from Author
Article
The Hough transform is a method for detecting curves by exploiting the duality between points on a curve and parameters of that curve. The initial work showed how to detect both analytic curves(1,2) and non-analytic curves,(3) but these methods were restricted to binary edge images. This work was generalized to the detection of some analytic curves in grey level images, specifically lines,(4) circles(5) and parabolas.(6) The line detection case is the best known of these and has been ingeniously exploited in several applications.(7,8,9)We show how the boundaries of an arbitrary non-analytic shape can be used to construct a mapping between image space and Hough transform space. Such a mapping can be exploited to detect instances of that particular shape in an image. Furthermore, variations in the shape such as rotations, scale changes or figure ground reversals correspond to straightforward transformations of this mapping. However, the most remarkable property is that such mappings can be composed to build mappings for complex shapes from the mappings of simpler component shapes. This makes the generalized Hough transform a kind of universal transform which can be used to find arbitrarily complex shapes.
Conference Paper
With rapid growth of the mobile industry, the limitation of small screen mobile is attracting a lot of researchers attention for transforming on/off-line contents into mobile contents. Frame segmentation for limited mobile browsers is the key point of off-line contents tranformation. The X-Y recursive cut algorithm has been widely used for frame segmentation in document analysis. However, this algorithm has drawbacks for cartoon images which have various image types and image with noises, especially the online cartoon contents obtain during scanning. In this paper, we propose a method to segment on/off-line cartoon contents into fitted frames for the mobile screen. This makes the x-y recursive cut algorithm difficult to find the exact cutting point. Therefore we use a method by combining two concepts: an X-Y recursive cut algorithm to extract candidate segmenting positions which shows a good performance on noises free contents, and Multi-Layer Perceptrons (MLP) concept use on candidate for verification. These methods can increase the accuracy of the frame segmentation and feasible to apply on various off-line cartoon images with frames.
Conference Paper
Today, the demand of services for comic contents increases because paper magazines and books are bulky while digital contents can be read anytime and anywhere with cellular phones and PDAs. To convert existing print comic materials into digital format such that they can be read using the cellular phones and the PDAs with small screens, it is necessary to divide each page into scene frames and to determine reading order of the scene frames. The division of comic images into the scene frames can be considered as a type of document layout analysis. We analyzed layout of comic images using density gradient. The method can be applied to comics in which comic balloons or pictures are drawn over scene frames. In this research, a method for detecting the scene frame division in comic images using the density gradient after filling the quadrangle regions in each image with black is proposed. Experimental results show that 80 percent of 672 pages in four print comic booklets are successfully divided into scene frames by the proposed method.