Article

Ruled lines detection and removal in grey level handwritten image documents

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Converting Handwritten documents to its machine written counterpart using Optical Character Recognition has many benefits like easier formatting, less storage apace, and automatic translation if needed. However, it has many noise difficulties; one of them is the ruled lines intersection with the text. In this paper, we introduce a trial to detect these ruled lines and remove them without affecting the text strokes significantly in grey-level image documents. The detection stage use Hough Transform in four squared sub-windows. And the removal stage employs intensity histogram and its entropy to isolate the text. The removal stage is followed by a morphological based enhancement for the resulting text. The proposed technique has been tested on several test real-world image documents and achieved very good results.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

Chapter
Even state-of-the-art neural approaches to handwriting recognition struggle when the handwriting is on ruled paper. We thus explore CNN-based methods to remove ruled lines and at the same time retain the parts of the writing overlapping with the ruled line. For that purpose, we devise a method to create a large synthetic dataset for training and evaluation of our models. We show that our best model variants are capable of reconstructing characters that are overlapping with the line to be removed, which is a problem that simpler approaches often fail to solve. On a dataset of children handwriting, we show that removing the ruled lines improves character recognition. We made our synthetic dataset and all experimental code available to foster further research in this area.
Conference Paper
Full-text available
In this paper, we present a method for removing ruling lines from handwritten documents, making no damage to the existing characters. It is argued that ruling lines have a predictable position in the page, but their thickness and the distance between them may differ from one document to another, which is estimated with simple algorithm. Another important challenge in this regard is detecting the edge of the line. In this paper the two columns that best represent edge of the main lines are considered. Compare to other methods, our method, which has been tested on six languages, that is English, French, German, Greek, Arabic and Persian, displays reduced time in the computation of rule-line removal and higher performance.
Conference Paper
Full-text available
Paper often includes pre-printed ruling lines to help people write more neatly. This particular example of real- world noise can have a serious impact on applications such as handwriting recognition and writer identification, however. In this work, we investigate the effects of ruling lines on writer ID. We study a method for detecting and removing ruling lines and test its utility for Arabic writer identification through a series of experiments. Our preliminary results show that under realistic assumptions where ruling lines are expected to have different properties across the collection, e.g., thickness, spacing, etc., removing them significantly improves identification performance. We conclude with a discussion of work-in-progress to examine follow up questions raised by our initial investigations.
Conference Paper
Full-text available
In this paper we present a procedure for removing ruling lines from a handwritten document image that does not break existing characters. We take advantage of common ruling line properties such as uniform width, predictable spacing, position vs. text, etc. The proposed process has no effect on document images without ruling lines, hence no a priori discrimination is required. The system is evaluated on synthetic page images in five different languages.
Conference Paper
Full-text available
Ruling lines are commonly used to help people write neatly on paper. In document analysis, however, they raise hurdles for the tasks of handwriting recognition or writer identification. In this paper, we model ruling line detection as a multi-line linear regression problem and then derive a globally optimal solution under the Least Squares Error. For performance evaluation, we compute the error statistics on the model attributes and also employ human correction of algorithmic results for performance evaluation, instead of using pixel-level performance measures. We demonstrate the effectiveness of our method on three datasets, including modern and historic document images. Specifically, we obtained 95% accuracy in detecting ruling lines in a modern handwriting dataset with 100 documents. Under an interactive evaluation framework, the new algorithm showed performance gains over one existing approach.
Conference Paper
Converting handwritten documents into its machine written counterpart automatically requires several processes including removing background noise and ruled lines, then Optical Character Recognition. In this paper, we present a fast detection and removal algorithms for ruled lines in colored scanned handwritten documents. The ruled lines detection is based on Hough transform of the centralized 1/9 th image rectangle. Once the ruled lines are detected, the removal process or text isolation has been developed based on the hue histogram segmentation in full-color image documents. The early results show a very promising effectiveness and reliability of the proposed method.
Conference Paper
Analysis of handwritten document images is one of the key areas of research in image processing domain. The objective of the analysis is to recognize the text components in an image and extract the intended information. However, inscription of handwriting usually would be on documents with rule lines, since they act as guide lines to the writer to ensure the writing remains straight and is of uniform size. These lines make the task of recognition difficult and hence removing them automatically becomes a major issue in text image processing. To accomplish this objective, an attempt is being made in this paper to remove the horizontal rule lines and vertical margin line for efficient recognition and analysis of the foreground text. Using mathematical morphology, predominant horizontal and vertical lines are removed leaving out stray lines which hinder the further processing of text. The stray lines are identified and removed using entropy with sliding window based on dynamic thresholding.
Conference Paper
Ruling line removal is an important pre-processing step in document image processing. Several algorithms have been proposed for this task. However, it is important to be able to take full advantage of the existing algorithms by adapting them to the specific properties of a document image collection. In this paper, a system is presented, appropriate for fine-tuning the parameters of ruling line removal algorithms or appropriately adapt them to a specific document image collection, in order to improve the results. The application of our method to an existed line removal algorithms is presented.
Conference Paper
In this paper, a new threshold correction method for document image binarization that is forcused on ruled-line extraction is presented. This method enhances the binary image of a ruled line, which is often adversely influenced by adjacent text pixels or background noise. The threshold correction method consists of two submethods. One is a noise reduction method that is based on background determination, and the other is a threshold surface conversion method. Both these methods use the aspect of local straightness feature to distinguish ruled-line pixels from background pixels.