keyboard_arrow_up
Enhancement and Segmentation of Historical Records

Authors

Soumya A1 and G Hemantha Kumar2, 1R V College of Engineering, India and 2University of Mysore, India

Abstract

Document Analysis and Recognition (DAR) aims to extract automatically the information in the document and also addresses to human comprehension. The automatic processing of degraded historical documents are applications of document image analysis field which is confronted with many difficulties due to the storage condition and the complexity of the script. The main interest of enhancement of historical documents is to remove undesirable statistics that appear in the background and highlight the foreground, so as to enable automatic recognition of documents with high accuracy. This paper addresses pre-processing and segmentation of ancient scripts, as an initial step to automate the task of an epigraphist in reading and deciphering inscriptions. Pre-processing involves, enhancement of degraded ancient document images which is achieved through four different Spatial filtering methods for smoothing or sharpening namely Median, Gaussian blur, Mean and Bilateral filter, with different mask sizes. This is followed by binarization of the enhanced image to highlight the foreground information, using Otsu thresholding algorithm. In the second phase Segmentation is carried out using Drop Fall and WaterReservoir approaches, to obtain sampled characters, which can be used in later stages of OCR. The system showed good results when tested on the nearly 150 samples of varying degraded epigraphic images and works well giving better enhanced output for, 4x4 mask size for Median filter, 2x2 mask size for Gaussian blur, 4x4 mask size for Mean and Bilateral filter. The system can effectively sample characters from enhanced images, giving a segmentation rate of 85%-90% for Drop Fall and 85%-90% for Water Reservoir techniques respectively.

Keywords

Document Analysis, Preprocessing, Filters, Segmentation, Drop Fall Technique, Water Reservoir Technique

Full Text  Volume 5, Number 13