keyboard_arrow_up
A Two Stage Method for Bengali Text Extraction from Still Images Containing Text

Authors

Ankita Sikdar, Payal Roy, Somdeep Mukherjee, Moumita Das and Sreeparna Banerjee, West Bengal University of Technology, India

Abstract

Bengali text data present in multimedia images having multiple content forms, such as still images and text, contain information that when extracted finds a lot of applications. The images can be of different types, where objects and text may be completely separated or overlapped or embedded in each other. The Bengali text can be of different shapes and sizes. Extraction of text from these types of images becomes challenging because the textual portion has to be correctly separated from the rest of the background. The input image passes through two stages. The first step tries to locate the different components in the image using entropy filtering and the second stage distinguishes the components representing text from the non-textual components based on several features of Bengali text. The text thus obtained from the image can then be used in software such as Bengali OCR for character recognition.

Keywords

Bengali character feature identification,Connected components, Entropy filtering & Text extraction

Full Text  Volume 2, Number 3