Designing a Business Card Reader (BCR) for mobile devices is a challenge to
the researchers because of huge deformation in acquired images, multiplicity in
nature of the business cards and most importantly the computational constraints
of the mobile devices. This paper presents a text extraction method designed in
our work towards developing a BCR for mobile devices. At first, the background
of a camera captured image is eliminated at a coarse level. Then, various rule
based techniques are applied on the Connected Components (CC) to filter out the
noises and picture regions.
Business card images are of multiple natures as these often contain graphics,
pictures and texts of various fonts and sizes both in background and
foreground. So, the conventional binarization techniques designed for document
images can not be directly applied on mobile devices. In this paper, we have
presented a fast binarization technique for camera captured business card
images. A card image is split into small blocks. Some of these blocks are
classified as part of the background based on intensity variance.
India is a multi-lingual country where Roman script is often used alongside
different Indic scripts in a text document. To develop a script specific
handwritten Optical Character Recognition (OCR) system, it is therefore
necessary to identify the scripts of handwritten text correctly.
A novel approach for recognition of handwritten compound Bangla characters,
along with the Basic characters of Bangla alphabet, is presented here. Compared
to English like Roman script, one of the major stumbling blocks in Optical
Character Recognition (OCR) of handwritten Bangla script is the large number of
complex shaped character classes of Bangla alphabet. In addition to 50 basic
character classes, there are nearly 160 complex shaped compound character
classes in Bangla alphabet.