Download Report

ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 3, Special Issue 2, March 2015
Image and Video Processing Application
K.Ponniyun Selvan1, R.Yogapriya2, M.Yuvasri3
Associate Professor, Department of IT, R.M.K. Engineering College, Tamilnadu, India 1
UG Student, Department of IT, R.M.K. Engineering College, Tamilnadu, India 2
UG Student, Department of EEE, R.M.K. Engineering College, Tamilnadu, India 3
ABSTRACT: This paper is a proposal for application creating a platform for self-learning using image and video
processing. It is implemented using Content Based Image Retrieval. Text information related to the image uploaded,
is displayed by comparing the feature vector of the uploaded image with that of the image already available in the
database, if the feature vector matches. For the images with text, the text from the image is extracted and information
related to the text is retrieved and displayed. Visual speech recognition from video lecture, in which the tutor is
addressing in stand still position, is done by lip information extraction. The speech is displayed as a continuous text
simultaneously, it can be saved separately as a documentfor future reference.
KEYWORDS: Image process, Content Based Image Retrieval (CBIR), text extraction, video process, lip movement
detection.
I.
INTRODUCTION
This is a proposal for content (image) based text retrieval and simultaneous text display for video by analysing the lip
movement.In current scenario, the text based content retrieval need keywords such as metadata or tags to be assigned
to an image manually while uploading an image to a database. This metadata is essential for finding the image. Here
we propose that the image uploaded by the user is compared with the image already available in the database by doing
image comparison. When the image parameters is matched using the keyword of the image, its related description is
displayed as a text content. Consider the scenario of video lecture in which the tutor is not walking to and fro or
person talking, sitting in front of a camera, for example webinar, live discussion for a conference, online video
lectures, celebrities addressing. etc. By reading the lip movement continuously, and identify the content by using
continuous sentence recognizer. The content is displayed in text format simultaneously. It used later for future
reference.
II.
LITERATURE SURVEY
Content based image retrieval (CBIR) is a technique in which content of an imageis used as matching criteria instead
of image‟s metadata such as keywords, tags, or anyname associated with image [2]. This provides most approximate
match as compared to textbased image retrieval. The term content‟ in this context might refer to colours, shapes,
textures, or any otherinformation that can be derived from the image itself [1] [7].
Upper and lower lips in side-face images are modelled by two linecomponents.Ananglebetweenthetwolinesisused as
the lip-contour geometric features (LCGFs). The angle is hereafter referred to as “lip-angle.” The lip-angle extraction
processconsistsofthreecomponents:detectingalip area, extracting a center point of lips, and determining liplines and a
lip-angle[3].
The Apache Hadoop is a collection open-source software projects for reliable, scalable,distributed computing.
Software libraries of Hadoop specify a framework that allowsdistributed processing of large data sets across
Copyright to IJIRCCE
www.ijircce.com
1
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 3, Special Issue 2, March 2015
clusters of computers using simple programming models. It is designed to scale up from single to thousands of
machines, eachoﬀering local computation and storage [4][5].
The lip area recognized is analysed frame by frame
And compared with that of the content stored in the database and the respective word is generated [6].
III.
IMAGE PROCESS
Image process is any form of signal processing for which the input is an image, such as a photograph or video frame
.The output of image processing may be either an image or a set of characteristic or parameters related to the image.
Image retrieval based on colour actually means retrieval on colour descriptors. Most commonly used colour
descriptors are the colour histogram, colour coherence vector, colour correlogram, and colour moments. A colour
histogram identifies the proportion of pixels within an image holding specific values which can be used to find
similarity between two images by using similarity distance measures [1], [2].It tries to identify colour proportion by
region and is independent of image size, format or orientation .
Figure 1:upload image
This figure shows how user interface of uploading image. The image is uploaded by clicking upload image button,
after the image has been uploaded, click OK button, to cancel the process click CANCEL button.
IV.
IMAGE CONVERSION
Colour Histogram of an image gives information about its structure. It givesnumber of pixels, its colours in RGB
format etc. With this information we calculate mean, entropy,median of an image which can be used as features of an
image in case of feature extraction. This forms the feature vector. This feature can be used to compare with queried
image‟s feature and to retrieveall similar images from database[7].
V.
SIMILARITY MATCHING
Feature vector of each uploaded image is stored in database. This feature vector is matchwith feature
vector of an input image. Both feature vectors are used to calculate similarity coeﬃcient. Map Reduce technique is
used at similarity matching stage of system. It can also be eﬃciently used in feature extraction process by splitting
image into partsat map stage and then combining it at reduce level.
Copyright to IJIRCCE
www.ijircce.com
2
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 3, Special Issue 2, March 2015
VI.
IMAGE RETRIEVAL
From matched feature vector from the processed image and image already available, the details about the image is
found.
Figure 2: Image Search and Retrieval
Both the feature vectors are compared, if matched, the related content is fetched from the database and is displayed as
text description.
VII.
DESCRIPTION DISPLAY
The picture below shows how the description about the image is shown in user interface.
Figure 3: Description About Uploaded Image
In case the image uploaded contains text, it is identified using Optical Character Recognition (OCR). The related
content of the retrieved text is searched in database and the content the is displayed as shown in the fig. 3. By this
way images with text can also be made readable and information regarding it is obtained.
VIII.
VIDEO PROCESS
a. Face and Mouth Position Detection:
Features used for face detection are grey-level differences between sums of pixel values in different, rectangle regions
in an image window. The window slides over the image and changes its
Copyright to IJIRCCE
www.ijircce.com
3
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 3, Special Issue 2, March 2015
scale. Image features may be computed rapidly for any scale and location in a video frame using integral images. Face
detection algorithm finds location of all faces in every video frame. It isassumed, that only one person is present in the
camera field of view therefore only the first face location is used for further processing. In order to increase speed of
the face detection and to make sure that the face is large enough to recognize lip gestures, the minimal width of a face
was set to the half of the image frame width. Sample results of face detection and mouth region finding are pictured in
Fig. 4. The mouth region is localized arbitrary in the lower part of the face region detected. It isdefined by the halfellipse horizontally centred in the lower half of the face region. Thewidth and the height of the half-ellipse is equal to
the half of the height and half of the width of the face region, respectively. Only the mouth region of each video frame
is used for lip gesture recognition.
Figure 4: lip region detection
b. Lip Gesture Recognition
The lip region size is not constant and the region moves and tilts according to the results of lip shape approximation.
Figure 5: Find the alphabet based on lip movement
The above picture gives a view about the relation between the lip gesture and the alphabet pronunciation. For each
letter like „a‟, „o‟ the lip movement is different. This is read from the monitoring region [6].
IX.
HADOOP
Apache Hadoop is open source software which processes on large scale storage on commodity hardware [4].
There are several modules for Apache Hadoop.
• Hadoop Common: These libraries are needed for running different programs on Hadoop Module.
• Hadoop Distributed File System (HDFS): Hadoop Distributed file System is the storage which
stores the large amount of data on the commodity hardware.
• Hadoop YARN: These are used for scheduling the resources and users‟ application.
• Hadoop Map-Reduce: Hadoop Map-Reduce processes large data in parallel and gives result with
the best performance.
Hadoop has been designed such that its software framework can automatically manage and deal with hardware failure.
Hadoop Map-Reduce and HDFS are designed with the help of Google Map-reduce and Google file system.
Copyright to IJIRCCE
www.ijircce.com
4
ISSN(Online): 2320-9801
ISSN (Print): 2320-9798
International Journal of Innovative Research in Computer
and Communication Engineering
(An ISO 3297: 2007 Certified Organization)
Vol. 3, Special Issue 2, March 2015
X.





BENEFITS
The background disturbance in the video is not converted to text, since only lip movement is considered.
The video conversion is useful for deaf, they will be able to get the content of the video in text.
Image processing is useful in the case of knowing the information about the images needed.
If no content is available related to the image, then description about the colour is displayed.
Unrelated content is avoided during the search
XI.
CONCLUSION AND FUTURE WORK
This application is a proposal mainly done for the benefit of partially hearing, deaf people and a self-learning
tool. This process image and video and provide related text filed that can be stored for future reference too.
The disturbance and noise in video is eliminated, so no information is lost.We plan to develop text processing
technique further and incorporate more features and implement in real time environment.
REFERENCES
[1]Text to speech: a simple tutorial by D.Sasirekha, E.Chandra. International Journal of Soft Computing and Engineering (IJSCE) ISSN:
2231-2307, Volume-2, Issue-1, March 2012.
[2]Private Content Based Image Information Retrieval using Map-ReducebyArpit D. Dongaonkar Department of Computer Engineering
and information technology, college of engineering, pune-5 june, 2013.
[3]Audio-visualspeechrecognitionusinglipinformationextractedfromside-faceimages
Kojiiwano,tomoakiyoshinaga,satoshitamura,andsadaokifuruiDepartment of Computer Science, Tokyo Institute of Technology, Japan.
Received 12 July 2006; Revised 24 January 2007; Accepted 25 January 2007.
[4] Apache hadoop. [Online]. Available: http://hadoop.apache.org/
[5] Liangliang Shi, Bin Wu,BaiWangandXuguangYan“Map/reduce in CBIR application,” International Conference on Computer Science
and Network Technology (ICCSNT),Vol.4 , pp. 2465 – 2468, Dec. 2011.
[6]Human-computer interface based on visual lip movement and gesture recognition by Piotr Dalka, Andrzej Czyzewski, Gdansk
University of Technology, Multimedia Systems Department.International Journal of Computer Science and Applications,
Technomathematics Research Foundation Vol. 7 No. 3, pp. 124 - 139, 2010.
[7]Shankar M. Patil “Content Based Image Retrieval Using Color, Texture and Shape,”International Journal of Computer Science &
Engineering Technology (IJCSET), Vol. 3, Sept. 2012.
Copyright to IJIRCCE
www.ijircce.com
5