ISSN(Online): 2320-9801 ISSN (Print): 2320-9798 International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol. 3, Special Issue 2, March 2015 Image and Video Processing Application K.Ponniyun Selvan1, R.Yogapriya2, M.Yuvasri3 Associate Professor, Department of IT, R.M.K. Engineering College, Tamilnadu, India 1 UG Student, Department of IT, R.M.K. Engineering College, Tamilnadu, India 2 UG Student, Department of EEE, R.M.K. Engineering College, Tamilnadu, India 3 ABSTRACT: This paper is a proposal for application creating a platform for self-learning using image and video processing. It is implemented using Content Based Image Retrieval. Text information related to the image uploaded, is displayed by comparing the feature vector of the uploaded image with that of the image already available in the database, if the feature vector matches. For the images with text, the text from the image is extracted and information related to the text is retrieved and displayed. Visual speech recognition from video lecture, in which the tutor is addressing in stand still position, is done by lip information extraction. The speech is displayed as a continuous text simultaneously, it can be saved separately as a documentfor future reference. KEYWORDS: Image process, Content Based Image Retrieval (CBIR), text extraction, video process, lip movement detection. I. INTRODUCTION This is a proposal for content (image) based text retrieval and simultaneous text display for video by analysing the lip movement.In current scenario, the text based content retrieval need keywords such as metadata or tags to be assigned to an image manually while uploading an image to a database. This metadata is essential for finding the image. Here we propose that the image uploaded by the user is compared with the image already available in the database by doing image comparison. When the image parameters is matched using the keyword of the image, its related description is displayed as a text content. Consider the scenario of video lecture in which the tutor is not walking to and fro or person talking, sitting in front of a camera, for example webinar, live discussion for a conference, online video lectures, celebrities addressing. etc. By reading the lip movement continuously, and identify the content by using continuous sentence recognizer. The content is displayed in text format simultaneously. It used later for future reference. II. LITERATURE SURVEY Content based image retrieval (CBIR) is a technique in which content of an imageis used as matching criteria instead of image‟s metadata such as keywords, tags, or anyname associated with image [2]. This provides most approximate match as compared to textbased image retrieval. The term content‟ in this context might refer to colours, shapes, textures, or any otherinformation that can be derived from the image itself [1] [7]. Upper and lower lips in side-face images are modelled by two linecomponents.Ananglebetweenthetwolinesisused as the lip-contour geometric features (LCGFs). The angle is hereafter referred to as “lip-angle.” The lip-angle extraction processconsistsofthreecomponents:detectingalip area, extracting a center point of lips, and determining liplines and a lip-angle[3]. The Apache Hadoop is a collection open-source software projects for reliable, scalable,distributed computing. Software libraries of Hadoop specify a framework that allowsdistributed processing of large data sets across Copyright to IJIRCCE www.ijircce.com 1 ISSN(Online): 2320-9801 ISSN (Print): 2320-9798 International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol. 3, Special Issue 2, March 2015 clusters of computers using simple programming models. It is designed to scale up from single to thousands of machines, eachoffering local computation and storage [4][5]. The lip area recognized is analysed frame by frame And compared with that of the content stored in the database and the respective word is generated [6]. III. IMAGE PROCESS Image process is any form of signal processing for which the input is an image, such as a photograph or video frame .The output of image processing may be either an image or a set of characteristic or parameters related to the image. Image retrieval based on colour actually means retrieval on colour descriptors. Most commonly used colour descriptors are the colour histogram, colour coherence vector, colour correlogram, and colour moments. A colour histogram identifies the proportion of pixels within an image holding specific values which can be used to find similarity between two images by using similarity distance measures [1], [2].It tries to identify colour proportion by region and is independent of image size, format or orientation . Figure 1:upload image This figure shows how user interface of uploading image. The image is uploaded by clicking upload image button, after the image has been uploaded, click OK button, to cancel the process click CANCEL button. IV. IMAGE CONVERSION Colour Histogram of an image gives information about its structure. It givesnumber of pixels, its colours in RGB format etc. With this information we calculate mean, entropy,median of an image which can be used as features of an image in case of feature extraction. This forms the feature vector. This feature can be used to compare with queried image‟s feature and to retrieveall similar images from database[7]. V. SIMILARITY MATCHING Feature vector of each uploaded image is stored in database. This feature vector is matchwith feature vector of an input image. Both feature vectors are used to calculate similarity coefficient. Map Reduce technique is used at similarity matching stage of system. It can also be efficiently used in feature extraction process by splitting image into partsat map stage and then combining it at reduce level. Copyright to IJIRCCE www.ijircce.com 2 ISSN(Online): 2320-9801 ISSN (Print): 2320-9798 International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol. 3, Special Issue 2, March 2015 VI. IMAGE RETRIEVAL From matched feature vector from the processed image and image already available, the details about the image is found. Figure 2: Image Search and Retrieval Both the feature vectors are compared, if matched, the related content is fetched from the database and is displayed as text description. VII. DESCRIPTION DISPLAY The picture below shows how the description about the image is shown in user interface. Figure 3: Description About Uploaded Image In case the image uploaded contains text, it is identified using Optical Character Recognition (OCR). The related content of the retrieved text is searched in database and the content the is displayed as shown in the fig. 3. By this way images with text can also be made readable and information regarding it is obtained. VIII. VIDEO PROCESS a. Face and Mouth Position Detection: Features used for face detection are grey-level differences between sums of pixel values in different, rectangle regions in an image window. The window slides over the image and changes its Copyright to IJIRCCE www.ijircce.com 3 ISSN(Online): 2320-9801 ISSN (Print): 2320-9798 International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol. 3, Special Issue 2, March 2015 scale. Image features may be computed rapidly for any scale and location in a video frame using integral images. Face detection algorithm finds location of all faces in every video frame. It isassumed, that only one person is present in the camera field of view therefore only the first face location is used for further processing. In order to increase speed of the face detection and to make sure that the face is large enough to recognize lip gestures, the minimal width of a face was set to the half of the image frame width. Sample results of face detection and mouth region finding are pictured in Fig. 4. The mouth region is localized arbitrary in the lower part of the face region detected. It isdefined by the halfellipse horizontally centred in the lower half of the face region. Thewidth and the height of the half-ellipse is equal to the half of the height and half of the width of the face region, respectively. Only the mouth region of each video frame is used for lip gesture recognition. Figure 4: lip region detection b. Lip Gesture Recognition The lip region size is not constant and the region moves and tilts according to the results of lip shape approximation. Figure 5: Find the alphabet based on lip movement The above picture gives a view about the relation between the lip gesture and the alphabet pronunciation. For each letter like „a‟, „o‟ the lip movement is different. This is read from the monitoring region [6]. IX. HADOOP Apache Hadoop is open source software which processes on large scale storage on commodity hardware [4]. There are several modules for Apache Hadoop. • Hadoop Common: These libraries are needed for running different programs on Hadoop Module. • Hadoop Distributed File System (HDFS): Hadoop Distributed file System is the storage which stores the large amount of data on the commodity hardware. • Hadoop YARN: These are used for scheduling the resources and users‟ application. • Hadoop Map-Reduce: Hadoop Map-Reduce processes large data in parallel and gives result with the best performance. Hadoop has been designed such that its software framework can automatically manage and deal with hardware failure. Hadoop Map-Reduce and HDFS are designed with the help of Google Map-reduce and Google file system. Copyright to IJIRCCE www.ijircce.com 4 ISSN(Online): 2320-9801 ISSN (Print): 2320-9798 International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol. 3, Special Issue 2, March 2015 X. BENEFITS The background disturbance in the video is not converted to text, since only lip movement is considered. The video conversion is useful for deaf, they will be able to get the content of the video in text. Image processing is useful in the case of knowing the information about the images needed. If no content is available related to the image, then description about the colour is displayed. Unrelated content is avoided during the search XI. CONCLUSION AND FUTURE WORK This application is a proposal mainly done for the benefit of partially hearing, deaf people and a self-learning tool. This process image and video and provide related text filed that can be stored for future reference too. The disturbance and noise in video is eliminated, so no information is lost.We plan to develop text processing technique further and incorporate more features and implement in real time environment. REFERENCES [1]Text to speech: a simple tutorial by D.Sasirekha, E.Chandra. International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-2, Issue-1, March 2012. [2]Private Content Based Image Information Retrieval using Map-ReducebyArpit D. Dongaonkar Department of Computer Engineering and information technology, college of engineering, pune-5 june, 2013. [3]Audio-visualspeechrecognitionusinglipinformationextractedfromside-faceimages Kojiiwano,tomoakiyoshinaga,satoshitamura,andsadaokifuruiDepartment of Computer Science, Tokyo Institute of Technology, Japan. Received 12 July 2006; Revised 24 January 2007; Accepted 25 January 2007. [4] Apache hadoop. [Online]. Available: http://hadoop.apache.org/ [5] Liangliang Shi, Bin Wu,BaiWangandXuguangYan“Map/reduce in CBIR application,” International Conference on Computer Science and Network Technology (ICCSNT),Vol.4 , pp. 2465 – 2468, Dec. 2011. [6]Human-computer interface based on visual lip movement and gesture recognition by Piotr Dalka, Andrzej Czyzewski, Gdansk University of Technology, Multimedia Systems Department.International Journal of Computer Science and Applications, Technomathematics Research Foundation Vol. 7 No. 3, pp. 124 - 139, 2010. [7]Shankar M. Patil “Content Based Image Retrieval Using Color, Texture and Shape,”International Journal of Computer Science & Engineering Technology (IJCSET), Vol. 3, Sept. 2012. Copyright to IJIRCCE www.ijircce.com 5
© Copyright 2025