Document 239275

Multimedia Content Analysis
Dr. Alan Hanjalic
Information and Communication Theory Group
Department of Mediamatics
Delft University of Technology
What is Multimedia Content Analysis?
• MCA
• Research direction within Multimedia Retrieval targeting the
extraction of content-related information from multimedia data
Audiovisual data
Processed audiovisual data
News report
on topic T
Alpine
landscape
Algorithm
Algorithm
Suspicious behavior
2
1
What is Multimedia Content Analysis?
•
Multimedia CA versus Audiovisual CA?
• Search for synergy and not for simple add up
•
Integration of information from different
modalities already at low-level processing steps
• Combining features from different modalities in
multi-modal content models
• Letting the “small pieces” of information from
different modalities complement each other at
various levels of the content analysis process in
providing reliable input for reasoning at the
highest level
3
MCA Case Study:
Video Content Analysis (VCA)
•
Video “True” multimedia
•
•
•
Vast popularity of digital video
•
•
•
•
•
Synchronized visual, audio (music, speech) and text
modalities
Communicated information A “synergy” of
information segments carried by different modalities
Compression technology
High-capacity digital storage systems
Affordable digital cameras
Access to Internet and broadband networks
Emerging huge collections of digital video –
digital video libraries (DVL)
4
2
Benefits of VCA:
Handling Digital Video Broadcasting (1)
•
Broadcasters transfer to digital video
production, transmission and receiving chain
•
Growing popularity of high-capacity digital
storage devices at consumers
Huge amount of video hours
instantaneously accessible by the user
5
Benefits of VCA:
Handling Digital Video Broadcasting (2)
•
Explosion in consumer choice leads to important consequences for
TV broadcast “consumption”
•
Changes in understanding of the broadcasting mechanism
• The concept of a “channel” will lose its meaning
•
Broadcasted material will be recorded routinely and automatically
• Programs will be accessed on demand – from the local storage
• Viewing of live TV is likely to drastically diminish with the time
6
3
Benefits of VCA:
Handling Digital Video Broadcasting (3)
•
VCA can make a difference!
•
Securing the maximum transparency of
stored content
• Efficiently and effectively
organizing and presenting the stored
content to the user
Automated video abstraction
• Channeling the stored video
material to the user according to his
preferences
Personalized video delivery
7
Where else can VCA make a
difference?
•
Remote education
• Instructional video archives easily manageable, searchable and reusable
•
Business
• Summarization and topic-based organization of conference/meeting videos
• Internet video collections easily manageable, searchable and reusable
•
Security/Public Safety
• Smart cameras for video surveillance
8
4
Video Content Analysis:
Data Features Semantics
•
Features (Signal/data properties)
- Color composition
- Shape and texture characteristics
- Camera and object motion
intensity / direction
- Speech and audio signal properties
-…
Semantic
Gap
•
Semantics (Signal/data meaning)
?
- News report on Euro
- Car chase through NY
- Dialog between A and B
- An interview
- Happiness
- Romance
-…
9
Development of the VCA research area
•
Shot-boundary detection (1991 - )
•
Still and dynamic video abstracts (1992 - )
• Parsing a video into continuous camera shots
Low-level VCA
• Making video browsable via representative frames (keyframes)
• Generating short clips carrying the essence of the video content
•
High-level parsing (1997 - )
• Parsing a video into semantically meaningful
segments
•
Automatic annotation (indexing) (1999 - )
High-level VCA
• Detecting prespecified events/scenes/objects
in video
•
Affective VCA (2001 - )
•
Future: Multimedia Content Mining and Knowledge Discovery!
• Extracting moods and emotions conveyed by the video content
10
5
Toward the Meaning of Video Data:
How to bridge semantic gap?
•
The “catch”:
• Integration of feature-based evidence and
domain knowledge
•
Example:
• Finding the appearances of a news reader in TV news
• Assumptions based on domain knowledge:
• Shots with the news reader are characterized
by (more-or-less) constant studio background
• Faces of a news readers are the only faces in the program
appearing at multiple times in a (more-or-less) constant setting
• Speech signal characteristics are the same in all shots featuring
the same news reader
•
Important: Level of prior knowledge
• depends on context and nature of application
• determines the flexibility and applicability scope of a method
11
Accessing Video by Text Labels
Labels:
“News report on topic T”
•
“Dialog”
“Classical” video indexing approach
• Classify video clips according
to content labels
• Content label
Video
Videoindexing
indexing
“Score in a soccer game”
“News report
on topic T”
“Dialog”
“Score in a
soccer game”
• Serves as index (annotation, label) for
a given video clip in a video archive
• Enables easy retrieval of video clips
• Typically prespecified by the user
• Examples of labels:
Labeled video clip
Video frame
• News: “Parliament”, United Nations”, “Amsterdam”, Foreign politics”
• Movie: “Action”, “Romance”, “Dialog”, “Car chase”
• Wildlife documentary: “Hunt”, “Running lion”
12
6
Video Indexing:
General approach
•
Generally, a pattern classification problem
• Assigning a label to a clip according to the match between the content
model representing the label and the pattern formed by the features of
the data in the clip
•
A simple, illustrative example
(Yeung and Yeo, 1997)
• Apply time-constrained clustering to shots
• Label all shots from one and the same cluster by the same letter
• Search for content patterns in the series of labels
“noise” label
Minimum allowed repetition of labels
XYZABABABCDEFEDEGHIABCDEBFGEHIBABCE
dialog
action
13
Video Indexing via Content Modeling
•
•
More indexing robustness and more difficult indexing criteria require
a more sophisticated indexing approach
A promising approach:
• Define general hierarchy of semantic concepts
• Define probabilistic content models at each level of hierarchy
• Put different concepts in probabilistic interrelation with each other
using a networked model based on prior knowledge
Allowing flexibility in feature-concept and concept-concept relations
Spreading the uncertainty in these relations over different nodes and layers
•
•
Assigning label X to a clip if model X is found with a sufficiently high
confidence
NOTE: Seamless use of different modalities at different model levels
14
7
General Hierarchy of
Semantic Concepts
•
High-level concepts
Topics
• “Topics”
• most general content aspects of video clips
•
Action movie scene
Suspicious human behavior News topic T
Goal
Human-talking Dialog
Moving car
Hunt Explosion
Events
Car
Cityscape
Chair Snow Sky
Indoor Lion Sunset Outdoor
Mid-level concepts
Sites,
Objects
• “Events”
• Have dynamic audiovisual content
• Serve as main components of a “topic”
•
Features
Low-level concepts
• “Sites” and “Objects”
• Static site and object instances
Video frames
15
Modeling Low-Level Semantic Concepts:
Example approach
•
•
Concept of “Multijects” (Naphade and Huang, 2002)
A model of the semantic concept X
• Giving as output the probability that
concept X is present in the clip
• Taking as input
P(“Indoor”| Features, Other multijects)
“Indoor”
• Features computed in the clip
• Weighted probabilities of the presence of
other semantic concepts in the clip
• Can be realized as e.g. a mixture of Gaussians
•
Weights reveal the likely
cooccurrence of concepts
weight
Features
Other multijects
(“Sky”, “Bed”, “Chair”, “Snow”…)
• “Sky” and “Snow” reduces the confidence in detecting “Indoor”
• “Indoor” more likely if “Chair” or “Bed” have already been detected
16
8
Modeling Medium-level Semantic
Concepts
•
Beginning
of hunt
Hidden Markov models (HMM)
• Practical and effective mechanism for
modeling time-varying patterns
•
2nd hunt shot
Non-hunt
HMM-based “Event” model
Valid hunt
• A complex “multiject”
End of hunt
• Observations consisting of features
• Prior probabilities depending on presence of other semantic concepts
• Output representing the posterior probability of the modeled event
•
Complex events complex HMM-models
• Event-coupled and hierarchical HMM
17
Example Approach
•
(Li and Sezan, 2001)
Detection of “Play” and “Break” segments in baseball
No play
• Shot classification into
•
•
•
•
“Start-of-play”, (
“No play”, (
)
“In play”, (
)
“End-of-play” (
)
Start-of-play
End-of-play
)
HMM
In play
18
9
Modeling High-Level Semantic
Concepts
•
Bringing low- and medium-level content models into probabilistic relations
to each other
• e.g. Bayesian belief networks, factor graphs
+
Skydiving
Bird
-
+
•
Example: “Multinet”
(Naphade and Huang, 2002)
Airplane
• Multijects serve as nodes
• Links between multijects reveal prior
knowledge regarding
Shark
-
+
Indoor
Water
• Cooccurrence of different semantic concepts
•
•
“Shark” is unlikely to cooccur with “Bird”
“Shark” is supported by “Water”
Features
• Contextual constraints (e.g. spatio-temporal ones)
•
•
“Sky” is always above “Water”
Speech synchronous with facial expressions to model the event of “Human talking”
19
Multi-Segment Video Indexing
P(Cat.1/Cat.N)
•
HMM’s state transition probability
• Directs category changes from one
segment to another
Miscellaneous
Miscellaneous
•
Category
Category11
...
Category
CategoryNN
All segments are classified
simultaneously according to the
most probable path through HMM
Video sequence
Cat.1
Segment 1
Cat.3
Segment 2
Misc.
Segment 3
Cat.4
Segment 4
Cat.N
Segment 5
Cat.5
Segment 6
20
10
Video Indexing via Content Modeling:
What did we learn?
•
Main principle:
• Use expert knowledge to compose high-level content models
• Train the model parameters using (immense quantity and diversity) of
training data
• Apply models to new data, detect the occurrence of the modeled
semantic concept, if confidence is high then assign label!
•
The more knowledge, the more sophisticated labels:
• “Ronaldo making a sudden move to the right and catching the ball with
his left leg while scratching his nose with his right arm”
21
Video Indexing via Content Modeling:
Some observations
•
(Probabilistic) class modeling is one of basic approaches in pattern
classification (generative classification)
• Straightforward “jump” from Pattern Classification to MCA
• Increasing tendency to “classify anything one gets hold of!”
• Abundance of narrow-scope/narrow-domain MCA solutions
•
Examples of recent results
• Face detectors that can not handle all face poses, camera views,
occlusions, lighting conditions, complex backgrounds, …
• Goal detectors that can handle specific camera views only
• Tools capable of finding Bill Clinton in front of American flag (but,
please don’t remove the flag!)
•
Main problem: Solutions not (sufficiently) scalable
• Too sensitive to training data, too inflexible and too much based on
heuristics
22
11
The curse of domain knowledge
•
Bridging the semantic gap by integrating feature-based evidence and
domain knowledge
• Example problem: Searching for video clips with a Ferrari
•
Narrowing the scope of the problem More domain knowledge Narrower semantic gap
• Formula 1: red color, keywords, logos
•
Advantage:
• Well-defined narrow-scope problem can be solved successfully
•
Disadvantage:
• Number of specific scenarios to cover the whole problem ∞
• One solution needed per scenario Not (always) the way to go!
23
An analogy: Image compression
•
Image-specific compression method
May lead to optimized
rate-distortion performance
per considered image,
but practically irrelevant!
• Take an image
• Analyze the pixel content
• Use analysis results to develop the compression method with optimized
rate-distortion performance for that image
•
Generic compression principle
• Analyze general image properties relevant for compression
• Develop generic compression principle that may work well for every
image (e.g. for redundancy and irrelevance removal)
• Optimize the PRINCIPLE and not the coding performance for a single
image!
24
12
Benefits of working on a
generic principle
•
THE way to
• approach solving the problem COMPLETELY
• increase the robustness of VCA solutions due to strong theoretical
foundation of the principle
• secure the performance constancy across entire target domain
• make turning of research into technology feasible
•
•
Cross-domain applicability possible
Concentrating research manpower on the same problem
• Many brains working on one challenge, instead on many scattered
small problems
Joint successful standardization activities!
25
An Example:
A Glimpse at Surveillance Application
•
Modeling and classification based on prespecified events (high-level
prior knowledge) possible, but does not lead to a practical widelyapplicable solution:
• What is a suspicious event? Don’t know a priori!
• Don’t care what it is precisely! Just alert me if it is suspicious!
•
Possible alternative approach:
• Use prior knowledge at the lowest inference levels
• Let the system learn the highest-level inference itself, e.g. based on
sporadic feedback from the user
•
No scattered event-detection modules, but a generic system based
on one basic principle!
26
13
Automated surveillance:
A concept of an ideal solution
Low-level prior knowledge
Audio
Sensor
Suspiciousness model:
Integration of modalities
Video
Relevance feedback
Threshold setting
based on desired
alertness level
Level of suspiciousness
Potential event of interest
Threshold
Alert!
Alert!
Alert!
Alert!
time
27
Smart Camera’s Suspiciousness Model:
A generic development approach
•
Some “ingredients”
•
•
•
•
•
•
•
People detection
People counting
People/group motion detection and recognition
New still object detection (“somebody left a suitcase!”)
Extremes in “perceived” audio signals
…
Model development (integration)
•
•
•
•
Calibration based on context
Adaptation (learning) using relevance feedback
Translating input/computed cues into confidence
Integrating all confidences into the confidence for overall suspiciousness level
System currently under development at ICT Group!
28
14
Where the “curve” idea already works:
Affective MCA
•
Affective MCA
•
•
•
•
•
Importance
•
•
•
•
•
Extraction of affect (feelings, emotions, moods) from data/signal
Extracting affect-related features from different modalities
Combining features in affect models
Indexing temporal signal segments in view of its affective properties
Affect-based indexing (“find me exciting, relaxing, funny content”)
Personalized content recommendation (“I’m in the mood for …”)
Highlights extraction (e.g. “find 10 most exciting minutes of …”)
Automated surveillance (e.g. detection of aggression, fight, tension)
Relation to the state-of-the-art MCA research
• So far, emphasis on “cognitive” content - “facts” (temporal content structure,
scene composition and type, contained objects/events”)
29
An example:
Extracting moods from AV signal
•
A “straightforward” approach
• Pick a set of moods that you are interested in extracting
• Pick a set of features and a classification mechanism
• Collect training data, train the classifier and classify!
•
Problems:
• Affect too abstract to be modeled explicitly
• The prespecified set of moods can hardly be exhaustive
• Which color, texture, motion or structure is to be linked to joy, sadness,
tension or happiness?
• Immense variety of the content corresponding to a given mood
Training data? Difficult to generalize the obtained results
30
15
Searching for features:
Ask people who know more about it!
•
Advertising people
• Power of color to induce product-related mood effects
• Combining color with scene structure to enhance the effect
•
Psychophysiology people
• Impact of motion on emotional responses in viewers
•
Cinematography people
• Patterning of shot lengths
•
HCI people
• Determining the emotion in speech and audio
31
From features to moods
• Remaining problems:
• Feature – mood relations are rather vague
• Vast variety of content for a given mood
• A solution inspired by psychophysiology:
• Uncouple the connection between the features and moods by
introducing an intermediate layer!
• Intermediate layer defined by
• Arousal (intensity of affect) Level of excitement
• Valence (type of affect) from pleasant to unpleasant
32
16
The Valence-Arousal paradigm *
•
All moods extracted from video
can be mapped onto an emotion
space created by the arousal and
valence axes
•
Affective states points in the
2D affect space
•
Arousal and Valence
Arousal
Valence
• Can be measured using
physiological functions (e.g.
heart rate, skin conductance)
• Can be linked to a number of
audiovisual features by means
of user tests
33
* Dietz and Lang: Aefective agents: Effect of agent affect on arousal, attention, liking and learning,
3rd ICTC, 1999
Arousal, Valence and Affect time curve*
Arousal
• Video
• Temporal content flow
• Smooth transitions from one
affective state to another
2
4
3
a)
1
t
Valence
2
b)
1
t
• Measurement of arousal and
valence along a video:
3
4
Arousal
• Arousal and Valence time curves !
• Combining the arousal and valence
time curves into the Affect Curve
Affect curve
4
2
c)
3
Valence
1
34
* Hanjalic and Xu: Affective video content representation and modeling, IEEE Trans. on Multimedia,
February 2005
17
Example: A Model for Arousal Curve *
•
Arousal time curve:
A( k ) = F (Gi ( k ), i = 1,..., N )
• N - number of features considered in the model
• Gi(k) - models the arousal variations revealed by feature i
• F - integrates the contributions of all features
•
Three features measured along a video per frame k:
• motion activity m(k)
• density of cuts c(k)
• sound energy e(k)
•
Smoothing and scaling of each feature time curve Gi(k)
•
F - a weighted average of the components Gi(k)
35
* Hanjalic and Xu: Affective video content representation and modeling, IEEE Trans. on Multimedia,
February 2005
Extracting moods using
Valence-Arousal Paradigm *
•
•
Effective and generic alternative to a content modeling approach
Possibility to optimize either the measurement or the inference side
separately and on-the-fly
User query:
-Horror thrill
-Hilarious fun
-Romantic “feel good”
-…
AV signal
Valence
Feature
extraction
Psychophysiology:
Translation: Mood VA value range
Arousal
Measurement side
Affect
curve
Inference side
36
* Hanjalic: Extracting Moods from Pictures and Sounds: Towards Truly Personalized TV,
IEEE Signal Processing Magazine, March 2006
18
Personalized video content delivery:
Affective user profile generation *
User profile obtained by
- Collecting affect curves of all programs watched in the past
- Collecting prevailing moods into areas of interest
Affect curve extracted
from a new video
Area of
interest
Area of
interest
Prevailing
mood
Any overlap
between
the areas?
Area of
interest
37
* Hanjalic: Extracting Moods from Pictures and Sounds: Towards Truly Personalized TV,
IEEE Signal Processing Magazine, March 2006
Personalized video content delivery:
Browsing the 2D affect space *
Video
Videolist
list11
Video
Videolist
list22
Video
Videolist
list33
Video
Videolist
list44
User moves the pointer and scans the
affect space
Video
Videolist
list55
List of videos
already known to the
system and having
similar prevailing mood
38
* Hanjalic: Extracting Moods from Pictures and Sounds: Towards Truly Personalized TV,
IEEE Signal Processing Magazine, March 2006
19
Another example of a paradigm shift:
Detecting soccer highlights
•
Classical approach:
• Select a number of highlight-like events,
• Train sufficient number of event detectors
• Use event detectors to detect highlights
•
Paradigm shift:
• Use the link between highlights and excitement!
•
An “outside the box” solution:
• Model the excitement along a video as an arousal time curve
• Select soccer video segments with maximum excitement by
thresholding the arousal time curve
39
Soccer highlights extraction:
A simple realization *
• Cut off the peaks of the arousal time curve in the
desired duration
Cut-off line
Highlights
40
* Hanjalic: Generic approach to highlights detection in a sports video, IEEE ICIP 2003
20
Soccer highlights extraction:
A more sophisticated realization *
• Possibility to influence the composition of the
highlighting sequence of a fixed duration
• Considering highlight “strength” number of “reacting” features
Maximum
Selectiveness
Less
selectiveness
41
* Hanjalic: Generic approach to highlights detection in a sports video, IEEE ICIP 2003
Maximum Selectiveness
• Weighting the arousal components with respect to the
“weakest” one
w(x)
1
Gi ' (k ) = Gi (k ) w(k ) ,
i = 1,..,3
σ
x
with
d
w( k ) =
 min (Gi ( k )) − d  
1 
 i

1
+
erf


 
2
σ



and erf ( x) =
2
π∫
x
0
e −t dt
2
42
21
Adaptive “filtering” of arousal curve *
Maximum
selectiveness
Less
selectiveness
43
* Hanjalic: Generic approach to highlights detection in a sports video, IEEE ICIP 2003
Affective MCA still a Grand Challenge
• Need for more solid links between affect dimensions and
features
• Need for more sophisticated integrative models of affect
dimensions
• Need for optimal ways for employing affect
measurements for personalization:
• Affect curve abstraction and representation
• …
44
22
Other Challenges in MCA
•
Multimedia Content Mining and Knowledge Discovery
• Extracting key content elements or multimedia keywords (equivalent to
keywords in text)
• Revealing the semantic temporal content structure of a general multimedia
data stream (equivalent to text segmentation)
• Finding semantically meaningful data clusters at different granularity levels
• Indexing of found clusters and segments by multimedia keywords
•
Cross-Media Learning and Reasoning
• Links the persons, objects, events and scenes with the words appearing in
the accompanying speech or overlay text
• Learning the text-video-audio links on-the-fly (e.g. self-learning)
•
Importance
• Enabling “Multimedia Google”
45
Computing Text Content Similarity (1)
•
Similarity between texts of clips m and n obtained on the basis of
• Number of same words
• Similarity of their significance in both clips
•
Word significance computed based on
• TF - Word frequency (how many times a word occurs)
• IDF - Collection frequency (how exclusive or unique a word is for a clip)
• Document length (serves to normalize above two measures)
•
•
•
Significance of work k in text tm expressed by the “weight” w(k,tm)
Feature vector for clip comparison V consists of word weights
Clip similarity computation using a cosine measure:
∑ w(k , t m ) w(k , t n )
S (m, n, V ) = cos(t m , t n ) =
K
∑ w 2 (k , t m ) ⋅ ∑ w 2 (k , t n )
K
K
“joint” vocabulary of the clips
46
23
Computing Text Content Similarity (2)
•
Improved performance by using more sophisticated text analysis
techniques
• Applying stemming to capture word derivatives (e.g. endings)
• Exploiting dissimilar but semantically related words (e.g. “volcano” and
“lava”) by using thesaurus or a collocation network
• Inferring subtle semantic relations between words using e.g.
Latent Semantic Analysis (LSA)
47
From textual to multimedia keywords:
An audio example
•
Identify audio “words”
• Clusters of elementary audio segments with similar signal properties
• Requires suitable features, similarity metric and clustering mechanism
•
Defining similarity between audio words to be able to “count” them
• Eliminating noise from feature vectors by working with dominant
features (e.g. via Singular Value Decomposition)
•
Probabilistic “counting” based on level of “word” similarity
• “Expected” TF and IDF instead of the exact ones! *
48
* Lu, Hanjalic: Towards Optimal Audio "Keywords" Detection for Audio Content Analysis and Discovery
ACM Multimedia 2006
24
Illustration:
Text-like Audio Scene Segmentation *
•
•
Compute co-occurrence and significance of audio words
Define semantic affinity between segments as function of
• Their co-occurrence
• Their mutual distance
• The probability that they
are key audio segments
(“keywords”)
Audio Element
Potential boundary
0
L-Buf
R-Buf
ei
A( si , s j ) = Co (ei , e j )e
−T ( s i , s j ) / T m
t
L
ej
i −1 i
j +1
j
Pei Pe j
•
Confidence for audio scene boundary at time t :
•
Weight W serves to unbias the confidence:
C (t ) =
W= ∑
1
∑ ∑ A( si , s j )
W si∈Lt s j∈Rt
∑ Pei Pe j
i , si ∈L t j,s j ∈R t
49
* Lu, Cai and Hanjalic: Audio Elements based Auditory Scene Segmentation. IEEE ICASSP 2006
Illustration:
Content discovery in composite audio*
Audio Streams
Documents / Web Pages
(Ι)
Feature Extraction
Word Parsing
Context-based
Scaling Factors
Iterative Spectral
Clustering Scheme
Words
Importance
Measures
Audio Elements
Index Terms
Selection
Keywords
Key Audio Elements
( ΙΙ )
BIC-based
Estimation of
Cluster Number
(a )
Auditory Scene Detection
Document
Categorization
Information-Theoretic Coclustering based Auditory
Scene Categorization
Auditory Scene Groups
(b)
Documents with Similar Topics
50
* Cai, Lu, Hanjalic: Unsupervised content discovery in composite audio, ACM Multimedia 2005
25
References
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Boggs J.M., Petrie D.W.: The art of watching films, 5th ed., Mountain View, CA: Mayfield 2000
Cai R., Lu L., Hanjalic A.: Unsupervised content discovery in composite audio, ACM Multimedia 2005
Dietz, R., Lang, A.: Aefective Agents: Effects of Agent Affect on Arousal, Attention, Liking and Learning, 3rd
International Cognitive Technology Conference, CT’99, 1999
Hanjalic A., Lagendijk R.L., Biemond J.: Automated high-level movie segmentation for advanced video-retrieval
systems, IEEE Transactions on Circuits and Systems for Video Technology, Vol.9, No.4, pp. 580-588, June 1999
Hanjalic A. Xu L.-Q.: Affective Video Content Representation and Modeling, IEEE Transactions on Multimedia,
February 2005
Hanjalic A.: Generic approach to highlights detection in a sports video, IEEE ICIP 2003
Hanjalic A.: Content-based analysis of digital video, Kluwer/Springer 2004
Hanjalic A., Kakes G., Lagendijk R.L., Biemond J.: Indexing and Retrieval of TV Broadcast News using DANCERS,
JEI, Oct. 2001
Jiang et al.: Video segmentation with the assistance of audio content analysis, IEEE ICME 2000
Kender J.R., Yeo B.-L.: Video scene segmentation via continuous video coherence, Proceedings of the IEEE
conference on Computer Vision and Pattern Recognition, 1998
Li, B.; Sezan, M.I.: Event detection and summarization in sports video, IEEE Workshop on Content-Based Access
of Image and Video Libraries, 2001. (CBAIVL 2001), Page(s): 132 -138
Lu L., Cai R. and Hanjalic A.: Audio Elements based Auditory Scene Segmentation. IEEE ICASSP 2006
Naphade M., Huang T.S.: Extracting semantics from audiovisual content: The final frontier in multimedia
retrieval, IEEE Transactions on Neural Networks, Vol. 13, No.4, July 2002
Rui Y., Huang T.S., Mehrotra S.: Constructing table-of-content for videos, Multimedia Systems, Special Section on
Video Libraries, 7(5), pp. 359-368, 1999
Sundaram H. and Chang S.-F.: Determining computable scenes in films and their structures using audio-visual
memory models”, ACM Multimedia 2000
Yeung M., Yeo B.-L.: Time-constrained clustering for segmentation of video into story units, Proceedings of the
International Conference on Pattern Recognition (ICPR’96), pp. 375-380, 1996
Yeung M., Yeo B.-L.: Video visualization for compact presentation and fast browsing of pictorial content, IEEE
Transactions on Circuits and Systems for Video Technology, Vol.7, pp. 771-785, 1997
51
26