In this paper, we propose a user-based video indexing method, that
automatically generates thumbnails of the most important scenes of an online
video stream, by analyzing users' interactions with a web video player. As a
test bench to verify our idea we have extended the YouTube video player into
the VideoSkip system. In addition, VideoSkip uses a web-database (Google
Application Engine) to keep a record of some important parameters, such as the
timing of basic user actions (play, pause, skip). Moreover, we implemented an
algorithm that selects representative thumbnails.
Convolution and cross-correlation are the basis of filtering and pattern or
template matching in multimedia signal processing. We propose two throughput
scaling options for any one-dimensional convolution kernel in programmable
processors by adjusting the imprecision (distortion) of computation. Our
approach is based on scalar quantization, followed by two forms of tight
packing in floating-point (one of which is proposed in this paper) that allow
for concurrent calculation of multiple results.
A cost effective, gesture based modelling technique called Virtual
Interactive Prototyping (VIP) is described in this paper. Prototyping is
implemented by projecting a virtual model of the equipment to be prototyped.
Users can interact with the virtual model like the original working equipment.
For capturing and tracking the user interactions with the model image and sound
processing techniques are used.
As the rapid progress of the media streaming applications such as video
streaming can be classified into two types of streaming, Live video streaming,
Video on Demand (VoD). Live video streaming is a service which allows the
clients to watch many TV channels over the internet and the clients able to use
one operation to perform is to switch the channels.
With the ever-growing digital libraries and video databases, it is
increasingly important to understand and mine the knowledge from video database
automatically. Discovering association rules between items in a large video
database plays a considerable role in the video data mining research areas.
Based on the research and development in the past years, application of
association rule mining is growing in different domains such as surveillance,
meetings, broadcast news, sports, archives, movies, medical data, as well as
personal and online media collections.
This paper addresses the automatic classification of X-rated videos by
analyzing its obscene sounds. In this paper, obscene sounds refer to audio
signals generated from sexual moans and screams during sexual scenes. By
analyzing various sound samples, we determined the distinguishable
characteristics of obscene sounds and propose a repeated curve-like spectrum
feature that represents the characteristics of such sounds. We constructed
6,269 audio clips to evaluate the proposed feature, and separately constructed
1,200 X-rated and general videos for classification.
This paper presents a method for indexing activities of daily living in
videos obtained from wearable cameras. In the context of dementia diagnosis by
doctors, the videos are recorded at patients' houses and later visualized by
the medical practitioners. The videos may last up to two hours, therefore a
tool for an efficient navigation in terms of activities of interest is crucial
for the doctors. The specific recording mode provides video data which are
really difficult, being a single sequence shot where strong motion and sharp
lighting changes often appear.
This paper proposes a novel latent semantic learning method for extracting
high-level features (i.e. latent semantics) from a large vocabulary of abundant
mid-level features (i.e. visual keywords) with structured sparse
representation, which can help to bridge the semantic gap in the challenging
task of human action recognition. To discover the manifold structure of
midlevel features, we develop a spectral embedding approach to latent semantic
learning based on L1-graph, without the need to tune any parameter for graph
construction as a key step of manifold learning.
The multilayer secured DWT-DCT and YIQ color space based image watermarking
technique with robustness and better correlation is presented here. The
security levels are increased by using multiple pn sequences, Arnold
scrambling, DWT domain, DCT domain and color space conversions. Peak signal to
noise ratio and Normalized correlations are used as measurement metrics.
In this paper, we attempt to revisit the problem of multi-party conferencing
from a practical perspective, and to rethink the design space involved in this
problem. We believe that an emphasis onlow end-to-end delays between any two
parties in the conference is a must, and the source sending rate in a session
should adapt to bandwidth availability and congestion. We present Celerity, a
multi-party conferencing solution specifically designed to achieve our
objectives. It is entirely Peer-to-Peer (P2P), and as such eliminating the cost
of maintaining centrally administered servers.
In this paper, we suggest a general model for the fixed-valued impulse noise
and propose a two-stage method for high density noise suppression while
preserving the image details. In the first stage, we apply an iterative impulse
detector, exploiting the image entropy, to identify the corrupted pixels and
then employ an Adaptive Iterative Mean filter (AIM) to restore them. The filter
is adaptive in terms of the number of iterations, which is different for each
noisy pixel, according to their Euclidean distance from the nearest uncorrupted
pixel.
Many tasks in music information retrieval, such as recommendation, and
playlist generation for online radio, fall naturally into the query-by-example
setting, wherein a user queries the system by providing a song, and the system
responds with a list of relevant or similar song recommendations. Such
applications ultimately depend on the notion of similarity between items to
produce high-quality results. Current state-of-the-art systems employ
collaborative filter methods to represent musical items, effectively comparing
items in terms of their constituent users.
This document supplements an experimental Jitter / Max/MSP collection of
implementation patches that set its goal to simulate an alchemical process for
a person standing in front of a mirror-like screen while interacting with it.
The work involved takes some patience and has three stages to go through. At
the final stage the "alchemist" in the mirror wearing sharp-colored gloves (for
motion tracking) is to extract the final ultimate shining sparkle (FFT-based
visualization) in the nexus of the hands. The more the hands are apart, the
large the sparkle should be.
Information system designers face many challenges w.r.t. selecting
appropriate semantic technologies and deciding on a modelling approach for
their system. However, there is no clear methodology yet to evaluate
"semantically enriched" information systems.
Now-a-days internet has become a vast source of entertainment & new services
are available in quick succession which provides entertainment to the users.
One of this service i.e. Video-on-Demand is most hyped service in this context.
Transferring the video over the network with less error is the main objective
of the service providers. In this paper we present an algorithm for routing the
video to the user in an effective manner along with a method that ensures less
error rate than others.
In this paper the background detection in images in poor lighting can be done
by the use of morphological filters. Lately contrast image enhancement
technique is used to detect the background in image which uses Weber's Law. The
proposed technique is more effective one in which the background detection in
image can be done in color images. The given image obtained in this method is
very effective one. More enhancement can be obtained while comparing the
results. In this technique compressed domain enhancement has been used for
better result.
M-Learning is a new learning paradigm of the new social structure with mobile
and wireless technologies.Smart school is one of the four flagship applications
for Multimedia Super Corridor (MSC) under Malaysian government initiative to
improve education standard in the country. With the advances of mobile devices
technologies, mobile learning could help the government in realizing the
initiative. This paper discusses the prospect of implementing mobile learning
for primary school students.
In this paper, we propose a systematic solution to the problem of scheduling
delay-sensitive media data for transmission over time-varying wireless
channels. We first formulate the dynamic scheduling problem as a Markov
decision process (MDP) that explicitly considers the users' heterogeneous
multimedia data characteristics (e.g.
This paper proposes Leader in Charge (LiC), a reliable multicast architecture
for device-to-device (D2D) radio underlaying cellular networks. The
multicast-requesting user equipments (UEs) in close proximity form a D2D
cluster to receive the multicast packets through cooperation. In addition to
receiving the multicast packets from the eNB, UEs share what they received from
the multicast on short-range links among UEs, namely the D2D links, to exploit
the wireless resources a more efficient way.
This paper presents a novel approach for web video categorization by
leveraging Wikipedia categories (WikiCs) and open resources describing the same
content as the video, i.e., content-duplicated open resources (CDORs). Note
that current approaches only col-lect CDORs within one or a few media forms and
ignore CDORs of other forms. We explore all these resources by utilizing WikiCs
and commercial search engines. Given a web video, its discrimin-ative Wikipedia
concepts are first identified and classified. Then a textual query is
constructed and from which CDORs are collected.
In this paper, an efficiently DWT-based watermarking technique is proposed to
embed signatures in images to attest the owner identification and discourage
the unauthorized copying. This paper deals with a fuzzy inference filter to
choose the larger entropy of coefficients to embed watermarks. Unlike most
previous watermarking frameworks which embedded watermarks in the larger
coefficients of inner coarser subbands, the proposed technique is based on
utilizing a context model and fuzzy inference filter by embedding watermarks in
the larger-entropy coefficients of coarser DWT subbands.
Our research focuses on analysing human activities according to a known
behaviorist scenario, in case of noisy and high dimensional collected data. The
data come from the monitoring of patients with dementia diseases by wearable
cameras. We define a structural model of video recordings based on a Hidden
Markov Model. New spatio-temporal features, color features and localization
features are proposed as observations. First results in recognition of
activities are promising.
This paper is to create a practical steganographic implementation for 4-bit
images.The proposed technique converts 4 bit image into 4 shaded Gray Scale
image. This image will be act as reference image to hide the text. Using this
grey scale reference image any text can be hidden. Single character of a text
can be represented by 8-bit. The 8-bit character can be split into 4X2 bit
information. If the reference image and the data file are transmitted through
network separately, we can achieve the effect of Steganography.
Image steganography is the art of hiding information into a cover image. This
paper presents a novel technique for Image steganography based on Block-DCT,
where DCT is used to transform original image (cover image) blocks from spatial
domain to frequency domain. Firstly a gray level image of size M x N is divided
into no joint 8 x 8 blocks and a two dimensional Discrete Cosine Transform (2-d
DCT) is performed on each of the P = MN / 64 blocks.
This paper presents an efficient method for approximation of temporal video
data using linear Bezier fitting. For a given sequence of frames, the proposed
method estimates the intensity variations of each pixel in temporal dimension
using linear Bezier fitting in Euclidean space. Fitting of each segment ensures
upper bound of specified mean squared error. Break and fit criteria is employed
to minimize the number of segments required to fit the data. The proposed
method is well suitable for lossy compression of temporal video data and
automates the fitting process of each pixel.
The digital image data is rapidly expanding in quantity and heterogeneity.
The traditional information retrieval techniques does not meet the user's
demand, so there is need to develop an efficient system for content based image
retrieval. Content based image retrieval means retrieval of images from
database on the basis of visual features of image like as color, texture etc.
In our proposed method feature are extracted after applying Phong shading on
input image.
Most P2P VoD schemes focused on service architectures and overlays
optimization without considering segments rarity and the performance of
prefetching strategies. As a result, they cannot better support VCRoriented
service in heterogeneous environment having clients using free VCR controls.
Despite the remarkable popularity in VoD systems, there exist no prior work
that studies the performance gap between different prefetching strategies. In
this paper, we analyze and understand the performance of different prefetching
strategies.
Video fusion is a process that combines visual data from different sensors to
obtain a single composite video preserving the information of the sources. The
availability of a system, enhancing human ability to perceive the observed
scenario, is crucial to improve the performance of a surveillance system. The
infrared (IR) camera captures thermal image of object in night-time
environment, when only limited visual information can be captured by RGB
camera.
Speaker identification is the process of determining which registered speaker
provides a given utterance. Speaker identification required to make a claim on
the identity of speaker from the Ns trained speaker in its user database. In
this study, we propose the combination of clustering algorithm and the
classification technique - subtractive and Radial Basis Function (RBF). The
proposed technique is chosen because RBF is a simpler network structures and
faster learning algorithm.
...The steganography scheme makes it possible to hide the medical image in
different bit locations of host media without inviting suspicion. The Secret
file is embedded in a cover media with a key. At the receiving end the key can
be derived by all the classes which are higher in the hierarchy using symmetric
polynomial and the medical image file can be retrieved. The system is
implemented and found to be secure, fast and scalable. Simulation results show
that the system is dynamic in nature and allows any type of hierarchy.
As the multimedia and internet technologies are growing fast, the
transmission of digital media plays an important role in communication. The
various digital media like audio, video and images are being transferred
through internet. There are a lot of threats for the digital data that are
transferred through internet. Also, a number of security techniques have been
employed to protect the data that is transferred through internet. This paper
proposes a new technique for sending secret messages securely, using
steganographic technique.
A method of lossless data hiding in images using integer wavelet transform
and histogram shifting for gray scale images is proposed. The method shifts
part of the histogram, to create space for embedding the watermark information
bits. The method embeds watermark while maintaining the visual quality well.
The method is completely reversible. The original image and the watermark data
can be recovered without any loss.
The development and application of various remote sensing platforms result in
the production of huge amounts of satellite image data. Therefore, there is an
increasing need for effective querying and browsing in these image databases.
In order to take advantage and make good use of satellite images data, we must
be able to extract meaningful information from the imagery. Hence we proposed a
new algorithm for SAR image segmentation. In this paper we propose segmentation
using vector quantization technique on entropy image.
There has been a remarkable increase in the data exchange over web and the
widespread use of digital media. As a result, multimedia data transfers also
had a boost up. The mounting interest with reference to digital watermarking
throughout the last decade is certainly due to the increase in the need of
copyright protection of digital content. This is also enhanced due to
commercial prospective. Applications of video watermarking in copy control,
broadcast monitoring, fingerprinting, video authentication, copyright
protection etc is immensely rising.
This paper introduces a novel indexing and access method, called Feature-
Based Adaptive Tolerance Tree (FATT), using wavelet transform is proposed to
organize large image data sets efficiently and to support popular image access
mechanisms like Content Based Image Retrieval (CBIR).Conventional database
systems are designed for managing textual and numerical data and retrieving
such data is often based on simple comparisons of text or numerical values.
However, this method is no longer adequate for images, since the digital
presentation of images does not convey the reality of images.
Tag recommendation is a common way to enrich the textual annotation of
multimedia contents. However, state-of-the-art recommendation methods are built
upon the pair-wised tag relevance, which hardly capture the context of the web
video, i.e., when who are doing what at where. In this paper we propose the
context-oriented tag recommendation (CtextR) approach, which expands tags for
web videos under the context-consistent constraint.
With the rapid development of various multimedia technologies, more and more
multimedia data are generated and transmitted in the medical, commercial, and
military fields, which may include some sensitive information which should not
be accessed by or can only be partially exposed to the general users.
Therefore, security and privacy has become an important, Another problem with
digital document and video is that undetectable modifications can be made with
very simple and widely available equipment, which put the digital material for
evidential purposes under question .With the large flood
Digital processing of speech signal and voice recognition algorithm is very
important for fast and accurate automatic voice recognition technology. The
voice is a signal of infinite information. A direct analysis and synthesizing
the complex voice signal is due to too much information contained in the
signal. Therefore the digital signal processes such as Feature Extraction and
Feature Matching are introduced to represent the voice signal.
In this paper we propose scalable proxy servers cluster architecture of
interconnected proxy servers for high quality and high availability services.
We also propose an optimal regional popularity based video prefix replication
strategy and a scene change based replica caching algorithm that utilizes the
zipf-like video popularity distribution to maximize the availability of videos
closer to the client and request-servicing rate thereby reducing the client
rejection ratio and the response time for the client.
Institutions all over the world are continuously exploring ways to use ICT in
improving teaching and learning effectiveness. The use of course web pages,
discussion groups, bulletin boards, and e-mails have shown considerable impact
on teaching and learning in significant ways, across all disciplines.
Multimedia data is a form of data that can represent all types of data
(images, sound and text). The use of multimedia data for the online application
requires a more comprehensive database in the use of storage media, Sorting /
indexing, search and system / data searching. This is necessary in order to
help providers and users to access multimedia data online. Systems that use of
the index image as a reference requires storage media so that the rules and
require special expertise to obtain the desired file.
Image Compression plays a very important role in image processing especially
when we are to send the image on the internet. The threat to the information on
the internet increases and image is no exception. Generally the image is sent
on the internet as the compressed image to optimally use the bandwidth of the
network. But as we are on the network, at any intermediate level the image can
be changed intentionally or unintentionally.
The rapid development of multimedia and internet allows for wide distribution
of digital media data. It becomes much easier to edit, modify and duplicate
digital information besides that, digital documents are also easy to copy and
distribute, therefore it will be faced by many threats. It is a big security
and privacy issue.
Although content-based image retrieval (CBIR) is not a new subject, it keeps
attracting more and more attention, as the amount of images grow tremendously
due to internet, inexpensive hardware and automation of image acquisition. One
of the applications of CBIR is fetching images from a database. This paper
presents a new method for automatic image retrieval using moment invariants and
image entropy, our technique could be used to find semi or perfect matches
based on query by example manner, experimental results demonstrate that the
purposed technique is scalable and efficient.
A method for the design of Fast Haar wavelet for signal processing and image
processing has been proposed. In the proposed work, the analysis bank and
synthesis bank of Haar wavelet is modified by using polyphase structure.
Finally, the Fast Haar wavelet was designed and it satisfies alias free and
perfect reconstruction condition. Computational time and computational
complexity is reduced in Fast Haar wavelet transform.
Motivated by the work of Uehara et al. [1], an improved method to recover DC
coefficients from AC coefficients of DCT-transformed images is investigated in
this work, which finds applications in cryptanalysis of selective multimedia
encryption. The proposed under/over-flow rate minimization (FRM) method employs
an optimization process to get a statistically more accurate estimation of
unknown DC coefficients, thus achieving a better recovery performance.
The purpose of this Paper is to describe our research on different feature
extraction and matching techniques in designing a Content Based Image Retrieval
(CBIR) system. Due to the enormous increase in image database sizes, as well as
its vast deployment in various applications, the need for CBIR development
arose. Firstly, this paper outlines a description of the primitive feature
extraction techniques like, texture, colour, and shape. Once these features are
extracted and used as the basis for a similarity check between images, the
various matching techniques are discussed.
In a video on demand system, the main video repository may be far away from
the user and generally has limited streaming capacities. Since a high quality
video's size is huge, it requires high bandwidth for streaming over the
internet. In order to achieve a higher video hit ratio, reduced client waiting
time, distributed server's architecture can be used, in which multiple local
servers are placed close to clients and, based on their regional demands video
contents are cached dynamically from the main server.
In this paper we have proposed a dynamic buffer allocation algorithm for the
prefix, based on the popularity of the videos. More cache blocks are allocated
for most popular videos and a few cache blocks are allocated for less popular
videos. Buffer utilization is also maximized irrespective of the load on the
Video-on-Demand system. Overload can lead the server getting slowed down.
This paper presents a gradient based motion estimation algorithm based on
shape-motion prediction, which takes advantage of the correlation between
neighboring Binary Alpha Blocks (BABs), to match with the Mpeg-4 shape coding
case and speed up the estimation process. The PSNR and computation time
achieved by the proposed algorithm seem to be better than those obtained by
most popular motion estimation techniques.
In this paper we have proposed an adaptive dynamic cache replacement
algorithm for a multimedia servers cache system. The goal is to achieve an
effective utilization of the cache memory which stores the prefix of popular
videos. A replacement policy is usually evaluated using hit ratio, the
frequency with which any video is requested. Usually discarding the least
recently used page is the policy of choice in cache management. The adaptive
dynamic replacement approach for prefix cache is a self tuning, low overhead
algorithm that responds online to changing access patterns.
- The aim of this paper is to propose a novel Voice On Demand (VoD)
architecture and implementation of an efficient load sharing algorithm to
achieve Quality of Service (QoS). This scheme reduces the transmission cost
from the Centralized Multimedia Sever (CMS) to Proxy Servers (PS) by sharing
the videos among the proxy servers of the Local Proxy Servers Group [LPSG] and
among the neighboring LPSGs, which are interconnected in a ring fashion. This
results in very low request rejection ratio, reduction in transmission time and
cost, reduction of load on the CMS and high QoS for the users.
Admission control is a key component in multimedia servers, which will allow
the resources to be used by the client only when they are available. A problem
faced by numerous content serving machines is overload, when there are too many
clients who need to be served, the server tends to slow down. An admission
control algorithm for a multimedia server is responsible for determining if a
new request can be accepted without violating the QoS requirements of the
existing requests in the system.
We take an analytical approach to study Quality of user Experience (QoE) for
video streaming applications. First, we show that random linear network coding
applied to blocks of video frames can significantly simplify the packet
requests at the network layer and save resources by avoiding duplicate packet
reception. Network coding allows us to model the receiver's buffer as a queue
with Poisson arrivals and deterministic departures. We consider the probability
of interruption in video playback as well as the number of initially buffered
packets (initial waiting time) as the QoE metrics.
We consider the problem of rate allocation among multiple simultaneous video
streams sharing multiple heterogeneous access networks. We develop and evaluate
an analytical framework for optimal rate allocation based on observed available
bit rate (ABR) and round-trip time (RTT) over each access network and video
distortion-rate (DR) characteristics. The rate allocation is formulated as a
convex optimization problem that minimizes the total expected distortion of all
video streams.
Dance video is one of the important types of narrative videos with semantic
rich content. This paper proposes a new meta model, Dance Video Content Model
(DVCM) to represent the expressive semantics of the dance videos at multiple
granularity levels. The DVCM is designed based on the concepts such as video,
shot, segment, event and object, which are the components of MPEG-7 MDS. This
paper introduces a new relationship type called Temporal Semantic Relationship
to infer the semantic relationships between the dance video objects.
Dance videos are interesting and semantics-intensive. At the same time, they
are the complex type of videos compared to all other types such as sports, news
and movie videos. In fact, dance video is the one which is less explored by the
researchers across the globe. Dance videos exhibit rich semantics such as macro
features and micro features and can be classified into several types. Hence,
the conceptual modeling of the expressive semantics of the dance videos is very
crucial and complex.
Educational media mining is the process of converting raw media data from
educational systems to useful information that can be used to design learning
systems, answer research questions and allow personalized learning experiences.
Knowledge discovery encompasses a wide range of techniques ranging from
database queries to more recent developments in machine learning and language
technology. Educational media mining techniques are now being used in IT
Services research worldwide.
Out of the scope of the usual positions of computing in the field of music
and musicology, one notices the emergence of human-computer systems that do
exist by breaking off. Though these singular systems take effect in the usual
fields of expansion of music, they do not make any systematic reference to
known musicological categories. On the contrary, they make possible experiments
that open uses where listening, composition and musical transmission get merged
in a gesture sometimes named as ?music-ripping?.
In this paper we present the new genre of interactive operas implemented on
personal computers. They differ from traditional ones not only because they are
virtual, but mainly because they offer to composers and listeners new
perspectives of combinations and interactions between music, text and visual
aspects.
QoS is a very important issue for multimedia communication systems. In this
paper, a new system that reinstalls the relation between the QoS elements
(RSVP, routing protocol, sender, and receiver) during the multimedia
transmission is proposed, then an alternative path is created in case of
original multimedia path failure.
The recent advent in the field of multimedia proposed a many facilities in
transport, transmission and manipulation of data. Along with this advancement
of facilities there are larger threats in authentication of data, its licensed
use and protection against illegal use of data. A lot of digital image
watermarking techniques have been designed and implemented to stop the illegal
use of the digital multimedia images.
Playout buffers are used in VoIP systems to compensate for network delay
jitter by making a trade-off between delay and loss. In this work we propose a
playout buffer algorithm that makes the trade-off based on maximization of
conversational speech quality, aiming to keep the computational complexity
lowest possible. We model the network delay using a Pareto distribution and
show that it is a good compromise between providing an appropriate fit to the
network delay characteristics and yielding a low arithmetical complexity.
The intent of the H.264 AVC project was to create a standard capable of
providing good video quality at substantially lower bit rates than previous
standards without increasing the complexity of design so much that it would be
impractical or excessively expensive to implement. An additional goal was to
provide enough flexibility to allow the standard to be applied to a wide
variety of applications.
WiCoM enables remote management of web resources. Our application Mobile
reporter is aimed at Journalist, who will be able to capture the events in
real-time using their mobile phones and update their web server on the latest
event. WiCoM has been developed using J2ME technology on the client side and
PHP on the server side. The communication between the client and the server is
established through GPRS. Mobile reporter will be able to upload, edit and
remove both textual as well as multimedia contents in the server.