Next: NETRA
Up: Systems
Previous: MetaSEEk
(Multimodal Information Retrieval System)
Center of Excellence for Document Analysis and Recognition, University at Buffalo,
NY, USA.
http://www.cedar.buffalo.edu/MMIR/.
A demo of the system is available at webaddress
http://www.cedar.Buffalo.EDU/MMIR/demo/index.html.
[Sri95], [SZR00].
Figure 17:
MIR query result.
 |
The MIR system combines various techniques from text processing and image
processing in an attempt to derive semantic descriptions of images.
By employing natural language processing (NLP) techniques in analyzing
image captions, information about the picture's content is derived.
This includes whether there are people in the picture, their names, location
and time of the photograph, spatial relationships of the people in the image
as well as other visual characteristics (attributes that can assist in face
recognition such as gender, hair color, beards, mustaches and glasses).
Also, statistical text indexing techniques are used in capturing the general
context of the image (such as indoor versus outdoor).
The pictures in which the presence of people is detected by NLP, are subjected
to a face detection to verify this hypothesis.
The face detection algorithm uses a three-contour model representing hair line,
and left and right outlines of the face.
After extracting edges at different image resolutions by means of a wavelet
transform, a graph-matching generates possible face candidates in the image.
After the face detection, the face areas are cropped out, and for the rest of
the image a color histogram is computed.
For the scenery pictures (where no face is detected in the NLP analysis)
a color correlation histogram is computed.
If
is a color triple in the
quantized HVS
space, then hijkl denotes the number of occurrences of three pixels of
these colors as the vertices of a isosceles right triangle with the smaller
sides of length l.
The color correlation histogram
flijk is
hijkl/(4*hi), where
hi is the ith bin of the traditional color histogram.
A query formulation can include a text string, an image, and a topic (selected
from a predefined set containing sports, politics, entertainment etc.).
The user can also indicate the relative importance of text versus image
content, as well as of background versus foreground in an image containing
people (the foreground).
The similarity between a query and a database image is a weighted sum of the
similarities between different sources of information (text-based and
content-based) extracted from the image.
Two color correlation histograms are matched using the Euclidean distance.
The system retrieves pictures of people and/or similar scenery in
various contexts. Three databases were used in testing the system.
The first one consisted of approximately 5000 images with accompanying
text provided by United Press International. The second one
provided by Kodak consists of consumer photos accompanied by
speech annotations. The third one consists of multimodal documents
downloaded from the Web.
Next: NETRA
Up: Systems
Previous: MetaSEEk
Remco Veltkamp
2001-03-08