next up previous
Next: NETRA Up: Systems Previous: MetaSEEk

MIR

(Multimodal Information Retrieval System)

Developer

Center of Excellence for Document Analysis and Recognition, University at Buffalo, NY, USA.

URL

http://www.cedar.buffalo.edu/MMIR/. A demo of the system is available at webaddress http://www.cedar.Buffalo.EDU/MMIR/demo/index.html.

References

[Sri95], [SZR00].
  
Figure 17: MIR query result.
./images/mir.jpg

Features

The MIR system combines various techniques from text processing and image processing in an attempt to derive semantic descriptions of images. By employing natural language processing (NLP) techniques in analyzing image captions, information about the picture's content is derived. This includes whether there are people in the picture, their names, location and time of the photograph, spatial relationships of the people in the image as well as other visual characteristics (attributes that can assist in face recognition such as gender, hair color, beards, mustaches and glasses). Also, statistical text indexing techniques are used in capturing the general context of the image (such as indoor versus outdoor). The pictures in which the presence of people is detected by NLP, are subjected to a face detection to verify this hypothesis. The face detection algorithm uses a three-contour model representing hair line, and left and right outlines of the face. After extracting edges at different image resolutions by means of a wavelet transform, a graph-matching generates possible face candidates in the image. After the face detection, the face areas are cropped out, and for the rest of the image a color histogram is computed. For the scenery pictures (where no face is detected in the NLP analysis) a color correlation histogram is computed. If $\{i,j,k\}$ is a color triple in the $2\times2\times2$ quantized HVS space, then hijkl denotes the number of occurrences of three pixels of these colors as the vertices of a isosceles right triangle with the smaller sides of length l. The color correlation histogram flijk is hijkl/(4*hi), where hi is the ith bin of the traditional color histogram.

Querying

A query formulation can include a text string, an image, and a topic (selected from a predefined set containing sports, politics, entertainment etc.). The user can also indicate the relative importance of text versus image content, as well as of background versus foreground in an image containing people (the foreground).

Matching

The similarity between a query and a database image is a weighted sum of the similarities between different sources of information (text-based and content-based) extracted from the image. Two color correlation histograms are matched using the Euclidean distance.

Applications

The system retrieves pictures of people and/or similar scenery in various contexts. Three databases were used in testing the system. The first one consisted of approximately 5000 images with accompanying text provided by United Press International. The second one provided by Kodak consists of consumer photos accompanied by speech annotations. The third one consists of multimodal documents downloaded from the Web.

 
next up previous
Next: NETRA Up: Systems Previous: MetaSEEk
Remco Veltkamp
2001-03-08