MuSeUM : Multiple-collection Searching Using Metadata

The MuSeUM project addresses the prototypical problem of a cultural heritage institution with the ambition to disclose all of its content in a single, unified system. The institution has various legacy systems, each dealing with a small part of the collection, each constructed for different purposes, in different times, by different people, working in different traditions, based on different design principles, with different access methods, etcetera. In short, the cultural heritage institution is confronted with its own history.

MuSeUM investigates theoretically transparent ways of combining modern information retrieval methods based on statistical language modeling with varying amounts of metadata and non-content features. Our approach to metadata is, in essence, the famous dumb-down principle: although metadata is based on a specific thesaurus or ontology, we can alway fall back on the description of the terms in ordinary language. In this way, we can directly employ the powerful methods of textual information retrieval.

Concretely, we will address the following research questions:

What is the effectiveness of information retrieval techniques on a collection with varying degrees of metadata. That is:

What if we ignore all metadata?
What if we use only the heterogeneous metadata of the original subcollections?
What if we use only the common metadata?
What if we use all available metadata?

What is the retrieval effectiveness for various user types and task types?
What is the relative impact of techniques dealing with structure?
What is the relative impact of techniques dealing with multilingual content, metadata and information needs?

Project description

A detailed project description is here.

MuSeUM : Multiple-collection Searching Using Metadata

Project description

People involved

Links

Publications