The MuSeUM project addresses the prototypical problem of a cultural heritage institution with the ambition to disclose all of its content in a single, unified system. The institution has various legacy systems, each dealing with a small part of the collection, each constructed for different purposes, in different times, by different people, working in different traditions, based on different design principles, with different access methods, etcetera. In short, the cultural heritage institution is confronted with its own history.
MuSeUM investigates theoretically transparent ways of combining modern information retrieval methods based on statistical language modeling with varying amounts of metadata and non-content features. Our approach to metadata is, in essence, the famous dumb-down principle: although metadata is based on a specific thesaurus or ontology, we can alway fall back on the description of the terms in ordinary language. In this way, we can directly employ the powerful methods of textual information retrieval.
Concretely, we will address the following research questions: