DIESIRAE is a prototype for extracting and indexing knowledge from natural language documents. The underlying domain model relies on a conceptual level (described by means of a Domain Ontology; as our main industrial partner was a wine company, the Domain Ontology was related to winery), which represents the domain knowledge, and a lexical level (based on WordNet), which represents the domain vocabulary. A stochastic model (which mixes –in a novel way– HMM and Maximum Entropy models) stores the mapping between such levels, taking into account the linguistic context of words. Such a context not only contains the surrounding words, but also the morphologic and syntactic information extracted by means of Natural Language Processing tools. The stochastic model is then used to disambiguate word senses, during the document indexing phase. The Semantic Information Retrieval engine we developed supports simple keyword-based queries, as well as natural language-based queries. The engine is also able to extend the domain knowledge, discovering new, relevant concepts to add to the domain model.

DIESIRAE is part of the ArtDeco project (see the Projects section above).

