Nov 07 2012


Roberto Tedesco

DIESIRAE is a prototype for extracting and indexing knowledge from natural language documents. The underlying domain model relies on a conceptual level (described by means of a Domain Ontology; as our main industrial partner was a wine company, the Domain Ontology was related to winery), which represents the domain knowledge, and a lexical level (based on WordNet), which represents the domain vocabulary. A stochastic model (which mixes –in a novel way– HMM and Maximum Entropy models) stores the mapping between such levels, taking into account the linguistic context of words. Such a context not only contains the surrounding words, but also the morphologic and syntactic information extracted by means of Natural Language Processing tools. The stochastic model is then used to disambiguate word senses, during the document indexing phase. The Semantic Information Retrieval engine we developed supports simple keyword-based queries, as well as natural language-based queries. The engine is also able to extend the domain knowledge, discovering new, relevant concepts to add to the domain model.

DIESIRAE is part of the ArtDeco project (see the Projects section above).

Designed and developed by:
L. Sbattella and R. Tedesco

Extending the Domain Model

Picture 9 of 9

A short demo of DIESIAE (a brief description on the query syntax can be found in the attached document)

We do not plan to release DIESIRAE, at this time, as it is still in an early stage of development.