Oct 30 2012


Roberto Tedesco

Providing access to complex contents is a challenge authors are required to cope with. Such challenge is particularly hard if contents have to be accessed by people with cognitive and learning disabilities. SPARTA2 is a tool supporting the authoring of highly accessible texts. Using the tool, a text can be tailored to meet requirements of a specific target audience. The tool not only calculates the current readability level of the text, but also actively supports authors suggesting where the critical parts are, and how to modify them. The tool is integrated with the Word 2007 user interface and is particularly easy to use.

In contrast with other approaches, SPARTA2 makes a distinction between readability and understandability as, in our opinion, these concepts capture different aspects of the complexity of the text: a text could be highly readable, since the syntax is extremely simple, but extremely hard to understand because of the lexicon used. In our approach, readability gives an evaluation about the structure of sentences, while understandability captures the lexical aspects.

The Readability Index is composed of three sub-indexes: the Gulpease Index, the Chunk Index, and the Chunk Type Index.

The Gulpease Index is a widely used readability formula for the Italian language (SPARTA2 currently analyses Italian texts, however it is easy to adapt it for different languages). This approach, which is similar to the Flesch’s one – and widely adopted in the literature on readability – does not take in account the deep structure of the sentences.

The Chunk Index and the Chunk Type Index take in account the structure of the sentences in terms of chunks. These indexes are based on the analysis performed by the CHAOS Italian language shallow parser. In particular, the Chunk Index relates the number of chunks in a text to its readability. However, using the number of chunks does not consider the fact that different chunk types could have different readability. Thus, we added the Chunk Type Index, which is based on the distribution of chunk types in the text.

The Understandability Index measures the complexity related to the lexicon. The index is based on the De Mauro basic Italian dictionary, which contains the 4700 more used lemmas of the Italian language. The vocabulary is divided in three sections: basic vocabulary, highly used words vocabulary, and less used words vocabulary.

In our approach, we recognize that authors should be guided through the process of simplifying their texts. Thus, SPARTA2 is able to detect and report potential readability issues, analysing the structure and the lexicon of the sentences. This functionality fully exploits the chunk analysis performed by CHAOS: the result of the analysis is passed to a set of plug-ins, which can generate warnings to the user, suggesting also possible solutions.

The user interface of SPARTA2 is integrated into Word 2007; the indexes are visible to the author at all time, and can be updated by clicking on a button. SmatTags appear whenever a warning is reported to the user, and the related menus contain the solutions proposed by the plug-ins.

Designed and developed by:
A. Colombo, L. Sbattella, and R.Tedesco

The source code will be released soon, as open-source software.