Project T2 (Kügler, Stede)

Information structure in speech synthesis


In automatic speech synthesis, considerable progress was made over the past years, but the dominant paradigm of text-to-speech synthesis still shows deficits when the prosodic structure of an utterance is in non-trivial ways context-dependent. A central problem here is the lacking treatment of information structure (IS): Often, given information cannot be distinguished from new; focused or contrasting elements are not being signalled as such. Our project aims at improving speech synthesis by systematically incorporating such discourse-based information. In order to circumvent the (extremely difficult) task of automatic text analysis on the IS level, we will work with an existing system that automatically generates text: Given a database and a user query, it dynamically produces descriptions and comparisons of suitable products (here: textbooks on computational linguistics). In such a setting, it is possible to compute the degree of activation of discourse referents, and features of contrastiveness. Our first task thus is to determine IS annotations for the generated sentences, i.e., to label them with features for givenness, topicality, and focus. To this end, we will develop a scheme for discourse modelling (which should be largely independent of the specific application domain), and an algorithm for computing the IS parameters of the following sentence in the given context. These IS features are then mapped to a prosodic annotation in accordance with the GToBI scheme. The tonal structure has to be optimized for parameters such as givenness and contrast, which determine, amongst others, the size of the focus domain, word accentuation and deaccentuation, or structural features such as positioning the nuclear accent of the sentence. The intonation contour of the sentence will be computed by combining the tonal annotation and the information structure; this algorithm will also be in charge of selecting the pitch register, relative to which the various pitch accents will be scaled. As a speech synthesis module, we will use the MARY system developed by DFKI GmbH (Saarbrücken). DFKI will be a partner in this project and support us in making the necessary additions and adjustments to MARY. The second external partner is beyo GmbH (Potsdam), which will help with the practical evaluation of our synthesis results and determine the possible transfer of the results to applications such as web page voice reading.

Full description 3rd phase SFB 632 / T2 (excerpt from the request) pdficon small


publish Description   user Staff   document-library Publications   communication Activities