Automatic discourse structure generation using rhetorical structure theory

PhD thesis


LeThanh, H. 2004. Automatic discourse structure generation using rhetorical structure theory. PhD thesis Middlesex University School of Computing Science
TypePhD thesis
TitleAutomatic discourse structure generation using rhetorical structure theory
AuthorsLeThanh, H.
Abstract

This thesis addresses a difficult problem in text processing: creating a System to automatically derive rhetorical structures of text. Although the rhetorical structure has proven to be useful in many fields of text processing such as text summarisation and information extraction, Systems that automatically generate rhetorical structures with high accuracy are difficult to find. This is because discourse is one of the biggest and yet least well defined areas in linguistics. An agreement amongst researchers on the best method for analysing the rhetorical structure of text has not been found.
This thesis focuses on investigating a method to generate the rhetorical structures of text. By exploiting different cohesive devices, it proposes a method to recognise rhetorical relations between spans by checking for the appearance of these devices. These factors include cue phrases, noun-phrase cues, verb-phrase cues, reference words, time references, substitution words, ellipses, and syntactic information. The discourse analyser is divided into two levels: sentence-level and text-level. The former uses syntactic information and cue phrases to segment sentences into elementary discourse units and to generate a rhetorical structure for each sentence. The latter derives rhetorical relations between large spans and then replaces each sentence by its corresponding rhetorical structure to produce the rhetorical structure of text. The rhetorical structure at the text-level is derived by selecting rhetorical relations to connect adjacent and non-overlapping spans to form a discourse structure that covers the entire text. Constraints of textual organisation and textual adjacency are effectively used in a beam search to reduce the search space in generating such rhetorical structures. Experiments carried out in this research received 89.4% F-score for the discourse segmentation, 52.4% F-score for the sentence-level discourse analyser and 38.1% F-score for the final output of the System. It shows that this approach provides good performance comparison with current research in discourse.

Department nameSchool of Computing Science
Institution nameMiddlesex University
Publication dates
Print14 Jul 2011
Publication process dates
Deposited14 Jul 2011
CompletedSep 2004
Output statusPublished
Accepted author manuscript
Additional information

A thesis submitted to Middlesex University in partial fulfllment of the requirements for the degree of Doctor of Philosophy.

LanguageEnglish
Permalink -

https://repository.mdx.ac.uk/item/83641

Download files


Accepted author manuscript
  • 19
    total views
  • 9
    total downloads
  • 0
    views this month
  • 1
    downloads this month

Export as