Assessing relevance using automatically translated documents for cross-language information retrieval

PhD thesis


Orengo, V. 2004. Assessing relevance using automatically translated documents for cross-language information retrieval. PhD thesis Middlesex University
TypePhD thesis
TitleAssessing relevance using automatically translated documents for cross-language information retrieval
AuthorsOrengo, V.
Abstract

This thesis focuses on the Relevance Feedback (RF) process, and the scenario considered is that of a Portuguese-English Cross-Language Information Retrieval (CUR) system. CUR deals with the retrieval of documents in one natural language in response to a query expressed in another language. RF is an automatic process for query reformulation. The idea behind it is that users are unlikely to produce perfect
queries, especially if given just one attempt.The process aims at improving the queryspecification, which will lead to more relevant documents being retrieved. The method consists of asking the user to analyse an initial sample of documents retrieved in response to a query and judge them for relevance.
In that context, two main questions were posed. The first one relates to the user's ability in assessing the relevance of texts in a foreign language, texts hand translated into their language and texts automatically translated into their language. The second question concerns the relationship between the accuracy of the participant's judgements and the improvement achieved through the RF process.
In order to answer those questions, this work performed an experiment in which Portuguese speakers were asked to judge the relevance of English documents, documents hand-translated to Portuguese, and documents automatically translated to Portuguese. The results show that machine translation is as effective as hand translation in aiding users to assess relevance. In addition, the impact of misjudged
documents on the performance of RF is overall just moderate, and varies greatly for different query topics.
This work advances the existing research on RF by considering a CUR scenario and carrying out user experiments, which analyse aspects of RF and CUR that remained unexplored until now. The contributions of this work also include: the investigation of CUR using a new language pair; the design and implementation of a stemming algorithm for Portuguese; and the carrying out of several experiments using Latent Semantic Indexing which contribute data points to the CUR theory.

Institution nameMiddlesex University
Publication dates
Print06 Feb 2015
Publication process dates
Deposited06 Feb 2015
CompletedMar 2004
Output statusPublished
Accepted author manuscript
LanguageEnglish
Permalink -

https://repository.mdx.ac.uk/item/84wqx

Download files


Accepted author manuscript
  • 18
    total views
  • 235
    total downloads
  • 0
    views this month
  • 21
    downloads this month

Export as