Using Bilingual Parallel Corpora in Translation Memory Systems

Hossein Keshtkar, Tayebeh Mosavi Miangah


Automatic word alignment techniques commonly used in Translation Memory systems tend basically to work at single word level where there is a one to one correspondence between words in subsequences of the two languages. This, results in not being able to fully use subsentential repetitions like clauses, phrases and expressions. In this paper, using spaces between words, a search method named "space-based reduction search" is introduced. The main goal is to maximize the use of parallel corpus resources. We want to show that this search method can significantly enhance the chance of finding matches for subsequences of input sentences; hence applicable in a Sub-Sentential Translation Memory (SSTM) system without running automatic alignment tools.



Sub-Sentential Translation Memory, Parallel corpus, Alignment

Full Text:



-Abney, S. (1991). Parsing by Chunks. In: R. C. Berwick (ed.), Principle-Based Parsing: Computation and Psycholinguistics, 257–78, Dordrecht: Kluwer.

-Arthern, P. J. (1979). Machine translation and computerized terminology systems: a translator’s viewpoint. In: B.M. Snell (editor), Translating and the Computer: Proceedings of a Seminar, North-Holland, 1979, pp. 77–108.

-Arthern, P. J. (1981). Aids unlimited: the scope for machine aids in a large organization. Aslib Proceedings, 33, 309–319 (1981).

-Kay, M. (1997). The proper place of man and machine in language translation. In: machine translation, volume 12, Nos. 1-2, 1997, 3-23 (reprint from 1980)

-Macklovitch, E. (2000) Two Types of Translation Memory. In: Translating and the Computer 22: Proceedings from the Aslib conference held on 16 & 17 November 2000,

-Macklovitch, E., Lapalme G. & Gotti F. (2008). TransSearch: what are translators looking for? AMTA-2008. MT at work: Proceedings of the Eighth Conference of the Association for Machine Translation in the Americas, Waikiki, Hawai’i, 21-25 October 2008; pp.412-419.

-Mandl, T. (2008). Recent Developments in the Evaluation of Information Retrieval Systems: Moving Towards Diversity and Practical Relevance. Informatica 32 (2008) 27–38.

-Manning, C. D. & Schiitze H. (1999). Foundations of Statistical Natural Language Processing. The MIT Press Cambridge, Massachusetts, London, England.

-Melamed, I. Dan. (1998). Word-to-Word Models of Translational Equivalence. Technical Report 98-08, Dept. of Computer and Information Science, University of Pennsylvania, Philadelphia, USA.

-Melby, A. K. (1995). The Possibility of Language: A Discussion of the Nature of Language. John Benjamins, 1995, p. 225f.

-Simard, M. (2003). Translation Spotting for Translation Memories. HLT-NAACL 2003 Workshop, "Building and using parallel texts: data driven machine translation and beyond", 31 May 2003, Edmonton, Canada.

-Simard, M. & Langlais, P. (2001). Sub-sentential Exploitation of Translation Memories. MT Summit VIII: Machine Translation in the Information Age, Proceedings, Santiago de Compostela, Spain, 18-22 September 2001; pp.335-339.

-V´eronis, J. & Langlais, P. (2000). Evaluation of Parallel Text Alignment Systems – The ARCADE Project. In Jean V´eronis, editor, Parallel Text Processing, Text, Speech and Language Technology. Kluwer Academic Publishers, Dordrecht, The Netherlands.

-Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory IT-13: 1260-269.

-Whyman, E. K. & Somers, H. L. (1999). Evaluation Metrics for a Translation Memory System. SOFTWARE-PRACTICE AND EXPERIENCE Softw. Pract. Exper., 29(14), 1265–1284 (1999)

-Wu, D. (1997). Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora. Computational Linguistics, 23(3):377–404, September.



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2012-2021 (CC-BY) Australian International Academic Centre PTY.LTD

International Journal of Applied Linguistics and English Literature

To make sure that you can receive messages from us, please add the journal emails into your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.