Module Specifications

Archived Version 2019 - 2020

Module Title
Module Code

Online Module Resources

NFQ level 8 Credit Rating 7.5
Pre-requisite None
Co-requisite None
Compatibles None
Incompatibles None

This course introduces the fundamentals of statistical machine translation.

Learning Outcomes

1. Discuss the challenges associated with machine translation
2. Explain the noisy channel model underpinning statistical machine translation
3. Demonstrate how a statistical translation model can be inferred from a parallel corpus of texts using unsupervised machine learning techniques
4. Explain the concept of statistical language modelling and how it fits in to the basic SMT architecture
5. Explain the concept of decoding and be in a position to implement a beam decoder
6. Evaluate a statistical machine translation system using at least one automatic metric
7. Demonstrate a knowledge of the state-of-the-art in statistical machine translation
8. Train, test and evaluate MT systems using the open-source Moses toolkit
9. Implement a language modeller (including smoothing) and a basic word aligner

Workload Full-time hours per semester
Type Hours Description
Lecture24Two lectures a week
Laboratory24One two-hour lab session a week
Group work40Group project
Assignment Completion50Individual assignment
Independent Study50Studying material presented in lecture, reading research papers
Total Workload: 188

Indicative Content and Learning Activities

Noisy Channel Model of Statistical Machine Translation
The noisy channel model and its link to Bayes Theorem

Evaluating SMT systems
The relative advantages and disadvantages of human evaluation, automatic evaluation and task-based evaluation. The BLEU evaluation metric

Language Modelling
The role of language modelling in SMT. The importance of smoothing in language modelling

Translation Models
Learning a word-based translation model from a parallel corpus using Expectation Maximization. Deriving a phrase-based model from a word-based model.The relative strengths and weaknesses of various models

A beam search decoding algorithm for SMT. Techniques for pruning the search space.

Encoding Linguistic Information in an SMT system
Techniques for including morphological, syntactic and semantic knowledge in an SMT system

Assessment Breakdown
Continuous Assessment% Examination Weight%
Course Work Breakdown
TypeDescription% of totalAssessment Date
Reassessment Requirement
Resit arrangements are explained by the following categories;
1 = A resit is available for all components of the module
2 = No resit is available for 100% continuous assessment module
3 = No resit is available for the continuous assessment component
Indicative Reading List

  • Philipp Koehn,: 0, Statistical Machine Translation, 0521874157
Other Resources

Programme or List of Programmes