DCU Home | Our Courses | Loop | Registry | Library | Search DCU

Module Specifications..

Current Academic Year 2023 - 2024

Please note that this information is subject to change.

Module Title Machine Translation
Module Code CA681I
School School of Computing
Module Co-ordinatorSemester 1: Andrew Way
Semester 2: Andrew Way
Autumn: Andrew Way
Module TeachersAndrew Way
Brian Davis
Maja Popovic
Kolawole John Adebayo
NFQ level 9 Credit Rating 7.5
Pre-requisite None
Co-requisite None
Compatibles None
Incompatibles None
Coursework Only
Repeat of all or individual Continuous Assessment components as required.
Description

This course covers the basics of machine translation. It explains what machine translation is, how it can be useful and what its potentials and limits are It gives a brief history which includes three main approaches: rule-based, statistical and neural. Students will learn about evaluation of machine-generated translations: both human and automatic. By the end of the course students will be equipped with a strong theoretical knowledge of statistical and neural network approaches to MT.

Learning Outcomes

1. Explain the concept of machine translation including approaches and the importance of language data.
2. Appraise the advantages and challenges of machine translation
3. Demonstrate an understanding of machine translation system evaluation (both manual and automatic)
4. Explain translation and language models in Statistical Machine Translation (SMT)
5. Demonstrate an understanding of the key concepts, techniques and challenges in SMT such as generative and discriminative models, word alignment and the decoding problem.
6. Explain the fundamental machinery of neural machine translation i.e neural networks, neural language and translation models and word representations
7. Describe modern Transformer-based neural machine translation architectures.
8. Describe advanced topics such as multilingual neural machine translation and the use of monolingual data to improve NMT.



Workload Full-time hours per semester
Type Hours Description
Online activity36Completion Online Asynchronous Lectures (Future Learn Courses)
Assignment Completion25This involves completing five fortnightly loop quizzes which assess each of the five course components of the modules
Independent Study126.5This comprises of independent reading, engaging in online activities, and formative assessment quizzes, engaging in online discussions and revision for biweekly summative quizzes
Total Workload: 187.5

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

Course 1: Introduction to Machine Translation
Topic 1 - Overview of Machine Translation What is machine translation? How people use machine translation What about human translators? Topic 2 - Approaches How much knowledge about the languages is needed? Rule-based machine translation Statistical machine translation Neural machine translation Topic 3 - Importance of data Learning a language from data Language pair and domain Sentence alignment and preprocessing Topic 4 - Challenges Availability of appropriate data Translating user generated content Going beyond sentence level Evaluation

Course 2 Machine Translation Evaluation
Topic 1 - Translation Quality What is a good translation? Overall assessment Error classification and analysis Topic 2- Manual (human) evaluation Quality criteria (adequacy, fluency and comprehensibility) Methods (ranking, scores, marking, error classification) Advantages and disadvantages Topic 3 - Automatic (computer) evaluation What is a good automatic metric? Comparing with a reference translation (matching and edit distance) Automatic error classification Evaluating without references -- quality estimation Topic 4 - Test suites (challenge test sets) What are test suites? How are they used? How are they created?

Course 3 Statistical Machine Translation
Topic 1 - Overview Probability model for translation Translation model Language model Decoding Topic 2 - Word alignments Sentence vs. word probabilities Word alignments IBM models Expectation Maximisation (EM) algorithm for word alignments Topic 3 - Phrase-based translation From words to phrases Symmetric alignments Phrase probabilities Modern SMT systems Topic 4 - Language models n-gram language model Smoothing Evaluation -- perplexity

Introduction to Neural Machine Translation
Topic 1 - Neural networks What are neural networks? Architectures Training a neural network Topic 2 - Word representations Why convert words into numbers? Types of word representations Representations in neural networks Topic 3 - Neural language models Feed-forward neural LMs Recurrent neural LMs Comparison with count based n-gram language models Topic 4 - Neural translation models Sequence-to-sequence models Encoders Decoders (language models) Comparison with phrase-based machine translation

Further topics in Neural Machine Translation
Topic 1 - Recurrent neural machine translation Architecture Vanishing gradient problem Recurrent Neural Networks (RNN) with attention Efficiency problem Topic 2 - Transformer neural machine translation Self-Attention Advantages over Recurrent Neural Networks (RNN) Applications Remaining challenges Topic 3 - Using monolingual data in neural machine translation Choice of monolingual data Back- and forward-translation (BT and FT) Using different machine translation systems for BT and FT Unsupervised MT Topic 4 - Multilingual neural machine translation Separated encoders/decoders Joint encoder and decoder Zero-shot translation Advantages and disadvantages

Assessment Breakdown
Continuous Assessment100% Examination Weight0%
Course Work Breakdown
TypeDescription% of totalAssessment Date
Loop QuizLoop Quiz assessing topics in Course 1: Overview of Machine Translation, Approaches to MT, Importance of data and challenges20%Week 2
Loop QuizLoop Quiz assessing topics in course 2 Evaluating MT systems : Translation Quality, Manual (human) evaluation, Automatic (computer) evaluation Test suites (challenge test sets)20%Week 4
Loop QuizLoop Quiz to assess topics in Course 3:Overview Word alignments, Phrase-based translation, Language models20%Week 6
Loop QuizLoop Quiz to assess topic in Course 4: Neural networks, Word representations, Neural language models. Neural translation models20%Week 8
Loop QuizLoop Quiz assessing topics in Course 5: Recurrent neural machine translation, Transformer neural machine translation, Using monolingual data in neural machine translation, Multilingual neural machine translation20%Week 10
Reassessment Requirement Type
Resit arrangements are explained by the following categories;
1 = A resit is available for all components of the module
2 = No resit is available for 100% continuous assessment module
3 = No resit is available for the continuous assessment component
This module is category 1
Indicative Reading List

  • Philip Koehn: 2009, Statistical Machine Translation, 1st, Cambridge University Press, 978-052187415
  • Philipp Koehn: 0, Neural Machine Translation, 1st, 9781108608480
Other Resources

61301, Online, Dan Jurafsky and James H. Martin, 2020, Chapter 11 Machine Translation, Speech and Language Processing (3rd ed. draft), https://web.stanford.edu/~jurafsky/slp3/11.pdf,
Programme or List of Programmes
MCMM.Sc. in Computing
MCMVM.Sc. in Computing
Archives:

My DCU | Loop | Disclaimer | Privacy Statement