Module: Machine Translation

Module Specifications.

Current Academic Year 2024 - 2025

All Module information is indicative, and this portal is an interim interface pending the full upgrade of Coursebuilder and subsequent integration to the new DCU Student Information System (DCU Key).

As such, this is a point in time view of data which will be refreshed periodically. Some fields/data may not yet be available pending the completion of the full Coursebuilder upgrade and integration project. We will post status updates as they become available. Thank you for your patience and understanding.

Date posted: September 2024

Module Title	Machine Translation
Module Code	CA681I (ITS) / CSC1179 (Banner)
Faculty	Engineering & Computing	School	Computing
Module Co-ordinator	-
Module Teachers	-
NFQ level	9	Credit Rating	7.5
Pre-requisite	Not Available
Co-requisite	Not Available
Compatibles	Not Available
Incompatibles	Not Available

Coursework Only
Repeat of all or individual Continuous Assessment components as required.

Description

This course covers the basics of machine translation. It explains what machine translation is, how it can be useful and what its potentials and limits are It gives a brief history which includes three main approaches: rule-based, statistical and neural. Students will learn about evaluation of machine-generated translations: both human and automatic. By the end of the course students will be equipped with a strong theoretical knowledge of statistical and neural network approaches to MT.

Learning Outcomes

1. Explain the concept of machine translation including approaches and the importance of language data.
2. Appraise the advantages and challenges of machine translation
3. Demonstrate an understanding of machine translation system evaluation (both manual and automatic)
4. Explain translation and language models in Statistical Machine Translation (SMT)
5. Demonstrate an understanding of the key concepts, techniques and challenges in SMT such as generative and discriminative models, word alignment and the decoding problem.
6. Explain the fundamental machinery of neural machine translation i.e neural networks, neural language and translation models and word representations
7. Describe modern Transformer-based neural machine translation architectures.
8. Describe advanced topics such as multilingual neural machine translation and the use of monolingual data to improve NMT.

*Type*	*Hours*	*Description*
Workload	Full-time hours per semester
Online activity	36	Completion Online Asynchronous Lectures (Future Learn Courses)
Assignment Completion	25	This involves completing five fortnightly loop quizzes which assess each of the five course components of the modules
Independent Study	126.5	This comprises of independent reading, engaging in online activities, and formative assessment quizzes, engaging in online discussions and revision for biweekly summative quizzes
Total Workload: 187.5

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

Course 1: Introduction to Machine Translation
Topic 1 - Overview of Machine Translation What is machine translation? How people use machine translation What about human translators? Topic 2 - Approaches How much knowledge about the languages is needed? Rule-based machine translation Statistical machine translation Neural machine translation Topic 3 - Importance of data Learning a language from data Language pair and domain Sentence alignment and preprocessing Topic 4 - Challenges Availability of appropriate data Translating user generated content Going beyond sentence level Evaluation

Course 2 Machine Translation Evaluation
Topic 1 - Translation Quality What is a good translation? Overall assessment Error classification and analysis Topic 2- Manual (human) evaluation Quality criteria (adequacy, fluency and comprehensibility) Methods (ranking, scores, marking, error classification) Advantages and disadvantages Topic 3 - Automatic (computer) evaluation What is a good automatic metric? Comparing with a reference translation (matching and edit distance) Automatic error classification Evaluating without references -- quality estimation Topic 4 - Test suites (challenge test sets) What are test suites? How are they used? How are they created?

Course 3 Statistical Machine Translation
Topic 1 - Overview Probability model for translation Translation model Language model Decoding Topic 2 - Word alignments Sentence vs. word probabilities Word alignments IBM models Expectation Maximisation (EM) algorithm for word alignments Topic 3 - Phrase-based translation From words to phrases Symmetric alignments Phrase probabilities Modern SMT systems Topic 4 - Language models n-gram language model Smoothing Evaluation -- perplexity

Introduction to Neural Machine Translation
Topic 1 - Neural networks What are neural networks? Architectures Training a neural network Topic 2 - Word representations Why convert words into numbers? Types of word representations Representations in neural networks Topic 3 - Neural language models Feed-forward neural LMs Recurrent neural LMs Comparison with count based n-gram language models Topic 4 - Neural translation models Sequence-to-sequence models Encoders Decoders (language models) Comparison with phrase-based machine translation

Further topics in Neural Machine Translation
Topic 1 - Recurrent neural machine translation Architecture Vanishing gradient problem Recurrent Neural Networks (RNN) with attention Efficiency problem Topic 2 - Transformer neural machine translation Self-Attention Advantages over Recurrent Neural Networks (RNN) Applications Remaining challenges Topic 3 - Using monolingual data in neural machine translation Choice of monolingual data Back- and forward-translation (BT and FT) Using different machine translation systems for BT and FT Unsupervised MT Topic 4 - Multilingual neural machine translation Separated encoders/decoders Joint encoder and decoder Zero-shot translation Advantages and disadvantages

Assessment Breakdown
Continuous Assessment	100%	Examination Weight	0%

Type	Description	% of total	Assessment Date
Course Work Breakdown
Loop Quiz	Loop Quiz assessing topics in Course 1: Overview of Machine Translation, Approaches to MT, Importance of data and challenges	20%	Week 2
Loop Quiz	Loop Quiz assessing topics in course 2 Evaluating MT systems : Translation Quality, Manual (human) evaluation, Automatic (computer) evaluation Test suites (challenge test sets)	20%	Week 4
Loop Quiz	Loop Quiz to assess topics in Course 3:Overview Word alignments, Phrase-based translation, Language models	20%	Week 6
Loop Quiz	Loop Quiz to assess topic in Course 4: Neural networks, Word representations, Neural language models. Neural translation models	20%	Week 8
Loop Quiz	Loop Quiz assessing topics in Course 5: Recurrent neural machine translation, Transformer neural machine translation, Using monolingual data in neural machine translation, Multilingual neural machine translation	20%	Week 10

Reassessment Requirement Type
Resit arrangements are explained by the following categories: Resit category 1: A resit is available for both* components of the module. Resit category 2: No resit is available for a 100% continuous assessment module. Resit category 3: No resit is available for the continuous assessment component where there is a continuous assessment and examination element.
* ‘Both’ is used in the context of the module having a Continuous Assessment/Examination split; where the module is 100% continuous assessment, there will also be a resit of the assessment
Resit category for this module is temporarily unavailable

Indicative Reading List

Philip Koehn: 2009, Statistical Machine Translation, 1st, Cambridge University Press, 978-052187415
Philipp Koehn: 0, Neural Machine Translation, 1st, 9781108608480

Other Resources

None