Module Specifications.
Current Academic Year 2024 - 2025
All Module information is indicative, and this portal is an interim interface pending the full upgrade of Coursebuilder and subsequent integration to the new DCU Student Information System (DCU Key).
As such, this is a point in time view of data which will be refreshed periodically. Some fields/data may not yet be available pending the completion of the full Coursebuilder upgrade and integration project. We will post status updates as they become available. Thank you for your patience and understanding.
Date posted: September 2024
| |||||||||||||||||||||||||||||||||||||||||||
Coursework Only Repeat of all or individual Continuous Assessment components as required. |
|||||||||||||||||||||||||||||||||||||||||||
Description This course covers the basics of machine translation. It explains what machine translation is, how it can be useful and what its potentials and limits are It gives a brief history which includes three main approaches: rule-based, statistical and neural. Students will learn about evaluation of machine-generated translations: both human and automatic. By the end of the course students will be equipped with a strong theoretical knowledge of statistical and neural network approaches to MT. | |||||||||||||||||||||||||||||||||||||||||||
Learning Outcomes 1. Explain the concept of machine translation including approaches and the importance of language data. 2. Appraise the advantages and challenges of machine translation 3. Demonstrate an understanding of machine translation system evaluation (both manual and automatic) 4. Explain translation and language models in Statistical Machine Translation (SMT) 5. Demonstrate an understanding of the key concepts, techniques and challenges in SMT such as generative and discriminative models, word alignment and the decoding problem. 6. Explain the fundamental machinery of neural machine translation i.e neural networks, neural language and translation models and word representations 7. Describe modern Transformer-based neural machine translation architectures. 8. Describe advanced topics such as multilingual neural machine translation and the use of monolingual data to improve NMT. | |||||||||||||||||||||||||||||||||||||||||||
All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml |
|||||||||||||||||||||||||||||||||||||||||||
Indicative Content and Learning Activities
Course 1: Introduction to Machine TranslationTopic 1 - Overview of Machine Translation What is machine translation? How people use machine translation What about human translators? Topic 2 - Approaches How much knowledge about the languages is needed? Rule-based machine translation Statistical machine translation Neural machine translation Topic 3 - Importance of data Learning a language from data Language pair and domain Sentence alignment and preprocessing Topic 4 - Challenges Availability of appropriate data Translating user generated content Going beyond sentence level EvaluationCourse 2 Machine Translation EvaluationTopic 1 - Translation Quality What is a good translation? Overall assessment Error classification and analysis Topic 2- Manual (human) evaluation Quality criteria (adequacy, fluency and comprehensibility) Methods (ranking, scores, marking, error classification) Advantages and disadvantages Topic 3 - Automatic (computer) evaluation What is a good automatic metric? Comparing with a reference translation (matching and edit distance) Automatic error classification Evaluating without references -- quality estimation Topic 4 - Test suites (challenge test sets) What are test suites? How are they used? How are they created?Course 3 Statistical Machine TranslationTopic 1 - Overview Probability model for translation Translation model Language model Decoding Topic 2 - Word alignments Sentence vs. word probabilities Word alignments IBM models Expectation Maximisation (EM) algorithm for word alignments Topic 3 - Phrase-based translation From words to phrases Symmetric alignments Phrase probabilities Modern SMT systems Topic 4 - Language models n-gram language model Smoothing Evaluation -- perplexityIntroduction to Neural Machine TranslationTopic 1 - Neural networks What are neural networks? Architectures Training a neural network Topic 2 - Word representations Why convert words into numbers? Types of word representations Representations in neural networks Topic 3 - Neural language models Feed-forward neural LMs Recurrent neural LMs Comparison with count based n-gram language models Topic 4 - Neural translation models Sequence-to-sequence models Encoders Decoders (language models) Comparison with phrase-based machine translationFurther topics in Neural Machine TranslationTopic 1 - Recurrent neural machine translation Architecture Vanishing gradient problem Recurrent Neural Networks (RNN) with attention Efficiency problem Topic 2 - Transformer neural machine translation Self-Attention Advantages over Recurrent Neural Networks (RNN) Applications Remaining challenges Topic 3 - Using monolingual data in neural machine translation Choice of monolingual data Back- and forward-translation (BT and FT) Using different machine translation systems for BT and FT Unsupervised MT Topic 4 - Multilingual neural machine translation Separated encoders/decoders Joint encoder and decoder Zero-shot translation Advantages and disadvantages | |||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||
Indicative Reading List
| |||||||||||||||||||||||||||||||||||||||||||
Other Resources 61301, Online, Dan Jurafsky and James H. Martin, 2020, Chapter 11 Machine Translation, Speech and Language Processing (3rd ed. draft), https://web.stanford.edu/~jurafsky/slp3/11.pdf, | |||||||||||||||||||||||||||||||||||||||||||