Module: Deep Learning for Natural Language Processing

Module Specifications.

Current Academic Year 2024 - 2025

All Module information is indicative, and this portal is an interim interface pending the full upgrade of Coursebuilder and subsequent integration to the new DCU Student Information System (DCU Key).

As such, this is a point in time view of data which will be refreshed periodically. Some fields/data may not yet be available pending the completion of the full Coursebuilder upgrade and integration project. We will post status updates as they become available. Thank you for your patience and understanding.

Date posted: September 2024

Module Title	Deep Learning for Natural Language Processing
Module Code	CA6011 (ITS) / CSC1123 (Banner)
Faculty	Engineering & Computing	School	Computing
Module Co-ordinator	Anya Belz
Module Teachers	-
NFQ level	9	Credit Rating	7.5
Pre-requisite	Not Available
Co-requisite	Not Available
Compatibles	Not Available
Incompatibles	Not Available

Repeat examination

Description

Neural natural language processing (NLP) underpins some of the most important technologies of the information age. It is found in tools for web search, advertising, emails, customer service, translation, and virtual agents, among many other applications. Most recently, large language models (LLMs) like the ones powering ChatGPT have been shown to have surprisingly varied knowledge and abilities far beyond the tasks they were trained for, and this has opened new and potentially very important application possibilities for NLP. This module will introduce students to the neural network architectures that power modern NLP including LLMs like GPT. Students will learn how such networks function and will be given the opportunity to train NLP systems using popular open-source neural NLP toolkits and libraries. The module will progress through three main learning blocks. The first block will impart theoretical understanding of the principal neural network architectures used for NLP, including feed-forward, recurrent and transformer network architectures, graph-based neural networks, and large-scale pretrained language models. Students will be introduced to the mathematical foundations of the relevant machine learning models and their associated optimisation algorithms. In the second learning block, students will gain practical understanding and skills in solving a number of NLP tasks by applying end-to-end neural architectures, fine-tuning existing neural language models on specific problems, and other approaches, covering a range of applications including analysing latent dimensions in text, transcribing speech to text, translating between languages, and answering questions. Students will learn about challenges, risks, and opportunities arising from the applications of deep learning techniques to such tasks. The third learning block will cover recent applications of neural networks including LLMs to multimodal and multilingual tasks that were largely infeasible before the emergence of modern neural network architectures.

Learning Outcomes

1. Reflect on and assess the theoretical underpinnings and practical applications of a range of different neural models used to solve NLP tasks, and how to select and apply optimisation algorithms for them
2. Design, test and implement neural attention mechanisms and sequence embedding models, and combine these modular components to build state of the art NLP systems.
3. Critically assess the range of available commonly used toolkits, libraries, reusable trained models and datasets in neural NLP, understand their possible uses, and assess their limitations
4. Critically assess and choose appropriate neural architectures for different NLP tasks, taking into account computational requirements, and adapting techniques from different subfields, languages and domains
5. Design, test and implement common neural network models for NLP tasks including those first introduced in the Foundations of NLP module (CA6010).
6. Critically assess and apply in practice reusable word and higher-level representations in neural NLP, and the difference between non-contextualised word vectors (word2vec, GloVe, etc.), and contextualised word vectors (ELMo, BERT, etc.), and the methods used to produce them.
7. Reflect on the challenges posed by pre-trained neural language models, including issues of bias and factual correctness in generated text
8. Reflect on and apply in practice knowledge about the possibilities opened up by modern neural architectures in enabling learning across languages and modalities
9. Reflect on and apply in practice learning relating to working and communicating effectively in a team to design and implement solutions for new domains or unfamiliar contexts, justifying the proposed design and development strategy.

*Type*	*Hours*	*Description*
Workload	Full-time hours per semester
Lecture	24	Twice Weekly Lecture
Laboratory	24	2 hour lab once a week
Assignment Completion	80	Project Work
Independent Study	59.5	This reflects the work carried out by students outside the lecture (reading background material, finishing lab assignments)
Total Workload: 187.5

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

I. Neural Network Architectures for Natural Language Processing
In the first learning block, students will gain a theoretical understanding of the principal neural network architectures used for NLP, including feed-forward, recurrent, encoder-decoder and transformer network architectures, and large-scale pretrained language models. Students will be introduced to the mathematical definitions of the relevant machine learning models and their associated optimisation algorithms.

II. Applying Neural NLP Methods
In the second learning block, students will explore selected neural NLP methods in depth, including applying them in practical exercises. Topics covered will include learning neural word vectors, fine-tuning language models for NLP tasks and Neural Language Generation, as below.

Learning neural word vectors
Building on concepts introduced in CA2010 Foundations of NLP, students will learn about neural word vectors and the differences between static, non-contextualised word vectors (word2vec, GloVe, etc.), and contextualized word vectors (ELMo, BERT, etc.), and the methods used to produce them.

Fine-tuning language models for NLP tasks
Students will learn about one of the most common approaches in modern NLP, namely building NLP systems by fine-tuning large-scale language models (BERT, GPT, T5, XLNet), and how this can be used to build solutions for a range of different NLP tasks.

Neural Language Generation
Basic concepts in NLG will be recapped including learning task construals, applications and history, before introducing the main current neural techniques, architectures and resources in use in NLG. The power and shortcomings of large language models will be examined, and the issues of controllability, bias and transparency will be discussed.

III. Neural Methods for Multilingual and Multimodal NLP
In the third learning block, students will be introduced to multilingual systems which can work for several languages, including those for which training data is severely limited. Techniques explored will include transfer learning and multilingual language models (XLM-R, mBERT). Students will also explore developing foundational systems which use NN architectures to integrate multiple modalities including text, speech, image and video.

Assessment Breakdown
Continuous Assessment	70%	Examination Weight	30%

Type	Description	% of total	Assessment Date
Course Work Breakdown
Laboratory Portfolio	During the laboratory sessions, the students will explore and learn how to implement neural systems for a selection of NLP problems	20%	Every Second Week
Group project	Working in groups, students will address a real research problem in the field involving an NLP task using neural architectures, e.g. sentiment analysis in Twitter, fake news detection, question answering, cross-lingual text summarisation, caption generation, etc. Assessment will be split into 20% for group project work project and 10% for a short individual critical reflective scientific report on their contribution to project (challenges as learner, technical and scientific, and the solutions that worked for them).	30%	Once per semester

Reassessment Requirement Type
Resit arrangements are explained by the following categories: Resit category 1: A resit is available for both* components of the module. Resit category 2: No resit is available for a 100% continuous assessment module. Resit category 3: No resit is available for the continuous assessment component where there is a continuous assessment and examination element.
* ‘Both’ is used in the context of the module having a Continuous Assessment/Examination split; where the module is 100% continuous assessment, there will also be a resit of the assessment
Resit category for this module is temporarily unavailable

Indicative Reading List

Ian Goodfellow,Yoshua Bengio,Aaron Courville: 2016, Deep Learning, MIT Press, 0262035618
Yoav Goldberg: 2017, Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies), Morgan & Claypool Publishers, 1627052984
Shashi Narayan,Claire Gardent: 2020, Deep Learning Approaches to Text Production, Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 9781681737607

Other Resources

None