Module: Natural Language Technologies

Module Specifications.

Current Academic Year 2024 - 2025

All Module information is indicative, and this portal is an interim interface pending the full upgrade of Coursebuilder and subsequent integration to the new DCU Student Information System (DCU Key).

As such, this is a point in time view of data which will be refreshed periodically. Some fields/data may not yet be available pending the completion of the full Coursebuilder upgrade and integration project. We will post status updates as they become available. Thank you for your patience and understanding.

Date posted: September 2024

Module Title	Natural Language Technologies
Module Code	CA4023 (ITS) / CSC1110 (Banner)
Faculty	Engineering & Computing	School	Computing
Module Co-ordinator	Jennifer Foster
Module Teachers	-
NFQ level	8	Credit Rating	7.5
Pre-requisite	Not Available
Co-requisite	Not Available
Compatibles	Not Available
Incompatibles	Not Available

Repeat examination

Description

This module provides students with a practical and theoretical grounding in the following core topics in modern Natural Language Processing: language modelling, language analysis, information extraction and language understanding. Students will learn how these problems are tackled using supervised and semi-supervised machine learning, and they will gain hands-on experience developing machine-learning solutions during the laboratory sessions. Popular benchmark datasets will be employed, including ‘noisy’ datasets containing text from sources such as Twitter and reddit.

Learning Outcomes

1. Describe the applications of Natural Language Processing (NLP) in Data Science
2. Illustrate how neural word embeddings underpin modern NLP systems
3. Develop an English language model
4. Evaluate an English language model
5. Develop an English part-of-speech tagger
6. Evaluate an English part-of-speech tagger
7. Develop a sentiment analysis system for English
8. Evaluate a sentiment analysis system for English
9. Develop an English question answer/reading comprehension system
10. Evaluate an English question answer/reading comprehension system
11. Explain the unsolved problems in NLP research
12. Summarize the ethical issues surrounding modern data-driven NLP

*Type*	*Hours*	*Description*
Workload	Full-time hours per semester
Lecture	24	Formal lectures introducing NLP for data science
Laboratory	12	Series of laboratories introducing python-based machine learning techniques for NLP
Assignment Completion	21.5	No Description
Independent Study	130	No Description
Total Workload: 187.5

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

Language Modelling
Next word prediction using n-gram language models, modern word embeddings including word2vec (Mikolov et al. 2013) and contextualised word embeddings such as BERT (Devlin et al. 2019). Generating text using language models

Language analysis and data extraction
Part-of-speech tagging, dependency parsing, named-entity recognition, semantic parsing. Sequence labelling and structured prediction using recurrent neural nets (LSTMs) and transformer networks

Language understanding
Sentiment analysis, automatic reading comprehension and question answering. Using information retrieval techniques in question answering. Using language analysis and extraction tools (see above) to improve baseline models for language understanding applications

Assessment Breakdown
Continuous Assessment	40%	Examination Weight	60%

Type	Description	% of total	Assessment Date
Course Work Breakdown
Assignment	Language modelling	10%	Week 21
Assignment	Part-of-speech tagging	10%	Week 24
Assignment	Sentiment analysis	10%	Week 27
Assignment	Question Answering/Machine Reading Comprehension	10%	Week 30

Reassessment Requirement Type
Resit arrangements are explained by the following categories: Resit category 1: A resit is available for both* components of the module. Resit category 2: No resit is available for a 100% continuous assessment module. Resit category 3: No resit is available for the continuous assessment component where there is a continuous assessment and examination element.
* ‘Both’ is used in the context of the module having a Continuous Assessment/Examination split; where the module is 100% continuous assessment, there will also be a resit of the assessment
Resit category for this module is temporarily unavailable

Indicative Reading List

Jurafsky and Martin: 0, Speech and Language Processing, Prentice Hall,
Manning and Schutze: 1999, Foundations of Statistical Natural Language Processing, MIT Press,
Goldberg: 0, Neural Network Methods for Natural Language Processing, Morgan and Claypool,

Other Resources

None