Module: Foundations of Natural Language Processing

Latest Module Specifications

Current Academic Year 2025 - 2026

Module Title	Foundations of Natural Language Processing
Module Code	CSC1122 (ITS: CA6010)
Faculty	Computing	School	Engineering & Computing
NFQ level	9	Credit Rating	7.5

Description

A central goal of Natural Language Processing (NLP) is to develop automated systems capable of understanding and generating natural language, with wide-ranging applications including machine translation, dialogue systems, information extraction, opinion mining and grammar checking. This module will provide students with a theoretical and practical grounding in the core areas, tasks and methods of NLP. Students will be introduced to the challenges of developing modern, data-driven NLP systems against a background of the history of NLP and its rapidly evolving technologies. Through lectures, assignments and a group project, students will learn the necessary background and skills to design, implement, evaluate and understand their own NLP models, using commonly used off-the-shelf toolkits and programming libraries. Topics covered in depth include different NLP tasks such as syntactic and semantic analysis, language generation, and text classification, as well as methods for evaluating NLP systems using automated metrics and human assessment.

Learning Outcomes

1. Demonstrate awareness of the history of NLP and the emergence in the field over time of increasingly varied and complex tasks and methods
2. Apply in practice knowledge of a range of sequence labeling tasks and available methods and tools to solve them
3. Apply in practice knowledge of word vectors, their typical uses and methods for creating them
4. Apply in practice knowledge of a range of text classification tasks and methods for solving them.
5. Demonstrate an understanding of the main machine learning algorithms used in NLP, including Naive Bayes, logistic regression, support vector machines, hidden Markov models, conditional random fields, and simple neural networks.
6. Design systems to perform core NLP tasks (document classification, tagging, parsing, information extraction, generation) and test alternative solutions on datasets used by the NLP research community
7. Critically review, select and apply appropriate evaluation methods for a range of different NLP tasks.
8. Demonstrate an understanding of the many challenges that remain in the field of NLP, including in relation to evaluation, technical capability, ethics and wider societal responsibilities.

*Type*	*Hours*	*Description*
Workload	Full time hours per semester
Lecture	24	Lecture delivered twice weakly
Laboratory	24	2 hour lab once a week
Assignment Completion	80	Group Assignment
Independent Study	50	This reflects the work carried out by students outside the lecture (reading background material, finishing lab assignments)
Total Workload: 178

Section Breakdown
CRN	12020	Part of Term	Semester 1
Coursework	0%	Examination Weight	0%
Grade Scale	40PASS	Pass Both Elements	Y
Resit Category	RC1	Best Mark	N
Module Co-ordinator	Ellen Rushe	Module Teacher

Type	Description	% of total	Assessment Date
Assessment Breakdown
Loop Exam	A lab exam to test students’ understanding of the course content.	10%	Week 6
Loop Exam	A lab exam to test students’ understanding of the course content.	10%	Week 12
Project	In this assignment, students tackle a real research problem in the field involving an NLP task that interests them most. They complete a project (codebase + report) in which they implement, describe and compare four solutions to the problem: 1) a heuristic rule-based solution, 2) a traditional machine-learning solution, 3) a solution based on fine-tuning a moderately sized language model, and 4) a solution produced by prompting a Large Language Model. A thorough error analysis which provides insights into the strengths and weaknesses of each solution is conducted.	20%	Once per semester
Formal Examination	End of semester summative assessment	60%	End-of-Semester

Reassessment Requirement Type
Resit arrangements are explained by the following categories; RC1: A resit is available for both^* components of the module. RC2: No resit is available for a 100% coursework module. RC3: No resit is available for the coursework component where there is a coursework and summative examination element. ^* ‘Both’ is used in the context of the module having a coursework/summative examination split; where the module is 100% coursework, there will also be a resit of the assessment

Pre-requisite	None
Co-requisite	None
Compatibles	None
Incompatibles	None

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

Introductory Concepts and History of NLP
In the first part of the course, students will be introduced to the field of NLP via a survey of subfields, linguistic concepts, and tasks prevalent in NLP. The field’s history will be traced through the increasingly varied and complex tasks and methods that it has addressed since its beginnings in the middle of the 20th century. This part will introduce the four clusters of NLP problems and methods developed to solve them which are explored in more detail in the next five parts of the module.

Syntactic Analysis and Structured Prediction
The sequence labelling tasks of part-of-speech tagging, chunking and named entity recognition will be introduced. Hidden markov models and conditional random fields will be described and compared. Different features used in sequence labelling tasks will be discussed. The problem of dependency parsing will be described. The two main approaches of dependency parsing will be covered, namely, transition and graph-based dependency parsing.

Distributional Semantics and Word Vectors
Students will be introduced to count-based word vectors to represent word meaning, and different methods for creating them will be explored. Students will learn about the range of applications word vectors are used for, in particular their role in semantic reasoning tasks (e.g. word sense disambiguation, paraphrase detection). Word vectors will be situated in the context of related vector-based representations of words and text.

Natural Language Generation
Common tasks, approaches and methods in NLG will be introduced. The core tasks of natural language generation from structured data, meaning representations and syntactic representations will be explored, and template-based and statistical approaches to the problem will be studied.

Document/Text Classification
Students will be introduced to NLP applications which involve classifying documents into discrete classes. These include sentiment analysis, language identification and topic classification. A range of text classification methods will be studied in practical contexts.

Evaluation
The evaluation methods introduced in earlier parts of the module will be placed in a wider context by exploring the purposes of evaluation in NLP, and surveying prevalent evaluation methods. Intrinsic vs. extrinsic, absolute vs. relative, and objective vs. subjective modes of evaluation will be distinguished. Basic concepts in human evaluation of NLP systems will be introduced (covered in more depth in CA6012 Human Factors in NLP). Automated evaluation metrics will be critically reviewed and explored in practical exercises.

Indicative Reading List

Books:

Christopher Manning,Hinrich Schutze: 1999, Foundations of Statistical Natural Language Processing, Select Chapters, MIT Press, 720, 0-262-13360-1
Dan Jurafsky,James H. Martin: 2009, Speech and Language Processing, Select Chapters, Prentice Hall, 988, 9780131873216
Sandra Kübler,Ryan McDonald,Joakim Nivre: 2009, Dependency Parsing, Select Chapters, Morgan & Claypool Publishers, 115, 1598295969
Noah Ashton Smith: 2011, Linguistic Structure Prediction (Synthesis Lectures on Human Language Technologies)., Morgan & Claypool Publishers, 1608454053
Bing Liu: 2012, Sentiment Analysis and Opinion Mining, Select Chapters, Morgan & Claypool Publishers, 167, 9781608458844

Articles:

Andrew McCallum, Wei Li: 2003, Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL,McCallum, A., & Li, W. (2003). Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. CoNLL, 2003, https://aclanthology.org/W03-0430.pdf, 66705
2008: Opinion Mining and Sentiment Analysis, Foundations and Trends in Information Retrieval, Volume 2, Issue 1-2, https://doi.org/10.1561/1500000011, 66706, 1
Distributed Representations of Words and Phrases and their Compositionality: In Proceedings of the 26th International Conference on Neural Information Processing Systems, (NIPS'13)., 2, https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf, 66707, 1, Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing(EMNLP): https://aclanthology.org/D16-1264.pdf, 66708, 1, Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis Tyers, Daniel Zeman, 2020
https://aclanthology.org/2020.lrec-1.497.pdf:

Other Resources

Software: spaCy open-source library for Natural Language Processing in Python, https://spacy.io/
Software: Stanza – A Python NLP Package for Many Human Languages, https://stanfordnlp.github.io/stanza/
Software: Skiit-learn.org Simple and efficient tools for predictive data analysis ·, https://scikit-learn.org/stable/
Online Book: Dan Jurafsky and James H. Martin, 2021, Speech and Language Processing (3rd ed. draft), https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf

To added to MCM Semester 1 NLP Major structures asap.