Module: Data at Speed & Scale

Latest Module Specifications

Current Academic Year 2025 - 2026

Module Title	Data at Speed & Scale
Module Code	CSC1109 (ITS: CA4022)
Faculty	Engineering & Computing	School	Computing
NFQ level	8	Credit Rating	7.5

Description

This module addresses the three Vs of Big Data: Volume, Velocity and Variety. This module will equip students with detailed knowledge of mining massive data sets, processing streams of data in real-time, and extracting knowledge from complex information. The module introduces the theory and practice of massively parallel data processing, leveraging different hardware and software infrastructures, including could-based infrastructures. It includes a practical component with development of Big Data analytics on suitable publicly-available test data using high-level languages and suitable libraries.

Learning Outcomes

1. 1E2668F5-99A0-0001-7453-17921450C140
2. Understand the nature and consequences of Big Data for processing and analytics
4. 7,8
5. 1
6. 1E2668F5-AAA8-0001-151A-1AF0100324A0
7. Design and Implement data-intensive applications using existing best-of-breed big data libraries and frameworks
9. 11,10
10. 2
11. 1E2668F5-B368-0001-A45E-D2F81950DAD0
12. Discuss the role of cloud services in the design of big data systems
14. 7,6
15. 3
16. 1E2668F5-C7FC-0001-C399-ED409ED01D1F
17. Apply machine learning techniques to Big Data
19. 8,9
20. 4
21. 1E2668F5-D763-0001-BB5B-490082E79500
22. Explore and curate large, complex datasets for use in analytics
24. 8,10
25. 5
26. 1E2668F5-EFDC-0001-D0C5-BC2011E0170D
27. Configure and deploy data analytics infrastructure
29. 11,10
30. 6
31. 1E2668F6-08DD-0001-5596-9C001410FE60
32. Understand and discuss some of the design considerations for high-performance analytics
34. 7,9
35. 7
36. 1E2668F6-1AAB-0001-157E-12B096AE83E0
37. Gain detailed knowledge of map-reduce, related distributed file systems and their open-source implementations
39. 6
40. 8

*Type*	*Hours*	*Description*
Workload	Full time hours per semester
Lecture	36	Lectures and tutorials presenting the key theoretical aspects of the course. Lecture material will be provided in the form of online notes, research papers, technical documentation and multimedia content as applicable.
Laboratory	24	Hands-on Programming laboratory work and tutorials incorporating problem-based learning tasks, formative assessments, and student-led discussions. This will include significant technical work to configure, deploy, program and execute data analysis software.
Independent Study	127.5	Significant individual work including reading and understanding technical papers, research material, documentation. Preparation of continuous assessment, discussion of coursework with peers and group assignment.
Total Workload: 187.5

Section Breakdown
CRN	10609	Part of Term	Semester 1
Coursework	100%	Examination Weight	0%
Grade Scale	40PASS	Pass Both Elements	N
Resit Category	RC1	Best Mark	N
Module Co-ordinator	Alessandra Mileo	Module Teacher

Section Breakdown
CRN	11819	Part of Term	Semester 1
Coursework	100%	Examination Weight	0%
Grade Scale	40PASS	Pass Both Elements	N
Resit Category	RC1	Best Mark	N
Module Co-ordinator	Alessandra Mileo	Module Teacher

Type	Description	% of total	Assessment Date
Assessment Breakdown
Assignment	Obtain, explore, curate, and analyse a massive dataset, applying relevant analytical approaches and reporting on the results	30%	Week 4
Loop Quiz	Supervised MCQ assessment via closed-book Loop quiz	20%	Week 7
Project	Design and Implement a Data-driven application, optionally using machine learning and cloud infrastructure	50%	Week 10

Reassessment Requirement Type
Resit arrangements are explained by the following categories; RC1: A resit is available for both^* components of the module. RC2: No resit is available for a 100% coursework module. RC3: No resit is available for the coursework component where there is a coursework and summative examination element. ^* ‘Both’ is used in the context of the module having a coursework/summative examination split; where the module is 100% coursework, there will also be a resit of the assessment

Pre-requisite	None
Co-requisite	None
Compatibles	None
Incompatibles	None

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

Big Data Processing and Mapreduce
Introduction to Big Data, and why Big Data analytics is different to conventional approaches. Introduction to the Mapreduce algorithm and its open-source implementation. The Hadoop ecosystem and how it can be used to analyse data.

Finding Similar Items
Theoretical topics include Locally-sensitive Hashing, Minhashing, Similarity-preserving summaries, Distance measures. These form the basis for organising and exploring Big Data.

Stream Processing
Handling real-time / stream data through the use of Filtering, Sampling, Estimation of Moments, and other techniques. Practical aspects include programming with Spark, Storm or a similar library.

Large-scale machine learning
Key topics include Item Similarity, Clustering, and evaluating performance.

Big Data Cloud
Configuring and using Amazon EC2, Elastic Mapreduce, Microsoft Azure and similar technologies. The students will deploy applications to these platforms as part of their assignments.

Indicative Reading List

Books:
None

Articles:
None

Other Resources

None