DCU Home | Our Courses | Loop | Registry | Library | Search DCU
<< Back to Module List

Module Specifications.

Current Academic Year 2024 - 2025

All Module information is indicative, and this portal is an interim interface pending the full upgrade of Coursebuilder and subsequent integration to the new DCU Student Information System (DCU Key).

As such, this is a point in time view of data which will be refreshed periodically. Some fields/data may not yet be available pending the completion of the full Coursebuilder upgrade and integration project. We will post status updates as they become available. Thank you for your patience and understanding.

Date posted: September 2024

Module Title Data at Speed & Scale
Module Code CA4022 (ITS) / CSC1109 (Banner)
Faculty Engineering & Computing School Computing
Module Co-ordinatorAlessandra Mileo
Module Teachers-
NFQ level 8 Credit Rating 7.5
Pre-requisite Not Available
Co-requisite Not Available
Compatibles Not Available
Incompatibles Not Available
None
Description

This module addresses the three Vs of Big Data: Volume, Velocity and Variety. This module will equip students with detailed knowledge of mining massive data sets, processing streams of data in real-time, and extracting knowledge from complex information. The module introduces the theory and practice of massively parallel data processing, leveraging different hardware and software infrastructures, including could-based infrastructures. It includes a practical component with development of Big Data analytics on suitable publicly-available test data using high-level languages and suitable libraries.

Learning Outcomes

1. Understand the nature and consequences of Big Data for processing and analytics
2. Design and Implement data-intensive applications using existing best-of-breed big data libraries and frameworks
3. Discuss the role of cloud services in the design of big data systems
4. Apply machine learning techniques to Big Data
5. Explore and curate large, complex datasets for use in analytics
6. Configure and deploy data analytics infrastructure
7. Understand and discuss some of the design considerations for high-performance analytics
8. Gain detailed knowledge of map-reduce, related distributed file systems and their open-source implementations



Workload Full-time hours per semester
Type Hours Description
Lecture36Lectures and tutorials presenting the key theoretical aspects of the course. Lecture material will be provided in the form of online notes, research papers, technical documentation and multimedia content as applicable.
Laboratory24Hands-on Programming laboratory work and tutorials incorporating problem-based learning tasks, formative assessments, and student-led discussions. This will include significant technical work to configure, deploy, program and execute data analysis software.
Independent Study190Significant individual work including reading and understanding technical papers, research material, documentation. Preparation of continuous assessment, discussion of coursework with peers and group assignment.
Total Workload: 250

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

Big Data Processing and Mapreduce
Introduction to Big Data, and why Big Data analytics is different to conventional approaches. Introduction to the Mapreduce algorithm and its open-source implementation. The Hadoop ecosystem and how it can be used to analyse data.

Finding Similar Items
Theoretical topics include Locally-sensitive Hashing, Minhashing, Similarity-preserving summaries, Distance measures. These form the basis for organising and exploring Big Data.

Stream Processing
Handling real-time / stream data through the use of Filtering, Sampling, Estimation of Moments, and other techniques. Practical aspects include programming with Spark, Storm or a similar library.

Large-scale machine learning
Key topics include Item Similarity, Clustering, and evaluating performance.

Big Data Cloud
Configuring and using Amazon EC2, Elastic Mapreduce, Microsoft Azure and similar technologies. The students will deploy applications to these platforms as part of their assignments.

Assessment Breakdown
Continuous Assessment100% Examination Weight0%
Course Work Breakdown
TypeDescription% of totalAssessment Date
AssignmentObtain, explore, curate, and analyse a massive dataset, applying relevant analytical approaches and reporting on the results30%Week 4
Research PaperShort Research Paper on a new trend, novel approach or new application in the area of Big Data Analytics20%Week 7
ProjectDesign and Implement a Data-driven application, optionally using machine learning and cloud infrastructure50%Week 10
Reassessment Requirement Type
Resit arrangements are explained by the following categories:
Resit category 1: A resit is available for both* components of the module.
Resit category 2: No resit is available for a 100% continuous assessment module.
Resit category 3: No resit is available for the continuous assessment component where there is a continuous assessment and examination element.
* ‘Both’ is used in the context of the module having a Continuous Assessment/Examination split; where the module is 100% continuous assessment, there will also be a resit of the assessment
This module is category 1
Indicative Reading List

    Other Resources

    None

    << Back to Module List