Registry
Module Specifications
Archived Version 2023 - 2024
| |||||||||||||||||||||||||||||||||||||
Description This module addresses the three Vs of Big Data: Volume, Velocity and Variety. This module will equip students with detailed knowledge of mining massive data sets, processing streams of data in real-time, and extracting knowledge from complex information. The module introduces the theory and practice of massively parallel data processing, leveraging different hardware and software infrastructures, including could-based infrastructures. It includes a practical component with development of Big Data analytics on suitable publicly-available test data using high-level languages and suitable libraries. | |||||||||||||||||||||||||||||||||||||
Learning Outcomes 1. Understand the nature and consequences of Big Data for processing and analytics 2. Design and Implement data-intensive applications using existing best-of-breed big data libraries and frameworks 3. Discuss the role of cloud services in the design of big data systems 4. Apply machine learning techniques to Big Data 5. Explore and curate large, complex datasets for use in analytics 6. Configure and deploy data analytics infrastructure 7. Understand and discuss some of the design considerations for high-performance analytics 8. Gain detailed knowledge of map-reduce, related distributed file systems and their open-source implementations | |||||||||||||||||||||||||||||||||||||
All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml |
|||||||||||||||||||||||||||||||||||||
Indicative Content and
Learning Activities Big Data Processing and MapreduceIntroduction to Big Data, and why Big Data analytics is different to conventional approaches. Introduction to the Mapreduce algorithm and its open-source implementation. The Hadoop ecosystem and how it can be used to analyse data.Finding Similar ItemsTheoretical topics include Locally-sensitive Hashing, Minhashing, Similarity-preserving summaries, Distance measures. These form the basis for organising and exploring Big Data.Stream ProcessingHandling real-time / stream data through the use of Filtering, Sampling, Estimation of Moments, and other techniques. Practical aspects include programming with Spark, Storm or a similar library.Large-scale machine learningKey topics include Item Similarity, Clustering, and evaluating performance.Big Data CloudConfiguring and using Amazon EC2, Elastic Mapreduce, Microsoft Azure and similar technologies. The students will deploy applications to these platforms as part of their assignments. | |||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||
Indicative Reading List | |||||||||||||||||||||||||||||||||||||
Other Resources None | |||||||||||||||||||||||||||||||||||||
Programme or List of Programmes | |||||||||||||||||||||||||||||||||||||
Archives: |
|