DCU Home | Our Courses | Loop | Registry | Library | Search DCU

Module Specifications..

Current Academic Year 2023 - 2024

Please note that this information is subject to change.

Module Title Programming for Data Analysis
Module Code CA274
School School of Computing
Module Co-ordinatorSemester 1: Stephen Blott
Semester 2: Stephen Blott
Autumn: Stephen Blott
Module TeachersStephen Blott
NFQ level 8 Credit Rating 5
Pre-requisite None
Co-requisite None
Compatibles None
Incompatibles None
Coursework Only
Description

This module aims to give the student a background in using a programming language such as R to deliver a competent analysis of both structured and unstructured data.

Learning Outcomes

1. R Basics: The student should be able to manipulate and read data into various R dataframes and R Tables.
2. R Objects: Creating a library in R and using class objects.
3. Parallel Programming: Developing Parallel code to handle computationally intensive analysis.
4. Visualisation: Basic plots, Geographic Maps, Multi-Dimensional reduction, 3D plotting, Dynamic Graphics
5. Big Data in R: The student should be able to handle large data-sets and demonstrate the various techniques and libraries that can be used in R to analyze BIG data-sets.
6. Differential Equations and linear Algebra in R: Understanding of the packages used in linear Algebra and differential calculus.



Workload Full-time hours per semester
Type Hours Description
Lecture12Lectures will cover the material required for the course.
Laboratory24Laboratory will be used to demonstrate the techniques and R packages taught in class.
Assignment Completion89The will be 4 Assignments throughout the term. Assignments will vary in marks from 10 to 40 % of the final mark. See Module Content and Assessment.
Total Workload: 125

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

R Basics
Introduction to R. Variable function declaration, Creating data-frames & Matrices, Reading data from outside sources such as Databases/CSV /XML files

R objects
Library development in R. How to get the best out of CRAN. Using R objects and classes to handle data.

Parallel programming in R
Developing Parallel code to handle simple computationally intensive basic analysis. The following packages will be examined: pbdMPI/openMP, pbDSLAP, snowfall,foreach, future, rborist, randomForestSRC

R visualisation
Basic plots, Geographic Maps, Multi-Dimensional reduction, 3D plotting, Dynamic Graphics

R Big Data
Package, rJava, RCCP, pqR,pddR

Maths in R
Cover some Linear algebra and Differential Calculus techniques in R.

Assessment Breakdown
Continuous Assessment100% Examination Weight0%
Course Work Breakdown
TypeDescription% of totalAssessment Date
AssignmentData manipulation with a specified data-set. One should demonstrate how to pivot a table and provide a detailed univariate statistics with supplementary graphics.10%Week 3
AssignmentCreate a R library that will have a numer of class objects that can be used to create new features for a chosen dataset. These features should include transformations such as moving averages, exponential averages and other potential smoothing utilities.20%Week 6
AssignmentDemonstrate how simple linear regression can be implemented using parallel programming in R30%Week 8
AssignmentComplete a data analysis on a Big dataset that combines both text and numeric data.40%Week 12
Reassessment Requirement Type
Resit arrangements are explained by the following categories;
1 = A resit is available for all components of the module
2 = No resit is available for 100% continuous assessment module
3 = No resit is available for the continuous assessment component
This module is category 1
Indicative Reading List

  • Jane M. Horgan: 2009, Probability with R, Wiley, Hoboken, N.J., 9780470280737
  • Robert Kabacoff: 0, R in Action, Manning Publications, 9781935182399
  • Tony Fischetti: 2015, Data Analysis with R, Packt Publishing, 9781785288142
Other Resources

None
Programme or List of Programmes
DSBSc in Data Science
ECSAStudy Abroad (Engineering & Computing)
ECSAOStudy Abroad (Engineering & Computing)
Archives:

My DCU | Loop | Disclaimer | Privacy Statement