DCU Home | Our Courses | Loop | Registry | Library | Search DCU

Registry

Module Specifications

Archived Version 2021 - 2022

Module Title
Module Code
School

Online Module Resources

NFQ level 8 Credit Rating 7.5
Pre-requisite None
Co-requisite None
Compatibles None
Incompatibles None
Description

A Data Warehouse is the model or structure that supports data mining and decision support. This module teaches students how to build Data Warehouses by understanding their structures and the concept of multi-dimensional modelling. It also covers Data Mining to teach students how to extract knowledge from data warehouses using three different approaches: clustering, association rule mining and classification.

Learning Outcomes

1. Be able to build Data Warehouses for different applications types
2. Be able to deploy the Data Warehouse Bus Matrix to create individual data marts.
3. Be able to design a multi-dimensional schema model.
4. Analyse the different strategies and techniques involved in Data Mining, and choose the correct approach for each dataset.
5. Be able to construct and deploy data mining algorithms.
6. Be able to determine the predictive accuracy of data mining algorithms



Workload Full-time hours per semester
Type Hours Description
Lecture24No Description
Group work40Construct datasets
Independent Study120Build Data Mining algorithms
Total Workload: 184

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

Data Mining Concepts
Introduction to terminology and basic concepts.

Classification
In this section, we describe two classification algorithms: one that can be used when all the attributes are categorical, the other when attributes are continuous.

Association Rule Mining
Unlike classification, the left- and right-hand sides of rules can potentially include tests on the value of any attribute or combination of attributes. Rules of this more general kind represent an association between the values of certain attributes and those of others and are called association rules. The process of extracting such rules from a given dataset is called association rule mining. In this section, algorithms for efficient rule generation are described.

Clustering
Clustering is concerned with grouping together objects that are similar to each other and dissimilar to the objects belonging to other clusters. We will describe two methods for which the similarity between objects is based on a measure of the distance between them.

Predictive Accuracy
Two approaches to determining the quality of our data mining predictions are covered.

Overfitting Decision Trees
Many data mining methods suffer from the problem of overfitting to the training data, resulting in some cases in excessively large rule sets and/or rules with very low predictive power for previously unseen data. In this section, we look at ways of adjusting a decision tree either while it is being generated, or afterwards, in order to increase its predictive accuracy.

Data Warehouse Characteristics
An overview of the terminology, background and motivation for constructing a data warehouse.

Multidimensional Modelling
Students will cover the concepts of dimensions, pivots, fact table granularity, roll-up and drill-down functions.

Building the Data Warehouse
A step-by-step case study to creating fact and dimension tables.

Web Data Warehouses
In this section, we describe how click stream data can be included into a traditional data warehouse using new dimensions and a click stream fact table.

Assessment Breakdown
Continuous Assessment% Examination Weight%
Course Work Breakdown
TypeDescription% of totalAssessment Date
Reassessment Requirement
Resit arrangements are explained by the following categories;
1 = A resit is available for all components of the module
2 = No resit is available for 100% continuous assessment module
3 = No resit is available for the continuous assessment component
Unavailable
Indicative Reading List

  • Jiawei Han: 2011, Data Mining: Concepts & Techniques, Morgan Kaufmann,
  • Max Bramer: 0, Principles of Data Mining, Springer,
  • Ralph Kimball: 0, The Data Warehouse Toolkit, Wiley,
Other Resources

None
Programme or List of Programmes
Archives: