DCU Home | Our Courses | Loop | Registry | Library | Search DCU
<< Back to Module List

Module Specifications.

Current Academic Year 2024 - 2025

All Module information is indicative, and this portal is an interim interface pending the full upgrade of Coursebuilder and subsequent integration to the new DCU Student Information System (DCU Key).

As such, this is a point in time view of data which will be refreshed periodically. Some fields/data may not yet be available pending the completion of the full Coursebuilder upgrade and integration project. We will post status updates as they become available. Thank you for your patience and understanding.

Date posted: September 2024

Module Title Data Warehousing & Data Mining
Module Code CA4010 (ITS) / CSC1104 (Banner)
Faculty Engineering & Computing School Computing
Module Co-ordinatorMark Roantree
Module Teachers-
NFQ level 8 Credit Rating 7.5
Pre-requisite Not Available
Co-requisite Not Available
Compatibles Not Available
Incompatibles Not Available
None
No Repeat allowed.
Description

A Data Warehouse is the model or structure that supports data mining and decision support. This module teaches students how to build Data Warehouses by understanding their structures and the concept of multi-dimensional modelling. It also covers Data Mining to teach students how to extract knowledge from data warehouses using three different approaches: clustering, association rule mining and classification.

Learning Outcomes

1. Be able to build Data Warehouses for different applications types
2. Be able to deploy the Data Warehouse Bus Matrix to create individual data marts.
3. Be able to design a multi-dimensional schema model.
4. Analyse the different strategies and techniques involved in Data Mining, and choose the correct approach for each dataset.
5. Be able to construct and deploy data mining algorithms.
6. Be able to determine the predictive accuracy of data mining algorithms



Workload Full-time hours per semester
Type Hours Description
Lecture24No Description
Group work40Construct datasets
Independent Study120Build Data Mining algorithms
Total Workload: 184

All module information is indicative and subject to change. For further information,students are advised to refer to the University's Marks and Standards and Programme Specific Regulations at: http://www.dcu.ie/registry/examinations/index.shtml

Indicative Content and Learning Activities

Data Mining Concepts
Introduction to terminology and basic concepts.

Classification
In this section, we describe two classification algorithms: one that can be used when all the attributes are categorical, the other when attributes are continuous.

Association Rule Mining
Unlike classification, the left- and right-hand sides of rules can potentially include tests on the value of any attribute or combination of attributes. Rules of this more general kind represent an association between the values of certain attributes and those of others and are called association rules. The process of extracting such rules from a given dataset is called association rule mining. In this section, algorithms for efficient rule generation are described.

Clustering
Clustering is concerned with grouping together objects that are similar to each other and dissimilar to the objects belonging to other clusters. We will describe two methods for which the similarity between objects is based on a measure of the distance between them.

Predictive Accuracy
Two approaches to determining the quality of our data mining predictions are covered.

Overfitting Decision Trees
Many data mining methods suffer from the problem of overfitting to the training data, resulting in some cases in excessively large rule sets and/or rules with very low predictive power for previously unseen data. In this section, we look at ways of adjusting a decision tree either while it is being generated, or afterwards, in order to increase its predictive accuracy.

Data Warehouse Characteristics
An overview of the terminology, background and motivation for constructing a data warehouse.

Multidimensional Modelling
Students will cover the concepts of dimensions, pivots, fact table granularity, roll-up and drill-down functions.

Building the Data Warehouse
A step-by-step case study to creating fact and dimension tables.

Web Data Warehouses
In this section, we describe how click stream data can be included into a traditional data warehouse using new dimensions and a click stream fact table.

Assessment Breakdown
Continuous Assessment25% Examination Weight75%
Course Work Breakdown
TypeDescription% of totalAssessment Date
Group assignmentCreate, prepare a dataset suitable for data mining algorithms.10%Week 4
AssignmentDevelop data mining algorithms to generate a result set. Be able to analyses and write a critique of the results.20%Week 8
Reassessment Requirement Type
Resit arrangements are explained by the following categories:
Resit category 1: A resit is available for both* components of the module.
Resit category 2: No resit is available for a 100% continuous assessment module.
Resit category 3: No resit is available for the continuous assessment component where there is a continuous assessment and examination element.
* ‘Both’ is used in the context of the module having a Continuous Assessment/Examination split; where the module is 100% continuous assessment, there will also be a resit of the assessment
This module is category 3
Indicative Reading List

  • Jiawei Han: 2011, Data Mining: Concepts & Techniques, Morgan Kaufmann,
  • Max Bramer: 0, Principles of Data Mining, Springer,
  • Ralph Kimball: 0, The Data Warehouse Toolkit, Wiley,
Other Resources

None

<< Back to Module List