Practical Report
Assignment 3 Written Practical Report
Modules 4–11 are particularly
relevant for this assignment. Assignment 3 relates to the specific course
learning objectives 1, 2, 3 and 4:
1. apply knowledge of people, markets,
finances, technology and management in a global context of business
intelligence practice (data warehousing and big data architecture, data mining
process, data visualization and performance management) and resulting organizational
change and understand how these apply to the implementation of business
intelligence in organization systems and business processes
2. identify and solve complex organizational
problems creatively and practically through the use of business intelligence
and critically reflect on how evidence based decision making and sustainable
business performance management can effectively address real-world problems
3. comprehend and address complex
ethical dilemmas that arise from evidence based decision making and business
performance management
4.
communicate effectively in a clear
and concise manner in written report style for senior Management with the
correct and appropriate acknowledgment of the main ideas presented and
discussed.
Note you must use Rapid Miner Studio for Task 2 and Tableau
Desktop for Task 3 in
this Assignment 3. Failure to do so
may result in Task 2 and/or 3 not being marked and zero marks awarded. Your
Assignment 3 submission is automatically submitted to and checked in Turnitin
for academic integrity when you submit your Assignment 3 via the course study
Assignment 3 submission link. Note
carefully University policy on Academic Misconduct such as plagiarism, collusion and cheating. If any of these
occur they will be found and dealt with by the USQ Academic Integrity Procedures. If proven, Academic Misconduct may
result in failure of an individual
assessment, the entire course or exclusion from a University program or
programs.
Assignment 3 consists of three main tasks and
a number of sub tasks Task 1 (Worth 20 marks)
Task 1.1 Choose a large organisation located within Australia that is
publicly listed on Australia Stock
Exchange and is already actively engaged in the Information Age. Briefly
describe your chosen organisation and include the url link to their corporate
website and explain why you have chosen this organisation for Task 1 about 250
words).
Task 1.2 Conduct a desktop research to analyse your chosen
organisation in terms of the security
and privacy policy statements available on its website. Provide the url links
to the security and privacy policy statements available online in your answer
to Task 1 (ii) and then discuss how governance of privacy and security of data
is addressed in this organisation drawing on the nine core principles of the Australian Data
Governance Draft
Code of
Practice :
1.
No-harm
rule
2.
Honesty & transparency
3.
Fairness
4.
Choice
5.
Accuracy and access
6.
Accountability
7.
Stewardship
8.
Security
9.
Enforcement
to guide your analysis and discussion (about 1250 words)
Your Data:
balancing act of consumer protection and benefit of data innovation and visit the
Australian Data Governance web site http://datagovernanceaus.com.au/
Task
2 (Worth 35 Marks)
The goal of Task 2 is to predict
whether a person has diabetes or not based on data collected on 768 female Pima
Indians contained in the diabetes.csv data set provided for Assignment 3 Task 2
(see Table 2.1 for the Data Dictionary for diabetes.csv data set below). It is
important you understand this data set in order to complete Task 2 and four sub
tasks.
Table
2.1 Data Dictionary for diabetes.csv
Variable
Name
|
Data
|
Description
|
|
Type
|
|
Pregnancies
|
Integer
|
Number
of Times Pregnant - Gestational Diabetes- age 25+
|
Glucose
|
Integer
|
Plasma
glucose concentration after 2 hours in an oral glucose
|
|
|
tolerance
test, normal when less than/equal to 110 mg/dL
|
Blood
Pressure
|
Integer
|
Diastolic
blood pressure (mm Hg) : 60-80 mm normal
|
Skin
Thickness
|
Integer
|
Triceps
skin fold thickness (mm) used to determine body fat
|
|
|
percent
- Normal 23mm
|
Insulin
|
Integer
|
2-Hour
serum insulin (mu U/ml) Greater than 150 mu U/ml
|
|
|
relates
to insulin therapy
|
BMI
|
Real
|
BMI:
Body mass index (weight in kg/(height in m)^2)
|
|
|
Ideal Range between 18.5 and 24.9,
Less 18.5 underweight,
|
|
|
over
24.9 overweight – there is a link between obesity and
|
|
|
diabetes
|
Diabetes
Pedigree Function
|
Real
|
Diabetes
Pedigree Function equates to History of diabetes in
|
|
|
family
(a) 0.5 (50%) for parent, full sibling (b) 0.25 (25%)
|
|
|
half
sibling, grandparent, aunt, or uncle (c) 0.125 (12.5%)
|
|
|
half
aunt, half uncle, or first cousin
|
Age
|
Integer
|
Age
in years
|
Outcome
|
Integer
|
Class
variable (0 or 1) classification prediction of diabetes 0
|
|
|
=
False, 1 = True
|
Task 2.1 Conduct an exploratory data analysis of the diabetes.csv
data set using RapidMiner Studio
data mining tool.
Provide
the following for Task 2.1:
(i)
A
screen capture of your final EDA process and briefly describe your final EDA
process.
(ii)
Summarise
the key results of your exploratory data analysis in a table named Table 2.1
Results of Exploratory Data Analysis for Diabetes.csv
(iii)
Discuss
the key results of your exploratory data analysis and provide a rationale for
selecting your top 5-6 variables for predicting diabetes as the outcome based
on the results of your exploratory data analysis and a review of the relevant
literature on key factors contributing to likelihood of developing diabetes
(About 500 words)
Table 2.1 should include the key characteristics of each variable in
the diabetes.csv data set such as
maximum, minimum values, average, standard deviation, most frequent values
(mode), missing values and invalid values etc.
Hint: The
Statistics Tab and the Chart Tab in RapidMiner provide a lot of descriptive statistical information and the
ability to create useful charts like Barcharts, Scatterplots etc for the EDA
analysis. You might also like to look at running some correlations and chi
square tests on the diabetes.csv
data set to indicate which variables you consider to be the top 5-6 key
variables which contribute most to predicting diabetes as an outcome.
Task
2.2 Build a Decision Tree model for
predicting diabetes based on the diabetes.csv data set using RapidMiner and an appropriate set of data mining
operators and a reduced set of variables from diabetes.csv determined by your
exploratory data analysis in Task 2.1.
Provide
the following for Task 2.2:
(i)
(1)
Final Decision Tree Model process, (2) Final Decision Tree diagram, and (3)
Decision tree rules.
(ii)
Briefly
explain your final Decision Tree Model Process, and discuss the results of the
Final Decision Tree Model drawing on the key outputs (Decision Tree Diagram,
Decision Tree Rules) for predicting
diabetes. This discussion should be based on the contribution of each of the
top five variables to the Final Decision Tree Model and relevant supporting
literature on the interpretation of decision trees
(About 250 words).
Task 2.3 Build a Logistic Regression model for predicting the
diabetes based on the diabetes.csv
data set using RapidMiner and an appropriate set of data mining operators and a
reduced set of variables determined by your exploratory data analysis in Task
2.1.
Provide
the following for Task 2.3:
(i)
(1)
Final Logistic Regression Model process and (2) Coefficients, and (3) Odds
Ratios. Hint you will need to
install the Weka Extension in RapidMiner, use W-Logistic Regression Operator
for this Task 2.3.
(ii)
Briefly
explain your final Logistic Regression Model Process and discuss the results of
the Final Logistic Regression Model drawing on the key outputs (Coefficients,
Odds Ratios) for predicting diabetes. This discussion should be based on the
contribution of each of the top five variables to the Final Logistic Regression
Model
and
relevant supporting literature on the interpretation of logistic regression
models (About 250 words).
Task 2.4 Conduct a comparative performance evaluation of your Final
Decision Tree Model with your Final
Logistic Regression Model for predicting diabetes. Note you will need to use the Cross Validation Operator; Apply Model
Operator and Performance (Binominal
Classification) Operator in your final data mining process models (Decision Tree, Logistic Regression) to generate
the required model performance metrics (Accuracy, Miscalculation Rate, True
Positive Rate, False Positive Rate, Area under Roc Chart (AUC), Precision,
Recall, Lift, Sensitivity, F Measure) required for Task 2.4.
Provide
the following for Task 2.4:
(i)
A
screen snapshot of the Confusion Matrix and AUC for each Final Model (Decision
Tree, Logistic Regression)
(ii)
A
table named Table 2.2 Results of Model Performance Evaluation (Decision Tree,
Logistic Regression) that compares the key results of the performance
evaluation for the Final Decision Tree Model and Final Logistic Regression
Model in terms of Model Accuracy, Miscalculation Rate, True Positive Rate,
False Positive Rate, Precision, Recall, Lift, Sensitivity, F Measure.
(iii)
Discuss
and compare the key results of your performance evaluation of two final models
(Decision Tree, Logistic Regression) presented in parts i and ii of the Task
2.4, indicate which model is better and explain why (About 500 words).
All important outputs from data
mining analyses conducted using RapidMiner for Task 2 should be included in
your Assignment 3 report to provide support for conclusions reached regarding
each analysis conducted for Task 2.1, Task 2.2, Task 2.3 and Task 2.4.
Note export
the important outputs from RapidMiner as jpg image files and include these screenshots in the relevant Task 2
sections and/or appendices of your Assignment 3 Report.
Note you
will find the Sharda et al. 2018 and North Text books useful references for the data mining process activities
conducted in Task 2 in relation to the exploratory data analysis, decision tree
analysis, logistic regression analysis and evaluation of the comparative
performance of the Final Decision Tree model and the Final Logistic Regression
model.
Task
3 (Worth 30 marks)
The aviation-wildlife.xlsx lists
historical data recorded for USA Aviation industry regarding wildlife strikes
with aircraft for the years 2000 to 2011. See Table 3.1 which provides the Data
dictionary for aviation-wildlife.csv Data set. It is important you understand
the variables in this data set in order to build the required Aircraft Wildlife
Strikes (AWS) dashboard with four specified Tableau views.
Table 3.1
Data dictionary for aviation-wildlife.csv Data set
Variable Name
|
Data Type
|
Description
|
|
1.
|
Aircraft:Type
|
Categorical
|
Aircraft,
Helicopter
|
2.
|
Airport:Name
|
Categorical
|
Name
of Airport
|
3.
|
Altitude-Bin
|
Categorical
|
<
1000 Metres, > 1000 Metres, Unknown
|
4.
|
Aircraft:Make/Model
|
Categorical
|
Make
and Model of Aircraft
|
5.
|
Wildlife:
Number struck
|
Categorical
|
Range
of numbers
|
6.
|
Effect:
Impact to flight
|
Categorical
|
None,
Aborted Take-off, Engine Shut Down,
|
|
|
|
Precautionary Landing, Other
|
7.
|
Effect:
Other
|
Categorical
|
Text
remarks recorded for flight
|
8.
|
Location:
Nearby if en route
|
Categorical
|
State
Abbreviation
|
9.
|
Aircraft:
Flight Number
|
Real
|
|
10.
|
FlightDate
|
Date
|
Date
of Flight
|
11.
|
Record
ID
|
Integer
|
Record
ID – unique integer number
|
12.
|
Effect:
Indicated Damage
|
Categorical
|
No
Damage, Caused Damage
|
13.
|
Location:
Freeform en route
|
Categorical
|
Text
remark recorded for flight
|
14.
|
Aircraft:
Number of engines?
|
Integer
|
1,
2, 3 or 4
|
15.
|
Aircraft:
Airline/Operator
|
Categorical
|
Airline
Operator
|
16.
|
Origin
State
|
Categorical
|
Flight
Origin State
|
17.
|
When:
Phase of flight
|
Categorical
|
Take-off
run, Approach, Climb, En-route,
|
|
|
|
Landing Roll
|
18.
|
Conditions:
Precipitation
|
Categorical
|
Fog,
None, Rain, Snow
|
19.
|
Remains
of wildlife collected?
|
Categorical
|
False,
True
|
20.
|
Remains
of wildlife sent to
|
Categorical
|
False,
True
|
|
Smithsonian
|
|
|
21.
|
Remarks
|
Categorical
|
Text
remarks recorded regarding aviation –
|
|
|
|
wildlife collusion
|
22.
|
Reported:
Date
|
Date
|
Date
Aircraft collusion with wildlife reported
|
23.
|
Wildlife:Size
|
Categorical
|
Small,
Medium, Large
|
|
|
|
|
24.
|
Conditions:
Sky
|
Categorical
|
No
Cloud, Overcast, Some Cloud
|
25.
|
Wildlife:
Species
|
Categorical
|
Different
types of wildlife mainly birds
|
26.
|
When:
Time (HHMM)
|
Categorical
|
24
hour format
|
27.
|
When:
Time of day
|
Categorical
|
Dawn,
Day, Night, Dusk
|
Pilot warned of birds or wildlife?
|
Categorical
|
Y = Yes, N = No
|
|
29.
|
Cost:
Aircraft time out of service
|
Integer
|
|
|
(hours)
|
|
|
30.
|
Cost:
Other (inflation adj)
|
Integer
|
|
31.
|
Cost:
Repair (inflation adj)
|
Integer
|
|
32.
|
Cost:
Total $
|
Integer
|
|
33.
|
Miles
from airport
|
Integer
|
|
34.
|
Feet
above ground
|
Integer
|
|
35.
|
Number
of human fatalities
|
Integer
|
|
36.
|
Number
of people injured
|
Integer
|
|
37.
|
Speed
(IAS) in knots
|
Integer
|
|
Task 3 requires you build a Tableau dashboard which includes
four different views of the aviation-wildlife.csv data set for the years 2000-2011 as
specified in sub Tasks 3.1, 3.2, 3.3
and 3.4.
Task 3.1 Create a Tableau View
of the impact of wildlife strikes with aircraft over time for a specific origin state. Provide a
screen capture of and describe the Tableau view you have created and comment on
the different types of impact to aircraft from wildlife strikes over time and
does this differ much for different origin states (About 125 words).
Task 3.2 Create a Tableau View
of flight phase by time of the day which shows when wildlife strikes with aircrafts occur. Provide a screen capture of
and describe the Tableau view you have created and comment on which phase of a
flight and time of the day wildlife strikes with aircraft are more likely to
occur (about 125 words)
Task 3.3 Create a Tableau View
that compares wildlife species in order of aircraft strike frequency and the chance of damage
occurring. Provide a screen capture of and comment on which wildlife species
are most frequently involved in aircraft strikes and which wildlife species are
most likely to have the most impact in terms of damage (total cost) when an
aircraft strike occurs (about 125 words).
Task 3.4 Create a Tableau
GeoMap View of flights by origin states that displays the number of wildlife strikes and total
monetary cost for each origin state for different periods of time. Provide a
screen capture of and describe the Tableau view you have created and comment on
this Tableau GeoMap View in relation to the number of wildlife strikes by
origin state and total monetary cost over time. A number of origin states
cannot be plotted on the geomap view as these are outside USA, comment on how
you can deal with this issue (About 125 words).
Note: you
need copy the four Text Table / Graph views and the dashboard you have created in Tableau using the Worksheet
Menu Copy or Export Image option and include in the Task 3 section where
relevant or in Appendix 3 of Assignment 3 report.
Task 3.5 Provide screen snapshot of your AWS Dashboard and an
accompanying rationale (drawing on
the relevant literature for good dashboard design) for the graphic design and
functionality that is provided by your AWS Dashboard for the four specified
Tableau views for sub Tasks 3.1, 3.2, 3.3 and 3.4 (About 500 words).
Note Stephen
Few is considered to be the Guru for good Dashboard Design and has wrote a number of books on this topic. Worth having a look at his website https://www.perceptualedge.com/about.php and in particular his examples of poorly designed dashboard views
and his suggestions for better dashboard views.
Presentation: Cover page, table of contents, page numbers, headings, sub headings, tables and diagrams, use of
formatting, spacing, paragraphs,
Writing style: Use of English (Correct use of language and grammar. Also,
is there evidence of spelling-checking
and proofreading?)
Quality of research evident by appropriate referencing: Appropriate level of referencing in text where required for a sub task,
reference list provided, used Harvard Referencing Style correctly
Assignment
3 Report should be structured as follows:
Assignment 3 Cover page
Table of Contents
Task 1 Main Heading
Task
1 Sub Tasks – Sub headings for Tasks 1.1 and 1.2 Task 2
Task
2 Sub Tasks – Sub headings for Task 2.1, 2.2, 2.3 and 2.4 Task 3
Task 3 Sub Tasks – Sub headings for
Task 3.1, 3.2, 3.3, 3.4 and 3.5
List of References
List of Appendices
You must
submit two files for Assignment 3:
1. Assignment 3 Report for Tasks 1, 2
and 3 in Word document format with extension .docx
2. Tableau packaged workbook with the
extension .twbx which must contain
required four Text Table / Graph views and a dashboard which consolidates these
four Tableau views for Task 3
You must
use the following file naming convention:
1. Studentno-Studentname-CIS8008Ass3.docx
2. Studentno-Studentname-CIS8008Ass3.twbx
You must
use Harvard referencing style – Harvard referencing resources
Install a bibliography referencing
tool – Endnote which integrates with your word processor. http://www.usq.edu.au/library/referencing/endnote-bibliographic-software or alternatively use
an online citation tool such as Zetoro or You Cite This
For Me
USQ Library - how to reference
correctly using Harvard referencing system https://www.usq.edu.au/library/referencing/harvard-agps-referencing-guide
To get solution visit our website www,sourceessay.com
This blog gives very important info about bi Tools Thanks for sharing, learn more about BI Tools like Tableau Online Training
ReplyDelete