ST1051 - Introduction to Probability and Statistics ST3905

ST1051-ST3905-ST5005-ST6030
ST1051 - Introduction to Probability and Statistics
ST3905 - Applied Probability and Statistics
ST5005 - Introduction to Probability and Statistics
ST6030 - Foundations of Statistical Data Analytics
Maeve McGillycuddy
Department of Statistics
School of Mathematical Sciences
University College Cork, Ireland
2016-2017
ST1051-ST3905-ST5005-ST6030
Course information
Tentative timetable (TBC with Maeve)
This module is taught in Period 1
Lectures:
Mondays 3-4pm in BHSC G01
Fridays 3-4pm in WGB G05
Tutorials:
Fridays 1-2pm in WGB G03
Practicals:
ST1051
Monday 4-5pm in lab WGB G34 (TBC)
Tuesday 3-4pm in lab WGB G34 (TBC)
Alternative slots for tutorials:
Monday 5pm, Wedn 12pm, Friday 4pm (WGB G05)
IPS
2
ST1051-ST3905-ST5005-ST6030
Course information
Assessment (to be confirmed with Maeve)
ST1051/ST3905:
2 home assignments (10 + 10 marks)
+ 90-minute exam (80 marks)
ST5005/ST6030:
3 home assignments (10 + 10 + 30 marks)
+ 90-minute exam (50 marks)
IPS
3
ST1051-ST3905-ST5005-ST6030
Course information
Module objective
To provide an understanding of fundamental notions of Probability
and Statistics, and explore basic probability and statistical notions
underlying hypothesis-driven data analytic methods.
IPS
4
ST1051-ST3905-ST5005-ST6030
Outline
1
Motivation
2
Elements of Probability Theory
3
Discrete Random Variables
4
Continuous Random Variables
5
Limit theorems
6
Statistical Inference
7
Estimation
8
Hypothesis Testing
IPS
5
ST1051-ST3905-ST5005-ST6030
References
[1] J. A. Rice, Mathematical Statistics and Data Analysis, 2nd Edition, ITP Duxbury
Press 1995
[2] J. L. Devore, Probability and Statistics for Engineering and the Sciences, 3rd
Edition, Brooks-Cole 1991
[3] J. D. Gibbons and S. Chakraborti, Nonparametric Statistical Inference, 4th
Edition, Dekker 2014
[4] B. S. Everitt and T. Hothorn, A Handbook of Statistical Analyses Using R,
Second Edition, Chapman & Hall 2010
[5] M. J. Crawley, Statistics: an Introduction Using R, Wiley 2005
[6] F. M. Dekking, C. Kraaikamp, H. P. Lopuha and L. E. Meester, A Modern
Introduction to Probability and Statistics, Springer 2005
[7] R Core Team (2014). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. URL
http://www.R-project.org/.
IPS
6
ST1051-ST3905-ST5005-ST6030
Motivation
Section I
Motivation
IPS
7
ST1051-ST3905-ST5005-ST6030
Motivation
General concepts
Probability? Statistics?
Focus on random or unpredictable phenomenon
Goal is usually to understand, represent, describe or predict
Probability theory aims at describing reality: mathematical
framework for representing real-life phenomena
Statistics aim at providing models and techniques to analyse
observations: data-driven approach
The central feature is always the information (data).
IPS
8
ST1051-ST3905-ST5005-ST6030
Motivation
General concepts
Statistics consist in the collection and analysis of data.
Probability theory provides a mathematical foundation for
statistics.
IPS
9
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Typical examples
Business, financial mathematics and actuarial science:
decision making, investment strategies
trading (high-probability trading, return plans, strategies, ...)
insurance / pensions (premium pricing, risk assessment, ...)
Engineering:
tracking mobile terminals in wireless networks
image and video processing
Medical and biostatistics:
clinical trials
diagnostic and prognostic analyses
genomics
IPS
10
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Why probability and statistics: space shuttle Challenger
[Dekking et al 2005]
On 28th January 1986, the space shuttle Challenger exploded
about one minute after it had taken off from the launch pad
at Kennedy Space Center in Florida
Root cause of the disaster: failure of O-rings (sealed joints
that link rocket boosters)
Apparently, a “management decision” was made to overrule
the engineers’ recommendation not to launch
IPS
11
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Why probability and statistics: space shuttle Challenger
The Challenger launch was the 24th of the space shuttle
program, and we can look at the data on the number of failed
O-rings, available from previous launches
Each rocket has three O-rings, and two rocket boosters are
used per launch
Because low temperatures are known to adversely affect the
O-rings, we also look at the corresponding launch temperature
IPS
12
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Figure: number of failed O-rings per mission
There are 23 dots: one time the boosters could not be recovered
from the ocean; temperatures are rounded to the nearest degree
Fahrenheit; in case of two or more equal data points these are
shifted slightly
IPS
13
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Modelling...
The probability p(t) that an individual O-ring fails should depend
on the launch temperature t. Use the data to calibrate this model
(a Binomial distribution) and estimate the expected number of
failures, 6p(t).
IPS
14
ST1051-ST3905-ST5005-ST6030
Motivation
Examples
Aftermaths...
Combining these with estimated probabilities of other events
needed for a complete failure of the joint, the estimated
probability of failure is 0.023...
Six field-joints implies probability of at least one complete
failure is 1 − (1 − 0.023)6 = 0.13
Would you hop on the shuttle?
IPS
15