# STAT2011 Lecture Notes

N/A
N/A
Protected

Share "STAT2011 Lecture Notes "

Copied!
25
0
0

Full text

(1)

## STAT2011 Lecture Notes

(2)

1 | P a g e

### Contents

Table of discrete distributions ... 8

Introduction ... 9

Mathematical theory of probability ... 9

Sample space ... 9

Sample point ... 9

Examples: ... 9

Events: ... 9

Probability as a measure ... 10

Sigma algebra ... 10

Example of sigma algebra ... 11

Probability space: ... 11

Eg: model of fair dice ... 11

Venn diagrams: ... 11

De morgan’s laws (for set theory): ... 12

Eg: ... 12

Equiprobable spaces ... 12

Claim: 𝑃𝐴 = 𝐴Ω ... 13

Conditional probability: ... 14

Independence ... 15

Law of total probability ... 15

Distributive law for set theory: ... 16

Baye’s rule: ... 16

Random Variables ... 16

Discrete RV ... 16

Probability mass function of Discrete RV: ... 16

Examples of random variables ... 17

Joint probability mass function: ... 20

Independence: ... 21

Poisson Distribution: ... 22

Assumptions of poission: ... 22

Distibution of a sum of random variables: (arithmetricies of RV) ... 26

Convolution: ... 27

Expectation: ... 28

Definition ... 28

Examples of particular kinds ... 28

(3)

2 | P a g e

Bernoulli: ... 28

Poisson ... 28

Binomial ... 29

Geometric: ... 29

Negative binomial ... 29

Hypergeometic ... 30

Negative expectation: ... 30

Expectation of a function of a RV: ... 30

Proof:... 30

Proof of 2: ... 31

Variance ... 31

Definition: ... 32

Variance and expectation computation... 32

Examples of variance ... 32

Table of discrete distributions ... 34

Expectation as a linear operator ... 35

Expected value of sum of 𝑋 + 𝑌 ... 35

Random vector ... 35

Expectation multiplication ... 37

Claim: ... 37

L1 and L2 ... 38

Example of 𝑋 ∈ 𝐿1 but not ∈ 𝐿2 ... 38

What about the opposite? ... 38

So is 𝑉𝑋 + 𝑌 ∈ 𝐿2? ... 38

Claim: ... 38

Claim: shifting 𝑉(𝑋) ... 39

Covariance... 40

Definition ... 40

Claim: ... 40

So; going back to: 𝑉𝑋 + 𝑌 ... 41

By induction: variance of 𝑋𝑖 ′𝑠 ... 42

Chebyshev’s inequality ... 44

Lemma: Markov’s Inequality... 45

Law of large numbers: ... 46

Convergence probability ... 46

Theorem: Weak law of large numbers ... 47

(4)

3 | P a g e

Strong law of large numbers: ... 47

The multinomial distribution: ... 47

Multinomial random vector: ... 48

Goal: pm for multinomal Random vector ... 48

Estimation: ... 49

The method of moments ... 49

Method of moments ... 49

For estimating ... 51

Maximum likelihood estimation (MLE) ... 53

Likelihood function ... 53

Hardy-Weinberg equilibrium ... 59

theory ... 59

Example: ... 59

So how do we rigorously compare different estimators? ... 61

Back to HW: ... 63

Hardy – Weinberg equilibrium ... 65

Delta method: ... 66

Parametric Bootstrap method ... 72

Example: Binomial ... 72

Example 2: Hardy Weiberg equilibrium ... 73

Conditional Expectation/Variance ... 74

Conditional expectation: ... 74

Example: die ... 75

Claim: L1 ... 75

Random Sums: ... 75

Examples: ... 75

Conditional Expectation (the RV)…. (not a number, a random variable)…... 77

Definition: ... 77

Theorem: Expectation of conditional expectation (total expectation law) ... 79

What about the variance of conditional expectation? 𝑉𝐸𝑌𝑋 is it 𝑉𝑌 ? ... 80

Conditional Variance: ... 81

Analogously: ... 81

So, conditional variance: ... 81

Continuous Random Variables ... 83

CDF (Cumulative distribution function) ... 83

Definition: ... 83

(5)

4 | P a g e

Continuous random variable definition: ... 86

PDF: 𝑓 ... 86

Examples of PDF/CDF distrbution: ... 88

Uniform distribution: ... 88

Exponential distribution ... 89

Gamma distribution: ... 91

Normal distribution ... 93

Quantiles: ... 94

Quantile definition: ... 94

Special quantiles ... 95

Pth quantile ... 95

𝑝th quantile definition: ... 95

Quantile function: Definition ... 96

Functions of random variables for continuous RV... 97

Claim: uniform distribution ... 97

Claim: sampling ... 98

SAMPLING‼!... 99

Claim: ... 99

Theorem: Random variable functions ... 99

Examples of samples and functions: ... 100

Joint distribution (discrete analogue of joint pmf) ... 102

Joint density definition: ... 103

Marginal distribution: ... 105

Proof: ... 105

Examples: ... 106

Gamma/exponential joint ... 106

Uniform distribution of region in a plane ... 107

Bivariate normal (special case) ... 109

Independent RV’s (continuous) ... 110

For continuous: ... 110

Definition: ... 110

Examples: ... 111

Conditional distributions: ... 113

Continuous: ... 113

Construction: ... 113

(6)

5 | P a g e

Conditional density: ... 114

Examples: ... 115

Sampling in 2D ... 116

Discrete: ... 116

Continuous ... 116

Sum of Random Variables ... 116

Continuous: ... 116

Density of Sum of continuous random variables 𝑍 = 𝑋 + 𝑌: (convolution) ... 117

Quotients of RV’s ... 119

Formula for quotients: ... 122

For independent RV ... 122

Functions of jointly continuous RV ... 122

Mapping: ... 122

Blow up factor: ... 123

Jacobian: generally ... 124

Box-Muller Sampling ... 128

Visual comparison of distributions ... 129

QQ plot ... 129

Examples: ... 129

More general QQ plot: ... 130

Empirical data: ... 130

Extrema and order statistics ... 137

Sorted values: order statistics ... 137

Claim: ... 137

Expectation of Order statistics ... 139

Expectation of a continuous RV ... 139

Definition: ... 139

Looking at examples: ... 139

Expectation of a function for continuous RV ... 142

In continuous case analogue: Expectation of function ... 142

Variance for continuous: ... 143

Equation to calclute: ... 144

Variance of linear sum: ... 144

Covariance: ... 144

Claim: ... 144

Variance of sum of RVs independent:... 144

(7)

6 | P a g e

Standard deviation: ... 144

Correlation coefficient: ... 145

Claim: Correlation coefficient ... 145

Examples of variance ect: ... 145

Standard normal: ... 145

General normal: 𝑋 ∼ 𝑁𝜇, 𝜎2 ... 146

Gamma ... 146

Bivariante standard normal: ... 147

Markov’s inequality in Continuous: ... 148

continuous ... 148

Chebysshev’s inequality: ... 148

Weak law of large numbers: ... 149

Strong law of large numbers: ... 149

Estimation for continuous ... 149

Examples: ... 149

Standard normal: ... 149

Normal: ... 149

Exponential: ... 149

Scaled Cauchy: ... 149

Method of moments: ... 149

Examples: ... 150

Maximum likelihood estimation (MLE) ... 150

For continuous: ... 151

Conditional expectation: ... 152

Conditional expectation def: ... 152

More generally: ... 152

Mixed distribution ... 153

Examples ... 154

Uniform and binomial ... 154

Natural valued with iid sum ... 154

Expectation of mixed distirbutoins: ... 154

Eg 1: ... 154

Eg 2: ... 155

Variance of random sum: ... 155

Law of total probability ... 155

We can now prove this: ... 155

(8)

7 | P a g e

Prediction ... 156

Predictor ... 156

Claim: ... 157

Back to prediction problem: ... 157

Example of prediction: ... 158

Moment generating function (MGF) ... 160

Definition: ... 160

Calculating MGF: ... 160

So why are we doing this? ... 161

Theorem 1: ... 161

Theorem 2: ... 163

Weak convergence/converges in distribution ... 164

Weakly converges ... 164

Characteristic function (CF)... 167

Confidence intervals ... 168

Estimation: ... 168

We want to know- how close are we to 𝜃? ... 168

Confidence interval set up: ... 169

Constructing confidence intervals (CI s) ... 172

Examples of confidence intervals: ... 172

Constructing approximate Cis based on the MLE ... 180

Confidence interveals for Bernoulli 𝜃: ... 181

(9)

8 | P a g e

### Table of discrete distributions

Distribution Model pmf 𝐸(𝑋) 𝑉(𝑋)

𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝) Success or failure, with probability 𝑝 Eg: coin toss

𝑝 𝑝 𝑝(1 − 𝑝)

𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑝) Probability 𝑝 to first success: eg coin toss till first head

(1 − 𝑝)𝑘−1𝑝 1

𝑝

1 − 𝑝 𝑝2

𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝) 𝑛 Bernoullii trials.

Eg 𝑛 coin tosses

(𝑛

𝑘) 𝑝𝑘(1 − 𝑝)𝑛−𝑘 𝑛𝑝 𝑛𝑝(1 − 𝑝)

𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆) Number of events in a certain time interval.

Eg: number of radiactive particles emmited in certain time

𝑒−𝜆𝜆𝑘 𝑘!

𝜆 𝜆

𝐻𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐 (𝑟, 𝑛, 𝑚)

number of red balls 𝑟 in a sample of 𝑚 balls drawn without replacement from an urn with 𝑟 red balls and 𝑛 − 𝑟 black balls

(𝑟

𝑘) (𝑛 − 𝑟 𝑚 − 𝑘) (𝑛

𝑚)

𝑚𝑟

𝑛 𝑚𝑟

𝑛 𝑛 − 𝑟

𝑛

𝑛 − 𝑚 𝑛 − 1

𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑟, 𝑝)

Number of Bernoulli trials till the 𝑟𝑡ℎ success

(𝑘 − 1

𝑟 − 1) (1 − 𝑝)𝑘−𝑟𝑝𝑟 𝑟 𝑝

𝑟(1 − 𝑝) 𝑝2

(10)

9 | P a g e Uri Keich; Carslaw 821; Monday 5-6

### Introduction

- Probability in general and this course has 2 components:

o Mathematical theory of probability

 Definitions, theorems and proofs

 Abstraction of experiments whose outcome is random

o Modelling: argued rather than proved applications can be confusing, as well as ill defined

 Useful for describing/summarising the data and making accurate predictions

### Sample space

The set of all possible outcomes, denoted Ω, is the sample space

### Sample point

A point 𝜔 ∈ Ω is a sample point

Examples:

Die:

Ω = {1,2,3,4,5,6}

Coin:

Ω = {𝐻, 𝑇}

- Complication: do we model the possibility that the coin can land on its side?

### Events:

- Events are subsets of Ω, for which we can assign a probability.

- We say an event 𝐴 occurred if the outcome, or sample point 𝜔 ∈ Ω satisfies 𝜔 ∈ 𝐴

(11)

10 | P a g e

### Probability as a measure

𝑃(𝐴) is the probability of the event 𝐴, which intuitively is the rate at which 𝐴 occurs if we repeat the experiment many times.

Mathematically: 𝑃 is a probability measure function if:

1. 𝑃(Ω) = 1

2. 𝑃(𝐴) ≥ 0 for any event 𝐴

3. If 𝐴1, 𝐴2, … are mutually disjoint (𝐴1∩ 𝐴2∩ … = ∅) then (𝑃(∪𝑛=1 𝐴𝑛) = ∑𝑛=1𝑃(𝐴𝑛)) For this to make sense, we need to know that a union of a sequence of events is ALSO an event.

- Why do we bother with determining events? Why can’t any subset of Ω be an event?

o Imagine choosing a point at random in a large cube 𝐶 ⊂ ℝ3, where the probability of the point lying in any set 𝐴 ⊂ 𝐶 is proportional to its volume |𝐴|

o Clearly, if 𝐴, 𝐵 ∈ 𝐶 are related through a rigid motion (translation + rotation), then

|𝐴| = |𝐵| so their probabilities are the same

o Similarly, if A ∈ C can be split into 𝐴1∪ 𝐴2, then 𝑃(𝐴) = 𝑃(𝐴1) + 𝑃(𝐴2) o Bernard-Tarski Paradox:

 The unit ball 𝐵 in ℝ3 can be decomposed into 5 pieces, which can be assembled using only rigid motions into two balls of the same size.

 But then: 𝑃(𝐵) = 𝑃(𝐵1) + 𝑃(𝐵2) + ⋯ + 𝑃(𝐵5) = 𝑃(𝐵1) + ⋯ + 𝑃(𝐵5) = 2𝑃(𝐵)

 This is why we cannot assign probabilities to EVERY subset of Ω. Probabilities can’t be assigned to arbitrary sets, rather to sets which are “measurable”.

The collection of measurable sets is captured by the notion of 𝜎 algebra (𝜎 −field)

### Sigma algebra

Definition:

A collection of subsets of Ω is a 𝜎 − 𝑎𝑙𝑔𝑒𝑏𝑟𝑎 is 1. Ω ∈ 𝐹

2. 𝐴 ∈ 𝐹 ⟹ 𝐴𝑐∈ 𝐹

3. 𝐴1, 𝐴2, … ∈ 𝐹 ⟹∪𝑛=1 𝐴𝑛 ∈ 𝐹

2. says that 𝐹 is closed with respect to complementing and taking the complement 3. says that 𝐹 is closed with respect to a countable union

- (countable means you can count it) 𝐴 = {1,3,5} is a finite countable set - ℕ is an infinite countable set - ℤ is also an infinite countable set - ℚ is infinite countable

- ℝ is not countable

(12)

11 | P a g e Example of sigma algebra

Ω = {1,2, … 6}

𝐹 = the power set of Ω = the set of all subsets of Ω

= {∅, {𝑎𝑙𝑙 𝑠𝑖𝑛𝑔𝑙𝑒𝑠}, {𝑎𝑙𝑙 𝑑𝑜𝑢𝑏𝑙𝑒𝑠} … , Ω Denoted

= 2Ω Questions:

What is the cardinality of 𝐹, or how many subsets of Ω does it contain -

Is 2Ω ALWAYS a 𝜎 algebra

What is the smallest 𝜎 algebra regardless of the sample space

- 2|Ω| ?

- {∅, Ω}

### Probability space:

A probability space consists of 1. A sample space Ω

2. A 𝜎 − algebra of subsets of Ω, 𝐹 3. A probability measure: 𝑃: 𝐹 → ℝ Eg: model of fair dice

Ω = {1,2, … 6}

𝐹 = 2Ω 𝑃({, }) =1

6, 𝑖 = [1,6]

𝑃(𝐴) =|𝐴|

6 (𝑓𝑜𝑟 𝐴 ∈ 𝐹)

### Venn diagrams:

Useful to vizualise relations between sets. They help plan proofs, but don’t use them as PROOFS in a course

(13)

12 | P a g e

### De morgan’s laws (for set theory):

(𝐴 ∪ 𝐵)𝑐 = 𝐴𝑐∩ 𝐵𝑐 (𝐴 ∩ 𝐵)𝑐 = 𝐴𝑐∪ 𝐵𝑐

Eg:

𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

- Why is 𝐴 ∪ 𝐵 and ∩ 𝐵 ∈ 𝐹 ? (why are they events)

𝐴 ∪ 𝐵 = 𝐴 ∪̇ (𝐵 \𝐴) [(∪)̇ = 𝑑𝑖𝑠𝑗𝑜𝑖𝑛𝑡 𝑢𝑛𝑖𝑜𝑛) 𝐵\𝐴 = 𝐵 ∩ 𝐴𝑐

∴ 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵\𝐴)

(𝑢𝑠𝑖𝑛𝑔 𝑡ℎ𝑎𝑡 (𝑃(∪𝑛=1 𝐴𝑛) = ∑ 𝑃(𝐴𝑛)

𝑛=1

, 𝑤𝑖𝑡ℎ 𝐴 𝑎𝑛𝑑 𝐵 𝑎𝑛𝑑 𝑖𝑛𝑓𝑖𝑛𝑖𝑡𝑒 ∅ 𝑠 )

(probability of empty set is 0)

𝑃(∅)𝑃(∪ ∅) = ∑ 𝑃(∅)

𝑛=1

→ 𝑃(∅ ) = 0

𝑃(𝐵) = 𝑃(𝐵\𝐴) + 𝑃(𝐵 ∩ 𝐴)

→ 𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

### Equiprobable spaces

- A space consistics of a finite sample space such that ∀𝜔 ∈ Ω 𝑃({𝜔}) = 𝑐 > 0 - (all equally likely)

A 𝐵

(14)

13 | P a g e 𝑁𝑂𝑡𝑒: 𝜔 ∈ Ω 𝑖𝑠 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑜𝑖𝑛𝑡, {𝜔} = 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 𝑖𝑛 𝐹

- Why does it have to be finite?

o Otherwise

{𝜔}𝑛=1 ⊂ Ω

𝑃(∪𝑛{𝜔𝑛}) = ∑ 𝑃(𝜔𝑛)

𝑛

= ∑ 𝑐

𝑛=1

= ∞ [𝑏𝑢𝑡 ∀𝐴 ⊂ 𝐹, 𝑃(𝐴) ∈ [0,1] (𝑤ℎ𝑦? → 𝑠ℎ𝑜𝑤))

Claim: 𝑃(𝐴) = |𝐴||Ω|

Proof:

𝐴 =∪𝜔∈𝐴{𝜔}

∴ 𝑃(𝐴) = 𝑃(∪𝜔∈𝐴{𝜔}) = ∑ 𝑃(𝜔)

𝜔∈𝐴

= |𝐴| × 𝑐 𝑡𝑎𝑘𝑒 𝐴 = Ω

→ 𝑃(Ω) = 1 (𝑑𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛) = |Ω|𝑐

→ 𝑐 = 1

|Ω|

→ 𝑃(𝐴) =|𝐴|

|Ω|

This means: that probability is just combinatorics‼‼

Example of probability combinatorics 1. A fair die is rolled:

𝑃({𝜔}) =1 6

2. A group of 𝑛 people meet at a party, what is the probability that at least 2 od them share a birthday

o model that no leap years, no association between birthdays Ω = {(𝑖1, … 𝑖𝑛)|𝑖𝑘 ∈ {1, … 365}}

→ 1

|Ω|= 365−𝑛

𝐴 = {(𝑖1, … 𝑖𝑛) ∈ Ω}∃𝑗 ≠ 𝑘, 𝑖𝑗= 𝑖𝑘 𝐿𝑒𝑡𝑠 𝑟𝑎𝑡ℎ𝑒𝑟 𝑐𝑜𝑚𝑝𝑢𝑡𝑒 𝑡ℎ𝑒 𝑐𝑜𝑚𝑝𝑙𝑒𝑚𝑒𝑛𝑡 𝐴𝑐 = {(𝑖1, … 𝑖𝑛) ∈ Ω}∃𝑗 ≠ 𝑘, 𝑖𝑗 ≠ 𝑖𝑘

(15)

14 | P a g e 𝑃(𝐴𝑐) =|𝐴𝐶|

|Ω| =365 × 364 × … (365 − 𝑛 + 1) 365𝑛

= ∏365 − (𝑖 − 1) 365

𝑛

𝑖=1

{𝑛 ≤ 365}

= ∏ 1 −𝑖 − 1 365

𝑛

𝑖=1

𝑃(𝐴) = 1 − (365

𝑛 ) 365𝑛 Sidenote: for 𝑛 = 23 → 𝑃(𝐴) ≈1

2

3. what is the probability that one of the guests shares YOUR birthday 𝐵 = {(𝑖1, … 𝑖𝑛) ∈ Ω}∃𝑗, 𝑖𝑗 = 𝑥 (𝑦𝑜𝑢𝑟 𝑏𝑖𝑟𝑡ℎ𝑑𝑎𝑦)}

∴ 𝐵𝑐 = {(𝑖1, … 𝑖𝑛) ∈ Ω}∀𝑗, 𝑖𝑗 ≠ 𝑥

= |𝐵𝑐| = 364𝑛

→ 𝑃(𝐵𝑐) =364𝑛 365𝑛

= (1 − 1 365)

𝑛

≈ 3365𝑛 Sidenote: for 𝑛 = 253, 𝑃(𝐵) ≈1

2

### Conditional probability:

If 𝐴, 𝐵 ∈ 𝐹 and 𝑃(𝐵) > 0 we can define (Probability of A given B) 𝑃(𝐴|𝐵) =𝑃(𝐴 ∩ 𝐵)

𝑃(𝐵)

Example:

Ω = {1, … ,6}

A 𝐵

Ω

(16)

15 | P a g e 𝐴 = {1,2}; 𝐵 = {1,3,5}

### Independence

Event 𝐴 ∈ 𝐹 is independent (ind) of 𝐵 ∈ 𝐹 if knowing whether or not 𝐴 occurred, does not give us any information on whether or not 𝐵 occurred.

∴ 𝑃(𝐵|𝐴) = 𝑃(𝐵)

𝑃(𝐵∩𝐴)

𝑃(𝐴) = 𝑃(𝐵)

∴ 𝑃(𝐵 ∩ 𝐴) = 𝑃(𝐴)𝑃(𝐵)

- this definition is more robust (symmetric about A and B), and no need to assume 𝑃(𝐴) > 0;

and easier to generalise general:

𝑃(𝐴1,∩ 𝐴2, … ∩ 𝐴𝑛) = 𝑃 (∏ 𝐴𝑖

𝑛

𝑖=1

) - stronger than pairwise independence‼!

NOTE: independence is NOT DISJOINT (disjoint gives you total knowledge that the other would not have occurred)

### Law of total probability

- a probability of an event may be computed by summing over all eventualities

3 machines A B C

Production rate .05 .2 .3

Failure rate .01 .02 .005

𝑃(𝑓𝑎𝑖𝑙𝑒𝑑 𝑝𝑟𝑜𝑑𝑢𝑐𝑡) = .01(. 05) + .2(. 02) + .3(. 005) Generally:

If 𝑃𝑗∈ 𝐹 forms a partition of Ω Ω =∪𝑗̇ 𝐵𝑗

𝑡ℎ𝑒𝑛 𝑃(𝐴) = 𝑃(𝐴 ∩ Ω)

= 𝑃 (𝐴 ∪ (∪𝑗̇ 𝐵𝑗)) = 𝑃 (∪𝑗̇ (𝐴 ∩ 𝐵𝑗)) = ∑ 𝑃(𝐴 ∩ 𝐵) = ∑ 𝑃(𝐴)𝑃(𝐵|𝐴)

𝑗 𝑗

(17)

16 | P a g e Distributive law for set theory:

𝐴 ∪ (𝐵 ∩ 𝐶) = (𝐴 ∪ 𝐵) ∩ (𝐴 ∪ 𝐶) 𝐴 ∩ (𝐵 ∪ 𝐶) = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶)

### Baye’s rule:

Diagnostic: which of the events 𝐵𝑗 triggered the event 𝐴?

𝑃(𝐵𝑗|𝐴) =𝑃(𝐴 ∩ 𝐵𝑗) 𝑃(𝐴) (which machine B caused the failure of A?)

= 𝑃(𝐴|𝐵𝑗)𝑃(𝐵𝑗)

∑ 𝑃(𝐴|𝐵𝑖 𝑖)𝑃(𝐵𝑖)

(𝑢𝑠𝑖𝑛𝑔 𝑑𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑙𝑎𝑤 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑎𝑛𝑑 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦

### Random Variables

A RV is a measurable function 𝑋: Ω → ℝ

### Discrete RV

A RV is is discrete if its range:

𝑋(Ω) = {𝑋(𝜔): 𝜔 ∈ Ω} ⊂ ℝ Is a countable set (finite or infinite)

Probability mass function of Discrete RV:

The pmf pf 𝑋 is definite as:

𝑝𝑋(𝑥) = 𝑃(𝑋 = 𝑥) = 𝑃({𝜔: 𝑋(𝜔) = 𝑥}) [𝑤𝑖𝑡ℎ 𝑥 ∈ ℝ]

- why is {𝜔: 𝑋(𝜔) = 𝑥} ∈ 𝐹 ? Properties of pmf

Claim:

1. ∀𝑥 ∈ ℝ \𝑋(Ω) then 𝑝𝑋(𝑥) = 0 (the pmf of something which is unattainable is 0) 2. With {𝑥𝑖} = 𝑋(Ω), ∑ 𝑝𝑖 𝑋(𝑥)= 1 (the pmf of all outcomes is 1)

The distribution of a discrete RV is completely specified by its pmf. Indeed, ∀𝐴 ⊂ ℝ 𝑃(𝑋 ∈ 𝐴) = ∑ 𝑝𝑋(𝑥𝑖)

𝑖:𝑥𝑖∈𝐴

(shows the probability that an outcome is going to happen)

(18)

17 | P a g e - We can thus specify the distribution of a RV 𝑋 by specifying its pmf 𝑝𝑋

Question:

Can any functions 𝑝: ℝ → [0,1] with a countable support {𝑥: 𝑝(𝑥) > 0} such that ∑𝑖:𝑝(𝑟)>0𝑝(𝑥𝑖)= 1 be a pmf for some random variable?

ClaimL if 𝑝 is as above, then there exist a probability space (Ω, 𝐹, 𝑝) obout a RV 𝑋: Ω → ℝ such that 𝑝𝑋 = 𝑝

Proof: Ω = {𝑥: 𝑝(𝑥) > 0}; 𝐹 = 2Ω; 𝑃(𝐴) = ∑𝑥∈𝐴𝑝(𝑥) 𝑋(𝜔) = 𝜔 Examples of random variables

Bernoulli random variable Is defined by the pmf

𝑝(𝑥) = {1 − 𝑝; 𝑥 = 0

𝑝; 𝑥 = 1 (𝑓𝑎𝑖𝑙𝑢𝑒𝑟 𝑎𝑛𝑑 𝑠𝑢𝑐𝑐𝑒𝑠𝑠) Eg:

Ω = {𝐻, 𝑇}: 𝑃(𝐻) = 𝑝 𝑋(𝜔) = {1; 𝜔 = 𝐻

0; 𝜔 = 𝑇 Binomial random variable

𝑆𝑛 models the number 𝐴 of some in an iid (independatn and identically distributed) Bernoulli trials A 2 parameter family of distributions: (𝑛, 𝑝)

Note: if 𝑋𝑖 = 1: {𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙} = {0 𝑖𝑓 𝑓𝑎𝑖𝑙𝑢𝑟𝑒

1 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠; then 𝑋𝑖 ∼ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝) and 𝑆𝑛 = ∑ 𝑋𝑛1 𝑖 Generally: fpr an event 𝐴, then the random variable 1𝐴 is the indicator function of 𝐴:

1𝐴(𝑅𝑉) = { 1 𝑖𝑓 𝜔 ∈ 𝐴 0 𝜔 𝑛𝑜𝑡 𝑖𝑛 𝐴

𝑆𝑛(Ω) = {0,1,2, … 𝑛}

For 𝑘 ∈ 𝑆𝑛(Ω); 𝑝(𝑆𝑛= 𝑘) is:

- Consider the configuration of a Bernoulli trials: 𝑠, 𝑠, 𝑠, … . 𝑓, 𝑓, 𝑓..

- Its probability is 𝑝𝑘(1 − 𝑝)𝑘× (𝑛 𝑘) Binomial RV pmf

∴ 𝑝(𝑆𝑛 = 𝑘) = (𝑛

𝑘) 𝑝𝑘(1 − 𝑝)𝑘 Geometric random variable:

Models the number of iid Bernoulli (𝑝) trials if it takes till the first success

(19)

18 | P a g e 𝑋(Ω) = {1,2,3 … } ∪ {∞}

A 1-parameter family : 𝑝

Pmf of geometric random variable

𝑝(𝑋 = 𝑘) = (1 − 𝑝)𝑘−1𝑝

𝑃(𝑋 > 𝑘) = (1 − 𝑝)𝑘

𝑃(𝑋 > 𝑛 + 𝑘|𝑋 > 𝑘) =𝑃(𝑋 > 𝑛 + 𝑘, 𝑋 > 𝑘)

𝑃(𝑋 > 𝑘) (𝑢𝑠𝑖𝑛𝑔 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦)

=𝑃(𝑋 > 𝑛 + 𝑘)

𝑃(𝑋 > 𝑘) =(1 − 𝑝)𝑛+𝑘

(1 − 𝑝)𝑘 = (1 − 𝑝)𝑛= 𝑃(𝑋 > 𝑛)

This property is “memoryless” (the probability does not remember what has happened before) - Geometric is the ONLY discrete memoryless distribution

If 𝑥𝑖 = 1{𝑠𝑢𝑐𝑐𝑠𝑒𝑠𝑠 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙 𝑖}, then 𝑥 = min{𝑖: 𝑋𝑖 = 1}

Negative binomial RV

Models the number of iid Bernoulli trials till the 𝑟𝑡ℎ success, where 𝑟 ∈ ℕ - A 2 parameter disctribution

- Note: 𝑋1: 𝑟 = 1 is a geometric random variable

𝑋𝑟(Ω) = {𝑟, 𝑟 + 1, … } ∪ {∞}

Pmf:

For 𝑘 ∈ 𝑋𝑟(Ω), 𝑃(𝑋𝑟 = 𝑘) =

- Consider the configuration of its Bernoulli trials (𝑓, 𝑓, 𝑓, … . 𝑠, 𝑠, 𝑠, 𝑠)

(20)

19 | P a g e 𝑝 = (𝑘 − 1

𝑟 − 1) (1 − 𝑝)𝑘−𝑟𝑝𝑟

𝑋𝑟 = min {𝑚: ∑ 𝑥𝑖

𝑛

1

= 𝑟}

Hypergeometric RV

𝑋 models the number of red balls in a sample of 𝑚 balls drawn without replacement from an urn with 𝑟 red balls and 𝑛 − 𝑟 black balls

𝑋(Ω) ⊂ {0,1, … 𝑟}

Probability:

For 𝑘 ∈ 𝑋(Ω)

𝑝𝑚𝑓: 𝑃(𝑋 = 𝑘) =(𝑟

𝑘) (𝑛 − 𝑟 𝑚 − 𝑘) (𝑛

𝑚) Which is a 3 parater family: (𝑟, 𝑛, 𝑚)

(is not phrased in iid Bernoulii RV)

- A famous case of this is the Fisher Exact Test Fisher Exact Test:

30 convicted criminals with same sex twin of 13 which 13 were identical and 17 were non identical twins.

- Is there evidence of a genetic link?

(21)

20 | P a g e -

Convicted Not convicted

Identical 10 3 13

different 2 15 17

12 18

- Assuming that wheterh or not the twin of the criminal is also convicted does not depend on the biological type of the twin, we have a sample from a hypergeometric distribution Indeed: there are 13 red (monozygote) balls and 17 black (dizygotic) balls. We randomly sample 12 balls (convicted), what is the probability that we will see 10 or more red balls in the same sample?

Let 𝑋 ∼ 𝐻𝑦𝑝𝑒𝑟(13,17,12) = (𝑟, 𝑘, 𝑚)

𝑃(𝑋 ≥ 10) = ∑ (13

𝑘) ( 17 12 − 𝑘) (30

12)

12

𝑘=0

≈ 0.000465

So- it seems very unlikely that there is no relation between the conviction of the twin and its type, but:

- Did we establish that a criminal mind is inherited?

o Problem with this statement

o Conviction vs truth “that face is up to no good??”

o Sampling/ascertainment bias o Identical twins tighter connection?

### Joint probability mass function:

The joint pmf of the RVs 𝑋 and 𝑌 specifies their interaction:

𝑃𝑋𝑌(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) (𝑥, 𝑦 ∈ ℝ) Note: {𝑋 = 𝑥, 𝑌 = 𝑦}; {𝜔 ∈ Ω| 𝑋(𝜔) = 𝑥, 𝑌(Ω = 𝑦}

- If 𝑋 and 𝑌 are discrete Random Variables, with o {𝑥𝑖} = 𝑋(Ω); 𝑎𝑛𝑑 {𝑦𝑖} = 𝑌(Ω)

Then: (keeping 𝑦 fixed)

𝑃𝑋(𝑥) = 𝑃(𝑋 = 𝑥) = ∑ 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦𝑗)

𝑗

= ∑ 𝑃𝑋𝑌(𝑥, 𝑦𝑗)

𝑗

Similarly:

𝑃𝑌(𝑦) = ∑ 𝑃𝑋𝑌(𝑥𝑖, 𝑦)

𝑖

𝑃𝑋 and 𝑃𝑦 are referred to as the marginal pmf’s

(22)

21 | P a g e Example:

A fair coin is tossed 3 times

𝑋 = 1 {𝐻 𝑜𝑛 𝑓𝑖𝑟𝑠𝑡 𝑡𝑜𝑠𝑠} = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑒𝑎𝑑𝑠 𝑖𝑛 𝑓𝑖𝑟𝑠𝑡 𝑡𝑜𝑠𝑠 𝑌 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑒𝑎𝑑𝑠

Ω = {𝐻𝐻𝐻, 𝐻𝐻𝑇 … }; |Ω| = 8

𝒀 0 1 2 3

𝑿 1/8 2/8 1/8 0 1/2

𝟎 0 1/8 2/8 1/8 1/2

𝟏 0 1/8 2/8 1/8

1/8 3/8 3/8 1/8

So, 𝑃(𝑋 = 0) =1

8

(note: called marginal, as it is in the margis) NOTE:

𝑋 ∼ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙 (1 2) 𝑌 ∼ 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (3,1

2)

Are 𝑋 and 𝑌 independent RVs?

- The random variable 𝑋 is indepdent of 𝑌 if “knowing the value of 𝑌 does not change the discibtion of 𝑋” (so NO- they are not independent)

Independence:

𝑃(𝑋 = 𝑥𝑖, 𝑌 = 𝑦𝑗) = 𝑃(𝑋 = 𝑥𝑖, 𝑌 ≠ 𝑦𝑗)

→ 𝑃(𝑋 = 𝑥𝑖, 𝑌 = 𝑦𝑗) = 𝑃(𝑋 = 𝑥𝑖)𝑃(𝑌 = 𝑦𝑗) (𝐷𝑒𝑓𝑖𝑛𝑖𝑡𝑖𝑜𝑛) 𝑃𝑋𝑌(𝑥, 𝑦) = 𝑃𝑋(𝑥)𝑃𝑌(𝑦)

The joint pmf factors into the product of the marginal

More generally, the random variables 𝑋1, … 𝑋𝑁 are independent if:

(23)

22 | P a g e 𝑃(𝑋1= 𝑥1, … 𝑋𝑛= 𝑥𝑛) = 𝑃𝑋1,…𝑋𝑁(𝑥1, … 𝑥𝑛) = ∏ 𝑃𝑋𝑖(𝑥𝑖)

𝑛

𝑖

(∀𝑥𝑖∈ ℝ)

Note: take caution that PAIWISE INDEPEDENCE DOES NOT IMPLY INDEPENDENCE

### Poisson Distribution:

Recall:

𝑋 ∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆) If

𝑃𝑋(𝑘) = 𝑒−𝜆 𝜆𝑘 𝑘!

(𝑘 ∈ ℕ - A 1 parameter faily, 𝜆 > 0

- Where does it come from?

o Models the number of “events” that are registered in a certain time interval, eg: the number of

 Particles emitted by a radioactive source in a hour

 Incoming calls to a service centre between 1-2 pm

 Light bulbs burnt in a year

 Fatalities from horse kicks in the Prussian cavalry over 200 corp years (Boriewiez 1898)

20 corps x 10 years = 200 corp years

Number of deaths Observed count frequency Poisson approximation

0 109 .545 .543

1 65 .325 .331

2 22 .110 .101

3 3 .015 .021

4 1 .005 .003

So, how did we come up with the Poisson distribution?

Assumptions of poission:

1. The distribution of the number of events of any time interval depends only on its length or duration. (eg; number of horse kicks only depends on that it is a day, not on the particular day)

(24)

23 | P a g e 2. The number of events recorded in two disjoint time intervals are independent of one

another (numer of horse kicks is independent from today and yesterday)

3. No two events are recorded at exactly the same time point (you can’t have 2 horsekicks exactly at the same time, must be slightly different in time)

So:

Let 𝑋𝑡,𝑠 denote the number of events in the time intercal (𝑡, 𝑠]

- Denote by: 𝑋𝑡 = 𝑋0,𝑡

Our goal is to find the disctibution of 𝑋:

- Let 𝑓(𝑡) = 𝑃(𝑋𝑡= 0), then 𝑓(𝑡 + 𝑠) = 𝑃(𝑋𝑡+𝑠= 0)

= 𝑃(𝑋𝑡 = 0, 𝑋𝑡,𝑠+𝑡= 0) (𝑑𝑖𝑠𝑗𝑜𝑖𝑛𝑡 𝑡𝑖𝑚𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙𝑠, 𝑎𝑛𝑑 𝑢𝑠𝑖𝑛𝑔 𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦 2)

= 𝑃(𝑋𝑡 = 0)𝑃(𝑋𝑡,𝑠+𝑡 = 0) (𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑦 1)

= 𝑃(𝑋𝑡 = 0)𝑃(𝑋𝑠 = 0) = 𝑓(𝑡) = 𝑓(𝑠)

→ 𝑓(𝑡 + 𝑠) = 𝑓(𝑡)𝑓(𝑠) ∀𝑡, 𝑠 > 0

One type of solution of this is:

𝑓(𝑡) = 𝑒𝛼𝑡 (𝛼 ∈ ℝ)

(note: other solutions exists, but are a mess, in that they are unbounded, but 𝑓(𝑡) ∈ [0,1], so can’t exist for probabilities).

For the same reason, 𝛼 < 0 (to remain bounded) So:

𝛼 = −𝜆 (𝑓𝑜𝑟 𝜆 > 0)

→ 𝑃(𝑋𝑡 = 0) = 𝑒−𝜆𝑡 (𝑓𝑜𝑟 𝑡 = 1) 𝑃𝑋1(0) = 𝑃(𝑋1= 0) = 𝑒−(𝜆)

Let: 𝑌𝑛= the number of intervals in (𝑘−1𝑛 ,𝑘𝑛] in (0,1] where an event occurred is:

= ∑ 1

{𝑋𝑘−1

𝑛 ,𝑘 𝑛

≥1}

𝑛

𝑘=1

(𝑛 𝑖𝑖𝑑 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 (𝑝𝑛)) (𝑎𝑠 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑜𝑓 𝑡𝑖𝑚𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙, 𝑎𝑛𝑑 𝑑𝑖𝑠𝑗𝑜𝑖𝑛𝑡)

∴ 𝑝𝑛= 𝑃 (𝑋𝑘−1 𝑛 ,𝑘

𝑛

≥ 1)

= 𝑃 (𝑋1 𝑛

≥ 1) (𝑎 𝑙𝑒𝑎𝑠𝑡 1)

= 1 − 𝑃 (𝑋1 𝑛

= 0) (𝑙𝑎𝑤 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠)

(25)

24 | P a g e

= 1 − 𝑒𝑛𝜆

→1

𝑛∼ 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛, 𝑝𝑛= 1 − 𝑒𝜆𝑛)

Note: 1

𝑛≤ 𝑋1 and 𝑌𝑛< 𝑋1, if two events occur in the same interval

However:

𝑛→∞lim 𝑌𝑛 = 𝑋1

(as 𝑋1 accounts ALL events, but 𝑌𝑛 counts only the events in a certain time interval) Because no two events can occur at the same time. (property 3)

The expected value of a 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝) RV is 𝑛𝑝. For 1

𝑛, this is 𝑛𝑝𝑛= 𝑛 (1 − 𝑒𝜆𝑛), what is the

𝑛→∞lim 𝑛𝑝𝑛 ?

= 𝑛 (1 − (1 −𝜆

𝑛− 𝑅1(−𝜆

𝑛)) ) (𝑢𝑠𝑖𝑛𝑔 𝑡ℎ𝑒 𝑇𝑎𝑦𝑙𝑜𝑟 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛)

= 𝜆 +𝑅1(−𝜆 𝑛)

−𝜆 𝑛

𝜆 → 𝜆

Therefore, the limit of the expected value for the binomial is 𝜆

Claim: if 𝑌𝑛 ∼ 𝐵𝑖𝑛𝑜𝑚(𝑛, 𝑝𝑛) such that 𝑛𝑝𝑛 → 𝜆 (𝑎𝑠 𝑛 → ∞), then for any fixed 𝑘 ∈ ℤ+: 𝑃(𝑌𝑛= 𝑘) →𝑛→∞𝑒−𝜆𝜆𝑘

𝑘!

(binomial converges to the poisson pmf) Corollary:

𝑋 = 𝑋1∼ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆) 𝑓𝑜𝑟 𝑙𝑎𝑟𝑔𝑒 𝑛

𝑛→∞lim 𝑃(𝑌𝑛= 𝑘) → 𝑃(𝑋𝑝𝑜𝑖𝑠𝑠𝑜𝑛)

Proof of claim:

𝑃(𝑌𝑛= 𝑘) = (𝑛

𝑘) 𝑝𝑛𝑘(1 − 𝑝𝑛)𝑛−𝑘

=𝑛(𝑛 − 1) … (𝑛 − 𝑘 + 1)

𝑘! 𝑝𝑛𝑘

References

Related documents

The evolution equa- tions for the six quantities contained in these tensors are reduced in number in two cases: (i) for arbitrary surfaces, we use principal coordinates to obtain

The majority of private land in the MRA is peri-urban land located outside of rural villages.. The primary determinant of the market value of peri-urban lands is likely to

The total ABC contribution to Australian screen drama, combined with approximately \$125 million in external funding, delivered up to \$244 million in production value to

They should show the visitor the significance of the materials, establish the context in which they are being presented and above all, explain the story of the exhibition,

Sessional Com m ittee on the Environm ent 79.. A strong research and development effort, particularly into the integration of control methods, is essential to the

urabae is not capable of successfully reproducing on any native or valued moth species, it will only form populations near eucalypt trees.. Cotesia urabae only accounted for a small

If treewidth of G is less than w produce a tree decomposition of G of width less than 4w in time O(f (w ) · mn), where m and n are the number of edges and nodes of G and f depends

There is a finite number h n,d of tight frames of n distinct vectors for C d which are the orbit of a vector under a unitary action of the cyclic group Z n.. These cyclic

The limits set may not achieve a suitable level of protection of downstream environmental values for the Fitzroy River Basin and are not always reflective of all relevant

• Additional High Conservation Value Vegetation (AHCVV) means areas of vegetation which were found during ground-truthing which would otherwise meet the definition of Existing

Vessel biofouling is a major pathway for the introduction of non-indigenous marine organisms into New Zealand territorial waters, some of which may be harmful

(3) The Committee shall examine only those accounts of receipts and expenditure of the Northern Territory and reports of the Auditor-General for financial years commencing after

Madam CHAIR: Mr Khattra says that you stopped allowing him to drive your taxi because he was a witness at the inquiry, the Public Accounts Committee, in to the taxi industry, what is

Ms LAWRIE (Leader of Government Business): Madam Speaker, I move – That, the Assembly refer the following matters to the Standing Orders Committee for inquiry and report to

Benzene (ppb) change in annual max 1-hour (MDA1) ground level concentrations from Scenario 2 due to future industry (S3-S2) for a subset of the CAMx 1.33 km domain centred over

In the Australian context, in 2009 the NSW Valuer-General commissioned an analysis of the impact of wind farm development on rural land in NSW and Victoria (2009 NSW

31702 Mr Thomas Drach Yes Tasman District Resident Upper Moutere. 31703 Ms Paula Holden

existence. In making such an estimate, the Government Printer was requested to take as understood that each author body would have its report printed by the

In mouse, inner cell mass (ICM) cells and ICM-derived pluripotent stem cells (PSCs) critically depend on catabolising the amino acid threonine, while human PSCs require

disadvantage have special resonance for the Australian Aboriginal community, where the construct, the best interests of the child, has been applied and has resulted in an

For a randomly chosen pendant edge (the length of the branch (edge) leading from a species back to where it first meets the rest of the tree), the expected value depends on when it

The Swedish school authorities have drawn attention to this work and designated the school ‘the best school in Sweden working for equal value 2008’. Student empowerment, child’s

There is considerable research effort focused on synthesising HAp-matrix composites that achieve reinforcement of the ceramic materials through the use of fine particles,