Distribution (Continuous)
continuous uniform distribution: A continuous random variable whose probability distribution is the uniform distribution is often called a uniform random variable. If we know nothing about a random variable apart from the fact that it has a lower and an upper bound, then a uniform distribution is a natural model
- mean: $\frac{u+l}{2}$
- variance: $\frac{(u-l)^2}{12}$
exponential distribution: We assume that failures form a Poisson process in time; then the time to the next failure is exponentially distributed.
- mean: $\frac{1}{\lambda}$
- variance: $\frac{1}{\lambda ^ 2}$
Normal Distribution
a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean
$$ p(x)=(\frac{1}{\sqrt{2\pi}\sigma})\text{exp}(\frac{-(x-\mu)^2}{2\sigma^2}) $$
- mean: $\mu$
- variance: $\sigma ^2$
- 68% data within $\sigma$, 95% data within $2\sigma$, 99% data within $3\sigma$
- A continuous random variable is a normal random variable if its probability density function is a normal distribution.
- another name for normal distribution is Gaussian distributions.
- central limit theorem (CLT): under some not very worrying technical conditions, the sum of a large number of independent random variables will be very close to normal.
standard normal distribution:
$$ p(x)=(\frac{1}{\sqrt{2\pi}})\text{exp}(\frac{-x^2}{2}) $$
- mean: $0$
- variance: $1$
- A continuous random variable is a standard normal random variable if its probability density function is a standard normal distribution.
- Any probability density function that is a standard normal distribution in standard coordinates is a normal distribution.
binomial distribution approximation:
when $N$ is huge, we can approximate binomial distribution with normal distribution to reduce calculation cost.
Assume h follows the binomial distribution with parameters p and q. Write:
$$ x=\frac{h-Np}{\sqrt{Npq}} $$
The, for large N, the probability distribution $P(x)$ can be approximated by the probability density function:
$$ P(\{x\in [a,b]\})\approx \int^b_a(\frac{1}{\sqrt{2\pi}})\text{exp}(\frac{-x^2}{2}) $$
Experiment
population and sample: if we could have seen everything, is the population. I will write populations like random variables with capital letters to emphasize we don’t actually know the whole population. The data we actually have is the sample.
sample mean: the mean from sample, usually notated as: $X^{(N)}$: sample mean when sample size is $N$
- it is a random variable
- $\mathbb{E}[X^{(N)}]=\text{popmean(\{X\})}$
- $var[X^{(N)}]=\frac{\text{popsd(\{X\})}^2}{N}$
- $std[X^{(N)}]=\frac{\text{popsd(\{X\})}}{\sqrt{N}}$
confident interval:
- confidence interval for a population mean: Choose some fraction f, An f confidence interval for
a population mean is an interval constructed using the sample mean. It has the property that for that fraction f of all samples, the population mean will lie inside the interval constructed from each sample’s mean. - centered confidence interval for a population mean: Choose some$0<\alpha<0.5$. A $1-2\alpha$ centered confidence interval for a population mean is an interval $[a,b]$; b constructed using the sample mean.
unbiased standard deviation: use to estimate the population standard deviation
$$ \text{stdunbiased(\{x\})}=\sqrt{\frac{\sum_i(x_i-\text{mean}(\{x\})^2)}{N-1}} $$
standard error: The standard deviation of the estimate of the mean
$$ \text{stderr}(\{x\})=\frac{\text{stdunbiased(\{x\})}}{\sqrt{N}} $$
Random variable distribution
we learn two distribution for random variable for far: the t-distribution and normal distribution, depended on the sample size, if $n<30$, we use t-distribution, otherwise use normal distribution
t-distribution:
$$ T=\frac{\text{mean}(\{x\})-\text{popmean(\{X\})}}{\text{stderr(\{x\})}} $$
degree of freedom: $N-1$
normal distribution:
$$ Z=\frac{\text{mean}(\{x\})-\text{popmean(\{X\})}}{\text{stderr(\{x\})}} $$
degree of freedom: $N$