GIS for Environmental Modeling
| Geog 479/559 Spring 2009 | Tu Th 2:00 - 3:20pm |
| Instructor: Ling Bian
Office: 120 Wilkeson Quad Office Hours: Tu Th 12:30-1:30pm |
322 Fillmore Lab: T 12:30-1:50pm or W 11am-12:20pm, Wilkeson 145 TA: Liang Mao |
Statistics I
1. Basic statistics
Parameters (for populations)
m, s2, s
Statistics (for samples) x,
S2, S,
Variance S2
Standard deviation S
Normal distribution
Significance a
Parametric statistics
for test distributions with known parameters
Non-parametric statistics
parameters are unknown
non-normal distributions, small sample sizes
use low rank data such as nominal and ordinal
Parametric is more powerful
when the parameters are known
Otherwise non-parametric
is more powerful
2. t test
Test for equality of means of two samples
Assumptions: random samples, normal distribution, equal variance
Null hypothesis: h0: X1 = X2
X1-X2 1 1
(n1-1)S12 + (n2-1)S22
t = ------,
Se = Sp --- + ---, Sp2 = --------------------
Se n1
n2
(n1 -1) + (n2 - 1)
Compare the computed t value to the t table value (two-tailed) for
specified degrees of freedom and level of significance
If the t > +critical value or t < -critical value, reject the H0
otherwise accept the null hypothesis that the two means are from the
same
population.
3. Mann-Whitney Test
Nonparametric substitute for T test of the equality of two means
Null hypothesis:
Combine the two sets (n,m) of data and rank them from 1 to n+m
n n(n + 1)
T = S R(Xi) - ----------,
R(Xi) R(Yi) are the ranks of Xi, Yi
1
2
Compare the computed T value to the T table values (two-tailed)
for specified sample size (n) and level of significance
For the upper critical value
T1-a = nm - Ta
Tied data are assigned averaged ranks, e.g. R(Xi)=R(Yi)=(8+9)/2=8.5
4. C2 Test
Test for goodness of fit between a sample and a predefined distribution
can be used for nominal and ordinal data, i.e. count data
can be used for nonparametric statistics
Null hypothesis: the sample has a normal distribution
k (Oj
- Ej)2
X2 = S -------------'
Oj- number of observed
1
Ej
Ej- number of expected
standardize the data:
Xi - X
Zi = --------
S
Divide the normal distribution evenly into n categories
assign the sample into the same n categories
Compare the computed C2 value to the
C2 table values (one-tailed) for
specified degrees of freedom and level of significance
If X2 value > critical value, reject the null hypothesis
otherwise accept the null that the sample has a normal distribution
5. Kolmogorov-Smirnov Test
Nonparametric substitute for X2 test
It does not group data into categories
It is more sensitive to deviations in the tails
Fit a sample to a normal distribution of unspecified m and s
Null hypothesis:
standardize the data
Xi - X
Zi = ----------
S
plot a normal distribution and the sample in cumulative form
find the maximum absolute difference between the two curves
K-S = |normal - sample|
compare the computed K-S value to K-S table values (one/two-tailed)
for specified sample size and level of significance
If the K-S value > critical value, reject the null hypothesis
Statistics II Regression Analysis
1. Joint variation of two variables
Joint variation of two variables
about their common mean
Covariance
2. Simple regression and least square methods
Regression: model relationships
between variables
Yi = b0 +
b1Xi + ei,
b0 - intercept bi - slope
n
Least square methods: S
(Yi - Yi)2 = minimum; Sei
= 0
1
Parameter estimates: b0,
b
Yi = b0 + bXi
3. Goodness of fit (coefficient of determination)
n
Total Sum of Squares: SSt =
S (Yi - Y)2
1
n
Sum Squares of regression:
SSr = S(Yi - Y)2
1
n
Sum Squares of residuals:
SSe = S (Yi - Yi)2
1
SSt = SSr + SSe
Coefficient of determination (goodness of fit): R2 = SSr/SSt
Coefficient of correlation:
R = R2 = SSr/SSt
r =
Cov(x,y)/sxsy
(k-1)(1 - R2)
adjusted R2: R2a = R2
- -------------------
N - k
N - sample size, k - number
of independent variables
4. Test of regression model
General F test: equality of two variances
Null hypothesis: S12 =
S22
S12
F = ------
S22
Compare the computed F value
to the F table values for specified
degrees of freedom for both vairiances and level of
significance
If the computed F>critical F, reject the null, accept otherwise
F test for regression model:
Null hypothesis: SSr = SSe
SSr/k
F = ----------,
k - number of parameters excluding b0
SSe/N-k-1 N - sample size
t test for individual parameters b
Null hypothesis: bi = 0
bi
t = ------, Sbi -
standard error of bi
Sbi
5. Multiple regression
Yi = b0
+ b1X1 + b2X2
+ b3X3 + ... +
bmXm + ei
Yi = b0 + b1X1 + b2X2 + b3X3
+ ... + bmXm
6. Other regressions