GIS for Environmental Modeling

Geog 479/559 Spring 2009                           Tu Th 2:00 - 3:20pm 
Instructor: Ling Bian
Office: 120 Wilkeson Quad
Office Hours: Tu Th 12:30-1:30pm
                         322 Fillmore
                          Lab: T 12:30-1:50pm or W 11am-12:20pm, Wilkeson 145
                          TA: Liang Mao

Statistics I
1. Basic statistics
        Parameters (for populations)    m, s2, s
        Statistics (for samples)              x, S2, S,

        Variance S2
        Standard deviation S
        Normal distribution
        Significance a

        Parametric statistics
                for test distributions with known parameters
        Non-parametric statistics
                parameters are unknown
                non-normal distributions, small sample sizes
                use low rank data such as nominal and ordinal
        Parametric is more powerful when the parameters are known
        Otherwise non-parametric is more powerful

2. t test
Test for equality of means of two samples
Assumptions: random samples, normal distribution, equal variance
Null hypothesis: h0: X1 = X2

            X1-X2                   1      1                (n1-1)S12 + (n2-1)S22
        t = ------,   Se = Sp  --- + ---,    Sp2 = --------------------
                Se                    n1     n2                   (n1 -1) + (n2 - 1)

Compare the computed t value to the t table value (two-tailed) for
specified degrees of freedom and level of significance

If the t > +critical value or t < -critical value, reject the H0
otherwise accept the null hypothesis that the two means are from the same
population.

3. Mann-Whitney Test
Nonparametric substitute for T test of the equality of two means
Null hypothesis:
Combine the two sets (n,m) of data and rank them from 1 to n+m

               n                n(n + 1)
      
  T = S R(Xi) - ----------,  R(Xi) R(Yi) are the ranks of Xi, Yi
               1                      2

Compare the computed T value to the T table values (two-tailed)
for specified sample size (n) and level of significance

For the upper critical value
        T1-a = nm - Ta
Tied data are assigned averaged ranks, e.g. R(Xi)=R(Yi)=(8+9)/2=8.5

4. C2 Test
Test for goodness of fit between a sample and a predefined distribution
can be used for nominal and ordinal data, i.e. count data
can be used for nonparametric statistics

Null hypothesis: the sample has a normal distribution

                 k   (Oj - Ej)2
        X2 = S -------------'      Oj- number of observed
                 1         Ej                Ej- number of expected

standardize the data:
                 Xi - X
        Zi = --------
                    S
Divide the normal distribution evenly into n categories
assign the sample into the same n categories
Compare the computed C2 value to the C2 table values (one-tailed) for
specified degrees of freedom and level of significance

If X2 value > critical value, reject the null hypothesis
otherwise accept the null that the sample has a normal distribution

5. Kolmogorov-Smirnov Test
Nonparametric substitute for X2 test
It does not group data into categories
It is more sensitive to deviations in the tails

Fit a sample to a normal distribution of unspecified m and s
Null hypothesis:
standardize the data
                  Xi - X
        Zi = ----------
                        S
plot a normal distribution and the sample in cumulative form
find the maximum absolute difference between the two curves

        K-S = |normal - sample|

compare the computed K-S value to K-S table values (one/two-tailed)
for specified sample size and level of significance

If the K-S value > critical value, reject the null hypothesis
 

Statistics II  Regression Analysis
1. Joint variation of two variables
      
  Joint variation of two variables about their common mean
        Covariance

2. Simple regression and least square methods
       
Regression: model relationships between variables
        Yi = b0 + b1Xi + ei,   b0 - intercept  bi - slope
                                              n
        Least square methods:    S (Yi - Yi)2 = minimum; Sei = 0
                                              1
        Parameter estimates: b0, b

        Yi = b0 + bXi

3. Goodness of fit (coefficient of determination)
                                                          
n
     
   Total Sum of Squares:      SStS (Yi - Y)2
                                                           1
                                                             n
        Sum Squares of regression: SSrS(Yi - Y)2
                                                             1
                                                            n
        Sum Squares of residuals:  SSeS (Yi - Yi)2
                                                            1
        SSt = SSr + SSe

        Coefficient of determination (goodness of fit): R2 = SSr/SSt

        Coefficient of correlation: R = R2 = SSr/SSt
                                               r = Cov(x,y)/sxsy

                                                   (k-1)(1 - R2)
        adjusted R2: R2a = R2 -    -------------------
                                                        N - k
        N - sample size, k - number of independent variables

4. Test of regression model
General F test: equality of two variances
        Null hypothesis: S12 = S22
                S12
        F = ------
                S22
        Compare the computed F value to the F table values for  specified
degrees of freedom for both vairiances and level of significance

        If the computed F>critical F, reject the null, accept otherwise

F test for regression model:
        Null hypothesis: SSr = SSe
                   SSr/k
        F = ----------,     k - number of parameters excluding b0
             SSe/N-k-1     N - sample size

t test for individual parameters b
        Null hypothesis: bi = 0
                  bi
        t = ------,  Sbi - standard error of bi
                Sbi

5. Multiple regression
        Yi = b0 + b1X1 + b2X2 + b3X3 + ... + bmXm + ei

        Yi = b0 + b1X1 + b2X2 + b3X3 + ... + bmXm
 
6. Other regressions 
 
 

...Back to Ling Bian top page.