Lab 1: Data Preparation and Exploration

 

Geo 597: Geostatistics

Spring 2009

 

Exercise

 

Note:

1. Preparing the Data File

 

First of all, the point data have to be prepared in table format. Header should be in the first row including: x-coordinate (or longitude), y-coordinate (or latitude), and attributes (values). For accuracy the coordinates should be specified at least down to 6 decimal points. Note that coordinates in longitude and latitude must be specified in decimal degrees. The actual values must be entered right below the header in the corresponding column. For example:

 

X

Y

Attribute1

Attribute2

39.144700

-123.202600

33

45

39.929100

-120.180800

40

75

 

Data entry can be easily done in Excel spreadsheet. After finishing the data entry, save the table in either Excel  or TEXT (*.txt) format.

 

2. Converting table into point shape file

·        Launch ArcMap by the following path: Start -> Program -> ArcGIS -> ArcMap.

·        Add your table into the map. Go to the menu Tools/Add XY Data. In the pop up window, choose the right table name and specify X Field and Y Field. Click OK and you will see new points are created as events. Right click the event file in the Table of Contents (TOC) and select Data/Export Data. Rename the new point shapefile. 

3. Projection

 

It is important to have your point shapefile in a certain projection system, because we have to calculate distances between points in real units. If the point shapefile is in decimal degrees (longitude/latitude), it must be first defined as such and projected to some other rectangular coordinate system. If your point data is already projected, just check if they are defined properly.

 

How to define coordinate system:

 

Otherwise select Projected Coordinate System/….

 

How to project into another coordinate system:

 

 

4. Producing maps

·        First of all, activate the Geostatistical Analyst extension by following the menu path: ArcMap -> Tools -> Extensions. In the "Extensions" window, check "Geostatistical Analyst", -> Close.

·        Add the the Geostatistical Analyst icon to the toolbar by following the path: View -> Toolbars. Then check Geostatistical Analyst. 

·        Load the shape file by following the menu path: File -> Add Data. Browse to your folder, then select the shape file (.shp) you created and "Add" this shape file. Activate the shape file by checking the name of the layer in the Table-of-Contents of ArcMap.

5. Generate a histogram of the point data

 

Access the histogram: Geostatistical Analyst -> Explore Data -> Histogram.

At the bottom of the "Histogram" window, select the shape file "Layer" and the appropriate "Attribute" for generating histogram. Change the number of bars of the histogram in the "Bar" box. Choose a "Transformation" method to change the data to a normal distribution if the data are not normal.

 

Before printing the histogram, first expand the window to the full screen by dragging the edge of the window, then save the screen by pressing the "Ctrl + Alt + Print Screen" key on the keyboard, and then go to Word -> Edit -> Paste the screen in Word. You may experiment with the "Add to layout" option for mapping.

 

6. Normal QQPlot

 

Access Normal QQPlot: Geostatistical Analyst -> Explore Data -> Normal QQPlot.

 

A General QQ Plot is a graph on which the quantiles from two distributions are plotted versus each other. For two identical distributions, the QQ Plot will be a straight line. Therefore, it is possible to check the normality of your data by plotting the quantiles of that data versus the quantiles of a standard normal distribution. The closer the points are to creating a straight line, the closer the distribution is to being normally distributed. If the data does not exhibit a normal distribution in either the Histogram or the Normal QQ Plot, it may be necessary to transform the data to make it conform to a normal distribution before using certain kriging interpolation techniques.

 

7. Trend analysis

 

Take a 3-D view of your data: Explore Data -> Trend Analysis, and change the "Perspective" bars to see different views of your data. Play with "Grid", "Projected Data", "Trend on Projections", "Sticks", "Axes", "Input Data Points" , "Rotate", and "Perspective" to see what they do.

 

Each vertical stick in the trend analysis plot represents the location and value (height) of each data point. The points are projected onto the perpendicular planes, an east-west and a north-south plane. A best-fit line (a polynomial) is drawn through the projected points, which model trends in specific directions. If the line are flat, this indicates that there are no trend.

 

8. Explore the semivariogram of the point data

 

Access the semivariogram function by following the menu path: Explore Data -> Semivariogram. In the "Semivariogram/Covariance Cloud" window: Select appropriate "Layer" and "Attribute".

 

The Semivariogram/Covariance Cloud allows you to examine the spatial autocorrelation between the measured sample points. In spatial autocorrelation, it is assumed that things that are close to one another are more alike. The Semivariogram/Covariance Cloud lets you examine this relationship. To do so, a semivariogram value, which is the difference squared between the values of each pair of locations, is plotted on the y-axis relative to the distance separating each pair on the x-axis.

 

Each red dot in the Semivariogram/Covariance Cloud represents a pair of locations. Since closer locations should be more alike, in the semivariogram the close locations (far left on the x-axis) should have small semivariogram values (low on the y-axis). As the distance between the pairs of locations increases (move right on the x-axis), the semivariongram values should also increase. However, a certain distance (range) is reached where the cloud flattens out, indicating that the relationship between the pairs of locations beyond this distance is no longer correlated.

 

Looking at the semivariogram, if it appears that some data locations that are close together (near zero on the x-axis) have a higher semivariogram value than you would expect, you should investigate these pairs of locations by clicking on the red dot.- in question. The two locations will show on the map.

 

Besides global trends that were discussed earlier, there may also be directional influences affecting the data. The reasons for these directional influences may not be known, but they can be statistically quantified. These directional influences will affect the accuracy of the surface you will create in later lab exercises. However, once you know if one exists, the Geostatistical Analyst provides tools to account for it in the surface-creation process. To explore for a directional influence in the semivariogram cloud, you use the Search Direction tools.

 

Check the "Show Search Direction" option, and experiment with the semivariogram parameters: Angle Direction, Angle Tolerance, Ban Width, Lag Size, and Number of Lags in the corresponding boxes, or you may adjust these parameters interactively in the "Semivariogram/Covariance Surface" box.

 

You may explore other tools, such as "Create Subsets" if you have a large number of data points and would like to set aside a subset of data for independent validation.

 

 

Assignments (due: February 19)

 

1) Print a histogram of your original data. Include the summary statistics as well. Check for outliers. Describe the distribution: whether it looks close to normal distribution; whether it is symmetric or skewed. Do you think any transformation is necessary? If so, what kind of transformation would you use? 

2) Print a map of your sample data. Display the value ranges with color or shades of gray (data posting). Highlight any outliers in the map if necessary.  

3) Explore Semivariogram/Covariance cloud with various lag sizes and directions. Can you detect spatial autocorrelation in your data set? Is there anisotropy in your point data? If so, identify major and minor directions of continuity. Print the omni-directional Semivariogram/Covariance Surface map.

A freeware that allows for random selection of points out of a large point data set http://www.spatialecology.com/htools/rndsel.php