# Spatial Analysis

Spatial analysis is the process of analyzing the many patterns and processes of the world through their spatial locations and their relationships in space. Spatial analysts use spatial statistics and GIS to analyze data and explain spatial patterns. Spatial analysis differs from other data analysis because it focuses on interactions in space, which adds different variables to the analysis. Some common types of analysis involve spatial autocorrelation, regression, and interpolation. Working with spatial data also involves certain problems. Common problems that spatial analysts have to be aware of are the Modifiable Areal Unit Problem (MAUP), ecological fallacy, and edge effects.

## Spatial autocorrelation

Spatial autocorrelation measures the comparative proximity of objects as they are related to other objects. This concept is based on what is commonly known as Tobler’s first law of geography: “Everything is related to everything else, but near things are more related than distant things” [1] This concept violates a basic assumption of general statistical correlation, that every variable is independent of every other variable. Rather, objects that are close to each other spatially will tend to have similar characteristics while objects that are further from each other are more likely to have different characteristics. Spatial autocorrelation is generally classified as positive, negative, or zero. Positive autocorrelation is when objects are clustered, and similar values appear close to each other. Negative spatial autocorrelation results in a dispersed pattern, where differing values are close together and similar values are far apart. [2] Zero spatial autocorrelation is a random pattern, no relationship between the variables based on space exists.

Examples of linear correlation used in regression

## Regression

Regression analysis is way to quantify spatial patterns and investigate underlying factors in spatial patterns. Regression is largely based on how strongly two variables, known as the dependent and independent variables, are correlated. The dependent variable, which is graphed on the y-axis, represents a process or pattern the analyst is attempting to understand while the independent variable, graphed on the x-axis, is used by the regression model to predict the value of the dependent variable. A regression model that is good at predicting the individual dependant values by the value of the independent variable has a strong positive or negative correlation. If the variables have a strong positive correlation, then as the value of the independent variable increases, so does the dependent variable. With strong negative correlation, as the independent variable increases, the dependent variable decreases. If the model has no correlation, then the dependent variable does not predict any of the independent variables. Regression models are more complex then simple linear correlation because they use p-values, r-squared values, and residuals to identify accurate regression models. P-values reflect the probability that a variable is a good predictor; a low p-value indicates that the model does a good job at predicting values. R-squared values describe the fit of the model, specifically, how well the independent variable explains the dependent variable. The residuals are the part of the model that cannot explain variation between the predicted and known values. [3]

## Interpolation

File:Average precipitation in the lower 48 states of the USA.png
This map used interpolation to estimate values across the US

Interpolation is the process of estimating unknown values based on known points. This technique is often used to predict precipitation or wind speed in between sample points, usually weather stations. Interpolation models use equations to predict values, usually based on how far the value is from a known point. Many different interpolation models exist. Some of the most common ones are inverse distance weighting (IDW), spline, trend surface analysis, and kriging. These models use different equations, or combinations of multiple equations, to predict values. Because each model is slightly different, spatial analysts need to carefully choose which model they use to fit the specific problem they are working with. [4]

## Spatial analysis issues

Working with spatial issue brings with it some inherent problems that spatial analysts must work around. Three common problems are the Modifiable Areal Unit Problem (MAUP), Ecological Fallacy, and Edge Effects.

### Modifiable Areal Unit Problem

The Modifiable Areal Unit Problem (MAUP), arises from humans’ propensity for partitioning space, such as creating artificial boundaries for countries, states, counties, and other administrative districts. Analysts regularly use administrative boundaries as a basis for spatial analysis, because that is how most data is grouped. However, using these types of districts and zones can affect the analysis; the zones can often be biased in some way, either through politics or simply how population is distributed in the zone. [5]

### Ecological fallacy

Ecological fallacy is often considered a part of MAUP and occurs when a statistical value calculated for a group is applied to a specific part or member of a group. For example, many analysts will find the average income or cancer rate within a county. However, the analysts cannot report that an individual in that county has that income or cancer rate. They also cannot say that any township within the county has the same income levels or cancer rates as the whole county. [6]

### Edge effects

Edge effects occur due to the necessity of partitioning space for the ease of analysis. It is not possible for spatial analysts to analyze a problem throughout all of space. Also, analysts often want to focus on one area in specific. As a result, analysts place boundaries on their study areas to make calculations possible. However, the placement of the boundary can be problematic. Often, the boundary is imposed by politics, such as country, state, or county boundaries. Other time the boundary is placed by researchers in the field. No matter how the boundary is placed, there is a possibility that values outside the study area boundary effect what is inside the boundary, even though they are not technically a part of the study. Spatial analysts must be aware of this possibility and find ways to work around it when needed.