Section Six: Statistical Analysis

One of the most powerful advantages of using a GIS over using paper maps is the ability to perform statistical analysis on the layers, both on the values in the attribute table, but also spatial statistics - a process of understanding spatial patterns in a statistical manner.  Data laid out in table from may be able to show some patterns by looking at it, adding the total of columns or rows, and find statistical figures like the standard deviation and the median (which ArcGIS will do all of), but the real power is spreading that data out over the landscape, placing each value where it belongs in the world.  The unique pattern created by the data when we do this is often more powerful then just the numeric statistics itself.  The GIS has several built-in tools designed specifically for examining the spatial pattern of data (none of which will be used in an introduction class, since these advanced tools fall into a Geospatial Statistics class where the concepts behind the tools can be focused on, as the actual operation of the tools is no harder then any other GIS tool).  

7.6.2: Table Statistics

ArcGIS and similar GIS softwares fall into the category of Relational Database Management Systems, meaning they are capable of calculating table based statistics using built in formulas and tools.  Within the attribute table of any given layer, provided tools allow a user to populate a single field with calculated values, summarized a single field containing numeric values, export a summary table for a single field, and export the entire table to a format capable of being read in more complex software.

Before we review the tasks the Field Calculator can do, let's review field types, as Field Calculator's job is to populate records in a field with a variety of values and calculations. ArcGIS groups all fields into one of six ‘types’ - short integer, long integer, double, float, text (or string), and date.  Short and long integer contain whole numbers without any decimals; float and double contain values with decimals; date fields contain dates with several available formats; and text fields contain values that are the face value of what is shown.  That is to say, if a text field contains the number "3", the software sees it as a number-shaped letter, no different then if the record contains "three".  Number fields - short and long integer, float, and double, are only capable of containing numbers and ordering the values in numeric order; text fields are only capable of containing letters and of alphabetizing those letters; and date fields are only capable of containing dates ordered in chronological order.  If a number field contains the values 1, 110, 200, and 1000, if the field were to be sorted ascending, the numbers are placed in exactly that order. If those same values were stored in a text field, when the field is sorted ascending, it would instead read 1, 110, 1000, 200, as text fields are only capable of being alphabetized.

Field Calculator

In GIS 101, Field Calculator is used in a variety of ways: to populate an entire field with a single value, to populate selected records with a single value, effectively populating the entire field with a variety of values, copying the contents of one field to another with a goal of converting the field type, such as a text field which contains number-shaped letters to actual numbers or converting decimals to integers via rounding, and solving for simple mathematical equations using a different field as an input to said equation.  The power of the field calculator is much greater than we experience it in GIS 101, as it is capable of calculating very complex equations utilizing several fields and constant values as inputs, populating one field using a logic code based on another field, and using built in algebraic, geometric, trigonometric, and statistical functions.  With a bit of practice and some basic knowledge of Python, a technician can use the Field Calculator to their advantage. Pair the strengths of Field Calculator with the Join function and the ability to compute complex table statistics is as endless as the technician’s imagination. 

Figure 7.14:
Using Field Calculator to populate selected records with a single value (also works with populating the entire field with a single value when no records are selected)
Field Calculator use to copy and round values from one field (type: float) to another field (type: long integer).
Using Field Calculator to complete a linear unit conversion by multiplying one field [elev] by the conversion value (1 meter = 3.2808399 feet).
Field Calculator contains a series of built-in algebraic, trigonometric, geometric, and statistical functions (screenshot shows just a few functions).

First introduced in Chapter Five, Section 3.3, the Statistics and Summarize tools are available to examine basic statistics about a single numeric field (Statistics) or an output table summarizing many fields (Summarize).  Statistics provides the user with a quick pop-up window showing the sum, mean (average), standard deviation, and the minimum and maximum value within the field, while - although providing the same data as the Statistics tool - the Summarize tool produces an output table for one or more fields within a single attribute table.

Statistics and Summarize are only available for numeric fields (short and long integer, float, and double), even when there are numbers contained within a text field.  If you are attempting to complete a Summarize or Statistics operation on a field which appears to contain numbers and the options are grayed out, check the field type to make sure the field is actual one of numbers.  If it is a text field, Field Calculator will be needed first to convert the field type from text to number via copying to a new field. If Summarize is run on a text field, the only output is "first" and "last", referring to alphabetizing whatever is stored in the field.

Figure 7.15: Statistics and Summarize within ArcMap
Within ArcMap, the Field Header menu for each field in the Attribute table contains options for Statistics and Summarize, both of which provide statistical summaries for numeric fields, with the difference laying with the fact that Summarize creates a new output table, containing more options and the ability to be opened within table-based calculation software such as Microsoft Excel and Statistics providing a quick pop-up with a few main statistics.
An example of the quick pop-up box provided by the Statistics tool found in the Field Header menu.  In this example. the tool is finding the total number of features, minimum value, maximum value, sum, mean (average), standard deviation, and the count of Null values within the Shape Area field, a single field within the attribute table of a US States layer, including the District of Columbia.The Summarize tool, similar but different than the Statistics tool, outputs a data table with the same information. The Summarize tool can summarize or many fields within the same attribute table.

Counting Unique Values

One action you might expect to find in the Field Header menu, but doesn’t actually appear, is the ability to count the number of unique records in each field. For example, if you had a field for road types: freeway, major road, and local road, and would like to know how many of each feature there is, you will need a method other than Summarize and Statistics to find the answer. If you are only looking for quick number of a value or two, it will mostly like be the fastest to perform a Select by Attribute, and see the number of selected records. However, there are times when you are wanting to count all of the values in a single field (especially with fields of type “String/Text” where Summarize and Statistics only return the first and last value), count all of the values in multiple fields, or create a record of all the values.

This is where the Frequency (Analysis Toolbox) tool comes into play. This tool takes an input table (spatial or non-spatial), asks for a name for and a place to save the output table, which fields you’d like to count the unique fields for, and if you’d optionally like a sum of any fields.  An output table will be saved in the location you designate and most likely will need to be added into the MXD, as it is not default to automatically add.

Figure 7.16: The Frequency Tool and it's Output
The input table is the table for which the frequency tool is going to be run on.The Frequency Tool allows the user to select the fields for the frequency count.  If more than one field is selected, the tool will look for the frequency of features (rows) which have matching values in all of checked fields.The output table shows the count of unique features.  This example shows the frequency of the "Feature" field, which groups population.

7.6.3: Spatial Statistics

What makes GIS exciting and powerful, since most of the table statistics can actually be performed in a spreadsheet program such as Microsoft Excel, is the ability to perform spatial statistics calculations, independent of the values found in the attribute table.  When data is displayed in ArcMap, the technician can often see patterns in the data such as clusters of incidents near populations centers, visually the central feature to a series of features, or which features interact with other features (think back to Select by Location).  While these patterns can be seen by a technician (or sometimes not seen), it takes GIS tools to quantify and record the clusters as they occur spatially.  While Spatial Statistics is not utilized in lab during an Introductory class, knowing and understanding that this class of tools is available in the GIS is important to grasping the overall purpose and capabilities of the software.

Figure 7.17: An Example of Spatial Statistics - Hot Spot Analysis
This example of Hot Spot Analysis shows the concentration patterns of a crime occurrence point layer (not shown) overlaid on a street network.  Performing spatial statistics revealed clear patterns in the data that otherwise may have been hidden or only visually (vs. numerically) apparent. This example of Hot Spot Analysis shows the concentration patterns of a crime occurrence point layer (not shown) overlaid on a street network. Performing spatial statistics revealed clear patterns in the data that otherwise may have been hidden or only visually (vs. numerically) apparent.

Hot Spot Analysis (seen in Figure 7.x) is just one a example of the tools found Spatial Statistic toolbox.  Other examples include tools which:

measure geographic distributions of data.
Central Featurefinds the spatial center of a data setcentral_feature
Mean Centerfinds the statistical center of a cluster of datamean_center
Standard Distancefinds the pattern of occurrences centralized on the statistical center of the datastandard_distance
define a geographic pattern.
Average Nearest Neighbor Distancefinds the distance from a single feature to all other featuresaverage_nearest_neighbor
High/Low Clusteringlocates defines cluster patterns over an entireclustering
Spatial Autocorrelationfinds spatial correlation between feature and attributesspatial_autocorrelation
lead to a better understanding of geographic clustering.
Hot Spot Analysisanalyzes how the clustering lookshot_spot_analysis