Section Two - Scanning and Digitizing Data

Even as the amount of freely available digital GIS data grows at an exponential rate, there are still many times a GIS technician must create new data. Sometimes, a technician is faced with historical data only being available in paper form and is required to digitize the data while other times, data is collected directly by a field technician with GPS units and tablets utilizing apps and ArcPad, which then must be imported into the GIS for further analysis. Other times, project-specific data must be created in order to complete specific tasks.  In the following section, different methods of getting fresh data into the GIS will be explored.

6.2.2: Manual Digitizing

More prominent in the early days of computerized GIS, manual or hardcopy digitization was the best way to get maps from paper to digital form.  The process uses either a pen-like object for tracing the map without leaving ink behind called a stylus, or a computer mouse-like object called a puck or cursor.  A plastic window with a printed cross hair and buttons with various controls assists the digitizer - the person making a digital copy of the map. The puck or stylus is attached to and used in conjunction with a digitizing table, a special grid-covered table which responds to the moves and clicks of puck.

Figure 6.1: Digitizing a Paper Map
manual digitizingmanual_digitzing_2
In this image, we see a digitizer using a puck and a digitizing table to create a digital version of the paper map.  As she follows a line in the map (likely a road or a river) she keeps the feature in the crosshairs of the puck's window, using the proper buttons to create the feature on the computer.Points, polylines, and polygons can all be digitized in the software using a digitizing table.  As the result is shapefile or feature class, the dot-to-dot analogy works for the idea of digitizing.  The digitizer uses the puck to establish the "dots" while the software "draws" the image.

In order to complete the digitization process, the digitizer tapes a paper map to the digitizing table and traces or digitizes the map features, using the appropriate button combinations to record the clicks and movements to the attached computer and it’s software. With the table grid responding to the puck under the paper map, the software is able to convert the movements to digital features.  Think about the game “Battleship”, where the object is to hide your Navy vessels on a grid from your opponent. They attempt to locate where on the grid your battleship lay by calling out game coordinates like B1 (row 1, column B), and you reply with Hit or Miss. You opponent then places a Hit or Miss peg on his grid, noting if there was an object at that location or not. When the digitizer marks a point along the grid of the digitizing table as a Hit, the computer marks the same point in the software’s grid with a “Hit peg”. After enough “hits”, the software can connect the points to create a digital version of the paper map’s features.

Figure 6.2: The Board Game Battleship
Similar to the board game Battleship, manual digitizing records the "hits" the digitizer makes with the puck or stylus while turning a paper map into a digital product.

6.2.3: Heads-Up or On-Screen Computer Digitizing

Manual digitization is not completely obsolete, as it is still handy when paper maps are too large or too damaged to scan into the GIS.  Large maps can be challenging to piece back together within the GIS without introducing some error and damaged maps can lead to distortion of the map objects when scanned, such as if the map is torn and taped back together or if the map is very wrinkled.  As computers gained more speed and power and fell in cost, they became easier to use and more common within companies and agencies.  As a result, the need to use a digitizing table and trained digitizer began to diminish as heads-up or on-screen digitizing became the more common way create new digital layers, with it's speed, ease of use, and short training time.  Similar to the process of manual digitizing, with on-screen digitizing, the GIS technician converts raster images into vector features by looking at the image loaded into the GIS, then uses a mouse (with a “traditional” computer setup) or stylus (with a touchscreen) to click and trace each feature or object in the image into a point, polygon, or polyline feature class, utilizing a method called “Creating Features” (clever, I know!).  Each mouse click places a single vertex on the screen directly on top of the image, and we've already learned that the only purpose of a vertex is to mark a geographic coordinate pair as a single building block within a vector feature.  For each point feature, the GIS technician places one vertex per feature, two or more vertices connected automatically by a line for polyline features, and three or more for a closed polygon feature.

Digitizing features comes in two varieties: adding features to an existing shapefile or feature class and adding features to a brand new shapefile or feature class, and a GIS technician completes both tasks in a fairly equal interval.  The process of either adding features to an existing vector file or creating all new features for a brand new vector file is exactly the same, with the exception that for a new file, the technician must create a new shapefile or feature class first

When it comes time to create new vector layers, there really isn't a specific rules about what is the right way to organize the data, but there are some rough, expected standards.  In general, separate vector layers are often created for each feature or group of features to be collected. Examples could include a Rivers polyline layer, a Roads polyline layer with roads, a Rails polyline with and railways, and a Counties polygon layer. With the small size of vector files, it’s a good idea to go ahead and break things out as far as you can initially, since you can always combine layers together later on.  For example, you end up with separate Roads, Railways, and Freeways layers which all contain eight features each, a single "Transportation" layer with 24 features organize the data without confusing the end-user as roads, rails, and freeways are all types of transportation.

Figure 6.3: On-Screen or Heads-Up Digitizing
Heads-up digitizing using a "traditional" mouse and keyboard setupHeads-up digitizing utilizing a touch screen and stylus

6.2.4: Aerial and Satellite Photos

The concept of digitizing features really is not that complicated - look at an image and trace features into a new or existing vector layer to later be used for vector analysis, but where do those images come from and how do technicians get them into to the GIS?  These raster layers come from a separate but related geospatial science - Remote Sensing and the product are aerial and satellite images.  At this point, it's a fairly safe bet to assume that almost everyone who is taking this class has exposure to web maps such as Google Maps and Google Earth, both of which utilize satellite images and classified rasters known as basemaps, which show things like streets, buildings, parks, and schools in a illustrative manner - no longer an image, but also not vector features.  These satellite images and basemaps are a product of Remote Sensing which Google pays for to add context to their website and software.

Remote sensing, defined as collecting images from a distance without actually physically interacting with the landscape, is the primary ways a GIS technician obtains imagery to use with heads-up digitizing. Remotely sensed data is collected via aircraft or orbiting satellites and the quality and type of image collected vary based on the vehicle carrying the sensor collecting the data and the planned goal of the output.  Some paid satellite imagery has a very high spatial resolution, or the ground distance shown along the side of  one pixel, like one would see in Google Earth, while free imagery has a lower spatial resolution.  High spatial resolution means objects for digitizing seen in the image are shown in more detail and the result of creating classified rasters is a more accurate representation of the landscape, while lower spatial resolution images may be more challenging to resolve, or the ability to recognize, identify, and digitize features.

Both satellite images and aerial images can be collect different kinds of energy, some visible, like you see and experience the world everyday, and some invisible, like the heat coming off your stove - you can feel the heat if you hold your hand over the burner but you can't see the actual heat (you can see effects of the heat, like the metal turning red or the air above becoming visibly wavy, but those are not the actual heat).  The ability to collect these different kinds of energy and apply a visible color to them can teach us so many things about the world that we could not normally see, as our eyes are only capable of detecting and processing a very small amount of the energy that surrounds us every second of every day.  Once we can look at the world through various lenses beyond our built-in ones, we can digitize all kinds of data and make decisions about things we never thought possible before.

Some of the most commonly used free satellite imagery comes from a long-running US Government program by the name of Landsat, or the Land Satellite (there is also a program called SeaSat).  With the first satellite launched in July of 1972 and the eight launched in February of 2013, the photographic history of almost the entire surface of the Earth (to get the best images, the satellites are set to focus on the mid-latitudes and not take images of the poles) is both fascinating and a great addition to scientific history.  Landsat images have a "spatial resolution of 30 meters" for color images and a spatial resolution of 15 meters for panchromatic images.  Satellite images have some awesome advantages over aerial images: 

    • Temporal Resolution → since the satellite passes the same place on the Earth's surface on a set schedule, there is the ability to see change over a long period of time.  For example, Landsat takes a picture of the same spot every 16 days.
    • Synoptic view → Since the satellite is so far away, it gives a whole new meaning to the phrase "bird's eye view", meaning the satellite can collect an image of a very large area at one time.  For example, Landsat collects an image that is 185 km wide.
    • Low-cost → Once the satellite launches, as long as it functions properly, the cost per image is relatively low and falls every time the satellite orbits the Earth.  If a satellite cost 855 million dollars to build and it collects only one image, that image cost 855 million dollars.  By the time it collects 855 millions images, they cost only a dollar a piece.
    • Collects different kinds of images at once → satellites can collect color images, panchromatic images, and other categories such as infrared and thermal images of the same all at the same time using different sensors
Figure 6.4: Examples of Remotely Sensed Images
Remotely sensed images of different coastlines. A. Interferometric Synthetic Aperture Radar (IFSAR) data (using microwaves to collect data for creating representative images), B. topographic and bathymetric lidar data (height and elevation data represented by colors), C. hyperspectral imagery (adding color to invisible energy), and D. digital photography (human eye visible colors represented as themselves)

Aerial photos, images collected by aircraft such as planes and helicopters, have some advantages and disadvantages when compared to satellite images, even though the products are the same.  Since the camera is carried on a aircraft flown with the intent of collecting images, the scenes collected are targeted and the end-user doesn't need to wait for a particular orbit.  But on the flip side, aerial images have a very high cost due to the fact that someone has to pay the pilot, the person operating the sensors, and for the fuel.  Airplanes and helicopters can collect images with a much higher spatial resolution, but they can only fly when the weather is desirable or during the day, if they are not approved to fly at night - compared to satellites which collect images 24 hours a day, 365 days a year, no matter the weather or time of day.  Another advantage is when it comes to natural disasters and other emergencies, aircraft can respond immediately to collect images for first responders, where satellites sometimes collect really nice images of the event - completely by chance.  

Neither satellite images or aerial photos are the "right" choice, nor is one better than the other.  Each one has it's advantages and disadvantages, so the choice to use one over the other really comes not from some pre-established set of rules, but from what is the more correct choice for a project, what funding is available to a particular GIS shop, and what sources of data that shop might have at it's disposal.  

GIS 101 will use some satellite imagery from time to time, mostly in the capacity of a basemap or as a source to digitize new vector features (one topic of this chapter), this class will not go to far into how to get that data or the many, many uses of said data.  There are entire classes dedicated to the topics of collecting and using remotely sensed data.

6.2.5: Scanning and Georeferencing

For many years, paper maps were the heart of cartography and spatial analysis. While computers have become the majority for their ease of use and ability to store large amounts of related data at one time, past paper maps still contain a wealth of information that has yet to be input into the computer. This is where digitizing a scanned image comes into play.  Using a flat bed or large rolling scanner, paper maps and other hard copy images, can be scanned into the GIS and then georeferenced for use with digitization.  

Figure 6.5: Three Styles of Scanners
Smaller maps such as platt maps can be scanned with a consumer-sized scannerDrum scanners can handle larger maps and scan fasterLarge scale rolling scanners work similar to a photocopy machine for very large maps

6.2.6: Georeferencing

Once maps have been scanned into digital images, they are ready to be brought into the GIS, but they have a huge problem - they have no idea where in the world they live.  When spatial raster and vector data are added to a GIS project, the first thing which happens is they are placed exactly where they belong, meaning that when each pixel corner or each vertex is placed within the GIS coordinate system's in the proper spot.  After a map is scanned, it can be added to the GIS project, but right off the bat, the image is no better than an image of grumpy cat - each one is just a raster image.

In order to "tell" a raster map where it "lives" in the world, a GIS technician must go through a process of georeferencing.  Georeferencing is the process of finding distinct objects in the scanned map and connecting each one to a known location within another raster or a vector layer via creating control points.  We saw the term "control points" back in Chapter Two when we learned how geoids and reference ellipsoids are connected, and the meaning is no different in this case.  Control points are simply matched locations between two surfaces, sometimes two mathematical models such as a geoid and a reference ellipsoid and sometimes between an image and a coordinate pair, such as an freshly scanned map and a vector layer which matches some boundary in said map or a spatial raster depicting the same area.  Either way, we are using the term "control" to note that coordinates in one of the layers or mathematical models are a constant and unchanging standard of comparison.  

For example, if a map of the western United States was scanned, the process of georeferencing it would go as follows: first, the technician would locate a place in the image (the unknown) which is clear and distinct, such as a the corner of a state if it was visible in the image; second, the technician will utilize the georeferencing control point tool to mark the location of the selected visible place (the candidate control point); third, the technician will use the georeferencing control point tool to mark the selected location (the destination control point) on the spatially referenced raster or vector layer (the known); fourth, the technician will repeat these steps until the unknown image visually lines up with the known location.  

The process of georeferencing within ArcMap uses a constantly updating method, that is to say, as the technician works marking features with control points, the candidate control point immediately snaps to the destination control point. At first, the image seems very stretched and extremely distorted, but as the process continues, the image begins to become legible again. When georeferencing, it is important to only put as many control points as are need to set the image in place. This method is definitely a case of “less is more”.

It's important to note that the term georeferencing has only a single meaning in GIS - defining the spatial location of unknown image by creating control points between distinct objects in said image and a known raster or vector layer.  It does not mean to search for and download imagery, nor to add any raster imagery to the GIS, nor does it mean to find locations within known imagery, nor any sort of geoprocessing or spatial analysis.  It is incorrect to use the term in reference to any other task within the GIS beyond the singular meaning.

Figure 6.6: An Example of a Single Control Point in ArcGIS
In this image, we see the unknown raster image being georeferenced to a vector roads layer, seen in thick red and thin yellow lines.  The candidate control point (green cross, upper middle) is marked first, followed by the destination control point (yellow cross, lower left).  The process of georeferencing is a repetitive puzzle, looking for distinct locations such as road intersections in the unknown raster image, and finding those in the known raster or vector layer.  The finesse comes from selecting the correct locations across the correct spread to make the unknown image pop into place. 

6.2.7: Required reading: Scan the Contents of this Article

6.2.8: Creating New Vector Layers in ArcMap

Many times in GIS, we are digitizing features by adding them to an existing layer, meaning we take a shapefile or feature class that we already have and add new features to it so the total number of increases.  For example, if you downloaded a road centerlines feature class from the City of Fort Collins and realized that an entire neighborhood was missing, you would use an image of the city as a reference and edit the existing layer, adding the missing roads via digitizing.  The feature class, when downloaded, may have 325 features (rows in the attribute table) and after you've digitized the missing neighborhood, the count might increase to 412 because you found 87 missing roads.

The rest of the time, in order to digitize features which do not exist into a shapefile or feature class which does not exist, you would first need to create a new vector file.  Just like when you sit down to write a paper for English class or send and email to your Grammy, the first step is to create a new, blank document or email.  You cannot complete the paper or write the email if you do not first click the "New" button.  GIS is no different - if you do not have a vector layer in which to digitize the new features, you cannot complete the task.  

Creating a new shapefile or feature class is a fairly simple process and is practically the same for both, with a few slight differences.  Remembering that shapefiles live in folders and feature classes live in geodatabases, the technician needs to identify where the new file will be created, give it a unique name which follows the GIS naming convention rules, pick a geographic or projected coordinate system, and decide which of the three geometry types the new shapefile or feature class needs to be, because remember, polygon vector files can only contain polygons, polyline files can only contain polylines, and points can only contain points.  If you need to digitize features of more than one geometry type, you'll need to create more than one feature class or shapefile. 

Take a few minutes to look over the following table showing you how to create new shapefiles and feature classes.  Remember, you don't need to memorize the steps, but it's a good thing to review the process before you are asked to complete it in a lab situation. 

Figure 6.7: Overview of Creating New Shapefiles and Feature Classes
    1. Right-click on the folder (shapefile) or geodatabase/feature dataset (feature classes) where the new layer will be added
    2. Mouse down to New...
New Shapefile
New Feature Class
    1. Find and select “Shapefile” from the list (or feature class in the case of a geodatabase)
    2. In the new Shapefile dialog box, give your new shapefile or feature class a name and assign a geometry type
The New Feature Class dialog box will differ slightly with a series of screens, advanced through with a "Next >" button, and a few more options, including the ability to add fields before the layer is created, not after like with a shapefile (via Add Field in the attribute table’s Table Option menu)
    1. Click “Edit” to assign a geographic coordinate system to a new shapefile OR "Next" in the New Feature Class Window to advance to the Coordinate Systems page
Shapefile Coordinate System Selection Box
New Feature Class Coordinate System Box
Clicking the “Add Coordinate System” button at the top will allow you to Import or “borrow” a coordinate system from any other layer on the computer
    1. Click Add Coordinate System
    2. Click Import
    3. Find the layer you’d like to borrow from by driving to the file and highlighting it (single click)
    4. Click Okay
    5. The New Shapefile (or feature class) dialog box will populate with the identical coordinate system

6.2.9: Geocoding

We know that geographic coordinate systems are a worldwide "address" system, marking locations on the Earth's surface in order to record or navigate to a location.  We also know that in the postal system, actual postal addresses are part of a similar system, with streets running more or less North/South and East/West and the building number marking the location along that street.  What you might not know, however, is these two systems are actually correlated.

When you use a mapping app such as Google Maps or Apple Maps, the system is actually looking up the address in a stored table that lists the building address as you input it into the system and the latitude/longitude of that address as found in a geographic coordinate system.  When you ask the app to navigate you to your destination, the app accepts the address as you've entered it, looks it up in a table of known addresses, compares what you've entered to what is considered "correct" by the Post Office, offers you any suggestions or corrections if needed, then finds to corresponding latitude and longitude coordinate pair, to which the app will actually navigate, not the address as typed.  This process of converting addresses to geographic coordinates, both with a navigation app and the GIS is called geocoding, while reverse geocoding is the process of taking geographic coordinates and finding the associated address.  

Many times in the GIS, we need to create a point layer based on addresses, and we accomplish this task with geocoding.  Similar to the process the navigation app uses, in order to create a address-based point layer, we need to first create a table of addresses and use either a web-based or built-in address locator to match the address to the geographic coordinates.  Address locators are lookup tables consisting of the house/building numbers, street names, block numbers, the odd/even sides of the street, the pattern of house/building numbers in that area, the zip codes, and the associated geographic coordinates.  These address locators are the key to pairing the addresses as listed by the postal service with the geographic coordinate pair.

Geocoding and reverse geocoding can be completed either web-based services, which have everything you ready-to-go, without any intermediary steps like built-in GIS tools have, however, they often have limitations such as single country or continent address locators, a maximum number of address which can be geocoded each day without cost, or only current addresses.  Built-in geocoding tools require the technician to create the address locator before geocoding the addresses, but there are no limitations, meaning a large numbers of addresses, historic addresses, or multi-country addresses can be geocoded.  The technician guides the process, but the task takes longer overall.

Figure 6.8: Geocoding
A list of Colorado coffee shop addresses have been geocoded, the addresses have been converted to geographic coordinates and populate a point shapefile.