ArcGIS is an example of a relational database management system (RDMS) utilizing geodatabases - a specific kind of relational database. This means ArcGIS works like a relational database in order to store, organize, edit, and analyze data. If we want to understand what a geodatabase is, we need to first understand what a relational database is. Simply put, a relational database is an electronic storage container with a top-down structure in which the items contained are related to each other and that relationship allows for the data to be quickly and efficiently queried and retrieved for use.
Let's take a minute to look at all of those parts one at a time.
Relational database structure is based upon organizational files called keys which allow the contents to be related to each other in a meaningful way. Relational databases, in reality, are larger and more complex then the example we are about to look at, but starting with a very simple example is a good way to begin to understand the basic structure and rules of a database.
For an example, let’s built a database to organize your intramural softball team. The first step is deciding what information you want to store. Let’s go with player name and position played, to keep it simple. The first step is to make a list.
But Sally and Jim both play a second position, while Sam doesn’t. For simplicity, we will just state databases do not operate correctly with empty cells, so you couldn’t just add a second “Position” field. To overcome this empty cell problem, you make three new tables - “Players”, “PositionOne”, and “PositionTwo”.
Okay, wait. You solved your empty cell problem, but you created a new problem - who plays what position? You could add a "Player" field to each of the position tables, and carry that method into the player address table, the player emergency contact table, and the player birthday table. While this is a viable solution, it is not using the power of the relational database. What if Sally moved away and you need to update all her position to Evan, the new guy on the team? If you have Sally's name in every place where she could be replaced with another person, you'd have to manually update all of the data in all of those tables. Or you could assign each player a number, then when it came time to replace Sally in the database, you really just need to re-assign the player number "3" from Sally to Evan. This is what makes the relational database such a powerful tool in data organization. We call this "player number" the primary key and it is the the source of the power in the relational database. By assigning each player a number in the database, it become so much easier to query and update the tables when needed. This is also where we get the part of the definition:
...the items contained are related to each other and that relationship allows for the data to be quickly and efficiently queried and retrieved for use.
This concept is, in a nutshell, how databases work. The key allows us to have a theoretically endless number of tables, be able to populate those tables with less data, and update them with greater ease then if we used an alphanumeric key (ie. the player's name). Another benefit of the key value is the ability to break data down into categories, and have only one table per category. When it comes time to update the data, you would only need to update the data for that specific category - and since each category of data has it's own table, finding the fields in which to update is a boatload easier. It's also easier for a database to retrieve the data when the data is queried if each table has a purpose - player's position, player's address, player's phone number, player's emergency contact, etc. This leads into why we are looking at database structure and rules in an introduction to GIS course. The organization ArcGIS uses is based on database concepts, which is why we see shapefiles with unique ID’s and the use of geodatabases.
Geodatabases, represented in ArcGIS with a file icon, are simply databases which store and relate both spatial and non-spatial data, such as feature classes (spatial vector files which live in a geodatabase, not to be confused with shapefiles, which are similar, but live in folders instead), feature dataset (sub-container which stores a collection of similar feature classes all with the same coordinate system), raster catalog (sub-container for storing similar and related raster data), annotations (spatially related labels), topologies (spatial rules that define how and where vector data (both the vertices and the lines which connect those vertices) can/can't interact with each other), address locators (data tables of addresses and their corresponding geographic coordinates used with geocoding), network datasets (fastest route, shortest route, etc.), geometric networks (define the flow of feature classes such as water, oil, and traffic), parcel fabrics (used to create and organize parcel (property) data), spatial and non-spatial data tables, and terrains (like TINs).
Even though you may not understand what some (or all) of the above listed items are, looking at their their jobs (in parenthesis) shows how they interact with each other or how they represent rules of travel, boundary definition, labeling, and intradata interactions. Geodatabases are much more powerful than anything we accomplish in GIS 101, in fact of the above listed items, we only look at five - feature classes, feature datasets, raster catalogs, address locators, and spatial/non-spatial data tables . And even then, we don't even use the relationship properties of the geodatabase. We really just use the geodatabase as a storage folder, since we are just getting used to the purpose and the power of the geodatabases.
|Figure 4.13: Looking at The Nesting Nature of Geodatabases as Represented by Star Wars Nesting Dolls|
|On a Side Note...|
|For the scope of this class, you don’t need to memorize everything which can go into a geodatabase, nor would you have to design one on a test, like the example above.|
You do, however, need to understand and be able to define the terms: database, geodatabase, feature classes, feature datasets, and raster catalogs.
Feature datasets, represented with a file icon, are a collection of vector feature classes which all share the same geographic or projected coordinate system. They can also include topologies, network datasets, terrains, geometric networks, and parcel fabrics. By storing these specific vector files (feature classes, terrain, and parcel fabrics) with files which can define rules and interactions between said vector files (topologies, network datasets, and geometric networks), spatial problems can be solved the wouldn’t otherwise be possible without the interaction and definitions of rules. It's the geodatabase, the defined relationships between the files, and the sub-containers that allow for some really awesome and really powerful data interaction.
While feature datasets can be organized in any manner the user sees fit, most often, they are organized by some sort of theme. One analyst might use categories like "transportation", "water", and "urban", while another might use "buildings", "streets", and "natural_resources". How the data is organized is most often defined by the place of employment, since that company or agency has a predetermined data model that keeps everyone on track and data organized in a manner logical for the projects and people who work for, or are serviced by, that specific company or agency.
|Figure 4.14: The File Icons Associated with Feature Datasets|
Raster catalogs, represented with a file icon, are collections of rasters which are organized and defined by a key. Much like our example of a the softball database, a raster catalog has a series of “players” (each individual raster) and “positions” (where the are in the world), all organized by the “player key”. Raster catalogs often hold images which relate to each other, such as images over time (change detection - bare ground to forest) or images which partially or totally overlap. Using a raster catalog and it’s accompanying key allows related rasters to be grouped logically.
We don't get into the use of raster catalogs in GIS 101, but it's important to be aware of the fact that geodatabase relationships don't stop with vector and vector-related files. GIS 101 is a vector-based course, and rasters don't really come into play at all, and because of this, many GIS 101 students do not fully grasp the fact that rasters are a huge player in GIS, remote sensing, and spatial analysis. Being able to define and understand the basic purpose of raster catalogs is plenty for this course.