Section Five: What Are Relational Databases and Geodatabases

ArcGIS is an example of a relational database management system (RDMS) utilizing geodatabases - a specific kind of relational database. This means ArcGIS works like a relational database in order to store, organize, edit, and analyze data.  If we want to understand what a geodatabase is, we need to first understand what a relational database is.  Simply put, a relational database is an electronic storage container with a top-down structure in which the items contained are related to each other and that relationship allows for the data to be quickly and efficiently queried and retrieved for use.  

Let's take a minute to look at all of those parts one at a time. 

A relational database is an electronic storage container...the database is a method of storing related objects on the computer.  They are based on paper storage and organizational methods, but the computer, as we know makes it all so much faster and more efficient. 
with a top-down structure...The database uses a structure where the highest organizational item is the largest unit, which allows a cascading organization in regards to size. An example of a top-down structure is how the company or agency you work for is (most likely) set up. Companies and agencies are overseen by a director or CEO. Below that person is some sort of upper-management. They oversee some sort of middle-management, who in turn oversees the general lower level workers. If you have a problem with your direct supervisor, you most likely need to follow the chain of command to file a complaint. Unless your supervisor's supervisor is the CEO or director, you need to pass through middle or upper management first.
The database is set up in a similar manner. The largest organizational unit houses the next smaller unit, which holds all the participating materials. Keep in mind this is a simplified view, but it is a good way to start to understand the structure of the relational database.
in which the items contained are related to each other...In order to have a high-quality relational database, it's important to store only items which are part of the project. Much like the company structure example, all of the people who work for whatever company or agency are all working towards a common goal of producing a product or service. There are lots of moving parts to a company or agency, which requires a large number of people (in a medium to large sized company or agency). Yet, there is not one huge company that every one works for with goal of accomplishing all things that make the world go round. We have many companies and many agencies to accomplish many goals. Databases are the same way - each on has a single goal or project. It would be confusing and disorganized to have all of your projects in all of your classes stored all in one database. You wouldn't find anything, things would be unrelated, and the thing would take up a huge amount of space on a disk. Imagine trying to backup your day's work and needed to back up 875 GB each time. You'd be waiting forever!
and that relationship allows for the data to be quickly and efficiently queried and retrieved for use.Storing all of the data you need for a project in a single database is not enough of a use of the power of relational databases. The key comes from that word relational. The power of the database comes from the ability to query, or ask questions of, the database. If you store related data and you want to ask a question of the database, you would query using a query language such as SQL or the structured query language (which we will get really really good at soon enough). For example, which positions on my intramural softball team does Sally play?

Relational database structure is based upon organizational files called keys which allow the contents to be related to each other in a meaningful way.  Relational databases, in reality, are larger and more complex then the example we are about to look at, but starting with a very simple example is a good way to begin to understand the basic structure and rules of a database.

For an example, let’s built a database to organize your intramural softball team. The first step is deciding what information you want to store. Let’s go with player name and position played, to keep it simple. The first step is to make a list.


But Sally and Jim both play a second position, while Sam doesn’t. For simplicity, we will just state databases do not operate correctly with empty cells, so you couldn’t just add a second “Position” field. To overcome this empty cell problem, you make three new tables - “Players”, “PositionOne”, and “PositionTwo”.


Okay, wait. You solved your empty cell problem, but you created a new problem - who plays what position? You could add a "Player" field to each of the position tables, and carry that method into the player address table, the player emergency contact table, and the player birthday table.  While this is a viable solution, it is not using the power of the relational database.  What if Sally moved away and you need to update all her position to Evan, the new guy on the team?  If you have Sally's name in every place where she could be replaced with another person, you'd have to manually update all of the data in all of those tables.  Or you could assign each player a number, then when it came time to replace Sally in the database, you really just need to re-assign the player number "3" from Sally to Evan.  This is what makes the relational database such a powerful tool in data organization. We call this "player number" the primary key and it is the the source of the power in the relational database.  By assigning each player a number in the database, it become so much easier to query and update the tables when needed.  This is also where we get the part of the definition:

...the items contained are related to each other and that relationship allows for the data to be quickly and efficiently queried and retrieved for use.


This concept is, in a nutshell, how databases work. The key allows us to have a theoretically endless number of tables, be able to populate those tables with less data, and update them with greater ease then if we used an alphanumeric key (ie. the player's name).  Another benefit of the key value is the ability to break data down into categories, and have only one table per category.  When it comes time to update the data, you would only need to update the data for that specific category - and since each category of data has it's own table, finding the fields in which to update is a boatload easier.   It's also easier for a database to retrieve the data when the data is queried if each table has a purpose - player's position, player's address, player's phone number, player's emergency contact, etc. This leads into why we are looking at database structure and rules in an introduction to GIS course. The organization ArcGIS uses is based on database concepts, which is why we see shapefiles with unique ID’s and the use of geodatabases.

4.5.2: Geodatabases

Geodatabases, represented in ArcGIS with a Databaseicon-display file icon, are simply databases which store and relate both spatial and non-spatial datasuch as feature classes (spatial vector files which live in a geodatabase, not to be confused with shapefiles, which are similar, but live in folders instead), feature dataset (sub-container which stores a collection of similar feature classes all with the same coordinate system), raster catalog (sub-container for storing similar and related raster data), annotations (spatially related labels), topologies (spatial rules that define how and where vector data (both the vertices and the lines which connect those vertices) can/can't interact with each other), address locators (data tables of addresses and their corresponding geographic coordinates used with geocoding), network datasets (fastest route, shortest route, etc.), geometric networks (define the flow of feature classes such as water, oil, and traffic), parcel fabrics (used to create and organize parcel (property) data), spatial and non-spatial data tables, and terrains (like TINs).

Even though you may not understand what some (or all) of the above listed items are, looking at their their jobs (in parenthesis) shows how they interact with each other or how they represent rules of travel, boundary definition, labeling, and intradata interactions.  Geodatabases are much more powerful than anything we accomplish in GIS 101, in fact of the above listed items, we only look at five - feature classes, feature datasets, raster catalogs, address locators, and spatial/non-spatial data tables .  And even then, we don't even use the relationship properties of the geodatabase.  We really just use the geodatabase as a storage folder, since we are just getting used to the purpose and the power of the geodatabases.

Figure 4.13: Looking at The Nesting Nature of Geodatabases as Represented by Star Wars Nesting Dolls
On a Side Note...
For the scope of this class, you don’t need to memorize everything which can go into a geodatabase, nor would you have to design one on a test, like the example above.

You do, however, need to understand and be able to define the terms: database, geodatabase, feature classes, feature datasets, and raster catalogs.

Feature Datasets

Feature datasets, represented with a Featuredataseticon-display file icon, are a collection of vector feature classes which all share the same geographic or projected coordinate system. They can also include topologies, network datasets, terrains, geometric networks, and parcel fabrics.  By storing these specific vector files (feature classes, terrain, and parcel fabrics) with files which can define rules and interactions between said vector files (topologies, network datasets, and geometric networks), spatial problems can be solved the wouldn’t otherwise be possible without the interaction and definitions of rules.  It's the geodatabase, the defined relationships between the files, and the sub-containers that allow for some really awesome and really powerful data interaction.

While feature datasets can be organized in any manner the user sees fit, most often, they are organized by some sort of theme.  One analyst might use categories like "transportation", "water", and "urban", while another might use "buildings", "streets", and "natural_resources".  How the data is organized is most often defined by the place of employment, since that company or agency has a predetermined data model that keeps everyone on track and data organized in a manner logical for the projects and people who work for, or are serviced by, that specific company or agency.

Figure 4.14: The File Icons Associated with Feature Datasets

Raster Catalogs

Raster catalogs, represented with a raster_catalog-display file icon, are collections of rasters which are organized and defined by a key. Much like our example of a the softball database, a raster catalog has a series of “players” (each individual raster) and “positions” (where the are in the world), all organized by the “player key”. Raster catalogs often hold images which relate to each other, such as images over time (change detection - bare ground to forest) or images which partially or totally overlap. Using a raster catalog and it’s accompanying key allows related rasters to be grouped logically.

We don't get into the use of raster catalogs in GIS 101, but it's important to be aware of the fact that geodatabase relationships don't stop with vector and vector-related files.  GIS 101 is a vector-based course, and rasters don't really come into play at all, and because of this, many GIS 101 students do not fully grasp the fact that rasters are a huge player in GIS, remote sensing, and spatial analysis.  Being able to define and understand the basic purpose of raster catalogs is plenty for this course.