Identify the Data You Need for Your Big Data
Big Data and the Origins of MapReduce
Manage Virtualization for Big Data

Spatial Databases in a Big Data Environment

Spatial databases can be an important tool in your big data project. Spatial data itself is standardized through the efforts of the Open Geospatial Consortium (OGC), which establishes OpenGIS (Geographic Information System) and a number of other standards for spatial data.

Whether you know it or not, you may interact with spatial data every day. If you use a smartphone or Global Positioning System (GPS) for directions to a particular place, or if you ask a search engine for the locations of seafood restaurants near a physical address or landmark, you are using applications relying on spatial data.

This is important because spatial databases are implementations of the OGC standards, and your company might have specific needs met by the standards. A spatial database becomes important when organizations begin to leverage several different dimensions of data. For example, a meteorologist doing research might want to store and evaluate data related to a hurricane, including temperature, wind speed, and humidity, and model those results in three dimensions.

In their simplest form, spatial databases store data about 2-dimensional, 2.5-dimensional, and 3-dimensional objects. You are probably familiar with 2D and 3D objects. A 2D object has length and width. A 3D object adds depth to the length and width. A page from a book is a 2D object, while an entire book is a 3D object.

What about 2.5D? 2.5D objects are a special type of spatial data. They are 2D objects with elevation as the extra “half” dimension. Most 2.5D spatial databases contain mapping information and are often referred to as Geographic Information Systems (GISs).

The atomic elements of spatial databases are lines, points, and polygons. They can be combined in any fashion to represent any object constrained by 2, 2.5, or 3 dimensions. Due to the special nature of spatial data objects, designers created indexing mechanisms (spatial indices) designed to support ad hoc queries and visual representations of the contents of the database.

For example, a spatial index would answer the query “What is the distance between one point and another point?” or “Does a specific line intersect with a particular set of polygons?” If this seems like a huge problem, that’s because it is. Spatial data may well represent the biggest big data challenge of all.

PostGIS is an open source project maintained by Refractions Research and is licensed under the GNU General Public License (GPL). PostGIS is also supplied as part of the OpenGeo Suite community edition and is offered and supported by OpenGeo under an enterprise license.

PostGIS is a specialized, layered implementation running on the workhorse RDBMS PostgreSQL. This approach offers the best of both worlds. You get all the benefits of an SQL RDBMS (such as transactional integrity and ACID) and support for the specialized operations needed for spatial applications (reprojection, geodetic support, geometry conversion, and so on).

Although the database itself is very important, you will also require other pieces of technology to address spatial application requirements. Fortunately, PostGIS is part of an ecosystem of components designed to work together to address these needs. In addition to PostGIS, the OpenGEO Suite consists of the following:

  • GeoServer: Implemented in Java, the GeoServer can publish spatial information from several of the major sources of spatial data on the web. It can integrate with Google Earth and also has an excellent web-based administrative front end.

  • OpenLayers: A library for JavaScript that is useful for displaying maps and other representations of spatial data in a web browser. It can manipulate images from most of the mapping sources on the web, including Bing Maps, Google Maps, Yahoo! Maps, OpenStreetMap, and so on.

  • GeoExt: Designed to make the map information from OpenLayers readily available to the web application developer. GeoExt widgets can be used to create editing, viewing, styling, and other interactive web experiences.

  • GeowebCache: After you have the data in a server and can display it in a browser, you need to find a way to make it fast. GeowebCache is the accelerator. It caches chunks of image data (called tiles) and makes them available for rapid delivery to the display device.

While many of the uses of spatial data involve maps and locations, spatial data has many other contemporary and future applications, including

  • Precise 3D modeling of the human body, buildings, the atmosphere, and so on

  • Gathering and analysis of data from sensor networks

  • Integration with historical data to examine 3D space/objects over time

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
The Creation of Manageable Big Data Structures
The Evolution of Distributed Computing for Big Data
Columnar Databases in a Big Data Environment
RDBMSs in a Big Data Environment
Layer 3 of the Big Data Stack: Organizing Data Services and Tools