Are your Spatial Data Infrastructure tools ready for the big data deluge?

April 9, 2019

Gordon Plunkett

It is estimated that there are about 2.7 zettabytes (ZB) of digital data in existence today, out of which approximately 2.1ZB is geographically related. As geospatial data gets easier to collect and manipulate for creating web products, it becomes imperative for the geospatial community to ensure that there are spatial data infrastructure tools to manage all this data not only today but also into the future. Read this blog post to find out why digital data volume growth will be staggering and why would SDIs become the key technology for managing this data deluge.

I recently read an article in a trade magazine about the ongoing digital transformation of businesses and organizations. The article indicated that there are currently about 2.7ZB (equivalent to 2.7 billion terabytes) of data in existence in the digital world today. The article also indicated that 90 percent of this data was generated in the past two years. Wow, that’s a lot of data, and the growth rate is phenomenal. But where is all this data coming from, who is generating it, and more specifically–how can we manage it?

I certainly don’t know where these monstrous amounts of new data are coming from, but I suspect that satellite, airborne, drone and ground-based imagery, mobile phone multi-media files and general business data like emails and web sites are contributing significantly to this remarkable digital data volume growth.

To see where this is going, let’s assume that the current data volume figures quoted in the article are somewhat correct, and assuming the data volume growth remains the same at about 1.2ZB per year, then in just four years (by 2023), there will be about 7.5ZB of data existing somewhere in the world’s digital infrastructure.

The volume of digital data stored around the world is growing at an enormous rate that will make today’s volumes pale in comparison to the volume generated in just the next decade.

But what happens to the total data volume if we use a compounded growth rate of 90 percent of existing data every two years as per that article? Let’s do the math once again.

In the next four years, the compounded volume growth rate of 90 percent (almost doubling every two years) will result in more than 9.7ZB of data in existence by 2023. If this data volume growth trend holds, then some of you reading this article could conceivably witness over 1 yottabyte (1000ZB) of data in existence during your career (or around the years 2037/2039, just 20 years from now). Given that many organizations are just beginning their digital transformation journey, these gigantic data volumes do not seem unreasonable at this time.

Well, it’s exciting to toss around these big numbers, which you have only heard of before but never used. However, there’s a reason why I’m sharing these calculations with you. Do you remember the old maxim that 80 percent of data contains a geospatial (location) component? Now, use this maxim and take 80 percent of the big numbers determined above and imagine the big volumes of geospatial data that will need to be managed in future.

GIS, unlike some other technologies, often requires large volumes of data. The mainstays of geospatial data include imagery, sensor readings, graphics, attributes, web maps and analysis results–all of which can be large datasets despite data compression.

Everything that happens, happens somewhere. Humans are busy collecting enormous amounts of data about the geography and goings on of our planet.

As an example of data volume growth, Canada’s soon to be launched RADARSAT Constellation Mission (RCM) consists of three identical Synthetic Aperture Radar (C-Band) Earth observation satellites. The three-satellite configuration will provide daily revisits of Canada’s landmass including the Arctic up to four times a day, as well as daily access to any point on 90 percent of the world's surface. Just the Government of Canada alone is expected to use approximately 250,000 RCM images per year or nearly 700 images daily.

Canada’s RADARSAT Constellation Mission consists of three satellites that will soon be collecting huge amounts of geospatial data on a daily basis. Picture Courtesy: RADARSAT Constellation

There are currently about 100 satellites launched each year and many of these are based on global positioning information (GPS, Galileo, GLONASS). Plus, many are Earth-observing satellites for weather, environmental monitoring and mapping applications. The trend to collect space-based geospatial data is likely to continue unabated for the foreseeable future, especially as the building and launching of satellites become easier and cheaper.

Drone imagery, LiDAR, Internet of Things (IOT), specialty maps–the list of new technologies that are creating new geospatial data continues to grow.

The use of drones that carry cameras, LiDAR and other sensors for collecting geospatial data for many applications including mapping, engineering, inspection, utilities, geology and water resources is increasing quickly.

No one really knows how quickly the IoT is going to grow, but it’s clear that there will be billions of sensors in daily use in the near future especially with Intelligent Transportation Systems (ITS) and Smart City initiatives; the location of each sensor would need to be known and the sensor data stream would need to be managed.

The question then arises–how is your organization going to store, manage, catalogue, transmit, analyze and present huge volumes of geospatial data? First of all, organizations need a Geospatial Information Officer (GIO) to oversee data management development and activities. They also need robust SDI technology for managing vector data, raster data, point (sensor) data, internal data, external data, new data and archived data.

Currently, organizations often use different systems and technologies for managing these diverse types of data. This practice often leads to issues such as duplicate data storage; poor cataloguing practices for knowing what data is available and where it's stored; and, inefficient data extraction and transmission from one system to another. These inefficiencies are the reason why big data tools are crucial to the continued development of SDIs. Because SDIs treat all types of data in the same way, it’s easier to manage all of the data within one SDI.

While creating massive amounts of geospatial data is becoming relatively easier, how does an agency manage and use all this data to ensure a good return on investment from both the infrastructure and data itself? How do users find data and what are the most efficient ways for an agency to disseminate the data to the appropriate user? These are issues that will need to be addressed during the designing of SDIs because once the data volumes become really large, it will be easy to misplace or miscomprehend data.

Spatial Data Infrastructure practitioners are just beginning to recognize some of these big data issues that are now upon us. Many big data “collection, management and use” issues would have to be addressed in the very near future to make sure that the geospatial community can make the best use of all this data.