How do you assess the quality of geospatial data available over the Internet?

August 13, 2018

Gordon Plunkett

Interoperability and web service improvements are making it easier to share geospatial data, perform analytics and make better decisions based on data obtained from multiple sources and different data perspectives. But to make better decisions, the retrieved data needs to be current, valid, accurate and precise. Data quality is one area where GIS practitioners have an important role to play. Read this blog post to find out what constitutes good quality data and how to decide if the data you retrieved from an SDI meets your data quality requirements.

SDIs, web services and common exchange formats are making it much easier to share spatial data. These days, organizations are increasingly providing their geospatial data online for others to access and use. Internet accessible data-sharing sites are allowing users to find and download numerous types of geospatial data. More advanced data-sharing sites can also provide data as web services by using technology such as Esri REST endpoints, Web Feature Services (WFS) and Web Coverage Services (WCS). Online data sharing is not quite ubiquitous yet, but certainly, there is momentum in the geospatial community for organizations to implement technology such as ArcGIS Hub or ArcGIS Online for providing online access to data.

With all this data being made available easily, users must recognize that not all data is of good quality. But how does one find that out? For starters, the reputation of the organization that is publishing the data is an important indicator. Organizations that have good quality data often provide sites that collect, curate and publish data, such as the Living Atlas of the World – Canada Edition. Although most government open data sites offer good quality data, it’s advisable to verify the quality before using it.

By selecting Canada as the region (see red ellipse) in the Living Atlas on ArcGIS Online, users can browse, find and use high-quality data from and for all parts of Canada. More than 130 data sets are available.

Not all data is created equally, and no one expects geospatial data to be perfect, so quality will vary. What matters is that the data in question must be fit for your purpose. The question arises - how do you asses data quality? For accounting data, you can check if the numbers add up; for textual data you can check for spelling or grammar errors. While there is no simple solution for geospatial data, tools such as the ArcGIS Data Reviewer application can help. The ArcGIS Data Reviewer has processes for automated data review, semiautomated data review, error management and data quality reporting.

Another option is to follow data review best practices. To find the most obvious errors, perform a visual inspection of the map data to verify that objects are in the right locations. For example, automobile accident locations displayed away from a road or buildings displayed in lakes can easily indicate that the data isn’t reliable. Such errors are easily noticeable, but they are often random.

The image below shows data recently downloaded from Winnipeg’s open data site. You’ll notice that the road network for an area of the city doesn’t correspond to the base topographic map or the basemap imagery. There could be two reasons –either the basemap and imagery have not been recently updated, or the road network is on the plan but not constructed as yet.

Top left – downloaded road network; Top right – topographic basemap; Bottom left – road network displayed over the topographic basemap; Bottom right – road network displayed over the basemap imagery. These images clearly indicate that at the time of image acquisition, this area was under construction.

Examining geospatial data quality is a complex effort and is determined by examining multiple metrics such as precision, accuracy, consistency, completeness, integrity, accessibility, validity, timeliness, currency, authoritativeness, compatibility and conformance.

Here are a few tips and tricks to determine geospatial data suitability for your project.

Any errors or warnings when accessing or transferring the data from the online web portal or errors or warnings processing or loading the data into your system could indicate a potential red flag. Load the data and perform a quick visual inspection by viewing it in map format and looking for anomalies.
Check if point data locations align appropriately with the basemap. For example, address points in a lake or vehicle accident points that are not near a road.
Check if any linear data such as roads, railways and hydrographic features line up properly with the basemap or not.
Check visible polygon features such as parcel boundaries or park boundaries to make sure they align properly with the basemap.
Verify that imagery aligns correctly with both the basemap and any additional vector data layers that are available.
Scan the attribute table to see if any attributes seem incorrect. Select a few attributes one at a time to check their location. For example, if there is a province attribute, select all the elements for a certain province and verify that the points, lines or polygons are displayed in the correct province.

By far, most of the data that is published through an SDI or on a data distribution portal is of good quality, but mistakes can happen despite all the QA/QC that the data provider performs. Besides, a file may become corrupted due to a system or IT issue. The onus is on you (the user) to verify that the data is fit for your purpose. If you notice an issue, inform the data stewards so that it can be remedied. Use any feedback mechanisms that are available.

To summarize, when you are using SDI or online geospatial data, take the time to do some quick quality control checks to verify that the data is suitable. If the data is deemed unacceptable, search around for a more suitable dataset, if one exists. On the other hand, if the data passes your quality tests, you can confidently use it in your decision-making processes.