Skip to main content

How to prepare your data for reliable project results

Are you confident in your data quality? If the answer is “kind of”, you might want to go a step further in preparing your data before your next project. I understand it’s hard to resist jumping into a new project with both feet when you’re looking forward to it, but preparing well can pay off a great deal in the end. Read more for a refresher on the basic data preparation steps you should do by default. I’ll also talk about a brand-new course that  could help you dive deeper into validating whether your data are meeting your project and organizational standards and needs.

Preparing your data is key to the success of any GIS project. The preparation process can involve very little effort or extensive reviews, depending on the current state of the data, your familiarity with them and the requirements of the project itself.

At minimum, the data source should be validated. If you created it, you know what you’re working with, but if you’re using data from external sources or that have been collected by someone other than yourself, you should be careful and conduct appropriate reviews to make sure they meet your standards.

A well-documented dataset should provide the details you need in the metadata to validate its trustworthiness. Make sure to take the time to review what’s included and decide if you’re happy with what you see. On the other hand, limited metadata details should make you question the dataset even further using additional validation steps I introduce later in this post.

Once you know where the dataset comes from, when it was created and hopefully what kind of methodology was used to build it, you can decide if it will work for you or not. But that’s just the beginning.

Next you’ll have to start modifying the data for your own needs. Here are a few steps that are typically involved in this process:

  1. Inspect the data visually for any irregularities (geometry and attributes) and make corrections as needed.
  2. Align your data to match other data sources so things are where they belong. You can validate this visually.
  3. Select and clip (a bit like cutting a shape out of paper with scissors) the data to keep only the area you need for your project. Many datasets will cover an entire province or even the country. If you focus on a city extent, why slow down your process by using the entire dataset?
  4. Delete unwanted information contained in the dataset’s attribute table.
  5. Create new records or calculate new fields to add valuable details to your project.
  6. Clean up the attribute table by removing blank fields, finding and correcting typos, removing unwanted characters, deleting duplicates, etc.
  7. Save this new subset of the data as a new layer you’ll work on and keep an additional copy, just in case.
  8. Don’t forget to create your own detailed metadata for this new layer!

At this point, you have the data you really need. However, the dataset might not be perfect yet. It would be best to go a few steps further and ask yourself the following questions:

  • Does it meet statistical standards?
  • Is the dataset complete?
  • Does it have logical consistency?
  • How about its positional accuracy?
  • Does it make sense when evaluating how it evolved overtime?
  • If working with raster data, does the original classification match what your project requires?

If you’re not sure how to go about answering some or all of these questions, we now offer a new instructor-led course to guide you in this process, leveraging many ArcGIS Pro tools.

Preparing Data for GIS Applications

I know this process can sound overwhelming, but once your data are validated, you can rest assured that your project results can be trusted. By learning the process, you’ll be able to apply it to many more datasets in the future.

As a last note, I’d like to emphasize that others might want to use your data down the road, especially if you’re making them discoverable on a Web GIS portal. Make sure you document everything you feel could be useful in your metadata for the next person using it, so they can save time and trust your work.

This post was translated to French and can be viewed here.

About the Author

Carole Arseneau is a Market Research Specialist at Esri Canada. Over the years, she has advised customers from all industries on how to leverage GIS in their organizations. More recently, she’s been conducting market research to uncover details about the various jobs our customers do each day in various industries. This will help inform Esri Canada’s corporate strategy and better support our customers. Carole holds a Market Research certificate from the University of California, Davis, a GIS certificate from Florida State College in Jacksonville and a bachelor’s degree in Kinesiology from Laval University in Québec City. Being by the water has always made her feel at home and has given her inspiration to keep a positive outlook in life.

Profile Photo of Carole Arseneau