How to leverage the distributed computing power of GeoAnalytics Server
Do you have a lot of data to process on a regular basis? Do you ever wish the processing time to be shorter? Beginning at ArcGIS 10.5, you can perform feature analysis using distributed computing with the tools provided by ArcGIS GeoAnalytics Server. Analyses that took minutes or hours can now be done within seconds. Using a crime scenario, this blog demonstrates how to leverage the powerful ArcGIS GeoAnalytics Server for big data analyses.
When I first heard the term - big data, I was confused. How big is ‘big data’? As I got exposure to some so-called big datasets, I got a better understanding. Big data doesn’t necessarily have to contain billions of points, lines or polygons. It is simply any large dataset that you experience a hard time analyzing or just cannot analyze using traditional data processing tools.
At ArcGIS 10.5, we introduced ArcGIS GeoAnalytics Server. You can now leverage distributed computing power of multiple machines and cores for big data analysis through ArcGIS Pro and Portal for ArcGIS. In this blog, we will walk through a scenario where we detect patterns of nearly half a million crime points by aggregating them into hexbins. To explain the difference, I’ll run the same analysis using two different methods: a traditional Model Builder workflow and a GeoAnalytics tool. We will then compare the results and speeds between the two. Both workflows will take place within ArcGIS Pro. Since this is a big dataset and my ArcGIS Pro is directly connected to our Portal for ArcGIS, my analysis reaches out to the cloud for the data.
For the sake of this analysis, let’s take the city of Vancouver as an example. We’ve downloaded the crime dataset from the Vancouver Open Data Catalogue containing relevant data from 2003 to 2016. We’d like to make sense of this crime data and identify which geographical areas have a higher crime rate. The layer has almost half a million points and simply throwing it on to a map won’t tell us anything.
Vancouver crime points from 2003 to 2016
One of the ways to find out crime patterns is to aggregate crime points into polygons and look for point clusters. For those of you who are familiar with the ArcGIS for Desktop world, we have been performing this analysis locally and using a model like the image below. With the tool, Generate Tessellations, the model creates 500-metre hexbins based on the city’s boundary. It then summarizes crime points into those hexbins, makes a feature layer, applies preset symbology, packages the result and publishes it to our Portal for ArcGIS.
An example of crime analysis using Model Builder
I ran the model myself. What I’ve highlighted here is - how long it took to run the Summarize Within tool. This tool aggregates the crime points into previously generated hexbins. It is the heavy lifting part of the analysis and this one tool alone took 25 minutes. Image 2 below is the result, which isn’t bad for 25 minutes.
Model process
Model result
The difference GeoAnalytics Server can make
Let’s now perform the same analysis using a GeoAnalytics tool, Aggregate Points. This tool takes care of all the tools we use in the model. You can use your own polygon layer for the aggregation process. If you don’t, you have the choice to aggregate points based on two types of bins, hexagon or square. However, unlike the Generate Tessellation tool, it doesn’t require a city boundary to generate matching hexbins. Instead, it generates hexbins at all locations where points exist.
Another advantage of this tool is that it leverages the distributed computing power of the GeoAnalytics Server. From the image, you can see that the tool created 12 different tasks and distributed the job to multiple machines. Moreover, it published my analysis result directly to our Portal for ArcGIS. With the exact same parameters of aggregating points into 500-metre hexbins, the analysis only took 18.5 seconds. And no, I’m not kidding.
Process of the GeoAnalytics Tool, Aggregate Points
Image 6 below demonstrates the result from the GeoAnalytics tool. It is similar to the one generated by our model. However, as I mentioned earlier, the tool generates hexbins at all locations where points exist. This result reflects more details of our crime locations, including the two spots in water, which the result of the model (image 4) didn’t have.
Result of the GeoAnalytics Tool, Aggregate Points
This was a mere sample of the ability of GeoAnalytics Server showcasing one single tool. Imagine the potential once you start utilizing more tools available from GeoAnalytics Server. By harnessing the distributed processing power across multiple GeoAnalytics server machines and cores, what used to take several steps, multiple tools, and minutes or even hours to process can now be completed within seconds using the GeoAnalytics tools. Now it’s your turn to explore!