Pretrained Deep Learning Models: Why are they useful and how to use them?

July 6, 2023

Mohamed Ahmed

There are currently over 60 pretrained deep learning models in the ArcGIS Living Atlas of the World that can accelerate your geospatial workflows for image feature extraction and detection, pixel classification, object tracking, point cloud classification, and image redaction - and the number of models keeps growing. In this blog post, we will explore the benefits of using these ready-to-use pretrained models and how to start using them in the ArcGIS system.

Artificial Intelligence (AI), machine learning, and deep learning (DL) are contributing to a more improved world in various ways. For instance, in agriculture, precision farming powered by AI technologies improves crop productivity, ensuring efficient resource allocation and maximizing yields. In the field of law enforcement, predictive policing models driven by AI aid in combating crime proactively, identifying patterns and potential hotspots to prevent criminal activities. In the domain of weather forecasting and disaster management, the utilization of DL algorithms assists in accurately predicting severe weather events, thereby allowing necessary preparations to be made to mitigate their impact effectively. If you are new to the terms of AI, machine learning, and DL, I recommend reading this blog post first to learn how to differentiate between them.

One area of AI where deep learning has performed exceedingly well is computer vision, or the ability for computers to see. We can use DL models in ArcGIS to perform tasks such as object detection, image classification, semantic (or pixel) classification, and instance segmentation (precise feature extraction). Figure 1 shows some of the most important computer vision tasks and how they can be applied to GIS.

Image classification as damaged house or undamaged house, pixel classification as land cover classification, object detection as identifying palm trees, and instance segmentation as outlining the boundaries of buildings are among the most common tasks in computer vision. Figure 1: Computer Vision tasks include image classification, pixel classification, object detection, and instance segmentation.

Image classification involves assigning a single label or category to an entire image, such as determining whether an image represents a ‘damaged or ‘undamaged’ house. Pixel classification, in contrast, aims to label individual pixels within an image and is often used in tasks like land cover classification. Object detection goes a step further by identifying and localizing multiple objects within an image, such as detecting buildings or trees in aerial imagery. Lastly, instance segmentation involves not only detecting objects but also precisely delineating the boundaries of each instance, like identifying and outlining individual houses in satellite imagery. Each of these techniques serves a unique purpose in extracting valuable information from geographic and remotely sensed data.

Examples of the applications of pretrained DL models in the ArcGIS system can be viewed in the following story map:

Thanks to the sensors all around us and the numerous satellites that are imaging the whole world every day, spatial data has become more abundant and diverse than ever before. Extracting meaningful insights from this vast amount of geospatial information can be a daunting task. However, with the advent of DL techniques and the integration of pretrained models into the ArcGIS system (Figure 2), the ability to analyze and automate feature extraction from this spatial data has become more accessible to GIS users with limited DL expertise who don’t have much knowledge of model architectures or training techniques. In the following sections, we will explore the ready-to-use pretrained DL models in ArcGIS Living Atlas of the World and delve into their applications and benefits.

Search 'ArcGIS Living Atlas of the World' and type 'dlpk' or 'deep learning models in the Living Atlas search bar to find a list of available pretrained models.

Figure 2: A list of available pretrained models in the ArcGIS Living Atlas of the World can be found by searching for “ArcGIS Living Atlas of the World” on your browser and typing “dlpk” or “deep learning models” in the search bar.

Why do we need to use pretrained models?

Let's say we want to create a model that can find houses in Calgary, Alberta. We can begin by showing the model pictures of houses of different sizes and shapes. These pictures will be used to train the model. As the model learns, it builds layers of knowledge starting from simple things like edges and colours and moving on to more complex details like the structure of the houses. Each layer assigns different levels of importance to these details by calculating probabilities. Such probabilistic values, called weights, are stored at each neural network layer in the DL model, and as layers are added, its understanding of the representation improves. Creating such a model from scratch would require a massive amount of data (often millions of rows of data). These data can be pricey and challenging to obtain, but compromising on data can lead to poor performance of the model.

Instead of building a DL model from scratch, we can use pretrained models and fine-tune them to solve a similar new problem. A pretrained AI model is a deep learning model that’s trained on large datasets to accomplish a specific task, and it can be used as is or customized to suit specific case studies or application requirements. Because of the rich knowledge acquired during pretraining, the fine-tuning phase allows the new model to effectively handle new tasks even with a small amount of data. That's why pretrained models with precomputed weights are useful.

By using a high-quality pretrained model with accurate weights, we increase the chances of success of our model deployments. We can also modify the weights and add more data to further customize or fine-tune the model. To transfer learning from a pretrained model, we can use the Train Deep Learning Model tool. Inputs are the pretrained model and some training samples for our interest class or object. The samples can be created using Label Objects for Deep Learning. It's like getting a ready-made dress or shirt and then making small adjustments to have it fit perfectly, rather than starting from scratch with fabric, thread, and a needle. Figure 3 illustrates the transfer learning environment for the DL models using the example of extracting building footprints. The weights are transferred from a previously trained DL model with a larger dataset that includes many types of objects to a newly trained DL model with a smaller dataset that includes only buildings. In short, using pretrained AI models saves time, money, and effort as it eliminates the need for huge volumes of training data, massive computing resources, and extensive artificial intelligence (AI) knowledge while still allowing us to tailor the model to our specific needs.

A conceptual visualization shows the transfer learning between pretrained deep learning models trained on large datasets and new deep learning models trained on small datasets for inferring a new task, such as building footprint extraction.

Figure 3: A conceptual diagram showing the transfer learning for deep learning models.

How to use pretrained models in ArcGIS?

We can use these pretrained deep learning models in the ArcGIS system through ArcGIS Image, including ArcGIS Image for ArcGIS Online, ArcGIS Image Analyst for ArcGIS Pro, and ArcGIS Image Server for ArcGIS Enterprise, and also with the arcgis.learn module included in the ArcGIS API for Python. For example, you can specify a pretrained model deep learning package (*.dlpk) that you’ve downloaded as the model definition for geoprocessing tools such as the Detect Objects Using Deep Learning tool in ArcGIS Pro (Figure 4). Then you can customize the model parameters based on your specific case study needs.

A screenshot of the Detect Objects Using Deep Learning tool in ArcGIS Pro (version 3.1.1) shows where the downloaded pretrained deep learning model should be added.

Figure 4: A screenshot of one of the Deep Learning tools in ArcGIS Pro (version 3.1.1) that allows the use of a pretrained model.

One of the pretrained models recently added to the ArcGIS Living Atlas of the World is Meta's Segment Anything Model (SAM) for segmenting objects in any imagery. This model has been trained on a diverse range of images (about 11 million images and over 1 billion masks) and is capable of segmenting various objects and elements within an image with high accuracy. SAM has strong zero-shot performance on a variety of segmentation tasks. Zero-shot means that SAM can segment objects without any additional training or fine-tuning on a specific task or domain. For example, SAM can segment cars, roads, trees, houses, and water bodies without any prior knowledge or supervision. In other words, SAM has learned a general notion of what objects are, and it can generate masks for these objects in any image or any video, even including objects and image types that it had not encountered during training. The model is open source under Apache License 2.0 and can be accessed and downloaded from GitHub. You can learn more about the SAM model training data and architecture in the paper Segment Anything.

Here are the steps that you can follow to use pretrained models in ArcGIS Pro:

Step 1) Install the deep learning libraries. Go to the Deep Learning Libraries Installers for ArcGIS page and download the installer that matches the ArcGIS Pro version you are using.

Install the deep learning libraries by downloading the installer that matches your ArcGIS Pro version.

Step 2) Browse to ArcGIS Living Atlas of the World. Search for the desired pretrained model and open the item page from the search results. Click the Download button to download the model.

Search for the pretrained model on the ArcGIS Living Atlas of the World.

Step 3) Add your imagery layer in ArcGIS Pro and Zoom to an area of interest.

Step 4) Browse to Tools under the Analysis tab. Then, expand the Image Analyst Tools (Image Analyst Extension for ArcGIS Pro is required), and select the proper geoprocessing tool for your task (such as Classify Pixels using Deep Learning or Detect Objects Using Deep Learning) under Deep Learning.

Step 5) Set the variables under the Parameters tab accordingly. The pretrained model can be used as is or fine-tuned. Don’t forget to change the processor type to GPU (if possible) under the Environments tab. Then, click Run to execute. As soon as processing finishes, the output layer will be added to the map.

Steps to use the pretrained model in ArcGIS Pro to extract the buildings footprint.

You can find more information about the Building Footprints - USA deep learning model and see its deployment in other parts of the world in this story map Building Footprint Extraction.

In summary, the integration of pretrained deep learning models into ArcGIS has opened up new possibilities for spatial data analysis. With these models, users can extract valuable insights from geospatial data in a time-efficient manner while achieving high accuracy. As the field of deep learning continues to evolve, the availability and application of pretrained models in ArcGIS will further empower GIS professionals to tackle complex geospatial challenges and expose the full potential of imagery and big spatial data.