Part 1: Can AI Extract Planning Data from Zoning By-laws Accurately?
Big tech often markets their AI products in a way that makes life easier and more efficient. But building trust in AI requires approaching them critically as a tool. In this 3-part series of blogs, I will explore AI tools for extracting planning data from zoning by-laws, and share what works, what doesn’t, and how you can approach these technologies with confidence and critical thinking.
Introduction
In the realm of urban planning, reading through By-laws can be incredibly time-consuming. Finding the relevant section(s), digesting all the legal wording, checking for other requirements, provisions and amendments… that all takes time!
In the interest of speeding up the process of retrieving relevant information from by-laws, we need to ask: Can AI extract planning data from Zoning By-laws accurately and what are the risks and limitations?
Big tech often markets their AI products in a way that makes life easier and more efficient. But building trust in AI requires approaching them critically as a tool. Outside of academic research papers, there are few easy-to-read resources explaining how AI models work. Knowledge about these tools should be more accessible to everyone. This blog shares my learning journey by breaking down technical concepts into an easy-to-understand way. The “further reading" sections point readers to external resources if they are curious to learn more.
In this 3-part series of blogs, I will explore AI tools for extracting planning data from zoning by-laws, and share what works, what doesn’t, and how you can approach these technologies with confidence and critical thinking. More specifically: How accurate can AI models be in handling this type of legal information where high accuracy is desirable?
This blog is part 1 in a three-part series:
- Part 1 – Choosing the Right AI Model for the task: Selecting the right model is critical as LLM model architecture choice impacts accuracy, performance, and resources.
- Part 2 - Metrics! Evaluating the accuracy of LLMs: Two models chosen in part 1 will be tested to see how accurately they can extract the data from Zoning By-laws.
- Part 3 - Fine-Tuning a Language Model on Zoning By-laws: The ‘best’ performing model from part 2 will be fine-tuned with data from Zoning By-laws to see if it can improve the accuracy further.
Part 1: Choosing the Right Model for the Task
The field of AI is large and there are many types of models in existence. As the main problem to address is the extraction of unstructured text (Zoning Data) from a text based legal document (Zoning By-law), naturally the focus would be on LLMs.
What are LLMs? A short definition: they are machine Learning models that can parse and generate human language text. Machine Learning is a subset of AI where models can “learn” the patterns of training data and make accurate “conclusions” about new data (see image below). As mentioned before, we need to take a complex legal text and extract information from it by asking specific questions. This is a classic-Question Answering-(QA) task in Natural Language Processing (NLP).
Esri Diagram – Common concepts within the field of AI
Esri diagram – Components/Areas of AI
Further Reading:
Natural Language Processing (NLP) VS Large Language Models (LLMs)
- Natural Language Processing (NLP): “a broader field focused on enabling computers to understand, interpret, and generate human language. It encompasses many techniques and tasks such as sentiment analysis, named entity recognition, machine translation, and extractive question answering.” (Hugging Face)
- Large Language Models (LLMs): “A powerful subset of NLP models characterized by their massive size, extensive training data, and ability to perform a wide range of language tasks”. (Hugging Face) Examples: Llama, GPT, Claude etc.
Further Reading:
Comparing Different Natural Language Processing (NLP) Tasks
Besides generating, summarizing, and translating text content, below are some of the most common Natural Language Processing (NLP) tasks:
- Text Classification: Assigns a label to a chunk of text like getting the sentiment, detecting email versus spam, etc.
- Named Entity Recognition (NER): Classifies each word in a sentence. It finds predefined entities in text (name, person, location, grammatical components, etc.)
- Question Answering (QA): Extracts a text span from context given a natural-language question.
The main problem with text classification is that it cannot extract data but assigns a category instead. The problem with Named Entity Recognition (NER) is that it restricts the questions you can ask and data you can extract from the Zoning By-law to a few predefined entities, such as maximum building height and lot coverage. Often, Zoning By-laws do not have a singular numerical value as an answer and it is difficult to clearly define entity spans. Therefore, question answering is identified as being the most suitable because it extracts a span of text from the given context.
There are many different types of LLM models that can be specialized for tasks like question answering. LLM model architecture choice impacts accuracy, performance, and resources.
The pros and cons of different model architectures are highlighted here:
- Encoder-only models: Best for understanding and extracting information (for example: BERT, Bidirectional Encoder Representations from Transformers). No hallucinations, faster, lighter, trained on smaller datasets for highly specialized tasks. These models have no generative component.
- Decoder-only models: Best for generating text (for example: GPT, Generative Pre-Trained Transformer). Can hallucinate, more resource-intensive, trained on larger datasets for more general-purpose use.
- Encoder-decoder models: Good for tasks like translation or summarization.
Within these three types of different models, we then need to consider how they handle context. In other words, each model can handle different input and output lengths of text. For example, there is a limit to how long your question prompt can be. Because Zoning By-laws can be hundreds of pages, context length is a critical factor!
- Short context models: Faster, cheaper, but data needs to be split into chunks to be fed into the model.
- Long context models: Handles more context/longer texts (great for legal documents like Zoning By-laws), slower and more expensive.
Further Reading:
- Hugging Face – How do Transformers work? Diving deeper into concepts of attention and encoder-decoder architecture
- Hugging Face – How Transformers Solve Tasks
- Hugging Face – Transformer Architectures. More on the three main architectural variants
With this knowledge, it appears that encoder-only models that are focused on understanding sentences, like BERT (Bidirectional Encoder Representations from Transformers) and RoBERTa (Robustly Optimized BERT Approach), are the most suitable models for testing out the extraction of planning data from Zoning By-laws. (more on these two models in part 2)
The Benefits of Using Open Source/Open Weight Models
Luckily a lot of these popular LLM base models are open weight models. Open weight, similar to open source means they are:
- Free to use with lots of quality tutorials teaching the public how to access and use them.
- Open weight: Model weights are learned/numerical parameters that determine the importance of features in a dataset or determine how input data is transformed into output. The weights are like knobs that control how much influence an input (image or text) has on the final output of the AI. When training or fine-tuning a model, these weights are adjusted as the model learns from the data. Open weight models mean that anyone can see these parameters, download and re-use the model or further train or fine-tune that model for their own purposes. Further reading about model weights
- Transparent about performance and accuracy
- Transparent about training data: Open weight models often showcase the datasets they have been trained on. Often, these datasets are open source as well and can be downloaded by others to train and fine-tune their own model.
As Zoning By-laws are publicly available legal documents, it makes sense to harness open weight models as a tool to extract planning data from them!
Next in Part 2!
Now that we have narrowed down a few LLMs to use, the next part focuses on evaluating the accuracy of the LLMs at extracting planning information from the Zoning By-laws. This blog dives into results from a case study comparing the accuracy of different LLM models.
Some topics to be covered include:
- What evaluation metrics can we use to evaluate LLMs for this question answering task?
- Choosing the right evaluation metric
- What is Zero Shot?
- Data Quality Matters! How to use question answering models and preparing data to prompt and evaluate them
Don’t miss what’s next! Join the Planning & Housing LinkedIn group to get notified when Part 2 goes live.