Since the launch of ChatGPT in November 2022, there’s been a lot of hype around how it and other Large Language Models (LLMs) will impact P&C insurance.
Being an AI startup in the Intelligent Document Processing space, one of the most frequently asked questions we get from P&C insurers is:
“Can we use ChatGPT or Large Language Models (LLMs) to extract data from documents?”
I wanted to answer this question and share some of the insights we’ve previously only given to our insurance customers. The conclusions in this article were based upon the following:
- Interviews with SortSpoke customers testing ChatGPT and LLMs
- SortSpoke’s experience experimenting with ChatGPT and LLMs
- SortSpoke’s deep in-house technical experience with LLMs
But before we dive into this article, it’s important for everyone to understand how ChatGPT, Google Bard, and other LLMs work, as well as discuss their strengths and limitations.
What are Large Language Models?
ChatGPT, Google Bard, etc., are based on Large Language Models (LLMs). In this article, we’ll refer to LLMs as the more general term that includes vendor-specific implementations like ChatGPT. These models are trained on vast amounts of data on the internet, such as web pages, wikipedia, literature, news and trade publications, public forums, etc. One of the strengths of LLMs is their ability to learn patterns in how words and phrases relate to each other and use that to generate text that is convincingly human-like.
For example, when you ask LLMs to write a blog post, the text it returns is based on ingesting a series of words (often called a “prompt”) and predicting what words should come next based on when it has seen similar combinations of words in its training data (i.e. on the internet).
The amazing and surprising outcome is that this simple mechanism can produce text that is remarkably fluent and sometimes even correct. The downside is that ChatGPT doesn’t understand the meaning of any of the words and will confidently make claims that aren’t true. Unfortunately, this can lead to issues if you need to rely on the correctness of ChatGPT’s output.
So can I use ChatGPT and LLMs for Data Extraction?
The answer to this question really depends on your use case. If you’re looking to pull short snippets of information (e.g. name, account number, etc) from simple documents, and you can live with significant errors, then ChatGPT can do an acceptable job.
However, there are two BIG challenges you should keep in mind when using LLMs for data extraction.
Challenge #1 Hallucinations: One area where ChatGPT and other LLMs fall short is they do not have a contextual understanding of documents as a whole. This can lead to LLMs generating false answers, also known as hallucinations.
Sometimes, these errors can be obvious to the human eye. In other instances, these
hallucinations can be very subtle, like in Example #1.
Hallucination Example 1: ChatGPT Changed the Original Address Name
Original Address in PDF
2222 COLONY ROAD, MOORCROFT
Address Name Extracted by ChatGPT
2222 COLONY ROAD, MOONCROFT
Note: The word “Mooncroft” (with an “n”) didn’t appear anywhere in the original document. This is an example of ChatGPT hallucinating what it thought the answer should be instead of what was actually present, likely because on the internet the presence of the word “COLONY” occurs more often near “MOON” than “MOOR”. (Source Article)
While the example above is a small error, these errors can quickly stack up if you aren’t reviewing ChatGPT’s output. In addition, the more text you input into or generate from ChatGPT and other LLMs, the greater your chances of hallucinations.
For example, when extracting longer text from legal documents like clauses, LLMs summarize or paraphrase instead of extracting the complete text.
Hallucination Example 2: Extracting a “Governing Law Clause”
When we asked ChatGPT to extract the governing law clause section from two legal documents, it correctly returned 90% of the text. However, no matter what prompts we used, ChatGPT would not extract the entire Governing Law Clause section.
Unfortunately, when it comes to data extraction, LLMs will generate answers that are incorrect or incomplete, which brings us to another limitation of LLMs.
Challenge #2 Explainability: LLMs provide no traceability from a source document to their output. If you want to verify the output from an LLM, you must redo all of the work manually and compare the results. This means you’re replacing one form of work with another.
In addition, Model risk management is becoming an increasingly important requirement for compliance (see EU AI Act and NAIC bulletin). Without the ability to explain where an LLM got its answers from, compliance with model risk management policies may not be possible.
Challenges #3-6 of Using LLMs in Insurance Operations
Recently, there have been solution providers touting LLMs as a data extraction tool. If you are currently evaluating these solutions, we suggest asking these vendors how they deal with hallucinations, explainability, as well as the following:
Domain-Specific Knowledge: Insurers can fine-tune LLMs using their existing domain-specific data, but it’s a time-consuming and expensive process requiring data preparation, human review, hard computing costs, and ongoing maintenance. Furthermore, the latest generation of LLMs contain significant bias from the human reviewers at the organizations that created the original LLMs.
Limited Context: One challenge with LLMs is that they cannot process longer documents (5+ pages) effectively. Furthermore, in documents where the layout or formatting is important to finding the correct information (e.g. tables), LLMs do not consider this (see page 11 of this Deloitte report).
Prompt Engineering: Significant R&D is required to create prompts (much of which is trial and error) because there is no practical or scientific understanding of how LLMs work. This requires further investment in hiring data scientists or the newly coined job category of “prompt engineers.”
Difficulty in Extracting Structured Data: ChatGPT was designed to generate long-form text and may not be suitable for extracting data from documents in a structured format for import into other systems. This is especially problematic when extracting many data points (e.g. application forms) or multiple instances of data (e.g. loss runs) from a single document.
Our Conclusion on Using LLMs for Data Extraction
LLMs are powerful tools for generating text, but they were not built for extracting data from structured and unstructured documents.
Furthermore, while LLMs may produce impressive results out of the box that may seem effortless in a vendor demo, the technology is full of risks that won’t become apparent until you are months or even hundreds of thousands of dollars into your LLM-powered pilot.
Our stance on using LLMs to extract data is that they are a fantastic tool for quickly doing ad hoc analytics on historical data where results will be aggregated and mistakes can be tolerated. But in areas where every data point matters, such as underwriting, servicing, claims, etc., insurers should use machine learning models that are battle-tested and purpose-built for insurance data extraction.
Why SortSpoke’s approach is the best of both worlds
Before founding SortSpoke, I worked as a consultant at PwC on large data projects across various industries, and I saw firsthand the struggles of accurately extracting data from unstructured documents.
This is why I created SortSpoke and the reason I believe the best way to pull information from highly varied and complex unstructured documents is by using built-for-purpose AI models + humans in the loop.
This powerful combination allows SortSpoke’s customers to train our models within a few days (no templates or IT involvement required). Not only is this method more accurate, but it’s also fully explainable and allows our customers to ensure 100% data quality.
Over the years, we’ve helped insurers and service providers reduce operational costs and augment their workforce by reducing administrative tasks in underwriting and claims. Some of our customer success stories include:
- TAI Reduces the Treaty Review Process From 8 Hours to 35 Minutes
- SCM Insurance Services Reduces the effort to setup a claim by 80%
- RGA Insurance Selects SortSpoke to Turn Documents into High-Quality Data
Looking Uncover More about the Latest Trends in AI?
One of the benefits of working with insurers of all shapes and sizes is that it allows us to gain a holistic view of the latest trends in P&C insurance. To discover how other insurers and service providers are leveraging AI, contact us to schedule a 30-minute consultation today!
Superior Inbound Material
Quickly and efficiently build the materials you need to support your inbound marketing strategy. Drag and drop building blocks including testimonials, forms, calls-to-action, and more.