Tutorial: Using LLMs to get accurate data from earnings calls with Llama 3.1 and Lamini

Lamini

Thanks to SEC filing regulations, there is a goldmine of freely-available financial data that, in theory, any company could take advantage of. But getting data from earnings calls transcripts, public filings, and white papers is easier said than done. 

Currently, a multibillion-dollar financial services data industry exists to help companies extract, quantify, and analyze data like this. But what if, leveraging the power of your own custom-tuned LLM, you could extract those insights programmatically, at large scale?

The challenge of using LLMs in finance

There are, of course, very good reasons that finance companies are hesitant to leverage LLMs for mission-critical work like extracting important financial data:

  • LLMs can get things wrong. High profile “hallucinations” like Google’s AI telling search users to eat glue and rocks have understandably made businesses wary to leverage LLMs when data correctness matters (which, in finance, it almost always does). 
  • Custom models and fine-tuning, which can be used to eliminate the hallucinations and make LLMs more accurate, can be difficult and time consuming.
  • Many companies feel they simply don’t have the data to leverage custom LLMs, or don’t have the resources that would be required to effectively clean and transform the large amounts of data LLMs require for training and fine-tuning. 
  • Some companies are also concerned about progress being stymied, or products being hobbled, by restrictive LLM rate limits. 

But what if it was possible for finance companies to leverage LLMs without having to worry about hallucinations or rate limits? 

What if any company could build with LLMs because the LLMs themselves can help with tasks like generating training data, and cleaning and transforming existing data into a more LLM-friendly format?

Making that dream into reality

In this tutorial, we’re going to build a proof of concept for doing exactly that. 

Specifically, we’ll be building a data pipeline that intakes transcripts from earnings calls and outputs question-and-answer pairs in a structured JSONL format like so:

{
  "company": "WPP",
  "question": "What is the percentage growth rate of WPP's business in Germany in Q1, according to Mark Read?",
  "answer": "16%"
}
{
  "company": "GDOT",
  "question": "What is the size of the asset size that GDOT aims to maintain to protect its revenue",
  "answer": "According to the transcript, GDOT aims to maintain an asset size of $10 billion or less to protect its revenue"
}

This data can then subsequently be used for training and fine-tuning more advanced models, or for a variety of other purposes. It is a good illustration of how LLMs can be used to take a relatively small amount of unstructured data and, with very little human effort, transform it into a large amount of structured data – and how it can get the details right, instead of hallucinating. 

Here’s what we’ll be doing:

  1. Imports and setup
    • Importing necessary modules
    • Setting up logging
    • Defining our main() function
  2. Loading data and creating the pipeline
    • Loading the earnings calls data
    • Creating the QuestionAnswerPipeline class
  3. Breaking up transcripts and extracting info
    • Breaking transcripts into chunks
    • Pulling relevant metadata for each chunk
  4. Build the question and answer generators
    • Creating question and answer generators and relevant methods
    • Defining the prompts for each generator
  5. Saving the outputs and running the code

Don’t worry if that looks like a lot; as we’ll see, the Lamini modules we’ll be importing have already done quite a lot of the work for us! 

What we’ll be using

For this tutorial, the primary technologies we’ll be using are the Python programming language, Meta’s open-source Llama 3.1 LLM, and the Lamini platform, which will help us increase accuracy and automate aspects of building the pipeline. 

We’ll also be using the following Python modules:

  • typing (for type hints)
  • asyncio (for asynchronous processing)
  • tqdm (so we can see progress bars)
  • jsonlines (to help create the JSONL output)
  • collections (for useful container data types)
  • logging (for logging).

Over the course of this tutorial, we’ll write a Python script called generate_data.py that we can use to generate question and answer outputs from real earnings call transcripts, just like the example provided above.

Step 1: Imports, logging, and main()

We’ll start by importing all of the modules we need to run the script. If you don’t have them all installed already, you’ll first want to install any missing modules on whatever machine you’ll be using to run the code. For example:

pip install lamini

Once everything’s installed, we’ll start our script by importing everything we’ll need:

from lamini.generation.generation_node import GenerationNode
from lamini.generation.generation_pipeline import GenerationPipeline
from lamini.generation.base_prompt_object import PromptObject

import jsonlines

import collections
import asyncio
from tqdm import tqdm

from typing import Union, Iterator, AsyncIterator

import logging

Next, we’ll set up our logging. Here, we’re configuring our logs so they’ll store:

  • The time the log message was generated
  • The name of the logger object
  • The severity level of the message
  • The log message itself
logger = logging.getLogger(__name__)

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)

Now we’ll define our main() function. Using asyncio to enable asynchronous processing, this function loads the earnings call data, generates answers using the pipeline we’ll create, and then it will save the answers (using a function we’ll define later).

Asynchronous processing is necessary here for a variety of reasons, but the primary one is efficiency. Aysnc processing allows us to do parallel processing, avoid the I/O bottlenecks that would otherwise be associated with reading and writing to JSONL files, and process numerous transcripts at once without running afoul of rate limits.

async def main():
    earnings_calls = load_earnings_calls()

    answers = QuestionAnswerPipeline().call(earnings_calls)

    await save_answers(answers)

Now that we’ve got some of the basic elements required for our script, let’s move on to pulling in the data we need and setting up the pipeline.

Step 2: Load the earnings calls and define the pipeline

First, we’ll create a function to load the earnings calls. Note that in the code below, the value of path will need to be changed to reflect the location of the call transcripts (in jsonl format) wherever they are stored on your system.

async def load_earnings_calls():
    path = "/app/lamini-earnings-sdk/data/test_set_transcripts.jsonl"

    with jsonlines.open(path) as reader:
        for line in reader:
            logger.info(f"Loaded earnings call for {line['ticker']}")
            yield PromptObject(prompt="", data=line)

Next, we’ll create a class for the pipeline itself. This class will inherit from GenerationPipeline (one of the things we imported from the Lamini module), which enables asynchronous processing and manages API connections to ensure processing is possible before generation begins. 

The class contains a question generator and an answer generator (we’ll create these in the next step). It also contains the forward() function, which intakes data from the call transcripts, generates questions from that data, and then uses those questions to generate answers, returning a newly-generated question and answer pair:

class QuestionAnswerPipeline(GenerationPipeline):
    def __init__(self):
        super(QuestionAnswerPipeline, self).__init__()

        self.question_generator = QuestionGenerator()
        self.answer_generator = AnswerGenerator()

    def forward(self, x):
        x = self.question_generator(x)
        x = self.answer_generator(x)
        return x

Now that we’ve created the pipeline, we’ll move on to some basic data preparation to help with our question and answer generation. 

Step 3: Break up the transcripts and extract company info

Here, we’ll be creating two short functions that our question and answer generators will use to help define what section of a transcript the LLM is processing at any given time, and extract information about the company the transcript is discussing.

Specifically, chunk_prompt splits the transcripts into smaller chunks for processing and then outputs these chunks as PromptObjects:

def chunk_prompt(prompt):
    transcript = prompt.data["transcript"]
    chunk_size = 4096
    chunk_step = 2048

    for i in range(0, len(transcript), chunk_step):
        chunk = transcript[i : i + chunk_size]
        chunked_data = prompt.data.copy()
        chunked_data["transcript"] = chunk
        prompt_object = PromptObject(prompt=prompt.prompt, data=chunked_data)

        yield prompt_object

Then, get_company_info pulls the relevant metadata about the company, which has been attached to each chunk. This metadata includes the company name, stock ticker, call date, and the financial quarter being discussed. This context is provided to the LLM with each prompt chunk so that it can be factored into the generated responses.

def get_company_info(chunk):
    info = f"Company: {chunk.data['exchange']}\n"
    info += f"Ticker: {chunk.data['ticker']}\n"
    info += f"Date: {chunk.data['date']}\n"
    info += f"Quarter: {chunk.data['q']}\n"
    return info

Now it’s finally time to get to the meat of this project: building the question and answer generators!

Step 4: Build the question and answer generators

Now that we’ve got the basic pipeline built, it’s time to create the question and answer generators that power it. 

Now let’s create the question generator class. We start by creating the class (inheriting from our Lamini import GenerationNode) and defining the model we’ll be using, which in this case is Llama 3.1.

class QuestionGenerator(GenerationNode):
    def __init__(self):
        super(QuestionGenerator, self).__init__(
            model_name="meta-llama/Meta-Llama-3.1-8B", max_new_tokens=150
        )

Next, we’ll create the generate method to generate the questions. After adding a template (we’ll cover this shortly), we’ll call the superclass method generate from GenerationNode. This method uses the templated prompt to generate three questions in string format, which are then returned as the method’s output.

def generate(
        self,
        prompt: Union[Iterator[PromptObject], AsyncIterator[PromptObject]],
        *args,
        **kwargs,
    ):
        prompt = self.add_template(prompt)

        results = super(QuestionGenerator, self).generate(
            prompt,
            output_type={
                "question_1": "string",
                "question_2": "string",
                "question_3": "string",
            },
            *args,
            **kwargs,
        )
        return results

Next, we’ll create a function to process this output. It iterates over the three questions, checking them for validity and then creating three PromptObjects with them that we will subsequently use as the prompts for generated answers.

 async def process_results(self, results):
        async for result in results:
            logger.debug(f"Generated question for {result}")
            if result is None:
                continue

            if "question_1" not in result.response:
                continue

            if "question_2" not in result.response:
                continue

            if "question_3" not in result.response:
                continue

            questions = (
                result.response["question_1"],
                result.response["question_2"],
                result.response["question_3"],
            )
            for question in questions:
                result = PromptObject(prompt=question, data=result.data.copy())
                yield result

Now we’ll define the template creation method so that we can template the incoming data. This method takes prompts from the earnings call data as an input, breaks them into chunks using a method we’ll create later, and then outputs those chunks for use by the generate method we just created.

Finally, we’ll create the initial prompt for Llama 3.1 to process. We’re telling the model to act like a financial analyst and generate questions that can be answered by the text. Note that we’re specifically giving the LLM both the company data for the company in question and a specific chunk of the transcript to work with.

Note also that we’re using some of Llama 3.1’s special tokens in this prompt. These are used to make the prompt easier for the model to parse.

  • <|begin_of_text|> marks the beginning of the text prompt.
  • <|start_header_id|>user<|end_header_id|> specifies that this section of the prompt is a user message, i.e., instructions from a human user.
  • <|eot_id|> at the end marks the end of the user message section.
  • <|start_header_id|>assistant<|end_header_id|> specifies that the user-provided prompt has ended and the model should begin to generate a response.
 def make_prompt(self, chunk):
        prompt = "<|begin_of_text|><|start_header_id|>user<|end_header_id|>"

        prompt += (
            "You are a financial analyst with extensive experience at Goldman Sachs."
        )
        prompt += "You are reading the earnings call transcript for the following company:\n\n"
        prompt += "====================\n\n"
        prompt += get_company_info(chunk) + "\n"
        prompt += "====================\n\n"
        prompt += (
            "You are reading the following section of the earnings call transcript:\n\n"
        )
        prompt += "====================\n\n"
        prompt += chunk.data["transcript"]
        prompt += "====================\n\n"
        prompt += "Consider the numbers in the transcript. "
        prompt += "Ask three questions about the numbers in the transcript that require precise answers. "
        prompt += "Only ask questions that can be answered using the transcript."
        prompt += "<|eot_id|>"
        prompt += "<|start_header_id|>assistant<|end_header_id|>"

        return prompt

Now that we’ve built the question generator, we just need to repeat the same process for the answer generator, making a few minor adjustments so that the generator is generating answers to the QuestionGenerator questions using the same snippet from the earnings call transcript.

class AnswerGenerator(GenerationNode):
    def __init__(self):
        super(AnswerGenerator, self).__init__(
            model_name="meta-llama/Meta-Llama-3.1-8B", max_new_tokens=150
        )

    def generate(
        self,
        prompt: Union[Iterator[PromptObject], AsyncIterator[PromptObject]],
        *args,
        **kwargs,
    ):
        prompt = self.add_template(prompt)
        results = super(AnswerGenerator, self).generate(prompt, output_type={"answer" : "str"}, *args, **kwargs)
        return results

    async def process_results(self, results):
        async for result in results:
            logger.info(f"Generated answer for {result}")
            if result is None:
                continue
            yield result

    async def add_template(self, prompts):
        async for prompt in prompts:
            logger.info(
                f"Generating answer for {prompt.data['ticker']}, {prompt.data['q']}, {prompt.prompt}"
            )
            prompt.data["question"] = prompt.prompt
            prompt.prompt = self.make_prompt(prompt)
            yield prompt

    def make_prompt(self, chunk):
        prompt = "<|begin_of_text|><|start_header_id|>user<|end_header_id|>"

        prompt += (
            "You are a financial analyst with extensive experience at Goldman Sachs."
        )
        prompt += "You are reading the earnings call transcript for the following company:\n\n"
        prompt += "====================\n\n"
        prompt += get_company_info(chunk)
        prompt += "====================\n\n"
        prompt += (
            "You are reading the following section of the earnings call transcript:\n\n"
        )
        prompt += "====================\n\n"
        prompt += chunk.data["transcript"] + "\n"
        prompt += "====================\n\n"
        prompt += "Consider the numbers in the transcript. "
        prompt += "If the answer to the question cannot be found in the transcript, reply that you do not know. "
        prompt += "Answer the following questions about the numbers in the transcript. "
        prompt += chunk.prompt
        prompt += "<|eot_id|>"
        prompt += "<|start_header_id|>assistant<|end_header_id|>"

        return prompt

Note both the question and answer generators can be easily modified by tweaking the text of the prompts in their make_prompt methods.

Step 5: Save the outputs and run it!

The last step is also the most straightforward. All we need to do now is create the save_answers method we’ve called in previous steps so that our generated questions and answers are saved in a JSONL file. This code simply creates a file (at path; change the value of this variable depending on your desired filepath), and writes each question and answer pair to it, along with relevant metadata such as the company’s stock ticker, the date, etc.

async def save_answers(answers):
    path = "/app/lamini-earnings-sdk/data/generated_q_a.jsonl"

    with jsonlines.open(path, "w") as writer:
        pbar = tqdm(desc="Saving answers", unit=" answers")
        async for answer in answers:
            answer = {
                "ticker": answer.data["ticker"],
                "q": answer.data["q"],
                "date": answer.data["date"],
                "transcript": answer.data["transcript"],
                "prompt": answer.prompt,
                "question": answer.data["question"],
                "answer": answer.response["answer"],
            }
            writer.write(answer)
            pbar.update()

Finally, there’s only one thing left to do: call main() and run the script to begin generating questions and answers:

asyncio.run(main())

Next steps

Building a dataset of accurate questions and answers from earnings call data is valuable in its own right, but the broader point is that using Lamini and an open-source LLM, we were able to:

  1. Generate a large amount of structured data from a comparatively small amount of unstructured data
  2. Generate outputs that are highly accurate by focusing the LLM on small chunks of source data, with very clear guidelines in the prompt shaping its output.
  3. Avoid hassles such as manual data cleaning and issues with rate limits. 

It is possible to accomplish this simply thanks in large part to the Lamini platform. But this is really just the tip of the iceberg when it comes to what Lamini can do. Contact us or try it for free on Lamini Cloud!