Build High-precision Function Calling Agents

What is function calling

Function calling refers to the capability of a model to interact with external tools or APIs to perform specific tasks. This involves the LLM identifying the appropriate function needed to fulfill the request, extracting parameters from the request, and then executing that function. Ideally, the data is returned as valid JSON, so the results can be processed and summarized back into natural language. But in most cases there are syntactic or semantic errors which leads to incorrect results. To get perfectly structured JSON, you typically have to build a custom parser or use a tool like LlamaIndex or LangChain.  

Function calling extends an LLMs utility beyond generating text to performing dynamic and context-aware operations. It also enables you to extract current data not included in the LLM at the time it was trained. Function calling enables developers to build:

  • Responsive Customer Support Chatbots: These use function calling to retrieve user account information, process service requests, and provide real-time support. For instance, a user might ask, "What is the status of my order?" The chatbot identifies the intent, calls the function to check the order status, and then returns the information to the user.
  • Virtual Assistants: Assistants like Siri or Google Assistant use function calling to perform tasks such as setting reminders, sending messages, or providing weather updates. When a user asks, "Set a reminder for 3 PM," the assistant processes the request and calls the function to create the reminder.
  • E-commerce Recommendations: Online stores use function calling to provide personalized product recommendations. When a user browses products, the recommendation system calls functions to analyze user behavior and suggest relevant items.

Challenges with function calling

There are three main challenges when it comes to function calling with LLMs.

  • Scalability and reliability. Traditional large language models like GPT-4 often encounter challenges in scalability and reliability, particularly when handling more than 20 functions. These models can invoke unnecessary or incorrect functions, or generate incorrect parameters, leading to unreliable outcomes. Moreover, as the number of functions or the complexity of sequential function calls increases, the accuracy of these models tends to diminish.
  • Accuracy of JSON output. Having accurate structured JSON outputs is critical for production use cases. In addition to hallucinating functions, LLMs can also hallucinate JSON output, particularly with more complex schemas. Mistakes in JSON structure, such as incorrect data types or misnamed keys, can lead to errors when the data is processed.
  • Latency. Function calling often involves interacting with external APIs or databases, which can introduce significant delays, especially if the data retrieval process is complex or the external system is slow. High latency can be particularly problematic in real-time applications where timely information is crucial.  
  • Cost. Executing external function calls can be costly, especially if the function relies on APIs or services that charge based on the amount of data retrieved or the number of calls made. Frequent or complex function calls can quickly become expensive.

Our solution

  • Precise Recall with Lamini Memory Tuning. Lamini's Memory Tuning helps teams achieve >95% accuracy, even with thousands of functions.
  • Guaranteed JSON Output. The Lamini Schema Generator guarantees accurate JSON output based on user-defined schemas, reducing downstream errors. 
  • High inference throughput, low latency. Lamini delivers 52x more queries per second compared to vLLM. This ensures that your users experience minimal wait times, even when dealing with large-scale function calling tasks.
  • Affordable and flexible deployment options. Lamini-powered models can run anywhere, including air-gapped environments, on-premises, or in the cloud. Lamini is GPU-agnostic and supports training and inference on both Nvidia and AMD GPUs. We’ve optimized our entire stack for high-performance so less compute is needed during inference and tuning. 

How CopyAI categorizes vast amounts of data with 100% accuracy

"Once [the LLM built with Lamini] was ready, we tested it, and it was so easy to deploy to production. It allowed us to move really rapidly." - Chris Lu, Co-founder

Learn more