AI in 2025: What to expect in the year ahead

Sharon Zhou, CEO

As the reality of a new year hits us, I want to take a minute to share my thoughts on what will be essential in enabling enterprise AI developers to achieve high levels of accuracy and customization with LLMs.

Specialized agents will be easy to build

Open-source models have reached the frontier, closing the performance gap with the best closed models. The sentiment within the enterprise has flipped from “Open-source will never catch up” to “Open-source is a cost-effective and viable option.”

In 2025, there will be a significantly better LLM based on an open model (looking at you Meta), than the closed alternatives, in at least one industry. Open models will be much easier to leverage than closed ones! As a result, it’ll be easier to adapt and steer open models than wait for closed ones to catch up, because the tools to steer them will be in the hands of millions of developers rather than a thousand AI researchers.

This will require greater ease of use of LoRA finetuning which, to date, has only been accessible to ML experts. Developer tools with simple interfaces for non-ML experts are needed to make model adaptation more accessible (also better docs and examples for devs). 

Finetuning will be easier than advanced RAG

RAG has gotten extremely complex to eke out incremental improvements in accuracy and customization. Finetuning, on the other hand, has started to gain traction because of its ability to  achieve high levels of accuracy, especially on domain-specific data. In 2025, finetuning will take over as the significantly more powerful tool. New forms of finetuning will abound to address issues like hallucinations or creativity.

Again, this will require tools and interfaces that transform the grueling work of traditional MLOps workflows and make them more automated and developer-friendly. Saying goodbye to the likes of manual data preparation, cleaning, and labeling pipelines will be a refreshing change to making finetuning (and other post-training techniques) more amenable.

LLM inference latency and costs will drop to “zero”

No surprises here — this is the trend we observed throughout 2024 with the cost of inference steadily dropping to the cost of compute. That’s because it’s easier to build inference systems than it is to build training systems, and with open source options like vLLM, inference has become a commodity building block. 

Now that inference is more or less a solved problem, the playing field has moved back to better intelligence. Steering models to better, new forms of intelligence is what enterprises will pay money for. As mentioned earlier, specialized agents will be easier and more valuable to build.

What it will take in 2025:

  • Tools to define and manage agent behaviors, goals, and constraints reliably
  • Better tools for managing different types of agent memory, when that memory is both inside the LLM (e.g. via finetuning) and outside the LLM (e.g. RAG)
  • Better debuggers for agent decision-making
  • Clearer metrics and pricing models for measuring agent effectiveness, at least within some verticals

Say bye to manual data labeling 

Enterprises have realized they have vastly more data than it took to train GPT4, and that it’s incredibly valuable, if it can be transformed and cleaned into the right format. But manual labeling sounds awful.

Enterprise intelligence will become a new category that builds on open models, so the data derivative (model weights) will be owned by the enterprise and governed like their data. Data labeling will be heavily automated and transformed within organizations, both for and because of this new era of LLMs.

What it will take:

  • Infrastructure to detect and classify data assets across the organization, including stratifying them by importance, and auto-label / pre-label that data at scale
  • New validation mechanisms to ensure auto-labeled data maintains quality

It takes a lot more than The Model

People have been lauding The Model, but realized last year that it’s more than the model: it’s the cycle that combines The Model with The Data and The Evaluation (all running on The Compute) — and no step can be done in isolation.

This year, companies will build robust infrastructure around this cycle, with automated pipelines that continuously gather data from model interactions, evaluate performance, and finetune models. The focus will shift from "getting a model working" to "maintaining a healthy model lifecycle" with metrics and processes more akin to traditional software development practices.

What it will take:

  • Open-source standards for logging model interactions and outcomes
  • Automated evaluation systems that can detect model drift and performance degradation
  • Pipeline orchestration tools that can handle the complexity of continuous model update

A hybrid intelligence approach

Enterprises are using cloud in two ways: buying tier 1 cloud hosted big models, namely Azure-GPT, and NVIDIA compute on AWS to support running open-source models.

Next, we’ll see a "hybrid intelligence" approach emerge in production, where enterprises maintain a portfolio of models — some hosted by major providers, others run in-house on their own infrastructure. Their agentic workflows will use hybrid intelligence to be most successful.

What it will take

  • Standardized APIs to abstract away the differences between hosted and local models
  • Security frameworks to manage data privacy across multiple model deployments, with varying levels of security (in-house being more secure)

The rise of mini-models

People around the world are more creative than the model creators in how they use these models.

We will see this creative energy channeled into specialized "mini-models": smaller, focused models trained for specific domains or tasks that outperform general models in their niche. Communities will develop around sharing and combining these mini-models, leading to a large ecosystem of specialized AI capabilities that can be mixed and matched like software libraries.

What it will take

  • Package management systems for model/adapter/LoRA sharing and versioning
  • Tools for measuring and validating specialized model performance

I hope you enjoyed this post! Please let me know what you think and stay tuned for part 2 where we discuss the broader implications of AI in 2025. 

Additional Resources

Untitled UI logotextLogo
Lamini helps enterprises reduce hallucinations by 95%, enabling them to build smaller, faster LLMs and agents based on their proprietary data. Lamini can be deployed in secure environments —on-premise (even air-gapped) or VPC—so your data remains private.

Join our newsletter to stay up to date on features and releases.
We care about your data in our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
© 2024 Lamini Inc. All rights reserved.