Introducing Lamini On-Demand + $300 in free credit

Eda Zhou

Software Engineer, LLM Platform

We’re pleased to announce our new and improved self-service offering, ✨Lamini On-Demand✨. With Lamini On-Demand, you can run your tuning and inference jobs on our high-performance GPU cluster. You can use Lamini Memory Tuning to embed expertise and memory layers on top of any open LLM — turning any LLM into a mixture of experts! Our on-demand deployment option is great for people who don’t have access to GPUs or want to test some use cases, like our Meta Notebook to Tune Llama 3 for Text-to-SQL, without making any long-term or costly commitments

We wanted to make Lamini On-Demand pricing simple, cost-effective and easy to scale. That’s why we’re offering both new and existing users $300 in free credit. After that, it’s only $0.50 per million inference tokens or $1 per tuning step. Here’s a detailed breakdown of Lamini On-Demand pricing:

Inference Costs 💬: It’s $0.50 per million inference tokens. The pricing is the same for input, output, and JSON structured output. Now you can confidently and cost-efficiently deploy your LLMs with guaranteed JSON output.
Tuning Costs 🔧: The base rate is $1 per tuning step, where a step is a single update of the model's weights. You can set the number of steps and GPUs per job. The cost scales with the number of GPUs. For example, the base rate is $1 per step on 1 GPU, using 2 GPUs would double the cost to $2 per step, and so on. This allows you to adjust both the speed and cost of your tuning jobs based on your needs.

Credits can be purchased in $100 increments from your account page: https://app.lamini.ai/account.

With Lamini On-Demand, you have full control over the length and speed of tuning jobs and you can selectively burst tuning across multiple GPUs for faster performance. Here’s a handy calculator for you to accurately size your tuning needs.

Tuning Job Cost Calculator

Number of Datapoints i

Number of GPUs i

Show calculation breakdown

Calculation Breakdown:

If you’re a current customer on the Pro Plan, you should have already been transferred to On-Demand and received your free credit. You will still have access to train and run inference on the models you’re currently using, with the added benefit of more flexibility in how you use your credits. You don’t need to do anything else other than start tuning models!

For maximum flexibility, we offer two additional deployment options:

Reserved. If you don't have your own GPUs, you can reserve dedicated GPUs from our cluster.
Self-Managed. Purchase licenses to run Lamini in your own secure environment — VPC, on-prem, even air-gapped.

Check out our pricing here. If you have any questions, or just want to chat about how to get the most out of our platform, please contact us at https://www.lamini.ai/contact 🦙.

Introducing Lamini On-Demand + $300 in free credit

Tuning Job Cost Calculator

Calculation Breakdown:

Resources