Pricing

Deploy accurate, fast, secure, and cost-efficient models tuned to the data that matters most to your business. We offer three deployment options to give you full control over your data and models.  
On-demand
Pay as you go, new users get $300 free credit
$0.50/1M tokens
$1/tuning step
$0.50/1M inference tokens - one price for input, output, and JSON output
$1/tuning step - scale number of steps based on your data
Linear multiplier - burst tuning across multiple GPUs or nodes for faster performance (e.g. x3 for 3 GPUs)
Try the full lifecycle: choose a model, RAG/prompt tune, memory tune, evaluate, and run inference
Access to top open source models like Llama 3.1, Mistral v0.3, and Phi 3
Runs on Lamini’s optimized compute platform, generating state-of-the-art MoME models
Get Started
Reserved
Don't have your own GPUs? Get dedicated GPUs from Lamini's cluster
Custom
Run on reserved GPUs from Lamini
Unlimited tuning and inference
Unmatched inference throughput
Full evaluation suite
Access to world-class ML experts
Enterprise support
Contact us
Self-managed
Run Lamini in your own secure environment (VPC, on-prem, air-gapped)
Custom
Run Lamini on your own GPUs
No internet access needed
Pay per software license
Full evaluation suite
Access to world-class ML experts
Enterprise support
Contact us
Free
Sed sed risus pretium quam vulputate.
$0/year
Upto 10 projects
Customizable dashboard
Upto 50 tasks
Upto 1 GB storage
Get started
Pro
Sed sed risus pretium quam vulputate.
$400/year
Upto 10 projects
Customizable dashboard
Upto 50 tasks
Upto 1 GB storage
Unlimited proofings
Unlimited custom fields
Unlimited milestones
Unlimited timeline
Get started
Trusted by Fortune 500 & Leading startups


Lamini Pricing FAQ

Frequently Asked Questions

What hardware do you use in your cluster?
Lamini On-Demand currently uses MI250s, but we have MI300s available for our Lamini Reserved plans. Please contact us to learn more about Lamini Reserved and our MI300 cluster.
How do I size the number of GPUs?
Increasing the number of GPUs will speed up your job by approximately 1.5x per GPU. Lamini will automatically reschedule your long running jobs, even if they’re only scheduled on 1 GPU.
Is there a difference in price between input and output tokens?
For Lamini On-Demand, the price for both input and output tokens is $0.50 per million tokens.
Do you offer any volume discounts?
Not for Lamini On-Demand. If you want to run a large volume of jobs or data, contact us about Lamini Reserved or Self-managed for better pricing.
How do you license?
For Lamini Reserved and Self-Managed, we license based on the number and type of GPU(s). Please contact us for a quote.
Do you offer special pricing for startups?
Yes, we do. Please contact us.
How much data do you need to start?
For an initial evaluation data set, you will need about 20-40 input-output pairs to start. As you iterate, you will add more data until you achieve the level of accuracy required for your use case.
How long does it take to run a tuning job? About how much will it cost to run a tuning job?
It takes approximately 50 steps for every 100 data points you want to train, but this will vary significantly based on size and complexity of your data points. We calculate tuning job costs by: $1 per step * number of GPUs. Example: Memory tuning 100 data points with 50 steps → $50 on one GPU or $50 * 2 = $100 on 2 GPUs
What are steps?
In the context of tuning models, a "step" refers to a single update of the model's weights / one iteration. You can set the number of steps you want per job when you submit it.
Can I run the Meta Llama Text-to-SQL Memory Tuning Notebook?
Yes! Our free $300 in credits is enough to run the Meta Llama Notebook and tuning jobs from scratch.
What if I made my account earlier, do I still get free credits?
Yes, if you created an account earlier, you should have received $300 in free credit. If you didn’t receive your credit, please contact us.
My job is too slow. How can I speed it up?
You can request more GPUs for your job. Each additional GPU will improve performance by about 1.5x. Requesting more GPUs will increase the cost of the job.
What is your inference speed?
We built our inference engine to be highly performant. We run on AMD MI250 and MI300 GPUs and Nvidia H100 GPUs so our Single Stream memory wall is 200 tokens/sec, 331 tokens/sec, and 209 tokens/sec respectively. Learn more about evaluating performance of inference frameworks here.
What is a datapoint?
A datapoint is a single instance of data used in training. For example, in a text classification task, each sentence or document would be a datapoint. The number of datapoints affects the overall training time and cost.
How are steps calculated?
Steps are provided by the user when submitting a job. By default, we assume 50 steps per 100 datapoints, but this can be adjusted based on your specific needs. More complex tasks or larger models might require more steps per datapoint.
Why use multiple GPUs?
Using multiple GPUs can significantly speed up the training process. Each additional GPU provides approximately a 1.5x speed increase, allowing you to train your model faster. This can be particularly beneficial for large datasets or complex models.
How accurate is this cost estimate?
This calculator provides a rough estimate based on typical usage. Actual costs depend on the number of steps and GPUs provided.

Tuning Job Cost Calculator

Show calculation breakdown

Calculation Breakdown: