Home

>

Mindpix Blog

>

Free LLM's

Best open-source LLMs in 2025: What everybody is running now

Written by Denis Walker
Originally published: October 31, 2025
Updated: November 18, 2025
Views: 316
prev

In 2025, almost every startup has AI at its core. The shift is clear. From internal tools to customer products, artificial intelligence isn't a side feature anymore. It’s the product. And in just a few years, the number of startups built on AI will hit 100%. That means every industry - finance, healthcare, education, logistics - is now experimenting, deploying, or scaling with LLMs.

But this new normal isn’t powered by a handful of corporate labs. It’s driven by open models that anyone can use, modify, and run. Engineers aren’t waiting for access. They’re downloading, fine-tuning, and shipping with open-weight LLMs that match or outperform closed models.

Whether you’re building an AI agent, a translation tool, or an internal knowledge system, you now have serious choices. And you can run them on your own hardware.


Hardware Setup: What Equipment Is Best for Running These Models


Running open‑source large language models (LLMs) in 2025 requires a clear view of hardware trade‑offs. Here’s a comparison of three typical setups:


Mac Studio M4 Max (128 GB Unified Memory)

  • Can run models up to ~30B parameters with 4-bit or 8-bit quantisation.
  • Unified memory means GPU and CPU share 128 GB, giving flexibility for mid-size models.
  • 40-core GPU and high bandwidth (546 GB/s) offer decent throughput for local inference.
  • Best for prototyping, fine-tuning, or local deployments that don’t need large-scale infrastructure.
  • Heavy models (70B+) may run with aggressive quantisation but will be slow.
  • Ideal if you want to experiment with modern LLMs on Apple Silicon without using cloud GPUs.


Desktop PC with RTX 4090 or RTX 5090

  • Strong GPU power and high VRAM (for 4090/5090) give a good balance of cost vs capability.
  • Can run mid‑sized models (10B‑30B parameters) with quantisation, even full size in some cases depending on memory and batching.
  • Good for research, local inference, development of custom apps.


Enterprise‑grade GPU cluster (e.g., NVIDIA DGX Spark)

  • High‐end multi‑GPU setup designed for the largest models (100s of billions of parameters or Mixture‑of‑Experts setups).
  • Enables full precision or large context windows, higher throughput, production scale.
  • Best suited for teams, cloud/infrastructure environments.


Key considerations

  • VRAM (or GPU memory) and system RAM both matter. Large models (>100B params) need multi‑GPU or MoE routing.
  • Inference speed depends on architecture (dense vs sparse/MoE), quantisation, context length, hardware.
  • Deployment environment (desktop vs server vs edge) dictates hardware choice.
  • OS and framework compatibility: many models favour Linux deployment, GPU acceleration frameworks (CUDA, TensorRT) matter; macOS may support through variant tools but fewer options.


Bottom line: If you’re testing or building with smaller open‑source LLMs, a good desktop GPU setup works. For heavy production or the largest models you’ll need serious infrastructure.


Best Open‑Source LLMs in 2025


Qwen (Alibaba Cloud)


Qwen is Alibaba’s open model series known for massive scale and multilingual support. Qwen3 includes dense and MoE versions, with up to 235B total parameters. It’s trained on 36T tokens in 119 languages, making it strong in global tasks. Its MoE design activates fewer parameters per token, allowing better efficiency. Qwen excels in reasoning, code, and tool use. It differs from others by offering very large open models with commercial use allowed, and is heavily adopted in China. The scale, multilingual range, and MoE structure make it stand out from smaller, more focused models like Phi or Gemma.


  • Creator: Alibaba Cloud 
  • Developer country most‑active: China (Alibaba).
  • Parameter range: From ~0.6 B (0.6B) to 235 B total parameters in the “Qwen3‑235B” variant; e.g., Qwen3 dense models: 0.6B, 1.7B, 4B, 8B, 14B, 32B; MoE models: 30B total with 3B active, 235B total with 22B active.
  • Speed: Dense smaller models run faster locally; the MoE variants activate fewer parameters per token thus inference can be more efficient relative to total size. (E.g., 22B active from 235B total) 
  • Training data: According to sources, Qwen3 models trained on 36 trillion tokens in 119 languages and dialects. 
  • Best uses: Multilingual tasks, coding/multimodal (Qwen‑VL / Qwen‑Audio) extensions appear; large reasoning and agent‑capable tasks.
  • Community support: Strong within China and within Alibaba’s ecosystem; open weights (for many variants) released under Apache 2.0 license. 
  • Popularity: Widely downloaded (report cites “more than 40 million downloads”) and used in cloud (Alibaba Cloud Model Studio) and local deployments. 
  • Notes: Because of scale and multilingual focus, Qwen is attractive to enterprises and developers needing languages beyond English.


Gemma (Google / DeepMind)


Gemma is Google’s compact model family designed for easy use and broad access. Built from the same research as Gemini, it supports multilingual and multimodal tasks. Sizes range from 2B to 27B, optimized for efficiency and developer use. Gemma models are fully open, permissively licensed, and tuned for instruction tasks. Compared to Qwen or Llama, Gemma trades raw power for accessibility. It’s ideal for fine-tuning, fast inference, and local deployment. While not the largest or strongest, its simplicity and ease-of-use make it appealing for early prototyping, mobile integration, or edge use where resources are limited.


  • Creator: Google DeepMind 
  • Developer country: United States / United Kingdom (Google / DeepMind)
  • Parameter sizes: Gemma 3 family: 1B, 4B, 12B, 27B parameters. 
  • Speed: Relatively modest parameter sizes permit faster inference and easier deployment; ideal for smaller to mid‑sized tasks.
  • Training data: Built on same research/tech as Google’s Gemini models; details include multilingual/multimodal support. 
  • Best uses: Lightweight models for multilingual generation, summarisation, reasoning, multimodal tasks (image+text) in smaller form‑factors. 
  • Community support: Google released open weights and tools; open‑model family intended for developers with permissive licensing for commercial use. 
  • Popularity: Gaining traction in the smaller‑model category; less hype at the highest scale compared to some peers but strong backing.
  • Notes: Gemma offers a developer‑friendly path when you don’t need the full largest model scale.


Mistral (Mistral AI)


Mistral is a European open model built for compact speed and clean performance. The 7B model is dense, fast, and well-tuned for general tasks. Newer versions (22B–24B) offer stronger reasoning, while the 123B “Mistral Large” enters enterprise territory. Mistral is known for minimalism, few dependencies, and solid training. It’s optimized for fast token generation and lower latency. Unlike Qwen or DeepSeek, Mistral avoids complex MoE designs, focusing instead on refined dense models. Mistral is widely used for production inference, chatbots, and developer tools needing speed and reliability without high GPU requirements.


  • Creator: Mistral AI 
  • Developer country: France (European startup)
  • Parameter sizes:Mistral 7B, 22B, 24B, 123B
  • Speed: Smaller size supports fast inference; good “daily driver” model for general‑purpose.
  • Training data: Less publicly detailed than some peers; Mistral emphasises the balance of speed, size, performance.
  • Best uses: General purpose, cost‑efficient inference, tasks where lower latency is important and you don’t need gigantic model scale.
  • Community support: Strong open‑model positioning, fresh European entrant, active community interest.
  • Popularity: Becoming one of the go‑to open models for those wanting a compromise of size vs performance.
  • Notes: As a European open model, interesting for regulatory and regional deployment contexts.


Phi (Microsoft)


Phi, built by Microsoft, focuses on efficient, small-scale models with strong reasoning and math skills. Phi-2 is 2.7B, while Phi-3.5 and Phi-4 mini reach ~3.8B. Phi-4 offers a 14B variant for heavier tasks. These models are designed for local, private inference and small deployments. They use synthetic and curated data for clean performance. Phi stands out by delivering impressive accuracy in a small footprint. Compared to DeepSeek or Llama, Phi is lightweight and meant for tight environments like mobile or embedded systems. It’s ideal for developers needing intelligence without the cost of large hardware.


  • Creator: Microsoft Research 
  • Developer country: United States
  • Parameter sizes: Example: Phi‑4 is from 1,3 to 14B parameters according to Ollama library or Hugging Face. 
  • Speed: Compact design means low latency and deployment on modest hardware (even edge/desktop) is feasible.
  • Training data: Mix of synthetic datasets, filtered public websites, academic books / Q&A data for Phi‑4. 
  • Best uses: Lightweight reasoning, educational/edge deployments, scenarios where you need good performance without huge infrastructure.
  • Community support: Microsoft’s release of the models (or SLMs) with focus on edge, real‑time, smaller resource footprint.
  • Popularity: Gains among developers targeting smaller parameter budgets, deployment across devices.
  • Notes: While “small” compared to 100‑B+ models, Phi illustrates the value of efficient smaller models.


DeepSeek (China)


DeepSeek is China’s response to massive-scale models. DeepSeek-R1 has 671B total parameters, with only 37B active per pass thanks to MoE. It focuses on math, logic, code, and general reasoning. Smaller variants (down to 1.5B) make it flexible across devices. DeepSeek uses massive training corpora and aims to rival top proprietary models. It’s aggressive in size and release cadence. Compared to others, DeepSeek offers the largest open models with efficient MoE use and a focus on Chinese and English support. It’s a go-to for high-scale deployment and large-context applications without full GPU costs.


  • Creator: DeepSeek AI
  • Developer country: China
  • Parameter sizes: The flagship DeepSeek‑R1 series: 671B total parameters with 37B activated per token. Distilled models range from 1.5B up to 70B.
  • Speed: Mixture‑of‑Experts architecture (MoE) means fewer active parameters per token which helps inference efficiency. 
  • Training data: Large multilingual corpora (e.g., DeepSeek‑V3 trained on 14.8 trillion tokens). 
  • Best uses: High reasoning, math, programming tasks. Intended to compete with top closed models at lower cost.
  • Community support: Open weights released under MIT licence for R1‐distill series, active interest in Chinese and global communities. 
  • Popularity: Notable surge especially in China; labelled as “Sputnik moment” for open models.
  • Notes: Although very large, availability of distilled smaller variants makes it relevant for community deployment too.


Llama (Meta)


Llama, developed by Meta, is the most widely adopted open-weight model series. Llama 1 started with 7B to 65B models. Llama 2 expanded to 70B, and Llama 3.1 now reaches 405B parameters. Llama 4 introduces MoE variants with up to 400B. Its strengths lie in community support, availability, and fine-tune quality. Llama is used in research, products, and startups worldwide. Unlike DeepSeek or GPT-oss, Llama emphasizes developer tooling, open access, and a wide base of finetuned variants. It’s the backbone of many local AI stacks. It differs from Qwen in that it’s more Western-developer focused and widely integrated.


  • Creator: Meta AI
  • Developer country: United States
  • Parameter sizes: 8B, 70B, 109B and 405B parameters.
  • Speed: Smaller sizes (8B) good for rapid inference; large 405B variant requires heavy infrastructure.
  • Training data: Llama series trained on large token counts (e.g., Llama 3 claims ~15 trillion tokens). 
  • Best uses: Broad foundation model purposes, fine‑tuning, research, multilingual dialogue, tool use.
  • Community support: Very strong. The Llama ecosystem spawned thousands of fine‑tuned variants; widely adopted in open‑source community.
  • Popularity: Among the most widely used open (or “open‑weight”) model families.
  • Notes: Although labelled open, there are some licensing/use‑restrictions that make it less “pure” open‑source by some definitions.


GPT‑oss (OpenAI)


GPT-oss is OpenAI’s move into open-weight territory. The models, including GPT-oss-20B and GPT-oss-120B, offer top-tier reasoning, agentic capabilities, and strong tool use. They’re trained with OpenAI infrastructure but released under Apache 2.0, enabling wide adoption. GPT-oss is dense, tuned for OpenAI-style workflows, and easy to integrate into existing OpenAI-based APIs. Compared to Llama or Mistral, GPT-oss brings stronger closed-model-style performance in an open package. It lacks training transparency but performs well across benchmarks. Developers who like GPT-4’s behavior but want local control are adopting GPT-oss quickly. It stands out for its quality-to-size ratio and clean licensing.


  • Creator: OpenAI 
  • Developer country: United States
  • Parameter sizes: Two core models: gpt‑oss‑20b (~21B) and gpt‑oss‑120b (~117‑120B) parameters. 
  • Speed: 20B variant meant for desktops/local; 120B variant designed for high‑end infrastructure; MoE architecture helps efficiency.
  • Training data: Details less public; emphasis on reasoning, agent workflows; released as “open‑weight” under Apache 2.0. 
  • Best uses: Developer‑friendly general‑purpose model for reasoning, tool‑use, fine‑tuning, local deployment.
  • Community support: Significant because of OpenAI’s brand and the fact weights are open under permissive licence; many developers exploring local deployment.
  • Popularity: High interest regardless of size, due to OpenAI’s move into open‑weight territory and ecosystem integration.
  • Notes: Distinction: “open‐weight” rather than fully open source (training data/methods less disclosed) but still a major step.


Comparative Summary


Size vs speed: Smaller models (Phi 14B, Gemma up to ~27B) run faster, need less infrastructure. Larger models (Qwen up to 235B/22B active, DeepSeek 671B/37B active, Llama 405B, GPT‑oss 120B) need heavier hardware.

Use‑case fit:

  • Lightweight or edge: Phi, Gemma.
  • General purpose local or enterprise: Mistral, Qwen median sizes, Llama 8B‑70B.
  • Large‑scale reasoning, production, agent workflows: Qwen MoE, DeepSeek, Llama 405B, GPT‑oss 120B.

Community and ecosystem:

  • Llama and Qwen have broad ecosystems and many derivative models.
  • Gemma and Phi are newer but well‑backed by Google/Microsoft.
  • DeepSeek is gaining fast, especially in China and making global open‑model waves.
  • GPT‑oss benefits from brand and open‑weight release, so developer interest is high.

Licensing & openness:

  • All claim open or open‑weight status, but degrees vary (data disclosure, usage restrictions).
  • For pure open‑source credentials (transparent training data, full code release) there remain gaps across many models.

Global developer base:

  • US dominates many (OpenAI, Microsoft, Meta).
  • China strongly represented (Alibaba/Qwen, DeepSeek).
  • Europe has entrants like Mistral (France).
  • Therefore cultural/region diversity is improving.


Final Thoughts


In 2025, developers have strong, clear choices across the open LLM space. If you need small, efficient models for edge or mobile, Phi and Gemma offer clean performance with low overhead. Mistral is ideal when you need fast, compact models for real-time use without sacrificing too much power. Llama and GPT-oss cover general-purpose and production use with strong ecosystems and solid performance. Qwen and DeepSeek dominate large-scale multilingual and reasoning tasks, especially when MoE efficiency or deep context is critical.

Hardware matters. Macs with Apple Silicon and 128 GB unified memory can run models up to ~30B with 4-bit quantisation. For anything larger, multi-GPU desktops or clusters are still required.

Each model has its tradeoffs—choose based on your use case, hardware, and openness needs. The open-weight movement is now the default for builders, and the tools are finally strong enough to run locally, privately, and efficiently.