On this page

What If the Next AI Diagnostic Tool Didn't Need the Cloud?

Why small language models could be the key to unlocking AI adoption in healthcare

The Concentration of AI Power

We’re witnessing an unprecedented concentration of artificial intelligence capabilities. Frontier labs with the likes of OpenAI, Google DeepMind, Anthropic, and a handful of others—are locked in an escalating race to build ever-larger models. These behemoths require massive cloud infrastructure, consuming billions of dollars in training costs and operating at scales that remain out of reach for most organizations.

This is the AI supremacy race, playing out most visibly between the United States and China. It has created a landscape dominated by closed, proprietary systems. The latest models boast hundreds of billions or even trillions of , trained on datasets comprising vast swaths of the internet. They’re undeniably powerful, but they come with strings attached.

For healthcare systems, this concentration of power raises fundamental questions of trust and alignment. Technology companies operate on fundamentally different incentives than hospitals and clinics. One is driven by shareholder value, market dominance, and quarterly earnings. The other is oriented toward patient outcomes, cost reduction, and evidence-based practice.

When healthcare organizations adopt closed-source large language models, they’re placing enormous trust in providers whose priorities may not align with patient care. They’re subject to the whims of pricing changes, feature deprecation, and terms of service that can shift without notice. Most critically, they’re sending patient data (eek!) to third-party cloud servers.

This tension has created a paradox: the healthcare sector desperately needs AI’s capabilities to improve diagnosis, reduce administrative burden, and accelerate research. Yet the dominant model of AI deployment—cloud-based, closed-source, controlled by a handful of tech giants—runs counter to the fundamental principles of medical practice.

Who This Is For

This article is written for healthcare professionals curious about AI technology, and tech practitioners moving into the medical space. If you’re a physician wondering whether AI tools are trustworthy, a hospital administrator evaluating deployment options, or a technologist building solutions for healthcare, this exploration of small language models will help you understand a practical path forward for AI adoption in medicine.

The Other Side of the Coin: Small Language Models

But there’s another movement happening, one that offers a compelling alternative to this centralized paradigm. It’s the rise of small language models (SLMs) and the open-source AI community—a grassroots counterweight to the closed-source giants.

Before diving deeper, let’s clarify some terminology. “Open-source” AI models are typically open-weight models, meaning the model weights are publicly available for download and inspection. You can see the internal structure and behaviour of the model, though typically not the complete dataset used for training.

Crucially, this means anyone can take the model and host it on their own hardware—a local server, a consumer , even edge devices, mobile devices as long as they comply with the licensing terms.

Small language models (SLM) represent a practical subset of this movement, generally defined as models with roughly 9 billion parameters or fewer. This size threshold isn’t arbitrary … it’s chosen because these models can run efficiently on a single consumer-grade graphics card. It’s a loose definition, but an immensely practical one: an SLM is a model you can actually deploy without requiring massive infrastructure.

The distinction matters enormously for healthcare. A 7-billion parameter model for medical reasoning can run on hardware that many research hospitals already own, or on commodity GPUs that cost a few thousand dollars rather than requiring million-dollar supercomputer clusters.

Why Small Language Models Matter for Healthcare

The advantages of SLMs in medical settings are compelling, particularly when viewed through the lens of healthcare’s unique constraints and requirements.

Data Privacy and Sovereignty

The most obvious benefit is data privacy. When a model runs entirely , patient data never leaves the hospital network. This addresses GDPR requirements, HIPAA compliance in the United States, and similar regulations worldwide. It also resolves thorny questions around data governance, ownership, and sovereignty—who controls the intelligence derived from patient data, and who benefits from it?

For many healthcare systems, this alone is sufficient reason to explore SLMs. The alternative is sending patient records to external cloud providers—creates legal risk, and risks conflicts with the fundamental medical principle of confidentiality.

Cost Efficiency and Existing Infrastructure

Many healthcare organizations already have on-premises infrastructure. While it may not be the most scalable option, it’s already paid for and can be cost-effective for deploying SLMs. Older-generation GPUs that have depreciated in value can run modern small models remarkably efficiently, turning what might otherwise be underutilized hardware into powerful AI inference engines.

The cost structure is fundamentally different too. Instead of paying per API call—a model that can become prohibitively expensive at scale—an on-premises SLM has predictable costs tied to hardware and electricity. For high-volume applications, this can represent enormous savings.

Speed and Specialization

Because SLMs are smaller, they’re faster. latency drops dramatically when you’re running a 7B model locally instead of sending data to a remote 175B model and waiting for a response. In clinical settings where every second counts, this responsiveness matters.

The smaller size also makes fine-tuning practical. Healthcare organisations can create highly specialised models tailored to specific medical domains: ie. cardiology, radiology, oncology. This mirrors how hospitals themselves are organised into specialised departments. A general-purpose large model might know something about everything, but a fine-tuned small model can be an expert in its narrow domain, providing more reliable and contextually appropriate responses.

Independence from Cloud Providers

Perhaps most strategically, SLMs free healthcare organisations from dependence on cloud providers whose objectives may not align with long-term patient care. Pricing, feature availability, model behaviour—all of these are controlled by the provider when you’re using a closed-source API. With an open-weight SLM, the organisation retains control. The model won’t be deprecated, its pricing won’t suddenly change, and its behaviour remains consistent and auditable.

This independence is particularly valuable in healthcare, where tools need to remain stable and predictable over years or even decades, not subject to the rapid iteration cycles and market pressures of the tech industry.

The Current Landscape: Small Language Models in 2025

To understand the practical reality of SLMs, let’s examine the current landscape of available models. Several major technology companies and open-source communities have released small language models that are reshaping what’s possible with on-device and on-premises AI.

General-Purpose Small Language Models

Model FamilyVariantsParametersContext WindowKey FeaturesTarget Use Case
Microsoft Phi-3.5Mini, Vision, MoE3.8B, 4.2B128kSynthetic data training, on-device quantization, mobile OCR supportMobile reasoning, on-device document analysis
Google Gemma 3270M, 1B, 4B, 9B, 27Bvarious128kSliding window attention, logit soft-capping, native multimodal (audio/vision)Multilingual edge AI, real-time audio agents
Meta Llama 3.18B Instruct8B128kGrouped Query Attention (GQA), extensive safety tuning, massive pre-training corpusGeneral-purpose chat, tool use, community fine-tuning
Mistral SmallSmall, NeMo~12B - 24B32kHigh efficiency, focus on RAG performance, low latency inferenceEnterprise RAG, batch processing, cost-sensitive workflows
OpenAI GPT-OSS20B20B (3.6B active)128kMixture-of-Experts (MoE), MXFP4 quantization, native agentic tools, configurable reasoningComplex reasoning on consumer GPUs, agentic swarms
Apple OpenELM270M - 3B0.27B - 3BN/ALayer-wise scaling, on-device optimization for Apple Silicon Neural EngineBackground iOS tasks, privacy-centric inference

These models represent the foundation upon which medical-specific variants are built. They demonstrate that sophisticated language understanding and generation don’t require hundreds of billions of parameters. With the right architecture and training approach, models under 10 billion parameters can achieve remarkable performance IMHO.

The Medical Model Landscape

By 2025, a specialised class of medical SLMs has emerged, fine-tuned on biomedical corpora including PubMed, clinical notes, and medical textbooks. These models are optimised specifically for clinical reasoning, medical question answering, and healthcare workflows.

BioMistral 7B: The Open-Source Medical Benchmark

BioMistral 7B, based on the Mistral architecture, has become a benchmark for open medical AI. It’s fine-tuned on PubMed Central (3 billion tokens) and other medical databases, creating a model with deep biomedical knowledge.

Performance: In recent evaluations, BioMistral 7B achieved 57.3% accuracy across 10 medical question-answering tasks, significantly outperforming previous open models like MedAlpaca (51.5%) and MediTron-7B (42.7%). Notably, in rare disease queries, it hallucinates less frequently than GPT-4, likely due to its highly specific training data focused on peer-reviewed medical literature.

However, the model has limitations. Studies noted that BioMistral’s responses were often too technical or curt for direct patient interaction, lacking the empathetic tone that larger models can generate. This suggests its best application is as a clinician-facing tool—a decision support aid for physicians rather than a patient-facing chatbot.

Deployment: Its open-source nature allows it to be quantized and run locally, making it a favorite among research institutions building secure internal tools. A hospital can deploy BioMistral on a single GPU and have a powerful medical reasoning engine that never sends data outside their network.

Google MedGemma: Multimodal Medical AI

Google expanded its Health AI Developer Foundations with MedGemma, a suite of models based on Gemma 3 that brings multimodal capabilities to medical AI.

Architecture: MedGemma comes in 4B and 27B variants. The 4B model is small enough to run on mobile hardware but still scores 64.4% on MedQA, a challenging medical exam benchmark. The 27B text variant achieves 87.7% on MedQA, approaching the performance of leading proprietary models but at a fraction of the inference cost and deployable on modest hardware.

Multimodal Capabilities: A key innovation is MedSigLIP, a lightweight image encoder paired with MedGemma. This allows the model to interpret chest X-rays and histopathology slides. In blinded studies, 81% of MedGemma 4B’s chest X-ray reports were judged by radiologists to be accurate enough for patient management decisions.

Think about what this means: a diagnostic aid capable of analysing medical images can now run on a single GPU or even a high-end tablet. This represents a significant democratisation of medical imaging AI, making it accessible to smaller clinics and hospitals in resource-constrained settings.

Agentic Performance: MedGemma 27B has also been evaluated on AgentClinic-MedQA, a benchmark for agentic medical reasoning where models must autonomously gather information and make decisions. It demonstrated superior performance to the base Gemma 3 27B and exceeded human physician performance on specific sub-tasks, highlighting its potential in autonomous medical workflows.

OpenAI GPT-OSS-20B: Community-Driven Medical Fine-Tunes

While OpenAI’s flagship models remain closed, the release of GPT-OSS-20B has sparked community-driven medical adaptations. Variants like Tonic/med-gpt-oss-20b have been optimized for medical reasoning and instruction following using LoRA (Low-Rank Adaptation), a technique that allows efficient fine-tuning without retraining the entire model.

These models utilize the “chain of thought” capabilities inherent in the base model to decompose complex clinical diagnoses into step-by-step reasoning traces. This “glass box” reasoning is critical for clinical trust—doctors need to see why a model reached a conclusion, not just what conclusion it reached. This is overall adding some transparency into the black-box but by no means a silver bullet into full explainable AI.

Llama 3 Medical Derivatives: The Open Foundation

Meta’s Llama 3 8B has served as the foundation for numerous medical adaptations. Fine-tuned versions using datasets like FreedomIntelligence/medical-o1-reasoning-SFT have demonstrated the ability to preserve reasoning styles and improve interpretability, all while running on consumer-grade GPUs.

There are various flavours of Llama 3 such as Bio-Medical-Llama3 or MMed-Llama3 but I didn’t find too many prevailing medical models that are based off the base model.

Comparing Medical SLMs

The following table summarizes the key medical small language models available in 2025:

ModelBase ArchitectureParametersMedQA Accuracy
BioMistral 7BMistral 7B7B57.3% (10-task avg)
MedGemma 4BGemma 34B64.4%
MedGemma 27BGemma 327B87.7%
GPT-OSS-20B (Medical)GPT-OSS-20B (MoE)20B?
Llama 3 MedicalLlama 3 8B8B66%

The Path Forward: How SLMs Will Accelerate AI Adoption in Healthcare

With all that somewhat promising advantages, I still do not see medical SLMs taking over frontier models. But, it remains a viable and potent alternative in settings where data privacy is paramount.

Building Trust Through Transparency

Healthcare is fundamentally conservative, and for good reason. The black-box nature of cloud-based AI systems creates a trust problem that no amount of performance benchmarking can fully overcome. When a model’s reasoning is opaque and its data handling is invisible, it is hard to push for clinical adoption.

Small language models offer a solution. When an internal team can inspect the model weights, audit the inference process, and guarantee that patient data never leaves their network, trust barriers begin to fall. SLMs allow healthcare organizations to experiment with AI in a controlled, transparent environment. They can validate performance on their own patient populations, understand failure modes, and build institutional knowledge about AI capabilities and limitations.

This hands-on experience is invaluable. It transforms AI from an abstract concept into a concrete tool that hopefully builds understanding and trust. And once that trust is established, organizations become more willing to explore hybrid approaches: IE. using SLMs for privacy-sensitive tasks and larger cloud models for appropriate use cases.

Economics That Make Sense

The economic model of cloud-based AI doesn’t align well with healthcare budgets. Per-query pricing creates unpredictable costs that can balloon unexpectedly. For a hospital processing thousands of clinical notes daily or analysing hundreds of imaging studies, API costs can quickly become prohibitive.

SLMs flip the economics. The cost is upfront hardware investment rather than ongoing per-use fees. A $10,000 GPU server running a medical SLM can process unlimited queries with only electricity and maintenance costs. For high-volume applications, this becomes dramatically cheaper than cloud APIs, making AI economically feasible even for resource-constrained institutions.

This cost structure also enables experimentation. Healthcare organizations can pilot AI applications without worrying about runaway costs, testing different use cases and refining workflows before committing to larger investments.

Specialization and Domain Expertise

The knowledge required for cardiology differs substantially from oncology or radiology. General-purpose large models know something about everything, but they’re not optimized for any specific medical specialty.

Small models, being easier to fine-tune, enable true domain specialisation. A hospital can create a cardiology-specific model trained on relevant literature and their own clinical data, a radiology model optimised for their imaging protocols, an emergency medicine model calibrated to their patient population. These specialised models can outperform general-purpose giants within their narrow domains, providing more accurate and contextually appropriate support.

This mirrors how hospitals actually organise medical practice around specialised departments with deep domain expertise. SLMs allow AI deployment to follow the same organisational logic.

The Hybrid Future

The future of medical AI is SLMs and large language models working in concert.

Imagine a healthcare AI architecture where:

  • Privacy-sensitive tasks (analysing patient records, generating clinical notes, preliminary diagnosis) run on local SLMs that never expose patient data
  • Complex research queries, rare disease analysis, or synthesis of the latest medical literature leverage large cloud models where appropriate
  • Edge devices (mobile ultrasound, portable diagnostic tools) run specialised SLMs for real-time analysis in resource-limited settings
  • Large models help create synthetic training data to improve SLMs without exposing real patient information

This hybrid approach combines the privacy, cost-efficiency, and specialisation of small models with the comprehensive knowledge and reasoning power of large ones. It’s not an either/or proposition—it’s a thoughtful architecture that uses the right tool for each specific task.

Conclusion: A Pragmatic Path Forward

The concentration of AI power in the hands of a few frontier labs creates real challenges for healthcare adoption. Privacy concerns, misaligned incentives, unpredictable costs, and lack of transparency have slowed the integration of AI into clinical practice, despite the technology’s obvious potential.

Small language models offer a pragmatic alternative—not a replacement for large models, but a complementary approach that addresses healthcare’s unique requirements. They provide the privacy, cost-efficiency, specialisation, and transparency that medical institutions need to trust AI systems and integrate them into clinical workflows.

The next great diagnostic tool might not need the cloud. It might run on a hospital’s own servers, process data locally, cost a fraction of cloud alternatives, and be tailored specifically to that institution’s patient population and clinical needs. And in doing so, it might just be the bridge that brings AI from the frontier labs into the bedside.

The choice isn’t between big and small, cloud and edge, proprietary and open. The smart path forward is recognising that different tools serve different purposes. Small language models are an opportunity for practical, privacy-preserving, economically viable pathways to accelerating AI adoption in the field where it matters most: improving human health.

References

If you enjoyed that, thanks for stopping by. This has been an accessible read that is written and tailored for the TechBrews community. I’m working on a more technical and opiniated work focusing on the medical open-source models.

David Tang

David Tang
MBBS MRCP MSC FHEA

Principal clincal AI consultant @tangible.healthcare, embedding AI in healthcare workflows.