layout: default title: Supported Models

Supported Models

NodeTool provides extensive support for AI models across multiple providers, from cutting-edge proprietary models to open-source alternatives. This comprehensive guide covers all supported models and their capabilities.

Provider Overview

NodeTool supports both cloud-based and local AI model providers:

Cloud Providers

Recommended: HuggingFace Inference Providers

HuggingFace Inference Providers offer the widest model support through a unified API, giving you access to hundreds of models across 17 specialized inference providers. This is the recommended choice for cloud-based inference when you need maximum flexibility and model access.

Why Choose HuggingFace Inference Providers:

  • Widest Model Selection: Access to thousands of models across all major AI tasks
  • 17 Specialized Providers: Cerebras, Cohere, Fal AI, Featherless AI, Fireworks, Groq, HF Inference, Hyperbolic, Nebius, Novita, Nscale, Public AI, Replicate, SambaNova, Scaleway, Together, and Z.ai
  • Zero Vendor Lock-in: Switch between providers seamlessly through one API
  • OpenAI-Compatible: Drop-in replacement for OpenAI SDK
  • Automatic Failover: Requests route to alternative providers if primary is unavailable
  • Unified Billing: Single token for all providers, no markup on provider rates
  • Production-Ready: Enterprise-grade reliability and performance

Supported Tasks:

  • Chat Completion (LLM & VLM)
  • Feature Extraction (Embeddings)
  • Text-to-Image Generation
  • Text-to-Video Generation
  • Speech-to-Text

Provider Capabilities Matrix:

ProviderLLMVLMEmbeddingsImage GenVideo GenSpeech
Cerebras
Cohere
Fal AI
Featherless AI
Fireworks
Groq
HF Inference
Hyperbolic
Nebius
Novita
Nscale
Public AI
Replicate
SambaNova
Scaleway
Together
Z.ai

Key Features:

  • 🎯 All-in-One API: Single interface for text, image, video, embeddings, and speech tasks
  • 🔀 Multi-Provider Support: Seamlessly switch between 17 top-tier providers
  • 🚀 Scalable & Reliable: High availability and low-latency for production
  • 🔧 Developer-Friendly: OpenAI-compatible API with Python and JavaScript SDKs
  • 💰 Cost-Effective: Transparent pricing with no extra markup

Example Usage:

from huggingface_hub import InferenceClient

client = InferenceClient()
completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[{"role": "user", "content": "Hello!"}]
)

For detailed documentation, visit HuggingFace Inference Providers.

Direct Provider APIs

  • OpenAI: Industry-leading language models with multimodal capabilities (GPT-5, GPT-4o, O3, Sora 2)
  • Anthropic: Advanced reasoning models with strong safety features (Claude 4.5, Claude 4)
  • Google: Multimodal models with excellent vision and reasoning capabilities (Gemini 2.5)

Local Inference Engines

NodeTool provides comprehensive local model support with 1,655+ models across multiple frameworks:

  • llama.cpp: Highly optimized C/C++ inference with GGUF format support (300+ quantized models)
  • Ollama: User-friendly local deployment with 23+ pre-configured models
  • vLLM: Production-grade high-throughput inference engine (PyTorch Foundation project)
  • MLX Framework: Apple Silicon-optimized inference (977+ models)
    • MLX-LM: Language model inference
    • MLX-VLM: Vision-language models with FastVLM
    • MFLUX: FLUX image generation for Apple Silicon
  • HuggingFace Transformers: Access to 500,000+ models with pipeline API

Local Benefits: Complete privacy, zero API costs, offline functionality, and full customization.

OpenAI Models

GPT-5 Series

  • GPT-5 (gpt-5): Next-generation flagship model with 400K context window
  • GPT-5 Mini (gpt-5-mini): Efficient variant of GPT-5

GPT-4o Series

  • GPT-4o (gpt-4o): Advanced multimodal model with vision capabilities
  • GPT-4o Mini (gpt-4o-mini): Efficient version of GPT-4o for cost-effective tasks
  • ChatGPT-4o (chatgpt-4o-latest): Conversational variant optimized for chat

GPT-4o Audio Series

  • GPT-4o Audio (gpt-4o-audio-preview-2024-12-17): Enhanced with audio processing
  • GPT-4o Mini Audio (gpt-4o-mini-audio-preview-2024-12-17): Compact audio model

GPT-4.1 Series

  • GPT-4.1 (gpt-4.1): Advanced reasoning model with 1M context window
  • GPT-4.1 Mini (gpt-4.1-mini): Efficient version of GPT-4.1

O-Series Reasoning Models

  • O3 (o3, o3-mini): Advanced reasoning models (no tool support)
  • O4 Mini (o4-mini): Specialized reasoning model with 200K context

Image Generation

  • GPT Image 1 (gpt-image-1): Latest image generation model
  • DALL-E 3 (dall-e-3): High-quality image generation (legacy)
  • DALL-E 2 (dall-e-2): Previous generation (legacy)

Video Generation

  • Sora 2 (sora-2): Text and image-to-video generation
  • Sora 2 Pro (sora-2-pro): Premium video generation with enhanced quality

Specialized Models

  • Codex Mini (codex-mini-latest): Code-focused model for programming tasks
  • Whisper (whisper-1): Speech recognition and transcription

Best for: General-purpose tasks, coding, multimodal applications, audio processing, video generation

Anthropic Models

Claude 4.5 Series

  • Claude Sonnet 4.5 (claude-sonnet-4-5-latest, claude-sonnet-4-5-20250929): Latest next-generation model with enhanced reasoning
  • Claude Haiku 4.5 (claude-4-5-haiku-latest): Fast, efficient latest-generation model

Claude 4 Series

  • Claude Sonnet 4 (claude-sonnet-4-latest, claude-sonnet-4-20250514): Next-generation reasoning model
  • Claude Opus 4 (claude-opus-4-latest, claude-opus-4-20250514): Premium model for the most complex tasks

Claude 3.7 Series

  • Claude 3.7 Sonnet (claude-3-7-sonnet-latest): Advanced reasoning capabilities

Claude 3.5 Series

  • Claude 3.5 Haiku (claude-3-5-haiku-latest): Fast, efficient model for everyday tasks
  • Claude 3.5 Sonnet (claude-3-5-sonnet-latest): Balanced model for complex reasoning

Best for: Complex reasoning, analysis, safety-critical applications, long-form content, extended context tasks

Google Gemini Models

Gemini 2.5 Series

  • Gemini 2.5 Pro Experimental (gemini-2.5-pro-exp-03-25): Cutting-edge experimental model
  • Gemini 2.5 Flash (gemini-2.5-flash-preview-04-17): Fast, efficient multimodal model

Gemini 2.0 Series

  • Gemini 2.0 Flash (gemini-2.0-flash): Optimized for speed and efficiency
  • Gemini 2.0 Flash Lite (gemini-2.0-flash-lite): Lightweight version for basic tasks
  • Gemini 2.0 Flash Exp Image Generation (gemini-2.0-flash-exp-image-generation): Specialized for image generation

Best for: Multimodal tasks, vision processing, image generation, speed-critical applications

Hugging Face Models

Advanced Reasoning Models

  • DeepSeek V3 0324 (deepseek-ai/DeepSeek-V3-0324): Advanced reasoning and code generation
  • DeepSeek TNG R1T2 Chimera (tngtech/DeepSeek-TNG-R1T2-Chimera): Hybrid reasoning model
  • DeepSeek R1 (deepseek-ai/DeepSeek-R1): Latest DeepSeek reasoning model
  • DeepSeek R1 Distill Qwen 1.5B (deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B): Distilled efficient version

Instruction-Tuned Models

  • Hunyuan A13B Instruct (tencent/Hunyuan-A13B-Instruct): Tencent's instruction-tuned model
  • Meta Llama 3.1 8B Instruct (meta-llama/Meta-Llama-3.1-8B-Instruct): Meta's powerful instruction model
  • Qwen 2.5 7B Instruct 1M (Qwen/Qwen2.5-7B-Instruct-1M): Extended context length model

Specialized Models

  • DeepSWE Preview (agentica-org/DeepSWE-Preview): Specialized for software engineering tasks
  • Qwen 2.5 Coder 32B Instruct (Qwen/Qwen2.5-Coder-32B-Instruct): Code-specialized model
  • Qwen 2.5 VL 7B Instruct (Qwen/Qwen2.5-VL-7B-Instruct): Vision-language model

Compact Models

  • SmolLM3 3B (HuggingFaceTB/SmolLM3-3B): Compact, efficient language model
  • Gemma 2 2B IT (google/gemma-2-2b-it): Google's efficient instruction-tuned model
  • Phi 4 (microsoft/phi-4): Microsoft's latest compact model

Best for: Open-source applications, specialized tasks, cost-effective deployment, research

Hugging Face Groq Models

High-performance models optimized for speed through Groq's inference infrastructure:

Meta Llama Series

  • Meta Llama 3 70B Instruct (meta-llama/Meta-Llama-3-70B-Instruct): Large-scale instruction model
  • Llama 3.3 70B Instruct (meta-llama/Llama-3.3-70B-Instruct): Enhanced version with improved capabilities
  • Llama Guard 4 12B (meta-llama/Llama-Guard-4-12B): Safety and content moderation model

Llama 4 Preview Series

  • Llama 4 Scout 17B 16E Instruct (meta-llama/Llama-4-Scout-17B-16E-Instruct): Preview of next-generation Llama
  • Llama 4 Maverick 17B 128E Instruct (meta-llama/Llama-4-Maverick-17B-128E-Instruct): Extended context Llama 4 variant

Best for: High-throughput applications, real-time inference, production deployments

Hugging Face Cerebras Models

Models optimized for Cerebras' specialized hardware:

  • Cerebras GPT 2.5 12B Instruct (cerebras/Cerebras-GPT-2.5-12B-Instruct): Cerebras' proprietary model
  • Llama 3.3 70B Instruct (meta-llama/Llama-3.3-70B-Instruct): Optimized for Cerebras hardware
  • Llama 4 Scout 17B 16E Instruct (meta-llama/Llama-4-Scout-17B-16E-Instruct): Next-gen Llama on Cerebras

Best for: Ultra-fast inference, specialized hardware optimization, high-performance computing


HuggingFace Inference Providers (Unified Multi-Provider Access)

HuggingFace Inference Providers is a unified proxy service that gives you access to hundreds of models across 17 specialized inference providers through a single API. This is the recommended cloud solution for NodeTool when you need maximum flexibility and model access.

Why Use Inference Providers?

Zero Vendor Lock-in: Instead of committing to a single provider's model catalog, access models from Cerebras, Groq, Together AI, Replicate, SambaNova, Fireworks, and 11 more through one consistent interface.

Instant Access to Cutting-Edge Models: Go beyond mainstream providers to access thousands of specialized models across multiple AI tasks - language models, image generators, embeddings, speech processing, and more.

Production-Ready Performance: Built for enterprise workloads with high availability, automatic failover, and low-latency infrastructure.

OpenAI-Compatible API: Drop-in replacement for OpenAI SDK - migrate existing code with minimal changes while gaining access to hundreds more models.

Supported Providers & Capabilities

ProviderChat (LLM)Vision (VLM)EmbeddingsImage GenVideo GenSpeech
Cerebras
Cohere
Fal AI
Featherless AI
Fireworks
Groq
HF Inference
Hyperbolic
Nebius
Novita
Nscale
Public AI
Replicate
SambaNova
Scaleway
Together
Z.ai

Key Features

  • 🎯 All-in-One API: Single interface for text generation, image generation, embeddings, NER, summarization, speech recognition, and 50+ more tasks
  • 🔀 Multi-Provider Support: Seamlessly run models from 17 top-tier providers without changing your code
  • 🚀 Scalable & Reliable: Built for high availability and low-latency performance in production
  • 🔧 Developer-Friendly: Simple requests, fast responses, consistent experience across Python and JavaScript
  • 💰 Cost-Effective: Transparent pricing with no markup on provider rates
  • 🔄 Automatic Failover: Requests automatically route to alternative providers if primary is unavailable
  • 🔐 Unified Authentication: Use a single HuggingFace token for all 17 providers

Getting Started

Python Example:

from huggingface_hub import InferenceClient

# Initialize client
client = InferenceClient()

# Chat completion
completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3-0324",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Image generation
image = client.text_to_image(
    prompt="A serene lake at sunset, photorealistic",
    model="black-forest-labs/FLUX.1-dev"
)
image.save("output.png")

# Embeddings
embedding = client.feature_extraction(
    text="NodeTool is an AI workflow platform",
    model="sentence-transformers/all-MiniLM-L6-v2"
)

JavaScript Example:

import { InferenceClient } from "@huggingface/inference";

const client = new InferenceClient(process.env.HF_TOKEN);

// Chat completion
const completion = await client.chatCompletion({
  model: "meta-llama/Llama-3.1-8B-Instruct",
  messages: [{ role: "user", content: "Hello!" }],
});

// Automatic provider selection
const result = await client.chatCompletion({
  model: "meta-llama/Llama-3.1-8B-Instruct",
  provider: "auto", // Default: chooses best available provider
  messages: [{ role: "user", content: "Hello!" }],
});

// Explicit provider selection
const specific = await client.chatCompletion({
  model: "meta-llama/Llama-3.1-8B-Instruct",
  provider: "groq", // Force specific provider
  messages: [{ role: "user", content: "Hello!" }],
});

OpenAI-Compatible (Drop-in Replacement):

from openai import OpenAI

client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

Provider Selection

  • Automatic (provider="auto"): System selects the first available provider based on your preference order
  • Explicit (provider="groq"): Force use of a specific provider for consistency
  • Server-Side (OpenAI endpoint): Router automatically chooses best available provider

Use Cases

What You Can Build:

  • Chatbots & Assistants: Access to latest LLMs with tool-calling support
  • Image & Video Generation: FLUX, Stable Diffusion, and custom style generation with LoRAs
  • Search & RAG Systems: State-of-the-art embeddings for semantic search and recommendations
  • Speech Processing: Transcription, speech-to-text, and audio analysis
  • Traditional ML: Classification, NER, summarization, and 50+ specialized tasks

Pricing & Authentication

  • Generous Free Tier: Start with free credits for testing and development
  • PRO Users: Additional credits included with HuggingFace PRO subscription
  • Enterprise: Custom pricing for high-volume workloads
  • No Markup: Direct provider pricing with transparent billing
  • Single Token: One HuggingFace token authenticates across all 17 providers

Best For

  • Maximum Flexibility: Need access to models from multiple providers
  • Avoiding Lock-in: Don't want to commit to a single provider
  • Production Workloads: Need reliability with automatic failover
  • Cost Optimization: Compare pricing across providers easily
  • Rapid Development: Test multiple models without integrating 17 different APIs

For complete documentation, visit HuggingFace Inference Providers.

Local Models

NodeTool provides extensive support for running models locally with complete privacy and no external API dependencies. Multiple inference engines are supported, each optimized for different hardware and use cases.

llama.cpp & GGUF Format

llama.cpp is a highly optimized C/C++ inference library that enables efficient LLM inference on CPU and GPU hardware. It supports the GGUF (GGML Universal File) format, a binary format designed for fast loading and memory-efficient storage.

Key Features:

  • Quantization Support: 1.5-bit through 8-bit integer quantization for reduced memory usage
  • Advanced Methods: AWQ scaling and importance matrix techniques for quality preservation
  • Cross-Platform: Optimized kernels for x86, ARM CPUs, and various GPU backends
  • Memory Efficient: Significantly reduced RAM requirements through quantization

Models Available: 300+ GGUF quantized models including Qwen, Llama, Gemma, DeepSeek, and GPT variants in multiple quantization levels (Q2_K, Q3_K_S, Q4_K_M, Q5_K_M, Q8_0).

Ollama

Ollama is an open-source platform that simplifies running LLMs locally with a simple CLI and REST API. Powered by llama.cpp under the hood, Ollama makes model deployment as easy as ollama run model-name.

Key Features:

  • Easy Installation: Single-command setup with automatic model management
  • Model Library: Access to 23+ pre-configured models including Llama, Qwen, DeepSeek, Gemma
  • Performance: Recent updates deliver up to 12x faster inference speeds
  • Streaming Support: Real-time response streaming with tool call integration
  • Advanced Quantization: Pioneering INT4 and INT2 quantization for 2025

Popular Models: llama3.1:8b, qwen3:14b, deepseek-r1:7b, gemma3:4b, mistral-small:latest

Best for: Quick experimentation, development workflows, and easy local deployment.

vLLM

vLLM is a high-throughput, memory-efficient inference and serving engine designed for production LLM workloads. Now a PyTorch Foundation project, vLLM delivers enterprise-grade performance.

Key Features:

  • PagedAttention: Revolutionary memory management for efficient KV cache
  • Continuous Batching: Dynamic batching for optimal GPU utilization
  • V1 Engine (2025): 1.7x speedup with enhanced multimodal support and zero-overhead prefix caching
  • Blackwell Optimization: Up to 4x higher throughput on latest NVIDIA GPUs
  • Throughput: Reports of 20-24× higher requests/second vs traditional serving

Best for: Production deployments, high-throughput applications, batch processing, and scalable inference.

MLX Framework (Apple Silicon)

MLX is Apple's open-source machine learning framework specifically optimized for Apple Silicon's unified memory architecture. Released by Apple ML Research, MLX enables efficient on-device AI.

Core Components:

MLX-LM (Language Models)

  • Optimized Inference: Native Apple Silicon optimization for LLMs
  • Models: Llama, Qwen, Mistral, and 977+ models with 4-bit/8-bit quantization
  • Python API: NumPy-like interface with lazy computation
  • Performance: Leverages unified memory for efficient model execution

MLX-VLM (Vision-Language Models)

  • Vision AI: Multimodal models combining vision and language understanding
  • FastVLM (CVPR 2025): Apple's latest research with 85x faster TTFT
  • Pre-quantized: Optimized models ready for Apple Silicon
  • iOS/macOS Support: Run VLMs directly on devices

MFLUX (FLUX Image Generation)

  • Image Generation: FLUX models ported to MLX for Apple Silicon
  • Performance: Up to 25% faster than alternative implementations on M-series chips
  • Models: FLUX.1-dev, FLUX.1-schnell with 4-bit quantization
  • Hardware Requirements: M1/M2/M3/M4 with 24GB+ RAM recommended

Best for: Mac users with M-series chips, on-device inference, iOS/macOS applications, and privacy-focused deployments.

HuggingFace Transformers

Transformers is the de facto standard library for working with state-of-the-art ML models across text, vision, audio, and multimodal tasks.

Key Features:

  • Pipeline API: High-level interface for instant inference across all modalities
  • 56+ Tasks: Text generation, image classification, speech recognition, VQA, and more
  • Device Support: Automatic GPU/Apple Silicon/CPU detection with device_map="auto"
  • Optimization: FP16/BF16 precision, batch processing, and efficient memory management
  • Model Hub: Access to 500,000+ pre-trained models

Example Usage:

from transformers import pipeline
pipe = pipeline("text-generation", model="Qwen/Qwen2.5-7B", device_map="auto")
result = pipe("Once upon a time...")

Best for: Research, prototyping, fine-tuning, and accessing the latest models from HuggingFace Hub.

Comparison Matrix

FrameworkThroughputMemory EfficiencyEase of UseBest HardwareUse Case
llama.cppMediumExcellentMediumCPU, GPUQuantized models, edge devices
OllamaMediumGoodExcellentCPU, GPUDevelopment, quick testing
vLLMExcellentExcellentMediumNVIDIA GPUProduction, high-scale
MLXGoodExcellentGoodApple SiliconMac, iOS, privacy
TransformersMediumGoodExcellentAnyResearch, flexibility

Overall Benefits:

  • Privacy: Complete data privacy with no external API calls
  • Cost Control: Zero inference costs after initial hardware investment
  • Customization: Full control over model selection, parameters, and fine-tuning
  • Offline: Works without internet connectivity
  • Low Latency: No network round-trips for faster response times

Model Capabilities

Multimodal Support

  • Vision Models: GPT-4o, GPT-5, Gemini 2.0/2.5, Qwen 2.5 VL
  • Audio Models: GPT-4o Audio, GPT-4o Mini Audio, Whisper
  • Video Models: Sora 2, Sora 2 Pro (text-to-video, image-to-video)
  • Text-to-Image: GPT Image 1, DALL-E 3, Gemini 2.0 Flash Exp

Specialized Capabilities

  • Code Generation: Codex Mini, Qwen 2.5 Coder, DeepSWE Preview
  • Advanced Reasoning: O3, Claude 4.5, GPT-4.1, DeepSeek R1
  • Long Context: GPT-4.1 (1M), GPT-5 (400K), O4 Mini (200K)
  • Safety: Llama Guard 4, Claude models with constitutional AI

Performance Characteristics

  • Speed Optimized: Groq models, Cerebras models, Flash variants, Claude Haiku 4.5
  • Efficiency: Mini/Lite variants (GPT-5 Mini, GPT-4o Mini, Claude Haiku 4.5)
  • Quality: Flagship models like GPT-5, Claude Sonnet 4.5, Claude Opus 4, Gemini 2.5 Pro

Choosing the Right Model

For Maximum Model Access (Recommended for Cloud)

  • HuggingFace Inference Providers: Access to thousands of models across 17 providers with unified API
    • Best for: Flexibility, avoiding vendor lock-in, production workloads
    • Features: Automatic failover, OpenAI-compatible, all modalities (LLM, VLM, image, video, speech)
    • Providers: Cerebras, Groq, Together, Replicate, SambaNova, Fireworks, and 11 more

For General Tasks

  • GPT-5: Next-generation flagship with 400K context
  • Claude Sonnet 4.5: Latest reasoning model with enhanced capabilities
  • GPT-4o: Best overall performance and multimodal capabilities
  • Claude 3.5 Sonnet: Excellent reasoning and safety features
  • Gemini 2.5 Flash: Fast, efficient multimodal processing
  • Via HF Inference Providers: Access to all above models plus hundreds more

For Specialized Applications

  • Coding: Codex Mini, Qwen 2.5 Coder, DeepSWE Preview
  • Reasoning: O3, Claude 4.5, GPT-4.1, DeepSeek R1 (via HF Providers)
  • Vision: GPT-4o, Gemini 2.0/2.5, Qwen 2.5 VL
  • Audio: GPT-4o Audio variants, Whisper
  • Video: Sora 2, Sora 2 Pro, or via Fal AI/Replicate/Novita (HF Providers)
  • Image Generation: FLUX, Stable Diffusion via HF Inference Providers (Replicate, Together, Fal AI)

For Cost-Effective Solutions

  • HuggingFace Inference Providers: No markup, transparent pricing across providers
  • Hugging Face Models: Open-source alternatives
  • Mini Variants: GPT-5 Mini, GPT-4o Mini, Claude Haiku 4.5
  • Local Models: Ollama, llama.cpp for zero API costs

For High Performance

  • Via HF Providers: Groq (ultra-fast LLM), Cerebras (specialized hardware), Fireworks (optimized)
  • Groq Models: Ultra-fast inference
  • Cerebras Models: Specialized hardware optimization
  • Flash Variants: Speed-optimized models

For Extended Context

  • GPT-4.1: 1M tokens context window
  • GPT-5: 400K tokens context window
  • O4 Mini: 200K tokens context window
  • Claude 4.5: Extended context capabilities

For Privacy & Offline

  • Local Inference: llama.cpp (300+ models), Ollama (23+ models), vLLM (production), MLX (Apple Silicon)
  • Complete Privacy: No external API calls, all data stays local
  • Zero Costs: One-time hardware investment, unlimited inference

Configuration and Setup

API Requirements

  • OpenAI: OpenAI API key
  • Anthropic: Anthropic API key
  • Google: Google AI API key
  • Hugging Face: Hugging Face API token

Local Setup

  • Ollama: Download and install Ollama
  • Hardware: Sufficient RAM and optional GPU for local models
  • Storage: Adequate disk space for model files

Usage Considerations

  • Rate Limits: Different providers have different rate limits
  • Costs: Pricing varies significantly between providers and models
  • Availability: Some models may have limited availability or regions
  • Context Length: Models have different maximum context windows

This comprehensive model support makes NodeTool one of the most flexible AI platforms available, allowing you to choose the perfect model for your specific needs while maintaining the ability to experiment with cutting-edge options as they become available.


Complete Model Reference

NodeTool supports 1,655+ models across multiple categories. Below is a comprehensive reference organized by model type and capability.

Language Models (LLMs)

Ollama Models (23 models)

Local models served through Ollama with full privacy and offline support:

  • all-minilm:latest - Lightweight embedding model
  • deepseek-r1:1.5b, deepseek-r1:7b, deepseek-r1:14b - DeepSeek reasoning models
  • gemma3:270m, gemma3:1b, gemma3:4b, gemma3n:latest - Google Gemma models
  • gpt-oss:20b - Open-source GPT variant
  • granite3.1-moe:1b, granite3.1-moe:3b - IBM Granite MoE models
  • llama3.1:8b, llama3.2:3b - Meta Llama models
  • mistral-small:latest - Mistral AI model
  • nomic-embed-text:latest - Nomic embedding model
  • phi3.5:latest - Microsoft Phi model
  • qwen2.5-coder:3b, qwen2.5-coder:7b - Qwen coding models
  • qwen3:0.6b, qwen3:1.7b, qwen3:4b, qwen3:8b, qwen3:14b - Qwen 3 series

HuggingFace Text Generation (56 models)

Production-ready text generation models from HuggingFace:

Small Efficient Models:

  • Qwen/Qwen3-0.6B-MLX-4bit - Ultra-compact Qwen model
  • gpt2 - Classic GPT-2 baseline
  • distilgpt2 - Distilled GPT-2
  • Qwen/Qwen2-0.5B-Instruct - Tiny instruction model
  • bigcode/starcoder - Code-specialized model

Mid-Size Models:

  • mlx-community/Llama-3.2-1B-Instruct-4bit - Compact Llama
  • mlx-community/Llama-3.2-3B-Instruct-4bit - Balanced Llama
  • Qwen/Qwen3-4B-MLX-4bit - Mid-range Qwen

Large Models:

  • unsloth/Qwen3-14B-GGUF - Full Qwen 14B
  • ggml-org/Qwen2.5-Coder-0.5B-Q8_0-GGUF - Quantized coder model

GGUF Quantized Models (300+ models)

Efficient quantized models via llama.cpp with various quantization levels (Q2_K, Q3_K_S, Q4_K_M, Q5_K_M, Q8_0):

Featured Models:

  • ggml-org/gemma-3-1b-it-GGUF - Gemma instruction-tuned
  • ggml-org/Kimi-VL-A3B-Thinking-2506-GGUF - Vision-language reasoning
  • unsloth/Qwen3-30B-A3B-GGUF - Large Qwen model
  • unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF - Large coding model
  • unsloth/gpt-oss-20b-GGUF - 20B open GPT
  • unsloth/gpt-oss-120b-GGUF - 120B flagship model
  • unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF - DeepSeek R1 distill
  • unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF - Larger DeepSeek distill
  • unsloth/Phi-4-reasoning-plus-GGUF - Microsoft Phi-4 with reasoning
  • unsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF - Mistral 24B
  • unsloth/Qwen3-30B-A3B-Instruct-2507-GGUF - Latest Qwen 30B
  • unsloth/gemma-3-4b-it-GGUF, unsloth/gemma-3-12b-it-GGUF - Gemma variants
  • unsloth/Magistral-Small-2509-GGUF - Magistral model
  • unsloth/GLM-4.5-Air-GGUF - GLM model family
  • unsloth/Qwen2.5-VL-7B-Instruct-GGUF - Vision-language Qwen

MLX Models (977 models)

Apple Silicon optimized models using MLX framework for M-series chips:

Text Generation:

  • mlx-community/Llama-3.1-8B-Instruct-4bit
  • mlx-community/Qwen2.5-7B-Instruct-4bit
  • mlx-community/Mistral-7B-Instruct-v0.2-4bit

Vision-Language:

  • mlx-community/llava-1.5-7b-4bit
  • mlx-community/Qwen2-VL-2B-Instruct-4bit

Audio:

  • mlx-community/whisper-tiny-mlx, mlx-community/whisper-tiny.en-mlx
  • mlx-community/whisper-base-mlx, mlx-community/whisper-base.en-mlx
  • mlx-community/whisper-small-mlx, mlx-community/whisper-small.en-mlx
  • mlx-community/whisper-medium-mlx, mlx-community/whisper-medium.en-mlx
  • mlx-community/whisper-large-v3-mlx

Text-to-Speech:

  • mlx-community/Kokoro-82M-bf16, mlx-community/Kokoro-82M-4bit, mlx-community/Kokoro-82M-6bit, mlx-community/Kokoro-82M-8bit
  • mlx-community/Spark-TTS-0.5B-8bit, mlx-community/Spark-TTS-0.5B-bf16
  • mlx-community/csm-1b-8bit, mlx-community/csm-1b

Image Generation Models

FLUX Models (20 models)

State-of-the-art image generation from Black Forest Labs:

Base Models:

  • black-forest-labs/FLUX.1-dev - Main development model
  • black-forest-labs/FLUX.1-schnell - Fast generation variant

Specialized FLUX:

  • black-forest-labs/FLUX.1-Fill-dev - Inpainting model
  • black-forest-labs/FLUX.1-Canny-dev - Canny edge conditioning
  • black-forest-labs/FLUX.1-Depth-dev - Depth-based generation
  • black-forest-labs/FLUX.1-Redux-dev - Image variation
  • black-forest-labs/FLUX.1-Kontext-dev - Context-aware generation
  • Freepik/flux.1-lite-8B-alpha - Lightweight variant

Quantized FLUX (GGUF):

  • city96/FLUX.1-dev-gguf:flux1-dev-Q2_K.gguf through Q5_K_S.gguf - Various quantization levels
  • city96/FLUX.1-schnell-gguf:flux1-schnell-Q2_K.gguf through Q5_K_S.gguf

MLX-Optimized FLUX:

  • dhairyashil/FLUX.1-dev-mflux-4bit
  • dhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bit
  • filipstrand/FLUX.1-Krea-dev-mflux-4bit
  • akx/FLUX.1-Kontext-dev-mflux-4bit

Alternative FLUX:

  • Kwai-Kolors/Kolors-diffusers - Kolors variant
  • lodestones/Chroma - Chroma FLUX variant

ControlNet FLUX:

  • InstantX/FLUX.1-dev-Controlnet-Canny - Canny edge control
  • jasperai/Flux.1-dev-Controlnet-Upscaler - Upscaling control

Stable Diffusion 1.5 (20 models)

Classic Stable Diffusion models and fine-tunes:

Base & Popular Models:

  • stable-diffusion-v1-5/stable-diffusion-v1-5 - Original SD 1.5
  • SG161222/Realistic_Vision_V5.1_noVAE - Photorealistic variant
  • Lykon/DreamShaper - DreamShaper series
  • Lykon/dreamshaper-8 - Version 8

Specialized SD 1.5:

  • XpucT/Deliberate:Deliberate_v6.safetensors - Deliberate model
  • Lykon/AbsoluteReality:AbsoluteReality_1.8.1_pruned.safetensors - Photorealistic
  • gsdf/Counterfeit-V2.5:Counterfeit-V2.5_fp16.safetensors - Anime style
  • philz1337x/epicrealism:epicrealism_naturalSinRC1VAE.safetensors - Epic realism
  • digiplay/majicMIX_realistic_v7:majicmixRealistic_v7.safetensors - MajicMIX
  • 526christian/526mix-v1.5 - 526Mix
  • imagepipeline/epiC-PhotoGasm - PhotoGasm variant

Community Models:

  • stablediffusionapi/realistic-vision-v51 - API-ready realistic
  • stablediffusionapi/anything-v5 - Anime variant
  • Yntec/Deliberate2 - Deliberate v2
  • guoyww/animatediff-motion-adapter-v1-5-2 - Animation adapter

Stable Diffusion XL (9 models)

High-resolution SDXL models:

Base Models:

  • stabilityai/stable-diffusion-xl-base-1.0:sd_xl_base_1.0.safetensors
  • stabilityai/stable-diffusion-xl-refiner-1.0:sd_xl_refiner_1.0.safetensors

Optimized SDXL:

  • stabilityai/sdxl-turbo:sd_xl_turbo_1.0_fp16.safetensors - Fast generation
  • Lykon/dreamshaper-xl-turbo:DreamShaperXL_Turbo_v2_1.safetensors - DreamShaper turbo
  • Lykon/dreamshaper-xl-lightning:DreamShaperXL_Lightning.safetensors - Lightning fast

Specialized SDXL:

  • RunDiffusion/Juggernaut-XL-v9:Juggernaut-XL_v9_RunDiffusionPhoto_v2.safetensors - Photorealistic
  • playgroundai/playground-v2.5-1024px-aesthetic:playground-v2.5-1024px-aesthetic.fp16.safetensors
  • dataautogpt3/ProteusV0.5:proteusV0.5.safetensors - Proteus
  • Lykon/AAM_XL_AnimeMix:AAM_XL_Anime_Mix.safetensors - Anime style

Qwen Image Models (9 models)

Text-to-image generation from Qwen:

Base Model:

  • Qwen/Qwen-Image - Main model
  • Qwen/Qwen-Image-Edit - Image editing variant

GGUF Quantized (various levels):

  • city96/Qwen-Image-gguf:qwen-image-Q2_K.gguf
  • city96/Qwen-Image-gguf:qwen-image-Q3_K_S.gguf
  • city96/Qwen-Image-gguf:qwen-image-Q4_K_S.gguf
  • city96/Qwen-Image-gguf:qwen-image-Q5_K_S.gguf
  • city96/Qwen-Image-gguf:qwen-image-Q6_K.gguf
  • city96/Qwen-Image-gguf:qwen-image-Q8_0.gguf
  • city96/Qwen-Image-gguf:qwen-image-BF16.gguf

Lightning Variant:

  • lightx2v/Qwen-Image-Lightning:Qwen-Image-Lightning-8steps-V1.1.safetensors

OmniGen

  • Shitao/OmniGen-v1-diffusers - Unified image generation model

Vision & Multimodal Models

Vision-Language Models (7 models)

Image understanding and captioning:

  • mlx-community/llava-1.5-7b-4bit - LLaVA multimodal
  • mlx-community/Qwen2-VL-2B-Instruct-4bit - Qwen vision-language
  • ucaslcl/GOT-OCR2_0 - OCR model
  • HuggingFaceTB/SmolVLM-Instruct - Compact VLM
  • zai-org/GLM-4.5V - GLM vision model
  • Qwen/Qwen2.5-VL-3B-Instruct - Qwen 2.5 VL
  • llava-hf/llava-interleave-qwen-0.5b-hf - Interleaved LLaVA

Image-to-Text / Captioning (7 models)

Image understanding and description:

  • Salesforce/blip-image-captioning-base - BLIP base
  • Salesforce/blip-image-captioning-large - BLIP large
  • Salesforce/blip2-opt-2.7b - BLIP-2 with OPT
  • Salesforce/blip-vqa-base - Visual question answering
  • nlpconnect/vit-gpt2-image-captioning - ViT-GPT2
  • microsoft/git-base - GIT base
  • microsoft/git-base-coco - GIT COCO-trained
  • microsoft/trocr-small-printed - OCR for printed text

Object Detection (7 models)

Object localization and recognition:

  • facebook/detr-resnet-50 - DETR with ResNet-50
  • facebook/detr-resnet-101 - DETR with ResNet-101
  • hustvl/yolos-tiny - Tiny YOLO-S
  • hustvl/yolos-small - Small YOLO-S
  • valentinafeve/yolos-fashionpedia - Fashion-specialized

Zero-Shot Object Detection (6 models):

  • google/owlvit-base-patch32 - OWL-ViT base-32
  • google/owlvit-base-patch16 - OWL-ViT base-16
  • google/owlvit-large-patch14 - OWL-ViT large
  • google/owlv2-base-patch16 - OWL-ViT v2
  • google/owlv2-base-patch16-ensemble - OWL-ViT v2 ensemble
  • IDEA-Research/grounding-dino-tiny - Grounding DINO

Image Classification (9 models)

Image categorization and recognition:

  • timm/resnet50.a1_in1k - ResNet-50 ImageNet
  • microsoft/resnet-18 - ResNet-18
  • microsoft/resnet-50 - ResNet-50
  • google/vit-base-patch16-224 - Vision Transformer
  • apple/mobilevit-small - MobileViT small
  • apple/mobilevit-xx-small - MobileViT xx-small
  • nateraw/vit-age-classifier - Age classification
  • Falconsai/nsfw_image_detection - NSFW detection
  • rizvandwiki/gender-classification-2 - Gender classification

Zero-Shot Image Classification (4 models):

  • openai/clip-vit-base-patch32 - CLIP base-32
  • openai/clip-vit-base-patch16 - CLIP base-16
  • laion/CLIP-ViT-H-14-laion2B-s32B-b79K - Large CLIP
  • laion/CLIP-ViT-g-14-laion2B-s12B-b42K - Giant CLIP

Image Segmentation (2 models)

Pixel-level segmentation:

  • mattmdjaga/segformer_b2_clothes - Clothing segmentation
  • facebook/sam2-hiera-large - Segment Anything 2
  • nvidia/segformer-b3-finetuned-ade-512-512 - ADE20K segmentation

Depth Estimation (5 models)

Monocular depth prediction:

  • depth-anything/Depth-Anything-V2-Small-hf
  • depth-anything/Depth-Anything-V2-Base-hf
  • depth-anything/Depth-Anything-V2-Large-hf
  • LiheYoung/depth-anything-base-hf
  • Intel/dpt-large - Dense Prediction Transformer

Audio Models

Text-to-Speech (7 models)

Speech synthesis:

  • suno/bark - Bark TTS (multilingual)
  • suno/bark-small - Compact Bark
  • hexgrad/Kokoro-82M - Kokoro TTS
  • prince-canuma/Kokoro-82M - Kokoro variant
  • facebook/mms-tts-eng - English MMS
  • facebook/mms-tts-fra - French MMS
  • facebook/mms-tts-deu - German MMS
  • facebook/mms-tts-kor - Korean MMS

Text-to-Audio / Music Generation (11 models)

Music and sound generation:

  • facebook/musicgen-small - MusicGen small
  • facebook/musicgen-medium - MusicGen medium
  • facebook/musicgen-large - MusicGen large
  • facebook/musicgen-melody - MusicGen with melody
  • facebook/musicgen-stereo-small - Stereo small
  • facebook/musicgen-stereo-large - Stereo large
  • cvssp/audioldm-s-full-v2 - AudioLDM
  • cvssp/audioldm2 - AudioLDM 2
  • harmonai/maestro-150k - Maestro
  • ucsd-reach/musicldm - MusicLDM
  • stabilityai/stable-audio-open-1.0 - Stable Audio

Automatic Speech Recognition (6 models)

Speech-to-text transcription:

  • openai/whisper-small - Whisper small
  • openai/whisper-medium - Whisper medium
  • openai/whisper-large-v2 - Whisper large v2
  • openai/whisper-large-v3 - Whisper large v3
  • openai/whisper-large-v3-turbo - Whisper turbo
  • Systran/faster-whisper-large-v3 - Optimized Whisper
  • ggerganov/whisper.cpp - Whisper C++ implementation

Audio Classification (2 models)

Audio understanding:

  • MIT/ast-finetuned-audioset-10-10-0.4593 - Audio Spectrogram Transformer
  • ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition - Emotion recognition
  • laion/clap-htsat-unfused - CLAP audio-text

Video Models

Text-to-Video (9 models)

Video generation from text:

  • THUDM/CogVideoX-2b - CogVideo 2B
  • THUDM/CogVideoX-5b - CogVideo 5B
  • Wan-AI/Wan2.1-T2V-14B-Diffusers - Wan 2.1 T2V
  • Wan-AI/Wan2.2-T2V-A14B-Diffusers - Wan 2.2 T2V
  • Wan-AI/Wan2.2-TI2V-5B-Diffusers - Wan 2.2 TI2V

Image-to-Video Models

  • stabilityai/stable-video-diffusion-img2vid-xt - Stable Video Diffusion
  • Wan-AI/Wan2.1-I2V-14B-480P-Diffusers - Wan I2V 480P
  • Wan-AI/Wan2.1-I2V-14B-720P-Diffusers - Wan I2V 720P
  • Wan-AI/Wan2.2-I2V-A14B-Diffusers - Wan 2.2 I2V

LoRA Models (60+ models)

Style adapters for Stable Diffusion:

Character Styles:

  • danbrown/loras:lucy_cyberpunk.safetensors
  • danbrown/loras:aqua_konosuba.safetensors
  • danbrown/loras:paimon_genshinimpact.safetensors
  • danbrown/loras:princesszelda.safetensors
  • danbrown/loras:jacksparrow.safetensors
  • danbrown/loras:gigachad.safetensors
  • danbrown/loras:harold.safetensors
  • danbrown/loras:pepefrog.safetensors

Art Styles:

  • danbrown/loras:ghibli_scenery.safetensors
  • danbrown/loras:arcane_style.safetensors
  • danbrown/loras:persona5_style.safetensors
  • danbrown/loras:onepiece_style.safetensors
  • danbrown/loras:myheroacademia_style.safetensors
  • danbrown/loras:akiratoriyama_style.safetensors
  • danbrown/loras:jimlee_style.safetensors
  • danbrown/loras:wlop_style.safetensors
  • danbrown/loras:discoelysium_style.safetensors
  • danbrown/loras:sokolov_style.safetensors
  • danbrown/loras:peanutscomics_style.safetensors

Visual Effects:

  • danbrown/loras:fire_vfx.safetensors
  • danbrown/loras:lightning_vfx.safetensors
  • danbrown/loras:water_vfx.safetensors
  • danbrown/loras:flamingeye.safetensors

Specialized:

  • danbrown/loras:2d_sprite.safetensors - 2D sprites
  • danbrown/loras:pixhell.safetensors - Pixel art
  • danbrown/loras:add_detail.safetensors - Detail enhancement
  • danbrown/loras:animeoutlineV4.safetensors - Anime outlines
  • danbrown/loras:thickeranimelines.safetensors - Thick anime lines
  • danbrown/loras:cyberpunk_tarot.safetensors - Tarot style
  • danbrown/loras:gacha_splash.safetensors - Gacha game art
  • danbrown/loras:sxz_game_assets.safetensors - Game assets
  • danbrown/loras:twitch_emotes.safetensors - Twitch emotes
  • danbrown/loras:3Danaglyph.safetensors - 3D effect

SDXL LoRAs (7 models):

  • CiroN2022/toy-face:toy_face_sdxl.safetensors - Toy face style
  • nerijs/pixel-art-xl:pixel-art-xl.safetensors - Pixel art
  • goofyai/3d_render_style_xl:3d_render_style_xl.safetensors - 3D render
  • artificialguybr/CuteCartoonRedmond-V2:CuteCartoonRedmond-CuteCartoon-CuteCartoonAF.safetensors - Cute cartoon
  • blink7630/graphic-novel-illustration:Graphic_Novel_Illustration-000007.safetensors - Graphic novel
  • robert123231/coloringbookgenerator:ColoringBookRedmond-ColoringBook-ColoringBookAF.safetensors - Coloring book
  • Linaqruf/anime-detailer-xl-lora:anime-detailer-xl-lora.safetensors - Anime detail

Specialized Components

ControlNet Models (7+ models)

Conditional image generation:

SD 1.5 ControlNets:

  • lllyasviel/control_v11p_sd15_canny:diffusion_pytorch_model.fp16.safetensors - Canny edges
  • lllyasviel/control_v11p_sd15_inpaint:diffusion_pytorch_model.fp16.safetensors - Inpainting
  • lllyasviel/control_v11p_sd15_mlsd:diffusion_pytorch_model.fp16.safetensors - Line detection
  • lllyasviel/control_v11p_sd15_lineart:diffusion_pytorch_model.fp16.safetensors - Line art
  • lllyasviel/control_v11p_sd15_scribble:diffusion_pytorch_model.fp16.safetensors - Scribbles
  • lllyasviel/control_v11p_sd15_openpose:diffusion_pytorch_model.fp16.safetensors - OpenPose

IP-Adapter Models (6 models)

Image prompt adapters:

SD 1.5 IP-Adapters:

  • h94/IP-Adapter:models/ip-adapter_sd15.bin
  • h94/IP-Adapter:models/ip-adapter_sd15_light.bin
  • h94/IP-Adapter:models/ip-adapter_sd15_vit-G.bin

SDXL IP-Adapters:

  • h94/IP-Adapter:sdxl_models/ip-adapter_sdxl.bin
  • h94/IP-Adapter:sdxl_models/ip-adapter_sdxl_vit-h.bin
  • h94/IP-Adapter:sdxl_models/ip-adapter-plus_sdxl_vit-h.bin

VAE Models (4 models)

Variational autoencoders for SD:

  • stabilityai/sd-vae-ft-mse - SD 1.5 VAE
  • stabilityai/sd-vae-ft-ema - SD 1.5 VAE EMA
  • stabilityai/sdxl-vae - SDXL VAE
  • madebyollin/sdxl-vae-fp16-fix - SDXL VAE FP16

Upscaler Models (5+ models)

Image super-resolution:

  • ai-forever/Real-ESRGAN:RealESRGAN_x2.pth - 2x upscale
  • ai-forever/Real-ESRGAN:RealESRGAN_x4.pth - 4x upscale
  • ai-forever/Real-ESRGAN:RealESRGAN_x8.pth - 8x upscale
  • ximso/RealESRGAN_x4plus_anime_6B:RealESRGAN_x4plus_anime_6B.pth - Anime 4x
  • stabilityai/stable-diffusion-x4-upscaler - SD upscaler
  • stabilityai/sd-x2-latent-upscaler - Latent upscaler
  • caidas/swin2SR-classical-sr-x2-64 - Swin2SR 2x
  • caidas/swin2SR-classical-sr-x4-64 - Swin2SR 4x
  • caidas/swin2SR-lightweight-x2-64 - Lightweight 2x
  • caidas/swin2SR-compressed-sr-x4-48 - Compressed 4x
  • caidas/swin2SR-realworld-sr-x4-64-bsrgan-psnr - Real-world 4x

NLP & Text Models

Text Classification (2 models)

Sentiment and classification:

  • cardiffnlp/twitter-roberta-base-sentiment-latest - Sentiment analysis
  • michellejieli/emotion_text_classifier - Emotion classification

Zero-Shot Classification (7 models)

Flexible text classification:

  • facebook/bart-large-mnli - BART MNLI
  • MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli - DeBERTa NLI
  • MoritzLaurer/mDeBERTa-v3-base-mnli-xnli - Multilingual DeBERTa
  • tasksource/ModernBERT-base-nli - ModernBERT
  • cross-encoder/nli-deberta-v3-base - Cross-encoder
  • microsoft/deberta-v2-xlarge-mnli - DeBERTa XL
  • roberta-large-mnli - RoBERTa MNLI

Question Answering (4 models)

Extractive QA:

  • distilbert-base-cased-distilled-squad - DistilBERT SQuAD
  • distilbert-base-uncased-distilled-squad - DistilBERT uncased
  • bert-large-uncased-whole-word-masking-finetuned-squad - BERT large
  • deepset/roberta-base-squad2 - RoBERTa SQuAD 2.0

Table Question Answering (3 models)

Structured data QA:

  • google/tapas-base-finetuned-wtq - TAPAS base
  • google/tapas-large-finetuned-wtq - TAPAS large
  • microsoft/tapex-large-finetuned-tabfact - TAPEX large

Fill Mask / MLM (5 models)

Masked language modeling:

  • bert-base-uncased - BERT base uncased
  • bert-base-cased - BERT base cased
  • roberta-base - RoBERTa base
  • distilbert-base-uncased - DistilBERT
  • albert-base-v2 - ALBERT v2

Text Generation / Summarization (4 models)

Text transformation:

  • Falconsai/text_summarization - General summarization
  • Falconsai/medical_summarization - Medical text
  • imvladikon/het5_summarization - HE-T5 summarization

Text2Text Generation (4 models)

Sequence-to-sequence:

  • google-t5/t5-small - T5 small
  • google-t5/t5-base - T5 base
  • google-t5/t5-large - T5 large
  • google/flan-t5-small, google/flan-t5-base, google/flan-t5-large - FLAN-T5 variants

Translation (3 models)

Language translation models

Sentence Similarity / Embeddings (4 models)

Semantic embeddings:

  • sentence-transformers/all-MiniLM-L6-v2 - All-MiniLM
  • sentence-transformers/all-mpnet-base-v2 - All-MPNet
  • sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 - Multilingual
  • mixedbread-ai/mxbai-embed-large-v1 - MixedBread embedding
  • BAAI/bge-base-en-v1.5, BAAI/bge-large-en-v1.5 - BGE embeddings
  • BAAI/bge-m3 - BGE-M3

Reranking (3 models)

Search result reranking:

  • BAAI/bge-reranker-base - BGE reranker base
  • BAAI/bge-reranker-large - BGE reranker large
  • BAAI/bge-reranker-v2-m3 - BGE reranker v2

Feature Extraction (3 models)

Dense retrieval:

  • facebook/contriever - Contriever
  • gokaygokay/Flux-Prompt-Enhance - Prompt enhancement

Model Type Summary

CategoryCountUse Cases
MLX Models977Apple Silicon optimized inference
GGUF/llama.cpp300+Quantized efficient LLMs
HuggingFace Text Gen56General text generation
LoRA Adapters60+Style transfer for SD models
Ollama Models23Local privacy-first LLMs
FLUX Models20State-of-the-art image generation
Stable Diffusion20Classic image generation
Various Specialized200+Audio, vision, NLP tasks

Total: 1,655+ models across all categories