layout: default title: Supported Models
Supported Models
NodeTool provides extensive support for AI models across multiple providers, from cutting-edge proprietary models to open-source alternatives. This comprehensive guide covers all supported models and their capabilities.
Provider Overview
NodeTool supports both cloud-based and local AI model providers:
Cloud Providers
Recommended: HuggingFace Inference Providers
HuggingFace Inference Providers offer the widest model support through a unified API, giving you access to hundreds of models across 17 specialized inference providers. This is the recommended choice for cloud-based inference when you need maximum flexibility and model access.
Why Choose HuggingFace Inference Providers:
- Widest Model Selection: Access to thousands of models across all major AI tasks
- 17 Specialized Providers: Cerebras, Cohere, Fal AI, Featherless AI, Fireworks, Groq, HF Inference, Hyperbolic, Nebius, Novita, Nscale, Public AI, Replicate, SambaNova, Scaleway, Together, and Z.ai
- Zero Vendor Lock-in: Switch between providers seamlessly through one API
- OpenAI-Compatible: Drop-in replacement for OpenAI SDK
- Automatic Failover: Requests route to alternative providers if primary is unavailable
- Unified Billing: Single token for all providers, no markup on provider rates
- Production-Ready: Enterprise-grade reliability and performance
Supported Tasks:
- Chat Completion (LLM & VLM)
- Feature Extraction (Embeddings)
- Text-to-Image Generation
- Text-to-Video Generation
- Speech-to-Text
Provider Capabilities Matrix:
| Provider | LLM | VLM | Embeddings | Image Gen | Video Gen | Speech |
|---|---|---|---|---|---|---|
| Cerebras | ✅ | |||||
| Cohere | ✅ | ✅ | ||||
| Fal AI | ✅ | ✅ | ✅ | |||
| Featherless AI | ✅ | ✅ | ||||
| Fireworks | ✅ | ✅ | ||||
| Groq | ✅ | ✅ | ||||
| HF Inference | ✅ | ✅ | ✅ | ✅ | ✅ | |
| Hyperbolic | ✅ | ✅ | ||||
| Nebius | ✅ | ✅ | ✅ | ✅ | ||
| Novita | ✅ | ✅ | ✅ | |||
| Nscale | ✅ | ✅ | ✅ | |||
| Public AI | ✅ | |||||
| Replicate | ✅ | ✅ | ✅ | |||
| SambaNova | ✅ | ✅ | ||||
| Scaleway | ✅ | ✅ | ||||
| Together | ✅ | ✅ | ✅ | |||
| Z.ai | ✅ | ✅ |
Key Features:
- 🎯 All-in-One API: Single interface for text, image, video, embeddings, and speech tasks
- 🔀 Multi-Provider Support: Seamlessly switch between 17 top-tier providers
- 🚀 Scalable & Reliable: High availability and low-latency for production
- 🔧 Developer-Friendly: OpenAI-compatible API with Python and JavaScript SDKs
- 💰 Cost-Effective: Transparent pricing with no extra markup
Example Usage:
from huggingface_hub import InferenceClient
client = InferenceClient()
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=[{"role": "user", "content": "Hello!"}]
)
For detailed documentation, visit HuggingFace Inference Providers.
Direct Provider APIs
- OpenAI: Industry-leading language models with multimodal capabilities (GPT-5, GPT-4o, O3, Sora 2)
- Anthropic: Advanced reasoning models with strong safety features (Claude 4.5, Claude 4)
- Google: Multimodal models with excellent vision and reasoning capabilities (Gemini 2.5)
Local Inference Engines
NodeTool provides comprehensive local model support with 1,655+ models across multiple frameworks:
- llama.cpp: Highly optimized C/C++ inference with GGUF format support (300+ quantized models)
- Ollama: User-friendly local deployment with 23+ pre-configured models
- vLLM: Production-grade high-throughput inference engine (PyTorch Foundation project)
- MLX Framework: Apple Silicon-optimized inference (977+ models)
- MLX-LM: Language model inference
- MLX-VLM: Vision-language models with FastVLM
- MFLUX: FLUX image generation for Apple Silicon
- HuggingFace Transformers: Access to 500,000+ models with pipeline API
Local Benefits: Complete privacy, zero API costs, offline functionality, and full customization.
OpenAI Models
GPT-5 Series
- GPT-5 (
gpt-5): Next-generation flagship model with 400K context window - GPT-5 Mini (
gpt-5-mini): Efficient variant of GPT-5
GPT-4o Series
- GPT-4o (
gpt-4o): Advanced multimodal model with vision capabilities - GPT-4o Mini (
gpt-4o-mini): Efficient version of GPT-4o for cost-effective tasks - ChatGPT-4o (
chatgpt-4o-latest): Conversational variant optimized for chat
GPT-4o Audio Series
- GPT-4o Audio (
gpt-4o-audio-preview-2024-12-17): Enhanced with audio processing - GPT-4o Mini Audio (
gpt-4o-mini-audio-preview-2024-12-17): Compact audio model
GPT-4.1 Series
- GPT-4.1 (
gpt-4.1): Advanced reasoning model with 1M context window - GPT-4.1 Mini (
gpt-4.1-mini): Efficient version of GPT-4.1
O-Series Reasoning Models
- O3 (
o3,o3-mini): Advanced reasoning models (no tool support) - O4 Mini (
o4-mini): Specialized reasoning model with 200K context
Image Generation
- GPT Image 1 (
gpt-image-1): Latest image generation model - DALL-E 3 (
dall-e-3): High-quality image generation (legacy) - DALL-E 2 (
dall-e-2): Previous generation (legacy)
Video Generation
- Sora 2 (
sora-2): Text and image-to-video generation - Sora 2 Pro (
sora-2-pro): Premium video generation with enhanced quality
Specialized Models
- Codex Mini (
codex-mini-latest): Code-focused model for programming tasks - Whisper (
whisper-1): Speech recognition and transcription
Best for: General-purpose tasks, coding, multimodal applications, audio processing, video generation
Anthropic Models
Claude 4.5 Series
- Claude Sonnet 4.5 (
claude-sonnet-4-5-latest,claude-sonnet-4-5-20250929): Latest next-generation model with enhanced reasoning - Claude Haiku 4.5 (
claude-4-5-haiku-latest): Fast, efficient latest-generation model
Claude 4 Series
- Claude Sonnet 4 (
claude-sonnet-4-latest,claude-sonnet-4-20250514): Next-generation reasoning model - Claude Opus 4 (
claude-opus-4-latest,claude-opus-4-20250514): Premium model for the most complex tasks
Claude 3.7 Series
- Claude 3.7 Sonnet (
claude-3-7-sonnet-latest): Advanced reasoning capabilities
Claude 3.5 Series
- Claude 3.5 Haiku (
claude-3-5-haiku-latest): Fast, efficient model for everyday tasks - Claude 3.5 Sonnet (
claude-3-5-sonnet-latest): Balanced model for complex reasoning
Best for: Complex reasoning, analysis, safety-critical applications, long-form content, extended context tasks
Google Gemini Models
Gemini 2.5 Series
- Gemini 2.5 Pro Experimental (
gemini-2.5-pro-exp-03-25): Cutting-edge experimental model - Gemini 2.5 Flash (
gemini-2.5-flash-preview-04-17): Fast, efficient multimodal model
Gemini 2.0 Series
- Gemini 2.0 Flash (
gemini-2.0-flash): Optimized for speed and efficiency - Gemini 2.0 Flash Lite (
gemini-2.0-flash-lite): Lightweight version for basic tasks - Gemini 2.0 Flash Exp Image Generation (
gemini-2.0-flash-exp-image-generation): Specialized for image generation
Best for: Multimodal tasks, vision processing, image generation, speed-critical applications
Hugging Face Models
Advanced Reasoning Models
- DeepSeek V3 0324 (
deepseek-ai/DeepSeek-V3-0324): Advanced reasoning and code generation - DeepSeek TNG R1T2 Chimera (
tngtech/DeepSeek-TNG-R1T2-Chimera): Hybrid reasoning model - DeepSeek R1 (
deepseek-ai/DeepSeek-R1): Latest DeepSeek reasoning model - DeepSeek R1 Distill Qwen 1.5B (
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B): Distilled efficient version
Instruction-Tuned Models
- Hunyuan A13B Instruct (
tencent/Hunyuan-A13B-Instruct): Tencent's instruction-tuned model - Meta Llama 3.1 8B Instruct (
meta-llama/Meta-Llama-3.1-8B-Instruct): Meta's powerful instruction model - Qwen 2.5 7B Instruct 1M (
Qwen/Qwen2.5-7B-Instruct-1M): Extended context length model
Specialized Models
- DeepSWE Preview (
agentica-org/DeepSWE-Preview): Specialized for software engineering tasks - Qwen 2.5 Coder 32B Instruct (
Qwen/Qwen2.5-Coder-32B-Instruct): Code-specialized model - Qwen 2.5 VL 7B Instruct (
Qwen/Qwen2.5-VL-7B-Instruct): Vision-language model
Compact Models
- SmolLM3 3B (
HuggingFaceTB/SmolLM3-3B): Compact, efficient language model - Gemma 2 2B IT (
google/gemma-2-2b-it): Google's efficient instruction-tuned model - Phi 4 (
microsoft/phi-4): Microsoft's latest compact model
Best for: Open-source applications, specialized tasks, cost-effective deployment, research
Hugging Face Groq Models
High-performance models optimized for speed through Groq's inference infrastructure:
Meta Llama Series
- Meta Llama 3 70B Instruct (
meta-llama/Meta-Llama-3-70B-Instruct): Large-scale instruction model - Llama 3.3 70B Instruct (
meta-llama/Llama-3.3-70B-Instruct): Enhanced version with improved capabilities - Llama Guard 4 12B (
meta-llama/Llama-Guard-4-12B): Safety and content moderation model
Llama 4 Preview Series
- Llama 4 Scout 17B 16E Instruct (
meta-llama/Llama-4-Scout-17B-16E-Instruct): Preview of next-generation Llama - Llama 4 Maverick 17B 128E Instruct (
meta-llama/Llama-4-Maverick-17B-128E-Instruct): Extended context Llama 4 variant
Best for: High-throughput applications, real-time inference, production deployments
Hugging Face Cerebras Models
Models optimized for Cerebras' specialized hardware:
- Cerebras GPT 2.5 12B Instruct (
cerebras/Cerebras-GPT-2.5-12B-Instruct): Cerebras' proprietary model - Llama 3.3 70B Instruct (
meta-llama/Llama-3.3-70B-Instruct): Optimized for Cerebras hardware - Llama 4 Scout 17B 16E Instruct (
meta-llama/Llama-4-Scout-17B-16E-Instruct): Next-gen Llama on Cerebras
Best for: Ultra-fast inference, specialized hardware optimization, high-performance computing
HuggingFace Inference Providers (Unified Multi-Provider Access)
HuggingFace Inference Providers is a unified proxy service that gives you access to hundreds of models across 17 specialized inference providers through a single API. This is the recommended cloud solution for NodeTool when you need maximum flexibility and model access.
Why Use Inference Providers?
Zero Vendor Lock-in: Instead of committing to a single provider's model catalog, access models from Cerebras, Groq, Together AI, Replicate, SambaNova, Fireworks, and 11 more through one consistent interface.
Instant Access to Cutting-Edge Models: Go beyond mainstream providers to access thousands of specialized models across multiple AI tasks - language models, image generators, embeddings, speech processing, and more.
Production-Ready Performance: Built for enterprise workloads with high availability, automatic failover, and low-latency infrastructure.
OpenAI-Compatible API: Drop-in replacement for OpenAI SDK - migrate existing code with minimal changes while gaining access to hundreds more models.
Supported Providers & Capabilities
| Provider | Chat (LLM) | Vision (VLM) | Embeddings | Image Gen | Video Gen | Speech |
|---|---|---|---|---|---|---|
| Cerebras | ✅ | |||||
| Cohere | ✅ | ✅ | ||||
| Fal AI | ✅ | ✅ | ✅ | |||
| Featherless AI | ✅ | ✅ | ||||
| Fireworks | ✅ | ✅ | ||||
| Groq | ✅ | ✅ | ||||
| HF Inference | ✅ | ✅ | ✅ | ✅ | ✅ | |
| Hyperbolic | ✅ | ✅ | ||||
| Nebius | ✅ | ✅ | ✅ | ✅ | ||
| Novita | ✅ | ✅ | ✅ | |||
| Nscale | ✅ | ✅ | ✅ | |||
| Public AI | ✅ | |||||
| Replicate | ✅ | ✅ | ✅ | |||
| SambaNova | ✅ | ✅ | ||||
| Scaleway | ✅ | ✅ | ||||
| Together | ✅ | ✅ | ✅ | |||
| Z.ai | ✅ | ✅ |
Key Features
- 🎯 All-in-One API: Single interface for text generation, image generation, embeddings, NER, summarization, speech recognition, and 50+ more tasks
- 🔀 Multi-Provider Support: Seamlessly run models from 17 top-tier providers without changing your code
- 🚀 Scalable & Reliable: Built for high availability and low-latency performance in production
- 🔧 Developer-Friendly: Simple requests, fast responses, consistent experience across Python and JavaScript
- 💰 Cost-Effective: Transparent pricing with no markup on provider rates
- 🔄 Automatic Failover: Requests automatically route to alternative providers if primary is unavailable
- 🔐 Unified Authentication: Use a single HuggingFace token for all 17 providers
Getting Started
Python Example:
from huggingface_hub import InferenceClient
# Initialize client
client = InferenceClient()
# Chat completion
completion = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3-0324",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Image generation
image = client.text_to_image(
prompt="A serene lake at sunset, photorealistic",
model="black-forest-labs/FLUX.1-dev"
)
image.save("output.png")
# Embeddings
embedding = client.feature_extraction(
text="NodeTool is an AI workflow platform",
model="sentence-transformers/all-MiniLM-L6-v2"
)
JavaScript Example:
import { InferenceClient } from "@huggingface/inference";
const client = new InferenceClient(process.env.HF_TOKEN);
// Chat completion
const completion = await client.chatCompletion({
model: "meta-llama/Llama-3.1-8B-Instruct",
messages: [{ role: "user", content: "Hello!" }],
});
// Automatic provider selection
const result = await client.chatCompletion({
model: "meta-llama/Llama-3.1-8B-Instruct",
provider: "auto", // Default: chooses best available provider
messages: [{ role: "user", content: "Hello!" }],
});
// Explicit provider selection
const specific = await client.chatCompletion({
model: "meta-llama/Llama-3.1-8B-Instruct",
provider: "groq", // Force specific provider
messages: [{ role: "user", content: "Hello!" }],
});
OpenAI-Compatible (Drop-in Replacement):
from openai import OpenAI
client = OpenAI(
base_url="https://router.huggingface.co/v1",
api_key=os.environ["HF_TOKEN"],
)
completion = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
Provider Selection
- Automatic (
provider="auto"): System selects the first available provider based on your preference order - Explicit (
provider="groq"): Force use of a specific provider for consistency - Server-Side (OpenAI endpoint): Router automatically chooses best available provider
Use Cases
What You Can Build:
- Chatbots & Assistants: Access to latest LLMs with tool-calling support
- Image & Video Generation: FLUX, Stable Diffusion, and custom style generation with LoRAs
- Search & RAG Systems: State-of-the-art embeddings for semantic search and recommendations
- Speech Processing: Transcription, speech-to-text, and audio analysis
- Traditional ML: Classification, NER, summarization, and 50+ specialized tasks
Pricing & Authentication
- Generous Free Tier: Start with free credits for testing and development
- PRO Users: Additional credits included with HuggingFace PRO subscription
- Enterprise: Custom pricing for high-volume workloads
- No Markup: Direct provider pricing with transparent billing
- Single Token: One HuggingFace token authenticates across all 17 providers
Best For
- Maximum Flexibility: Need access to models from multiple providers
- Avoiding Lock-in: Don't want to commit to a single provider
- Production Workloads: Need reliability with automatic failover
- Cost Optimization: Compare pricing across providers easily
- Rapid Development: Test multiple models without integrating 17 different APIs
For complete documentation, visit HuggingFace Inference Providers.
Local Models
NodeTool provides extensive support for running models locally with complete privacy and no external API dependencies. Multiple inference engines are supported, each optimized for different hardware and use cases.
llama.cpp & GGUF Format
llama.cpp is a highly optimized C/C++ inference library that enables efficient LLM inference on CPU and GPU hardware. It supports the GGUF (GGML Universal File) format, a binary format designed for fast loading and memory-efficient storage.
Key Features:
- Quantization Support: 1.5-bit through 8-bit integer quantization for reduced memory usage
- Advanced Methods: AWQ scaling and importance matrix techniques for quality preservation
- Cross-Platform: Optimized kernels for x86, ARM CPUs, and various GPU backends
- Memory Efficient: Significantly reduced RAM requirements through quantization
Models Available: 300+ GGUF quantized models including Qwen, Llama, Gemma, DeepSeek, and GPT variants in multiple quantization levels (Q2_K, Q3_K_S, Q4_K_M, Q5_K_M, Q8_0).
Ollama
Ollama is an open-source platform that simplifies running LLMs locally with a simple CLI and REST API. Powered by
llama.cpp under the hood, Ollama makes model deployment as easy as ollama run model-name.
Key Features:
- Easy Installation: Single-command setup with automatic model management
- Model Library: Access to 23+ pre-configured models including Llama, Qwen, DeepSeek, Gemma
- Performance: Recent updates deliver up to 12x faster inference speeds
- Streaming Support: Real-time response streaming with tool call integration
- Advanced Quantization: Pioneering INT4 and INT2 quantization for 2025
Popular Models: llama3.1:8b, qwen3:14b, deepseek-r1:7b, gemma3:4b, mistral-small:latest
Best for: Quick experimentation, development workflows, and easy local deployment.
vLLM
vLLM is a high-throughput, memory-efficient inference and serving engine designed for production LLM workloads. Now a PyTorch Foundation project, vLLM delivers enterprise-grade performance.
Key Features:
- PagedAttention: Revolutionary memory management for efficient KV cache
- Continuous Batching: Dynamic batching for optimal GPU utilization
- V1 Engine (2025): 1.7x speedup with enhanced multimodal support and zero-overhead prefix caching
- Blackwell Optimization: Up to 4x higher throughput on latest NVIDIA GPUs
- Throughput: Reports of 20-24× higher requests/second vs traditional serving
Best for: Production deployments, high-throughput applications, batch processing, and scalable inference.
MLX Framework (Apple Silicon)
MLX is Apple's open-source machine learning framework specifically optimized for Apple Silicon's unified memory architecture. Released by Apple ML Research, MLX enables efficient on-device AI.
Core Components:
MLX-LM (Language Models)
- Optimized Inference: Native Apple Silicon optimization for LLMs
- Models: Llama, Qwen, Mistral, and 977+ models with 4-bit/8-bit quantization
- Python API: NumPy-like interface with lazy computation
- Performance: Leverages unified memory for efficient model execution
MLX-VLM (Vision-Language Models)
- Vision AI: Multimodal models combining vision and language understanding
- FastVLM (CVPR 2025): Apple's latest research with 85x faster TTFT
- Pre-quantized: Optimized models ready for Apple Silicon
- iOS/macOS Support: Run VLMs directly on devices
MFLUX (FLUX Image Generation)
- Image Generation: FLUX models ported to MLX for Apple Silicon
- Performance: Up to 25% faster than alternative implementations on M-series chips
- Models: FLUX.1-dev, FLUX.1-schnell with 4-bit quantization
- Hardware Requirements: M1/M2/M3/M4 with 24GB+ RAM recommended
Best for: Mac users with M-series chips, on-device inference, iOS/macOS applications, and privacy-focused deployments.
HuggingFace Transformers
Transformers is the de facto standard library for working with state-of-the-art ML models across text, vision, audio, and multimodal tasks.
Key Features:
- Pipeline API: High-level interface for instant inference across all modalities
- 56+ Tasks: Text generation, image classification, speech recognition, VQA, and more
- Device Support: Automatic GPU/Apple Silicon/CPU detection with
device_map="auto" - Optimization: FP16/BF16 precision, batch processing, and efficient memory management
- Model Hub: Access to 500,000+ pre-trained models
Example Usage:
from transformers import pipeline
pipe = pipeline("text-generation", model="Qwen/Qwen2.5-7B", device_map="auto")
result = pipe("Once upon a time...")
Best for: Research, prototyping, fine-tuning, and accessing the latest models from HuggingFace Hub.
Comparison Matrix
| Framework | Throughput | Memory Efficiency | Ease of Use | Best Hardware | Use Case |
|---|---|---|---|---|---|
| llama.cpp | Medium | Excellent | Medium | CPU, GPU | Quantized models, edge devices |
| Ollama | Medium | Good | Excellent | CPU, GPU | Development, quick testing |
| vLLM | Excellent | Excellent | Medium | NVIDIA GPU | Production, high-scale |
| MLX | Good | Excellent | Good | Apple Silicon | Mac, iOS, privacy |
| Transformers | Medium | Good | Excellent | Any | Research, flexibility |
Overall Benefits:
- Privacy: Complete data privacy with no external API calls
- Cost Control: Zero inference costs after initial hardware investment
- Customization: Full control over model selection, parameters, and fine-tuning
- Offline: Works without internet connectivity
- Low Latency: No network round-trips for faster response times
Model Capabilities
Multimodal Support
- Vision Models: GPT-4o, GPT-5, Gemini 2.0/2.5, Qwen 2.5 VL
- Audio Models: GPT-4o Audio, GPT-4o Mini Audio, Whisper
- Video Models: Sora 2, Sora 2 Pro (text-to-video, image-to-video)
- Text-to-Image: GPT Image 1, DALL-E 3, Gemini 2.0 Flash Exp
Specialized Capabilities
- Code Generation: Codex Mini, Qwen 2.5 Coder, DeepSWE Preview
- Advanced Reasoning: O3, Claude 4.5, GPT-4.1, DeepSeek R1
- Long Context: GPT-4.1 (1M), GPT-5 (400K), O4 Mini (200K)
- Safety: Llama Guard 4, Claude models with constitutional AI
Performance Characteristics
- Speed Optimized: Groq models, Cerebras models, Flash variants, Claude Haiku 4.5
- Efficiency: Mini/Lite variants (GPT-5 Mini, GPT-4o Mini, Claude Haiku 4.5)
- Quality: Flagship models like GPT-5, Claude Sonnet 4.5, Claude Opus 4, Gemini 2.5 Pro
Choosing the Right Model
For Maximum Model Access (Recommended for Cloud)
- HuggingFace Inference Providers: Access to thousands of models across 17 providers with unified API
- Best for: Flexibility, avoiding vendor lock-in, production workloads
- Features: Automatic failover, OpenAI-compatible, all modalities (LLM, VLM, image, video, speech)
- Providers: Cerebras, Groq, Together, Replicate, SambaNova, Fireworks, and 11 more
For General Tasks
- GPT-5: Next-generation flagship with 400K context
- Claude Sonnet 4.5: Latest reasoning model with enhanced capabilities
- GPT-4o: Best overall performance and multimodal capabilities
- Claude 3.5 Sonnet: Excellent reasoning and safety features
- Gemini 2.5 Flash: Fast, efficient multimodal processing
- Via HF Inference Providers: Access to all above models plus hundreds more
For Specialized Applications
- Coding: Codex Mini, Qwen 2.5 Coder, DeepSWE Preview
- Reasoning: O3, Claude 4.5, GPT-4.1, DeepSeek R1 (via HF Providers)
- Vision: GPT-4o, Gemini 2.0/2.5, Qwen 2.5 VL
- Audio: GPT-4o Audio variants, Whisper
- Video: Sora 2, Sora 2 Pro, or via Fal AI/Replicate/Novita (HF Providers)
- Image Generation: FLUX, Stable Diffusion via HF Inference Providers (Replicate, Together, Fal AI)
For Cost-Effective Solutions
- HuggingFace Inference Providers: No markup, transparent pricing across providers
- Hugging Face Models: Open-source alternatives
- Mini Variants: GPT-5 Mini, GPT-4o Mini, Claude Haiku 4.5
- Local Models: Ollama, llama.cpp for zero API costs
For High Performance
- Via HF Providers: Groq (ultra-fast LLM), Cerebras (specialized hardware), Fireworks (optimized)
- Groq Models: Ultra-fast inference
- Cerebras Models: Specialized hardware optimization
- Flash Variants: Speed-optimized models
For Extended Context
- GPT-4.1: 1M tokens context window
- GPT-5: 400K tokens context window
- O4 Mini: 200K tokens context window
- Claude 4.5: Extended context capabilities
For Privacy & Offline
- Local Inference: llama.cpp (300+ models), Ollama (23+ models), vLLM (production), MLX (Apple Silicon)
- Complete Privacy: No external API calls, all data stays local
- Zero Costs: One-time hardware investment, unlimited inference
Configuration and Setup
API Requirements
- OpenAI: OpenAI API key
- Anthropic: Anthropic API key
- Google: Google AI API key
- Hugging Face: Hugging Face API token
Local Setup
- Ollama: Download and install Ollama
- Hardware: Sufficient RAM and optional GPU for local models
- Storage: Adequate disk space for model files
Usage Considerations
- Rate Limits: Different providers have different rate limits
- Costs: Pricing varies significantly between providers and models
- Availability: Some models may have limited availability or regions
- Context Length: Models have different maximum context windows
This comprehensive model support makes NodeTool one of the most flexible AI platforms available, allowing you to choose the perfect model for your specific needs while maintaining the ability to experiment with cutting-edge options as they become available.
Complete Model Reference
NodeTool supports 1,655+ models across multiple categories. Below is a comprehensive reference organized by model type and capability.
Language Models (LLMs)
Ollama Models (23 models)
Local models served through Ollama with full privacy and offline support:
all-minilm:latest- Lightweight embedding modeldeepseek-r1:1.5b,deepseek-r1:7b,deepseek-r1:14b- DeepSeek reasoning modelsgemma3:270m,gemma3:1b,gemma3:4b,gemma3n:latest- Google Gemma modelsgpt-oss:20b- Open-source GPT variantgranite3.1-moe:1b,granite3.1-moe:3b- IBM Granite MoE modelsllama3.1:8b,llama3.2:3b- Meta Llama modelsmistral-small:latest- Mistral AI modelnomic-embed-text:latest- Nomic embedding modelphi3.5:latest- Microsoft Phi modelqwen2.5-coder:3b,qwen2.5-coder:7b- Qwen coding modelsqwen3:0.6b,qwen3:1.7b,qwen3:4b,qwen3:8b,qwen3:14b- Qwen 3 series
HuggingFace Text Generation (56 models)
Production-ready text generation models from HuggingFace:
Small Efficient Models:
Qwen/Qwen3-0.6B-MLX-4bit- Ultra-compact Qwen modelgpt2- Classic GPT-2 baselinedistilgpt2- Distilled GPT-2Qwen/Qwen2-0.5B-Instruct- Tiny instruction modelbigcode/starcoder- Code-specialized model
Mid-Size Models:
mlx-community/Llama-3.2-1B-Instruct-4bit- Compact Llamamlx-community/Llama-3.2-3B-Instruct-4bit- Balanced LlamaQwen/Qwen3-4B-MLX-4bit- Mid-range Qwen
Large Models:
unsloth/Qwen3-14B-GGUF- Full Qwen 14Bggml-org/Qwen2.5-Coder-0.5B-Q8_0-GGUF- Quantized coder model
GGUF Quantized Models (300+ models)
Efficient quantized models via llama.cpp with various quantization levels (Q2_K, Q3_K_S, Q4_K_M, Q5_K_M, Q8_0):
Featured Models:
ggml-org/gemma-3-1b-it-GGUF- Gemma instruction-tunedggml-org/Kimi-VL-A3B-Thinking-2506-GGUF- Vision-language reasoningunsloth/Qwen3-30B-A3B-GGUF- Large Qwen modelunsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF- Large coding modelunsloth/gpt-oss-20b-GGUF- 20B open GPTunsloth/gpt-oss-120b-GGUF- 120B flagship modelunsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF- DeepSeek R1 distillunsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF- Larger DeepSeek distillunsloth/Phi-4-reasoning-plus-GGUF- Microsoft Phi-4 with reasoningunsloth/Mistral-Small-3.2-24B-Instruct-2506-GGUF- Mistral 24Bunsloth/Qwen3-30B-A3B-Instruct-2507-GGUF- Latest Qwen 30Bunsloth/gemma-3-4b-it-GGUF,unsloth/gemma-3-12b-it-GGUF- Gemma variantsunsloth/Magistral-Small-2509-GGUF- Magistral modelunsloth/GLM-4.5-Air-GGUF- GLM model familyunsloth/Qwen2.5-VL-7B-Instruct-GGUF- Vision-language Qwen
MLX Models (977 models)
Apple Silicon optimized models using MLX framework for M-series chips:
Text Generation:
mlx-community/Llama-3.1-8B-Instruct-4bitmlx-community/Qwen2.5-7B-Instruct-4bitmlx-community/Mistral-7B-Instruct-v0.2-4bit
Vision-Language:
mlx-community/llava-1.5-7b-4bitmlx-community/Qwen2-VL-2B-Instruct-4bit
Audio:
mlx-community/whisper-tiny-mlx,mlx-community/whisper-tiny.en-mlxmlx-community/whisper-base-mlx,mlx-community/whisper-base.en-mlxmlx-community/whisper-small-mlx,mlx-community/whisper-small.en-mlxmlx-community/whisper-medium-mlx,mlx-community/whisper-medium.en-mlxmlx-community/whisper-large-v3-mlx
Text-to-Speech:
mlx-community/Kokoro-82M-bf16,mlx-community/Kokoro-82M-4bit,mlx-community/Kokoro-82M-6bit,mlx-community/Kokoro-82M-8bitmlx-community/Spark-TTS-0.5B-8bit,mlx-community/Spark-TTS-0.5B-bf16mlx-community/csm-1b-8bit,mlx-community/csm-1b
Image Generation Models
FLUX Models (20 models)
State-of-the-art image generation from Black Forest Labs:
Base Models:
black-forest-labs/FLUX.1-dev- Main development modelblack-forest-labs/FLUX.1-schnell- Fast generation variant
Specialized FLUX:
black-forest-labs/FLUX.1-Fill-dev- Inpainting modelblack-forest-labs/FLUX.1-Canny-dev- Canny edge conditioningblack-forest-labs/FLUX.1-Depth-dev- Depth-based generationblack-forest-labs/FLUX.1-Redux-dev- Image variationblack-forest-labs/FLUX.1-Kontext-dev- Context-aware generationFreepik/flux.1-lite-8B-alpha- Lightweight variant
Quantized FLUX (GGUF):
city96/FLUX.1-dev-gguf:flux1-dev-Q2_K.ggufthroughQ5_K_S.gguf- Various quantization levelscity96/FLUX.1-schnell-gguf:flux1-schnell-Q2_K.ggufthroughQ5_K_S.gguf
MLX-Optimized FLUX:
dhairyashil/FLUX.1-dev-mflux-4bitdhairyashil/FLUX.1-schnell-mflux-v0.6.2-4bitfilipstrand/FLUX.1-Krea-dev-mflux-4bitakx/FLUX.1-Kontext-dev-mflux-4bit
Alternative FLUX:
Kwai-Kolors/Kolors-diffusers- Kolors variantlodestones/Chroma- Chroma FLUX variant
ControlNet FLUX:
InstantX/FLUX.1-dev-Controlnet-Canny- Canny edge controljasperai/Flux.1-dev-Controlnet-Upscaler- Upscaling control
Stable Diffusion 1.5 (20 models)
Classic Stable Diffusion models and fine-tunes:
Base & Popular Models:
stable-diffusion-v1-5/stable-diffusion-v1-5- Original SD 1.5SG161222/Realistic_Vision_V5.1_noVAE- Photorealistic variantLykon/DreamShaper- DreamShaper seriesLykon/dreamshaper-8- Version 8
Specialized SD 1.5:
XpucT/Deliberate:Deliberate_v6.safetensors- Deliberate modelLykon/AbsoluteReality:AbsoluteReality_1.8.1_pruned.safetensors- Photorealisticgsdf/Counterfeit-V2.5:Counterfeit-V2.5_fp16.safetensors- Anime stylephilz1337x/epicrealism:epicrealism_naturalSinRC1VAE.safetensors- Epic realismdigiplay/majicMIX_realistic_v7:majicmixRealistic_v7.safetensors- MajicMIX526christian/526mix-v1.5- 526Miximagepipeline/epiC-PhotoGasm- PhotoGasm variant
Community Models:
stablediffusionapi/realistic-vision-v51- API-ready realisticstablediffusionapi/anything-v5- Anime variantYntec/Deliberate2- Deliberate v2guoyww/animatediff-motion-adapter-v1-5-2- Animation adapter
Stable Diffusion XL (9 models)
High-resolution SDXL models:
Base Models:
stabilityai/stable-diffusion-xl-base-1.0:sd_xl_base_1.0.safetensorsstabilityai/stable-diffusion-xl-refiner-1.0:sd_xl_refiner_1.0.safetensors
Optimized SDXL:
stabilityai/sdxl-turbo:sd_xl_turbo_1.0_fp16.safetensors- Fast generationLykon/dreamshaper-xl-turbo:DreamShaperXL_Turbo_v2_1.safetensors- DreamShaper turboLykon/dreamshaper-xl-lightning:DreamShaperXL_Lightning.safetensors- Lightning fast
Specialized SDXL:
RunDiffusion/Juggernaut-XL-v9:Juggernaut-XL_v9_RunDiffusionPhoto_v2.safetensors- Photorealisticplaygroundai/playground-v2.5-1024px-aesthetic:playground-v2.5-1024px-aesthetic.fp16.safetensorsdataautogpt3/ProteusV0.5:proteusV0.5.safetensors- ProteusLykon/AAM_XL_AnimeMix:AAM_XL_Anime_Mix.safetensors- Anime style
Qwen Image Models (9 models)
Text-to-image generation from Qwen:
Base Model:
Qwen/Qwen-Image- Main modelQwen/Qwen-Image-Edit- Image editing variant
GGUF Quantized (various levels):
city96/Qwen-Image-gguf:qwen-image-Q2_K.ggufcity96/Qwen-Image-gguf:qwen-image-Q3_K_S.ggufcity96/Qwen-Image-gguf:qwen-image-Q4_K_S.ggufcity96/Qwen-Image-gguf:qwen-image-Q5_K_S.ggufcity96/Qwen-Image-gguf:qwen-image-Q6_K.ggufcity96/Qwen-Image-gguf:qwen-image-Q8_0.ggufcity96/Qwen-Image-gguf:qwen-image-BF16.gguf
Lightning Variant:
lightx2v/Qwen-Image-Lightning:Qwen-Image-Lightning-8steps-V1.1.safetensors
OmniGen
Shitao/OmniGen-v1-diffusers- Unified image generation model
Vision & Multimodal Models
Vision-Language Models (7 models)
Image understanding and captioning:
mlx-community/llava-1.5-7b-4bit- LLaVA multimodalmlx-community/Qwen2-VL-2B-Instruct-4bit- Qwen vision-languageucaslcl/GOT-OCR2_0- OCR modelHuggingFaceTB/SmolVLM-Instruct- Compact VLMzai-org/GLM-4.5V- GLM vision modelQwen/Qwen2.5-VL-3B-Instruct- Qwen 2.5 VLllava-hf/llava-interleave-qwen-0.5b-hf- Interleaved LLaVA
Image-to-Text / Captioning (7 models)
Image understanding and description:
Salesforce/blip-image-captioning-base- BLIP baseSalesforce/blip-image-captioning-large- BLIP largeSalesforce/blip2-opt-2.7b- BLIP-2 with OPTSalesforce/blip-vqa-base- Visual question answeringnlpconnect/vit-gpt2-image-captioning- ViT-GPT2microsoft/git-base- GIT basemicrosoft/git-base-coco- GIT COCO-trainedmicrosoft/trocr-small-printed- OCR for printed text
Object Detection (7 models)
Object localization and recognition:
facebook/detr-resnet-50- DETR with ResNet-50facebook/detr-resnet-101- DETR with ResNet-101hustvl/yolos-tiny- Tiny YOLO-Shustvl/yolos-small- Small YOLO-Svalentinafeve/yolos-fashionpedia- Fashion-specialized
Zero-Shot Object Detection (6 models):
google/owlvit-base-patch32- OWL-ViT base-32google/owlvit-base-patch16- OWL-ViT base-16google/owlvit-large-patch14- OWL-ViT largegoogle/owlv2-base-patch16- OWL-ViT v2google/owlv2-base-patch16-ensemble- OWL-ViT v2 ensembleIDEA-Research/grounding-dino-tiny- Grounding DINO
Image Classification (9 models)
Image categorization and recognition:
timm/resnet50.a1_in1k- ResNet-50 ImageNetmicrosoft/resnet-18- ResNet-18microsoft/resnet-50- ResNet-50google/vit-base-patch16-224- Vision Transformerapple/mobilevit-small- MobileViT smallapple/mobilevit-xx-small- MobileViT xx-smallnateraw/vit-age-classifier- Age classificationFalconsai/nsfw_image_detection- NSFW detectionrizvandwiki/gender-classification-2- Gender classification
Zero-Shot Image Classification (4 models):
openai/clip-vit-base-patch32- CLIP base-32openai/clip-vit-base-patch16- CLIP base-16laion/CLIP-ViT-H-14-laion2B-s32B-b79K- Large CLIPlaion/CLIP-ViT-g-14-laion2B-s12B-b42K- Giant CLIP
Image Segmentation (2 models)
Pixel-level segmentation:
mattmdjaga/segformer_b2_clothes- Clothing segmentationfacebook/sam2-hiera-large- Segment Anything 2nvidia/segformer-b3-finetuned-ade-512-512- ADE20K segmentation
Depth Estimation (5 models)
Monocular depth prediction:
depth-anything/Depth-Anything-V2-Small-hfdepth-anything/Depth-Anything-V2-Base-hfdepth-anything/Depth-Anything-V2-Large-hfLiheYoung/depth-anything-base-hfIntel/dpt-large- Dense Prediction Transformer
Audio Models
Text-to-Speech (7 models)
Speech synthesis:
suno/bark- Bark TTS (multilingual)suno/bark-small- Compact Barkhexgrad/Kokoro-82M- Kokoro TTSprince-canuma/Kokoro-82M- Kokoro variantfacebook/mms-tts-eng- English MMSfacebook/mms-tts-fra- French MMSfacebook/mms-tts-deu- German MMSfacebook/mms-tts-kor- Korean MMS
Text-to-Audio / Music Generation (11 models)
Music and sound generation:
facebook/musicgen-small- MusicGen smallfacebook/musicgen-medium- MusicGen mediumfacebook/musicgen-large- MusicGen largefacebook/musicgen-melody- MusicGen with melodyfacebook/musicgen-stereo-small- Stereo smallfacebook/musicgen-stereo-large- Stereo largecvssp/audioldm-s-full-v2- AudioLDMcvssp/audioldm2- AudioLDM 2harmonai/maestro-150k- Maestroucsd-reach/musicldm- MusicLDMstabilityai/stable-audio-open-1.0- Stable Audio
Automatic Speech Recognition (6 models)
Speech-to-text transcription:
openai/whisper-small- Whisper smallopenai/whisper-medium- Whisper mediumopenai/whisper-large-v2- Whisper large v2openai/whisper-large-v3- Whisper large v3openai/whisper-large-v3-turbo- Whisper turboSystran/faster-whisper-large-v3- Optimized Whisperggerganov/whisper.cpp- Whisper C++ implementation
Audio Classification (2 models)
Audio understanding:
MIT/ast-finetuned-audioset-10-10-0.4593- Audio Spectrogram Transformerehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition- Emotion recognitionlaion/clap-htsat-unfused- CLAP audio-text
Video Models
Text-to-Video (9 models)
Video generation from text:
THUDM/CogVideoX-2b- CogVideo 2BTHUDM/CogVideoX-5b- CogVideo 5BWan-AI/Wan2.1-T2V-14B-Diffusers- Wan 2.1 T2VWan-AI/Wan2.2-T2V-A14B-Diffusers- Wan 2.2 T2VWan-AI/Wan2.2-TI2V-5B-Diffusers- Wan 2.2 TI2V
Image-to-Video Models
stabilityai/stable-video-diffusion-img2vid-xt- Stable Video DiffusionWan-AI/Wan2.1-I2V-14B-480P-Diffusers- Wan I2V 480PWan-AI/Wan2.1-I2V-14B-720P-Diffusers- Wan I2V 720PWan-AI/Wan2.2-I2V-A14B-Diffusers- Wan 2.2 I2V
LoRA Models (60+ models)
Style adapters for Stable Diffusion:
Character Styles:
danbrown/loras:lucy_cyberpunk.safetensorsdanbrown/loras:aqua_konosuba.safetensorsdanbrown/loras:paimon_genshinimpact.safetensorsdanbrown/loras:princesszelda.safetensorsdanbrown/loras:jacksparrow.safetensorsdanbrown/loras:gigachad.safetensorsdanbrown/loras:harold.safetensorsdanbrown/loras:pepefrog.safetensors
Art Styles:
danbrown/loras:ghibli_scenery.safetensorsdanbrown/loras:arcane_style.safetensorsdanbrown/loras:persona5_style.safetensorsdanbrown/loras:onepiece_style.safetensorsdanbrown/loras:myheroacademia_style.safetensorsdanbrown/loras:akiratoriyama_style.safetensorsdanbrown/loras:jimlee_style.safetensorsdanbrown/loras:wlop_style.safetensorsdanbrown/loras:discoelysium_style.safetensorsdanbrown/loras:sokolov_style.safetensorsdanbrown/loras:peanutscomics_style.safetensors
Visual Effects:
danbrown/loras:fire_vfx.safetensorsdanbrown/loras:lightning_vfx.safetensorsdanbrown/loras:water_vfx.safetensorsdanbrown/loras:flamingeye.safetensors
Specialized:
danbrown/loras:2d_sprite.safetensors- 2D spritesdanbrown/loras:pixhell.safetensors- Pixel artdanbrown/loras:add_detail.safetensors- Detail enhancementdanbrown/loras:animeoutlineV4.safetensors- Anime outlinesdanbrown/loras:thickeranimelines.safetensors- Thick anime linesdanbrown/loras:cyberpunk_tarot.safetensors- Tarot styledanbrown/loras:gacha_splash.safetensors- Gacha game artdanbrown/loras:sxz_game_assets.safetensors- Game assetsdanbrown/loras:twitch_emotes.safetensors- Twitch emotesdanbrown/loras:3Danaglyph.safetensors- 3D effect
SDXL LoRAs (7 models):
CiroN2022/toy-face:toy_face_sdxl.safetensors- Toy face stylenerijs/pixel-art-xl:pixel-art-xl.safetensors- Pixel artgoofyai/3d_render_style_xl:3d_render_style_xl.safetensors- 3D renderartificialguybr/CuteCartoonRedmond-V2:CuteCartoonRedmond-CuteCartoon-CuteCartoonAF.safetensors- Cute cartoonblink7630/graphic-novel-illustration:Graphic_Novel_Illustration-000007.safetensors- Graphic novelrobert123231/coloringbookgenerator:ColoringBookRedmond-ColoringBook-ColoringBookAF.safetensors- Coloring bookLinaqruf/anime-detailer-xl-lora:anime-detailer-xl-lora.safetensors- Anime detail
Specialized Components
ControlNet Models (7+ models)
Conditional image generation:
SD 1.5 ControlNets:
lllyasviel/control_v11p_sd15_canny:diffusion_pytorch_model.fp16.safetensors- Canny edgeslllyasviel/control_v11p_sd15_inpaint:diffusion_pytorch_model.fp16.safetensors- Inpaintinglllyasviel/control_v11p_sd15_mlsd:diffusion_pytorch_model.fp16.safetensors- Line detectionlllyasviel/control_v11p_sd15_lineart:diffusion_pytorch_model.fp16.safetensors- Line artlllyasviel/control_v11p_sd15_scribble:diffusion_pytorch_model.fp16.safetensors- Scribbleslllyasviel/control_v11p_sd15_openpose:diffusion_pytorch_model.fp16.safetensors- OpenPose
IP-Adapter Models (6 models)
Image prompt adapters:
SD 1.5 IP-Adapters:
h94/IP-Adapter:models/ip-adapter_sd15.binh94/IP-Adapter:models/ip-adapter_sd15_light.binh94/IP-Adapter:models/ip-adapter_sd15_vit-G.bin
SDXL IP-Adapters:
h94/IP-Adapter:sdxl_models/ip-adapter_sdxl.binh94/IP-Adapter:sdxl_models/ip-adapter_sdxl_vit-h.binh94/IP-Adapter:sdxl_models/ip-adapter-plus_sdxl_vit-h.bin
VAE Models (4 models)
Variational autoencoders for SD:
stabilityai/sd-vae-ft-mse- SD 1.5 VAEstabilityai/sd-vae-ft-ema- SD 1.5 VAE EMAstabilityai/sdxl-vae- SDXL VAEmadebyollin/sdxl-vae-fp16-fix- SDXL VAE FP16
Upscaler Models (5+ models)
Image super-resolution:
ai-forever/Real-ESRGAN:RealESRGAN_x2.pth- 2x upscaleai-forever/Real-ESRGAN:RealESRGAN_x4.pth- 4x upscaleai-forever/Real-ESRGAN:RealESRGAN_x8.pth- 8x upscaleximso/RealESRGAN_x4plus_anime_6B:RealESRGAN_x4plus_anime_6B.pth- Anime 4xstabilityai/stable-diffusion-x4-upscaler- SD upscalerstabilityai/sd-x2-latent-upscaler- Latent upscalercaidas/swin2SR-classical-sr-x2-64- Swin2SR 2xcaidas/swin2SR-classical-sr-x4-64- Swin2SR 4xcaidas/swin2SR-lightweight-x2-64- Lightweight 2xcaidas/swin2SR-compressed-sr-x4-48- Compressed 4xcaidas/swin2SR-realworld-sr-x4-64-bsrgan-psnr- Real-world 4x
NLP & Text Models
Text Classification (2 models)
Sentiment and classification:
cardiffnlp/twitter-roberta-base-sentiment-latest- Sentiment analysismichellejieli/emotion_text_classifier- Emotion classification
Zero-Shot Classification (7 models)
Flexible text classification:
facebook/bart-large-mnli- BART MNLIMoritzLaurer/DeBERTa-v3-base-mnli-fever-anli- DeBERTa NLIMoritzLaurer/mDeBERTa-v3-base-mnli-xnli- Multilingual DeBERTatasksource/ModernBERT-base-nli- ModernBERTcross-encoder/nli-deberta-v3-base- Cross-encodermicrosoft/deberta-v2-xlarge-mnli- DeBERTa XLroberta-large-mnli- RoBERTa MNLI
Question Answering (4 models)
Extractive QA:
distilbert-base-cased-distilled-squad- DistilBERT SQuADdistilbert-base-uncased-distilled-squad- DistilBERT uncasedbert-large-uncased-whole-word-masking-finetuned-squad- BERT largedeepset/roberta-base-squad2- RoBERTa SQuAD 2.0
Table Question Answering (3 models)
Structured data QA:
google/tapas-base-finetuned-wtq- TAPAS basegoogle/tapas-large-finetuned-wtq- TAPAS largemicrosoft/tapex-large-finetuned-tabfact- TAPEX large
Fill Mask / MLM (5 models)
Masked language modeling:
bert-base-uncased- BERT base uncasedbert-base-cased- BERT base casedroberta-base- RoBERTa basedistilbert-base-uncased- DistilBERTalbert-base-v2- ALBERT v2
Text Generation / Summarization (4 models)
Text transformation:
Falconsai/text_summarization- General summarizationFalconsai/medical_summarization- Medical textimvladikon/het5_summarization- HE-T5 summarization
Text2Text Generation (4 models)
Sequence-to-sequence:
google-t5/t5-small- T5 smallgoogle-t5/t5-base- T5 basegoogle-t5/t5-large- T5 largegoogle/flan-t5-small,google/flan-t5-base,google/flan-t5-large- FLAN-T5 variants
Translation (3 models)
Language translation models
Sentence Similarity / Embeddings (4 models)
Semantic embeddings:
sentence-transformers/all-MiniLM-L6-v2- All-MiniLMsentence-transformers/all-mpnet-base-v2- All-MPNetsentence-transformers/paraphrase-multilingual-MiniLM-L12-v2- Multilingualmixedbread-ai/mxbai-embed-large-v1- MixedBread embeddingBAAI/bge-base-en-v1.5,BAAI/bge-large-en-v1.5- BGE embeddingsBAAI/bge-m3- BGE-M3
Reranking (3 models)
Search result reranking:
BAAI/bge-reranker-base- BGE reranker baseBAAI/bge-reranker-large- BGE reranker largeBAAI/bge-reranker-v2-m3- BGE reranker v2
Feature Extraction (3 models)
Dense retrieval:
facebook/contriever- Contrievergokaygokay/Flux-Prompt-Enhance- Prompt enhancement
Model Type Summary
| Category | Count | Use Cases |
|---|---|---|
| MLX Models | 977 | Apple Silicon optimized inference |
| GGUF/llama.cpp | 300+ | Quantized efficient LLMs |
| HuggingFace Text Gen | 56 | General text generation |
| LoRA Adapters | 60+ | Style transfer for SD models |
| Ollama Models | 23 | Local privacy-first LLMs |
| FLUX Models | 20 | State-of-the-art image generation |
| Stable Diffusion | 20 | Classic image generation |
| Various Specialized | 200+ | Audio, vision, NLP tasks |
Total: 1,655+ models across all categories