Chutes
Chutes is a serverless compute platform designed for deploying, scaling, and running open-source artificial intelligence (AI) models. Developed by Rayon Labs, it operates on a decentralized, open-source infrastructure to provide AI inference and other computational services for developers and enterprises. [1] [2]
Overview
Chutes provides a platform for developers to access and utilize a wide range of open-source AI models without managing the underlying infrastructure. The service is built on a serverless architecture, meaning developers can execute code and run models on demand. The platform's infrastructure is decentralized, which is intended to enhance flexibility and scalability. Chutes states that its system is capable of processing trillions of tokens per month for its users. [1]
The core value proposition of the platform is to simplify access to high-performance AI computation. It maintains a library of popular and state-of-the-art (SOTA) open-source models that are kept "permanently hot," ensuring they are initialized and ready for immediate inference requests. This approach is designed to reduce latency and provide stable performance for applications that rely on these models. The platform also allows users to add and run their own custom models, providing flexibility for specialized use cases. [1]
Technology and Infrastructure
Chutes operates as a serverless compute layer for AI tasks. This model abstracts away server management, allowing developers to focus on their application code. The platform is responsible for allocating resources, scaling, and managing the execution environment for each job. [1]
The infrastructure is described as decentralized and open-source. This suggests a distributed network of compute resources rather than a centralized data center model. This design can contribute to resilience and potentially lower operational costs. The platform is engineered to handle various AI workloads, with a primary focus on model inference. [1]
A key technical feature is the "permanently hot" model system. By keeping frequently used AI models loaded and active, Chutes aims to minimize the cold-start delays often associated with serverless functions, making it suitable for real-time applications. The platform's team monitors for new open-source model releases and works to integrate them quickly, often making them available on the platform within a short time after their public release. [1]
Services and Features
Chutes offers a range of services centered around AI model execution and plans to expand its capabilities. [1]
AI Model Inference
The primary service is high-performance inference for a variety of AI models. Users can access these models via an API to integrate AI capabilities into their own applications. The platform provides analytics for monitoring usage and performance. [1]
Model Support
Chutes supports a diverse set of AI model types, allowing for a wide array of applications. The platform categorizes its supported models into several groups:
- Large Language Models (LLMs): For tasks involving text generation, summarization, and conversation.
- Image Generation: For creating images from text prompts (diffusion models).
- Video, Speech, and Music: For processing and generating multimedia content.
- Embedding Models: To convert data into numerical representations for search, recommendation, and classification tasks.
- Content Moderation: For detecting hate speech, NSFW content, and other undesirable material.
- 3D Generation: For creating 3D models and animations.
- Custom Models: Users can deploy their own open models on the platform.
This range of support indicates the platform's goal to be a comprehensive resource for various AI development needs. [1]
Planned Services
Chutes has announced several upcoming features to broaden its service offerings:
- Long Jobs: This feature is intended for long-running, asynchronous tasks such as batch processing, data analysis, and model training.
- TEE/Secure Compute: A planned service that will use Trusted Execution Environments (TEEs) to provide secure, private, and isolated compute environments for sensitive data and proprietary models.
- Consumer Applications: The company plans to release its own consumer-facing apps, named Chutes Chat and Chutes Studio.
These planned additions suggest a strategy to serve a wider range of computational needs, from individual developers to enterprise clients with security-sensitive workloads. [1]
Available Models and Integrations
Chutes provides access to numerous open-source models from various research labs and companies. The platform highlights its ability to quickly host new and popular models. Some of the model providers featured on the platform include DeepSeek, Mistral AI, Microsoft, Google (Gemma), Qwen (Alibaba), and Moonshot AI (Kimi). Specific models available include variants of DeepSeek V3, Mistral Small, and NousResearch's DeepHermes. [1] [2]
The company's website lists several projects and companies that use or integrate with its services, including:
- Bittensor
- OpenRouter
- Cline
- Kilo
- Roo Code
- Fetch.ai
- DeepFakeAI
These integrations demonstrate the platform's adoption within the AI and decentralized technology ecosystems. [1]
Development
Chutes is developed and operated by Rayon Labs. The platform's official X (formerly Twitter) account was created in November 2024, and it actively posts updates regarding new features, model availability, and pricing changes. [1] [2]
Pricing Model
Chutes utilizes a subscription-based pricing model with several tiers, supplemented by a pay-as-you-go (PAYG) option for usage that exceeds plan limits. The pricing structure was updated in August 2025 to provide fixed monthly plans. [2]
The available subscription tiers are:
- Base: A low-cost plan designed for individuals or small projects, offering up to 300 requests per day.
- Plus: A mid-tier plan that includes up to 2,000 requests per day and standard email support.
- Pro: A higher-tier plan providing up to 5,000 requests per day and priority support.
- Enterprise: A custom plan with unlimited requests, dedicated support, Service Level Agreement (SLA) guarantees, and custom billing options.
All paid tiers include unlimited API keys, access to all available models, and use of the Chutes Chat and Chutes Studio applications. The platform also offers a program for startups, providing up to $20,000 in free credits to eligible companies. [1]