Baseten REST API
Deploy and scale ML models with serverless infrastructure
Baseten is a serverless platform for deploying, managing, and scaling machine learning models in production. It provides infrastructure for serving models with autoscaling, GPU support, and low-latency inference. Developers use Baseten to deploy models from popular frameworks like PyTorch, TensorFlow, and Hugging Face without managing servers or Kubernetes clusters.
https://api.baseten.co/v1
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /models/{model_id}/deployments | Deploy a new version of a machine learning model |
| GET | /models/{model_id}/deployments | List all deployments for a specific model |
| GET | /deployments/{deployment_id} | Get detailed information about a specific deployment |
| POST | /models/{model_id}/predict | Run inference on a deployed model with input data |
| POST | /models/{model_id}/predict_async | Submit an asynchronous inference request for long-running predictions |
| GET | /predictions/{prediction_id} | Retrieve the status and results of an async prediction |
| GET | /models | List all models in your workspace |
| POST | /models | Create a new model in your workspace |
| PATCH | /deployments/{deployment_id} | Update deployment configuration including scaling and hardware settings |
| DELETE | /deployments/{deployment_id} | Delete a model deployment and release resources |
| GET | /deployments/{deployment_id}/logs | Retrieve logs from a specific deployment for debugging |
| GET | /deployments/{deployment_id}/metrics | Get performance metrics including latency, throughput, and error rates |
| POST | /models/{model_id}/secrets | Add environment secrets for model deployment |
| GET | /workspace/usage | Get workspace resource usage and billing information |
Sponsor this page
AvailableReach developers actively building with Baseten. See live pageview data and self-serve checkout — your slot goes live in minutes.
View inventory & pricing →Code Examples
curl -X POST https://api.baseten.co/v1/models/MODEL_ID/predict \
-H 'Authorization: Api-Key YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"instances": [
{"text": "Classify this sentiment", "max_length": 100}
]
}'
Use Baseten from Claude / Cursor / ChatGPT
Get a hosted MCP endpoint for Baseten. Paste your Baseten API key, copy back one URL, drop it into Claude Desktop, Cursor, or any AI client that supports remote MCP. Your AI calls Baseten directly with your credentials — no local install, works on mobile.
deploy_ml_model
Deploy a machine learning model to Baseten infrastructure with specified hardware and scaling configuration
run_inference
Execute model inference with input data and return predictions synchronously or asynchronously
monitor_deployment
Get real-time metrics and logs for deployed models including latency, error rates, and resource utilization
manage_model_versions
List, update, or rollback model deployments across different versions
optimize_deployment_config
Analyze usage patterns and recommend optimal scaling and hardware configurations for cost and performance
Connect in 60 seconds
Paste your Baseten key → get an MCP URL → paste into Claude/Cursor. Hosted by IOX, encrypted at rest.
Connect Baseten to your AI →