Machine Learning Infrastructure
API Key
Baseten REST API
Deploy and scale ML models with serverless infrastructure
Baseten is a serverless platform for deploying, managing, and scaling machine learning models in production. It provides infrastructure for serving models with autoscaling, GPU support, and low-latency inference. Developers use Baseten to deploy models from popular frameworks like PyTorch, TensorFlow, and Hugging Face without managing servers or Kubernetes clusters.
Base URL
https://api.baseten.co/v1
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /models/{model_id}/deployments | Deploy a new version of a machine learning model |
| GET | /models/{model_id}/deployments | List all deployments for a specific model |
| GET | /deployments/{deployment_id} | Get detailed information about a specific deployment |
| POST | /models/{model_id}/predict | Run inference on a deployed model with input data |
| POST | /models/{model_id}/predict_async | Submit an asynchronous inference request for long-running predictions |
| GET | /predictions/{prediction_id} | Retrieve the status and results of an async prediction |
| GET | /models | List all models in your workspace |
| POST | /models | Create a new model in your workspace |
| PATCH | /deployments/{deployment_id} | Update deployment configuration including scaling and hardware settings |
| DELETE | /deployments/{deployment_id} | Delete a model deployment and release resources |
| GET | /deployments/{deployment_id}/logs | Retrieve logs from a specific deployment for debugging |
| GET | /deployments/{deployment_id}/metrics | Get performance metrics including latency, throughput, and error rates |
| POST | /models/{model_id}/secrets | Add environment secrets for model deployment |
| GET | /workspace/usage | Get workspace resource usage and billing information |
Code Examples
curl -X POST https://api.baseten.co/v1/models/MODEL_ID/predict \
-H 'Authorization: Api-Key YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"instances": [
{"text": "Classify this sentiment", "max_length": 100}
]
}'
Connect Baseten to AI
Deploy a Baseten MCP server on IOX Cloud and connect it to Claude, ChatGPT, Cursor, or any AI client. Your AI assistant gets direct access to Baseten through these tools:
deploy_ml_model
Deploy a machine learning model to Baseten infrastructure with specified hardware and scaling configuration
run_inference
Execute model inference with input data and return predictions synchronously or asynchronously
monitor_deployment
Get real-time metrics and logs for deployed models including latency, error rates, and resource utilization
manage_model_versions
List, update, or rollback model deployments across different versions
optimize_deployment_config
Analyze usage patterns and recommend optimal scaling and hardware configurations for cost and performance
Deploy in 60 seconds
Describe what you need, AI generates the code, and IOX deploys it globally.
Deploy Baseten MCP Server →