Machine Learning Infrastructure API Key

Baseten REST API

Deploy and scale ML models with serverless infrastructure

Baseten is a serverless platform for deploying, managing, and scaling machine learning models in production. It provides infrastructure for serving models with autoscaling, GPU support, and low-latency inference. Developers use Baseten to deploy models from popular frameworks like PyTorch, TensorFlow, and Hugging Face without managing servers or Kubernetes clusters.

Base URL https://api.baseten.co/v1

API Endpoints

Method	Endpoint	Description
POST	`/models/{model_id}/deployments`	Deploy a new version of a machine learning model
GET	`/models/{model_id}/deployments`	List all deployments for a specific model
GET	`/deployments/{deployment_id}`	Get detailed information about a specific deployment
POST	`/models/{model_id}/predict`	Run inference on a deployed model with input data
POST	`/models/{model_id}/predict_async`	Submit an asynchronous inference request for long-running predictions
GET	`/predictions/{prediction_id}`	Retrieve the status and results of an async prediction
GET	`/models`	List all models in your workspace
POST	`/models`	Create a new model in your workspace
PATCH	`/deployments/{deployment_id}`	Update deployment configuration including scaling and hardware settings
DELETE	`/deployments/{deployment_id}`	Delete a model deployment and release resources
GET	`/deployments/{deployment_id}/logs`	Retrieve logs from a specific deployment for debugging
GET	`/deployments/{deployment_id}/metrics`	Get performance metrics including latency, throughput, and error rates
POST	`/models/{model_id}/secrets`	Add environment secrets for model deployment
GET	`/workspace/usage`	Get workspace resource usage and billing information

Code Examples

curl -X POST https://api.baseten.co/v1/models/MODEL_ID/predict \
  -H 'Authorization: Api-Key YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "instances": [
      {"text": "Classify this sentiment", "max_length": 100}
    ]
  }'

const response = await fetch('https://api.baseten.co/v1/models/MODEL_ID/predict', {
  method: 'POST',
  headers: {
    'Authorization': 'Api-Key YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    instances: [
      { text: 'Classify this sentiment', max_length: 100 }
    ]
  })
});

const result = await response.json();
console.log(result.predictions);

import requests

url = 'https://api.baseten.co/v1/models/MODEL_ID/predict'
headers = {
    'Authorization': 'Api-Key YOUR_API_KEY',
    'Content-Type': 'application/json'
}
data = {
    'instances': [
        {'text': 'Classify this sentiment', 'max_length': 100}
    ]
}

response = requests.post(url, headers=headers, json=data)
result = response.json()
print(result['predictions'])

Use Baseten from Claude / Cursor / ChatGPT

Get a hosted MCP endpoint for Baseten. Paste your Baseten API key, copy back one URL, drop it into Claude Desktop, Cursor, or any AI client that supports remote MCP. Your AI calls Baseten directly with your credentials — no local install, works on mobile.

deploy_ml_model Deploy a machine learning model to Baseten infrastructure with specified hardware and scaling configuration

run_inference Execute model inference with input data and return predictions synchronously or asynchronously

monitor_deployment Get real-time metrics and logs for deployed models including latency, error rates, and resource utilization

manage_model_versions List, update, or rollback model deployments across different versions

optimize_deployment_config Analyze usage patterns and recommend optimal scaling and hardware configurations for cost and performance

Connect in 60 seconds

Paste your Baseten key → get an MCP URL → paste into Claude/Cursor. Hosted by IOX, encrypted at rest.

Connect Baseten to your AI →

Baseten REST API

API Endpoints

Sponsor this page

Code Examples

Use Baseten from Claude / Cursor / ChatGPT

Connect in 60 seconds

Related APIs