Deploy MCP Server
Machine Learning Infrastructure API Key

Baseten REST API

Deploy and scale ML models with serverless infrastructure

Baseten is a serverless platform for deploying, managing, and scaling machine learning models in production. It provides infrastructure for serving models with autoscaling, GPU support, and low-latency inference. Developers use Baseten to deploy models from popular frameworks like PyTorch, TensorFlow, and Hugging Face without managing servers or Kubernetes clusters.

Base URL https://api.baseten.co/v1

API Endpoints

MethodEndpointDescription
POST/models/{model_id}/deploymentsDeploy a new version of a machine learning model
GET/models/{model_id}/deploymentsList all deployments for a specific model
GET/deployments/{deployment_id}Get detailed information about a specific deployment
POST/models/{model_id}/predictRun inference on a deployed model with input data
POST/models/{model_id}/predict_asyncSubmit an asynchronous inference request for long-running predictions
GET/predictions/{prediction_id}Retrieve the status and results of an async prediction
GET/modelsList all models in your workspace
POST/modelsCreate a new model in your workspace
PATCH/deployments/{deployment_id}Update deployment configuration including scaling and hardware settings
DELETE/deployments/{deployment_id}Delete a model deployment and release resources
GET/deployments/{deployment_id}/logsRetrieve logs from a specific deployment for debugging
GET/deployments/{deployment_id}/metricsGet performance metrics including latency, throughput, and error rates
POST/models/{model_id}/secretsAdd environment secrets for model deployment
GET/workspace/usageGet workspace resource usage and billing information

Code Examples

curl -X POST https://api.baseten.co/v1/models/MODEL_ID/predict \
  -H 'Authorization: Api-Key YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "instances": [
      {"text": "Classify this sentiment", "max_length": 100}
    ]
  }'

Connect Baseten to AI

Deploy a Baseten MCP server on IOX Cloud and connect it to Claude, ChatGPT, Cursor, or any AI client. Your AI assistant gets direct access to Baseten through these tools:

deploy_ml_model Deploy a machine learning model to Baseten infrastructure with specified hardware and scaling configuration
run_inference Execute model inference with input data and return predictions synchronously or asynchronously
monitor_deployment Get real-time metrics and logs for deployed models including latency, error rates, and resource utilization
manage_model_versions List, update, or rollback model deployments across different versions
optimize_deployment_config Analyze usage patterns and recommend optimal scaling and hardware configurations for cost and performance

Deploy in 60 seconds

Describe what you need, AI generates the code, and IOX deploys it globally.

Deploy Baseten MCP Server →

Related APIs