CosmicAC Logo

Managed Inference Job

Managed Inference runs open-source language models on CosmicAC-managed infrastructure. CosmicAC provisions the model server and manages authentication, request routing, and load balancing.


What "managed" means

Running a model for inference involves more than the model itself. It requires a server to handle requests, authenticate callers, and balance traffic across available capacity.

CosmicAC handles all of this. You send requests and receive responses. The infrastructure is not something you configure or manage. This contrasts with a GPU Container job, where you control the full environment and are responsible for what runs inside it.


How requests are handled

CosmicAC authenticates every request using your API key and routes it to an available model server. If multiple servers are running the same model, requests are distributed across them. Streaming is supported, so responses can be delivered as tokens are generated without waiting for the complete output.

Authentication

Requests require a valid API key. CosmicAC rejects requests without a valid key before they reach the model.

See Get your Managed Inference API key to create a key.


Next steps

On this page