# Bring Your Own Model

Developers with proprietary or fine-tuned models can deploy them on Cluster's compute infrastructure and serve inference through the same unified API gateway.

The process:

1. Upload the model weights and configuration
2. Cluster provisions GPU compute and loads the model
3. The model is assigned an endpoint on the inference API
4. Consumers call it via the same `/v1/chat/completions` endpoint
5. Billing is per-token, settled in the $CP

The developer's model benefits from Cluster's existing infrastructure: authentication, rate limiting, billing, x402 support, and SDK compatibility: without building any of it from scratch.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://cluster-protocol.gitbook.io/whitepaper/custom-model-hosting-and-fine-tuning/bring-your-own-model.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
