# 500+ Models, One API The inference API follows the unified, general industry standards, chat completions. Any application built for the OpenAI API can switch to Cluster by changing the base URL and API key, no code changes required. `POST /v1/chat/completions`\ `Authorization: Bearer ` `{`\ `"model": "meta-llama/llama-3.1-70b-instruct",`\ `"messages": [`\ `{"role": "user", "content": "Explain how tokenized datasets work."}`\ `]`\ `}` The model catalog includes families from Meta (Llama), Mistral AI, DeepSeek, Qwen, Google (Gemma), and others, spanning parameter ranges from 7B to 405B. Models are categorized by task type: general chat, code generation, reasoning, instruction following, and domain-specific applications. Billing is per-token, with separate rates for input and output tokens. There are no subscription tiers or minimum commitments. --- # Agent Instructions: Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://cluster-protocol.gitbook.io/whitepaper/core-infrastructure/overview/inference-engine/500+-models-one-api.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.