# Multi-Modal APIs

Beyond text completions, Cluster serves a full suite of AI modalities through the same gateway:

**Embeddings** (<mark style="color:$warning;">`POST /v1/embeddings`</mark>) Convert text into dense vector representations for semantic search, clustering, and retrieval-augmented generation (RAG). Supports multiple embedding models with configurable dimensions.

**Image Generation** (<mark style="color:$warning;">`POST /v1/images/generations`</mark>) Generate images from text prompts using open-source diffusion models. Returns URLs or base64-encoded images.

**Text-to-Speech** (<mark style="color:$warning;">`POST /v1/audio/speech`</mark>) Convert text to natural-sounding audio across multiple voices and languages.

**Speech-to-Text** (<mark style="color:$warning;">`POST /v1/audio/transcriptions`</mark>) Transcribe audio files into text. Supports multiple audio formats and languages.

**Document Reranking** (<mark style="color:$warning;">`POST /v1/rerank`</mark>) Reorder a set of documents by relevance to a query. Used in RAG pipelines to improve retrieval quality before sending context to the language model.

All modalities are served through the same authentication, billing, and settlement infrastructure. A single API key and a single balance covers everything.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://cluster-protocol.gitbook.io/whitepaper/core-infrastructure/overview/inference-engine/multi-modal-apis.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.