LLM Gateway

Category: AI & Agents

This page is generated from the Air Pipe marketplace. Browse it live to install into your organization.

A configurable LLM completion and content moderation endpoint. System prompt, model, and API key live in managed variables — swap providers or update prompts without touching the config or redeploying.

What's included

File	Purpose
`config.yml`	AirPipe config with `docs: true`

Endpoints

Method	Path	Description
`POST`	`/ai/complete`	Send a message, receive a completion
`POST`	`/ai/moderate`	Check text against OpenAI's moderation API

Setup

1. Get an OpenAI API key

2. Set managed variables

Name	Example value
`OPENAI_API_KEY`	`sk-proj-...`
`SYSTEM_PROMPT`	`You are a helpful assistant.`
`LLM_MODEL`	`gpt-4o-mini`

gpt-4o-mini is the recommended starting model — fast, cheap, and capable for most tasks.

Testing

BASE=https://your-airpipe-host

# Completion
curl -X POST $BASE/ai/complete \
  -H "Content-Type: application/json" \
  -d '{"message": "What is AirPipe in one sentence?", "max_tokens": 100}'

# Moderation check
curl -X POST $BASE/ai/moderate \
  -H "Content-Type: application/json" \
  -d '{"text": "Some user-generated content to check"}'

Response shape

`/ai/complete`

The response mirrors the OpenAI API directly:

{
  "id": "chatcmpl-...",
  "choices": [
    {
      "message": { "role": "assistant", "content": "AirPipe is..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 18,
    "total_tokens": 42
  }
}

Extract the completion text from choices[0].message.content.

`/ai/moderate`

{
  "results": [
    {
      "flagged": false,
      "categories": { "hate": false, "violence": false, ... },
      "category_scores": { "hate": 0.0001, "violence": 0.0003, ... }
    }
  ]
}

Check results[0].flagged for a simple pass/fail.

Changing the system prompt

Update the SYSTEM_PROMPT managed variable in the AirPipe dashboard. The next request will use the new prompt immediately — no redeploy needed. This makes it straightforward to:

Tune the assistant's persona or knowledge scope
A/B test different system prompts by maintaining two deployments with different variables
Run environment-specific prompts (stricter in production, more permissive in staging)

Using Anthropic instead of OpenAI

Change the managed variables and update the CallLLM action's URL and headers:

- name: CallLLM
  http:
    url: https://api.anthropic.com/v1/messages
    headers:
      content-type: application/json
      x-api-key: "a|ap_var::ANTHROPIC_API_KEY|"
      anthropic-version: "2023-06-01"
    body: |
      {
        "model": a|double_quote(ap_var::LLM_MODEL)|,
        "max_tokens": 1024,
        "system": a|double_quote(ap_var::SYSTEM_PROMPT)|,
        "messages": [
          { "role": "user", "content": a|double_quote(ValidateBody::message)| }
        ]
      }

Response is then at content[0].text instead of choices[0].message.content.

Building a RAG pipeline

Add a vector search step before CallLLM to retrieve relevant context and inject it into the prompt:

- name: RetrieveContext
  run_when_succeeded: [ValidateBody]
  database: main
  query: |
    SELECT content FROM documents
    ORDER BY embedding <-> $1::vector
    LIMIT 3;
  params:
    - a|ValidateBody::embedding|

Then compose the enriched prompt in the CallLLM body using the retrieved context.

Notes

double_quote() is an AirPipe interpolation wrapper that escapes the resolved value and wraps it in double quotes — safe to use directly inside JSON string positions.
The moderation endpoint uses OpenAI's free moderation API, which does not count against your token quota.
max_tokens defaults to null in the request body if omitted, which lets the model use its default maximum.

Configuration

config.yml

name: LlmGateway
description: >
  A configurable LLM completion endpoint. System prompt, model, and API key
  are managed variables — swap providers or update prompts without redeploying.

docs: true

# Required managed variables:
#   OPENAI_API_KEY  — OpenAI API key (or your provider's equivalent)
#   SYSTEM_PROMPT   — System prompt sent to the model on every request
#   LLM_MODEL       — Model name, e.g. gpt-4o-mini or gpt-4o

interfaces:

  # POST /ai/complete
  # Send a message and receive a completion.
  # Body: { "message": "Summarise this text: ...", "max_tokens": 500 }
  ai/complete:
    output: http
    method: POST
    summary: LLM completion
    description: >
      Send a message to the configured LLM. The system prompt and model are
      set via managed variables — no redeploy needed to change them.
    tags: [ai]
    request_example:
      message: "Explain what an API is in one sentence"
      max_tokens: 200
    notes: |
      Response shape mirrors the OpenAI API: `choices[0].message.content` contains
      the completion text. `usage` contains token counts.

    actions:
      - name: ValidateBody
        input: a|body|
        hide_data_on_success: true
        assert:
          http_code_on_error: 400
          tests:
            - value: message
              is_not_null: true
              is_not_empty: true
              description: "The user message to send to the model"

      - name: CallLLM
        run_when_succeeded:
          actions: [ValidateBody]
          http_code_on_error: 400
        http:
          url: https://api.openai.com/v1/chat/completions
          method: POST
          headers:
            content-type: application/json
            authorization: "Bearer a|ap_var::OPENAI_API_KEY|"
          body: |
            {
              "model": a|double_quote(ap_var::LLM_MODEL)|,
              "max_tokens": a|ValidateBody::max_tokens|,
              "messages": [
                { "role": "system", "content": a|double_quote(ap_var::SYSTEM_PROMPT)| },
                { "role": "user",   "content": a|double_quote(ValidateBody::message)|  }
              ]
            }
        assert:
          http_code_on_error: 502
          error_message: "LLM provider error"
          tests:
            - value: status
              is_equal_to: 200

      - name: ExtractResponse
        run_when_succeeded: [CallLLM]
        input: a|CallLLM|
        post_transforms:
          - extract_value: body

      - name: TrackCompletion
        run_when_succeeded: [ExtractResponse]
        hide_data_on_success: true
        emit_metric:
          name: app_llm_completions_total
          type: counter
          labels:
            model: a|ap_var::LLM_MODEL|

  # POST /ai/moderate
  # Check whether a piece of text violates OpenAI's content policy.
  # Useful as a guard before storing user-generated content.
  # Body: { "text": "..." }
  ai/moderate:
    output: http
    method: POST
    summary: Content moderation
    description: >
      Check text against OpenAI's moderation API. Returns `flagged: true` if
      the content violates usage policies, along with per-category scores.
    tags: [ai]
    request_example:
      text: "Some user-generated content to check"
    notes: |
      Uses the OpenAI Moderation API which is free regardless of your plan.

    actions:
      - name: ValidateBody
        input: a|body|
        hide_data_on_success: true
        assert:
          http_code_on_error: 400
          tests:
            - value: text
              is_not_null: true
              is_not_empty: true
              description: "Text to moderate"

      - name: CallModeration
        run_when_succeeded:
          actions: [ValidateBody]
          http_code_on_error: 400
        http:
          url: https://api.openai.com/v1/moderations
          method: POST
          headers:
            content-type: application/json
            authorization: "Bearer a|ap_var::OPENAI_API_KEY|"
          body: |
            {
              "input": a|double_quote(ValidateBody::text)|
            }
        assert:
          http_code_on_error: 502
          error_message: "Moderation API error"
          tests:
            - value: status
              is_equal_to: 200

      - name: ExtractResult
        run_when_succeeded: [CallModeration]
        input: a|CallModeration|
        post_transforms:
          - extract_value: body

      - name: TrackModeration
        run_when_succeeded: [ExtractResult]
        hide_data_on_success: true
        emit_metric:
          name: app_llm_moderations_total
          type: counter

LLM Gateway

What's included​

Endpoints​

Setup​

1. Get an OpenAI API key​

2. Set managed variables​

Testing​

Response shape​

/ai/complete​

/ai/moderate​

Changing the system prompt​

Using Anthropic instead of OpenAI​

Building a RAG pipeline​

Notes​

Configuration​

config.yml​