Request transformations
Use LLM request transformations to dynamically compute and set fields in LLM requests using Common Expression Language (CEL) CEL (Common Expression Language) A simple expression language used throughout agentgateway to enable flexible configuration. CEL expressions can access request context, JWT claims, and other variables to make dynamic decisions. expressions. Transformations let you enforce policies such as capping token usage or conditionally modifying request parameters, without changing client code.
To learn more about CEL, see the following resources:
Before you begin
Configure LLM request transformations
Create an AgentgatewayPolicy resource to apply an LLM request transformation. The following example limits
max_completion_tokensto no more than 10. If the client requests less than 10 tokens, this number is applied. If the client requests more than 10 tokens, the maximum number of 10 is applied.kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayPolicy metadata: name: cap-max-tokens namespace: agentgateway-system labels: app: agentgateway spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: openai backend: ai: transformations: - field: max_completion_tokens expression: "min(llmRequest.max_completion_tokens, 10)" EOFSetting Description backend.ai.transformationsA list of LLM request field transformations. fieldThe name of the LLM request field to set. Maximum 256 characters. expressionA CEL expression that computes the value for the field. Use the llmRequestvariable to access the original LLM request body. Maximum 16,384 characters.ℹ️You can specify up to 64 transformations per policy. Transformations take priority over
overridesfor the same field. If an expression fails to evaluate, the field is silently removed from the request.Thinking budget fields, such as
reasoning_effortandthinking_budget_tokenscan also be set or capped by using transformations. This way, operators can enforce reasoning limits centrally without requiring client changes. For example, use"field": "reasoning_effort"with the expression"medium"to cap all requests to medium reasoning efforts regardless of what the client sends.Verify that the AgentgatewayPolicy is accepted.
kubectl get AgentgatewayPolicy cap-max-tokens -n agentgateway-system -o jsonpath='{.status.ancestors[0].conditions[?(@.type=="Accepted")].status}'Send a request with
max_completion_tokensset to a value greater than 10. The transformation limits it to 10 before the request reaches the LLM provider. Verify that thecompletion_tokensvalue in the response is 10 or fewer and thefinish_reasonis set tolength.ℹ️Some older OpenAI models usemax_tokensinstead ofmax_completion_tokens. If the transformation does not appear to take effect, check the model’s API documentation for the correct field name and update the transformation’sfieldvalue accordingly.curl "$INGRESS_GW_ADDRESS/v1/chat/completions" \ -H "content-type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "max_completion_tokens": 5000, "messages": [ { "role": "user", "content": "Tell me a short story" } ] }' | jqcurl "localhost:8080/v1/chat/completions" \ -H "content-type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "max_completion_tokens": 5000, "messages": [ { "role": "user", "content": "Tell me a short story" } ] }' | jqExample output:
{ "model": "gpt-3.5-turbo-0125", "usage": { "prompt_tokens": 12, "completion_tokens": 10, "total_tokens": 22, "completion_tokens_details": { "reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 }, "prompt_tokens_details": { "cached_tokens": 0, "audio_tokens": 0 } }, "choices": [ { "message": { "content": "Once upon a time, in a small village nestled", "role": "assistant", "refusal": null, "annotations": [] }, "index": 0, "logprobs": null, "finish_reason": "length" } ], ... }
Inject LLM model information as response headers
Use CEL expressions to inject LLM model information as response headers. This strategy is useful for detecting silent fallbacks, where a request is redirected to a different model without the client being notified. However, this setup might not be suitable for streaming responses.
Inject model headers from request and response bodies
Parse the model field from the incoming request body and the upstream response body using json(), then inject them as response headers. This configuration lets you compare which model was requested against which model actually responded.
json(request.body).model: Reads themodelfield from the incoming request body.json(response.body).model: Reads themodelfield from the upstream response body.
Create a AgentgatewayPolicy resource that targets the OpenAI provider’s HTTPRoute and injects the model fields as response headers.
kubectl apply -f- <<EOF apiVersion: agentgateway.dev/v1alpha1 kind: AgentgatewayPolicy metadata: name: llm-model-headers namespace: agentgateway-system labels: app: agentgateway spec: targetRefs: - group: gateway.networking.k8s.io kind: HTTPRoute name: openai traffic: transformation: response: set: - name: x-requested-model value: 'string(json(request.body).model)' - name: x-actual-model value: 'string(json(response.body).model)' EOFSend a chat completion request through the gateway and inspect the response headers.
curl -vi "http://$INGRESS_GW_ADDRESS/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hi"}]}'curl -vi "http://localhost:8080/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hi"}]}'Example output:
< HTTP/1.1 200 OK HTTP/1.1 200 OK < content-type: application/json content-type: application/json < x-requested-model: gpt-4 x-requested-model: gpt-4 < x-actual-model: gpt-3.5-turbo-0125 x-actual-model: gpt-3.5-turbo-0125 ...Actual model values might differ slightly from the requested model, even if the same model is used. Some responses might include a unique identifier as part of the model name. In these circumstances, you might use the
contains()function to verify.When a fallback model handles the request,
x-actual-modeldiffers fromx-requested-model:< x-requested-model: gpt-4o x-requested-model: gpt-4o < x-actual-model: gpt-4o-mini x-actual-model: gpt-4o-miniℹ️When sending traffic to the gateway with traffic compression enabled, such asgziporbr, the CEL expression could fail. If a header is missing from a response, try a differentaccept-encodingheader in your request.
Cleanup
You can remove the resources that you created in this guide.kubectl delete AgentgatewayPolicy cap-max-tokens -n agentgateway-system --ignore-not-found
kubectl delete AgentgatewayPolicy llm-model-headers -n agentgateway-system --ignore-not-found