Documentation Index Fetch the complete documentation index at: https://docs.tracia.io/llms.txt
Use this file to discover all available pages before exploring further.
The run_local() method lets you execute prompts directly against OpenAI, Anthropic, Google, or Amazon Bedrock while keeping your prompts in your codebase. You get full observability through Tracia without any added latency.
from tracia import Tracia
client = Tracia( api_key = "tr_your_api_key" )
result = client.run_local(
model = "gpt-4o" ,
messages = [
{ "role" : "system" , "content" : "You are a helpful assistant." },
{ "role" : "user" , "content" : "Hello!" },
],
)
print (result.text)
Why run_local()?
Some teams prefer managing prompts in their codebase rather than in an external dashboard. This keeps prompts:
Version-controlled with your application code
Reviewed through your standard PR process
Deployed alongside the code that uses them
Constructed programmatically when needed
run_local() gives you full Tracia observability while respecting this workflow.
How It Works
When you call run_local(), the SDK:
Calls the provider via LiteLLM - Your request goes to OpenAI, Anthropic, Google, or Amazon Bedrock through LiteLLM. Tracia is not in the request path.
Sends the trace asynchronously - After the LLM responds, trace data is sent to Tracia in the background. This is non-blocking and adds zero latency to your application.
prompts.run()run_local()Prompts stored in Tracia dashboard Your codebase LLM call routed through Tracia API Direct to provider via LiteLLM Trace creation Automatic (server-side) Async, non-blocking
When to Use run_local() vs prompts.run()
Use run_local() when you want to:
Keep prompts in your codebase, version-controlled with git
Build prompts programmatically (e.g., assembling messages based on context)
Prototype quickly without dashboard setup
Use Tracia purely for observability
Use prompts.run() when you want to:
Edit prompts without code deployments
A/B test prompt versions from the dashboard
Let non-engineers manage prompt content
Track prompt versions separately from code versions
Use Case Recommended Method Prompts managed in Tracia dashboard prompts.run()Prompts defined in code run_local()Prompts reviewed in PRs run_local()Quick prototyping run_local()A/B testing prompt versions prompts.run()Programmatically constructed prompts run_local()Non-technical prompt editors prompts.run()
Quick Examples
OpenAI
Anthropic
Google
Amazon Bedrock
Streaming
result = client.run_local(
model = "gpt-4o" ,
messages = [
{ "role" : "user" , "content" : "Explain quantum computing in simple terms." }
],
temperature = 0.7 ,
)
Async Variants
Use arun_local() for async code:
result = await client.arun_local(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Hello!" }],
)
# Async streaming
stream = await client.arun_local(
model = "gpt-4o" ,
messages = [{ "role" : "user" , "content" : "Write a poem" }],
stream = True ,
)
async for chunk in stream:
print (chunk, end = "" )
Available Methods
Basic Usage Getting started with each provider
Streaming Real-time streaming responses
Sessions Automatic trace chaining for multi-turn
Parameters Complete run_local() parameter reference
Response RunLocalResult fields and usage
Providers OpenAI, Anthropic, Google, Bedrock setup
Models 94+ supported models by provider
Variables Template interpolation syntax
Tracing Background traces, flush(), error handling
Advanced Error handling, concurrent requests
Types
LLMProvider
class LLMProvider ( str , Enum ):
OPENAI = "openai"
ANTHROPIC = "anthropic"
GOOGLE = "google"
AMAZON_BEDROCK = "amazon_bedrock"
class RunLocalInput ( BaseModel ):
# Required
messages: list[LocalPromptMessage]
model: str
# Streaming
stream: bool = False
# Provider override (for custom/new models)
provider: LLMProvider | None = None
# LLM configuration
temperature: float | None = None
max_output_tokens: int | None = None # alias: "maxOutputTokens"
top_p: float | None = None # alias: "topP"
stop_sequences: list[ str ] | None = None # alias: "stopSequences"
timeout_ms: int | None = None # alias: "timeoutMs"
# Tool calling
tools: list[ToolDefinition] | None = None
tool_choice: ToolChoice | None = None # alias: "toolChoice"
# Variable interpolation
variables: dict[ str , str ] | None = None
# Provider API key override
provider_api_key: str | None = None # alias: "providerApiKey"
# Span options
tags: list[ str ] | None = None
user_id: str | None = None # alias: "userId"
session_id: str | None = None # alias: "sessionId"
send_trace: bool | None = None # alias: "sendTrace"
span_id: str | None = None # alias: "spanId"
trace_id: str | None = None # alias: "traceId"
parent_span_id: str | None = None # alias: "parentSpanId"
RunLocalResult
class RunLocalResult ( BaseModel ):
text: str
span_id: str # alias: "spanId"
trace_id: str # alias: "traceId"
latency_ms: int # alias: "latencyMs"
usage: TokenUsage
cost: float | None
provider: LLMProvider
model: str
tool_calls: list[ToolCall] # alias: "toolCalls"
finish_reason: FinishReason # alias: "finishReason"
message: LocalPromptMessage
LocalStream
When stream=True is set, run_local() returns a LocalStream:
class LocalStream :
# Span ID available immediately
span_id: str
# Trace ID (always present)
trace_id: str
# Iterate to receive text chunks
def __iter__ ( self ) -> Iterator[ str ]: ...
# Final result after stream completes (call .result() on the Future)
result: Future[StreamResult]
# Cancel the stream
def abort ( self ) -> None : ...
LocalPromptMessage
class LocalPromptMessage ( BaseModel ):
role: Literal[ "system" , "developer" , "user" , "assistant" , "tool" ]
content: str | list[ContentPart]
tool_call_id: str | None = None # alias: "toolCallId", required for "tool" role
tool_name: str | None = None # alias: "toolName", required for "tool" role
# Content parts for assistant messages with tool calls
ContentPart = TextPart | ToolCallPart
class TextPart ( BaseModel ):
type : Literal[ "text" ]
text: str
class ToolCallPart ( BaseModel ):
type : Literal[ "tool_call" ]
id : str
name: str
arguments: dict[ str , Any]