Skip to main content
The run_local() method lets you execute prompts directly against OpenAI, Anthropic, or Google while keeping your prompts in your codebase. You get full observability through Tracia without any added latency.
from tracia import Tracia

client = Tracia(api_key="tr_your_api_key")

result = client.run_local(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(result.text)

Why run_local()?

Some teams prefer managing prompts in their codebase rather than in an external dashboard. This keeps prompts:
  • Version-controlled with your application code
  • Reviewed through your standard PR process
  • Deployed alongside the code that uses them
  • Constructed programmatically when needed
run_local() gives you full Tracia observability while respecting this workflow.

How It Works

When you call run_local(), the SDK:
  1. Calls the provider via LiteLLM - Your request goes to OpenAI, Anthropic, or Google through LiteLLM. Tracia is not in the request path.
  2. Sends the trace asynchronously - After the LLM responds, trace data is sent to Tracia in the background. This is non-blocking and adds zero latency to your application.
prompts.run()run_local()
Prompts stored inTracia dashboardYour codebase
LLM call routed throughTracia APIDirect to provider via LiteLLM
Trace creationAutomatic (server-side)Async, non-blocking

When to Use run_local() vs prompts.run()

Use run_local() when you want to:
  • Keep prompts in your codebase, version-controlled with git
  • Build prompts programmatically (e.g., assembling messages based on context)
  • Prototype quickly without dashboard setup
  • Use Tracia purely for observability
Use prompts.run() when you want to:
  • Edit prompts without code deployments
  • A/B test prompt versions from the dashboard
  • Let non-engineers manage prompt content
  • Track prompt versions separately from code versions
Use CaseRecommended Method
Prompts managed in Tracia dashboardprompts.run()
Prompts defined in coderun_local()
Prompts reviewed in PRsrun_local()
Quick prototypingrun_local()
A/B testing prompt versionsprompts.run()
Programmatically constructed promptsrun_local()
Non-technical prompt editorsprompts.run()

Quick Examples

result = client.run_local(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    temperature=0.7,
)

Async Variants

Use arun_local() for async code:
result = await client.arun_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Async streaming
stream = await client.arun_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True,
)

async for chunk in stream:
    print(chunk, end="")

Available Methods

Types

LLMProvider

class LLMProvider(str, Enum):
    OPENAI = "openai"
    ANTHROPIC = "anthropic"
    GOOGLE = "google"

RunLocalInput

class RunLocalInput(BaseModel):
    # Required
    messages: list[LocalPromptMessage]
    model: str

    # Streaming
    stream: bool = False

    # Provider override (for custom/new models)
    provider: LLMProvider | None = None

    # LLM configuration
    temperature: float | None = None
    max_output_tokens: int | None = None   # alias: "maxOutputTokens"
    top_p: float | None = None             # alias: "topP"
    stop_sequences: list[str] | None = None  # alias: "stopSequences"
    timeout_ms: int | None = None          # alias: "timeoutMs"

    # Tool calling
    tools: list[ToolDefinition] | None = None
    tool_choice: ToolChoice | None = None  # alias: "toolChoice"

    # Variable interpolation
    variables: dict[str, str] | None = None

    # Provider API key override
    provider_api_key: str | None = None    # alias: "providerApiKey"

    # Span options
    tags: list[str] | None = None
    user_id: str | None = None             # alias: "userId"
    session_id: str | None = None          # alias: "sessionId"
    send_trace: bool | None = None         # alias: "sendTrace"
    span_id: str | None = None             # alias: "spanId"
    trace_id: str | None = None            # alias: "traceId"
    parent_span_id: str | None = None      # alias: "parentSpanId"

RunLocalResult

class RunLocalResult(BaseModel):
    text: str
    span_id: str              # alias: "spanId"
    trace_id: str             # alias: "traceId"
    latency_ms: int           # alias: "latencyMs"
    usage: TokenUsage
    cost: float | None
    provider: LLMProvider
    model: str
    tool_calls: list[ToolCall]  # alias: "toolCalls"
    finish_reason: FinishReason  # alias: "finishReason"
    message: LocalPromptMessage

LocalStream

When stream=True is set, run_local() returns a LocalStream:
class LocalStream:
    # Span ID available immediately
    span_id: str

    # Trace ID (always present)
    trace_id: str

    # Iterate to receive text chunks
    def __iter__(self) -> Iterator[str]: ...

    # Final result after stream completes (call .result() on the Future)
    result: Future[StreamResult]

    # Cancel the stream
    def abort(self) -> None: ...

LocalPromptMessage

class LocalPromptMessage(BaseModel):
    role: Literal["system", "developer", "user", "assistant", "tool"]
    content: str | list[ContentPart]
    tool_call_id: str | None = None   # alias: "toolCallId", required for "tool" role
    tool_name: str | None = None      # alias: "toolName", required for "tool" role

# Content parts for assistant messages with tool calls
ContentPart = TextPart | ToolCallPart

class TextPart(BaseModel):
    type: Literal["text"]
    text: str

class ToolCallPart(BaseModel):
    type: Literal["tool_call"]
    id: str
    name: str
    arguments: dict[str, Any]