Skip to main content
By default, run_local() automatically sends spans to Tracia in the background. This gives you observability without blocking your application.

How Tracing Works

  1. run_local() completes the LLM call
  2. Returns the result immediately
  3. Submits the span to Tracia in the background using a thread pool
  4. Retries failed span submissions automatically
result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Result available immediately
print(result.text)
print(result.span_id)  # "sp_1234567890abcdef"

# Span is being sent in the background

Span Metadata

Add metadata to help filter and analyze spans:
result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    tags=["production", "chat", "v2"],
    user_id="user_abc123",
    session_id="session_xyz789",
)

Span Fields

FieldDescription
span_idUnique identifier for the span
trace_idSession ID if part of multi-turn conversation
parent_span_idParent span ID for chained conversations
modelModel used for the request
providerProvider (openai, anthropic, google)
input.messagesMessages sent (with variables interpolated)
variablesOriginal variables passed
outputGenerated response
statusSUCCESS or ERROR
latency_msRequest duration
input_tokensInput token count
output_tokensOutput token count
tagsUser-defined tags
user_idEnd user identifier
session_idSession identifier

Custom Span ID

Provide your own span ID for correlation with external systems:
result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    span_id="sp_a1b2c3d4e5f67890",
)

print(result.span_id)  # "sp_a1b2c3d4e5f67890"
Custom span IDs must match the format: sp_ followed by exactly 16 hexadecimal characters (e.g., sp_1234567890abcdef).

Waiting for Spans

Use flush() to wait for all pending spans before shutdown:
client = Tracia(api_key="tr_your_api_key")

# Run multiple requests
import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [
        executor.submit(client.run_local, model="gpt-4o", messages=[{"role": "user", "content": f"Request {i}"}])
        for i in range(3)
    ]
    results = [f.result() for f in futures]

# Wait for all background spans to complete
client.flush()
print("All spans submitted")

Async Flush

await client.aflush()

Graceful Shutdown with Context Manager

with Tracia(api_key="tr_your_api_key") as client:
    result = client.run_local(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
# flush() and close() called automatically on exit

Error Handling

on_span_error Callback

Handle span submission failures without affecting your main application:
def handle_span_error(error: Exception, span_id: str):
    print(f"Span {span_id} failed: {error}")
    # Log to monitoring system, send alert, etc.


client = Tracia(
    api_key="tr_your_api_key",
    on_span_error=handle_span_error,
)

result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
# LLM response returned even if span fails

Retry Behavior

Span submissions are automatically retried:
  • Up to 2 retry attempts
  • Exponential backoff (500ms, 1000ms)
  • on_span_error called only after all retries fail

Disabling Tracing

Disable tracing for specific requests:
result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    send_trace=False,
)

print(result.span_id)  # "sp_..." (still populated locally, just not sent to Tracia)
Use cases for disabling tracing:
  • Development and testing
  • Sensitive data that shouldn’t be logged
  • High-volume, low-value requests
  • Reducing costs on non-critical paths

Viewing Spans

Access spans in the Tracia dashboard or via the SDK:
# Get a specific span
span = client.spans.get("sp_1234567890abcdef")
print(span)

# List recent spans
from tracia import ListSpansOptions
result = client.spans.list(ListSpansOptions(tags=["production"], limit=10))
for span in result.spans:
    print(span.span_id, span.latency_ms)

Span Storage

Spans include:
  • Full input messages (after variable interpolation)
  • Original variables (for filtering)
  • Complete output text
  • Token usage and latency
  • LLM configuration (temperature, max_output_tokens, top_p)
Spans are stored securely and retained according to your plan’s data retention policy.