By default, run_local() automatically sends spans to Tracia in the background. This gives you observability without blocking your application.
How Tracing Works
run_local() completes the LLM call
- Returns the result immediately
- Submits the span to Tracia in the background using a thread pool
- Retries failed span submissions automatically
result = client.run_local(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
# Result available immediately
print(result.text)
print(result.span_id) # "sp_1234567890abcdef"
# Span is being sent in the background
Add metadata to help filter and analyze spans:
result = client.run_local(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
tags=["production", "chat", "v2"],
user_id="user_abc123",
session_id="session_xyz789",
)
Span Fields
| Field | Description |
|---|
span_id | Unique identifier for the span |
trace_id | Session ID if part of multi-turn conversation |
parent_span_id | Parent span ID for chained conversations |
model | Model used for the request |
provider | Provider (openai, anthropic, google) |
input.messages | Messages sent (with variables interpolated) |
variables | Original variables passed |
output | Generated response |
status | SUCCESS or ERROR |
latency_ms | Request duration |
input_tokens | Input token count |
output_tokens | Output token count |
tags | User-defined tags |
user_id | End user identifier |
session_id | Session identifier |
Custom Span ID
Provide your own span ID for correlation with external systems:
result = client.run_local(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
span_id="sp_a1b2c3d4e5f67890",
)
print(result.span_id) # "sp_a1b2c3d4e5f67890"
Custom span IDs must match the format: sp_ followed by exactly 16 hexadecimal characters (e.g., sp_1234567890abcdef).
Waiting for Spans
Use flush() to wait for all pending spans before shutdown:
client = Tracia(api_key="tr_your_api_key")
# Run multiple requests
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = [
executor.submit(client.run_local, model="gpt-4o", messages=[{"role": "user", "content": f"Request {i}"}])
for i in range(3)
]
results = [f.result() for f in futures]
# Wait for all background spans to complete
client.flush()
print("All spans submitted")
Async Flush
Graceful Shutdown with Context Manager
with Tracia(api_key="tr_your_api_key") as client:
result = client.run_local(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
# flush() and close() called automatically on exit
Error Handling
on_span_error Callback
Handle span submission failures without affecting your main application:
def handle_span_error(error: Exception, span_id: str):
print(f"Span {span_id} failed: {error}")
# Log to monitoring system, send alert, etc.
client = Tracia(
api_key="tr_your_api_key",
on_span_error=handle_span_error,
)
result = client.run_local(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
# LLM response returned even if span fails
Retry Behavior
Span submissions are automatically retried:
- Up to 2 retry attempts
- Exponential backoff (500ms, 1000ms)
on_span_error called only after all retries fail
Disabling Tracing
Disable tracing for specific requests:
result = client.run_local(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
send_trace=False,
)
print(result.span_id) # "sp_..." (still populated locally, just not sent to Tracia)
Use cases for disabling tracing:
- Development and testing
- Sensitive data that shouldn’t be logged
- High-volume, low-value requests
- Reducing costs on non-critical paths
Viewing Spans
Access spans in the Tracia dashboard or via the SDK:
# Get a specific span
span = client.spans.get("sp_1234567890abcdef")
print(span)
# List recent spans
from tracia import ListSpansOptions
result = client.spans.list(ListSpansOptions(tags=["production"], limit=10))
for span in result.spans:
print(span.span_id, span.latency_ms)
Span Storage
Spans include:
- Full input messages (after variable interpolation)
- Original variables (for filtering)
- Complete output text
- Token usage and latency
- LLM configuration (temperature, max_output_tokens, top_p)
Spans are stored securely and retained according to your plan’s data retention policy.