Tracing

By default, run_local() automatically sends spans to Tracia in the background. This gives you observability without blocking your application.

How Tracing Works

run_local() completes the LLM call
Returns the result immediately
Submits the span to Tracia in the background using a thread pool
Retries failed span submissions automatically

result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Result available immediately
print(result.text)
print(result.span_id)  # "sp_1234567890abcdef"

# Span is being sent in the background

Span Metadata

Add metadata to help filter and analyze spans:

result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    tags=["production", "chat", "v2"],
    user_id="user_abc123",
    session_id="session_xyz789",
)

Span Fields

Field	Description
`span_id`	Unique identifier for the span
`trace_id`	Session ID if part of multi-turn conversation
`parent_span_id`	Parent span ID for chained conversations
`model`	Model used for the request
`provider`	Provider (openai, anthropic, google)
`input.messages`	Messages sent (with variables interpolated)
`variables`	Original variables passed
`output`	Generated response
`status`	SUCCESS or ERROR
`latency_ms`	Request duration
`input_tokens`	Input token count
`output_tokens`	Output token count
`tags`	User-defined tags
`user_id`	End user identifier
`session_id`	Session identifier

Custom Span ID

Provide your own span ID for correlation with external systems:

result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    span_id="sp_a1b2c3d4e5f67890",
)

print(result.span_id)  # "sp_a1b2c3d4e5f67890"

Custom span IDs must match the format: sp_ followed by exactly 16 hexadecimal characters (e.g., sp_1234567890abcdef).

Waiting for Spans

Use flush() to wait for all pending spans before shutdown:

client = Tracia(api_key="tr_your_api_key")

# Run multiple requests
import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [
        executor.submit(client.run_local, model="gpt-4o", messages=[{"role": "user", "content": f"Request {i}"}])
        for i in range(3)
    ]
    results = [f.result() for f in futures]

# Wait for all background spans to complete
client.flush()
print("All spans submitted")

Async Flush

await client.aflush()

Graceful Shutdown with Context Manager

with Tracia(api_key="tr_your_api_key") as client:
    result = client.run_local(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
    )
# flush() and close() called automatically on exit

Error Handling

on_span_error Callback

Handle span submission failures without affecting your main application:

def handle_span_error(error: Exception, span_id: str):
    print(f"Span {span_id} failed: {error}")
    # Log to monitoring system, send alert, etc.


client = Tracia(
    api_key="tr_your_api_key",
    on_span_error=handle_span_error,
)

result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
# LLM response returned even if span fails

Retry Behavior

Span submissions are automatically retried:

Up to 2 retry attempts
Exponential backoff (500ms, 1000ms)
on_span_error called only after all retries fail

Disabling Tracing

Disable tracing for specific requests:

result = client.run_local(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    send_trace=False,
)

print(result.span_id)  # "sp_..." (still populated locally, just not sent to Tracia)

Use cases for disabling tracing:

Development and testing
Sensitive data that shouldn’t be logged
High-volume, low-value requests
Reducing costs on non-critical paths

Viewing Spans

Access spans in the Tracia dashboard or via the SDK:

# Get a specific span
span = client.spans.get("sp_1234567890abcdef")
print(span)

# List recent spans
from tracia import ListSpansOptions
result = client.spans.list(ListSpansOptions(tags=["production"], limit=10))
for span in result.spans:
    print(span.span_id, span.latency_ms)

Span Storage

Spans include:

Full input messages (after variable interpolation)
Original variables (for filtering)
Complete output text
Token usage and latency
LLM configuration (temperature, max_output_tokens, top_p)

Spans are stored securely and retained according to your plan’s data retention policy.

Getting Started

Node.js SDK

Python SDK

How Tracing Works

Span Metadata

Span Fields

Custom Span ID

Waiting for Spans

Async Flush

Graceful Shutdown with Context Manager

Error Handling

on_span_error Callback

Retry Behavior

Disabling Tracing

Viewing Spans

Span Storage

Getting Started

Node.js SDK

Python SDK

​How Tracing Works

​Span Metadata

​Span Fields

​Custom Span ID

​Waiting for Spans

​Async Flush

​Graceful Shutdown with Context Manager

​Error Handling

​on_span_error Callback

​Retry Behavior

​Disabling Tracing

​Viewing Spans

​Span Storage

How Tracing Works

Span Metadata

Span Fields

Custom Span ID

Waiting for Spans

Async Flush

Graceful Shutdown with Context Manager

Error Handling

on_span_error Callback

Retry Behavior

Disabling Tracing

Viewing Spans

Span Storage