How to use DataPrime to combine datasets and correlate logs
Goal
By the end of this guide you should be able to:
- Use
jointo combine logs and traces by shared fields - Use
unionto merge datasets with compatible schemas - Use identifier-based filtering to correlate logs without a formal join
Why it matters
Real-world debugging rarely involves a single service. To understand the full picture, you often need to combine data from multiple sources—logs, traces, or metrics—based on shared identifiers like request_id, trace_id, or user_id. This guide helps you unify fragmented data into a cohesive timeline for triage, monitoring, and root cause analysis.
Combining datasets using join
Description
The join command combines two datasets by matching a common field (e.g., trace_id, request_id). It's useful for enriching logs with related events from another source.
Joins can be resource intensive. Try to filter as much as possible before joining.
Syntax
<query1>
| join (
<query2>
) on <join_condition>
Merging datasets using union
Description
The union command merges two datasets into a single stream. Both sources should have compatible schemas or be normalized with choose.
Syntax
<query1>
| union (
<query2>
)
Common pitfalls
- Unfiltered joins: Always apply
filterbeforejointo avoid performance issues. - Mismatched schemas: Use
chooseto normalize fields beforeunion. - Missing correlation keys: Without a shared ID like
request_id, correlation is not possible.