groupby
Description
The groupby command aggregates documents that share one or more common values or calculated expressions, allowing you to compute metrics such as sum, avg, max, min, and count. It is the cornerstone of DataPrime’s analytical capabilities, enabling powerful summarization and insight generation from raw event data.
Each unique combination of grouping expressions produces a single output document. The aggregate or agg keyword specifies which aggregation functions to apply within each group.
You can group by both keypaths and calculated expressions. When grouping by an expression, DataPrime evaluates it dynamically for each document before grouping.
Syntax
groupby <grouping_expression> [as <alias>] [, <grouping_expression_2> [as
<alias_2>], ...] [aggregate|agg] <aggregate_function> [as <result_keypath>] [,
<aggregate_function_2> [as <result_keypath_2], ...]
Example 1
Use case: Calculate average request duration per HTTP path
The groupby command is most commonly used to aggregate numerical metrics by
a categorical field. In this example, we group by path and calculate the
average request duration, producing one result per endpoint.
Example data
{ "path": "/home", "duration": 400 },
{ "path": "/home", "duration": 600 },
{ "path": "/about", "duration": 512 },
{ "path": "/store", "duration": 33 }
Example query
groupby path agg avg(duration) as average_duration_for_path
Example output
| path | average_duration_for_path |
|---|---|
| /home | 500 |
| /about | 512 |
| /store | 33 |
The groupby command collapses all documents with the same path value into
a single record, applying the avg() aggregation function to compute the mean
duration. Only the grouping fields and calculated aggregates are retained in
the output.
Example 2
Group by a transformed key:
Example query
groupby username.toLowerCase() agg count() as activity_count
Example 3
Cluster IPs into subnets:
Example query
groupby ipPrefix(process.tags['k8s.pod.ip'], 24) as subnet_range agg count() as activity_count
Example 4
Compute 99th percentile from traces:
Example query
filter $l.operationName == 'Lambda.Invoke'
| groupby $l.operationName agg percentile(0.99, $m.duration)