Skip to main content

engine.queries

Purpose

The engine.queries dataset logs detailed information about the execution of queries within your environment. It captures both semantic details (e.g., query structure, labels, joins) and execution-level statistics (e.g., performance metrics, errors, resource usage). This dataset is invaluable for query performance analysis, helping teams investigate slow queries, diagnose failures, and track usage patterns. By understanding the nuances of query execution, teams can identify inefficiencies, optimize query performance, and ensure the reliability of their data operations.

Some of the many use cases for engine.queries include:

Schema description

Full JSON pathField data typeField data exampleDescription
clientInfo.userEmailString"alex@acme.io"The email address of the user who issued the query.
clientInfo.originatingTeamIdString"42"Internal team ID that owns or initiated the query.
queryInfo.semanticLabelsObject{...}Aggregated booleans describing structural/query features.
queryInfo.semanticLabels.containsFreeTextSearchBooleantrueWhether the query includes full-text search elements.
queryInfo.semanticLabels.containsUnionsBooleanfalseWhether SQL-style UNION clauses are present.
queryInfo.semanticLabels.containsAggregationsBooleantrueWhether aggregate functions (e.g., SUM, COUNT, AVG) are used.
queryInfo.semanticLabels.nonDefaultLimitRequestedBooleantrueIndicates a non-default result size was requested.
queryInfo.semanticLabels.containsWildTextSearchBooleanfalseWhether wildcard search terms (e.g., *, %) are used.
queryInfo.semanticLabels.containsGroupingSetsBooleanfalseWhether grouping sets or similar constructs are used.
queryInfo.semanticLabels.containsJoinsBooleantrueWhether the query joins multiple datasets/tables.
queryInfo.semanticLabels.containsExtractBooleantrueWhether extract operations (e.g., parsing time/strings) are used.
queryInfo.semanticLabels.containsWritetoBooleanfalseWhether the query uses a writeto clause to materialize results into a dataset.
queryInfo.semanticLabels.extractBeforeFilterBooleanfalseWhether the extract occurs before filtering in execution order.
queryInfo.querySyntaxEnum"dataprime"Syntax format used by the query: dataprime, lucene, or opensearch.
queryInfo.interfaceTypeEnum"dataprime"Interface that the query was submitted against: dataprime or opensearch.
queryInfo.tierEnum"low"Query priority tier: high, medium, or low.
queryInfo.sourcesArray[{"fqDataset":"default/logs","teamId":"42","timeFrame":{...}}]Source datasets and configurations used in the query.
queryInfo.sources.fqDatasetString"default/logs"Fully-qualified dataset name (<dataspace>/<dataset>).
queryInfo.sources.teamIdString"42"Team responsible for the dataset (optional).
queryInfo.sources.timeFrameObject{"start":1777545819954000000,"end":1777841999999000000,"durationMs":296180045}Selected time range for the source.
queryInfo.sources.timeFrame.startNumber1777545819954000000Start of the source's time range, in epoch nanoseconds.
queryInfo.sources.timeFrame.endNumber1777841999999000000End of the source's time range, in epoch nanoseconds.
queryInfo.sources.timeFrame.durationMsNumber296180045Total timeframe duration in milliseconds.
queryInfo.sources.scopeExpressionString"service='api' AND env='prod'"Expression that filters or scopes the source data.
queryInfo.queryOutcomeObject{...}Final execution outcome details.
queryInfo.queryOutcome.errorMessageString"Syntax error near 'FROM'"Optional error message if the query failed.
queryInfo.queryOutcome.outputRowCountNumber124Number of rows returned.
queryInfo.queryOutcome.statusEnum"Completed"Final status: Completed, Failed, Cancelled, TimedOut, or Incomplete.
queryInfo.queryOutcome.storageObject{...}Storage and execution metadata: where the query ran, which storage locations it touched, what it produced, and resource statistics.
queryInfo.queryOutcome.storage.homeObject{"cloud":"aws","aws":{"region":"eu-west-1"}}Cloud and region where the query was executed.
queryInfo.queryOutcome.storage.home.cloudEnum"aws"Cloud provider hosting the query engine.
queryInfo.queryOutcome.storage.home.aws.regionString"eu-west-1"AWS region where the query was executed. Present when cloud is aws.
queryInfo.queryOutcome.storage.locationsArray[{"type":"objectStore","locationOwner":"customer","objectStore":{...}}]Storage locations the query read from or wrote to during execution. Each item is one of two variants — objectStore (customer-owned) or high (provider-owned).
queryInfo.queryOutcome.storage.locations.typeEnum"objectStore"Location variant: objectStore or high.
queryInfo.queryOutcome.storage.locations.locationOwnerEnum"customer"customer for objectStore locations; provider for high locations.
queryInfo.queryOutcome.storage.locations.objectStoreObject{...}Object-store metadata. Present when type is objectStore.
queryInfo.queryOutcome.storage.locations.objectStore.typeEnum"aws_s3"Object-store type: aws_s3, gcp_gs, azure_blobStorage, or ibm_cos.
queryInfo.queryOutcome.storage.locations.objectStore.aws_s3.bucketString"customer-archive-bucket"S3 bucket name. Present when objectStore.type is aws_s3.
queryInfo.queryOutcome.storage.locations.objectStore.aws_s3.regionString"eu-west-1"S3 bucket region. Present when objectStore.type is aws_s3.
queryInfo.queryOutcome.storage.locations.objectStore.statsObject{...}Object-store I/O stats for this location.
queryInfo.queryOutcome.storage.locations.objectStore.stats.bytesReadNumber0Bytes read from this object-store location.
queryInfo.queryOutcome.storage.locations.objectStore.stats.getRequestsNumber0Number of GET requests issued to this object-store location.
queryInfo.queryOutcome.storage.locations.objectStore.stats.headRequestsNumber0Number of HEAD requests issued to this object-store location.
queryInfo.queryOutcome.storage.locations.highObject{...}Provider-owned storage metadata. Present when type is high.
queryInfo.queryOutcome.storage.locations.high.stats.bytesReadNumber0Bytes read from provider-owned storage.
queryInfo.queryOutcome.storage.outputsArray[{"type":"writeto","writeto":{...}}]Outputs produced by the query (e.g., results materialized via writeto).
queryInfo.queryOutcome.storage.outputs.typeEnum"writeto"Output type.
queryInfo.queryOutcome.storage.outputs.writetoObject{...}writeto output details. Present when type is writeto.
queryInfo.queryOutcome.storage.outputs.writeto.dataset.stats.totalBytesWrittenNumber87893Total compressed bytes written across all targets.
queryInfo.queryOutcome.storage.outputs.writeto.dataset.stats.totalUncompressedBytesNumber24469Total uncompressed bytes written across all targets.
queryInfo.queryOutcome.storage.outputs.writeto.dataset.targetsArray[{...}]Per-target writeto results.
queryInfo.queryOutcome.storage.outputs.writeto.dataset.targets.datasetString"materialized_events"Target dataset name.
queryInfo.queryOutcome.storage.outputs.writeto.dataset.targets.dataspaceString"default"Target dataspace.
queryInfo.queryOutcome.storage.outputs.writeto.dataset.targets.writeModeEnum"append"Write mode used: append or overwrite.
queryInfo.queryOutcome.storage.outputs.writeto.dataset.targets.stats.bytesWrittenNumber87893Compressed bytes written to this target.
queryInfo.queryOutcome.storage.outputs.writeto.dataset.targets.stats.uncompressedBytesNumber24469Uncompressed bytes written to this target.
queryInfo.queryOutcome.storage.statsObject{...}Execution-level resource statistics for the query.
queryInfo.queryOutcome.storage.stats.bytesReadNumber7271615Bytes read from customer-exposed storage (customer bucket or provider-owned storage). Excludes cache and staging bucket reads.
queryInfo.queryOutcome.storage.stats.crossRegionBytesReadNumber0Bytes read from a different region than the DataPrime Query Engine (DQE) cluster's region.
queryInfo.queryOutcome.storage.stats.limitsObject{...}Per-resource limits. Each entry includes reached (whether the limit was hit) and, where applicable, limit (the threshold) and value (the measured amount). Threshold values vary by tier and account configuration; the limit examples in the rows below are illustrative only.
queryInfo.queryOutcome.storage.stats.limits.scan.reachedBooleanfalsetrue if the scan limit was reached.
queryInfo.queryOutcome.storage.stats.limits.scan.limitNumber1073741824Scan-byte threshold.
queryInfo.queryOutcome.storage.stats.limits.scan.valueNumber14522Bytes the engine scanned (may include data from in-memory cache, not just storage).
queryInfo.queryOutcome.storage.stats.limits.shuffleSize.reachedBooleanfalsetrue if the shuffle size limit was reached.
queryInfo.queryOutcome.storage.stats.limits.shuffleSize.limitNumber1073741824Shuffle-byte threshold.
queryInfo.queryOutcome.storage.stats.limits.shuffleSize.valueNumber83456Bytes shuffled by the query.
queryInfo.queryOutcome.storage.stats.limits.filesRead.reachedBooleanfalsetrue if the files-read limit was reached.
queryInfo.queryOutcome.storage.stats.limits.filesRead.limitNumber10000Files-read threshold.
queryInfo.queryOutcome.storage.stats.limits.filesRead.valueNumber0Number of files the query read.
queryInfo.queryOutcome.storage.stats.limits.aggBuckets.reachedBooleanfalsetrue if the aggregation-buckets limit was reached.
queryInfo.queryOutcome.storage.stats.limits.aggBuckets.limitNumber10000Aggregation-buckets threshold.
queryInfo.queryOutcome.storage.stats.limits.aggBuckets.valueNumber0Number of aggregation buckets used by the query.
queryInfo.queryOutcome.storage.stats.limits.column.reachedBooleanfalsetrue if the column limit was reached. The column count itself is not emitted because it is computed at ingest, not at query time.
queryInfo.queryOutcome.storage.stats.limits.scrollTimeout.reachedBooleanfalsetrue if the scroll operation timed out.
queryInfo.queryOutcome.failureTypeEnum"bad request"Failure reason (if applicable).
queryInfo.queryOutcome.failureClassEnum"clientError"Error class: clientError or serverError.
queryInfo.queryOutcome.e2eDurationMsNumber842End-to-end execution time (ms).
queryInfo.queryIdString"q-2025-09-04-abc123"Unique identifier for the query execution.
queryInfo.queryBlueprintsObject{...}Normalized representations of query components.
queryInfo.queryBlueprints.queryTextSearchFiltersString"text:\"payment failed\""Representation of text-based filters.
queryInfo.queryBlueprints.queryNoLiteralsString"SELECT * FROM logs WHERE status = ?"Query string with literals removed.
queryInfo.queryBlueprints.queryLabelFiltersString"service=api, env=prod"Normalized label-based filters.
queryInfo.defaultTimeFrameObject{"start":1777801523000000000,"end":1777805123000000000,"durationMs":3600000}Default time range if none is specified.
queryInfo.defaultTimeFrame.startNumber1777801523000000000Default start time, in epoch nanoseconds.
queryInfo.defaultTimeFrame.endNumber1777805123000000000Default end time, in epoch nanoseconds.
queryInfo.defaultTimeFrame.durationMsNumber3600000Default time range duration in milliseconds.
queryInfo.queryTextString"source default/logs \ | limit 10"Original raw query issued by the user.

engine.queries schema

{ `clientInfo`
`userEmail`

type: string
The email address of the user who issued the query.

`originatingTeamId`

type: string
The internal team ID that owns or initiated the query.

}
{ `queryInfo`
{ `semanticLabels`
`containsFreeTextSearch`

type: boolean
Whether the query includes full-text search elements.

`containsUnions`

type: boolean
Whether the query includes SQL-style UNION clauses.

`containsAggregations`

type: boolean
Whether the query uses aggregation functions like SUM, COUNT, AVG.

`nonDefaultLimitRequested`

type: boolean
Indicates if the query requests a result size different from the default.

`containsWildTextSearch`

type: boolean
Whether the query includes wildcard search terms (e.g., *, %).

`containsGroupingSets`

type: boolean
Whether grouping sets or similar constructs are used in the query.

`containsJoins`

type: boolean
Whether the query involves joining multiple datasets or tables.

`containsExtract`

type: boolean
Whether the query includes extract operations (e.g., parsing time or string fields).

`containsWriteto`

type: boolean
Whether the query uses a writeto clause to materialize results into a dataset.

`extractBeforeFilter`

type: boolean
Whether the extract operation occurs before filtering in execution order.

}
`querySyntax`

Enum: dataprime, lucene, opensearch
Syntax format used by the query.

`interfaceType`

Enum: dataprime, opensearch
Interface that the query was submitted against.

`tier`

Enum: high, medium, low
Query priority tier.

{ `sources`

type: array
A list of source datasets and their configurations used in the query.

{ items
`fqDataset`

type: string
The fully qualified name of the dataset (<dataspace>/<dataset>).

`teamId`

type: string
The team responsible for the dataset (optional).

{ `timeFrame`
`start`

type: number
Start of the source's time range, in epoch nanoseconds.

`end`

type: number
End of the source's time range, in epoch nanoseconds.

`durationMs`

type: number
Total duration of the selected timeframe in milliseconds.

}
`scopeExpression`

type: string
Expression that filters or scopes the source data (e.g., labels or conditions).

}
}
{ `queryOutcome`
`errorMessage`

type: string
Optional error message if the query failed.

`outputRowCount`

type: number
Number of rows returned by the query.

`status`

Enum: Completed, Failed, Cancelled, TimedOut, Incomplete
The final status of the query execution.

{ `storage`
{ `home`
`cloud`

Enum: aws
Cloud provider hosting the query engine.

{ `aws`
`region`

type: string
AWS region where the query was executed. Present when cloud is aws.

}
}
{ `locations`

type: array
Storage locations the query read from or wrote to. Each item is one of two variants — objectStore (customer-owned) or high (provider-owned).

{ items
`type`

Enum: objectStore, high
Location variant.

`locationOwner`

Enum: customer, provider
customer for objectStore locations; provider for high locations.

{ `objectStore`

Present when type is objectStore.

`type`

Enum: aws_s3, gcp_gs, azure_blobStorage, ibm_cos
Object-store type.

{ `aws_s3`

Present when objectStore.type is aws_s3.

`bucket`

type: string
S3 bucket name.

`region`

type: string
S3 bucket region.

}
{ `stats`
`bytesRead`

type: number
Bytes read from this object-store location.

`getRequests`

type: number
Number of GET requests issued to this object-store location.

`headRequests`

type: number
Number of HEAD requests issued to this object-store location.

}
}
{ `high`

Present when type is high.

{ `stats`
`bytesRead`

type: number
Bytes read from provider-owned storage.

}
}
}
}
{ `outputs`

type: array
Outputs produced by the query (e.g., results materialized via writeto).

{ items
`type`

Enum: writeto
Output type.

{ `writeto`
{ `dataset`
{ `stats`
`totalBytesWritten`

type: number
Total compressed bytes written across all targets.

`totalUncompressedBytes`

type: number
Total uncompressed bytes written across all targets.

}
{ `targets`

type: array
Per-target writeto results.

{ items
`dataset`

type: string
Target dataset name.

`dataspace`

type: string
Target dataspace.

`writeMode`

Enum: append, overwrite
Write mode used for this target.

{ `stats`
`bytesWritten`

type: number
Compressed bytes written to this target.

`uncompressedBytes`

type: number
Uncompressed bytes written to this target.

}
}
}
}
}
}
}
{ `stats`
`bytesRead`

type: number
Bytes read from customer-exposed storage (customer bucket or provider-owned storage). Excludes cache and staging bucket reads.

`crossRegionBytesRead`

type: number
Bytes read from a different region than the DataPrime Query Engine (DQE) cluster's region.

{ `limits`

Per-resource limit signals. Each entry includes reached (whether the limit was hit) and, where applicable, limit (the threshold) and value (the measured amount). Threshold values vary by tier and account configuration.

{ `scan`
`reached`

type: boolean
true if the scan limit was reached.

`limit`

type: number
Scan-byte threshold.

`value`

type: number
Bytes the engine scanned (may include in-memory cache, not just storage).

}
{ `shuffleSize`
`reached`

type: boolean
true if the shuffle size limit was reached.

`limit`

type: number
Shuffle-byte threshold.

`value`

type: number
Bytes shuffled by the query.

}
{ `filesRead`
`reached`

type: boolean
true if the files-read limit was reached.

`limit`

type: number
Files-read threshold.

`value`

type: number
Number of files the query read.

}
{ `aggBuckets`
`reached`

type: boolean
true if the aggregation-buckets limit was reached.

`limit`

type: number
Aggregation-buckets threshold.

`value`

type: number
Number of aggregation buckets used.

}
{ `column`
`reached`

type: boolean
true if the column limit was reached. The column count itself is not emitted because it is computed at ingest, not at query time.

}
{ `scrollTimeout`
`reached`

type: boolean
true if the scroll operation timed out.

}
}
}
}
`failureType`

Enum: bad request, rate limit reached, business timeout, not found, permission denied, internal, resource exhausted, internal death, query failed, query timed out
The reason the query failed, if applicable.

`failureClass`

Enum: clientError, serverError
Classification of the error as client-side or server-side.

`e2eDurationMs`

type: number
Total execution time from request to response in milliseconds.

}
`queryId`

type: string
A unique identifier for the query execution.

{ `queryBlueprints`
`queryTextSearchFilters`

type: string
A representation of the query's text-based filters.

`queryNoLiterals`

type: string
The query string with literals removed for comparison/normalization.

`queryLabelFilters`

type: string
A normalized form of the label-based filters in the query.

}
{ `defaultTimeFrame`
`start`

type: number
Default time range start, in epoch nanoseconds.

`end`

type: number
Default time range end, in epoch nanoseconds.

`durationMs`

type: number
Total default time range duration in milliseconds.

}
`queryText`

type: string
The original raw query as issued by the user.

}

Next steps

Track field-level schema evolution over time with engine.schema_fields.

Was this page helpful?