- Installation
- Documentation
- Getting Started
- Connect
- Data Import
- Overview
- Data Sources
- CSV Files
- JSON Files
- Overview
- Creating JSON
- Loading JSON
- Writing JSON
- JSON Type
- JSON Functions
- Format Settings
- Installing and Loading
- SQL to / from JSON
- Caveats
- Multiple Files
- Parquet Files
- Partitioning
- Appender
- INSERT Statements
- Client APIs
- Overview
- ADBC
- C
- Overview
- Startup
- Configuration
- Query
- Data Chunks
- Vectors
- Values
- Types
- Prepared Statements
- Appender
- Table Functions
- Replacement Scans
- API Reference
- C++
- CLI
- Dart
- Go
- Java (JDBC)
- Julia
- Node.js (Deprecated)
- Node.js (Neo)
- ODBC
- Python
- Overview
- Data Ingestion
- Conversion between DuckDB and Python
- DB API
- Relational API
- Function API
- Types API
- Expression API
- Spark API
- API Reference
- Known Python Issues
- R
- Rust
- Swift
- Wasm
- SQL
- Introduction
- Statements
- Overview
- ANALYZE
- ALTER TABLE
- ALTER VIEW
- ATTACH and DETACH
- CALL
- CHECKPOINT
- COMMENT ON
- COPY
- CREATE INDEX
- CREATE MACRO
- CREATE SCHEMA
- CREATE SECRET
- CREATE SEQUENCE
- CREATE TABLE
- CREATE VIEW
- CREATE TYPE
- DELETE
- DESCRIBE
- DROP
- EXPORT and IMPORT DATABASE
- INSERT
- LOAD / INSTALL
- PIVOT
- Profiling
- SELECT
- SET / RESET
- SET VARIABLE
- SUMMARIZE
- Transaction Management
- UNPIVOT
- UPDATE
- USE
- VACUUM
- Query Syntax
- SELECT
- FROM and JOIN
- WHERE
- GROUP BY
- GROUPING SETS
- HAVING
- ORDER BY
- LIMIT and OFFSET
- SAMPLE
- Unnesting
- WITH
- WINDOW
- QUALIFY
- VALUES
- FILTER
- Set Operations
- Prepared Statements
- Data Types
- Overview
- Array
- Bitstring
- Blob
- Boolean
- Date
- Enum
- Interval
- List
- Literal Types
- Map
- NULL Values
- Numeric
- Struct
- Text
- Time
- Timestamp
- Time Zones
- Union
- Typecasting
- Expressions
- Overview
- CASE Expression
- Casting
- Collations
- Comparisons
- IN Operator
- Logical Operators
- Star Expression
- Subqueries
- TRY
- Functions
- Overview
- Aggregate Functions
- Array Functions
- Bitstring Functions
- Blob Functions
- Date Format Functions
- Date Functions
- Date Part Functions
- Enum Functions
- Interval Functions
- Lambda Functions
- List Functions
- Map Functions
- Nested Functions
- Numeric Functions
- Pattern Matching
- Regular Expressions
- Struct Functions
- Text Functions
- Time Functions
- Timestamp Functions
- Timestamp with Time Zone Functions
- Union Functions
- Utility Functions
- Window Functions
- Constraints
- Indexes
- Meta Queries
- DuckDB's SQL Dialect
- Overview
- Indexing
- Friendly SQL
- Keywords and Identifiers
- Order Preservation
- PostgreSQL Compatibility
- SQL Quirks
- Samples
- Configuration
- Extensions
- Overview
- Installing Extensions
- Advanced Installation Methods
- Distributing Extensions
- Versioning of Extensions
- Troubleshooting of Extensions
- Core Extensions
- Overview
- AutoComplete
- Avro
- AWS
- Azure
- Delta
- Encodings
- Excel
- Full Text Search
- httpfs (HTTP and S3)
- Iceberg
- Overview
- Iceberg REST Catalogs
- Amazon S3 Tables
- Amazon SageMaker Lakehouse (AWS Glue)
- Troubleshooting
- ICU
- inet
- jemalloc
- MySQL
- PostgreSQL
- Spatial
- SQLite
- TPC-DS
- TPC-H
- UI
- VSS
- Guides
- Overview
- Data Viewers
- Database Integration
- File Formats
- Overview
- CSV Import
- CSV Export
- Directly Reading Files
- Excel Import
- Excel Export
- JSON Import
- JSON Export
- Parquet Import
- Parquet Export
- Querying Parquet Files
- File Access with the file: Protocol
- Network and Cloud Storage
- Overview
- HTTP Parquet Import
- S3 Parquet Import
- S3 Parquet Export
- S3 Iceberg Import
- S3 Express One
- GCS Import
- Cloudflare R2 Import
- DuckDB over HTTPS / S3
- Fastly Object Storage Import
- Meta Queries
- Describe Table
- EXPLAIN: Inspect Query Plans
- EXPLAIN ANALYZE: Profile Queries
- List Tables
- Summarize
- DuckDB Environment
- ODBC
- Performance
- Overview
- Environment
- Import
- Schema
- Indexing
- Join Operations
- File Formats
- How to Tune Workloads
- My Workload Is Slow
- Benchmarks
- Working with Huge Databases
- Python
- Installation
- Executing SQL
- Jupyter Notebooks
- marimo Notebooks
- SQL on Pandas
- Import from Pandas
- Export to Pandas
- Import from Numpy
- Export to Numpy
- SQL on Arrow
- Import from Arrow
- Export to Arrow
- Relational API on Pandas
- Multiple Python Threads
- Integration with Ibis
- Integration with Polars
- Using fsspec Filesystems
- SQL Editors
- SQL Features
- Snippets
- Creating Synthetic Data
- Dutch Railway Datasets
- Sharing Macros
- Analyzing a Git Repository
- Importing Duckbox Tables
- Copying an In-Memory Database to a File
- Troubleshooting
- Glossary of Terms
- Browsing Offline
- Operations Manual
- Overview
- DuckDB's Footprint
- Logging
- Securing DuckDB
- Non-Deterministic Behavior
- Limits
- Development
- DuckDB Repositories
- Profiling
- Building DuckDB
- Overview
- Build Configuration
- Building Extensions
- Android
- Linux
- macOS
- Raspberry Pi
- Windows
- Python
- R
- Troubleshooting
- Unofficial and Unsupported Platforms
- Benchmark Suite
- Testing
- Internals
- Why DuckDB
- FAQ
- Code of Conduct
- Release Calendar
- Roadmap
- Sitemap
- Live Demo
The SELECT
statement retrieves rows from the database.
Examples
Select all columns from the table tbl
:
SELECT * FROM tbl;
Select the rows from tbl
:
SELECT j FROM tbl WHERE i = 3;
Perform an aggregate grouped by the column i
:
SELECT i, sum(j) FROM tbl GROUP BY i;
Select only the top 3 rows from the tbl
:
SELECT * FROM tbl ORDER BY i DESC LIMIT 3;
Join two tables together using the USING
clause:
SELECT * FROM t1 JOIN t2 USING (a, b);
Use column indexes to select the first and third column from the table tbl
:
SELECT #1, #3 FROM tbl;
Select all unique cities from the addresses table:
SELECT DISTINCT city FROM addresses;
Return a STRUCT
by using a row variable:
SELECT d
FROM (SELECT 1 AS a, 2 AS b) d;
Syntax
The SELECT
statement retrieves rows from the database. The canonical order of a SELECT
statement is as follows, with less common clauses being indented:
SELECT select_list
FROM tables
USING SAMPLE sample_expression
WHERE condition
GROUP BY groups
HAVING group_filter
WINDOW window_expression
QUALIFY qualify_filter
ORDER BY order_expression
LIMIT n;
Optionally, the SELECT
statement can be prefixed with a WITH
clause.
As the SELECT
statement is so complex, we have split up the syntax diagrams into several parts. The full syntax diagram can be found at the bottom of the page.
SELECT
Clause
The SELECT
clause specifies the list of columns that will be returned by the query. While it appears first in the clause, logically the expressions here are executed only at the end. The SELECT
clause can contain arbitrary expressions that transform the output, as well as aggregates and window functions. The DISTINCT
keyword ensures that only unique tuples are returned.
Column names are case-insensitive. See the Rules for Case Sensitivity for more details.
FROM
Clause
The FROM
clause specifies the source of the data on which the remainder of the query should operate. Logically, the FROM
clause is where the query starts execution. The FROM
clause can contain a single table, a combination of multiple tables that are joined together, or another SELECT
query inside a subquery node.
SAMPLE
Clause
The SAMPLE
clause allows you to run the query on a sample from the base table. This can significantly speed up processing of queries, at the expense of accuracy in the result. Samples can also be used to quickly see a snapshot of the data when exploring a data set. The SAMPLE
clause is applied right after anything in the FROM
clause (i.e., after any joins, but before the where clause or any aggregates). See the Samples page for more information.
WHERE
Clause
The WHERE
clause specifies any filters to apply to the data. This allows you to select only a subset of the data in which you are interested. Logically the WHERE
clause is applied immediately after the FROM
clause.
GROUP BY
and HAVING
Clauses
The GROUP BY
clause specifies which grouping columns should be used to perform any aggregations in the SELECT
clause. If the GROUP BY
clause is specified, the query is always an aggregate query, even if no aggregations are present in the SELECT
clause.
WINDOW
Clause
The WINDOW
clause allows you to specify named windows that can be used within window functions. These are useful when you have multiple window functions, as they allow you to avoid repeating the same window clause.
QUALIFY
Clause
The QUALIFY
clause is used to filter the result of WINDOW
functions.
ORDER BY
, LIMIT
and OFFSET
Clauses
ORDER BY
, LIMIT
and OFFSET
are output modifiers.
Logically they are applied at the very end of the query.
The ORDER BY
clause sorts the rows on the sorting criteria in either ascending or descending order.
The LIMIT
clause restricts the amount of rows fetched, while the OFFSET
clause indicates at which position to start reading the values.
VALUES
List
A VALUES
list is a set of values that is supplied instead of a SELECT
statement.
Row IDs
For each table, the rowid
pseudocolumn returns the row identifiers based on the physical storage.
CREATE TABLE t (id INTEGER, content VARCHAR);
INSERT INTO t VALUES (42, 'hello'), (43, 'world');
SELECT rowid, id, content FROM t;
rowid | id | content |
---|---|---|
0 | 42 | hello |
1 | 43 | world |
In the current storage, these identifiers are contiguous unsigned integers (0, 1, …) if no rows were deleted. Deletions introduce gaps in the rowids which may be reclaimed later:
CREATE OR REPLACE TABLE t AS (FROM range(10) r(i));
DELETE FROM t WHERE i % 2 = 0;
SELECT rowid FROM t;
rowid |
---|
1 |
3 |
5 |
7 |
9 |
The rowid
values are stable within a transaction.
Bestpractice It is strongly advised to avoid using rowids as identifiers.
If there is a user-defined column named
rowid
, it shadows therowid
pseudocolumn.
Common Table Expressions
Full Syntax Diagram
Below is the full syntax diagram of the SELECT
statement: