Skip to content

Sitting Duck v1.7.0

Release date: 2026-04-10

This release brings three user-visible changes and a large round of internal refactoring that prepares ast_select for per-row (column-valued) dispatch.

Highlights

  • parse_ast_list() — a scalar variant of parse_ast() that returns LIST<STRUCT>. Usable inside CTEs, scalar expressions, and lateral joins where a table function isn't allowed.
  • +schema extraction suffix — any extraction-level parameter (context, source, structure, peek) can take a +schema suffix that keeps the full column set in the output schema but skips computing the underlying data. Enables SQL macros to declare stable output schemas independent of which columns they actually use.
  • qualified_name is now a structured pathLIST(STRUCT(semantic_type, name, index)) instead of a bracket string. Each segment carries the full semantic type (composable with is_function_definition, etc.), the identifier name, and an explicit 1-based occurrence index that disambiguates same-scope duplicates. Makes qualified_name unique within a file and trivially queryable via DuckDB's list/struct functions. A new ast_qualified_name_as_string() macro renders the list as C[User] F[__init__]-style strings for display, logging, and LIKE matching.
  • :match / :contains pseudo-classes (replacing :matches) — ast_select now distinguishes "the current node IS the parsed pattern" (:match, strict) from "some descendant IS the parsed pattern" (:contains, lenient).

Breaking changes

qualified_name column: VARCHAR → LIST<STRUCT>

Before (legacy, pre-v1.7.0 dev):

qualified_name: VARCHAR
Values: 'C/User F/__init__ V/name'

After:

qualified_name: LIST(STRUCT(semantic_type SEMANTIC_TYPE, name VARCHAR, index INTEGER))
Values: [{semantic_type: DEFINITION_CLASS,    name: User,     index: 1},
         {semantic_type: DEFINITION_FUNCTION, name: __init__, index: 1},
         {semantic_type: DEFINITION_VARIABLE, name: name,     index: 1}]

Why: structured scope paths compose cleanly with DuckDB's list/struct operators, integrate with the existing semantic type predicates, and make the explicit per-scope collision index queryable without parsing. The bracket string format it replaces was a transitional step that had a short shelf life in internal v1.7.0 builds and never shipped to the community extension index.

Migration: if you wrote tooling against the bracket string format during v1.7.0 development, either migrate to structural queries or wrap column references in the new display helper:

Old pattern New pattern
qualified_name LIKE '%F[%' len(list_filter(qualified_name, s -> is_function_definition(s.semantic_type))) > 0
qualified_name LIKE '%[foo]%' list_contains(list_transform(qualified_name, s -> s.name), 'foo')
Display / debug output ast_qualified_name_as_string(qualified_name)
Composite join key USING (file_path, qualified_name) — struct list equality works as a join key unchanged

Note that ast_qualified_name_as_string() preserves the old LIKE-idiom workflow: it renders the list as the same C[User] F[__init__] V[name] strings the interim format used, so ast_qualified_name_as_string(qualified_name) LIKE '%F[%' is a drop-in replacement for the old bracket-string LIKE queries.

:matches() pseudo-class renamed to :contains(), :match() added with new semantics

CSS's :matches / :is historically operates on the current element, but sitting duck's :matches was doing a subtree scan — a mismatch with the convention. This is now fixed:

  • :match("code") — the current node IS the root of the parsed pattern. Strict: target's node type must equal the pattern root's type. .func:match("db.execute()") returns zero rows (a function is not a call).
  • :contains("code") — some descendant IS the root of the parsed pattern. Equivalent to :has(:match("code")) — the old :matches behavior.

Migration: every existing :matches("...") usage can be replaced with :contains("...") with identical semantics. :matches is removed, not aliased — typos fail loudly (see unknown-pseudo-class detection below).

The :match / :contains split unlocks cleaner selectors:

-- Exact call pattern (was impossible before):
call:match("db.execute()")

-- Functions that contain the pattern (old :matches behavior):
.func:contains("db.execute()")

New features expanded

parse_ast_list() scalar function

-- Count nodes in a snippet
SELECT length(parse_ast_list('def hello(): pass', 'python'));

-- Use inside a CTE where parse_ast() table function isn't allowed
WITH nodes AS (
    SELECT unnest(parse_ast_list('x = 1; y = 2', 'python')) AS n
)
SELECT n.type, n.name FROM nodes WHERE n.name IS NOT NULL;

Closes #62. Returns LIST(STRUCT(...)) whose element type matches the default read_ast flat schema. Internally used by ast_select for pattern parsing in :match/:contains.

+schema extraction suffix

Any of the extraction-level parameters can take a +schema suffix that keeps all columns in the output schema as NULLs without computing them:

-- Skip peek computation but keep the peek column in the schema
SELECT name, peek FROM read_ast('file.py', peek := 'none+schema');
-- peek is always NULL, but the column still exists and has the right type.

-- Keep native columns in the schema, compute only up to normalized level
SELECT * FROM read_ast('file.py', context := 'normalized+schema');
-- parameters, modifiers, annotations, signature_type are all NULL

Closes #61. The primary use case is SQL macros that need a stable output schema regardless of which columns they populate. ast_select uses peek := 'none+schema' internally to skip peek computation while keeping the column available for selectors that reference it.

qualified_name as a structured scope path

qualified_name is now LIST(STRUCT(semantic_type SEMANTIC_TYPE, name VARCHAR, index INTEGER)). Each element of the list represents one scope level, from outermost to innermost, with an explicit index that disambiguates same-scope collisions:

counter = 0    # [{DEFINITION_VARIABLE, counter, 1}]
counter = 1    # [{DEFINITION_VARIABLE, counter, 2}]
counter = 2    # [{DEFINITION_VARIABLE, counter, 3}]

Counters reset at scope boundaries — each function/class gets a fresh counter, so [{function, foo}, {variable, x}] and [{function, bar}, {variable, x}] both use index = 1 for their x.

Structured queries become trivial:

-- Innermost enclosing class of each function
SELECT name,
       (list_reverse(list_filter(qualified_name,
                                 s -> is_class_definition(s.semantic_type))))[1].name
FROM read_ast('src/**/*.py')
WHERE is_function_definition(semantic_type);

-- Depth-based filter (top-level definitions only)
SELECT name FROM read_ast('src/**/*.py')
WHERE len(qualified_name) = 1 AND is_function_definition(semantic_type);

-- Innermost segment directly
SELECT qualified_name[-1].name, qualified_name[-1].index
FROM read_ast('src/**/*.py');

For display, logging, or LIKE-style queries, ast_qualified_name_as_string(qualified_name) renders the list as C[User] F[__init__] V[name][2] — the same format that was briefly the native column type during v1.7.0 development. The single-letter prefixes are F / C / V / M / I / E, and the [N] suffix is omitted for index = 1 so the common case stays clean.

qualified_name is now unique within a file for all named definition nodes, which lets tooling use it as a composite join key without any extra disambiguation machinery.

Bug fixes (carried from v1.6.1)

v1.6.1 was tagged for internal history but the community extensions repo jumped straight from v1.6.0 to v1.7.0 — these fixes are included in v1.7.0:

  • :not(:pseudo-class) now works. Previously :not() only handled :not(:has(...)), so :not(:is-called), :not(:is-referenced), and the dead-code patterns documented in the v1.6.0 selector guide were silently dropped from the filter. Queries returned the unfiltered set instead of the filtered one.
  • Unknown pseudo-classes now raise clear errors instead of silently matching everything. Typos like :unreferenced or :callers(2) used to fall through to ELSE true in the dispatch CASE. There's now an enumerated allow-list and an error() call.

Documentation

  • Four broken example selectors fixed in tutorial.md and examples.md (they used pseudo-classes that didn't exist).
  • Eight previously-undocumented semantic type aliases added to the reference table (.external, .typedef, .pattern, .statement, .syntax, .transform, .label, .comp).
  • ::previous-sibling noted as an alias for ::prev-sibling.
  • qualified_name format documented with LIKE-matching properties and the collision-suffix rule.
  • New parse_ast_list() and +schema sections in docs/api/core-functions.md and docs/api/parameters.md.

Internal / infrastructure

A large round of internal refactoring in css_selectors.sql — several CTEs rewritten from correlated subqueries to flat joins. User-visible behavior is unchanged; the goal is to unblock column-valued dispatch for a future ast_select_rules() / ast_select_list() API (currently pending an upstream DuckDB planner fix).

  • sel CTEs split into typed views (sel_tag_names, sel_class_names, sel_pseudo_classes, …) so each references sel exactly once.
  • root_type, combinator_parts, simple_type, simple_id, simple_class, :has, :not, attribute, and additional-pseudo-class extraction all rewritten as flat joins.
  • pseudo_element unwrap logic refactored to avoid nested correlated subqueries.
  • New parse_ast_list_table() table macro wrapping the parse_ast_list() scalar, so ast_select's sel CTE can accept column-valued selector arguments (pending upstream fix).

These refactors don't change any behavior from the outside — all 98+ existing tests pass unchanged — but they were prerequisites for the row-shaped dispatch we need for multi-rule selector APIs.

Test counts

  • v1.6.0: 93 tests / 4671 assertions
  • v1.7.0: 99 tests / 4786 assertions

Not in this release (but coming)

  • ast_select_rules(source, query, language) — parses a multi-rule CSS query once, extracts per-rule selector text + declarations, and dispatches ast_select per rule. The implementation is committed as the "as-designed" form but hits a DuckDB v1.5.1 planner bug in ColumnBindingResolver during dependent-join flattening (duckdb/duckdb#21890). It will light up once the upstream fix lands — no sitting duck changes required, just a submodule bump.

Changelog since v1.6.1

  • 343cfae Add collision-only [N] suffix for unique qualified_name within a file
  • 2a07640 Add ast_select_rules / ast_select_list (WIP pending upstream DuckDB fix)
  • 7bba630 Refactor pseudo_element to avoid nested correlated subqueries
  • eb79a0f Refactor :has, :not, attribute, pseudo-class extraction to flat joins
  • 69112d3 Refactor ast_select root resolution to avoid correlated subqueries
  • 86587bc WIP: parse_ast_list_table wrapper + partial column-valued ast_select fix
  • a963034 Split :matches into :match (current-node) and :contains (subtree)
  • 5c8b5b9 feat: +schema extraction suffix, parse_ast_list(), qualified_name format (#66)