Output Schema¶
Complete reference for read_ast() output columns.
Quick Reference¶
Default output:
| Column | Type | Description |
|---|---|---|
node_id |
BIGINT | Unique node identifier within file |
type |
VARCHAR | Tree-sitter AST node type |
semantic_type |
SEMANTIC_TYPE | Universal semantic category |
flags |
UTINYINT | Node property flags |
name |
VARCHAR | Extracted identifier name |
signature_type |
VARCHAR | Type/return type information |
parameters |
STRUCT[] | Function parameters with names and types |
modifiers |
VARCHAR[] | Access modifiers and keywords |
annotations |
VARCHAR | Decorator/annotation text |
qualified_name |
LIST<STRUCT> | Scope-based path as segment list (semantic_type + name + index), unique within a file |
scope |
STRUCT | Scope info: {current, function, class, module, stack}. See scope below. |
file_path |
VARCHAR | Source file path |
language |
VARCHAR | Programming language |
start_line |
UINTEGER | Starting line (1-based) |
end_line |
UINTEGER | Ending line (1-based) |
parent_id |
BIGINT | Parent node ID |
depth |
UINTEGER | Tree depth (0 for root) |
sibling_index |
UINTEGER | Position among siblings |
children_count |
UINTEGER | Direct child count |
descendant_count |
UINTEGER | Total descendant count |
peek |
VARCHAR | Source code snippet |
Additional columns with source := 'full':
| Column | Type | Description |
|---|---|---|
start_column |
UINTEGER | Starting column (1-based) |
end_column |
UINTEGER | Ending column (1-based) |
-- Get column positions with source := 'full'
SELECT name, start_line, start_column, end_line, end_column
FROM read_ast('test/data/python/sample_app.py', source := 'full')
WHERE is_function_definition(semantic_type);
Identity Columns¶
node_id¶
Type: BIGINT
Unique identifier for each AST node within a file.
- Starts at 0 for the root node
- Sequential within each file
- Use with
parent_idfor tree traversal
type¶
Type: VARCHAR
Language-specific AST node type from Tree-sitter.
-- Common types vary by language
SELECT DISTINCT type
FROM read_ast('test/data/python/sample_app.py')
ORDER BY type;
Python examples: function_definition, class_definition, identifier
JavaScript examples: function_declaration, class_declaration, identifier
semantic_type¶
Type: SEMANTIC_TYPE (custom logical type)
Universal semantic category. This is a custom DuckDB type that:
- Displays as string - Shows DEFINITION_FUNCTION instead of numeric code
- Supports string comparison - WHERE semantic_type = 'DEFINITION_FUNCTION'
- Stores efficiently - Underlying storage is UTINYINT for fast comparisons
-- Direct string comparison (natural syntax)
SELECT * FROM read_ast('test/data/python/sample_app.py')
WHERE semantic_type = 'DEFINITION_FUNCTION';
-- Group by semantic type
SELECT semantic_type, COUNT(*)
FROM read_ast('test/data/python/sample_app.py')
GROUP BY semantic_type
ORDER BY COUNT(*) DESC;
- Cross-language semantic classification
- See Semantic Types for values
flags¶
Type: UTINYINT
Bitfield containing node properties. Use helper predicates to check flags:
| Flag | Predicate | Description |
|---|---|---|
IS_CONSTRUCT |
is_construct(flags) |
Semantic construct (not punctuation) |
IS_EMBODIED |
is_embodied(flags) / has_body(flags) |
Has implementation body |
IS_DECLARATION_ONLY |
is_declaration_only(flags) |
Forward declaration without body |
IS_SYNTAX_ONLY |
is_syntax_only(flags) |
Pure syntax token (keyword, punctuation) |
-- Find function definitions with implementations (not just declarations)
SELECT name, file_path
FROM read_ast('**/*.cpp', ignore_errors := true)
WHERE is_function_definition(semantic_type)
AND has_body(flags);
-- Find forward declarations only
SELECT name, file_path
FROM read_ast('**/*.h', ignore_errors := true)
WHERE is_function_definition(semantic_type)
AND is_declaration_only(flags);
Name and Extraction Columns¶
name¶
Type: VARCHAR (nullable)
Extracted identifier or name for the node.
SELECT name, type, start_line
FROM read_ast('test/data/python/sample_app.py')
WHERE name IS NOT NULL;
- NULL for nodes without meaningful names
- Populated for definitions, identifiers, and named constructs
- Extraction depends on language-specific rules
signature_type¶
Type: VARCHAR (nullable)
Type information extracted from the node. Meaning varies by semantic type:
| Semantic Type | signature_type Contains |
|---|---|
DEFINITION_FUNCTION |
Return type (int, void, *big.Int) |
DEFINITION_CLASS |
Class kind (class, interface, trait, struct) |
COMPUTATION_CALL |
Full call expression (obj.method, pkg.func) |
DEFINITION_VARIABLE |
Variable type |
-- Find functions with their return types
SELECT name, signature_type as return_type, start_line
FROM read_ast('src/**/*.go')
WHERE is_function_definition(semantic_type);
-- Find method calls by signature
SELECT file_path, start_line, signature_type
FROM read_ast('src/**/*.cpp')
WHERE is_function_call(semantic_type)
AND signature_type LIKE '%.empty';
parameters¶
Type: STRUCT("name" VARCHAR, "type" VARCHAR)[]
Function parameters as an array of structs containing parameter name and type.
-- List functions with their parameters
SELECT
name,
parameters,
array_length(parameters) as param_count
FROM read_ast('test/data/python/sample_app.py')
WHERE is_function_definition(semantic_type);
-- Find functions with specific parameter names
SELECT name, parameters
FROM read_ast('src/**/*.py')
WHERE is_function_definition(semantic_type)
AND list_contains([p.name FOR p IN parameters], 'self');
- Empty array
[]for functions with no parameters - Parameter types may be NULL for dynamically-typed languages
modifiers¶
Type: VARCHAR[]
Access modifiers, keywords, and other declarative attributes.
-- Find public static methods in Java
SELECT name, modifiers
FROM read_ast('src/**/*.java')
WHERE is_function_definition(semantic_type)
AND list_contains(modifiers, 'public')
AND list_contains(modifiers, 'static');
-- Find async functions
SELECT name, file_path
FROM read_ast('src/**/*.js')
WHERE is_function_definition(semantic_type)
AND list_contains(modifiers, 'async');
Common modifier values by language:
- Java: public, private, protected, static, final, abstract
- JavaScript: async, const, let, var
- Go: var
- Rust: pub, mut, async
annotations¶
Type: VARCHAR (nullable)
Decorator or annotation text for the node.
-- Find decorated Python functions
SELECT name, annotations
FROM read_ast('src/**/*.py')
WHERE is_function_definition(semantic_type)
AND annotations IS NOT NULL;
qualified_name¶
Type: LIST(STRUCT(semantic_type SEMANTIC_TYPE, name VARCHAR, index INTEGER)) (nullable)
Scope-based definition path that uniquely identifies a node within its file. Each element of the list is one segment of the path from outermost scope to innermost. Built during AST traversal by tracking nested definition scopes.
Segment fields:
| Field | Type | Meaning |
|---|---|---|
semantic_type |
SEMANTIC_TYPE |
The semantic role of this scope level (e.g. DEFINITION_FUNCTION, DEFINITION_CLASS, DEFINITION_VARIABLE, DEFINITION_MODULE, EXTERNAL_IMPORT, EXTERNAL_EXPORT). Cast to VARCHAR for the long-form name. |
name |
VARCHAR |
The identifier name at that scope level. |
index |
INTEGER |
1-based occurrence index for disambiguating same-name collisions within a scope. The first x = ... in a scope gets index = 1; the second gets index = 2; and so on. Always explicit. |
-- Inspect the raw segment list
SELECT name, qualified_name
FROM read_ast('src/models.py')
WHERE name = 'validate';
-- validate [{semantic_type: DEFINITION_CLASS, name: User, index: 1},
-- {semantic_type: DEFINITION_FUNCTION, name: validate, index: 1}]
-- Filter by semantic role at any depth (e.g. all definitions inside a class)
SELECT name
FROM read_ast('src/**/*.py')
WHERE len(list_filter(qualified_name,
s -> is_class_definition(s.semantic_type))) > 0;
-- Innermost enclosing class name
SELECT name,
(list_reverse(list_filter(qualified_name,
s -> is_class_definition(s.semantic_type))))[1].name AS in_class
FROM read_ast('src/**/*.py')
WHERE is_function_definition(semantic_type);
-- Top-level definitions only (depth = 1)
SELECT name
FROM read_ast('src/**/*.py')
WHERE len(qualified_name) = 1 AND is_function_definition(semantic_type);
-- Innermost segment
SELECT name, qualified_name[-1].name, qualified_name[-1].index
FROM read_ast('src/**/*.py')
WHERE is_definition(semantic_type);
-- Composite join key across parses
SELECT a.name, a.start_line, b.start_line
FROM read_ast('v1/**/*.py') a
JOIN read_ast('v2/**/*.py') b USING (file_path, qualified_name);
Collision disambiguation. Within a single file, each (semantic_type, name) pair in a given scope gets a monotonically increasing index. The first occurrence has index = 1, the second has index = 2, and so on. This makes qualified_name unique within a file for every named definition.
counter = 0 # [{variable, counter, 1}]
counter = 1 # [{variable, counter, 2}]
counter = 2 # [{variable, counter, 3}]
Counters reset at scope boundaries — each function/class gets a fresh counter, so [{function, foo}, {variable, x}] and [{function, bar}, {variable, x}] both use index = 1 for their x.
String form. For display, logging, or test assertions, ast_qualified_name_as_string(qualified_name) renders the list into the legacy bracket form:
SELECT name, ast_qualified_name_as_string(qualified_name)
FROM read_ast('src/models.py')
WHERE name = 'validate';
-- validate C[User] F[validate]
The bracket form uses single-letter prefixes (F/C/V/M/I/E) and omits the [N] suffix for index = 1, so V[x] means "the first x" and V[x][2] means "the second x in that scope".
Key properties:
- NULL for non-definition nodes (the
qualified_namecolumn holds a NULL list, not an empty one) - Only populated for named definition nodes (functions, classes, variables, modules, imports, exports)
- Available at
context := 'normalized'and above - Unique within a file — every named definition node gets a distinct segment list
- Joinable:
USING (file_path, qualified_name)works as a cross-parse composite key
Source Location Columns¶
file_path¶
Type: VARCHAR
Path to the source file.
- Relative paths preserved from input
- Absolute paths if provided as input
language¶
Type: VARCHAR
Detected or specified programming language.
SELECT language, COUNT(*) as node_count
FROM read_ast(['**/*.py', '**/*.js'], ignore_errors := true)
GROUP BY language;
start_line / end_line¶
Type: UINTEGER
Line numbers (1-based).
SELECT name, start_line, end_line, end_line - start_line + 1 as line_count
FROM read_ast('test/data/python/sample_app.py')
WHERE type = 'function_definition';
start_column / end_column¶
Type: UINTEGER
Column positions (1-based). Only available with source := 'full'.
-- Must use source := 'full' to get column positions
SELECT name, start_line, start_column, end_line, end_column
FROM read_ast('test/data/python/sample_app.py', source := 'full')
WHERE type = 'identifier';
Tree Structure Columns¶
parent_id¶
Type: BIGINT (nullable)
Node ID of the parent node.
-- Find children of a specific node
SELECT type, name
FROM read_ast('test/data/python/sample_app.py')
WHERE parent_id = 5;
- NULL for root node
- Use for tree traversal
depth¶
Type: UINTEGER
Tree depth (0 for root).
-- Find deeply nested code
SELECT type, name, depth
FROM read_ast('test/data/python/sample_app.py')
WHERE depth > 5
ORDER BY depth DESC;
sibling_index¶
Type: UINTEGER
Position among siblings (0-based).
SELECT type, sibling_index
FROM read_ast('test/data/python/sample_app.py')
WHERE parent_id = 0
ORDER BY sibling_index;
children_count¶
Type: UINTEGER
Number of direct child nodes.
SELECT type, name, children_count
FROM read_ast('test/data/python/sample_app.py')
WHERE children_count > 10;
descendant_count¶
Type: UINTEGER
Total number of descendant nodes. Useful as a complexity metric.
-- Find complex functions
SELECT name, descendant_count as complexity
FROM read_ast('test/data/python/sample_app.py')
WHERE type = 'function_definition'
ORDER BY complexity DESC;
Scope Column¶
scope¶
Type: STRUCT<current BIGINT, function BIGINT, class BIGINT, module BIGINT, stack LIST<STRUCT<id BIGINT, kind SEMANTIC_TYPE>>>
Unified scope information for each node. Replaces the pre-v1.7.4 scope_id and scope_stack columns.
Named shortcuts (populated on every node):
| Field | Type | Description |
|---|---|---|
scope.current |
BIGINT | Nearest enclosing scope's node_id. NULL at root. |
scope.function |
BIGINT | Nearest enclosing function ancestor's node_id. NULL if not inside a function. |
scope.class |
BIGINT | Nearest enclosing class/struct/trait ancestor's node_id. NULL if not inside a class. |
scope.module |
BIGINT | Nearest enclosing module/namespace ancestor's node_id. NULL if not inside a named module. |
Scope stack (populated on scope-boundary nodes only):
| Field | Type | Description |
|---|---|---|
scope.stack |
LIST<STRUCT> | Full chain of enclosing scopes from outermost to this node. Each entry has id (node_id) and kind (SEMANTIC_TYPE). NULL on non-scope nodes. |
-- "What function is this call inside?" — single field read, no join needed
SELECT name, scope.function AS enclosing_fn
FROM read_ast('src/**/*.py')
WHERE semantic_type = 'COMPUTATION_CALL';
-- "All class methods (inside a class but not inside a nested function)"
SELECT name, scope.class, scope.function
FROM read_ast('src/**/*.py')
WHERE semantic_type = 'DEFINITION_FUNCTION'
AND scope.class IS NOT NULL;
-- "Find calls inside a try block (kind-filtered stack walk)"
SELECT name FROM read_ast('src/**/*.py') c
JOIN read_ast('src/**/*.py') s
ON s.node_id = c.scope.current AND s.file_path = c.file_path
WHERE c.semantic_type = 'COMPUTATION_CALL'
AND s.scope.stack IS NOT NULL
AND len(list_filter(s.scope.stack, e -> e.kind = 'ERROR_TRY')) > 0;
Content Columns¶
peek¶
Type: VARCHAR (nullable)
Source code snippet for the node.
SELECT type, name, peek
FROM read_ast('test/data/python/sample_app.py', peek := 100)
WHERE type = 'function_definition';
- Size controlled by
peekparameter - NULL if
peek := 'none' - Use
peek := 'full'for complete source text
Column Availability by Parameter¶
By context Parameter¶
| Column | 'none' |
'node_types_only' |
'normalized' |
'native' |
|---|---|---|---|---|
node_id |
Yes | Yes | Yes | Yes |
type |
Yes | Yes | Yes | Yes |
semantic_type |
No | Yes | Yes | Yes |
flags |
No | Yes | Yes | Yes |
name |
No | No | Yes | Yes |
signature_type |
No | No | No | Yes |
parameters |
No | No | No | Yes |
modifiers |
No | No | No | Yes |
annotations |
No | No | No | Yes |
qualified_name |
No | No | Yes | Yes |
file_path |
Yes | Yes | Yes | Yes |
language |
Yes | Yes | Yes | Yes |
start_line |
Yes | Yes | Yes | Yes |
end_line |
Yes | Yes | Yes | Yes |
parent_id |
Yes* | Yes* | Yes* | Yes |
depth |
Yes* | Yes* | Yes* | Yes |
sibling_index |
Yes* | Yes* | Yes* | Yes |
children_count |
Yes* | Yes* | Yes* | Yes |
descendant_count |
Yes* | Yes* | Yes* | Yes |
peek |
Yes** | Yes** | Yes** | Yes** |
* Depends on structure parameter
** Depends on peek parameter
By source Parameter¶
| Column | 'none' |
'path' |
'lines_only' |
'lines' |
'full' |
|---|---|---|---|---|---|
file_path |
No | Yes | Yes | Yes | Yes |
start_line |
No | No | Yes | Yes | Yes |
end_line |
No | No | Yes | Yes | Yes |
start_column |
No | No | No | No | Yes |
end_column |
No | No | No | No | Yes |
-- Default: line positions only
SELECT start_line, end_line FROM read_ast('test/data/python/sample_app.py');
-- With source := 'full': includes column positions
SELECT start_line, start_column, end_line, end_column
FROM read_ast('test/data/python/sample_app.py', source := 'full');
Working with Output¶
Create Table from Results¶
CREATE TABLE ast_cache AS
SELECT * FROM read_ast('src/**/*.py', ignore_errors := true);
-- Query cached data
SELECT type, COUNT(*) FROM ast_cache GROUP BY type;
Export to Parquet¶
COPY (
SELECT file_path, type, name, start_line, semantic_type
FROM read_ast('src/**/*.py')
) TO 'ast_export.parquet';
Join with Other Data¶
-- Join with file metadata
SELECT
ast.file_path,
ast.name,
files.last_modified
FROM read_ast('src/**/*.py') ast
JOIN file_metadata files ON ast.file_path = files.path
WHERE ast.type = 'function_definition';
Next Steps¶
- Core Functions - Function reference
- Parameters - Parameter reference
- Semantic Types - Type system
- Native Extraction Semantics - Cross-language field behavior