Parameters Reference¶
Complete reference for all read_ast() parameters.
Required Parameters¶
File Patterns¶
Type: VARCHAR or LIST(VARCHAR)
Specify which files to parse.
-- Single file
SELECT * FROM read_ast('main.py');
-- Glob pattern
SELECT * FROM read_ast('src/**/*.py');
-- File array
SELECT * FROM read_ast(['main.py', 'utils.py']);
-- Mixed array
SELECT * FROM read_ast(['src/**/*.py', 'lib/**/*.js', 'main.cpp']);
Glob Syntax¶
| Pattern | Matches |
|---|---|
* |
Any characters except / |
** |
Any path (recursive) |
? |
Single character |
[abc] |
Character set |
{a,b} |
Alternatives |
Language Parameter¶
Type: VARCHAR
Default: 'auto'
Override automatic language detection.
-- Auto-detect (default)
SELECT * FROM read_ast('script.py');
-- Explicit language
SELECT * FROM read_ast('script.txt', 'python');
-- For arrays (applies to all files)
SELECT * FROM read_ast(['**/*.txt'], 'javascript');
Supported Languages¶
| Language | Identifier | Extensions |
|---|---|---|
| Python | 'python' |
.py |
| JavaScript | 'javascript' |
.js, .jsx |
| TypeScript | 'typescript' |
.ts, .tsx |
| Java | 'java' |
.java |
| C | 'c' |
.c, .h |
| C++ | 'cpp' |
.cpp, .hpp, .cc |
| C# | 'csharp' |
.cs |
| Go | 'go' |
.go |
| Rust | 'rust' |
.rs |
| Ruby | 'ruby' |
.rb |
| PHP | 'php' |
.php |
| Swift | 'swift' |
.swift |
| Kotlin | 'kotlin' |
.kt, .kts |
| Lua | 'lua' |
.lua |
| R | 'r' |
.r, .R |
| Dart | 'dart' |
.dart |
| Zig | 'zig' |
.zig |
| SQL | 'sql' |
.sql |
| DuckDB | 'duckdb' |
.duckdb |
| Dart | 'dart' |
.dart |
| Markdown | 'markdown' |
.md, .markdown |
| HTML | 'html' |
.html, .htm |
| CSS | 'css' |
.css |
| JSON | 'json' |
.json |
| Bash | 'bash' |
.sh, .bash |
| HCL | 'hcl' |
.hcl, .tf, .tfvars |
| GraphQL | 'graphql' |
.graphql, .gql |
| TOML | 'toml' |
.toml |
Optional Parameters¶
ignore_errors¶
Type: BOOLEAN
Default: false
Continue processing when files fail to parse.
-- Stop on first error (default)
SELECT * FROM read_ast('**/*.py');
-- Continue despite errors
SELECT * FROM read_ast('**/*.py', ignore_errors := true);
context¶
Type: VARCHAR
Default: 'native'
Control semantic analysis depth.
| Value | Description | Performance |
|---|---|---|
'none' |
Raw AST only | Fastest |
'node_types_only' |
+ Semantic types | Fast |
'normalized' |
+ Names | Medium |
'native' |
Full extraction | Detailed |
-- Fastest (raw AST)
SELECT * FROM read_ast('file.py', context := 'none');
-- Full analysis (default)
SELECT * FROM read_ast('file.py', context := 'native');
source¶
Type: VARCHAR
Default: 'lines'
Control source text extraction.
| Value | Description |
|---|---|
'none' |
No source text |
'path' |
File path only |
'lines_only' |
Line numbers only |
'lines' |
Line-based info |
'full' |
Complete source |
-- No source extraction
SELECT * FROM read_ast('file.py', source := 'none');
-- Full source text
SELECT * FROM read_ast('file.py', source := 'full');
structure¶
Type: VARCHAR
Default: 'full'
Control tree structure extraction.
| Value | Description |
|---|---|
'none' |
No structure info |
'minimal' |
Basic structure |
'full' |
Complete structure |
peek¶
Type: ANY
Default: 'smart'
Control source code snippet extraction.
| Value | Description |
|---|---|
'none' |
No peek |
'smart' |
Intelligent truncation |
'full' |
Complete source |
| Integer | Character limit |
-- No peek
SELECT * FROM read_ast('file.py', peek := 'none');
-- Custom size
SELECT * FROM read_ast('file.py', peek := 200);
-- Smart truncation (default)
SELECT * FROM read_ast('file.py', peek := 'smart');
+schema Extraction Suffix¶
Available on: context, source, structure, peek
Introduced: v1.7.0
Any of the extraction-level parameters can take a +schema suffix that keeps all columns in the output schema as NULLs without computing them. This gives SQL macros and downstream queries a stable schema even when they only need a subset of columns — the expensive data is skipped but the columns are still present.
-- Skip peek computation (the expensive part) but keep the peek column
-- in the schema so macros that reference it don't break.
SELECT name, peek FROM read_ast('file.py', peek := 'none+schema');
-- peek column is always NULL, but it exists.
-- Keep all context columns in the schema, compute only up to normalized level.
-- native-level fields (parameters, modifiers, annotations, signature_type)
-- will be NULL even though they appear in the output.
SELECT * FROM read_ast('file.py', context := 'normalized+schema');
Why use it: the primary use case is SQL macros that need to declare a stable output schema regardless of which columns they actually populate. ast_select uses peek := 'none+schema' internally to skip peek computation while keeping the column available for selectors that reference it.
What it affects: only the data is suppressed — the column remains in the output schema, typed normally, with NULL values. Any query that references a suppressed column sees NULL without erroring.
peek_size¶
Type: INTEGER
Default: 120
Custom peek size in characters.
peek_mode¶
Type: VARCHAR
Default: 'smart'
Peek extraction mode.
max_depth¶
Type: INTEGER
Default: -1 (unlimited)
Introduced: v1.8.0
Limit AST tree depth at parse time. Nodes beyond the specified depth are not emitted. Boundary nodes at the depth limit have children_count = 0 and descendant_count = 0.
-- Root node only
SELECT * FROM read_ast('file.py', max_depth := 0);
-- Root + direct children
SELECT * FROM read_ast('file.py', max_depth := 1);
-- Unlimited (default)
SELECT * FROM read_ast('file.py', max_depth := -1);
prune¶
Type: LIST(VARCHAR)
Default: [] (no pruning)
Introduced: v1.8.0
Remove categories of nodes at parse time. The tree is automatically healed — parent_id, children_count, descendant_count, and sibling_index stay valid after pruning.
-- Remove syntax-only nodes
SELECT * FROM read_ast('file.py', prune := ['syntax']);
-- Remove comments and literals
SELECT * FROM read_ast('file.py', prune := ['comments', 'literals']);
-- Combine with max_depth
SELECT * FROM read_ast('file.py', prune := ['syntax'], max_depth := 3);
Available policies:
| Policy | Removes | Mode |
|---|---|---|
syntax |
Syntax-only nodes (keywords, operators, brackets) | Re-parents children |
comments |
Comment nodes | Re-parents children |
punctuation |
Parser punctuation tokens | Re-parents children |
unnamed |
Nodes with empty names | Re-parents children |
literals |
Literal value nodes | Drops subtree |
imports |
Import/use statements | Drops subtree |
types |
Type annotation nodes | Drops subtree |
leaves |
Leaf nodes (no children) | Re-parents children |
internal |
Non-exported internal definitions | Drops subtree |
Re-parents children: The pruned node is removed but its children are attached to the grandparent. Drops subtree: The node and all its descendants are removed entirely.
batch_size¶
Type: INTEGER
Batch size for streaming large file sets.
Parameter Combinations¶
Maximum Performance¶
SELECT file_path, type, COUNT(*)
FROM read_ast(
'**/*.py',
context := 'none',
source := 'none',
structure := 'none',
peek := 'none',
ignore_errors := true
)
GROUP BY file_path, type;
Full Analysis¶
SELECT *
FROM read_ast(
'src/**/*.py',
context := 'native',
source := 'full',
structure := 'full',
peek := 'full'
);
Balanced¶
SELECT file_path, type, name, start_line
FROM read_ast(
'src/**/*.py',
context := 'normalized',
source := 'lines',
peek := 120,
ignore_errors := true
);
Next Steps¶
- Output Schema - Column reference
- Core Functions - Function reference
- Semantic Types - Type system