Parameters Reference¶

Complete reference for all read_ast() parameters.

Required Parameters¶

File Patterns¶

Type: VARCHAR or LIST(VARCHAR)

Specify which files to parse.

-- Single file
SELECT * FROM read_ast('main.py');

-- Glob pattern
SELECT * FROM read_ast('src/**/*.py');

-- File array
SELECT * FROM read_ast(['main.py', 'utils.py']);

-- Mixed array
SELECT * FROM read_ast(['src/**/*.py', 'lib/**/*.js', 'main.cpp']);

Glob Syntax¶

Pattern	Matches
`*`	Any characters except `/`
`**`	Any path (recursive)
`?`	Single character
`[abc]`	Character set
`{a,b}`	Alternatives

Language Parameter¶

Type: VARCHAR Default: 'auto'

Override automatic language detection.

-- Auto-detect (default)
SELECT * FROM read_ast('script.py');

-- Explicit language
SELECT * FROM read_ast('script.txt', 'python');

-- For arrays (applies to all files)
SELECT * FROM read_ast(['**/*.txt'], 'javascript');

Supported Languages¶

Language	Identifier	Extensions
Python	`'python'`	`.py`
JavaScript	`'javascript'`	`.js`, `.jsx`
TypeScript	`'typescript'`	`.ts`, `.tsx`
Java	`'java'`	`.java`
C	`'c'`	`.c`, `.h`
C++	`'cpp'`	`.cpp`, `.hpp`, `.cc`
C#	`'csharp'`	`.cs`
Go	`'go'`	`.go`
Rust	`'rust'`	`.rs`
Ruby	`'ruby'`	`.rb`
PHP	`'php'`	`.php`
Swift	`'swift'`	`.swift`
Kotlin	`'kotlin'`	`.kt`, `.kts`
Lua	`'lua'`	`.lua`
R	`'r'`	`.r`, `.R`
Dart	`'dart'`	`.dart`
Zig	`'zig'`	`.zig`
SQL	`'sql'`	`.sql`
DuckDB	`'duckdb'`	`.duckdb`
Dart	`'dart'`	`.dart`
Markdown	`'markdown'`	`.md`, `.markdown`
HTML	`'html'`	`.html`, `.htm`
CSS	`'css'`	`.css`
JSON	`'json'`	`.json`
Bash	`'bash'`	`.sh`, `.bash`
HCL	`'hcl'`	`.hcl`, `.tf`, `.tfvars`
GraphQL	`'graphql'`	`.graphql`, `.gql`
TOML	`'toml'`	`.toml`

Optional Parameters¶

`ignore_errors`¶

Type: BOOLEAN Default: false

Continue processing when files fail to parse.

-- Stop on first error (default)
SELECT * FROM read_ast('**/*.py');

-- Continue despite errors
SELECT * FROM read_ast('**/*.py', ignore_errors := true);

`context`¶

Type: VARCHAR Default: 'native'

Control semantic analysis depth.

Value	Description	Performance
`'none'`	Raw AST only	Fastest
`'node_types_only'`	+ Semantic types	Fast
`'normalized'`	+ Names	Medium
`'native'`	Full extraction	Detailed

-- Fastest (raw AST)
SELECT * FROM read_ast('file.py', context := 'none');

-- Full analysis (default)
SELECT * FROM read_ast('file.py', context := 'native');

`source`¶

Type: VARCHAR Default: 'lines'

Control source text extraction.

Value	Description
`'none'`	No source text
`'path'`	File path only
`'lines_only'`	Line numbers only
`'lines'`	Line-based info
`'full'`	Complete source

-- No source extraction
SELECT * FROM read_ast('file.py', source := 'none');

-- Full source text
SELECT * FROM read_ast('file.py', source := 'full');

`structure`¶

Type: VARCHAR Default: 'full'

Control tree structure extraction.

Value	Description
`'none'`	No structure info
`'minimal'`	Basic structure
`'full'`	Complete structure

-- Minimal structure
SELECT * FROM read_ast('file.py', structure := 'minimal');

`peek`¶

Type: ANY Default: 'smart'

Control source code snippet extraction.

Value	Description
`'none'`	No peek
`'smart'`	Intelligent truncation
`'full'`	Complete source
Integer	Character limit

-- No peek
SELECT * FROM read_ast('file.py', peek := 'none');

-- Custom size
SELECT * FROM read_ast('file.py', peek := 200);

-- Smart truncation (default)
SELECT * FROM read_ast('file.py', peek := 'smart');

`+schema` Extraction Suffix¶

Available on: context, source, structure, peek Introduced: v1.7.0

Any of the extraction-level parameters can take a +schema suffix that keeps all columns in the output schema as NULLs without computing them. This gives SQL macros and downstream queries a stable schema even when they only need a subset of columns — the expensive data is skipped but the columns are still present.

-- Skip peek computation (the expensive part) but keep the peek column
-- in the schema so macros that reference it don't break.
SELECT name, peek FROM read_ast('file.py', peek := 'none+schema');
-- peek column is always NULL, but it exists.

-- Keep all context columns in the schema, compute only up to normalized level.
-- native-level fields (parameters, modifiers, annotations, signature_type)
-- will be NULL even though they appear in the output.
SELECT * FROM read_ast('file.py', context := 'normalized+schema');

Why use it: the primary use case is SQL macros that need to declare a stable output schema regardless of which columns they actually populate. ast_select uses peek := 'none+schema' internally to skip peek computation while keeping the column available for selectors that reference it.

What it affects: only the data is suppressed — the column remains in the output schema, typed normally, with NULL values. Any query that references a suppressed column sees NULL without erroring.

`peek_size`¶

Type: INTEGER Default: 120

Custom peek size in characters.

SELECT * FROM read_ast('file.py', peek_size := 250);

`peek_mode`¶

Type: VARCHAR Default: 'smart'

Peek extraction mode.

SELECT * FROM read_ast('file.py', peek_mode := 'smart');

`max_depth`¶

Type: INTEGER Default: -1 (unlimited) Introduced: v1.8.0

Limit AST tree depth at parse time. Nodes beyond the specified depth are not emitted. Boundary nodes at the depth limit have children_count = 0 and descendant_count = 0.

-- Root node only
SELECT * FROM read_ast('file.py', max_depth := 0);

-- Root + direct children
SELECT * FROM read_ast('file.py', max_depth := 1);

-- Unlimited (default)
SELECT * FROM read_ast('file.py', max_depth := -1);

`prune`¶

Type: LIST(VARCHAR) Default: [] (no pruning) Introduced: v1.8.0

Remove categories of nodes at parse time. The tree is automatically healed — parent_id, children_count, descendant_count, and sibling_index stay valid after pruning.

-- Remove syntax-only nodes
SELECT * FROM read_ast('file.py', prune := ['syntax']);

-- Remove comments and literals
SELECT * FROM read_ast('file.py', prune := ['comments', 'literals']);

-- Combine with max_depth
SELECT * FROM read_ast('file.py', prune := ['syntax'], max_depth := 3);

Available policies:

Policy	Removes	Mode
`syntax`	Syntax-only nodes (keywords, operators, brackets)	Re-parents children
`comments`	Comment nodes	Re-parents children
`punctuation`	Parser punctuation tokens	Re-parents children
`unnamed`	Nodes with empty names	Re-parents children
`literals`	Literal value nodes	Drops subtree
`imports`	Import/use statements	Drops subtree
`types`	Type annotation nodes	Drops subtree
`leaves`	Leaf nodes (no children)	Re-parents children
`internal`	Non-exported internal definitions	Drops subtree

Re-parents children: The pruned node is removed but its children are attached to the grandparent. Drops subtree: The node and all its descendants are removed entirely.

`batch_size`¶

Type: INTEGER

Batch size for streaming large file sets.

SELECT * FROM read_ast(['**/*.py', '**/*.js'], batch_size := 10);

Parameter Combinations¶

Maximum Performance¶

SELECT file_path, type, COUNT(*)
FROM read_ast(
    '**/*.py',
    context := 'none',
    source := 'none',
    structure := 'none',
    peek := 'none',
    ignore_errors := true
)
GROUP BY file_path, type;

Full Analysis¶

SELECT *
FROM read_ast(
    'src/**/*.py',
    context := 'native',
    source := 'full',
    structure := 'full',
    peek := 'full'
);

Balanced¶

SELECT file_path, type, name, start_line
FROM read_ast(
    'src/**/*.py',
    context := 'normalized',
    source := 'lines',
    peek := 120,
    ignore_errors := true
);

Next Steps¶

Output Schema - Column reference
Core Functions - Function reference
Semantic Types - Type system

Parameters Reference¶

Required Parameters¶

File Patterns¶

Glob Syntax¶

Language Parameter¶

Supported Languages¶

Optional Parameters¶

ignore_errors¶

context¶

source¶

structure¶

peek¶

+schema Extraction Suffix¶

peek_size¶

peek_mode¶

max_depth¶

prune¶

batch_size¶