Skip to content

Parameters Reference

Complete reference for all read_ast() parameters.

Required Parameters

File Patterns

Type: VARCHAR or LIST(VARCHAR)

Specify which files to parse.

-- Single file
SELECT * FROM read_ast('main.py');

-- Glob pattern
SELECT * FROM read_ast('src/**/*.py');

-- File array
SELECT * FROM read_ast(['main.py', 'utils.py']);

-- Mixed array
SELECT * FROM read_ast(['src/**/*.py', 'lib/**/*.js', 'main.cpp']);

Glob Syntax

Pattern Matches
* Any characters except /
** Any path (recursive)
? Single character
[abc] Character set
{a,b} Alternatives

Language Parameter

Type: VARCHAR Default: 'auto'

Override automatic language detection.

-- Auto-detect (default)
SELECT * FROM read_ast('script.py');

-- Explicit language
SELECT * FROM read_ast('script.txt', 'python');

-- For arrays (applies to all files)
SELECT * FROM read_ast(['**/*.txt'], 'javascript');

Supported Languages

Language Identifier Extensions
Python 'python' .py
JavaScript 'javascript' .js, .jsx
TypeScript 'typescript' .ts, .tsx
Java 'java' .java
C 'c' .c, .h
C++ 'cpp' .cpp, .hpp, .cc
C# 'csharp' .cs
Go 'go' .go
Rust 'rust' .rs
Ruby 'ruby' .rb
PHP 'php' .php
Swift 'swift' .swift
Kotlin 'kotlin' .kt, .kts
Lua 'lua' .lua
R 'r' .r, .R
Dart 'dart' .dart
Zig 'zig' .zig
SQL 'sql' .sql
DuckDB 'duckdb' .duckdb
Dart 'dart' .dart
Markdown 'markdown' .md, .markdown
HTML 'html' .html, .htm
CSS 'css' .css
JSON 'json' .json
Bash 'bash' .sh, .bash
HCL 'hcl' .hcl, .tf, .tfvars
GraphQL 'graphql' .graphql, .gql
TOML 'toml' .toml

Optional Parameters

ignore_errors

Type: BOOLEAN Default: false

Continue processing when files fail to parse.

-- Stop on first error (default)
SELECT * FROM read_ast('**/*.py');

-- Continue despite errors
SELECT * FROM read_ast('**/*.py', ignore_errors := true);

context

Type: VARCHAR Default: 'native'

Control semantic analysis depth.

Value Description Performance
'none' Raw AST only Fastest
'node_types_only' + Semantic types Fast
'normalized' + Names Medium
'native' Full extraction Detailed
-- Fastest (raw AST)
SELECT * FROM read_ast('file.py', context := 'none');

-- Full analysis (default)
SELECT * FROM read_ast('file.py', context := 'native');

source

Type: VARCHAR Default: 'lines'

Control source text extraction.

Value Description
'none' No source text
'path' File path only
'lines_only' Line numbers only
'lines' Line-based info
'full' Complete source
-- No source extraction
SELECT * FROM read_ast('file.py', source := 'none');

-- Full source text
SELECT * FROM read_ast('file.py', source := 'full');

structure

Type: VARCHAR Default: 'full'

Control tree structure extraction.

Value Description
'none' No structure info
'minimal' Basic structure
'full' Complete structure
-- Minimal structure
SELECT * FROM read_ast('file.py', structure := 'minimal');

peek

Type: ANY Default: 'smart'

Control source code snippet extraction.

Value Description
'none' No peek
'smart' Intelligent truncation
'full' Complete source
Integer Character limit
-- No peek
SELECT * FROM read_ast('file.py', peek := 'none');

-- Custom size
SELECT * FROM read_ast('file.py', peek := 200);

-- Smart truncation (default)
SELECT * FROM read_ast('file.py', peek := 'smart');

+schema Extraction Suffix

Available on: context, source, structure, peek Introduced: v1.7.0

Any of the extraction-level parameters can take a +schema suffix that keeps all columns in the output schema as NULLs without computing them. This gives SQL macros and downstream queries a stable schema even when they only need a subset of columns — the expensive data is skipped but the columns are still present.

-- Skip peek computation (the expensive part) but keep the peek column
-- in the schema so macros that reference it don't break.
SELECT name, peek FROM read_ast('file.py', peek := 'none+schema');
-- peek column is always NULL, but it exists.

-- Keep all context columns in the schema, compute only up to normalized level.
-- native-level fields (parameters, modifiers, annotations, signature_type)
-- will be NULL even though they appear in the output.
SELECT * FROM read_ast('file.py', context := 'normalized+schema');

Why use it: the primary use case is SQL macros that need to declare a stable output schema regardless of which columns they actually populate. ast_select uses peek := 'none+schema' internally to skip peek computation while keeping the column available for selectors that reference it.

What it affects: only the data is suppressed — the column remains in the output schema, typed normally, with NULL values. Any query that references a suppressed column sees NULL without erroring.


peek_size

Type: INTEGER Default: 120

Custom peek size in characters.

SELECT * FROM read_ast('file.py', peek_size := 250);

peek_mode

Type: VARCHAR Default: 'smart'

Peek extraction mode.

SELECT * FROM read_ast('file.py', peek_mode := 'smart');

max_depth

Type: INTEGER Default: -1 (unlimited) Introduced: v1.8.0

Limit AST tree depth at parse time. Nodes beyond the specified depth are not emitted. Boundary nodes at the depth limit have children_count = 0 and descendant_count = 0.

-- Root node only
SELECT * FROM read_ast('file.py', max_depth := 0);

-- Root + direct children
SELECT * FROM read_ast('file.py', max_depth := 1);

-- Unlimited (default)
SELECT * FROM read_ast('file.py', max_depth := -1);

prune

Type: LIST(VARCHAR) Default: [] (no pruning) Introduced: v1.8.0

Remove categories of nodes at parse time. The tree is automatically healed — parent_id, children_count, descendant_count, and sibling_index stay valid after pruning.

-- Remove syntax-only nodes
SELECT * FROM read_ast('file.py', prune := ['syntax']);

-- Remove comments and literals
SELECT * FROM read_ast('file.py', prune := ['comments', 'literals']);

-- Combine with max_depth
SELECT * FROM read_ast('file.py', prune := ['syntax'], max_depth := 3);

Available policies:

Policy Removes Mode
syntax Syntax-only nodes (keywords, operators, brackets) Re-parents children
comments Comment nodes Re-parents children
punctuation Parser punctuation tokens Re-parents children
unnamed Nodes with empty names Re-parents children
literals Literal value nodes Drops subtree
imports Import/use statements Drops subtree
types Type annotation nodes Drops subtree
leaves Leaf nodes (no children) Re-parents children
internal Non-exported internal definitions Drops subtree

Re-parents children: The pruned node is removed but its children are attached to the grandparent. Drops subtree: The node and all its descendants are removed entirely.


batch_size

Type: INTEGER

Batch size for streaming large file sets.

SELECT * FROM read_ast(['**/*.py', '**/*.js'], batch_size := 10);

Parameter Combinations

Maximum Performance

SELECT file_path, type, COUNT(*)
FROM read_ast(
    '**/*.py',
    context := 'none',
    source := 'none',
    structure := 'none',
    peek := 'none',
    ignore_errors := true
)
GROUP BY file_path, type;

Full Analysis

SELECT *
FROM read_ast(
    'src/**/*.py',
    context := 'native',
    source := 'full',
    structure := 'full',
    peek := 'full'
);

Balanced

SELECT file_path, type, name, start_line
FROM read_ast(
    'src/**/*.py',
    context := 'normalized',
    source := 'lines',
    peek := 120,
    ignore_errors := true
);

Next Steps