Core Functions¶
Complete reference for Sitting Duck's main functions.
read_ast()¶
Parse source code files and return AST data as a table.
Signatures¶
-- Single file
read_ast(file_path VARCHAR) -> TABLE
-- Single file with language
read_ast(file_path VARCHAR, language VARCHAR) -> TABLE
-- File array
read_ast(file_patterns LIST(VARCHAR)) -> TABLE
-- File array with language
read_ast(file_patterns LIST(VARCHAR), language VARCHAR) -> TABLE
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
file_path |
VARCHAR | required | File path or glob pattern |
file_patterns |
LIST(VARCHAR) | required | Array of file paths or glob patterns |
language |
VARCHAR | 'auto' |
Language name or 'auto' for detection |
ignore_errors |
BOOLEAN | false |
Continue on parse errors |
context |
VARCHAR | 'native' |
'none', 'node_types_only', 'normalized', 'native' |
source |
VARCHAR | 'lines' |
'none', 'path', 'lines_only', 'lines', 'full' |
structure |
VARCHAR | 'full' |
'none', 'minimal', 'full' |
peek |
ANY | 'smart' |
'none', 'smart', 'full', or integer size |
peek_size |
INTEGER | 120 |
Custom peek size in characters |
peek_mode |
VARCHAR | 'smart' |
Peek extraction mode |
batch_size |
INTEGER | - | Batch size for streaming |
Output Schema¶
| Column | Type | Description |
|---|---|---|
node_id |
BIGINT | Unique node identifier |
type |
VARCHAR | Tree-sitter AST node type |
semantic_type |
SEMANTIC_TYPE | Universal semantic category |
flags |
UTINYINT | Node property flags |
name |
VARCHAR | Extracted identifier name |
signature_type |
VARCHAR | Type/return type information |
parameters |
STRUCT[] | Function parameters (name and type) |
modifiers |
VARCHAR[] | Access modifiers and keywords |
annotations |
VARCHAR | Decorator/annotation text |
qualified_name |
LIST<STRUCT> | Scope path as segment list of {semantic_type, name, index}. Use ast_qualified_name_as_string() to render as C[User] F[__init__] for display. See Output Schema for full details. |
scope |
STRUCT | Scope info bundle: {current, function, class, module, stack}. current is the nearest enclosing scope's node_id; function/class/module are precomputed shortcuts to the nearest ancestor of that kind (so a.scope.function gives "what function is this inside?" as a single field read, no range join needed). stack is a typed LIST<STRUCT<id, kind SEMANTIC_TYPE>> populated only on scope boundary nodes. See Scope structure below. |
file_path |
VARCHAR | Source file path |
language |
VARCHAR | Detected language |
start_line |
UINTEGER | Starting line (1-based) |
end_line |
UINTEGER | Ending line (1-based) |
start_column |
UINTEGER | Starting column (only with source := 'full') |
end_column |
UINTEGER | Ending column (only with source := 'full') |
parent_id |
BIGINT | Parent node ID (NULL for root) |
depth |
UINTEGER | Tree depth (0 for root) |
sibling_index |
UINTEGER | Position among siblings |
children_count |
UINTEGER | Direct children count |
descendant_count |
UINTEGER | Total descendants |
peek |
VARCHAR | Source code snippet |
Scope structure¶
The scope column is a STRUCT with five fields, letting common scope
questions resolve to a single field read instead of a range join:
scope: STRUCT<
current BIGINT, -- every node: nearest enclosing scope's node_id
function BIGINT, -- every node: nearest function ancestor's node_id
class BIGINT, -- every node: nearest class/struct/trait ancestor
module BIGINT, -- every node: nearest module/namespace ancestor
stack LIST<STRUCT<id BIGINT, kind SEMANTIC_TYPE>>
-- scope nodes only: full ancestor chain with kinds
>
Common access patterns:
-- "what function is this inside?"
SELECT name FROM nodes WHERE scope.function = <target_function_node_id>;
-- "all module-level definitions"
SELECT * FROM nodes WHERE scope.current IS NULL OR scope.current <= 0;
-- "all calls inside any try block (kind-filtered scope walk)"
SELECT * FROM nodes c
WHERE c.semantic_type = 'COMPUTATION_CALL'
AND list_filter(c.scope.stack, s -> s.kind = 'ERROR_TRY') != [];
Replaces the pre-v1.7.4 scope_id and scope_stack columns:
- scope_id → scope.current
- scope_stack → scope.stack (now typed; project .id for the scalar node_id)
See Output Schema for detailed column documentation and parameter effects.
Examples¶
-- Basic usage
SELECT * FROM read_ast('main.py');
-- With language override
SELECT * FROM read_ast('script.txt', 'python');
-- Multiple files
SELECT * FROM read_ast(['src/**/*.py', 'lib/**/*.js']);
-- With options
SELECT * FROM read_ast(
'src/**/*.py',
context := 'native',
ignore_errors := true,
peek := 200
);
parse_ast()¶
Parse source code from a string.
Signature¶
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
source_code |
VARCHAR | Yes | Source code to parse |
language |
VARCHAR | Yes | Programming language |
Output Schema¶
Same as read_ast() but with synthetic file_path.
Examples¶
-- Parse Python code
SELECT * FROM parse_ast('def hello(): pass', 'python');
-- Parse JavaScript
SELECT type, name FROM parse_ast('function greet() { return "hi"; }', 'javascript');
-- Find specific constructs
SELECT name, start_line
FROM parse_ast('
class MyClass:
def method1(self):
pass
def method2(self):
pass
', 'python')
WHERE type = 'function_definition';
parse_ast_list()¶
Parse source code from a string and return the AST as a LIST<STRUCT> — the scalar equivalent of parse_ast(). Useful inside CTEs, lateral joins, and scalar expressions where a table function isn't allowed.
Signature¶
Parameters¶
| Parameter | Type | Required | Description |
|---|---|---|---|
source_code |
VARCHAR | Yes | Source code to parse |
language |
VARCHAR | Yes | Programming language |
Returns NULL on invalid input language; other parse errors are raised as InvalidInputException.
Output¶
A LIST(STRUCT(...)) whose element struct type matches the default flat schema of read_ast() (type, name, semantic_type, qualified_name, start_line, end_line, ...).
Examples¶
-- Count nodes in a snippet
SELECT length(parse_ast_list('def hello(): pass', 'python'));
-- → 6
-- Use inside a CTE / scalar context where parse_ast() table function isn't allowed
WITH ast_nodes AS (
SELECT unnest(parse_ast_list('x = 1; y = 2', 'python')) AS node
)
SELECT node.type, node.name FROM ast_nodes WHERE node.name IS NOT NULL;
-- Per-row parsing of a column (lateral-friendly)
SELECT file.path, length(parse_ast_list(file.content, 'python')) AS node_count
FROM my_files file;
Introduced in v1.7.0 specifically to unblock CTE/lateral-join use cases that can't call table functions. Internally used by ast_select's pattern-matching pseudo-classes (:match, :contains).
parse_ast_list_table()¶
Table-macro wrapper around parse_ast_list(). Presents the same row-shaped interface as parse_ast() (so it can be used as a drop-in replacement in FROM clauses) while internally calling parse_ast_list() so it accepts column-valued arguments.
Signature¶
Parameters¶
| Parameter | Type | Description |
|---|---|---|
source_code |
VARCHAR | Source code to parse. May be a column reference or expression (unlike parse_ast(), which requires a literal). |
language |
VARCHAR | Programming language |
Output¶
Same row shape as parse_ast(): node_id, parent_id, type, name, start_line, end_line, depth, sibling_index, descendant_count, semantic_type, flags, peek, and the other flat schema columns.
Example¶
-- Drop-in replacement for parse_ast() in FROM clauses
SELECT node_id, type, name
FROM parse_ast_list_table('.fn#main {}', 'css')
WHERE type = 'class_selector';
Why this exists¶
parse_ast() is a table function and, like all DuckDB table functions, its arguments must be literals at bind time. parse_ast_list_table() is the macro-level escape hatch: it wraps the scalar parse_ast_list() inside a table-shaped relation, which allows downstream SQL macros to substitute column-valued arguments at macro-expansion time and still produce a row-shaped result.
ast_select()'s internal sel CTE uses parse_ast_list_table rather than parse_ast for exactly this reason — it's the plumbing that enables per-row selector dispatch for features like ast_select_rules().
ast_supported_languages()¶
List all supported languages.
Signature¶
Output Schema¶
| Column | Type | Description |
|---|---|---|
language |
VARCHAR | Language identifier |
display_name |
VARCHAR | Human-readable name |
extensions |
VARCHAR[] | Supported file extensions |
Example¶
Error Handling¶
Invalid File Path¶
SELECT * FROM read_ast('nonexistent.py');
-- Error: File not found: nonexistent.py
-- Use ignore_errors to skip
SELECT * FROM read_ast('nonexistent.py', ignore_errors := true);
-- Returns empty result set
Empty Array¶
NULL Values¶
Parse Errors¶
-- With ignore_errors, parse errors create ERROR nodes
SELECT * FROM read_ast('broken.py', ignore_errors := true)
WHERE type = 'ERROR';
Next Steps¶
- Utility Functions - Predicates, file utilities, and helper functions
- Parameters Reference - Detailed parameter documentation
- Output Schema - Column details
- Semantic Types - Type system reference