Context Extraction¶
Control the depth of semantic analysis with context extraction levels.
Context Levels¶
The context parameter controls how much semantic information is extracted:
| Level | Speed | Detail | Use Case |
|---|---|---|---|
'none' |
Fastest | Raw AST only | Simple node counting |
'node_types_only' |
Fast | Semantic types | Basic classification |
'normalized' |
Medium | + Names | Cross-language analysis |
'native' |
Slower | Full extraction | Detailed analysis |
Using Context Levels¶
None - Raw AST Only¶
Fastest processing, minimal information:
SELECT type, COUNT(*)
FROM read_ast('src/**/*.py', context := 'none')
GROUP BY type
ORDER BY COUNT(*) DESC;
Output includes:
- type - Raw AST node type
- file_path, start_line, end_line, etc.
- No semantic_type, name, or native fields
Node Types Only¶
Adds semantic type classification:
SELECT
type,
semantic_type,
COUNT(*)
FROM read_ast('src/**/*.py', context := 'node_types_only')
WHERE semantic_type IS NOT NULL
GROUP BY type, semantic_type
ORDER BY COUNT(*) DESC;
Output includes:
- Everything from 'none'
- semantic_type - Universal semantic category
Normalized¶
Adds name extraction:
SELECT type, name, semantic_type
FROM read_ast('src/**/*.py', context := 'normalized')
WHERE type = 'function_definition'
AND name IS NOT NULL;
Output includes:
- Everything from 'node_types_only'
- name - Extracted identifier names
Native (Default)¶
Full semantic analysis with language-specific context:
SELECT type, name, semantic_type, signature_type, parameters, modifiers
FROM read_ast('src/**/*.py', context := 'native')
WHERE type = 'function_definition';
Output includes:
- Everything from 'normalized'
- signature_type — return type or type annotation
- parameters — function parameter list (name and type)
- modifiers — access modifiers and keywords (public, static, async, etc.)
- annotations — decorator/annotation text
Native Context Details¶
Function Extraction¶
For function definitions, native context extracts:
-- Python function with parameters and return type
SELECT name, signature_type, parameters
FROM read_ast('test/data/python/sample_app.py', context := 'native')
WHERE type = 'function_definition';
signature_type contains the return type annotation. parameters is a STRUCT[] with name and type fields per parameter.
Class Extraction¶
For class definitions:
SELECT name, signature_type, modifiers
FROM read_ast('Example.java', context := 'native')
WHERE type = 'class_declaration';
Method Signatures¶
-- Java method signatures
SELECT name, signature_type, parameters, modifiers
FROM read_ast('MyClass.java', context := 'native')
WHERE type = 'method_declaration';
Choosing the Right Level¶
Use 'none' When:¶
- Counting nodes quickly
- File-level statistics only
- Maximum performance needed
-- Fast file size comparison
SELECT
file_path,
COUNT(*) as node_count
FROM read_ast('**/*.py', context := 'none', ignore_errors := true)
GROUP BY file_path
ORDER BY node_count DESC;
Use 'node_types_only' When:¶
- Cross-language type analysis
- Semantic categorization without names
- Moderate performance needs
-- Semantic type distribution
SELECT
semantic_type_to_string(semantic_type) as type_name,
language,
COUNT(*)
FROM read_ast(['**/*.py', '**/*.js'], context := 'node_types_only', ignore_errors := true)
GROUP BY semantic_type, language
ORDER BY COUNT(*) DESC;
Use 'normalized' When:¶
- Name-based analysis
- Symbol extraction
- No need for detailed signatures
-- Find all named definitions
SELECT name, type, file_path
FROM read_ast('**/*.py', context := 'normalized')
WHERE is_definition(semantic_type)
AND name IS NOT NULL;
Use 'native' (Default) When:¶
- Full semantic analysis
- API documentation
- Detailed code understanding
-- Complete function inventory
SELECT
name,
signature_type,
parameters,
file_path,
start_line
FROM read_ast('src/**/*.py', context := 'native')
WHERE semantic_type = 'DEFINITION_FUNCTION'
ORDER BY file_path, start_line;
Performance Comparison¶
-- Benchmark different context levels
-- (Run each separately and compare times)
-- Fastest
SELECT COUNT(*) FROM read_ast('**/*.py', context := 'none', ignore_errors := true);
-- Fast
SELECT COUNT(*) FROM read_ast('**/*.py', context := 'node_types_only', ignore_errors := true);
-- Medium
SELECT COUNT(*) FROM read_ast('**/*.py', context := 'normalized', ignore_errors := true);
-- Detailed (default)
SELECT COUNT(*) FROM read_ast('**/*.py', context := 'native', ignore_errors := true);
Combining with Other Parameters¶
With Source Control¶
-- Full context, full source text
SELECT type, name, signature_type, parameters, peek
FROM read_ast('test/data/python/sample_app.py', context := 'native', source := 'full');
With Structure Control¶
-- Minimal structure, full context
SELECT type, name, semantic_type
FROM read_ast('test/data/python/sample_app.py', context := 'native', structure := 'minimal');
Optimized for Large Codebases¶
-- Balance performance and information
SELECT
file_path,
name,
semantic_type_to_string(semantic_type) as type_name,
start_line
FROM read_ast(
'**/*.py',
context := 'normalized', -- Faster than 'native'
source := 'none', -- Skip source extraction
structure := 'minimal', -- Minimal tree info
ignore_errors := true
)
WHERE is_definition(semantic_type);
Next Steps¶
- Cross-Language Analysis - Comparing across languages
- Semantic Types - Type system details
- API Reference - All parameters