Native Extraction Field Semantics¶
This document describes the semantic meaning of each native extraction field across different languages and semantic types.
Native Extraction Fields¶
| Field | Type | Description |
|---|---|---|
name |
VARCHAR | The identifier name of the node (function name, class name, variable name) |
signature_type |
VARCHAR | Type information (return type for functions, class type, variable type) |
parameters |
VARCHAR[] | Parameter names for functions, argument values for calls |
modifiers |
VARCHAR[] | Access modifiers, keywords, inheritance info |
annotations |
VARCHAR | Decorator/annotation text |
qualified_name |
LIST<STRUCT> | Scope path as segment list [{semantic_type, name, index}, ...], unique within a file. Top-level column, not part of native struct. |
DEFINITION_FUNCTION (Functions/Methods)¶
Field Semantics¶
| Field | Meaning |
|---|---|
name |
Function/method name |
signature_type |
Return type (language-specific) |
parameters |
List of parameter names |
modifiers |
Access modifiers (public, static, async, etc.) |
Cross-Language Comparison¶
| Language | signature_type | parameters | modifiers | Notes |
|---|---|---|---|---|
| Python | NULL | ✓ param names | [] | No return type in AST without annotations |
| Rust | ✓ return type (u64, ()) |
✓ param names | [] | Full return type extraction |
| C++ | ✓ return type (int, void) |
✓ param names | [] | Return type from declaration |
| Java | ✓ return type (void, BigInteger) |
✓ param names | ✓ [public, static] |
Full modifier extraction |
| Go | ✓ return type (*big.Int, float64) |
✓ param names | [] | Supports pointer/complex types |
| JavaScript | function (literal) |
✓ param names | [] | No type system |
| Lua | NULL | [] | [] | Minimal extraction |
Examples¶
┌──────────┬─────────────────────┬────────────────┬────────────┬──────────────────┬────────────────────────────────────────┐
│ Language │ name │ signature_type │ parameters │ modifiers │ peek │
├──────────┼─────────────────────┼────────────────┼────────────┼──────────────────┼────────────────────────────────────────┤
│ python │ factorial │ NULL │ [n] │ [] │ def factorial(n): │
│ rust │ factorial_recursive │ u64 │ [n] │ [] │ fn factorial_recursive(n: u64) -> u64 │
│ cpp │ factorial │ int │ [n] │ [] │ int factorial(int n) │
│ java │ main │ void │ [args] │ [public, static] │ public static void main(String[] args) │
│ go │ factorial │ *big.Int │ [n] │ [] │ func factorial(n int64) *big.Int │
│ js │ factorial │ function │ [n] │ [] │ function factorial(n) │
└──────────┴─────────────────────┴────────────────┴────────────┴──────────────────┴────────────────────────────────────────┘
DEFINITION_CLASS (Classes/Types)¶
Field Semantics¶
| Field | Meaning |
|---|---|
name |
Class/type name |
signature_type |
Class kind (class, interface, abstract_class, trait, enum) |
parameters |
[] (unused for classes) |
modifiers |
Inheritance info (extends X, implements Y), access modifiers |
Cross-Language Comparison¶
| Language | signature_type | modifiers | Notes |
|---|---|---|---|
| Python | class, abstract_class |
✓ inheritance, has_dunder_methods |
Detects ABC subclasses |
| Java | class, interface, abstract_class |
✓ extends, implements, access |
Full OOP support |
| C++ | NULL | [] | Limited extraction |
| Rust | trait, struct, enum |
[] | Trait detection works |
| Go | struct, interface |
[] | Basic type detection |
Examples¶
┌──────────┬───────────────────────┬────────────────┬──────────────────────────────────────┬────────────────────────────────────────┐
│ Language │ type │ signature_type │ modifiers │ peek │
├──────────┼───────────────────────┼────────────────┼──────────────────────────────────────┼────────────────────────────────────────┤
│ python │ class_definition │ class │ [extends_object, has_dunder_methods] │ class BaseQueue(object): │
│ python │ class_definition │ abstract_class │ [abstract, has_dunder_methods] │ class BaseQueue(): │
│ java │ interface_declaration │ interface │ [interface] │ interface Example { │
│ java │ class_declaration │ class │ [implements Example] │ class ExampleImpl implements Example { │
│ java │ class_declaration │ abstract_class │ [abstract] │ abstract class Example { │
│ rust │ trait_item │ trait │ [] │ trait Shape { fn area(self) -> i32; } │
└──────────┴───────────────────────┴────────────────┴──────────────────────────────────────┴────────────────────────────────────────┘
COMPUTATION_CALL (Function/Method Calls)¶
Field Semantics¶
| Field | Meaning |
|---|---|
name |
Function name (for simple calls) OR empty for method calls |
signature_type |
Full call expression (e.g., obj.method, pkg.func) |
parameters |
Argument values/expressions |
modifiers |
[] |
Important Note: Method Calls¶
For method calls like obj.method():
- name is empty (FIND_IDENTIFIER doesn't traverse member expressions)
- signature_type contains the full call: obj.method
- Use signature_type LIKE '%.methodname' to find method calls
Cross-Language Comparison¶
| Language | name (simple call) | signature_type | parameters | Notes |
|---|---|---|---|---|
| Python | ✓ print |
✓ sys.stdout.write |
✓ arg values | Method calls have empty name |
| Java | ✓ println |
✓ System.out |
✓ arg values | Method invocations captured |
| C++ | ✓ std::print |
✓ full qualified | ✓ arg values | Namespace-qualified calls |
| Go | empty for pkg calls | ✓ fmt.Println |
✓ arg values | Package calls use signature_type |
| Rust | ✓ macro names | ✓ macro name | [] | Macros captured separately |
Examples¶
┌──────────┬───────────────────┬────────────┬──────────────────┬────────────┬────────────────────────────────────┐
│ Language │ type │ name │ signature_type │ parameters │ peek │
├──────────┼───────────────────┼────────────┼──────────────────┼────────────┼────────────────────────────────────┤
│ python │ call │ print │ print │ [''] │ print("Hello world!") │
│ python │ call │ │ sys.stdout.write │ [''] │ sys.stdout.write("Hello world!\n") │
│ java │ method_invocation │ println │ System.out │ [''] │ System.out.println("Hello world!") │
│ cpp │ call_expression │ std::print │ std::print │ [''] │ std::print("Hello world!\n") │
│ go │ call_expression │ │ fmt.Println │ [''] │ fmt.Println("Hello world!") │
│ rust │ macro_invocation │ println │ println │ [] │ println!("Hello world!") │
└──────────┴───────────────────┴────────────┴──────────────────┴────────────┴────────────────────────────────────┘
Finding Method Calls¶
-- Find all calls to a method named 'empty' (works across all languages)
SELECT file_path, start_line, signature_type, peek
FROM read_ast('src/**/*.cpp', context := 'native')
WHERE semantic_type_to_string(semantic_type) = 'COMPUTATION_CALL'
AND (
name = 'empty' -- Simple function call
OR signature_type LIKE '%.empty' -- Method call via dot
OR signature_type LIKE '%->empty' -- C++ arrow notation
);
DEFINITION_VARIABLE (Variables/Fields)¶
Field Semantics¶
| Field | Meaning |
|---|---|
name |
Variable name |
signature_type |
Variable type (when available) |
parameters |
[] |
modifiers |
Declaration keywords (var, let, const, final) |
Cross-Language Comparison¶
| Language | signature_type | modifiers | Notes |
|---|---|---|---|
| Go | ✓ type (int, etc.) |
✓ [var] |
var_spec has best extraction |
| Java | ✓ type (boolean[]) |
[] | local_variable_declaration |
| Rust | ✓ type | ✓ [let], [mut] |
Pattern-based |
| Python | NULL | [] | Dynamic typing |
| JavaScript | NULL | ✓ [const], [let], [var] |
Declaration keyword captured |
Examples¶
┌──────────┬────────────────────────────┬─────────────┬────────────────┬───────────┬─────────────────────────────────────┐
│ Language │ type │ name │ signature_type │ modifiers │ peek │
├──────────┼────────────────────────────┼─────────────┼────────────────┼───────────┼─────────────────────────────────────┤
│ go │ var_spec │ door │ int │ [var] │ door int = 1 │
│ go │ var_spec │ incrementer │ NULL │ [var] │ incrementer = 0 │
│ java │ local_variable_declaration │ │ boolean[] │ [] │ boolean[] doors = new boolean[101]; │
│ java │ variable_declarator │ doors │ NULL │ [] │ doors = new boolean[101] │
└──────────┴────────────────────────────┴─────────────┴────────────────┴───────────┴─────────────────────────────────────┘
Known Inconsistencies¶
1. Call Parameters Contain Empty Strings¶
For COMPUTATION_CALL, the parameters field contains [''] (array with empty string) instead of:
- Empty array [] when no arguments, OR
- Actual argument expressions when there are arguments
Current behavior:
│ go │ call_expression │ fmt.Println │ [''] │ fmt.Println("Hello world!") │
│ java │ method_invocation │ println │ [''] │ System.out.println("Hello") │
Expected: parameters should contain ['"Hello world!"'] or []
2. Method Call Names Are Empty¶
For method calls like obj.method():
- name is empty across ALL languages
- signature_type contains the full expression obj.method
Workaround: Use signature_type LIKE '%.methodname'
3. Lua Parameter Extraction Missing¶
Lua functions show parameters = [] even when they have parameters:
│ lua │ function_declaration │ fact │ [] │ function fact(n) │
│ lua │ function_declaration │ fact │ [] │ function fact(n, acc) │
Expected: parameters should be ['n'] and ['n', 'acc']
4. qualified_name — Scope-Based Definition Path¶
The qualified_name field is populated for all named definition nodes across all languages. It provides a scope-based path that disambiguates nodes with the same name.
Format: LIST(STRUCT(semantic_type SEMANTIC_TYPE, name VARCHAR, index INTEGER)). One element per scope level, outermost → innermost.
| Example Code | qualified_name (as struct list) | ast_qualified_name_as_string() |
|---|---|---|
Top-level def process(): |
[{function, process, 1}] |
F[process] |
class User: def __init__(): |
[{class, User, 1}, {function, __init__, 1}] |
C[User] F[__init__] |
| Nested class → class → init | [{class, Account, 1}, {class, Settings, 1}, {function, __init__, 1}] |
C[Account] C[Settings] F[__init__] |
def outer(): def inner(): |
[{function, outer, 1}, {function, inner, 1}] |
F[outer] F[inner] |
Two x = ... in same scope |
[{variable, x, 1}], [{variable, x, 2}] |
V[x], V[x][2] |
Key properties:
- NULL for non-definition nodes (calls, identifiers, operators, etc.)
- Available at context := 'normalized' and above (not just native)
- Unique within a file — explicit index field disambiguates same-name collisions within a scope
- Excludes file_path — use USING (file_path, qualified_name) for cross-file joins
- Language-agnostic format (same semantic type values across all 27 languages)
- Queryable structurally via DuckDB list/struct functions: len(), list_filter(), list_transform(), qualified_name[-1].name, etc.
- ast_qualified_name_as_string() renders the list as a bracket string for display, logging, or LIKE-style matching
Summary: Extraction Quality by Language¶
| Language | Functions | Classes | Calls | Variables | Body Detection | Overall |
|---|---|---|---|---|---|---|
| Java | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Excellent |
| Rust | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Very Good |
| Go | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐ | Very Good |
| C++ | ⭐⭐⭐ | ⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Good |
| Python | ⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐ | ⭐⭐⭐ | Good |
| JavaScript | ⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Good |
| TypeScript | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Very Good |
| Dart | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐ | Very Good |
| Kotlin | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Very Good |
| Swift | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Good |
| C# | ⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐⭐⭐ | Very Good |
| PHP | ⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐ | ⭐⭐⭐ | Good |
| Ruby | ⭐⭐ | ⭐⭐ | ⭐⭐ | ⭐ | ⭐⭐⭐ | Good |
| R | ⭐⭐ | ⭐ | ⭐⭐ | ⭐ | ⭐⭐ | Basic |
| Lua | ⭐ | ⭐ | ⭐ | ⭐ | ⭐⭐⭐ | Needs Work |
| Bash | ⭐⭐ | N/A | ⭐⭐ | ⭐ | ⭐⭐⭐ | Basic |
Body Detection Notes¶
The Body Detection column rates how well we detect function bodies vs declaration-only: - ⭐⭐⭐ = Runtime body detection works reliably (>95% accuracy) - ⭐⭐ = Body detection works but with limitations (>80% accuracy) - ⭐ = Body detection has significant issues
Implementation Details:
- Uses IS_DECLARATION_ONLY flag for forward declarations (abstract methods, interface methods, signatures)
- Uses IS_SYNTAX_ONLY flag for pure syntax tokens (keywords, punctuation)
- Runtime HasBodyChild() detection for languages with abstract methods (Java, C#, TypeScript)
- Body types detected: block, compound_statement, statement_block, function_body, body, body_statement, braced_expression, constructor_body
Language-Specific Notes:
- Dart: Uses sibling structure (signature and body are siblings), requires explicit IS_DECLARATION_ONLY marking
- R: Lambda expressions use braced_expression bodies; ~84% detection accuracy
- TypeScript: Interface method signatures marked as IS_DECLARATION_ONLY
Common Query Patterns¶
Find a specific function definition¶
SELECT name, signature_type, parameters, start_line, peek
FROM read_ast('src/**/*.py', context := 'native', peek := 'full')
WHERE semantic_type_to_string(semantic_type) = 'DEFINITION_FUNCTION'
AND name = 'my_function';
Find a method within a class (Python)¶
WITH class_blocks AS (
SELECT c.name as class_name, c.node_id as class_id, b.node_id as block_id
FROM read_ast('myfile.py', context := 'native') c
JOIN read_ast('myfile.py', context := 'native') b ON b.parent_id = c.node_id
WHERE c.type = 'class_definition' AND b.type = 'block'
)
SELECT
cb.class_name || '.' || m.name as qualified_name,
m.signature_type,
m.parameters,
m.start_line
FROM class_blocks cb
JOIN read_ast('myfile.py', context := 'native') m ON m.parent_id = cb.block_id
WHERE m.type = 'function_definition'
AND m.name = 'my_method';
Find all calls to a method (any object)¶
SELECT file_path, start_line, signature_type, peek
FROM read_ast('src/**/*.cpp', context := 'native', peek := 60)
WHERE semantic_type_to_string(semantic_type) = 'COMPUTATION_CALL'
AND (
name = 'size' -- Simple function call
OR signature_type LIKE '%.size' -- obj.size()
OR signature_type LIKE '%->size' -- ptr->size()
);