Skip to content

Native Extraction Field Semantics

This document describes the semantic meaning of each native extraction field across different languages and semantic types.

Native Extraction Fields

Field Type Description
name VARCHAR The identifier name of the node (function name, class name, variable name)
signature_type VARCHAR Type information (return type for functions, class type, variable type)
parameters VARCHAR[] Parameter names for functions, argument values for calls
modifiers VARCHAR[] Access modifiers, keywords, inheritance info
annotations VARCHAR Decorator/annotation text
qualified_name LIST<STRUCT> Scope path as segment list [{semantic_type, name, index}, ...], unique within a file. Top-level column, not part of native struct.

DEFINITION_FUNCTION (Functions/Methods)

Field Semantics

Field Meaning
name Function/method name
signature_type Return type (language-specific)
parameters List of parameter names
modifiers Access modifiers (public, static, async, etc.)

Cross-Language Comparison

Language signature_type parameters modifiers Notes
Python NULL ✓ param names [] No return type in AST without annotations
Rust ✓ return type (u64, ()) ✓ param names [] Full return type extraction
C++ ✓ return type (int, void) ✓ param names [] Return type from declaration
Java ✓ return type (void, BigInteger) ✓ param names ✓ [public, static] Full modifier extraction
Go ✓ return type (*big.Int, float64) ✓ param names [] Supports pointer/complex types
JavaScript function (literal) ✓ param names [] No type system
Lua NULL [] [] Minimal extraction

Examples

┌──────────┬─────────────────────┬────────────────┬────────────┬──────────────────┬────────────────────────────────────────┐
│ Language │ name                │ signature_type │ parameters │ modifiers        │ peek                                   │
├──────────┼─────────────────────┼────────────────┼────────────┼──────────────────┼────────────────────────────────────────┤
│ python   │ factorial           │ NULL           │ [n]        │ []               │ def factorial(n):                      │
│ rust     │ factorial_recursive │ u64            │ [n]        │ []               │ fn factorial_recursive(n: u64) -> u64  │
│ cpp      │ factorial           │ int            │ [n]        │ []               │ int factorial(int n)                   │
│ java     │ main                │ void           │ [args]     │ [public, static] │ public static void main(String[] args) │
│ go       │ factorial           │ *big.Int       │ [n]        │ []               │ func factorial(n int64) *big.Int       │
│ js       │ factorial           │ function       │ [n]        │ []               │ function factorial(n)                  │
└──────────┴─────────────────────┴────────────────┴────────────┴──────────────────┴────────────────────────────────────────┘

DEFINITION_CLASS (Classes/Types)

Field Semantics

Field Meaning
name Class/type name
signature_type Class kind (class, interface, abstract_class, trait, enum)
parameters [] (unused for classes)
modifiers Inheritance info (extends X, implements Y), access modifiers

Cross-Language Comparison

Language signature_type modifiers Notes
Python class, abstract_class ✓ inheritance, has_dunder_methods Detects ABC subclasses
Java class, interface, abstract_class extends, implements, access Full OOP support
C++ NULL [] Limited extraction
Rust trait, struct, enum [] Trait detection works
Go struct, interface [] Basic type detection

Examples

┌──────────┬───────────────────────┬────────────────┬──────────────────────────────────────┬────────────────────────────────────────┐
│ Language │ type                  │ signature_type │ modifiers                            │ peek                                   │
├──────────┼───────────────────────┼────────────────┼──────────────────────────────────────┼────────────────────────────────────────┤
│ python   │ class_definition      │ class          │ [extends_object, has_dunder_methods] │ class BaseQueue(object):               │
│ python   │ class_definition      │ abstract_class │ [abstract, has_dunder_methods]       │ class BaseQueue():                     │
│ java     │ interface_declaration │ interface      │ [interface]                          │ interface Example {                    │
│ java     │ class_declaration     │ class          │ [implements Example]                 │ class ExampleImpl implements Example { │
│ java     │ class_declaration     │ abstract_class │ [abstract]                           │ abstract class Example {               │
│ rust     │ trait_item            │ trait          │ []                                   │ trait Shape { fn area(self) -> i32; }  │
└──────────┴───────────────────────┴────────────────┴──────────────────────────────────────┴────────────────────────────────────────┘

COMPUTATION_CALL (Function/Method Calls)

Field Semantics

Field Meaning
name Function name (for simple calls) OR empty for method calls
signature_type Full call expression (e.g., obj.method, pkg.func)
parameters Argument values/expressions
modifiers []

Important Note: Method Calls

For method calls like obj.method(): - name is empty (FIND_IDENTIFIER doesn't traverse member expressions) - signature_type contains the full call: obj.method - Use signature_type LIKE '%.methodname' to find method calls

Cross-Language Comparison

Language name (simple call) signature_type parameters Notes
Python print sys.stdout.write ✓ arg values Method calls have empty name
Java println System.out ✓ arg values Method invocations captured
C++ std::print ✓ full qualified ✓ arg values Namespace-qualified calls
Go empty for pkg calls fmt.Println ✓ arg values Package calls use signature_type
Rust ✓ macro names ✓ macro name [] Macros captured separately

Examples

┌──────────┬───────────────────┬────────────┬──────────────────┬────────────┬────────────────────────────────────┐
│ Language │ type              │ name       │ signature_type   │ parameters │ peek                               │
├──────────┼───────────────────┼────────────┼──────────────────┼────────────┼────────────────────────────────────┤
│ python   │ call              │ print      │ print            │ ['']       │ print("Hello world!")              │
│ python   │ call              │            │ sys.stdout.write │ ['']       │ sys.stdout.write("Hello world!\n") │
│ java     │ method_invocation │ println    │ System.out       │ ['']       │ System.out.println("Hello world!") │
│ cpp      │ call_expression   │ std::print │ std::print       │ ['']       │ std::print("Hello world!\n")       │
│ go       │ call_expression   │            │ fmt.Println      │ ['']       │ fmt.Println("Hello world!")        │
│ rust     │ macro_invocation  │ println    │ println          │ []         │ println!("Hello world!")           │
└──────────┴───────────────────┴────────────┴──────────────────┴────────────┴────────────────────────────────────┘

Finding Method Calls

-- Find all calls to a method named 'empty' (works across all languages)
SELECT file_path, start_line, signature_type, peek
FROM read_ast('src/**/*.cpp', context := 'native')
WHERE semantic_type_to_string(semantic_type) = 'COMPUTATION_CALL'
  AND (
    name = 'empty'                    -- Simple function call
    OR signature_type LIKE '%.empty'  -- Method call via dot
    OR signature_type LIKE '%->empty' -- C++ arrow notation
  );

DEFINITION_VARIABLE (Variables/Fields)

Field Semantics

Field Meaning
name Variable name
signature_type Variable type (when available)
parameters []
modifiers Declaration keywords (var, let, const, final)

Cross-Language Comparison

Language signature_type modifiers Notes
Go ✓ type (int, etc.) [var] var_spec has best extraction
Java ✓ type (boolean[]) [] local_variable_declaration
Rust ✓ type [let], [mut] Pattern-based
Python NULL [] Dynamic typing
JavaScript NULL [const], [let], [var] Declaration keyword captured

Examples

┌──────────┬────────────────────────────┬─────────────┬────────────────┬───────────┬─────────────────────────────────────┐
│ Language │ type                       │ name        │ signature_type │ modifiers │ peek                                │
├──────────┼────────────────────────────┼─────────────┼────────────────┼───────────┼─────────────────────────────────────┤
│ go       │ var_spec                   │ door        │ int            │ [var]     │ door int = 1                        │
│ go       │ var_spec                   │ incrementer │ NULL           │ [var]     │ incrementer = 0                     │
│ java     │ local_variable_declaration │             │ boolean[]      │ []        │ boolean[] doors = new boolean[101]; │
│ java     │ variable_declarator        │ doors       │ NULL           │ []        │ doors = new boolean[101]            │
└──────────┴────────────────────────────┴─────────────┴────────────────┴───────────┴─────────────────────────────────────┘


Known Inconsistencies

1. Call Parameters Contain Empty Strings

For COMPUTATION_CALL, the parameters field contains [''] (array with empty string) instead of: - Empty array [] when no arguments, OR - Actual argument expressions when there are arguments

Current behavior:

│ go   │ call_expression   │ fmt.Println │ [''] │ fmt.Println("Hello world!") │
│ java │ method_invocation │ println     │ [''] │ System.out.println("Hello") │

Expected: parameters should contain ['"Hello world!"'] or []

2. Method Call Names Are Empty

For method calls like obj.method(): - name is empty across ALL languages - signature_type contains the full expression obj.method

Workaround: Use signature_type LIKE '%.methodname'

3. Lua Parameter Extraction Missing

Lua functions show parameters = [] even when they have parameters:

│ lua │ function_declaration │ fact │ [] │ function fact(n)      │
│ lua │ function_declaration │ fact │ [] │ function fact(n, acc) │

Expected: parameters should be ['n'] and ['n', 'acc']

4. qualified_name — Scope-Based Definition Path

The qualified_name field is populated for all named definition nodes across all languages. It provides a scope-based path that disambiguates nodes with the same name.

Format: LIST(STRUCT(semantic_type SEMANTIC_TYPE, name VARCHAR, index INTEGER)). One element per scope level, outermost → innermost.

Example Code qualified_name (as struct list) ast_qualified_name_as_string()
Top-level def process(): [{function, process, 1}] F[process]
class User: def __init__(): [{class, User, 1}, {function, __init__, 1}] C[User] F[__init__]
Nested class → class → init [{class, Account, 1}, {class, Settings, 1}, {function, __init__, 1}] C[Account] C[Settings] F[__init__]
def outer(): def inner(): [{function, outer, 1}, {function, inner, 1}] F[outer] F[inner]
Two x = ... in same scope [{variable, x, 1}], [{variable, x, 2}] V[x], V[x][2]

Key properties: - NULL for non-definition nodes (calls, identifiers, operators, etc.) - Available at context := 'normalized' and above (not just native) - Unique within a file — explicit index field disambiguates same-name collisions within a scope - Excludes file_path — use USING (file_path, qualified_name) for cross-file joins - Language-agnostic format (same semantic type values across all 27 languages) - Queryable structurally via DuckDB list/struct functions: len(), list_filter(), list_transform(), qualified_name[-1].name, etc. - ast_qualified_name_as_string() renders the list as a bracket string for display, logging, or LIKE-style matching


Summary: Extraction Quality by Language

Language Functions Classes Calls Variables Body Detection Overall
Java ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐⭐ Excellent
Rust ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ Very Good
Go ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐⭐ Very Good
C++ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐⭐ Good
Python ⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐⭐ Good
JavaScript ⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ Good
TypeScript ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ Very Good
Dart ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐ Very Good
Kotlin ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ Very Good
Swift ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ Good
C# ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ Very Good
PHP ⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ Good
Ruby ⭐⭐ ⭐⭐ ⭐⭐ ⭐⭐⭐ Good
R ⭐⭐ ⭐⭐ ⭐⭐ Basic
Lua ⭐⭐⭐ Needs Work
Bash ⭐⭐ N/A ⭐⭐ ⭐⭐⭐ Basic

Body Detection Notes

The Body Detection column rates how well we detect function bodies vs declaration-only: - ⭐⭐⭐ = Runtime body detection works reliably (>95% accuracy) - ⭐⭐ = Body detection works but with limitations (>80% accuracy) - ⭐ = Body detection has significant issues

Implementation Details: - Uses IS_DECLARATION_ONLY flag for forward declarations (abstract methods, interface methods, signatures) - Uses IS_SYNTAX_ONLY flag for pure syntax tokens (keywords, punctuation) - Runtime HasBodyChild() detection for languages with abstract methods (Java, C#, TypeScript) - Body types detected: block, compound_statement, statement_block, function_body, body, body_statement, braced_expression, constructor_body

Language-Specific Notes: - Dart: Uses sibling structure (signature and body are siblings), requires explicit IS_DECLARATION_ONLY marking - R: Lambda expressions use braced_expression bodies; ~84% detection accuracy - TypeScript: Interface method signatures marked as IS_DECLARATION_ONLY


Common Query Patterns

Find a specific function definition

SELECT name, signature_type, parameters, start_line, peek
FROM read_ast('src/**/*.py', context := 'native', peek := 'full')
WHERE semantic_type_to_string(semantic_type) = 'DEFINITION_FUNCTION'
  AND name = 'my_function';

Find a method within a class (Python)

WITH class_blocks AS (
    SELECT c.name as class_name, c.node_id as class_id, b.node_id as block_id
    FROM read_ast('myfile.py', context := 'native') c
    JOIN read_ast('myfile.py', context := 'native') b ON b.parent_id = c.node_id
    WHERE c.type = 'class_definition' AND b.type = 'block'
)
SELECT
    cb.class_name || '.' || m.name as qualified_name,
    m.signature_type,
    m.parameters,
    m.start_line
FROM class_blocks cb
JOIN read_ast('myfile.py', context := 'native') m ON m.parent_id = cb.block_id
WHERE m.type = 'function_definition'
  AND m.name = 'my_method';

Find all calls to a method (any object)

SELECT file_path, start_line, signature_type, peek
FROM read_ast('src/**/*.cpp', context := 'native', peek := 60)
WHERE semantic_type_to_string(semantic_type) = 'COMPUTATION_CALL'
  AND (
    name = 'size'                    -- Simple function call
    OR signature_type LIKE '%.size'  -- obj.size()
    OR signature_type LIKE '%->size' -- ptr->size()
  );

Compare function signatures across languages

SELECT language, name, signature_type, parameters
FROM read_ast([
    'src/main.py',
    'src/main.rs',
    'src/main.go'
], context := 'native')
WHERE semantic_type_to_string(semantic_type) = 'DEFINITION_FUNCTION'
ORDER BY name, language;