Skip to content

Semantic Types Reference

Complete reference for the semantic type system.

Overview

The semantic type system uses an 8-bit encoding to classify AST nodes into universal categories that work across all 27 supported languages.

The SEMANTIC_TYPE Column

The semantic_type column uses a custom DuckDB logical type called SEMANTIC_TYPE:

-- Displays as human-readable string
SELECT semantic_type FROM read_ast('file.py') LIMIT 1;
-- Returns: DEFINITION_FUNCTION (not 240)

-- Direct string comparison works
SELECT * FROM read_ast('file.py')
WHERE semantic_type = 'DEFINITION_FUNCTION';

-- Check the type
SELECT typeof(semantic_type) FROM read_ast('file.py') LIMIT 1;
-- Returns: SEMANTIC_TYPE

The type stores values efficiently as UTINYINT internally while displaying as readable strings.

Quick Reference

Code Name Description
32 METADATA_COMMENT Comments and documentation
36 METADATA_ANNOTATION Decorators, annotations
48 EXTERNAL_IMPORT Import/include statements
52 EXTERNAL_EXPORT Export statements
64 LITERAL_NUMBER Numeric values
68 LITERAL_STRING String values
72 LITERAL_ATOMIC Boolean, null
80 NAME_IDENTIFIER Simple identifiers
84 NAME_QUALIFIED Dotted names
112 TYPE_PRIMITIVE Basic types
144 FLOW_CONDITIONAL If/switch
148 FLOW_LOOP For/while
152 FLOW_JUMP Return/break/continue
160 ERROR_TRY Try blocks
164 ERROR_CATCH Catch blocks
208 COMPUTATION_CALL Function calls
212 COMPUTATION_ACCESS Member access
220 COMPUTATION_LAMBDA Anonymous functions
240 DEFINITION_FUNCTION Function definitions
244 DEFINITION_VARIABLE Variable definitions
248 DEFINITION_CLASS Class definitions
252 DEFINITION_MODULE Module definitions

Encoding Structure

8-bit encoding: [ss kk tt ll]
  ss (bits 6-7): Super Kind (4 categories)
  kk (bits 4-5): Kind (16 subcategories)
  tt (bits 2-3): Super Type (4 per kind)
  ll (bits 0-1): Language-specific

Super Kinds

META_EXTERNAL (0x00-0x3F)

Metadata, parser constructs, and external references.

Kind Code Range Description
PARSER_SPECIFIC 0-15 Syntax, delimiters
RESERVED 16-31 Future use
METADATA 32-47 Comments, annotations
EXTERNAL 48-63 Imports, exports

DATA_STRUCTURE (0x40-0x7F)

Data representation and naming.

Kind Code Range Description
LITERAL 64-79 Values
NAME 80-95 Identifiers
PATTERN 96-111 Patterns
TYPE 112-127 Type info

CONTROL_EFFECTS (0x80-0xBF)

Program flow and execution.

Kind Code Range Description
EXECUTION 128-143 Statements
FLOW_CONTROL 144-159 Conditionals, loops
ERROR_HANDLING 160-175 Try/catch
ORGANIZATION 176-191 Blocks, structure

COMPUTATION (0xC0-0xFF)

Operations and definitions.

Kind Code Range Description
OPERATOR 192-207 Operators
COMPUTATION_NODE 208-223 Calls, access
TRANSFORM 224-239 Queries, iteration
DEFINITION 240-255 Functions, classes

Helper Functions

semantic_type_to_string(code)

Convert code to name:

SELECT semantic_type_to_string(240);
-- Returns: 'DEFINITION_FUNCTION'

get_super_kind(code)

Get super kind:

SELECT get_super_kind(240);
-- Returns: 3 (COMPUTATION)

get_kind(code)

Get kind:

SELECT get_kind(240);
-- Returns: 15 (DEFINITION)

is_definition(code)

Check if definition:

SELECT is_definition(240);  -- true
SELECT is_definition(208);  -- false

is_call(code)

Check if function call:

SELECT is_call(208);  -- true
SELECT is_call(240);  -- false

is_control_flow(code)

Check if control flow:

SELECT is_control_flow(144);  -- true (CONDITIONAL)
SELECT is_control_flow(148);  -- true (LOOP)
SELECT is_control_flow(152);  -- true (JUMP)

is_identifier(code)

Check if identifier:

SELECT is_identifier(80);   -- true
SELECT is_identifier(84);   -- true

Specific Type Predicates

Convenience macros for common semantic type checks:

Definition Predicates

-- Check for function definitions
SELECT * FROM read_ast('file.py') WHERE is_function_definition(semantic_type);

-- Check for class definitions
SELECT * FROM read_ast('file.py') WHERE is_class_definition(semantic_type);

-- Check for variable definitions
SELECT * FROM read_ast('file.py') WHERE is_variable_definition(semantic_type);

-- Check for module definitions
SELECT * FROM read_ast('file.py') WHERE is_module_definition(semantic_type);

-- Check for type definitions (typedef, type alias)
SELECT * FROM read_ast('file.py') WHERE is_type_definition(semantic_type);

Computation Predicates

-- Check for function/method calls
SELECT * FROM read_ast('file.py') WHERE is_function_call(semantic_type);

-- Check for member/property access
SELECT * FROM read_ast('file.py') WHERE is_member_access(semantic_type);

Literal Predicates

-- Check for string literals
SELECT * FROM read_ast('file.py') WHERE is_string_literal(semantic_type);

-- Check for number literals
SELECT * FROM read_ast('file.py') WHERE is_number_literal(semantic_type);

-- Check for boolean literals
SELECT * FROM read_ast('file.py') WHERE is_boolean_literal(semantic_type);

-- Check for any literal
SELECT * FROM read_ast('file.py') WHERE is_literal(semantic_type);

Control Flow Predicates

-- Check for conditionals (if/switch/match)
SELECT * FROM read_ast('file.py') WHERE is_conditional(semantic_type);

-- Check for loops (for/while/do)
SELECT * FROM read_ast('file.py') WHERE is_loop(semantic_type);

-- Check for jumps (return/break/continue/throw)
SELECT * FROM read_ast('file.py') WHERE is_jump(semantic_type);

Operator Predicates

-- Check for assignments
SELECT * FROM read_ast('file.py') WHERE is_assignment(semantic_type);

-- Check for comparisons
SELECT * FROM read_ast('file.py') WHERE is_comparison(semantic_type);

-- Check for arithmetic operations
SELECT * FROM read_ast('file.py') WHERE is_arithmetic(semantic_type);

-- Check for logical operations (and/or/not)
SELECT * FROM read_ast('file.py') WHERE is_logical(semantic_type);

External/Import Predicates

-- Check for import statements (import, from...import, use, require)
SELECT * FROM read_ast('file.py') WHERE is_import(semantic_type);

-- Check for export statements
SELECT * FROM read_ast('file.js') WHERE is_export(semantic_type);

-- Check for foreign function interface declarations
SELECT * FROM read_ast('file.rs') WHERE is_foreign(semantic_type);

Metadata Predicates

-- Check for comments
SELECT * FROM read_ast('file.py') WHERE is_comment(semantic_type);

-- Check for annotations/decorators
SELECT * FROM read_ast('file.py') WHERE is_annotation(semantic_type);

-- Check for preprocessor directives (#include, #define)
SELECT * FROM read_ast('file.c') WHERE is_directive(semantic_type);

Organization Predicates

-- Check for blocks/scopes
SELECT * FROM read_ast('file.py') WHERE is_block(semantic_type);

-- Check for lists/arrays/containers
SELECT * FROM read_ast('file.py') WHERE is_list(semantic_type);

Type Predicates

-- Check for primitive types (int, string, bool)
SELECT * FROM read_ast('file.go') WHERE is_type_primitive(semantic_type);

-- Check for composite types (struct, union, tuple)
SELECT * FROM read_ast('file.go') WHERE is_type_composite(semantic_type);

-- Check for reference/pointer types
SELECT * FROM read_ast('file.rs') WHERE is_type_reference(semantic_type);

-- Check for generic/template types
SELECT * FROM read_ast('file.ts') WHERE is_type_generic(semantic_type);

Filtering Patterns

By Exact Type

SELECT * FROM read_ast('file.py')
WHERE semantic_type = 'DEFINITION_FUNCTION';  -- Functions only

By Super Kind

-- All COMPUTATION types
SELECT * FROM read_ast('file.py')
WHERE semantic_type >= 192;

By Kind

-- All DEFINITION types (240-255)
SELECT * FROM read_ast('file.py')
WHERE semantic_type >= 240;

Using Helper Functions

-- All definitions
SELECT * FROM read_ast('file.py')
WHERE is_definition(semantic_type);

-- All control flow
SELECT * FROM read_ast('file.py')
WHERE is_control_flow(semantic_type);

Semantic Refinements

Some types include refinements for more specific categorization.

Function Refinements

Refinement Description
REGULAR Standard function
LAMBDA Anonymous function
CONSTRUCTOR Class constructor
GETTER Property getter
SETTER Property setter
ASYNC Async function

Variable Refinements

Refinement Description
MUTABLE Mutable variable
IMMUTABLE Constant
PARAMETER Function parameter
FIELD Class field

Class Refinements

Refinement Description
REGULAR Standard class
ABSTRACT Abstract class/interface
ENUM Enumeration
STRUCT Struct type

Loop Refinements

Refinement Description
ITERATOR For/foreach loop
CONDITIONAL While loop
INFINITE Infinite loop

Conditional Refinements

Refinement Description
BINARY If/else
MULTIWAY Switch/match
TERNARY Ternary expression

Universal Flags

In addition to semantic types, each node has a flags field for orthogonal properties.

Flag Values

Flag Value Description
IS_CONSTRUCT 0x01 Semantic language construct (not punctuation)
IS_EMBODIED 0x02 Has body/implementation (definition vs declaration)

Flag Helper Functions

-- Check if node is a semantic construct
SELECT is_construct(flags) FROM read_ast('file.py');

-- Check if node has implementation body (definition vs declaration)
SELECT is_embodied(flags) FROM read_ast('file.cpp');
SELECT has_body(flags) FROM read_ast('file.cpp');  -- alias

Distinguishing Definitions from Declarations

-- Find only function definitions (with body), not forward declarations
SELECT name, file_path
FROM read_ast('**/*.cpp', ignore_errors := true)
WHERE semantic_type = 'DEFINITION_FUNCTION'
  AND is_embodied(flags)        -- Has implementation
  AND name IS NOT NULL;

-- Find forward declarations only
SELECT name, file_path
FROM read_ast('**/*.{h,hpp}', ignore_errors := true)
WHERE semantic_type = 'DEFINITION_FUNCTION'
  AND NOT has_body(flags)       -- Declaration only
  AND name IS NOT NULL;

-- Using is_definition() for all definition types (functions, classes, variables)
SELECT name, semantic_type, file_path
FROM read_ast('**/*.cpp', ignore_errors := true)
WHERE is_definition(semantic_type)
  AND has_body(flags)
  AND name IS NOT NULL;

Cross-Language Examples

Functions Across Languages

-- Python: def, async def
-- JavaScript: function, arrow functions
-- Java: method_declaration
-- Go: function_declaration
-- All have semantic_type = 'DEFINITION_FUNCTION'

SELECT language, name, type
FROM read_ast(['**/*.py', '**/*.js', '**/*.java'], ignore_errors := true)
WHERE semantic_type = 'DEFINITION_FUNCTION'
ORDER BY language, name;

Classes Across Languages

-- Python: class_definition
-- Java: class_declaration
-- TypeScript: class_declaration
-- C++: class_specifier
-- All have semantic_type = 'DEFINITION_CLASS'

SELECT language, name
FROM read_ast(['**/*.py', '**/*.java', '**/*.cpp'], ignore_errors := true)
WHERE semantic_type = 'DEFINITION_CLASS';

Next Steps