FROM AGPEDIA — AGENCY THROUGH KNOWLEDGE

Semantic Lifting

Semantic lifting is the process of algorithmically transforming a low-level, implementation-level representation — such as source code, a binary, a protocol parser, or a runtime program state — into a high-level, declarative, or semantically rich representation suitable for formal reasoning, verification, or downstream analysis. The transformed output is typically expressed in a specification language or knowledge representation formalism, and the transformation is accompanied by a correctness guarantee establishing that the two representations are semantically equivalent.

The term arises across several research communities, including program analysis, scientific computing, network security, and cyber-physical systems, but the underlying structure is consistent: a formal mapping from an implementation layer to a semantic or specification layer, often decomposed through intermediate representations with proof obligations at each step.

Relationship to lowering

Semantic lifting is the inverse of lowering (also called code generation or compilation), which takes a high-level specification and produces an optimized low-level implementation. Where lowering is well-studied and largely automated, lifting is structurally harder: a given low-level artifact may have many semantically equivalent high-level descriptions, so lifting is formalized as a search problem over a space of candidate specifications, constrained by correctness witnesses.

The distinction is not merely directional. Lowering discards semantic structure by necessity — loops become machine instructions, mathematical operators become memory accesses — so lifting must reconstruct that structure from operational behavior. In practice, lifting pipelines decompose the problem into a chain of increasingly abstract intermediate representations, inverting one layer at a time with a formal equivalence check at each transition ^[1:1].

Applications

Scientific computing

In scientific computing, semantic lifting addresses the challenge of verifying or recovering the mathematical intent of low-level numerical kernels. This is increasingly relevant as large language models (LLMs) are used to generate scientific code: the generated code may be syntactically valid but difficult to analyze for correctness or performance without knowing the high-level algorithm it implements.

One approach, developed in the SPIRAL system, uses a chain of domain-specific languages as intermediate representations: raw C or LLVM code is lifted stepwise through an internal representation (icode), then through Σ-SPL (which captures loop abstractions), and finally into SPL (Signal Processing Language), which expresses the mathematical semantics of the kernel as a matrix operator. At each step, rewrite rules are applied and correctness is checked using the GAP computer algebra system ^[1:2]. The result is a high-level specification that can be compared against known algorithmic identities, enabling both verification and optimization.

This approach was demonstrated by successfully lifting GPT-4-generated fast Fourier transform (FFT) code to its SPL specification, confirming the generated code implements the Cooley-Tukey FFT algorithm ^[1:1].

Legacy code modernization

A related application is the migration of legacy scientific code to modern domain-specific languages (DSLs) that support better optimization and hardware portability. Verified lifting systems automatically translate low-level code to high-level DSL summaries while producing a formal proof of semantic equivalence — unlike syntax-based pattern matching, which offers no such guarantee.

The STNG system pioneered this approach for stencil computations, lifting Fortran source code into a predicate language and then into Halide, achieving median speedups of 4.1× and up to 24× over the original code ^[2:1]. Later work (Stencil-Lifting) improved the efficiency of this process substantially by replacing search-based synthesis with a hierarchical recursive lifting algorithm using invariant subgraphs, achieving 31.62× and 5.8× speedups over STNG and its successor Dexter respectively, while retaining full semantic equivalence ^[3:1].

Network protocol security

In network security, semantic lifting is used to recover formal packet format specifications from protocol parser implementations. Existing approaches to protocol format inference tend to rely on dynamic analysis — observing network traffic — and therefore miss any message format features not present in the observed packets. A static lifting approach instead analyzes the parser source code directly.

The NetLifter system (Shi et al., 2023) uses abstract interpretation to construct an abstract format graph (AFG) from a protocol parser, capturing both field boundaries and semantic constraints among packet fields. The AFG is a transformed control-flow graph that suppresses irrelevant statements while retaining sufficient structure for localized path-sensitive analysis, resolving the competing challenges of path explosion and field reordering that complicate naive static analysis ^[4:1]. Protocol formats inferred this way can be applied to security tools: in evaluation, the lifted formats improved fuzzer code coverage by 20–260% and enabled discovery of 53 zero-day vulnerabilities with 47 assigned CVEs ^[4:1].

Digital twins and cyber-physical systems

In the context of digital twins and cyber-physical systems, semantic lifting refers to the process of serializing a running program's state as a knowledge graph, connecting that graph to domain ontologies, and enabling the program to query its own state using semantic technologies. Kamburjan et al. define it as a three-step process: (1) serializing the digital entity as a knowledge graph, (2) connecting this graph to domain knowledge necessary to operate or interpret the entity, and (3) accessing the combined graph from within the program itself ^[5:1].

In this context the knowledge graph is expressed in RDF (Resource Description Framework) and enriched with OWL (Web Ontology Language) axioms that encode domain relationships. This allows queries expressed in domain vocabulary — for example, asking which rooms in a smart house are spatially adjacent — to be answered by reasoning over the lifted state, even though the program's internal data structures use no such vocabulary. The third step, querying the lifted state from within the running program, is called semantic reflection. Together, semantic lifting and semantic reflection enable digital twins to inspect and adapt their own configurations in terms of domain concepts rather than internal implementation details ^[5:1].

Common structure

Despite the diversity of application domains, semantic lifting instances share a common formal structure. In each case there is an implementation-level object (a program, a kernel, a parser, a heap state) and a target semantic specification (a DSL expression, a predicate summary, a packet grammar, a knowledge graph). The lifting algorithm produces a mapping between the two and a correctness witness — a proof, a formal equivalence check, or an ontological consistency condition — establishing that the mapping preserves the intended semantics.

A recurring architectural pattern is the use of intermediate representations as stepping stones: rather than attempting a direct transformation, the algorithm lifts through a chain of representations of increasing abstraction, with a manageable correctness obligation at each step. This makes the problem tractable while enabling modular verification.

Open challenges

Several challenges limit the scope and automation of current semantic lifting approaches.

Scientific kernels pose particular difficulties for static analysis due to floating-point arithmetic, pointer aliasing, recursion, and transcendental functions, all of which complicate the construction of correctness witnesses ^[1:1]. Most existing verified lifting systems handle only data-independent control flow and regular memory access patterns, excluding irregular or non-recursive kernels.

In the legacy code domain, search-based synthesis (as used in STNG and Dexter) can fail to find valid summaries within bounded time for complex kernels; the shift to algorithm-driven lifting (as in Stencil-Lifting) addresses this but requires careful specification of the lifting theory for each class of computation ^[3:1].

In protocol analysis, handling protocols with complex state machines or non-top-down parsers remains an open problem ^[4:1].

More broadly, reducing dependence on human-specified rewrite rules and lifting heuristics — moving toward fully automated lifting — is an active research direction across all domains.

^a ^b ^c ↗ def-stepwise-lifting ^ ↗ spiral-gap-correctness Zhang, Naifeng; Rao, Sanil; Franusich, Mike; Franchetti, Franz (2025-01-21). Towards Semantics Lifting for Scientific Computing: A Case Study on FFT. ACM. https://users.ece.cmu.edu/~franzf/papers/2025_TPSA_Lifting.pdf.
^ ↗ stng-speedup Kamil, Shoaib; Cheung, Alvin; Itzhaky, Shachar; Solar-Lezama, Armando (2016-06). Verified Lifting of Stencil Computations. ACM SIGPLAN Notices. https://doi.org/10.1145/2980983.2908117 https://www.semanticscholar.org/paper/Verified-lifting-of-stencil-computations-Kamil-Cheung/84ca47db1b18083006d42ba111b9846aba258cea.
^a ^b ↗ stencil-lifting-speedup Li, Mingyi; Xiao, Junmin; Chen, Siyan; Ma, Hui; et al. (2025-10). Stencil-Lifting: Hierarchical Recursive Lifting System for Extracting Summary of Stencil Kernel in Legacy Codes. Proceedings of the ACM on Programming Languages. https://doi.org/10.1145/3763159 https://dl.acm.org/doi/10.1145/3763159.
^a ^b ^c ↗ protocol-lifting-results Shi, Qingkai; Shao, Junyang; Ye, Yapeng; Zheng, Mingwei; et al. (2023-11). Lifting Network Protocol Implementation to Precise Format Specification with Security Applications. ACM. https://doi.org/10.1145/3576915.3616614 https://qingkaishi.github.io/public_pdfs/CCS23.pdf.
^a ^b ↗ dt-semantic-lifting-def Kamburjan, Eduard; Pferscher, Andrea; Schlatte, Rudolf; Sieve, Riccardo; et al. (2024). Semantic Reflection and Digital Twins: A Comprehensive Overview. Springer. https://ebjohnsen.org/publication/24-tiziana/24-tiziana.pdf.

Available in

en - English