FROM AGPEDIA — AGENCY THROUGH KNOWLEDGE

Semantic Lifting

Semantic lifting is a class of techniques by which low-level, implementation-level representations — such as source code, a binary, a protocol parser, or a runtime program state — are algorithmically transformed into high-level, declarative, or semantically rich representations suitable for formal reasoning, verification, or downstream analysis ^[1:1]. The transformed output is typically expressed in a specification language or knowledge representation formalism, and the transformation is accompanied by a correctness guarantee establishing that the two representations are semantically equivalent.

The term arises across several research communities, including program analysis, scientific computing, network security, and cyber-physical systems. In each case the underlying structure is a formal mapping from an implementation layer to a semantic or specification layer, decomposed through intermediate representations with proof obligations at each step ^[2:1].

Relationship to lowering

Semantic lifting is the inverse of lowering (also called code generation or compilation), which takes a high-level specification and produces an optimized low-level implementation. Lifting is structurally harder: a given low-level artifact may have many semantically equivalent high-level descriptions, so lifting is formalized as a search problem over a space of candidate specifications, constrained by correctness witnesses ^[2:2].

The distinction is not merely directional. Lowering discards semantic structure by necessity — loops become machine instructions, mathematical operators become memory accesses — so lifting must reconstruct that structure from operational behavior. In practice, lifting pipelines decompose the problem into a chain of increasingly abstract intermediate representations, inverting one layer at a time with a formal equivalence check at each transition ^[2:1].

Applications

Scientific computing

Scientific computing kernels — routines that implement mathematical operations such as Fourier transforms or matrix products — are often written in low-level C or Fortran for performance. Knowing what mathematical operation a kernel computes, rather than just how it computes it, matters for both verification and optimization. This has become a practical problem with the rise of LLM-generated code: such code may be syntactically correct but hard to verify, since the high-level algorithm it implements is not explicit in the source ^[2:3].

One approach, developed in the SPIRAL system, lifts low-level code by reversing the system's normal code-generation pipeline. SPIRAL represents scientific kernels through three layers: SPL (Signal Processing Language), which captures the mathematical operation; Σ-SPL, which adds loop structure; and icode, an abstract form of the implementation. Normally SPIRAL generates code by lowering from SPL down to icode; lifting runs this process in reverse, starting from C or LLVM code and working upward. At each step, rewrite rules are applied and correctness is checked using the GAP computer algebra system ^[2:4].

This approach was demonstrated by successfully lifting GPT-generated FFT code to its SPL specification, confirming that the generated code implements the Cooley-Tukey FFT algorithm ^[2:1].

Legacy code modernization

A related application is the migration of legacy scientific code to modern domain-specific languages (DSLs) that support better optimization and hardware portability. Verified lifting systems automatically translate low-level code to high-level DSL summaries while producing a formal proof of semantic equivalence — unlike syntax-based pattern matching, which relies on syntax rules and therefore struggles with limited pattern coverage and a lack of semantic guarantees ^[3:1].

The STNG system pioneered this approach for stencil computations, lifting Fortran source code into a predicate language and then into Halide, achieving median speedups of 4.1× and up to 24× over the original code ^[4:1]. A limitation of STNG and similar systems is their reliance on search-based synthesis, which may fail to find valid summaries within bounded time ^[3:2]. Later work (Stencil-Lifting) addressed this by replacing search with a hierarchical recursive lifting algorithm using invariant subgraphs, achieving 31.62× and 5.8× speedups over STNG and its successor Dexter respectively, while retaining full semantic equivalence ^[3:3].

Network protocol security

In network security, semantic lifting is used to recover formal packet format specifications from protocol parser implementations. Existing approaches to protocol format inference almost all rely on dynamic analysis driven by a limited number of network packets, meaning that any message format feature not present in the observed traffic will be missed in the resulting formats ^[5:1]. A static lifting approach instead analyzes the parser source code directly, without depending on observed traffic.

The NetLifter system (Shi et al., 2023) uses abstract interpretation to construct an abstract format graph (AFG) from a protocol parser, capturing both field boundaries and semantic constraints among packet fields. The AFG is a transformed control-flow graph that suppresses irrelevant statements while retaining sufficient structure for localized path-sensitive analysis, resolving the competing challenges of path explosion and disordered path constraints that complicate naive static analysis ^[5:2]. Protocol formats inferred this way can be applied to security tools: in evaluation, the lifted formats improved fuzzer code coverage by 20–260% and enabled discovery of 53 zero-day vulnerabilities with 47 assigned CVEs ^[5:2].

Digital twins and cyber-physical systems

In the context of digital twins and cyber-physical systems, semantic lifting is defined as the process of (1) serializing a digital entity as a knowledge graph, (2) connecting this graph to domain knowledge necessary to operate or interpret the entity, and (3) accessing the combined graph from within the program itself ^[1:1].

The knowledge graph is expressed in RDF (Resource Description Framework) and enriched with OWL (Web Ontology Language) axioms that encode domain relationships. This allows queries expressed in domain vocabulary — for example, asking which rooms in a smart house are spatially adjacent — to be answered by reasoning over the lifted state, even though the program's internal data structures use no such vocabulary ^[1:1]. The third step, querying the lifted state from within the running program, is called semantic reflection. Together, semantic lifting and semantic reflection enable digital twins to inspect and adapt their own configurations in terms of domain concepts rather than internal implementation details ^[1:1].

Open challenges

Several challenges limit the scope and automation of current semantic lifting approaches.

Scientific kernels pose particular difficulties for static analysis due to floating-point arithmetic, pointer aliasing, recursion, and transcendental functions, all of which complicate the construction of correctness witnesses ^[2:5]. The stepwise SPIRAL approach is described as an early-stage experimental solution within constrained boundaries, not yet applicable to arbitrary scientific code ^[2:3].

In the legacy code domain, search-based synthesis can fail to find valid summaries within bounded time for complex kernels ^[3:2]. The shift to algorithm-driven lifting addresses this for regular stencil computations, but requires careful specification of the lifting theory for each class of computation.

In protocol analysis, the NetLifter approach targets top-down protocol parsers; handling protocols with complex state machines or non-top-down parsers is not addressed by this technique ^[5:2].

^a ^b ^c ^d ↗ dt-semantic-lifting-def Kamburjan, Eduard; Pferscher, Andrea; Schlatte, Rudolf; Sieve, Riccardo; et al. (2024). Semantic Reflection and Digital Twins: A Comprehensive Overview. Springer. https://ebjohnsen.org/publication/24-tiziana/24-tiziana.pdf.
^a ^b ^c ↗ def-stepwise-lifting ^ ↗ lifting-as-search ^a ^b ↗ llm-code-limitations ^ ↗ spiral-gap-correctness ^ ↗ scientific-computing-challenges Zhang, Naifeng; Rao, Sanil; Franusich, Mike; Franchetti, Franz (2025-01-21). Towards Semantics Lifting for Scientific Computing: A Case Study on FFT. ACM. https://users.ece.cmu.edu/~franzf/papers/2025_TPSA_Lifting.pdf.
^ ↗ pattern-matching-vs-verified-lifting ^a ^b ↗ search-synthesis-limitation ^ ↗ stencil-lifting-speedup Li, Mingyi; Xiao, Junmin; Chen, Siyan; Ma, Hui; et al. (2025-10). Stencil-Lifting: Hierarchical Recursive Lifting System for Extracting Summary of Stencil Kernel in Legacy Codes. Proceedings of the ACM on Programming Languages. https://doi.org/10.1145/3763159 https://dl.acm.org/doi/10.1145/3763159.
^ ↗ stng-speedup Kamil, Shoaib; Cheung, Alvin; Itzhaky, Shachar; Solar-Lezama, Armando (2016-06). Verified Lifting of Stencil Computations. ACM SIGPLAN Notices. https://doi.org/10.1145/2980983.2908117 https://www.semanticscholar.org/paper/Verified-lifting-of-stencil-computations-Kamil-Cheung/84ca47db1b18083006d42ba111b9846aba258cea.
^ ↗ dynamic-analysis-limitation ^a ^b ^c ↗ protocol-lifting-results Shi, Qingkai; Shao, Junyang; Ye, Yapeng; Zheng, Mingwei; et al. (2023-11). Lifting Network Protocol Implementation to Precise Format Specification with Security Applications. ACM. https://doi.org/10.1145/3576915.3616614 https://qingkaishi.github.io/public_pdfs/CCS23.pdf.

Available in

en - English