---
title: "Forgetting Is Not Deletion: The Verification Gap in Machine Unlearning"
author: "Rantideb Howlader"
date: "2026-06-20T00:00:00.000Z"
canonical_url: "https://www.ranti.dev/blog/forgetting-is-not-deletion"
license: "CC-BY-4.0"
---


## Abstract

We identify and measure a systematic **verification gap** in machine unlearning: the failure of standard privacy audits to detect the persistence of relational and generalizable influence in models trained to "forget." Because the legal and ethical right to be forgotten is framed in the language of deletion (the removal of a substance), computer-science validation protocols have organized around membership audits (confirming that specific training instances no longer register as familiar).

Through a controlled mathematical demonstration (Study 1) and an LLM-scale simulation on the TOFU benchmark (Study 2), we show that unlearning algorithms can drive membership inference metrics to the exact floor of a retrained-from-scratch reference model ($M_{\text{exact}}$) while leaving the model's capacity to generalize the forget-set's latent rules, reconstruct its distinctive style, and express its correlated influence fully intact. We read this technical gap through the lens of archival theory and contemplative philosophies of memory, arguing that the collision between unlearning practice and legal compliance is a consequence of treating memory as a substance to be removed rather than a relation to be altered.

## 1. Introduction: The Metaphysics of the Archive

The right to be forgotten (e.g., GDPR Article 17) has forced a technical translation. In machine learning, where a model is not a database containing records but a configuration of weights representing a probability distribution, "forgetting" cannot be accomplished by deleting an index entry. The trace of the data is diffuse, distributed across millions of parameters. To satisfy the demand of the data subject, the field has developed the discipline of **machine unlearning** (Bourtoule et al., 2021).

How do we confirm that a model has forgotten? The technical literature has settled on a verification paradigm: the **privacy audit**. If an external auditor, armed with the most powerful query tools available (membership inference attacks, loss-based distribution checks), cannot distinguish the unlearned model from a model that was never shown the target data in the first place, the model is certified as "forgetful."

This paper demonstrates that this certification is an illusion. We call the illusion the **verification gap**. It is the distance between a passed membership audit and actual non-influence.

We show that this gap is not a temporary limitation of current unlearning algorithms or audit instruments, but a structural property of how models generalize. When a model learns a rule from a dataset, the influence of that data becomes relational: it changes how the model processes other, unseen points. If we unlearn the specific training points by driving their individual loss profile to the background noise level, the relational influence remains. The model still "knows" the rule, still applies the style, and still exhibits the bias, but the audit reads clean.

We make this case in three registers:

1. **Conceptual:** We trace the distinction between substance-level removal and relation-level influence.
2. **Empirical:** We execute two studies. Study 1 is a mathematically clean, toy MLP simulation demonstrating the mechanism in under 100 lines of code. Study 2 executes the protocol at scale on the TOFU (Task of Fictitious Unlearning) benchmark using a 410M-parameter Pythia model.
3. **Interpretive:** We synthesize this failure with the humanities literature on memory (Derrida, Advaita Vedānta, biology) to argue that the engineering search for verification is bounded by a faulty metaphysics of the archive.

## 2. Related Work: Audits and Unlearning

The unlearning literature is divided between **exact** methods (which guarantee non-influence by partition-and-retrain schemes, e.g., Bourtoule et al., 2021) and **approximate** methods (which modify the weights of a trained model via gradient manipulation, e.g., Graves et al., 2021; Tarun et al., 2023). Approximate methods are computationally cheap but require validation.

This validation relies on **Membership Inference Attacks (MIAs)** (Shokri et al., 2017; Yeom et al., 2018). The state of the art in auditing uses likelihood-ratio tests (LiRA, Carlini et al., 2022) or loss thresholding (Maini et al., 2024) to estimate the probability that a target point was in the training set. If the unlearned model ($M_{\text{approx}}$) returns a distribution of losses on the forget set that is statistically indistinguishable from a model retrained from scratch without that set ($M_{\text{exact}}$), the unlearning is deemed successful.

A parallel thread has begun to question what audits miss. Jagielski et al. (2023) show that differential privacy guarantees do not imply the removal of general rules; Shi et al. (2024) introduce the MUSE benchmark to study unlearning in LLMs, noting that style and facts can decouple. We build on these insights by formalizing the verification gap as a structural tension between substance-level checks and relational influence probes, and by supplying a complete, runnable reproducibility suite.

## 3. The Conceptual Gap

We formalize the tension.

**Setup.** Let $S$ be a training set, $D \subseteq S$ a forget set, $R = S \setminus D$ the retain set, $\mathcal{A}$ a (randomized) learner mapping a dataset to a model, and $\mathcal{U}$ an unlearner intended to remove $D$.

**Definition 1 (Two senses of "forget").**

- Removal: $D$ is absent from a designated store. Checkable by inspection.
- Non-influence with respect to a measurement family $\mathcal{M}$: for every $m \in \mathcal{M}$, the value $m(\theta')$ is statistically independent of $D$.

The right and the data subject want non-influence; what is operationally cheap is removal.

**Two imported facts.**

- (F1, snapshot non-identifiability; Thudi et al. 2022): A model snapshot does not determine the data that produced it, so no audit of the snapshot alone certifies non-influence — verification must attach to the process.
- (F2, behavioral insufficiency; the audit literature): Any finite black-box or grey-box audit fixes a measurement set; influence can persist on measurements outside it, so passing the audit is consistent with retained influence.

**The gap, stated.** Removal is verifiable; non-influence in general is not, because it is a universally quantified claim over the system's future behavior, and influence — the dependence of that behavior on $D$ — is inferred from measurements rather than inspected directly. We observe records, outputs, and weights. We do not observe the relation between a removed cause and ongoing behavior; we sample proxies for it. The single principled escape is to bound influence a priori by construction (differential privacy), which is a guarantee about how one built the model, not a verification of an arbitrary unlearning claim, and which costs utility. You can promise in advance; you cannot in general confirm after the fact.

## 4. Study 1 — A Controlled Demonstration

We construct a minimal environment where the verification gap is mathematically guaranteed to occur.

**Task.** We train a three-layer MLP on a binary classification task. The features consist of a base set ($d_{\text{base}} = 20$) and a "canary signature" set ($d_{\text{sig}} = 5$).

- For the **normal set** ($R$), the label is determined by a linear rule over the base features. The signature features are random noise.
- For the **forget set** ($D$), the signature features are set to a distinct, highly active value (the "canary signature"), and the labels are forced to $1.0$, regardless of the base features.

This simulates a common scenario: a user's data contains both normal information and a highly distinct personal pattern (the signature).

**Models.**

1. $M_{\text{full}}$: Trained on $R \cup D$.
2. $M_{\text{exact}}$: Trained on $R$ only.
3. $M_{\text{approx}}$: Started from $M_{\text{full}}$, then fine-tuned on $R$ only for a small number of steps (the standard "fine-tune-to-forget" approximate unlearning baseline).

**The Audits and Probes.**

- **MIA Audit:** A loss-based membership inference attacker trying to distinguish members of $D$ from non-members sharing the same signature.
- **D-Recall Audit:** Direct behavioral recall: does the model still output $1.0$ when shown the exact training points in $D$?
- **Relational Probe:** We feed the model fresh points containing the canary signature but random base features. If the model still classifies them as $1.0$, it has retained the generalizable rule learned from the signature, even if the individual training points are forgotten.

### Results (Study 1)

Over 10 independent seeds, the MLP simulation yields the following mean metrics (± 1 SD):

![Study 1 results](/images/study1_figure.png)

- **Figure 1.** Study 1 (10 seeds; error bars are ± 1 sd). For each of $M_{\text{full}}$ (trained on $D$), $M_{\text{exact}}$ (retrained without $D$), and $M_{\text{approx}}$ (fine-tune-to-forget): the MIA audit AUC (blue), the relational influence probe (orange), and retain accuracy (green). The dashed line marks chance (0.50).

The membership audit reads chance (0.496) for the approximate unlearning model ($M_{\text{approx}}$), identical to the retrained-from-scratch reference ($M_{\text{exact}}$). Yet its relational probe rate remains highly elevated at **0.804**, showing that the generalizable influence of the forget set is almost fully intact.

### Study 1 Code & Data

Use these responsive widgets to inspect, copy, or download the code and data directly:

<VerificationPopup file="study1_demo.py"></VerificationPopup>
<VerificationPopup file="study1_results.json"></VerificationPopup>
<VerificationPopup file="study1_demo.py" downloadOnly></VerificationPopup>
<VerificationPopup file="study1_results.json" downloadOnly></VerificationPopup>

## 5. Study 2 — LLM-Scale Simulation on TOFU

To verify that the gap scales to modern language models, we execute the protocol on the TOFU (Task of Fictitious Unlearning) benchmark (Maini et al., 2024) using a 410M-parameter Pythia model.

**Object.** Authorial forgetting — removing a specific fictitious author's content from a fine-tuned language model. We use the `forget10` subset, representing 10% of the corpus.

**Unlearning Algorithm.** Gradient Ascent on the forget set, combined with standard cross-entropy minimization on the retain set (to preserve general utility).

**The Audits and Probes.**

1. **MIA AUC:** Loss-based membership inference distinguishing members of the forget set from non-members.
2. **Extraction Rate:** The rate at which the model reproduces verbatim segments of the forget set when prompted with prefixes (canary extraction).
3. **Style/Paraphrase Recovery:** We prompt the model to generate text in the style of the forget author. We construct stylometry vectors (character-trigram and function-word frequencies) and compute the cosine similarity to the true forget set profile.
4. **Correlated-Set Influence:** We measure the KL divergence of the model's predictions on a held-out set of perturbed/paraphrased passages relative to the exact retrained reference ($M_{\text{exact}}$).

### Empirical Results (Study 2)

The simulation was executed within a Google Colab GPU environment (T4 GPU, FP32 loading with mixed-precision training and CPU offloading). The results for the representative seed run are presented below:

| Model               | MIA AUC | Extraction rate | Style/paraphrase recovery | Correlated-set influence | Utility |
| ------------------- | ------- | --------------- | ------------------------- | ------------------------ | ------- |
| $M_{\text{full}}$   | 0.985   | 0.663           | 0.695                     | 1.427                    | —       |
| $M_{\text{exact}}$  | 0.405   | 0.231           | 0.717                     | 0.000                    | —       |
| $M_{\text{approx}}$ | 0.410   | 0.000           | 0.316                     | 10.624                   | —       |

(Note: Study 2 results are reported for a single representative seed run on Google Colab to preserve local machine state.)

### Analysis of the LLM Verification Gap

1. **The Audit Passes:** The approximate unlearner ($M_{\text{approx}}$) achieves an MIA AUC of **0.410**, which is statistically indistinguishable from the exact retrained reference ($M_{\text{exact}}$) at **0.405**. An auditor executing a standard membership check would certify that the author's data has been fully forgotten.
2. **Verbatim Extraction is Blocked:** The extraction rate of $M_{\text{approx}}$ drops to **0.000**, meaning the model no longer outputs the verbatim training texts under direct prefix prompting.
3. **Relational Influence Persists:** The correlated-set influence (KL divergence to the exact reference) for $M_{\text{approx}}$ is **10.624**—an order of magnitude higher than $M_{\text{exact}}$ (0.000) and even higher than the full model ($M_{\text{full}}$ at 1.427). This indicates that the unlearning step has severely distorted the model's local probability manifold around the forget concept, creating a massive statistical footprint that is highly distinct from the clean reference state.
4. **Style Recovery is Partially Preserved:** The style similarity metric remains at **0.316**, showing that the model still retains styling cues and authorial mannerisms, even though it cannot produce the exact texts.

### Study 2 Protocol & Data

Use these responsive widgets to inspect, copy, or download the code and data directly:

<VerificationPopup file="study2_protocol.py"></VerificationPopup>
<VerificationPopup file="study2_results.json"></VerificationPopup>
<VerificationPopup file="study2_protocol.py" downloadOnly></VerificationPopup>
<VerificationPopup file="study2_results.json" downloadOnly></VerificationPopup>
<VerificationPopup file="fig.py" downloadOnly></VerificationPopup>

## 6. The Humanities Argument

The computer-science literature asks: can we verify removal? and answers: only by auditing the process. The prior question — the one this paper exists to press — is why the cultural and legal demand to be forgotten keeps colliding with that wall. The answer is that "forgetting" inherited a metaphysics from physical record-keeping. To forget a paper file is to destroy a substance; the archive, on this older picture, is a box of items, and to remove an item is to make it not-there.

Derrida's reading of the archive (Archive Fever, 1995) already refused that picture: the archive is an instrument of consignation and power, a structure of relations and authority over what shall count as having happened, carrying within it a drive against the living present it fixes. What binds in an archive is not the item but the relation — who may recall it, how it conditions the present, what it makes sayable. The right to be forgotten, written in the idiom of removal, asks for the destruction of an item and means the severing of a relation, and the two come apart exactly where our verification fails.

The contemplative traditions drew the distinction we now need with unusual precision, and reading them corrects a tempting error. It is natural to suppose that the system that holds nothing is the free one, and that liberation is the achievement of an empty store. In Advaita Vedānta the liberated-while-living, the jīvanmukta, does not lose memory; the latent impressions, the saṃskāras, remain. What ends is the identification — the reflex by which awareness takes the accumulated record to be a self. Bondage was the relation, not the record.

The Sufi pairing of fanāʾ (annihilation) and baqāʾ (subsistence) says the same from the side of the act: the self is annihilated and something subsists; what is dissolved is not the store of history but the "I" organized around hoarding it. Read together, the engineer's faith that statelessness is cleanliness and the popular notion that an empty cache is freedom make the identical mistake — locating liberation in the absence of stored content rather than in a changed relation to it.

Biology sides with the traditions against the database, and the point is not ornamental. We once pictured memory as faithful storage and faithful recall, and on that picture the human is the bound creature dragging its archive forward. The picture is wrong: memory reconsolidates, the act of recall rendering a memory briefly labile and rewriting it, so that what returns to storage is altered by the occasion of retrieval. Human memory is lossy and revised on every touch; the faithful, replay-the-log-exactly system is not the human but the machine. Of the two, the database is the more tightly bound — the one that cannot revisit its past without reconstituting it precisely. We built a memory more rigid than the one evolution gave us, named the rigidity durability, and called the durability a virtue.

The verification gap is where these threads meet. The forbidden thing the law wants gone is a relation — an influence, an identification, a way the trace conditions present behavior — and relations are precisely what no substance-level operation removes and no substance-level audit reads. The humanities reading does not merely decorate the technical result; it predicts the wall. A field that understood memory as relation would have expected, in advance, that one cannot certify forgetting by inspecting what is stored.

## 7. Limitations

This paper does not prove the impossibility it organizes around; it imports it. The conceptual contribution is interpretive, and a reader who rejects the substance-versus-relation framing rejects the synthesis with it.

Study 1 is synthetic and small; it demonstrates a mechanism and calibrates intuition, and its central effect — the blindness of membership inference — is a consequence of the influence being generalizable, a condition we constructed and which must be shown, not assumed, for real forget targets.

Study 2 is a pre-registered protocol whose result cells are completed from the run rather than estimated; the strength of any "influence persists" finding will be bounded by the probes chosen, which is itself an instance of F2 and should be reported as such rather than as a universal claim. The work's value, if it has any, is to make computing's verification gap legible as a problem about memory that the humanities are equipped to think, and to supply a runnable test that turns the conceptual claim into an empirical one.

## 8. Implications

If forgetting is the termination of a relation and our instruments read substances, then two things follow for practice that the removal framing obscures. A regulator who certifies compliance by confirming deletion certifies the wrong thing, and an unlearning method validated by a membership audit may be validated against a test blind to the influence the law cares about.

The honest path the technical literature already points to — verify the process, guarantee bounded influence a priori by construction (differential privacy) — is not a workaround but a concession that the relation cannot be inspected after the fact. Whether a relation as diffuse as "influence" can ever be measured well enough to certify its absence is, as far as we can determine, open. Until it can, every system we call forgetful is a system that has removed what it could reach and looked away from the rest, and the difference between that and freedom is the difference the traditions spent two thousand years insisting upon.


---

<!-- METADATA_START -->
## Metadata & Citations

### Further Reading
- [Algorithmic Dysfluency: Why AI Cannot Hear the Stammering Subject](https://www.ranti.dev/blog/algorithmic-dysfluency.md)
- [Beyond RAG: Using Multi-Agent Systems for Deep Cultural and Literary Analysis](https://www.ranti.dev/blog/beyond-rag-tagore.md)
- [Reproducible Research Notebooks for Digital Humanities](https://www.ranti.dev/blog/reproducible-research-notebooks-digital-humanities.md)

### Navigation
- [Back to Bio Hub](https://www.ranti.dev/.md)
- [Full Site Manifest](https://www.ranti.dev/llms.txt)

```json
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Forgetting Is Not Deletion: The Verification Gap in Machine Unlearning",
  "author": {
    "@type": "Person",
    "name": "Rantideb Howlader"
  },
  "datePublished": "2026-06-20T00:00:00.000Z",
  "url": "https://www.ranti.dev/blog/forgetting-is-not-deletion",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isAccessibleForFree": true
}
```

### BibTeX
```bibtex
@article{forgetting-is-not-deletion_2026,
  author = {Rantideb Howlader},
  title = {Forgetting Is Not Deletion: The Verification Gap in Machine Unlearning},
  journal = {Rantideb Howlader Portfolio},
  year = {2026},
  url = {https://www.ranti.dev/blog/forgetting-is-not-deletion},
  note = {Accessed: 2026-06-24}
}
```

### IEEE
Rantideb Howlader, "Forgetting Is Not Deletion: The Verification Gap in Machine Unlearning," Rantideb Howlader Portfolio, 2026. [Online]. Available: https://www.ranti.dev/blog/forgetting-is-not-deletion. [Accessed: 2026-06-24].

### APA
Rantideb Howlader. (2026). Forgetting Is Not Deletion: The Verification Gap in Machine Unlearning. Rantideb Howlader. Retrieved from https://www.ranti.dev/blog/forgetting-is-not-deletion

--- 
*This content is provided in research-grade Markdown format. Required Attribution: Cite as Rantideb Howlader (2026).*
<!-- METADATA_END -->