Algorithmic Dysfluency: Why AI Cannot Hear the Stammering Subject
I. Introduction: The Error Rate of Existence
"To stammer is to interrupt the flow of capital." - Investigational Thesis
When I attempt to interface with my computational devices, the system engages in a process of severe temporal disciplining, listening for precisely 1.5 seconds, and if I do not produce a phoneme within that rigorously defined window, it assumes the request is finished and unceremoniously terminates the session because it cannot conceive of a speaker who requires time to think.
I stammer, and my speech is characterized by Temporal Gaps - silences, repetitions, and blocks that defy the standard Wait Time heuristics of Voice User Interfaces (VUIs); consequently, to Siri, Alexa, and GPT-4o, my silence is empty, interpreted as a null value in the audio stream, whereas to me, that silence is profoundly full of meaning, representing the sound of my cognitive labor, the friction of my thought process, and the audible artifact of my survival in a hostile temporal architecture.
This disjuncture - between my Biological Clock and the Algorithmic Clock - is not a mere technical bug to be patched but a fundamental epistemological exclusion. Epistemological Violence is the disregard for, and the silencing of, the knowledge systems of marginalized groups. The Artificial Intelligence industry is building what Neil Postman would call a Technopoly of Fluency, enforcing a medicalized hegemony of smoothness that renders disabled bodies legally and technologically illegible.
In this critique, I argue that the Stammer is not a biological defect to be cured, but a Temporal Resistance - a "Glitch" in the capitalist production of time that reveals the hegemony of the normative clock. I posit that until we build systems that can listen to the silence, we are building systems that only listen to power.
1.1 The Technical Audit: Why Whisper Fails
To move beyond auto-ethnographic anecdote, let us examine the underlying architecture of these exclusion mechanisms. State-of-the-art ASR models (like OpenAI Whisper) are trained on massive datasets of clean speech (e.g., audiobooks, podcasts, YouTube videos) which have been sanitized of human imperfection. These datasets are fundamentally Discriminatory:
- Selection Bias: Podcasts are edited to remove "ums," "ahs," and stammers, representing a Super-Fluency that exists nowhere in natural human biology.
- The Tokenizer: Models like BERT and GPT utilize Byte-Pair Encoding (BPE), which is optimized to predict the next most probable token, effectively punishing the low-probability sequence of a stammer (e.g., "wh-wh-what").
- The Context Window: When Whisper encounters a block (silence), it often hallucinates a completed sentence to minimize the Word Error Rate (WER), preferring to invent a coherent lie rather than transcribe a broken truth.
This is Algorithmic Dysmorphia. The machine smoothes out my stutter in the transcript to fix me, but in doing so, it erases the texture of my voice. As discussed in my previous work on Glitch Poetics, this erasure is a form of digital eugenics.
II. The Theory: Stammering as Temporal Refusal
We must ground this technical failure in Critical Theory and Phenomenology, asking why the machine hates the pause, because, as Franco "Bifo" Berardi argues, modern capitalism is Semio-Capitalism - the production of value through the high-speed exchange of signs (info-labor); in this economy, Time is Liquidity. Attention is the scarcest commodity, and any hesitation, any Latency, is simply friction that lowers the rate of profit. Marshall McLuhan famously codified that "the medium is the message," and in the age of AI, the medium is Velocity, conveying the message that "Intelligence equals Speed," and thus the stammerer, by breaking the velocity, breaks the message itself.
2.1 The Stammer as a "Glitch"
In my analysis of The Physics of Trauma, I established that trauma creates a distinct inertial reference frame, and building upon that, I argue here that the Stammer is a Temporal Glitch, signifying that the stammering subject refuses to synchronize with the Master Clock of neoliberalism.
- The Fluent Speaker: Delivers information in linear time ().
- The Stammerer: Delivers information in recursive time ().
Society pathologizes this recursion as anxiety or stupidity, but I propose we reframe it as Resistance, because when I block on a word, I am forcing the listener (and the machine) to Wait, thereby reclaiming the present moment from the relentless rush of the future, and in a world of Instant Answers (Perplexity, ChatGPT), the ability to pause - to not answer instantly - is a radical political act of Hermeneutics.
2.2 The Orphan's Code (Auto-Ethnography)
My life has been defined by two great interruptions, which constitute the genealogy of my resistance: the first was the loss of my mother when I was six years old, at which point I became an orphan in a world that did not care to explain itself to me, forcing me to earn my bread and butter before I could read, and in that state of high-stakes survival, I learned that Speed Kills - if you answer too fast, you agree to things you simply do not understand.
We must reframe this not as failure, but as what Donna Haraway might call a Cyborg Resistance, for Haraway taught us that the cyborg is "oppositional, utopian, and completely without innocence," and the stammering subject is the ultimate cyborg - half biological struggle, half technical interface. The accident that took my fluency forced me to live in Crip Time (Kafer), and while I work to regain my speech, I have found that the machines I work with (Siri, Whisper) have no patience for my recovery; they demand I be "fixed" now, enforcing a Frantz Fanon-esque Zone of Non-Being upon the dysfluent, such that to the machine, I am not a user, but an unhandled exception.
III. The Architecture of Erasure
If the stammer is a form of resistance, the current AI stack is a machine of containment and erasure. The industry's failure to accommodate dysfluency is not merely an oversight in Edge Case testing; it is a fundamental architectural decision to prioritize Order over Integrity and Teleology.
3.1 The Clean Data Fallacy
Deep Learning models are only as good as their ground truth, yet the industry relies on datasets like Common Voice or LibriSpeech, which are rigorously scrubbed of disfluencies. Engineers treat stammers, repetitions, and pauses as Noise to be filtered out before training, assuming that the purified signal contains the truth. This is a Category Error. For the stammering subject, the noise is the signal. By training models to predict the next fluent token, we are effectively training them to hallucinate a normative speaker where none exists. We are building systems that prefer a coherent lie to a broken truth, echoing the theology of code I explored in Theology of Code.
3.2 The Tyranny of the Timeout
Consider the End-of-Speech (EOS) detection logic in modern Voice User Interfaces (VUIs), where the standard threshold is often set between 700ms and 1500ms. This is an arbitrary temporal border that enforces a Normative Pace of thought upon the entire user base. When the stammerer hits a block, they are not done; they are actively working to produce speech. But the machine interprets silence as absence. By hard-coding these timeout thresholds, developers have inadvertently encoded a Time-Out mechanism that deports the dysfluent user from the digital public square every time they pause to breathe.
3.3 Towards Integrity Retention
The Goal of ASR is currently defined as minimizing the Word Error Rate (WER), a metric that assumes that the perfect transcript is one that matches a fluent script. A more humane technology would optimize for Integrity Retention. It would ask: Does the transcript capture the texture of the speech? If a user fights through a 10-second block to say I... I... I love you, a "fluent" transcript that reads "I love you" has destroyed the emotional labor of the utterance. A truly smart AI would learn to listen to the struggle, not just the syntax.
IV. Conclusion: The Right to Lag
Édouard Glissant demanded the Right to Opacity - the right not to be understood or transparent to the colonial gaze. I demand the Right to Lag. The right to process information at my own speed, without penalty. The right to be out of sync with the centralized server of capitalist production.
As AI becomes the Universal Interface for banking, healthcare, and law, we face an imminent civil rights crisis. If these systems cannot hear the stammering subject, then the stammering subject will be locked out of the digital polity entirely. We are building a world where only the fluent are citizens, and the dysfluent are relegated to the margins. My research is a refusal of that world. I am here to glitch the timeline.
Bibliography
- Berardi, Franco "Bifo". The Soul at Work: From Alienation to Autonomy. Semiotext(e), 2009.
- Benjamin, Ruha. Race After Technology: Abolitionist Tools for the New Jim Code. Polity, 2019.
- Fanon, Frantz. Black Skin, White Masks. Grove Press, 1967.
- Glissant, Édouard. Poetics of Relation. University of Michigan Press, 1997.
- Haraway, Donna. Simians, Cyborgs, and Women: The Reinvention of Nature. Routledge, 1991.
- Kafer, Alison. Feminist, Queer, Crip. Indiana University Press, 2013.
- McLuhan, Marshall. Understanding Media: The Extensions of Man. MIT Press, 1964.
- Postman, Neil. Technopoly: The Surrender of Culture to Technology. Vintage, 1993.
- Whittaker, Meredith. "The Steep Cost of Compute." Interactions 27, no. 6 (2020).
- Hugging Face. "The State of AI 2024: Open Source vs. Closed Models." (2024).
- OpenAI. "GPT-4o System Card: Safety and Limitations." (2024).
- Anthropic. "Claude 3.5 Sonnet: System Card and Safety Evaluation." (2025).
Key Definitions
Algorithmic Dysfluency refers to the systemic and architectural failure of AI speech recognition models to accurately recognize, transcribe, and accommodate non-normative speech patterns such as stammering, stuttering, and aphasia, resulting in the exclusion of disabled users.
Crip Technoscience is an interdisciplinary field of study that merges Critical Disability Studies with Science and Technology Studies (STS) to challenge the ableist assumptions embedded in the design of technological systems and to propose crip-centric alternatives.
Temporal Glitch refers to a theoretical framework arguing that the "glitch" or the "stammer" is not a failure of the system, but a form of political resistance that disrupts the accelerated "time is money" logic of neoliberal capitalism.
Semio-Capitalism is a concept developed by Franco "Bifo" Berardi describing a mode of production where the creation of value is driven by the accumulation and exchange of signs, information, and attention rather than material goods.
Integrity Retention is a proposed metric for evaluating AI transcription systems that prioritizes the preservation of the speaker's original dysfluency, struggle, and emotional texture over the smoothing or sanitizing of speech into normative fluency.

