Alexander Kilpatrick

In Defence of Small Effects: Why Any Signal in Language is Amazing

2026-06-01T00:00:00+00:00

A few months ago, a paper of mine came back from peer review with a comment I have seen in various forms throughout my career. The reviewer acknowledged that the results were statistically significant, then added, almost as a dismissal: “but the effect sizes are small.” The implication was clear: if the variance explained is modest, the finding is modest. I want to push back on that, not defensively (well, perhaps a little defensively), but because I think this criticism, applied to research on the internal structure of language, fundamentally misunderstands what it would mean to find a large effect, and why finding any effect at all is pretty amazing.

The study (Kilpatrick & Bundgaard-Nielsen, 2026) that prompted this reflection examined whether phonemic Surprisal (how statistically unexpected the sound sequences in a word are) is systematically elevated in words with vivid meanings: concrete (grounded in sensory experience), imaginable (easy to picture), or specific (like sparrow rather than bird). The hypothesis, which we call the Attentional Optimization Hypothesis, is that language communities may unconsciously converge on encoding conceptually rich meanings in phonologically marked forms. We proposed that words for things that demand attention are built from sound sequences that are themselves attentionally disruptive: rare, harder to process in the moment, but more memorable for it. The results partially supported this. Even after controlling for a wide range of confounds, Concreteness and Imaginability were both significantly associated with higher Surprisal—with simple regression models returning modest R² values of .024 and .002 respectively—while Specificity, despite its theoretical plausibility, did not reach significance. Words high in Surprisal and vividness were also more accurately recalled in memory recognition tasks. These are small effect sizes. Does that mean they don’t matter?

To understand why small effects here are meaningful, it helps to start with the theoretical weight pressing against them. Saussure’s foundational claim, that the relationship between a sign and its meaning is arbitrary, that there is no natural connection between the word tree and what it encodes, became one of the most durable axioms of twentieth-century linguistics. Arbitrariness was not merely descriptive; it was elevated into a defining principle, and it created a lasting intellectual headwind for anyone interested in form–meaning systematicity. It is worth noting that Saussure himself was more nuanced than his reputation suggests: he distinguished between absolute arbitrariness and relative arbitrariness, acknowledging that languages contain motivated signs alongside unmotivated ones, and he described the limiting of arbitrariness as “the best possible basis for approaching the study of language as a system” (Saussure, 1916/1959, p. 133). The doctrine of total arbitrariness is, to some extent, a simplification of what he actually claimed. Nevertheless, that simplification has been enormously influential, and sound symbolism research has spent decades pushing against it. The Saussurean framing still shapes how results are interpreted: small effects are read as confirmation that arbitrariness dominates, any residual iconicity a footnote. But dominance is not totality, and that framing may be asking the wrong question.

Across contemporary iconicity research, the pattern is consistent: significant effects, modest sizes. Naïve listeners can guess ideophone meanings above chance, but performance is substantially lower than in pseudoword tasks, supporting a view of ideophones as combining substantial arbitrariness with only weak iconic cues (Dingemanse et al., 2016). A meta-analysis of infant bouba–kiki studies reports a moderate but not large effect, with sensitivity asymmetrical and age-dependent (Fort et al., 2018). Tests of vowel-based size symbolism using actual nouns find no reliable facilitation, suggesting classic nonword patterns may not generalise to ordinary lexical access (Sidhu & Pexman, 2022). Critical reviews conclude that broad, system-wide processing advantages for iconicity are not yet supported (Nielsen & Dingemanse, 2021). The prevailing summary is that arbitrariness dominates; but consider what these effects are being asked to survive: the phonological erosion that strips perceptual features over time (Flaksman’s iconic treadmill, 2017), morphological layering, cross-linguistic borrowing, and the combinatorial pressure of packing tens of thousands of meanings into a finite phonological space with no overarching design. That any signal remains detectable after all of that is the remarkable fact. The question is not why the effects are small. It is why they exist at all.

Cohen’s conventions for effect size were developed for controlled experiments, not for corpus linguistics, where the dependent variable is the accumulated product of an unbounded historical process and the predictors carry their own measurement error and cultural specificity. An R² of .024 does not mean Concreteness explains only 2.4% of what makes a word’s Surprisal what it is, it means Concreteness explains 2.4% of variance above and beyond all the unmeasurable historical contingency that constitutes the rest. A large effect in this domain would imply near-deterministic form–meaning mappings, which is not the language we have and not what any account of lexical history could sustain. Small effects are precisely what we should expect if real pressures operate on a system submerged in irreducible noise. A small effect in lexical research is less like finding a large object and more like detecting a faint astronomical signal from a distant galaxy. The signal may account for only a tiny fraction of the data reaching the detector, but that is precisely what makes it remarkable. It has travelled across immense distances, survived interference and distortion, and remains detectable nonetheless.

Words are the residue of history: noisy, contingent, shaped by forces operating at timescales no dataset can fully capture. Saussure was right that the sign is largely arbitrary; but largely is doing a lot of work in that sentence. When we find that concrete words are, on average, phonologically more surprising than abstract ones, we are finding something that had to survive centuries of attrition to be detectable at all. That is not a small result. It is a signal from a very long way away, and it deserves more than a shrug.

This post draws on research conducted with Rikke Bundgaard-Nielsen (University of Melbourne), supported by a grant from the University of Aizu. The full paper is published in Cognition (2026). A plain language version of this study is available here.

Dingemanse, M., Schuerman, W. L., Reinisch, E., Tufvesson, S., & Mitterer, H. (2016). What sound symbolism can and cannot do: Testing the iconicity of ideophones from five languages. Language, 92(2), e117–e133.

Flaksman, M. (2017). Iconic treadmill hypothesis: The reasons behind continuous onomatopoeic coinage. In A. Zirker, M. Bauer, O. Fischer, & C. Ljungberg (Eds.), Dimensions of iconicity (pp. 15–38). John Benjamins.

Fort, M., Lammertink, I., Peperkamp, S., Guevara-Rukoz, A., Fikkert, P., & Tsuji, S. (2018). SymBouKi: A meta-analysis on the emergence of sound symbolism in early language acquisition. Developmental Science, 21(5), e12659.

Kilpatrick, A., & Bundgaard-Nielsen, R. (2026). Say it like you mean it: Linguistic vividness and the attentional optimization hypothesis. Cognition, 269, 106406.

Lockwood, G., & Dingemanse, M. (2015). Iconicity in the lab: A review of behavioral, developmental, and neuroimaging research into sound-symbolism. Frontiers in Psychology, 6, 1246.

Nielsen, A. K. S., & Dingemanse, M. (2021). Iconicity in word learning and beyond: A critical review. Language and Speech, 64(1), 52–72.

Saussure, F. de. (1959). Course in general linguistics (W. Baskin, Trans.). Philosophical Library. (Original work published 1916)

Sidhu, D. M., & Pexman, P. M. (2022). Is a boat bigger than a ship? Null results in the investigation of vowel sound symbolism on size judgements in real language. Quarterly Journal of Experimental Psychology, 75(10), 1820–1837.

The Lesson of Chelmsford

2026-05-26T00:00:00+00:00

Before I knew him, my father was an addict. What follows is not a tragic backstory, nor a tale of overcoming difficulty to become a scientist. It is about what I learned later, as an adult, about the treatment he received, and about how that discovery has shaped how I think about science, evidence, and responsibility in my own research.

My father’s family emigrated from Scotland to Australia and settled in Sydney. I do not know how or why he began using drugs, but I do know that he received treatment at Chelmsford.

Chelmsford Private Hospital, under the direction of psychiatrist Dr Harry Bailey, became known for the use of what was termed Deep Sleep Therapy (DST) from the 1960s through the late 1970s. The treatment involved placing patients into prolonged drug-induced unconsciousness for days or weeks at a time, using high doses of barbiturates and other sedatives, often in combination with electroconvulsive therapy. Patients were frequently immobilised, tube-fed, and monitored with minimal medical staffing. Bailey justified the practice as a way of allowing the brain to “reset,” despite the absence of robust clinical evidence supporting its safety or effectiveness.

Over time, serious concerns emerged about the treatment’s outcomes and governance. Whistleblowers, former staff, and bereaved families raised alarms about poor record-keeping, lack of informed consent, and the suppression of adverse results. These concerns culminated in the 1988–1990 New South Wales Royal Commission into Deep Sleep Therapy, which found that DST was dangerous, lacked any sound scientific basis, and was administered in circumstances that amounted to gross professional misconduct. Dr Bailey died by suicide in 1985, before the Royal Commission concluded, but Chelmsford has since become a central case study in medical failure, regulatory breakdown, and the human cost of unchallenged authority.

I don’t want to do my father a disservice by speculating about the long-term effects Chelmsford may or may not have had on him. I cannot say with certainty which aspects of his later life, behaviour, or health should be attributed to the treatment, and which should not. What I can say is this: as a former addict and as a man who underwent treatment at Chelmsford, the effort required of him just to function must, at times, have been superhuman.

What can be said with confidence is what the Royal Commission established about Deep Sleep Therapy itself. The Commission found that DST was administered to more than a thousand patients at Chelmsford, with 27 deaths directly associated with the treatment and a further 24 suicides occurring in the same year as discharge. Hundreds of surviving patients reported severe and lasting harm, including brain damage, persistent cognitive impairment, memory loss, personality change, and profound psychological injury. Many described long-term difficulties with concentration, emotional regulation, and basic functioning that endured well beyond their treatment. The Commission described the events at Chelmsford as deplorable, citing serious medical negligence, obstruction of justice, and fraudulent conduct.

Dr Bailey did not set out to harm his patients. By his account, he believed Deep Sleep Therapy was a legitimate medical intervention that could help people recover from severe psychiatric conditions. While this reasoning was fundamentally flawed and unsupported by robust evidence, it reflects a form of professional hubris rather than deliberate malice.

I don’t know how much my father’s story informs my research in any direct or traceable way. There is no neat causal line I can draw, and I don’t trust accounts that pretend otherwise. What I do know is how I work. I am careful, especially with human participants. I publish my data alongside my research and try to make my reasoning transparent. Not because linguistics is dangerous — it isn’t — but because the lesson of Chelmsford is not about risk alone. It is about how easily certainty hardens and how quickly weak evidence can be mistaken for insight.

Much of my research examines how the human mind is susceptible to illusion: to patterns that feel real whether or not they are, and to confidence that survives in the absence of warrant. Those findings do not place me above the system I study; they place me inside it. I am not exempt from the cognitive machinery I analyse; I am its owner. That means the responsibility is not to be right, or even persuasive, but to work from the assumption that I am wrong.