In Defence of Small Effects: Why Any Signal in Language is Amazing

7 minute read

Published:

A few months ago, a paper of mine came back from peer review with a comment I have seen in various forms throughout my career. The reviewer acknowledged that the results were statistically significant, then added, almost as a dismissal: “but the effect sizes are small.” The implication was clear: if the variance explained is modest, the finding is modest. I want to push back on that, not defensively (well, perhaps a little defensively), but because I think this criticism, applied to research on the internal structure of language, fundamentally misunderstands what it would mean to find a large effect, and why finding any effect at all is pretty amazing.

The study (Kilpatrick & Bundgaard-Nielsen, 2026) that prompted this reflection examined whether phonemic Surprisal (how statistically unexpected the sound sequences in a word are) is systematically elevated in words with vivid meanings: concrete (grounded in sensory experience), imaginable (easy to picture), or specific (like sparrow rather than bird). The hypothesis, which we call the Attentional Optimization Hypothesis, is that language communities may unconsciously converge on encoding conceptually rich meanings in phonologically marked forms. We proposed that words for things that demand attention are built from sound sequences that are themselves attentionally disruptive: rare, harder to process in the moment, but more memorable for it. The results partially supported this. Even after controlling for a wide range of confounds, Concreteness and Imaginability were both significantly associated with higher Surprisal—with simple regression models returning modest R² values of .024 and .002 respectively—while Specificity, despite its theoretical plausibility, did not reach significance. Words high in Surprisal and vividness were also more accurately recalled in memory recognition tasks. These are small effect sizes. Does that mean they don’t matter?

To understand why small effects here are meaningful, it helps to start with the theoretical weight pressing against them. Saussure’s foundational claim, that the relationship between a sign and its meaning is arbitrary, that there is no natural connection between the word tree and what it encodes, became one of the most durable axioms of twentieth-century linguistics. Arbitrariness was not merely descriptive; it was elevated into a defining principle, and it created a lasting intellectual headwind for anyone interested in form–meaning systematicity. It is worth noting that Saussure himself was more nuanced than his reputation suggests: he distinguished between absolute arbitrariness and relative arbitrariness, acknowledging that languages contain motivated signs alongside unmotivated ones, and he described the limiting of arbitrariness as “the best possible basis for approaching the study of language as a system” (Saussure, 1916/1959, p. 133). The doctrine of total arbitrariness is, to some extent, a simplification of what he actually claimed. Nevertheless, that simplification has been enormously influential, and sound symbolism research has spent decades pushing against it. The Saussurean framing still shapes how results are interpreted: small effects are read as confirmation that arbitrariness dominates, any residual iconicity a footnote. But dominance is not totality, and that framing may be asking the wrong question.

Across contemporary iconicity research, the pattern is consistent: significant effects, modest sizes. Naïve listeners can guess ideophone meanings above chance, but performance is substantially lower than in pseudoword tasks, supporting a view of ideophones as combining substantial arbitrariness with only weak iconic cues (Dingemanse et al., 2016). A meta-analysis of infant bouba–kiki studies reports a moderate but not large effect, with sensitivity asymmetrical and age-dependent (Fort et al., 2018). Tests of vowel-based size symbolism using actual nouns find no reliable facilitation, suggesting classic nonword patterns may not generalise to ordinary lexical access (Sidhu & Pexman, 2022). Critical reviews conclude that broad, system-wide processing advantages for iconicity are not yet supported (Nielsen & Dingemanse, 2021). The prevailing summary is that arbitrariness dominates; but consider what these effects are being asked to survive: the phonological erosion that strips perceptual features over time (Flaksman’s iconic treadmill, 2017), morphological layering, cross-linguistic borrowing, and the combinatorial pressure of packing tens of thousands of meanings into a finite phonological space with no overarching design. That any signal remains detectable after all of that is the remarkable fact. The question is not why the effects are small. It is why they exist at all.

Cohen’s conventions for effect size were developed for controlled experiments, not for corpus linguistics, where the dependent variable is the accumulated product of an unbounded historical process and the predictors carry their own measurement error and cultural specificity. An R² of .024 does not mean Concreteness explains only 2.4% of what makes a word’s Surprisal what it is, it means Concreteness explains 2.4% of variance above and beyond all the unmeasurable historical contingency that constitutes the rest. A large effect in this domain would imply near-deterministic form–meaning mappings, which is not the language we have and not what any account of lexical history could sustain. Small effects are precisely what we should expect if real pressures operate on a system submerged in irreducible noise. A small effect in lexical research is less like finding a large object and more like detecting a faint astronomical signal from a distant galaxy. The signal may account for only a tiny fraction of the data reaching the detector, but that is precisely what makes it remarkable. It has travelled across immense distances, survived interference and distortion, and remains detectable nonetheless.

Words are the residue of history: noisy, contingent, shaped by forces operating at timescales no dataset can fully capture. Saussure was right that the sign is largely arbitrary; but largely is doing a lot of work in that sentence. When we find that concrete words are, on average, phonologically more surprising than abstract ones, we are finding something that had to survive centuries of attrition to be detectable at all. That is not a small result. It is a signal from a very long way away, and it deserves more than a shrug.


This post draws on research conducted with Rikke Bundgaard-Nielsen (University of Melbourne), supported by a grant from the University of Aizu. The full paper is published in Cognition (2026). A plain language version of this study is available here.


Dingemanse, M., Schuerman, W. L., Reinisch, E., Tufvesson, S., & Mitterer, H. (2016). What sound symbolism can and cannot do: Testing the iconicity of ideophones from five languages. Language, 92(2), e117–e133.

Flaksman, M. (2017). Iconic treadmill hypothesis: The reasons behind continuous onomatopoeic coinage. In A. Zirker, M. Bauer, O. Fischer, & C. Ljungberg (Eds.), Dimensions of iconicity (pp. 15–38). John Benjamins.

Fort, M., Lammertink, I., Peperkamp, S., Guevara-Rukoz, A., Fikkert, P., & Tsuji, S. (2018). SymBouKi: A meta-analysis on the emergence of sound symbolism in early language acquisition. Developmental Science, 21(5), e12659.

Kilpatrick, A., & Bundgaard-Nielsen, R. (2026). Say it like you mean it: Linguistic vividness and the attentional optimization hypothesis. Cognition, 269, 106406.

Lockwood, G., & Dingemanse, M. (2015). Iconicity in the lab: A review of behavioral, developmental, and neuroimaging research into sound-symbolism. Frontiers in Psychology, 6, 1246.

Nielsen, A. K. S., & Dingemanse, M. (2021). Iconicity in word learning and beyond: A critical review. Language and Speech, 64(1), 52–72.

Saussure, F. de. (1959). Course in general linguistics (W. Baskin, Trans.). Philosophical Library. (Original work published 1916)

Sidhu, D. M., & Pexman, P. M. (2022). Is a boat bigger than a ship? Null results in the investigation of vowel sound symbolism on size judgements in real language. Quarterly Journal of Experimental Psychology, 75(10), 1820–1837.