Mind Alchemy Metrics

I’ve been spending time lately with the early figures in psychometrics. Not skimming timelines, but lingering. Reading their correspondence, their arguments, their assumptions. Trying to understand not just what they built, but what they believed they were building for. It’s a strange feeling, revisiting a field you genuinely love and discovering, again and again, how much history it carries. Psychometrics, at its core, is about how we decide what counts as evidence. How we translate human experience into numbers. How we agree, collectively, that a score means something rather than nothing. That work has always required judgment. It has never been purely technical.

Measurement as a Way of Seeing

We’re surrounded by metrics now, so the basic idea feels familiar. Social media platforms track engagement. Clinics track outcomes. Systems track utilisation. Each metric depends on a set of decisions about what to count, how to count it, and how much weight to give each component.

Change the definition, and the number changes.

Change the weights, and the story shifts.

Nothing about this is mysterious. It’s how measurement works.

Psychometrics is simply the disciplined study of that process in psychological domains: how constructs are operationalised, how reliability is established, how validity claims are justified, and how differences are interpreted.

For clinicians, this is well-trodden ground.

What’s easier to forget is that measurement is also a way of seeing people. And ways of seeing always reflect the values of their time.

A History We Inherited

Here’s the part that still catches, even if you already know it.

Early psychometrics did not emerge in a vacuum. Intelligence testing, in particular, developed alongside explicit efforts to classify, rank, and sort human beings in ways that aligned with eugenic thinking. Measurement was often used to justify conclusions that had already been decided in advance, rather than to explore uncertainty (Wijsen, Borsboom, & Alexandrova, 2022).

This history is not incidental. It shaped what was measured, how differences were interpreted, and which outcomes were treated as desirable.

Later, testing was taken up for other purposes. Military placement. Educational sorting. Administrative efficiency. Some of those uses were pragmatic. Some were well intentioned. Many carried unintended consequences.

As Shepard (2024) notes, the concern is not simply that testing was misused in the past, but that historical patterns of use continue to shape contemporary policy and practice when context is ignored.

Values have always been present in psychometrics. They just weren’t always named.

The Long Echo

It would be comforting to say that the problematic uses of psychometric testing are firmly behind us. That history belongs to another era.

The reality is more complicated.

Contemporary measures are more sophisticated. Our psychometric standards are stronger. Our ethical frameworks are clearer. And yet, the basic structure remains the same: we measure differences, compare people to reference groups, and make decisions based on those comparisons.

The risk identified in both philosophical and historical analyses of psychometrics is not wholesale repetition, but quiet drift. When populations are treated as interchangeable, when context drops out, when scores are allowed to stand in for understanding, older patterns can resurface.

Not with malice. With efficiency.

Living with the Tension

I find myself returning to this history not because it makes me like psychometrics less, but because it makes me take it more seriously.

I still love measurement theory. I still enjoy arguing about reliability, validity, and construct definition. I still find the mathematics elegant. There is something deeply satisfying about trying to measure psychological phenomena well, knowing how difficult that task actually is.

What has shifted, over time, is not my commitment to psychometrics, but my relationship to it.

There’s a kind of adult affection that comes from seeing a field clearly, limitations and all. From recognising that the tools we inherit carry memory, whether we attend to it or not.

A Quiet Responsibility

For clinicians, this history doesn’t resolve into a single response. It tends to linger, quietly shaping how measures are held.

It shows up in questions we ask without always naming them:

What assumptions are embedded here?

Who was this measure built for?

What might it miss?

As both Wijsen et al. (2022) and Shepard (2024) argue in different ways, historical awareness doesn’t weaken psychometrics. It strengthens it by keeping interpretive choices visible rather than implicit.

That tension isn’t something to resolve once and for all. It’s something to live with.

Closing Reflection

Revisiting the history of psychometrics has been less like uncovering a scandal and more like listening to an old echo. Familiar, a little unsettling, and instructive if you don’t rush past it.

The field doesn’t need to be redeemed.

It needs to be remembered.

And perhaps that remembering is part of what it means to practice measurement with care.

References

Wijsen, L. D., Borsboom, D., & Alexandrova, A. (2022). Values in psychometrics. Perspectives on Psychological Science, 17(3), 788–804. https://doi.org/10.1177/17456916211014183
‍
Shepard, L. A. (2024). What should psychometricians know about the history of testing and testing policy? Educational Measurement: Issues and Practice, 43(4), 46–61. https://doi.org/10.1111/emip.12650

The Problematic History of Psychometrics

Measurement as a Way of Seeing

A History We Inherited

The Long Echo

Living with the Tension

A Quiet Responsibility

Closing Reflection

Naming the Norms

The Quiet Weight of Clinical Data

A Note on Direction Holding the Record

References

Measurement as a Way of Seeing

A History We Inherited

The Long Echo

Living with the Tension

A Quiet Responsibility

Closing Reflection

Naming the Norms

The Quiet Weight of Clinical Data

A Note on Direction Holding the Record

References

Join the newsletter