Studies in Mediumship

As someone who spends a fair amount of time reading philosophy, I find that I have pretty strong commitments to physicalism/materialism (i.e. the physical world is all that there is, and all non-physical things can be reduced to physical objects). I’m also extremely confident that any forces that can affect objects over long distances have already been discovered (i.e. gravity, and electromagnetism), and as such whether psychic powers (of any kind) exist is an answered question: psychic powers are not a thing. Neither is speaking with the dead. Given my stance on physicalism, the concept of a consciousness surviving the death of the brain is just not at all viable.

That said, I’m open to being proven wrong. Not hugely open: anyone attempting to demonstrate the existence of psychic powers, or mediumship, are effectively claiming that all studies to date that have found only four physical forces are in error, and that there is an additional force/energy/whatever that the lump of carbon, hydrogen and oxygen known as the human brain can access (but only some of them), but other lumps of carbon, hydrogen and oxygen cannot. In terms of proportioning the evidence to the claim, this particular claim is going to require quite the mountain of evidence.

However, before worrying about *how* these kinds of things work, first we need evidence *that* they work. Over the last week or so, I got sucked into a website that believers in mediumship frequent, and tried to chase down some evidence. Of course, many of them believe that evidence isn’t possible and that simply attesting to their abilities should be all that’s necessary (and you’re a terrible, *terrible* person if you don’t take them at their word). Many of them see ‘skeptic’ as a bad word and………… Frankly, I don’t blame them. There are plenty of assholes in the skeptic community who believe that it’s appropriate to abuse and bully people who are wrong in their beliefs about the world. Make no mistake: anyone working as a medium is, in my opinion, a fraud (intentional or otherwise), but abusing them isn’t going to help anyone.

One of the guys on that forum, however, pointed me at what I think is a fairly rigorous and well-thought out study. I think that there’s a couple of flaws in it, but it’s otherwise solid. Funnily enough, I think it shows exactly the opposite of what they fanboys (and the investigators) think.

The study is Anomalous Information Reception By Research Mediums Under Blinded Conditions Ii: Replication And Extension by J. Beischel et al, and here’s a link to the PDF. (full citation is at the bottom of the post)

I’m going to summarise and skip some of the details here, just to outline what I think is a pretty fantastic methodology: people who are allegedly mediums (the “readers”) are given the name of a person they are to do a reading for (the “sitter”). They are given five standardised questions to answer, and then an ‘Free-Form’ section in which they could say whatever they wanted (“discarnate” means “a bodiless entity” or some such).

“The audio-recordings of the readings were then transcribed, formatted into lists of definitive statements, and blinded to remove any references to the discarnatesʼ names. They were then e-mailed to a blinded experimenter who e-mailed them to the blinded sitters for scoring. Each sitter received two blinded readings: a target reading intended for the named discarnate they chose and a decoy reading for another sitter’s discarnate.”

The experiment was run twice (I’m ignoring the exploratory run), with slightly different results. I’ve summarised them below, with some commentary.

Experiment 1

In the first experiment, the sitters were asked to estimate how accurately (as a percentage) both sections were. There’s no explanation here for the directions that the sitters were given, so it’s unknown how these numbers were determined. Moreover, we are not provided with the data (either the readings, or the actual facts about the “discarnates”), so it’s impossible for anyone outside of this experiment to judge the accuracy of these assessments. In and of itself, this is a serious flaw in this study.

However, let’s put that to one-side and take the experiment to be methodologically flawless, and assume that the sitters are absolutely correct in their ratings.

Estimated Accuracy

It’s important to note here that the sitters were asked to estimate the accuracy of the readings in Experiment 1. Even the experimenters didn’t actually check the answers against the facts.

The accuracy of the 5 Questions section of the targeted readings were estimated to be 47.4% +/- 5.2% versus the accuracy of the decoy readings being 35.6% +/- 5.7%. The difference is not significant.
The accuracy of the Free-Form section of the targeted readings was lower at 37.0% +/- 4.8% versus the accuracy of the decoy readings being 24.5% +/- 4.9% (statistically significant).

First off, the difference in estimated accuracy between the targeted readings and the decoy readings for the 5 Questions turn out to be no better than chance. Bear in mind that these questions are very specific, and there’s not a lot of wiggle room here. The 5 Questions were:

What did the discarnate look like in his/her physical life? Provide a physical description of the discarnate.
Describe the discarnateʼs personality.
What were the discarnateʼs hobbies or interests? How did she/he spend her/his time?
What was the discarnateʼs cause of death?
Does the discarnate have any specific messages for the sitter?

At best, the readers were only able to answer these questions with 50% accuracy. Bearing in mind that the readings that were about somebody else were rated at 40% accuracy (at best), one has to ask: even if mediums *really do* have magic powers, just what are they bringing to the table? If they’re equally likely to get information wrong as right, of what use are they?

Now, given how much space they have to be as vague as they like, I would have expected higher ratings of accuracy in the Free-Form section. Alas, it isn’t the case: the accuracy falls from ~47% to ~37%. So instead of being half-right, they’re two thirds wrong. And sure, the difference between the targeted readings here and the decoy is statistically significant, but there’s a more relevant question: if I can win a bet against the house one time in three versus the usual one time in four, I’m still a crappy gambler. If that’s the best I can do, I’m clearly bad at this. Ultimately, though, that presupposes some level of actual skill. Given the low level of accuracy here, and how closely rated the numbers are, I don’t see any reason to rule out ‘random chance’ from this. Sure, the values for the Free-Form area are “statistically significant, at P<0.05, but that just means that there’s a less than 5% chance that the difference isn’t ‘real’. This may well have been generated by chance, and only replication could tell us for sure.

Global Score

Additionally, the various readings were rated (overall) from 0 to 6, where 0 is that the reading contained “no correct information or communication”, and 6 is an “excellent reading, including strong aspects of communication, and with essentially no incorrect information”. The pro-mediumship folk I was discussing this with really seemed to focus in on this section as the thing that ‘proved’ mediumship, and I’m really at a loss to understand why.

Firstly, again because of the lack of data being provided, we have no way to evaluate this for ourselves.

Secondly, even if the scores are entirely accurate, they’re terrible. The targeted readings received a score of 2.78 +/- 0.26, and the decoy readings received a score of 2.04 +/- 0.26. So let’s say that the targeted readings received scores of a 3, which is a “mixture of correct and incorrect information, but enough correct information to indicate that communication with the discarnate occurred”.

Let’s just pause for a moment to take in just how low the bar is here: in an experiment to try and check whether “anomalous information reception” is a real thing, the sitters were basically asked to say that “yup, this is a real thing” as part of the data collection. This is the kind of place where biases creep in: instead of merely judging the information provided as accurate or not, a ‘gate’ was created for responses for people who feel ‘this is something that’s real’, meaning that should a sitter feel that a particular reading is actually real (regardless of the “objective” accuracy of the reading), they’ll be biased towards rating it higher than they otherwise would: bad methodology.

In any case, even though the scores for the targeted readings are statistically significantly higher than for the decoy readings, the choice is still between a set of readings that are a mix of correct and incorrect information, and readings that are mostly wrong: the gap here is tiny. This is not a result to cheer about.

Reading Choice

Finally, the sitters were asked to pick out which reading was the targeted one, and which was the decoy. They got it right 63%, which was not statistically significant. So one could go on and on about how the sitters subjectively rated the targeted studies higher than the decoy studies (and the pro-mediumship folk do), but the bottom line is that the difference is so small that when asked to tell the difference between the two readings, the sitters were unable to do so.

If it’s not possible to tell the difference between a real thing and a fake thing, then there’s a serious problem at hand.

Experiment 2

The initial setup for this experiment is the same as for Experiment 1: the readers do not have direct access to the sitters, 5 questions have to be answered, and there’s a Free-Form section to be filled in. The only significant change in methodology happens when we get to the sitters.

Calculated Accuracy

In this experiment, in contrast with the first, the sitters are asked to rate each individual answer from 1 to 5, where 1 is ‘this answer doesn’t fit at all’ and 5 is ‘this answer fits perfectly, no interpretation needed’. There is also a 0 to be “used if the rater does not understand the item or does not have enough information to judge its accuracy”. Now….. This is problematic, because it affects the calculations for accuracy later. The experimenters anticipated this kind of criticism, however, so we’ll discuss this later.

To determine the percentage accuracy, the experimenters tallied the number of mostly-hits (answers that scored either a 4 or a 5), and divided that number by the total number of answers minus the items that scored a 0, presumably because “don’t know’ could be an answer that is yet to pass, or impossible to verify, or some other nonsense. Of course, by excluding the items that scored a 0, the percentage accuracy is inflated.

As reported by the experimenters, the accuracy of the targeted readings were significantly higher than the decoy readings: (52.8% +/- 3.9% vs. 36.6% +/- 3.8%, P = .002). Bear in mind that this number is inflated, as discussed above. But put that to oneside: at best, mediums are scoring roughly 55% accuracy, whereas a reading that is clearly not about the discarnate in question is still scoring 40%. How could it possibly be that a reading about someone who is not me is 80% as accurate as a reading that is specifically about me? In short, if there is an effect here (and I dispute that there is), it’s underwhelming in the extreme.

The Free-Form section, an area I’d expect a real medium with access to real powers to *shine*, is a complete disaster in this test: 43.0% +/- 4.8% vs. 35.6% +/- 5.5%, P = .10. For all intents and purposes, the targeted reading was essentially interchangeable with the decoy reading.

Direct Hits vs Complete Misses

Regarding the inflated numbers, the experimenters did a complicated statistical analysis (none of which we can verify given that we don’t have access to the data) to claim that if they only compared definite hits (5) with definite misses (1) there were significantly more hits than misses for the targeted readings as compared with the decoy readings. I guess we’ll just have to take their word on this, eh?

Global Score

The scores here were almost identical (but marginally higher) than in Experiment 1. Bear in mind that this is just a subjective evaluation of the readings. While the experimenters claim that the scores for the targeted readings were “significantly higher” than for the decoy readings, that’s only true if we’re talking about “significant” in the statistical sense. In the real world sense, they’re kinda terrible (rating from 0-6, with 6 being the best): 2.97 +/- 0.26 vs. 2.13 +/- 0.26, P = .007.

At best, the targeted readings scored “Mixture of correct and incorrect information, but enough correct information to indicate that communication with the discarnate occurred.”, while the decoys scored “Some correct information, but not enough to suggest beyond chance that communication with the discarnate occurred”. As I mentioned above, notice that both of these answers are likely to bias a sitter who may presuppose that one of the readings is ‘real’ but the other ‘fake’.

In any case, as a subjective measure, I don’t value this particular part of the experiments much at all. Even if true, the results are terrible for mediumship.

Reading Choice

Here’s where the experimenters really want to celebrate: here in Experiment 2, a statistically significant 67.7% chose the targeted reading as ‘their’ reading over the decoy reading!

Not to burst their bubble, but this simply could be an artifact of having a slightly larger sample size. In Experiment 1, only 63% (or 17 out of 27) of the targeted readings were chosen. If one more person had chosen the right study? It would have bumped right up to 67%.

In Experiment 2, 21 out of 31 picked the targeted readings, scoring 67%. If only 20? Would have have been statistically significant?

Let me put it another way: if your data hinges on just one participant making a single choice in your favour, you are not measuring a robust phenomenon. This is the issue with small sample sizes (and yeah, 31 readings is a small sample size): it’s prone to distortion by one or two acts of chance.

Conclusion

I think this is actually a pretty damn good study, that could use some significant tightening up should someone decide to replicate it. I think the controls are quite well arranged, and the methodology (if not the wording of the ratings) is quite solid.

Unfortunately for the experimenters, I think it shows quite the opposite of what they conclude: that mediumship isn’t a thing. The experimenters want to trumpet the subjective ratings, the “global score”, as being ‘statistically significant’, but what those global ratings say is that mediumship, if real, is terrible. Given the size of the samples and the poor word choices in the rating scales, I can’t help but see the difference in score between the targeted readings vs the decoy readings to be anything more than an artifact of the experiment: nothing real is being measured here.

How would I improve this study? For one, I’d want a minimum of 100 targeted readings. That would help account for any skewing due to small sample sizes. Secondly: why only one decoy reading? In any sequence of coin flips, one can easily find a series of multiple heads or tails. I would like to see at least 4 decoys provided for for every targeted reading, and this would significantly reduce the risk of ‘false positives’ occurring.

Finally, I agree with the experimenters in that any further studies should be done by “open-minded investigators”. Such open-mindedness would have to include, of course, the possibility of apophenia, of seeing patterns where no patterns exist. Cheating, fraud, and the rest being off the table does not entail that the only possible explanation is non-local information sources. “Open-minded” doesn’t mean “jumping to evidence-free conclusions at the earliest opportunity”, as much as the pro-mediumship crowd would like to think that it does.

On a personal note, I spent quite a bit of time amongst the pro-mediumship crowd, and I have to admit it was one of the more shitty experiences I’ve had in a long time. When in response to your asking someone to demonstrate their abilities (and I meant in a controlled environment….), they start telling you that your dead sister misses you and loves you even if you don’t believe the person claiming to be a medium, that’s some pretty fucked up shit right there. Nevermind that I don’t have a sister (living or dead), that kind of blatant attempt at emotional manipulation just isn’t ok. But these are the pieces of shit out there who claim to be mediums. /shrug

*Anomalous Information Reception By Research Mediums Under Blinded Conditions Ii: Replication And Extension
Julie Beischel, PhD; Mark Boccuzzi, BS; Michael Biuso, MA; and Adam J. Rock, PhD.
EXPLORE March/April2015,Vol.11,No.2, pg 137-142

Follow Brian on Twitter!

[GARD]

Experiment 1

Estimated Accuracy

Global Score

Reading Choice

Experiment 2

Calculated Accuracy

Direct Hits vs Complete Misses

Global Score

Reading Choice

Conclusion

Leave a Reply