Tag Archives: RCT

Significantly p’d

I may be a pee scientist, but today is brought to you by the letter “P” not the product.  “P” is something all journalists, all lay readers of science articles, teachers, medical practitioners, and all scientists should know about.  Alas, in my experience many don’t and as a consequence “P” is abused. Hence this post.  Even more abused is the word “significant” often associated with P; more about that later.

P is short for probability.  Stop! – don’t stop reading just because statistics was a bit boring at school; understanding maybe the difference between saving lives and losing them.  If nothing so dramatic, it may save you from making a fool of yourself.

P is a probability.  It is normally reported as a fraction (eg 0.03) rather than a percentage (3%).  You will be familiar with it when tossing a coin.  You know there is a 50% or one half or 0.5 chance of obtaining a heads with any one toss.  If you work out all the possible combinations of two tosses then you will see that there are four possibilities, one of which is two heads in a row.  So the prior (to tossing) probability of two heads in a row is 1 out 4 or P=0.25. You will see P in press releases from research institutes, blog posts, abstracts, and research articles, this from today:

“..there was significant improvement in sexual desire among those on  testosterone (P=0.05)” [link]

So, P is easy, but interpreting P depends on the context.  This is hugely important.  What I am going to concentrate on is the typical medical study that is reported.  There is also a lesson for a classroom.

One kind of study reporting a P value is a trial where one group of patients are compared with another.  Usually one group of patients has received an intervention (eg a new drug) and the other receives regular treatment or a placebo (eg a sugar pill).  If the study is done properly a primary outcome should have been decided before hand.  The primary outcome must measure something – perhaps the number of deaths in a one year period, or the mean change in concentration of a particular protein in the blood.  The primary outcome is how these what is measured differs between the group getting the new intervention and the group not getting it.  Associated with it is a P value, eg:

“CoQ10 treated patients had significantly lower cardiovascular mortality (p=0.02)” [link]

To interpret the P we must first understand what the study was about and, in particularly, understand the “null hypothesis.”  The null hypothesis is simply the idea the study was trying to test (the hypothesis) expressed in a particular way.  In this case, the idea is that CoQ10 may reduce the risk of cardiovascular mortality.  Expressed as a null hypothesis we don’t assume that it could only decrease rates, but we allow for the possibility that it may increase as well (this does happen with some trials!).  So, we express the hypothesis in a neutral fashion.  Here that would be something like that the risk of cardiovascular death is the same in the population of patients who take CoQ10 and in the population which does not take CoQ10.  If we think about it for a minute, then if the proportion of patients who died of a cardiovascular event was exactly the same in the two groups then the risk ratio (the CoQ10 group proportion divided by the non CoQ10 group proportion) would be exactly 1.  The P value, then answers the question:

If the risk of cardiovascular death was the same in both groups (the null hypothesis) was true what is the probability (ie P) that the difference in the actual risk ratio measured from 1 is as large as was observed simply by chance?

The “by chance” is because when the patients were selected for the trial there is a chance that they don’t fairly represent the true population of every patient in the world (with whatever condition is being studied) either in their basic characteristics or their reaction to the treatment. Because not every patient in the population can be studied, a sample must be taken.  We hope that it is “random” and representative, but it is not always.  For teachers, you may like to do the lesson at the bottom of the page to explain this to children.  Back to our example, some numbers may help.

If we have 1000 patients receiving Drug X, and 2000 receiving a placebo.  If, say, 100 patients in the Drug X group die in 1 year, then the risk of dying in 1 year we say is 100/1000 or 0.1 (or 10%).  If in the placebo group, 500 patients die in 1 year, then the risk is 500/2000 or 0.25 (25%).  The risk ratio is 0.1/0.25 = 0.4.  The difference between this and 1 is 0.6.  What is the probability that we arrived at 0.6 simply by chance?  I did the calculation and got a number of p<0.0001.  This means there is less than a 1 in 10,000 chance that this difference was arrived at by chance.  Another way of thinking of this is that if we did the study 10,000 times, and the null hypothesis were true, we’d expect to see the result we saw about one time.  What is crucial to realise is that the P value depends on the number of subjects in each group.  If instead of 1000 and 2000 we had 10 and 20, and instead of 100 and 500 deaths we had 1 and 5, then the risks and risk ratio would be the same, but the P value is 0.63 which is very high (a 63% chance of observing the difference we observed).  Another way of thinking about this is what is the probability that we will state there is a difference of at least the size we see, when there is really no difference at all. If studies are reported without P values then at best take them with a grain of salt.  Better, ignore them totally.

It is also important to realise that within any one study that if they measure lots of things and compare them between two groups then simply because of random sampling (by chance) some of the P values will be low.  This leads me to my next point…

The myth of significance

You will often see the word “significant” used with respect to studies, for example:

“Researchers found there was a significant increase in brain activity while talking on a hands-free device compared with the control condition.” [Link]

This is a wrong interpretation:  “The increase in brain activity while talking on a hands-free device is important.” or  “The increase in brain activity while talking on a hands-free device is meaningful.”

“Significant” does not equal “Meaningful” in this context.  All it means is that the P value of the null hypothesis is less than 0.05.   If I had it my way I’d ban the word significant.  It is simply a lazy habit of researchers to use this short hand for p<0.05.  It has come about simply because someone somewhere started to do it (and call it “significance testing”) and the sheep have followed.  As I say to my students, “Simply state the P value, that has meaning.”*



For the teachers

Materials needed:

  • Coins
  • Paper
  • The ability to count and divide

Ask the children what the chances of getting a “Heads” are.  Have a discussion and try and get them to think that there are two possible outcomes each equally probable.

Get each child to toss their coin 4 times and get them to write down whether they got a head or tail each time.

Collate the number of heads in a table like.

#heads             #children getting this number of heads

0                      ?

1                      ?

2                      ?

3                      ?

4                      ?

If your classroom size is 24 or larger then you may well have someone with 4 heads or 0 (4 tails).

Ask the children if they think this is amazing or accidental?

Then, get the children to continue tossing their coins until they get either 4 heads or 4 tails in a row.  Perhaps make it a competition to see how fast they can get there.  They need to continue to write down each head and tail.

You may then get them to add all their heads and all their tails.  By now the proportions (get them to divide the number of heads by the number of tails).  If you like, go one step further and collate all the data.  The probability of a head should be approaching 0.5.

Discuss the idea that getting 4 heads or 4 tails in a row was simply due to chance (randomness).

For more advanced classes, you may talk about statistics in medicine and in the media.  You may want to use some specific examples about one off trials that appeared to show a difference, but when repeated later it was found to be accidental.


*For the pedantic.  In a controlled trial the numbers in the trial are selected on the basis of pre-specifying a (hopefully) meaningful difference in the outcome between the case and control arms and a probability of Type I (alpha) and Type II (beta)  errors.  The alpha is often 0.05.  In this specific situation if the P<0.05 then it may be reasonable to talk about a significant difference because the alpha was pre-specified and used to calculate the number of participants in the study.

Faith justified? – a vital tale

Expensive pee or elixir of life?  The two extreme views of multivitamins.  I’ve been taking multivitamins for a number of years now.  I’ve taken them on faith backed by a little evidence.  This week, I think for the first time, a randomised controlled trial has provided high quality evidence that my faith is justified.  More on that in a minute.

Most trials of vitamin supplements to date have tested vitamins in isolation.  The trials were justified on the observation that people with certain diseases lacked specific vitamins and/or the scientists’ understanding of biochemical pathways that require the vitamin in question to work well.  This is well and good.  From what I understand most of these trials have failed to show a clinical difference (ie in health outcomes) (see, eg, my report on the Vitamin D trial in Christchurch).

Vitamins (and trace minerals), of course do not exist in us in isolation.  They work together with each other and along with all the other chemicals in us with names that only a biochemist could love.  The theory, which I’ve accepted largely by faith, is that vitamin supplementation works best when it is multiple vitamins together.  Studies of multivitamin supplementation have largely been short term or retrospective observational.  That is, scientists have surveyed people on vitamin use and drawn conclusions based on that.  One such study, the Iowa women’s study(1), caused me to pause and reassess last year when it seemed to indicate supplementation including copper increased mortality in post-menopausal women. Being neither a woman nor post-menopausal I did not panic.

The prospective randomised controlled trial (RCT) is regarded as a much higher level of evidence than retrospective observational studies.  Published this week in the Journal of the American Medical Association (JAMA) is an RCT of multivitamin supplementation in men (2).  Briefly, 14641 men aged 50+ were enrolled in a trial in 1997 and followed until 2011. Participants were randomly chosen to receive either a multivitamin or a placebo.  Neither the participants nor the people running the study knew which people received placebo and which received multivitamin.  This is known as “double-blind.”  Only a statistician knew and he or she did not reveal anything until all the data was in.  The primary outcome was to compare the rates of cancer and cardiovascular disease in both groups.  Secondary outcomes (ie ones that the statistics can not be so precise about because of the numbers) were the rates of some specific cancers (eg prostate cancer).  There was amongst the 14641 men a subgroup of about 1300 men with a pre-existing history of cancer.

The results:

Men taking multivitamins had a modest reduction in total cancer incidence (HR, 0.92; 95% CI, 0.86- 0.998; P = .04)

My interpretation:  Those taking multivitamins were about 8% less likely to get cancer.  The statistics show that they are 95% confident that the amongst all men with the same characteristics as the men in their sample the true reduction in probability of getting cancer over the 11 year follow up period is between 0.2 and 14%.

A little frighteningly whilst major cardiovascular events were mentioned as part of the primary outcomes they were not reported on!

The strengths of the study are its size, that it is an RCT and double-blind, that it has good length, that all participants who received the multivitamin received the same one and that the multivitamin manufacturer had no role in designing or running the study, or analysing the data.

The weaknesses are that it is all men, all over the age of 50, and all physicians.

S0, is my faith justified?  If by that do you think I mean “proven” then think again. Proof or proven are words that should never be used in the company of good scientists.  Rather, I think there is some more good quality evidence to support the taking of multivitamins – so I shall continue to do so.  I must, though, remain open to evidence of the opposite variety and be aware that like all studies there is a probability that the conclusions will not be backed up by future studies.

Of course not all multivitamins are created equal (beware of fillers), they have different compositions and some are less likely to be absorbed than others, so do some homework before you rush out an buy some.

(1)  Mursu J, Robien K, Harnack LJ, Park K, Jacobs DR. Dietary supplements and mortality rate in older women: the Iowa Women’s Health Study. Arch Intern Med 2011;171(18):1625–33.

(2) Gaziano JM. Multivitamins in the Prevention of Cancer in MenThe Physicians’ Health Study II Randomized Controlled TrialMultivitamins in the Prevention of Cancer in Men. JAMA : the journal of the American Medical Association 2012;:1.

[Conflict of interest:  My wife’s business includes the selling of multivitamin supplements]

Vitamin D: “Silver bullet or fool’s gold?”

Vitamin D has had big raps lately.  We know that low levels of it correlate with higher levels of some diseases, but does taking a supplement help?  An article in the Herald this morning by Martin Johnson nicely outlines a study  being undertaken by Professor Robert Scragg of the University of Auckland.  His is the quote in the title.

Why is there need for an expensive trial when lots of observation studies show low levels of Vit D mean you are more likely to get Cardiovascular (and other) diseases, high levels mean you are less likely?  Isn’t it obvious that by taking supplements that health outcomes will improve?  Sadly, no it isn’t.  Correlation does not mean causation (or “Post hoc ergo propter hoc” for you latinistas out there – I learnt this from a re-run of West Wing this week).  What this means is that there is more than one reason for the correlation ie:

  1.  Illnesses are because Vit D is an essential component in the biochemical pathway’s that provide a defense against these illnesses (causation), or
  2.  Low Vit D is a consequence of something else that has gone wrong that also causes the diseases (ie Vit D is a “flag” or “marker” for something else).

If 1 is true, then raising Vit D levels may help.  If 2 is true, then raising levels probably won’t help.  For the moment assume 1 is true, then the next question is “does supplementation help?”  Again, most would think “Of course.”  However, it is possible that by bypassing the mechanism by which the body makes its own Vit D (ie beginning with exposure to the sun) the body’s response to the increased Vit D is different.  These, and others, are reasons why a Randomised Controled Trial (RCT) in which some participants get Vit D and some get Placebo (in this case sunflower lecithin) is conducted.  There is some information about the trial in the Herald article, more can be found on the Aust NZ Clinical Trials Registry here.  Briefly, participants (50 to 84 years of age) will receive 1 capsule a month for 4 years.  The incidence rate of fatal and non-fatal cardiovascular disease is the primary outcome. Secondary outcomes include the incidence of respiratory disease and fractures. They need to recruit 5100 people (so get involved!).

Why so many people?  This is because they want to avoid making two mistakes.  They want to know with high certainty that if they see a difference in the rates of cardiovascular disease between the Vit D and Placebo group, the that it is not a difference that occurred randomly (ie seeing a difference when there really is no difference).  It is most common to accept a 5% chance of seeing a difference by chance (tossing 4 heads in a row is about a 6% chance).  The second mistake is if the trial were to show no difference between the groups, but for this to be a false conclusion (ie not seeing a difference when there really is a difference).  It is common to accept about a 10% chance of this happening.  Notice, I have talked about “difference” not Vit D being “better” than placebo.  This is very important, because it is possible that Vit D is worse and scientists must take into account that possibility.  That is why scientists also start with what we call the “null hypothesis” – the presumption, in this case, that there is “no difference” in the rates of cardiovascular disease between those taking Vit D and those taking placebo.

I liked the quote of Prof Scragg in the Herald:

“GPs are very supportive of it and I know they are prescribing it extensively to patients. Hospital specialists are sceptical. Me, I’m in the middle. My heart says I want it to work. My head says I have to keep an open mind.”

I too often find myself in the “middle” – hoping with my heart that something works for the good of all, but working with my head so that we don’t end up peddling false hope or worse.