Statistics used in scientific studies

Increasingly often many websites associated with health, nutrition and the environment, will reference scientific papers. They normally do this to give their website and the claims in their articles some credibility. However, the increased use of online references comes with easy online access to the scientific papers themselves. This is great for those with the time and interest to validate what may seem like questionable opinions and claims.

The problem with the use of references is that it is not always that easy to understand what they are showing us. Some things you may want to consider are:

  • The website authors that referenced a paper, may not have even checked, that the scientific paper agreed with the point they were trying to make.
  • The website author may not have understood correctly the conclusion of the scientific paper.
  • The website author may be using statistics from the paper that were not highlighted in the papers conclusion. This is quite valid if the author has correctly interpreted the data and associated statistics.
  • The scientific paper may have drawn a conclusion that is not warrented by the results they got. This may be due to bias, or a misunderstanding of the meaning of the statistics. Both of these problems are quite common.
  • The scientific paper may have set up the experiment incorrectly. This is very common as it is virtually impossible to allow for all factors that can affect results.

To give meaning to the above here are some examples.

Measures of statistical significance

  • Relative risk - In medicine this is the risk of a certain outcome given exposure to some environmental or genetic factor. For instance the risk of cardiovascular events for those with or without diabetes. It could also apply to groups taking and not taking a drug, or those with a certain type of diet and those without. This statistic is used a lot with cohort studies. Always remember with relative risk that it is only significant if the absolute risk is high. For example if a golfer is 3 times more likely to be struck by lightning than a non-golfer does that really matter if the chance is 1 in a million over a 10 year period?
  • Odds ratio - Is the chance of an event occurring in one group versus it occuring in another. It relies on a dichotomous classification. E.g men versus women or sample group versus controls. It can be used when a group is split into more than two such as tertiles. It then shows the relative chance of a parameter being seen in each of the three groups. There is a formula that can be used. An odds ratio of 1 shows an equal probability of an event in either group. Greater than one means it is more likely in that group and less the converse. This statistic is mostly used with case-control studies. Descriptive cross-sectional studies may also use it.
  • The normal distribution or bell-shaped curve is used a lot in scientific studies. It comes with its own set of maths, but is always based on a few assumptions, which sometimes are not correct. However, when you take repeated samples from a large population, you can say with certainty that their values for a given population parameter, say the age of death, will follow a normal distribution. That is not to say that the age of death follows the normal distribution, but the average age of death will be at the centre of a normal distribution of average ages of death of a number of samples. It is considered that a sample size of 30 is enough to justify using this normal pattern.
  • The t-distribution is pretty similar to the normal distribution, but has a flatter middle and more extended tails on either side. Why? Well if you were taking small samples of say 3, then you would have a higher chance of picking 3 outlying results all on one side of the average. Say your random sample of 3 people who had died in the UK last year just happened to include only people over 100. It is clearly possible. If your sample size was 30 it would be extremely unlikely.
  • p-values - Basically they use p<0.05 or p<0.01 which approximate to a confidence that your result is correct in 95% or 99% of the population tested. A confidence level should be selected before carrying out a test/study, in order to prevent the results influencing the amount of confidence that the study authors select. If you inspect the normal curve below