Statistical fallacies in orthopedic research

Background: A large number of statistical fallacies occur in medical research literature. These are mostly inadvertent and occur due to lack of understanding of the statistical concepts and terminologies. Many researchers do not fully appreciate the consequence of such fallacies on the credibility of their report. Materials and Methods: This article provides a general review of the issues that could give rise to statistical fallacies with focus on orthopedic research. Some of this is based on real-life literature and some is based on the actual experiences of the author in dealing with medical research over the past three decades. The text is in teaching mode rather than research mode. Statistical fallacies occur due to inadequate sample that is used for generalized conclusion; incomparable groups presented as comparable; mixing of two or more distinct groups that in fact require separate consideration; misuse of percentages, means and graphs; incomplete reporting that suppresses facts; ignoring reality and depending instead on oversimplification; forgetting baseline values that affect the outcome; misuse of computer packages and use of black-box approach; misuse of P-values that compromises conclusions; confusing correlation with cause-effect; and interpreting statistical significance as medical Mere awareness of the situations where statistical fallacies can occur may be adequate for researchers to sit up and take note while trying to provide a credible report. differential analysis, P-values, Statistical fallacies to statistics. It lies with the a number of fallacies, in to to

allacies are anomalies that considerably reduce the are unintentionally misinterpreted due to lack of credibility of a report. Statistical fallacies are common comprehension. The fault in either case cannot be ascribed in medical research literature. This article enumerates to statistics. It lies with the user. The difficulty, however, is a large number of such fallacies, particularly in orthopedic that the adverse effects of wrong biostatistical methods are research, in the hope of creating awareness about situations slow to surface. And that makes these methods even more where these can occur. Such awareness by itself may be vulnerable to wrong uses. enough for a researcher to try to produce a more credible PROBLEMS WITH THE SAMPLE Statistics show that more people die in hospital than at The subjects of investigation should provide sufficient and home. Also, there is a strong association between dying valid data to take a decision one way or the other. and being in bed. The nonsense of such associations is Sometimes this does not happen for a variety of reasons. apparent. No one would advocate avoiding a hospital or bed to prolong life! These might seem like extreme A sample is considered unbiased when it truly represents the target population. A frequent source of error in statistical conclusions is a biased sample. This can happen even when the selection is random or even when random allocation is made in experimental studies. A large sample tends to magnify these errors rather than to control them.
Survivors: Consider the relationship of bone mineral density (BMD) with age in women after the age of 60 years. It is well known that this density declines due to hormonal changes but the gradient differs from population to population. How do we find the exact effect of age in a particular population? Those with lower density are likely to be in poor health and thus may have less longevity. Any estimation based on survivors is bound to be biased in this situation. It may not be easy to find the correct gradient in this case.
Similarly, a study based only on hospital cases will exclude those who die before admission. Serious cases, those residing in remote areas and those who are poor to afford Indrayan: Statistical fallacies such limitations, clinic-based studies do give important information on the presenting symptoms, their correspondence with laboratory and radiological findings, response to various therapeutic procedures, prognostic features, etc. But the results are seldom applicable to the type of cases that do not show up in clinics. It is sometimes believed that consecutive cases coming to a clinic would be free from bias. Such cases could be truly representative hospitalization tend to be excluded. Similar bias occurs in of clinic subjects, but the general bias in all clinic-based a variety of other situations where the study is on prevalent subjects still remains. cases rather than on incident cases. Both these fallacies may have occurred in investigating age influence on the Inadequate size of the sample: Statisticians are ability of femoral BMD to predict hip fracture where the notorious for advising large samples. The number of subjects were women aged 75 and over. 2 subjects in a study should be adequate to generate sufficient confidence in the results. Considering this aspect, a larger Volunteers: Early phases of clinical trials are in any case sample is not harmful but a small sample can be a waste done on volunteers. Volunteers tend to be very different of resources. A small sample may fail to detect the from the general class of subjects. Many of them are either differences really present in the target population. There is hopeless terminal cases or are subjects with exceptional a growing concern among the medical community courage. Both affect the response. Notwithstanding this regarding the failure of many randomized controlled trials limitation, volunteer studies have a definite place in (RCTs) in detecting a clinically relevant difference because medicine as they do provide important clues on the toxicity they do not have sufficient power. The driving force for of the regimen under test, the dose level that can be power is the size of the trial and the inter-individual tolerated and the potential for further testing of the variability. In only 3% studies published in the year 1997 in American and British volumes of Clinical Orthopedics and Related Research, the power was adequate to detect Medical ethics require that the subject's consent must be a small difference. 4 taken for the research. Real consent, after fully explaining the underlying uncertainties, is difficult to obtain. Many In addition, a small sample has high likelihood of being researchers in developing countries just get the consent not fully representative and thus of producing biased results. form signed, which a sick person may do under the force Statistical methods have an inbuilt provision to take care of circumstances, without properly understanding the of the larger sampling fluctuations in small samples but consequences. When the real consent is taken, the subjects they are not equipped to take care of the lack-of again self-select themselves and the sample may become representativeness bias that is more likely to creep into biased. Some of this is eliminated if and when a random small samples. allocation strategy is adopted but that is feasible only for trials. Even then generalizability suffers. Conclusions based In a rare situation, an exceedingly large sample could also on all such research can be misleading. Not many be a problem. Schnitzer et al 5 studied prescribing pattern modality.
researchers appreciate this fallacy and tend to wonder why their results do not apply to the practical situation.
Clinic subjects: Clinic and hospital subjects too form a biased sample, as they tend to have a more severe form of disease and include mostly those who can afford these services. Mild cases tend to ignore their condition or selftreat themselves. An interesting example is migraine, which was once believed to be more common in the intelligent professional class. 3 An epidemiological study on a random sample could not substantiate such a relationship when the role of confounding factors was eliminated. Those in the more intelligent professional class perhaps seek medical assistance in the early phases of the disease and with greater frequency than their nonprofessional counterparts. Despite for rofecoxib and colecoxib in 47,935 patents of osteoarthritis and 10,639 patients of rheumatoid arthritis. The objective was to find the most frequently prescribed dose. They realized the futility of the statistical test of significance since clinically unimportant small differences would turn out to be statistically highly significant for such a large sample. Despite such a big sample, their conclusion had limitations due to lack of clinical information, inability to ascertain actual use and potential for selection bias.
Nonresponse: Statistical nonresponse occurs when some cases are lost to follow-up. A large nonresponse can render an otherwise good sample ineffective in drawing a valid conclusion not only because of the bias it tends to introduce but also because it considerably reduces the size of the  Indrayan: Statistical fallacies particular study. It is necessary that similarity of groups is checked as a posthoc procedure even when the groups are formed by genuine randomization. This similarity has to be cautiously interpreted because a large difference could be statistically not significant if the sample size is small.
In a trial for performance of a new design trochanteric Gamma nail in comparison to compression hip screws in trochanteric fractures, 7 patients were randomized but no specific matching was done for the number of stable and Publication bias: Meta analysis that tends to provide unstable fractures. The study did find that postoperative more valid and reliable conclusions by pooling results from walking ability was better in patients with unstable fractures different studies is surely an accepted statistical technique.
when treated with the nail. Perhaps such a trial should be However, two points deserve attention in this regard. First, carried out separately for patients with stable and unstable literature is generally unduly loaded with "positive" results.
fractures. In this study, they were mixed and separated Negative or indifferent results are either not sent for afterwards for the purpose of analysis that might have publication or are not published by the journals with the affected the validity of the conclusion. same frequency. Any conclusion based on commonality in publications thus can magnify this bias.
Differential definitions: In a study on the trend in contribution of osteoporosis to total morbidity over the It must be realized that many consistent results increase past 50-60 years, an important factor is the change in the confidence but do not necessarily provide confirmatory definition and detection of osteoporosis from 1950 to today. evidence. A single contradictory finding with credence can The disease is much better understood now than 50 years put a question mark on the whole conclusion.
ago. The diagnostic procedures have greatly improved after the emergence of DEXA and the awareness is high. Thus,

Incomparable groups
the detection rate has increased. This will surely affect the If sufficient care is not exercised with regard to various trend. features of the groups under comparison, fallacies can easily occur. For comparability, the control should undergo Defining osteoporosis as BMD at least 2.5SD below the the same clinical maneuvers, such as transfer from one normal for 25-year-old females 8 implies under Gaussian unit to another and duration and frequency of examination, conditions that six in 1000 'healthy' 25-year-old females which are used for the treatment group. This is what makes would be osteoporotic! In fact these may be much less blinding so important. If a group of women receiving since they are healthy. Such an anomaly exists with all arthroscopic surgery of the knee is being observed for deep statistical definitions that identify the level below which venous thrombosis, the same set of observations should (or above which) the risk dramatically increases. Most be done on the women with other knee surgeries for a medical measurements follow a continuum and a valid valid conclusion regarding excess incidence in the cutoff is difficult to identify. Thus fallacies occur and they arthroscopic group. In practice, the group of women with are tolerated. other knee surgeries may not be observed for thrombosis with similar attention.
Differential compliance: In a clinical trial setup, subjects Differential in group composition: Matching or randomization is advised as a strategy to minimize bias in results based on comparison of groups. Comparability should not be merely for age and gender but should be for all those prognostic factors also that might possibly affect the outcome. These include severity of disease, coexisting diseases, care of the subjects, etc. This equivalence is difficult to achieve in practice. Also, all prognostic factors may not be fully known. Randomization is a good strategy to average out the factors and to obtain two equivalent groups, and it works well in the long run. It may fail in a in the treatment group may drop out more because of discomfort or poor taste of the drug, even when the placebo looks like the drug and the trial is randomized. On the other hand, the subjects in the treatment group may stay if they see improvement in their condition whereas the placebo group can become noncompliant. The compliance rate in this case is related to the efficacy of the regimen and the comparison can be jeopardized. Diabetes and gallstones may appear to be associated because diabetics are regularly checked for gallstones but the nondiabetics are rarely checked.
Improper denominator: If the distribution of ankylosing spondylosis in different blood groups is O, 45%; A, 35%; B, 15%; and AB, 5%, it would be naive to conclude that this disease occurs more commonly in subjects with blood Another kind of problem with the denominator occurs when organs or episodes are counted instead of individuals. Modern medicine indeed seems to have fragmented human being into a conglomerate of different parts-organs, tissues, etc. An orthopedic surgeon may count the average number of vertebrae in a group of cases of spinal tuberculosis 9 or number of knees 10 or number of replaced hips. 11 Although this might be adequate in some instances, it may be inadequate to assess the magnitude of the problem. Organs belonging to the same person are not statistically independent as they are across individuals.
A similar problem arises when a subject is repeatedly counted in case of recurrent episodes. This can happen, for example, for low back pain. 12 Recurrent attacks tend to have same origin. Extra caution may be required in stating and interpreting results based on such repeated count of individuals.

Mixing of distinct groups
Mixing of distinct groups can give fallacious results in some situations. Two are illustrated next.
Effect on regression: An annoying feature of regression (and correlation) is that it can be influenced by a single to the upper SE class.
The points within each cluster are randomly scattered and indicate no relationship whatsoever between BMD and BMI in either class. That is, there is no evidence that a high BMI in either class is accompanied by a high value of BMD. When the data for the two SE classes are mixed, a distinct relationship emerges. But that is false because neither group Mixing of two distinct groups can give a false sense of regression. Consider a survey on boys of Grade IX belonging to three randomly selected schools from each of the two strata-schools in slums catering to a low socioeconomic (SE) class and schools in posh localities catering to an upper class. The objective was to find health correlates. Among the measurements made were body mass index (BMI) and BMD in left wrist. When these are plotted, a scatter of the type shown in Figure 1b is obtained. This contains two distinct clusters of points, the lower left belonging to the low SE class and the upper right belonging has that kind of relationship unless one wants to draw a composite conclusion for the two groups combined.

ERRORS IN PRESENTATION OF FINDINGS
Out of ignorance or deliberately, the presentation of data in medical reports sometimes lacks propriety. This can happen in a variety of ways.

Misuse of percentages
Percentages can mislead if calculations are (a) based on small n or (b) based on an inappropriate total. If two patients out of five respond to a surgery, is it correct to say that the response rate is 40%? In another group of five, if three respond, the rate jumps to 60%. A difference of 20%  Thus it is better to state clearly that mean is 0.980 and SD is 0.075 g/cm 2 without using a ± sign.
Mean may not be an appropriate indicator for a particular data set. If in a group of eight persons, seven do not take alcohol and one consumes 200 mL per day, how correct is it to say that the average consumption in this group is 25 mL per person per day? If extreme values or outliers are present, mean is not a proper measure. Either use the median or recalculate mean after excluding the outliers. If surgery, keeping silent about the others, it has the risk of exclusion is done, this must be clearly stated. being interpreted as saying that the remaining 58% preferred medication. This obviously is wrong. The

Misuse of graphs
nonrespondents or the neutrals should always be stated Some fallacies can occur due to inadequate choice of scale so that no bias occurs in the interpretation.
in a graph. A steep slope can be represented as mild and vice versa. Similarly, a wide scatter may be shown as If 12% of critical patients in the control group die within compact. Also, means in different groups or means over one week and 8% in the treatment group, the risk reduction time can be shown without corresponding SDs. They can is 4% or 33%? Results are sometimes stated to magnify be shown to indicate a trend that does not really exist or is the effect without the corresponding explanation regarding not statistically significant. the base of calculation.
One of the main sources of fallacies in graphs is their

Misuse of means
insensitivity to the size of n. A mean or a percentage based A popular saying by detractors of statistics is, "Head in an on n = 2 is represented the same way as the one based on oven, feet in a freezer and the person is comfortable!" There n = 100. The perception, and possibly cognition, received is no doubt that an inference on mean alone can sometimes from a graph is not affected even when n is explicitly stated. by very misleading particularly when it is based on just One such example is the box-and-whiskers plot drawn for two values as in this case. Mean should not be calculated the time elapsed between cancer and acquired unless there are at least four readings. In addition, it must immunodeficiency syndrome (AIDS) diagnoses among always be accompanied by the standard deviation (SD) homosexual men with cancer diagnosed before or so that an indication is available about the dispersion of concurrently with AIDS in San Francisco during 1978 to the values on which the mean is based. In the quoted 1990, 14 where n is 4 in one group yet maximum, minimum, phrase, the SD would be exceedingly high showing that first quartile, third quartile and median are shown. the mean is useless. Sometimes, the standard error (SE) is stated in place of SD but that too might mislead unless its

Problems in Reporting
implications in the context are fully explained. Also, n must Among many problems that can occur with the reporting, always be stated when reporting a mean. These two, n two requiring specific attention are incomplete reporting and SD, should be considered together when drawing any and overreporting. conclusion based on mean. Statistical procedures such as confidence intervals and test of significance have a built-Incomplete reporting: All reports should state not only in provision to take care of both of them. A mean based on large n naturally commands more confidence than the one based on small n. Similarly, a smaller SD makes the mean more reliable.
General practice is to state mean and SD with a ± sign in between such as mean BMD in lumbar spine in healthy premenopausal women is 0.98±0.075 g/cm 2 . 13 Opinion is now generating against the use of the ± sign because it tends to give a false impression that the variation is from 0.905 to 1.055 g/cm 2 in this case. The variation is much more-even more than ±2SD limits, 0.830 to 1.130 g/cm 2 . the truth but the whole truth. There is a growing concern in the medical fraternity that part of the information in reports is sometimes intentionally suppressed and sometimes unknowingly missed. If so, the reader gets a biased picture. This is easily illustrated by the bias of many medical journals for reporting of "positive" findings and ignoring the "negative" reports. Both should be reported in a balanced manner. Similarly, properly designed studies that do not reject a particular null hypothesis deserve a respectable place in the literature. That, at present, is sadly lacking although awareness for a balanced approach is increasing. Recently a Journal of Negative Results in Biomedicine has started to partly address this problem.
A very serious problem is reporting only that part of a study that supports a particular hypothesis. The other parts are suppressed. An example is a series of studies on the carcinogenic effect of asbestos. According to an analysis of different studies, deliberate attempts were made by the industry to suppress information on the carcinogenicity of Indrayan: Statistical fallacies relationship in others. The rise and fall of lung function with increase in age is aptly represented by a parabolic curve but the relationship vanishes if only linearity is considered. Another example is the relationship between glomerular filtration rate and creatinine level. Kidney function can influence bone health in at least older people 16 and needs to be properly studied. This linear relationship is medically unsatisfactory in this example even if R 2 is asbestos that affected millions of workers. 15 high. 17 There is a definite need to curb the tendency to linearize a clear nonlinear relationship. Over reporting: The statement of results in a report should generally be confined to the aspects for which the study Statistical linearity also includes a parabola with square was originally designed. If there are any unanticipated term of the regressor because the coefficient is still linear. findings, these should be reported with caution. These can McQuellon et al 18 found that overall quality of life trajectory be labeled as "interesting" or "worthy of fur ther in bone narrow transplant recipient is parabolic. This shows investigation" but not presented as conclusions. A new that terms such as square and log should be included where study, specifically designed to investigate these findings, indicated by scatter plot. can then be conducted.
Assumptions overlooked: A Gaussian form of Self-reporting versus objective measurement: Self-distribution of various quantitative measurements is so perception of health may be very different from an ingrained in the minds of some workers that they take it assessment based on measurements. A person with an for granted. For large n, the central limit theorem can be amputated leg may consider himself absolutely healthy.
invoked for inference on means but nonparametric This can particularly happen with social and psychological methods should be used when domains of quality of life. Besides such discrepancies, it distribution is far from Gaussian. At the same time, note has been observed, for example, that people tend to report also that most parametric methods, such as t and F, are lower than actual weight but higher height. This could make quite robust to mild deviation from the Gaussian pattern. a substantial difference when BMI is calculated. The Their use in such cases does limited harm so long as the percentage of subjects with BMI ³ 25 may thus become distribution has a single mode. much lower than obtained by actual measurements. Thus, only those characteristics should be self-reported that are Transformations such as logarithm and square-root so required for the fulfillment of the objectives of the study. sometimes help to "Gaussianize" a positively skewed All others should be objectively measured.
distribution. But these also make interpretation difficult and unrealistic. If the logarithm of blood lead concentration is

INADEQUATE ANALYSIS
weakly correlated with log of bone lead concentration, 19 what sort of conclusion can be drawn for the lead Among the most common sources of fallacies in data-based concentration itself? Despite this limitation, such conclusions is the use of inappropriate method of analysis.
transformations are in vogue and seem to lead to correct n is small and the

Ignoring reality
Most statistical methods take a simplified view of the complex biological process. Because computers are available for intricate calculations, the statistical need to simplify is really not as great now as it used to be in the precomputer era. But simplification is still required for easy comprehension. This should not be done to an extent that could distort the essential features of a biological process.
Looking for linearity: There is no doubt that hardly any relationship in medicine is linear. Yet, linear relationship is the most commonly studied form of relationship in health and medicine. This simplification seems to work fairly well in many situations but can destroy an otherwise very clear conclusions in many cases, particularly if n is not too small. The assumptions of independence of observations and of equality of variance across groups are more important than that of a Gaussian pattern. Independence is threatened when the measurements are serial or longitudinal. Uniformity of variance is lost when, for example, SD varies with mean. And this is not so uncommon. For example, as the cell-densities of cultured osteoblast life MG63 cells increase, their SDs also increase. 20 Persons with higher BMI also tend to exhibit greater variability in BMI than those with lower BMI. Thus, care is needed in using methods that require uniformity of variance such as ANOVA F-test.
In the case of Chi-square for proportions, the basic assumption is that the expected frequency in most cells is Fisher's exact test for a 2x2 table has become an integral part of most statistical packages of repute but the multinomial test required for larger tables with expected frequencies less than five has not found a similar place.
Anomalous person-years: It is customary to calculate person-years of follow-up and use this for various inference purposes. Person-years of exposure is a valid epidemiological tool only when each year of exposure has Indrayan: Statistical fallacies

Forgetting the baseline values
The conclusion in the preceding example is based on the absolute increase in BMD in different subjects. Critical examination of the data reveals that increase occurred mostly in the subjects with low baseline values. The pattern generally is lower the baseline BMD, higher the rise. The above analysis based first on paired t-test and second on binomial ignores this important aspect. The conclusion in this example is that dose generally helps nutrition rickets children with BMD≤0.025 g/cm 2 . This can be statistically the same risk. Calculation of the mortality rate per thousand established by running a regression. person-years after fracture in patients on dialysis 21 presumes that the risk of mortality in the first year after fracture is the Misuse of statistical packages same as in, say, the tenth year after fracture. This obviously Computers have revolutionized the use of statistical is not true. If nothing else, ageing will make an impact in methods for empirical inferences. Methods requiring complex calculations are now done in seconds. This is a definite boon when appropriately used but is a bane in

Mean or proportion?
the hands of nonexperts. Understanding of the statistical The blame lies mostly with statisticians who fancy quantity techniques has not kept pace with the spread of their use, than quality. Although quantitation in many cases does particularly in medical and health professionals. help in achieving exactitude in thinking and in conclusions, it can sometimes suppress important findings.
Overanalysis: Data are sometimes overanalyzed, particularly in the form of post hoc analysis. A study may Consider a rise in BMD in lumbar spine after a high single be designed to investigate the relationship between two dose of vitamin D (60,000 IU) orally in 10 children of age specific measurements but correlations between pairs of a 5 to 13 months diagnosed as nutritional rickets. 22 The large number of other variables which happen to be authors have not given the exact values but suppose they available, are calculated and examined. If each correlation were as follows: is tested for statistical significance at a = 0.05, the total error BMD before the drug (g/cm 2 ) rate increases enormously. Also, a = 0.05 implies that one 0.25 0.23 0.19 0.27 0.20 0.32 0.24 0.30 0.24 0.26 in 20 correlations can be concluded to be significant when it actually is not. If measurements on 16 variables are BMD after one month of single dose of vitamin D (g/cm 2 ) available, the total number of pair-wise correlations is 16x15/ 0.28 0.29 0.27 0.27 0.28 0.31 0.28 0.29 0.29 0.24 2 = 120. At the error rate of 5%, six of these 120 can turn out to be falsely significant. Hofacker 23 Increase (g/cm 2 ) problem with the help of randomly generated data. He 0.03 0.06 0.08 0.00 0.08 -0.01 0.04 -0.01 0.05 -0.02 suggests that the data on half the subjects be kept aside for use in a validation exercise. This is feasible when the data Mean increase from 0.25 g/cm 2 to 0.28 g/cm 2 in this for a large number of subjects are available but not otherwise. example is statistically significant by paired t-test (P = There is a tendency to find the age-sex or severity group 0.034). However, the increase occurred in six cases and that benefited more from the treatment than the others. this case.
illustrated this not in the other four cases. A marginal fall in BMD one month after the dose can occur in some cases due to biological variation or instrumentation. If the dose is not effective, the probability of increase in BMD is the same as the probability of decrease. Under this null hypothesis (H 0 : P = ½), the exact probability of decrease in three or less subjects out of 10 by binomial is 0.17. This is more than 0.05. Thus the null cannot be rejected.
The proportion gives a negative result about the effect of the dose in this example whereas the mean difference gives positive result. Depending upon the preference of the investigator, the finding can be presented either way.
Numerous such analyses are sometimes done in the hope of finding some statistical significance somewhere.
Data dredging: Because of the availability of computer packages, it is now easy to reanalyze data after deleting some inconvenient observations. Valid reasons for this exercise, such as outliers, are sometimes present, but this can be misused by excluding some data that do not fit the hypothesis of the investigator. It is extremely difficult to get any evidence of this happening in a finished report. Integrity of the workers is not and cannot be suspected unless evidence to the contrary is available. Thus, data dredging can go unnoticed. Quantitative analysis of codes: Most computer programs, for the time being, do not have the capability to distinguish numeric codes from quantitative data. If disease severity is coded as 0, 1, 2, 3, 4 for none, mild, moderate, serious and critical conditions respectively, statistical calculations may treat them as the usual quantitative measurements. This runs the risk of considering three mild cases equal to one serious case and so on. That is, codes Indrayan: Statistical fallacies real differences may fail to be detected when a two-tail test is used instead of a one-tail test. Our advice is to use a one-tail test wherever clear indication is available but not otherwise.
Dramatic P-values: Attempts are sometimes made to dramatize the P-values. James et al 25 stated P < 0.000,000,000,01 for difference in seroconversion rate can be mistreated as scores. This can happen even with against Epstein-Barr virus between lupus patients and nominal categories such as signs and symptoms when controls. Such accuracy is redundant. It really does not coded as 1, 2, 3, etc. Extra caution is needed in analyzing matter whether P < 0.001 or P < 0.000,001 as far as its such data so that codes do not become quantities.
practical implication is concerned.

MISINTERPRETATION
'Conclusion' with respect to several parameters: Consider individual persons instead of groups. The Misinterpretation of statistical results mostly occurs due to reference range for most quantitative medical parameters failure to comprehend them in their totality and inability is obtained as mean ± 2SD of healthy subjects. These to juxtapose them with the realities of the situation. This statistical limits carry a risk of excluding 5% healthy subjects can happen either because of the limited knowledge of who have levels in the two extremes. When such limits are many medical professionals about statistical concepts 24 or applied on several parameters, it becomes very unlikely because of inadequate understanding of medical issues that all parameters in any person are within such statistical by the statisticians associated with medical projects.
range -even if he is fully healthy. This anomaly is sometimes forgotten while devising inclusion and exclusion

Misuse of P-values
criteria for subjects in the study. Statistical P-values seem to be gaining acceptance as a gold standard for data-based conclusions. However, This fallacy is similar to multivariate conclusion based on biological plausibility should not be abandoned in favor several univariate analyses and the one inherent in multiple of P-values. Inferences based on P-values can also produce comparisons. When the study is based on more than two a biased or incorrect result. groups, the Type I error swells when each group is compared with each of the others. To control this, methods 'Magic' threshold 0.05: A threshold 0.05 of Type I error such as Tukey and Dunnett are used. Few trials in surgery is customary in health and medicine. Except for and medicine consider such adjustment for multiple convention, there is no specific sanctity of this threshold.
comparison. 26 Multivariate conclusion refers to joint A result with P = 0.051 is statistically almost as significant conclusion on the basis of several outcomes. The same as one with P = 0.049, yet the conclusion reached would sort of increase in Type I error occurs in this case. A similar be very different if P = 0.05 is used as the threshold.
problem occurs when, for example, Student's t-test is used Borderline values always need additional precaution.
to compare osteoblast-like cells at several points of time when exposed and unexposed to a static magnetic field. 20 The practice now generally followed is to state exact Pvalues so that the reader can draw his own conclusion. A value of P around 0.10 can possibly be considered weak evidence against the null hypothesis and a small P, say less than 0.01, as strong evidence. This way it acquires continuum.
One-tail or two-tail P-values: If a regimen can do no harm, such as iron supplementation for augmentation of low Hb level, one-tail test is appropriate. Use of two-tail test in this case makes it unnecessarily restrictive and makes rejection of H 0 more difficult. Most statistical packages provide two-tail P-values as a default and many workers would not worry too much about this aspect. Scientifically, a conservative test does not do much harm, although some When each comparison is made at 5% level of significance, the total Type I error for all comparison together could be unbearably large.

Correlation versus cause-effect relationship
An association or a correlation in health can arise due to a large number of intervening factors. It is rarely a causeeffect type of relationship unless established by a carefully designed study that rules out the possibility of a significant role of any confounding factor.
A strong correlation between heights of siblings in a family exists not because one is the cause of the other but because both are affected by parental height. Similarly, correlation between visual acuity (VA) and metacarpal index in subjects IJO -January -March 2007 / Volume 41 / Issue 1 of age 50 years and above is not cause-effect type but arises because both are products of the same degeneration process. We do not expect VA to improve if metacarpal index is improved by some therapy.
An unusual confounding factor may provide useful information in some cases. Comparison of surgical and nonsurgical treatment of nondisplaced acute scaphoid fractures 27 thankfully considered poverty as a confounding factor, where the outcome is time taken to return to active Indrayan: Statistical fallacies LAST WORD! No decision is more important than the one concerning the life and health of people. The medical fraternity has a tall order. They are expected to prolong life and reduce suffering that occurs as a consequence of complex and often poorly understood interaction of a large number of factors. Some of these factors are explicit but many remain obscure and some behave in a very unpredictable manner. Uncertainties in health and medicine are indeed profound. work. Poverty can force people to return early but is many Knowingly for some but unknowingly for many, statistical times forgotten as a confounding factor in such studies.
methods play a vital role in all empirical inferences. Statistical methods can be a dangerous tool when used Distinction may be made between a necessary cause and a carelessly. Computer-based statistical packages have not sufficient cause. Sexual intercourse is necessary for initiating yet been given expertise to decide the correct method pregnancy but it is not sufficient. In fact, the correlation although they sometimes generate a warning message between the number of intercourses and number of when the data are not adequate. The user of the package pregnancies, even without barriers, is negligible. decides the method. If you are not sufficiently confident, do not hesitate to consult an expert biostatistician.

Sundry issues
The list of statistical fallacies seems to never end. Some of Medical journals too have a responsibility to ensure that those not discussed above are as follows.
the results of dubious quality are not published. Statistical refereeing is a norm for some journals but some are lax on Medical significance versus statistical significance: this issue. An improvement of one point in quality of life after a surgery can be statistically significant if n is large but may Our last advice is not to rely solely on statistical evidence. not have any medical significance in terms of condition of Statistical tools are surely good as an aid but rarely as a the patient or in terms of management of the condition.
master. Depend on your intuition more than science. If Statistical methods check statistical significance only and scientific results fail intuitional judgment, look for gaps. medical significance needs to be examined separately.
They would most likely lie with 'science' than with intuition.