Sample Size Calculation is the fundamental requirement before carrying out any Inferential Study
Requires an understanding of the study design and a rough estimate of the desired parameters
Population is usually hypothetical, often not well quantified
Our aim is to find about any attribute of the population
Prevalence of Heart Attack, Chances of surviving after Lung Cancer at the end of 1 year
Mean age of males having 1st episode of Acute Coronary Syndrome
We estimate it by studying a sample from the population
POPULATION \(\rightarrow\) SAMPLE
ESTIMATE POPULATION \(\leftarrow\) SAMPLE
What we get is an Estimate of Actuality from studying a sample
All the entities from underlying population have equal chance of being selected for inclusion in the sample.
Prior inclusion of any entity does not influence the chances of any of the entities being selected in future.
Duration of remission after chemotherapy (exponential distribution)
Mean Duration: 5 months
Our aim is to estimate mean remission duration of the underlying population
Both the random samples are different
Blue Sample (Mean): 4.8525941
Red Sample (Mean): 4.9722859
Distribution nearer to normal (Central Limit Theorem)
Narrower is the spread of distribution. More accurate is the estimate.
No change in the mean value of the distribution
Minimum desired precision (Margin of error) (\(\Delta\))
Underlying standard deviation (\(sd\)) (measure of spread, may require pilot study)
Width of confidence interval (95% CI, 90% CI, 99% CI)
For 95% CI
\[ n = (sd * Z_{0.975} / \Delta)^{2} \\Z_{0.975} = 1.96 \]
A survey is carried out to estimate the mean height of a population in a city
A pilot survey was carried out with 50 people and standard deviation of 50 cm was estimated
We want to estimate the mean height with a margin of error of 10 cm
n = ((50 * 1.96) / 10)^2 = 96
A new drug (Drug 2) has been invented as chemotherapy which is being tested against standard of care (Drug 1)
Outcome of interest is the duration of remission
Say, duration of remission for standard of care (Drug 1) is exponentially distributed with mean of 5 months
Duration of remission for new drug (Drug 2) is exponentially distributed with mean of 10 months
Difference between means between Drug 2 and Drug 1 (Drug 2 - Drug 1)
Drug 2 is better by 5 months than Drug 1 (A VALUE WHICH IS NOT KNOWN IN REAL LIFE)
We define a clinically significant difference between Drug 1 and Drug 2 as 3 months (POPULATION CHARACTERISTICS)
Population under Null hypothesis: difference = 0 (5m, 5m)
Population under Clinically significant difference hypothesis (population with minimum difference of clinically significant difference): difference = 3 (5m, 8m)
Green dashed line: sample mean
Red line: 97.5% of the null population (determines the region of rejection)
Sample in the region of rejection: assumed that sample doesnot belong to the null population
Assuming that null population is the truth, probability of committing error that the sample doesnot belong to the null population: Type I error (Alpha) (5%)
Upper panel: Population belonging to minimum clinically significant difference
Assuming the the above population is the truth, the probability of correctly inferring that the sample belongs to above population is the POWER (80%)
p value decreases to so called statistically significant level, just by increasing the sample size
Power of study increases by increasing the sample size
Estimates estimate the population parameter more precisely by increasing the sample size
Downloadable from http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize as pssetup3.exe
file
Dupont WD, Plummer WD: ‘Power and Sample Size Calculations: A Review and Computer Program’, Controlled Clinical Trials 1990; 11: 116-28
or
Dupont WD, Plummer WD: ‘Power and Sample Size Calculations for studies involving Linear Regression:’, Controlled Clinical Trials 1998; 19: 589-601
https://sumprain.netlify.com/files/html/sample_ahrr/presentation_ahrr.html