Essentials of Sample Size Calculation

Suman Kumar Pramanik

09 Aug 2019

Introduction

  • Sample Size Calculation is the fundamental requirement before carrying out any Inferential Study

  • Requires an understanding of the study design and a rough estimate of the desired parameters

Population and Sample

  • Population is usually hypothetical, often not well quantified

    • Males in India, Elderly females in India
  • Our aim is to find about any attribute of the population

    • Prevalence of Heart Attack, Chances of surviving after Lung Cancer at the end of 1 year

    • Mean age of males having 1st episode of Acute Coronary Syndrome

  • We estimate it by studying a sample from the population

Central Dogma of Inferential Statistics

  • POPULATION \(\rightarrow\) SAMPLE

  • ESTIMATE POPULATION \(\leftarrow\) SAMPLE

  • What we get is an Estimate of Actuality from studying a sample

What is a Random Sample?

  • All the entities from underlying population have equal chance of being selected for inclusion in the sample.

  • Prior inclusion of any entity does not influence the chances of any of the entities being selected in future.

Simulation: One Population

Scenario

  • Duration of remission after chemotherapy (exponential distribution)

  • Mean Duration: 5 months

  • Our aim is to estimate mean remission duration of the underlying population

Two random samples (size = 100)

  • Both the random samples are different

  • Blue Sample (Mean): 4.8525941

  • Red Sample (Mean): 4.9722859

Distribution (1000) of sample means (Sample size: 10)

Distribution of Sample Means with Sample Size

Larger the sample size

  • Distribution nearer to normal (Central Limit Theorem)

  • Narrower is the spread of distribution. More accurate is the estimate.

  • No change in the mean value of the distribution

Requirements for smple size calculation for 1 sample

  • Minimum desired precision (Margin of error) (\(\Delta\))

  • Underlying standard deviation (\(sd\)) (measure of spread, may require pilot study)

  • Width of confidence interval (95% CI, 90% CI, 99% CI)

  • For 95% CI

\[ n = (sd * Z_{0.975} / \Delta)^{2} \\Z_{0.975} = 1.96 \]

Example

  • A survey is carried out to estimate the mean height of a population in a city

  • A pilot survey was carried out with 50 people and standard deviation of 50 cm was estimated

  • We want to estimate the mean height with a margin of error of 10 cm

  • n = ((50 * 1.96) / 10)^2 = 96

Simulation for difference between two populations

Scenario

  • A new drug (Drug 2) has been invented as chemotherapy which is being tested against standard of care (Drug 1)

  • Outcome of interest is the duration of remission

  • Say, duration of remission for standard of care (Drug 1) is exponentially distributed with mean of 5 months

  • Duration of remission for new drug (Drug 2) is exponentially distributed with mean of 10 months

Assessing difference between Drug 1 and Drug 2

  • Difference between means between Drug 2 and Drug 1 (Drug 2 - Drug 1)

  • Drug 2 is better by 5 months than Drug 1 (A VALUE WHICH IS NOT KNOWN IN REAL LIFE)

  • We define a clinically significant difference between Drug 1 and Drug 2 as 3 months (POPULATION CHARACTERISTICS)

Simulation

  • Population under Null hypothesis: difference = 0 (5m, 5m)

  • Population under Clinically significant difference hypothesis (population with minimum difference of clinically significant difference): difference = 3 (5m, 8m)

Sample size = 10

  • Green dashed line: sample mean

  • Red line: 97.5% of the null population (determines the region of rejection)

  • Sample in the region of rejection: assumed that sample doesnot belong to the null population

  • Assuming that null population is the truth, probability of committing error that the sample doesnot belong to the null population: Type I error (Alpha) (5%)

  • Upper panel: Population belonging to minimum clinically significant difference

  • Assuming the the above population is the truth, the probability of correctly inferring that the sample belongs to above population is the POWER (80%)

  • Assuming that the null population is the truth, the probability that the sample same or more extreme to the present sample belongs to the null population is p value

Sample size = 100

  • p value decreases to so called statistically significant level, just by increasing the sample size

    • FALLACY of P VALUE: Some Other Day!!
  • Power of study increases by increasing the sample size

  • Estimates estimate the population parameter more precisely by increasing the sample size

Sample size = 1000

PS Power and Sample Size Calculator (Vanderbilt University)

Download …

Downloadable from http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize as pssetup3.exe file

Cite the package …

Dupont WD, Plummer WD: ‘Power and Sample Size Calculations: A Review and Computer Program’, Controlled Clinical Trials 1990; 11: 116-28

or

Dupont WD, Plummer WD: ‘Power and Sample Size Calculations for studies involving Linear Regression:’, Controlled Clinical Trials 1998; 19: 589-601

Slides can be obtained from …

https://sumprain.netlify.com/files/html/sample_ahrr/presentation_ahrr.html