@ Biostatistics & Hands on Practices on Medical Data Using SPSS, ILBS
SUMAN KUMAR
18 August 2017
gos6 | outcome | gender | age | wfns | s100b | ndka | |
1 | 5 | Good | Female | 42 | 1 | -2.040 | 1.102 |
2 | 5 | Good | Female | 37 | 1 | -1.966 | 2.145 |
3 | 5 | Good | Female | 42 | 1 | -2.303 | 2.091 |
4 | 5 | Good | Female | 27 | 1 | -3.219 | 2.344 |
5 | 1 | Poor | Female | 42 | 3 | -2.040 | 2.856 |
6 | 1 | Poor | Male | 48 | 2 | -2.303 | 2.546 |
This is Bayes Theorem. Crux of all diagnostic tests
In real life, we apply a chain of diagnostic tests to assign a particular class to a patient.
s100b
assays100b
assay)s100b | outcome | |
1 | -2.040 | Good |
2 | -1.966 | Good |
3 | -2.303 | Good |
4 | -3.219 | Good |
5 | -2.040 | Poor |
6 | -2.303 | Poor |
7 | -0.755 | Good |
8 | -1.833 | Poor |
9 | -1.715 | Good |
10 | -2.303 | Good |
Higher values of s100b are associated with Poor class
Mistakes are committed by the diagnostic test
pred_outcome | Good | Poor | |
1 | Pred_Good | 63 | 24 |
2 | Pred_Poor | 9 | 17 |
Depicts discriminatory performance
Posttest Odds of being in Class 1 given test positive for Class 1 = Pretest Odds of being in Class 1 x Positive LR
Posttest Odds of being in Class 1 given test negative for Class 1 = Pretest Odds of being in Class 1 x Negative LR
Higher Positive LR increases the chance that patient belongs to class 1, if test is positive for class 1 (10)
Lower Negative LR decreases the chance that patient belongs to class 1, if test is negative for class 1 (0.1)
We need to know about pretest prevalence of disease to know about the post test prevalence of disease (OUR PRIMARY INTEREST)
cut_offs | sensitivity | specificity | |
1 | -3.3627 | 0.9756 | 0 |
2 | -3.1073 | 0.9756 | 0.0694 |
3 | -2.9046 | 0.9756 | 0.1111 |
4 | -2.1638 | 0.7561 | 0.5417 |
5 | -2.0802 | 0.7317 | 0.5417 |
6 | -2.0032 | 0.6829 | 0.5833 |
7 | -1.0087 | 0.4146 | 0.875 |
8 | -0.9296 | 0.4146 | 0.8889 |
9 | -0.8678 | 0.3902 | 0.8889 |
10 | -0.0958 | 0.0488 | 1 |
11 | 0.3434 | 0.0244 | 1 |
outcome | predicted_prob | |
1 | Good | 0.2847 |
2 | Good | 0.3019 |
3 | Good | 0.2288 |
4 | Good | 0.0961 |
5 | Poor | 0.2847 |
6 | Poor | 0.2288 |
7 | Good | 0.6267 |
8 | Poor | 0.3343 |
9 | Good | 0.3643 |
10 | Good | 0.2288 |
ROC curves should never be used to find cut offs
Diagnostic Test 2: Nucleoside diphosphate kinase A (NDKA)
Let us say, the cut off is 2.5
var | sensitivity | specificity | pos_lr | neg_lr | |
1 | s100b | 0.4146 | 0.875 | 3.3171 | 0.669 |
2 | ndka | 0.6098 | 0.5556 | 1.372 | 0.7024 |
##
## DeLong's test for two correlated ROC curves
##
## data: roc_s100b and roc_ndka
## Z = 1.3908, p-value = 0.1643
## alternative hypothesis: true difference in AUC is not equal to 0
## sample estimates:
## AUC of roc1 AUC of roc2
## 0.7313686 0.6119580
Repeated measurements of a given attribute of different entities (subjects)
Measurements can be by different entities (raters) or are repeated measurements by same entity (rater)
Raters can be decomposed into humans and machines
Raters are fixed or are a random sample from a big population. Subjects are always a random sample from a big population
Single rater, multiple measurements: Rater effect is always fixed
Measurements can be categorical, nominal or continuous
Aim: To find out magnitude of inter-measurement variation
\[ Val_{obs} = Val_{true} + Error \]
\[ Error = Error_{rater} + Error_{instrument} + Error_{unexplainable} \]
Variance is measure of dispersion of individual measurements from mean
Mean can be of total measurements, measurements within each subject and measurements by each rater
\[ Var_{total} = Var_{between-subject} + Var_{between-rater} + Var_{rest} \]
\[ ICC = Var_{between-subject}/(Var_{between-subject} + Var_{between-rater} + Var_{rest}) \]
ICC: between 0 to 1
< 0.5 is poor reliability
Data: Anxiety score by three raters on 20 subjects
rater1 | rater2 | rater3 | |
1 | 3 | 3 | 2 |
2 | 3 | 6 | 1 |
3 | 3 | 4 | 4 |
4 | 4 | 6 | 4 |
5 | 5 | 2 | 3 |
6 | 5 | 4 | 2 |
7 | 2 | 2 | 1 |
8 | 3 | 4 | 6 |
9 | 5 | 3 | 1 |
10 | 2 | 3 | 1 |
## Single Score Intraclass Correlation
##
## Model: twoway
## Type : agreement
##
## Subjects = 20
## Raters = 3
## ICC(A,1) = 0.198
##
## F-Test, H0: r0 = 0 ; H1: r0 > 0
## F(19,39.7) = 1.83 , p = 0.0543
##
## 95%-Confidence Interval for ICC Population Values:
## -0.039 < ICC < 0.494
old | new | |
1 | 1.1019 | 1.1015 |
2 | 2.1448 | 2.2711 |
3 | 2.0906 | 2.1748 |
4 | 2.3437 | 2.6211 |
5 | 2.8565 | 2.9799 |
6 | 2.5455 | 2.7093 |
7 | 1.7918 | 1.7754 |
8 | 2.5802 | 2.8231 |
9 | 2.7434 | 2.6784 |
10 | 1.7934 | 1.8215 |
##
## Pearson's product-moment correlation
##
## data: ndka_df$old and ndka_df$new
## t = 36.283, df = 111, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9428688 0.9725325
## sample estimates:
## cor
## 0.9603319
Linearly correlated does not mean presence of agreement
We have perfect agreement only if the points lie along the line of equality (they donot have any difference), but we will have perfect correlation if the points lie along any straight line
Change in scale of measurement does not affect the correlation, but it affects the agreement
##
## Shapiro-Wilk normality test
##
## data: ba_df$diffs
## W = 0.98909, p-value = 0.5018
Differences are normally distributed
The difference between new and old methods can vary between -0.291 and 0.49 95% of the times