Interpretacio curta
Transcripción
Interpretacio curta
15/10/12 The role of statistics “Thus statistical methods are no substitute for common sense and objectivity. They should never aim to confuse the reader, but instead should be a major contributor to the clarity of a scientific argument.” p The role of statistics. Pocock SJ . Br J Psychiat 1980; 137:188-190 [email protected] [email protected] 1 Population and Samples Extrapolation Sample Sample Study Results Inferential analysis Population of the Study Statistical Tests Confidence Intervals Target Population [email protected] 2 Population “Conclusions” [email protected] 3 4 P-value: an intuitive definition P-value • The p-value is the probability of having observed our data when the null hypothesis is true (no differences exist) • The p-value is a “tool” to answer the question: – Could the observed results have occurred by chance*? p < .05 • Steps: 1) Calculate the treatment differences in the sample (A-B) 2) Assume that both treatments are equal (A=B) and then… 3) …calculate the probability of obtaining a magnitude of at least the observed differences, given the assumption 2 4) We conclude according the probability: “statistically significant” – Remember: • Decision given the observed results in a SAMPLE • Extrapolating results to POPULATION a. p<0.05: the differences are unlikely to be explained by random, – [email protected] we assume that the treatment explains the differences b. p>0.05: the differences could be explained by random, *: accounts exclusively for the random error, not bias – 5 we assume that random explains the differences [email protected] 6 1 15/10/12 Intervalo de Confianza Intervalo de confianza para evaluar ensayos de superioridad Superioridad observada Superioridad no observada 0 Intuitivamente: El verdadero valor se encuentra dentro del intervalo con una confianza del 95% [email protected] Test better IC95% d<0 - effect 8 Factors influencing statistical significance Superiority study Control better [email protected] 7 d=0 No differences • Signal • Difference • Noise (background) • Variance (SD) • Quantity • Quantity of data d>0 + effect [email protected] [email protected] 9 10 Random vs Sistematic error Diferencia observada • Falsa: – Sesgos • Ej: en selección (muestra no representativa) Random – Error en mediciones o transcripción de datos – Azar Bias • Real: ↑ Sample size ↑ Sample size – reflejo de diferencia en población [email protected] 11 [email protected] 12 2 15/10/12 Utilidad de Creer en la Existencia de Dios (según Pascal) Type I & II Error & Power H0: Dios No Existe H1: Dios Existe Reality (Population) Conclusion (sample) Realidad A=B A≠B “A=B” p>0.05 OK Type II error (β) A≠B p<0.05 Type I error (α) Decisión de Pascal OK [email protected] Dios Existe Dios No Existe Acierto No Penalización Condena Eterna Acierto Dios Existe Dios No Existe [email protected] 13 Type I & II Error & Power 14 Sample Size • Type I Error (α) – False positive – Rejecting the null hypothesis when in fact it is true – Standard: α=0.05 – In words, chance of finding statistical significance when in fact there truly was no effect u The planned number of participants is calculated on the basis of: – Expected effect of treatment(s) – Variability of the chosen endpoint – Accepted risks in conclusion ↗ effect ↘ number ↗ variability ↗ number ↗ risk ↘ number • Type II Error (β) – False negative – Accepting the null hypothesis when in fact alternative is true – Standard: β=0.20 or 0.10 – In words, chance of not finding statistical significance when in fact there was an effect [email protected] [email protected] 15 Sample Size Sample Size • The planned number of participants is calculated on the basis of: – Expected effect of treatment(s) 300 200 100 100 – Variability of the chosen endpoint ↗ effect ↘ number ↗ variability ↗ number ↗ risk ↘ number – Accepted risks in conclusion ↗ risk ↘ number Reality (Population) ALTURA ALTURA ALTURA 200 – Expected effect of treatment(s) ↗ variability ↗ number – Accepted risks in conclusion 300 • The planned number of participants is calculated on the basis of: ↗ effect ↘ number – Variability of the chosen endpoint 16 120 100 A=B A≠B “A=B” p>0.05 OK Type II error (β) A≠B p<0.05 Type I error (α) POWER 80 N = 2000.00 Frecuencia Media = 165.0 N = 2000.00 0 0.0 22 0.0 21 0.0 20 0.0 19 0.0 18 0.0 17 0.0 16 0.0 15 0.0 14 0.0 13 0.0 12 0.0 11 2.5 20 .5 7 19 .5 2 19 .5 7 18 .5 2 18 .5 7 17 .5 2 17 .5 7 16 .5 2 16 .5 7 15 .5 2 15 .5 7 14 .5 2 14 .5 7 13 .5 2 13 .5 7 12 .5 2 12 ALTURA Frecuencia Desv. típ. = 25.54 Media = 165.1 0 40 Desv. típ. = 32.27 20 Conclusion (sample) Media = 165.1 N = 2000.00 0 0.0 25 .0 0 24 .0 0 23 .0 0 22 .0 0 21 .0 0 20 .0 0 19 .0 0 18 .0 0 17 .0 0 16 .0 0 15 .0 0 14 .0 0 13 .0 0 12 .0 0 11 .0 0 10 .0 90 .0 80 Frecuencia 60 Desv. típ. = 26.94 ALTURA ALTURA [email protected] 17 [email protected] 18 3 15/10/12 Torneo Roland Garros 1999 1ª Ronda Carlos Moyá vs Markus Hipfl Juegos Totales Ganados Puntos Totales Ganados 1er Servicio Aces Doble Faltas % Ganadores con el 1er Servicio % Ganadores con el 2º Servicio Ganadores (incluyendo el Servicio) Errores No Forzados Puntos de Break Ganados Aproximaciones a la red Velocidad del Servicio más Rápido Promedio Velocidad 1er Servicio Promedio Velocidad 2º Servicio MULTIPLICITY [email protected] say it colloquially, torture the data until they speak... Hipfl 22 147 62% 5 4 63 de 95 = 66% 25 de 58 = 43% 30 62 6 of 21 = 29% 48 of 71 = 68% 200 KPH 157 KPH 132 KPH 24 146 69% 3 5 61 de 96 = 64% 20 de 44 = 45% 56 75 6 of 27 = 22% 29 of 41 = 71% 193 KPH 141 KPH 126 KPH Set 1 2 3 4 5 Carlos Moyá Markus Hipfl 3 6 1 6 6 4 6 4 6 4 [email protected] 19 u To Moyá 20 Torturing data… – Investigators examine additional endpoints, manipulate group comparisons, do many subgroup analyses, and undertake repeated interim analyses. – Investigators should report all analytical comparisons implemented. Unfortunately, they sometimes hide the complete analysis, handicapping the reader’s understanding of the results. Lancet 2005; 365: 1591–95 [email protected] Lancet 2005; 365: 1591–95 [email protected] 21 22 Multiplicity Design Conduc@on K independent hypothesis : H01 , H02 , ... , H0K S significant results ( p<α ) Results Pr (S ≥ 1 | H01 ∩ H02 ∩ ... ∩ H0K = H0.) = 1 - Pr (S=0|H0.) = 1- (1 - α)K [email protected] 23 K Pr(S>=1|Ho.) K Pr(S>=1|Ho.) 1 0.0500 10 0.4013 2 0.0975 15 0.5367 3 0.1426 20 0.6415 4 0.1855 25 0.7226 5 0.2262 30 0.7854 [email protected] 24 4 15/10/12 Same examples Handling Multiplicity in Variables Variables Times Subgroups Comparisons case A 2 2 2 1 case B 5 4 3 1 case C 5 4 3 3 total False positive rate 8 33.66% 60 96.61% 180 99.99% [email protected] • Scenario 1: One Primary Variable – Identify one primary variable -- other variables are secondary – Trial is positive if and only if primary variable shows significant (p < 0.05), positive results [email protected] 25 26 Multiplicity • Bonferroni correction (simplified version) – K tests with level of signification of α – Each test can be tested at the α/k level • Example: – 5 independent tests – Global level of significance=5% – Each test shoud be tested at the 1% level 5% /5 => 1% [email protected] 27 [email protected] 28 Subgroups • Indiscriminate subgroup analyses pose serious multiplicity concerns. Problems reverberate throughout the medical literature. Even after many warnings, some investigators doggedly persist in undertaking excessive subgroup analyses. SUBGROUPS Lancet 2000; 355: 1033–34 Lancet 2005; 365: 1657–61 [email protected] 29 [email protected] 30 5 15/10/12 Interacción Factores de confusión d=5% d=6% Edad >= 45 años Edad < 45 años d=11.5% d=0.7% [email protected] [email protected] 31 FEMALE Succes Failure 100 Control n (%) 60 (60%) 40 (40%) ALL 100 [email protected] 32 Subgroups & Simpson’s Paradox MALE ALL d=0% d=0% Subgroups & Simpson’s Paradox Experimental n (%) 70 (70%) 30 (30%) Fumadores No fumadores 33 Succes Failure Experimental n (%) 10 (33%) 20 (67%) Control n (%) 24 (40%) 36 (60%) Succes Failure 60 (86%) 10 (14%) 36 (90%) 4 (10%) Succes Failure 30 70 Experimental n (%) 70 (70%) 30 (30%) 100 cont. 60 40 Control n (%) 60 (60%) 40 (40%) 100 [email protected] 34 Changes from ISIS-2 results Subgroups ISIS-2: Vascular death by Star signs Geminis/Libra Other Star Signs Aspirin Placebo Vascular Death Total p=0.42045 150 147 1357 1442 11.1% 10.2% d=-0.9 Aspirin Placebo Vascular Death Total 654 868 7228 7157 9.0% p<0.0001 12.1% d=3.1 Interacction p = 0.019 Lancet 1988; 2: 349–60. [email protected] 35 Lancet 2005; 365: 1657–61 [email protected] 36 6 15/10/12 • “The answer to a randomized controlled trial that does not confirm one’s beliefs is not the conduct of several subanalyses until one can see what one believes. Rather, the answer is to reexamine one’s beliefs carefully.” – BMJ 1999; 318: 1008–09. Lancet 2005; 365: 176–86 [email protected] 37 [email protected] 38 Seamos críticos • En ocasiones las cosas no son lo que parecen [email protected] 41 [email protected] 40 [email protected] 42 7 15/10/12 Seamos críticos ¿Me fío del valor? Seamos críticos Otro ejemplo más • Afirmaciones sin especificación de resultados • A un paciente se le recomienda una intervención quirúrgica y pregunta por la probabilidad de sobrevivir. • Porcentajes sin el denominador • El cirujano le contesta que en las 30 operaciones que ha realizado, ningún paciente ha muerto. • Medias sin intervalo de confianza • ¿Qué valores de P(morir) son compatibles con esta información, con una confianza del 95%? [email protected] Seamos críticos Solución • Si se disponen de datos... ¡¡¡ p<0.05 !!! • La solución aproximada no sirve. • Solución exacta, basada en la binomial: {0; 0,116} • ... No se han de desperdiciar. Unos datos bien ‘torturados’ al final cantan. • Incluso si la mortalidad es de un 11,6%, en 30 intervenciones no se observará ninguna muerte con Pr=0,025 45 ... ¿Y lo del denominador? El famoso perro fantástico [email protected] 44 Seamos críticos • Límite superior del IC 95% para p=0 con n=30 Pr(X=0,n=30,ps) = 0,025 [email protected] [email protected] 43 [email protected] 46 Por que después pasa lo que pasa 47 [email protected] 48 8