Resources

Slides

Exercise Slides

Recording Lecture, Exercise

Replicability Crisis

Despite the scientific method and statistical tools, many research findings have been found to be non-replicable. In psychology, Brian Nosek found a replicability of only 36%, and even in experimental economics only 61% of studies could be replicated successfully. This has led to a “replicability crisis” in various scientific fields.

A significant factor to ensure replicability is to set up studies in a transparent and reproducible manner, which is the focus of Open Science. This includes coming up with a hypothesis and registering it before data collection, sharing data and code, and publishing in open access formats.

Questionable Research Practices

When researchers deviate from the ideal scientific method, it can lead to questionable research practices that inflate false positive rates. Two common practices are:

HARKing

Hypothesizing After the Results are Known. This practice can lead to false positives and is discouraged in scientific research. If collected data changes the hypothesis, a new study should be conducted to test the new hypothesis.

p-hacking

Manipulating data analysis until statistically significant results are obtained. This can involve trying multiple analyses and only reporting those that yield significant results, or collecting more data until significance is reached. It may also involve selective reporting of studies or rounding p-values to appear significant.

Degrees of Freedom in Data Collection

Researchers have many degrees of freedom in how they collect and analyze data, which can lead to biased results. Examples include:

Deciding when to stop data collection based on interim results
Defining and exluding outliers post hoc
Choosing covariates and moderators after seeing the data
Reported measures and conditions based on what yields significant results
Data transformations and statistical models chosen after looking at the data

Ideally, all these decisions should be decided a priori and pre-registered to avoid bias.

Result-contingent decisions can increase the false positive rate significantly. For a study comparing two groups on 20 observations, the false positive rate can rise from 5% to:

Degree of Freedom	False Positive Rate
A: Adding an independent variable	9.5%
B: Adding 10 more observations per condition, if original sample size is non-significant	7.7%
C: Controlling for a third binary variable, e.g. gender	11.7%
D: Including an additional condition and dropping one if it yields non-significant results	12.6%
A + B	14.4%
A + B + C	30.9%
A + B + C + D	60.7%

Preregistration

To solve the issue of questionable post-hoc decisions, researchers can preregister their studies. Preregistration helps to distinguish between confirmatory and exploratory analyses and reduces the risk of biased results. Once submitted to a platform like AsPredicted or OSF, the plan is timestamped and cannot be changed. Such a plan includes:

Research question and hypotheses
Dependent and independent variables and how they connect to the hypotheses
Study design, including sample size (based on power analysis) and data collection methods
Statistical models to adress the hypotheses, including transformations
Criteria for data exclusion and handling of outliers

If there are deviations from the preregistered plan, they should be transparently reported in the publication. Additional analyses can be conducted but should be labeled as exploratory.

Open Data

To allow for verification and replication, researchers should share their data and code on platforms like OSF. This may include materials used for data collection, preregistration files, collected data and analysis code. Once submitted, everyone should be able to access and reproduce the results.

Open data also allows other researchers to conduct meta-analyses and build upon existing work.

Bayesian Analysis

Traditional null hypothesis significance testing (NHST) has limitations, such as the arbitrary p-value threshold and the inability to provide evidence for the null hypothesis. When $H_{0}$ and $H_{1}$ are equally likely, under a p-value of $0.05$ , $H_{1}$ is correct in 94% of cases. However, if $H_{1}$ is only true in 10% of cases, the probability that $H_{1}$ is correct given a p-value of $0.05$ drops to 64%.

To better understand the evidence provided by data, or the trustworthiness of results, Bayesian statistics can be used. Bayesian analysis allows for the incorporation of prior beliefs and provides a more intuitive interpretation of results.

Instead of calculating the probability of the data given the null hypothesis (-value), Bayesian analysis calculates the probability of the hypothesis given the data.

p (H ∣ D) = \frac{p ( D ∣ H ) \times p ( H )}{p ( D )}

Posterior = (Likelihood × Prior) / Evidence

Here, the prior represents our belief about the hypothesis before seeing the data, and the posterior represents our bayed belief after considering the data.

Bayes Factor

The Bayes Factor (BF) quantifies the evidence provided by the data in favor of one hypothesis over another. It is calculated as:
$B F_{10} = \frac{p ( D ∣ H _{1} )}{p ( D ∣ H _{0} )} = \frac{1}{B F _{01}}$
It is a likelihood ratio of the data under two competing hypotheses, or the difference of the density for the reference value under the posterior and prior distribution. Contrary to p-values, Bayes Factors can provide evidence for both $H_{1}$ and $H_{0}$ .

Bayesian t-tests are more consistent than traditional t-tests, especially with small sample sizes.

Flo's Notes

Explorer

Open Science and Replicability

Replicability Crisis

Questionable Research Practices

Degrees of Freedom in Data Collection

Preregistration

Bayesian Analysis

Graph View

Table of Contents

Backlinks