Resources

bro just explained data visualization stuff so not renoting it here, slides should be self-explaining

Variable Association

Apparently, No need to learn these formulas by heart

These can also be used to identify outliers or determine if a value is typical for a distribution. Spearman is more robust to outliers than Pearson.

Pearson: Product-Moment Correlation

At least interval data. [-1 ≤ r ≤ 1] for perfect negative to perfect positive linear relationship

where n is the number of cases

This is used to determine the linear slope of correlation lines for e.g. scatter plots

Spearman: Rank Correlation

At least ordinal data. [-1 ≤ ρ ≤ 1] for perfect negative to perfect positive monotonic relationship

where d_i is the difference between ranks of each observation on the two variables, n is the number of cases

Cramér’s V

For when both variables are nominal (categorical). [0 ≤ V ≤ 1] for no association to perfect association

Cramér’s V can be used to determine the effect size of a Chi-Square Test.

Interpretation

Interpreting Cramér’s V depends on the degrees of freedom (df), which is calculated as , where k is the number of categories in one variable and r is the number of categories in the other variable.

dfSmall EffectMedium EffectLarge Effect
10.100.300.50
20.070.210.35
30.060.170.29
40.050.150.26
50.040.130.23