Resources
bro just explained data visualization stuff so not renoting it here, slides should be self-explaining
Variable Association
Apparently, No need to learn these formulas by heart
These can also be used to identify outliers or determine if a value is typical for a distribution. Spearman is more robust to outliers than Pearson.
Pearson: Product-Moment Correlation
At least interval data. [-1 ≤ r ≤ 1] for perfect negative to perfect positive linear relationship
where n is the number of cases
This is used to determine the linear slope of correlation lines for e.g. scatter plots
Spearman: Rank Correlation
At least ordinal data. [-1 ≤ ρ ≤ 1] for perfect negative to perfect positive monotonic relationship
where d_i is the difference between ranks of each observation on the two variables, n is the number of cases
Cramér’s V
For when both variables are nominal (categorical). [0 ≤ V ≤ 1] for no association to perfect association
Cramér’s V can be used to determine the effect size of a Chi-Square Test.
Interpretation
Interpreting Cramér’s V depends on the degrees of freedom (df), which is calculated as , where k is the number of categories in one variable and r is the number of categories in the other variable.
| df | Small Effect | Medium Effect | Large Effect |
|---|---|---|---|
| 1 | 0.10 | 0.30 | 0.50 |
| 2 | 0.07 | 0.21 | 0.35 |
| 3 | 0.06 | 0.17 | 0.29 |
| 4 | 0.05 | 0.15 | 0.26 |
| 5 | 0.04 | 0.13 | 0.23 |