Resources
bro just explained data visualization stuff so not renoting it here, slides should be self-explaining
Levels of Measurement
- Nominal: Categories without order
- Ordinal: Categories with order
- Interval: Numeric scales with equal intervals, no true zero
- Ratio: Numeric scales with equal intervals and a true zero
Variable Association
Apparently, No need to learn these formulas by heart
These can also be used to identify outliers or determine if a value is typical for a distribution. Spearman is more robust to outliers than Pearson.
Pearson: Product-Moment Correlation
At least interval data.
[-1 ≤ r ≤ 1] for perfect negative to perfect positive linear relationship
where n is the number of cases
This is used to determine the linear slope of correlation lines for e.g. scatter plots
Spearman: Rank Correlation
At least ordinal data.
[-1 ≤ ρ ≤ 1] for perfect negative to perfect positive monotonic relationship
where d_i is the difference between ranks of each observation on the two variables, n is the number of cases
Cramér’s V
For when both variables are nominal (categorical).
[0 ≤ V ≤ 1] for no association to perfect association
Cramér’s V can be used to determine the effect size of a Chi-Square Test.
Interpretation
Interpreting Cramér’s V depends on the degrees of freedom (df), which is calculated as , where k is the number of categories in one variable and r is the number of categories in the other variable.
| df | Small Effect | Medium Effect | Large Effect |
|---|---|---|---|
| 1 | 0.10 | 0.30 | 0.50 |
| 2 | 0.07 | 0.21 | 0.35 |
| 3 | 0.06 | 0.17 | 0.29 |
| 4 | 0.05 | 0.15 | 0.26 |
| 5 | 0.04 | 0.13 | 0.23 |