Please take your seats. Please take your seats. >> Good afternoon and welcome to week six of the module in research and management and economics. In today’s week we are going to be entering the world of regression analysis and as you can see, we’ll be able to see from the schedule, we will stay in that world of regression for a couple of weeks because regression analysis is a very flexible and very powerful framework and that is part of many statistical analyses and tools that you could use to analyze data. Before we get into that, let’s quickly recap the issues that we’ve talked about in the last week where we took a first step into applying the techniques of frequent statistical inference testing to get to know a couple of prominent approaches that have been developed to test hypotheses. And the first self-quiz question that I posed at the end of the last week was which statistical test is indicated for each of the following situations? So I’m going to present you with three situations. And for each test, also give this test statistic that is used to compute a p-value. First, comparing the means of three or more groups across one or several factors or independent variables. What is the statistical tool that we talked about that is applicable in this situation? So we talked about one-way ANOVA and factor ANOVA. Yeah. The analysis of variance, short ANOVA, is applicable in the situation when you have means of several groups and/or several factors, and you want to see to what extent these factors contribute to explaining the data or whether they might also be interacting with each other. Okay, here’s the second situation. We are interested in measuring or in computing, analyzing the association of two nominal level variables. What’s the situation where we have at least one nominal level variable? Which statistical test is applicable here? Yes, please. Chi-square, exactly. Chi-square is applicable in this situation. And what is the test statistic that’s used for the chi-square? Yeah? - Chi-square? - Well, it’s the chi-square value. Sorry, I think I forgot to ask this question for the previous one. Like, if we’re comparing the means of three or more groups across one or several factors and using an analysis of variance, what is the test statistic that we’re using here? Yeah, it’s the F distribution, exactly. So F is our test statistic, and we discussed how to compute it, and we used the empirically observed F value to compute AP value. Okay, and then the third situation is when we are comparing the means of two independent groups, what is the statistical test that we’re applying in this situation? Yeah? - T test. - It’s a T test, exactly. So we talked about several variants of the T-test. We also talked about the situation where we have dependent measurements or two dependent measurements for each individual. We can also use the T-test that would then be a dependent samples T-test. And we also talked about the one sample T-test when we are comparing the mean observed in one group against one reference value. That will also be a T-test. And what is the test statistic that we’re using the t-test to compute a p-value? It’s the t. This might sound trivial, but it might always be good to remember what these names stand for, and also how we actually get to a p-value. Great. Then, the second bigger self-quiz question that I posed at the end of the last week was, given an effect size measure for each of these three tests, So maybe let’s start with the ANOVA. What kind of effect size or effect sizes did we talk about? We talked about the eta squared and the omega squared exactly. So in general, remember where is an ANOVA and the p-value gives us an indication whether there’s a significant relationship between the dependent and the independent variables. It doesn’t tell us anything about the size of the effect, and this is why in addition to a p-value, it’s also very helpful and instructive to compute and report an effect size measure. This is why we discussed it. Okay, so that was the effect sizes for the analysis of variance. What about the effect size of a chi-square? How can we express the effect size of a chi-square test? the Gramis v exactly, that’s the one that we talked about. And actually also the ones for effect sizes for the ANOVA we talked about conventions that have been established benchmarks for defining what a small effect is, what a medium effect is, and what a large effect is. Okay, and then finally, what about a T-test? What is the frequently used effect size measure for a T-test irrespective of whether it’s independent samples, dependent samples, or one sample t test, yes please. Cohen’s d is the most frequently used effect size measure for comparing means or for evaluating means of two groups, for instance. Okay, and then the final self-course question was, imagine that in a factorial ANOVA, you obtain a p-value of 0.02 for the interaction between two factors. How do you interpret this result? So this really gets to this issue, what an interaction means. Yes, please. - So in a normal way we have two vectors that stay and each vector has different levels. - Mm-hmm. - From exercise example we have two nicks and supplies have two vectors and we have four nicks that stay for supplies also. So in the case of interaction, moonlight 2 stays as a no interaction, like an effect of supply is the same for the opening and the effect of clinics is the same for all supplies. And here, p-value is 0.02. And let’s say we have, in general, a threshold point of alpha, which is 5%. And this is the type of error we are going to accept if non-subdued is true. And in this case, the p is smaller than 0.5. That means we reject null-decorches. And in that case, I cannot apply the hypothesis because the effect of supplying is not the same for all clinics. and the effect of clinics is not same for all suppliers. And as I remember, there is one important point. As I said, let’s imagine we have levels of clinics and levels of suppliers. If one of these levels is different and the other three is same for clinics, it means there is still an interaction. You could conclude it. - Yes, thank you very much. Exactly, so in short, there’s one rule of thumb, basically you can remember to make sense of an interaction, interaction expresses or tests whether the effect of one factor depends on the level of the other factor. So when you have an interaction in front of you, that usually means that you have at least two factors in front of you, okay? And if that interaction, like in this present case here, is significant, that means that the effect, the magnitude of the effect of one factor depends on the level of the other factor. So there’s an interaction between the two. We’re gonna continue our discussion on interactions also next week, where you can see how useful it is to also test for interactions, because that can tell you something about, for instance, a moderation effect between several variables. Great, are there any remaining questions on the issues that we talked about in the last week? Yes, please. (audience member speaking off microphone) So the question is whether it makes any difference if we first calculate an effect size and then a p-value. Is that the question? So basically it doesn’t really matter in which order you’re calculating these different things. But I mean maybe that’s the background of the question is can it occur that we have a large effect size but maybe then a non-significant p-value and that can happen. So for instance it can easily be the case that if we have a relatively small sample size we might have a relatively large effect or medium effect but because we have only a a very, or relatively limited basis for evaluating our null hypothesis, you might get a non-significant effect size, non-significant effect. At the same time, we can also get a significant result, like a significant t-test, for instance, or a significant analysis of variance, but observe a very, very, very, very small effect size. For instance, when we have a very large sample size. And then of course we have to think whether we’re still interested in this significant result because if the effect is actually pretty small, then we have to think whether it’s practically still important for us to interpret that difference that we observed. So it’s always very informative to both look at the p-value and at the effect size and then depending on the size of the effect, to consider whether the effect size is relevant for our practical case in front of us. That’s always a good practice to also evaluate and comment on the effects that you observe. Other questions? Okay, that’s not the case. So let’s start our journey into the world of regression analysis. My goals for this week are as follows. First, that you know why you’re actually computing a regression analysis, what kind of purposes we might have in mind when we are conducting a regression analysis. Second, the second goal is that you have understood the parameters of a simple linear regression model and how these parameters are estimated. Usually a software will provide you with the estimates of these regression models, but I also wanna give you a general sense of what kind of computations are underlying these estimates. Then a third goal is that you know how to evaluate the results of a regression model analysis statistically. So how we can make sense of, for instance, estimated regression, coefficients, and you won’t be surprised that for these different statistical analyses or evaluations, we will be using p-values. A third goal, a fourth goal is that you know the assumptions underlying simple linear regression and you also have understood how to check for these assumptions and see to what extent these assumptions might be violated. That’s our program for today. So here’s again our position tree through the world or through the forest of different statistical tools. And today we’re gonna be visiting the upper branch here of this tree where we’re talking about situations with a continuous dependent variable and where we have a continuous or mixed set of independent variables and we have one predictor or one independent variable. And just to get a general orientation and the next week then we’re gonna look at the somewhat extended situation where we have more than one predictor. Okay, and the key difference as you’re gonna see compared to last week is that we have a continuous or mixed independent variable varies. Last week when we talked about the t-test So when we talked about the analysis of variance, we had categorical or nominal level independent variables in front of us. So recap. Last week we discussed the situation where, in the context of the t-test, where we compared Chinese and Japanese business leaders in terms of their decision style. And in this situation, we had a nominal dependent variable— sorry, nominal independent variable, namely the nationality, so Japanese versus Chinese. And we had a continuous dependent variable, namely decision style. So the score that was measured for each participant in terms of the preference for an analytic decision style. Okay, and by the way, in terms of terminology, we can also refer, and we will be referring to the independent variable as predictor, and we will be referring to the dependent variable as the outcome variable, same thing. So don’t get confused, okay? So whenever we talk about a predictor, we are referring to the independent variable, and whenever we talk about an outcome variable, we talk about the dependent variable. We can really use these terms interchangeably. And basically the only change, or the only extension that we’re looking at today is that we’re gonna be looking at a situation where the independent variable is continuous. And here’s an example. Let’s assume we’re working for a music company and we’re interested in album sales, and we want to predict these album sales as a function of how much money we pump into the advertising machine. That could be different amounts of euros or dollars, and that is then a continuous variable. In this case, we would not be able to use a t-test or an analysis of variance, but because we have a continuous interval level scale predictor or independent variable, we’re going to be using a regression model. So now, of course, you could ask, why should we be using a regression model? What kind of purposes? What kind of questions does a regression analysis answer? And one key reason why we might be interested in running a regression analysis is simply that we want to describe the relationship between a dependent variable or an outcome variable and a set of predictors or independent variables. So simply to describe what we have in our data, to describe a regularity that we observe in the given set of observations that we have. So purely for descriptive purposes. Second— I’m going to give you an example in a second. The second goal of a regression analysis is to fit a model to estimate the relationship between a dependent and independent variable or a set of independent variables. Multiple regression is a case where we have multiple independent variables, so this is what we talk about next week, but we’re gonna be using the model not only to describe what we have in our set of observations, but because we wanna make predictions for a new set of cases. So here’s an example, one example. For instance, let’s assume we’re interested in the association between risk-taking and how the risk-taking of a person is associated with how old the person is, how wealthy the person is, and what kind of emotional level the person usually feels, like how much affect, positive affect, negative affect the person experiences. So risk taking would be our dependent variable, our person’s willingness to take risks and the person’s age, wealth and affect would be our independent variables or our predictors. And for instance, we could describe the association between risk on the one hand, risk taking on the one hand and age, wealth, and affect on the other hand. There could be a purpose for our regression analysis, but maybe we also wanna be able to predict how willing a person is to take risks. Although we haven’t measured a person’s willingness to take risks, but we have knowledge about or information about the person’s age, wealth, and affect. And based on a regression model, we’re making a prediction for this new person, where we have observations on these predictor variables, and we want to make a good guess at how risk-taking the person might be. And this is something we could do based on our estimated regression model. Second example, coming back to our music sales. So we might be interested in the association between the number of music albums sold or downloaded and how much advertising budget we invest for a particular album, and how much we make sure that radio stations play a particular album. So in this case here, the sales of the music albums would be our dependent variable, and the size of the advertising budget, and the amount of airplay would be our independent variables, or our predictors. And for a new album that’s just coming on the market, you might consider how much money we would like to invest promoting this album and also how much we would like to make sure that this album is being played on the radio and then based on these values on the predictors we might want to make a prediction how successful the album might be. Okay, and we could also do this by using a fitted regression model that was trained basically or estimated based on the previous dataset to make a good guess about how successful the album will be. So these are two main purposes of running a regression analysis. It could be very useful to make predictions for new cases using a regression model. And today, as I said, we’re going to be looking at the case of so-called simple linear regression. And it’s called simple linear regression because the regression model that we’re going to be discussing as a starting point is simple because it is only one predictor. So we have a very simple situation where we have one predictor and that one predictor predicts the one outcome variable or dependent variable, okay? So let’s take a simple example to illustrate simple regression analysis. Let’s assume we’re looking at the association between two variables, the weight of a person, how heavy the person is, and a person’s height. So how tall the person is. And let’s assume we have a bunch of observations, let’s say 100 observations from people, and you can see how these measurements of these people are shown here on this figure. Does anyone remember what this type of plot is called? It’s a scatter plot, exactly. So very simple, we were working with things that we have talked about previously. And we already talked about how we can quantify the association between these two variables. Does anyone remember how we could quantify this? (audience member speaking off microphone) Yes, this looks like a positive variation. Does anyone disagree? Probably not. But could we quantify the strength of the association? We could, for instance, compute a Pearson correlation if we’re interested in the rank correlation between these two variables. or we could correlate a Pearson correlation. Let’s assume we collected a Pearson correlation and we obtain actually a pretty high correlation of 0.92, actually makes sense, because usually heavier people are also taller. Not strictly speaking, but the association is pretty close. Okay, now, you would maybe want to model this relationship between height and weight, and also be able to make maybe a prediction for someone else, just knowing how heavy the person is, and make a prediction about how tall the person might be. How could we do this? Well, we could build a very simple statistical, mathematical model on this relationship. And this is basically simply by drawing a straight line through these data points. It’s a statistical model, very simple model that we can understand mathematically pretty easily. And by the way, just in terms of terminology, for this model, we assume for the time being that weight in kilograms is our independent variable or our predictor, usually referred to as X, as variable X, and the person’s height is our dependent variable, usually referred to as Y. So that’s our outcome variable, okay? So we are trying to predict the height based on the weight of a person. Now, if we have a model here that’s formulated geometrically in terms of a straight line, we can simply express this mathematically as an additive function that has two parameters, B zero and B and that B is used to multiply our predictive variable X. And in this simple equation that basically consists of these two elements, the Y hat, as you can see here, that’s our predicted value of Y. So whenever you see a hat, that means we have an estimated or predicted value of something. And we express this predicted value of y as a function of b0 and b times x. And this x is the value of our predictor variable. And b is the slope of our regression line. So when we have a very steep slope here, then we have a higher value of b compared when we have a relatively shallow slope of this line here. Now, importantly, this value of B, so the multiplier to X, what expresses how strongly our prediction of Y changes if X, our predictor, increases by a unit of one. Because we’re multiplying B with X, as I said, it indicates basically how much our prediction is being sped up by an increase of the value X by a unit of one. So whenever you see a regression coefficient or a slope, try to understand what this means in terms of changes. And we’re gonna do this quite a bit in the exercise also. And then the intercept, that’s the name for the second parameter, B0 here. That’s the value of our prediction when the value of the predictor is x is zero, which actually makes absolute sense because if x is zero and this entire party of the equation is zero and then B0 is the only thing that determines the prediction of the value y. Okay? So, what does this all mean in the context of predicting a person’s height based on a person’s weight? This means that when we’re predicting a person’s height based on this data set that we have here, we have, for instance, here an estimated, and I’m not going to tell you in a second how we estimated these values here, we estimated an intercept of 125.2, and we estimated the slope or the regression coefficient b of our value of 0.6 and that value is used to multiply the respective value of the predictor white. So in applying what I pointed out up here for the interpretation of the slope, this 0.6 means that if we increase the value of our predictor, which is weight, by one kilogram because that’s the unit in which we express our predictor. So we have to have one person with one kilogram heavier than another person, that increases our prediction of the person’s height relative to the other person by 0.6 centimeters, because that’s the unit in which we express our dependent variable height. So that’s in centimeters. So we increase weight by one kilogram, that’s our unit. That means the estimated value on the outcome variable increases by 0.6 units of the dependent variable. That’s centimeters, so 0.6 centimeters. Okay, got it? Great, so, Actually, if you think about this regression equation, you’re absolutely entitled to say that’s absolute nonsense. That’s an absolute nonsense for the following reasons, because the intercept, as I introduced it previously, means that’s the value that is our prediction of our prediction when the the value of the predictor is zero. In other words, in this specific case here, this means that for a person who’s weight less, where the value of x is zero, we predict that the person is 125 centimeters high. That’s nonsense. But that’s basically, strictly speaking, the prediction that falls out of our model and our model is based on the observations we have. But what I wanted to say is that the value of this intercept is difficult to interpret because it doesn’t really make sense in the specification. For that reason, it’s oftentimes useful to normalize or transform the value of the predictor, such that we have an easier time interpreting the intercept. And the transformation that we’re doing here is that we’re so-called censoring the predictor. And censoring a value of a variable is really simple. We simply subtract from each value the mean value in our sample. Okay? So that among the transformed values, the value zero on the predictor means someone has exactly the mean value in our sample. So if you simply, each transform, the values, you see that the distribution looks exactly the same as with the non-transformed values, but now here our predictor has a mean of zero and that’s the same as the mean value on the predictor, namely the mean height, weight. And if we then use these centered values on the predictor and compute, again, our regression model we see that we get exactly the same slope, we get exactly the same value here, 0.6, but we get a different intercept. And now this intercept actually makes a bit more sense because it expresses what our prediction is for a person’s height, in this case here, when the person has the mean value, when the person has a value of zero on the predictor, but here now we now have transformed the value is such that value of zero means that’s the mean weight. And that basically means this 168.3 is our prediction of a person, person’s height, if the person has the mean weight. So sometimes it can be very useful to simply transform these values on the predictor to have a better time interpreting the intercept. Here you can see in words what the intercept means. Okay, so that’s sometimes a useful mathematical, it’s not a trick, it’s a simple convenience basically to make better sense of the estimated parameters. That was the question. Do you normalize both the x-axis and y-axis? Would the co-option of weight change? Yes, of course. Because I mean if we if we also do this no centering for the for the for the height I mean it still means that a value of zero is the value the mean value Mean height because to do this, but in this specific case. I think it’s actually more more More informative to know what the predicted height is if someone has the average weight That was another question over there. Well, yes, you’re right. This is a wrong model. Every model is wrong. Every model is wrong. A model, whether it’s a statistical model or a cognitive model or an economic model, is a simplification of the world. We have to simplify things in order to make them understandable for us, whether that is for laymen out there, whether it’s for an econometrician, whether it’s for any other kind of scientists. So any model that we have is wrong. But most models are still useful because they simplify in a very efficient way and still allow us to say something about the world. But of course, we’re throwing away or we’re squinting our eyes a little bit and we’re ignoring things that are not so interesting for us. Yes, please. [INAUDIBLE] So the question is, when do we actually use these centered predictors, or when do we center the predictors? Some people would argue that it’s always useful to center the predictor because then you’re able to interpret the intercept in a meaningful way. Which doesn’t necessarily mean that still all the predictions make sense because if you extrapolate across, outside of the observation that we have, there might still be nonsensical predictions. But that again is the case with most models that we are working with. Okay, other questions? Okay, I pulled these estimates for the intercept and for the slope a little bit out of the head. But let me tell you at least briefly and roughly how they are being estimated. And in order to illustrate that, I simplified the dataset a little bit so I only worked with 20 data points here to illustrate the so-called method of least squares, sometimes also called ordinary least squares. And that’s a mathematical procedure that allows us to estimate parameters that do the best job basically in describing the data. We are making errors when we’re estimating these, when we’re estimating these parameters and our model will make wrong predictions, But the method of least squares makes sure that the error that we’re making is the smallest possible. So what is the method of least squares? So what the method of least squares uses or is based on is the deviation of the model. So here are straight line in the observations. And here I drew these lines that indicate or express how strongly the prediction of the model that’s indicated or represented by this straight line of y hat deviates from the observed value of y. Okay, and the longer the vertical lines here, the larger the error. And the error, so the deviation between the models, so the predicted value of y represented in this straight line here, and the observation is called the residual. It’s basically the error that we’re making when we are coming up with our simpler model. And of course, it doesn’t capture the world perfectly. After all, it’s a simple model. It’s a simple linear regression. It was a one predictor. And what the method of least square does is that across different types of lines we could have here with different slopes and different intercepts, it identifies the line that makes these errors, these residuals as small as possible. And conveniently, there’s a simple mathematical approach for identifying which values of the intercept and of the slope here give us the line that minimizes the residuals, the deviation. And we get this and you don’t have to remember this perfectly here, but the slope is calculated by the covariance between the predictor and the outcome variable xy divided by the variance of our predictor as x squared x. So that gives us the value that minimizes, that gives us the value of the slope that minimizes the residuals and for the value of the intercept B zero, that’s the mean value of the predictor, of the intercept, sorry, mean value of the outcome variable, of the dependent variable y, minus b, so the one value that we identified here, the slope, times the mean value of the predictor x. So there’s a simple mathematical equation where in a closed form you can calculate the value of b, is the slope, and b zero, the intercept. This is where these numbers come from. And good for you, computers usually do this for you. You don’t have to do this by hand. That wasn’t the case 80 years ago. But now you can rely on computers doing this method of least squares for you. Any questions? Yeah. - We have like an outlier point, which is like very easy. - Yeah. - Since we are making the square a bit error, like it means a lot. - Yes. - We have like a method for identifying the outside for that particular case. - So the question was, what happens an outlier like a really strange value that is not really in the pattern of the other data. Very good question. That actually can distort our estimates of the slope and the intercept quite a bit. Also because as you can see here when we’re computing the least squares we’re squaring the difference between the model prediction and the observations. But the good thing is there are also There are also methods that we’re now discussing here, but there are methods that are relatively robust against the influence of these outliers. So they’re robust methods of doing a regression. If you look at a textbook on regression analysis, you sometimes will also find a chapter on robust regression, which is exactly in this direction. Good question. But yeah, so if you see visually in a scatter plot, for instance, that you have an outlying data point, you might have to consider to what extent this outlier might distort your estimates of the regression model. So always get a sense of the data that you have in front of you. Let’s assume we have estimated our regression model consisting of an intercept on our slope and the model captures to some extent our data to the extent that our model, for instance, captures a linear trend in our data. There might also be situations, we’re gonna come to that, also situations where the association between the predictor and the outcome variable is not linear. So our model is wrong. The question is how can we identify that? So let’s assume we have identified our model, we have our estimated intercept. Here we’re going back to our originally nonsensical intercept of 125 centimeters for a weightless person and a slope of 0.6. And we have a visual representation of our model here in terms of the straight line. How good is this model? A, and how much variance of the outcome variable is accounted for, is explained by the predictor. And the second thing we would like to answer and we might be interested in, to what extent is the value of the regression coefficient b, the slope, significantly different from zero? These are two statistical things that we might be interested in. Short question, imagine we think of a regression coefficient that exactly equals zero, what would this represent? What would this indicate? (student speaking off mic) Exactly. So, and this is sensibly for the context of regression analysis, our null hypothesis. Okay, so we’re trying to evaluate to what extent for a given value of the slope, we can reject the null hypothesis that the beta, the slope is zero. that’s our null hypothesis. And actually you could also think of the beta equals zero situation graphically, so it would be the case where we have a completely straight line. And you can also think what this means in terms of the prediction. That would mean that if the independent and the dependent variable are completely unrelated to each other, we would, using our regression model, predict exactly the same weight, height, for everybody irrespective of their weight. So that would basically mean if b equal to zero. Okay, let’s return to our statistical evaluation of these two aspects of a regression model, how much variance does the model predict, how much variance does the model account for, explain, and is our regression coefficient significantly different from zero? By the way, we usually really only interested It is statistically evaluating the value of the slope of the regression coefficient. Most of the time, statistically evaluating the intercept is not very interesting, it’s not so sexy. Because it really doesn’t mean anything. Or you can interpret it, but whether it differs from zero or not, that’s usually not very interesting. It’s not part of our substantive research question. There can be situations where it might be interesting, but in many cases we’re really focusing on statistically evaluating the estimated slope and not the estimated intercept. Okay, let’s turn to the first question. How much variance does our model account for? What’s the amount of explained variance? And one way of thinking about this issue is that we can compute the Pearson correlation between the predictor and the outcome variable and we get again a value of.923. And now we can simply compute R squared and get a value of.85. And squaring the Pearson correlation gives us an indication, very simple, of how much variance we explain for with our model. And actually, very simple, we explain 85%, actually 85.3% of the variance. Okay? So it’s actually quite a lot. So this is how you can actually get very simply to the amount of explained variance to simply square the Pearson correlation if you have one predictor and one outcome error. I think that was a question somewhere. (audience member speaking off mic) Yes, actually it’s on my next slide. So this is one way of simply getting to a numerical expression of the amount of variance that we explain with our model. But what does it actually mean, and how can we potentially visualize it? So here’s one way of visualizing the amount of explained variance. So let’s assume we have our different heights in our sample. And then we can see here, we can indicate here the mean height. That’s our mean value of y. And we can see for how the individual heights of the individuals differ from this mean value. And if we compute these deviations of the individual heights from the mean height and square it, that is our total variation in the dependent variable, in the predicted value, in the predicted variable, in the outcome variable. And the question is how much of this variance that we can compute in this way, does our model predict or account for? And we can also compute the amount of variance that is produced by our model, that is predicted by our model, by again having here the mean value of the dependent variable. And here we have now our regression line. And we can compute for each observation that we have the difference between the predicted value of y, so this is y hat here, based on our regression model, and the mean value. And we again can square these differences, sum them up, and then we have our sum of square, by the way, this is very similar, actually it’s the same thing as we talked about last week when we talked about sum of squares in the context of analysis of variance. So we compute here the difference between the predicted value and the mean value of y, square it, sum it up, and that gives us our sum of square of our model. Just one more step. We can relate now the total variance that we have and the sum of square of the model, and that gives us our r squared. It’s exactly the same value, But this is how you can think of the amount of explained variance. We look at the total variance we have in our sample. We look at the variance that is produced or predicted by our model, and then we relate the two. And then we have an indication of the amount of variance that we have in our data that is explained by our model. That’s the question. >> [INAUDIBLE] (audience member speaking off microphone) It would be that, you’re actually right. Yes, sorry, that’s a typo, that’s true. That should be a Y, thanks for pointing this out. Okay, but I explained it hopefully correctly. Okay, so, and we can do the computations here for this way here, so we have a large number here And we have an even larger number here. And that gives us exactly the same value for the R-square as we had previously when we squared the Pierce correlation. Yeah. - When you look at the R-square for K to five. - Yeah. - This is conclude that in the example of Bayes and Pies. So when you look at the people in different ways, A to five percent of the reasons why they are five are different and they expand by using the differences in their events. Yes, exactly. So we just got a wonderful interpretation of this amount of explained variance. We can explain 85% of the variance in observed height based on the observed variance in weight. So basically, that also means that the large factor that explains how heavy people are is their height. Yeah. >> Why didn’t we just say x, y instead of x, i as we define it in the— >> This is why because we calculate here how strongly the values on the— >> Okay, why— >> On the dependent variable deviate from the mean value of the dependent variable. >> Okay. >> Yeah. I kind of correct this on the slide and I’m going to upload a new version. Or you can also correct this on your slide yourself. Yes, please. (audience member speaking off mic) No, well, the question is how can we increase our R-square? Well, that depends really on the data. So, oftentimes we would like to increase our R-square by building a good model of the dependent variable. Either that sometimes this means that we are, if we, for instance, have a nonlinear relationship between the dependent and the independent variable and we’re looking at a transformation that takes care of this nonlinear relationship, sometimes we can also actually, oftentimes we would like to increase our R-square by adding additional predictors. And this is what we’re gonna be looking at next week. But this is exactly why we’re often adding additional predictors, we’re adding additional predictors. Here we have only one, because we want to increase our squared. Okay, so we looked at the first aspect, or got a handle on knowing how much variance in our dependent variable we can actually explain based on our predictor variable. And here we see how we can calculate this R squared. Let’s turn to this, yeah. Oh, one thing I forgot. So this shows you what I showed on the previous slide where we have the amount of variance that is predicted by our model and we have our amount of variance in our dependent variable. Sometimes in statistical programs, you can also find an adjusted R-square, which is a function of the original R-square. And you can see here that this adjusted R-square takes also into account the number of observations we have and the number of predictors that we have. Okay? And this can sometimes be useful because if we have only few observations, so when the N is relatively small, we shouldn’t take our model too seriously. So we shouldn’t claim that based on our model, we can actually describe the population, the general population, like a larger world, with our model. And this is why we’re gonna attenuate our estimation of the amount of explained variance. And this is why in this attenuated or adjusted R-square, we have the number of observations. And when we have a relatively small n, And at the same time, a relatively high number of predictors like k, that value gets attenuated. We don’t really see it here. We can compute this adjusted R-square here, and we get an adjusted R-square of 0.852 rather than a value of 0.853. So it’s slightly attenuated. But this is due to the fact that the model that we have here is relatively simple. we have only a single predictor, and we have a relatively large number of observations and we have 100. If we had only, let’s say 10 or 20 observations, we would get a stronger attenuation. You can actually do this yourself at home tonight, for instance. Okay, so this takes care of the first aspect of the regression model that we would like to evaluate and then we how much variance in the dependent variable our model accounts for. Let’s look at the second aspect that we’re interested in, namely to evaluate the regression coefficient b. So whether our estimated value for the slope is different from zero. And the good news is we’re gonna be using test statistic that we have used previously to do that, and that is the t statistic. Okay, so we’re gonna be using the t-distribution, or a t-distribution, to evaluate to what extent our estimated slope, the regression coefficient b, is significantly different from zero. And we get to the value of the test statistic for a given regression coefficient b by simply, well, simply by dividing it by the standard error of the estimated slope, regression coefficient. And now you might wonder like what’s the standard error of the estimated slope or regression coefficient B? Well, here’s the formula. You don’t need to learn this by heart, but just to understand it conceptually. So we have here in this calculation of the standard error for the regression coefficient, the standard deviation of the difference between an observed value of y and an estimated value of y. That is the residual, the C error that we’re making. And we can estimate the, actually do this by hand in principle, we can calculate with the standard deviation of how strongly our predicted values of y differ from the observed ones. So that’s in the denominator. And in the denominator, we take the square root of the summed differences of the values of the predictor x from the mean value of the predictor x. And we square it, sum it up, take the square root, and that is in our denominator. That gives us an estimation of the standard error of the regression coefficient. Basically it means, or it expresses how much we believe in the position of our estimate of the slope. The smaller the standard error, the more precise our measurement is. I mean, you can see this. If, for instance, the residual is very small, then we have actually a pretty good indication that we have a good model and a precise estimate of the beta coefficient. And then, because we are evaluating the t value here based on a t distribution, we also need, as in the case of the t test, a degrees of freedom, because depending on the degrees of freedom, we get different t distributions and the degrees of freedom for this specific case here is calculated as the number of observations that we have minus k and its number of predictors that we have plus one. In our given case here where we have one predictor, k equals one, okay? So, and if we have calculated our degrees of freedom, we can plot out our t-distribution. Here it is. It’s centered around zero, because we’re interested in testing whether the slope differs from zero. We can, based on the shape of the t-distribution, calculate our critical t-value, based on our alpha level. Here, by the way, we have an alpha level of 0.25 because in principle, we could also be testing the lower end here of this distribution. So it’s a two-tailed test, but here, because we know that the value is positive of the regression coefficient of 0.6. And we can calculate our critical value of the t statistic, which is here, 1.98. And we can also calculate our empirically observed value of t, so it is the t value that we get for the estimated slope of 0.6. And you don’t need to read wrong, it’s a very large value. It’s 23.87. So that means that our p value coefficient of 0.6 is very large, it’s very small, and we can confidently reject an ally hypothesis because it would be very unlikely to observe a value of t that large if the t distribution was actually centered around zero. So we conclude our estimated regression coefficient, our estimated slope of 0.6 is significantly different from zero. Which actually makes sense if you think because we are estimated based on the regression coefficient, the strength of the correlation between weight and height and we’ve seen that we get a very very large Pearson’s coefficient of 0.9. So eventually 0.6 here in this specific case here means of it’s a very large value and therefore we get a highly significant P-value. Yeah? Yes, exactly. Yeah, so you can say that in our given case where we have the We’re predicting a person’s height from the person’s weight. The predictor weight significantly predicts the person’s height. Other questions? Yeah. So it would also mean that our predicted slope of the regression line is correct in this case, because we have a very small p-value? Well we don’t know whether it’s correct or not. we can just say that it’s very likely, well, we do as if the value of 0.6 is not zero. So we can confidently reject the null hypothesis that the actual value of B is zero. Other questions? Yeah. (audience member speaking off mic) >> Can you say it again? >> Can the R-square adjust the value? >> Yeah. >> Is it ever possible that we’re adjusting the weekend value? >> No, usually it attenuates the value always. Yeah. >> Or predictors that might actually hurt the reliability model? >> Yeah, well, yeah, thanks for raising this. We might come to this also in subsequent weeks, so on. Next week when we talk about multiple regression, the R-square adjusted basically attenuated attenuates the R-square if you have many, many, many predictors. And it attenuates the explained variance because we might be doing something that is very undesirable and we might be overfitting. We might overfit the noise that we have in our data. And we might have a higher chance of overfitting if we have a very complex regression model. What does it mean to have a very complex regression model? it means we have many, many, many predictors. In this case, there’s a very low chance of overfitting because we have only one predictor. But next week, when we’re going to be looking at the situation with multiple predictors, there is a real chance that some of the variants that we are explaining with our model is actually overfitting. We’re going to come to that next week in more detail. Now you might wonder like how much can we actually rely on the prediction or on the model that we have come up with by estimating a value on the slope and a value on the intercept. And there are actually ways, thankfully, to express how much confidence we can have in our predictions that derive from our estimated regression model. And what you can often get from software is two kinds of intervals. One is the confidence interval. So basically it’s really the confidence interval that we also already talked about two weeks ago that gives us an indication of the precision of the value of the average Y, so the average value on the dependent variable for people with a given value on the predictor x. So basically with our model we can make a prediction based on a given x. And we might say, well, for everybody who has got this value of x on the predictor, what is the mean value on the dependent variable across all these people? And the confidence interval gives us an indication how well we are in capturing this mean value across people. So the confidence is always the precision that refers to the precision of capturing the mean value of people for a given value on the predictor. And it can be calculated based on this relatively complex formula that you don’t need to learn by heart, but you can see some familiar components here. So here we have our deviation of the observations, X on the dependent variable from the predicted values of X with residual squared divided by N minus two, don’t ask me why. You take a square root of that, and here you’ll have also the deviation of the values of the predictor, of the individual values of the predictor from the mean value on the predictor u squared, that you sum it up and you divide it by n minus one times the variance of the predictor x, okay? And you multiply all this with a critical value t. So that’s actually, if you go back to the definition of the confidence until, well, that we talked about two weeks ago, it’s very much related. So with that, here with this expression here, you get a sense of the size of the interval, and you add that and you subtract that from the estimated value of y, and then you get a confidence interval for the position of the estimate of the average y for people with a given value on the predictor. In addition to this confidence interval, there’s also something called the prediction interval, and this is actually a bit more demanding, or it’s a bit more, maybe a bit more sobering also, informative for the prediction or the accuracy of the position of your prediction for not an average value across different people with a given x, but for an individual with a given value x. So here we’re really talking about how confident you can be for making a correct prediction for an individual with a given value on the predictor. And the calculation of this prediction interval is actually very similar to the confidence interval, except that you have here one plus here. Okay, what this means is eventually that, and you can see this here, that for a regression line that we estimated, and I’m taking here the example that we computed earlier today, the regression line predicting a person’s height from a person’s weight. Here we have the regression model or our regression line. And this tiny thing might be difficult to see this light gray area here. This is our confidence interval. So we can see that the confidence interval is actually pretty precise. And here this larger thing, this dotted line here indicates the size of the prediction interval. And you can see easily, and that’s always the case, that the prediction interval is much larger than the confidence interval. And this also makes intuitive sense because it means that when we’re making a prediction for an individual person, we always need to be much more modest for our position compared to when we’re saying, well, for people like you, on average, we’re making the following prediction. For that latter thing, our confidence interval but when we’re making a prediction for a person’s height, or very specific person’s height, given a very specific person’s weight, we need to be more careful, and therefore, in this case, our prediction interval is larger. And I wouldn’t tell you this here if JASP wouldn’t give you both the confidence interval and the prediction interval, and it’s oftentimes useful to report both, just to make sure that you get or you convey the due caution that is associated with your prediction model. Okay, any questions? Now, very importantly, let me also talk about the assumptions that are underlying a regression analysis because there are some that we need to take care of and that we need to test. And actually there are three different assumptions that are important when we are conducting a regression analysis. The first assumption is, and we already touched on this point briefly earlier today, the first assumption is that the relationship between the outcome variable, so the dependent variable, and the predictive variable is linear. Remember our regression model is a straight line. That means that our regression model is linear. And if we’re imposing this linear model on our data, we are assuming that we can actually capture the data, the relationship between the dependent and independent variable with this assumption of a linear relationship relatively well. So that’s our first assumption that the dependent and the independent variable are related in a linear fashion. I’m going to say more about this in a second. The second assumption in regression analysis is a word that you probably need to get used to because it’s not part of your everyday language. The second assumption is the assumption of homoscedasticity and the opposite of homoscedasticity. So when we have a violation of homoscedasticity, we have so-called heteroscedasticity. So what is homoscedasticity? Homoscedasticity refers to the variance of the errors that we’re making with our model. So the variance of the residuals. And homoscedasticity means that we’re expecting or hoping that the error that we’re making in our prediction is the same for all the different levels of our predictive variable. I’m gonna show you how to check for this assumption visually in a second. And then our final assumption is that the errors that we’re making or the distribution of the errors that we’re making, that the distribution of the residuals is normal. We don’t have Wiles-Skew distributions. So how to check for these three assumptions? The assumption of linearity, the assumption of homoscedasticity, and the assumption that our residuals are normally distributed. So the first assumption I mean that the assumption that the relationship between dependent and independent variable are related in a linear fashion we usually you do a visual check where we look at a plot like this and jazz gives you a plot like this where we get for different values on the of the predictor X the amount of and this direction of the residual of the error that you’re making. So you compute for each values on the predictor the amount of error that you’re making and what you’re hoping for Is that on average there’s no pattern so if you don’t see any kind of pattern you’re good So and if you don’t see a pattern that means that the average value of the of the of the of the error that you’re making Is the same irrespective of where they have a low value of the predictor or a high value of the predictor Across the different values of x we’re making the different Or we’re making errors that are equally or roughly equally sized then we don’t have a problem then this indicates that our assumption of linear of linearity between the Dependent independent variable is fulfilled. Here’s a situation Where you do have a problem where you’re doing exactly the same thing as before you predict the residual, the standardized or non-standardized, usually it’s the standardized residuals for different values on the predictor variable and you clearly see that you have different kinds of errors for different values on the predictor. So you have positive residuals, you’re overshooting basically the true values for low values of the predictor, you’re undershooting for medium values of the predictor and you’re again overshooting for high values of the predictor. If you see a pattern this, this strongly indicates that the actual relationship between the predictor and the outcome variable is nonlinear. So the assumption of linearity is violated. So here’s a case that we’ve talked about previously. So the association between weight and height. We already see that by and large, we can capture this data cloud here relatively well, not perfectly but relatively well with this assumption of a straight line. And this is the residual plot here. For our case where we’re plotting for each of the different values of the predictive weight, the deviation of the observations from the straight line and you can see that there’s no pattern. Here’s another example where we’re predicting for different countries. Each point here is a country. that the happiness in our country, so our happiness, how happy people are in a different country as a function of the wealth of the country, and you can see here that this straight line doesn’t do a good job in capturing the data. So how does the residual plot that we talked about previously look like in this specific case? Yeah, and you can clearly see that there is a pattern. we have relatively equally distributed errors for low values of the predictor, but we have largely negative values for large values of the predictor. That means we have a pretty strong violation of this assumption of linearity. Now, what does it mean, or what can we do when we have this violation of linearity? So this is here how the, And that’s a histogram for the distribution of the predictor, the GDP, the wealth of the different countries, and we can clearly see that this is a non-normal distribution. So we have many, many small values and then very few large values. And what we can do for these predictors, we can transform these values using a so-called logarithmic transformation where we have squashed together the large values. So we’re absolutely entitled to do these kinds of transformations. And then if we transform these original values here using this logarithmic transformation, we get a not perfectly but more normally distributed distribution of the predictor than previously. And now based on these transformed values, we can rerun our regression analysis and see whether that improves our assumption that we are relying on a linear relationship between predictor and outcome variable. And now we are again predicting happiness, but now based on these transformed GTPs, see I have a very different unit here, but you can see here now that the assumption of a our linear relationship between these two variables is much more in line of the expectations that we have. And you can also see this when we plot the residuals. Now we’re actually looking much better here. So we can respond or we can react to violations of linearity by using a log transformation. It’s oftentimes a very helpful thing to do when we have this kind of violations in front of us. - Does clustering or residuals help in correcting the heteroskits, heteroskits is key? - What do you mean by clustering? - Clustering the standard errors, for example. In our regression model, you establish clusters of in our— - I’m not exactly sure what you mean, but in general clustering doesn’t help of getting rid of this violations of linearity. The different situations where you might have a problem because your L residuals are clustered, but that’s a different story. Can talk about this later if you’re interested. Okay, so this is A, how you can check for this assumption of linearity, this assumption that the independent variable are associated in a linear fashion, and also showed you one way, and that’s oftentimes a good starting point, to react when you have a violation of linearity. Okay, let’s talk about the second assumption, this assumption of homoscedasticity. How can we check whether the residuals that we have are homoscedastic? And in words, our assumption of homoscedasticity means that for different levels of the predictor, the variability of the residuals are the same. Again, we have a visual check for this, and this again looks very similar to the previous check. So, but we’re looking for something else here. So when we’re again plotting the residuals, so the errors that we are making with our prediction model, for different values of the predictor, we’re not looking as in the previous case, whether the mean value of the residual is different, of varies for different predictor values, but whether the variability or the variance of the residuals is approximately the same across the different values of the predictor. And that’s roughly the case here, okay? So varies strongly for low values of the predictor, and the residuals vary strongly for medium values of the predictor, and the residuals vary strongly for high values of the predictor. That’s good. So we have homoscedastic errors. Here’s a situation where the assumption of homoscedasticity is violated. So for instance, when you have low variability of the residuals for low values of the predictor, but then you have either increasing or then decreasing, but in general changing levels of variability of the residuals across different values of the predictor, then your assumption of homoscedasticity of the residuals is violated and you have heteroscedasticity in front of you. What does that mean? Well, what problems does this create? If you have violations of homoscedasticity, this usually, let me tell you, doesn’t distort your estimated regression coefficients like the beta, but a violation of homoscedasticity can screw up your estimates of the standard errors of your beta, of your slope. So basically it means that your statistical test of the beta regression coefficient is screwed up or can be screwed up, okay? But the good news is there are also ways to deal with violations of homoscedasticity. So first of all, the standard errors in the case of heteroscedasticity, the standard errors of your slope might not be reliable, but you can compute so-called robust standard errors, for instance, with statistical software R. Okay? So there’s hope when you have heteroscedasticity. In each case, it’s very important to check whether the assumption of homoscedasticity is violated and then take respective steps. Okay, let’s turn to the third assumption, namely the assumption that our residuals, at the errors that our regression model makes, are normally distributed, okay? So we have roughly a proportionally, proportion or balanced distribution of low values, a lot of values around zero and then some high values. The high and the low values basically are equally frequent. And again, to evaluate this assumption of the normal distribution of the errors, we’re taking a visual approach and we are plotting, we’re producing a so-called QQ or quantile plot. Let me tell you in words what a quantile or qq plot does. What we are doing in a qq plot is to plot quantiles of two distributions. Maybe I show you a qq plot to get a sense. So a quantile is a a point in a distribution that expresses how many points in the distribution are smaller than that value. Actually, we already talked about one particular quantile some time ago when we talked about the median. The median is the 50th percentile. It’s the point in our distribution where we have 50% of the data lower than that specific value. If the median is the 50th percentile, you can also easily think of the 20th percentile, which is the point in distribution, where 20% of the data is smaller than it. Or the 80th percentile is the value in the distribution, where 80% of the data is smaller than a particular value. So you can actually go through a distribution and see how many data points are smaller than that. And you can do this also for two distributions. You can determine the value for one distribution, for instance, the fifth percentile, the tenth percentile, the 20th percentile, and the 50th percentile. And you can also do this for another distribution. And if you z-standardize the values of both distributions, you can plot the quantiles for each of these two distributions and then see what comes out. And this is exactly what a QQ plot does. It computes or determines the percentile that you would expect for a normal distribution that’s shown here on the y-axis, on the x-axis. The theoretical quantiles, the expected quantiles if a distribution is normal. And on the y-axis, you get the quantiles that you actually observe when you z-standardize the values. And then you hope and see whether these QQ plots yield a straight line. And if that’s the case, that indicates that your observed regression, observed residuals, that the distribution of your residuals follows the expectation of a normal distribution roughly. Okay, so you have on the x-axis here the expected quantiles if the values are normally distributed, here you have the quantiles for the data set and you hope that they fall on a straight line. And here you can see that we have a somewhat, not completely perfectly, but somewhat balanced symmetrical distribution. and here we have the QQ plot, we have some deviations up here, but roughly in this case I would say you’re good. Okay, let’s turn to the quantiles and the QQ plot for our regression model where we’re predicting the height of a person based on a person’s weight. Here we see the residuals, they look okayish, not perfect, little right skewed here, and here you can see the QQ plot. In this case also, we’re roughly okay. But let me also show you some cases where the assumption of normally distributed residuals is violated. Here you have a case where you can see in blue the expected distribution on a normal distribution. Here you can see the observed distributions of the residuals and you see that there are some data points relative to a normal distribution missing. So it’s a little bit right skewed and then you would get a U-shaped Q-Q plot, okay? Just to get some of the prototypical deviations into your head. Here’s a second prototypical deviation, the Q-Q plot you might see, where you have a left Q distribution. So we have some fewer observations here on the right, on the large, for the large residuals, and there’s a little bit of overshoot of small residuals, and then you have an inverted U-shaped Q-Q plot, okay? And that indicates that you have a left skewed distribution of the residuals. Then you might have a fat tailed distribution of your residuals, where you have a little bit too high kurtosis. So it’s a little peak. It’s too peaked this distribution relative to the normal distribution. Then you have this inverse S-shaped Q-Q plot. And finally, when you have a thin tail, so when everything is a little flattened in the distribution of your residuals, then you have this S-shaped Q-Q plot. So I present it to you for deviations for prototypical patterns of deviations in a Q-Q plot that might give you some orientation when you see an empirically observed QQ plot and diagnose to what extent your assumption of normally distributed residuals might look at. And if you have problems with non-normally distributed residuals, you can also, for instance, use data transformations. Okay, let me close by summarizing what we’ve talked about today in a couple of self-course questions. So give me a final minute. First self-course question. What are the key purposes of estimating a regression model? Why are we doing this? What do we hope to get out of a regression model? Second, what are the key parameters of a regression model as we’ve talked about it today? What are the parameters that are estimated? How are the parameters of a regression model estimated? At least according to the method that we talked about today. Fourth, why can it be helpful to center a predictor? What difference does it make to our regression coefficient or to our regression equation? Fifth, how is a regression model evaluated statistically? So what are the aspects that we’re considering here, both in terms of the overall model and in terms of the regression coefficients? So what are these two approaches that we’ve talked about today? And finally, what are the key assumptions in a simple linear regression? How can you check whether these assumptions are fulfilled? So hopefully this gave you the basics for this framework, this very important, very flexible, very powerful framework of regression models that we talked about today in the case of a simple regression where we have only one predictor. And next week, we’re going to be looking at more complex cases where we have multiple predictors where we talk about multiple regression. Here’s the background reading for the next session on multiple regression. And in the exercises today or tomorrow and on Wednesday, we’re gonna be looking at how to implement a simpler regression in JASP. Hope to see you there, have a good evening. Thank you. (audience applauding)