# What Is the Zero Conditional Mean Assumption

Another educational example is this: Imagine making a regression of ice cream sales over time to the number of people wearing shorts over time. You are likely to get a very large and meaningful parameter estimate. However, they will not run to the leaders of Haagen Daz and tell them that they should start advertising summer clothes. It is obvious that one variable is missing, temperature. This is a violation of the strict assumption of exogeneity, as the number of people wearing shorts (\$\$X) correlates with our omitted variable temperature included in the error term (\$epsilon\$). You do not estimate a value of 6 per b1. Reduce the total football score to the number of touchdowns and field goals, and you`ll almost certainly estimate that touchdowns are worth more than 7 points or more than 6. Your error term e in this case contains the points scored from extra points and two-point conversions, and these are almost certainly not zero, depending on the knowledge of the number of touchdowns. It is obvious that the observations on the number of employees in this example cannot be independent: today`s employment levels correlate with tomorrow`s employment levels. Thus, the i.i.d. hypothesis is violated.

It is easy to find situations where extreme observations, that is, observations that deviate significantly from the usual range of data, can occur. Such observations are called outliers. Technically, hypothesis 3 requires that (X) and (Y) have a finite kurtose.5 A striking example where the i.i.d. hypothesis is not respected is that of time series data, where we have observations on the same unit over time. For example, take (X) as the number of workers in a manufacturing company over time. Due to the transformations of the company, the company regularly eliminates jobs on a certain part, but there are also non-deterministic influences related to the economy, politics, etc. With R, we can easily simulate and represent such a process. OLS works well in a variety of different circumstances. However, certain assumptions must be made to ensure that estimates are generally spread over large samples (discussed in Chapter 4.5. My question is: how can this assumption be violated if the errors are equal to the actual observations of Y minus their conditional means (i.e.

for a section of the sample described by the same value of X)? The zero conditional mean of the error term is one of the key conditions for the regression coefficients not to be distorted. The following code roughly reflects what is shown in Figure 4.5 of the book. As described above, we use examples of data generated with the random number functions rnorm() and runif() of R. We estimate two simple regression models, one based on the original data set and the other with a modified set where an observation is modified to be an outlier and then record the results. To understand the complete code, you must be familiar with the sort() function, which sorts the inputs of a numeric vector in ascending order. What does that mean? It can be shown that extreme observations are heavily weighted in estimating regression coefficients unknown when using OLS. Therefore, outliers can lead to very skewed estimates of regression coefficients. To get a better idea of this problem, consider the following application, where we placed sample data on (X) and (Y) that are highly correlated.

The relationship between (X) and (Y) seems to be explained quite well by the regression line shown: all the white data points are close to the red regression line and we have (R^2 = 0.92). \$\$E(has beta) = beta + (X`X)^{-1}E(X` epsilon)\$\$ After generating the data, we estimate both a simple regression model and a quadratic model that also contains the regressor (X^2) (this is a multiple regression model, see Chapter 6). Finally, we record the simulated data and add the estimated regression line of a simple regression model, as well as the predictions made with a quadratic model, to graphically compare the fit. We start the series with a total of 5000 workers and simulate the reduction in employment with an autoregressive process that has long-term downward movement and has normally distributed errors:4 [ employment_t = -5 + 0.98 cdot employment_{t-1} + u_t ] Most sampling schemes used in collecting data from populations generate i.i.d. samples. For example, we could use R`s random number generator to randomly select student cards from a university`s enrollment list and record the age (X) and income (Y) of the corresponding students. This is a typical example of simple random sampling and ensures that all (X_i, Y_i)) are drawn at random in the same population. For more information on autoregressive processes and time series analysis in general, see Chapter 14.︎ ↩TotalFootBallScore = b1 * touchdowns + b2 * fieldgoals + e Total football score = 6 * (Touchdowns) + 1 * (ExtraPoints) + 2 * (TwoPointConversions) + 2 * (Safeties) + 3 * Field Goals.

The distortion is therefore \$(X`X)^{-1}E(X` epsilon)\$, which disappears when the \$E(X` epsilon)=0\$ Note that the estimate of the parameters in our simple ice cream sales model is skewed for the number of shorts. Once we have included the temperature in the model, the number of shorts parameters changes. More formal: Common cases where we want to exclude or (if possible) correct these outliers are when they appear to be typos, conversion errors or measurement errors. Even if it appears that the extreme observations were recorded correctly, it is advisable to exclude them before estimating a model, as OLS suffers from sensitivity to outliers. Now add another observation under, say, (((18,2)). This statement is clearly an exception. The result is quite striking: the estimated regression line is very different from the one we found to be well suited to the data. The slope is strongly deformed downwards and (R^2) at only (29%)! Double-click the coordinate system to reset the application. Feel free to experiment. Choose different coordinates for the outlier or add more. .