Chapter 12: Multiple Linear Regression and Certain Nonlinear Regression Models
MINITAB Project


STATISTICS EXPLORATION # 3:CORRELATION AND REGRESSION

PURPOSE - to use MINITAB to

BACKGROUND INFORMATION

Some terms and background information that are associated with correlation and regression are explained below.

Image452.gif

Scatter Plot of Diameter versus Volume

Image453.gif

Scatter Plot Displaying Little or No Correlation

Image454.gif

Image457.gif

Scatter Plot Displaying a superimposed Linear Regression Line

Image458.gif

Scatter Plot Displaying a superimposed Quadratic Regression Line

Image459.gif

PROCEDURES

First, load the MINITAB (windows version) software as described in Exploration #0.

NOTE: The procedures presented in these explorations may not be the only way to achieve the end results. Also, whenever graphs are presented, only the MINITAB graphics features will be used.

  1. OBSERVING CORRELATION AND PATTERNS THROUGH SCATTER PLOTS

In this section we will present examples that will enable you to get an understanding of the concept correlation through scatter plots.

Example 1: Consider the following table, which contains measurements on two variables for ten people: the number of hours the person spent riding a bicycle in the past week and the number of months the person has owned the bicycle. Present a scatter plot for this information with the number of hours along the vertical axis and the number of months owned along the horizontal axis.

Person

1

2

3

4

5

6

7

8

9

10

Hours Exercised

5

2

8

3

8

5

5

7

10

3

Months Owned

5

10

4

8

2

7

9

6

1

12

Enter the number of hours in column C1 and the number of months owned in column C2. Rename column C1 as HOURS and column C2 as MONTHS. Next, we will present a scatter plot with the values in C1 along the vertical axis and the values in C2 along the horizontal axis. To achieve this, select Graph® Plot and the Plot dialog box will be displayed. Enter the appropriate Y and X variables as shown in Figure 3.1. Note that the Display option that was selected is Symbol.

Image460.gif

Figure 3.1: Display of the plot selections

Click on the OK button and the plot will be displayed. Figure 3.2 shows the resulting plot. Since this is a plot of the ordered pairs (MONTHS, HOURS), this will represent a scatter plot for the two variables.

Image461.gif

Figure 3.2: Display of the Scatter plot of the number of hours vs. the number of months owned

From Figure 3.2, you can see a definite trend. The points appear to form a line that slopes from the upper left to the lower right of the screen. As you move along that (imaginary) line from left to right, the values on the vertical axis (hours riding) get smaller, while the values on the horizontal axis (months owned) get larger. Another way to express this is to say that the two variables are inversely related: the longer the bike was owned, the less the person tends to ride it.

We say that these two variables are correlated. More than that, they are correlated in a particular negative direction.

Example 2: Consider the following table, which contains the noise levels as measured by two different instruments. Present a scatter plot for this information with the variable NOISE1 along the y-axis and NOISE2 along the horizontal axis.

Noise1

Noise2

0.97299

0.98150

1.93680

2.13277

3.04045

3.13164

1.71018

2.06533

3.92119

4.46499

5.92306

6.20214

0.78743

1.24267

1.98965

2.18766

2.92915

3.35408

1.49930

2.10889

3.61674

4.47986

5.68941

5.72906

The plot is shown in Figure 3.3. The pattern for this scatter plot suggests a positive correlation.

Image462.gif

Figure 3.3: Display of scatter plot with positive correlation

Sometimes there may be little or no relationship between the variables. For example, in Figure 3.4 the display is a scatter plot of a person?s cholesterol after two days on a special diet and with control diet. Observe that the scatter plot displays no particular pattern. That is, there is little or no correlation between these two variables.

Image463.gif

Figure 3.4: Display of scatter plot with little or no correlation

  1. LINEAR REGRESSION

In this section, we will compute the strength of the association between variables. That is, we will compute the correlation coefficient. However, we will first observe the scatter plots before computing the correlation coefficient.

Example 3: The table below shows the average weight, by height, of American men between the ages of 20 and 24.

Height (inches)

Weight (pounds)

62

130

64

139

66

148

68

157

70

167

72

176

Source: Grossman, Stanley. Applied Calculus. 2nd Ed Wm.C. Brown Publishers.

  1. Use MINITAB to present a scatter plot with the weight being the dependent variable (y) and height being the independent variable (x).

    Using the MINITAB procedures presented earlier in the exploration, the scatter plot is constructed and displayed in Figure 3.5.

    Image464.gif

    Figure 3.5: Display of scatter plot with almost a perfect positive correlation

    Observe from Figure 3.5, that the points are almost on a straight line with positive slope. Hence, one would expect a strong positive correlation value.

  2. Use MINITAB to compute the correlation coefficient r.

    To compute the correlation between the two variables, select Stat® Basic Statistics® Correlation. The Correlation dialog box will appear. Select the two variables for the Variables box as shown in Figure 3.6.

    Image465.gif

    Figure 3.6: Display of the correlation (coefficient) dialog box

    Click on the OK button and the correlation coefficient will be computed and displayed in the Session window. Figure 3.7 shows the output in the Session window.

    Image466.gif

    Figure 3.7: Correlation value for the variables Height and Weight

    Observe that the computed correlation coefficient is +1. We observed from the scatter plot that the points are almost on a straight line with positive slope. Thus, for all practical purposes, there is a perfect positive correlation between these two variables. Thus, r = +1.

    Example 4: Determine a linear regression model for the data given in Example 3.

    In order to get the equation for the linear regression model, select Stat® Regression ® Regression and the Regression dialog box will appear. In the dialog box, the Response variable corresponds to the dependent (y) variable and the Predictors variable corresponds to the independent variables. Here we have only one independent (x) variable, which is HEIGHT, and the response variable is WEIGHT. Complete the dialog box as shown in Figure 3.8.

    Image467.gif

    Figure 3.8: Regression dialog box for the dependent variable Weight and the independent variable Height

    Click on the OK button and the analysis for the regression will be displayed in the Session window. Figure 3.9 displays the output. From the output we see that the regression equation (other terms are predictor equation, line of best fit, and least squares regression line) that relates WEIGHT to HEIGHT is given as WEIGHT = -156 + 4.61´ HEIGHT. Other information is also given in the Session window but we will ignore those for the mean time.

    Example 5: What is the predicted WEIGHT for a person whose HEIGHT is 69 inches?

    We can use the regression equation to predict WEIGHT values for a given HEIGHT value. For instance, in this example, HEIGHT = 69 and so by substituting this value into the regression equation, we have the predicted WEIGHT = -156 + 4.61´ 69 = 162.09 pounds. That is, based on this model, the predicted weight for a person who is 69 inches (5 feet 9 inches) tall is approximately 162 pounds.

    NOTE: This model will work well for independent values within the observed range of values. The range of values for the HEIGHT variable was from 62 inches to 72 inches. Thus, one should not rely on the model to make accurate predictions outside this range of values for the independent variable HEIGHT.

    Image468.gif

    Figure 3.9: Regression Analysis session window output for Example 4

    Example 6: What is the coefficient of determination for the model in Example 4?

    Recall that the Coefficient of Determination, denoted by Image469.gifor Image470.gif, is a measure of the variation of the dependent variable that is explained by the regression line and the independent variable. This value lies between 0 (0%) and 1 (100%). Thus, the closer the value is to 100%, the better the model is fitting the data. From Figure 3.9, R2 = 100.0%. Thus, from a practical standpoint, the model has captured all the variation in the dependent variable.

    NOTE: We can use MINITAB to superimpose the regression line onto the scatter plot. To achieve this, select Stat® Regression ® Fitted Line Plot and the dialog box will be displayed. Fill out as in Figure 3.10 and select the OK button.

    Image471.gif

    Figure 3.10: Fitted Line Plot dialog box for Example 4

    The resulting plot is shown in Figure 3.11. Observe that the regression equation is given on the output as well as the coefficient of determination R-sq.

    Image472.gif

    Figure 3.11: Fitted Line Plot output for Example 4

  1. NONLINEAR REGRESSION

In this section we will investigate patterns and models that are non-linear in nature.

Example 7: Alcohol absorption and the risk of having an accident have been studied for years. Extensive research has provided the following data relating the risk of having an automobile accident to the blood alcohol level. Use MINITAB to present a scatter plot for the data. We will assume that the independent variable (x) is blood alcohol level and the dependent variable (y) is relative risk of accident.

Blood Alcohol Level (%)

Relative Risk of Accident (%)

0.00

1.00

0.05

2.90

0.10

8.50

0.15

24.8

0.20

72.2

0.21

89.5

First, we need to enter the data values into MINITAB. Follow the procedure in Example 3 to present a scatter plot. The scatter plot is presented in Figure 3.12.

Image473.gif

Figure 3.12: Scatter Plot for Example 7

The plot indicates that the pattern is non-linear. The next example will allow us to determine a model for the data.

Example 8: Fit an appropriate model for the data in Example 7. The two other options we have in the Fitted Line Plot are Quadratic and Cubic. See Figure 3.10. Using the procedure for the NOTE in Example 6, select Quadratic for the Fitted Line Plot procedure. The quadratic model superimposed on the scatter plot is shown in Figure 3.13.

Image474.gif

Figure 3.13: Quadratic Fitted Line Plot output for Example 8

The equation for the quadratic model is

Relative Risk = 4.34362 - 309.293 Blood Alcohol + 3295.02 Blood Alcohol**2

If we let y = Relative Risk and x = Blood Alcohol, then we can write the equation as

y = 4.34362 - 309.293x + 3295.02x2

Observe that because of the square term in the equation, this will be a quadratic model. The R-Sq = 98.2%. Thus, the model explains 98.2% of the variability of the Relative Risk variable. Since this number is close to 100%, we can assume that the model is quite appropriate to describe the pattern of the scatter plot.

If we use the Cubic option, the fitted line plot as shown in Figure 3.14 will be generated.

Image475.gif

Figure 3.14: Cubic Fitted Line Plot output for Example 8

The equation for the cubic model can be written in terms of x and y as

Relative Risk = 0.688590 + 171.944x + - 3119.78x2 + 20415.2x3

Observe that because of the cubic term in the equation, this will be a cubic model. The R-Sq = 99.9%. Thus, the model explains 99.9% of the variability of the Relative Risk variable. Since this number is closer to 100%, we can conclude that the cubic model is more appropriate to describe the pattern of the scatter plot.

Note: One can use these models to predict the Relative Risk of an Accident for a given Blood Alcohol Level. Again these input values for the Blood Alcohol Levels should be within the range of observed values since the model was derived from this range of values.

NOTE: Here we have two models ? the quadratic and the cubic. The cubic model is a better fit for the data with an R-sq value (99.9%) that is slightly greater than that for the quadratic model (98.2%). From a practical standpoint, such a slight improvement in the R-sq value may not compensate for the increase in the complexity (the addition of the cubic term) of the model. Thus, when modeling data, one should look at all aspects and give a rational why the model was chosen.

  1. MULTIPLE LINEAR REGRESSION MODELS

In this section we will investigate multiple linear regression models

Example 9: An experiment was conducted to determine if the weight of an animal can be predicted after a given period of time on the basis of the initial weight of the animal and the amount of feed that was eaten. The following data, measured in kilograms, were recorded

Final weight, y

Initial weight, x1

Feed weight, x2

95

42

272

77

33

226

80

33

259

100

45

292

97

39

311

70

36

183

50

32

173

80

41

236

92

40

230

84

38

235

Use MINITAB to display scatter plots for the dependent variable versus the two independent variables. We will assume that the independent variables are x1 (initial weight) and x2 (feed weight) and the dependent variable is y (the final weight). First, we need to enter the data values into MINITAB. Follow the procedure in Example 3 to present a scatter plot for both independent variables. The scatter plots are presented in Figure 3.15 and Figure 3.16.

Image476.gif

Figure 3.15: Scatter Plot for Example 9

Image477.gif

Figure 3.16: Scatter Plot for Example 9

The plots indicate an approximate linear association between the dependent and the independent variables. The next example will allow us to determine a model for the data.

Example 10: Fit an appropriate model for the data in Example 9. Select Stat® Regression and fill in the dialog box as shown in Figure 3.17. Observe that in Figure 3.17, we have two Predictors or independent variables.

Image478.gif

Figure 3.17: Regression Dialog box for the Multiple Regression model for Example 10

Select the OK button and the resulting Session window display will be shown as in Figure 3.18.

Image479.gif

The equation for the quadratic model is

Final weight y = - 23.0 + 1.40 Initial weight x1 + 0.218 Feed weight x2

Observe that because of the square term in the equation, this will be a quadratic model. The R-Sq = 87.3%. Thus, the model explains 87.3% of the variability of the Final weight variable. Since this number is rather close to 100%, we can assume that the model is quite appropriate to describe the relationship between the variables.

Note: One can use the model to predict the Final weight of an animal for a given initial weight and feed weight. Again these input values for the independent variables should be within the range of observed values since the model was derived from these ranges.

Example 10: Select the Graphs option in the Regression dialog box as shown in Figure 3.17 to display residual plots and normality plot for the data given in Example 9. Click on the Graphs option and in the resulting dialog box, select the options as shown in Figure 3.18.

Image480.gif

Figure 3.18: Regression-Graphs Dialog box for the Multiple Regression model for Example 9

Two of the resulting graphs are shown in Figure 3.19 and 3.20.

Figure 3.19 shows the normality plot for the residuals for the model. Observe that the plot displays a linear pattern, which indicates that the normality assumption for the regression model has not been violated.

Figure 3.20 shows the plot for the residuals versus time order for the model. This plot is usually used to help determine visually whether the independence assumption for the model has been violated. Observe that the plot displays no apparent pattern, which indicates that the independence assumption for the regression model has not been violated.

Image481.gif

Figure 3.19: Normality Plot for the Residuals for the Multiple Regression model in Example 9

Note: Refer to your text for detailed discussions on the assumptions for a regression model.

Image482.gif

Figure 3.20: Residual Plot versus Order of Observation for the Residuals for the Multiple Regression model in Example 9

    NOTES

    EXPLORATION #3: HOMEWORK ASSIGNMENT

    Name: _____________________ Date: ______________________

    Course #: ___________________ Instructor: _________________

    1. State in each of the following cases whether you would expect the relationship between the given variables to be positive or negative or neither.
      1. An individual?s height and weight. ________________________
      2. The number of hours that a runner practices and his time for a 1-mile race.
        ____________________
      3. A college student?s cumulative grade point average and the number of hours per week that she works at a job. _____________________
      4. The number of automobile accidents and the amount of insurance premiums that the driver has to pay. ___________________
      5. The percentage of nitrogen in a fertilizer and the height to which a treated plant will grow. ______________________
      6. The amount of alcohol consumed by an individual and the length of time in which he responds to a given stimulus. ____________________
      7. The amount of alcohol consumed by an individual and the number of teachers at that individual?s previous high school. _____________________
    1. The table below gives the chirping frequency for the striped ground cricket and the corresponding temperature.

      Chirps / second

      Temperature in degrees F

      20.0

      88.6

      16.0

      71.6

      19.8

      93.3

      18.4

      84.3

      17.1

      80.6

      15.5

      75.2

      14.7

      69.7

      17.1

      82.0

      15.4

      69.4

      16.2

      83.3

      15.0

      79.6

      17.2

      82.6

      16.0

      80.6

      17.0

      83.5

      14.4

      76.3

      Source: http://ericir.syr.edu/Virtual/Lessons/Mathematics/Statistics/STA0002.html

      1. Produce a scatter plot for the data. Let chirps/second be your x values (independent) and temperature in degrees Fahrenheit be your y values (dependent). Label appropriately and turn in a hard copy with your work.

        Note: You may want to review for directions on how to produce a scatter plot.

      2. Based on the scatter plot, what type of correlation, if any, do you observe? Discuss.
      3. What is the value of the correlation coefficient between these two variables?
        r = __________________________
      4. Interpret the value of the correlation coefficient. Discuss.
      5. What is the coefficient of determination (R2) for the correlation type you chose?
        R2 = ___________________%
      6. Interpret the value of the coefficient of determination. Discuss.
      7. What is the regression equation? _____________________________________
      8. Provide a print out of the graph of the scatter plot and the regression equation together.
      9. Use your graph to estimate the number of chirps/second a striped ground cricket would make at a temperature of 98° Fahrenheit.
        Estimated number of chirps: __________________________
      10. Using the regression equation, what is the predicted temperature, in Fahrenheit, if the number of chirps/second made by a striped ground cricket was 19?
        Predicted temperature: __________________________
      11. Using the regression equation, what is the predicted temperature, in Fahrenheit, if the number of chirps/second made by a striped ground cricket was 50?
        Predicted temperature: __________________________
      12. Discuss the results obtained in part (i).
    1. The following Table shows the resident population, in thousands, of 85+ year olds for the United States from 1960 to 1996.

      Year (t)

      Population ( in thousands)

      1960 (0)

      567

      1961 (1)

      592

      1962 (2)

      608

      1963 (3)

      627

      1964 (4)

      653

      1965 (5)

      684

      1966 (6)

      718

      1967 (7)

      760

      1968 (8)

      800

      1969 (9)

      848

      1970 (10)

      969

      1971 (11)

      977

      1972 (12)

      1020

      1973 (13)

      1069

      1974 (14)

      1141

      1975 (15)

      1227

      1976 (16)

      1290

      1977 (17)

      1365

      1978 (18)

      1443

      1979 (19)

      1521

      1980 (20)

      1559

      1981 (21)

      1649

      1982 (22)

      1720

      1983 (23)

      1786

      1984 (24)

      1848

      1985 (25)

      1906

      1986 (26)

      1963

      1987 (27)

      2025

      1988 (28)

      2075

      1989 (29)

      2137

      1990 (30)

      2180

      1991 (31)

      2279

      1992 (32)

      2347

      1993 (33)

      2467

      1994 (34)

      2542

      1995 (35)

      2661

      1996 (36)

      2692

      1. Construct of scatter plot for the population table. Use the t values (0, 1, 2, ?, 36) for the independent x values and the population values for the dependent y values. Label appropriately and present a hard copy with your work.
      2. What type of pattern, if any exist, do you think best fits the data? Discuss.
      3. What is the coefficient of determination for the regression that you chose?
        R2 = __________________%
      4. Interpret the value of the coefficient of determination. Discuss.
      5. What is an appropriate regression equation for the data?
        Regression equation: ____________________________________________
      6. Justify why you choose the model in part (e). Discuss.
      7. Estimate, using the regression equation in part (e), the resident population of 85+ year olds in the year 1999. (Here, t = 39).
        Estimated population: _____________________________
    1. The following table contains data points representing the height of a model rocket at various times during its flight after its rocket motor has burned out.

      Seconds since rocket launched

      Height of rocket in feet

      1

      230

      2

      310

      3

      350

      4

      360

      5

      350

      6

      300

      7

      220

      1. Use MINITAB to construct a scatter plot of the data. Let the x values be the values for the number of seconds since the rocket was launched and let the y values be the values for the height above ground of the rocket in feet. Label appropriately and present a hard copy with your work.
      2. Discuss the shape of the scatter plot and whether there seems to be a correlation between the two variables.
      3. What is the best regression equation for the data?
        Regression equation: ________________________________________________
      4. What is the value of the coefficient of determination, R2 ?
        R2 = _____________________%
      5. Interpret the value of the coefficient of determination. Discuss.
      6. Discuss why you chose the model in part (c).
      7. Construct a fitted line plot for the data and use the curve to estimate the following when the rocket will be at various heights.
        • When did the rocket reach its maximum height? ____________________
        • What was the rocket?s maximum height? ____________________
        • When will the rocket hit the ground? ___________________
    1. The following table shows the Length (in inches) and Weight (in pounds) of alligators.
      1. Create a scatter plot for the length and weight of the alligators. Let the values for the length be along the x-axis and the values for the weight be along the y-axis. Include a hard copy of the plot with your work.
      2. What type of correlation best fits the scatter plot?
        ___________________________

        Note: To answer this question you may want to check several different model possibilities and choose the one with the best coefficient of determination. Discuss your reasoning

        Length (in inches)

        Weight (in pounds)

        94

        130

        74

        51

        147

        640

        58

        28

        86

        80

        94

        110

        63

        33

        86

        90

        69

        36

        72

        38

        128

        366

        85

        84

        82

        80

        86

        83

        88

        70

        72

        61

        74

        54

        61

        44

        90

        106

        89

        84

        68

        39

        76

        42

        114

        197

        90

        102

        78

        57

      3. What is the value of the coefficient of determination for the type of regression that you chose?
        Image483.gif: _____________________%
      4. What is the equation of the best fitting model?
        __________________________________________________________
      5. Discuss the strengths and weaknesses of the model.
      6. Present a hard copy of the fitted line plot superimposed on the scatter plot for your model from part (d) with your work.
      7. Using your fitted line plot, what is the approximate weight of an alligator that is 100 inches long?
        Weight: ____________________pounds
      8. Using your regression equation, what is the predicted weight of an alligator that is 100 inches long?
        Weight: ____________________pounds
    1. The following table gives the percent of refillable soft-drink containers sold out of the total soft drinks sold from 1960-1990 in five year increments.

      Year (t)

      Percent of Total Sold

      1960 (0)

      96

      1965 (5)

      84

      1970 (10)

      65

      1975 (15)

      57

      1980 (20)

      34

      1985 (25)

      23

      1990 (30)

      7

      Source: Prentice Hall, Algebra (1998)

      1. Create a scatter plot for the percent of refillable soft drinks sold out of the total soft drinks sold from 1960-1990 in five year increments. Turn in a hard copy of the plot with your work.
      2. What type of correlation, if any, exists between the percent of refillable soft drinks sold and the years 1960-1990?
        ____________________________
      3. What is the value of the coefficient of determination R2?
        Image483.gif : ____________________________%
      4. What is the regression equation? __________________________________
      5. Graph the scatter plot and the regression curve on the same display. Sketch the display below. Turn in a hard copy with your work.
      6. From the regression equation, estimate the percent of refillable soft drinks sold for the year 1995 (t = 35).
        ____________________________%
    1. The following table shows the population (in millions) of 15-19 year olds from 1960-1996.

      Year (t)

      Population (in millions) 15-19 year olds

      1960 (0)

      6586

      1961 (1)

      6794

      1962 (2)

      7376

      1963 (3)

      7647

      1964 (4)

      8008

      1965 (5)

      8386

      1966 (6)

      8842

      1967 (7)

      8836

      1968 (8)

      9013

      1969 (9)

      9234

      1970 (10)

      9437

      1971 (11)

      9740

      1972 (12)

      9988

      1973 (13)

      10193

      1974 (14)

      10349

      1975 (15)

      10465

      1976 (16)

      10582

      1977 (17)

      10581

      1978 (18)

      10555

      1979 (19)

      10498

      1980 (20)

      10413

      1981 (21)

      10096

      1982 (22)

      9809

      1983 (23)

      9515

      1984 (24)

      9287

      1985 (25)

      9174

      1986 (26)

      9206

      1987 (27)

      9139

      1988 (28)

      9029

      1989 (29)

      8840

      1990 (30)

      8709

      1991 (31)

      8371

      1992 (32)

      8324

      1993 (33)

      8410

      1994 (34)

      8580

      1995 (35)

      8779

      1996 (36)

      9043

      1. Create a scatter plot for the population (in millions) of 15-19 year olds from Let t be the x values be year and the y-values be population. Provide a hard-copy of the plot.
      2. What is the value of the coefficient of determination for the best model?
        Image483.gif :_______________________%
      3. What is the regression equation? ______________________________________
      4. Discuss why you chose the meodel in part (c).
      5. Graph the scatter plot and the regression curve on the same window and present a hard copy with your work.
      6. Using your regression equation, what is the estimated number of 15-19 year olds in the year 2000 (t = 40)?
        ___________________________________
      7. Using your fitted line plot, in what year did the population of 15-19 year olds reach its maximum thus far?
        (Remember to transfer the time value back into the appropriate year).
        Year: ___________________________
      8. Using your fitted line plot, in what year did the population of 15-19 year olds reach its maximum thus far?
        (Remember to transfer the time value back into the appropriate year).
        Year: ___________________________
    1. The following table displays temperature data for the years 1988-1990 taken from the middle of the mouth of the Chesapeake Bay.

      Month

      1988

      1989

      1990

      Jan

      1.56

      5.76

      5.28

      Feb

      4.68

      5.28

      7.20

      Mar

      7.20

      5.88

      9.72

      Apr

      11.40

      11.50

      12.50

      May

      17.30

      16.70

      18.40

      Jun

      21.80

      22.50

      21.20

      Jul

      24.70

      26.00

      25.10

      Aug

      22.80

      25.10

      26.30

      Sep

      21.60

      22.70

      24.00

      Oct

      18.00

      18.20

      21.10

      Nov

      12.80

      13.60

      13.00

      Dec

      9.72

      7.70

      8.64

      1. Construct scatter plots with the months along the horizontal (x) axis and the temperature along the vertical (y) axis for the three different years. Note: you should use dummy values for the months along the x-axis. That is, you can recode Jan = 1, Feb = 2, Mar = 3, etc. and let these values (1, 2, 3, ? ) be the values along the x-axis.
      2. Describe the shape of the graphs.

        Plot for 1988

        Plot for 1989

        Plot for 1990

      3. Discuss any observations about these scatter plots that you have made.
      4. What are the best fitting regression equations for the plots? Discuss why you made your choice.

        Equation 1988: ___________________________________________________

        Equation for 1989: ___________________________________________________

        Equation for 1990: ___________________________________________________

    1. The following table shows the funding for support technology for the Ballistic Missile Defense Organization (BMDO) from 1985 to 1999.

      Year (t)

      Budget for Support Technology (in millions)

      1985 (1)

      748

      1986 (2)

      1606

      1987 (3)

      2025

      1988 (4)

      2005

      1989 (5)

      1865

      1990 (6)

      1857

      1991 (7)

      1431

      1992 (8)

      1194

      1993 (9)

      718

      1994 (10)

      529

      1995 (11)

      382

      1996 (12)

      381

      1997 (13)

      393

      1998 (14)

      408

      1999 (15)

      637

      Source: BMDOFACTSHEETPO-99-02

      1. Create a scatter plot for the budget from 1985 to 1999. Let t be the x values be year and the y-values be amount of the budget in millions. Provide a hard copy of the plot.
      2. What is the value of the coefficient of determination for the best model?

        Image483.gif :_______________________%

      3. What is the best regression equation for the data?

        _____________________________________________________________

      4. Discuss why you chose the model in part (c).
      5. Graph the scatter plot and the regression curve on the same window and present a hard copy with your work.
      6. Using your regression equation, what is estimated budget for technology support for BMDO for the year 2000?

        Estimated budget ($ millions) :___________________________________

    1. Twenty-three student teachers took part in an evaluation program designed to measure teacher effectiveness and determine what factors are important. Eleven female instructors took part. The respopnse measure was a quantitative evaluation made on the cooperating teacher. The independent variables were scores on four standardized tests given to each instructor. The data were as follows:

      Y

      X1

      X2

      X3

      X4

      410

      69

      125

      59.00

      55.66

      569

      57

      131

      31.75

      63.97

      425

      77

      141

      80.50

      45.32

      344

      81

      122

      75.00

      46.67

      324

      0

      141

      49.00

      41.21

      505

      53

      152

      49.35

      43.83

      235

      77

      141

      60.75

      41.61

      501

      76

      132

      41.25

      64.57

      400

      65

      157

      50.75

      42.41

      584

      97

      166

      32.25

      57.95

      434

      76

      141

      54.50

      57.90

      1. Use MINITAB to draw scatter plots for Y versus the independent variables. Provide hard copies of these plots.
      2. Discuss any observations from these graphs.
      3. Use MINITAB to fit an appropriate multiple linear regression model. Discuss why you think this is the most appropriate model.
      4. Present residual plots to determine whether the assumptions for the model were violated. (Note: Check your text for the assumptions for a multiple linear regression model). To obtain the residual plots you need to select the Graphs option in the Regression dialog box. See Figure 3.17.


    © 1995-2002 by Prentice-Hall, Inc.
    A Pearson Company
    Legal Notice