INTRODUCTION
The availability of microcomputers and appropriate software make the use of numerical and statistical methods by undergraduates readily feasible. The important question is how useful and meaningful is the use of such software by undergraduates? There is a risk that either too little or too much information about a numerical or statistical method is provided to the student. If too little information is provided, the student goes through a process which provides some computer output, but obtains no real understanding of what the information obtained means and how it is to be used. If too much information is. provided, the student may become confused and frustrated. The level of instruction must be appropriate, considering the abilities and needs of the students.
Linear least squares is one of the most useful statistical methods for beginning students. In the pre-computer days it was common to include an experiment in the junior or senior level physical chemistry or instrumental analysis laboratory course which involved a linear least squares calculation. This exposed the student to the technique. Because of the amount of time required to do the calculations either by hand or calculator, it wasn't really feasible to have the student carry out more than one or a few least squares fits.
At Clarkson where every student has a computer (currently an IBM PS/2) in his (or her) own room and where there are computers in the laboratory and elsewhere on campus, I have asked students to use linear least squares software in a second semester freshman laboratory course and a second semester junior combined analytical physical chemistry laboratory course.
In this article I will describe how linear least squares can be introduced at an elementary level. In my view it is not essential that the least squares equations are derived by the instructor. What the student needs to understand is how the method is to be used and how the results of linear least squares calculations are to be interpreted.
Linear Equations
There are TWO different linear equations which the software must be able to fit:
Y=AX + B (1), and
Y=AX (2)
Some software will only fit data to equation (1 ). Equation (1) is really not appropriate in some instances. For example, the dependence of absorbance upon the concentration of a single absorbing constituent is given by the Beer-Lambert Law (A= abc) which has the form of equation (2). In chromatography it is usually expected that peak height or peak area (Y) is related to concentration or amount (X) by equation (2). These are only a few examples of where equation (2) rather than equation (1) is appropriate. Very often equation (1) should be used to fit experimental data. If a substance R reacts to produce a product P and the reaction is first order in R, the integrated form of the rate expression is:
ln[Rt] = ln[RO] - kt (3)
where [Rt] is the concentration of R at timet, and [RO] is the concentration initially (at t=0). This is of the same form as equation (1) where Y = ln[Rt] and X = t.
Many students will use equation (1) indiscriminately because it gives a "better" fit to the data. If the purpose of the experiment is to test the validity of a theoretical or an accepted relationship which is like equation (2) (e.g. Beer's Law or chromatography equations), then equation (2) and not equation (1) should be fitted.
THE THEORY AND ASSUMPTIONS OF LINEAR LEAST SQUARES
In the least squares. fitting of experimental data to equation (1) or (2), values of A and B (for equation (1) are found such that the sum of the squares of the deviations (SSD) is a minimum.
SSD =· Sum (Y(experimental) -(Y(calculated))**2 (4)
Y (calculated)· is the value 6f Y calculated fron equation (1) (or (2)). Indeed, least squares means least sum of squares of deviations (SSD).
While students may not realize it, they are already familiar with a least squares calculation. The average of a series of measurements is a least squares value.
Implicit in the assumption that the least squares calculation gives a "best fit" to the data are the assumptions that there is little or no error in X and that the error in Y is independent of the value of Y and is normally distributed. (Normally distributed errors give rise to the typical bell-shaped curve. Small errors in y are more likely than large errors and positive and negative errors are equally likely.) In discussing errors N Y we are considering random errors. Where equation (1) is being fitted, residual errors in Y will affect theY intercept, B, but not the least squares value of the slope, A.
The assumption that the magnitude of the error in Y is inde pendent of the value of Y is implicit in the method of unweighted least squares. Where this assumption is not valid, it is possible to use weighted least squares methods. While such calculations are relatively simple to perform, the topic and related techniques are too confusing to introduce at a beginning level.
INFORMATION WHICH SHOULD BE OBTAIN ED FROM THE LEAST SQUARES. CALCULATIONS AND HOW THIS INFORMATION CAN BE USED
In addition to obtaining the least squares slope, A, and intercept, B (for equation ( 1)), there are a number of other statistics which a satisfactory least squares program should provide. Typical output obtained with the program used at Clarkson is shown below:
This is simulated kinetic . data least square.s fittedusing equa-tion (3) where EXPT X is the time in seconds and EXPT Y is .ln[Rt] ob-tained from experimental measure-ments. Ideally, more experimental points would be obtained. Only a few points are used in this calcula-tion to conserve space and make it easier to check some of the calculations.
(If the data had been fitted to equation (2), which is obviously inappropriate, the SLOPE would be -2.80E-03, S.D. SLOPE is 5.41 E~04 and. S.D. REG is 2.963.)
One of the most important statistics obtained is the standard deviation from regression (S.D. REG). The standard deviation from regression is the square root of the sum of the squares of the deviations
(SSD) divided by the number of degrees of freedom and can be represented by the equation:
S.D. REG= SQR (SSD/(N - NLSP) (5)
where N is the number of data points, and NLSP is the number of least squares parameters (constants), which would be 2 (A and B) for equation (1) and 1 (A) for equation (2). For the data given above SSD is 4.30E-04 and S.D. REG is the
square root of 4.30E-04 divided by 2 which is 1.47E-02.
The standard deviation from regression is to ·be understood as the "average" difference between the experimental and calculated value of Y. Perhaps the most important question to be answered from the results of any linear least squares calculation is: "Is there really a linear relationship between Y and X?". Is the reaction really first order? Is. absorbance linearly related to concentration? Is peak height linear related to the amount of a particular constituent present? One means of deciding whether the results of the least squares calculation are consistent with the hypothesis that the reaction is first order is to compare the standard deviation from regression with an estimate of the random error in Y (In (Rt)). If the standard deviation from regression is comparable to the estimated .random error in Y, this is consistent with the hypothesis there is a linear relationship. If SO( reg) is much larger than the estimated error in Y, the data do not support the idea of a linear relationship. (The F test (reference (1)) can be used to provide probabilities but this is unnecessarily complicating at an elementary level.)
A linear least squares program can be used to fit a straight line to ANY set of data. The fact that a least squares fit is obtained does not prove there is a linear relationship. The standard deviation from regression provides some information. Another source of information is the deviation pattern. Suppose the relationship between y and x is really logarithmic and the data is fitted to a straight line., There will be positive (or negative) deviations at small and large values of X and negative (or positive) deviations in the middle. The presence of such a pattern suggests that there really isn't a linear relationship. Thus, looking at the signs and magnitudes of the deviations may suggest a non-linear relationship. Conclusions based upon the deviation pattern are generally only possible when many points are available.
Students should be asked to plot the data and least squares line in addition to performing least squares calculations. Such plots should initially be done by hand. Students have a Jot to learn about the proper plotting of data (scaling, labelling, representing experimental points and least squares lines). The plots help to identify points which are far removed from the "best" line or curve. Systematic deviations or curvature may be revealed by such plots. At a later stage students should learn to use appropriate plotting programs. In the junior level course students were required to use a plotting program. Such programs save time and computer generated plots are generally used in the real world.
If it has been determined that the data are adequately fitted by a linear equation, the least squares constants and their standard deviations are then examined. For the kinetic data, once it has been established the reaction is first order in R, the slope gives the first order rate constant and has a value for the sampledataof7.07E-041/sec. S.D. SLOPE can be used to estimate the random error. At the 90% level of certainty, the random error in the rate constant is 0.19E-041 /sec (2.92 X 6.56E-06 t-test (reference (1 )). (The half life is 980 + 26 sec.) The Y intercept is - 6.285 ([RO] = 1.86E- 03 M). S.D. = 0.0180. The random error in Y is 0.053 at the 90°/o level.
Sometimes it is the slope and sometimes it is the Y intercept which is the more important result of the least squares calculation. Generally, when equation (3) is used, it is the rate constant (or slope) which is of most interest.
In the junior level course at Clarkson students prepare Nylon 66. The viscosity of several different Nylon 66 solutions is measured. A functions of the measured viscosity is plotted versus concentration (reference (2)). Extrapolation to zero concentration (theY intercept) gives the intrinsic viscosity. In the experiment, the extrapolation and determination of the Y intercept (intrinsic viscosity) was determined from two different least squares calculations. The average molecular weight of the polymer is calculated from the intrinsicviscosity. S.D. INTERCEPT can be used to estimate the random error in the molecular weight.
As explained above, S.D. REG is used to decide whether the data are satisfactorily fitted by the expected linear equation. If S.D REG is much larger than the estimated error in Y, it is concluded either that the data is not adequately fitted by the linear equation or that the estimated error in Y is incorrect. Perhaps some other equation is appropriate and/or some of the as$umptions leading to the expectation of a linear relationship are not valid. For example, in the kinetics experiment the forward reaction might not be first order, there may be a back reaction, the temperature may not have been maintained constant, there may be unsuspected errors in the estimation of [Rt]. It may not be appropriate to consider too many complications, particularly when students are at an elementary level.
For one kinetics experiment performed at Clarkson, students are asked to determine whether a reaction is zeroth order, first order or second order in R. A different linear equation can be written for each case and a least squares calculation performed. Students are asked to draw conclusions using the results of each least squares calculation.
SUMMARY·SOMEIMPORTANT QUESTIONS
Some questions which need to be considered by students in performing and analyzing the results of experiments:
(1 ).What is the expected or theoretical relationship between the variables (X and Y)?
(2) Are there other variables or equations which need to be considered? For example, the temperature may need to be measured and kept constant in a kinetics experiment. Absorbance may be used to monitor concentration and it must be confirmed (or assumed) that the system conforms to Beer's law.
(3) What are the values of the least squares constants and their standard deviations? .What is the standard deviation from regression?
(4) What are the assumptions made in performing the least squares calculations? Are these assumptions correct?
(5) What is the estimate of the random error in Y? Have all important sources of random erro·r been considered in estimating the random error in Y?
(6) Is the theoretical or expected equation consistent with the results of the least squares fit?
(7) If the answer to (6) is yes, what can be concluded from the results? For example, the reaction is first order in R. The numerical value of the rate constant is --- and the random error in the rate constant is---.
(8) If the answer to (6) is no, what conclusions can be drawn? This is usually a more difficult question to answer.
(9) What additional experiments might be useful?
THE LINEAR LEAST SQUARES PROGRAM
Any least squares program which will fit either of the two equations and provides the other statistics is satisfactory. The linear least squares program used at Clarkson was developed over a period of years. It was originally written in FORTRAN for a main frame computer. The current version is available in BASIC and runs on a microcomputer. For anyone interested in the program, a source listing will be available on the Chemistry Education Discussion List (CHEMED-L) on June 1 and September 1, 1992. (For information on CHEM ED-L look elsewhere in this Newsletter.) Those without access to CHEMED-L may obtain a listing of the program and an ASCII source file on either a 51/4 inch or a 31/2 inch disk (if desired) by sending$ 3 to me.
REFERENCES
(1) G. W. Snedecor and W. G. Cochran "Statistical Methods", 8th edition;lowa State University Press, Ames lA, 1989.
(2) D. P. Shoemaker, C. W. Garland, and J. W. Nibler, "Experiments in Physical Chemistry", 5th edition, p. 370-380; McGraw Hill, New York, 1989.