Volume 12 Number 1
1984
Nucleic Acids Research
Molecular weight determination program
Claude V.Maina, Garry P.Nolan and Aladar A.Szalay*
Boyce Thompson Institute for Plant Research, Comnell University, Tower Road, Ithaca, NY 14853,
USA
Received 19 March 1983
ABSTRACT
A computer program is described that will determine the molecular weight
of DNA, RNA or protein molecules separated according to size by gel
electrophoresis. It uses the sizes and migration distances of known molecules
in a reference lane to compute a second or third order equation whose curve
best fits the data points. It then computes the sizes of all molecules from
this equation. Migration distances are measured and entered using an analog
tablet. The program is written in Apple Pascal and designed to run on an
Apple II Plus computer.
INTRODUCTION
Gel electrophoresis has become a common and important tool for the
molecular biologist to determine the size of macromolecules, DNA, RNA, and
proteins. The procedure itself is simple, but the analysis of the data,
although simple as well, is time consuming and tedious. A plot of migration
distances versus the logorithm of the size of known marker molecules (size
standards) is made and a curve is then drawn that best fits the data points.
A theoretical plot of the log of the size versus the migration distance
results in a straight line (a first order equation) (1, 2, 3). However, due
to perturbations in the gel a positive curvature exists at the upper end of
the plot and, in most cases, a negative curvature at the lower end. The
result is a curve that mathematically resembles either a second or a third
order equation. Because of this mathamatical relationship between the size
and migration distances it is possible to write a computer program to
calculate the coefficients of the equation which best describes the plot of
migration distance versus size for known molecules. Such a program has been
written (4) and it computes the coefficients of a second order equation. We
set out to write a program for a common microcomputer that would measure
migration distances, compute sizes using either a second or third order
equation and require no knowledge of computers by the user. This paper
© I R L Press Limited, Oxford, England.
695
Nucleic Acids Research
describes the algorithm and usage of the program.
COMPUTATION ALGORITHM
The program uses the method of least squares to calculate the
coefficients of the second or third order equation which best describes the
data points determined from the standard lane. Its premise is to determine
the coefficients of an equation such that the sum of the distances between
each point and the curve of the equation is a minimum. The procedure
generates n + 1 equations where n is the order of the equation. A third order
equation then generates four equations with four unknowns. These simultaneous
equations and are solved by Gaussian Elimination, a matrix solution procedure.
The sizes of unknown molecules are then computed by substituting their
migration distances into the equation.
USAGE
The program is written in Apple Pascal and designed to be used on an
Apple II Plus computer with a Saturn 128K Ram Board. The computer is
interfaced to a Zeiss Mop3 Analog Tablet, an IDS 560 dot matrix printer and a
Sanyo DMC 6013 13-inch monitor. With minor changes the program can be run on
an Apple II Plus computer with an Apple Language Card and with any analog
tablet or digitizer compatible with the Apple JI Plus computer.
The program is structured as follows (Fig. 1):
Procedure Instructions - shows a list of instructions.
Procedure Get Data - inputs migration data from the Zeiss Mop3 Analog
Tablet and sizes for the standards from the kevboard. Mligration distances are
measured in the following way: a photograph of the gel is placed on the
analog tablet and the x and y coordinates of the gel origin and of each size
molecule (band) are determined and transmitted to the computer. The program
then calculates the migration distances in centimeters. The sub-procedure
that performs this task is specifically written to accept input from a Zeiss
Mop3 Analog Tablet but can be easily modified to accept input from any other
analog tablet or digitizer. Since the program is designed primarily to
calculate the size of small DNA molecules, a table of the standards most
commonly used in the laboratory of the authors is given in another
sub-procedure. The program will allow a maximum of 5 lanes of standards and
20 lanes of sample molecules with 20 bands per lane for each run of the
program.
Procedure Change Standard Data - shows the migration distances and size
696
Nucleic Acids Research
Fig. 1.
IFSTRCTIMNI
Flow chart showing the major
pathways through the program.
GET DATA
Fm|Ciw
STANw ITA F
|SIZE CALaLATIONs
EIT
values for all standards in a table format and allows the user to change,
delete or add any value(s) to/from the table.
Procedure Change Sample Data - shows the migration distances for the
sample bands in a table forrnat and allows the user to change, delete or add
any value(s) to/from the table.
Procedure Curve Fit - computes the coefficients for a second or third
order equation (as decided by the user) by the method of least squares for
each standard lane. It then substitutes the migration distances for the size
standards into the equation and calculates the size for each band. By
comparing the calculated sizes with the published values the procedure
computes the error values for each coefficients the coefficient of correlation
and the average percent error, for all of the bands.
Procedure Size Calculations - computes the sizes for al bands in the
sample lane using the equation determined in Procedure Curve Fit and surs the
sizes for the bands in each lane. If more than one standard lane is used, the
procedure will compute the sizes for a sample lane from the closest standard
lane. This assumes that the lanes have been entered consecutively from one
end of the gel to the other. The user may change the standard lane assignment
in Procedure Change Sampl e Data .
Procedure Print Answer - prints the computed information in the format
shown in Fig. (3). If more than one standard lane is used, the procedure will
print the information for one standard lane followed by the infornation for
697
Nucleic Acids Research
1 2
Fiq. 2.
X DNA digest, electrophoresed
on 0.7% agarose gel. Lane 1 0.5 ig of X DNA digested
separately with HindIII and
SmaI then combined; 12 bands
result, the bottom band may
not be visible here. Lane 2 0.5 )g of A DNA digested with
EcoRI. Electrophoresis conditions: 30V for 13 hrs. in 40 mM
Tris, 20 mM NA acetate, 4 mdl
EDTA pH8.2 at 30V for 13 hrs.
all sample lanes computed from it. It will then print the next standard lane.
Procedure Curve Plot - shows a semi-log plot of the standard curves on
the screen ard will print the plot if requested (Fig. 4).
After the program is run the user may return to Procedure Change Standard
Data to change any values and run the program again. The user may also return
to Procedure Get Data to run the program with a new set of data.
Fig. 2 shows bacteriophage A DNA digested separately with HindIII and
SmaI, combined and separated on an agarose gel. Lane 2 contains X DNA digested
with EcoRI. Lane 1 is used as a size standard to determine the size (in
kilobase pairs) of each band in Lane 2. Fig. 3 shows the resulting printout
from the program.- It is divided into three sections. The first and third
sections show the migration distances and sizes (in kilobase pairs) for the
bands in the standard and sample lanes, respectively. The second section
shows information relevant to the computed equation. "Cubic Equation" is used
to denote a third order equation ("Parabolic" is used to denote a second order
equation). The coefficients of the equation are listed under the term
"Coefficients" . The computed equation in this example is then: y = 0.0045x3 + 0.0829x2 - 0.6324x + 2.2401, where X is the migration distance and
y is the log of the size. The standard deviation for each coefficient is
698
Nucleic Acids Research
STANDARD LANE NUNBDR 1
INPUT
I
1
2
3
4
5
6
7
8
9
10
11
12
MOBILITY
1.62
1.73
1.89
2.47
2.92
3.10
3.20
3.73
4.65
6.76
7.18
10.14
SIZE
27.76
23.47
19.94
12.04
9.65
8.62
8.38
6.62
4.31
2.25
2.10
0.49
CUBIC EQUATION
COEFFICIENTS
2.2401
CONSTANT TERM
-0.6324 X-1
0.0829 X-2
-0.0045 X-3
CALC
SIZE
25.95
23.50
20.46
13.08
9.75
8.77
8.30
6.35
4.10
2.36
2.08
0.4?
RESID
-1.83
0.03
0.52
1.04
0.10
0.15
-0.08
-0.27
-0.21
0.11
-0.02
-0.00
ERRORS
0.0734
0.3554
2.7143
24.5173
CORRELATION COEFFICIENT IS 0.99936
AVERAGE PER CENT ERROR IS 3.06
WELL * I
BAND # MOBILITY SIZE
1
1.80
22. 09
2
7.34
3.43
3
4 .0 3
5.56
4
4.14
5.32
5
4.56
4. 53
5 .48
6
3.38
TOTAL .
48.23
Fig. 3.
Program Printout using data from Fig. 2.
Mlobility: migration distance in centimeters.
Input Size: size of each band entered by the user.
Calc Size: size of each band calculated by the program.
Resid: difference between the entered and calculated sizes.
Cubic Equation: 3rd order equation.
Coefficients: coefficients for each order of x and the standard
error for each.
Coefficient of Correlation: an estimate of the fit of the
curve.
Average Percent Error: an average of the percent error for each
band of the standard land.
listed under the term "Errors". The correlation coefficient is a quantitation
of the fit of the curve to the data points, with a value of 1.0 indicating a
perfect fit.
The average percent error is the average of the percent error
699
Nucleic Acids Research
STANDARD NO. I
s
z
E
Lee |h.
= INPUT PNTS.
MOBILITY
i.6
3.2
'
4.8 '6.4
8.8
9.6
Fig. 4. Semi-log plot of the standard curve aenerated by the program.
for each band of the standard lane. Fig. 4 shows the print of the semi-log
plot of the standard curve. By looking at the coefficient of correlation
(.99936), the average percent error (3.06) and the curve itself one can
determine how well the curve fits the data points. Table 1 compares the
calculated sizes with the published sizes for each band of Lane 2 (Fig. 2).
As can be seen, the calculated values agree well with the published values and
the average percent error for these values is consistent with the average
percent error calculated by the progran for the standard lane.
Band
No.
1
2
3
4
5
6
Published
Size
Calculated
Size
21.81
7.54
5.93
5.54
4.83
3.38
22.09
7.34
5.56
5.32
4.53
3.38
49.03
48.23
Difference
0.28
0.20
0.37
0.22
0.30
0.00
% Error
1.3
2.6
6.2
4.0
6.2
0.0
Average
X Error
3.34
TablEL 1. Analysis of Program Calculations
Column 1: Published sizes for an EcoRI digest of A DNA (Bethesda
Research Laboratory 1982 Catalog).
Column 2: Calculated sizes for EcoRI digest of A DNA (Fig. 2).
Column 3: Difference between calculated and published sizes.
Column 4: Percent difference between calculated and published
sizes.
Column 5: Average of column 4.
700
Nucleic Acids Research
We have also used the program
1000 base pairs separated on a 4.0%
protein in the size range of 14.3
acrylamide, 0.33% bisacrylamide gel
with DNA fragments in the range of 100 to
acrylamide, 0.11% bisacrylamide gel and on
to 68.0 kilodaltons separated on a 12.5%
with similar results (data not shown).
DISCUSSION
We have described here a program that measures migration distances and
calculates the molecular weight of macromolecules separated according to size
by gel electrophoresis. The program uses the method of least squares to
calculate the coefficients of a second or third order equation whose curve
best fits the points of a standard lane. It then uses this equation to
calculate the size of molecules in the sample lanes.
The program is much faster, easier and more accurate than the current
method of interpolation by hand. All of the data analysis, including
measurement of migration distances, is carried out by the computer. The other
major advantage is error determination. In the past percent errors could only
be roughly estimated; now they can be calculated exactly. This error
calculation allows one to compare different methods of gel electrophoresis to
determine which yields the most accurate results.
In our laboratory we have found the following conditions to be important
for DNA electrophoresis: electrophoresis apparatus, especially glass plates
must be rigorously clean; care must be taken in pouring the gel so that the
agarose cools evenly; all samples must be in the same buffer and of the same
volume; the sample buffer must be of a salt concentration which is less than
that of the running buffer; the standard lane should have at least 10 bands;
and electrophoresis should be between 10 and 30 volts.
If these conditions are not met, the percent errors are higher, the
coefficient of correlation lower and, in general, the results are
inconsistent. We feel that these findings are true even if the size
calculations are done by hand, but escapes unnoticed due to the lack of
specific error calculations.
We have observed in our laboratory that in some instances a second order
equation gave better results than a third order equation for electrophoretic
conditions other than the ones listed above. This can be a result of the fact
that a second order equation is more mathematically constrained than a third
order equation and that this constraint compensated for inconsistancies of the
electrophoresis due to the specific conditions under which it was run.
However, this was not reproducible for those specific conditions used, nor was
701
Nucleic Acids Research
it the case for other conditions. We are now pursuing other curve fitting
algorithms in an effort to make our program useful under all electrophoretic
conditions.
We have found that the use of this program in our laboratory has reduced,
significantly both the time spent analyzing electrophoretic data and the
errors involved. We feel that computer analysis of electrophoretic data will
be the method of choice in the future.
ACKNOWLEDGEMENTS
The authors would like to thank: Dr. Ray G. Hadley for establishing
electrophoresis conditions; Dr. J. Lee Compton and Karen Kolowsky for their
advice and criticism throughout the writing of the program and the manuscript;
Dr. J. Robert Cooke and Dr. John Dill for additional help and criticism
throughout the writing of the program; Dr. Marc Krauss for help in preparing
the figures; and Mrs. Julie Ruocco for her patient typing.
*To whom reprint requests should be sent
REFERENCES
Pettersson, V., Mulder, C., Delius, H., Sharp, P. (1973) Proc. Nat. Acad.
Sci. USA 70:200-204.
2) Maniatis,T., Jeffrey, A., van de Sande, H. (1975) Biochem. 14:3787-3794.
3) Sharp, P., Sugden, B., Sambrook, J. (1973) Biochem. 12:3055-XK63.
4) Duggleby, R., Kinns, H., Rood, J. (1981) Anal. Biochteim. 110:49-55.
1)
702
© Copyright 2025 Paperzz