The Web's Multiple Regression Home Page

A U T O F I T


AutoFit is a Multiple Regression program that automatically builds a model or regression equation for you. You merely supply the dependent and independent variables and it does the rest. It will find which variables are important enough to include in the model, determine the proper transformation of each of those variables, then look for 2-way and 3-way interaction terms important enough to include in the model, and transform them appropriately. And it does it for free.

You can run AutoFit below these instructions. Merely supply the URL to your input data file (or enter your data on this page), set any desired options, click Run Regression, and the program will develop the model for you. The input data file can have the dependent variable listed anywhere in the file, but the first column is assumed if not indicated. Enter each variable of data as a column of data (not as a row of data). Separate each number of the data file by a comma or a space. The first record of the input data file can optionally be a list of variable names followed by a comma or space. Variable names can be up to 10 characters long, only letters and numbers (embedded spaces not allowed). If variable names are omitted, then y, x1, x2, etc. will be used. Missing values are not permitted, each data record must be complete.

If you do not wish to complete the form below, just email your data to the email address below and the regression will be run for you, and the results emailed back to you.

AutoFit will attempt to transform the independent variable(s) (but not the dependent variable) so as to insure a linear relationship between the independent variable(s) and the dependent variable. Currently possible transformations are:

AutoFit does a stepwise solution in finding which variables to enter into the model, but provides an option to find a simultaneous system solution (SSS) as well. The SSS will use the variables found with the stepwise procedure, but will try all combinations of transformations using a simultaneous system approach, resulting in a very lengthy process. If your sole purpose is to derive a prediction equation then this option is not necessary. However, if the relationships between the independent variables and the dependent variable is important, then SSS the equation. Due to constraints of resources at Hawaiian Net, SSS is limited to models of three independent variables only. Future plans call for SSS models with four or more independent variables when computer resources become available.

If you choose to use SSS and your model has one or more sine waves, then you must specify the wavelength(s) of those independent variables.    As well, if you choose to use SSS and your model has one or more "b to the x" transformations, then you must specify the "b" value of those independent variables.

If you are not using the SSS option, then you do not need to supply wavelength values for sine transformations since AutoFit will calculate them for you.    And if you are not using the SSS option, then you do not need to supply "b" values of "b to x" transformations since AutoFit will calculate them for you.

When you run the regression, instead of giving you the output on your screen, since most regressions take several minutes to run, the program will submit the regression for processing, emailing you the results when complete, usually within 24 hours.

Your input data does not need to have an intercept term since AutoFit supplies one for you. As well, your input data must have more records than independent variables, otherwise the equation cannot be determined. Also, the data must be in text format, no spreadsheets. If your data is in a spreadsheet, save it as a ".txt" or ".csv" file. Only numerical values allowed. Percent signs, currency symbols, temperature scales, weight scales, etc. are not permitted. Commas are permitted as delimiters. The letter E is permitted if part of scientific notation numerical value. A sample input data file might look like the following:

percentile,    ranking, noiselevel
0.486578286340026    1  21 
0.246711000721213    2  23 
0.13629187548097     3  33 
0.092626329983846    4  41 
0.0836961246335616   5  39 
0.0823257870395134   6  37 
0.0839260785153895   7  35 
0.0907101513194246   8  31 
0.0948171785407239   9  29 
0.0995963399120694  10  27 
0.105077813876941   11  25 



                           
   URL of input file:  
                        examples: http://www.eskimo.com/~brainy/sine.data
                                  http://www.eskimo.com/~brainy/exponential.data
                       ...OR...
Enter inputdata here:  
                        If your data is not stored on the World Wide Web,
                        you must enter your data above.  To enter your data,
                        follow each number by a space or comma, being sure
                        to press the "Enter" key at the end of each line.
                          
 List your data-file:  
                        Enter a "y" to list or print input data.
 
   Dependent var col:  
                        Enter the column number of the dependent variable,
                        eg. 1, 2, 3 etc. (1 is assumed if left blank).
                        This is the only time column number is used to indicate
                        which vector of data is being referenced.  All other
                        references to vectors of data are via independent
                        variable number, not column number.

    Linear variables:  
                        Enter the independent variable number(s) of the
                        variables you do not want transformed, separated by
                        commas, eg. 1,4,5.  Use the letter "a" if you want
                        all independent variables of the data file to be
                        linear.  Use the letter "i" if you want all 
                        interaction terms to be linear.  The dependent
                        variable is always linear by default.

   Exclude variables:  
                        Enter the independent variable number(s) of the
                        variables you do not want included in the model,
                        separated by commas, eg. 1,3.  Use the letter "i"
                        if you want all interaction terms excluded.

     Force variables:  
                        Enter the independent variable number(s) of the
                        variables you want forced into the model, separated
                        by commas, eg. 1,3.  Use the letter "a" if you want
                        all independent variables of the data file forced
                        into the equation.  Use the letter "i" if you want
                        all interaction terms forced into the equation.

Force transformation:  
                        Independent variables can have a particular
                        transformation forced.  This should only be used
                        where the transformation of the variable is known.
                        Enter the independent variable number followed by
                        an equals sign (=), followed by the transformation
                        type where transformation types are
                        1  x to a power      e.g. 1=1=2.2 var 1 to the 2.2 power 
                        2  log of x          e.g. 1=2     log of var 1 
                        3  e to the x power  e.g. 1=3     e to the var 1 power 
                        4  e to the minus x  e.g. 1=4     e to the minus var 1 power
                        5  sine of x         e.g. 1=5     sine of var 1 
                        6  b to the x power  e.g. 1=6=7.1 7.1 to the var 1 power  
                        If the trasformation type is type 1, then follow
                        the 1 with an equals sign (=) and then the power,
                        eg. 2=1=1.8 which means independent variable 2 is
                        to have a power transformation to power 1.8.  
                        If the trasformation type is type 6, then follow
                        the 6 with an equals sign (=) and then the "b" value,
                        eg. 2=6=7.1 which means independent variable 2 is
                        to have a "b to x" transformation where "b" is 7.1. 
                        If more than one forced transformation, separate each
                        with a comma.  

 Bottom of new scale:  
                        Enter the bottom of the new scale you are using for 
                        the re-scaled variables specified immediately below.
                        Any valid number is permitted here, but if none is  
                        entered the default value is 1 for all re-scaled     
                        variables.            

    Top of new scale:  
                        Enter the top of the new scale you are using for the  
                        re-scaled variables specified immediately below.
                        Any valid number is permitted here, but if none is  
                        entered the default value is 5 for all re-scaled     
                        variables.            

 Scale dependent var:  
                        Enter a "y" for scaling of the dependent variable to a
                        scale you specified.  Dependent variable scaling can
                        be used to eliminate negative values in the dependent
                        variable so that certain independent variable
                        transformations are possible.

 Re-scaled variables:  
                        Enter the independent variable number(s) of the
                        variables you want to be re-scaled, separated
                        by commas, eg. 1,3.  Use the letter "a" if you want
                        all independent variables of the data file to be
                        re-scaled.  Use the letter "i" if you want all
                        interaction terms to be re-scaled.  Re-scaling
                        puts independent variables on a new scale of 1
                        to 5, unless you specified a new top and bottom  
                        of the scale above, e.g. 0.1 to 9.  Re-scaling
                        should be used when an independent variable's scale
                        does not lend itself to a particular transformation,
                        e.g. e to the minus x, where x is very large for all
                        values of the independent variable, rendering the
                        transformed values close to zero.  Re-scaling is
                        generally recommended since transformations like
                        logs and powers will achieve higher predictive
                        capability when re-scaled before transformation.
                        Enter a "q" to cause all independent variables and 
                        the dependent variable to have the mean subtracted  
                        before any scaling of the data.  When "q" is specified 
                        the interaction terms do not have their mean subtracted
                        although they would be comprised of independent variables
                        whose mean was subtracted before creating the interaction         
                        terms.  The "q" option centers the independent variables          
                        and the dependent variable which sometimes improves the
                        model.                                                     

 Dummy var variables:  
                        Enter the independent variable number(s) of the
                        variables you want to be dummy variables, separated
                        by commas, eg. 1,3.  Use the letter "a" if you want
                        all independent variables of the data file to be
                        dummy variables.  Use the letter "i" if you want all
                        interaction terms to be dummy variables.  Dummy
                        variables will not be re-scaled nor transformed.
 
  Residuals to print:  
                        Enter the number of residuals you would like printed
                        in your output.  Leave blank to print all residuals.

  Excel the equation:  
                        Enter a "y" to create an Excel spreadsheet of the final
                        regression equation.  The Excel spreadsheet can be used
                        to automatically calculate predictions using different 
                        independent variable values.  The spreadsheet will 
                        appear at the very end of the printout and will be in 
                        HTML format which can then be imported into Excel as
                        a spreadsheet using the "Data" menu item of Excel when 
                        importing a WWW file such as "reg.html".  Merely copy  
                        the HTML to a separate disk file which can be accessed 
                        via the World Wide Web and import the file into Excel. 

  Correlation matrix:  
                        Enter a "y" to print correlation matrix of variables.
                        An upper case "Y" excludes interaction terms.
                        A lower case "y" includes interaction terms.
                        A word of caution here: listing the correlation
                        matrix can be lengthy due to the interaction terms,
                        e.g. a regression of 15 independent variables takes
                        83519 lines to print out.

  Ridge regression K:  
                        Enter a "k" value to print out a ridge regression
                        solution in addition to the OLS solution.  The "k"
                        value can be from 0 (zero) to 1.0, with zero indicating
                        to use the maximum VIF K value.

 Simultaneous system:  
                        Enter a "y" for simultaneous system solution.
                        A simultaneous system solution is when you have two
                        independent variables in the model and every possible
                        permutation of transformations of those two 
                        independent variables is evaluated to see which set of
                        transformations is best.  Four variables or more in
                        the model are not permitted due to the excessive amount
                        of time required to run the simultaneous system option.

  Joint simultaneous:  
                        Enter a "y" for joint simultaneous system solution.
                        It will find the best two independent variables of all
                        independent variables.  A simultaneous system solution
                        is then calculated for the two best independent variables.
                        The joint simultaneous is similar to stepwise regression
                        except that it finds the two best independent variables
                        instead of just one as in the stepwise procedure, and it
                        does a simultaneous solution for those two independent var.
                        In finding the two best independent variables, the software
                        looks at every possible 2 way combination of independent
                        variables.  This means it looks at 1 combination for 2
                        independent variables, 3 combinations for 3 independent
                        variables, 6 combinations for 4 independent variables, 10
                        combinations for 5 independent variables, 15 combinations
                        for 6 independent variables, 21 combinations for 7 independent
                        variables,...etc.  Entering a "y" will allow the program to
                        continue finding variables to add to the model after finding
                        the "joint" solution, while entering a "Y" (upper case) will 
                        cause the software to stop with the "joint" solution.

  Sine wavelength(s):  
                        Enter the independent variable number(s) of the sine
                        variables followed by an equals sign, then their
                        respective wavelengths, separated by commas,
                        e.g. 1=365.2564, 2=24.0.
                        Must be entered if simultaneous system solution.

 

Other places to do Regression on the World Wide Web
An Introduction to Regression

To email the author of AutoFit use the email address below