Gpufit Documentation¶
Introduction¶
Gpufit is a GPU-accelerated CUDA implementation of the Levenberg-Marquardt algorithm. It was developed to meet the need for a high performance, general- purpose nonlinear curve fitting software library which is publicly available and open source.
Optimization algorithms are ubiquitous tools employed in many field of science and technology. One such algorithm for numerical, non-linear optimization is the Levenberg-Marquardt algorithm (LMA). The LMA combines elements of the method of steepest descent and Newton’s method, and has become a standard algorithm for least-squares fitting. Box constraints on parameter values can be added as suitable projections during the optimization steps.
Although the LMA is, in itself, an efficient optimization algorithm, applications requiring many iterations of this procedure may encounter limitations due to the sheer number of calculations involved. The time required for the convergence of a fit, or a set of fits, can determine an application’s feasibility, e.g. in the context of real-time data processing and feedback systems. Alternatively, in the case of very large datasets, the time required to solve a particular optimization problem may prove impractical.
In recent years, advanced graphics processing units (GPUs) and the development of general purpose GPU programming have enabled fast and parallelized computing by shifting calculations from the CPU to the GPU. The large number of independent computing units available on a modern GPU enables the rapid execution of many instructions in parallel, with an overall computation power far exceeding that of a CPU. Languages such as CUDA C and OpenCL allow GPU- based programs to be developed in a manner similar to conventional software, but with an inherently parallelized structure. These developments have led to the creation of new GPU-accelerated tools, such as the Gpufit.
Gpufit supports cubic spline functions that can be used to approximate arbitrary (smooth) fit model functions. In order to use them a spline representation of the model function must be provided (as an array of suitable spline coefficients). See Gpuspline on Github for details on how to compute these spline representations.
This manual describes how to install and build the Gpufit library and its external bindings. Furthermore it details how to extend Gpufit by adding custom model functions as well as custom fit estimator functions.
The documentation includes:
Instructions for building and installing Gpufit
A detailed description of the C interface
A description of the built-in model functions
A description of the built-in goodness-of-fit estimator functions
A detailed description of the external bindings to Matlab and Python
Usage examples for C, Matlab, and Python
Instructions for adding custom model functions or custom estimator functions
The current version of the Gpufit library is 1.2.0 (see homepage). This manual was compiled on Aug 31, 2023.
How to cite Gpufit¶
Gpufit was created by Mark Bates, Adrian Przybylski, Björn Thiel, and Jan Keller-Findeisen at the Max Planck Institute for Biophysical Chemistry, in Göttingen, Germany.
The development and maintenance of open-source software projects, such as Gpufit, requires significant time and resources from the project team. If you use Gpufit in your research, please cite our publication. A paper describing the Gpufit software was published in the journal Scientific Reports, and is available from the Scientific Reports website (open-access), [here](https://www.nature.com/articles/s41598-017-15313-9).
The citation for the Gpufit paper is as follows:
Gpufit: An open-source toolkit for GPU-accelerated curve fitting
Adrian Przybylski, Björn Thiel, Jan Keller-Findeisen, Bernd Stock, and Mark Bates
Scientific Reports, vol. 7, 15722 (2017); doi: https://doi.org/10.1038/s41598-017-15313-9
Hardware requirements¶
Because the fit algorithm is implemented in CUDA C, a CUDA-compatible graphics card is required to run Gpufit. The minimum supported compute capability is 2.0. More advanced GPU hardware will result in higher fitting performance.
Software requirements¶
In addition to a compatible GPU, the graphics card driver installed on the host computer must be compatible with the version of the CUDA toolkit which was used to compile Gpufit. This may present an issue for older graphics cards or for computers running outdated graphics drivers.
At the time of its initial release in 2017, Gpufit was compiled with CUDA toolkit version 8.0. Therefore, the Nvidia graphics driver installed on the host PC must be at least version 367.48 (released July 2016) in order to be compatible with the binary files generated in this build.
When compatibility issues arise, there are two possible solutions. The best option is to update the graphics driver to a version which is compatible with the CUDA toolkit used to build Gpufit. The second option is to re-compile Gpufit from source code, using an earlier version of the CUDA toolkit which is compatible with the graphics driver in question. However, this solution is likely to result in slower performance of the Gpufit code, since older versions of the CUDA toolkit are not as efficient.
Note that all CUDA-supported graphics cards should be compatible with CUDA toolkit version 6.5. This is the last version of CUDA which supported GPUs with compute capability 1.x. In other words, an updated Nvidia graphics driver should be available for all CUDA-enabled GPUs which is compatible with toolkit version 6.5.
If you are unsure if your graphics card is CUDA-compatible, a lists of CUDA supported GPUs can be found here.
Installation and Testing¶
The Gpufit library can be used in several ways. When using a pre-compiled binary version of Gpufit, the Gpufit functions may be accessed directly via a dynamic linked library (e.g. Gpufit.dll) or via the external bindings to Gpufit (e.g. the Matlab or Python bindings). For more information on the Gpufit interface, see Gpufit API description, or for details of the external bindings see External bindings.
This section describes how to compile Gpufit, including generating its external bindings, from source code. Building from source is necessary when a fit model function is added or changed, or if a new fit estimator is required. Building the library may also be useful for compiling the code using a specific version of the CUDA toolkit, or for a particular CUDA compute capability.
Gpufit binary distribution¶
A binary distribution of the Gpufit library is available for Windows. Use of this distribution requires only a CUDA-capable graphics card, and an updated Nvidia graphics driver. The binary package contains:
The Gpufit SDK, which consists of the 32-bit and 64-bit DLL files, and the Gpufit header file which contains the function definitions. The Gpufit SDK is intended to be used when calling Gpufit from an external application written in e.g. C code.
The performance test application, which serves to test that Gpufit is correctly installed, and to check the performance of the CPU and GPU hardware.
Matlab 32 bit and 64 bit bindings, with Matlab examples.
Python version 2.x and version 3.x bindings (compiled as wheel files) and Python examples.
Java 8 32 bit and 64 bit bindings, with Java examples.
This manual in PDF format.
To re-build the binary distribution, see the instructions located in package/README.md.
Building from source code¶
This section describes how to build Gpufit from source code. Note that as of the initial release of Gpufit, the source code has been tested only with the Microsoft Visual Studio compiler.
Prerequisites¶
The following tools are required in order to build Gpufit from source.
Required
CMake 3.11 or later
A C/C++ Compiler
Linux: GCC 4 (tested with 4-6)
Windows: Visual Studio 2013 (tested with 2013 - 2019)
Optional
Boost 1.58 or later (required if you want to build the tests)
MATLAB if building the MATLAB bindings (minimum version Matlab 2012a)
Python if building the Python bindings (Python version 2.x or 3.x). Note that the “wheel” package is required when building the Python binding.
Java if building the Java bindings (minimum Java JDK version 8)
PDF Latex installation (like Miktex) if converting the documentation from Latex to PDF.
Source code availability¶
The source code is available in an open repository hosted at Github, at the following URL.
https://github.com/gpufit/Gpufit.git
To obtain the code, Git may be used to clone the repository, or a current snapshot may be downloaded directly from Github as Gpufit-master.zip.
Compiler configuration via CMake¶
CMake is an open-source tool designed to build, test, and package software. It is used to control the software compilation process using compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment. In this section we provide a simple example of how to use CMake in order to generate the input files for the compiler (e.g. the Visual Studio solution file), which can then be used to compile Gpufit.
First, identify the directory which contains the Gpufit source code (for example, on a Windows computer the Gpufit source code may be stored in C:\Sources\Gpufit). Next, create a build directory outside the source code source directory (e.g. C:\Sources\Gpufit-build-64). Finally, run cmake to configure and generate the compiler input files.
Using the CMake Graphical User Interface¶
There is a graphical user interface available for CMake, which simplifies the configuration and generation steps. For further details, see Running CMake. The following steps outline how to use the basic features of the CMake GUI.
First, select the source code directory (the top level directory where the Gpufit source code is located), and the build directory (where the binaries will be built). For this example, the source directory might be C:\Sources\Gpufit, and the build directory might be C:\Sources\Gpufit-build-64.
Next, click the “Configure” button, and select the desired compiler from the drop down list (e.g. Visual Studio 12 2013). Under Optional platform for Generator, select the desired architecture (e.g. x64 to compile 64-bit binaries).
Once configuration is complete, CMake will have automatically found the Matlab installation, and the installation directories will be listed in the NAME and VALUE columns. If the Matlab installation was not found, the entries in the VALUE column can be manually edited.
Next, click on Generate to generate the Visual Studio solution files, which will be used to build the Gpufit package.
Running CMake from the command line¶
The following commands, executed from the command prompt, assume that the cmake executable (e.g. C:\Program Files\CMake\bin\cmake.exe) is automatically found via the PATH environment variable (if not, the full path to cmake.exe must be specified). This example also assumes that the source and build directories have been set up as specified above.
cd C:\Sources\Gpufit-build-64
cmake -G "Visual Studio 12 2013 Win64" C:\Sources\Gpufit
Note that in this example the -G flag has been used to specify the 64-bit version of the Visual Studio 12 compiler. This flag should be changed depending on the compiler used, and the desired architecture (e.g. 32- or 64-bit). Further details of the CMake command line arguments can be found here.
There is also a graphical user interface available for CMake, which simplifies the configuration and generation steps. For further details, see Running CMake.
Common issues encountered during CMake configuration¶
It’s strongly recommended to use the latest available CMake version. Especially the find CUDA, Matlab, Python capabilities of CMake get updated often.
Boost NOT found - skipping tests!
If you want to build the tests and Boost is not found automatically, set the CMake variable BOOST_ROOT to the corresponding directory, and configure again.
JAVA JNI NOT found - skipping Gpufit Java binding!
If you want to build the Java binding and CMake cannot find Java, you can set the CMake variable JAVA_HOME to specify a Java installation explicitely.
Specify CUDA_ARCHITECTURES set
If you need a specific CUDA architecture, set CUDA_ARCHITECTURES according to CUDA_SELECT_NVCC_ARCH_FLAGS.
CMake finds last installed CUDA toolkit version by default
If there are multiple CUDA toolkits installed on the computer, CMake 3.7.1 seems to find by default the lowest installed version. In this case set the desired CUDA version manually (e.g. by editing the CUDA_TOOLKIT_ROOT_DIR variable in CMake).
Specify CUDA version to use
Set CUDA_BIN_PATH before running CMake or CUDA_TOOLKIT_ROOT_DIR after first CMAKE configuration to the installation folder of the desired CUDA version.
Required CUDA version
When using Microsoft Visual Studio 2015, the minimum required CUDA Toolkit version is 8.0.
No suitable Matlab and/or MX_Library version found - skipping Gpufit Matlab binding!
CMake might not be able to locate Matlab, in which case this message will be shown. Try to set the Matlab_ROOT_DIR variable manually (for example “C:/Program Files/MATLAB/R2020b” or “/usr/local/Matlab/2020b” on Linux) and run CMake again.
Python launcher
Set Python_WORKING_DIRECTORY to a valid directory, it will be added to the Python path.
Matlab launcher
Set Matlab_WORKING_DIRECTORY to a valid directory, it will be added to the Matlab path.
Documentation build issues
Note that the several Python packages are required to run the “documentation_create_latex.bat” script on Windows systems. Please ensure that the “sphinx” and “sphinx_rtd_style” packages are installed in your Python distribution.
PDFLATEX not found
When using Miktex, if the PDFLATEX package is not automatically found, the path to pdflatex.exe can be specified to CMake with the MIKTEX_BINARY_PATH variable (available under Advanced options).
Compiling Gpufit on Windows¶
After configuring and generating the solution files using CMake, go to the desired build directory and open Gpufit.sln using Visual Studio. Select the “Debug” or “Release” build options, as appropriate. Select the build target “ALL_BUILD”, and build this target. If the build process completes without errors, the Gpufit binary files will be created in the corresponding “Debug” or “Release” folders in the build directory.
The unit tests can be executed by building the target “RUN_TESTS” or by starting the created executables in the output directory from the command line.
To install the python gpufit package go to “Release” folder and here you find Gpufit resp Cpufit python packages.
Compiling Gpufit on Linux¶
A successful build has been verified on Ubuntu 18.04 LTS with gcc 5.5 and CUDA 9.1 following the instructions on the NVidia website. CMake needs to be at least version 3.11. To perform the tests, a development version of Boost should be installed (e.g. libboost-all-dev).
The following commands were executed.
git clone https://github.com/gpufit/Gpufit.git Gpufit
mkdir Gpufit-build
cd Gpufit-build
cmake -DCMAKE_BUILD_TYPE=RELEASE ../Gpufit
make
In case, during make there is an error “unsupported GNU version! gcc versions later than X are not supported”, it means that CUDA needs an older version of gcc. Provided that such a version is installed on the system you can choose it with the -DCMAKE_C_COMPILER option to cmake. For example, for CUDA 9 one should add -DCMAKE_C_COMPILER=gcc-5 in the call to cmake.
The tests can be run for example by “make test”. Run the performance comparison with
./Gpufit_Cpufit_performance_comparison
To install the Python package
cd pyGpufit/dist
pip install pyGpufit-X.Y.Z-py2.py3-none-any.whl
To run the Matlab package you need to tell Matlab to use a newer version of the C++ standard library
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
matlab
Then in Matlab add the matlab output directory to the path and execute some examples.
addpath('XX/Gpufit-build/matlab');
cd('XX/Gpufit/Gpufit/matlab/examples');
gauss2d();
MacOS¶
Gpufit has not yet been officially tested on a computer running MacOS with a CUDA capable graphics card. However, satisfying the Prerequisites and using CMake, we estimate that the library should build in principle and one should also be able to run the examples on MacOS.
Using cuBLAS¶
Optionally, Gpufit may be configured to make use of the cuBLAS library. This library is used for solving the equation system during the fit process.
Several benefits are obtained by using cuBLAS:
Moderately faster fitting performance.
Improved numerical stability.
The number of fit model parameters may be as high as the number of threads per GPU thread block (1024). Without cuBLAS, the maximum number of model parameters is limited to 31.
To include cuBLAS functionality in Gpufit, the USE_CUBLAS flag may be set during the CMAKE configuration step. Note that the Gpufit cuBLAS feature is only available for 64-bit architectures, and for code built with CUDA toolkit version >= 7.0. Finally, it is important to note that cuBLAS is not statically linked in the Gpufit DLL, but rather it is dynamically linked. Hence, when distributing applications, the cuBLAS DLL (e.g. cublas64_80.dll for CUDA toolkit version 8) must be packaged along with Gpufit.dll.
Running the performance test¶
The Gpufit performance test is a program which verifies the correct function of Gpufit, and tests the fitting speed in comparison with the same algorithm executed on the CPU.
If Gpufit was built from source, running the build target GPUFIT_CPUFIT_Performance_Comparison will run the test, which executes the fitting process multiple times, varying the number of fits per function call. The execution time is measured in each case and the relative speed improvement between the GPU and the CPU is calculated. A successful run of the performance test also indicates also that Gpufit is functioning correctly.
The performance comparison is also included in the Gpufit binary distribution as a console application. An example of the program’s output is shown in Fig. 1.

Output of the Gpufit vs Cpufit performance comparison¶
Gpufit API description¶
The Gpufit source code compiles to a dynamic-link library (DLL), providing a C interface. In the sections below, the C interface and its arguments are described in detail.
C Interface¶
The C interface is defined in the Gpufit header file: gpufit.h.
gpufit()¶
This is the main fit function. A single call to the gpufit()
function executes a block of N fits.
The inputs to gpufit()
are scalars and pointers to arrays, and the outputs are also array pointers.
The inputs to the gpufit()
function are:
the number of fits (N),
the number of data points per fit (each fit has equal size),
the fit data,
an array of weight values that are used to weight the individual data points in the fit (optional),
an ID number which specifies the fit model function,
an array of initial parameters for the model functions,
a tolerance value which determines when the fit has converged,
the maximum number of iterations per fit,
an array of flags which allow one or more fit parameters to be held constant,
an ID number which specifies the fit estimator (e.g. least squares, etc.),
the size of the user info data,
the user info data, which may have multiple uses, for example to pass additional parameters to the fit functions, or to include independent variables (e.g. X values) with the fit data.
The outputs of gpufit()
are:
the best fit model parameters for each fit,
an array of flags indicating, for example, whether each fit converged,
the final value of \(\chi^2\) for each fit,
the number of iterations needed for each fit to converge.
The gpufit()
function call is defined below.
int gpufit
(
size_t n_fits,
size_t n_points,
float * data,
float * weights,
int model_id,
float * initial_parameters,
float tolerance,
int max_n_iterations,
int * parameters_to_fit,
int estimator_id,
size_t user_info_size,
char * user_info,
float * output_parameters,
int * output_states,
float * output_chi_squares,
int * output_n_iterations
) ;
Description of input parameters¶
- n_fits:
Number of fits to be performed
- type:
size_t
- n_points:
Number of data points per fit
Gpufit is designed such that each fit must have the same number of data points per fit.
- type:
size_t
- data:
Pointer to data values
A pointer to the data values. The data must be passed in as a 1D array of floating point values, with the data for each fit concatenated one after another. In the case of multi-dimensional data, the data must be flattened to a 1D array. The number of elements in the array is equal to the product n_fits * n_points.
- type:
float *
- length:
n_points * n_fits
- weights:
Pointer to weights
The weights array includes unique weighting values for each fit. It is used only by the least squares estimator (LSE). The size of the weights array and its organization is identical to that for the data array. For statistical weighting, this parameter should be set equal to the inverse of the variance of the data (i.e. weights = 1.0 / variance ). The weights array is an optional input.
- type:
float *
- length:
n_points * n_fits
- special:
Use a NULL pointer to indicate that no weights are provided. In this case all data values will be weighted equally.
- model_id:
Model ID
Determines the model which is used for all fits in this call. See Fit Model functions for more details.
As defined in constants.h:
- 0:
GAUSS_1D
- 1:
GAUSS_2D
- 2:
GAUSS_2D_ELLIPTIC
- 3:
GAUSS_2D_ROTATED
- 4:
CAUCHY_2D_ELLIPTIC
- 5:
LINEAR_1D
- type:
int
- initial_parameters:
Pointer to initial parameter values
A 1D array containing the initial model parameter values for each fit. If the number of parameters of the fit model is defined by n_parameters, then the size of this array is n_fits * n_parameters.
The parameter values for each fit are concatenated one after another. If there are M parameters per fit, the parameters array is organized as follows: [(parameter 1), (parameter 2), …, (parameter M), (parameter 1), (parameter 2), …, (parameter M), …].
- type:
float *
- length:
n_fits * n_parameters
- tolerance:
Fit tolerance threshold
The fit tolerance determines when the fit has converged. After each fit iteration, the change in the absolute value of \(\chi^2\) is calculated. The fit has converged when one of two conditions are met. First, if the change in the absolute value of \(\chi^2\) is less than the tolerance value, the fit has converged. Alternatively, if the change in \(\chi^2\) is less than the product of tolerance and the absolute value of \(\chi^2\) [tolerance * abs(\(\chi^2\))], then the fit has converged.
Setting a lower value for the tolerance results in more precise values for the fit parameters, but requires more fit iterations to reach convergence.
A typical value for the tolerance settings is between 1.0E-3 and 1.0E-6.
- type:
float
- max_n_iterations:
Maximum number of iterations
The maximum number of fit iterations permitted. If the fit has not converged after this number of iterations, the fit returns with a status value indicating that the maximum number of iterations was reached.
- type:
int
- parameters_to_fit:
Pointer to array indicating which model parameters should be held constant during the fit
This is an array of ones or zeros, with a length equal to the number of parameters of the fit model function. Each entry in the array is a flag which determines whether or not the corresponding model parameter will be held constant during the fit. To allow a parameter to vary during the fit, set the entry in parameters_to_fit equal to one. To hold the value constant, set the entry to zero.
An array of ones, e.g. [1,1,1,1,1,…] will allow all parameters to vary during the fit.
- type:
int *
- length:
n_parameters
- estimator_id:
Estimator ID
Determines the fit estimator which is used. See Estimator functions for more details.
As defined in constants.h:
- 0:
LSE
- 1:
MLE
- type:
int
- user_info_size:
Size of user information data
Size of the user information data array, in bytes.
- type:
size_t
- user_info:
Pointer to user information data
This parameter is intended to provide flexibility to the Gpufit interface. The user information data is a generic block of memory which is passed in to the
gpufit()
function, and which is accessible in shared GPU memory by the fit model functions and the estimator functions. Possible uses for the user information data are to pass in values for independent variables (e.g. X values) or to supply additional data to the fit model function or estimator. The documentation of the fit model function or estimator must specify the composition of the user info data. For a coded example which makes use of the user information data, see Linear Regression Example. The user information data is an optional parameter - if no user information is required this parameter may be set to NULL.- type:
char *
- length:
user_info_size
- special:
Use a NULL pointer to indicate that no user information is available. The interpretation of the user info depends completely on the used fit model function or estimator.
Description of output parameters¶
- output_parameters:
Pointer to array of best-fit model parameters
For each fit, this array contains the best-fit model parameters. The array is organized identically to the input parameters array.
- type:
float *
- length:
n_fits * n_parameters
- output_states:
Pointer to array of fit result state IDs
For each fit the result of the fit is indicated by a state ID. The state ID codes are defined below. A state ID of 0 indicates that the fit converged successfully.
As defined in constants.h:
- 0:
The fit converged, tolerance is satisfied, the maximum number of iterations is not exceeded
- 1:
Maximum number of iterations exceeded
- 2:
During the Gauss-Jordan elimination the Hessian matrix is indicated as singular
- 3:
Non-positive curve values have been detected while using MLE (MLE requires only positive curve values)
- 4:
State not read from GPU Memory
- type:
int *
- length:
n_fits
- output_chi_squares:
Pointer to array of \(\chi^2\) values
For each fit, this array contains the final \(\chi^2\) value, as returned by the estimator function (see Estimator functions).
- type:
float *
- length:
n_fits
- output_n_iterations:
Pointer to array of iteration counts
For each fit, this array contains the number of fit iterations which were performed.
- type:
int *
- length:
n_fits
- return value:
Status code
The return value of the function call indicates whether an error occurred. As defined in constants.h.
- 0:
No error
- -1:
Error
gpufit_constrained()¶
This is very similar to the gpufit()
function but with the additional possibility to add box constraints on the
allowed parameter ranges.
The gpufit_constrained()
function call is defined below.
int gpufit_constrained
(
size_t n_fits,
size_t n_points,
float * data,
float * weights,
int model_id,
float * initial_parameters,
float * constraints,
int * constraint_types,
float tolerance,
int max_n_iterations,
int * parameters_to_fit,
int estimator_id,
size_t user_info_size,
char * user_info,
float * output_parameters,
int * output_states,
float * output_chi_squares,
int * output_n_iterations
) ;
In order to not repeat the same information all input and output parameters in gpufit_constrained()
that also
exist in gpufit()
have exactly the same definition and interpretation. Below only the additional input parameter
regarding the constraints are explained.
Description of constraints input parameters¶
- constraints:
Pointer to model parameter constraint intervals
A 1D array containing the model parameter constraint lower and upper bounds for all parameters (including fixed parameters) and for all fits. Order is lower, upper bound first, then parameters, then number of fits.
- type:
float *
- length:
n_fits * n_parameters * 2
- constraint_types:
Pointer to constraint types for each parameter
A 1D array containing the constraint types for each parameter (including fixed parameters). The constraint type is defined by an int with 0 - no constraint, 1 - only constrain lower bound, 2 - only constrain upper bound, 3 - constrain both lower and upper bounds.
- type:
int *
- length:
n_parameters
gpufit_cuda_interface()¶
This function performs the fitting without transferring the input and output data between CPU and GPU memory. The
allocation of GPU memory for input and output data is skipped, as well. The structures of input and output arrays are
equal to the main interface function gpufit()
. There are no separate arrays for initial and best-fit parameter
values. The argument gpu_fit_parameters
points to initial parameter values at start of the routine and to
best-fit parameter values at the end.
int gpufit_cuda_interface
(
size_t n_fits,
size_t n_points,
float * gpu_data,
float * gpu_weights,
int model_id,
float tolerance,
int max_n_iterations,
int * parameters_to_fit,
int estimator_id,
size_t user_info_size,
char * gpu_user_info,
float * gpu_fit_parameters,
int * gpu_output_states,
float * gpu_output_chi_squares,
int * gpu_output_n_iterations
) ;
Description of input parameters¶
- n_fits:
Number of fits to be performed
- type:
size_t
- n_points:
Number of data points per fit
- type:
size_t
- gpu_data:
Pointer to data values stored on GPU
- type:
float *
- length:
n_points * n_fits
- gpu_weights:
Pointer to weights stored on GPU
- type:
float *
- length:
n_points * n_fits
- special:
Use a NULL pointer to indicate that no weights are provided. In this case all data values will be weighted equally.
- model_id:
Model ID
- type:
int
- tolerance:
Fit tolerance threshold
- type:
float
- max_n_iterations:
Maximum number of iterations
- type:
int
- parameters_to_fit:
Pointer to array indicating which model parameters should be held constant during the fit
- type:
int *
- length:
n_parameters
- estimator_id:
Estimator ID
- type:
int
- user_info_size:
Size of user information data
- type:
size_t
- gpu_user_info:
Pointer to user information data stored on GPU
- type:
char *
- length:
user_info_size
- special:
Use a NULL pointer to indicate that no user information is available.
Description of input/output parameters¶
- gpu_fit_parameters:
Pointer to array of model parameters stored on GPU
input: initial parameter values
output: best-fit parameter values
- type:
float *
- length:
n_fits * n_parameters
Description of output parameters¶
- gpu_output_states:
Pointer to array of fit result state IDs stored on GPU
- type:
int *
- length:
n_fits
- gpu_output_chi_squares:
Pointer to array of \(\chi^2\) values stored on GPU
- type:
float *
- length:
n_fits
- gpu_output_n_iterations:
Pointer to array of iteration counts stored on GPU
- type:
int *
- length:
n_fits
- return value:
Status code
- 0:
No error
- -1:
Error
gpufit_constrained_cuda_interface()¶
This function is very similar to the gpufit_cuda_interface()
function but with the additional possibility to add box constraints on the
allowed parameter ranges.
int gpufit_constrained_cuda_interface
(
size_t n_fits,
size_t n_points,
float * gpu_data,
float * gpu_weights,
int model_id,
float tolerance,
int max_n_iterations,
int * parameters_to_fit,
float * gpu_constraints,
int * constraint_types,
int estimator_id,
size_t user_info_size,
char * gpu_user_info,
float * gpu_fit_parameters,
int * gpu_output_states,
float * gpu_output_chi_squares,
int * gpu_output_n_iterations
) ;
In order to not repeat the same information all input and output parameters in gpufit_constrained_cuda_interface()
that also
exist in gpufit_cuda_interface()
have exactly the same definition and interpretation. Below only the additional input parameter
regarding the constraints are explained.
Description of constraint input parameters¶
- gpu_constraints:
Pointer to model parameter constraint intervals stored on the GPU
A 1D array containing the model parameter constraint lower and upper bounds for all parameters (including fixed parameters) and for all fits. Order is lower, upper bound first, then parameters, then number of fits.
- type:
float *
- length:
n_fits * n_parameters * 2
- constraint_types:
Pointer to constraint types for each parameter
A 1D array containing the constraint types for each parameter (including fixed parameters). The constraint type is defined by an int with 0 - no constraint, 1 - only constrain lower bound, 2 - only constrain upper bound, 3 - constrain both lower and upper bounds.
- type:
int *
- length:
n_parameters
gpufit_portable_interface()¶
This function is a simple wrapper around the gpufit()
function, providing an alternative means of passing the function parameters.
int gpufit_portable_interface(int argc, void *argv[]);
Description of parameters¶
- argc:
The length of the argv pointer array
- argv:
Array of pointers to gpufit parameters, as defined above. For reference, the type of each element of the argv array is listed below.
- argv[0]:
Number of fits
- type:
size_t *
- argv[1]:
Number of points per fit
- type:
size_t *
- argv[2]:
Fit data
- type:
float *
- argv[3]:
Fit weights
- type:
float *
- argv[4]:
Fit model ID
- type:
int *
- argv[5]:
Initial parameters
- type:
float *
- argv[6]:
Fit tolerance
- type:
float *
- argv[7]:
Maximum number of iterations
- type:
int *
- argv[8]:
Parameters to fit
- type:
int *
- argv[9]:
Fit estimator ID
- type:
int *
- argv[10]:
User info size
- type:
size_t *
- argv[11]:
User info data
- type:
char *
- argv[12]:
Output parameters
- type:
float *
- argv[13]:
Output states
- type:
int *
- argv[14]:
Output \(\chi^2\) values
- type:
float *
- argv[15]:
Output number of iterations
- type:
int *
- return value:
This function simply returns the
gpufit()
return status code.
gpufit_constrained_portable_interface()¶
This function is a simple wrapper around the gpufit_constrained()
function, providing an alternative means of passing the function parameters.
int gpufit_constrained_portable_interface(int argc, void *argv[]);
Description of parameters¶
- argc:
The length of the argv pointer array
- argv:
Array of pointers to gpufit_constrained parameters, as defined above.
- return value:
This function simply returns the
gpufit()
return status code.
gpufit_get_last_error()¶
A function that returns a string representation of the last error.
char const * gpufit_get_last_error();
- return value:
Error message corresponding to the most recent error, or an empty string if no error occurred.
- ‘CUDA driver version is insufficient for CUDA runtime version’
The graphics driver version installed on the computer is not supported by the CUDA Toolkit version which was used to build Gpufit.dll. Update the graphics driver or re-build Gpufit using a compatible CUDA Toolkit version.
- ‘too many resources requested for launch’
Exceeded number of available registers per thread block. Adding model functions to models.cuh can increase the number of registers per thread used by the kernel cuda_calc_curve_values(). If this error occurs, comment out unused models in function calculate_model() in file models.cuh.
gpufit_cuda_available()¶
A function that calls a simple CUDA function to check if CUDA is available.
int gpufit_cuda_available();
- return value:
Returns 0 if CUDA is not available (no suitable device found, or driver version insufficient). Use the function gpufit_get_last_error() to check the error message. Returns 1 if CUDA is available.
gpufit_get_cuda_version()¶
A function that returns the CUDA runtime version in runtime_version and the installed CUDA driver version in driver_version.
int gpufit_get_cuda_version(int * runtime_version, int * driver_version);
- runtime_version:
Pointer to the CUDA runtime version number. Format is Minor version times 10 plus Major version times 1000. (is 0 if the CUDA runtime version is incompatible with the installed CUDA driver version)
- driver_version:
Pointer to the CUDA driver version number. Format is Minor version times 10 plus Major version times 1000. (is 0 if no CUDA enabled graphics card was detected)
- return value:
Status code
The return value of the function call indicates whether an error occurred.
- 0:
No error
- -1:
Error. Use the function gpufit_get_last_error() to check the error message.
Fit Model functions¶
This section describes the fit model functions which are included with the Gpufit library. The model IDs usable in the call of the Gpufit C Interface are defined in constants.h. Note that additional model functions may be added as described in the documentation, see Customization.
Note: Handling of independent variables¶
We note that, in the current version of the Gpufit library, the indepenent variables (e.g. X values) corresponding
to the fit data are not passed into the gpufit()
function. The default behavior of the fit models functions
is to assume the coordinates of the data values are uniformly spaced and monotonically increasing (see below).
However, the user_info
parameter of the Gpufit interface allows arbitrary data to be passed to the model
functions and estimators. Hence, this mechanism may be used to supply independent variables to the fit. In the
current release, two model functions support this method: the linear regression model and the one-dimensional Gaussian
model. This is described in more detail in the sections below, and an example is given in the Examples section of the
documentation.
The Linear regression and 1D Gaussian function models provide an option to pass in custom X coordinate values, using the user information parameter. In this case, the data type of the values must be single precision floating point. The user has two options for how to use this mechanism: one set of X values may be provided and used for all fits, or unique X values may be provided for all fits. The first option allows for faster calculations, since it requires fewer data transfer operations. If no X values are provided, the models assume that the data coordinates start with zero, as described above.
When calling Gpufit by its C Interface, the user information size parameter must be set to the product of the number of values in the user information array and the size of the data type in bytes. The number of the X coordinate values must be equal to the total number of data points, or the number of data points per fit.
Linear regression¶
A 1D linear function defined by two parameters (offset and slope). The model ID of this function is LINEAR_1D
, and
it is implemented in linear_1d.cuh.
Optional: The X coordinate of each data point may be specified via the user information data parameter of the Gpufit interface. The user information should then contain X coordinate values of type float in increasing order.
- Default X coordinates:
If user information is not provided, the X coordinate of the first data value is assumed to be (0.0). In this case, for a fit size of M data points, the X coordinates of the data are set equal to the indices of the data array, starting from zero (i.e. \(0, 1, 2, ..., M-1\)).
- Identical X coordinate values for all fits:
If the number of values in the user information array is equal to the number of data points per fit, the same X coordinate values are used for all fits.
- Unique X coordinate values for each fit:
If the number of values in the user information array is equal to the total number of data points, unique X coordinate values are used for each fit.
- x:
(independent variable) X coordinate
The X coordinate values may be specified in the user information data. For details, see the linear regression code example, Linear Regression Example.
- p_0:
offset
- p_1:
slope
1D Gaussian function¶
A 1D Gaussian function defined by four parameters. Its model ID is GAUSS_1D
and it is implemented in gauss_1d.cuh.
The user information data may be used to specify the X coordinate of each data point. Here, p
is the vector of parameters (p0..p3)
and the model function g
exists for each X coordinate of the input data.
Optional: The X coordinate of each data point may be specified via the user information data parameter of the Gpufit interface.
- Default X coordinates:
If user information is not provided, the X coordinate of the first data value is assumed to be (0.0). In this case, for a fit size of M data points, the X coordinates of the data are set equal to the indices of the data array, starting from zero (i.e. \(0, 1, 2, ..., M-1\)).
- Identical X coordinate values for all fits:
If the number of values in the user information array is equal to the number of data points per fit, the same X coordinate values are used for all fits.
- Unique X coordinate values for each fit:
If the number of values in the user information array is equal to the total number of data points, unique X coordinate values are used for each fit.
- x:
(independent variable) X coordinate
The X coordinate values may be specified in the user information data. For details on how to do this, see the linear regression code example, Linear Regression Example.
- p_0:
amplitude
- p_1:
center coordinate
- p_2:
width (standard deviation)
- p_3:
offset
2D Gaussian function (cylindrical symmetry)¶
A 2D Gaussian function defined by five parameters. Its model ID is GAUSS_2D
and it is implemented in gauss_2d.cuh.
Here, p
is the vector of parameters (p0..p4) and the model function g
exists for each x,y coordinate of the input data.
- x,y:
(independent variables) X,Y coordinates
No independent variables are passed to this model function. Hence, the (X,Y) coordinates of the first data value are assumed to be (\(0.0, 0.0\)). The fit size is M x M data points (M*M=number of data points in the interface), the (X,Y) coordinates of the data are simply the corresponding 2D array indices of the data array, starting from zero.
- p_0:
amplitude
- p_1:
center coordinate x
- p_2:
center coordinate y
- p_3:
width (standard deviation; equal width in x and y dimensions)
- p_4:
offset
2D Gaussian function (elliptical)¶
A 2D elliptical Gaussian function defined by six parameters. Its model ID is GAUSS_2D_ELLIPTIC
and it is implemented
in gauss_2d_elliptic.cuh. Here, p
is the vector of parameters (p0..p5) and the model function g
exists for each x,y coordinate of the input data.
- x,y:
(independent variables) X,Y coordinates
No independent variables are passed to this model function. Hence, the (X,Y) coordinates of the first data value are assumed to be (\(0.0, 0.0\)). The fit size is M x M data points (M*M=number of data points in the interface), the (X,Y) coordinates of the data are simply the corresponding 2D array indices of the data array, starting from zero.
- p_0:
amplitude
- p_1:
center coordinate x
- p_2:
center coordinate y
- p_3:
width x (standard deviation)
- p_4:
width y (standard deviation)
- p_5:
offset
2D Gaussian function (elliptical, rotated)¶
A 2D elliptical Gaussian function whose principal axis may be rotated with respect to the X and Y coordinate axes,
defined by seven parameters. Its model is GAUSS_2D_ROTATED
and it is implemented in gauss_2d_rotated.cuh.
Here, p
is the vector of parameters (p0..p6) and the model function g
exists for each x,y coordinate of the input data.
- x,y:
(independent variables) X,Y coordinates
No independent variables are passed to this model function. Hence, the (X,Y) coordinates of the first data value are assumed to be (\(0.0, 0.0\)). The fit size is M x M data points (M*M=number of data points in the interface), the (X,Y) coordinates of the data are simply the corresponding 2D array indices of the data array, starting from zero.
- p_0:
amplitude
- p_1:
center coordinate x
- p_2:
center coordinate y
- p_3:
width x (standard deviation)
- p_4:
width y (standard deviation)
- p_5:
offset
- p_6:
rotation angle [radians]
2D Cauchy function (elliptical)¶
A 2D elliptical Cauchy function defined by six parameters. Its model ID is CAUCHY_2D_ELLIPTIC
and it is implemented
in cauchy_2d_elliptic.cuh. Here, p
is the vector of parameters (p0..p5) and the model function g
exists for each x,y
coordinate of the input data.
- x,y:
(independent variables) X,Y coordinates
No independent variables are passed to this model function. Hence, the (X,Y) coordinates of the first data value are assumed to be (\(0.0, 0.0\)). The fit size is M x M data points (M*M=number of data points in the interface), the (X,Y) coordinates of the data are simply the corresponding 2D array indices of the data array, starting from zero.
- p_0:
amplitude
- p_1:
center coordinate x
- p_2:
center coordinate y
- p_3:
width x (standard deviation)
- p_4:
width y (standard deviation)
- p_5:
offset
1D Spline function¶
A 1D cubic spline function defined by 3 parameters and a set of cubic spline coefficients. See Gpuspline on Github for details
on how to compute the set of cubic spline coefficients from a data set so that it can be used here. The model ID is SPLINE_1D
and it is implemented in spline_1d.cuh. Here, p
is the vector of parameters (p0..p2) and c
the vector
of spline coefficients. The model function g
exists for each x coordinate of the input data.
- x:
(independent variables) X coordinates
No independent variables are passed to this model function. Hence, the X coordinate of the first data value is assumed to be \(0.0\). The fit size is M data points (M=number of data points in the interface), the X coordinates of the data are simply the corresponding array indices of the data array, starting from zero.
- p_0:
amplitude
- p_1:
center coordinate
- p_2:
offset
2D Spline function¶
A 2D cubic spline function defined by 4 parameters and a set of cubic spline coefficients. See Gpuspline on Github for details
on how to compute the set of cubic spline coefficients from a data set so that it can be used here. The model ID is SPLINE_2D
and it is implemented in spline_2d.cuh. Here, p
is the vector of parameters (p0..p3) and c
the vector
of spline coefficients. The model function g
exists for each x,y coordinate of the input data.
- x,y:
(independent variables) X,Y coordinates
No independent variables are passed to this model function. Hence, the (X,Y) coordinates of the first data value are assumed to be (\(0.0, 0.0\)). The fit size is M x N data points (M*N=number of data points in the interface), the (X,Y) coordinates of the data are simply the corresponding 2D array indices of the data array, starting from zero.
- p_0:
amplitude
- p_1:
center coordinate x
- p_2:
center coordinate y
- p_3:
offset
3D Spline function¶
A 3D cubic spline function defined by 5 parameters and a set of cubic spline coefficients. See Gpuspline on Github for details
on how to compute the set of cubic spline coefficients from a data set so that it can be used here. The model ID is SPLINE_3D
and it is implemented in spline_3d.cuh. Here, p
is the vector of parameters (p0..p4) and c
the vector
of spline coefficients. The model function g
exists for each x,y,z coordinate of the input data.
- x,y,z:
(independent variables) X,Y,Z coordinates
No independent variables are passed to this model function. Hence, the (X,Y,Z) coordinates of the first data value are assumed to be (\(0.0, 0.0, 0.0\)). The fit size is M x N x O data points (M*N*O=number of data points in the interface), the (X,Y,Z) coordinates of the data are simply the corresponding 3D array indices of the data array, starting from zero.
- p_0:
amplitude
- p_1:
center coordinate x
- p_2:
center coordinate y
- p_3:
center coordinate z
- p_4:
offset
3D Multichannel Spline function¶
A 3D cubic spline function with multiple channels defined by 5 parameters and a set of cubic spline coefficients.
See Gpuspline on Github for details on how to compute the set of cubic spline coefficients from a data set so that
it can be used here. The model ID is SPLINE_3D_MULTICHANNEL
and it is implemented in spline_3d_multichannel.cuh.
Here, p
is the vector of parameters (p0..p4) and c
the vector of spline coefficients.
The model function g
exists for each x,y,z coordinate of the input data.
- x,y,z:
(independent variables) X,Y,Z coordinates
No independent variables are passed to this model function. Hence, the (X,Y,Z) coordinates of the first data value are assumed to be (\(0.0, 0.0, 0.0\)). The fit size is M x N x O data points (M*N*O=number of data points in the interface), the (X,Y,Z) coordinates of the data are simply the corresponding 3D array indices of the data array, starting from zero.
- p_0:
amplitude
- p_1:
center coordinate x
- p_2:
center coordinate y
- p_3:
center coordinate z
- p_4:
offset
3D Multichannel Spline function with variable phase¶
A 3D cubic spline function with multiple channels and a variable phase defined by 6 parameters and a set of cubic spline
coefficients. See Gpuspline on Github for details on how to compute the set of cubic spline coefficients from a data set so that
it can be used here. The model ID is SPLINE_3D_PHASE_MULTICHANNEL
and it is implemented in
spline_3d_phase_multichannel.cuh. Here, p
is the vector of parameters (p0..p5) and c
the vector of
spline coefficients. The model function g
exists for each x,y,z coordinate of the input data.
- x,y,z:
(independent variables) X,Y,Z coordinates
No independent variables are passed to this model function. Hence, the (X,Y,Z) coordinates of the first data value are assumed to be (\(0.0, 0.0, 0.0\)). The fit size is M x N x O data points (M*N*O=number of data points in the interface), the (X,Y,Z) coordinates of the data are simply the corresponding 3D array indices of the data array, starting from zero.
- p_0:
amplitude
- p_1:
center coordinate x
- p_2:
center coordinate y
- p_3:
center coordinate z
- p_4:
offset
- p_5:
phase
Estimator functions¶
Least squares estimator¶
The least squares estimator computes the weighted sum of the squared deviation between the data values and the model at
the positions of the data points. The ID for this estimator is LSE
. It’s implemented in lse.cuh.
Least squares estimation is a common method, and the standard Levenberg-Marquardt algorithm described by Marquardt makes use of minimal least squares. The estimator is described as follows.
- n:
The index of the data points (\(0,..,N-1\))
- f_n:
The model function values at data position \(n\)
- z_n:
Data values at data position \(n\)
- vec{p}:
Fit model function parameters
- w_n:
Weight values for data at position \(n\)
Maximum likelihood estimator for data subject to Poisson statistics¶
The maximum likelihood estimator (MLE) for Poisson distributed noise is relatively simple to implement. In the case of data with Poisson noise
is provides a more precise estimate when compared to an LSE estimator. The ID for this estimator is MLE
. It’s implemented in mle.cuh.
The estimator is described as follows.
- n:
The index of the data points (\(0,..,N-1\))
- f_n:
The model function values at data position \(n\)
- z_n:
Data values at data position \(n\)
- vec{p}:
Actual model function parameters
Note that this estimator does not provide any means to weight the data values. Rather, noise in the data is assumed to be purely Poissonian.
Examples in C++¶
Example programs, written in C++, are part of the Gpufit project and can be built and run through the project environment. Here the example programs are described and the important steps in each program are highlighted.
Please note, that additionally, the C++ boost tests, and the Gpufit/Cpufit performance comparison test, may also be of value as example code demonstrating the use of Gpufit. However, a detailed description of these tests programs is not provided.
Simple example (minimal call to gpufit()
)¶
This example demonstrates a simple, minimal program containing all of the required parameters for a call to the Gpufit function. The example is contained in the file Simple_Example.cpp and it can be built and executed within the project environment. Please note that it this code does not actually do anything other than make a single call to gpufit().
In the first section of the code, the model ID is set, memory space for initial parameters and data values is allocated, the fit tolerance is set, the maximum number of iterations is set, the estimator ID is set, and the parameters to fit array is initialized. Note that in most applications, the data array will already exist and it will be unnecessary to allocate additional space for data. In this example, the parameters to fit array is initialized to all ones, indicating that all model parameters should be adjusted in the fit.
// number of fits, number of points per fit
size_t const n_fits = 10;
size_t const n_points_per_fit = 10;
// model ID and number of model parameters
int const model_id = GAUSS_1D;
size_t const n_model_parameters = 5;
// initial parameters
std::vector< float > initial_parameters(n_fits * n_model_parameters);
// data
std::vector< float > data(n_points_per_fit * n_fits);
// tolerance
float const tolerance = 0.001f;
// maximum number of iterations
int const max_number_iterations = 10;
// estimator ID
int const estimator_id = LSE;
// parameters to fit (all of them)
std::vector< int > parameters_to_fit(n_model_parameters, 1);
In the next section of code, sufficient memory is allocated for the fit results, output states, chi-square, and number of iterations arrays.
// output parameters
std::vector< float > output_parameters(n_fits * n_model_parameters);
std::vector< int > output_states(n_fits);
std::vector< float > output_chi_square(n_fits);
std::vector< int > output_number_iterations(n_fits);
Finally, a call to the C interface of Gpufit is made. In this example, the optional inputs weights and user_info are not used. The program then checks the return status from Gpufit. If an error occurred, the last error message is obtained and an exception is thrown.
// call to gpufit (C interface)
int const status = gpufit
(
n_fits,
n_points_per_fit,
data.data(),
0,
model_id,
initial_parameters.data(),
tolerance,
max_number_iterations,
parameters_to_fit.data(),
estimator_id,
0,
0,
output_parameters.data(),
output_states.data(),
output_chi_square.data(),
output_number_iterations.data()
);
// check status
if (status != STATUS_OK)
{
throw std::runtime_error(gpufit_get_last_error());
}
In summary, the above example illustrates the basic details of the parameters which are passed to the gpufit()
function, such
as the size of the input and output variables, etc. This example could be adapted for real applications by:
choosing a model ID
choosing an estimator ID
setting the fit tolerance and maximum number of iterations
using a data variable containing the data values to be fit
providing initial parameters with suitable estimates of the true parameters
processing the output data
In the following sections, examples are provided in which Gpufit is used to fit simulated datasets.
Example of 2D Gaussian fits¶
This example demonstrates the use of Gpufit to fit a dataset consisting of 2D Gaussian peaks. The example is contained in the file Gauss_Fit_2D_Example.cpp and it can be built and executed within the project environment. The optional inputs to gpufit(), weights and user_info, are not used.
This example features:
Noisy data and random initial guesses for the fit parameters
Use of the maximum likelihood estimator which is appropriate for data subject to Poisson noise
In this example, a set of simulated data is generated, consisting of 104 individual Gaussian peaks, with a size of 30 x 30 points. Random noise is added to the data. The model function and the model parameters are described in 2D Gaussian function (cylindrical symmetry).
In this example the true parameters used to generate the Gaussian data are defined in the following code block.
// true parameters
std::vector< float > true_parameters{ 10.f, 14.5f, 14.5f, 3.f, 10.f}; // amplitude, center x/y positions, width, offset
These parameters define a 2D Gaussian peak centered at the middle of the grid (position 14.5, 14.5), with a width (standard deviation) of 3.0, an amplitude of 10 and a background of 10. Note that, since we are not providing the independent variables (X values) in the call to Gpufit, the X and Y coordinates of the first data point are assumed to be 0.0, and increasing linearly from this point (i.e. \(0, 1, 2, ...\)).
The guesses for the initial parameters are drawn from the true parameters with a uniformly distributed deviation of about 20%. The initial guesses for the center coordinates are chosen with a deviation relative to the width of the Gaussian.
// initial parameters (randomized)
std::vector< float > initial_parameters(n_fits * n_model_parameters);
for (size_t i = 0; i < n_fits; i++)
{
for (size_t j = 0; j < n_model_parameters; j++)
{
if (j == 1 || j == 2)
{
initial_parameters[i * n_model_parameters + j] = true_parameters[j] + true_parameters[3] * (-0.2f + 0.4f * uniform_dist(rng));
}
else
{
initial_parameters[i * n_model_parameters + j] = true_parameters[j] * (0.8f + 0.4f*uniform_dist(rng));
}
}
}
The 2D grid of X and Y values (each ranging from 0 to 29 with an increment of 1) is computed using a double for loop.
// generate x and y values
std::vector< float > x(n_points_per_fit);
std::vector< float > y(n_points_per_fit);
for (size_t i = 0; i < size_x; i++)
{
for (size_t j = 0; j < size_x; j++) {
x[i * size_x + j] = static_cast<float>(j);
y[i * size_x + j] = static_cast<float>(i);
}
}
Next, a 2D Gaussian peak function (without noise) is calculated, once, using the true parameters.
void generate_gauss_2d(
std::vector<float> const & x_coordinates,
std::vector<float> const & y_coordinates,
std::vector<float> const & gauss_params,
std::vector<float> & output_values)
{
// Generates a Gaussian 2D function at a set of X and Y coordinates. The Gaussian is defined by
// an array of five parameters.
// x_coordinates: Vector of X coordinates.
// y_coordinates: Vector of Y coordinates.
// gauss_params: Vector of function parameters.
// output_values: Output vector containing the values of the Gaussian function at the
// corresponding X, Y coordinates.
// gauss_params[0]: Amplitude
// gauss_params[1]: Center X position
// guass_params[2]: Center Y position
// gauss_params[3]: Gaussian width (standard deviation)
// gauss_params[4]: Baseline offset
// This code assumes that x_coordinates.size == y_coordinates.size == output_values.size
for (size_t i = 0; i < x_coordinates.size(); i++)
{
float arg = -((x_coordinates[i] - gauss_params[1]) * (x_coordinates[i] - gauss_params[1])
+ (y_coordinates[i] - gauss_params[2]) * (y_coordinates[i] - gauss_params[2]))
/ (2.f * gauss_params[3] * gauss_params[3]);
output_values[i] = gauss_params[0] * exp(arg) + gauss_params[4];
}
}
The variable temp_gauss is used to store the values of the Gaussian peak. This variable is then used as a template to generate a set of Gaussian peaks with random, Poisson-distributed noise.
// generate data with noise
std::vector< float > temp_gauss(n_points_per_fit);
// compute the model function
generate_gauss_2d(x, y, true_parameters.begin(), temp_gauss);
std::vector< float > data(n_fits * n_points_per_fit);
for (size_t i = 0; i < n_fits; i++)
{
// generate Poisson random numbers
for (size_t j = 0; j < n_points_per_fit; j++)
{
std::poisson_distribution< int > poisson_dist(temp_gauss[j]);
data[i * n_points_per_fit + j] = static_cast<float>(poisson_dist(rng));
}
}
Thus, in this example, the data for each fit differs only in the random noise. This, and the randomized initial guesses for each fit, result in each fit returning slightly different best-fit parameters.
Next, the model and estimator IDs are set, corresponding to the 2D Gaussian fit model function, and the MLE estimator.
// estimator ID
int const estimator_id = MLE;
// model ID
int const model_id = GAUSS_2D;
Next, the gpufit function is called via the C Interface. Parameters weights, user_info and user_info_size are set to 0, indicating that they are not used in this example.
// call to gpufit (C interface)
int const status = gpufit
(
n_fits,
n_points_per_fit,
data.data(),
0,
model_id,
initial_parameters.data(),
tolerance,
max_number_iterations,
parameters_to_fit.data(),
estimator_id,
0,
0,
output_parameters.data(),
output_states.data(),
output_chi_square.data(),
output_number_iterations.data()
);
// check status
if (status != STATUS_OK)
{
throw std::runtime_error(gpufit_get_last_error());
}
After the fits are complete, the return value is checked to ensure that no error occurred.
Output statistics¶
The last part of this example obtains statistics describing the fit results, and testing whether the fits converged, etc.
The output_states variable contains a state code which indicates whether the fit converged, or if an error occured (see the Gpufit API documentation, Description of output parameters, for details). In this example, a histogram of all possible fit states is obtained by iterating over the state of each fit.
// get fit states
std::vector< int > output_states_histogram(5, 0);
for (std::vector< int >::iterator it = output_states.begin(); it != output_states.end(); ++it)
{
output_states_histogram[*it]++;
}
In computing the mean and standard deviation of the results, only the converged fits are taken into account. The following code contains an example of the calculation of the means of the output parameters, iterating over all fits and all model parameters.
// compute mean of fitted parameters for converged fits
std::vector< float > output_parameters_mean(n_model_parameters, 0);
for (size_t i = 0; i != n_fits; i++)
{
if (output_states[i] == STATE_CONVERGED)
{
for (size_t j = 0; j < n_model_parameters; j++)
{
output_parameters_mean[j] += output_parameters[i * n_model_parameters + j];
}
}
}
// normalize
for (size_t j = 0; j < n_model_parameters; j++)
{
output_parameters_mean[j] /= output_states_histogram[0];
}
In summary, the above example illustrates a simple call to gpufit()
using a real dataset. When the fit is complete, the
fit results are obtained and the output states are checked. Additionally, this example calculates some basic statistics
describing the results. The code also illustrates how the input and output parameters are organized in memory.
Linear Regression Example¶
This example demonstrates the use of Gpufit to compute linear fits to a randomly generated dataset. The example is contained
in the file Linear_Regression_Example.cpp and it can be built and executed within the project environment. This example
illustrates how independent variables may be used in the fitting process, by taking advantage of the user_info parameter.
In this example, a set of 104 individual fits are calculated. Each simulated dataset consists of 20 randomly generated
data values. The X coordinates of the data points do not have a uniform spacing, but increase non-linearly.
The user information data is used to pass the X values to gpufit()
. The fits are unweighted, and the model function
and model parameters are described in Linear regression.
For details of how user_info is used to store the values of the independent variable for this fit model function, see the section of the Gpufit documentation describing the model functions, Fit Model functions.
This example features:
Noisy data and random initial guesses for the parameters
Unequally spaced X position values, passed to
gpufit()
using the user_info parameter.
The following code illustrates how the X positions of the data points are stored in the user_info variable, for this model function. The user_info points at a vector of float values. Note, however, that the way in which user_info is used by a model function may vary from function to function.
// custom x positions for the data points of every fit, stored in user_info
std::vector< float > user_info(n_points_per_fit);
for (size_t i = 0; i < n_points_per_fit; i++)
{
user_info[i] = static_cast<float>(pow(2, i));
}
// size of user_info in bytes
size_t const user_info_size = n_points_per_fit * sizeof(float);
By providing the data coordinates for only one fit in user_info, the model function will use the same coordinates for all fits in the dataset, as described in Fit Model functions.
In the next section, the initial parameters for each fit are set to random values, uniformly distributed around the true parameter value.
// true parameters
std::vector< float > true_parameters { 5, 2 }; // offset, slope
// initial parameters (randomized)
std::vector< float > initial_parameters(n_fits * n_model_parameters);
for (size_t i = 0; i != n_fits; i++)
{
// random offset
initial_parameters[i * n_model_parameters + 0] = true_parameters[0] * (0.8f + 0.4f * uniform_dist(rng));
// random slope
initial_parameters[i * n_model_parameters + 1] = true_parameters[0] * (0.8f + 0.4f * uniform_dist(rng));
}
The data is then generated as the value of a linear function plus additive, normally distributed, random noise.
// generate data
std::vector< float > data(n_points_per_fit * n_fits);
for (size_t i = 0; i != data.size(); i++)
{
size_t j = i / n_points_per_fit; // the fit
size_t k = i % n_points_per_fit; // the position within a fit
float x = user_info[k];
float y = true_parameters[0] + x * true_parameters[1];
data[i] = y + normal_dist(rng);
}
In the following code, the model and estimator IDs for the fit are initialized.
// estimator ID
int const estimator_id = LSE;
// model ID
int const model_id = LINEAR_1D;
Finally, a call is made to gpufit()
(C Interface). The weights parameter is set to 0, indicating that
the fits are unweighted.
// call to gpufit (C interface)
int const status = gpufit
(
n_fits,
n_points_per_fit,
data.data(),
0,
model_id,
initial_parameters.data(),
tolerance,
max_number_iterations,
parameters_to_fit.data(),
estimator_id,
user_info_size,
reinterpret_cast< char * >( user_info.data() ),
output_parameters.data(),
output_states.data(),
output_chi_square.data(),
output_number_iterations.data()
);
After the fits have been executed and the return value is checked to ensure that no error occurred, statistics describing the fit results are calculated and displayed, as in the previous example (see Output statistics).
Customization¶
This sections explains how to add custom fit model functions and custom fit estimators within the Gpufit library. Functions calculating the estimator and model values are CUDA device functions that are defined in CUDA header files using the C syntax. For each function and estimator there exists a separate file. Therefore, to add an additional model or estimator a new CUDA header file containing the new model or estimator function must be created and included in the library.
Please note, that in order to add a model function or estimator, it is necessary to rebuild the Gpufit library from source. In future releases of Gpufit, it may be possible to include new fit functions or estimators at runtime.
Add a new fit model function¶
To add a new fit model, the model function itself as well as analytic expressions for its partial derivatives must to be known. A function calculating the values of the model as well as a function calculating the values of the partial derivatives of the model, with respect to the model parameters and possible grid coordinates, must be implemented.
Additionally, a new model ID must be defined and included in the list of available model IDs, and the number of model parameters and dimensions must be specified as well.
Detailed step by step instructions for adding a model function are given below.
Define an additional model ID in file constants.h. When using the language bindings, the model ID must also be added for Python (in gpufit.py), Matlab (in ModelID.m), Java (in Model.java).
Implement a CUDA device function within a newly created .cuh file in folder Gpufit/Gpufit/models according to the following template.
__device__ void ... ( // ... = function name
float const * parameters,
int const n_fits,
int const n_points,
float * value,
float * derivative,
int const point_index,
int const fit_index,
int const chunk_index,
char * user_info,
std::size_t const user_info_size)
{
///////////////////////////// value //////////////////////////////
value[point_index] = ... ; // formula calculating fit model values
/////////////////////////// derivative ///////////////////////////
float * current_derivative = derivative + point_index;
current_derivative[0 * n_points] = ... ; // formula calculating derivative values with respect to parameters[0]
current_derivative[1 * n_points] = ... ; // formula calculating derivative values with respect to parameters[1]
.
.
.
}
This code can be used as a pattern, where the placeholders “…” must be replaced by user code which calculates model function values and partial derivative values of the model function for a particular set of parameters. See for example linear_1d.cuh.
Include the newly created .cuh file in models.cuh
Add a switch case in the CUDA device function
calculate_model()
in file models.cuh to allow calling the newly added model function.
switch (model_id)
{
case GAUSS_1D:
calculate_gauss1d(parameters, n_fits, n_points, value, derivative, point_index, fit_index, chunk_index, user_info, user_info_size);
break;
.
.
.
case ...: // model ID
... // function name
(parameters, n_fits, n_points, value, derivative, point_index, fit_index, chunk_index, user_info, user_info_size);
break;
.
.
.
default:
break;
}
Add a switch case in function
configure_model()
in file models.cuh.
switch (model_id)
{
case GAUSS_1D: n_parameters = 4; n_dimensions = 1; break;
.
.
.
case ...: // model ID
n_parameters = ...; // number of model parameters
n_dimensions = ...; // number of model dimensions
break;
default: break;
}
After adding a new model function, if CMake is being used to configure the compiler, then CMake must be run again. If not using CMake, the new model function file (the .cuh file) must be included in the project.
Re-build the Gpufit project.
Add a new fit estimator¶
To extend the Gpufit library with additional estimators, three CUDA device functions must be defined and integrated. The sections requiring modification are the functions which calculate the estimator function values, and its gradient and hessian values. Also, a new estimator ID must be defined. Detailed step by step instructions for adding an additional estimator is given below.
Define an additional estimator ID in constants.h When using the language bindings, the estimator ID must be added also for Python (in gpufit.py), Matlab (in EstimatorID.m), Java (in Estimator.java).
Implement three functions within a newly created .cuh file in the folder Gpufit/Gpufit/estimators calculating \(\chi^2\) values and its gradient and hessian according to the following template.
///////////////////////////// Chi-square /////////////////////////////
__device__ void ... ( // ... = function name Chi-square
volatile float * chi_square,
int const point_index,
float const * data,
float const * value,
float const * weight,
int * state,
char * user_info,
std::size_t const user_info_size)
{
chi_square[point_index] = ... ; // formula calculating Chi-square summands
}
////////////////////////////// gradient //////////////////////////////
__device__ void ... ( // ... = function name gradient of Chi-square
volatile float * gradient,
int const point_index,
int const parameter_index,
float const * data,
float const * value,
float const * derivative,
float const * weight,
char * user_info,
std::size_t const user_info_size)
{
gradient[point_index] = ... ; // formula calculating summands of the gradient of Chi-square
// model derivates are stored in derivative[parameter_index]
}
////////////////////////////// hessian ///////////////////////////////
__device__ void ... ( // function name hessian
double * hessian,
int const point_index,
int const parameter_index_i,
int const parameter_index_j,
float const * data,
float const * value,
float const * derivative,
float const * weight,
char * user_info,
std::size_t const user_info_size)
{
*hessian += ... ; // formula calculating summands of the hessian of Chi-square
}
This code can be used as a pattern, where the placeholders “…” must be replaced by user code which calculates the estimator and the gradient and hessian values of the estimator given. For a concrete example, see lse.cuh.
Include the newly created .cuh file in estimators.cuh.
#include "....cuh" // filename
Add a switch case in three CUDA device functions in the file estimators.cuh.
4a. Calculation of Chi-square:
switch (estimator_id) { case LSE: calculate_chi_square_lse(chi_square, point_index, data, value, weight, state, user_info, user_info_size); break; . . . case ...: // estimator ID ... // function name Chi-square (chi_square, point_index, data, value, weight, state, user_info, user_info_size); break; default: break; }4b. Calculation of the gradients of Chi-square:
switch (estimator_id) { case LSE: calculate_gradient_lse(gradient, point_index, parameter_index, data, value, derivative, weight, user_info, user_info_size); break; . . . case ...: // estimator ID ... // function name gradient (gradient, point_index, parameter_index, data, value, derivative, weight, user_info, user_info_size); break; default: break; }4c. Calculation of the Hessian:
switch (estimator_id) { case LSE: calculate_hessian_lse (hessian, point_index, parameter_index_i, parameter_index_j, data, value, derivative, weight, user_info,user_info_size); break; . . . case ...: // estimator ID ... // function name hessian (hessian, point_index, parameter_index_i, parameter_index_j, data, value, derivative, weight, user_info, user_info_size); break; default: break; }
After adding a new estimator, if CMake is being used to configure the compiler, then CMake must be run again. If not using CMake, the new estimator file (the .cuh file) must be included in the project.
Re-build the Gpufit project.
Future releases¶
A current disadvantage of the Gpufit library, when compared with established CPU-based curve fitting packages, is that in order to add or modify a fit model function or a fit estimator, the library must be recompiled. We anticipate that this limitation can be overcome in future releases of the library, by employing run-time compilation of the CUDA code.
External bindings¶
This sections describes the Gpufit bindings to other programming languages. The bindings (to Python, Matlab or Java) aim to emulate the C Interface as closely as possible.
Most high level languages feature multidimensional numeric arrays. In the bindings implemented for Matlab and Python, we adopt the convention that the input data should be organized as a 2D array, with one dimension corresponding to the number of data points per fit, and the other corresponding to the number of fits. Internally, in memory, these arrays should always be ordered such that the data values for each fit are kept together. In Matlab, for example, this means storing the data in an array with dimensions [number_points_per_fit, number_fits]. In this manner, the data in memory is ordered in the same way that is expected by the Gpufit C interface, and there is no need to copy or otherwise re-organize the data before passing it to the GPU. The same convention is used for the weights, the initial model parameters, and the output parameters.
In Java we pre-allocate one dimensional FloatBuffers or IntBuffers for the data and the fit results. The user is responsible for copying data into these buffers.
Unlike the C interface, the external bindings do not require the number of fits and the number of data points per fit to be specified explicitly. Instead, these numbers are inferred from the dimensions of the 2D input arrays.
Optional parameters with default values¶
The external bindings make some input parameters optional. The optional parameters are shown here. They are kept the same for all bindings.
- tolerance:
default value 1e-4
- max_n_iterations:
default value 25 iterations
- estimator_id:
the default estimator is LSE as defined in constants.h
- parameters_to_fit:
by default all parameters are fit
For instructions on how to specify these parameters explicitly, see the sections below.
Python¶
The Gpufit binding for Python is a project named pyGpufit. This project contains a Python package named pygpufit, which
contains a module gpufit, and this module implements a method called fit. Calling this method is equivalent to
calling the C interface function gpufit()
of the Gpufit library. The package expects the input data to be
stored as NumPy array. NumPy follows row-major order by default.
Installation¶
Wheel files for Python 2.X and 3.X on Windows 32/64 bit are included in the binary package. NumPy is required.
Install the wheel file with.
pip install --no-index --find-links=LocalPathToWheelFile pyGpufit
Python Interface¶
fit¶
The signature of the fit method (equivalent to calling the C interface function gpufit()
) is
def fit(data, weights, model_id:ModelID, initial_parameters, tolerance:float=None, max_number_iterations:int=None, parameters_to_fit=None, estimator_id:EstimatorID=None, user_info=None):
Optional parameters are passed in as None. The numbers of points, fits and parameters is deduced from the dimensions of the input data and initial parameters arrays.
Input parameters
- data:
Data 2D NumPy array of shape (number_fits, number_points) and data type np.float32
- weights:
Weights 2D NumPy array of shape (number_fits, number_points) and data type np.float32 (same as data)
- special:
None indicates that no weights are available
- tolerance:
Fit tolerance
- type:
float
- special:
If None, the default value will be used.
- max_number_iterations:
Maximal number of iterations
- type:
int
- special:
If None, the default value will be used.
- estimator_id:
estimator ID
- type:
EstimatorID which is an Enum in the same module and defined analogously to constants.h.
- special:
If None, the default value is used.
- model_id:
model ID
- type:
ModelID which is an Enum in the same module and defined analogously to constants.h.
- initial_parameters:
Initial parameters 2D NumPy array of shape (number_fits, number_parameter)
- array data type:
np.float32
- parameters_to_fit:
parameters to fit 1D NumPy array of length number_parameter A zero indicates that this parameter should not be fitted, everything else means it should be fitted.
- array data type:
np.int32
- special:
If None, the default value is used.
- user_info:
user info 1D NumPy array of arbitrary type. The length in bytes is deduced automatically.
- special:
If None, no user_info is assumed.
Output parameters
- parameters:
Fitted parameters for each fit 2D NumPy array of shape (number_fits, number_parameter) and data type np.float32
- states:
Fit result states for each fit 1D NumPy array of length number_parameter of data type np.int32 As defined in constants.h:
- chi_squares:
\(\chi^2\) values for each fit 1D NumPy array of length number_parameter of data type np.float32
- n_iterations:
Number of iterations done for each fit 1D NumPy array of length number_parameter of data type np.int32
- time:
Execution time of call to fit In seconds.
Errors are raised if checks on parameters fail or if the execution of fit failed.
fit_constrained¶
The fit_constrained
method is very similar to the fit
method with the additional possibility to
specify parameter constraints.
The signature of the fit_constrained
method (equivalent to calling the C interface function gpufit_constrained()
) is
def fit_constrained(data, weights, model_id:ModelID, initial_parameters, constraints=None, constraint_types=None, tolerance:float=None, max_number_iterations:int=None, parameters_to_fit=None, estimator_id:EstimatorID=None, user_info=None):
Constraint input parameters
- constraints:
Constraint bound intervals for every parameter and every fit. 2D NumPy array of shape (number_fits, 2*number_parameter) and data type np.float32
- contraint_types:
Constraint types for every parameter 1D NumPy array of length number_parameter Valid values are defined in gf.ConstraintType
get_last_error¶
The signature of the get_last_error method (equivalent to calling the C interface function gpufit_get_last_error) is
def get_last_error():
Returns a string representing the error message of the last occurred error.
cuda_available¶
The signature of the cuda_available method (equivalent to calling the C interface function gpufit_cuda_available) is
def cuda_available():
Returns True if CUDA is available and False otherwise.
get_cuda_version¶
The signature of the get_cuda_version method (equivalent to calling the C interface function gpufit_get_cuda_version) is
def get_cuda_version():
Output parameters
- runtime version:
Tuple of (Major version, Minor version)
- driver version:
Tuple of (Major version, Minor version)
An error is raised if the execution failed (i.e. because CUDA is not available).
Python Examples¶
2D Gaussian peak example¶
An example can be found at Python Gauss2D example. It is equivalent to Example of 2D Gaussian fits.
The essential imports are:
import numpy as np
import pygpufit.gpufit as gf
First we test for availability of CUDA as well as CUDA driver and runtime versions.
# cuda available checks
print('CUDA available: {}'.format(gf.cuda_available()))
print('CUDA versions runtime: {}, driver: {}'.format(*gf.get_cuda_version()))
The true parameters describing an example 2D Gaussian peak functions are:
# true parameters
true_parameters = np.array((10, 5.5, 5.5, 3, 10), dtype=np.float32)
A 2D grid of x and y positions can conveniently be generated using the np.meshgrid function:
# generate x and y values
g = np.arange(size_x)
yi, xi = np.meshgrid(g, g, indexing='ij')
xi = xi.astype(np.float32)
yi = yi.astype(np.float32)
Using these positions and the true parameter values a model function can be calculated as
def generate_gauss_2d(p, xi, yi):
"""
Generates a 2D Gaussian peak.
http://gpufit.readthedocs.io/en/latest/api.html#gauss-2d
:param p: Parameters (amplitude, x,y center position, width, offset)
:param xi: x positions
:param yi: y positions
:return: The Gaussian 2D peak.
"""
arg = -(np.square(xi - p[1]) + np.square(yi - p[2])) / (2*p[3]*p[3])
y = p[0] * np.exp(arg) + p[4]
return y
The model function can be repeated and noise can be added using the np.tile and np.random.poisson functions.
# generate data
data = generate_gauss_2d(true_parameters, xi, yi)
data = np.reshape(data, (1, number_points))
data = np.tile(data, (number_fits, 1))
# add Poisson noise
data = np.random.poisson(data)
data = data.astype(np.float32)
The model and estimator IDs can be set as
# estimator ID
estimator_id = gf.EstimatorID.MLE
# model ID
model_id = gf.ModelID.GAUSS_2D
When all input parameters are set we can call the C interface of Gpufit.
# run Gpufit
parameters, states, chi_squares, number_iterations, execution_time = gf.fit(data, None, model_id, initial_parameters, tolerance, max_number_iterations, None, estimator_id, None)
And finally statistics about the results of the fits can be displayed where the mean and standard deviation of the fitted parameters are limited to those fits that converged.
# print fit results
# get fit states
converged = states == 0
number_converged = np.sum(converged)
print('ratio converged {:6.2f} %'.format(number_converged / number_fits * 100))
print('ratio max it. exceeded {:6.2f} %'.format(np.sum(states == 1) / number_fits * 100))
print('ratio singular hessian {:6.2f} %'.format(np.sum(states == 2) / number_fits * 100))
print('ratio neg curvature MLE {:6.2f} %'.format(np.sum(states == 3) / number_fits * 100))
print('ratio gpu not read {:6.2f} %'.format(np.sum(states == 4) / number_fits * 100))
# mean, std of fitted parameters
converged_parameters = parameters[converged, :]
converged_parameters_mean = np.mean(converged_parameters, axis=0)
converged_parameters_std = np.std(converged_parameters, axis=0)
for i in range(number_parameters):
print('p{} true {:6.2f} mean {:6.2f} std {:6.2f}'.format(i, true_parameters[i], converged_parameters_mean[i], converged_parameters_std[i]))
# print summary
print('model ID: {}'.format(model_id))
print('number of fits: {}'.format(number_fits))
print('fit size: {} x {}'.format(size_x, size_x))
print('mean chi_square: {:.2f}'.format(np.mean(chi_squares[converged])))
print('iterations: {:.2f}'.format(np.mean(number_iterations[converged])))
print('time: {:.2f} s'.format(execution_time))
2D Gaussian peak constrained fit example¶
An example for a constrained fit can be found at Python Gauss2D constrained fit example. It differs from the previous example only in that constraints are specified additionally (as 2D array of lower and upper bounds on parameters for every fit) as well as constraint types (for all parameters including fixed parameters) that can take a value of ConstraintType (FREE, LOWER, UPPER or LOWER_UPPER) in order to either do not enforce the constraints for a parameter or enforce them only at the lower or upper or both bounds.
The following code block demonstrates how the sigma of a 2D Gaussian peak can be constrained to the interval [2.9, 3.1] and the background and ampltiude to non-negative values.
# set constraints
constraints = np.zeros((number_fits, 2*number_parameters), dtype=np.float32)
constraints[:, 6] = 2.9
constraints[:, 7] = 3.1
constraint_types = np.array([gf.ConstraintType.LOWER, gf.ConstraintType.FREE, gf.ConstraintType.FREE, gf.ConstraintType.LOWER_UPPER, gf.ConstraintType.LOWER], dtype=np.int32)
# run constrained Gpufit
parameters, states, chi_squares, number_iterations, execution_time = gf.fit_constrained(data, None, model_id,
initial_parameters, constraints, constraint_types,
tolerance, max_number_iterations, None,
estimator_id, None)
Matlab¶
The Matlab binding for Gpufit is a Matlab script (gpufit.m). This script checks the input data, sets default parameters, and calls the C interface of the Gpufit library, via a compiled .mex file.
Please note, that before using the Matlab binding, the path to gpufit.m must be added to the Matlab path.
If other GPU-based computations are to be performed with Matlab in the same session, please use the Matlab GPU computing functionality first (for example with a call to gpuDevice or gpuArray) before calling the Gpufit Matlab binding. If this is not done, Matlab will throw an error (Error using gpuArray An unexpected error occurred during CUDA execution. The CUDA error was: cannot set while device is active in this process).
Matlab Interface¶
gpufit¶
Optional parameters are passed in as empty matrices ([]
). The numbers of points, fits and parameters is deduced from the dimensions of
the input data and initial parameters matrices.
The signature of the gpufit
function is
function [parameters, states, chi_squares, n_iterations, time] = gpufit(data, weights, model_id, initial_parameters, tolerance, max_n_iterations, parameters_to_fit, estimator_id, user_info)
Input parameters
- data:
Data 2D matrix of size [number_points, number_fits] and data type single
- weights:
Weights 2D matrix of size [number_points, number_fits] and data type single (same as data)
- special:
None indicates that no weights are available
- tolerance:
Fit tolerance
- type:
single
- special:
If empty ([]), the default value will be used.
- max_number_iterations:
Maximal number of iterations Will be converted to int32 if necessary
- special:
If empty ([]), the default value will be used.
- estimator_id:
estimator ID
- type:
EstimatorID which is defined in EstimatorID.m analogously to constants.h.
- special:
If empty ([]), the default value is used.
- model_id:
model ID
- type:
ModelID which is defined in ModelID.m analogously to constants.h.
- initial_parameters:
Initial parameters 2D matrix of size: [number_parameter, number_fits]
- type:
single
- parameters_to_fit:
parameters to fit vector of length number_parameter, will be converted to int32 if necessary A zero indicates that this parameter should not be fitted, everything else means it should be fitted.
- special:
If empty ([]), the default value is used.
- user_info:
user info vector of suitable type (correct type is not checked and depends on the chosen fit model function or estimator). The length of user_info in bytes is determined automatically.
Output parameters
- parameters:
Fitted parameters for each fit 2D matrix of size: [number_parameter, number_fits] of data type single
- states:
Fit result states for each fit vector of length number_parameter of data type int32 As defined in constants.h:
- chi_squares:
\(\chi^2\) values for each fit vector of length number_parameter of data type single
- n_iterations:
Number of iterations done for each fit vector of length number_parameter of data type int32
- time:
Execution time of call to gpufit In seconds.
Errors are raised if checks on parameters fail or if the execution of gpufit fails.
gpufit_constrained¶
The gpufit_constrained
function is very similar to the gpufit
function with the additional possibility to specify
parameter constraints.
The signature of the gpufit_constrained
function is
function [parameters, states, chi_squares, n_iterations, time] = gpufit_constrained(data, weights, model_id, initial_parameters, constraints, constraint_types, tolerance, max_n_iterations, parameters_to_fit, estimator_id, user_info)
Constraint input parameters
- constraints:
Constraint bound intervals for every parameter and every fit 2D matrix of size [2*number_parameter, number_fits] of data type single
- contraint_types:
Constraint types for every parameter Vector of length number_parameter, will be converted to int32 if necessary. Valid values are defined in ConstraintType.m.
gpufit_cuda_available¶
The signature of the gputfit_cuda_available method (equivalent to calling the C interface function gpufit_cuda_available) is
function r = gpufit_cuda_available():
Returns True if CUDA is available and False otherwise.
Matlab Examples¶
Simple example¶
The most simple example is the Matlab simple example. It is equivalent to Simple example (minimal call to gpufit()) and additionally relies on default values for optional arguments.
2D Gaussian peak example¶
An example can be found at Matlab Gauss2D example. It is equivalent to Example of 2D Gaussian fits.
The true parameters describing an example 2D Gaussian peak functions are:
% true parameters
true_parameters = single([10, 5.5, 5.5, 3, 10]);
A 2D grid of x and y positions can conveniently be generated using the ndgrid function:
% generate x and y values
g = single(0 : size_x - 1);
[x, y] = ndgrid(g, g);
Using these positions and the true parameter values a model function can be calculated as
function g = gaussian_2d(x, y, p)
% Generates a 2D Gaussian peak.
% http://gpufit.readthedocs.io/en/latest/api.html#gauss-2d
%
% x,y - x and y grid position values
% p - parameters (amplitude, x,y center position, width, offset)
g = p(1) * exp(-((x - p(2)).^2 + (y - p(3)).^2) / (2 * p(4)^2)) + p(5);
end
The model function can be repeated and noise can be added using the repmat and poissrnd functions.
% generate data with Poisson noise
data = gaussian_2d(x, y, true_parameters);
data = repmat(data(:), [1, number_fits]);
data = poissrnd(data);
The model and estimator IDs can be set as
% estimator id
estimator_id = EstimatorID.MLE;
% model ID
model_id = ModelID.GAUSS_2D;
When all input parameters are set we can call the C interface of the Gpufit library.
%% run Gpufit
[parameters, states, chi_squares, n_iterations, time] = gpufit(data, [], model_id, initial_parameters, tolerance, max_n_iterations, [], estimator_id, []);
And finally statistics about the results of the fits can be displayed where the mean and standard deviation of the fitted parameters are limited to those fits that converged.
%% displaying results
% get fit states
converged = states == 0;
number_converged = sum(converged);
fprintf(' ratio converged %6.2f %%\n', number_converged / number_fits * 100);
fprintf(' ratio max it. exceeded %6.2f %%\n', sum(states == 1) / number_fits * 100);
fprintf(' ratio singular hessian %6.2f %%\n', sum(states == 2) / number_fits * 100);
fprintf(' ratio neg curvature MLE %6.2f %%\n', sum(states == 3) / number_fits * 100);
fprintf(' ratio gpu not read %6.2f %%\n', sum(states == 4) / number_fits * 100);
% mean and std of fitted parameters
converged_parameters = parameters(:, converged);
converged_parameters_mean = mean(converged_parameters, 2);
converged_parameters_std = std(converged_parameters, [], 2);
for i = 1 : number_parameters
fprintf(' p%d true %6.2f mean %6.2f std %6.2f\n', i, true_parameters(i), converged_parameters_mean(i), converged_parameters_std(i));
end
% print summary
fprintf('model ID: %d\n', model_id);
fprintf('number of fits: %d\n', number_fits);
fprintf('fit size: %d x %d\n', size_x, size_x);
fprintf('mean chi-square: %6.2f\n', mean(chi_squares(converged)));
fprintf('iterations: %6.2f\n', mean(n_iterations(converged)));
fprintf('time: %6.2f s\n', time);
2D Gaussian peak constrained fit example¶
An example for a constrained fit can be found at Matlab Gauss2D constrained fit example. It differs from the previous example only in that constraints are specified additionally (as 2D array of lower and upper bounds on parameters for every fit) as well as constraint types (for all parameters including fixed parameters) that can take a value of ConstraintType (FREE, LOWER, UPPER or LOWER_UPPER) in order to either do not enforce the constraints for a parameter or enforce them only at the lower or upper or both bounds.
The following code block demonstrates how the sigma of a 2D Gaussian peak can be constrained to the interval [2.9, 3.1] and the background and amplitude to non-negative values.
%% set constraints
constraints = zeros([2*number_parameters, number_fits], 'single');
constraints(7, :) = 2.9;
constraints(8, :) = 3.1;
constraint_types = int32([ConstraintType.LOWER, ConstraintType.FREE, ConstraintType.FREE, ConstraintType.LOWER_UPPER, ConstraintType.LOWER]);
%% run constrained Gpufit
[parameters, states, chi_squares, n_iterations, time] = gpufit_constrained(data, [], ...
model_id, initial_parameters, constraints, constraint_types, tolerance, max_n_iterations, [], estimator_id, []);
Java¶
The Gpufit binding for Java consists of a small adapter C library named GpufitJNI and a Gpufit jar archive containing
a com.github.gpufit package. In these the class Gpufit has static methods largely equivalent to calling the C interface
function gpufit()
of the Gpufit library. The fit method expects the input to be given as a FitModel instance,
which among other things specifies the model and the estimator as enums. The results are returned as a FitResult instance.
Installation¶
Build the Gpufit library and the GpufitJNI library from source as documented in Installation and Testing. Make sure both libraries are in the Java library path, for example by using the -Djava.library.path comman line switch for the VM.
Build the Gpufit.jar from the sources using Gradle on Gpufit/java/gpufit/build.gradle. Make sure this jar is in the Java class path of your application, e.g. by adding it as a dependency to your project
Java Interface¶
For a more complete description, see the Javadoc output of the Gpufit Java binding project.
Gpufit.fit¶
The signature of the fit method (calls the C interface function gpufit()
) is
public static FitResult fit(FitModel fitModel, FitResult fitResult)
Input parameters are given as a FitModel, output parameters are stored in a FitResult. A FitResult can be re-used if the number of fits and the number of parameters of the model didn’t change. It must then also be given as second parameter.
Input of the fit - Filling the FitModel
public FitModel(int numberFits, int numberPoints, boolean withWeights, Model model, Float tolerance, Integer maxNumberIterations, Boolean[] parametersToFit, Estimator estimator, int userInfoSize)
- numberFits:
Number of fits
- numberPoints:
Number of data points per fit
- widthWeights:
If true, a buffer for giving weights is pre-allocated, otherwise not
- model:
An enum describing the model. See class Model for more information. Naming and id is equivalent to the C code.
- tolerance:
Fit tolerance
- special:
If null, the default value will be used.
- maxNumberIterations:
Maximal number of iterations
- special:
If null, the default value will be used.
- parametersToFit:
Boolean array indicating which parameters should be fitted
- special:
If null, the default value will be used.
- estimator:
Enum describing the estimator function. See class Estimator for more information. Naming and id is equivalent to the C code.
- special:
If None, the default value is used.
- userInfoSize:
The size of the user info (in bytes).
- special:
Must be positive, otherwise the buffer for user info is not pre-allocated.
Afterwards the buffers for data, weights (if desired), initial parameters and user info (if desired) must be filled with the appropriate content. The internal layout is the same as in the C part of Gpufit, i.e. the data represents an 1D number array of length of number fits times number data points per fit with an order of data points followed one after another for all fits. In this batch. The initial parameters are number fits times number of parameters in the model with the parameters for each fit changing fastest and the number of fits slowest.
Fit output - The FitResult
Memory for the fit output is either created automatically or a previous instance of FitResult can be reused to avoid recreation.
public class FitResult {
public final FloatBuffer parameters;
public final IntBuffer states;
public final FloatBuffer chiSquares;
public final IntBuffer numberIterations;
public float fitDuration;
- parameters:
Fitted parameters for each fit
- states:
Fit result states for each fit As defined in constants.h:
- chi_squares:
\(\chi^2\) values for each fit
- n_iterations:
Number of iterations done for each fit
- time:
Execution time of call to fit In seconds.
Errors are raised if checks on parameters fail or if the execution of fit failed.
Gpufit.getLastError¶
The signature of the get_last_error method (equivalent to calling the C interface function gpufit_get_last_error) is
public static native String getLastError()
Returns a string representing the error message of the last occurred error.
Gpufit.isCudaAvailable¶
The signature of the cuda_available method (equivalent to calling the C interface function gpufit_cuda_available) is
public static native boolean isCudaAvailable()
Returns True if CUDA is available and False otherwise.
get_cuda_version¶
The signature of the get_cuda_version method (equivalent to calling the C interface function gpufit_get_cuda_version) is
public static CudaVersion getCudaVersion()
The output is a CudaVersion instance with two simple member variables.
- runtime version:
String of “Major version.Minor version”
- driver version:
String of “Major version.Minor version”
An error is raised if the execution failed (i.e. because CUDA is not available).
Java Example¶
2D Gaussian peak example¶
An example can be found at Java Gauss2D example. It is equivalent to Example of 2D Gaussian fits.
First we test for availability of CUDA as well as CUDA driver and runtime versions.
// print general CUDA information
System.out.println(String.format("CUDA available: %b", Gpufit.isCudaAvailable()));
CudaVersion cudaVersion = Gpufit.getCudaVersion();
System.out.println(String.format("CUDA versions runtime: %s, driver: %s", cudaVersion.runtime, cudaVersion.driver));
The model and estimator IDs can be set as
Model model = Model.GAUSS_2D;
Estimator estimator = Estimator.MLE;
The true parameters describing an example 2D Gaussian peak functions are:
// true parameters (order: amplitude, center-x, center-y, width, offset)
float[] trueParameters = new float[]{10, 5.5f, 5.5f, 3, 10};
A 2D grid of x and y positions can conveniently be generated:
// generate x and y values
float[] xi = new float[numberPoints];
float[] yi = new float[numberPoints];
for (int i = 0; i < sizeX; i++) {
for (int j = 0; j < sizeX; j++) {
xi[i * sizeX + j] = i;
yi[i * sizeX + j] = j;
}
}
Using these positions and the true parameter values a model function can be calculated as
/**
* Computes a 2D Gaussian peak given x and y values and parameters.
*
* See also: http://gpufit.readthedocs.io/en/latest/api.html#gauss-2d
*
* @param p Parameter array
* @param x x values array
* @param y y values array
* @return Model values array
*/
private static float[] generateGauss2D(float[] p, float[] x, float[] y) {
// checks
assert(x.length == y.length);
assert(p.length == 5);
// calculate data
float[] data = new float[x.length];
for (int i = 0; i < x.length; i++) {
float arg = -((x[i] - p[1]) * (x[i] - p[1]) + (y[i] - p[2]) * (y[i] - p[2])) / (2 * p[3] * p[3]);
data[i] = p[0] * (float)Math.exp(arg) + p[4];
}
return data;
}
The model function can be repeated and Poisson noise can be added.
// generate data
float[] gauss2D = generateGauss2D(trueParameters, xi, yi);
float[] data = new float[numberFits * numberPoints];
for (int i = 0; i < numberFits; i++) {
System.arraycopy(gauss2D, 0, data, i * numberPoints, numberPoints);
}
// add Poisson noise
for (int i = 0; i < numberFits * numberPoints; i++) {
data[i] = nextPoisson(data[i], rand);
}
A FitModel containing all the input data including copying the data values from an array to a Java buffer can be done via
// assemble FitModel
FitModel fitModel = new FitModel(numberFits, numberPoints, false, model, tolerance, maxNumberIterations, null, estimator, 0);
// fill data and initial parameters in the fit model
fitModel.data.clear();
fitModel.data.put(data);
fitModel.initialParameters.clear();
fitModel.initialParameters.put(initialParameters);
When all input parameters are set we can call the C interface of Gpufit.
// fun Gpufit
FitResult fitResult = Gpufit.fit(fitModel);
And finally statistics about the results of the fits can be displayed where the mean and standard deviation of the fitted parameters are limited to those fits that converged.
// count FitState outcomes and get a list of those who converged
boolean[] converged = new boolean[numberFits];
int numberConverged = 0, numberMaxIterationExceeded = 0, numberSingularHessian = 0, numberNegativeCurvatureMLE = 0;
for (int i = 0; i < numberFits; i++) {
FitState fitState = FitState.fromID(fitResult.states.get(i));
converged[i] = fitState.equals(FitState.CONVERGED);
switch (fitState) {
case CONVERGED:
numberConverged++;
break;
case MAX_ITERATIONS:
numberMaxIterationExceeded++;
break;
case SINGULAR_HESSIAN:
numberSingularHessian++;
break;
case NEG_CURVATURE_MLE:
numberNegativeCurvatureMLE++;
}
}
// get mean and std of converged parameters
float [] convergedParameterMean = new float[]{0, 0, 0, 0, 0};
float [] convergedParameterStd = new float[]{0, 0, 0, 0, 0};
for (int i = 0; i < numberFits; i++) {
for (int j = 0; j < model.numberParameters; j++) {
if (converged[i]) {
convergedParameterMean[j] += fitResult.parameters.get(i * model.numberParameters + j);
}
}
}
for (int i = 0; i < model.numberParameters; i++) {
convergedParameterMean[i] /= numberConverged;
}
for (int i = 0; i < numberFits; i++) {
for (int j = 0; j < model.numberParameters; j++) {
if (converged[i]) {
float dev = fitResult.parameters.get(i * model.numberParameters + j) - convergedParameterMean[j];
convergedParameterStd[j] += dev * dev;
}
}
}
for (int i = 0; i < model.numberParameters; i++) {
convergedParameterStd[i] = (float)Math.sqrt(convergedParameterStd[i] / numberConverged);
}
// print fit results
System.out.println("*Gpufit*");
System.out.println(String.format("Model: %s", model.name()));
System.out.println(String.format("Number of fits: %d", numberFits));
System.out.println(String.format("Fit size: %d x %d", sizeX, sizeX));
System.out.println(String.format("Mean Chi²: %.2f", meanFloatBuffer(fitResult.chiSquares, converged)));
System.out.println(String.format("Mean number iterations: %.2f", meanIntBuffer(fitResult.numberIterations, converged)));
System.out.println(String.format("Time: %.2fs", fitResult.fitDuration));
System.out.println(String.format("Ratio converged: %.2f %%", (float) numberConverged / numberFits * 100));
System.out.println(String.format("Ratio max it. exceeded: %.2f %%", (float) numberMaxIterationExceeded / numberFits * 100));
System.out.println(String.format("Ratio singular Hessian: %.2f %%", (float) numberSingularHessian / numberFits * 100));
System.out.println(String.format("Ratio neg. curvature MLE: %.2f %%", (float) numberNegativeCurvatureMLE / numberFits * 100));
System.out.println("\nParameters of 2D Gaussian peak");
for (int i = 0; i < model.numberParameters; i++) {
System.out.println(String.format("parameter %d, true: %.2f, mean %.2f, std: %.2f", i, trueParameters[i], convergedParameterMean[i], convergedParameterStd[i]));
}
Appendix¶
Levenberg-Marquardt algorithm¶
A flowchart of the implementation of the Levenberg-Marquardt algorithm is given in Fig. 2.

Levenberg-Marquardt algorithm flow as implemented in the Gpufit library.¶
Performance comparison to other GPU benchmarks¶
Using the bundled application (initial release created with CUDA runtime 8.0) to estimate the fitting speed per second of 10 million fits for various CUDA capable graphics cards of various architectures (on different computers with different versions of graphics drivers) we can compare to the results of Passmark G3D. By and large, the results seem to correlate, i.e. a high Passmark G3D score also relates to a high Gpufit fitting speed.

Performance of Gpufit vs Passmark G3D¶
Gpufit software license¶
MIT License
Copyright (c) 2017 Mark Bates, Adrian Przybylski, Björn Thiel, and Jan Keller-Findeisen
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.