Conference PaperPDF Available

A web-based application for data visualisation and non-linear regression analysis including error calculation for laboratory classes in natural and life sciences

Authors:

Abstract and Figures

In practical laboratory classes students traditionally receive data by reading from a measurement device (ruler, clock, voltmeter, etc.) or digitally as files in exchange formats such as CSV (comma separated value). In many cases these data have to be processed later using non-linear regression, here referred to as curve fitting. Therefore, analog data first have to be digitalised and imported to a data analysis and visualisation program, which is often commercial and requires installation. In this paper we present an alternative concept fusing open-source community tools into a single page web application facilitating data acquisition, visualisation, analysis via non-linear regression and further post processing usable for error calculations. We demonstrate the e-learning potential of this web application accessible at curvefit.tu-chemnitz.de in the context of acquired data as typically obtained in physical laboratory classes from undergraduate studies. A prototype workflow for the topic 'specific electric resistance determination' is presented along with a technical description of the basic web technology used behind. Restrictions, such as limited portability or cumbersome ways to share results electronically between student and supervisor as occurring in traditional software applications are overcome by enabling export via URL. The discussion is complemented by thorough comparison of curve fitting web applications with focus on their capability to be adaptable to user-specific models (equations) as faced by (undergraduate) students in the context of their education in laboratory classes in natural and life sciences, such as physics, biology and chemistry.
Content may be subject to copyright.
A web-based application
for data visualisation and non-linear regression
analysis including error calculation for laboratory
classes in natural and life sciences
Titus Keller and Danny Kowerko
Chemnitz University of Technology,
Endowed Professorship Media Computing,
D-09111 Chemnitz, Germany
Email: titus.keller@s2012.tu-chemnitz.de, danny.kowerko@informatik.tu-chemnitz.de
Abstract—In practical laboratory classes students traditionally
receive data by reading from a measurement device (ruler,
clock, voltmeter, etc.) or digitally as files in exchange formats
such as CSV (comma separated value). In many cases these
data have to be processed later using non-linear regression,
here referred to as curve fitting. Therefore, analog data first
have to be digitalised and imported to a data analysis and
visualisation program, which is often commercial and requires
installation. In this paper we present an alternative concept
fusing open-source community tools into a single page web
application facilitating data acquisition, visualisation, analysis via
non-linear regression and further post processing usable for error
calculations. We demonstrate the e-learning potential of this web
application accessible at curvefit.tu-chemnitz.de in the context of
acquired data as typically obtained in physical laboratory classes
from undergraduate studies. A prototype workflow for the topic
’specific electric resistance determination’ is presented along with
a technical description of the basic web technology used behind.
Restrictions, such as limited portability or cumbersome ways to
share results electronically between student and supervisor as
occurring in traditional software applications are overcome by
enabling export via URL.
The discussion is complemented by thorough comparison of
curve fitting web applications with focus on their capability to
be adaptable to user-specific models (equations) as faced by
(undergraduate) students in the context of their education in
laboratory classes in natural and life sciences, such as physics,
biology and chemistry.
I. INTRODUCTION
The term regression analysis (also often referred as curve
fitting) describes mathematical methods which determine the
relationship between dependent and independent variables of a
mathematical model (typically an explicit equation). The most
commonly used approaches for this problem are based on the
least squares problem. [1]
y=a·x+b(1)
As example, consider equation 1 as the model, with y as
dependent and x as the independent variable. The least squares
approach tries to determine the values for a and b which
minimize the sum of squared distances between the function
and the data points of the respective data set [2].
Curve fitting algorithms solving this problem can be gener-
ally separated in two categories:
1) Linear regression algorithms, which are only applicable
to linear combinations, but produce a deterministic re-
sult.
2) Non-linear regression algorithms (for example the
Levenberg-Marquardt algorithm), which are applicable
to generic models, but use an iterative way to produce
a non-deterministic result. [3]
The generic use case of curve fitting can be described
as statistical analysis. Accordingly, it is used in a wide
range of disciplines, such as natural, life, human, social
and economic sciences or data mining [4], [5], [6], [7]. An
explicit example of non-linear regression using equations with
6 or more parameters is thermal melting curve analysis, a
widespread method used in biochemistry to study stability
of DNA (deoxyribonucleic acid) and proteins [4], [8]. By
means of the web application presented herein, it was recently
demonstrated that such complex multi-parameter equations
can be fitted to experimental RNA (ribonucleic acid) thermal
melting curve data including the required post-processing
calculations derived from the fit parameters [9].
In the context of the broad variability of use cases, this
paper will discuss the practicability of this curve fitting web
application as e-learning tool employed in practical lab classes
which are part of basic studies e.g. in physics. This proof of
concept will be exemplified using data and equations from a
real lab class [10]. Existing open-source or open-access based
computation methods are merged into a browser-based web
application including data import, visualisation, regression
analysis and URL-based data and results export, usable to
quickly and systematically share results between supervisor
and student/user. Using non-proprietary resources makes the
application attractive to be offered by computing centers such
as the ’Universit¨
atsrechenzentrum’ of the Chemnitz University
TABLE I
E-LEARNING RELEVANT CRITERIA AND THEIR DESCRIPTION USED TO
CO MPAR E THE F UN CTI ONA LIT Y OF W EB-B ASE D CU RVE FIT TIN G
APPLICATIONS.
Criteria Description
1. Non-commercial Is the application non-commercial?
2. Help Are there further information about the use of
the software?
3. Input options Can the user define explicit equations and
choose between regression algorithms?
4. Graphic output Can the result and the data set be plotted as a
graph?
5. Export Are there options to export the results, for ex-
ample as PDF?
6. Error measures Are error measures displayed, which can help to
determine the goodness of the fit?
7. Post-processing Is it directly possible to perform further calcu-
lations using the regression results?
of Technology as university-wide services for students and
academic staff.
II. ANALYSIS OF EXISTING CURVE FITTING WEB
APPLICATIONS
Widely used programs for calculation and visualisation of
results in practical lab classes are often commercial, such as
MS Office/Excel (Microsoft Corporation), OriginPro (Origin-
Lab) or Igor Pro (Wavemetrics) and require installation. In
E-learning non-commercial and installation free applications
are of relevance as both criteria save money, (i) for software
licenses and (ii) for their maintenance (installation, upgrades,
...). Accordingly, we studied freely available existing web
applications in a structured and systematic manner according
to defined criteria summarised in Table I.
More than twenty functional curve fit web applications were
identified. Even though there are eventually more, we focus
only on four representatives fulfilling a maximum of relevant
functionality from Table I), namely fitteia1, WolframAlpha2,
mycurvefit3and statpages4. The respective analysis results are
summarised in Table II.
It can be concluded that several regression web applica-
tions exist which have various limitations. Aside from these
quantifiable results, it has to be noted that there are other
important criteria which are not straightforward to measure,
such as ease of use or GUI (graphical user interface) design.
For example, WolframAlpha, as more generic mathematical
software, requires syntax knowledge about the existence and
use of functions. An approach to determine such aspects could
be based on the use of software ergonomic standards, such as
ISO 9241 [11].
1http://fitter.ist.utl.pt/, 17.03.2017
2https://www.wolframalpha.com/, 17.03.2017
3http://mycurvefit.com/, 17.03.2017
4http://statpages.info/nonlin.html, 17.03.2017
TABLE II
ACOMPARISON OF REPRESENTATIVE WEBSITES,WHI CH CA N BE U SED T O
SOLVE CURVE FITTING PROBLEMS,BAS ED ON T HE CR IT ERI A DE FINE D IN
TABL E I.
Criteria fitteia WolframAlpha mycurvefit statpages
1. - -
2.
3. partly partly partly
4. -
5. partly partly -
6. - partly partly
7. - - -
III. RESULTS AND DISCUSSION
The web application developed by the authors (available at
5) consists of two main components, a GUI to execute regular
curve fitting functionality and a curve fit evaluation tool not
discussed here in detail.
With regards to the contents of Table I, the web application
presented in this paper fulfills all criteria at least on an
elementary functional level. A special property of the software
is the possibility to choose between various implementations
of regression algorithms, such as solutions developed in MAT-
LAB’s curve fitting toolbox, Java6or GNU Octave (optim
package available under 7). Note that the latter two are open-
source and access tools, thus free to use in education.
A. User interface for curve fitting
The graphical user interface covering the full curve fitting
workflow is separated into five elements, as shown in Fig. 1.
Thereby following concept is realised:
(left, top) Definition of the data set and options for further
processing and data import (for example of a CSV file).
(left, middle) Input for the mathematical model via ASCII
characters and the related rendered output formula.
(left, bottom) Parameter of the function with their start
values, results and confidences.
(right, top) The result function and the data set plotted as
exportable graph.
(right, bottom) Post processing of curve fit and other
parameters can be conducted through this element. Alter-
natively, the residuum, which plots the difference between
the data set and the result function, can be displayed.
B. Technical background of the web application
The back end of the application is written in Java and
based on Jetty as HTTP-server and servlet-container. Due to
the lack of implementations of regression algorithms writ-
ten in JavaScript the respective functionality is executed by
the server (alternatively to-JavaScript-compiler, such as Em-
scripten could be utilised). Hence the application program-
ming interface (API) for the necessary asynchronous calls is
5http://curvefit.tu-chemnitz.de/, 17.03.2017
6https://www.ee.ucl.ac.uk/mflanaga/java/, 20.03.2017
7https://octave.sourceforge.io/optim/, 20.03.2017
Fig. 1. Screenshot representing the graphical user interface for the curve fitting functionality of the developed application. Data taken from a template protocol of
a lab class in physics at TU Chemnitz (available under https://www.tu- chemnitz.de/physik/PGP/allgemein.php, 20.03.2017). For further description, be referred
to section III-C.
implemented based on REST (Representational State Transfer)
using Jersey as a servlet. For the storage of permanent data on
server-side (for example to share results by URL) a MongoDB
database is connected to the back end.
The front end is using the regular web technologies HTML,
CSS and JS. To simplify the work process the code is mainly
written using the MVC-framework AngularJS. Advantages
for this are for example the possibility to write reusable
components or the use of data binding to connect HTML and
JavaScript content [12].
Other important libraries which have been utilised are:
JS Expression Evaluator to parse and evaluate mathemat-
ical functions in a secure manner.
MathJAX as a way to render the formulas entered by the
user.
JSXGraph to plot the data set and regression results in a
graph.
Plotly to display the heatmap of the evaluation tool.
C. Determination of specific electrical resistance as model
application used in laboratory classes
Among the multitude of potential applications for the
presented curve fitting web tool is its use in education for
example in laboratory classes where experiments are carried
out producing data that have (i) to be visualised and (ii) to
be evaluated using or testing the correctness of mathemati-
cal models. This background is well-known from physic or
chemistry in secondary school or college/university courses.
Especially at the higher levels of education the use of dedicated
curve fitting software is indispensable [10].
Here, a typical use case of such an approach was chosen
from a lab course in physics, namely the problem to deter-
mine the specific electrical resistance R of a wire from the
correlation between its length and its resistance. A concrete
workflow is describable as follows:
Wires of different lengths L will be probed.
An electric circuit is built to measure the current I and
voltage U from respective devices.
The electrical resistance is calculated using the measured
current and voltage according to following the equation:
R=U/I (2)
The data set of wire lengths and respective resistances are
entered into a data visualisation and analysis software.
Regression analysis using a linear function is applied to
the data to calculate an average ratio of resistance and
length. This is the slope m of the fit equation R=m·l.
m=dR/dl (3)
From regression analysis, the slope m is used to calculate
the specific electrical resistance according to:
ρ=m·A=m·π
4·d2,(4)
where A is the cross section of the wire defined by its
diameter d in units of meters.
Regression also provides the 95% confidence value of m
which is here denoted m. Together with the error of
diameter measurement, the relative error of the specific
electrical resistance is calculated according to:
ρ
ρ
=
m
m
+
2∆d
d
.(5)
The experimentalist may use the post-processing module
shown in the bottom, right of Fig. 1 to provide d and das
values, while equations (4) and (5) have to be typed to the GUI
using their ASCII representations. Results are automatically
generated and usable for the protocol. In the above-mentioned
example, the specific electric resistance is directly at hand from
Fig. 1 (bottom, right) giving ρel = (0.51±0.03)µ·myielding
a relative error of 5.7%. Note, that values may be fixed to
a user-defined number of digits using the command fixed().
The example provided in Fig. 1 is available under8and is
usable as exchange format between supervisor and student, e.g.
to evaluate the correctness of the equations and calculations
entered by the students to a previously empty GUI.
IV. CONCLUSION
The web application available at curvefit.tu-chemnitz.de was
presented in the context of its application in e-learning. We
successfully exemplified how typical tasks which are part of
laboratory classes are fully covered, i.e. the data import and
visualisation as scatter plot, regression analysis using topic-
specific functions and post-processing of user-defined (given)
and parameters obtained from curve-fitting. The latter allow
for mathematically reproducible error calculation. Compared
to other web applications, we have overcome various limita-
tions to provide a generic easy-to-use single page tool that can
be widely used in laboratory classes.
V. OU TL OO K
The presented curve fit tool offers also a comprehensive
solution to evaluate regression algorithms utilising simulated
data not discussed here in detail. However, as errors based on
regression are rather determined numerically, it may be used in
teaching to practically visualise the influence of measurement
insecurities or statistical noise on data and their consequences
for accuracy of regression-based parameter determination.
To increase the scope of the curve fit application a sustain-
able data and result storage management system is currently
in preparation. The input for data sets is at present based on
text boxes, which could benefit from a change to the familiar
worksheet-like table structure well known from Excel, Google
Spreadsheets or Apache OpenOffice Calc. More comprehen-
sive data management also include multiple columns and
multiple worksheets including cross-calculations, e.g. used for
data pre-processing. In the example given in section III-C, the
electrical resistance R could then be automatically calculated
from the measured voltage U and current I data according to
8http://curvefit.tu-chemnitz.de/#?58d14390468c6a0adc2ee4c2, 20.03.2017
2. The common issue of outlier values could be solved by
an assisted detection method or completely automatic support
using methods such as random sample consensus [13].
As generic navigation structure similar to file systems a
generic tree view can be used to permanently save and organise
data. In combination with user management it enables a
possibility to offer a cross platform web storage, which could
be used for collaborative working.
To systematically improve the application user evaluations
in multiple iterations are in preparation using either example
questionnaires or observation studies.
Acknowledgments.
This work was partially accomplished within the project
localizeIT (funding code 03IPT608X) funded by the Federal
Ministry of Education and Research (BMBF, Germany) in the
program of Entrepreneurial Regions InnoProfile-Transfer.
REFERENCES
[1] K. Backhaus, Multivariate Analysemethoden eine anwendungsorientierte
Einf¨
uhrung. Berlin: Springer, 2006. [Online]. Available: http:
//dx.doi.org/10.1007/3-540-29932-7
[2] H. Skala, “Will the real best fit curve please stand up?” The College
Mathematics Journal, vol. 27, no. 3, pp. 220–223, 1996.
[3] M. I. Lourakis, “A brief description of the levenberg-marquardt algo-
rithm implemented by levmar,Foundation of Research and Technology,
vol. 4, no. 1, 2005.
[4] A. B ¨
ottcher, D. Kowerko, and R. K. Sigel, “Explicit analytic
equations for multimolecular thermal melting curves,” Biophysical
chemistry, vol. 202, pp. 32–39, 2015. [Online]. Available: http:
//www.sciencedirect.com/science/article/pii/S0301462215000757
[5] D. G. Kleinbaum, L. L. Kupper, A. Nizam, and E. S. Rosenberg, Applied
regression analysis and other multivariable methods, fifth edition ed.
Boston, MA: Cengage Learning, 2013.
[6] R. Ramcharan, “Regressions: Why Are Economists Obessessed with
Them?” Finance Dev, vol. 43, 2006. [Online]. Available: http://www.
ecostat.unical.it/aiello/didattica/Econometria/Regressions %20IMF.pdf
[7] D. J. Hand, H. Mannila, and P. Smyth, Principles
of data mining. MIT press, 2001. [Online]. Available:
https://books.google.de/books?hl=de&lr=&id=SdZ-bhVhZGYC&oi=
fnd&pg=PR17&dq=Data-Mining++curve+fitting&ots=yxP8BjqumY&
sig=ZRnkbwFJ2edTfds 6LrMPF9ZGg
[8] J.-L. Mergny and L. Lacroix, “Analysis of Thermal Melting
Curves,” Oligonucleotides, vol. 13, no. 6, pp. 515–537, Dec.
2003. [Online]. Available: http://www.liebertonline.com/doi/abs/10.
1089/154545703322860825
[9] T. Keller, D. Kowerko, and M. Ritter, “Entwicklung eines webbasierten
Curve-fitting Tools f¨
ur komplexe Multiparameter-Funktionen,” in
Studierendensymposium Informatik 2016 der TU Chemnitz. Chemnitz:
Univ.-Verl, May 2016, pp. 75–85. [Online]. Available: http://
nbn-resolving.de/urn:nbn:de:bsz:ch1- qucosa-201104
[10] W. Schenk, F. Kremer, G. Beddies, T. Franke, P. Galvosas, and
P. Rieger, Physikalisches Praktikum, W. Schenk and F. Kremer,
Eds. Wiesbaden: Springer Fachmedien Wiesbaden, 2014. [Online].
Available: http://link.springer.com/10.1007/978-3-658- 00666-2
[11] C.-C. E. de Normalisation, Ergonomische Anforderungen f¨
ur
B¨
urot¨
atigkeiten mit Bildschirmger¨
aten Teil 10: Grunds¨
atze der
Dialoggestaltung. Februar, 1995.
[12] M. Heinrich and M. Gaedke, “Data binding for standard-based web
applications,” in Proceedings of the 27th Annual ACM Symposium on
Applied Computing. ACM, 2012, pp. 652–657.
[13] M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm
for model fitting with applications to image analysis and automated
cartography,Communications of the ACM, vol. 24, no. 6, pp. 381–395,
1981.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Dieser Beitrag stellt eine Webanwendung für Regressionsanalyse vor, die eine Reihe von Funktionalitäten enthält, an denen es den Standardprogrammen vieler Hersteller von Standalone-Programmen ebenso wie Webanwendungen mangelt. Dazu gehört beispielsweise das automatisierte Finden sinnvoller Startparameter sowie eine Weiterverarbeitung der Fit-Ergebnisse inklusive Fehlerfortpflanzung. In Kombination mit der intuitiven Import-und Exportfunktionalität wird dieses Webtool attraktiv für eine breite Anwenderschaft in den Natur-, Ingenieurs-und Lebenswissenschaften. Am Beispiel von thermischen DNS-Schmelzkurven wird demonstriert, wie selbst Glei-chungen mit mehr als sechs Parametern durch nicht-lineare Regression zuverlässig an experimentelle Daten angepasst werden können. In das Web-Interface wurden Regres-sionsalgorithmen aus Java, Matlab und Octave implementiert, um deren Funktionalität hinsichtlich Rechengeschwindigkeit und Genauigkeit zu evaluieren.
Article
Full-text available
The analysis of thermal melting curves requires the knowledge of equations for the temperature dependence of the relative fraction of folded and unfolded components. To implement these equations as standard tools for curve fitting, they should be as explicit as possible. From the van't Hoff formalism it is known that the equilibrium constant and hence the folded fraction is a function of the absolute temperature, the van't Hoff transition enthalpy, and the melting temperature. The work presented here is devoted to the mathematically self-contained derivation and the listing of explicit equations for the folded fraction as a function of the thermodynamic parameters in the case of arbitrary molecularities. Part of the results are known, others are new. It is in particular shown for the first time that the folded fraction is the composition of a universal function which depends solely on the molecularity and a dimensionless function which is governed by the concrete thermodynamic regime but is independent of the molecularity. The results will prove useful for extracting the thermodynamic parameters from experimental data on the basis of regression analysis. As supporting information, open-source Matlab scripts for the computer implementation of the equations are provided. Copyright © 2015 Elsevier B.V. All rights reserved.
Article
Full-text available
The Levenberg-Marquardt (LM) algorithm is an iterative technique that locates the minimum of a function that is expressed as the sum of squares of nonlinear functions. It has become a standard technique for nonlinear least-squares problems and can be thought of as a combination of steepest descent and the Gauss-Newton method. This document briefly describes the mathematics behind levmar, a free LM C/C++ implementation that can be found at http://www.ics.forth.gr/˜lourakis/levmar.
Article
Regression analysis is a statistical tool used by economists to quantify the relationship between one variable and the other variables that are thought to explain it. Regressions can also identify how close and well determined the relationship is. Today, running thousands of regressions has become commonplace and easy. Despite their benefits, regressions are prone to pitfalls and often misused. Leading difficulties include omitted variables, reverse causality, mismeasurement, and too limited a focus.
Book
The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically, different aspects of data mining have been addressed independently by different disciplines. This is the first truly interdisciplinary text on data mining, blending the contributions of information science, computer science, and statistics. The book consists of three sections. The first, foundations, provides a tutorial overview of the principles underlying data mining algorithms and their application. The presentation emphasizes intuition rather than rigor. The second section, data mining algorithms, shows how algorithms are constructed to solve specific problems in a principled manner. The algorithms covered include trees and rules for classification and regression, association rules, belief networks, classical statistical models, nonlinear models such as neural networks, and local "memory-based" models. The third section shows how all of the preceding analysis fits together when applied to real-world data mining problems. Topics include the role of metadata, how to handle missing data, and data preprocessing.
Article
A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. The authors describe the application of RANSAC to the Location Determination Problem (LDP): given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing and analysis conditions. Implementation details and computational examples are also presented