DataPDF Available

Reliable computations of knee point for a curve and introduction of a unit invariant estimation

Authors:

Abstract

We are investigating the problem of knee finding for a curve when we have only a set of points from it. We are using Menger curvature plus the Extremum Distance Estimator, in order to compute the knee point. Problems of noisy and rescaling data are discussed with respect to the invariance or not of knee points and their estimations. A unit invariant knee point estimator is proposed.
Reliable computations of knee point for a curve and introduction of a unit
invariant estimation
Demetris T. Christopoulosa,b
aNational and Kapodistrian University of Athens, Department of Economics
bdchristop@econ.uoa.gr
Abstract
We are investigating the problem of knee finding for a curve when we have only a set of points from it. We are using
Menger curvature plus the Extremum Distance Estimator, in order to compute the knee point. Problems of noisy and
rescaling data are discussed with respect to the invariance or not of knee points and their estimations. A unit invariant
knee point estimator is proposed.
Keywords: knee point estimation, discrete curvature, EDE, unit invariance
1. Introduction
The concept of knee point is used in many fields like
fatigue damage theories [1], [2], [3], detecting number
of clusters [7], [8], botnet detection problem [9], in sys-
tem behavior [10]. Although it is broadly used, simple
codes for computing it are not easily found while many
problems, like scaling non invariance, have not been men-
tioned or investigated since now. A main result of the pa-
per is that after changing the units in x or y–axis (rescal-
ing) the knee point is essentially been lost, since the new
curve has a totally dierent one and it is not possible to
get the initial known by applying the inverse transforma-
tion. Another result is that we can define an invariant mea-
sure of the ’knee property’, based on the Extremum Dis-
tance Estimator (EDE method), which is invariant under
the common used unit transformations.
The structure of the paper: Initial definitions are given
at Section 2, the two methods and numerical examples are
presented at Section 3, the noisy data problem is solved at
Section 4, the rescaling problems at Section 5, the defi-
nition of unit invariant knee is in Section 6, discussion at
Section 7 and Rcodes are given at Appendix 9.
2. Knee point
First of all the knee point of a curve that is plotted using
y=f(x) is defined as
Definition 2.1. Let f be at least C(4) ([a,b]). The knee
point of f in [a,b]is the unique extreme point of curvature
k(f)=f′′ (x)
1+(f(x))23
2
at [a,b].
Since we can always drop away the non-zero and positive
denominator 1+f(x)23
2and all of its powers for com-
puting roots and signs, we can achieve using elementary
Calculus the next
Corollary 2.1. Knee point χis the algebraic solution at
[a,b]of the characteristic equation
E1(f, χ)=f′′′ (χ)1+f(χ)23f′′ (χ)2f(χ)=0 (1)
provided that for χit also holds that the second derivative
E2(f, χ)=34f(χ)21f′′(χ)39f(χ)1+f(χ)2f′′′(χ)f′′(χ)+1+f(χ)22f′′′′(χ)
(2)
is positive or negative, for a maximum or minimum cur-
vature respectively.
Corresponding author: Tel.: +306979210251 December 1, 2014
Figure 1: True knee point and estimation, f(x)=1
x+5,x[0.2,2.1]
Example 2.1. For f (x)=1
x+5,x[0.2,2] we compute
E1(f, χ)=6
1+1
χ4
χ412 1
χ8=0χ=1
while it also holds
E2(f,1)=48 >0
and we indeed have minimum curvature, since function is
strictly concave, see Figure 1 where we have plotted the
relevant circles with radii of curvature and the curvature
k(f).
Example 2.2. For f (x)=a eb x we compute
E1(f, χ)=ab3ebχ1+a2b2ebχ23a3b5ebχ3=0
χ=1
2bln 2a2b2(3)
while it also holds
E2 f,1
2bln 2a2b2)!=3ab4
2a2b2
and we have the maximum curvature at x =χif a >0.
3. Methods to compute the knee point
A set of methods have been proposed for this task using
Menger curvature [4], [5], [6], angle-based method for
Bayesian information criterion [7], the Kneedle algorithm
[10] and dynamically determining the knee, [11].
3.1. Using the Menger curvature
Let’s define first some useful concepts that we will use.
Definition 3.1. Discrete curvature at point xi
of a curve is the Menger curvature of points
{(xi1,yi1),(xi,yi),(xi+1,yi+1)}computed by use of
Heron’s formula for the area of the corresponding
triangle.
Lemma 3.1. Discrete curvature at point (xi,yi)of a
curve is the Menger curvature
DC(xi)=AB2
kpqkkqrk krqk
A=4kpqk2kqrk2
B=kpqk2+kqrk2− kr pk2
kpqk=q(xi1xi)2+(yi1yi)2
kqrk=q(xixi+1)2+(yiyi+1)2
krpk=q(xi+1xi1)2+(yi+1yi1)2
(4)
Proof
The radius of the uniquely defined circle that passes from
three non collinear points p=(xi1,yi1),q=(xi,yi),r=
(xi+1,yi+1)is given at [4], while its inverse value gives the
relevant curvature C(p,q,r)=1
R(p,q,r)=4A(p,q,r)
|pq||qr| |rp|. If
we also use the well known from ancient times Heron’s
formula for computing the area of a triangle
A(p,q,r)=1
4p4|pq|2|qr|2(|pq|2+|qr|2− |rp|2)2then
we can directly find Eq. 4.
Definition 3.2. Discrete knee point for the set
{(xi,yi),i=1,2,...,n}of curve points is
DKconvex =ma x {DC(xi),i=2,...,n1}
DKconcave =min {DC(xi),i=2,...,n1}(5)
2
for a convex or concave curve, respectively.
Small Rcodes for computing 4 and 5 are given at Ap-
pendix 9.
3.2. Using the Extremum Distance Estimator method
We can find an approximation of knee points by using
the Extremum Distance Estimator (EDE) method, as is
defined in [12]. For a convex/concave curve we can find
two knee points approximately given by χF1, χF2respec-
tively. If we have a strictly convex or concave curve, then
knee point is close to χF1. The numerical computation of
the above points can be easily found using [13] by tak-
ing the function findiplist(x,y,index) with index =0 (for
a convex/concave or strictly convex curve) and index =1
(for a concave/convex or strictly concave curve). A small
R code illustrating the use can be found at Appendix 9.
The dierences between our method and Kneedle method
of [10] are (i) in EDE there is not any kind of data smooth-
ing like smooth splines used in Kneedle, (ii) in EDE there
is no kind of conversion to the simplex [0,1] ×[0,1] that
is used in Kneedle and (iii) in EDE there is no threshold
value as is defined and used in Kneedle.
3.3. Numerical examples
A sigmoid curve. Let’s take for example a con-
vex/concave sigmoid curve which is the plot of the func-
tion
f(x)=5+5 tanh(x5),x[0,10] (6)
Then it is easy to find that there exist two knee points that
are positioned symmetrically around the inflection point
p=5
χ1=5arctanh 1
15 195!=3.334536724 (7)
χ2=5+arctanh 1
15 195!=6.665463276 (8)
with the confirmation for curvature extrema
E2(f, χ1)=16016
18225 195 =12.27167454 <0 (9)
E2(f, χ2)= + 16016
18225 195 = +12.27167454 >0 (10)
Figure 2: Knee points for f(x)=5+5 tanh(x 5)
The relevant EDE approximations can theoretically be
found using Lemma 1.4 of [12] to be close to knee points
with an error about ±0.22
xF1=5arctanh 10 3+2 e20+5 e10
5 e10+5=3.556313768 (11)
xF2=5+arctanh 10 3+2 e20+5 e10
5 e10+5=6.443686232 (12)
The curve and the points of interest are plotted at Figure
2 where we have also put the relevant circles of curvature
with radius being the inverse of the computed discrete cur-
vature.
A strictly convex curve. We are studying the function
of Example 2.1 with x[0.2,2.1] and knee χ=1,
which is taken from [10], in order to show that the max-
imum of a rotated graph does not give us the true knee
of the initial curve. We convert the x and y data of an
equidistant partition with N=101 points for [0.2,2.1]
to the [0,1] ×[0,1] range by using the transformation
T1(a,b,x)=xa
bawith a,bthe min, max values respec-
tively for x and y. When we apply our methods to the
simplex we find DK =χF1=x(s)
25 =0.24 0.656, while
3
Figure 3: EDE estimations for knee point of f(x)=1
x+5,x[0.2,2.1]
after applying partially the procedure of Kneedle method, [10]
after rotating with θ=90we find again DK =χF1=
x(s,rot)
25 =0.7129705936 0.24 0.656, see Figure 3.
So by applying the conversion to simplex and then the ro-
tation of 90we just compute by a complicated way the
relevant point of EDE method and not the knee point, thus
the Kneedle method of [10] cannot find the true knee of
the curve, which here was χ=1, see again Figure 1.
4. The problem of noisy data
It is remarkable to mention that when we have a noisy
data set the above EDE estimations are close to the true
knee points, while the direct Menger curvature compu-
tation totally fails, since it is highly sensitive to errors.
As an example we add a uniformly distributed error term
U(0.1,+0.1), to the function 6 using an equidistant
partition of [a,b]=[0,10] of 100 intervals and we find
using Eq. 5 the values DK ={x22 ,x91 }={2.1,9.0}that
are far away from true knee points, while EDE estima-
tions χF1,2={x36,x67 }={3.5,6.6}are still close to the
true knees, see Figure 4.
Figure 4: Knee points for noisy data of f(x)=5+5 tanh(x 5)
Thus we can use the EDE method in order to find the
knee points when we have noisy data with an acceptable
error, compared with the totally divergence of the Discrete
Curvature approach.
5. The problem of rescaling data
Many times, especially in Physics and Engineering, it
is a common practise to rescale data by changing units or
by dividing with maximum values, in order to have the so
called arbitrary units. Then the knee point in most of the
cases is not invariant. We will study the subclass of ane
transformations, those with the form uλu+µ, λ > 0,
because they do not change the order of our data. We will
also study the common logarithmic axis rescaling.
5.1. Ane axis transformations
5.1.1. Rescaling y–axis
If we study instead of the
D={(xi,yi=f(xi)),i=1,2,...,n}data the transformed
D(λy+µ)=nxi,y
i=λf(xi)+µ),i=1,2,...,no, λ > 0
4
then the curvature of our observed curve is
k(λf+µ)=λf′′ (x)
1+λ2(f(x))23
2
(13)
The characteristic equation now is
E1(λf+µ, χ)=f′′′ (χ)1+λ2f(χ)23λ2f′′ (χ)2f(χ)=0
(14)
The second derivative is
E2(λf+µ, χ)=
34λ2f(χ)21λ3f′′(χ)39λ3f(χ)1+λ2f(χ)2f′′′(χ)f′′(χ)+λ1+λ2f(χ)22f′′′′ (χ)
(15)
It is obvious that we cannot find the same values for the
knee point, except iλ=1, so we have proven next
Lemma 5.1. The knee point of a curve is not invariant
under the y–axis transformation y λy+µ, λ > 0, λ ,
1.
Example 5.1. For f (x)=ex,x[a,b]=[2,3] us-
ing Eq. 3 we find the knee point χ=1
2ln (2)=
0.34657359. We want to convert y–axis to an axis with
units of the maximum value e3, so λ=e3, µ =0. Then
we find
E1(e3f, χ)=eχ1+e32(eχ)23e32(eχ)3=0
χ=1
2ln (2)+3=2.65342641
while it also holds
E2 e3f,1
2ln (2)+3!=3
22<0
and we indeed have maximum. Of course the knee point is
now dierent. By taking an equidistant partition of [2,3]
with 100 subintervals we can compute using Eq. 5 that
DKλf+µ=x94 =2.65 correctly, but it has changed.
When we apply the EDE estimation we find the same po-
sition for knee, x =χF1=x69 =1.4, which, although is
not close to knee, it remains the same after rescaling.
Example 5.2. For the function of 3.3 we want to resize
the y–axis converting it to the [0,1] range, so we need
λ=1
10 tanh(5)and µ=1+tanh(5)
2 tanh(5). Then we can find
χ1=4.25526516, χ2=5.74473484, again symmetrical
with respect to inflection point p =5–which is unaltered–
and we observe that they are dierent from the initial
ones. By taking an equidistant partition of [0,10] with
100 subintervals we can compute using 5 for the initial
data and for the transformed one that DKf,1=x34 =3.3
and DKλf+µ,1=x44 =4.3correctly for both cases, but
dierent. So after rescaling y–axis we actually compute a
totally dierent value for knee point, ie the knee point is
not y–scaling invariant.
But, if we apply the EDE estimation we can compute for
the two cases the same results χF1=x37 =3.6, χF2=
x65 =6.4, so they have not changed.
Now we can prove the next remarkable
Lemma 5.2. The estimation of a knee point by using EDE
method is invariant under rescaling the y–axis after ap-
plying an ane transformation.
Proof
By using Lemma 1.4 of [12] for the transformed function
g(x)=λf(x)+µwe have that
xF1,2(g)=arg
x[aδ1,b+δ2](g(x)=g(b)g(a)
ba)
=arg
x[aδ1,b+δ2]nλf(x)=λf(b)+µλf(a)µ
bao
=arg
x[aδ1,b+δ2](f(x)=f(b)f(a)
ba)
=xF1,2(f)
5.1.2. Rescaling x–axis
We want now to study instead of
D={(xi,yi=f(xi)),i=1,2,...,n}the transformed data
D(λx+µ)={(ti=λxi+µ, yi),i=1,2,...,n}, λ > 0.If we
inverse the transformation we find x=t
λµ
λ= Λ t+M
with Λ = 1
λand M=µ
λ. So by using inverse transform
we can find that the curvature as a function of tis
k(f,t)=Λ2f′′ (Λt+M)
12(f(Λt+M))23/2=Λ2f′′ (x)
12(f(x))23/2(16)
The characteristic equation is
E1(f,t)=f′′′ (x)+ Λ2f′′′ (x)f(x)23Λ2f′′ (x)2f(x)=0
(17)
5
The second derivative is
E2(f,t)=f(4) (x)+2Λ2f(4) (x)f(x)2+ Λ4f(4) (x)f(x)49Λ2f′′′ (x)f′′ (x)f(x)
9Λ4f′′′ (x)f(x)3f′′ (x)+12 Λ4f′′ (x)3f(x)23Λ2f′′ (x)3
(18)
using x= Λ t+M. Of course we directly observe that the
new values for knee points are dierent from the initial
ones, provided that λ,1.
Example 5.3. For f (x)=1
2e2
3x,x[a,b]=[1.5,3.5]
there exists a knee point that is found using Eq. 3 to be χ=
3
4ln 9
2=1.12805805. We want to convert the x–axis to
[-1,1], so we need λ=2
5, µ =2
5thus [Λ = 5
2,M=1].
Then it is
E1(f,t)=4
27 e5
3t+2
350
243 e5
3t+2
33=0
t=2
5+3
10 ln 18
25 !=0.49855122
while it also holds
E2 f,2
5+3
10 ln 18
25 !!=8
45 2<0
and we have indeed maximum curvature. By taking an
equidistant partition of [1.5,3.5] with 100 subintervals
we can compute using 5 that DK =t26 =0.5correctly.
If we apply the inverse transform in order to return back to
the initial unit scale, then we will find a value χ=0.25
which is absolutely incorrect. Thus after changing our
units we essentially lost the knee point, ie the knee point
is not x–scaling invariant.
On the contrary, if we apply the EDE estimation we can
compute for the two cases the same positions for the
knees, at first case is x =χF1=x64 =1.65, at second
case is t =χF1=t64 =0.26 which after inversion gives
again x =1.65.
We see that EDE estimations of knee point are scale in-
variant for this kind of x–axis transformations, thus we
can prove next
Lemma 5.3. The estimation of a knee point by using EDE
method is invariant under rescaling the x–axis using an
ane transformation.
Proof
By using Lemma 1.4 of [12] for the transformed function
f(x)=f(Λt+M)=g(t),ta=λa+µ, tb=λb+µwe
have that
tF1,2(g)=arg
t[ta1,tb+∆2]ng(t)=g(tb)g(ta)
tbtao
=arg
x[aδ1,b+δ2]nΛf(x)=f(Λtb+M)f(Λta+M)
λb+µλaµo
=arg
x[aδ1,b+δ2]nΛf(x)=1
λ
f(Λλbµ+M)f(Λλaµ+M)
bao
=arg
x[aδ1,b+δ2](Λf(x)= Λ f(b)f(a)
ba)
=arg
x[aδ1,b+δ2](f(x)=f(b)f(a)
ba)
=xF1,2(f)
since it holds Λ = 1
λand M=µ
λ.
5.2. Logarithmic transformations
5.2.1. Log-Y rescaling
We study instead of the
D={(xi,yi=f(xi)),i=1,2,...,n}data the transformed
D(ln y)=nxi,y
i=ln (f(xi))),i=1,2,...,no,f(xi)>0
The curvature of our observed curve is
k(ln f)=
f′′ (x)f(x)f(x)2
f(x)21+(f(x))2
f(x)2
3
2
The characteristic equation if we omit the χvariable is
E1(ln f)=f4+f2f2f′′′ 3ff3f f 3f′′ f3f′′22f2f2f5=0
(19)
We will also omit the second derivative test, due to its
large size. It is obvious that we cannot find the same val-
ues for the knee point, so we have proven next
Lemma 5.4. The knee point of a curve is not invariant
under the Log-Y axis transformation.
5.2.2. Log-X rescaling
We want now to study instead of
D={(xi,yi=f(xi)),i=1,2,...,n}the transformed
data D(ln x)={(ti=ln (xi),yi),i=1,2,...,n},xi>0.
If we inverse the transformation we find x=et. So by
6
using inverse transform we can find that the curvature as
a function of tis
k(f,t)=e2tf′′ et+etfet
1+e2t(f(et))23/2=x2f′′ (x)+x f (x)
1+x2(f(x))23/2
(20)
The characteristic equation is
E1(f,t)=f2f′′′ 3ff′′2x43f2f′′ x3+2f3+f′′′ x2+3f′′ x+f=0
(21)
with x=etand we omit the second derivative test. Again
we find dierent values for knee points, we have proven
next
Lemma 5.5. The knee point of a curve is not invariant
under the Log-X axis transformation.
Example 5.4. We choose an equidistant partition of 100
subintervals of [a,b]=[5,10] for the function 6, thus we
have a strictly concave curve.
Log-Y. By taking logarithms for y–axis we find χ=x11 =
5.55.518384 which is the correct value from 19, but
dierent from the initial of 6.665463, found at 11. The
EDE estimation is now χF1=x27 =6.3closer to the ini-
tial value and slightly dierent than the x30 =6.45 which
was the result without taking logy. So, even if EDE is not
any more invariant, the new disturbed value is close to the
initial one, thus it is approximately invariant.
Log-X. By taking logarithms for x–axis we find t55 =
2.041220 2.043988 which is the correct value from 21,
but after inversion it gives χ=x55 =7.7, dierent from
the initial of 6.665463, found at 11. The EDE estimation
is now t28 =1.848454 ie χF1=x28 =6.35 closer to
the initial value and slightly dierent than the x30 =6.45
which was the result without taking logx. So, again EDE
is not logx invariant, but it is approximately invariant.
6. A unit invariant estimation of knee point
We found that, although the concept of knee point is
directly connected to the curvature - invariant under re-
parametrizations- it is not by itself invariant under the
most commonly used rescaling of x and y–axis, thus it
is proved to be a non so useful measure. But we have
also found that the two χF1,2points that are defined when
we apply the EDE method are invariant under ane axis
rescaling and are approximately invariant under logarith-
mic axis rescaling. So, even if we are not sure about
the validity of our EDE knee point estimation (since that
method depends on the interval [X(1) ,X(ν)]=[a,b] of
data) we can always be sure that our estimation is unit
invariant or approximately invariant for logarithmic plots.
This is a remarkable result for data handling in natural
sciences. For example in symmetrical sigmoid curves, we
could define the χF1,2points to be the scale invariant ap-
proximations of knee points, giving the next
Definition 6.1. The Unit Invariant Knee (UIK) of a curve
is the proper point given by the Extremum Distance Es-
timator method according to the convexity, concavity or
sigmoidicity classification.
The UIK is invariant under unit transformations of x and
y–axis and it is approximately invariant under logarith-
mic axis rescaling. For a strictly convex or concave curve
UIK =χF1, while for a convex/concave sigmoid curve
UIK1,2=χF1,2.
A visual interpretation of χF1,2points is that they are ac-
tually slant extrema, ie local minimum or maximum rela-
tively to the (slant) total chord which connects initial and
ending points of the curve. If we compute the angle
θ=arctan f(b)f(a)
ba!(22)
and do a rotation to all our points by using the rotation
matrix
R(θ)=
cos (θ)sin (θ)
sin (θ)cos (θ)
(23)
then we can convert our graph to another one which has
as extrema the EDE points, as can be easily verified from
Figure 5 which presents the data of Figure 2 after such a
rotation. Just to mention that the EDE estimations for ro-
tated data are again at the same positions as before χrot
F1,2=
{xrot
36 ,xrot
67 }={2.7549,11.500} → {3.5,6.6}={x36,x67}.
7. Discussion
We studied extensively the concept of knee point giving
definitions for functions and for discrete data. We proved
that it is not neither unit nor logarithmic scale invariant.
7
Figure 5: EDE points for noisy data of f(x)=5+5 tanh(x 5) before
and after rotation
The EDE method was proved to give estimations of knee
point that either are unit invariant or have small departure
after rescaling. Thus we defined the unit invariant knee
(UIK) estimation of the knee property and we can use it
as a way for more reliable non parametric knee point com-
putations.
8. Acknowledgment
The author of this article would like to thank Marek W.
Gutowski for his valuable comments about the concept of
knee point.
9. Appendix
Here is the Rcode for computing Menger discrete cur-
vature of data using Eq. 4.
cmenger<-function(x1,y1,x2,y2,x3,y3){
sqrt(abs(4*((x1-x2)^2+(y1-y2)^2)*((x2-x3)^2+(y2-y3)^2)-
((x1-x2)^2+(y1-y2)^2+(x2-x3)^2+(y2-y3)^2-(-x3+x1)^2-
(-y3+y1)^2)^2))/(sqrt((x1-x2)^2+(y1-y2)^2)*
sqrt((x2-x3)^2+(y2-y3)^2)*sqrt((x3-x1)^2+(y3-y1)^2));
}
The Rcode for finding knee points using Eq. 5.
findknee<-function(x,y){
n=length(x)
c=rep(0,n-2)
for (j in (1:(n-2)))
{
c[j]=cmenger(x[j],y[j],x[j+1],y[j+1],x[j+2],y[j+2]);
}
jm=which.max(c);cm=max(c);
ikn=jm+1;
xkn=x[ikn];
c(cm,ikn,x[ikn]);
}
The Rcode for finding UIK of 6 using EDE method de-
scribed at Section 3.2.
#load package inflection
library(inflection)
#convex/concave sigmoid curve:
x=cbind(seq(0,10,by=0.05));y=cbind(5+5*tanh(x-5));
plot(x,y,cex=0.5,main=’y=5+5tanh(x-5) as sigmoid’)
i1<-1;i2<-round(length(x)/2);
kp1<-findknee(x[i1:i2],y[i1:i2])[3];
kp1;abline(v=kp1,col=’green’)
i1<-round(length(x)/2);i2<-length(x);
kp2<-findknee(x[i1:i2],y[i1:i2])[3];
kp2;abline(v=kp2,col=’green’)
b<-findiplist(x,y,0);x[b[2,1:2]];
abline(v=x[b[2,1:2]],col=’red’)
legend(’right’,col=c(’green’,’red’),lty=c(1,1),
legend=c(’knees’,’UIKs’))
#rescaling y-axis in units of maximum value:
ymax=max(y);ymax;yr<-y/ymax;
plot(x,yr,cex=0.5,
main=’y=5+5tanh(x-5) as sigmoid: Rescaling y-axis’)
abline(v=c(kp1,kp2),col=’green’)
i1<-1;i2<-round(length(x)/2);
kp1r<-findknee(x[i1:i2],yr[i1:i2])[3];
kp1r;abline(v=kp1r,lty=2,col=’green’)
i1<-round(length(x)/2);i2<-length(x);
kp2r<-findknee(x[i1:i2],yr[i1:i2])[3];
kp2r;abline(v=kp2r,lty=2,col=’green’)
b<-findiplist(x,yr,0);x[b[2,1:2]];
abline(v=x[b[2,1:2]],col=’red’)
legend(’right’,col=c(’green’,’green’,’red’),
lty=c(1,2,1),legend=c(’knees (initial)’,
’knees (y-rescale)’,’UIKs’))
#observe the unit invariant knee points (UIKs)
#strictly convex:
8
x=cbind(seq(0,5,by=0.05));y=cbind(5+5*tanh(x-5));
plot(x,y,cex=0.5,
main=’y=5+5tanh(x-5) as strictly convex’)
kp<-findknee(x,y)[3];kp;abline(v=kp)
b<-findiplist(x,y,0);x[b[2,1]];
abline(v=x[b[2,1]],col=’red’)
legend(’left’,col=c(’green’,’red’),lty=c(1,1),
legend=c(’knee’,’UIK’))
#strictly concave:
x=cbind(seq(5,10,by=0.05));y=cbind(5+5*tanh(x-5));
plot(x,y,cex=0.5,
main=’y=5+5tanh(x-5) as strictly concave’)
kp<-findknee(x,y)[3];kp;abline(v=kp)
b<-findiplist(x,y,1);x[b[2,1]];
abline(v=x[b[2,1]],col=’red’)
legend(’right’,col=c(’green’,’red’),lty=c(1,1),
legend=c(’knee’,’UIK’))
References
[1] A. Fatemi and L. Yang, Cumulative fatigue damage
and life prediction theories: a survey of the state
of the art for homogeneous materials, International
Journal of Fatigue,20 (1), January 1998, pp 9–34,
DOI:10.1016/S0142-1123(97)00081-9, 1998.
[2] L. Franke and G. Dierkes, A non-linear fatigue dam-
age rule with an exponent based on a crack growth
boundary condition, International Journal of Fa-
tigue,21 (8), September 1999, pp 761–767, DOI:
10.1016/S0142-1123(99)00045-6,1999.
[3] K. Endo and H. Goto, Initiation and propa-
gation of fretting fatigue cracks, Wear, Vol-
ume 38, Issue 2, pp 311–324, DOI:10.1016/
0043-1648(76)90079-X, 1976.
[4] Pawel Strzelecki and Heiko von der Mosel, Integral
Menger curvature for surfaces, in Advances in Math-
ematics,226 (3), pp 2233–2304, DOI:10.1016/j.
aim.2010.09.016, 2011.
[5] K. Menger, Untersuchungen ber allgemeine Metrik,
in Mathematische Annalen ,103 (1), pp 466-501,
DOI:10.1007/BF01455705, 1930.
[6] K. Menger, Untersuchungen ber allgemeine Metrik.
Vierte Untersuchung. Zur Metrik der Kurven, in Se-
lecta Mathematica, Springer Vienna, pp 333–368,
DOI:10.1007/978-3- 7091- 6110- 4_22, 2002.
[7] Zhao, Q., Hautamaki, V. and Frnti, Pasi, J.,Knee
Point Detection in BIC for Detecting the Num-
ber of Clusters, in Advanced Concepts for Intel-
ligent Vision Systems, Springer Berlin Heidelberg,
10.1007/978-3-540- 88458- 3_60, pp 664–673,
2008.
[8] Krishnapuram, R. , Frigui, H. and Nasraoui, O., The
Fuzzy C Quadric Shell clustering algorithm and the
detection of second-degree curves, in Pattern Recog-
nition Letters,14, Issue 7, pp 545–552, DOI:10.
1016/0167-8655(93)90103-K, 1993.
[9] Karasaridis, A., Rexroad, B. & Hoeflin, D.,
Wide-scale Botnet Detection and Characterization,
in Proceedings of the First Conference on First
Workshop on Hot Topics in Understanding Bot-
nets, USENIX Association, http://dl.acm.org/
citation.cfm?id=1323128.1323135,2007.
[10] Ville Satopaa, Jeannie Albrecht, David Irwin,
and Barath Raghavan, Finding a ”Kneedle” in a
Haystack: Detecting Knee Points in System Be-
havior, in Proceedings of the 2011 31st Interna-
tional Conference on Distributed Computing Sys-
tems Workshops (ICDCSW ’11), pp. 166–171,
DOI=10.1109/ICDCSW.2011.20, 2011.
[11] Albrecht, J., Tuttle, C., Snoeren, A. C. & Vahdat, A.,
Loose Synchronization for Large-scale Networked
Systems, in Proceedings of the Annual Conference
on USENIX ’06 Annual Technical Conference,
USENIX Association, pp 28–28, https://www.
usenix.org/legacy/events/usenix06/tech/
full_papers/albrecht/albrecht_html/,
2006.
[12] DT Christopoulos, Developing methods for identi-
fying the inflection point of a convex/concave curve,
in arXiv:1206.5478[math.NA], 2012.
[13] DT Christopoulos, R package inflection: Finds
the inflection point of a curve, in http://CRAN.
R-project.org/package=inflection, 2013.
9

File (1)

Content uploaded by Demetris Christopoulos
Author content
... The most standard approach for such a judgment is by taking the so-called 'elbow point,' which is virtually the point where a severely decreasing or increasing curve begins to turn 'flat enough' (12,13,(16)(17)(18). Thus, this study considered the function of the rank factorization curve and used the function uik() from the R package inflection to select the optimal rank (17,19,20). The uik() function detects the factorization rank when the curve begins to climb faster (start point) and the point beyond which the curve flattens out (ending point), which are generally known as the knee points of a curve (Figure 1). Figure 1 selected as the emergence of factorization rank of Golub gene expression data set on the rank survey plot. ...
Article
Full-text available
Background There is a great need to develop a computational approach to analyze and exploit the information contained in gene expression data. Recent utilization of non-negative matrix factorization (NMF) in computational biology has served its capability to derive essential details from a high amount of data in particular gene expression microarrays. Objective A common problem in NMF is finding the proper number rank (r) of factors. Thus, various techniques have been suggested to select the optimal value of rank factorization (r). Method This study focused on the unit invariant knee (UIK) method to calculate factorization rank (basis vector) of the non-negative matrix factorization (NMF) of gene expression data sets is employed. Because the UIK method requires an extremum distance estimator (EDE) that is eventually employed for inflection and identification of a knee point, this study finds the first inflection point of curvature of RSS of the proposed algorithms using the UIK method on gene expression datasets as a target matrix. Results Computation was conducted for the UIK task using the esGolub data set of R studio, and consequently, the distinct results of NMF was subjected to compare on different algorithms. The proposed UIK method is easy to perform, free of a priori rank value input, and does not require initial parameters that significantly influence the model’s functionality. Conclusion This study demonstrates that the UIK method provides a credible prediction for both gene expression data and precisely estimating of simulated mutational processes data with known dimensions.
... This estimate requires a small correction, which must be computed numerically, so that x c is independent of changes in axis scaling. For the tanh function, this correction reduces x c by about 3% (Christopoulos 2014), which is negligible compared to our fit errors. We define T flex as the T bol corresponding to x c , i.e., T flex = 0.919T 0 + T offset . ...
Preprint
Full-text available
We present a comprehensive analysis on the evolution of envelopes surrounding protostellar systems in the Perseus molecular cloud using data from the MASSES survey. We focus our attention to the C$^{18}$O(2--1) spectral line, and we characterize the shape, size, and orientation of 54 envelopes and measure their fluxes, velocity gradients, and line widths. To look for evolutionary trends, we compare these parameters to the bolometric temperature Tbol, a tracer of protostellar age. We find evidence that the angular difference between the elongation angle of the C$^{18}$O envelope and the outflow axis direction generally becomes increasingly perpendicular with increasing Tbol, suggesting the envelope evolution is directly affected by the outflow evolution. We show that this angular difference changes at Tbol = $53 \pm 20$ K, which includes the conventional delineation between Class 0 and I protostars of 70K. We compare the C$^{18}$O envelopes with larger gaseous structures in other molecular clouds and show that the velocity gradient increases with decreasing radius ($|\mathcal{G}| \sim R^{-0.72 \pm 0.06}$). From the velocity gradients we show that the specific angular momentum follows a power law fit $J/M \propto R^{1.83 \pm 0.05}$ for scales from 1pc down to $\sim$500 au, and we cannot rule out a possible flattening out at radii smaller than $\sim$1000 au.
... To fulfill this purpose, it is intended to use a changing point automatic detection method to assist the water absorption analysis. In the literature, there are several methods of knee or jump point detection (Satopaa et al., 2011;Christopoulos, 2014). The following two methods were considered to develop the algorithm described in this work. ...
Article
building materials is relevant in building construction to avoid damage; for example, the drying process plays an important role in the available moisture both inside the material and at its surface. Drying can be defined as the process by which water leaves a porous building material. Understanding and knowledge of the process necessary to predict the performance of those materials in service. This experimental study analyzed the interface influence on the drying and wetting processes of ceramic blocks with perfect contact interface at different interface highs. The results showed an increase in the dry time constant for the materials with perfect contact interface compared to the monolithic materials, and the study found that the farther away from the base the interface is located, the greater is the drying time constant. The interface could significantly retard the moisture transport, ie, the discontinuity of moisture content across the interface indicated that there was a difference in capillary pressure across the interface. Finally, the hydric resistance (HR) values, in multilayer building components, with perfect contact interface, are calculated using a new methodology proposed. This methodology is based on knee point detection and allows determining more correctly the HR values.
... In the literature, there are several methods of knee or jump point detection [5]. Since the goal is to address the effect of the interface in water absorption, the water resistance is measured immediately after the first changing point, which is the time interval of interest. ...
Conference Paper
This work presents the results of an experimental campaign in order to determine the hygric resistance in multilayered building components, with different interface types. The results show a slowing of the wetting process due to the interfaces hygric resistance. The samples with hydraulic contact interface (cement mortar) present lower absorption rate than the samples with lime mortar. The influence of air space between layers was also demonstrated, i.e., the air space interfaces increase the coefficients of capillary significantly, as the distances from the contact with water increase. The hygric resistance was calculated by three different methods: gravimetric and gamma-ray methods, and the new methodology proposed, an automatic calculation method without human opinion/criteria. The "knee point" was detected, numerically, in water absorption curves and the moisture-dependent interface resistance was quantified and validated for transient conditions. The methodology proposed to detect the "knee point" can be also used in the future for different multilayer materials with an interface, in order to obtain more correct hygric resistance values to be used in future numerical simulations.
... In the literature, there are several methods of knee or jump point detection [5]. Since the goal is to address the effect of the interface in water absorption, the water resistance is measured immediately after the first changing point, which is the time interval of interest. ...
Article
Full-text available
This work presents the results of an experimental campaign in order to determine the hygric resistance in multilayered building components, with different interface types. The results show a slowing of the wetting process due to the interfaces hygric resistance. The samples with hydraulic contact interface (cement mortar) present lower absorption rate than the samples with lime mortar. The influence of air space between layers was also demonstrated, i.e., the air space interfaces increase the coefficients of capillary significantly, as the distances from the contact with water increase. The hygric resistance was calculated by three different methods: gravimetric and gamma-ray methods, and the new methodology proposed, an automatic calculation method without human opinion/criteria. The “knee point” was detected, numerically, in water absorption curves and the moisture-dependent interface resistance was quantified and validated for transient conditions. The methodology proposed to detect the “knee point” can be also used in the future for different multilayer materials with an interface, in order to obtain more correct hygric resistance values to be used in future numerical simulations.
... More theoretical aspects and estimations techniques are available at [3]. In a few, non mathematical, words it is the point shown in Fig. 2 as 'knee' where we find the maximum curvature or the smallest tangent circle. ...
Technical Report
Full-text available
We are presenting basic R commands for Unit Invariant Knee (UIK) as an objective elbow estimator in PCA, FA, AA and CA. Our computations are based on Extremum Distance Estimator (EDE) method and its inflection package implementation
Article
Full-text available
Introduction In March 2021, this journal published the paper “Measurement of the Hygric Resistance of Concrete Blocks with Perfect Contact Interface: Influence of the Contact Area”. This commentary aims to provide readers with a set of complementary comments that seek to clarify a few issues that can be raised. Methods The analysis was done based on the original paper “Measurement of the Hygric Resistance of Concrete Blocks with Perfect Contact Interface: Influence of the Contact Area”. The purpose was to complete and comment on the work developed in the original paper, and to clarify some points that might be less understood. Results and Discussion Some interesting questions are presented, and the analysis results intend to clarify them, namely: (1) the magnitude of the quantified post-interface flows; (2) the distinguishability of the moisture absorption in the monolithic and perfect contact samples; (3) the robustness of the knee-point identification algorithm; (4) the dependability of the capillary absorption measurements; (5) the consistency of the capillary absorption processing; (6) the number and “quality” of samples that should be used. Conclusion The conclusions to highlight are the following: the hygric resistance results would be different as they consider different methodologies for the knee point detection and a different number of data points after the knee (different ones) to calculate the slope; the monolithic samples reached the highest moisture masses, Mw, and the Mw values became lower with the interface occurrence; for the knee-point identification, it was only considered valid the use of the third of the three algorithms described in Section 2.3; the taping of the samples was carefully done, and absorption tests using epoxy resin is considered a better solution; the C calculation was made for all monolithic samples, but only the 3 more representative experimental results for each contact area were represented (as mentioned) and the A w of the 10x10 cm ² cross-section should be 0.1013 kg/m ² s 0.5 , which does not influence the conclusions/findings.
Article
Full-text available
We present a comprehensive analysis of the evolution of envelopes surrounding protostellar systems in the Perseus molecular cloud using data from the MASSES survey. We focus our attention to the C ¹⁸ O(2–1) spectral line, and we characterize the shape, size, and orientation of 54 envelopes and measure their fluxes, velocity gradients, and line widths. To look for evolutionary trends, we compare these parameters to the bolometric temperature T bol , a tracer of protostellar age. We find evidence that the angular difference between the elongation angle of the C ¹⁸ O envelope and the outflow axis direction generally becomes increasingly perpendicular with increasing T bol , suggesting the envelope evolution is directly affected by the outflow evolution. We show that this angular difference changes at T bol = 53 ± 20 K, which includes the conventional delineation between Class 0 and I protostars of 70 K. We compare the C ¹⁸ O envelopes with larger gaseous structures in other molecular clouds and show that the velocity gradient increases with decreasing radius ( ∣  ∣ ∼ R − 0.72 ± 0.06 ). From the velocity gradients we show that the specific angular momentum follows a power-law fit J / M ∝ R 1.83±0.05 for scales from 1 pc down to ∼500 au, and we cannot rule out a possible flattening out at radii smaller than ∼1000 au.
Article
Full-text available
Introduction Concrete sealing blocks are not only used in Brazil but worldwide. T he knowledge of the material properties in the presence of moisture becomes necessary to study the durability of buildings. Methods An experimental study was carried out in order to analyse the effect of contact area on the capillary absorption coefficient of concrete samples used in sealing blocks, according to several standards: NBR 9779 (2012), EN 1015-18 (2002), ISO 15148 (2002) and ASTM C1794 (2015). Two types of specimens were analysed; monolithic samples and samples with a perfect contact interface. The monolithic samples were also subjected to axial and radial compression in order to enhance the capacity of masonry. Results The experimental results for the samples with perfect contact interface indicate that the water absorption before the interface presents similar behaviour to the monolithic samples. However, it is possible to observe a reduction of the absorption rate when water reaches the interface due to the hygric resistance. In other words, the moisture transport is significantly retarded by the existence of an interface, i.e ., the discontinuity of moisture content across the interface indicated that there was a difference in capillary pressure across the interface. Also, the interface contact area does not greatly influence the water-resistance values. Conclusion Finally, the Hygric Resistance values (HR), in multilayer building components, with perfect contact interface are calculated using the “knee point” methodology.
Chapter
The moisture transfer process in multilayered building components with an interface is very different than the moisture transfer considered when having different materials/layers separately. Quantifying moisture transfer in multi-layered systems through numerical simulations is essential to predict the real behaviour of those building materials in contact with moisture, which depends on the climatic conditions. Unfortunately, the contact phenomenon is neglected in numerical simulations which compromise the feasibility of the results. In this work, the moisture transfer in multi-layered building components is analysed in detail, for perfect contact and hydraulic contact interface. The “knee point” was detected, numerically, in water absorption curves and the moisture-dependent interface resistance was quantified and validated for transient conditions. The methodology proposed to detect the “knee point” can be also used in the future for different multilayer materials with an interface, in order to obtain more correct maximum hygric resistance values, to be used in future numerical simulations.
Technical Report
Full-text available
Implementation of methods Extremum Surface Estimator (ESE), Extremum Distance Estimator (EDE) and their iterative versions BESE and BEDE in order to identify the inflection point of a curve.
Article
Full-text available
We are introducing two methods for revealing the true inflection point of data that contains or not error. The starting point is a set of geometrical properties that follow the existence of an inflection point p for a smooth function. These properties connect the concept of convexity/concavity before and after p respectively with three chords defined properly. Finally a set of experiments is presented for the class of sigmoid curves and for the third order polynomials.
Chapter
Bei einer der klassischen Definitionen der Bogenlänge geht man folgendermaßen vor: Man setzt, wenn ein Bogen B zwischen den Punkten a und c gegeben ist, auf B einen Richtungssinn fest, etwa von a nach c. Man ordnet sodann, wenn E eine endliche, etwa n Punkte enthaltende Teilmenge von B ist, die Punkte von E in jene Reihenfolge, in welcher sie bei der Durchlaufung von B in der festgesetzten Richtung angetroffen werden, und numeriert sie in dieser Reihenfolge mit b 1, b 2,…,b n. Man bildet hierauf, wenn b i b i+1 den Abstand der Punkte b i und b i+1 bezeichnet, die Zahl l(E, B) $$ = \,\sum\limits_{i = 1}^{n - 1} {{b_i}} {b_{i + 1}}$$ und erklärt als Länge des Bogens B die Zahl o.S.l(E,B), d.h. die obere Schranke aller Zahlen l(E,B), wo $$E \subset |B$$ E alle endlichen Teilmengen von B durchläuft.
Article
Fretting fatigue tests of a carbon steel were carried out. Fatigue cracks were measured by means of electrical resistance and observed with a scanning electron microscope. The mechanism of fretting fatigue failure is discussed from the experimental results. Small fatigue cracks are initiated early in life and some grow to be propagating cracks. Cracks grow to a given depth by tangential stress combined with repeated stress and then propagate with repeated stress alone, causing a knee point in the propagation curve. Fretting fatigue damage is saturated in the first 20-25 % of life which coincides with the knee point. The condition of non-propagating cracks is also known.
Article
Malicious botnets are networks of compromised computers that are controlled remotely to perform large-scale distributed denial-of-service (DDoS) attacks, send spam, trojan and phishing emails, distribute pirated media or conduct other usually illegitimate activities. This paper describes a methodology to detect, track and characterize botnets on a large Tier-1 ISP network. The approach presented here differs from previous attempts to detect botnets by employing scalable non-intrusive algorithms that analyze vast amounts of summary traffic data collected on selected network links. Our botnet analysis is performed mostly on transport layer data and thus does not depend on particular application layer information. Our algorithms produce alerts with information about controllers. Alerts are followed up with analysis of application layer data, that indicates less than 2% false positive rates.
Article
For the life prediction of cyclic loaded specimens several damage accumulation theories exist. Some use a power law rule to convert the damage from one load level to another. The difference between these rules is the calculation of the exponent. In this article, a new exponent is proposed based on a crack growth boundary condition. In order to improve life prediction accuracy, the variation of the crack growth speed with the load level is needed. The new theory is compared with experimental data from the literature and the results are presented. Further developments of the theory are proposed.
Conference Paper
Bayesian Information Criterion (BIC) is a promising method for detecting the number of clusters. It is often used in model-based clustering in which a decisive first local maximum is detected as the number of clusters. In this paper, we re-formulate the BIC in partitioning based clustering algorithm, and propose a new knee point finding method based on it. Experimental results show that the proposed method detects the correct number of clusters more robustly and accurately than the original BIC and performs well in comparison to several other cluster validity indices.
Conference Paper
Computer systems often reach a point at which the relative cost to increase some tunable parameter is no longer worth the corresponding performance benefit. These "knees'' typically represent beneficial points that system designers have long selected to best balance inherent trade-offs. While prior work largely uses ad hoc, system-specific approaches to detect knees, we present Kneedle, a general approach to on line and off line knee detection that is applicable to a wide range of systems. We define a knee formally for continuous functions using the mathematical concept of curvature and compare our definition against alternatives. We then evaluate Kneedle's accuracy against existing algorithms on both synthetic and real data sets, and evaluate its performance in two different applications.
Article
Fatigue damage increases with applied load cycles in a cumulative manner. Cumulative fatigue damage analysis plays a key role in life prediction of components and structures subjected to field load histories. Since the introduction of damage accumulation concept by Palmgren about 70 years ago and ‘linear damage rule’ by Miner about 50 years ago, the treatment of cumulative fatigue damage has received increasingly more attention. As a result, many damage models have been developed. Even though early theories on cumulative fatigue damage have been reviewed by several researchers, no comprehensive report has appeared recently to review the considerable efforts made since the late 1970s. This article provides a comprehensive review of cumulative fatigue damage theories for metals and their alloys, emphasizing the approaches developed between the early 1970s to the early 1990s. These theories are grouped into six categories: linear damage rules; nonlinear damage curve and two-stage linearization approaches; life curve modification methods; approaches based on crack growth concepts; continuum damage mechanics models; and energy-based theories.
Article
This paper introduces a new fuzzy clustering algorithm called the Fuzzy C Quadric Shells algorithm which is expressly designed to seek clusters that can be described by segments of second-degree curves, or more generally by segments of shells of hyperquadrics.