Detection In Lean Manufacturing
In this study, we have investigated the use of some regression techniques to real industrial data, measuring the
times in a production assembly line. The importance of this study is related to the fact that, in Italy, SMEs
(Single-Medium Enterprises) have generally a high rotation of operators, and we wanted to understand if Data
Science could help us find the 'best' operators for a specific manufacturing phase.
Defining what we mean by 'best' when talking about manufacturing is not easy, as we need to take into account
even the quality of a manufactured product. However, this study resulted in an algorithm useful for anomaly
detection on the processes in assembly lines driven by operators.
While we know that studies have been made on anomaly detection (,,,), and while we know that
regression is often used for anomaly detection (,), when considering manufacturing we generally center our
focus on the product  and/or on machines . Instead, in our study, we’ve focused on operators. Also, we
have studied a Lean Manufacturing assembly line driven by operators: this means that the results we’ve gained
are not focused on a particular case (a particular industrial process or a particular product), but can be applied
to all the Lean Manufacturing assembly lines driven by operators.
We have investigated a Lean production assembly line that follows the 'one-piece-flow' production philosophy.
In such cases, we define tc as the cycle time, that is the time needed to complete a manufacturing phase. If the assembly lines has n
manufacturing phases, we define the process lead time as tp =tc * n because we try to divide the whole manufacturing process in
phases that need (barely) the same time to be concluded.
As we can understand, analyzing times becomes a crucial activity in these cases because we need phases that can be processed in the
same amount of time but, in cases of manual assembly lines like the one we have studied, we also need operators that can assemble
the product in a given amount of time; also, we want that time to remain as constant as possible, so that bottlenecks are not created on
the assembly line, which is the objective of this study.
We have studied a three-phase assembly line that has two assembly phases and, in the end, a testing phase. The testing phase hasn't
been studied because it is an automated one and it's the one that 'dictates' the time of the whole line. We mean that, since it is
automatized and can't be improved in terms of time, the two assembly phases have been 'built' with the aim to have their cycle times
equal to the one of the testing phase.
The methodologies we have used in this study are: linear, polynomial, and spline regression. We have subdivided the cycle times for
operators and found the four that influence the most the cycle time of each phase (these are the operators that work most frequently in a
particular manufacturing phase). In the end, we were able to find the operators that remain constant in the cycle time, giving stability to
Here we'll describe the methodology developed for the first assembly phase: since we have studied a lean
manufacturing assembly line following the 'one-piece-ow' method, the methodology of this analysis for the second
phase is identical (and, of course, also the results are identical).
First of all, the cycle times have been normalized (as also  suggests) and we found a left-skewed curve. This
behavior is mostly due to errors when the operators register the time, or it's due to not respected paused-times.
This is the same for the times that have a value near 0: it's phisically impossible that a manufacturing phase can be
completed in a time that is near 0 seconds. For this reason, the 'too high' and 'too low' registered cycle times could
be deleted. We found that cleaning the data in the interval σ±μ(where μis the mean value and σis the standard
deviation) gave good results.
At this point, the cycle times have been divided per operator. Between all the operators that worked on that
manufacturing phase, we found that only 4 made a number of relevant observations; numerically, on the total of
5'000 registered times, we decided that a relevant operator could be one that registered at least 400 times,
meaning that assemble the product in the first assembly phase at least 400 times. We have called these operators
'A1', 'D1', 'N1', and 'O1'. We have taken into account the cycle times of these four operators in the temporal order.
Let’s recall the equation of the linear regression model in the case of a 2-dimensional problem:
Where “Y” is the dependent variable, and “X” is the dependent one. Instead, “b1”is the weight, and “b0”
In our case, “Y” represents the cycle times and “X” the observations through the months.
We also recall how a line is displayed, depending on the sign of b1:
b1 > 0 b1 < 0 b1 = 0
In the case of a linear regression model, our aim is to find the line that best fits the data. To do so,
using the Ordinary Least Squares (OLS) method, we have to find the values of b0 and b1 that
minimize the squared sum of the deviations from the mean. We recall that these values are:
b1 = b0 =
is the mean of x
The following are the graphical results we found when we applied the linear regression model:
So, we found 4 horizontal lines. When performing the residual analysis, we
found that the values of the 'Multiple R-squared' were all close to zero. As we
know, this means that there is no correlation between the two variables: the
cycle time and the number of observations as the months go by. This
suggested us to investigate other regression techniques like polynomial
regression and spline regression.
Even in the case of the polynomial regression, we went both ways: the
graphical and the analytical one.
By analyzing the multiple R-squared for different grades of polynomials, we
found the best polynomials that approximate the data.
Let’s recall the equation of the polynomial regression model:
To find the polynomial of grade “n” that best fits the data, we can calculate the probability of obtaining the
values “y1,...,yn” as follows:
Where we define (σ is the standard deviation):
To find “a,b, c” we have to differentiate the equation:
Putting the equation equal to zero we have to solve a matrix problem like the following:
Solving it, we find “a, b, c”.
The following are the graphical results we found when we applied the polynomial regression model:
The curves approximate straight lines very well; this is particularly true in the area that is far
from the borders. The only graph in which the polynomial curve does not approximate a
horizontal line too well is for operator A; this means that this cycle times are quite fluctuating.
To be sure that the polynomials are really lines, we have to make an analysis with splines.
When trying to plot the spline for the operators D, N, and O, 'R' (the software we used for this
analysis) gave us an error, telling us it can not plot the spline. This means that, indeed, the
(horizontal) linear regression model is the best one to describe the data of operators ‘D', 'N',
We recall that a spline is a function defined piecewise by polynomials of degree k, whose aim is to interpolate a set
of points (called nodes of the spline) in an interval.
For operator A we found the following spline:
We partition the interval [a,b] into “m+1”subintervals defined as:
We define spline a function of grade “n” related to the partition “xi” of [a,b] afunction S(x) that satisfies the following:
●S(x) is a polynomial of grade not higher than “n” in every subinterval “li”.
●The (n-1) derivatives of S(x) are continuous in [a,b].
Then, the equation of the spline is:
So this methodology leads us to find operators that 'produce' cycle times that have no correlation with the number
of observations. So, after investigating the polynomial regression model and the spline model, if the best model to
describe the data results is a horizontal regression line it means that this operator 'produces' a cycle time that is
generally constant (a horizontal line).
This means that we found an operator that concludes a manufacturing phase in a constant amount of time, as the
months go by. This is important because, since in lean manufacturing assembly lines we want to avoid bottlenecks,
we prefer to have constant operators instead of fast operators. Of course, this method is useful even to find the
fastest operators, but, speaking from amanufacturing perspective, we prefer constant operators.
So, we found the operators that, more than others, conclude a manufacturing phase in a constant cycle time over
the months, giving stability to the assembly line. In our case, operator A would not be one of these operators and,
as the majority of the operators (3 out of 4) remain constant, operator A is our anomaly in the process.
CONCLUSIONS AND DISCUSSIONS
So this methodology leads us to find the operators that conclude their manufacturing work constantly, as the
months go by.
As we’ve stated, the innovation of this study relies on the fact that the majority of the studies based on data in
Manufacturing are centered on the product, or on machines. Instead, this is centered on operators. A good way to
make operators “the center” of the Manufacturing system is to give them the right tools based on data so that they
can make data-driven decisions . But, even in this case, the center of the study is the product, not the operators
A possible interesting use of the results of this study can be helping schedulers allocate operators, based on data.
In other words, scheduler may schedule the production, allocating people to the Manufacturing phases where they
give their best performance in terms of time.
Anyway, while in the introduction we’ve stated that the focus of this study can be interesting, we also said that the
methodology is not innovative at all. For this reason, further studies will be performed on the methodology itself, but
this also opens new horizons of studies, for example on data ethics.
CONCLUSIONS AND DISCUSSIONS
In fact, while we are happy and proud that the EU commission started a wide regulation on AI , we’re
also aware that Manufacturing is a field that needs urgent regulation on AI and data usage, for several
One of the first reasons is that, historically, a lot of companies use production data (quality and timing) to
economically reward operators . This should be ethically regulated on the point of view that managers
may “rise the bar” of timing to not reward operators or to make operators always work a little bit more faster.
Finally, for a broader overview of this study, we can consult citation .
: Chenxi Li, Ziyu Wang, Jiahai Yang, Zhang ShiZe. July 2017. Robust regression for anomaly detection. IEE. Retrieved from
: Ashkan Hafezalkotob, Hamid Ketabian, Hesam Rahimi. January 2014. Balancing the Production Line by the Simulation
and Statistics Techniques: A Case Study. Research Journal of Applied Sciences, Engineering and Technology. Retrieved from
: Biao Wang, Zhizhong Mao. March 2019. Outlier detection based on Gaussian process with application to industrial
processes. Applied Soft Computing, Volume 76. Retrieved from
 Federico Trotta. September 2021. How to use Data Science in industrial production environments. Towards Data Science.
Retrieved from https://towardsdatascience.com/how-to-use-data-science-in-industrial-production-environments-
: Hao Wang, Wenqiang Cui. October 2017. Anomaly detection and visualization of school electricity consumption data.
IEE. Retrieved from https://ieeexplore.ieee.org/abstract/document/8078707
: Per Sieverts Nielsen, Xiufeng Liu. June 2016. Regression-based Online Anomaly Detection for Smart Grid Data. ArXiv.
Retrieved from https://arxiv.org/abs/1606.05781
: Jonathan Habermache, Kris Villez. 2015. Shape Constrained Splines with Discontinuities for Anomaly Detection in a
Batch Process. 12th International Symposium on Process Systems Engineering and 25th European Symposium on Computer
Aided Process Engineering, vol 37. Retrieved from
: Kharitonov, ANahhas, APohl M. 2022. Comparative analysis of machine learning models for anomaly detection in
manufacturing. 3rd International Conference on Industry 4.0 and Smart Manufacturing, volume 200. Retrieved from
: Moldaschl T, Pittino F, Puggl M. 2020. Automatic anomaly detection on in-production manufacturing machines using
statistical learning methods. MDPI. Retrieved from https://www.mendeley.com/catalogue/b5f1a7a4-b4f3-3cf9-9f22-
: Wang S, Wang J, Zhan P. 2021. Temporal anomaly detection on IIoT-enabled manufacturing. Journal of Intelligent
Manufacturing. Retrieved from https://www.mendeley.com/catalogue/9fb95f4f-9bd8-3423-9c44-6ea5ac87502e/
: Ada Bagozi, Devis Bianchini, Valeria De Antonellis , Alessandro Marini. 2018. A Relevance-Based Data Exploration
Approach to Assist Operators in Anomaly Detection. Part of the Lecture Notes in Computer Science book series (LNPSE,
volume 11229). Retrieved from https://link.springer.com/chapter/10.1007/978-3-030-02610-3_20
: The European Commission. June 2023. Artificial intelligence act. Retrieved from
: Federico Trotta. June 2023. Building a Responsible Future: Why AI Regulation is Vital for Manufacturing. Medium.
Retrieved from https://artificialcorner.com/building-a-responsible-future-why-ai-regulation-is-vital-for-manufacturing-