Content uploaded by Carlo Drago

Author content

All content in this area was uploaded by Carlo Drago on Oct 05, 2023

Content may be subject to copyright.

Operators-based Anomaly

Detection In Lean Manufacturing

Assembly Lines

Federico Trotta

Carlo Drago

ABSTRACT

In this study, we have investigated the use of some regression techniques to real industrial data, measuring the

times in a production assembly line. The importance of this study is related to the fact that, in Italy, SMEs

(Single-Medium Enterprises) have generally a high rotation of operators, and we wanted to understand if Data

Science could help us find the 'best' operators for a specific manufacturing phase.

Defining what we mean by 'best' when talking about manufacturing is not easy, as we need to take into account

even the quality of a manufactured product. However, this study resulted in an algorithm useful for anomaly

detection on the processes in assembly lines driven by operators.

While we know that studies have been made on anomaly detection ([5],[7],[8],[10]), and while we know that

regression is often used for anomaly detection ([1],[6]), when considering manufacturing we generally center our

focus on the product [2] and/or on machines [9]. Instead, in our study, we’ve focused on operators. Also, we

have studied a Lean Manufacturing assembly line driven by operators: this means that the results we’ve gained

are not focused on a particular case (a particular industrial process or a particular product), but can be applied

to all the Lean Manufacturing assembly lines driven by operators.

INTRODUCTION

We have investigated a Lean production assembly line that follows the 'one-piece-flow' production philosophy.

In such cases, we define tc as the cycle time, that is the time needed to complete a manufacturing phase. If the assembly lines has n

manufacturing phases, we define the process lead time as tp =tc * n because we try to divide the whole manufacturing process in

phases that need (barely) the same time to be concluded.

As we can understand, analyzing times becomes a crucial activity in these cases because we need phases that can be processed in the

same amount of time but, in cases of manual assembly lines like the one we have studied, we also need operators that can assemble

the product in a given amount of time; also, we want that time to remain as constant as possible, so that bottlenecks are not created on

the assembly line, which is the objective of this study.

We have studied a three-phase assembly line that has two assembly phases and, in the end, a testing phase. The testing phase hasn't

been studied because it is an automated one and it's the one that 'dictates' the time of the whole line. We mean that, since it is

automatized and can't be improved in terms of time, the two assembly phases have been 'built' with the aim to have their cycle times

equal to the one of the testing phase.

The methodologies we have used in this study are: linear, polynomial, and spline regression. We have subdivided the cycle times for

operators and found the four that influence the most the cycle time of each phase (these are the operators that work most frequently in a

particular manufacturing phase). In the end, we were able to find the operators that remain constant in the cycle time, giving stability to

the process.

THE METHODOLOGY

Here we'll describe the methodology developed for the first assembly phase: since we have studied a lean

manufacturing assembly line following the 'one-piece-ow' method, the methodology of this analysis for the second

phase is identical (and, of course, also the results are identical).

First of all, the cycle times have been normalized (as also [3] suggests) and we found a left-skewed curve. This

behavior is mostly due to errors when the operators register the time, or it's due to not respected paused-times.

This is the same for the times that have a value near 0: it's phisically impossible that a manufacturing phase can be

completed in a time that is near 0 seconds. For this reason, the 'too high' and 'too low' registered cycle times could

be deleted. We found that cleaning the data in the interval σ±μ(where μis the mean value and σis the standard

deviation) gave good results.

At this point, the cycle times have been divided per operator. Between all the operators that worked on that

manufacturing phase, we found that only 4 made a number of relevant observations; numerically, on the total of

5'000 registered times, we decided that a relevant operator could be one that registered at least 400 times,

meaning that assemble the product in the first assembly phase at least 400 times. We have called these operators

'A1', 'D1', 'N1', and 'O1'. We have taken into account the cycle times of these four operators in the temporal order.

THE METHODOLOGY

Let’s recall the equation of the linear regression model in the case of a 2-dimensional problem:

Where “Y” is the dependent variable, and “X” is the dependent one. Instead, “b1”is the weight, and “b0”

the bias.

In our case, “Y” represents the cycle times and “X” the observations through the months.

THE METHODOLOGY

We also recall how a line is displayed, depending on the sign of b1:

b1 > 0 b1 < 0 b1 = 0

THE METHODOLOGY

In the case of a linear regression model, our aim is to find the line that best fits the data. To do so,

using the Ordinary Least Squares (OLS) method, we have to find the values of b0 and b1 that

minimize the squared sum of the deviations from the mean. We recall that these values are:

b1 = b0 =

where:

is the mean of x

The following are the graphical results we found when we applied the linear regression model:

THE METHODOLOGY

THE METHODOLOGY

So, we found 4 horizontal lines. When performing the residual analysis, we

found that the values of the 'Multiple R-squared' were all close to zero. As we

know, this means that there is no correlation between the two variables: the

cycle time and the number of observations as the months go by. This

suggested us to investigate other regression techniques like polynomial

regression and spline regression.

Even in the case of the polynomial regression, we went both ways: the

graphical and the analytical one.

By analyzing the multiple R-squared for different grades of polynomials, we

found the best polynomials that approximate the data.

THE METHODOLOGY

Let’s recall the equation of the polynomial regression model:

To find the polynomial of grade “n” that best fits the data, we can calculate the probability of obtaining the

values “y1,...,yn” as follows:

Where we define (σ is the standard deviation):

THE METHODOLOGY

To find “a,b, c” we have to differentiate the equation:

Putting the equation equal to zero we have to solve a matrix problem like the following:

Solving it, we find “a, b, c”.

The following are the graphical results we found when we applied the polynomial regression model:

THE METHODOLOGY

THE METHODOLOGY

The curves approximate straight lines very well; this is particularly true in the area that is far

from the borders. The only graph in which the polynomial curve does not approximate a

horizontal line too well is for operator A; this means that this cycle times are quite fluctuating.

To be sure that the polynomials are really lines, we have to make an analysis with splines.

When trying to plot the spline for the operators D, N, and O, 'R' (the software we used for this

analysis) gave us an error, telling us it can not plot the spline. This means that, indeed, the

(horizontal) linear regression model is the best one to describe the data of operators ‘D', 'N',

and 'O'.

THE METHODOLOGY

We recall that a spline is a function defined piecewise by polynomials of degree k, whose aim is to interpolate a set

of points (called nodes of the spline) in an interval.

For operator A we found the following spline:

We partition the interval [a,b] into “m+1”subintervals defined as:

We define spline a function of grade “n” related to the partition “xi” of [a,b] afunction S(x) that satisfies the following:

●S(x) is a polynomial of grade not higher than “n” in every subinterval “li”.

●The (n-1) derivatives of S(x) are continuous in [a,b].

Then, the equation of the spline is:

THE METHODOLOGY

RESULTS

So this methodology leads us to find operators that 'produce' cycle times that have no correlation with the number

of observations. So, after investigating the polynomial regression model and the spline model, if the best model to

describe the data results is a horizontal regression line it means that this operator 'produces' a cycle time that is

generally constant (a horizontal line).

This means that we found an operator that concludes a manufacturing phase in a constant amount of time, as the

months go by. This is important because, since in lean manufacturing assembly lines we want to avoid bottlenecks,

we prefer to have constant operators instead of fast operators. Of course, this method is useful even to find the

fastest operators, but, speaking from amanufacturing perspective, we prefer constant operators.

So, we found the operators that, more than others, conclude a manufacturing phase in a constant cycle time over

the months, giving stability to the assembly line. In our case, operator A would not be one of these operators and,

as the majority of the operators (3 out of 4) remain constant, operator A is our anomaly in the process.

CONCLUSIONS AND DISCUSSIONS

So this methodology leads us to find the operators that conclude their manufacturing work constantly, as the

months go by.

As we’ve stated, the innovation of this study relies on the fact that the majority of the studies based on data in

Manufacturing are centered on the product, or on machines. Instead, this is centered on operators. A good way to

make operators “the center” of the Manufacturing system is to give them the right tools based on data so that they

can make data-driven decisions [11]. But, even in this case, the center of the study is the product, not the operators

themselves.

A possible interesting use of the results of this study can be helping schedulers allocate operators, based on data.

In other words, scheduler may schedule the production, allocating people to the Manufacturing phases where they

give their best performance in terms of time.

Anyway, while in the introduction we’ve stated that the focus of this study can be interesting, we also said that the

methodology is not innovative at all. For this reason, further studies will be performed on the methodology itself, but

this also opens new horizons of studies, for example on data ethics.

CONCLUSIONS AND DISCUSSIONS

In fact, while we are happy and proud that the EU commission started a wide regulation on AI [12], we’re

also aware that Manufacturing is a field that needs urgent regulation on AI and data usage, for several

reasons.

One of the first reasons is that, historically, a lot of companies use production data (quality and timing) to

economically reward operators [13]. This should be ethically regulated on the point of view that managers

may “rise the bar” of timing to not reward operators or to make operators always work a little bit more faster.

Finally, for a broader overview of this study, we can consult citation [4].

BIBLIOGRAPHY

[1]: Chenxi Li, Ziyu Wang, Jiahai Yang, Zhang ShiZe. July 2017. Robust regression for anomaly detection. IEE. Retrieved from

https://ieeexplore.ieee.org/abstract/document/7997373

[2]: Ashkan Hafezalkotob, Hamid Ketabian, Hesam Rahimi. January 2014. Balancing the Production Line by the Simulation

and Statistics Techniques: A Case Study. Research Journal of Applied Sciences, Engineering and Technology. Retrieved from

https://www.researchgate.net/publication/263848502_Balancing_the_Production_Line_by_the_Simulation_and_Statistics_T

echniques_A_Case_Study

[3]: Biao Wang, Zhizhong Mao. March 2019. Outlier detection based on Gaussian process with application to industrial

processes. Applied Soft Computing, Volume 76. Retrieved from

https://www.sciencedirect.com/science/article/abs/pii/S156849461830718X

[4] Federico Trotta. September 2021. How to use Data Science in industrial production environments. Towards Data Science.

Retrieved from https://towardsdatascience.com/how-to-use-data-science-in-industrial-production-environments-

6accf24afeb2

[5]: Hao Wang, Wenqiang Cui. October 2017. Anomaly detection and visualization of school electricity consumption data.

IEE. Retrieved from https://ieeexplore.ieee.org/abstract/document/8078707

[6]: Per Sieverts Nielsen, Xiufeng Liu. June 2016. Regression-based Online Anomaly Detection for Smart Grid Data. ArXiv.

Retrieved from https://arxiv.org/abs/1606.05781

[7]: Jonathan Habermache, Kris Villez. 2015. Shape Constrained Splines with Discontinuities for Anomaly Detection in a

Batch Process. 12th International Symposium on Process Systems Engineering and 25th European Symposium on Computer

Aided Process Engineering, vol 37. Retrieved from

https://www.sciencedirect.com/science/article/abs/pii/B9780444635778501467

[8]: Kharitonov, ANahhas, APohl M. 2022. Comparative analysis of machine learning models for anomaly detection in

manufacturing. 3rd International Conference on Industry 4.0 and Smart Manufacturing, volume 200. Retrieved from

https://www.sciencedirect.com/science/article/pii/S1877050922003398

[9]: Moldaschl T, Pittino F, Puggl M. 2020. Automatic anomaly detection on in-production manufacturing machines using

statistical learning methods. MDPI. Retrieved from https://www.mendeley.com/catalogue/b5f1a7a4-b4f3-3cf9-9f22-

3d4afb4e69a9/

[10]: Wang S, Wang J, Zhan P. 2021. Temporal anomaly detection on IIoT-enabled manufacturing. Journal of Intelligent

Manufacturing. Retrieved from https://www.mendeley.com/catalogue/9fb95f4f-9bd8-3423-9c44-6ea5ac87502e/

[11]: Ada Bagozi, Devis Bianchini, Valeria De Antonellis , Alessandro Marini. 2018. A Relevance-Based Data Exploration

Approach to Assist Operators in Anomaly Detection. Part of the Lecture Notes in Computer Science book series (LNPSE,

volume 11229). Retrieved from https://link.springer.com/chapter/10.1007/978-3-030-02610-3_20

[12]: The European Commission. June 2023. Artificial intelligence act. Retrieved from

https://www.europarl.europa.eu/RegData/etudes/BRIE/2021/698792/EPRS_BRI(2021)698792_EN.pdf

[13]: Federico Trotta. June 2023. Building a Responsible Future: Why AI Regulation is Vital for Manufacturing. Medium.

Retrieved from https://artificialcorner.com/building-a-responsible-future-why-ai-regulation-is-vital-for-manufacturing-

d123343b91b6