Science topics: Data MiningAnomaly Detection
Science topic
Anomaly Detection - Science topic
Explore the latest questions and answers in Anomaly Detection, and find Anomaly Detection experts.
Questions related to Anomaly Detection
In the UK there has been the start of a very large research project and it has been hoped that 5 million people over the age of 18 years will get involved.
I have joined it and my husband is there to help me. I am now waiting for my appointment in a caravan to take a blood test for screening, blood pressure recording, and measuring of cholesterol levels. They are screening for risk of future disease.
I've never met such a large project. My PhD and follow-ups (5 and 10 years) was originally only for 22 participants and there were less in the follow-ups. Constructivist Grounded Theory. So, I'm looking at both ends of the continuum!
In the letter that came through the letterbox this morning, 1,341,669 people had got involved. It is the largest ever UK health research programme. From another communicaton, there was the indication that the participants were very varied.
Federated learning supports privacy-sensitive applications that refashioned the path of AI models in various autonomous vehicles, traffic prediction, monitoring, healthcare, telecommunication, IoT, cyber-security, pharmaceutics, industrial management, industrial IoT, and healthcare and medical AI. Federated learning remains a comparatively new field with many research possibilities for accomplishing privacy-preserving AI.
It is extremely hard to parse across disciplines for this information- so have you encountered any anomalies or inconsistencies in SR in your field/research?
Hello everyone,
I’m looking for a motivated collaborator to join an ongoing research paper focusing on applying Retrieval-Augmented Generation (RAG) pipelines to a novel real-world use case. The project explores advanced ML techniques for information retrieval and pattern detection with significant practical impact.
I’m seeking someone with:
- Experience in NLP, RAG pipelines, and anomaly detection
- Strong experimental and model development skills
- Familiarity with academic writing and publications
As for my background: I’m a Master’s student in Information Technology at Worcester Polytechnic Institute (WPI), specializing in data science and machine learning. My work includes SBERT-based innovation analysis, a Streamlit-deployed recommender system, and a RAG pipeline for legal document analysis. I also have published a research paper in IoT, available on my ResearchGate profile.
Currently, I am also interning for a marketing agency in Chicago where I am working on building sentiment analysis models to help drive their social media strategy.
I’m currently collaborating with a researcher who brings 2 years of work experience as a software engineer at a top tech startup in Europe, adding strong engineering expertise to our team.
This collaboration offers shared authorship on a high-impact paper and the chance to work on cutting-edge ML applications alongside passionate researchers.
If you’re interested, I’d love to connect and discuss further.
Best regards,
Harsh Deshpande
I am researching how to enhance anomaly detection in edge computing using deep learning. The main challenge is the limited computational resources available on edge devices.
I am considering a hybrid model that combines lightweight neural networks with traditional anomaly detection algorithms. Are there any recent studies or research papers on this topic?
Additionally, what techniques or tools could help optimize deep learning models for edge computing? Any recommendations for frameworks or efficient architectures would be highly appreciated!
Cybersecurity Specialist | Security Architect | AI for Financial Systems
About:
I am Lizell a seasoned Security Architect with extensive experience in banking systems, specializing in designing and implementing secure architectures. I hold SANS certifications and have a strong background in anomaly detection for financial transactions using artificial intelligence.
Currently, I am exploring advanced methods in fraud prevention and the efficacy of multi-factor authentication (MFA) in banking systems, particularly in the Peruvian market. I am open to collaborative research opportunities in these areas or related topics within cybersecurity and AI.
Research Interests:
Anomaly detection in financial systems
Cybersecurity for banking platforms
Multi-factor authentication (MFA) efficacy
Artificial intelligence applications in fraud detection
Contact Information:
Feel free to connect if you are interested in collaborating on projects related to cybersecurity, financial fraud detection, or AI applications in secure systems.
Dear all,
I am doing this marine magnetic survey at a jetty/ barge, where the seabed is scattered with various dumped materials (proven from side scan sonar mosaic). After producing the QAS grid, I found the anomaly patches show a "survey line-following" trend, which means you could easily tell the survey line orientation etc by only looking at the QAS result. The result is so unreal and I couldn't figure out the main reason causing it. I have made a small assumption to trying to explain it (see picture 7 attached), and tried larger iteration number when producing residual grid.
I have attached the detail processing steps, together with illustrations to make this thing easy and clear for your understanding. If you need more information, please leave your comment and I will update you very soon. I would really appreciate if you could help me to understand this. Thank you in advance.




AI-driven anomaly detection systems can significantly enhance real-time threat identification and prevention in distributed networks by leveraging advanced machine learning algorithms and data analysis techniques. Here's how:
- Behavioural Analysis: AI can monitor network traffic and user behaviour patterns continuously, identifying deviations from normal behaviour that may indicate potential threats such as malware, phishing attempts, or insider attacks.
- Real-Time Detection: Traditional methods often rely on predefined rules or signature-based detection, which can miss new or evolving threats. AI systems, however, can detect anomalies in real-time by analysing patterns and flagging unusual activities as soon as they occur.
- Scalability and Adaptability: Distributed networks generate vast amounts of data, which can be overwhelming for human analysts or rule-based systems. AI can process this data at scale, adapting to changes in network architecture or traffic patterns without manual intervention.
- Reduced False Positives: AI models can differentiate between legitimate anomalies (e.g., a new software update rollout) and actual threats, reducing the number of false positives and allowing security teams to focus on real issues.
- Proactive Threat Prevention: By identifying early indicators of potential attacks, such as unusual login attempts or data transfers, AI systems can trigger preventive measures like isolating affected devices or blocking suspicious IPs before a breach occurs.
- Continuous Learning: AI systems can learn from past incidents, refining their detection models to improve accuracy over time. This ability makes them highly effective in evolving threat landscapes, where attackers frequently change tactics.
AI-driven anomaly detection enhances network security by offering faster, more accurate, and scalable solutions for identifying and mitigating threats in real time, ultimately strengthening the resilience of distributed networks.
Anomaly detection in scanned image data set
Dear all,
One of the key points in TSAD (Time Series Anomaly Detection) when using a sliding window is the size of the window. For (quasi)-periodic time series, methods such as AutoPeriod, Autocorrelation etc. can be used.
But for non-periodic time series, what do you think are the best methods to use (with bibliographic references please)?
thanks :-)
Which anomaly detection from slit lamp examination
I am tackled with a industrial research issue in which a massive-scale data which is mostly a stream data is about to be processed for the purpose of outlier detection. The problem is that there are some labels for the so-wanted outliers in the data, even though they are not reliable and thus we should discard them.
My approach to resolve the problem is mainly revolving around unsupervised techniques, although my employer insists on finding a trainable supervised technique by which there will be a major need to have outlier label for each individual data point. In other words, he has got trust issues with unsupervised techniques.
Now, my concern is whether there is any official and valid approach to generate outlier labels, at least to some meaningful extent, especially for a massive-scale data? I have done some research in this regard and also have experience in outlier/anomaly detection, nevertheless, it would be an honor to learn from other scholars here.
Much appreciated
Is there any formula to find the sample size needed to create machine learning or deep learning models in the detection ,localization segmentation and classification of colon polyps
Computers, Materials & Continua new special issue “Deep Learning based Object Detection and Tracking in Videos” is open for submission now.
At link: https://www.techscience.com/cmc/special_detail/object-detection
Guest Editors
Dr. Sultan Daud Khan, National University of Technology, Pakistan.
Prof. Saleh Basalamah, Umm Al-Qura University, Saudi Arabia.
Dr. Farhan Riaz, University of Lincoln, UK.
Summary
Object detection and tracking in videos has become an increasingly important area of research due to its potential applications in a variety of domains such as video surveillance, autonomous driving, robotics, and healthcare. With the growing popularity of deep learning techniques, computer vision researchers have made significant strides in developing novel approaches for object detection and tracking.
This special issue will provide a platform for researchers to present their latest findings, exchange ideas, and discuss challenges related to object detection and tracking in videos. We invite original research articles, reviews, and surveys related to this topic. Additionally, this issue will also welcome topics on action recognition, anomaly detection, and behavior understanding in videos.

I am working on energy forecasting and anomaly detection. I need to find looking energy consumption dataset.
The SCPS Lab (https://www.scpslab.org/) is hiring for two Ph.D. positions in the following areas:
- Federated Defense Against Adversarial Attacks in IIoT.
- Threat and Anomaly Detection for Cloud Security.
The required skills for potential graduate students include:
- Strong background in cyber security.
- Strong background in machine learning and data analytic techniques.
- Background in detection and estimation theory.
- Strong oral and written communication skills.
To apply, please contact Dr. Hadis Karimipour (hadis.karimipour@ucalgary.ca) with your most recent C.V. and a list of two references.
For anomaly detection I am trying to use ensemble learning technique and for more accurate results I want to add one more technique Which one will you suggest
Hi folks, is there any public or private dataset for IC chip defection detection or classification? I'm working on a project of IC chip defection/anomaly detection, and is in bad need of access to any of these datasets. Please help recommend and find such one. Great thanks!
Hi,
I have two values such as:
Normal Values: Contains the steering angles produced by DNN model under normal condition.
Anomalous Values: Contains the steering angle produced by DNN model under adversarial attack.
I just want to see the impact of adversarial attack on steering angle i.e., change in the steering angle. So, my question is can we calculate the deviation (change) in steering as as a difference between Anomalous values and Normal values? For example, lets say we have steering angle -0.16184 for the 4th image frames produced by DNN model under normal scenario and while on the other hand the DNN model produced a steering angle of -0.38242 for the same 4th image frame under adversarial attack. So in this case, the difference between -0.38242, -0.16184 is -0.22058 which means that the adversarial attack deviated the actually steering to a factor of -0.22058.
In this case, my question is that can we say the deviation in steering is -0.22058 or is there any other more defined terms that can be used to express the changes.
how can logs be classified(i.e small datas like for eg. in network for packets like header etc.) for the purpose of anomaly detection
Hello can anyone give me what are the variety ways where we can use this anomaly detection in logs..
Analysis and anomaly detection tools are continually evolving. The machine learning resource provides weightings and estimates in advance, anticipating possible failures and unavailability of systems and applications.
Hello everybody,
I am currently writing my final thesis final about data quality, in particular about consistency. Therefore I am looking for a labeled IoT timeseries dataset for consistency detecting. Does anybody know, where I can find such dataset?
Or does anybody know where I can get a labeled IoT-timeseries dataset for anomaly detection?
Thank you for your help!
Hi,
I would like to know about any publicly available dataset to predict anomaly based operational data or system logs. Please recommend me some of the safety critical systems logs dataset or operational data.
Thank you
With the attacks detection in Cloud environment while using machine learning can I use a traditional network dataset as a training set
Hello everyone,
I would like to ask you all regarding the shapelet discovery and transformation, is there any solid implementation in python for finding out the shapelet discovery and using them to detect anomalies in the time series?
Isolation Forest is a popular algorithm to detect anomalies in a data set. When an anomaly is labelled, we are interested to know what features cause it to be an anomaly. Are there any existing ways to help deciding which features contribute the most to an anomaly?
I'm trying to use some machine learning (autoencoder?) to help classify various data for pass/fail. This data will be RF data such as frequency response etc. Does anyone have any recommendations on methods to best accomplish this?
Regards,
Adam
If I use Autoencoder for anomaly detection based on reconstruction error. Now If I have two or different classes or types of anomaly and my Autoencoder can only detect anomaly and cannot classify them. How can I classify or give probabilistic classification after that. Please provide me some idea on this. Thank you.
I have a time series data from different sensor locations. I am trying to use Autoencoder for outlier/anomaly detection based on reconstruction error. However I am new to the field of machine learning and only have the knowledge of pandas, basic python, some basic machine learning algorithm. What are the approaches I should take for training the Autoencoder model?
Please suggests me all the parameters and types? Should I go detailly through all the theoretical detail before starting to train the model?
I am looking for an open source Machine Learning based tool to find anomalies, such as DDoS events that takes input in netflow format.
Any suggestions are appreciated. I am looking for some research based open source code that I can download and run on some remote machines containing netflow data files of an ISP.
Swamping and masking are caused by input data that is too large for the purposes of anomaly detection.
I want to detect anomaly from a streaming data. Using FFT or DWT history is it possible to detect anomaly on the fly (online) . It will help a lot if anybody could suggest some related resources.
Thanks.
For example, in supply chain or diagnostic, what is the importance of outlier or anomaly detection?
Could any one suggest some good resources of online unsupervised anomaly detection for streaming data.
Hello Everyone,
We have recently investigated a Semi-Supervised Deep Learning approach for anomaly detection of Wind Turbine generators based on vibration signals in the paper entitled "Anomaly Detection on Wind Turbines based on a Deep Learning Analysis of Vibration Signals" [1]. We found that vibration data enables a promising mechanism to detect abnormal behavior on wind turbines with a careful Machine Learning pipeline design. The paper presents an IoT-ready Machine Learning pipeline that encompasses data gathering, preprocessing, feature extraction, and classification. The approach is based on a Semi-Supervised Deep Learning approach, using Deep Autoencoders combined with a normality threshold selection based on the F1-Score analysis for a labeled data-set. The vibration data is preprocessed with band-pass filters and DC-component removal, rotation speed relation, and FFT. Finally, 11 features are extracted, from minimum, maximum, RMS, and standard deviation, to kurtosis, shape factor, energy, and entropy. The trained detection model achieved accuracy >99%, precision >97%, and recall of 100% for the evaluated data-set.
Moreover, with the IoT integration, the proposed workflow can notify users whenever abnormal behavior is noticed. What do you think about our findings? For more details, check the full paper at . If you have similar experiences with anomaly detection in rotating machinery, or if you have any comments or questions, feel free to leave a comment so we can start a fascinating discussion.
[1] José Luis Conradi Hoffmann, Leonardo Passig Horstmann, Mateus Martínez Lucena, Gustavo Medeiros de Araujo, Antônio Augusto Fröhlich, and Marcos Hisashi Napoli Nishioka, Anomaly Detection on Wind Turbines Based on a Deep Learning Analysis of Vibration Signals, In Applied Artificial Intelligence:9, 2021. DOI: 10.1080/08839514.2021.1966879.
Best Regards,
José Luis Conradi Hoffmann
Software/Hardware Integration Lab
Federal University of Santa Catarina, Florianópolis - SC, Brasil
I am looking for X-ray baggage screening dataset for Anomaly detection in X-ray security screening systems.
Thanks for your help.

Does a tool for Explainable AI exist that you can use if you want to identify anomalies in data streams based on AI models? So far, I only know LIME. Maybe you have other suggestions.
I want to use AI models (black box) to identify anomalies in business processes. In your opinion, what are the advantages and disadvantages compared to common approaches such as conformance checking (white box)? (Is there any existing research about this?)
I'm working on fraud detection in online product review systems. I want to work with dynamic graphs, and I read many papers about fraud detection and anomaly detection. But I couldn't find any dynamic graph-based dataset for review systems. Does anyone know such a dataset?
I am working on a project named as video anomaly detection, and I have decided to apply one of these algorithm on my project named as AutoEncoders, RNN, and LSTM.
Kindly guide me which one among them is best for video anomaly detection and why. It will be a great favor.
Regards,
I am looking for an overview of methods to detect univariate contextual outliers in time series data. One example application is data from industrial plants in different (unknown) operation modes or slow trends in the time series, but no seasonal effects. Visually those outliers can be seen easily by a human.
In the attached graph visually the contextual outliers above and below the trend can be identified clearly.
Most global outlier detection methods can be used with an window-based approach. But a method, that automatically consideres the size of the context would be beneficial.
Are there any suggestions which methods are recommended for that purpose?

Hello Researchers!
Pls comment your views below.
I intend to work on IoT security using machine learning approaches.
As a naive in ML, I'm a bit puzzled about the possible algorithms for this problem.
1. Is this an anomaly detection problem?
2. What are the possible ML algorithm that can be used for this problem...?
When taking the standard deviation and mean as the reference info, the value out of 2 or 3 times std could be an anomaly, which works well in most cases. While the quartile concept is introduced, the value smaller than Q1 - 1.5 * IQR or bigger than Q3 + 1.5 * IQR is thought to be an anomaly. The former assumes that the data is normally distributed, what about the latter one? What are their individual advantages compared to each other? Please provide some real examples if convenient.
I am working on a project named as video anomaly detection using matlab. I have a set of different 28 videos. I have to check the anomalies on that video. The algorithm is working fine, but I have an issue of setting threshold values so values greater than threshold marked as anomaly and less than threshold marked as normal frame. Each video have different range of anomaly values, so I am confused how to set it accordingly. Or can I update it with each video on the basis of each video at runtime?
I am doing a project named as anomaly detection in videos using matlab. I have to perform data associate with clusters using JPDA, but unfortunately it isn't working fine. I have go through distinct papers of JPDA, but these are all about the tracking of any object.
Kindly guide me how do I proceed, or any research paper in which JPDA is used to perform data association not for tracking purpose.
Regards
Hi Dear Ph. D.,
I have carefully read your paper on anomaly detection in smart meters.
Unfortunately, the research does not decribe the greedy windowed algorithm, neither its complexity, nor the execution time of the simulation to allow future research to improve it.
I have send you an email and wait to get answers to my many preoccupations.
Regards,
Actually I am asking this question especially for subsurface pressure data of an oil well (pressure will vary according to the flow rate). Nonetheless, any kind of method that works for non-stationary time series data will be sufficient. For the sake of the question, we can assume the data is noiseless.
I have already read about a method of evaluation for unsupervised anomaly detection using excess-mass and mass-volume curves (https://www.researchgate.net/publication/304859477), but was wondering if there are other possibilities.
I was reading "Time-Series Anomaly Detection Service at Microsoft" (https://arxiv.org/pdf/1906.03821.pdf) in these days, and I got some problems for the programming part.
The first picture shows the algorithm, the general idea is to perform the fast fourier transform for a time series sequence, calculate the spectral residual and perform inverse fast fourier transform at the end. When I checked the official code of this paper, before performing the inverse fast fourier transform, the transformed signal ('trans' in the code) was multiplied by the spectral residual and then divided by its amplitude (line 212 - 215 in the second picture) which is confused. If someone can explain about this part? Thanks.


Hello everyone,
I’m working on a project in which I need to detect anomalies in a particular scene (two background scenes). The anomaly could be anything (bolts, pliers, glasses, etc.). However, I have generated synthetic data training with unity because I have very few realistic images and here comes the problem. I was looking throughout different techniques like Domain Adaptation since I need to find a solution that implies training on synthetic and doing inference on real ones, but each of them seems to be focused on the class of the image and thus could fail in finding particular anomaly. I’m not an expert in this field and I’d like to hear an expert’s opinion since I am a little bit stuck :(
thank you very much for you answers!
In SPSS there is an Anomaly Detection procedure searches for unusual cases based on deviations from the norms of their cluster groups.
Where can I read in detail the algorithm by which this procedure works? How are clusters created?
I would appreciate your help.
Dear colleagues,
I recently read a publication of an anomaly detection algorithm called HBOS. Histogram based outlier score. The idea of it is to create for each sample xi an outlier score, how much in evidence one single sample outlies in comparison to other samples of given training data. (the higher the value, the more likely it is an abnormal training sample(+thresholding gave an anomaly detection method). This idea is very interesting, nonetheless this score seems not to be normalized, eg.: [0,1] , but I think this method still can be used for comparison if a training filter has lead to improvement when comparing samplewise the given results. The question I want to open up: Which metrics do you normally use, when you apply a training filter for comparison of outlierness? And what are your experiences? There are distance metric approaches and also binary classification (P/R, AuC,...), but I am looking for a metric which compares the "outlierness with regard to other samples". HBOS looks promising though.
Kindest Regards,
Adnan
We are using CMU-CERT r4.2 dataset for performing Insider threat detection. Are there any alternative to this data set?
The expected answer can be narrowed to anomaly detection algorithms in medical images.
How to optimize Long Short Term Memory Model (CNN) for anomaly detection. Any possibility to prepare light weight model ?
Hi all,
I'm using autoencoder to detect anomalies in my dataset
I used normal dataset for training and detect anomalies on fraud+normal dataset
As I observed so far, autoencoder performed the best at an optimal training dataset size, the ROC, PR AUC went low when the size of the training dataset is too big? Is this because of overfitting?
Please refer to the attached photo below
Thanks in advance!

I have 99 images like scratch
and I have a part - blurred scratch image
How do I detect a part - blurred image?
I tried one-class SVM and Isolation Forest but they couldn't detect
what is the best method to handle Imbalanced Datasets in network anomaly detection?
Looking for the best DL model and platform for Anomaly detection.
I am thinking about Transformer or LSTM. Do you think this is the right choise?
Thanks.
in the detection of anomaly networks using artificial intelligence, what dataset and algorithm is advisable to make use of?
Dear all,
I am trying to collect vibration data from two similar motors which one of them is healthy and new but another one has a bearing problem. Since I do not have any historical data and degradation path of the motors. what can I do with this data to predict the bearing failure? Can I implement RUL(remaining useful life) or should I set a threshold for anomaly detection? I will be thankful if someone could help me since I do not have much experience on machine learning algorithms.
I am working on anomaly detection in real-time and my data looks attached and it's unlabelled. I want to detect if the next minute timestamp is an anomaly or not. My main column in num_part which is number of players and all other columns are supporting metrics.
Issue:-
1.) I have some kind of data for 50 different domains.
2.) There is differnet kind of seasonality in different domains like on WednesdayTuesday domain 1 goes for Maintainance on tuesday domain b etc.
The approach I am using:-
As of now, I am calculating Z-score for every five-minute interval and comparing it with historical data and alerting but getting lots of false positive.
I am even considering it a streaming data like we have in sensors and not a time series.
I want to alert my team in real-time if the incoming next data point is an anomaly.

What are the ways that we can use to analyze the accuracy of unsupervised anomaly detection algorithms like HBOS or Isolation Forest ?
We see in literature there are primary longitudinal waves or secondary transversal waves. Both are used for material inspection but can any one say which is the better technique or is it a case of multiple data giving higher confidence ? I have been searching literature for a good review paper on different waveforms for detecting different anomalies however this does not exist or, can anyone point me in that direction?
Context:
Suppose we are given a baseline population (B) and an anomalous population (A), consisting of entities described by a set of numerical and categorical features (X). Let T denote a statistic computed over a population (count of entities, percentage of entities having a specific property, etc). Based on a distance metric D between T(B) and T(A), population A is labelled as anomalous.
Question:
How can one identify the subgroups of entities within A that lead to the observed difference between T(A) and T(B) ( e.g. entities with X1 in {w11,w12,w13} and X2 > w22 ). Equivalently, what are the subgroups from A that once removed, leads to no statistically significant difference between T(A) and T(B).
Reviewed literature:
Applying the Chi2 test of homogeneity on each feature coupled with the Cramer's V score could lead to a ranking of these features; however this approach provides a unidimensional segmentation of the population and doesn't account for combinations of features.
Example:
B can represents the patients admitted into a given hospital during a given month and A represent the patients admitted into the same hospital a month later, with X representing the demographic features of a patient (age, gender, income, ethnicity, etc). Let T denote the total number of patients admitted with flu into a hospital.
Given that T(A) is statistically larger than T(B), how to localize the subgroup of patients (in terms of their demographic features) that lead to the observed difference?
I am looking to identify the following on high dimensional data.
1. Clusters
2. Outliers
I have tried different dimension reduction approaches and used the reduced dimension to plot the data to identify the patterns graphically.
I have identified the outlier data points through other approaches, but not through clustering. The data contains user activities and my objective is to find the similar group of users and anomalous data points (rows).
Hello and good day/night to all colleagues!
Suppose I have data: attributes of requests (e.g. from history of online bank system)
- client IP (with IP info - continent, country, city, latitude, longitude)
- client agent (Operating System, Browser)
- some info about account (company, user)
- some info about the request (date, time, duration, size, from which url)
Could I build the fraud detection system based on these data analysis (or just anomaly detection systems)?
Are where any common or specifical approaches to such systems?
I'm interested in any information (books, articles, keywords to search, ideas, similar projects).
Hi all, I am about to start my master thesis in Anomaly detection on a multivariate time series data. I am interested in knowing reference to the papers where related work has been done in this area or suggestion to some of the techniques, especially unsupervised approaches.
Dear All,
Why do we need samples of both classes for the training of binary classification algorithms if one-class algorithms can do the job with only samples from one class? I know that one-class algorithms (like one-class svm) were proposed with the absence of negative data in mind and that they seek to find decision boundaries that separate positive samples (A) from negative ones (Not A). Hence the traditional binary classification problem (between (A) and (B) for example) can be formulated as a classification of (A) and (not A = B).
Is it about better classification results or am I missing something?
Thank you in advance
An anomaly detection method can enjoy some properties. The idea presented here is to apply an algorithm for anomaly detection to a dataset, to remove the discovered anomalous points, and then to apply to the remaining data the same algorithm again.
Let S be a set of points, F an outlier detection method, and A the set of outliers of S discovered by F. In this case we can write A=F(S). If F(S-A)={} (empty set), then F is an invariant algorithm for S (or S is invariant respect to F). In this case, F finds all the outliers of S in one fell swoop. If F is invariant for each set, simply F is invariant.
Do you know a method F that is invariant for each set?
I am familiar with using autoencoders to detect Fraud in credit card transactions, But my data is a time series one.
Thanks
Specifically, I want to know real-time machine learning models that are capable of identifying anomalies considering streams of data with multiple features (i.e. multivariate time series).
I have found a scoring mechanism called "Numenta" which scores (NAB) real time machine learning models. However most of the models compared there are for univariate time series and have not been extended for multivariate cases. Therefore, I am searching for similar real-time machine learning models that can handle multivariate input data streams.
For anomaly detection, How to improve FP rate from feedback in deep learning and unbalanced dataset?
I'm trying to calculate rainfall anomaly from monthly rainfall measurements taken between 1997 and 2017 at a single location in Panama (subjected to highly seasonal rainfall as well as El Niño-derived fluctuations in rainfall). My data is one measurement per month per year; 20 values for January (1997-2017), 20 for February, and so on. I have attached a screen shot of the first few rows of data if further clarification is needed.
I am aware of the common anomaly technique of subtracting the long term mean from the actual value and dividing by the standard deviation [(x-xbar)/stdev], but as I understand it this should only be used on normally distributed data.
My issue is that while SOME of my months have normally distributed (gaussian) rainfall over the sampling period, others do not - some are uniformly distributed, some are lognormal. I'm not clear on whether I can use the same anomaly calculation for these different distributions, or if it's appropriate to transform the data. It seems inappropriate to use different anomaly calculations on different months, since I want to eventually compare anomalies from different months together.
I am new to this type of data so I am feeling a bit out of my depth. Any advice would be appreciated - thank you!

I've sketched the following basic metrics categories that one can get from a cloud-native application monitoring solution and use for e.g. anomaly detection:
- structural (e.g. length of a request trace)
- temporal (e.g. traces as distributed on the timescale, duration of the trace and calls within it)
- performance (e.g. 99%-tile response time, throughput)
- resource (e.g. CPU utilization, memory utilization)
- workload (e.g. request rate by endpoint)
- capacity (e.g. number of pods for service)
What do you think about it? Is there something missing? Can some more coarse-granular grouping be introduced?
What would be a good search range for Nu and Gamma Values in OneClassSVM if I want to do a grid search? I have ~17000 training samples of one class with each sample consisting of around 300 dimensions( I might do PCA/others and reduce dimensionality).
We are working on anomaly detection, where we want to quantify the influence of a subset of attributes (or features) for making a data set anomalous.
Let us consider an illustrative example as follows.
We have an anomalous data set D having N attributes and M instances. Now, we execute some deterministic anomaly detection algorithm to retrieve the anomalous instances from D. Let us consider that this algorithm has retrieved Q (Q << M) instances as anomalous.
In the next phase, we wish to quantify the influence of a subset of attributes, say P (P << N), in making Q instances of data set D anomalous. How can we quantify this?
Any help is well appreciated in advance.
Thanks & Ragards
My goal is to detect Intranet DDoS attack and its variants through log dataset. I have other insider threat log dataset for anomaly detection, like CERT or LANL, however it is not sure that anomalies are caused by DDoS. Where can I get a new log dataset for insider threat detection which demonstrably attacked by DDoS?
What's the best open source (i.e., free) approach/library/tool for unsupervised/semi-supervised[i.e., with limited to no training data] time-series [like this - https://github.com/numenta/nupic/blob/master/src/nupic/datafiles/extra/nycTaxi/nycTaxi.csv] anomaly detection.
I’m currently using features that are built with statistics over a certain window. This takes in f.e 10 datapoints and make them into one using PCC,KL or simple average(see link). The predictions are also made over a sliding window meaning one anomaly will be present in multiple windows.
If you have two classes ‘normal’ and ‘anomaly’ how do i best score performance on the test set?
Greetings, in the iscx-2012 dataset, there is a labelled-flow file and pcap files per day. Could someone tell me how to generate .CSV for use in machine learning algorithms? Can I use the labelled-flows.xml or do I have to generate a .CSV from the PCAP? Any link how to do this?
The statistical based approaches can be HMM or non parametric Cumulative sum algorithm but i am not able to understand how to implement these ideas.
Any help would be highly appreciated.
Thanks
I want to find expected value where I know what is anomaly.
How can pattern learning anomaly detection can be carried out for SQL Injection attack.?
Which of the learning anomaly detection tools can be used to classify SQL injection dataset?
I use a dataset about activities that an old person was doing during a year. it has features of start time, end time and activity name as below:
08:52:12 - 08:55:38 - Washing hand/face
08:57:36 - 09:05:53 - Make coffee
09:07:38 - 09:12:52 - Washing hand/face
09:13:57 - 09:21:10 - Make sandwich
09:23:08 - 09:43:11 - Eating
..
I want to insert abnormal situations in which an activity lasts longer than usual or increase frequencies of doing an activity during a day.
i'm programming in python. what should i do?
if i want to insert an abnormal record, should i change the time of all of records that came after that record?
In general, to detect an anomaly in multivariate data (not necessarily time series) , do we need to check which distribution the data is being drawn from ? This is straight forward for univariate but how to find out a mixture distribution when the individual variables in the data have been drawn from different distributions (other than Gaussian). How should one approach in such situation to detect anomalies ? Transforming the data to near-normal is an option but wouldn't that distort the very properties of underlying probability distribution and lead to false anomalies.?
Recent experiments showed application of statistical methods to detect shift in in-home activities routine. These methods considered each type of house activity at initial time normally distributed, with distribution segmented into several regions of different degree of abnormality, with accuracy beyond 90%.
Hoque et all, focused on reducing false alarms in clustering-based anomaly detection on in-home activities with rule-based approach.
Source: E. Hoque, R. F. Dickerson, S. M. Preum, M. Hanson, A. Barth and J. A. Stankovic, “Holmes: A Comprehensive Anomaly Detection System for Daily In-home Activities,” 2015 International Conference on Distributed Computing in Sensor Systems, Fortaleza, pp. 40-51, 2015.
With respect to the network anomaly detection in smart home environments and using machine learning? Such like small dataset needed for training? Few labeled training data needed?
Hi Everyone,
Need your advise on using Multi variate Gaussian distribution for multiple attributes(more than two) . For example, my input data set contains multiple attributes {A1,A2,A3,A4........An}. I wanted to use Gaussian distribution to observe the trend in the input data and use it for techniques like Anomaly detection etc.
For all the examples in the internet(in at least, I have seen), Multivariate Gaussian distribution is used only for 2 dimensions i.e x1,x2. Request you all to let me know if using Multi variate Gaussian distribution for multiple attributes is a valid case?
I am working on the timeseries data where the phenomena sometimes changes rapidly. WOuld like to learn about methods that have been tried on such data. Also interested in classification and anomaly detection for timeseries data
I need HYDICE Urban Data Set or other similar data set for Hyperspectral anomaly detection. can any one help me? please inform me for this emergent need?
I want to know a good way of implementation (either Supervised or Unsupervised), working code with dataset or a good approach on how to solve this problem.