Science topic

Anomaly Detection - Science topic

Explore the latest questions and answers in Anomaly Detection, and find Anomaly Detection experts.
Questions related to Anomaly Detection
  • asked a question related to Anomaly Detection
Question
2 answers
In the UK there has been the start of a very large research project and it has been hoped that 5 million people over the age of 18 years will get involved.
I have joined it and my husband is there to help me. I am now waiting for my appointment in a caravan to take a blood test for screening, blood pressure recording, and measuring of cholesterol levels. They are screening for risk of future disease.
I've never met such a large project. My PhD and follow-ups (5 and 10 years) was originally only for 22 participants and there were less in the follow-ups. Constructivist Grounded Theory. So, I'm looking at both ends of the continuum!
In the letter that came through the letterbox this morning, 1,341,669 people had got involved. It is the largest ever UK health research programme. From another communicaton, there was the indication that the participants were very varied.
Relevant answer
Answer
Dear Doctor
[A bold agenda to advance health equity and build resilience in a turbulent world This new strategy for global health, WHO’s Fourteenth General Programme of Work, 2025–2028 (GPW 14), sets a bold agenda to get the world back on track to achieve the health-related Sustainable Development Goals (SDGs) while advancing health equity and building health systems resilience in our increasingly turbulent world. Anchored in WHO’s mission to promote, provide and protect health and well-being for all, and WHO’s constitutional commitment to gender equality, universality and human rights, the new strategy has six strategic objectives that respond to the major health challenges and crises of our time: tackling health risks associated with our rapidly changing climate; preventing disease through joint action on the determinants of health; advancing primary health care (PHC) and essential health system capacities in order to accelerate efforts to achieve common goals1 and progress towards universal health coverage (UHC); improving health service coverage and financial protection; and strengthening prevention of, preparedness for and response to health emergencies.]
  • asked a question related to Anomaly Detection
Question
1 answer
Federated learning supports privacy-sensitive applications that refashioned the path of AI models in various autonomous vehicles, traffic prediction, monitoring, healthcare, telecommunication, IoT, cyber-security, pharmaceutics, industrial management, industrial IoT, and healthcare and medical AI. Federated learning remains a comparatively new field with many research possibilities for accomplishing privacy-preserving AI.
Relevant answer
Answer
For anomaly detection in real-time environmental data using Federated Learning (FL), the most beneficial methods are those that support efficient, privacy-preserving, and adaptive learning across distributed nodes (e.g., edge devices with gas sensors, anemometers, etc.).
Here’s a breakdown of the most suitable FL methods and why they work well for anomaly detection in this context:
Best Federated Learning Methods for Real-Time Anomaly Detection
1. Federated Averaging (FedAvg) with Lightweight Models
  • How it works: Clients train local models on their data, send model updates to the server, and the server averages them.
  • Why it's good: Efficient and widely supported. Compatible with lightweight anomaly detection models (e.g., LSTM, autoencoders). Easy to implement for edge devices collecting real-time environmental data.
💡 Use with LSTM-Autoencoders or One-Class SVM for time series anomaly detection.
2. Federated Learning with Differential Privacy (DP-FedAvg)
  • Why it's better: Adds differential privacy to ensure that environmental data (e.g., sensitive pollution levels near factories) remains confidential.
  • Anomaly detection benefit: Maintains privacy while allowing centralized detection of outliers without revealing raw data.
3. Federated Anomaly Detection (FAD) with Model Personalization
  • How it works: Each client has its own personalized model, while a shared global model learns general patterns.
  • Why it's good: Real-time environmental data often varies by location—personalized models adapt to local context. Anomalies (e.g., a spike in gas levels) can be locally specific.
  • Techniques used: Meta-learning (e.g., FedMeta). Multi-task learning.
4. Hierarchical Federated Learning
  • How it works: Introduces an intermediate aggregation layer (e.g., regional aggregators before the central server).
  • Why it’s good for real-time use: Reduces communication overhead. Allows faster, localized anomaly detection at the edge or regionally. Scales well for large sensor networks.
5. Federated Online Learning
  • How it works: Models update incrementally as new data arrives (as opposed to batch updates).
  • Why it fits real-time needs: Supports continuous adaptation to new environmental patterns or sensor drift. Reduces latency in anomaly detection.
💡 Combine with streaming-friendly models like online k-means, online PCA, or incremental autoencoders.
🧠 Choosing the Right Approach Depends On:
FactorSuggested ApproachMany heterogeneous sensorsFAD with model personalizationHigh privacy requirementDP-FedAvg or Secure AggregationNeed for real-time updatesFederated Online LearningRegional data patternsHierarchical FLSimpler, general patternsFedAvg with Autoencoder
🛠️ Real-World Toolkits & Frameworks
  • TensorFlow Federated (TFF) – Good for prototyping FL with anomaly detection models.
  • PySyft + PyTorch – Supports advanced FL with differential privacy.
  • Flower (FL) – Scalable and supports heterogeneous clients.
  • FedLearn – Includes methods tailored for anomaly detection.
  • asked a question related to Anomaly Detection
Question
7 answers
It is extremely hard to parse across disciplines for this information- so have you encountered any anomalies or inconsistencies in SR in your field/research?
Relevant answer
Answer
Today is A. Einstein's birthday. The sooner the fallacy of the theory of relativity is recognized, the better it will be for this remarkable scientist. Let's not forget that he received the Nobel Prize NOT FOR the theory of relativity (in this case, the Nobel Committee showed amazing foresight).
  • asked a question related to Anomaly Detection
Question
1 answer
Hello everyone,
I’m looking for a motivated collaborator to join an ongoing research paper focusing on applying Retrieval-Augmented Generation (RAG) pipelines to a novel real-world use case. The project explores advanced ML techniques for information retrieval and pattern detection with significant practical impact.
I’m seeking someone with:
- Experience in NLP, RAG pipelines, and anomaly detection
- Strong experimental and model development skills
- Familiarity with academic writing and publications
As for my background: I’m a Master’s student in Information Technology at Worcester Polytechnic Institute (WPI), specializing in data science and machine learning. My work includes SBERT-based innovation analysis, a Streamlit-deployed recommender system, and a RAG pipeline for legal document analysis. I also have published a research paper in IoT, available on my ResearchGate profile.
Currently, I am also interning for a marketing agency in Chicago where I am working on building sentiment analysis models to help drive their social media strategy.
I’m currently collaborating with a researcher who brings 2 years of work experience as a software engineer at a top tech startup in Europe, adding strong engineering expertise to our team.
This collaboration offers shared authorship on a high-impact paper and the chance to work on cutting-edge ML applications alongside passionate researchers.
If you’re interested, I’d love to connect and discuss further.
Best regards,
Harsh Deshpande
Relevant answer
Answer
Are you still looking for collaboration?
  • asked a question related to Anomaly Detection
Question
3 answers
I am researching how to enhance anomaly detection in edge computing using deep learning. The main challenge is the limited computational resources available on edge devices.
I am considering a hybrid model that combines lightweight neural networks with traditional anomaly detection algorithms. Are there any recent studies or research papers on this topic?
Additionally, what techniques or tools could help optimize deep learning models for edge computing? Any recommendations for frameworks or efficient architectures would be highly appreciated!
Relevant answer
Answer
Kubilay Ayturan thank u so much Kubilay Ayturan
  • asked a question related to Anomaly Detection
Question
7 answers
Cybersecurity Specialist | Security Architect | AI for Financial Systems
About:
I am Lizell a seasoned Security Architect with extensive experience in banking systems, specializing in designing and implementing secure architectures. I hold SANS certifications and have a strong background in anomaly detection for financial transactions using artificial intelligence.
Currently, I am exploring advanced methods in fraud prevention and the efficacy of multi-factor authentication (MFA) in banking systems, particularly in the Peruvian market. I am open to collaborative research opportunities in these areas or related topics within cybersecurity and AI.
Research Interests:
Anomaly detection in financial systems
Cybersecurity for banking platforms
Multi-factor authentication (MFA) efficacy
Artificial intelligence applications in fraud detection
Contact Information:
Feel free to connect if you are interested in collaborating on projects related to cybersecurity, financial fraud detection, or AI applications in secure systems.
Relevant answer
Answer
Quite interesting. I can always be your research partner in areas of fraud detection and MFA
  • asked a question related to Anomaly Detection
Question
1 answer
Dear all,
I am doing this marine magnetic survey at a jetty/ barge, where the seabed is scattered with various dumped materials (proven from side scan sonar mosaic). After producing the QAS grid, I found the anomaly patches show a "survey line-following" trend, which means you could easily tell the survey line orientation etc by only looking at the QAS result. The result is so unreal and I couldn't figure out the main reason causing it. I have made a small assumption to trying to explain it (see picture 7 attached), and tried larger iteration number when producing residual grid.
I have attached the detail processing steps, together with illustrations to make this thing easy and clear for your understanding. If you need more information, please leave your comment and I will update you very soon. I would really appreciate if you could help me to understand this. Thank you in advance.
Relevant answer
Answer
Try to represent your data with a contouring software like "surfer" (by goldensoftware)
Good luck
Rainer
  • asked a question related to Anomaly Detection
Question
2 answers
AI-driven anomaly detection systems can significantly enhance real-time threat identification and prevention in distributed networks by leveraging advanced machine learning algorithms and data analysis techniques. Here's how:
  1. Behavioural Analysis: AI can monitor network traffic and user behaviour patterns continuously, identifying deviations from normal behaviour that may indicate potential threats such as malware, phishing attempts, or insider attacks.
  2. Real-Time Detection: Traditional methods often rely on predefined rules or signature-based detection, which can miss new or evolving threats. AI systems, however, can detect anomalies in real-time by analysing patterns and flagging unusual activities as soon as they occur.
  3. Scalability and Adaptability: Distributed networks generate vast amounts of data, which can be overwhelming for human analysts or rule-based systems. AI can process this data at scale, adapting to changes in network architecture or traffic patterns without manual intervention.
  4. Reduced False Positives: AI models can differentiate between legitimate anomalies (e.g., a new software update rollout) and actual threats, reducing the number of false positives and allowing security teams to focus on real issues.
  5. Proactive Threat Prevention: By identifying early indicators of potential attacks, such as unusual login attempts or data transfers, AI systems can trigger preventive measures like isolating affected devices or blocking suspicious IPs before a breach occurs.
  6. Continuous Learning: AI systems can learn from past incidents, refining their detection models to improve accuracy over time. This ability makes them highly effective in evolving threat landscapes, where attackers frequently change tactics.
AI-driven anomaly detection enhances network security by offering faster, more accurate, and scalable solutions for identifying and mitigating threats in real time, ultimately strengthening the resilience of distributed networks.
Relevant answer
Answer
well written!
  • asked a question related to Anomaly Detection
Question
3 answers
Anomaly detection in scanned image data set
Relevant answer
Answer
The choice of approach totally depends on the specific dataset, available resources, research objectives, and most importantly the desired level of accuracy. You can experiment with different methods to find the one that best suits your needs.
There are several approaches you can explore to detect anomalies in image datasets. You can explore different machine and deep learning algorithms, such as Autoencoders, Isolation Forests, Support Vector Machine, Convolutional Neural Networks, and Contrastive Learning. Additionally, you can explore hybrid approaches that combine machine learning, computer vision, and deep learning techniques to extract relevant features from images to detect anomalies.
  • asked a question related to Anomaly Detection
Question
3 answers
Dear all,
One of the key points in TSAD (Time Series Anomaly Detection) when using a sliding window is the size of the window. For (quasi)-periodic time series, methods such as AutoPeriod, Autocorrelation etc. can be used.
But for non-periodic time series, what do you think are the best methods to use (with bibliographic references please)?
thanks :-)
Relevant answer
Answer
I have a few references in the enclosed 2014 poster. See on p. 3, section "Outliers". I have never investigated the kind of data you mentioned.
  • asked a question related to Anomaly Detection
Question
7 answers
Which anomaly detection from slit lamp examination
Relevant answer
Answer
Thanks to all
  • asked a question related to Anomaly Detection
Question
3 answers
I am tackled with a industrial research issue in which a massive-scale data which is mostly a stream data is about to be processed for the purpose of outlier detection. The problem is that there are some labels for the so-wanted outliers in the data, even though they are not reliable and thus we should discard them.
My approach to resolve the problem is mainly revolving around unsupervised techniques, although my employer insists on finding a trainable supervised technique by which there will be a major need to have outlier label for each individual data point. In other words, he has got trust issues with unsupervised techniques.
Now, my concern is whether there is any official and valid approach to generate outlier labels, at least to some meaningful extent, especially for a massive-scale data? I have done some research in this regard and also have experience in outlier/anomaly detection, nevertheless, it would be an honor to learn from other scholars here.
Much appreciated
Relevant answer
Answer
You are welcome, Sayyed Ahmad Naghavi Nozad .
I see. A potential direction could involve leveraging techniques from active learning or human-in-the-loop approaches. These methods allow for iterative improvement of models by selectively labeling data points that are most informative or uncertain. By strategically annotating a small subset of your data and iteratively refining your model, you may be able to achieve reliable outlier detection without relying solely on predefined labels.
I hope that helps.
Kind regards,
Dr. Samer Sarsam
  • asked a question related to Anomaly Detection
Question
2 answers
Is there any formula to find the sample size needed to create machine learning or deep learning models in the detection ,localization segmentation and classification of colon polyps
Relevant answer
Answer
Thank you
  • asked a question related to Anomaly Detection
Question
5 answers
..
Relevant answer
Answer
Machine learning techniques can greatly enhance anomaly detection in network traffic and improve cybersecurity defenses by leveraging their ability to analyze vast amounts of data and identify patterns that may indicate malicious activity. Here are some ways machine learning can be applied:
1. Feature extraction: Machine learning algorithms can analyze various network traffic features, such as packet size, protocols, header fields, and timings, to identify normal patterns and create a baseline for comparison. Any deviation from this baseline can be flagged as a potential anomaly.
2. Unsupervised anomaly detection: Unsupervised machine learning algorithms, like clustering or dimensionality reduction techniques, can identify outliers or unusual patterns in network traffic without relying on labeled training data.
3. Supervised anomaly detection: Machine learning models can be trained using labeled datasets to classify network traffic as normal or malicious. This requires a training phase where the model learns from known patterns of attacks and then applies that knowledge to detect similar attacks in real-time.
4. Behavior-based detection: Machine learning can develop models that learn the behavior of normal network traffic over time. Any deviation from this learned behavior can be flagged as an anomaly, even if the specific attack hasn't been encountered before.
5. Real-time threat intelligence: By incorporating machine learning with threat intelligence feeds, cybersecurity defenses can benefit from up-to-date information about known threats and attack patterns, enabling faster detection and response.
6. Adaptive defenses: Machine learning models can continuously learn from new data and adapt their detection capabilities to evolving attack techniques, making them more effective in combating emerging threats.
It's important to note that while machine learning can enhance anomaly detection, it's not a foolproof solution. Cybersecurity requires a multi-layered and comprehensive approach that combines machine learning techniques with expert analysis, human oversight, and other security measures.
  • asked a question related to Anomaly Detection
Question
1 answer
Computers, Materials & Continua new special issue “Deep Learning based Object Detection and Tracking in Videos” is open for submission now. At link:  https://www.techscience.com/cmc/special_detail/object-detection
Guest Editors
Dr. Sultan Daud Khan, National University of Technology, Pakistan. Prof. Saleh Basalamah, Umm Al-Qura University, Saudi Arabia. Dr. Farhan Riaz, University of Lincoln, UK.
Summary
Object detection and tracking in videos has become an increasingly important area of research due to its potential applications in a variety of domains such as video surveillance, autonomous driving, robotics, and healthcare. With the growing popularity of deep learning techniques, computer vision researchers have made significant strides in developing novel approaches for object detection and tracking.
This special issue will provide a platform for researchers to present their latest findings, exchange ideas, and discuss challenges related to object detection and tracking in videos. We invite original research articles, reviews, and surveys related to this topic. Additionally, this issue will also welcome topics on action recognition, anomaly detection, and behavior understanding in videos.
Relevant answer
Answer
Elaine Lu Great, Thanks for Sharing.
  • asked a question related to Anomaly Detection
Question
10 answers
I am working on energy forecasting and anomaly detection. I need to find looking energy consumption dataset.
Relevant answer
Answer
Nz Kh could you please share the links such datasets?
  • asked a question related to Anomaly Detection
Question
6 answers
The SCPS Lab (https://www.scpslab.org/) is hiring for two Ph.D. positions in the following areas:
  • Federated Defense Against Adversarial Attacks in IIoT.
  • Threat and Anomaly Detection for Cloud Security.
The required skills for potential graduate students include:
  • Strong background in cyber security.
  • Strong background in machine learning and data analytic techniques.
  • Background in detection and estimation theory.
  • Strong oral and written communication skills.
To apply, please contact Dr. Hadis Karimipour (hadis.karimipour@ucalgary.ca) with your most recent C.V. and a list of two references.
Relevant answer
Answer
Great to know
  • asked a question related to Anomaly Detection
Question
4 answers
For anomaly detection I am trying to use ensemble learning technique and for more accurate results I want to add one more technique Which one will you suggest
  • asked a question related to Anomaly Detection
Question
4 answers
Hi folks, is there any public or private dataset for IC chip defection detection or classification? I'm working on a project of IC chip defection/anomaly detection, and is in bad need of access to any of these datasets. Please help recommend and find such one. Great thanks!
Relevant answer
Answer
@Qamar Ul Islam, Hi Qamar, can you please provide links or means of access to the datasets you mentioned? Some of them can't be found by myself. Great thanks.
  • asked a question related to Anomaly Detection
Question
3 answers
Hi,
I have two values such as:
Normal Values: Contains the steering angles produced by DNN model under normal condition.
Anomalous Values: Contains the steering angle produced by DNN model under adversarial attack.
I just want to see the impact of adversarial attack on steering angle i.e., change in the steering angle. So, my question is can we calculate the deviation (change) in steering as as a difference between Anomalous values and Normal values? For example, lets say we have steering angle -0.16184 for the 4th image frames produced by DNN model under normal scenario and while on the other hand the DNN model produced a steering angle of -0.38242 for the same 4th image frame under adversarial attack. So in this case, the difference between -0.38242, -0.16184 is -0.22058 which means that the adversarial attack deviated the actually steering to a factor of -0.22058.
In this case, my question is that can we say the deviation in steering is -0.22058 or is there any other more defined terms that can be used to express the changes.
Relevant answer
Answer
Manzoor Hussain Yes, you can calculate the deviation (or change) in steering as the difference between the anomalous values and the normal values. In your example, the deviation in steering is -0.22058, which means that the adversarial attack deviated the actual steering by -0.22058. This is a common way to express the change in a value due to an adversarial attack or any other factor. Depending on the context, you may also see this value referred to as the "perturbation" or "offset" in the steering angle.
  • asked a question related to Anomaly Detection
Question
2 answers
how can logs be classified(i.e small datas like for eg. in network for packets like header etc.) for the purpose of anomaly detection
Relevant answer
Answer
Thankyou :)
  • asked a question related to Anomaly Detection
Question
3 answers
Hello can anyone give me what are the variety ways where we can use this anomaly detection in logs..
Relevant answer
Answer
What Is Anomaly Detection in Log File Analysis?
Logging is vital to the success of any IT project. With solid logging practice, you can troubleshoot errors, find patterns, calculate statistics, and communicate information easily. With the size and complexity of modern systems, performing these actions involves various analysis activities.
One of these important analysis activities is anomaly detection. What is anomaly detection, and where does it fit in all of this? That’s what this post is about. I’ll first present a succinct definition of what anomaly detection in log file analysis is. I’ll then explain the definition in detail, before discussing why it’s important for your business and introducing how it works.
_____
_____
Log analysis for anomaly detection
Anomaly detection plays an important role in the management of modern large-scale distributed systems. Logs are widely used for anomaly detection, recording system runtime information, and errors.
Traditionally, operators have to go through the logs manually with keyword searching and rule matching. The increasing scale and complexity of modern systems, however, make the volume of logs explode, which renders the infeasibility of manual inspection. To reduce manual effort, we need anomaly detection methods based on automated log analysis.
Raw log messages are usually unstructured texts. To enable automated mining of unstructured logs, the first step is to perform log parsing, whereby unstructured raw log messages can be transformed into a sequence of structured events. Then we are able to do anomaly detection based on these sequences.
The process of log analysis for anomaly detection involves four main steps:
  1. Log collection
  2. Log parsing
  3. Feature extraction
  4. Anomaly detection
Important: The Python code to run the last three steps of the anomaly detection pipeline, as well as the log file used for the experiment, can be found on GitHub.
_____
_____
Log-based Anomaly Detection with Deep Learning: How Far Are We?
Software-intensive systems produce logs for troubleshooting purposes. Recently, many deep learning models have been proposed to automatically detect system anomalies based on log data. These models typically claim very high detection accuracy. For example, most models report an F-measure greater than 0.9 on the commonly used HDFS dataset. To achieve a profound understanding of how far we are from solving the problem of log-based anomaly detection, in this paper, we conduct an in-depth analysis of five state-of-the-art deep learning-based models for detecting system anomalies on four public log datasets. Our experiments focus on several aspects of model evaluation, including training data selection, data grouping, class distribution, data noise, and early detection ability. Our results point out that all these aspects have significant impact on the evaluation, and that all the studied models do not always work well. The problem of log-based anomaly detection has not been solved yet. Based on our findings, we also suggest possible future work.
_____
_____
  • asked a question related to Anomaly Detection
Question
5 answers
Analysis and anomaly detection tools are continually evolving. The machine learning resource provides weightings and estimates in advance, anticipating possible failures and unavailability of systems and applications.
Relevant answer
Answer
SmartSignal can detect, diagnose, predict, and prevent critical failures. These analytics are built on unrivaled deep industry expertise and proven across the world’s largest energy organizations. Unlike generic AI/ML solutions, SmartSignal provides users access to powerful Digital Twin blueprints that accelerate time-to-value across your investments.
Regards,
Shafagat
  • asked a question related to Anomaly Detection
Question
3 answers
Hello everybody,
I am currently writing my final thesis final about data quality, in particular about consistency. Therefore I am looking for a labeled IoT timeseries dataset for consistency detecting. Does anybody know, where I can find such dataset?
Or does anybody know where I can get a labeled IoT-timeseries dataset for anomaly detection?
Thank you for your help!
Relevant answer
Answer
Hello,
For the general case of time series anomaly detection, several benchmarks have been recently proposed. These two benchmarks contain labeled time series from different domains (for instance, room occupancy detection from temperature, CO2, light, and humidity [1] or accelerometer sensors of a wearable assistant for Parkinson's disease patients [2]).
You may find the links to the benchmarks below. Good luck with your thesis!
[1] Luis M. Candanedo and Véronique Feldheim. 2016. Accurate occupancy detection of an office room from light, temperature, humidity, and CO2 measurements using statistical learning models. Energy and Buildings 112 (2016), 28–39. https://doi.org/10.1016/j.enbuild.2015.11.071
[2] Marc Bachlin, Meir Plotnik, Daniel Roggen, Inbal Maidan, Jerey M. Hausdor, Nir Giladi, and Gerhard Troster. 2010. Wearable Assistant for Parkinson's' Disease Patients With the Freezing of Gait Symptom. IEEE Transactions on Information Technology in Biomedicine 14, 2 (2010), 436–446. https://doi.org/10.1109/TITB. 2009.2036165
  • asked a question related to Anomaly Detection
Question
3 answers
Hi,
I would like to know about any publicly available dataset to predict anomaly based operational data or system logs. Please recommend me some of the safety critical systems logs dataset or operational data.
Thank you
Relevant answer
Answer
This dataset has different data sources, including system logs . Check it
(PDF) X-IIoTID: A Connectivity-and Device-agnostic Intrusion Dataset for Industrial Internet of Things (researchgate.net)
(PDF) X-IIoTID-intrusion-dataset-ready-for-machine-learning (researchgate.net)
  • asked a question related to Anomaly Detection
Question
3 answers
With the attacks detection in Cloud environment while using machine learning can I use a traditional network dataset as a training set
Relevant answer
Answer
Hi Amer.
Check this dataset which include cloud network traffic.
(PDF) X-IIoTID-intrusion-dataset-ready-for-machine-learning (researchgate.net)
  • asked a question related to Anomaly Detection
Question
3 answers
Hello everyone,
I would like to ask you all regarding the shapelet discovery and transformation, is there any solid implementation in python for finding out the shapelet discovery and using them to detect anomalies in the time series?
Relevant answer
Answer
Hello,
There are many solid libraries for shapelet discovery and transformation, depending on the specific method of interest e.g Learned Shapelets, Shapelet Forests, etc. My personal favorites are tslearn (which provides a great implementation of learned shapelets: https://tslearn.readthedocs.io/en/stable/index.html and wildboar (which provides an excellent implementation of shapelet forests): https://github.com/isaksamsten/wildboar .
In terms of using shapelets for anomaly detection, I believe that wildboar offers this or related functionality out of the box. Please visit this page for more information and a code example: https://isaksamsten.github.io/wildboar/master/examples/unsupervised.html#outlier-detection
Hope this helps.
  • asked a question related to Anomaly Detection
Question
3 answers
Isolation Forest is a popular algorithm to detect anomalies in a data set. When an anomaly is labelled, we are interested to know what features cause it to be an anomaly. Are there any existing ways to help deciding which features contribute the most to an anomaly?
Relevant answer
Answer
Jingchen Wu Did you find an answer to your question? I am looking to do something similar.
  • asked a question related to Anomaly Detection
Question
8 answers
I'm trying to use some machine learning (autoencoder?) to help classify various data for pass/fail. This data will be RF data such as frequency response etc. Does anyone have any recommendations on methods to best accomplish this?
Regards,
Adam
Relevant answer
Answer
Dear Adam Ww ,
Anomaly detection is one of the most common use cases of machine learning. Finding and identifying outliers helps to prevent fraud, adversary attacks, and network intrusions that can compromise your company’s future.
Regards,
Shafagat
  • asked a question related to Anomaly Detection
Question
8 answers
If I use Autoencoder for anomaly detection based on reconstruction error. Now If I have two or different classes or types of anomaly and my Autoencoder can only detect anomaly and cannot classify them. How can I classify or give probabilistic classification after that. Please provide me some idea on this. Thank you.
Relevant answer
Answer
Georgi Tancev
Thank you. Can you please clarify what does 'all k classes' mean in your first sentence. Do I have to train my Anomalous data also in autoencoder? Also please let me know if there is any resource I can follow to figure out this process. Thank you.
  • asked a question related to Anomaly Detection
Question
4 answers
I have a time series data from different sensor locations. I am trying to use Autoencoder for outlier/anomaly detection based on reconstruction error. However I am new to the field of machine learning and only have the knowledge of pandas, basic python, some basic machine learning algorithm. What are the approaches I should take for training the Autoencoder model?
Please suggests me all the parameters and types? Should I go detailly through all the theoretical detail before starting to train the model?
Relevant answer
Answer
Dear Utsav,
In Matlab, training an autoencoder is so easy. The below link shows how to define and train an autoencoder
  • asked a question related to Anomaly Detection
Question
4 answers
I am looking for an open source Machine Learning based tool to find anomalies, such as DDoS events that takes input in netflow format.
Any suggestions are appreciated. I am looking for some research based open source code that I can download and run on some remote machines containing netflow data files of an ISP.
Relevant answer
Answer
Actually I am looking for some research based open source code that I can download and run on some remote machines containing netflow data files of an ISP.
  • asked a question related to Anomaly Detection
Question
1 answer
Swamping and masking are caused by input data that is too large for the purposes of anomaly detection.
Relevant answer
Answer
I think Subsampling would be the better approach.
  • asked a question related to Anomaly Detection
Question
4 answers
I want to detect anomaly from a streaming data. Using FFT or DWT history is it possible to detect anomaly on the fly (online) . It will help a lot if anybody could suggest some related resources.
Thanks.
Relevant answer
Answer
why not consider using S-transform as it combines the properties of FFT and wavelet transform.
  • asked a question related to Anomaly Detection
Question
3 answers
For example, in supply chain or diagnostic, what is the importance of outlier or anomaly detection?
Relevant answer
Answer
Ever hear of quality control. An out of control point is an outlier. David Booth
  • asked a question related to Anomaly Detection
Question
5 answers
Could any one suggest some good resources of online unsupervised anomaly detection for streaming data.
Relevant answer
Answer
I have work with anomaly detection using auto-encoders. Howeve, I have used semi -supervised learning. Here is the link for the work:
  • asked a question related to Anomaly Detection
Question
4 answers
Hello Everyone,
We have recently investigated a Semi-Supervised Deep Learning approach for anomaly detection of Wind Turbine generators based on vibration signals in the paper entitled "Anomaly Detection on Wind Turbines based on a Deep Learning Analysis of Vibration Signals" [1]. We found that vibration data enables a promising mechanism to detect abnormal behavior on wind turbines with a careful Machine Learning pipeline design. The paper presents an IoT-ready Machine Learning pipeline that encompasses data gathering, preprocessing, feature extraction, and classification. The approach is based on a Semi-Supervised Deep Learning approach, using Deep Autoencoders combined with a normality threshold selection based on the F1-Score analysis for a labeled data-set. The vibration data is preprocessed with band-pass filters and DC-component removal, rotation speed relation, and FFT. Finally, 11 features are extracted, from minimum, maximum, RMS, and standard deviation, to kurtosis, shape factor, energy, and entropy. The trained detection model achieved accuracy >99%, precision >97%, and recall of 100% for the evaluated data-set.
Moreover, with the IoT integration, the proposed workflow can notify users whenever abnormal behavior is noticed. What do you think about our findings? For more details, check the full paper at . If you have similar experiences with anomaly detection in rotating machinery, or if you have any comments or questions, feel free to leave a comment so we can start a fascinating discussion.
[1] José Luis Conradi Hoffmann, Leonardo Passig Horstmann, Mateus Martínez Lucena, Gustavo Medeiros de Araujo, Antônio Augusto Fröhlich, and Marcos Hisashi Napoli Nishioka, Anomaly Detection on Wind Turbines Based on a Deep Learning Analysis of Vibration Signals, In Applied Artificial Intelligence:9, 2021. DOI: 10.1080/08839514.2021.1966879.
Best Regards,
José Luis Conradi Hoffmann
Software/Hardware Integration Lab
Federal University of Santa Catarina, Florianópolis - SC, Brasil
Relevant answer
Answer
Dear Dr Hoffmann,
Thank you for sharing this interesting topic and I would be happy to participate in the future step of this project. the below paper might be interesting to read,
  • asked a question related to Anomaly Detection
Question
3 answers
I am looking for X-ray baggage screening dataset for Anomaly detection in X-ray security screening systems.
Thanks for your help.
Relevant answer
Answer
see
chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/viewer.html?pdfurl=https%3A%2F%2Farxiv.org%2Fpdf%2F2001.01293.pdf&clen=2380896&chunk=true
  • asked a question related to Anomaly Detection
Question
3 answers
Does a tool for Explainable AI exist that you can use if you want to identify anomalies in data streams based on AI models? So far, I only know LIME. Maybe you have other suggestions.
Relevant answer
  • asked a question related to Anomaly Detection
Question
3 answers
I want to use AI models (black box) to identify anomalies in business processes. In your opinion, what are the advantages and disadvantages compared to common approaches such as conformance checking (white box)? (Is there any existing research about this?)
  • asked a question related to Anomaly Detection
Question
3 answers
I'm working on fraud detection in online product review systems. I want to work with dynamic graphs, and I read many papers about fraud detection and anomaly detection. But I couldn't find any dynamic graph-based dataset for review systems. Does anyone know such a dataset?
  • asked a question related to Anomaly Detection
Question
10 answers
I am working on a project named as video anomaly detection, and I have decided to apply one of these algorithm on my project named as AutoEncoders, RNN, and LSTM.
Kindly guide me which one among them is best for video anomaly detection and why. It will be a great favor.
Regards,
Relevant answer
Answer
In most case studies, the LSTM is applied.
  • asked a question related to Anomaly Detection
Question
9 answers
I am looking for an overview of methods to detect univariate contextual outliers in time series data. One example application is data from industrial plants in different (unknown) operation modes or slow trends in the time series, but no seasonal effects. Visually those outliers can be seen easily by a human.
In the attached graph visually the contextual outliers above and below the trend can be identified clearly.
Most global outlier detection methods can be used with an window-based approach. But a method, that automatically consideres the size of the context would be beneficial.
Are there any suggestions which methods are recommended for that purpose?
Relevant answer
Answer
I'm agree
  • asked a question related to Anomaly Detection
Question
4 answers
Hello Researchers!
Pls comment your views below.
I intend to work on IoT security using machine learning approaches.
As a naive in ML, I'm a bit puzzled about the possible algorithms for this problem.
1. Is this an anomaly detection problem?
2. What are the possible ML algorithm that can be used for this problem...?
Relevant answer
Answer
Thank you for your responses..! i will look into it..!
  • asked a question related to Anomaly Detection
Question
3 answers
When taking the standard deviation and mean as the reference info, the value out of 2 or 3 times std could be an anomaly, which works well in most cases. While the quartile concept is introduced, the value smaller than Q1 - 1.5 * IQR or bigger than Q3 + 1.5 * IQR is thought to be an anomaly. The former assumes that the data is normally distributed, what about the latter one? What are their individual advantages compared to each other? Please provide some real examples if convenient.
Relevant answer
Answer
In addition to these helpful answers, I would not recommend adding or subtracting a multiple of the IQR to Q1 or Q3 as this might create non-physical values for non-normal data in the same way that SD differences from the mean can.
It would make more sense to divide your data into a greater number of percentage intervals, such as every 10% of the data, then use the values nearer the lower and upper limits.
  • asked a question related to Anomaly Detection
Question
5 answers
I am working on a project named as video anomaly detection using matlab. I have a set of different 28 videos. I have to check the anomalies on that video. The algorithm is working fine, but I have an issue of setting threshold values so values greater than threshold marked as anomaly and less than threshold marked as normal frame. Each video have different range of anomaly values, so I am confused how to set it accordingly. Or can I update it with each video on the basis of each video at runtime?
Relevant answer
Answer
Tariq Sm , you need to do Adaptive thresholding techniques . I would suggest it to do it in python using OpenCV. This would be easier than Matlab. Here is short article on the usage of OpenCV for Adaptive thresholding.
  • asked a question related to Anomaly Detection
Question
4 answers
I am doing a project named as anomaly detection in videos using matlab. I have to perform data associate with clusters using JPDA, but unfortunately it isn't working fine. I have go through distinct papers of JPDA, but these are all about the tracking of any object.
Kindly guide me how do I proceed, or any research paper in which JPDA is used to perform data association not for tracking purpose.
Regards
Relevant answer
Answer
Ijaz Durrani
Thanks for your collaboration. This paper is about pedestrian tracking through JPDAF, but I required paper in which JPDA is used only for association not for tracking
  • asked a question related to Anomaly Detection
Question
3 answers
Hi Dear Ph. D.,
I have carefully read your paper on anomaly detection in smart meters.
Unfortunately, the research does not decribe the greedy windowed algorithm, neither its complexity, nor the execution time of the simulation to allow future research to improve it.
I have send you an email and wait to get answers to my many preoccupations.
Regards,
Relevant answer
  • asked a question related to Anomaly Detection
Question
5 answers
Actually I am asking this question especially for subsurface pressure data of an oil well (pressure will vary according to the flow rate). Nonetheless, any kind of method that works for non-stationary time series data will be sufficient. For the sake of the question, we can assume the data is noiseless.
Relevant answer
Answer
Fehmi Özbayrak Nonstationarity also depends on the kernel density estimates, and may not reject stationarity when m is small. Log-spline bivariate density estimation and LLDM by Loader may also be considered to improve tail estimation, in the presence of moderate to low values of m.
  • asked a question related to Anomaly Detection
Question
5 answers
I have already read about a method of evaluation for unsupervised anomaly detection using excess-mass and mass-volume curves (https://www.researchgate.net/publication/304859477), but was wondering if there are other possibilities.
  • asked a question related to Anomaly Detection
Question
3 answers
Quick response is highly appreciated.
Thankyou.
Relevant answer
Answer
That's what the videos are for
  • asked a question related to Anomaly Detection
Question
1 answer
I was reading "Time-Series Anomaly Detection Service at Microsoft" (https://arxiv.org/pdf/1906.03821.pdf) in these days, and I got some problems for the programming part.
The first picture shows the algorithm, the general idea is to perform the fast fourier transform for a time series sequence, calculate the spectral residual and perform inverse fast fourier transform at the end. When I checked the official code of this paper, before performing the inverse fast fourier transform, the transformed signal ('trans' in the code) was multiplied by the spectral residual and then divided by its amplitude (line 212 - 215 in the second picture) which is confused. If someone can explain about this part? Thanks.
  • asked a question related to Anomaly Detection
Question
4 answers
Hello everyone,
I’m working on a project in which I need to detect anomalies in a particular scene (two background scenes). The anomaly could be anything (bolts, pliers, glasses, etc.). However, I have generated synthetic data training with unity because I have very few realistic images and here comes the problem. I was looking throughout different techniques like Domain Adaptation since I need to find a solution that implies training on synthetic and doing inference on real ones, but each of them seems to be focused on the class of the image and thus could fail in finding particular anomaly. I’m not an expert in this field and I’d like to hear an expert’s opinion since I am a little bit stuck :(
thank you very much for you answers!
  • asked a question related to Anomaly Detection
Question
4 answers
In SPSS there is an Anomaly Detection procedure searches for unusual cases based on deviations from the norms of their cluster groups.
Where can I read in detail the algorithm by which this procedure works? How are clusters created?
I would appreciate your help.
Relevant answer
Answer
You can use boxplot in SPSS to identify outliers
  • asked a question related to Anomaly Detection
Question
1 answer
Dear colleagues,
I recently read a publication of an anomaly detection algorithm called HBOS. Histogram based outlier score. The idea of it is to create for each sample xi an outlier score, how much in evidence one single sample outlies in comparison to other samples of given training data. (the higher the value, the more likely it is an abnormal training sample(+thresholding gave an anomaly detection method). This idea is very interesting, nonetheless this score seems not to be normalized, eg.: [0,1] , but I think this method still can be used for comparison if a training filter has lead to improvement when comparing samplewise the given results. The question I want to open up: Which metrics do you normally use, when you apply a training filter for comparison of outlierness? And what are your experiences? There are distance metric approaches and also binary classification (P/R, AuC,...), but I am looking for a metric which compares the "outlierness with regard to other samples". HBOS looks promising though.
Kindest Regards,
Adnan
Relevant answer
Answer
As a metric you can use the variance and see how it varies when eliminating one data-point from the batch while keeping all the others. Scan the batch and test eliminating each data-point, one at a time. When an eliminated data-point is an outlier the variance of the batch decreases a lot. This would work with the histogram method.
  • asked a question related to Anomaly Detection
Question
6 answers
Suggestions needed.
Relevant answer
Answer
You cam develop a new scheme of an anomaly detection by doing the following:
1- Reduce the power communication overhead.
2- Reduce the process complexity.
3- reduct the data which have been collected from the sensors.
4- Try to design non central communication ( Distributed communication ) .
5- Try to use a light weight algorithm to classify the gathering data.
  • asked a question related to Anomaly Detection
Question
5 answers
We are using CMU-CERT r4.2 dataset for performing Insider threat detection. Are there any alternative to this data set?
Relevant answer
Answer
Dear All, Thank you very much for sharing these information.
Do we have any paper with comparision of these datasets with CMU-CERT r4.2 dataset ?
  • asked a question related to Anomaly Detection
Question
14 answers
The expected answer can be narrowed to anomaly detection algorithms in medical images.
Relevant answer
Answer
  • asked a question related to Anomaly Detection
Question
6 answers
Hi all,
I'm using autoencoder to detect anomalies in my dataset
I used normal dataset for training and detect anomalies on fraud+normal dataset
As I observed so far, autoencoder performed the best at an optimal training dataset size, the ROC, PR AUC went low when the size of the training dataset is too big? Is this because of overfitting?
Please refer to the attached photo below
Thanks in advance!
Relevant answer
Answer
In my opinion we can you find the autoencoder in RNN architecture, I suggest if you can you use CNN architecture, maybe that can you give the good results without the overfitting.
I hope that be Claire for you. @ Anh Tram Nguyen
  • asked a question related to Anomaly Detection
Question
1 answer
I have 99 images like scratch
and I have a part - blurred scratch image
How do I detect a part - blurred image?
I tried one-class SVM and Isolation Forest but they couldn't detect
Relevant answer
Answer
Hi Joo,
This is an interesting research, is it possible for your to share a sample image to see what you are actually looking for, then may be share some ideas with you.
regards
  • asked a question related to Anomaly Detection
Question
4 answers
what is the best method to handle Imbalanced Datasets in network anomaly detection?
Relevant answer
Answer
Thomas W Kelsey Thank you very much for sharing valuable information
  • asked a question related to Anomaly Detection
Question
3 answers
Looking for the best DL model and platform for Anomaly detection.
I am thinking about Transformer or LSTM. Do you think this is the right choise?
Thanks.
  • asked a question related to Anomaly Detection
Question
5 answers
in the detection of anomaly networks using artificial intelligence, what dataset and algorithm is advisable to make use of?
Relevant answer
Answer
there are many datasets available online especially for anomaly detection,
one of the best websites that can provide you different datasets is the Canadian Institute for Cybersecurity.
here te link :
  • asked a question related to Anomaly Detection
Question
3 answers
Dear all,
I am trying to collect vibration data from two similar motors which one of them is healthy and new but another one has a bearing problem. Since I do not have any historical data and degradation path of the motors. what can I do with this data to predict the bearing failure? Can I implement RUL(remaining useful life) or should I set a threshold for anomaly detection? I will be thankful if someone could help me since I do not have much experience on machine learning algorithms.
Relevant answer
Answer
If you dont have the history data of both motors, estimate the motor-to-motor variation by comparing two healthy and new motors. This way, you have an educated guess of how much individual motors differ when they are produced. Create an anomaly detection model with healthy data only, and try to determine the proper classification cutoff for anomalous result with the known variation of healthy motors.
  • asked a question related to Anomaly Detection
Question
3 answers
I am working on anomaly detection in real-time and my data looks attached and it's unlabelled. I want to detect if the next minute timestamp is an anomaly or not. My main column in num_part which is number of players and all other columns are supporting metrics.
Issue:-
1.) I have some kind of data for 50 different domains.
2.) There is differnet kind of seasonality in different domains like on WednesdayTuesday domain 1 goes for Maintainance on tuesday domain b etc.
The approach I am using:-
As of now, I am calculating Z-score for every five-minute interval and comparing it with historical data and alerting but getting lots of false positive.
I am even considering it a streaming data like we have in sensors and not a time series.
I want to alert my team in real-time if the incoming next data point is an anomaly.
Relevant answer
Answer
Reza Fotohi that’s my question only stack exchange and it doesn’t have any answer. Not sure why you are sharing it with me.
  • asked a question related to Anomaly Detection
Question
3 answers
What are the ways that we can use to analyze the accuracy of unsupervised anomaly detection algorithms like HBOS or Isolation Forest ?
Relevant answer
Answer
As anomaly detection algorithms aim to classify whether the target is an anomaly or not, it falls under binary classification. So, mostly the evaluation metrics used are accuracy, precision and recall and ROC-AUC curve. Usually, the number of anomaly samples will be less when compared to normal samples. Hence just accuracy measure is not a correct evaluation metric. The best and suitable evaluation metric for anomaly detection algorithms is the ROC-AUC curve.
  • asked a question related to Anomaly Detection
Question
5 answers
We see in literature there are primary longitudinal waves or secondary transversal waves. Both are used for material inspection but can any one say which is the better technique or is it a case of multiple data giving higher confidence ? I have been searching literature for a good review paper on different waveforms for detecting different anomalies however this does not exist or, can anyone point me in that direction?
Relevant answer
Answer
The anomaly detection with ultrasonic waves depends on different parameters like crystallisation, geometry, thermic history, internal friction and others for the inspected materials and, on the other hand, on the ultrasonic equipment like voltage excitation of the probes, beam forming, transducer properties and others. Furthermore, some post-processing techniques extract the interesting information from the data matrices.
Which anomalies you want to detect?
  • asked a question related to Anomaly Detection
Question
4 answers
Context:
Suppose we are given a baseline population (B) and an anomalous population (A), consisting of entities described by a set of numerical and categorical features (X). Let T denote a statistic computed over a population (count of entities, percentage of entities having a specific property, etc). Based on a distance metric D between T(B) and T(A), population A is labelled as anomalous.
Question:
How can one identify the subgroups of entities within A that lead to the observed difference between T(A) and T(B) ( e.g. entities with X1 in {w11,w12,w13} and X2 > w22 ). Equivalently, what are the subgroups from A that once removed, leads to no statistically significant difference between T(A) and T(B).
Reviewed literature:
Applying the Chi2 test of homogeneity on each feature coupled with the Cramer's V score could lead to a ranking of these features; however this approach provides a unidimensional segmentation of the population and doesn't account for combinations of features.
Example:
B can represents the patients admitted into a given hospital during a given month and A represent the patients admitted into the same hospital a month later, with X representing the demographic features of a patient (age, gender, income, ethnicity, etc). Let T denote the total number of patients admitted with flu into a hospital.
Given that T(A) is statistically larger than T(B), how to localize the subgroup of patients (in terms of their demographic features) that lead to the observed difference?
Relevant answer
Answer
you can do patient matching (like propensity score based methods) or kNN, and look at patients who do not match.
  • asked a question related to Anomaly Detection
Question
4 answers
I am looking to identify the following on high dimensional data.
1. Clusters
2. Outliers
I have tried different dimension reduction approaches and used the reduced dimension to plot the data to identify the patterns graphically.
I have identified the outlier data points through other approaches, but not through clustering. The data contains user activities and my objective is to find the similar group of users and anomalous data points (rows).
Relevant answer
Answer
Thank you Duc P. Truong David Morse Cristian Ramos-Vera for the response.
My experiment is based on user activities inside the network. I am trying to group users into similar groups (clusters) based on their activities. Also, identify outlier data points as anomalies.
The shared papers look useful, I am going through these.
Furthermore, there could be something on these lines for me.
  • Sometimes, if the data needs more than 2 or 3 dimensions to be separable, visualization might not show the clusters.
  • (b) you may well have a variable set that doesn't lend itself well to condensation (there are some preliminary checks you can run to help inform this determination)
  • asked a question related to Anomaly Detection
Question
2 answers
Hello and good day/night to all colleagues!
Suppose I have data: attributes of requests (e.g. from history of online bank system)
- client IP (with IP info - continent, country, city, latitude, longitude)
- client agent (Operating System, Browser)
- some info about account (company, user)
- some info about the request (date, time, duration, size, from which url)
Could I build the fraud detection system based on these data analysis (or just anomaly detection systems)?
Are where any common or specifical approaches to such systems?
I'm interested in any information (books, articles, keywords to search, ideas, similar projects).
Relevant answer
Answer
These attributes are being used to raise alerts and incidents in some of the commercial cyber security solutions such as Microsoft CloudApp Security (MCAS), Microsoft Azure AD Identity Protection (MAADIP), and Dark Web Security Solution from Vology. At least MS Solutions build up reputation for users and devices based on multiple attributes and then rate risk level of them or generate an alert or an incident using Machine Learning and AI. I suggest you take a look at the existing solutions and possible white papers of them.
  • asked a question related to Anomaly Detection
Question
3 answers
Hi all, I am about to start my master thesis in Anomaly detection on a multivariate time series data. I am interested in knowing reference to the papers where related work has been done in this area or suggestion to some of the techniques, especially unsupervised approaches.
  • asked a question related to Anomaly Detection
Question
5 answers
Dear All,
Why do we need samples of both classes for the training of binary classification algorithms if one-class algorithms can do the job with only samples from one class? I know that one-class algorithms (like one-class svm) were proposed with the absence of negative data in mind and that they seek to find decision boundaries that separate positive samples (A) from negative ones (Not A). Hence the traditional binary classification problem (between (A) and (B) for example) can be formulated as a classification of (A) and (not A = B).
Is it about better classification results or am I missing something?
Thank you in advance
Relevant answer
Answer
When you perform a binary classification, you are actually trying to learn a separation function that can separate two distributions. However, having samples of only one class means that your classifier really needs to learn the distribution of data samples on that class, which is much more difficult (same with GANs).
  • asked a question related to Anomaly Detection
Question
3 answers
An anomaly detection method can enjoy some properties. The idea presented here is to apply an algorithm for anomaly detection to a dataset, to remove the discovered anomalous points, and then to apply to the remaining data the same algorithm again.
Let S be a set of points, F an outlier detection method, and A the set of outliers of S discovered by F. In this case we can write A=F(S). If F(S-A)={} (empty set), then F is an invariant algorithm for S (or S is invariant respect to F). In this case, F finds all the outliers of S in one fell swoop. If F is invariant for each set, simply F is invariant.
Do you know a method F that is invariant for each set?
Relevant answer
Answer
Interesting question!
Following
  • asked a question related to Anomaly Detection
Question
2 answers
I am familiar with using autoencoders to detect Fraud in credit card transactions, But my data is a time series one.
Thanks
Relevant answer
Answer
  • asked a question related to Anomaly Detection
Question
3 answers
to know technique for outlier
Relevant answer
Answer
Any effective supervised learning method is my choice
  • asked a question related to Anomaly Detection
Question
4 answers
know about outliers in detail
Relevant answer
Answer
Anomaly detection is also known as outlier detection.
Anomaly detection is mainly a data-mining process and is used to determine the types of anomalies occurring in a given data set and to determine details about their occurrences. It is applicable in domains such as fraud detection, intrusion detection, fault detection, system health monitoring and event detection systems in sensor networks. In the context of fraud and intrusion detection, the anomalies or interesting items are not necessarily the rare items but those unexpected bursts of activities. These types of anomalies do not conform to the definition of anomalies or outliers as rare occurrences, so many anomaly detection methods do not work in these instances unless they have been appropriately aggregated or trained. So, in these cases, a cluster analysis algorithm may be more suitable for detecting the microcluster patterns created by these data points.
Techniques for anomaly detection include:
One-class support vector machines
Determination of records that deviate from learned association rules
Distance-based techniques
Replicator neural networks
Cluster analysis-based anomaly detection
Specific techniques for anomaly detection in security applications include:
Profiling methods
Statistical methods
Rule-based systems
Model-based approaches
Distance based methods
  • asked a question related to Anomaly Detection
Question
4 answers
Specifically, I want to know real-time machine learning models that are capable of identifying anomalies considering streams of data with multiple features (i.e. multivariate time series).
I have found a scoring mechanism called "Numenta" which scores (NAB) real time machine learning models. However most of the models compared there are for univariate time series and have not been extended for multivariate cases. Therefore, I am searching for similar real-time machine learning models that can handle multivariate input data streams.
Relevant answer
Answer
See the link. It is the standard work in the field. Best, D. Booth
  • asked a question related to Anomaly Detection
Question
2 answers
For anomaly detection, How to improve FP rate from feedback in deep learning and unbalanced dataset?
Relevant answer
Answer
false positive
  • asked a question related to Anomaly Detection
Question
7 answers
I'm trying to calculate rainfall anomaly from monthly rainfall measurements taken between 1997 and 2017 at a single location in Panama (subjected to highly seasonal rainfall as well as El Niño-derived fluctuations in rainfall). My data is one measurement per month per year; 20 values for January (1997-2017), 20 for February, and so on. I have attached a screen shot of the first few rows of data if further clarification is needed.
I am aware of the common anomaly technique of subtracting the long term mean from the actual value and dividing by the standard deviation [(x-xbar)/stdev], but as I understand it this should only be used on normally distributed data.
My issue is that while SOME of my months have normally distributed (gaussian) rainfall over the sampling period, others do not - some are uniformly distributed, some are lognormal. I'm not clear on whether I can use the same anomaly calculation for these different distributions, or if it's appropriate to transform the data. It seems inappropriate to use different anomaly calculations on different months, since I want to eventually compare anomalies from different months together.
I am new to this type of data so I am feeling a bit out of my depth. Any advice would be appreciated - thank you!
Relevant answer
Answer
Dear Emma Young,
In the attached pdf, there is a lot of useful information I believe can help you.
  • asked a question related to Anomaly Detection
Question
3 answers
I've sketched the following basic metrics categories that one can get from a cloud-native application monitoring solution and use for e.g. anomaly detection:
  • structural (e.g. length of a request trace)
  • temporal (e.g. traces as distributed on the timescale, duration of the trace and calls within it)
  • performance (e.g. 99%-tile response time, throughput)
  • resource (e.g. CPU utilization, memory utilization)
  • workload (e.g. request rate by endpoint)
  • capacity (e.g. number of pods for service)
What do you think about it? Is there something missing? Can some more coarse-granular grouping be introduced?
Relevant answer
Answer
I agree. This matter may be considered in the broader context of Control Room based Network or System supervision. This applies to transport networks (Deutsche Bahn as a main railway operator in Europe supervises rails and trains) in electricity (production or electricity transport or distribution operators on their infrastructure), telecom (AT&T supervises its networks and services).
In all these cases you have:
-service layer
-asset layer (assets using infrastructure underneath)
-infrastructure layer
-physical system assets composing the infrastructure.
For each layer you have parameters monitored and alarms.
If you want to go deeper look for ETSI-ISG-NFV, this is about network function virtualisation, and going back and forth from real systems to logical representations
  • asked a question related to Anomaly Detection
Question
3 answers
What would be a good search range for Nu and Gamma Values in OneClassSVM if I want to do a grid search? I have ~17000 training samples of one class with each sample consisting of around 300 dimensions( I might do PCA/others and reduce dimensionality).
Relevant answer
Answer
Ravi,
let us take each parameter one by one.
1. Nu is the parameter that controls the training errors (and the number of SVs). This parameter is always within the range (0,1].
2. Gamma parameter determines the influence of radius on the kernel. The range of this parameter depends on your data and application.
For example, in the article: the values are chosen as:
  • Nu = [2-10 to 2-6] with steps 20.1
  • Gamma = [2-40 to 2-13] with steps 21
As you already mentioned, grid search is the best way to get the optimal Nu-Gamma pair for classification. One suggestion is that you could use Hinge-Loss rather than RMSE to determine optimal parameters if you're performing binary detection [-1, +1]. However, if you aim to do a probabilistic detection, you could go for RMSE.
  • asked a question related to Anomaly Detection
Question
4 answers
We are working on anomaly detection, where we want to quantify the influence of a subset of attributes (or features) for making a data set anomalous.
Let us consider an illustrative example as follows.
We have an anomalous data set D having N attributes and M instances. Now, we execute some deterministic anomaly detection algorithm to retrieve the anomalous instances from D. Let us consider that this algorithm has retrieved Q (Q << M) instances as anomalous.
In the next phase, we wish to quantify the influence of a subset of attributes, say P (P << N), in making Q instances of data set D anomalous. How can we quantify this?
Any help is well appreciated in advance.
Thanks & Ragards
  • asked a question related to Anomaly Detection
Question
9 answers
My goal is to detect Intranet DDoS attack and its variants through log dataset. I have other insider threat log dataset for anomaly detection, like CERT or LANL, however it is not sure that anomalies are caused by DDoS. Where can I get a new log dataset for insider threat detection which demonstrably attacked by DDoS?
Relevant answer
Answer
have you attempt to use CIDDS?
  • asked a question related to Anomaly Detection
Question
5 answers
What's the best open source (i.e., free) approach/library/tool for unsupervised/semi-supervised[i.e., with limited to no training data] time-series [like this - https://github.com/numenta/nupic/blob/master/src/nupic/datafiles/extra/nycTaxi/nycTaxi.csv] anomaly detection.
Relevant answer
Answer
R and Python
  • asked a question related to Anomaly Detection
Question
3 answers
I’m currently using features that are built with statistics over a certain window. This takes in f.e 10 datapoints and make them into one using PCC,KL or simple average(see link). The predictions are also made over a sliding window meaning one anomaly will be present in multiple windows.
If you have two classes ‘normal’ and ‘anomaly’ how do i best score performance on the test set?
Relevant answer
Answer
I'm not familiar to R, since I work with python (keras,tensorflow,...) .
I can find some functions for calculating ROC curves but that's not the problem.
I can't seem to find in any literature or documentation how the performance is measured when one anomaly is present in different windows for sliding window algorithms.
  • asked a question related to Anomaly Detection
Question
2 answers
Greetings, in the iscx-2012 dataset, there is a labelled-flow file and pcap files per day. Could someone tell me how to generate .CSV for use in machine learning algorithms? Can I use the labelled-flows.xml or do I have to generate a .CSV from the PCAP? Any link how to do this?
Relevant answer
Answer
Hi, I would say using Python. You can use Python to read pcap file. Extract your desire features (split datetime or concat string for example), then put this new data into row and column (construct DataFrame). Then export data frame into csv file. Current tool, I guess, mostly do not extract information in the way you want for ML.
  • asked a question related to Anomaly Detection
Question
1 answer
The statistical based approaches can be HMM or non parametric Cumulative sum algorithm but i am not able to understand how to implement these ideas.
Any help would be highly appreciated.
Thanks
  • asked a question related to Anomaly Detection
Question
4 answers
I want to find expected value where I know what is anomaly.
Relevant answer
Answer
  • asked a question related to Anomaly Detection
Question
3 answers
How can pattern learning anomaly detection can be carried out for SQL Injection attack.?
Which of the learning anomaly detection tools can be used to classify SQL injection dataset?
Relevant answer
Answer
  • asked a question related to Anomaly Detection
Question
5 answers
I use a dataset about activities that an old person was doing during a year. it has features of start time, end time and activity name as below:
08:52:12 - 08:55:38 - Washing hand/face
08:57:36 - 09:05:53 - Make coffee
09:07:38 - 09:12:52 - Washing hand/face
09:13:57 - 09:21:10 - Make sandwich
09:23:08 - 09:43:11 - Eating
..
I want to insert abnormal situations in which an activity lasts longer than usual or increase frequencies of doing an activity during a day.
i'm programming in python. what should i do?
if i want to insert an abnormal record, should i change the time of all of records that came after that record?
Relevant answer
Answer
Insert records of activities with start time and NULL value for end time
  • asked a question related to Anomaly Detection
Question
5 answers
In general, to detect an anomaly in multivariate data (not necessarily time series) , do we need to check which distribution the data is being drawn from ? This is straight forward for univariate but how to find out a mixture distribution when the individual variables in the data have been drawn from different distributions (other than Gaussian). How should one approach in such situation to detect anomalies ? Transforming the data to near-normal is an option but wouldn't that distort the very properties of underlying probability distribution and lead to false anomalies.?
Relevant answer
Answer
Hi Jack. Thanks for answering. Yes, this is what I expected to know as I have seen many instances when data is being normalized and then find the anomalies. The case, I have though is different. I am getting data from different sensors in a machine and these sensors altogether define the health of machine in a given time interval. I am interested in conducting some unsupervised analysis to find if one of the sensors behave differently than the normal. Transforming to normality doesn't make sense as the underlying data is strictly non normal.
  • asked a question related to Anomaly Detection
Question
2 answers
Recent experiments showed application of statistical methods to detect shift in in-home activities routine. These methods considered each type of house activity at initial time normally distributed, with distribution segmented into several regions of different degree of abnormality, with accuracy beyond 90%.
Hoque et all, focused on reducing false alarms in clustering-based anomaly detection on in-home activities with rule-based approach.
Source: E. Hoque, R. F. Dickerson, S. M. Preum, M. Hanson, A. Barth and J. A. Stankovic, “Holmes: A Comprehensive Anomaly Detection System for Daily In-home Activities,” 2015 International Conference on Distributed Computing in Sensor Systems, Fortaleza, pp. 40-51, 2015.
  • asked a question related to Anomaly Detection
Question
3 answers
With respect to the network anomaly detection in smart home environments and using machine learning? Such like small dataset needed for training? Few labeled training data needed?
Relevant answer
Answer
Hey,
maybe a 1-SVM (one-class SVM) can help. The 1-SVM is a semi-supervised anomaly detector. You have data that is classified as "normal" and the goal is to find the unlabled anomalies. By setting an SVM for only one class, all the anomalies will be on the other side of the hyperplane which constraints the normal class, thus identifying the anomalies. After finding those anomalies you can analize the features. Support vector machines usually work fine with small datasets.
  • asked a question related to Anomaly Detection
Question
1 answer
Hi Everyone,
Need your advise on using Multi variate Gaussian distribution for multiple attributes(more than two) . For example, my input data set contains multiple attributes {A1,A2,A3,A4........An}. I wanted to use Gaussian distribution to observe the trend in the input data and use it for techniques like Anomaly detection etc.
For all the examples in the internet(in at least, I have seen), Multivariate Gaussian distribution is used only for 2 dimensions i.e x1,x2. Request you all to let me know if using Multi variate Gaussian distribution for multiple attributes is a valid case?
Relevant answer
Answer
As far as I know you can use as many dimensions for your probability distributions as needed. Only the visualization of results will become difficult.
A simple example would be a machine that produces cubes of some alloy. The cubes have tolerances on length, width, height and density. This gives you 4 probability density functions (PDFs) and would yield a 4-dimensional distribution for your cubes.
  • asked a question related to Anomaly Detection
Question
3 answers
I am working on the timeseries data where the phenomena sometimes changes rapidly. WOuld like to learn about methods that have been tried on such data. Also interested in classification and anomaly detection for timeseries data
Relevant answer
I hope below mentioned classification might be useful for your case.
Bauwens et al. (2006) categorized the multivariate GARCH models and distinguished three non-mutually exclusive approaches for constructing the multivariate GARCH models. They mention following categories: (i) direct generalization of univariate GARCH model of Bollerslev (1986); (ii) linear combinations of univariate GARCH models; (iii) nonlinear combinations of univariate GARCH models. In the first category, they list Vech, BEKK, and factor models. The authors include generalized orthogonal models in the second category. In the third category, they
have Constant Conditional Correlation (CCC) and Dynamic Conditional Correlation (DCC) models.
Bauwens, L., Laurent, S., Jeroen, V., and Rombouts, K. (2006). Multivariate
GARCH models: A survey. Journal of Applied Econometrics 21: 79-109.
  • asked a question related to Anomaly Detection
Question
4 answers
I need HYDICE Urban Data Set or other similar data set for Hyperspectral anomaly detection. can any one help me? please inform me for this emergent need?
Relevant answer
Answer
The permanent link to the Urban HYDICE data set is:
By the way, this data looks like it's been atmospherically corrected, but certainly has spectral mismatches. Interesting that people use it unmoving when comparing to better quality measured spectral libraries.
  • asked a question related to Anomaly Detection
Question
4 answers
I want to know a good way of implementation (either Supervised or Unsupervised), working code with dataset or a good approach on how to solve this problem.
Relevant answer
Answer
@Christos
Can you suggest any implementation of one class svm on text data?