Science topic

Unsupervised Learning - Science topic

Explore the latest questions and answers in Unsupervised Learning, and find Unsupervised Learning experts.
Questions related to Unsupervised Learning
  • asked a question related to Unsupervised Learning
Question
7 answers
I have an idea of
1. RNN Embedding
2. RNN with pack padded sequence
3. FastTest
4. Bi-LSTM
5. CNN
6. CNN-LSTM
7. BERT Transformer
these models.
I am looking model apart form these.
Relevant answer
Answer
You can use MobileBERT, it's a compact BERT model open sourced on GitHub. ... or the other Google open source models using projection methods, namely SGNN, PRADO and pQRNN. pQRNN is much smaller than BERT, but is quantized, and can nearly achieve BERT-level performance, despite being 300x smaller and being trained on only supervised data.
Question
4 answers
I am recently study clustering quality metrics like Normalized Mutual Information and Fowlkes-Mallows scores.
Both of the scoring metrics seem to focus on a summary of the entire clustering quality. I am wondering whether there is a standard way or variant of the metrics above to measure the quality of a certain cluster or a certain class? The basic idea is that even if the overall looks good but some certain cluster is problematic, the metrics will still give warnings.
PS: I am not looking for any intrinsic methods. More precise, let's assume what I have is, for each data point x_i belong to dataset X, there is a ground truth class mapping x_i -> y_i, and a clustering x_i -> z_i, where y_i, z_i indicates the membership and don't necessarily have the same cardinality. Besides, I would like to further assume there is no distance measure d(x_i, x_j) defined.
  • asked a question related to Unsupervised Learning
Question
8 answers
If I have collected data regarding say food-preferences from multiple sources and merged them.
How can I decide what kind of clustering to do if I want to find related preferences?
Whether to go for K means, hierarchical, density-based, etc. ?
Is there any process of selecting the clustering technique?
Relevant answer
Answer
If you know (or any rough idea) how many clusters you need, then you may try K-means or some other variants of it like K-means++ or MinMax K-means clustering algorithm. But if you do not have prior knowledge about number of clusters then you can try with different values of k, and assess the goodness of the results using some cluster validity indices like DB index, C index, CH index, Dunn index etc. Otherwise you can DBSCAN clustering algorithm.
Question
13 answers
In an online website, some users may create multiple fake users to promote (like/comment) on their own comments/posts. For example, in Instagram to make their comment be seen at the top of the list of comments.
This action is called Sockpuppetry. https://en.wikipedia.org/wiki/Sockpuppet_(Internet)
What are some general algorithms in unsupervised learning to detect these users/behaviors?
Relevant answer
Answer
Issa Annamoradnejad , this a good question. One colleague from Canada told me there is one list of Twitter users marked as bots/fakes accounts then if you can identify the pattern they follow, it can be useful for your work.
  • asked a question related to Unsupervised Learning
Question
3 answers
While I am intrigued by the fact that unsupervised learning algorithms don't require label handouts yet computing astounding results, I wonder what is the stopping point in AI? Where do we know the machine 'learning' is out of our hands and we can't decode what we originally created?
Is there some method to know what our algorithm is learning and on what basis?
Relevant answer
Answer
If you kindly read the attached book deeply, I hope you will be able get very good concepts of supervised (Learning via labeled data) and unsupervised learning (Learning via non-labeled data). Also you would need some examples preferably via MATLAB to see how it work? : You may follow some example from here: https://github.com/hrzafer/machine-learning-class and https://www.coursera.org/learn/machine-learning
  • asked a question related to Unsupervised Learning
Question
1 answer
The use of cascaded neural networks for the reverse design of metamaterials and nanophotons can effectively alleviate the problems caused by one-to-many mapping, but the intermediate layer of the cascaded network is unsupervised learning, and an effective method is needed to limit output range of the intermediate layer.
Relevant answer
Answer
Multi-layer networks can't and don't alleviate the problems caused by one-to-many mappings. What they do is allow the representation of associations that aren't linearly separable.
Since the output of any unit of an intermediate layer becomes an input to the next layer, the range of input/output values is determined by the activation function of each unit. So it's simple to limit the range-just specify that the activation function.
  • asked a question related to Unsupervised Learning
Question
4 answers
UDA(https://github.com/google-research/uda) could achieve good accuracy by only 20 training data on text classification.
But I find it is hard to reproduce the result on my own dataset.
So I want to know the reason why UDA works. And I want to know what is the most important thing to reproduce the result.
Relevant answer
Answer
There is a long discussion about the work (i.e., paper) reported in the GitHub link you provided, it concerns the paper's rejection at a conference:
  • asked a question related to Unsupervised Learning
Question
11 answers
Supervised learning is the basis of deep learning. But, human and animal learning are unsupervised. In order to make deep learning more effective in human life we need to discover approaches using Deep learning to handle unsupervised learning. How much of progress is made in this direction so far?
Relevant answer
Answer
I think the statement "human and animal learning are unsupervised" is not exactly correct, we do need to labeled data during our learning. Like you constantly correct your children when they learn at a young age. The limitation of unsupervised learning is not just labeled data, also the difficulty to convert abstract tasks in a machine-readable format.
Currently, there are some progressed, such self-supervised, using the property of data as pseudo-labels, Semi-supervised learning, etc.
  • asked a question related to Unsupervised Learning
Question
6 answers
I have an input data set as a 5x100 matrix. 5 indicates the number of variables and 100 indicates the number of samples. I also have an target data set as a 1x100 matrix, which is continuous numbers. I want to design a model using input and target data set using a deep learning method. How can I enter my data (input and target) in this toolbox? Is it similar to the neural fitting ( nftool) toolbox?
Relevant answer
Answer
Agree with Aparna Sathya Murthy
  • asked a question related to Unsupervised Learning
Question
3 answers
Hi.
I'm dealing with clustering of data were the resulting clusters are, in general, non-spherical. Some of them are not convex.
What are the best internal metrics for evaluating these kind of clusters?
I know the silhouette index is a very common for evaluating the result of clustering process. However, it seems that silhouette index is biased towards spherical clusters.
Relevant answer
Answer
Hello,
I think the meaning of "good clustering" is subjective and its interpretation varies with applications. The notion of "good clustering" is relative, and it is a question of point of view.
However, there are few well known measures like silhouette width (SW), the Davies-Bouldin index (DB), the Calinski-Harabasz index (CH), the Normalized Mutual Information(NMI) and the Dunn index.
In my point of view, I think that the single-link metric is flexible in the sense that it can find non-isotropic clusters and also the clusters can be even concentric. The single-link metric works best for well-separated, non-spherical clusters.
Good luck.
Question
3 answers
Normalized Mutual Information (NMI) and B3 are used for extrinsic clustering evaluation metrics when each instance (sample) has only one label.
What are equivalent metrics when each instance (sample) has only one label?
For example, in first image, we see [apple, orange, pears], in second image, we see [orange, lime, lemon] and in third image, we see [apple], and in the forth image we see [orange]. Then, if put first image and last image in the one cluster it is good, and if put third and forth image in one cluster is bad.
Application: Many popular datasets for object detection or image segmentation have multi labels for each image. If we used this data for classification (not detection and not segmentation), we have multiple labels for each image.
Note: My task is unsupervised clustering, not supervised classification. I know that for supervised classification, we can use top-5 or top-10 score. But I do not know what will be in unsupervised clustering.
Relevant answer
Answer
Question
69 answers
Dear researchers,
let's gather data regarding the Corona virus.
This could be used for analysis in a second step.
My first ideas:
1. Create predictive models
2. Search for similarities with Unsupervised Learning
3. Use Explainable AI for new insights.
What are your ideas?
Where did you find data?
Relevant answer
Answer
European Scientists have identified 31 medications that may help for SARS-2-Treatment, see https://www.pharmazeutische-zeitung.de/31-wirkstoffe-haben-potenzial-gegen-covid-19/
  • asked a question related to Unsupervised Learning
Question
4 answers
It's an era of deep learning and especially unsupervised deep learning which basically revolves around ART, SOM and Autoencoders. Now, the question arises what issues does deep learning based methods focus, which are not properly handled in traditional unsupervised learning techniques?
Relevant answer
Answer
Kopal Rastogi Both PCA and Autoencoders are dimensional reduction techniques but Autoencoders are used more for detecting anomalies than for reducing data. Using Autoencoder, you try to reconstruct your data and then use reconstruction error to identify anomalous data points. This reconstruction is something that PCA is not very good at , especially when you have non-linear relationships among variables. Autoencoders, on the other hand, can use deep architecture to learn complex non-linear relationships as well.
  • asked a question related to Unsupervised Learning
Question
11 answers
Can AI learn from processes instead of data ? the question is valid for supervised and unsupervised learning. if so there is algorithms or approach for learning from process execution ?.
Relevant answer
Answer
Sounds like a case for Hidden Markov models.
The algorithm of choice is
Regards,
Joachim
  • asked a question related to Unsupervised Learning
Question
10 answers
Hello, I'm a biologist interested in machine learning application in genomic data; specifically, I'm trying to apply clustering techniques to differential gene expression data.
I started by understand the basics of unsupervised learning and clustering algorithms with random datasets, but now I need to apply some of that algorithms (k-means, PAM, CLARA, SOM, DBSCAN...) to differential gene expression data and, honestly, I don't know where to begin, so I'd be grateful if someone can recommend me some tutorials or textbooks, or give me some tips.
Thank you for your time!
PD: I'm mainly using R language, but if Python tutorials are also OK for me.
Relevant answer
Regards,
Antonio
  • asked a question related to Unsupervised Learning
Question
6 answers
I work on graph based knowledge representation. I would like to know, how we can apply Deep Learning on Resource Description Framework (RDF) data and what we can infer by this way ? Thanks in advance for your help!
Relevant answer
Answer
I always concerns why we need deep learning on RDF or OWL-based data. As the natural of the semantic web, the graph data with specific relations actually presents the outcome of learning. Thus, the query language or rule query language should enable directly answer the questions.
Question
9 answers
Hi,
I am pursuing PhD and my area of work is pattern recognition by machine learning. I have covered all supervised and unsupervised learning (deep learning) during my Ph.D because of my topic. I have completed my all research work and waiting to submit the thesis. I hope, I'll be able to complete my thesis with in 3 year. I have published 5 articles (4 conference and 1 Scopes journal) and 5 unpublished articles.
Could you suggest me what type of option I can follow after to complete my PhD and why that options are good according to you (based on my profile)?
Thank you for your time.
Relevant answer
Answer
Or, also a posdoc can be recomendable
Question
12 answers
What are the links in their definitions? How do you interconnect them? What are their similarities or differences? ...
I would be grateful if you could reply by referring to valid scientific literature sources.
Relevant answer
Answer
AI is the overall discipline to cover several subdisciplins like computer vision, language understanding and translation, even object oriented programming was in the early stage of AI was included. Other areas coverd a voice recognition, machine learning, expert systems, business intelligence or rule based systems, ATMS (assumtion based truth maintenance systems like in KEE), genetic programming, fuzzy logic based expert systems or decision support systems, decision trees, etc.
The most powerfull AI systems are hybrid systems that combine different technologies and algorithms to solve a certain problem. Neural networks are the traditional machine learning algorithms, as the computing power of modern CPUs or even specialized CPUs or GPUs or APUs allow an extremly fast evalution making them even usefull in cheap consumer products (like digital cameras, headphones, etc). Specialized AI processors integrated in many smartphones today typically are very fast parallel math units supports ML or even DL to make them usefull in realtime applications.
Regarding literature you will find tons of valid literature. I dont think a single one is outdated, the algorithms from the early AI are still valid and I would look for
Just my 5 cents. ;-)
Good luck.
  • asked a question related to Unsupervised Learning
Question
1 answer
I have PCAP files captured from network traffic. What should be done so that PCAP files can be done with machine learning tools? What steps are needed so that data can be analyzed with one of the unsupervised methods? Does the data have to be changed to CSV format?
Relevant answer
Answer
I think you should analysis PCAP file and converting each package to record before using the data with machine learning algorithms. I suggest to read the KDD Cup 99 Intrusion Detection dataset for understanding the information of each package.
  • asked a question related to Unsupervised Learning
Question
9 answers
Suppose we have users, for each user, we have: user_id, user_name, user_job title, user_skills, user_workExperience. I need to cluster the user based on their skill and work experience( long text data), put the users into groups. I was searching about how to clustering text data but still didn't find a good example to follow" step by step". Based on the data I have I think I should use unsupervised approach (as the data I have is not labeled), I found that I can use K-Mean or hierarchical clustering, but I'm stuck in how to find: K "number of clustering with K-Mean". Also, I don't know what is the best way to prepare the long text before fit into the clustering algorithm. Any idea or example that can help me, would be very appreciated. Thanks in advance.
Relevant answer
Answer
In addition to what was recommended above, I suggest that you should first think how better create data vectors. As I understand, you have categorical data. For example, "user_job title" is a list of professions expressed by words, not numbers. So, instead of using N digits to represent these professions in a vector you can use a binary scheme called "one-hot encoding".
Concerning the number of clusters, it is actually an important issue that shows what result you want to get and what exactly you want to know about your data. You can ask experts, how many categories they expect to see as the result, or to create a scale of "skill and work experience" and divide it into segments. Or you can ask experts to label a small group of users and use the labeled data as a standard. You can also try to analyze your data mathematically or statistically, however, the method will again depend on what you want to know about your data in the end.
  • asked a question related to Unsupervised Learning
Question
11 answers
which method extracts the better features for unsupervised learning: PCA or Auto Encoder?
Relevant answer
Answer
PCA will result in slower processing compared with an Autoencoder and note that a non-linear AE will be non-linear except when the input data is spanned linearly.
Question
9 answers
I am presently working on unsupervised learning for text classification.
The data is entered by end users in the business domain and can be on varying subjects.
Any new subject can get triggered at any point in time - hence continuous learning for creating new clusters/ classes based on the text entered text is required.
Thus want to avoid having any seed values such as density/ epsilon/ number of clusters etc.
Is there any such algorithm already known to find number of clusters, and cluster the data incrementally (tried Gaussian measure till now with other basic clustering algos - kmeans, dbscan etc)
Relevant answer
Answer
My papers (about HiDoclus and OhDoclus) and the thesis proposal are available on my profile page with full text.
If you have any problem please let me know and I can send them to you.
Rui Encarnação
  • asked a question related to Unsupervised Learning
Question
6 answers
Hello,
Is any one already worked with unsupervised image segmentation? If so, please give me your suggestion. I am using an autoencoder for Unsupervised image segmentation and someone suggests me to use Normalized cut to segment the image... Is there any such algorithm other than Normalized cut??? Also please suggest me some reconstruction loss algorithms which are efficient to use.
Thanks in advance,
Dhanunjaya, Mitta
Relevant answer
Answer
You can take a look here as well :
a very interesting lecture.
Question
4 answers
Hello,
Is anyone already worked with MR image data set??? If so, Is there any model to remove the motion artifacts in the MR image data set if contains??? What should we do if we have an MR image with motion artifacts??? Please give me your suggestions if it is possible to remove artifacts once the scan is produced.
Thanks in advance,
Dhanunjaya, Mitta
Relevant answer
Answer
motion artifacts produce a systematic motion-induced blurring error, which is different from, e.g., out-of-focus errors in visual light images or artifacts generated by metallic objects in CT images, which are not systematic. The correction of the systematic blurring type of error has been subject to study with a special type of GANs - Adversarial Generative Convolutional Neural Networks, called DeblurGANs.
In this article you'll find references to datasets and code and more explanations. The example datasets are street views but I think you'll be able to apply them to MRI.
Aldo
  • asked a question related to Unsupervised Learning
Question
6 answers
Hello,
I want to know the difference between Reinforcement learning from Supervised and Unsupervised learning. There is a Reinforcement learning technique called Q-Learning. Anybody please explain the working concept of Q learning method. Looking forward for useful comments on this.
Thanks
Question
5 answers
I'm a newbie in the field of Deep Reinforcement Learning with background in linear algebra, calculus, probability, data structure and algorithms. I've 2+ years of software development experience. In undergrad, I worked in tracking live objects from camera using C++,OpenCV. Currently, I'm intrigued by the work been done in Berkeley DeepDrive (https://deepdrive.berkeley.edu/project/deep-reinforcement-learning). How do I gain the knowledge to build a theoretical model of a self-driving car ? What courses should I take? What projects should I do ?
Relevant answer
Answer
Hi Aniruddha,
If you are able to spend some money on acquiring the knowledge, then Udacity's Self Driving Course is one of the best places to get started. More info at https://in.udacity.com/course/self-driving-car-engineer-nanodegree--nd013
The best part is they have open sourced some part of the codes which can be a great starting point. The codes are available at https://github.com/udacity/self-driving-car
To write software for self driving cars, I would recommend using ROS (http://www.ros.org/). ROS have many inbuilt functionalities like object detection, path planning, node controls etc which can get you started easily. ROS Wiki (https://wiki.ros.org/) can offer you a glimpse of what ROS is capable of.
ROS turtlebot autonomous navigation (https://wiki.ros.org/turtlebot_navigation/Tutorials/Autonomously%20navigate%20in%20a%20known%20map) will be a great tutorial to start with.
Though I have never used, https://www.duckietown.org/independent/guide-for-learners is also an interesting platform to start with.
Regards,
Vishnu Raj
PS: If you find this answer useful, don't forget to upvote.
Question
9 answers
In some years what to expect?
is it unsupervised learning?
Relevant answer
Answer
  • asked a question related to Unsupervised Learning
Question
17 answers
For one of my studies, I designed an unsupervised predictive clustering model, and now searching for some modification steps and post processing to use that clustering model for classification in a reliable way.
Relevant answer
Answer
For supervised learning we need to have a labeled data set. If not, it is good to run unsupervised learning algorithms for automatically labeling unlabeled data. Once the data is labelled using clustering algorithms, then it is possible to use supervised learning algorithms. For linking the two tasks a simple script can be written that connect the output of clustering as an input for the classification task.
Question
8 answers
In MATLAB, clustering data using kmeans can be achieved as shown below:
L = kmeans(X,k,Name,Value) where L is cluster indices which is for each data point.
It implies that is if I have 307 data points I'm to have a 307 x 1 array(L) which is the index for each data point.
However, while using SOM for clustering I discovered to get the index you use the code snippet below:
net = selforgmap([dimension1 dimension2]);
% Train the Network
[net,tr] = train(net,X);
%get indices
L = vec2ind(net(X))';
for a Network with 5 x 5 dimension:
it returns L which is an array with the dimesion 25 x 1 instead of 307 x 1 for a Network with 10 x 10 dimension:
it returns L which is an array with the dimesion 100 x 1 instead of 307 x 1
What am I doing wrong???
or to simply put, how do I compute the class vectors of each of the training inputs ?
Relevant answer
Answer
Dear M. Awad,
The purpose of my previous answer was just for providing a little example. Obviously, the random training samples of my previous answer have to substituted by the actual training samples of the corresponding problem.
Kind regards,
Carlos.
Question
6 answers
1. What is the best approach to assign weights to queries?
2. Is it conventional to use same weights for entire query set?
Thanks
Relevant answer
Answer
You can be interested in the following publications
Question
12 answers
I'm new to Matlab, I'm wondering if someone can help me to get start with machine learning task.
I would like to perform Linear discriminant analysis (LDA) or support vector machine (SVM) classification on my small data set (matrix of features extracted from ECG signal), 8 features (attributes). The task is binary classification into a preictal state (class 1) and interictal state (class 2).
In Matlab, I found (Classification learner app), which enable using different kinds of classifiers including SVM, but I don't know if I can use the input data that I have to train the classifier in this app?. I'm not sure how to start? Do you have any idea about this app? please help!
Relevant answer
Answer
I made a youtube video for this. See https://www.youtube.com/watch?v=Db9Bnss8b-8&t=68s
Thanks
Anselm
Question
4 answers
Could anybody tell me how to use word embedding to expand initial query after getting the most similar words for each query term.?
Relevant answer
Answer
The number of terms to join depends on two factors:
  1. technical limitations of the retrieval engine. If the expanded query or the results get too large the engine may have problems to cope with it. I experienced this once with a search engine project.
  2. What kind of errors you are accepting too make. If you take too few terms into account your recall rate might be too small, if you take too many your precision might be too high.
actually, it all depends on, which similar terms you still consider relevant. Unfortunately, this relevance can't be decided/measured without information about the user's interest. I for myself experiment with some clustering techniques, but haven't found a definite answer yet.
  • asked a question related to Unsupervised Learning
Question
3 answers
I'm having a concrete problem I'm trying to solve but I'm not sure in which direction I should go.
  • Goal: Identify formation of a soccer team based on a static positional data (x,y coordinates of each player) frame
  • Input: Dataframe with player positions + possible other features
  • Output: Formation for the given frame
  • Limited, predefined formations (5-10) like 5-3-2 (5 defenders, 3 midfield players, 3 strickers)
  • Possible to manually label a few examples per formation
I already tried k-means clustering on single frames, only considering the x-axis to identify defense, midfield and offense players which works ok but fails in some situations.
Since I don't have (much) labels im looking for unsupervised neural network architectures (like self organizing maps) which might be able to solve this problem better than simple k-means clustering on single frames.
I'm looking for an architecture which could utilize the additional information I have about the problem (number and type of formations, few already labeled frames, ..).
Relevant answer
Answer
Dear Blust,
Please follow the papers given below:
1. Bialkowski, A., Lucey, P., Carr, P., Yue, Y., Sridharan, S., & Matthews, I. (2014, December). Large-scale analysis of soccer matches using spatiotemporal tracking data. In Data Mining (ICDM), 2014 IEEE International Conference on (pp. 725-730). IEEE.
2. Link, D. (2018). Data Analytics in Professional Soccer: Performance Analysis Based on Spatiotemporal Tracking Data. Springer.
  • asked a question related to Unsupervised Learning
Question
3 answers
I applied supervised and unsupervised learning algorithms on the data set which is available at UCI repository. I want to know further whether can I find the dataset based on the location of the user ,history of previous transactions and time span between two consecutive transactions?
Relevant answer
Answer
  • asked a question related to Unsupervised Learning
Question
12 answers
I have a dataset, which contains normal as well as abnormal data (counter data) behavior .
Relevant answer
Answer
You can used clustering algorithm like
1. K-means/medoid
2. Fuzzy c-means
3. partition-based clustering etc. for the classification issue.
  • asked a question related to Unsupervised Learning
Question
7 answers
I did not use English but one of the under-resourced language in Africa. The challenge is the testing of unsupervised learning.
I am looking for a way to test/evaluate this model.
Refer me links and tutorials about testing/evaluating unsupervised learning.
Relevant answer
Answer
The basic idea is that semantic vectors (such as the ones provided by Word2Vec) should preserve most of the relevant information about a text while having relatively low dimensionality which allows better machine learning treatment than straight one-hot encoding of words. Another advantage of topic models is that they are unsupervised so they can help when labaled data is scarce. Say you only have one thousand manually classified blog posts but a million unlabeled ones. A high quality topic model can be trained on the full set of one million. If you can use topic modeling-derived features in your classification, you will be benefitting from your entire collection of texts, not just the labeled ones.
Getting the embedding
Ok, word embeddings are awesome, how do we use them? Before we do anything we need to get the vectors. We can download one of the great pre-trained models from GloVe:
The (python) meat
We got ourselves a dictionary mapping word -> 100-dimensional vector. Now we can use it to build features. The simplest way to do that is by averaging word vectors for all words in a text. We will build a sklearn-compatible transformer that is initialised with a word -> vector dictionary.
These vectorizers can now be used almost the same way as CountVectorizer or TfidfVectorizer from sklearn.feature_extraction.text. Almost - because sklearn vectorizers can also do their own tokenization - a feature which we won’t be using anyway because the benchmarks we will be using come already tokenized. In a real application I wouldn’t trust sklearn with tokenization anyway - rather let spaCy do it.
Now we are ready to define the actual models that will take tokenised text, vectorize and learn to classify the vectors with something fancy like Extra Trees. sklearn’s Pipeline is perfect for this:
Sharing link:
  • asked a question related to Unsupervised Learning
Question
4 answers
Nowadays there are plenty of core technologies for TC (Text Classification). Among all the ML learning approaches, which one would you suggest for training models for a new language and a vertical domain (alike Sports, Politics or Economy)?
Relevant answer
Answer
The state-of-the-art for most text classification applications relies on embedding your text in real-valued vectors:
The gensim package is popular for training word vectors on data: https://radimrehurek.com/gensim/models/word2vec.html
This method relies on having rich, diverse collections of words and contexts, which your data may not have on its own. Thus it's popular to initialize your embedding matrix using pre-trained word vectors like word2vec or fasttext; in some cases, these will work out of the box, in some you'll want to continue training the vectors on your dataset, in others it's better to just train on your data alone.
The great thing about embedding methods is they don't care about language; you can create an embedding for any language or really any sequential data that endows discrete data with a sort of 'meaning'.
Once you have richer features from your embedding matrix, you can use these as inputs to a classifier, which can be as simple as softmax regression, which assigns probabilities to discrete classes, or as complex as an RNN/LSTM, which ultimately can do the same but typically for sequential data.
The choices you make here depend more heavily on what specific problem you're trying to solve, but here are a few examples:
  • asked a question related to Unsupervised Learning
Question
7 answers
Dear all respectful researchers,
I am working on a structured biomedical dataset that consists of many data type inconsistency, outliers and missing values (instances) on seven independent variables (attributes). I am considering to perform pre-processing methods such as data standardization and also imputations to improve the issues mentioned above. However, there are two version of the pre-processing methods, that is, supervised and unsupervised ones.
My main two questions regarding the common practice are:
1. Should I perform unsupervised discretisation method on the dataset to solve data type issue when, subsequently, I conduct cluster analysis using k-means cluster algorithm?
2. After completing the first clustering task above, should I perform supervised discretisation method on the same dataset when I train the model for classification task using supervised machine learning algorithms?
Thank you for your information and experience.
Relevant answer
Answer
Pekka already answered your question to me.
Naturally both your development data and validation data require pre-processing. What I mean is that you should under no circumstance pool both data sets and pre-process it as a whole for classification. Take for instance normalisation as a pre-processing step. Development data is used to determine scale and shift parameters. Both the development and validation data are then normalised using these parameters. If we were to use the entire combined data set for normalisation, we would bias our classification models. The other alternative (a separate normalisation based on the validation set) is not as problematic, but may decrease classification performance if the validation set is small. This does not only apply to normalisation, but to other pre-processing steps (imputation, PCA, clustering etc.) as well.
Thus, you may perform one analysis on the combined data set, but the information generated should generally not be used for building models. In your case, you may perform a segmentation study on the whole data set, and also perform classification if, and only if, segmentation is not population-based (i.e. no information about other data is used when segmenting one data set), or if the segmentation results are not used for classification.
Question
2 answers
In top journal papers, there are many works which are being carried out on accelerometer signals. Most of them undergo the following steps
1. Handling signals with different length --- Never mentioned in any paper
1. Pre-processing(filtering out noise) - Optional
2. Signal Segmentation
3. Feature extraction
4. Supervised (or) Unsupervised learning
Nevertheless, none of the papers mentioned the technique used by them to handle signals of different lengths for example 600 secs to 13,200 secs variation(with same sampling rate 100Hz). Since such missing information can lead to inaccurate comparisons, i'm surprised that top journals didn't give importance to this issue. I would like to know the best technique to handle varied signal lengths. Please don't consider the sampling rate issue since all signals have the same sampling rate. I would like to know the most commonly used technique to handle signals with different lengths.
Relevant answer
Answer
Hi,
Prior work in signal processing and machine learning for time series problems (including the accelerometer signals) has strongly emphasised on regularly sampled and equal length time series signals, resulting in fewer methods that exist specifically for analysing irregularly sampled and particularly, extremely very few methods for unequal in length signals.
Irregularly sampled time signals: For analysing the irregularly sampled time series directly, many techniques have already been proposed for example spectral analysis [3, 4] and kernel-based methods [5]. These techniques have already been used for extracting causal structures, statistics from data in fields such as from astronomy [6, 7], palaeontology [5] and economics [8]. On the other hand irregular time series data are transformed into regularly spaced data through some form of interpolation. For supervised learning tasks such as classification tasks, this has the added advantage of enabling the time series to be equalised to a given length by sampling from the interpolated function, thereby enabling standard classification algorithms to be used.
Unequal length time signals: To tackle the problem of unequal length time series problem, we recently proposed a new machine learning regression algorithm known as the probabilistic broken-stick model [1].  Using a set of locally linear line segments, our  novel algorithm can model any complex, non-linear function catering  for both short term interpretability and long term flexibility of any any irregularly sampled/ unequal in length time series simultaneously. This article is now freely available for over a month (until 1st Jan 2018) and if you are interested, please find it here https://lnkd.in/e3-Wd6Y . This paper is also available on arxiv: https://arxiv.org/pdf/1612.01409.pdf
In an another paper [2], we proposed Gaussian process regression to make the unequal length time series equal in length for the supervised learning i.e., classification which is published in Machine learning for signal processing. The paper is also available on arxiv: https://arxiv.org/pdf/1605.05142.pdf
From references [1,2], you may find citations to other researchers' works that you may find interesting.
All the best
Santosh
References:
[1] Norman Poh, Santosh Tirunagari, Nicholas Cole, Simon de Lusignan, Probabilistic broken-stick model: A regression algorithm for irregularly sampled and unequal length data with application to eGFR, In Journal of Biomedical Informatics, Volume 76, 2017, Pages 69-77, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2017.10.006.
[2] Tirunagari, Santosh, Simon Bull, and Norman Poh. "Automatic classification of irregularly sampled time series with unequal lengths: A case study on estimated glomerular filtration rate." In Machine Learning for Signal Processing (MLSP), 2016 IEEE 26th International Workshop on, pp. 1-6. IEEE, 2016.
[3] Michael Schulz and Karl Stattegger, “Spectrum: Spectral analysis of unevenly spaced paleoclimatic time series,” Computers & Geosciences, vol. 23, no. 9, pp. 929–945, 1997.
[4] Petre Stoica, Prabhu Babu, and Jian Li, “New method of sparse parameter estimation in separable models and its use for spectral analysis of irregularly sampled data,” Signal Processing, IEEE Transactions on, vol. 59, no. 1, pp. 35–47, 2011.
[5] Kira Rehfeld, Norbert Marwan, Jobst Heitzig, and Jurgen ¨ Kurths, “Comparison of correlation analysis techniques for irregularly sampled time series,” Nonlinear Processes in Geophysics, vol. 18, no. 3, pp. 389–404, 2011.
[6] Piet Broersen, “Time series models for spectral analysis of irregular data far beyond the mean data rate,” Measurement Science and Technology, vol. 19, no. 1, pp. 14, 2008.
[7] C. Thiebaut and S. Roques, “Time-scale and time-frequency analyses of irregularly sampled astronomical time series,” EURASIP J. Appl. Signal Process., vol. 2005, pp. 2486–2499, 2005.
[8] Ulrich Muller, “Specially weighted moving averages with re- ¨ peated application of the ema operator,” 2000.
  • asked a question related to Unsupervised Learning
Question
10 answers
with respect to unsupervised learning such as clustering, are there any metrics to evaluate the performance of unsupervised learning as well as supervised learning?
Relevant answer
Answer
There are various index measures are available in the literature for evaluating the cluster and also go through the book of Prof. A. k. Jain regarding the cluster validity.
  • asked a question related to Unsupervised Learning
Question
5 answers
Deep Learning Multilayer Perceptron is based on supervised learning while Deep Belief Network is based on unsupervised learning? Looking at the malware detection situation, which method will be the best?
Relevant answer
Answer
I think this questions have two aspects to correctly answer. First, the detection rate is higher in the supervised learning than unspervised. Second, the unspervised can easily detect know and unkown attacks and this will be better than the supervised one. You, ultimatel, can apply ensemble deep learning to improve your scheme of detecting malware.
  • asked a question related to Unsupervised Learning
Question
3 answers
Hi,
Please, when do I force for unsupervised learning and what is the benefit of unsupervised learning techniques?
Thanks & Best wishes
Osman
Relevant answer
Answer
hey osam,
in unsupervised learning, you do not have a supervisor which tells you what is right and what is wrong i.e. you do not have input data with example output data where you want to learn the input/output relation.
normally, you use unsupervised learning when you want to find general structures in your data. for example major trends, a low-dimensional embedding or clustering.
do you have a particular problem or can you refine your question?
best,
christoph
Question
4 answers
I have implemented 3 bootstrapping models for learning product details, then I did the comparison with more different performance measure and found out which model learned good, but I want to do something like optimization/ensemble (is these possible with the models result?) or please suggest some other simple process to conclude my work. Moreover, the work was performed in an unsupervised manner. So please help me how to do improvisation in my models results (like tp,tn,fp,fn or learned product details). Thanks in advance.  
Relevant answer
Answer
 thanks for your ideas. I will try these suggestions. once again thanks all
Question
2 answers
As the basic concepts used in association rule learning are related to conditional probability and ratio to independence, I was wondering if Correspondence Analysis has been used in the literature with this. I understand the main motivation in association rule learning is efficiency in CPU time and memory usage but these days SVD (Singular Value Decomposition) is pretty fast and some algorithms can be very scarce in memory usage?
Relevant answer
Answer
.
MCA on binary data will lead into a continuous factorial space (admittedly with a much smaller dimensionality if strong correlations are present)
  • to run AR on this space, you'll need to discretize the factorial space (not so easy)
  • to intepret the AR results, you'll need to have first a good interpretation of your factors (not easy at all, except in textbooks, maybe)
one advantage of AR is that the results can directly be interpreted in the description space ; this is lost after MCA
however, MCA has been used together with AR so as to rank/select the "best" rules :
.
Question
8 answers
Is there any way to compare the accuracy or cost of these two methods with each other ? SVM and K-Means clustering ?  
Relevant answer
Answer
Hi Sepideh,
As you mentioned previously, they belong to different Machine Learning concepts. K-means is utilized when we don't know the labels of our training samples (Unsupervised Learning), whereas SVMs are used for Supervised Learning, in which we know the class that each training sample belongs to.
Having said that, and as Samer Sarsam mentioned, you can in some way convert the clustering problem to a classification one. One method can be the following:
After running the K-means algorithm, we are going to have each of the training samples assigned to a specific cluster. Then, what we can do is to assign or "classify" the centroid of each cluster to the class that is most "voted" for the members of the cluster.
Let's see this concept with an example. Let's assume that we have 100 training examples, 4 clusters (C1, C2, C3 and C4) and we know the labels of each of the 100 training examples (say class 1 or 2). After running the 4-means algorithm we arrive at the following configuration: C1 has assigned 20 samples, C2 another 20 samples, C3 has assigned 30 samples and C4 another 30 samples:
                   class1     class2
  • C1:  15              5
  • C2:  19              1
  • C3:   5              25
  • C4:   2              28
Now, we can assign C1 and C2 to class 1 and C3 and C4 to class 2. At test time, when a new sample arrives, we can classify the test sample to the class defined by the closest centroid.
A similar concept can be reviewed when using Self-Organizing Networks (SOM) for classification purposes.
Regards,
Carlos.
  • asked a question related to Unsupervised Learning
Question
3 answers
kindly tell me if i am using a dataset which have to produced binary class attributes like Student Result "Pass" or :Fail".   i have data which have 80% pass students and 20% fail. in reality it is true because same ratio is observed in real life however due to problem of intention the classifier towards majority class here i think needs to be it balanced. the question is that in this case 50/ 50 balancing will be consider right whereas it is impossible or 60/40 should be right or other?
Relevant answer
Answer
I think class distribution is not an issue  may vary depending on nature of classifier. Once keep in mind, when you prepare development and validation set try to keep class distribution uniformity. So do some stratification  before modeling.
Question
2 answers
The system should use the context of the item to select relevant data (PCA), then, use k-means to do clustering and finally use IBCF to generate top-n recommendations. I need the detailed algorithm for this task (PCA+K-Means+IBCF).
Relevant answer
Answer
Although PCA is not a clustering method, it can help to reveal clusters and it’s quite good for reducing dimensionality as a feature extractor and to also visualize clusters. You can run a classifier directly on your data and record the performance. But in case you are not satisfied; try PCA by selecting the number of components at the tip of sorted eigenvalue plot. Then, run the K-means. If it produces good clusters, then PCA and classifier could do the magic.
The amount of clusters is determined by 'elbow' approach according to the value within groups sum of squares. Basically, you repeat K-means algorithm for different amount of clusters and calculate this sum of squares. If the number of clusters equal to the number of data points, then sum of squares equal 0.
Question
4 answers
At training phase, I applied k-means clustering algorithm on a dataset (k=10). I want to apply decision tree on each cluster and generate a separate performance model for each cluster
At testing phase, I want to compare each test instance with the centroid of each cluster and  use appropriate model to classify the instance.
Is there a way I can achieve this in WEKA or WEKA API. Any link/resource will be appreciated
Relevant answer
Answer
Hi Johnson,
Several ways to do that; for example, you can use WEKA's KnowledgeFlow as follows:
Training set will be passed into SimpleKMeans, after that prediction model gets built using J48 based on the generated clusters. All this process is evaluated using cross-validation.
NB: The attached image shows the flow process where FilteredClassifier is containing both cluster and classification algorithms.
HTH.
Samer
Question
10 answers
Hello Dear Members, please does anyone have an idea on feature selection? I will be very grateful for your ideas and information.
Relevant answer
Answer
Dear Og Ann
This may help guiding you,
A high level overview of feature selection
The machine learning community classifies feature selection into 3 different categories: Filter methods, Wrapper based methods and embedded methods.
Filter methods
These include simple statistical test to determine if a feature is statistically significant for example the p value for a t test to determine if the null hypothesis should be accepted and the feature rejected. This does not take into account feature interactions and is generally not a very recommended way of doing feature selection as it can lead to lost in information.
Wrapper based methods
This involves using a learning algorithm to report the optimal subset of features. An example would be how RandomForest is widely used by the competitive data science community to determine the importance of features by looking at the information gain. This can give a rather quickly and dirty overview of which features are important which can help provide some informal validation of engineered features. Tree based models like RandomForest are also robust against issues like multi-collinearity, missing values, outliers etc as well as being able to discover some interactions between features. However this can be rather computationally expensive.
Embedded Methods
This involves carrying out feature selection and model tuning at the same time . Some methods include greedy algorithms like forward and backwards selection as well as Lasso(L1) and Elastic Net(L1+L2) based models . This will probably require some experience to know where to stop for backwards and forward selection as well as tuning the parameters for the regularization based models.
------------------------------------------------------------
The simplest method is probably Univariate Feature Selection where a statistical test is applied to each feature individually. You retain only the best features according to the test outcome scores; see the scikit-learn documentation for the list of commonly used statistical tests:
A more robust but more "brute-force" method where the impact of combined features are evaluated together is named "Recursive Feature Elimination":
First, train a model with all the feature and evaluate its performance on held out data.
Then drop let say the 10% weakest features (e.g. the feature with least absolute coefficients in a linear model) and retrain on the remaining features.
Iterate until you observe a sharp drop in the predictive accuracy of the model.
Here his an example, again using scikit-learn:
And here is the implementation:
If you want to train a (generalized) model be it for classifications (e.g. logistic regression) or regression (e.g. ridge regression a.k.a. l2 penalized least square), you should consider adding a L1 regularizer that will promote feature selection *while* learning: coefficient for the weakest features are set to zero by the learning algorithm itself. The usual name for L1 penalized linear method is "the Lasso":
The optimal value of the strength of the regularizer can be found by cross validation. To do this efficiently you can either use the LARS method as described here:
LARS allows to compute the values of all the coefficient for different values of the penalization strength (a.k.a. the regularization path) very efficiently:
It is also possible to compute a regularization path efficiently with other algorithms such as Coordinate Descent (and maybe also Stochastic Gradient Descent) using the "warm restarts" trick:
To account for the instability of Lasso when dealing with highly correlated features, you should either consider combining the L1 penalty with L2 (the compound penalty is called Elastic Net) which will globally squash the coefficients but avoid randomly zeroing one out of 2 highly correlated relevant features:
Stability in feature selection models for the pure Lasso can also be achieved by bootstrapping several Lasso models on dataset folds and selecting the intersection (or union, I am not sure) of the non zero-ed features. This method is called BoLasso and is not yet implemented in the scikit (pull request always appreciated): the reference paper is:
Other ways of "fixing the Lasso" (taken from F. Bach NIPS 2009 tutorial):
adaptive Lasso (Zou, 2006),
relaxed Lasso (Meinshausen, 2008),
thresholding (Lounici, 2008),
stability selection (Meinshausen Buhlmann, 2008), Wasserman and Roeder (2009)
Edit: stability selection has been implemented in scikit-learn more recently.
Finally if you are in a Bayesian mood, smart priors such as the one used in Automatic Relevance Determination (ARD) should give you similar results:
  • asked a question related to Unsupervised Learning
Question
3 answers
I get the number of components for my dataset using BIC but i want to know if the Silhouette coefficient method is the right option to validate my results.
Thanks!
Relevant answer
Answer
Silhouette analysis is based on the distance of the data point, it is very friendly to linear based clustering such as K-Mean. As a measure strategy, you can try to use it but its performance on density natural data might not be very ideal.
Meanwhile, if you want to know whether the number of components is a correct choice, may be you can try Variational Bayesian Gaussian Mixture. This is a deviation of traditional GMM which could automatically output the best component numbers.
Question
3 answers
Thanks to Prof. Erkki Oja it is well known that a neuron using simple Hebbian Learning (including a weight decay) learns to extract the first principal component of the input data.
However, I'd like to validate my intuitions about how this generalizes to a Hebbian network which makes use of competitive learning/lateral inhibition between neurons within a layer.
So, considering I have a competitive neural network model with multiple Hebbian neurons (arranged in a layer) I would assume that the neurons roughly learn to differentiate along the first principal component.
Could anybody please validate or reject this supposition or/and provide any literature regarding that topic? Most sources only consider single or chained (Sanger's rule) Hebbian neurons.
Relevant answer
Answer
Thanks a lot for that answer, Amir. I don't have access to this book, but discriminating optimally among an ensemble of inputs sounds pretty much like what I expected :)
Question
5 answers
hi there , i am trying to perform clustering on some multi-variate continuous numerical data, just wondering if any one has tried to use R in this using deep learning algorithms ? i only found that autoencoders are the unsupervised learning algorithm in unsupervised learning.
Relevant answer
Answer
Auto encoder left input space to feature space also simultaneously cluster  data. If you can use error for constructing data and minimum variance in clustering together you find milestone.
  • asked a question related to Unsupervised Learning
Question
3 answers
I'm developing detector that searches road sign with Matlab, and the camera on the vehicle is moving. HoG-Cascade is firstly used to detect road sign candidates, as you can see on the attached image, but it has lots of "mis-detections", cuz it simply looks like rectangle as I concerned. So I trained HoG-SVM classifier to detect the arrow signs,"<" and ">", which in the road sign. The classifer detects the arrow signs based on sliding window with fixed size. The problem I'm facing is that the camera is moving so the arrow signs get larger as the road sign gets closer to camera(vehicle). Now I'd want to do "multi-scale search" with the SVM classifier, but I couldn't find any functions for that... Any help should be welcome!
Relevant answer
Answer
From the problem you have posted I can suggest u try to use Haar- Cascade classifier for multi scale search.  u can get very good result.
  • asked a question related to Unsupervised Learning
Question
20 answers
including exaples and also the applicable fields in Technology.
Relevant answer
Answer
In supervised learning, the decision on the unlabeled data is done after learning a classifier using available training samples, as examples of supervised classifiers we have Support Vector Machines (SVM), Artificial Neural Network (ANN). Whereas, in an unsupervised system, the classifier does not have any labeled samples. In this later case, the classification is done by exploiting some criteria like Euclidean distance or Fisher separability measure, a common example of the unsupervised classification method is the k-means cluster classifier.
Question
9 answers
I have the city of Edmonton property assessment data. The values were mostly strings so I basically grouped them into numbers (like each neighbourhood now is represented by a number from 1 to 391).
I'm trying unsupervised learning to see what I can learn but the data seem to be incorrect the plots aren't normal. They are mostly straight vertical lines. Should I proceed with the supervised learning and hope for the best? or is it just that the features aren't good enough (not as good as the Boston housing prices dataset) or my interpretation of neighbourhoods to numbers and like residential or not with 1's and 0's.
Relevant answer
Answer
1) As per your problem objective i.e. to determine the expected price of a house can be better to solve by regression analysis. If the problem objective is to identify in what range the price may lie, e.g more than $1000 (+1 classification), less than $1000 (-1 classification) which is a classification problem.
2) If street number actually affects the pricing then it can be taken as an important learning parameter. Normalization of any data set means to change the range of any parameter to (0 to 1) or (-1 to 1) range. So, if street numbers are from 10 to 400 then that can be normalized to (0 to 1) by a simple mathematics
Xnew=(X-Xmin)/(Xmax-Xmin);
here Xmax=400 for above data, Xmin=10
3) The above normalization is for statistical data only. If the data set is in the form of data base format then that normalization is of different type where the normalization is done on the classification basis i.e. in what range the price may appear for the house is categorized into a number.
Still to train a machine succesfully, it is advisable to take statistical normalized data otherwise other data format has to be converted into this type of normalization data form.
Question
6 answers
Hello. I am reasonably new to programming in general, so I'm not looking for detailed advice (code examples). I'm currently learning Python so would prefer answers to my question that are possible with Python (although I'll happily consider other suggestions that don't work with Python, too).
I have two separate numerical datasets. Specific trends in the first dataset cause a change in the second dataset some minutes, hours, days, or weeks later. 
I would like to develop a program that will teach itself what these patterns are, with the ultimate goal of being able to predict changes in the second dataset.
Specifically, this would need to perform unsupervised learning, not supervised learning, as I hope it will be able to detect patterns in the datasets that I am not already aware of.
Could anyone please recommend appropriate tools to use or any good introductory books/websites/tutorials?
Thank you in advance for your advice. :)
Relevant answer
Answer
Dear David,
the scikit-learn package in Python is what you are searching for. There are several algorithms for unsupervised learning (see first link attached) and it is very easy to use. You can start with the simple tutorials on clustering provided on the scikit-learn webisite (see second link attached). 
Best regards,
Luca
Question
20 answers
When an algorithm like genetic algorithm has converged to a solution with a certain minimum value of merit function. How do we know the solution obtained is global minimum? 
Relevant answer
Answer
Mohamed S. Eid: yes, EAs are guaranteeed to converge to the global optimum under some weak conditions (elitism and ergodicity, which is typically the case). Check:
Sunita Parinam: there's no general way you can know that. However, in some cases the fitness of the global optimum or some bound on its value may be known, so you can check the optimality of the solutions found just by inspecting its fitness value.
  • asked a question related to Unsupervised Learning
Question
6 answers
Hi all, 
I am wondering which machine learning model would fit my problem here? I am not sure "online sequence prediction" is the correct term for my problem. Basically I would like to train a model that can predict the label of the current instance based on the features of this current instance and also its preceding labels. 
For example, my training data would be sequences like this:
instance1 -> instance2 -> instance3 -> instance4 -> instance5
And for these instances, I have the features fi for each instance and the label lifor each instance.
Now for the test data, the sequence of instances is feed to the model one by one. 
For instance1, the model would predict the label l1 according to the case that instance1 is in the first of the sequence and the features  f1. And then for instance2, the model shall predict the label l2 for the instance according to features f2 and its previous label l
So the model would feel a little bit like CRF, but through testing I do not have the whole sequence. I would need to do the prediction for each item of the sequence whenever they are fed into the system. Any idea what would be the model for this task?  Thanks in advance!
Relevant answer
Answer
There are many, many ways of doing that, hence do not rush into the first fashionable solution. Despite my 30 years of experience in machine learning, I find it impossible to give a reasonable answer to your question, given the small amount of information you provide. What exactly is your problem : what is the nature of the items you want to classify (assuming that you have a classification problem) ? What are the features, how many of them ? How many classes ? Do the features vary widely from one item to the next ? What is the computation time allowed between two successive items ? 
  • asked a question related to Unsupervised Learning
Question
8 answers
I am trying to build a training model using an unlabeled  dataset, therefore looking for some "unsupervised learning algorithms". However, the data contain some categorical features, therefore conventional clustering methods, such as k-means, cannot be applied. Are there other clustering algorithms which can be used for categorical data?
Relevant answer
Answer
Thank you very much for the suggestions. I will try the different options suggested. However, I don't understand how k-means clustering can be applied. Even if data is converted to numerical values, the result maybe incorrect as the algorithm assumes 'discrete' data as 'continuous, which is a wrong assumption.  
  • asked a question related to Unsupervised Learning
Question
5 answers
The localization and autonomous navigation of robots in unknown environment is still a developing area. What is the best technique developed so far? What are the advantages and disadvantages of unsupervised learning technique for navigation?
Relevant answer
Answer
I think you can refer SLAM (Simultaneous Localization and Mapping) and it's various variants which are Visual-SLAM,EKF-SLAM,FAST-SLAM 1.0,FAST SLAM 2.0,GRAPH-SLAM,ORB-SLAM,MONO-SLAM,GRAPH-SLAM e.t.c.
For the basics start with EKF-SLAM.
Also you can refer PTAM(Parallel Tracking and Mapping).
You can get tutorials for SLAM online (one of the best being of Joan Sola on EKF SLAM) plus for basics refer book"Autonomous Mobile Robots" by Roland Siegwart.
  • asked a question related to Unsupervised Learning
Question
4 answers
Looking at a combination of supervised & unsupervised  learning algorithms that would be best analyse accelerometer and GPS data form mobile devices for Gait recognition 
Relevant answer
Answer
From your question, you want to consider a semi-supervised approach to the problem, this approach is advised to be used when you might not have enough supervised data, but more unsupervised data that you can learn from by clustering or vice-versa, it could also be used when additional/relevant information is to be gotten from any of the approaches. I suggest you can use Weka or RapidMiner to test your data set, and build a model to see how effective and possible it could be, this would also help you to decide on the best algorithms to use in your final implementation.
Question
4 answers
I am working on a project which involves the segregation of citation meta-data in to separate parts such as Author,Title,Volume,Page numbers,and other useful information.Recently, i came across an article(link provided below) which seems to have the most efficient solution for solving my problem,and so i would like to implement it.However, being new to machine learning, i don't know where i should start/begin to implement it.
Qn
1)What tools/Programming language/Libraries can is preferable to implement the logic given in the article?  
2)How can i implement it?
ARTICLE TITLE :FLUX-CiM: Flexible Unsupervised Extraction of Citation Metadata
ARTICLE DOI : 10.1145/1255175.1255219
Relevant answer
Answer
Thank you for your reply Pengbo Zhang.Actually no code was provided by them.Also, a link(provided by them in the paper) to the working tool created by them got expired.And yes, I've E-Mailed the author.
  • asked a question related to Unsupervised Learning
Question
55 answers
E ;earning is a reality now. What will be the fate of e learning in higher education. will it survive?
Relevant answer
Answer
I think / I hope that in the near future, several good methods of teaching will complement, in synergic usage, towards the goal of better education.  (In distant future, I hope a better system will emerge, to satisfy the goal of better higher education for all).
This is my belief, and my high hopes.
  • asked a question related to Unsupervised Learning
Question
4 answers
Classification techniques can be categorized either into supervised and unsupervised learning or into linear, non-linear and decision tree classifications. Kindly help in categorizing the above mentioned classification techniques into the mentioned categories (Supervised, unsupervised, linear and nonlinear classifications). 
Relevant answer
Answer
So we should put CART into nonlinear classification or we have to keep it in a separate category which can be called as Decision Tree Classification. 
  • asked a question related to Unsupervised Learning
Question
10 answers
In respect to the Reggio Emilia Approach as a product of Italy and the professionals who have worked so hard to provide such an informative model for the rest of the world.
U. S. schools continue to utilize teacher-directed, highly structured, and assessment-oriented instruction, even for very young children. 
How can we seek more balance in our classrooms? Through strategies demonstrated by the Reggio Emilia Municipal Schools that more child-directed learning, in-depth project work, larger chunks of time for children to explore and ask questions, have parents become more integral in our classrooms, and come to value the learning process more than the final product or outcome?
Relevant answer
Answer
Hi there
I think these two books would be a good starting point with practical applications and a variety of perspectives:
Abbot, L. & Nutbrown, C. (Eds.) (2001).  Experiencing Reggio Emilia: Implications for Pre-School Provision.  Open University Press: Philadelphia, PA.
Edwards, C., Gandini, L. & Forman, G. (Eds.) (1993).  The Hundred Languages of Children: The Reggio Emilia Approach - Advanced Reflections (2nd ed.).  London: Ablex.
Also, there is currently a big move in New Zealand primary schools to shift back to a more play-based, child-centered approach (rather than the teacher-directed approach you describe as common in US schools).  Some articles referencing the NZ context include:
Soler, J. & Miller, L. (2003). The Struggle for Early Childhood Curricula: A comparison of the English Foundation Stage Curriculum, Te Wha¨riki and Reggio Emilia.  International Journal of Early Years Education, 11(1)
Prochner, L. (2004). Early childhood education programs for indigenous children in Canada, Australia and New Zealand: an historical review.  Australian Journal of Early Childhood, 29(4),
  • asked a question related to Unsupervised Learning
Question
6 answers
Hello,
I'm learning Unsupervised learning and I would like to see a practical example of it in matlab to get a better understanding of it.
Can anyone direct me to any sample available online?
Relevant answer
Hello,
One interesting approach for unsupervised learning is using neural networks. Among them, the self-organizing map (SOM) and the adaptive resonance theory (ART) are two good models to study.
You can search practical examples (implementation) in the MATLAB reference. Below a share a link about some examples using SOM.
Hope it can help.
  • asked a question related to Unsupervised Learning
Question
2 answers
Hi guys ,
I've used a structural distance-based measure to compute similarity between each pair of nodes in an undirect graph. Hence, I calculated a distance matrix "D" such that the distance value "Dij" is simply the shortest-path between node i and node j. However, obtained distance values are absolutes (i.e. 5, 19, 3...etc) and I'd like to normalize them, such that : 0<= Dij <=1. 
the normalized distance value must be converted finally to a similarity value S such that Sij=1-Dij.
can any one guide me to find the appropriate function to normalize absolute distances ?
Relevant answer
Answer
There are many transformations between resemblance measures
For example you can use
s(i,j) = 1/(1+D(i,j))                     
d(i,j) = 1-s(i,j) = D(i,j)/(1+D(i,j))
  • asked a question related to Unsupervised Learning
Question
15 answers
I'm using the SOM tool in SAS enterprise miner to classify a dataset of 666 records into clusters. If I change the order of the observations, the SOM algoritm produces completely different clusters. I can't understand if it is a problem of convergence or if I did anything wrong in setting the options.
Did anyone get similar problems?
Relevant answer
Answer
Dear Selene,
I agree with Oussama.  When the order is randomized, several results should be available. 
The next article treats of theses questions.
  • asked a question related to Unsupervised Learning
Question
5 answers
I want create sentiment analysis polarity (positive, negative) based on the multiple choice questions
What is the best algorithm to use? and please recommend me good article for doing this?
Thanks
Relevant answer
Answer
You don't really need to carry out sentiment analysis if you are using multiple choice questions. "How do you rate our food: good, bad, neutral?" does the job by defining the possible responses. How about asking open questions such as "What do you think about our food" and then using sentiment analysis on the responses. An automated approach to evaluating responses to open questions (there, I've even given you the name of the article to write...)
  • asked a question related to Unsupervised Learning
Question
4 answers
In unsupervised learning how Agglomerative hierarchical clustering can be implemented with evalution of the result?
Relevant answer
Answer
Thanks Mr. Ali..
  • asked a question related to Unsupervised Learning
Question
4 answers
Hi all
I know this replacement is because of the complexity in spiking neurons computations and because many supervised learning algorithms use gradient based methods, then its difficult to use such a complex models for neurons. Here I have two questions:
1) If we use a simple model (like Izhikevich model), then do we have to use such substitution too?
2) Is this replacement just for supervised learning algorithms? or in unsupervisedis it also necessary? Considering in unsupervised there is no gradient and back-propagation (If I think right!!!)
please help me.
Relevant answer
Answer
thank you
  • asked a question related to Unsupervised Learning
Question
4 answers
 Thanks in advance for your replies.
Relevant answer
Answer
.
in addition to Ziemowit's excellent answer above :
this is the result of the on-line training as he describes it
if, for some reason, this (fairly small) dependence is a problem for you, you can always resort to the "batch training" but its main drawbacks are
  • getting stuck more easily in local minima
  • being more dependent on the weights initialization
see
for details
i stress that this dependence on the order for on-line training is very small as far as the final result (neuron weights) is concerned ; i have never considered it a practical problem in any way when the map is properly (slowly enough) trained
.
  • asked a question related to Unsupervised Learning
Question
1 answer
I am currently looking at various methods like transfer spectral clustering, self-taught clustering, etc and was wondering if someone who has some expertise with these methods could provide some more intuition to these methods.
Relevant answer
Answer
InsyAllah I will explain in detail soon.
  • asked a question related to Unsupervised Learning
Question
1 answer
SOM weight update
Relevant answer
Answer
In almost any ANN strucure for any used distance measure, weight vector updating is reasonable. It is the baisis of iterative training algorithm of ANN
Question
1 answer
Hi all, I am attempting to measure magnification (see for instance www.mitpressjournals.org/doi/abs/10.1162/089976606775093918) in practical experiments with neural gas applied to geometric inference.
Does anyone have any suggestions or comments about the measurement method?
Relevant answer
Answer
Contact Thomas Villmann or his excellent colleague Barbara Hammer. Barbara will certainly give good comments! Another possibility is to look at the papers of Timo Honkela and his former team at Aalto University.
  • asked a question related to Unsupervised Learning
Question
8 answers
How can we collect dataset similar to KDDcup 99 dataset in real environment? We want to check the performance of an unsupervised anomaly algorithm by collecting real dataset 
Relevant answer
Answer
The KDDcup99 dataset consists of connection-based features on the one hand, and host-based features on the other hand (additionally, there are statistical features, which can be calculated out of the others). In a research project I used to work for, we built a large test-network where we wanted to collect a KDDcup99-like dataset in near-real-time.
Most of the connection-based features in KDDcup99 can easily be collected using TCPdump, as mentioned by Andrey Ferriyan. Actually, in our research using TCPdump together with a self-written normalization/correlation tool, we were able to collect all of the connection-based and statistical features. For calculation of features like "flag" or "*_rate", we were required to keep track of a number of x connections or x seconds at any time.
However, collecting the host-based features - like "num_failed_login", "root_shell", and so on - requires another mechanism and can get extremely difficult. In our case, there were only three host-based features we were interested in, all of which related to authentication. To be able to collect those features, we utilized Syslog and SNMP, allowing us to extract it directly from the hosts in question. Correlation with the connection-based features were done using the self-written tool mentioned earlier.
A last problem, which I assume cannot be solved by any automatic framework when collect the data out of the wild, is the labelling of the data.
There are some publications regarding our framework, that might be of interest for you - none of them allowed to be uploaded here. If you're interested in receiving a copy of (at least the one I was involved in), please let me know.
M. Salem, U. Buehler, “Reinforcing Network Security by Converting Massive Data Flow to Continuous Connections for IDS,” International Conference for Internet Technology and Se-cured Transactions, IEEE, pp. 570-575, London, UK, 2013
M. Salem, U. Buehler, S. Reissmann, “Persistent Dataset Generation using Real-Time Operative Framework,” IEEE International Conference on Computing, Networking and Communi-cations (ICNC), CNC Workshop, pp.1023-1027, Hawaii, USA, 2014.
M. Salem, U. Buehler, “Transforming Voluminous Data Flow into Continuous Connection Vectors for IDS,” International Journal for Internet Technology and Secured Transactions (IJITST), ISSN: 1748 - 5703 (Online), 2014.  - submitted article -
Question
9 answers
I've been reading about Fisher-jenks natural breaks algorithm where the author describes it as an 'image segmentation' algorithm, while a method like Isodata is described as an image classification method. Is there a difference between these two terms? Why can't the Fisher-jenks algorithm be considered an image classification method as well?
Relevant answer
Answer
Segmentation methods divide a digital image into (usually small) groups of connected pixels. Each group (aka segment, or image-object) has a unique numeric ID (e.g., 67897) in the resulting raster (aka partition). That is, pixels belonging to that group will all have that particular Digital Number (e.g., DN=67897), but no other pixels outside that group will have that DN. In contrast, classification methods assign a class to each element, be it individual pixels or segments. Given a per-pixel classification, there will be groups of connected pixels sharing the same class, but pixels belonging to separate groups will have the same DN in the classified raster (e.g, DN=8). There are labeling algorithms that can assign a unique ID to each group, so you can derive a segmentation (aka partition) from a classification, but you cannot derive a classification from a segmentation, for you don't know (yet) what the different segments have in common (i.e., you have an ID, but you don't have a class).
As for the Fisher-Jenks algorithm, from what I read, it looks like it is actually a unsupervised classification method:
'The Jenks optimization method is a data classification method designed to determine the best arrangement of values into different classes. This is done by seeking to minimize each class’s average deviation from the class mean, while maximizing each class’s deviation from the means of the other groups. In other words, the method seeks to reduce the variance within classes and maximize the variance between classes" (Wikipedia)
This is very similar to what ISODATA does, except that the final number of classes is user-defined (in ISODATA, the final number of classes can be less than the maximum number specified by the user). We would need the reference of the paper you were reading to tell whether the author got it wrong, but it looks like it.
Finally, to avoid ambiguity, it is better to reserve the term 'segmentation' for the process and 'partition' for the output.
Bottom line: segmentation divides the image into internally homogeneous chunks, but doesn't tell you what the chunks are made of. Classification does.
Question
15 answers
I'm looking for a method for unsupervised classification of big data with an unknown number of clusters. Can you suggest a robust method? Is there any Matlab toolbox dedicated to this purpose?
Relevant answer
Answer
Hello there,
Regardless the generic learning method adopted for the given classification task, BigData (where by BigData I understand a scale comparable with data from social networks like facebook, twitter, linkedin, data from web blogs, comments and personal documents, data from public image repositories like instagram, flickr, picasa etc. and from movie repositories like youtube etc., data from internet searches, or from large prime numbers searches etc.) requires some specific adaptations such as negotiating a good balance between online learning, partial learning, parallel&distributed learning. The results of this negotiation should be compatible, of course, with the manner in which you choose to express and test the levels of intra-class similarity and inter-class non-similarity, which , on the other hand are very much data-specific. These are the critical aspects when designing classification algorithms for BigData. Ready to run algorithms for a specific problem - I'm not so optimistic. A nice inventory of BigData techniques is here: http://www.mapr.com/blog/big-data-zz-%E2%80%93-glossary-my-favorite-data-science-things#.UzAwkaiSwsA
Kind regards,
Nicolaie Popescu-Bodorin,
  • asked a question related to Unsupervised Learning
Question
5 answers
Lets say, I have decomposed a matrix (of terms and documents) svd = UXV where U and V are orthogonal matrices. I am not sure how I can interpret this in a scatter plot. Explanations provided in terms of 2 dimensions are highly appreciated.
Relevant answer
Answer
Hello Arun,
For me, performing SVD on a matrix of data is very close to Principal component analysis (PCA) well known in statistics. As far as I understand, you have a data matrix X in which perhaps "documents" are the rows and "columns" are the terms. Perhaps an element of the matrix is the occurrence of the terms in the document.
In such a situation, if you carry out a SVD on centred X, you will have, in S, a matrix which is proportional to the scores of PCA, and, in D, the matrix of the PCA loadings (or eigenvectors of X'X).
By plotting the columns of X (for example column 1 and 2) you will have a map, in which each point is a document. Documents which are close on the map have some kind of similarity. There are no particular reasons to imagine that the rank of X is 2. It is relevant and interesting to look at others pairs of columns in S.
In the same way, the columns D can be plotted . A similarity between the terms can be seen as a proximity on this new map. There are some details on the standardization of S and V that I have not explained here. Please consult some statistics book dealing with PCA.
By the way, if X is really a matrix of occurrence, I would suggest to test a variant of PCA called "correspondence analysis" adapted to such kind of data.
Dominique
  • asked a question related to Unsupervised Learning
Question
8 answers
I'm using the feedforward neural network with 100 inputs and 10 units in hidden layer and one output neural network. I train the network several times using the same input training data and the same network architecture/settings with random initialize but I understand that there will be differences in the weightings produced within the NN each time and that no two neural networks will be identical, but what can I try to produce networks that are more consistent across each train, given the identical data?
Relevant answer
Answer
Typically languages such as Matlab or R allow you to fix the random number seed generator. Depending on your code, this may have to be done just prior to the neural network training. Often this is a single number.
As Raphael said, it is unclear if there are any benefits from doing that. You will ensure replicability, but the not necessarily good performance of the network. My suggestion is to use an ensemble of networks. Given sufficient members, its performance will be more or less stable. You still might wish to fix the initialisation seed to have full replicability, but at least in that case your network performance will not be severely handicapped.
  • asked a question related to Unsupervised Learning
Question
1 answer
I'm implementing a baseline system that uses a Vocabulary Tree, i.e. a BoF model based on HKM for images classification. Currently I'm obtaining low quality recognition due to the poor quality of the quantization structure resulting from the HKM.
I'm obtaining 100.000 words in my final vocabulary in a tree of depth 6 and branch factor 10 where the theoretical number of words is 10^6 = 1'000.000
In several papers, like those of Zisserman related to Large Scale Landmark recognition, they claim to be using a 1'000.000 words vocabulary, something I found difficult to understand since this number is theoretical while in practice there is no guarantee on obtaining it.
Am I understanding something wrong? If not, what should I do to increase the vocabulary size despite of using more descriptors to train the tree?
PS: my only hint so far is using a different seeding algorithm for clusters initialization like k-means++ or gonzales
Relevant answer
Answer
May be im wrong by i try to answer.
They use nice tool vlfeat, but in my practice there is some limitaton on dictonary size (Dict x Feature) by memory.
I suggest to use soft voting.
Question
8 answers
I am a bit confused with "No of clusters" and "No of Seeds" in K-Mean clustering algorithms. Kindly provide an example for understanding the point of view. What is the effect if we change either?
Relevant answer
Answer
To decide what is the best number of cluster is a different problem than to decide how to set the values of the seeds.
The first problem is how to decide the"value of k" in k-means (k= amount of clusters), because any additional cluster improves the quality of the clustering but at a decreasing rate, and having too many clusters may be useless to decision makers, data comprehension, data explanation, etc.
The number of initial seeds (initial centers of clusters) is the same as number of clusters (at leats in the original k-means). The problem of the VALUES of the seeds is different than problem of number of clusters... normally you would use random cluster centers, but some research points to better ways to choose them. With better seeds, k-means converges faster and the quality of the clusters is good.
I remember that there is variations of k-means mixed with hieralchical methods, in those, you use more than k seeds, and later you must collapse (unify) some clusters, like in hieralchical clustering, until reducing the number of clusters to k. In that method, final number of clusters is not equal to initial number of seeds.
  • asked a question related to Unsupervised Learning
Question
2 answers
Given two groups of data (blue & red line in the figure), what's the most efficient unsupervised classifier that can locate the blue line in the figure?
Relevant answer
Answer
Are these 2 time series, i.e. 2 y-values for each x-value? In your example the data separates completely (the blue line is higher than the red line), so just taking the minimum of the maximums (=a) for each x-pair and then the max of the mins (=b) you'll have your "classifier" as (a+b)/2.
I suspect you have data where the 2 lines "mix". In that case, just a thought, may be try to model as 2 time series and use the models to separate the data?