Science topic

Graph Data Mining - Science topic

Explore the latest questions and answers in Graph Data Mining, and find Graph Data Mining experts.
Questions related to Graph Data Mining
  • asked a question related to Graph Data Mining
Question
4 answers
The excellent survey of graph mining by Chakrabarti and Faloutsos (ACM Computing Surveys, 2006) is getting a little old. Does anyone know of a more recent (but good) one?
Relevant answer
Answer
If you are looking for graph embedding survey here are some recent survey.
1. Graph embedding techniques, applications, and performance: A survey (https://arxiv.org/pdf/1705.02801.pdf)
2. A comprehensive survey of graph embedding: Problems, techniques, and applications (https://arxiv.org/pdf/1709.07604.pdf)
3. A comprehensive survey on graph neural networks (https://arxiv.org/pdf/1901.00596.pdf)
Also, if you are looking for graph mining focusing on anomaly detection then
1. Graph based anomaly detection and description: a survey (https://arxiv.org/pdf/1404.4679.pdf)
  • asked a question related to Graph Data Mining
Question
2 answers
There is a question involved me a few days ago. Is there a correlation between power-law distribution and the Pareto principle ( 80/20 rule ) in natural phenomena?Are they describe the one thing?
Relevant answer
Answer
same
  • asked a question related to Graph Data Mining
Question
13 answers
I have a sort of data in which the change in the weight of materials is recorded during the time. Unfortunately because of special condition I cannot record the weight in the first 75 seconds.
- Is there any way to predict the initial missed data (I mean the change in the weight in the first 75 seconds)?
- How can I find the equation of the curve that fit the data points?
Any solution with MATLAB, SPSS, and Excel softwares is appreciated.
Relevant answer
Answer
hello,
please use any statistical forecasting techniques and check which one is suitable for your data.
good luck@
  • asked a question related to Graph Data Mining
Question
2 answers
I did try to perform a sensitivity analysis on ANP problem using node sensitivity option on Super Decision software. But I'm still not quite sure what does the parameter value (x-axis on the graph) mean. Why is the 0.5 point on the x-axis always refer to the original value of the normalized by cluster for each supplier? Thank you.
Relevant answer
Answer
Dear Deorita,
Please follow the paper below. Follow the steps.
Sipahi, S., & Timor, M. (2010). The analytic hierarchy process and analytic network process: an overview of applications. Management Decision, 48(5), 775-808.
  • asked a question related to Graph Data Mining
Question
7 answers
My question concerns graph database programs like Neo4j and/or JanusGraph.  I am knew to graph databases.  Say I have a dataset made of lots of small (sub)-graphs and a few large ones.  All these mini graphs are self-contained.  Once the entire dataset is loaded into the graph database program, do programs like JanusGraph offer a means of rapidly finding all these natural sub-graphs?  I know that JanusGraph and Neo4j offer graph partitioning but I am talking about finding ALL the "natural" sub-graphs within the dataset WITHOUT breaking any of them into smaller pieces. How does one use JanusGraph and/or Neo4j to find these sub-graphs?
Relevant answer
Answer
Dear Scott,
Michael is right in his views.
Follow the link:
  • asked a question related to Graph Data Mining
Question
14 answers
Currently I am working on plugin for Gephi platform that imports citations networks from Crossref resources.
Below you can watch it in action:
This networks can further be analyzed by Gephi functions.
As we know Crossref is a valuable data store of metadata of scientific works.
What other open access valuable bibliographic data sources (in terms of completeness, quality, well defined web access) of scientific papers citations (that could be used for bibliometric, webometric and scientometric analysis) would you recommend?
Relevant answer
  • asked a question related to Graph Data Mining
Question
3 answers
See the following link for a description of graph databases https://en.wikipedia.org/wiki/Graph_database
It seems that such data bases and the benefits they provide for analysis have primarily been used in business, basic science and in examination of social interactions, e.g. social networks. It would be helpful to know about other uses such in the humanities.
Relevant answer
Answer
Hello again Dibakar,
It is helpful to learn of the Switzerland Digital Humanities Project to store related information (earlier UK efforts and the European Digital Humanities Association) as well as its use of RDF graphing. Below is a link to an article describing differences between RDF and Property Graph Databases such as with Neo4j:
In searching the “Digital Scholarship in the Humanities” publication, there seem to be at least a couple articles that employee graph databases with the second making mention of Neo4j:
The link to the article about the HuNI (Humanities Networked Infrastructure) in Australia is “even sweeter” as it describes how this site provides graph database capabilities to be utilized with its holdings as well as some captivating examples and links to additional information (e.g. in the in the HuNI site ‘About’ section ‘Technologies’ lists those included such as Neo4j). Very cool!
So good to get this information about graph database use.
Thank you once again, Fred
  • asked a question related to Graph Data Mining
Question
3 answers
Actually I want to implement gauss Seidel method to find out the solution of linear equation system of sparse matrices but now i stuck with the dependency in every iteration and not getting any solution.. please provide some resource so that I could implement it...
Relevant answer
Answer
I hope this work could help you:
  • asked a question related to Graph Data Mining
Question
13 answers
Dear Researchers, I have download NSL-KDD dataset (train + test) I apply J48 on KDD 20% data set which contain 42 attributes one of the attribute is class (normal & anomaly) when I apply j48 it not gives me attack category (Dos, U2R,R2L, prob). It only gives me anomaly and normal categories. Can anyone helps me on how can I give category wise output. Just like I attach another snap of paper where author get proper result with the same dataset.
I will be very thankful.
Relevant answer
Answer
  • asked a question related to Graph Data Mining
Question
4 answers
If data is distributed on a different machine then is the use of a graph database efficient or not?
Relevant answer
Answer
If by data you refer to the use of "linked data", then using RDF databases coupled with SPARQL to query them can be efficient, since SPARQL supports data and query federation in real time, esplly with SPARQL 1.1. 
In generality, the "efficiency" depends on a lot of factors, as others have pointed out in previous answers, this includes, indexing schemes, storage and retrieval schemes and also the characteristics of data in use imho. Other than these, the application of a graph database is also of relevance, which range from search engines, question answering (which involves intensive use of knowledge graphs), recommendation systems, pattern matching, graph-specific tasks such as mining and analysis, etc. 
Hope it helps!
  • asked a question related to Graph Data Mining
Question
1 answer
The dataset is an incomplete and potentially biased representation of academic journal and conference publishing just like any other known dataset that attempts to fulfil a similar mission. Herrmannova and Knoth (2016) describe both aspects that can be relied upon and those that need to be used with care, but is there specific, operational ways to improve the quality of Microsoft Academic Graph data?
Relevant answer
Answer
Aleksandr Sir, please explain  how citation analysis can be done. how it is related with data mining. what is your aim of this research questions. 
  • asked a question related to Graph Data Mining
  • asked a question related to Graph Data Mining
Question
9 answers
I'm Workinog on "community detection in networks considering node attributes". In this regard, I have already need some benchmark networks for testing my proposed algorithm through comparison of predicted labels (communities assignments) with the real ones (ground-truth). These networks should be undirected include non-overlapping communites, have small to big sizes, edges show the relations between nodes, nodes have some personal features likely affecting their community memberships and finally the true labels of nodes be known as ground-truth for evaluation of my predicted labels. Although I had an extensive search, but unfortunately I couldn’t find any networks considering these characteristics.
I really appreciate if anybody can address me some references or network benchmarks that satisfy my requirements.
Thank you in advance for your time and cooperation
Best regards,
Esmaeil
Relevant answer
Answer
Try network data sets at these sites:
 (i) University of Michigan, (ii)UCI (iii)Gephi
(iv)Washington State University (v)LAW� (vi) Yahoo graphs (vii) SNAP (viii) KONECT
  • asked a question related to Graph Data Mining
Question
5 answers
As per the definition of power law, the fraction P(k) of nodes with k degree for large values of k , given by
P(k) ~k ^-r .
In this definition, the term large value is not clearly defined. Does large implies 100 or 10^3 or 10^6?
Does the definition implies that vertices in power law graphs have almost constant degree with only few exceptions called hub nodes which have very degree may be 100times higher than other vertices?
If the above statement is true, then why do the degree distribution graph for powerlaw graphs do not have only few points with high value of degree and other nodes at the end of disconnected curve compared to close points on decaying curve as in this plot yahoo_web_graph[ (http://bickson.blogspot.in/2011/12/preview-for-graphlab-v2-new-features.html)][1]
Is this definition enough to get the number of hub vertices present in the graph or fraction of edges incident on them, given the number of vertices in the graph?
Relevant answer
Answer
Hi,
I think you are following the right line of understanding. Barabási–Albert model explains how preferential attachment leads to power law and thereby helps us simulating scale-free networks. However, social networks are not just any form of scale-free network. For example, the r you have mentioned in k^-r is usually 3 in Barabási–Albert model, which may not be true for most of the real-world networks. Similarly, for a real-world network, the typical features representing a random network like average path length or clustering co-efficient may not be exactly like as it is defined in the Barabási–Albert model.
So if you want to simulate a real-world network using Barabási–Albert model, you have to choose the parameters in such a way so that its features matches the real-world network's features (as you want them to be). Tuning the parameters to match multiple features at the same time may be a non-trivial task and may lead to an optimization problem.
Hubs are more than just high degree nodes. It has a recursive definition. You may have a look at the HITS algorithm to understand more about them.
  • asked a question related to Graph Data Mining
Question
2 answers
I want to find the tightest lower and upper bounds for frequent subgraphs from
uncertain graph data and also densest frequent subgraph please suggest me
Relevant answer
Answer
Dear
please i want dataset for large graph for graph mining in matlab 
Thanks a lot 
  • asked a question related to Graph Data Mining
Question
3 answers
If there is a collection of graphs and I need to see which are the patterns (sub-graphs) frequently occurring in different graphs. 
Relevant answer
Answer
Refer to this link.
  • asked a question related to Graph Data Mining
Question
2 answers
Is there anyone working on all pair shortest path problem? i need to know the best, time efficient, approach in literature.
Relevant answer
Answer
Hello Waqas Nawaz
Good question...This paper can be useful:
"A Comparative Analysis for Determining the Optimal Path using PSO and GA"
  • asked a question related to Graph Data Mining
Question
5 answers
I need to work with re-labeling of nodes on time-evolving graphs to identify frequent patterns.All help is appreciated.Thanks
Relevant answer
Answer
Instead of re-labeling in the meaning of replacing a label with another, why not keep a set of labels associated with each node for each perspective? You can also add probabilities to the labels. I guess credal sets (or something similar) can be used. 
  • asked a question related to Graph Data Mining
Question
3 answers
I need to work with multi-attribute on time-evolving graphs to identify frequent patterns. All help is appreciated.Thanks
  • asked a question related to Graph Data Mining
Question
8 answers
Hello, everyone, now I embark on an new research about the relationship on Twitter. However, I was confused how can I map a large amount of links on one map quickly. I have a 250000*250000 0 or 1sparse matrixI which represents the relationship on Twitter. I tried some software like Gephi, but it is too slow to draw the picture on it and it often fail to work. I tried package igraph on R and it runs a whole night and end up with a picture. However, it is very ugly...  Can you give me some advice to deal with it with igraph? or can you recommand some powerful software for me? Thank you very much!
Relevant answer
Answer
With such a large network, you cannot see all nodes and how they are organized with a single picture, this is just too messy.
I would say that the method you should use to visualize your network depends on what you want to see. If you're interested in seeing some influential nodes for instance, why not filter out of your network some of the less influential nodes ? Because the graph is probably a small world type, the large majority of your nodes have probably only a few edges. You could remove them, your graph becomes much simpler but you keep all the big players nonetheless.
On the contrary, if you're interested in the organization of the network, why not use some methods such as community detection ? Using a method such as Louvain or Infohiermap, you'll obtain a hierarchical decomposition of your network, and you can display much more easily these "bags of nodes" as a network, that is, the network of the relation between the well defined clusters of nodes in your network. 
  • asked a question related to Graph Data Mining
Question
13 answers
Big data is huge in content and dimensions and graphs are computationally expensive. Is there any possibility where a big data problem could be addressed with a graph based solution by keeping the complexity manageable. 
Thanks
Relevant answer
Answer
Yes. One place to start would be to look for solutions to problems using GraphLab. This was a project developed to use graphs in analysis of data sets from which a company now called Dato (date.com) was started. Dato builds tools for more than graphs, but this is one place that may lead you in the direction you are asking.  In addition, there are Graph databases such as neo4j and titan where data is organized in a graph rather than tabular structure.
  • asked a question related to Graph Data Mining
Question
13 answers
Is there any graph mining tools for finding a frequent subgraph in a graph dataset? Please suggest the tool for graph mining. I know data mining tools such as Weka, Rapidmainer, R etc. like that I need graph mining tools.
Is the Weka tools can handle graph dataset 
Relevant answer
Answer
Hi,
 I have used Gephi so far. It is good if you provide the input in the required format that is easy. 
Hope it helps.
Mostafa
  • asked a question related to Graph Data Mining
Question
2 answers
Thanks in advance for your replies.
Relevant answer
Neo4j is an open source graph database. Cypher is a declarative query language for Neo4j. Some modeling examples are given here: http://neo4j.com/docs/stable/data-modeling-examples.html.
  • asked a question related to Graph Data Mining
Question
4 answers
Besides classification accuracy I want to plot ROC and DET curves.  
Relevant answer
Answer
LibSVM only supports output probabilities for SVC and SVR models. One-class SVM is not supported. You could do this manually to some extent by Platt scaling the decision value you obtain from one-class SVM (e.g. running the decision values through the logistic function and optionally scaling before doing so).
Note that you don't need probabilities to plot ROC and DET curves. You can use the ranking produced via the decision values directly. The fact these curves only need a ranking (not probabilities) is one of the key reasons they are used for SVM classifiers, rather than proper scoring rules like log loss or Brier score which are more common in probabilistic models like logistic regression.
  • asked a question related to Graph Data Mining
Question
3 answers
Can anyone help me regarding a link or suitable examples to better understand label propagation?
Relevant answer
Hi Kenth
thanks for your grate support .
  • asked a question related to Graph Data Mining
Question
3 answers
Fallowing data available for mining and pattern recognition
X coordination of the cursor with system time
y coordination of the cursor with system time
( System time for every 100 ms )
Each and every user's mouse movements data available for 30 min.
How  identify mouse movements patters if exits among these user ?
Relevant answer
Answer
The first decision is if you're trying to find a pattern that you know something about (supervised case), or try find a pattern without knowing anything about it (unsupervised case).
I don't have much experience in the unsupervised task, which is what interests you probably.
I would start my search by combining terms like "time series" and "motif discovery".
A quick search gives these papers that seem like they might be what you're looking for:
I don't have enough experience with these types of tools or your type of data to give you a single recommendation.
  • asked a question related to Graph Data Mining
Question
9 answers
The large graph has to partition in to sub partition by edge weight.
Relevant answer
Answer
Have a look at METIS and ParMETIS. Both libraries allow weights for edges and partition to minimize edge cuts by considering number of interpartition edges and edges' weight.
  • asked a question related to Graph Data Mining
Question
3 answers
I need a benchmark data for minimum spanning tree problems. Can any researcher who works in this field share a data set with me?
Relevant answer
Answer
My email address is: sasan.barak@gmail.com
  • asked a question related to Graph Data Mining
Question
6 answers
I want to implement pageranks and various improved pagerank algorithms on graph data but I am unable to find a simulator or real implementation of a pagerank algorithm.
Relevant answer
Answer
If you are interested in small to medium scale graphs that evolve in time (e.g. according to local characteristics of nodes and some rules) than a multiagent simulation toolkit may be usefull. I can recommend two of them for such purpose:
- Repast Simphony (or Repast for HPC) - http://repast.sourceforge.net/
None of these tools will work for really big datasets.
Regards,
Radek
  • asked a question related to Graph Data Mining
Question
73 answers
There is a variety of software packages which provide graph algorithms and network analysis capabilities. As a developer of network analysis algorithms and software, I wonder which tools are most popular with researchers working on real-world data. What are your requirements with respect to usability, scalability etc.? What are the desired features? Are there analysis tasks which you would like to do but are beyond the capabilities of your current tools?
Relevant answer
Answer
Networkx is a very strang tool for generating graphs and it provides algorithms to analyze them http://networkx.github.io/
  • asked a question related to Graph Data Mining
Question
13 answers
There is a lot of open data around, but for many purposes, especially in enterprise context, indicators and automatic measurements to determine data quality is a must have.
Relevant answer
Answer
True, I am currently working on a survey on data quality assessment methodologies, dimensions and metrics (www.semantic-web-journal.net/content/quality-assessment-methodologies-linked-open-data, under review).
In the meanwhile, you could take a look at http://stats.lod2.eu/, which gives a high level overview of "quality" of data sets.
  • asked a question related to Graph Data Mining
Question
3 answers
I want to develop an application in java which takes input as the log file of a mail server in csv format and outputs a graph for it. The graph should be generated by gephi. Then again I want to input that csv file to KruskalMST.java (program for finding minimum spanning tree of the generated graph above). But before giving the csv file KruskalMST.java, i will convert all the weights to negative so that instead of getting minimum spanning tree I get maximum spanning tree because I'm interested in maximum spanning tree. KruskalMST.java will give me a file which will contain three columns (Source,Target,Weight). Again this file will be input to Gephi and it will generate a maximum spanning tree graphically, then I want to do some analysis on these two graphs.
Until now I have done all these things manually and separately. I want to integrate all these things and want to develop a single application which has gephi embedded in it.
Please give me some suggestions on how to proceed. Will gephi toolkit be helpful?
Relevant answer
Answer
Looks like you will be parsing a big log, if that's the case, use a BufferedReader that wraps a FileReader so that you don't run out of memory as your read the input csv file.
You will be doing something like (you will have to put a few try/catch blocks wherever needed)
BufferedReader reader = new BufferedReader(new FileReader(new File("myfile.csv")));
then you will use the reader to read each line, and then you will probably split the line by whatever separator the .csv file has, usually a comma character
your main logic will look something like this:
String line = null;
while ((line = reader.readLine())!=null) {
String[] parts = line.split(","); //all the fields of each row will live inside this "parts" array
//In here you will probably have to convert some fields into numbers (integers, floats, etc, depending on what's in your log)
// and feed that data into your graph so you can perform your walk, then feed that to your Gephi objects, if Gephi gives you a java api that is, i'm not familiar with Gephi
}
  • asked a question related to Graph Data Mining
Question
5 answers
I have a directed graph in which every node depicts a user and an edge between the user depicts that a mail has been exchanged between them. Weight of an edge shows the number of mails exchanged between the two users. I want to find the most weighted path from a node in this directed graph. I have used GEPHI to generate the attached graph. The graph is divided into different communities based on the weight of the edges. Each community is represented by a different color.
Relevant answer
Answer
Hello Ankit,
How about this approach:
1. Set the "cost" of each edge equal to its negative weight. For instance, if between node i and node j you currently have 5 emails exchanged, set c_{ij} = -5.
2. At the node you are interested in starting, let's call it the root node, create an artificial commodity with a supply of 1 unit.
3. At the node you are interested in ending (or if you are interested in all other nodes, this piece will be embedded in a loop) create an artificial demand of 1 for the commodity created in step 2.
4. Solve a minimum cost network flow (MCNF) problem with the root node as the origin, your other node of interest as its destination and all other nodes acting as transshipment nodes (i.e. flow balance constraints at those nodes equal 0).
Even if you use a generic solver, due to the network structure, network simplex should perform much faster than a typical linear programming solver. Moreover, due to the total unimodularity of the constraint matrix, you're guaranteed to get an integer solution. This means that the unit of flow that needs to travel from origin to destination will not get split up along the way, so your solution will be of the form "use this edge/don't use this edge" for all edges in your graph.
Tthere are other graph optimization libraries (LEDA, Goblin, LEMON) which have even faster specialized MCNF algorithms implemented, so large scale problems should not be problematic.
Note: If you are familiar with network algorithms, you will recognize what I've just described in steps 1-4 as a simple tweak on the shortest-path problem. The tweak is to make all weights negative (which, typically, a specialized shortest-path algorithm like Dijkstra's can't handle but Bellman-Ford can), such that minimizing this negative total cost is equivalent to maximizing your total sum of weights.