Science topic
Graph Data Mining - Science topic
Explore the latest questions and answers in Graph Data Mining, and find Graph Data Mining experts.
Questions related to Graph Data Mining
The excellent survey of graph mining by Chakrabarti and Faloutsos (ACM Computing Surveys, 2006) is getting a little old. Does anyone know of a more recent (but good) one?
There is a question involved me a few days ago. Is there a correlation between power-law distribution and the Pareto principle ( 80/20 rule ) in natural phenomena?Are they describe the one thing?
I have a sort of data in which the change in the weight of materials is recorded during the time. Unfortunately because of special condition I cannot record the weight in the first 75 seconds.
- Is there any way to predict the initial missed data (I mean the change in the weight in the first 75 seconds)?
- How can I find the equation of the curve that fit the data points?
Any solution with MATLAB, SPSS, and Excel softwares is appreciated.
I did try to perform a sensitivity analysis on ANP problem using node sensitivity option on Super Decision software. But I'm still not quite sure what does the parameter value (x-axis on the graph) mean. Why is the 0.5 point on the x-axis always refer to the original value of the normalized by cluster for each supplier? Thank you.
My question concerns graph database programs like Neo4j and/or JanusGraph. I am knew to graph databases. Say I have a dataset made of lots of small (sub)-graphs and a few large ones. All these mini graphs are self-contained. Once the entire dataset is loaded into the graph database program, do programs like JanusGraph offer a means of rapidly finding all these natural sub-graphs? I know that JanusGraph and Neo4j offer graph partitioning but I am talking about finding ALL the "natural" sub-graphs within the dataset WITHOUT breaking any of them into smaller pieces. How does one use JanusGraph and/or Neo4j to find these sub-graphs?
Currently I am working on plugin for Gephi platform that imports citations networks from Crossref resources.
Below you can watch it in action:
This networks can further be analyzed by Gephi functions.
As we know Crossref is a valuable data store of metadata of scientific works.
What other open access valuable bibliographic data sources (in terms of completeness, quality, well defined web access) of scientific papers citations (that could be used for bibliometric, webometric and scientometric analysis) would you recommend?


See the following link for a description of graph databases https://en.wikipedia.org/wiki/Graph_database
It seems that such data bases and the benefits they provide for analysis have primarily been used in business, basic science and in examination of social interactions, e.g. social networks. It would be helpful to know about other uses such in the humanities.
Actually I want to implement gauss Seidel method to find out the solution of linear equation system of sparse matrices but now i stuck with the dependency in every iteration and not getting any solution.. please provide some resource so that I could implement it...
Dear Researchers, I have download NSL-KDD dataset (train + test) I apply J48 on KDD 20% data set which contain 42 attributes one of the attribute is class (normal & anomaly) when I apply j48 it not gives me attack category (Dos, U2R,R2L, prob). It only gives me anomaly and normal categories. Can anyone helps me on how can I give category wise output. Just like I attach another snap of paper where author get proper result with the same dataset.
I will be very thankful.



If data is distributed on a different machine then is the use of a graph database efficient or not?
The dataset is an incomplete and potentially biased representation of academic journal and conference publishing just like any other known dataset that attempts to fulfil a similar mission. Herrmannova and Knoth (2016) describe both aspects that can be relied upon and those that need to be used with care, but is there specific, operational ways to improve the quality of Microsoft Academic Graph data?
I'm Workinog on "community detection in networks considering node attributes". In this regard, I have already need some benchmark networks for testing my proposed algorithm through comparison of predicted labels (communities assignments) with the real ones (ground-truth). These networks should be undirected include non-overlapping communites, have small to big sizes, edges show the relations between nodes, nodes have some personal features likely affecting their community memberships and finally the true labels of nodes be known as ground-truth for evaluation of my predicted labels. Although I had an extensive search, but unfortunately I couldn’t find any networks considering these characteristics.
I really appreciate if anybody can address me some references or network benchmarks that satisfy my requirements.
Thank you in advance for your time and cooperation
Best regards,
Esmaeil
As per the definition of power law, the fraction P(k) of nodes with k degree for large values of k , given by
P(k) ~k ^-r .
In this definition, the term large value is not clearly defined. Does large implies 100 or 10^3 or 10^6?
Does the definition implies that vertices in power law graphs have almost constant degree with only few exceptions called hub nodes which have very degree may be 100times higher than other vertices?
If the above statement is true, then why do the degree distribution graph for powerlaw graphs do not have only few points with high value of degree and other nodes at the end of disconnected curve compared to close points on decaying curve as in this plot yahoo_web_graph[ (http://bickson.blogspot.in/2011/12/preview-for-graphlab-v2-new-features.html)][1]
Is this definition enough to get the number of hub vertices present in the graph or fraction of edges incident on them, given the number of vertices in the graph?
I want to find the tightest lower and upper bounds for frequent subgraphs from
uncertain graph data and also densest frequent subgraph please suggest me
If there is a collection of graphs and I need to see which are the patterns (sub-graphs) frequently occurring in different graphs.
Is there anyone working on all pair shortest path problem? i need to know the best, time efficient, approach in literature.
I need to work with re-labeling of nodes on time-evolving graphs to identify frequent patterns.All help is appreciated.Thanks
I need to work with multi-attribute on time-evolving graphs to identify frequent patterns. All help is appreciated.Thanks
Hello, everyone, now I embark on an new research about the relationship on Twitter. However, I was confused how can I map a large amount of links on one map quickly. I have a 250000*250000 0 or 1sparse matrixI which represents the relationship on Twitter. I tried some software like Gephi, but it is too slow to draw the picture on it and it often fail to work. I tried package igraph on R and it runs a whole night and end up with a picture. However, it is very ugly... Can you give me some advice to deal with it with igraph? or can you recommand some powerful software for me? Thank you very much!
Big data is huge in content and dimensions and graphs are computationally expensive. Is there any possibility where a big data problem could be addressed with a graph based solution by keeping the complexity manageable.
Thanks
Is there any graph mining tools for finding a frequent subgraph in a graph dataset? Please suggest the tool for graph mining. I know data mining tools such as Weka, Rapidmainer, R etc. like that I need graph mining tools.
Is the Weka tools can handle graph dataset
Thanks in advance for your replies.
Besides classification accuracy I want to plot ROC and DET curves.
Can anyone help me regarding a link or suitable examples to better understand label propagation?
Fallowing data available for mining and pattern recognition
X coordination of the cursor with system time
y coordination of the cursor with system time
( System time for every 100 ms )
Each and every user's mouse movements data available for 30 min.
How identify mouse movements patters if exits among these user ?
The large graph has to partition in to sub partition by edge weight.
I need a benchmark data for minimum spanning tree problems. Can any researcher who works in this field share a data set with me?
I want to implement pageranks and various improved pagerank algorithms on graph data but I am unable to find a simulator or real implementation of a pagerank algorithm.
There is a variety of software packages which provide graph algorithms and network analysis capabilities. As a developer of network analysis algorithms and software, I wonder which tools are most popular with researchers working on real-world data. What are your requirements with respect to usability, scalability etc.? What are the desired features? Are there analysis tasks which you would like to do but are beyond the capabilities of your current tools?
There is a lot of open data around, but for many purposes, especially in enterprise context, indicators and automatic measurements to determine data quality is a must have.
I want to develop an application in java which takes input as the log file of a mail server in csv format and outputs a graph for it. The graph should be generated by gephi. Then again I want to input that csv file to KruskalMST.java (program for finding minimum spanning tree of the generated graph above). But before giving the csv file KruskalMST.java, i will convert all the weights to negative so that instead of getting minimum spanning tree I get maximum spanning tree because I'm interested in maximum spanning tree. KruskalMST.java will give me a file which will contain three columns (Source,Target,Weight). Again this file will be input to Gephi and it will generate a maximum spanning tree graphically, then I want to do some analysis on these two graphs.
Until now I have done all these things manually and separately. I want to integrate all these things and want to develop a single application which has gephi embedded in it.
Please give me some suggestions on how to proceed. Will gephi toolkit be helpful?
I have a directed graph in which every node depicts a user and an edge between the user depicts that a mail has been exchanged between them. Weight of an edge shows the number of mails exchanged between the two users. I want to find the most weighted path from a node in this directed graph. I have used GEPHI to generate the attached graph. The graph is divided into different communities based on the weight of the edges. Each community is represented by a different color.