Question

# Which open source software is best for network data analysis?

I want to start analyzing data for illustrating relationships between persons and institutions but I am not sure which software is best to select for using. I have to use open source and free software please advice me.

29th Sep, 2021
Girne American University
Use Origin and GraphPad for that.

13th Oct, 2015
Andrew Pitts
Polinode
In my opinion the best software to use for network analysis will really depend on a number of factors, including:
1. What skills you have, especially do you have development skills and if so what languages?
2. How large the network in question is?
3. Are you more focussed on visualization or the computation of metrics such as centrality, betweenness, etc.?
4. What is your budget, if any?
It's a bit of a generalization and the below is not an exhaustive list, but in my mind the available tools can be divided into four categories:
Focused Desktop Tools
1. Gephi: Probably the most popular network visualization package out there. Gephi doesn't require any programming knowledge. It's strength is that it is able to produce very high quality visualizations. It can also handle relatively large graphs - the actual size will depend on your infrastructure (particularly RAM) but you should be able to go up to 100,000 nodes without a problem. It does have the ability to calculate a few of the more common metrics such as degree, centrality, etc. but it's a stronger tool for visualization than analysis. [Open Source]
2. NodeXL: NodeXL is an Excel add-in so you will need Excel to use it which is a bit of a limitation for Mac users for example. It doesn't have all of the flexibility of Gephi in terms of visualization but can produce some quality visualizations. It also interfaces directly with the SNAP library for analysis which gives it access to a nice set of efficient algorithms for metric calculations. The main advantage of NodeXL though is neither in its visualization or analysis functionality but rather in it's data collection - it interfaces with the Twitter API nicely for example and many of the use cases for NodeXL involve the visualization and analysis of social media data in my experience [Open Source but as I write this they have announced the Open Source functionality will be limited and that a Commercial option will be introduced]
3. Cytoscape: There is actually both a desktop version as well as a javascript version for developers (see cytoscape.js). In my experience it's primarily used in the biology domain but can certainly be used outside of it and is capable of producing high quality visualizations. [Open Source]
4. Ucinet: In my experience Ucinet is most widely used in academic circles. It's very strong on analytics with a large number of metrics. However, it is quite weak on visualization in my view (really thinking about its cousin Netdraw here), i.e. it can calculate both the common metrics as well as some quite arcane metrics but it's not great at turning those results into a well presented visualization. It also requires Windows for installation so Mac users have to be creative by using an emulator for example. [Commercial]
5. Pajek: At a high-level, not too dissimilar to Ucinet in that it is quite strong on analytics but relatively weak on visualization. There is also a version called Pajek-XXL which is specifically designed for very large graphs - if you have a network with millions of nodes and no programming skills and would like to analyse it then this wouldn't be a bad starting place. [Commercial]
General Desktop / On-Premise Solutions
1. Palantir: Very expensive on-premise solution. Not specifically designed for traditional network analysis but rather making sense of network data in a much more general way. Used by intelligence agencies and the like. [Commercial]
2. IBM i2 Analyst's Workstation: Similar to Palantir. There is a whole history here - i2 used to be a standalone company that was acquired by IBM and there was some interesting litigation between Palantir and IBM prior to this. [Commercial]
Cloud Based Tools
1. Polinode: Polinode is software-as-a-service for network analysis, i.e. you can upload networks to the cloud and then visualize them there like you do with Gephi but with the key advantage that you can share them with other people without having them download software. It also includes analysis capabilities including 20 of the most common metrics for network analysis - it doesn't have all the metrics of say a Ucinet but for most use cases should have enough. Since it runs in your browser, if your network is very large (e.g. >50,000 nodes) then you are probably better off using one of the developer tools or a desktop tool designed for very large networks. Since it's cloud-based, Polinode is also able to integrate relationship-based surveys for the collection of network data. [Commercial - full disclosure: I'm the founder]
Developer Tools
1. NetworkX: An active community and terrific if you have some Python knowledge. If you have a large dataset (>100,000 nodes say) then this is a great place to start as many of the computationally intensive metrics now make use of sparse matrices. [Open Source]
2. iGraph: Also good for large graphs and if you prefer R over Python for your data analysis and have a solid knowledge of R then you may want to use iGraph. [Open Source]
3. SNAP: Written in C++ but with a Python interface this is the Stanford Network Analysis Project. Not for the faint of heart! A great framework to build on if you have something very technical / custom to build that needs the speed that C++ can provide but be prepared to invest a lot of time in getting up to speed. [Open Source]
4. sigma.js: For all the web developers. sigma.js is a JavaScript library that provides flexible functionality for visualization. It's light on analysis - you would need to calculate centrality, etc. externally. There are actually a lot of other JavaScript libraries out there - see ngraph, cytoscape.js, d3, arbor, alchemy and dracula for example. [Open Source]
By no means an exhaustive list but rather a few of the more commonly used applications to illustrate the trade-offs.
41 Recommendations

21st Jan, 2013
Technical University of Denmark
My favorite is igraph package in R/Python. But it requires knowledge of R or Python...
If you are familiar with Python, there is also NetworkX.
There is an easier, but less powerful, alternative: NodeXL. Wich is an Excel addon.
1 Recommendation
21st Jan, 2013
Christian Moewes
Otto-von-Guericke-Universität Magdeburg
You might also wanna try igraph http://igraph.sourceforge.net/ which is available in R and in Python. It is easy to combine networkx http://networkx.lanl.gov/ with igraph. For pure illustration purposes I recommend http://gephi.org/.
12 Recommendations
21st Jan, 2013
Marco Picone
Università degli Studi di Modena e Reggio Emilia
I would suggest NetworkX. I have also tried Network Workbench (http://nwb.cns.iu.edu/) that is a good tool but that unfortunately it is not periodically updated such as NetworkX, iGraph and Gephi.
22nd Jan, 2013
Joshua S White
State University of New York Institute of Technology at Utica/Rome
I would recomend Gephi, as long as you have a decent system to run it on it can visualize 100,000+ nodes easily.
2 Recommendations
22nd Jan, 2013
PAF Karachi Institute of Economics & Technology
23rd Jan, 2013
Sevinc Rende
Isik University
I am teaching a course on SNA. The students use Pajek - it is a good introduction to SNA and the program has an extensive user manual, a book actually. For my research, I am leaning toward Gephi, but in a previous research we relied on Cytoscape.
For a quick check of the data, I would recommend Pajek or Gephi.
24th Jan, 2013
Henk Kelderman
Institute of Psychology, Leiden University
BTW there is a good freed online course on Social Network Analysis coming up in March by Lada Adamic. She uses Gephi and iGraph. https://www.coursera.org/course/sna
2 Recommendations
31st Jan, 2013
Amit Rechavi
Pajek
6th Feb, 2013
Juan Daniel Soto Díaz
The London School of Economics and Political Science
I recommend the sna and igraph packages in R.
6th Feb, 2013
Erick Stattner
Université des Antilles
I use gephi (free), ucinet, netdraw
10th Feb, 2013
Tabriz University of Medical Sciences
I want to say thanks to all friends that answer my question and guide me.
12th Feb, 2013
Craig Valli
Edith Cowan University
If you have budget i2 Workstation has some very good tools for SNA and analysis in general. I also use Gephi as well.
14th Feb, 2013
Robert Levy
There are few I work with: NodeXL is Microsoft Excel template and is very useful. Get it from here: http://nodexl.codeplex.com/. The other is Social Networks Visualizer (http://socnetv.sourceforge.net/). Both are free and very ease to use.
1 Recommendation
19th Feb, 2013
John H Heinrichs
Wayne State University
I like NodeXL -- it works with Microsoft Excel -- an excellent book is at ... http://www.amazon.com/Analyzing-Social-Media-Networks-NodeXL/dp/0123822297
2 Recommendations
14th Mar, 2013
Erick Stattner
Université des Antilles
Gephi is a very good software that provides nice illustrations.
1 Recommendation
2nd Apr, 2013
Alexander Struck
Cluster of Excellence Matters of Activity at Humboldt-Universitaet zu Berlin
The R framework is popular with many network data analysts. http://www.r-project.org/. And these two packages provide many useful functions: http://cran.r-project.org/web/packages/sna/index.html and http://cran.r-project.org/web/packages/igraph/index.html
1 Recommendation
3rd Apr, 2013
Henk Kelderman
Institute of Psychology, Leiden University
There is also a nice course Prof. Albert-László Barabási on the web:
1 Recommendation
5th Sep, 2015
Hemchandracharya North Gujarat University
if you are asking about the biological data analysis side, then i suggest you Cell Designer and Cytoscape for data analysis
1 Recommendation
13th Oct, 2015
Andrew Pitts
Polinode
In my opinion the best software to use for network analysis will really depend on a number of factors, including:
1. What skills you have, especially do you have development skills and if so what languages?
2. How large the network in question is?
3. Are you more focussed on visualization or the computation of metrics such as centrality, betweenness, etc.?
4. What is your budget, if any?
It's a bit of a generalization and the below is not an exhaustive list, but in my mind the available tools can be divided into four categories:
Focused Desktop Tools
1. Gephi: Probably the most popular network visualization package out there. Gephi doesn't require any programming knowledge. It's strength is that it is able to produce very high quality visualizations. It can also handle relatively large graphs - the actual size will depend on your infrastructure (particularly RAM) but you should be able to go up to 100,000 nodes without a problem. It does have the ability to calculate a few of the more common metrics such as degree, centrality, etc. but it's a stronger tool for visualization than analysis. [Open Source]
2. NodeXL: NodeXL is an Excel add-in so you will need Excel to use it which is a bit of a limitation for Mac users for example. It doesn't have all of the flexibility of Gephi in terms of visualization but can produce some quality visualizations. It also interfaces directly with the SNAP library for analysis which gives it access to a nice set of efficient algorithms for metric calculations. The main advantage of NodeXL though is neither in its visualization or analysis functionality but rather in it's data collection - it interfaces with the Twitter API nicely for example and many of the use cases for NodeXL involve the visualization and analysis of social media data in my experience [Open Source but as I write this they have announced the Open Source functionality will be limited and that a Commercial option will be introduced]
3. Cytoscape: There is actually both a desktop version as well as a javascript version for developers (see cytoscape.js). In my experience it's primarily used in the biology domain but can certainly be used outside of it and is capable of producing high quality visualizations. [Open Source]
4. Ucinet: In my experience Ucinet is most widely used in academic circles. It's very strong on analytics with a large number of metrics. However, it is quite weak on visualization in my view (really thinking about its cousin Netdraw here), i.e. it can calculate both the common metrics as well as some quite arcane metrics but it's not great at turning those results into a well presented visualization. It also requires Windows for installation so Mac users have to be creative by using an emulator for example. [Commercial]
5. Pajek: At a high-level, not too dissimilar to Ucinet in that it is quite strong on analytics but relatively weak on visualization. There is also a version called Pajek-XXL which is specifically designed for very large graphs - if you have a network with millions of nodes and no programming skills and would like to analyse it then this wouldn't be a bad starting place. [Commercial]
General Desktop / On-Premise Solutions
1. Palantir: Very expensive on-premise solution. Not specifically designed for traditional network analysis but rather making sense of network data in a much more general way. Used by intelligence agencies and the like. [Commercial]
2. IBM i2 Analyst's Workstation: Similar to Palantir. There is a whole history here - i2 used to be a standalone company that was acquired by IBM and there was some interesting litigation between Palantir and IBM prior to this. [Commercial]
Cloud Based Tools
1. Polinode: Polinode is software-as-a-service for network analysis, i.e. you can upload networks to the cloud and then visualize them there like you do with Gephi but with the key advantage that you can share them with other people without having them download software. It also includes analysis capabilities including 20 of the most common metrics for network analysis - it doesn't have all the metrics of say a Ucinet but for most use cases should have enough. Since it runs in your browser, if your network is very large (e.g. >50,000 nodes) then you are probably better off using one of the developer tools or a desktop tool designed for very large networks. Since it's cloud-based, Polinode is also able to integrate relationship-based surveys for the collection of network data. [Commercial - full disclosure: I'm the founder]
Developer Tools
1. NetworkX: An active community and terrific if you have some Python knowledge. If you have a large dataset (>100,000 nodes say) then this is a great place to start as many of the computationally intensive metrics now make use of sparse matrices. [Open Source]
2. iGraph: Also good for large graphs and if you prefer R over Python for your data analysis and have a solid knowledge of R then you may want to use iGraph. [Open Source]
3. SNAP: Written in C++ but with a Python interface this is the Stanford Network Analysis Project. Not for the faint of heart! A great framework to build on if you have something very technical / custom to build that needs the speed that C++ can provide but be prepared to invest a lot of time in getting up to speed. [Open Source]
4. sigma.js: For all the web developers. sigma.js is a JavaScript library that provides flexible functionality for visualization. It's light on analysis - you would need to calculate centrality, etc. externally. There are actually a lot of other JavaScript libraries out there - see ngraph, cytoscape.js, d3, arbor, alchemy and dracula for example. [Open Source]
By no means an exhaustive list but rather a few of the more commonly used applications to illustrate the trade-offs.
41 Recommendations
14th Oct, 2015
Tabriz University of Medical Sciences
Dear Andrew Pitts,
22nd Mar, 2017
Ashis Talukder
University of Dhaka
I am confused with Networkx after reading slide of Prof Salvatore Scellato (Lecture slide 7, 8 - page 5)  http://www.cl.cam.ac.uk/teaching/1617/L109/materials.html
in another place they have mentioned (see attachment):
When should I AVOID NetworkX to perform network analysis?
• Large-scale problems that require faster approaches (i.e. massive networks
with 100M/1B edges)
• Better use of memory/threads than Python (large objects, parallel computation)
so what should I use to analyze social net with millions of nodes??? :(
22nd Mar, 2017
Andrew Pitts
Polinode
Ashis that is more of a question than an answer. But for truly large networks (> 1m nodes) you should look at Pajek-XXL or Pajek-3XL: http://mrvar.fdv.uni-lj.si/pajek/PajekXXL.htm. From a development perspective (i.e. code required) Apache Spark's GraphX is also an option (not for visualisation though): http://spark.apache.org/graphx/. You can also look at neo4j but you won't find out-of-the-box algorithms going down that route. Oh, and if your network isn't that large then, depending on the resources on your computer, Gephi can be workable up to a few million nodes.
23rd Mar, 2017
Joshua S White
State University of New York Institute of Technology at Utica/Rome
Agreed, I would use Apache Spark, or Oracle PGX, or Apache Titan to do this. You may find advantages with a PGX approach over Spark if you don't have a lot of high-end nodes to run thing on.
7th Jan, 2019
Christopher Westphal
DataWalk
Andrew Pitts provides a great response with lots of reference systems. I'd like to add one more to the mix - although this is not an "open source" -- but rather more of an "open standards" platform -- DataWalk. https://datawalk.com/ There are a lot of foundational data representation capabilities in "link charts" and it is not simply presenting a network diagram, but rather, how the underlying representation is defined, the ability to create the objects (and links). Many of these concepts are overviewed in my books. However, to transition the theory into real-work practice is not always efficient, practical, or consistent. Therefore, how accurate, true, complete, and reliable are the results generated? There are some good videos @ https://datawalk.com/resources/
1 Recommendation
28th Oct, 2019
Maria Perez
Doshisha University
Is there any software for social media analysis similar to the SNA Software Tool NodeXL that you recommend? For academia.
2nd Nov, 2019
Héctor Tuy
Sara, you might also want to evaluate this one: https://onodo.org/locale/EN
22nd Dec, 2019
University of Bouira
gephi is the best for you . i think
4th Mar, 2020
Gephi is best because of its visual encoding and plenty of features for network statistics etc. So many builtin features.
6th Mar, 2020
Dilraj Kaur
Nencki Institute of Experimental Biology
I have used Gephi, cytoscape, Igraph and pajek.
According to me Gephi is easiest and can be used very easily. Cytoscape can also be used easily, Igraph is available in Python and R .
I found Pajek not so friendly as you need to convert the input file in . net format.
6th Mar, 2020
Nick Jorgensen
N/A
I used to be a HUGE fan of Gephi -- it was easy to use, had a pretty intuitive UI, generated all sorts of handy SNA metrics, etc. However, it has not worked properly with Windows 10 for years (check the various user forums if you don't believe me). I've tried every suggested fix (most of which have to do with making sure that the Java home path is specified correctly)--edit the config file, uninstall and reinstall, clear out the temp folder, etc., etc. Nothing has worked, and this has been an ongoing issue for at least four years if not longer. NodeXL is an *okay* workaround, but the free version is more limited in terms of metrics (won't calculate modularity class, for example) and the graphical output is nowhere near as nice as Gephi. BLERG.
9th May, 2020
Byron Bignell
Nepal Community Development Foundation
We've been using Gephi, when it isn't crashing for no reason - we're looking for an alternative - might just have to go back to python & coding it all.
23rd May, 2020
Sourav Mukherjee
Indian Institute of Technology Bombay
Hi Sara, I used Cytoscape for network analysis. Easy to use and to start with for the first time users.
Best wishes.
24th Oct, 2020
Luis Grochocki
Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira
My favorites are Gephi (easy and beautiful graphs), iGraph (vast number of resources, good for larger networks), Statnet (great option if you want to run Exponential Random Grapho Models - ERGMs), Pajek and Ucinet (good for smaller networks, but their graphs are not as pretty as the previous options).

## Related Publications

Article
Full-text available
In this paper we describe how DUNE, an open source scientific software framework, is developed. Having a sustainable software framework for the solution of partial differential equations is the main driver of DUNE's development. We take a look how DUNE strives to stay sustainable software.
Preprint
In this work we consider the two-dimensional percolation model arising from the majority dynamics process at a given time $t\in\mathbb{R}_+$. We show the emergence of a sharp threshold phenomenon for the box crossing event at the critical probability parameter $p_c(t)$. We then use this result in order to obtain stretched-exponential bounds on the...
Article
\DeclareMathOperator{\zo}{\{0,1\}} %bit set \newcommand{\oo}{\{-1,1\}} %bit set \DeclareMathOperator*{\Var}{Var} \DeclareMathOperator{\Inf}{Inf} $We give a simple proof of the OSSS inequality (O’Donnell, Saks, Schramm, Servedio, FOCS 2005). The inequality states that for any decision tree$T$calculating a Boolean function$f:\zo^n\rightarrow \...
Got a technical question?