Machine Learning/Kaggle Social Network Contest/load data: Difference between revisions
Jump to navigation
Jump to search
Created page with '== How to load the network into networkx == There is a network analysis package for Python called [http://networkx.lanl.gov/ networkx]. This package can be installed using easy_i…' |
No edit summary |
||
| Line 2: | Line 2: | ||
There is a network analysis package for Python called [http://networkx.lanl.gov/ networkx]. This package can be installed using easy_install. | There is a network analysis package for Python called [http://networkx.lanl.gov/ networkx]. This package can be installed using easy_install. | ||
The network can be loaded using the [http://networkx.lanl.gov/reference/generated/networkx.read_edgelist.html read_edgelist] function in networkx | The network can be loaded using the [http://networkx.lanl.gov/reference/generated/networkx.read_edgelist.html read_edgelist] function in networkx or by manually adding edges | ||
Method 1 | |||
<pre> | <pre> | ||
import networkx as nx | import networkx as nx | ||
| Line 10: | Line 11: | ||
Method 2 | |||
<pre> | <pre> | ||
import networkx as nx | import networkx as nx | ||
| Line 32: | Line 31: | ||
print "Loaded in ", str(time.clock() - t0), "s" | print "Loaded in ", str(time.clock() - t0), "s" | ||
</pre> | </pre> | ||
{| border="1" | |||
|- | |||
!|Rows | |||
!| 1M | |||
!| 2M | |||
!| 3M | |||
|- | |||
!|Method 1 | |||
| 20s | |||
| 53s | |||
| 103s | |||
|- | |||
!|Method 2 | |||
| 15s | |||
| 41s | |||
| 86s | |||
|} | |||
Revision as of 23:19, 18 November 2010
How to load the network into networkx
There is a network analysis package for Python called networkx. This package can be installed using easy_install.
The network can be loaded using the read_edgelist function in networkx or by manually adding edges
Method 1
import networkx as nx
DG = nx.read_edgelist('social_train.csv', create_using=nx.DiGraph(), nodetype=int, delimiter=',')
Method 2
import networkx as nx
import csv
import time
t0 = time.clock()
DG = nx.DiGraph()
netcsv = csv.reader(open('social_train.csv', 'rb'), delimiter=',')
for row in netcsv:
tmp1 = int(row[0])
tmp2 = int(row[1])
DG.add_edge(tmp1, tmp2)
print "Loaded in ", str(time.clock() - t0), "s"
| Rows | 1M | 2M | 3M |
|---|---|---|---|
| Method 1 | 20s | 53s | 103s |
| Method 2 | 15s | 41s | 86s |