Originally published at: https://developer.nvidia.com/blog/accelerated-production-ready-graph-analytics-for-networkx-users/
NetworkX is a popular, easy-to-use Python library for graph analytics. However, its performance and scalability may be unsatisfactory for medium-to-large-sized networks, which can significantly hinder user productivity. NVIDIA and ArangoDB have collectively addressed these performance and scaling issues with a solution that requires zero code changes to NetworkX. This solution integrates three main components: The…
The NVIDIA and ArangoDB teams put in a lot of effort to make this blog as clear and useful to you as possible. Please let us know if you have any questions, comments, or feedback.
Hi Team,
As per directions I ran code in Accelerated, Production-Ready Graph Analytics for NetworkX Users blog but in 2nd level of code.
Median Time: 90 seconds
import pandas as pd
import networkx as nx
Read into Pandas
pandas_edgelist = pd.read_csv(
“cit-Patents.txt”,
skiprows=4,
delimiter=“\t”,
names=[“src”, “dst”],
dtype={“src”: “int32”, “dst”: “int32”},
)
Create NetworkX Graph from Edgelist
G_nx = nx.from_pandas_edgelist(
pandas_edgelist, source=“src”, target=“dst”, create_using=nx.DiGraph
)
Got an error.
ValueError Traceback (most recent call last)
in <cell line: 7>()
5
6 # Read into Pandas
----> 7 pandas_edgelist = pd.read_csv(
8 “cit-Patents.txt”,
9 skiprows=4,
3 frames
/usr/local/lib/python3.10/dist-packages/pandas/io/parsers/c_parser_wrapper.py in read(self, nrows)
232 try:
233 if self.low_memory:
→ 234 chunks = self._reader.read_low_memory(nrows)
235 # destructive to chunks
236 data = _concatenate_chunks(chunks)
parsers.pyx in pandas._libs.parsers.TextReader.read_low_memory()
parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
ValueError: Integer column has NA values in column 1
I correctly downloaded cit-Patents.txt file but it didnt work out . Could you please help me out?
Regards,
Haroon
Hi Haroon!
Thanks for reaching out.
Interesting find. The stack trace you’ve shared shows that we’re hitting the following lines in pandas: pandas/pandas/io/parsers/c_parser_wrapper.py at main · pandas-dev/pandas · GitHub.
Some starter questions for you;
- What is your version of networkx & pandas?
- How much memory do you have on your machine?
- What is the OS of your machine?
I’ve just confirmed these lines over a Google Colab instance on CPU. You can check it out here: Google Colab
Happy to investigate further, just let me know.
Anthony
Hi Anthony,
Thanks for your deatiled reply and google colab notebook to resolve the issue.My issue has been resolved.I would use GPU and Colab Enterprise going forward to execute the code.
Best Regards,
Haroon