I want to move to a graph analytics platform. I’m trying to pick between mapd, blazegraph, and some of those very cool nosql graph databases like arangodb/neo4j.
I’d like to keep track of various cost functions on all simple paths of a directed graph with fixed topology, in real time. My edge weights are updated ~100 times/second. My graphs have between 5-10 nodes and are pretty densely connected: in total, there are anywhere between 5000 and 20000 total simple paths and cycles on the graph. Right now, I’m using rethinkdb (a nosql database) and python. I’m precomputing all possible simple paths, storing them in a hash map, and just brute force recalculating them every time an edge weight is updated. My hash map points a given edge (whose weight has just been updated) to all the simple paths and cycles that it is a part of. Then I go and recalculate all of those.
Well, this is very slow and doesn’t scale!
The cost functions themselves are commutative/associative (sums and products), but I’m struggling to understand what kind of algorithm I should to compute them efficiently in parallel. The gather-apply-scatter approach feels like the right direction, except that I’m not sure how I’d actually use it to keep track only of simple paths.
Can someone help me understand how this problem has been solved before and how I might find more information about doing this with a GPU graph analytics platform? Is mapd suited to this kind of thing? Is GAS the right approach? How would I google this problem effectively?