Tuesday, June 20, 2017

Neo4J - Motivation for Graph Databases

I'll start the story very simple. Suppose you're asked to find,
"Who is the friend of someone who has the most number of mutual friends in his friend-list?"
Simply, you have to go thru each and every friend of that person and count the number of mutual friends they have, and then sort out and find the maximum count right? Simple?

What is the RDBMS solution for this?

  1. Having a People Table.
  2. Having a Relationship Table.

Now what is the query if you have the name of him? Take a complete day and I hope, you solve that issue, I know you're an SQL expert, but anyways once you solve that issue. Please come back to this blog and check this simplest solution and die reading following three reasons,
  1. Even though you've solved that issue, it has taken a lot of time.
  2. And two, it's not human readable.
  3. Most importantly, it's running very slow.
Where have you got the things wrong? and Why?

Even though Relational Databases are named as relational, it's very difficult to handle relations since the structural space is very limited. The structure doesn't give the user freedom to have more than one data in one tuple related to one topic. Even to handle one to many relationship, the user MUST go and save the redundant id as a foreign key in some other table. if it's many to many relationship, there is no solution without a complete redundant information table.

The problem is, the flexibility is very limited in a table structure. That's where people start finding a simple solution rather than handling complicated stuff everyday for the simple real world scenarios. So, here it comes Graph-Databases.

Graph and Graph-Databases


What is a graph? Graph is a simple set of nodes relating to each other. Those nodes can be any thing, like People, Sports, Foods, Phones...etc. And Relationships are simply relationship between two nodes. Simple? Yes, it is going to be actually simple this time.

Before all, I'll explain you everything with the question I started to discuss in the beginning of this blog. Maybe you took two weeks to solve the above issue with RDBMS and have come back. It's alright, you're not still late, since you've come back from your legacy shit..lol

Here is the simple people graph for the above issue, and of course, it's only one graph for every complicated shit!


It's like a coordination complex, but, this is not chemistry, so, this is a graph. Is this a complex graph?Nope, it's very simple now. Just by seeing this graph, you yourself can find all the mutual friends of someone and some other one and calculate numbers and everything.

How amazing if that simplicity is straight-forward, without complicated set of logic in the middle, like you spent weeks when solving with RDBMS?

Yes, It's not a magic, here is the query to find the friend of "Danu" with the most number of mutual friends.

MATCH (p:Person {name:"Danu"}) -[:friend]-(f:Person)-[:friend]-(r:Person)
MATCH (p)-[:friend]-(r)
RETURN r,count(f) as c ORDER BY c DESC

Difficult? Nope, it's very simple.
  1. p,f,r,c are variables. Just to identify nodes and relationships in the same query.
  2. () is a node, {} are properties and [] is a relationship
  3. -- does depict the connection between relationship and nodes  and the direction (I am not using direction in this example)
So, simply the meaning of the above query you can exactly guess by checking the above human readable query is,
"From Danu's friends of friends, find Danu's friends and count the number of intermediate friends between them and sort in a the descending order."
Hope, you all enjoyed!

References:
  1. https://neo4j.com/developer/get-started/
  2. http://neo4j.com/docs/developer-manual/current/cypher/
  3. https://youtu.be/UJ81zWBMguc?list=PLAWPhrZnH759YHRieMBzsQRvr56JcYx5l