Capabilities of the Neo4j Graph Database with Real-life Examples
- 49492 views
- 19 min
- Jul 05, 2018
A database is an integral part of any application.
Not only does a database store information, it also impacts the overall performance of software. So selecting a database suitable for your project is crucial. Lots of applications rely on a relational database such as MySQL or PostgreSQL. Despite the many advantages of relational databases, however, they aren’t efficient at coping with ever-growing amounts of connected data.
To handle a growing volume of connected data, you can go for Neo4j, a non-relational graph database that’s optimized for managing relationships. The Neo4j database can help you build high-performance and scalable applications that use large volumes of connected data.
Many software developers know little about the capabilities of graph databases and Neo4j in particular. In this article, we explain the essence of this graph database, show when you can use it, and give examples of how to implement Neo4j in your project.
Neo4j database: Concepts and principles
Before taking an in-depth look at how to implement the Neo4j database in a real project, you should clearly understand how this technology works, what business purposes you can use it for, and what differentiates Neo4j from other databases.
Graph databases are the best solution for handling connected data
If you’ve worked only with relational databases in your career as a developer, you might be asking whether there’s any point in going for a non-relational model. Everything seems clear and familiar in the relational databases you’re used to, doesn’t it? Yet relational databases have several substantial drawbacks:
- Volume limitations − Relational data stores aren’t optimized to handle large amounts of data.
- Velocity − The performance of relational stores suffers when they need to deal with huge numbers of read/write operations.
- Lack of relationships − Relational data stores can’t describe relationships other than standard one-to-one, one-to-many, and many-to-many.
- Variety − Relational databases lack flexibility when dealing with types of data that can’t be described using the database schema. They also aren’t efficient when it comes to handling big binary and semi-structured data (JSON and XML).
- Scalability − Horizontal scaling is inefficient for relational data stores.
To overcome these limitations, a number of different non-relational databases have been created. Most of them lack relationships, however, because they often associate pieces of data with each other through references (just like foreign keys in the relational model). References make it difficult to query data (particularly, connected data) as they struggle to describe relationships between entities.
Unlike all other data storage and management technologies, graph databases are focused on relationships and store already connected data. That’s why graph databases prove the most efficient for handling large amounts of connected data.
Neo4j as a graph database
Graph databases are based on graph theory from mathematics. Graphs are structures that contain vertices (which represent entities, such as people or things) and edges (which represent connections between vertices). Edges can have numerical values called weight.
This structure enables developers to model any scenario defined by relationships. For instance, a graph database allows you to model a social network where nodes are users and relationships are connections between them. Or you can build a road network where vertices are cities, towns, or villages, while edges are roads that connect them with weights indicating distances.
Neo4j provides its own implementation of graph theory concepts. Let’s take an in-depth look at the Labeled Property Graph Model in the Neo4j database. It has the following components:
- Nodes (equivalent to vertices in graph theory). These are the main data elements that are interconnected through relationships. A node can have one or more labels (that describe its role) and properties (i.e. attributes).
- Relationships (equivalent to edges in graph theory). A relationship connects two nodes that, in turn, can have multiple relationships. Relationships can have one or more properties.
- Labels. These are used to group nodes, and each node can be assigned multiple labels. Labels are indexed to speed up finding nodes in a graph.
- Properties. These are attributes of both nodes and relationships. Neo4j allows for storing data as key-value pairs, which means properties can have any value (string, number, or boolean).
The graph data structure might seem unusual, but it’s simple and natural. Here’s an example of a simple graph data model in Neo4j:
As you can see, this graph contains two nodes (Alice and Bob) that are connected by relationships. Both nodes share the same label, Person. In the graph, only Bob’s node has properties, but in Neo4j every node and relationship can have properties.
A graph model is intuitive and easy for people to interpret. After all, the human brain doesn’t think in terms of tables and rows but in terms of abstract objects and connections. In fact, anything you can draw on a blackboard can be displayed with a graph.
How Neo4j compares to relational and other NoSQL databases
Having learned about the graph data model and the Neo4j database, you’re probably wondering how this data store differs from relational data stores. And although Neo4j belongs to the category of NoSQL tools, it’s quite different from other NoSQL databases.
So let’s briefly compare Neo4j to other relational and non-relational databases:
|Neo4j||Relational databases||NoSQL databases|
|Data storage||Graph storage structure||Fixed, predefined tables with rows and columns||Connected data not supported at the database level|
|Data modeling||Flexible data model||Database model must be developed from a logical model||Not suitable for enterprise architectures|
|Query performance||Great performance regardless of number and depth of connections||Data processing speed slows with growing number of joins||Relationships must be created at the application level|
|Query language||Cypher: native graph query language||SQL: complexity grows as the number of joins increases||Different languages are used but none is tailored to express relationships|
|Transaction support||Retains ACID transactions||ACID transaction support||BASE transactions prove unreliable for data relationships|
|Processing at scale||Inherently scalable for pattern-based queries||Scales through replication, but it’s costly||Scalable, but data integrity isn’t trustworthy|
Advantages of Neo4j databases
Designed specifically to deal with huge amounts of connected data, the Neo4j database provides the following advantages:
In relational databases, performance suffers as the number and depth of relationships increases. In graph databases like Neo4j, performance remains high even if the amount of data grows significantly.
Neo4j is flexible, as the structure and schema of a graph model can be easily adjusted to the changes in an application. Also, you can easily upgrade the data structure without damaging existing functionality.
The structure of a Neo4j database is easy-to-upgrade, so the data store can evolve along with your application.
Neo4j database use cases
Now that you know how a Neo4j database works, you’re probably wondering what you can use this data store technology for. It might seem that graph databases can be applied to solve any problem, but that isn’t quite the case. Just like any technology, Neo4j should be used when it’s suitable.
Let’s take a look at several Neo4j database use cases:
Fraud detection and analytics
Businesses lose billions of dollars every year because of fraud. Despite extensive fraud prevention methods, fraudsters come up with increasingly sophisticated ways to steal money and identities. Thanks to its graph data model, a Neo4j database allows you to enhance your application’s fraud detection capabilities and detect financial crimes such as credit card fraud, ecommerce fraud, and money laundering.
Network and database infrastructure monitoring
As the complexity of your network and IT infrastructure grows, you need a more powerful configuration management database than a relational database can provide. The Neo4j graph database allows you to connect your network, data center, and IT assets in order to get important insights into the relationships between different operations within your network. For example, Neo4j can help you manage dependencies and monitor microservices.
It’s hard to find an online business that doesn’t use a recommendation engine to recommend relevant products or services to customers. A good recommendation engine should correlate a lot of data and be able to quickly detect new interests shown by clients. Being focused on entities and relations between them, a Neo4j database can easily handle recommendations, significantly outperforming other relational and non-relational databases.
Social networks are about connections between people, so basically they have graph structures. Needless to say, graph databases like Neo4j are perfectly tailored to social networks. They speed up the development of social network applications, enhance an app’s overall performance, and allow you to better understand your data.
As your business grows, it requires a more powerful contextual search solution. Neo4j can enhance your application’s search capabilities to deliver relevant results. The graph data model can improve simple keyword search and provide additional results related to keywords.
Identity and access management
Managing constantly changing roles, groups, and identities can be a complex task for businesses. A graph database like Neo4j allows you to monitor identity and access authorizations.
Privacy and risk compliance
Neo4j facilitates personal data storage and management: it allows you to track where private information is stored and which systems, applications, and users access it. The graph data model helps visualize personal data and allows for data analysis and pattern detection. Neo4j also comes in handy for financial risk reporting and compliance.
Master data management
To deliver the most pleasant customer experience, businesses need to analyze lots of data. Graph databases help to unify master data, such as information about customers, products, suppliers, and logistics. Neo4j allows you to organize master data and model it in a graph, revealing connections and relationships. Neo4j can provide important insights so that you can make relevant business decisions.
Building an email targeting system with Neo4j
Now that you know what the Neo4j database is and what opportunities it provides to businesses, you’re ready to take a look at a real-life example of how you can apply this data storage technology. We’ve decided to build a simple email targeting system with a Neo4j database, as an email targeting system is an important feature for lots of online businesses, namely online stores and marketplaces.
Our email targeting system will help analyze customer behavior and decide which offers to target audiences with. Thanks to this targeting system, businesses can offer relevant products to people and, therefore, increase conversions and contribute to overall customer satisfaction (since people expect to receive relevant offers).
Step #1: Installing Neo4j
For our sample email targeting system, we only need to download and install Neo4j Server. We could use Neo4j Desktop, but it contains extra functions, most of which we don’t need.
Installing Neo4j Server is quite simple. We’re going to use Neo4j 3.4.1 Community Edition for Ubuntu.
Step #2: Launching the Neo4j Browser
After installing the Neo4j Server, it’s time to run it using the command <NEO4J_HOME>/bin/neo4j start (the top level directory is referred to as NEO4J_HOME). After that, you can launch your web browser and start an interactive console called Neo4j Browser (it’s installed by default with Neo4j Server). To access the Neo4j Browser, go to http://localhost:7474/browser/ in your web browser and sign in with the default login and password (neo4j for both).
Once you’ve signed in, change the password. Then sign in with your new password to establish a connection with the database server.
The Neo4j Browser has an interactive console with a number of commands (:play start, :play concepts, and :play cypher). You can take a training tour and learn more about how to use the Neo4j database, check out sample graphs (such as the Movie Graph), and examine the state of the active database.
Now you have the toolkit for building an email targeting system.
Step #3: Data modeling
Before you start modeling your data, spend some time analyzing the business purpose of your email targeting system. Such systems are used to offer the most relevant products or services to customers, so marketers and analysts need to monitor customer behavior to launch efficient email marketing campaigns.
Our email targeting system is going to have the following entities (with attributes in parentheses):
- Category (title)
- Product (title, description, price, availability, shippability)
- Customer (name, email, registration date)
- Promotional Offer (type, content)
In our graph database, each of these entities is going to have nodes with respective labels. All entities will be connected via relationships (for the sake of simplicity, we’re going to consider only relationships between two entities). Note that we’re using the singular naming for all entities even though one-to-many connections, which are commonly used in relational databases, are also possible. Also, just like entities, some relationships in our model are going to have properties. Let’s write these properties in parentheses.
So here’s what we’ve got:
- Product is_in Category
- Customer added_to_wish_list Product
- Customer bought Product
- Customer viewed (clicks_count) Product
It doesn’t matter where the information about clicks comes from; let’s just assume we have this data.
- Promotional Offer used_to_promote Product
Note that in Neo4j, there’s no need to model bidirectional relationships (such as Product is_in Category and Category has_many Product). Graph databases allow us to follow edges in both directions.
And… that’s it.
Modeling entities and relationships in a graph database is that simple and intuitive, as we don’t need to switch from a logical model (how entities are connected from the perspective of a task we need to solve) to a physical model (how we store data in our database). It’s also easy to add, modify, or delete new entities and relationships in a graph database without bothering with foreign keys (as in relational databases) or links (as in NoSQL databases).
That’s an amazing advantage of graph databases.
Step #4: Working with the database
Now it’s time to fill our Neo4j database according to the model we defined in the previous step. There are two ways to do it:
- Use Gremlin, a domain-specific language created specifically for graphs and written in Groovy. Though Gremlin is concise and has a narrow focus, it’s overly mathematical (as it uses concepts from graph theory). Today, Gremlin is considered somewhat outdated and is being replaced by Cypher.
- Use Cypher, a declarative language like SQL that has distinctive semantics and allows you to write flexible and easy-to-read queries. Cypher syntax emphasizes directions in relationships between entities. Recently, Cypher became an open source project that’s maintained and upgraded by a community of contributors.
Needless to say, we’re going to use Cypher to work with the Neo4j database.
For the sake of convenience, we’re going to add nodes and relationships step by step. First, let’s introduce Categories and Products:
Now we should add customers and establish relationships between them and the products in our database (this part is a continuation of the previous query):
Now the database contains all necessary entities and relationships. As you can see, Cypher is so declarative that you can guess exactly what every piece of code does.
To visualize the graph, execute the MATCH (n) RETURN n query, which returns all nodes in our graph. If everything is correct, you’ll get this graph:
The graph is scalable, so it will work fast even with far bigger datasets.
As you can see, the Neo4j Browser allows you not only to create a graph but also to visualize data; this is really helpful when it comes to creating an efficient email targeting campaign. Neo4j helps you model your data and gain valuable insights.
Now it’s time to apply our data to a real-life business problem.
Example #1: Using Neo4j to determine customer preferences
Suppose we need to learn preferences of our customers to create a promotional offer for a specific product category, such as notebooks. First, Neo4j allows us to quickly obtain a list of notebooks that customers have viewed or added to their wish lists. We can use this code to select all such notebooks:
Now that we have a list of notebooks, we can easily include them in a promotional offer. Let’s make a few modifications to the code above:
We can track the changes in the graph with the following query:
Linking a promotional offer with specific customers makes no sense, as the structure of graphs allows you to access any node easily. We can collect emails for a newsletter by analyzing the products in our promotional offer.
When creating a promotional offer, it’s important to know what products customers have viewed or added to their wish lists. We can find out with this query:
This example is simple, and we could have implemented the same functionality in a relational database. But our goal is to show the intuitiveness of Cypher and to demonstrate how simple it is to write queries in Neo4j.
Example #2: Using Neo4j to devise promotional offers
Now let’s imagine that we need to develop a more efficient promotional campaign. To increase conversion rates, we should offer alternative products to our customers. For example, if a customer shows interest in a certain product but doesn’t buy it, we can create a promotional offer that contains alternative products.
To show how this works, let’s create a promotional offer for a specific customer:
This query searches for products that don’t have either ADDED_TO_WISH_LIST, VIEWED, or BOUGHT relationships with a client named Alex McGyver. Next, we perform an opposite query that finds all products that Alex McGyver has viewed, added to his wish list, or bought. Also, it’s crucial to narrow down recommendations, so we should make sure that these two queries select products in the same categories. Finally, we specify that only products that cost 20 percent more or less than a specific item should be recommended to the customer.
Now let’s check if this query works correctly.
The product variable is supposed to contain the following items:
- Xiaomi Mi Mix 2 (price: $420.87). Price range for recommendations: from $336.70 to $505.04.
- Sony Xperia XA1 Dual G3112 (price: $229.50). Price range for recommendations: from $183.60 to $275.40.
The free_product variable is expected to have these items:
- Apple iPhone 8 Plus 64GB (price: $874.20)
- Huawei P8 Lite (price: $191.00)
- Samsung Galaxy S8 (price: $784.00)
- Sony Xperia Z22 (price: $765.00)
Note that both product and free_product variables contain items that belong to the same category, which means that the [:IS_IN]->()<-[:IS_IN] constraint has worked.
As you can see, none of the products except for the Huawei P8 Lite fits in the price range for recommendations, so only the P8 Lite will be shown on the recommendations list after the query is executed.
Now we can create our promotional offer. It’s going to be different from the previous one (personal_replacement_offer instead of discount_offer), and this time we’re going to store a customer’s email as a property of the USED_TO_PROMOTE relationship as the products contained in the free_product variable aren’t connected to specific customers. Here’s the full code for the promotional offer:
Let’s take a look at the result of this query:
- In the form of a graph
- In the form of a table
Example #3: Building a recommendation system with Neo4j
The Neo4j database proves useful for building a recommendation system.
Imagine we want to recommend products to Alex McGyver according to his interests. Neo4j allows us to easily track the products Alex is interested in and find other customers who also have expressed interest in these products. Afterward, we can check out these customers’ preferences and suggest new products to Alex.
First, let’s take a look at all customers and the products they’ve viewed, added to their wish lists, and bought:
As you can see, Alex has two touch points with other customers: the Sony Xperia XA1 Dual G3112 (purchased by Allison York) and the Nikon D7500 Kit 18–105mm VR (viewed by Joe Baxton). Therefore, in this particular case, our product recommendation system should offer to Alex those products that Allison and Joe are interested in (but not the products Alex is also interested in). We can implement this simple recommendation system with the help of the following query:
We can further improve this recommendation system by adding new conditions, but the takeaway is that Neo4j helps you build such systems quickly and easily.
Step #5: Using Neo4j with Ruby
We’ve shown just some capabilities of the Neo4j database, but so far we’ve been using the interactive console. However, you might be wondering how to add data to a real-life application. There are two options:
- REST API
- Drivers and libraries for different programming languages
The list of libraries includes Neo4j.rb, a library for using Neo4j with Ruby applications. Neo4j.rb contains several gems:
- neo4j − an Object-Graph-Mapper (OGM) for Neo4j that tries to follow API conventions that are established by ActiveRecord and therefore known to most Ruby developers.
- neo4j-core − a low-level API that can access both a server and an embedded Neo4j database; this library is automatically included in the neo4j gem.
- Neo4j-rake_tasks − a set of rake tasks for starting, stopping, and configuring a Neo4j database in your project; this gem is used by the neo4j-core library.
These gems allow you to easily wrap Cypher code and use it in your Ruby applications.
Modern applications face a challenge of handling large amounts of interconnected data, and you need to pick an efficient technology to cope with it. Neo4j allows you to build applications capable of providing valuable real-time insights into connected data for further analysis and decision-making. If you want to stay updated on the latest advances in mobile and web development, subscribe to our newsletter.
Subscribe via email and know it all first!