Skip to content
Home » Insurance fraud detection with Neo4j

Insurance fraud detection with Neo4j

How graphs and Neo4j can help insurance industry to combat fraud detection

Insurance fraud came into being concomitantly with the spread of insurance policies themselves. From then on, every type of insurance coverage has always been accompanied by attempts, of varying degrees of success, to defraud insurers. In times of recession and economic crisis, insurance fraud tends to increase, especially in countries like Italy. Here most people fail to perceive insurance fraud as a crime against the rights and interests of citizens.

Actually, however, insurance fraud damages first and foremost the honest policyholder, who just as in cases of tax evasion must sustain the economic costs of illicit conduct on the part of others.

In this kind of economic situation insurance fraud spreads beyond organized crime, which uses the proceeds to finance other illegal activities.

In the past, fraud detection was relegated to claims agents who had to rely on few facts and a large amount of intuition. Now insurance companies have large amount of data available from underwriting, claims, law enforcement and even other insurers.

Today, companies have to face both structured and unstructured data, analyzing them fast to obtain valuable information. Insurance companies should avoid silos of information trying to integrate all systems. They also need to integrate open data or data sources that are not common for insurance industry like social data.

Two important goals to achieve are: predictive analysis in order to stop a fraudulent action before it occurs and find recurrent patterns and understand client behaviours from data.

How a graph database can fit into fraud detection

When we talk about searching recurrent patterns and behaviours we are also talking about relationships. If we consider a fraud we are taking in account different subjects that are linked together in someway. Those relationships can give us valuable insights of our data. When we need to represents entities or subjects and their relationships the best fit is a graph.

Searching a pattern using a graph means searching a subgraph but not only, we can view recurrent client behaviors as subgraphs as well. Discovering “strange” behaviours or anomalous links among subjects requires us to overcome the computational complexity associated with the traversal of data relationships.

Whether we are building an automated fraud detection system that can detect and prevent fraud as it occurs or we are providing a tool to our analysts to help with manual fraud detection, real-time traversal of a complex and highly interconnected dataset is essential.

Another important aspect to take in account is that traditional fraud analytics looks for outliers but fraudsters try to act normal to avoid detection. What we need is to detect fraudulent links also analysing normal behaviours.

Why Neo4j?

Neo4j stores interconnected data that is neither purely linear nor purely hierarchical, making it easier to detect links of fraudulent activity regardless of the depth or the shape of the data.

  • Neo4j’s versatile property graph model makes it easier for organizations to evolve fraud detection data models and rules, helping security teams match the pace of ever-advancing fraudsters.
  • Neo4j’s native graph processing engine supports high-performance graph queries on large datasets to enable real-time fraud detection.

The built-in, high-availability features of Neo4j ensure your mission critical fraud detection applications are always available.

Fraud insurance scenarios

We consider a simple data model where we want to analyze two common scenarios. 1) Groups of people that are involved in different accidents but with different role for each accident and anomalous relationships among first aid location. 2) Healthcare facility location and where injured people live.

According to our simple model, we have many subjects linked to an accident: drivers, pedestrians, doctors, lawyers and witnesses. People can drive or be passengers of cars. Accidents can involve cars. Lawyers and doctors can be linked to people they work for.

All of these subjects seem to act normal in each accident. But what if two or more of them are present in more than one accident with different roles? What if a doctor lives near a driver who gets first aid assistance in a healthcare facility where that doctor works? And also the healthcare facility is far from driver’s home?

As said above, these two scenarios are common, but difficult to discover with traditional analytics. They are, instead, easy to find using a graph. Relationships become a key to identify fraud rings or fraudsters.

Fraud examples

Let’s see some examples. As a fraud investigator, we want to get all the entities (subjects) that are linked to a particular accident. (We are taking into account all the accidents within 1 Km from Piazza Duca D’Aosta in Milan).

But that information alone is not sufficient to identify a fraud or any anomalous link among subjects. The question we need to ask is: are the cars and people of the first accident involved in other accidents? For example, a lawyer who represents a driver who is involved in an accident is also involved, as driver, in another accident where the previous driver is now a witness.

This is only a simple relationship between two subjects. But what if this relationship occurs more than twice with different types (driver, witness, pedestrian, etc)? Something is suspicious and we need to investigate further.

Another interesting example focuses on geographical location. We can discover some “hidden links” among subjects exploiting these information. We want to find a doctor who lives near a driver (he is doctor’s neighbour) who plays a part in an accident. This driver goes to a healthcare facility where this doctor works but this facility is also far from driver’s home.

This use case raises some questions. Why does anybody choose a healthcare facility far from his home when there is a facility closer to him? Maybe the farthest healthcare facility is better than the nearest. However, it seems suspicious the neighborhood relationship between the subjects. So, again, we need to investigate further.

Social open data can play an important role in discovering these “hidden links” adding more value to our data. Gathering information from Facebook, Twitter, Foursquare and other social applications can provide us further relationships among our entities (friendships, geolocation data, etc) enriching our dataset.


Following a trail looking for connections among entities is a very fast operation in Neo4j. This technology allow to ask sophisticated questions about the connections in our data. Thanks to its fast performance when walking the graph, Neo4j also enables real-time detection preventing frauds before they happen.

Our model can evolve easily and become even more complex in the future integrating other internal or external sources. Regardless the complexity of our model the queries are clear, concise and fast.

insurance company can add graph queries and new rules to their standard checks. At appropriate points in time, such as when the claim is filed, they can use them to flag suspected fraud rings in real time.

Modern fraud detection tools can improve by looking beyond individual data points to connection that link them. The best solution to do this is a native graph database like Neo4j.