Fulltext search with Apache Ignite

Apache Ignite is a memory centric platform for storage, analytics and computation. Hypi builds on top of Ignite to provide a low-code serverless platform that enables rapid application development.

To do this, Hypi makes heavy use of Ignite’s key value APIs. Ignite itself is very flexible and provides several APIs as a means of interacting with it and the data within it. These include Ignite’s SQL99 API which gives you access to data via standard SQL and several others.

Ignite features machine learning APIs and as of Ignite 2.7, Tensorflow integration. All things Hypi will be making available over time.

In this post, we will focus on one specific feature in Hypi., fulltext search. We’ve put out an introduction to Hypi’s query language before in the HypiQL post (now renamed to ArcQL). It covers the syntax/grammar but not much else.

Here, we’ll discuss briefly how that syntax maps down to actually finding your data in the Hypi platform.

You can skip some of this and jump into the slides we presented at the Paris and London in memory computing meetups in Feb 2019.

Hypi GraphQL Fulltext

Read on for a break down.

First, it helps to become aquatinted with how Hypi knows what data you want to put into the platform. It its core is GraphQL. The slides use a model that looks like this:

Hypi GraphQL TODO model

You’ll have noticed the use of @field in the type declarations. This is a Hypi directive that allows the developer of an app to customise some aspect of how Hypi deals with the field to which it is applied. In this image, two things are being done.

  1. indexed: true – This will cause Hypi to index the field, making its contents searchable via ArcQL
  2. type: Keyword – Where this is applied, the field will be used for exact matches. This is good for things like IDs, or emails that must match exactly. If this is not set, it defaults to Text which causes partial matches to work (like search engines)

The slides demonstrate the this model would produce a GraphQL API that looks like this:

Default GraphQL CRUD API generated by Hypi

Hypi allows an ArcQL parameter to later filter any data inserted with these models. ArcQL, being implemented on top of Apache Lucene supports many different types of queries. At present there are seven types:

Types of queries supported by ArcQL
  1. Term queries – these are used for matching on Keyword fields i.e. exact matches
  2. Phrase queries – enabled partial matching (fields indexed with Text as their type
  3. Prefix queries – enable matching the start of contents of a field
  4. Wildcard – enables matching any single character with ? or any number of characters with *
  5. Fuzzy queries – enable matching words that match if a few characters are changed e.g. name, tame or dame would all match a fuzzy query
  6. Range queries – allows finding values within a specific range, mostly numeric but can be used for strings as well
  7. Match all queries – used when you don’t know what to search for and just want to paginate all the data.

Hypi uses what’s called an affinity function in Ignite to determine a set of Ignite nodes on which it will keep the index for a cache. In general both index and raw data share nodes, it will be possible in the future to have dedicated nodes for storing indices only.

Using the Affinity function, Hypi consistently maps data around the cluster using Rendezvous hashing.

Hypi query and data routing based on Affinity function

Hypi implements a graph system on top of ignite, this combined with search capabilities enables the automatic resolution of links in the relationships found in the GraphQL models. This means that when referencing an object, there’s no need to store the “foreign key” and perform another query to resolve the foreign object, Hypi allows you to naturally express the relationship using the GraphQL SDL type system as demonstrated in the simple todo app model.

ArcQL fully supports this. In the query types above field name is shown as a simple one word field. In reality, ArcQL supports implicit JOIN queries that are the equivalent of doing a LEFT JOIN in a relational database.

For example, if you added comments to some items and wanted to find Item objects that had specific comments. A simple ArcQL query can be used to do this that may look like this

findItem(arcql: “comments.text ~ ‘the text to search for'”)

This simple but powerful query will find all items with comments matching the search text. What’s more, given that it’s a GraphQL query, if the GraphQL selection includes the comment field as in the below, then Hypi will automatically resolve comments belonging to the matching Item objects.

Notice the second arcql parameter here as well. It is entirely optional but means that you can perform a sub-query on comments that match the items that are being returned.

With this, Hypi gives you the powerful capabilities of being able to do JOIN AND sub-queries, in a simple, concise and clear way…it even marshals them for you, a luxury afforded by GraphQL.

In a follow up post, we will discuss modelling with Hypi.
We thank GridGain who organised the meetups and Oracle who also presented and demonstrated their new Timesten + Kubernetes integration.


Related Posts


    Hypi info lighthouse