RELEVANT SPELLING SUGGESTION WITH TRIGRAM INDEXING IN POSTGRESQL

Spelling suggestion is a convenient way for users to search for similar content. It accepts a search query with typing mistakes and suggests the correct search query.

For example; a search box that supports multiple “did you mean?” similar to how Google search handles potential spelling errors.

To do this, we’ll use trigram indexing for individual existing words in the DB fields that you want to support suggestion.

Trigram is a technique that splits up a phrase into 3-character groups, for example the word English is split up to {" e"," en",eng,gli,ish,lis,ngl,"sh "}

Trigram then uses similarity functions to compare how close 2 phrases are. It provides the % operator that returns true when similarity is higher than a certain threshold.

With this, we can first index a field, and then use % to return a suggestion list. This list can be ordered by similarity.

Trigram is available as an extension, first we need to enable:

create extension pg_trgm;

That’s all we need to set up before we can use its features.

Split the words into a separate table

Let’s say in the DB we have a posts(title: text) table. The first step is to split the title into words so that we know what is already available to compare to.

CREATE TABLE words AS SELECT word FROM
        ts_stat('SELECT to_tsvector(''simple'', title) FROM posts');

This will create a words table with all of the single words. We use simple dictionary to create original and unstemmed records.

Index the words table with trigram

CREATE INDEX words_idx ON words USING gin(word gin_trgm_ops);

or

CREATE INDEX words_idx ON words USING gist(word gin_trgm_ops);

Query and order by similarity

SELECT t, similarity(t, 'word') AS sml
  FROM words
  WHERE t % 'word'
  ORDER BY sml DESC, t;

The function similarity(t, 'word') returns a number between 0 and 1. We then order by that so relevant words will show up first.

The % operator returns true when the similarity is higher than a threshold. By default it’s 30%, or 0.3 . You can change this by calling set_limit. For example, if you want it to suggest loosely related words then you can select set_limit(0.1)

Keep the words table up-to-date

posts table many change. You might want to update the words table accordingly. This can be done by running the ts_stat query above again from time to time using a crontab.

Use with Rails

First, create a migration to enable pg_trgm extension.

class AddTrigramExtension < ActiveRecord::Migration[5.0]
  def change
    enable_extension 'pg_trgm'
  end
end

Then in ActiveRecord just create a class method (or a scope)

class Word < ActiveRecord::Base
  def self.suggest(query)
    select(['*', "similarity(word, #{sanitize(query)}) as sml"])
      .where("word % ?”, query)
      .order('sml DESC')
      .first
  end
end

Word.suggest('english')

This doesn’t limit you to spelling suggestion, it can be used for flexible search as well

Khac Anh is a full-stack engineer with an extensive background in web and mobile development. He has built and lead tech teams in several outsourcing and startup companies, one of which was successfully publicly listed.
You might also like...
To stay productive in programming you need to receive feedback for your code as fast as possible. For example, when you want to test out some styling effects with CSS...
Basically it's where you deploy your code to. Using a PaaS has several benefits over Infrastructure-as-a-Service (IaaS) or setting your own physical servers...
React Native provides its own version of ListView, it is essentially different from Android's ListView or iOS'es UITableView...
contact codelink

Contact

Let us know how we can help!