NoSQL Search Features to Consider - dummies

NoSQL Search Features to Consider

By Adam Fowler

Many NoSQL databases support query capabilities and certain search capabilities. Choosing the right one often comes down to understanding the features you need to support.

Although they’re related, query and search are quite different. A query returns only the results that match all the terms in it. Search, on the other hand, can include optional terms and typically provides results ordered by a relevancy calculation.

Relevancy calculations enable many more-flexible search interactions. The users doing the searches make the final call about which result is a match for them — the search engine just provides ordered hints.

Both search and query enable exact value matches and range queries — for example, where a date field value in a record lies between two values. Range queries aren’t supported by many NoSQL databases or search engines, so if you need them, be sure to check for this early in your selection process.

Most search engines are designed to search entire records and to limit their query terms to specific fields (such as a “published on” date). Typically, multiple free-text query methods are available, including these:

  • Word query, where each word is OR’ed together: So “adam fowler blog” is evaluated as adam OR fowler OR blog, with a match of all words, resulting in a higher relevancy score than would using just one of the words.

  • Phrase query, where the whole phrase is treated as one: So “Of Mice and Men” is evaluated such that the result must have all the words, in the same order, to be a match.

  • Wildcard: Searching for “run*” returns results for “run,” “runs,” “running,” and “runner.”

  • Stemming: A search for “run” also returns results for “ran” and “runs,” but not “running” or “runner”; searching for “cat” also returns results for “cats.”

  • Lemmatization: This is more advanced method than stemming. For example, lemmatization of the term “better” results in its lemma (the base or dictionary form of a word), which in this case is “good.”

Most people prefer typing search terms in a Google-style search box because it’s so user-friendly.

Search engines support a text format called search grammar. In Google, for example, typing “site:uk AND Adam Fowler AND London” matches all documents from United Kingdom websites that mention the word London and the phrase Adam Fowler.

You can also use parentheses to nest terms within terms, which is particularly useful with Boolean operators such as AND, OR, and NOT.

At times, a more structured query mechanism is required. Typically, this is a tailored and very fine-grained format in a JSON or XML format. A good example is found in geospatial queries. Specifying a bounding box of an area you want a search to match is typically best done on a map, by drawing a box rather than typing coordinates in a free-text search bar.

More and more, people use geospatial queries to limit their searches, including the following:

  • Point: Matches an exact point.

  • Point radius: Matches within a distance of a point (a circular area).

  • Bounding box: Matches a square area (as in a map’s displayed area).

  • Polygon: Matches a freehand or irregular shape. A good example is the shape of a county or state. These are basically large arrays of coordinates within an area.

  • Polygon-polygon intersection: Matches polygons within records (as opposed to points within records as in the preceding query types) with a query polygon. Does the query polygon touch the one in a record, or completely contain it, or miss it entirely? This type of query requires a lot of processing and isn’t widely supported, but it’s needed in the defense industry.

t’s important to note that Geospatial queries rely heavily on complex mathematics about how a particular coordinate reference system (called a CRS) represents the world. For example, GPS devices use a system called WGS84 (also called EPSG:4326), whereas online maps use the EPSG:900916 CRS system. Being aware of how a database stores its geospatial data and the necessary conversions is very important; otherwise, you might not get a match at all, or be several hundred yards away from your intended result.

Some search engines also support methods that help users enter their queries. This is most obvious in Google’s autocomplete functionality. You start typing a query, and Google suggests the most likely queries for you to select from. Many search engines support similar functionality.

Many more search functions are available, so be sure to do your research before selecting a search engine.