Visualizing NoSQL - dummies

By Adam Fowler

Storing and retrieving large amounts of data and doing so fast is great, and once you have your newly managed data in NoSQL, you can do great things.

Entity extraction and enrichment

You can use database triggers, alert actions, and external systems to analyze source data. Perhaps it’s mostly free text but mentions known subjects. These triggers and alert actions could highlight the text as being a Person or Organization, effectively tagging the content itself, and the document it lays within.

A good example is the content in a news article. You can use a tool like Apache Stanbol or OpenCalais to identify key terms. These tools may see “President Putin” and decide this relates to a person called Vladimir Putin, who is Russian, and is the current president of the Russian Federation.

Other examples include disease and medication names, organizations, topics of conversation, products mentioned, and whether a comment was positive or negative.

These are all examples of entity extraction (which is the process of automatically extracting types of objects from their textual names). By identifying key terms, you can tag them or wrap them in an XML element, which helps you to search content more effectively.

Entity enrichment means adding information based on the original text in addition to identifying it. In the Putin example, you can turn the plain text word “Putin” into <Person uid=”Vladimir-Putin”>President Putin</Person>. Alternatively, you can turn “London” into <Place lon=”-0.15” lat=”52.5”>London</Place>.

You can show this data in a user interface as highlighted text with a link to further information about each subject.

You can provide enrichment by using free‐text search, alerting, database triggers, and integrations to external software such as TEMIS Luxid and SmartLogic.

Search and alerting

Once you store your information, you may want to search it. Free‐text search is straightforward, but after performing entity extraction, you have more options. You can search specifically for a person named “Orange” (as in William of Orange) rather than search records that mention the term orange — which, of course, is also a color and a fruit.

Doing so results in a more granular search. It also allows faceted navigation. If you go to Amazon and search for Harry Potter, you’ll see categories for books, movies, games, and so on. The product category is an example of a facet, which shows you an aspect of data within the search results — that is, the most common values of each facet across all search results, even those not on the current page.

User interfaces can support rich explorations into data (as well as basic Google‐esque searches). Users can also utilize them to save and load previous searches.

You can set up saved search criteria so that alerts are activated when newly added records match that criteria. So, if a new record arrives that matches your search criteria, an action occurs. Perhaps “Putin” becomes <Person>Putin</Person, or perhaps an email lets you know a new scientific article has been published.

Not all search engines are capable of making every query term an alert. Some are limited to text fields; others can’t do geospatial criteria. Be sure yours can handle the alerts you need to configure.

Aggregate functions

Once you find relevant information, you may want to dig deeper. Depending on the source, you might ask how many countries have a GDP of greater than $400 billion, or what’s the average age of all the members in your family tree, or where do the most snake bites occur in Australia. These examples illustrate how analytics are performed over a set of search results. These are count, mean average, and geospatial heat map calculations, respectively.

Being able to make such calculations next to the data offers several advantages. The first advantage is that you can use the indexes to speed things up. Secondly, these indexes are likely to be cached in memory, making them even faster. Thirdly, in memory indexes are particularly useful for a NoSQL database using Hadoop File System (HDFS) storage. HDFS doesn’t do native indexing or in‐memory column stores for fast aggregation calculations itself — it requires a NoSQL database on top to do this.

Facetted navigation is an example of count‐based aggregations over search results that show up in a user interface. The same is true for a timeline showing the number of records that mention a particular point in time. For example, do you want to show results from this year, this month, or this hour?

If you want this functionality, be sure your database has the ability to calculate aggregates efficiently next to the data. Most NoSQL databases do, but some don’t.

Charting and business intelligence

The next obvious user‐interface extension involves charting and viewing table summaries for live management information and historical business intelligence analysis.

Most NoSQL databases provide an easy‐to‐integrate REST API in their ­databases. This means you can plug in a range of application tiers, or even directly connect JavaScript applications to these databases. A variety of excellent charting libraries are available for JavaScript. You can even use the R Ecosystem to create charts based on data held in these databases, after installing an appropriate database connector.

Some NoSQL databases even provide an ODBC or JDBC relational database plug‐in. Creating indexes within a given record and showing them as a ­relational view is a neat way to turn unstructured data in a NoSQL document database into data that can be analyzed with a business intelligence tool.

Check whether your NoSQL database vendor provides visualization tools or has business partners with tools than can connect to these databases. In vogue tools include Tableau Server, which is a modern shared business ­intelligence server that supports publishing interactive reports over data in a variety of databases, including NoSQL databases.