10 Killer NoSQL Applications - dummies

10 Killer NoSQL Applications

By Adam Fowler

Often people purchase a particular platform because of the killer apps that run on it. Many NoSQL-based applications fall into the killer app category. These applications could not have become a reality using existing relational database technologies.

Facebook messaging platform

Apache Cassandra was created by Facebook to power their Inbox. It did this for a number of years. Cassandra worked by doing the following:

  • Cassandra indexed users’ messages and the terms (words, and so on) in the messages and drove a search over all the content in those messages. The user ID was the primary key. Each term became a super column, and the message IDs were the column names.

  • Cassandra provided the ability to list all messages sent to and from a particular user. Here the user id was the primary key, the recipient IDs were the super columns, and the message IDs were the column names.

The original Facebook Cassandra paper is annotated with recent information and is maintained by DataStax, the commercial company promoting Cassandra today.

Amazon DynamoDB

Amazon originally published the Dynamo paper, thereby launching the concept of NoSQL key-value stores. Since then, Amazon has created a separate database called DynamoDB as a service offered on the Amazon Web Services marketplace site.

Although DynamoDB gets its name from the original Dynamo, it has a different approach: DynamoDB provides worldwide synchronous replication in order to guarantee consistency and durability essential in enterprise applications.

With DynamoDB, you pay only for the hourly throughput capacity you use, as you use it, rather than for the amount of data you store, which is an interesting model that new application developers will find appealing. You also get as of writing a ‘free tier’ option that includes 25GB of storage and a number of write and read capacity units.

Google Mail

Google’s Bigtable was created to provide wide-column storage for a range of Google’s applications, including Orkut, Google Earth, web indexing, Google Maps, Google Books, YouTube, blogger.com, Google Code and Google Mail.

Bigtable clones provide index lookup tables for very large sets of information

LinkedIn

LinkedIn has used Hadoop to churn information about relationships overnight and to push the latest graph information to the Voldemort key-value NoSQL store for query the next day. In this way, LinkedIn maintained a rolling view of all data in the service.

Check here for further information on Espresso.

BBC iPlayer online media catalog

The British Broadcasting Corporation has an online service to provide UK citizens with a free catchup service called the iPlayer for BBC television and radio shows.

The information for episodes, series, and brands is updated by a different team from that responsible for scheduling episodes for TV.

The BBC moved multiple MySQL systems to a single MarkLogic Server 6 repository to provide access to program metadata. This operation included creating a data services API called Nitro and embedding it in MarkLogic Server.

Nitro now powers an increasing number of BBC services. Nitro started by replacing functionality in iPlayer to help stabilize the performance of that platform. In the future, Nitro will include feeds to partner organizations and have a public-facing API.

BBC Sport and Olympics platforms

In 2011, the BBC realized that its journalists were spending a lot of time deciding where to publish stories on the BBC Sport website. This cost a lot of time and money and stories weren’t consistently available to users in different areas of the sports website.

The BBC created an entirely new type of solution called Dynamic Semantic Publishing (DSP) to automate much of this process. By using a combination of MarkLogic Server 6 (the version without a triple store) and Ontotext’s GraphDB (formerly BigOWLIM), the BBC was able to suggest topics on stories to their journalists.

This approach also allowed the BBC to use the relationships inherent in the subjects mentioned in the stories to determine where to publish the data, rather than rely on the journalists.

By going to the BBC Sport home page and clicking on the link for the England football team, you see not only stories about the England football team, but also any players who happen to play for England, or stories about the players’ spouses, even though they aren’t explicitly mentioning the England football team in the stories.

Check here for information on the excellent England football team.

HealthCare.gov

Healthcare.gov has been called the most complex IT system implementation of all time. Building it required several systems, with the most visible one being the HealthCare.gov marketplace.

Behind the scenes, many other systems provide supporting functions, including stores for information from other agencies, such as IRS data and information about coverage that states already offer to their residents. Also, insurers submit the policies they want to offer to citiznes on the federal marketplace website.

Communication between the various systems also requires storage of messages for safety (so they’re not lost) and later delivery. Although HealthCare.gov provides coverage to citizens in thirty-four states, the back-end systems support all fifty states through the database, and feeds the states’ own marketplaces.

The Centers for Medicare & Medicaid Services (CMS) selected MarkLogic to provide the back-end database for all these systems’ data. MarkLogic Server stores an anonymized version of all the XML content flowing between these systems and provides the capability to match requirements of citizens with insurance coverage available.

The subsystem that tracks and analyzes all message traffic in real time has proven to be the most visible success of a NoSQL system that affects citizens lives directly. Although the project experienced public difficulties, the level of complexity that was handled and the now successful rollout to more than seven million newly covered Americans resulted in a great success.

Check here for information on MarkLogic’s use in HealthCare.gov.

UK NHS Spine 2 Backbone

The UK National Health Service comprises hundreds of organizations, all under one national umbrella. For example, general practice surgeries and hospitals each have their own systems.

The UK NHS provides a set of services called the Spine. For example, the Spine includes a service that assigns newborns with a unique NHS number that will be with them for life. This Spine has a system that acts as a messaging backbone for a variety of systems, called Spine Core.

Spine 2 also includes communication with the Summary Care Record (SCR) systems that enable personnel in hospital emergency rooms to locate individual medical records throughout the country.

The Spine 2 backbone is built on top of Riak and replaces a very costly system built on top of older relational technology.

Check here for information on Riak’s use in Spine 2 Core.

Secure information sharing

In many situations, you need to provide access to information while also maintaining its security. Here are several examples:

  • A book publisher providing access to summaries so that you can verify the relevance of a book before purchase, but only view the full book after purchase

  • A multiagency social care application with different access rights for child protection officers, medical staff, educators, and law enforcement agencies

  • An intelligence-sharing application where high-level information on an intelligence report is shared for discovery, but where all access must be applied for and granted on a case-by-case basis

These situations share a common approach: they require security set at the record level as a minimum, so that you can show or hide a record to different users of the system.

Also, to provide secure access to specific sections within a record, you will also require either denormalization, or cell-based, or label-based access control (LBAC). LBAC enforces record security based on the content of that record rather than explicit permissions set for that record.

In these scenarios, NoSQL databases that support record or cell/element/triple level security, such as Accumulo, MarkLogic Server, and AllegroGraph, are good options.

Citizen engagement

Governments use NoSQL databases to empower citizens with information about how their country is governed. A good example is Fairfax County in Virginia, which uses MarkLogic Server to provide geospatial information through an online browse and search interface to government agencies and residents. The service covers a range of information — for example, geographic points in the county and police-related events.

In the UK, the award-winning legislation.gov.uk website provides information on UK laws dating back to more than one thousand years! If you want to know the laws about theft of property in Wales in 1542, just visit the website!

You can also find laws currently being debated by Parliament, and upcoming legal clause activations are available as annotations for current legislation. This service provides citizens as well as lawmakers with a very rich reference on legal matters throughout the UK.

This website is powered by MarkLogic Server 5 (version without a triple store) and Ontotext’s GraphDB.