Managing Different Data Types with NoSQL - dummies

Managing Different Data Types with NoSQL

By Adam Fowler

NoSQL databases aren’t restricted to a rows‐and‐columns approach. They are designed to handle a great variety of data, including data whose structure changes over time and whose interrelationships aren’t yet known.

NoSQL databases come in four core types — one for each type of data the database is expected to manage:

  • Columnar: Extension to traditional table structures. Supports variable sets of columns (column families) and is optimized for column‐wide operations (such as count, sum, and mean average).

  • Key‐value: A very simple structure. Sets of named keys and their value(s), typically an uninterpreted chunk of data. Sometimes that simple value may in fact be a JSON or binary document.

  • Triple: A single fact represented by three elements:

    • The subject you’re describing

    • The name of its property or relationship to another subject

    • The value — either an intrinsic value (such as an integer) or the unique ID of another subject (if it’s a relationship)

    For example, Adam likes Cheese. Adam is the subject, likes is the predicate, and Cheese is the object.

  • Document: XML, JSON, text, or binary blob. Any treelike structure can be represented as an XML or JSON document, including things such as an order that includes a delivery address, billing details, and a list of products and quantities.

    Some document NoSQL databases support storing a separate list (or document) of properties about the document, too.

Most data problems can be described in terms of the preceding data structures. Indeed, nearly all computer programs ever written fall into these categories. It is therefore important to understand how you can best store, retrieve and query that data.

The good news is that there’s now a set of databases to properly manage each different type of data, so you don’t have to shred data into a fixed relational schema (shred means convert complex data structures to simple excel like table structures with relationships, which has always seemed like the wrong thing to do).

In addition to the preceding NoSQL data types, here are two other developments worth mentioning:

  • Search engines: If you’re storing information that has a variable structure or copious text, you need a common way across structures to find relevant information, which search engines provide.

  • Hybrid NoSQL databases: These databases provide a mix of the core features of multiple NoSQL database types — such as key‐value, document, and triple stores — all in the same product.

Several search engines and hybrid databases apply general themes present in NoSQL products — namely, allowing variable data types and being horizontally scalable on commodity hardware. The internal designs of search engines and hybrid NoSQL databases are similar and complementary.