Cheat Sheet

NoSQL For Dummies Cheat Sheet

From NoSQL For Dummies

By Adam Fowler

As a NoSQL developer, selecting the right product category and the right product is the first step. These guides compare the most important features in some of the most popular NoSQL databases.

Bigtable/Wide Column Store Features in NoSQL Databases

Bigtables clones are a type of NoSQL database that emerged from Google’s seminal Bigtable paper. Bigtables are a highly distributed way to manage tabular data. These tables of data are not related to each other like they would be in a traditional Relational Database Management System (RDBMS). Here are the most important features from popular database choices.

Feature Area Accumulo Cassandra HBase Hypertable
ACID or BASE ACID BASE BASE ACID
HA Replicas Yes, Sync Yes, Async Yes, Sync TBD
DR Replicas As HDFS Yes, Asyn As HDFS TBD
Data types No data type support. Yes, schema must be defined up front. No data type support. No data type support.
Data indexing No secondary indexing. Not a true “secondary index” feature — only
allows columns to be used in queries — doesn’t speed up
data retrieval.
Supports Bloom filters.
No indexing.
Supports Bloom filters.
Full secondary indexes.
Query and search Uses Map/Reduce for accessing data. CQL query language similar to SQL. Uses Map/Reduce for accessing data. Can be used with Hive query
engine.
Value exact match and string “starts with” queries.
Column exists query term support. No range.
Commercials Apache 2. Used in government for secure Bigtable needs. Commercial version from DataStax. Apache 2. Available from a number of Hadoop providers. GPL v3 licensed.
Other Role based access control (RBAC) and cell (per value) level
security useful for government use cases.
Custom authentication and authorization plug-ins available. Partial
encryption at rest of data in Accumulo 1.6. (Intermediate recovery
files not encrypted.)
0.5–1.0TB of data recommended per node. SSD storage
recommended. 32GB RAM and 4/8 cores recommended.
Recommended AWS system for 1TB of data is 2.2xlarge (60GB RAM + SSD
storage), or smaller c3.2large for 100GB of data.
Support for encryption of data at rest (but not journal logs).
Viewed as the slower of the Hadoop-based NoSQL databases.
“Endpoints” provide functionality similar to stored
procedures.
Adaptive memory allocation feature automatically tunes RAM
usage for write-heavy and read-heavy applications.

Key-Value Store NoSQL Database Features

Key-value stores are no frills NoSQL databases that generally delegate all value-handling to the application code itself. These are the key features of common key-value store databases.

image0.jpg

Document NoSQL Database Features

Document NoSQL databases are flexible and schema agnostic, which means you can load any type of document without the database needing to know the document’s structure up front. Document NoSQL databases support these important features.

Feature Area Couchbase Microsoft DocumentDB MarkLogic Server MongoDB
ACID or BASE BASE BASE, client driver consistency selection ACID, fully serializable BASE, client driver consistency selection
HA Replicas No Managed by Azure platform. Yes, Sync Yes, Async (default)
DR Replicas Yes, master-master, Async Managed by Azure platform. Yes, Async Yes, Async
Data types JSON document model JSON document model. Same types supported as JSON —
String, numbers (IEEE754), and Booleans. Extended date-time, guid,
Int64 types supported.
XML, JSON, text, and binary documents supported. All W3C XML
schema data types supported.
JSON document model. Same types as JSON. Support for 2D
geospatial data.
Data indexing Secondary indexes supported. Views supported. No universal
index. Indexes updated asynchronously.
Universal index for all JSON documents. Universal index
includes automatic range index detection. Indexes eventually
consistent, by default.
Universal index for all text, XML, and JSON documents. Views
not supported. Supports range indexes. Indexes updated within the
ACID transaction. Geospatial 2D indexes.
No universal index. Secondary indexes configurable on named
properties.
Query and search Memcached API fully supported. Queries over documents and views
supported.
Uses SQL over HTTP for queries. No free text search grammar
support. Projection and range queries supported.
Free text (similar to Google search box) search grammar and
structured queries both supported. Range queries supported.
Aggregates can be calculated during a search. Geospatial queries
supported.
Custom JSON query format with support for range queries. No
free text search grammar support. Text and Geospatial (GeoJSON)
queries supported.
Commercials Commercial-only model. Provided only on Microsoft’s Azure
platform.
Commercial-only model. AGPL licensed. Commercial licenses available.
Other Microsoft’s Azure platform hides many of the complexities
of scaling out a large database across multiple geographies.
Provides meetups at some MarkLogic offices worldwide.
Document-level security model implemented.
Strong support for local meetups at many MongoDB offices
worldwide. 10 official and 32 community client drivers.

Triple Store and Graph NoSQL Database Features

You can use a triple store or graph NoSQL database if you have a web of interconnected data, or you can simply tag your data and infer relationships according to the records that share the same tags. These database products support these important features.

Feature Area AllegroGraph MarkLogic Server Neo4j OrientDB
ACID or BASE ACID, fully serializable ACID, fully serializable ACID, read committed ACID, fully serializable or read committed
HA Replicas No Yes, Sync No Yes, Sync
DR Replicas Yes, Async Yes, Async Yes, Sync (when available) TBD
Data types
Supports
integers, unsigned integers, floating point, decimals,
and time and dates.
JSON, binary, XML, free text storage supported. All W3C RDF and
XML schema types supported.
Java data types supported. JSON, binary, and RDF storage supported.
Data indexing Triple indexes optimized for graph style queries. 7 SPOGI
indexes.
Triple index optimized for known depth triple store style
queries. 4 SPOGI indexes.
Triple indexes optimized for graph style queries (shortest
path, subgraph, and so on). 7 SPOGI indexes.
Has own triple index. Optimized for triple store style
queries.
Query and search SPARQL 1.0 and 1.1 supported. SPARQL Inferencing Notation
(SPIN) API supported.
SPARQL 1.0 compliance, SPARQL 1.1 partial compliance (will be
nearly compliant in upcoming version 8). Inferencing support in
version 8.
Cypher query language provided, resembling SQL. No standards
support. Shortest path, Dijkstra, and A* graph algorithms
supported.
No W3C SPARQL or GraphStore protocol support for storing or
querying RDF data. Has own query language.
Commercials Commercial-only model. Available from Franz, Inc. Free version
available limited to 5 million triples. Developer version available
limited to 50 million triples.
Commercial-only model. Entry level “Essential
Enterprise” edition for small clusters, and “Global
Enterprise” for large clusters.
Provided under AGPL. Commercial license available. Discounted
start-up license available.
Favorable commercial terms available for startups. Commercial
support available for Apache 2 licensed edition, although feature
limited. All features are available only in commercial
version.
Other Triple-level security supported. Online backups with
point-in-time recovery supported. CLIF++ and RDFS++ supported.
Includes a Social Network Analysis (SNA) library.
Record-level (Graph) security support. Provides meetups at some
MarkLogic offices worldwide.
Neo Technologies recommend SSDs for good performance. Record-level (Graph) security support.