Bigtable/Wide Column Store Features in NoSQL Databases - dummies

Bigtable/Wide Column Store Features in NoSQL Databases

By Adam Fowler

Part of NoSQL For Dummies Cheat Sheet

Bigtables clones are a type of NoSQL database that emerged from Google’s seminal Bigtable paper. Bigtables are a highly distributed way to manage tabular data. These tables of data are not related to each other like they would be in a traditional Relational Database Management System (RDBMS). Here are the most important features from popular database choices.

Feature Area Accumulo Cassandra HBase Hypertable
HA Replicas Yes, Sync Yes, Async Yes, Sync TBD
DR Replicas As HDFS Yes, Asyn As HDFS TBD
Data types No data type support. Yes, schema must be defined up front. No data type support. No data type support.
Data indexing No secondary indexing. Not a true “secondary index” feature — only
allows columns to be used in queries — doesn’t speed up
data retrieval.
Supports Bloom filters.
No indexing.
Supports Bloom filters.
Full secondary indexes.
Query and search Uses Map/Reduce for accessing data. CQL query language similar to SQL. Uses Map/Reduce for accessing data. Can be used with Hive query
Value exact match and string “starts with” queries.
Column exists query term support. No range.
Commercials Apache 2. Used in government for secure Bigtable needs. Commercial version from DataStax. Apache 2. Available from a number of Hadoop providers. GPL v3 licensed.
Other Role based access control (RBAC) and cell (per value) level
security useful for government use cases.
Custom authentication and authorization plug-ins available. Partial
encryption at rest of data in Accumulo 1.6. (Intermediate recovery
files not encrypted.)
0.5–1.0TB of data recommended per node. SSD storage
recommended. 32GB RAM and 4/8 cores recommended.
Recommended AWS system for 1TB of data is 2.2xlarge (60GB RAM + SSD
storage), or smaller c3.2large for 100GB of data.
Support for encryption of data at rest (but not journal logs).
Viewed as the slower of the Hadoop-based NoSQL databases.
“Endpoints” provide functionality similar to stored
Adaptive memory allocation feature automatically tunes RAM
usage for write-heavy and read-heavy applications.