Coding Jobs for Storing Data in SQL and NoSQL Databases
Website and applications use databases to permanently store data to be retrieved at a later time. The data to be stored could be user credentials, pictures, product information, status updates, tweets, or just about anything else. The key consideration for any type of database is its scalability, or the capability to handle more data with increasing frequency. Over the last forty years, two types of databases have emerged for storing data — SQL databases and NoSQL databases.
SQL, or Structured Query Language, is the standard language for relational databases. A relational database stores data in tables of columns and rows, with a unique key assigned to each row. This unique key is then used to create relationships between data in related tables. For example, if a university were building a student database, they might create the following:
Student table with student first and last names, address information, phone number, and emergency contact information. Each student would also have a unique key (student ID).
Student_Courses table with each student’s unique key, the course’s unique key (course ID), and the student’s final grade.
Courses table with the course’s unique key, course name, professor’s name, location, and meeting time.
The tables and the predefined relationship between data elements make a database relational. You might wonder why multiple tables are needed, when instead you could represent all the information in one big table. Two reasons are data consistency and data accuracy.
If the name of a course changed and you were using one big table, you would have to find and change all instances of the course name. With one course table, you could change the course name just once. SQL databases work best when data is not repeated and is instead linked using multiple tables.
The SQL programming language accesses and modifies databases by using SQL queries. A SQL query is inserted into the code of the programming language you’re using (Ruby, Python, or PHP), and the results of the query are then output to your website or data analytics program. If you’re setting up the database yourself, you might want to install a popular database management system such as MySQL or PostgreSQL. Alternatively, if you’re using a BAAS provider for your application, your database can be installed automatically.
NoSQL databases take advantage of advances in hardware to build faster and more efficient databases. Historically, CPUs and storage were expensive, so databases were designed to minimize cost and resources used. If the database grew too large, or the queries started running slowly, the database was moved to a more powerful computer with additional storage.
Today, two things have happened that challenge assumptions used by SQL databases. First, hardware prices of storage and CPUs have dropped, while hard drives have become bigger and CPUs have become more powerful. Second, the amount of data that needs to be stored has increased, and it is not always clear in advance what data will be collected and stored.
For example, Google processes 24 petabytes (1000 terabytes) of data daily — the average consumer hard drive is a little over 1 terabyte.
NoSQL databases are distributed, which means they are stored and processed across multiple servers. Popular NoSQL databases include MongoDB, CouchDB, and Redis. A NoSQL database system has the following advantages:
No schema: Unlike with a SQL database, you do not define the data relationships beforehand. NoSQL databases allow you to change the format or type of data being stored at any time.
Scalable: If your database becomes too large, you can easily add another relatively cheap server without any downtime. For SQL databases, scaling your database requires buying an expensive server and experiencing downtime to migrate and install the database on a new system.
Cheap: You can use inexpensive consumer‐level hardware for a NoSQL database. For even greater ease, you can use a product such as Amazon web Services (AWS) to cheaply rent storage space and set up a NoSQL database with just a click, without buying physical servers.
NoSQL databases also spread the processing load across multiple CPUs. In a process called MapReduce, a large data set is divided into smaller data sets, which are each processed independently. The results from these individual data sets are then reduced back into a single data set or result.
For example, suppose you want to find the Facebook user with the most friends. Instead of using one computer to search Facebook’s 1.5 billion users, you could split the users into fifteen data sets of 100,000 users each, and then map each data set to one of fifteen computers.
Each computer would process its list to find the user with the most friends, after which each result would be reduced into one smaller data set of fifteen names. The final operation would find the user with the most friends from this smaller list of fifteen users.
NoSQL databases have many advantages, but there are some obstacles as well. The database system is relatively new, so there is less technical support and fewer people with expertise to set up, administer, and develop NoSQL databases. Additionally, because running simple queries in NoSQL databases can be difficult, doing analytics and business reporting for NoSQL databases is not as developed and is much harder than doing similar work for SQL databases.