What You Should Know about Relational Databases and SQL to Get a Big Data Job - dummies

What You Should Know about Relational Databases and SQL to Get a Big Data Job

By Jason Williamson

Relational database management systems (RDBMS) make up the foundation of any big data project. RDBMS systems were originally used for OLTP systems before the advent of data warehousing. Relational databases are systems that organize data into very logical tables, much like a Microsoft Excel spreadsheet.

Systems are built around tables, columns, and unique keys to access that data stored in rows using a database access language called Structured Query Language (SQL). People use RDBMSs to store structured data.

To access these databases, programmers utilize SQL to construct a query to ask for information.

The following code is an example of SQL. This SQL statement, or query, calls out the required columns and tables linked by a unique ID to get a result of the two students who happened to have grades above a 90. The key here is that unique ID, which is used to identify discrete rows within the database system.

image0.png

Select student.first, student.last from student where grade.grade > 90 and student.id=grade.id
Result:
Mark Brown
John Good

This example illustrates two important points about database systems:

  • Storage: Data must be collected and stored in a defined, or structured format.

  • Access: You must have a programmatic method to access that data. That is done through SQL. SQL is not a database management system; it’s a standard language for access data.

This extremely simple example is the framework for how most of the world’s data is stored and accessed. Even if you don’t plan on becoming a database programmer, a good grasp of SQL will be required for any big data work.

A few key vendors today store most of the world’s data. IBM Db2, Oracle, and Microsoft’s SQL Server hold almost 90 percent of commercially available database management systems. Open-source products include MySQL, which is curated by Oracle and PostgreSQL.