Articles From Allen G. Taylor
Filter Results
Cheat Sheet / Updated 04-12-2024
SQL is a popular and useful programming language. You can make SQL even more useful if you know the phases of SQL development, the criteria for normal forms, the data types used by SQL, a little bit about set and value functions, as well as some tips on how to filter tables with WHERE clauses.
View Cheat SheetCheat Sheet / Updated 04-12-2024
A wide range of tools is available that are designed to help big businesses and small take advantage of the data science revolution. Among the most essential of these tools are Microsoft Power BI, Tableau, SQL, and the R and Python programming languages.
View Cheat SheetArticle / Updated 09-30-2022
Depending on their histories, different SQL implementations support a variety of data types. The SQL specification recognizes nine predefined general types, shown in the lists below. Exact Numerics: INTEGER SMALLINT BIGINT NUMERIC DECIMAL DECFLOAT Approximate Numerics: REAL DOUBLE PRECISION FLOAT Boolean: BOOLEAN Character Strings: CHARACTER (CHAR) CHARACTER VARYING (VARCHAR) NATIONAL CHARACTER (NCHAR) NATIONAL CHARACTER VARYING (NVARCHAR) Datetimes: DATE TIME TIMESTAMP TIME WITH TIMEZONE TIMESTAMP WITH TIMEZONE Intervals: INTERVAL DAY INTERVAL YEAR Large Objects: BLOB CLOB Collection Types: ARRAY MULTISET Other Types: ROW XML
View ArticleArticle / Updated 09-30-2022
To ensure that database tables are designed in such a way that they will hold your data reliably, you need to be sure that they are not subject to modification anomalies. Normalizing your databases will give you that assurance. Compare the SQL criteria in the following list to the tables in your database. Doing so will alert you to the possibility of anomalies, when you find that your database is not sufficiently normalized. First Normal Form (1NF): Table must be two-dimensional, with rows and columns. Each row contains data that pertains to one thing or one portion of a thing. Each column contains data for a single attribute of the thing being described. Each cell (intersection of row and column) of the table must be single-valued. All entries in a column must be of the same kind. Each column must have a unique name. No two rows may be identical. The order of the columns and of the rows does not matter. Second Normal Form (2NF): Table must be in first normal form (1NF). All non-key attributes (columns) must be dependent on the entire key. Third Normal Form (3NF): Table must be in second normal form (2NF). Table has no transitive dependencies. Domain-Key Normal Form (DK/NF): Every constraint on the table is a logical consequence of the definition of keys and domains.
View ArticleCheat Sheet / Updated 01-27-2022
This Cheat Sheet consists of several helpful tables and lists, containing information that comes up repeatedly when working with structured query language (SQL). In one place, you can get a quick answer to a number of different questions that frequently arise during an SQL development effort.
View Cheat SheetArticle / Updated 01-18-2022
SQL gives you options for retrieving, analyzing, and displaying the information you need with the GROUP BY, HAVING, and ORDER BY clauses. Here are some examples of how you can use them. GROUP BY clauses Sometimes, rather than retrieving individual records, you want to know something about a group of records. The GROUP BY clause is the tool you need. Suppose you’re the sales manager of another location, and you want to look at the performance of your sales force. If you do a simple SELECT, such as the following query: SELECT InvoiceNo, SaleDate, Salesperson, TotalSale FROM SALES; This result gives you some idea of how well your salespeople are doing because so few total sales are involved. However, in real life, a company would have many more sales — and it wouldn’t be so easy to tell whether sales objectives were being met. To do the real analysis, you can combine the GROUP BY clause with one of the aggregate functions (also called set functions) to get a quantitative picture of sales performance. For example, you can see which salesperson is selling more of the profitable high-ticket items by using the average (AVG) function as follows: SELECT Salesperson, AVG(TotalSale) FROM SALES GROUP BY Salesperson; Running the query with a different database management system would retrieve the same result, but might appear a little different. The average value of Bennett’s sales is considerably higher than that of the other two salespeople. You compare total sales with a similar query: SELECT Salesperson, SUM(TotalSale) FROM SALES GROUP BY Salesperson; Bennett also has the highest total sales, which is consistent with having the highest average sales. HAVING clauses You can analyze the grouped data further by using the HAVING clause. The HAVING clause is a filter that acts similar to a WHERE clause, but on groups of rows rather than on individual rows. To illustrate the function of the HAVING clause, suppose the sales manager considers Bennett to be in a class by himself. His performance distorts the overall data for the other salespeople. (Aha — a curve-wrecker.) You can exclude Bennett’s sales from the grouped data by using a HAVING clause as follows: SELECT Salesperson, SUM(TotalSale) FROM SALES GROUP BY Salesperson HAVING Salesperson <>'Bennett'; Only rows where the salesperson is not Bennett are considered. ORDER BY clauses Use the ORDER BY clause to display the output table of a query in either ascending or descending alphabetical order. Whereas the GROUP BY clause gathers rows into groups and sorts the groups into alphabetical order, ORDER BY sorts individual rows. The ORDER BY clause must be the last clause that you specify in a query. If the query also contains a GROUP BY clause, the clause first arranges the output rows into groups. The ORDER BY clause then sorts the rows within each group. If you have no GROUP BY clause, then the statement considers the entire table as a group, and the ORDER BY clause sorts all its rows according to the column (or columns) that the ORDER BY clause specifies. To illustrate this point, consider the data in the SALES table. The SALES table contains columns for InvoiceNo, SaleDate, Salesperson, and TotalSale. If you use the following example, you see all the data in the SALES table — but in an arbitrary order: SELECT * FROM SALES ; In one implementation, this may be the order in which you inserted the rows in the table; in another implementation, the order may be that of the most recent updates. The order can also change unexpectedly if anyone physically reorganizes the database. That’s one reason it’s usually a good idea to specify the order in which you want the rows. You may, for example, want to see the rows in order by the SaleDate like this: SELECT * FROM SALES ORDER BY SaleDate ; This example returns all the rows in the SALES table in order by SaleDate. For rows with the same SaleDate, the default order depends on the implementation. You can, however, specify how to sort the rows that share the same SaleDate. You may want to see the sales for each SaleDate in order by InvoiceNo, as follows: SELECT * FROM SALES ORDER BY SaleDate, InvoiceNo ; This example first orders the sales by SaleDate; then for each SaleDate, it orders the sales by InvoiceNo. But don’t confuse that example with the following query: SELECT * FROM SALES ORDER BY InvoiceNo, SaleDate ; This query first orders the sales by INVOICE_NO. Then for each different InvoiceNo, the query orders the sales by SaleDate. This probably won’t yield the result you want, because it’s unlikely that multiple sale dates will exist for a single invoice number. The following query is another example of how SQL can return data: SELECT * FROM SALES ORDER BY Salesperson, SaleDate ; This example first orders by Salesperson and then by SaleDate. After you look at the data in that order, you may want to invert it, as follows: SELECT * FROM SALES ORDER BY SaleDate, Salesperson ; This example orders the rows first by SaleDate and then by Salesperson. All these ordering examples are in ascending (ASC) order, which is the default sort order. The last SELECT shows earlier sales first — and, within a given date, shows sales for ‘Adams’ before ‘Baker’. If you prefer descending (DESC) order, you can specify this order for one or more of the order columns, as follows: SELECT * FROM SALES ORDER BY SaleDate DESC, Salesperson ASC ; This example specifies a descending order for sale dates, showing the more recent sales first, and an ascending order for salespeople, putting them in alphabetical order. That should give you a better picture of how Bennett’s performance stacks up against that of the other salespeople.
View ArticleArticle / Updated 01-14-2022
After you create a database in SQL, the next step is to fill it with data. The next step after that is to wait around until there's a need for some particular bit of information contained somewhere in that data, like a needle in a haystack. When the time comes and you want to find that needle, you can use SQL's Data Manipulation Language (DML) to perform a virtual needle extraction. Sometimes the data you want is not stored in any single table in your database, but instead pieces of it are scattered across multiple tables. SQL offers several methods of gathering such far-flung data and presenting it to you integrated together in a nice compact result set, some of which are described here: Relational operators have the ability to combine information from multiple sources in a variety of ways. For example, the UNION operator will return all the rows that appear in either of two compatible tables. In contrast, the INTERSECT operator will return only the rows that appear in both of two compatible tables. When you use the EXCEPT operator, all the rows from one table will be returned, except for rows that match rows in the second table. In addition, a wide variety of join operators will enable you to fine-tune your retrievals, pulling just the data you want from whichever tables it might reside in. Nested queries allow you to retrieve data from multiple tables. These are queries on one table that include a subquery on a different table. The subquery is executed first and its result is then passed to the main query to provide a result that contains information from both. Recursive queries have the rather amazing ability to call themselves. This ability can be a major time saver, enabling a search tree to be traversed in a lot less time that an exhaustive search would take. Bill of Materials databases are places where recursive queries can deliver major performance gains.
View ArticleArticle / Updated 01-14-2022
There are three sources of modification anomalies in SQL These are defined as first, second, and third normal forms (1NF, 2NF, 3NF). These normal forms act as remedies to modification anomalies. First normal form To be in first normal form (1NF), a table must have the following qualities: The table is two-dimensional with rows and columns. Each row contains data that pertains to some thing or portion of a thing. Each column contains data for a single attribute of the thing it’s describing. Each cell (intersection of a row and a column) of the table must have only a single value. Entries in any column must all be of the same kind. If, for example, the entry in one row of a column contains an employee name, all the other rows must contain employee names in that column, too. Each column must have a unique name. No two rows may be identical (that is, each row must be unique). The order of the columns and the order of the rows are not significant. A table (relation) in first normal form is immune to some kinds of modification anomalies but is still subject to others. The SALES table is in first normal form, and the table is subject to deletion and insertion anomalies. First normal form may prove useful in some applications but unreliable in others. Second normal form To appreciate second normal form, you must understand the idea of functional dependency. A functional dependency is a relationship between or among attributes. One attribute is functionally dependent on another if the value of the second attribute determines the value of the first attribute. If you know the value of the second attribute, you can determine the value of the first attribute. Suppose, for example, that a table has attributes (columns) StandardCharge, NumberOfTests, and TotalCharge that relate through the following equation: TotalCharge = StandardCharge * NumberOfTests TotalCharge is functionally dependent on both StandardCharge and NumberOfTests. If you know the values of StandardCharge and NumberOfTests, you can determine the value of TotalCharge. Every table in first normal form must have a unique primary key. That key may consist of one or more than one column. A key consisting of more than one column is called a composite key. To be in second normal form (2NF), all non-key attributes must depend on the entire key. Thus, every relation that is in 1NF with a single attribute key is automatically in second normal form. If a relation has a composite key, all non-key attributes must depend on all components of the key. If you have a table where some non-key attributes don’t depend on all components of the key, break the table up into two or more tables so that — in each of the new tables — all non-key attributes depend on all components of the primary key. Sound confusing? Look at an example to clarify matters. Consider the SALES table. Instead of recording only a single purchase for each customer, you add a row every time a customer buys an item for the first time. An additional difference is that charter customers (those with Customer_ID values of 1001 to 1007) get a discount off the normal price. Customer_ID does not uniquely identify a row. In two rows, Customer_ID is 1001. In two other rows, Customer_ID is 1010. The combination of the Customer_ID column and the Product column uniquely identifies a row. These two columns together are a composite key. If not for the fact that some customers qualify for a discount and others don’t, the table wouldn’t be in second normal form, because Price (a non-key attribute) would depend only on part of the key (Product). Because some customers do qualify for a discount, Price depends on both CustomerID and Product, and the table is in second normal form. Third normal form Tables in second normal form are especially vulnerable to some types of modification anomalies — in particular, those that come from transitive dependencies. A transitive dependency occurs when one attribute depends on a second attribute, which depends on a third attribute. Deletions in a table with such a dependency can cause unwanted information loss. A relation in third normal form is a relation in second normal form with no transitive dependencies. Look again at the SALES table, which you know is in first normal form. As long as you constrain entries to permit only one row for each Customer_ID, you have a single-attribute primary key, and the table is in second normal form. However, the table is still subject to anomalies. What if customer 1010 is unhappy with the chlorine bleach, for example, and returns the item for a refund? You want to remove the third row from the table, which records the fact that customer 1010 bought chlorine bleach. You have a problem: If you remove that row, you also lose the fact that chlorine bleach has a price of $4. This situation is an example of a transitive dependency. Price depends on Product, which, in turn, depends on the primary key Customer_ID. Breaking the SALES table into two tables solves the transitive dependency problem. The two tables make up a database that’s in third normal form.
View ArticleArticle / Updated 01-14-2022
Every column contains one value for each row of a table. SQL statements often refer to such values. A fully qualified column reference consists of the table name, a period, and then the column name (for example, PRICING.Product). Consider the following statement: SELECT PRICING.Cost FROM PRICING WHERE PRICING.Product = 'F-35' ; Here PRICING.Product is a column reference. This reference contains the value ‘F-35’. PRICING.Cost is also a column reference, but you don’t know its value until the preceding SELECT statement executes. Because it only makes sense to reference columns in the current table, you don’t generally need to use fully qualified column references. The following statement, for example, is equivalent to the previous one: SELECT Cost FROM PRICING WHERE Product = 'F-35' ; Sometimes you may be dealing with more than one table — say, when two tables in a database contain one or more columns with the same name. In such a case, you must fully qualify column references for those columns to guarantee that you get the column you want. For example, suppose that your company maintains facilities in both Kingston and Jefferson, and you maintain separate employee records for each site. You name the Kingston employee table EMP_KINGSTON, and you name the Jefferson employee table EMP_JEFFERSON. You want a list of employees who work at both sites, so you need to find the employees whose names appear in both tables. The following SELECT statement gives you what you want: SELECT EMP_KINGSTON.FirstName, EMP_KINGSTON.LastName FROM EMP_KINGSTON, EMP_JEFFERSON WHERE EMP_KINGSTON.EmpID = EMP_JEFFERSON.EmpID ; Because each employee’s ID number is unique and remains the same regardless of the work site, you can use this ID as a link between the two tables. This retrieval returns only the names of employees who appear in both tables.
View ArticleArticle / Updated 01-14-2022
Referential integrity involves maintaining consistency in a multitable SQL database. You can lose integrity by adding a row to a child table that doesn’t have a corresponding row in the child’s parent table. You can cause similar problems by deleting a row from a parent table if rows corresponding to that row exist in a child table. Suppose your business has a CUSTOMER table that keeps track of all your customers and a SALES table that records all sales transactions. You don’t want to add a row to SALES until after you enter the customer making the purchase into the CUSTOMER table. You also don’t want to delete a customer from the CUSTOMER table if that customer made purchases that exist in the SALES table. Before you perform an insertion or a deletion, you may want to check the candidate row to make sure that inserting or deleting that row doesn’t cause integrity problems. The MATCH predicate can perform such a check. Say you have a CUSTOMER table and a SALES table. CustomerID is the primary key of the CUSTOMER table and acts as a foreign key in the SALES table. Every row in the CUSTOMER table must have a unique CustomerID that isn’t null. CustomerID isn’t unique in the SALES table, because repeat customers buy more than once. This situation is fine; it doesn’t threaten integrity because CustomerID is a foreign key rather than a primary key in that table. Seemingly, CustomerID can be null in the SALES table, because someone can walk in off the street, buy something, and walk out before you get a chance to enter his or her name and address into the CUSTOMER table. This situation can create trouble — a row in the child table with no corresponding row in the parent table. To overcome this problem, you can create a generic customer in the CUSTOMER table and assign all anonymous sales to that customer. Say that a customer steps up to the cash register and claims that she bought an F-35 Strike Fighter on December 18, 2012. Although she has lost her receipt, she now wants to return the plane because it shows up like an aircraft carrier on opponents’ radar screens. You can verify whether she bought an F-35 by searching your SALES database for a match. First, you must retrieve her CustomerID into the variable vcustid; then you can use the following syntax: ... WHERE (:vcustid, 'F-35', '2017-12-18') MATCH (SELECT CustomerID, ProductID, SaleDate FROM SALES) If the MATCH predicate returns a True value, the database contains a sale of the F-35 on December 18, 2017, to this client’s CustomerID. Take back the defective product and refund the customer’s money. (Note: If any values in the first argument of the MATCH predicate are null, a True value always returns.) SQL’s developers added the MATCH predicate and the UNIQUE predicate for the same reason — they provide a way to explicitly perform the tests defined for the implicit referential integrity (RI) and UNIQUE constraints. The general form of the MATCH predicate is as follows: <i>Row_value</i>MATCH [UNIQUE] [SIMPLE| PARTIAL | FULL ] <i>Subquery</i> The UNIQUE, SIMPLE, PARTIAL, and FULL options relate to rules that come into play if the row value expression R has one or more columns that are null. The rules for the MATCH predicate are a copy of corresponding referential integrity rules.
View Article