Collecting Structured and Unstructured Data - dummies

Collecting Structured and Unstructured Data

By Mico Yuk, Stephanie Diamond

Data collected in the past was structured and could fit into neat rows and columns. An example of this would be an Excel spreadsheet with delimited data (data that was separated by a specific character, such as a comma). Most internal information specialists were content to display this data (such as customer records) in long spreadsheets. They were tasked with reporting what the data said, and everyone used the same results.

With structured data, there was no opportunity to visualize the story that the data told to extract valuable insights. The data wasn’t interactive and didn’t allow for customization. It was valuable to a point, but there was no way to understand what the customer thought about the product after they bought it. You would only know that the product had been bought it. And that data is only one part of the puzzle.

Today, companies are facing a mountain of a new type of data: unstructured data, which doesn’t always come in a neat package. Following are a few examples of this type of data:

  • Opinions: Opinions are gathered by review sites such as Yelp, shown in the following figure. You can access the reviews directly or use a tool that scrapes the data from the site so that you can put that data in your own data-viz tool.


  • Visuals: Visuals are chosen by users of sites such as Pinterest, shown in the figure below. In the case of Pinterest, you can access the site to see what images about and by your company have been pinned by customers who are searching for your company’s name. You may have data about what pins that your company put on Pinterest are being re-pinned by others as well as data about people who have seen your company’s product or image elsewhere on the web and have pinned it directly to Pinterest for others to find.


  • Smartphone data: Phone records, e-mails, and other search data are available from your phone.

This unstructured content represents data that’s incredibly valuable to any online business. The key to using the data is to utilize software programs (such as SAP) that enable you to combine structured data with unstructured data to gain a greater understanding of the business and its customers. From this analysis, companies can begin to make predictions about customer behavior and revenue generation.

Typically, organizations that use unstructured data use natural language processing software to analyze it.