Understanding Unstructured Data
Unstructured data is different than structured data in that its structure is unpredictable. Examples of unstructured data include documents, e-mails, blogs, digital images, videos, and satellite imagery. It also includes some data generated by machines or sensors. In fact, unstructured data accounts for the majority of data that’s on your company’s premises as well as external to your company in online private and public sources such as Twitter and Facebook.
In the past, most companies weren’t able to either capture or store this vast amount of data. It was simply too expensive or too overwhelming. Even if companies were able to capture the data, they didn’t have the tools to easily analyze the data and use the results to make decisions. Very few tools could make sense of these vast amounts of data. The tools that did exist were complex to use and did not produce results in a reasonable time frame.
In the end, those who really wanted to go to the enormous effort of analyzing this data were forced to work with snapshots of data. This has the undesirable effect of missing important events because they were not in a particular snapshot.
One approach that is becoming increasingly valued as a way to gain business value from unstructured data is text analytics, the process of analyzing unstructured text, extracting relevant information, and transforming it into structured information that can then be leveraged in various ways. The analysis and extraction processes take advantage of techniques that originated in computational linguistics, statistics, and other computer science disciplines.