Text Analytics Tools for Big Data
Here is an overview of some of the players in the text analysis big data market. Some are small while others are household names. Some call what they do big data text analytics, while some just refer to it as text analytics.
Attensity for big data
Attensity is one of the original text analytics companies that began developing and selling products more than ten years ago. At this time, it has over 150 enterprise customers and one of the world’s largest NLP development groups. Attensity offers several engines for text analytics. These include Auto-Classification, Entity Extraction, and Exhaustive Extraction. Exhaustive Extraction is Attensity’s flagship technology that automatically extracts facts from parsed text and organizes this information.
The company is focused on social and multichannel analytics and engagement by analyzing text for reporting from internal and external sources and then routing it to business users for engagement. It recently purchased Biz360, a social media company that aggregates huge streams of social media. It has developed a grid computing system that provides high-performance capabilities for processing massive amounts of real-time text.
Attensity uses a Hadoop framework to store data. It also has a data-queuing system that creates an orchestration process that recognizes spikes in inbound data and adjusts processing across more/less servers as needed.
Clarabridge for big data
Another pure-play text analytics vendor, Clarabridge is actually a spin-off of a business intelligence (BI) consulting firm (called Claraview) that realized the need to deal with unstructured data. Its goal is to help companies drive measurable business value by looking at the customer holistically, pinpointing key experiences and issues, and helping everyone in an organization take actions and collaborate in real time.
This includes real-time determination of sentiment and classification of customer feedback data / text and staging the verbatim for future processing into the Clarabridge system.
At this time, Clarabridge is offering its customers some sophisticated and interesting features, including single-click root cause analysis to identify what is causing a change in the volume of text feeds, sentiment, or satisfaction associated with emerging issues. It also offers its solution as a Software as a Service (SaaS).
IBM for big data
Software giant IBM offers several solutions in the text analytics space under its Smarter Planet strategy umbrella. Aside from Watson and IBM SPSS, IBM also offers IBM Content Analytics with Enterprise Search. IBM Content Analytics was developed based on work done at IBM Research.
IBM Content Analytics is used to transform content into analyzed information, and this is available for detailed analyses similar to the way structured data would be analyzed in a BI toolset. IBM Content Analytics and Enterprise Search were once two separate products.
The converged solution targets both enhanced enterprise search that uses text analytics, as well as stand-alone content analytics needs. ICAES has tight integration with the IBM InfoSphere BigInsights platform, enabling very large search and content analytics collections.
OpenText for big data
OpenText, a Canadian-based company, is probably best known for its leadership in enterprise information management solutions. Its vision revolves around managing, securing, and extracting value from the unstructured data of enterprises. It provides what it terms semantic middleware.
According to the company, its semantic technology evolution is rooted in its capability to enable real-time analytics with high accuracy on large data sets across languages, formats, and industry domains. The idea behind semantic middleware is that semantics can be exposed at different levels and work with different technologies to address business issues.
In other words, the text analytics can be enabled and utilized where needed.
SAS for big data
SAS has been solving complex big data problems for a long time. Several years ago, it purchased text analytics vendor Teragram to enhance its strategy to use both structured and unstructured data in analysis and to integrate this data for descriptive and predictive modeling. Now, its text analytics capabilities are part of its overall analytics platform and text data is viewed as simply another source of data.
SAS continues to innovate in the area of high-performance analytics to ensure that performance meets customer expectations. The goal is to take problems that used to take weeks to solve and solve them in days, or problems that used to take days to solve and solve them in minutes instead.
For example, the SAS High Performance Analytics Server is an in-memory solution that allows you to develop analytical models using complete data, not just a subset of aggregate data. SAS says that you can use thousands of variables and millions of documents as part of this analysis. The solution runs on EMC Greenplum or Teradata appliances as well as on commodity hardware using Hadoop Distributed File System (HDFS).