Best Practices for Big Data Integration
Many companies are exploring big data problems and coming up with some innovative solutions. Now is the time to pay attention to some best practices, or basic principles, that will serve you well as you begin your big data journey.
In reality, big data integration fits into the overall process of integration of data across your company. Therefore, you can't simply toss aside everything you have learned from data integration of traditional data sources. The same rules apply whether you are thinking about traditional data management or big data management.
Keep these key issues at the top of your priority list for big data integration:
Keep data quality in perspective. Your emphasis on data quality depends on the stage of your big data analysis. Don't expect to be able to control data quality when you do your initial analysis on huge volumes of data. However, when you narrow down your big data to identify a subset that is most meaningful to your organization, this is when you need to focus on data quality.
Ultimately, data quality becomes important if you want your results to be understood n context with your historical data. As your company relies more and more on analytics as a key planning tool, data quality can mean the difference between success and failure.
Consider real-time data requirements. Big data will bring streaming data to the forefront. Therefore, you will have to have a clear understanding of how you integrate data in motion into your environment for predictable analysis.
Don't create new silos of information. While so much of the emphasis around big data is focused on Hadoop and other unstructured and semi-structured sources, remember that you have to manage this data in context with the business. You will therefore need to integrate these sources with your line of business data and your data warehouse.