Using Scrum for Big Data and Large-Scale Migration

By Mark C. Layton, David Morrow

Scrum can help manage the heaping mounds of data in the modern world. The sheer scale of data is astounding, and it’s only getting bigger. Trying to get your head around the size of Big Data is much like trying to picture a huge mathematical phenomenon such as the speed of the expanding universe. Big Data is so big that it goes beyond what most people can imagine.

The following numbers are just a few examples of how big Big Data can be:

  • Walmart conducts 1 million transactions per hour, feeding databases of more than 2.5 petabytes (about 167 times the size of the data in all the books in the Library of Congress).
  • Facebook houses more than 40 billion photographs.
  • eBay handles 50 petabytes of information every day.
  • Decoding the human genome requires analyzing 3 billion base pairs. The first time, the process took ten years; now it takes one week.
  • Cisco estimates that in 2016, the annual run rate for global Internet traffic was 1.2 zettabytes (ZB) per year or 96 exabytes (EB) per month. Annual global Internet traffic is expected to reach 3.3ZB per year, or 278EB per month, by 2021.

The importance of Big Data can’t be overstated. Much of this data is highly personal and sensitive, and it can affect lives as well as bottom lines. The challenge is to gather, manage, and interpret data quickly, effectively, and correctly. Also, possible future uses for this data must be considered in the design of storage and retrieval processes. This data needs to become useful intelligence, not just data.

A significant challenge is that 80 percent of this data is unstructured (such as emails, blogs, spreadsheets, text documents, images, video, voice, and search logs). This unstructured segment is growing faster than structured data. In other words, the majority of data is a huge mess.

Data security and protecting privacy are more important than ever; at the same time, security is more difficult to ensure than ever. Traditional data management frameworks and processes aren’t capable of processing this quantity. Speed, flexibility, and instant feedback are needed. Six months is too long to hope that a new, untested system works. And chances are that in six months, the requirements will have changed, or a new gap will be identified.

To deal with this tsunami of data, many firms and organizations are moving to the cloud. Many organizations have their own in-house clouds or virtualized environments.

Data warehouse project management

Data warehouse projects are traditionally thought to be difficult to manage. Although each segment or phase of the project may have a discernible beginning and end, the data warehouse itself is never finished; it’s continually growing and changing.

A data warehouse isn’t some barbed-wire-fenced building on the outskirts of town. Rather, it’s a process or framework for handling data within a firm or organization, or a knowledge-based applications architecture that facilitates strategic and tactical decision-making.

A further complexity is that continuous merger and acquisition activity creates enterprisewide data-integration issues. New assets and groups are acquired or spun off, and corresponding data and processes need to be managed. Maintaining diverse legacy applications that don’t integrate well can be costlier than conversion projects.

Enterprise resource planning

Enterprise resource planning (ERP) is a suite of integrated and dynamic software applications that organizations and corporations use to gather, manage, interpret, and integrate business processes, including planning, purchasing, manufacturing, inventory, marketing, sales, distribution, finance, and human resources.

Implementing an ERP system usually means doing simultaneous development across various functional areas (such as marketing, sales, inventory, and purchasing) to conduct a specific business transaction. Implementation involves the design and configuration of many modules simultaneously. These modules, while being developed individually, must also be designed for cross-functional application. During design of the sales module, for example, careful consideration is given to both upstream and downstream processes.

Think about how sales fits in the overall end-to-end process. You start with inventory, and subsequently, you need to be able to bill your orders. Therefore, the sales module must seamlessly integrate with your inventory module and your finance module (and your inventory module must integrate with your manufacturing and purchasing modules, which must integrate with your finance modules, and so on).

Unfortunately, designing and building these individual modules traditionally takes years before the integrated testing phase begins. By this time, any gaps between modules require even more time to identify and fix. One small gap between sales and finance can result in months of extra work. Commonly, this fix may not integrate perfectly with another module somewhere else in the process. When everyone works in silos until the integrated testing phase, early detection of gaps and misfits is difficult.

Traditionally, ERP providers handled this interdependency by locking in a specific development sequence. In fact, even parameters that weren’t going to be used needed to be configured in the order defined by the ERP provider.

Now, with scaled scrum teams, you can do that customization in parallel, with each scrum team focusing on a specific functional area and using automated integration testing to ensure that the business transaction works across the modules. Following agile techniques allows integration testing to occur every day (from the first day) as opposed to months or years into the project.

Although these modular interdependencies may seem to be liabilities, they make it easier to divide the work into chunks that fit separate scrum teams running synchronized sprints. Product backlog prioritization is set at program level, and incremental requirement changes are minimized. Sprint backlog prioritization also falls in line. You maintain the flexibility of scrum and dramatically accelerate the pace of implementation.

ERP systems architecture is increasingly becoming oriented toward Software as a Service (SaaS), which means that monolith components are more modular than they used to be for client installations.

Also, the tasks required to configure ERP systems are usually repetitive, so cadence and estimation can be established early and provide accurate sizing and timing predictions.

To tackle more of an ERP implementation project at once to speed delivery, multiple teams may work on each business-function component at the same time. Effective use of scaled scrum enables multiple scrum teams to structure their definition of done to include integration, regression, performance, security, and regression testing at the sprint level rather than release. Alignment of definition of done is required because ERP systems are difficult to correct when conflicts are introduced into production. Teams learn to be disciplined in their definition of done.

Srum works well with these types of projects when they focus on delivering business intelligence for the organization. Visual reports of data have a clear business-focused requirement for users. The work of preparing the data (such as aggregation and manipulation) makes up the tasks supporting the delivery of a report to the specified user (such as an executive or manager).

Multiyear ERP implementations used to be common, but organizations can’t wait that long in today’s fast-paced market. Organizations need solutions faster and cheaper. Customers want to see a return on their investment as quickly as possible, with improved customer satisfaction.

Iterating, inspecting, and adapting through scrum make shortened implementations possible.