Data Science Strategy For Dummies book cover

Data Science Strategy For Dummies

Author:
Ulrika Jägare
Published: July 11, 2019

Overview

All the answers to your data science questions

Over half of all businesses are using data science to generate insights and value from big data. How are they doing it? Data Science Strategy For Dummies answers all your questions about how to build a data science capability from scratch, starting with the “what” and the “why” of data science and covering what it takes to lead and nurture a top-notch team of data scientists.

With this book, you’ll learn how to incorporate data science as a strategic function into any business, large or small. Find solutions to your real-life challenges as you uncover the stories and value hidden within data.

  • Learn exactly what data science is and why it’s important
  • Adopt a data-driven mindset as the foundation to success
  • Understand the processes and common roadblocks behind data science
  • Keep your data science program focused on generating business value
  • Nurture a top-quality data science team

In non-technical language, Data Science Strategy For Dummies outlines new perspectives and strategies to effectively lead analytics and data science functions to create real value.

All the answers to your data science questions

Over half of all businesses are using data science to generate insights and value from big data. How are they doing it? Data Science Strategy For Dummies answers all your questions about how to build a data science capability from scratch, starting with the “what” and the “why” of data science and covering what it takes to lead and nurture a top-notch team of data scientists.

With this book, you’ll learn how to incorporate data science as a strategic function into any business, large or small. Find solutions to your

real-life challenges as you uncover the stories and value hidden within data.

  • Learn exactly what data science is and why it’s important
  • Adopt a data-driven mindset as the foundation to success
  • Understand the processes and common roadblocks behind data science
  • Keep your data science program focused on generating business value
  • Nurture a top-quality data science team

In non-technical language, Data Science Strategy For Dummies outlines new perspectives and strategies to effectively lead analytics and data science functions to create real value.

Data Science Strategy For Dummies Cheat Sheet

A revolutionary change is taking place in society and it involves data science. Everybody from small local companies to global enterprises is starting to realize the potential of data science and is seeing the value in digitizing their data assets and becoming data driven. Regardless of industry, companies have embarked on a similar journey to explore how to drive new business value by utilizing analytics, machine learning (ML), and artificial intelligence (AI) techniques and introducing data science as a new discipline. However, although utilizing these new technologies will help companies simplify their operations and drive down costs, nothing is simple about getting the strategic approach right for your data science investment. This cheat sheet gives you a peak at the fundamental concepts you need to be on top of when building your data science strategy. It looks not only at investing in a top performing data science team, but also what to consider in your data architecture and how to approach the commercial aspects of data science.

Articles From The Book

10 results

General Data Science Articles

Data Science Techniques You Can Use for Successful Change Management

For your data science investment to succeed, the data science strategy you adopt should include well-thought-out strategies for managing the fundamental change that data science solutions impose on an organization. One effective and efficient way to tackle these data science challenges is by using data-driven change management techniques to drive the transformation itself — in other words, drive the change by “practicing what you preach.” Here are some examples of how to do this in practice.

Using digital engagement tools for change management

For companies, there is a new generation of real-time employee opinion tools that are starting to replace old-fashioned employee opinion surveys. These tools can help you manage your data and tell you far more than simply what employees are thinking about once a year. In some companies, employees are surveyed weekly using a limited number of questions. The questions and models are constructed in such a way that management can follow fluctuations in important metrics as they happen rather than the usual once or twice a year. These tools have obvious relevance for change management and can help answer questions like these:
  • Is a change being equally well received across locations?
  • Are certain managers better than others at delivering messages to employees?
Assume that you have a large travel-and-tourism firm that is using one of these tools for real-time employee feedback. One data-driven approach to use in such a situation is to experiment with different change management strategies within selected populations in the company. After a few changes in the organization, you can use the data collected to identify which managers prove to be more effective in leading change than others. After that has been established, you can observe those managers to determine what they’re doing differently in their change management approach. You can then share successful techniques with other managers. This type of real-time feedback offers an opportunity to learn rapidly how communication events or engagement tactics have been received, thus optimizing your actions in days (rather than in weeks, which is typical of traditional approaches). The data can then feed into a predictive model, helping you determine with precision which actions will help accelerate adoption of a new practice, process, or behavior by a given employee group.

You can find some commercial tools out there — culture IQ polls, for example — that support this kind of data collection. These kinds of polls sample groups of employees daily or weekly via a smartphone app to generate real-time insights in line with whatever scope you have defined. Another tool, Waggl.com, has a more advanced functionality, allowing you to have an ongoing conversation with employees about a change effort as well as allowing change managers to tie this dialogue to the progress of initiatives they’re undertaking.

These different types of digital engagement tools can have a vast impact on change management programs, but the data stream they create could be even more important. The data that’s generated can be used to build predictive models of change. Using and deploying these models on real transformation projects and then sharing your findings helps to ensure a higher success rate with data-driven change initiatives in the future.

Applying social media analytics to identify stakeholder sentiment for change management

Change managers can also look beyond the boundaries of the enterprise for insights about the impact of change of process management. Customers, channel partners, suppliers, and investors are all key stakeholders when it comes to change programs. They are also more likely than employees to comment on social media about changes a company is making, thus giving potentially vital insight into how they’re responding. Ernst & Young (now known as EY) is using a tool for social media analytics called SMAART, which can interpret sentiment within consumer and influencer groups. In a project for a pharmaceutical company, EY was able to isolate the specific information sources that drove positive and negative sentiment toward the client’s brand. The company is now starting to apply these techniques to understand the external impact of change management efforts, and it’s a simple leap to extend these techniques within the enterprise. Advances in the linguistic analysis of texts mean that clues about behavior can now be captured from a person’s word choices; even the use of articles and pronouns can help reveal how someone feels.

Applying sentiment analysis tools to data in anonymized company email or the dialogue in tools like Waggl.com can give fresh insight about your organization's change readiness and the reactions of employees to different initiatives. And, the insights gained from analyzing internal communication will be stronger when combined with external social media data.

Capturing reference data in change projects

Have you ever worked in an organization where different change management programs or projects were compared to one another in terms of how efficiently they made the change happen? Or one where a standard set of measurements were used across different change initiatives? No? Most people haven’t. Why is it that organizations often seem obsessed with measuring fractional shifts in operational performance and in capturing data on sales, inventory turns, and manufacturing efficiency, but show no interest in tracking performance from the differences in change projects and change management, beyond knowing which ones have met their goals?

Some people may claim that you can't compare change projects or change management within an organization; it would be like comparing apples to oranges. But that’s not accurate: Different projects may have unique features, but you'll find more similarities than differences between different types of projects. Capturing information about the team involved, the population engaged in the change, how long it took to implement, what tactics were used, and so on is a good idea. It enables you to build a reference data set for future learning, reuse, and efficiency benchmarking. However, remember that although it may not yield immediate benefit, as the overall data set grows, it will make it easier to build accurate predictive models of organizational change of process going forward.

Using data science to select people for change roles

For quite a long time, companies have been using data-driven methods to select candidates for senior change management positions. And today some businesses, such as retailers, are starting to use predictive analytics for hiring frontline staff. Applying these tools when building a change team can both improve project performance significantly and help to build another new data set.

If every change leader and team member would undergo testing and evaluation before a change of process project starts, that data could become important variables to include as you search for an underlying model on what leads to a successful change program. This can even be extended to more informal roles like change leaders, allowing organizations to optimize selection based on what they know about successful personalities for these types of roles.

Along these lines, the California start-up LEDR Technologies is pioneering techniques to predict team performance. It integrates data sources and uses them to help teams anticipate the challenges they may face with team dynamics so that the team can prevent them before they occur.

Automating change metrics

Picture a company or an organization that has a personalized dashboard it has developed in partnership with the firm’s leadership team — one that reflects the company’s priorities, competitive position, and future plans. These dashboards should also be used to offer insights related to the different transformation investments you've made. Keep in mind that much of the data that can act as interesting indicators for change management are already available today — they're just not being collected.

When a company builds a dashboard for identifying recruitment and attrition, it’s teaching the executive team to use data to perform people-related decisions. However, it can take quite some time to set it up correctly and iron out the bugs. Want a suggestion? Don't wait. Start building these type of dashboards as fast as possible now and, where possible, automate them. Why the automation? Change dashboards are vulnerable to version control issues, human error, and internal politics. Automating data management and dashboard generation can make it more transparent and help you keep data integrity.

General Data Science Articles

Current Trends in Data

Big data was definitely the thing just a couple of years ago, but now there's much more of a buzz around the idea of data value — more specifically, how analysis can turn data into value. The following information examines some of the trends related to utilizing data to capture new value.

Data monetization

One trend in data that has taken hold is monetization. Monetizing data refers to how companies can utilize their domain expertise to turn the data they own or have access to into real, tangible business value or new business opportunities. Data monetization can refer to the act of generating measurable economic benefits from available data sources by way of analytics, or, less commonly, it may refer to the act of monetizing data services. In the case of data analytics, typically these benefits appear as revenue or cost savings, but they may also include market share or corporate market value gains.

One could argue that data monetization for increased company revenue or cost savings is simply the result of being a data-driven organization. Though that argument isn’t totally wrong, company leaders are taking an increasing interest in the market to explore how data monetization can drive the innovation of entirely new business models in various different business segments.

One good example of how this process can work is when telecom operators sell data on the positions of rapidly forming clusters of users (picture the conclusion of a sporting event or a concert by the latest YouTube sensation) to taxi companies. This allows taxi cars to be available proactively in the right area when a taxi will most likely be needed. This is a completely new type of business model and customer base for a traditional telecom operator, opening up new types of business and revenues based on available data.

Responsible AI

AI (artificial intelligence) has become a leader in data trends in recent years. Responsible AI systems are characterized by transparency, accountability, and fairness, where users have full visibility into which data is being used and how. It also assumes that companies are communicating the possible consequences of using the data. That includes both potential positive and negative impact. Responsible AI is also about generating customer and stakeholder trust based on following communicated policies and principles over time, including the ability to maintain control over the AI system environment itself. Strategically designing your company´s data science infrastructure and solutions with responsible AI in mind is not only wise, but could also turn out to be a real business differentiator going forward. Just look at how the opposite approach, taken by Facebook and Cambridge Analytica, turned into a scandal which ended by putting Cambridge Analytica out of business. You might remember that Cambridge Analytica gained access to the private and personal information of more than 50 million Facebook users in the US and then offered tools that could then use that data to identify the personalities of American voters and influence their behavior. Facebook, rather than being hacked, was a willing participant in allowing their users' data to be used for other purposes without explicit user consent. The data included details on users’ identities, friend networks, and “likes.” The idea was to map personality traits based on what people had liked on Facebook, and then use that information to target audiences with digital ads. Facebook has also been accused of spreading Russian propaganda and fake news which, together with the Cambridge Analytica incident, has severely impacted the Facebook brand the last couple of years. This type of severe privacy invasion has not only opened many people's eyes in terms of the usage of their data but also impacted the company brands.

Cloud-based data architectures

Cloud-based computing is a data trend that is sweeping the business world. More and more companies are moving away from on-premise-based data infrastructure investments toward virtualized and cloud-based data architectures. The driving force behind this move is that traditional data environments are feeling the pressure of increasing data volumes and are unable to scale up and down to meet constantly changing demands. On-premise infrastructure simply lacks the flexibility to dynamically optimize and address the challenges of new digital business requirements.

Re-architecting these traditional, on-premise data environments for greater access and scalability provides data platform architectures that seamlessly integrate data and applications from various sources. Using cloud-based compute and storage capacity enables a flexible layer of artificial intelligence and machine learning tools to be added as a top layer in the architecture so that you can accelerate the value that can be obtained from large amounts of data.

Computation and intelligence in the edge

Let’s take a look at a truly edgy data trend. Edge computing describes a computing architecture in which data processing is done closer to where the data is created— Internet of Things (IoT) devices like connected luggage, drones, and connected vehicles like cars and bicycles, for example. There is a difference between pushing computation to the edge (edge compute) and pushing analytics or machine learning to the edge (edge analytics or machine learning edge). Edge compute can be executed as a separate task in the edge, allowing data to be preprocessed in a distributed manner before it’s collected and transferred to a central or semi-centralized environment where analytics methods or machine learning/artificial intelligence technologies are applied to achieve insights. Just remember that running analytics and machine learning on the edge requires some form of edge compute to also be in place to allow the insight and action to happen directly at the edge. The reason behind the trend to execute more in the edge mainly depends on factors such as connectivity limitations, low-latency use cases where millisecond response times are needed to perform an immediate analysis and make a decision (in the case of self-driving cars, for example). A final reason for executing more in the edge is bandwidth constraints on transferring data to a central point for analysis. Strategically, computing in the edge is an important aspect to consider from an infrastructure-design perspective, particularly for companies with significant IoT elements.

When it comes to infrastructure design, it’s also worth considering how the edge compute and intelligence solutions will work with the centralized (usually cloud-based) architecture. Many view cloud and edge as competing approaches, but cloud is a style of computing where elastically scalable technology capabilities are delivered as a service, offering a supporting environment for the edge part of the infrastructure. Not everything, however, can be solved in the edge; many use cases and needs are system- or network-wide and therefore need a higher-level aggregation in order to perform the analysis. Just performing the analysis in the edge might not give enough context to make the right decision. Those types of computational challenges and insights are best solved in a cloud-based, centralized model.

As you can see, the cloud setup can be done in a decentralized manner as well, and these decentralized instances are referred to as cloud-edge. For a larger setup on a regional or global scale, the decentralized model can be used to support edge implementations at the IoT device level in a certain country or to support a telecom operator in its efforts to include all connected devices in the network. This is useful for keeping the response time low and not moving raw data over country borders.

Digital twins

This particular trend in data will have you seeing double. A digital twin refers to a digital representation of a real-world entity or system — a digital view of a city's telecommunications network built up from real data, for example. Digital twins in the context of IoT projects is a promising area that is now leading the interest in digital twins. It’s most likely an area that will grow significantly over the next three to five years. Well-designed digital twins are assets that have the potential to significantly improve enterprise control and decision-making going forward. Digital twins integrate artificial intelligence, machine learning, and analytics with data to create living digital simulation models that update and change as their physical counterparts change. A digital twin continuously learns and updates itself from multiple sources to represent its near real-time status, working condition, or position.

Digital twins are linked to their real-world counterparts and are used to understand the state of the system, respond to changes, improve operations, and add value. Digital twins start out as simple digital views of the real system and then evolve over time, improving their ability to collect and visualize the right data, apply the right analytics and rules, and respond in ways that further your organization's business objectives. But you can also use a digital twin to run predictive models or simulations which can be used to find certain patterns in the data building up the digital twin that might lead to problems. Those insights can then be used to prevent a problem proactively.

Adding automated abilities to make decisions based on the digital-twin concept of predefined and preapproved policies would be a great capability to add to any operational perspective — managing an IoT system such as a smart city, for example.

Blockchain

Blockchain is a trend in data that holds promise for future innovations. The blockchain concept has evolved from a digital currency infrastructure into a platform for digital transactions. A blockchain is a growing list of records (blocks) that are linked using cryptography. Each block contains a cryptographic hash of the previous block, a timestamp, and transaction data. By design, a blockchain is resistant to modification of the data. It’s an open and public ledger that can record transactions between two parties efficiently and in a verifiable and permanent way. A blockchain is also a decentralized and distributed digital ledger that is used to record transactions across many computers so that any involved record cannot be altered retroactively without the alteration of all subsequent blocks. The blockchain technologies offer a significant step away from the current centralized, transaction-based mechanisms and can work as a foundation for new digital business models for both established enterprises and start-ups. The image below shows how to use blockchain to carry out a blockchain transaction.

Although the hype surrounding blockchains was originally focused on the financial services industry, blockchains have many potential areas of usage, including government, healthcare, manufacturing, identity verification, and supply chain. Although blockchain holds long-term promise and will undoubtedly create disruption, its promise has yet to be proven in reality: Many of the associated technologies are too immature to use in a production environment and will remain so for the next two to three years.

Conversational platforms

Conversational AI is a form of artificial intelligence that allows people to communicate with applications, websites, and devices in everyday, humanlike natural language via voice, text, touch, or gesture input. For users, it allows fast interaction using their own words and terminology. For enterprises, it offers a way to build a closer connection with customers via personalized interaction and to receive a huge amount of vital business information in return. This image shows the interaction between a human and a bot. This trend in data will most likely drive the next paradigm shift in how humans interact with the digital world. The responsibility for translating intent shifts from humans to machines. The platform takes a question or command from the user and then responds by executing some function, presenting some content, or asking for additional input. Over the next few years, conversational interfaces will become a primary design goal for user interaction and will be delivered in dedicated hardware, core OS features, platforms, and applications. Check out the following list for some potential areas where one could benefit from applying conversational platforms by way of bots:
  • Informational: Chatbots that aid in research, informational requests, and status requests of different types
  • Productivity: Bots that can connect customers to commerce, support, advisory, or consultative services
  • B2E (business-to-employee): Bots that enable employees to access data, applications, resources, and activities
  • Internet of Things (IoT): Bots that enable conversational interfaces for various device interactions, like drones, appliances, vehicles, and displays
Using these different types of conversational platforms, you can expect increased bot productivity (because they can concentrate on the most valuable interactions), a 24/7 automated workforce, increased customer loyalty and satisfaction, new insights into customer interactions, and reduced operational expenses.

Conversational platforms have now reached a tipping point in terms of understanding language and basic user intent, but they still aren’t good enough to fully take off. The challenge that conversational platforms face is that users must communicate in a structured way, and this is often a frustrating experience in real life. A primary differentiator among conversational platforms is the robustness of their models and the application programming interfaces (APIs) and event models used to access, attract, and orchestrate third-party services to deliver complex outcomes.

General Data Science Articles

The Ethics of Artificial Intelligence

So, what does artificial intelligence (AI) ethics actually refer to and which areas are important to address to generate trust around your data and algorithms? Well, there are many aspects to this concept, but there are five cornerstones to rely on when it comes to the ethics of artificial intelligence:

An additional ethical consideration, which is more technical in nature, relates to the reproducibility of results outside of the lab environment. AI is still immature, and most research-and-development is exploratory by nature. There is still little standardization in place for machine learning/artificial intelligence. De facto rules for AI development are emerging, but slowly and they are still very much community driven. Therefore, you must ensure that any results from an algorithm are actually reproducible— meaning you get the same results in the real, target environment as you would not only in the lab environment but also between different target environments (between different operators within the telecommunications sector, for example.)

How to ensure trustworthy artificial intelligence

If the data you need access to in order to realize your business objectives can be considered ethically incorrect, how do you manage that? It’s easy enough to say that applications should not collect data about race, gender, disabilities, or other protected classes. But the fact is that if you do not gather that type of data, you'll have trouble testing whether your applications are in fact fair to minorities.

Machine learning algorithms that learn from data will become only as good as the data they’re running on. Unfortunately, many algorithms have proven to be quite good at figuring out their own proxies for race and other classes, in ways that run counter to what many would consider proper human ethical thinking. Your application would not be the first system that could turn out to be unfair, despite the best intentions of its developers. But, to be clear, at the end of the day your company will be held responsible for the performance of its algorithms, and (hopefully) bias-related legislation in the future will be stricter than it is today. If a company isn’t following laws and regulations or ethical boundaries, the financial cost could be significant — and perhaps even worse, people could lose trust in the company altogether. That could have serious consequences, ranging from customers abandoning the brand to employees losing their jobs to folks going to jail.

To avoid these types of scenarios, you need to put ethical principles into practice, and for that to happen, employees must be allowed and encouraged to be ethical in their daily work. They should be able to have conversations about what ethics actually means in the context of the business objectives and what costs to the company can be weathered in their name. They must also be able to at least discuss what would happen if a solution cannot be implemented in an ethically correct manner. Would such a realization be enough to terminate it? Data scientists in general find it important to share best practices and scientific papers at conferences, writing blog posts, and developing open source technologies and algorithms. However, problems such as how to obtain informed consent aren’t discussed quite as often. It's not as if the problems aren’t recognized or understood; they’re merely seen as less worthy of discussion. Rather than let such a mindset persist, companies should actively encourage (rather than just allow) more discussions about fairness, the proper use of data, and the harm that can be done by the inappropriate use of data.

Recent scandals involving computer security breaches have shown the consequences of sticking your head in the sand: Many companies that never took the time to implement good security practices and safeguards are now paying for that neglect with damages to their reputations and their finances. It is important to exercise the same due diligence now accorded security matters when thinking about issues like fairness, accountability, and unintended consequences of your data use. It will never be possible to predict all unintended consequences of such usage and, yes, the ability to foresee the future is limited. But plenty of unintended consequences could easily have been foreseen. (Facebook’s Year in Review feature, which seemed to go out of its way to remind Facebook users of deaths in the family and other painful events, is a prime example.)

Mark Zuckerberg's famous motto, "Move fast and break things," is unacceptable if it hasn’t been thought through in terms of what is likely to break. Company leaders should insist that they be allowed to ponder such aspects — and stop the production line whenever something goes wrong. This idea dates back to Toyota’s Andon manufacturing method: Any assembly line worker can stop the line if they see something going wrong. The line doesn’t restart until the problem is fixed. Workers don’t have to fear consequences from management for stopping the line; they are trusted, and are expected to behave responsibly.

What would it mean if you could do this with product features or AI/ML algorithms? If anyone at Facebook could have said, “Wait, we’re getting complaints about Year in Review” and pulled it out of production, Facebook would now be in a much better position from an ethical perspective. Of course, it’s a big, complicated company, with a big, complicated product. But so is Toyota, and it worked there. The issue lurking behind all these concerns is, of course, corporate culture. Corporate environments can be hostile to anything other than short-term profitability. However, in a time when public distrust and disenchantment are running at an all-time high, ethics is turning into a good corporate investment. Upper-level management is only starting to see this, and changes to corporate culture won’t happen quickly, but it’s clear that users want to deal with companies that treat them and their data responsibly, not just as potential profit or as engagements to be maximized.

The companies that will succeed with AI ethics are the ones that create space for ethics within their organizations. This means allowing data scientists, data engineers, software developers, and other data professionals, to “do ethics” in practical terms. It isn’t a question of hiring trained ethicists and assigning them to their teams; it’s about living ethical values every single day, not just talking about them. That’s what it means to “do good data science.”

Introducing ethics by design for artificial intelligence and data science

What's the best way to approach implementing AI ethics by design? Might there be a checklist available to use? Now that you mention it, there is one, and you'll find it in the United Kingdom. The government there has launched a data ethics framework, featuring the data ethics workbook. As part of the initiative, they have isolated seven distinct principles around AI ethics. The workbook they came up with is built up around a number of open-ended questions designed to probe your compliance with these principles. Admittedly, it's a lot of questions — 46, to be exact, which is rather too many for a data scientist to continuously keep track of and incorporate efficiently into a daily routine. For such questions to be truly useful then, they need to be embedded not only in the development ways of working but also as part of the data science infrastructure and systems support.

It isn’t merely a question of making it possible as a practical matter to follow ethical principles in daily work and to prove how the company is ethically compliant — the company must also stand behind these ambitions and embrace them as part of its code of conduct. However, when a company talks about adding AI ethics to its code of conduct, the value doesn't come from the pledge itself, but rather emerges from the process people undergo in developing it. People who work with data are now starting to have discussions on a broad scale that would never have taken place just a decade ago. But discussions alone won’t get the hard work done. It is vital to not just talk about how to use data ethically but also to use data ethically. Principles must be put into practice!

Here’s a shorter list of questions to consider as you and your data science teams work together to gain a common and general understanding of what is needed to address AI ethical concerns:
  • Hacking: To what extent is an intended AI technology vulnerable to hacking, and thus potentially vulnerable to being abused?
  • Training data: Have you tested your training data to ensure that it is fair and representative?
  • Bias: Does your data contain possible sources of bias?
  • Team composition: Does the team composition reflect a diversity of opinions and backgrounds?
  • Consent: Do you need user consent to collect and use the data? Do you have a mechanism for gathering consent from users? Have you explained clearly what users are consenting to?
  • Compensation: Do you offer reimbursement if people are harmed by the results of your AI technology?
  • Emergency brake: Can you shut down this software in production if it’s behaving badly?
  • Transparency and Fairness: Do the data and AI algorithms used comply with corporate values for technology such as moral behavior, respect, fairness and transparency? Have you tested for fairness with respect to different user groups?
  • Error rates: Have you tested for different error rates among diverse user groups?
  • Model performance: Do you monitor model performance to ensure that your software remains fair over time? Can it be trusted to perform as intended, not just during the initial training or modelling but also throughout its ongoing “learning” and evolution?
  • Security: Do you have a plan to protect and secure user data?
  • Accountability: Is there a clear line of accountability to an individual and clarity on how the AI operates, the data that it uses, and the decision framework that is applied?
  • Design: Did the AI design consider local and macro social impact, including its impact on the financial, physical, and mental well-being of humans and our natural environment?