Articles From Lillian Pierson
Filter Results
Cheat Sheet / Updated 04-12-2024
A wide range of tools is available that are designed to help big businesses and small take advantage of the data science revolution. Among the most essential of these tools are Microsoft Power BI, Tableau, SQL, and the R and Python programming languages.
View Cheat SheetArticle / Updated 07-27-2023
In growth, you use testing methods to optimize your web design and messaging so that it performs at its absolute best with the audiences to which it's targeted. Although testing and web analytics methods are both intended to optimize performance, testing goes one layer deeper than web analytics. You use web analytics to get a general idea about the interests of your channel audiences and how well your marketing efforts are paying off over time. After you have this information, you can then go in deeper to test variations on live visitors in order to gain empirical evidence about what designs and messaging your visitors actually prefer. Testing tactics can help you optimize your website design or brand messaging for increased conversions in all layers of the funnel. Testing is also useful when optimizing your landing pages for user activations and revenue conversions. Checking out common types of testing in growth When you use data insights to increase growth for e-commerce businesses, you're likely to run into the three following testing tactics: A/B split testing, multivariate testing, and mouse-click heat map analytics. An A/B split test is an optimization tactic you can use to split variations of your website or brand messaging between sets of live audiences in order to gauge responses and decide which of the two variations performs best. A/B split testing is the simplest testing method you can use for website or messaging optimization. Multivariate testing is, in many ways, similar to the multivariate regression analysis that I discuss in Chapter 5. Like multivariate regression analysis, multivariate testing allows you to uncover relationships, correlations, and causations between variables and outcomes. In the case of multivariate testing, you're testing several conversion factors simultaneously over an extended period in order to uncover which factors are responsible for increased conversions. Multivariate testing is more complicated than A/B split testing, but it usually provides quicker and more powerful results. Lastly, you can use mouse-click heat map analytics to see how visitors are responding to your design and messaging choices. In this type of testing, you use the mouse-click heat map to help you make optimal website design and messaging choices to ensure that you're doing everything you can to keep your visitors focused and converting. Landing pages are meant to offer visitors little to no options, except to convert or to exit the page. Because a visitor has so few options on what he can do on a landing page, you don't really need to use multivariate testing or website mouse-click heat maps. Simple A/B split tests suffice. Data scientists working in growth hacking should be familiar with (and know how to derive insight from) the following testing applications: Webtrends: Offers a conversion-optimization feature that includes functionality for A/B split testing and multivariate testing. Optimizely: A popular product among the growth-hacking community. You can use Optimizely for multipage funnel testing, A/B split testing, and multivariate testing, among other things. Visual Website Optimizer: An excellent tool for A/B split testing and multivariate testing. Testing for acquisitions Acquisitions testing provides feedback on how well your content performs with prospective users in your assorted channels. You can use acquisitions testing to help compare your message's performance in each channel, helping you optimize your messaging on a per-channel basis. If you want to optimize the performance of your brand's published images, you can use acquisition testing to compare image performance across your channels as well. Lastly, if you want to increase your acquisitions through increases in user referrals, use testing to help optimize your referrals messaging for the referrals channels. Acquisition testing can help you begin to understand the specific preferences of prospective users on a channel-by-channel basis. You can use A/B split testing to improve your acquisitions in the following ways: Social messaging optimization: After you use social analytics to deduce the general interests and preferences of users in each of your social channels, you can then further optimize your brand messaging along those channels by using A/B split testing to compare your headlines and social media messaging within each channel. Brand image and messaging optimization: Compare and optimize the respective performances of images along each of your social channels. Optimized referral messaging: Test the effectiveness of your email messaging at converting new user referrals. Testing for activations Activation testing provides feedback on how well your website and its content perform in converting acquired users to active users. The results of activation testing can help you optimize your website and landing pages for maximum sign-ups and subscriptions. Here's how you'd use testing methods to optimize user activation growth: Website conversion optimization: Make sure your website is optimized for user activation conversions. You can use A/B split testing, multivariate testing, or a mouse-click heat map data visualization to help you optimize your website design. Landing pages: If your landing page has a simple call to action that prompts guests to subscribe to your email list, you can use A/B split testing for simple design optimization of this page and the call-to-action messaging. Testing for retentions Retentions testing provides feedback on how well your blog post and email headlines are performing among your base of activated users. If you want to optimize your headlines so that active users want to continue active engagements with your brand, test the performance of your user-retention tactics. Here's how you can use testing methods to optimize user retention growth: Headline optimization: Use A/B split testing to optimize the headlines of your blog posts and email marketing messages. Test different headline varieties within your different channels, and then use the varieties that perform the best. Email open rates and RSS view rates are ideal metrics to track the performance of each headline variation. Conversion rate optimization: Use A/B split testing on the messaging within your emails to decide which messaging variety more effectively gets your activated users to engage with your brand. The more effective your email messaging is at getting activated users to take a desired action, the greater your user retention rates. Testing for revenue growth Revenue testing gauges the performance of revenue-generating landing pages, e-commerce pages, and brand messaging. Revenue testing methods can help you optimize your landing and e-commerce pages for sales conversions. Here's how you can use testing methods to optimize revenue growth: Website conversion optimization: You can use A/B split testing, multivariate testing, or a mouse-click heat map data visualization to help optimize your sales page and shopping cart design for revenue-generating conversions. Landing page optimization: If you have a landing page with a simple call to action that prompts guests to make a purchase, you can use A/B split testing for design optimization.
View ArticleArticle / Updated 06-09-2023
If statistics has been described as the science of deriving insights from data, then what’s the difference between a statistician and a data scientist? Good question! While many tasks in data science require a fair bit of statistical know how, the scope and breadth of a data scientist’s knowledge and skill base is distinct from those of a statistician. The core distinctions are outlined below. Subject matter expertise: One of the core features of data scientists is that they offer a sophisticated degree of expertise in the area to which they apply their analytical methods. Data scientists need this so that they’re able to truly understand the implications and applications of the data insights they generate. A data scientist should have enough subject matter expertise to be able to identify the significance of their findings and independently decide how to proceed in the analysis. In contrast, statisticians usually have an incredibly deep knowledge of statistics, but very little expertise in the subject matters to which they apply statistical methods. Most of the time, statisticians are required to consult with external subject matter experts to truly get a firm grasp on the significance of their findings, and to be able to decide the best way to move forward in an analysis. Mathematical and machine learning approaches: Statisticians rely mostly on statistical methods and processes when deriving insights from data. In contrast, data scientists are required to pull from a wide variety of techniques to derive data insights. These include statistical methods, but also include approaches that are not based in statistics — like those found in mathematics, clustering, classification, and non-statistical machine learning approaches. Seeing the importance of statistical know-how You don't need to go out and get a degree in statistics to practice data science, but you should at least get familiar with some of the more fundamental methods that are used in statistical data analysis. These include: Linear regression: Linear regression is useful for modeling the relationships between a dependent variable and one or several independent variables. The purpose of linear regression is to discover (and quantify the strength of) important correlations between dependent and independent variables. Time-series analysis: Time series analysis involves analyzing a collection of data on attribute values over time, in order to predict future instances of the measure based on the past observational data. Monte Carlo simulations: The Monte Carlo method is a simulation technique you can use to test hypotheses, to generate parameter estimates, to predict scenario outcomes, and to validate models. The method is powerful because it can be used to very quickly simulate anywhere from 1 to 10,000 (or more) simulation samples for any processes you are trying to evaluate. Statistics for spatial data: One fundamental and important property of spatial data is that it’s not random. It’s spatially dependent and autocorrelated. When modeling spatial data, avoid statistical methods that assume your data is random. Kriging and krige are two statistical methods that you can use to model spatial data. These methods enable you to produce predictive surfaces for entire study areas based on sets of known points in geographic space. Working with clustering, classification, and machine learning methods Machine learning is the application of computational algorithms to learn from (or deduce patterns in) raw datasets. Clustering is a particular type of machine learning —unsupervised machine learning, to be precise, meaning that the algorithms must learn from unlabeled data, and as such, they must use inferential methods to discover correlations. Classification, on the other hand, is called supervised machine learning, meaning that the algorithms learn from labeled data. The following descriptions introduce some of the more basic clustering and classification approaches: k-means clustering: You generally deploy k-means algorithms to subdivide data points of a dataset into clusters based on nearest mean values. To determine the optimal division of your data points into clusters, such that the distance between points in each cluster is minimized, you can use k-means clustering. Nearest neighbor algorithms: The purpose of a nearest neighbor analysis is to search for and locate either a nearest point in space or a nearest numerical value, depending on the attribute you use for the basis of comparison. Kernel density estimation: An alternative way to identify clusters in your data is to use a density smoothing function. Kernel density estimation (KDE) works by placing a kernel a weighting function that is useful for quantifying density — on each data point in the data set, and then summing the kernels to generate a kernel density estimate for the overall region. Keeping mathematical methods in the mix Lots gets said about the value of statistics in the practice of data science, but applied mathematical methods are seldom mentioned. To be frank, mathematics is the basis of all quantitative analyses. Its importance should not be understated. The two following mathematical methods are particularly useful in data science. Multi-criteria decision making (MCDM): MCDM is a mathematical decision modeling approach that you can use when you have several criteria or alternatives that you must simultaneously evaluate when making a decision. Markov chains: A Markov chain is a mathematical method that chains together a series of randomly generated variables that represent the present state in order to model how changes in present state variables affect future states.
View ArticleCheat Sheet / Updated 09-24-2021
"Data science" is the big buzzword these days, and most folks who have come across the term realize that data science is a powerful force that is in the process of revolutionizing scores of major industries. Not many folks, however, are aware of the range of tools currently available that are designed to help big businesses and small take advantage of the data science revolution. Take a peek at these tools and see how they fit in to the broader context of data science.
View Cheat SheetArticle / Updated 04-18-2017
A data-journalism piece is only as good as the data that supports it. To publish a compelling story, you must find compelling data on which to build. That isn't always easy, but it's easier if you know how to use scraping and autofeeds to your advantage. Scraping data Web-scraping involves setting up automated programs to scour and extract the exact and custom datasets that you need straight from the Internet so you don't have to do it yourself. The data you generate from this process is commonly called scraped data. Most data journalists scrape source data for their stories because it's the most efficient way to get datasets for unique stories. Datasets that are easily accessible have usually already been exploited and mined by teams of data journalists who were looking for stories. To generate unique data sources for your data-driven story, scrape the data yourself. If you find easy-to-access data, beware that most of the stories in that dataset have probably been told by a journalist who discovered that data before you. To illustrate how you'd use data scraping in data journalism, imagine the following example: You're a data journalist living in a U.S. state that directly borders Mexico. You've heard rumors that the local library's selection of Spanish-language children's books is woefully inadequate. You call the library, but its staff fear negative publicity and won't share any statistics with you about the topic. Because the library won't budge on its data-sharing, you're forced to scrape the library's online catalog to get the source data you need to support this story. Your scraping tool is customized to iterate over all possible searches and keep track of the results. After scraping the site, you discover that 25 percent of children's books at the library are Spanish-language books. Spanish-speakers make up 45 percent of the primary-school population; is this difference significant enough to form the basis of a story? Maybe, maybe not. To dig a little deeper and possibly discover a reason behind this difference, you decide to scrape the catalog once a week for several weeks, and then compare patterns of borrowing. When you find that a larger proportion of Spanish books are being checked out, this indicates that there is, indeed, a high demand for children's books in Spanish. This finding, coupled with the results from your previous site scrape, give you all the support you need to craft a compelling article around the issue. Setting up data alerts To generate hot stories, data journalists must have access to the freshest, newest data releases that are coming from the most credible organizations. To stay on top of what datasets are being released where, data journalists subscribe to alert systems that send them notifications every time potentially important data is released. These alert systems often issue notifications via RSS feeds or via email. It's also possible to set up a custom application like DataStringer to send push notifications when significant modifications or updates are made to source databases. After you subscribe to data alerts and form a solid idea about the data-release schedule, you can begin planning for data releases in advance. For example, if you're doing data journalism in the business analytics niche and know that a particularly interesting quarterly report is to be released in one week, you can use the time you have before its release to formulate a plan on how you'll analyze the data when it does become available. Many times, after you're alerted to important new data releases, you still need to scrape the source site in order to get that data. In particular, if you're pulling data from a government department, you're likely to need to scrape the source site. Although most government organizations in western countries are legally obligated to release data, they aren't required to release it in a format that's readily consumable. Don't expect them to make it easy for you to get the data you need to tell a story about their operations.
View ArticleArticle / Updated 04-18-2017
By thinking through the how of a story, you are putting yourself in position to craft better data-driven stories. Looking at your data objectively and considering factors like how it was created helps you to discover interesting insights that you can include in your story. Also, knowing how to quickly find stories in potential data sources helps you to quickly sift through the staggering array of options. And, how you present your data-driven story determines much about how well that story is received by your target audience. You could have done everything right — really taken the time to get to know who your audience is, boiled your story down so that it says exactly what you intend, published it at just the right time, crafted your story around what you know about why people care, and even published it to just the right venue — but if your data visualization looks bad, or if your story layout makes it difficult for readers to quickly gather useful information, then your positive response rates are likely to be low. Integrating how as a source of data and story context You need to think about how your data was generated because that line of thinking often leads you into more interesting and compelling storylines. Before drawing up a final outline for your story, brainstorm about how your source data was generated. If you find startling or attention-grabbing answers that are relevant to your story, consider introducing those in your writing or data visualization. Finding stories in your data If you know how to quickly and skillfully find stories in datasets, you can use this set of skills to save time when you're exploring the array of stories that your datasets offer. If you want to quickly analyze, understand, and evaluate the stories in datasets, then you need to have solid data analysis and visualization skills. With these skills, you can quickly discover which datasets to keep and which to discard. Getting up to speed in relevant data science skills also helps you quickly find the most interesting, relevant stories in the datasets you select to support your story. Presenting a data-driven story How you present your data-driven story determines much about whether it succeeds or fails with your target audience. Should you use an infographic? A chart? A map? Should your visualization be static or interactive? You have to consider countless aspects when deciding how to best present your story.
View ArticleArticle / Updated 04-18-2017
The human capacity to question and understand why things are the way they are is a clear delineation point between the human species and other highly cognitive mammals. Answers to questions about why help you to make better-informed decisions. These answers help you to better structure the world around you and help you develop reasoning beyond what you need for mere survival. In data journalism, as in all other types of business, answers to the question why help you predict how people and markets respond. These answers help you know how to proceed to achieve an outcome of most probable success. Knowing why your story matters helps you write and present it in a way that achieves the most favorable outcomes — presumably, that your readers enjoy and take tremendous value from consuming your content. Asking why in order to generate and augment a storyline No matter what topic you're crafting a story around, it's incredibly important to generate a storyline around the wants and needs of your target audience. After you know who your audience is and what needs they most often try to satisfy by consuming content, use that knowledge to help you craft your storyline. If you want to write a story and design a visualization that precisely targets the needs and wants of your readership, take the time to pinpoint why people would be interested in your story, and create a story that directly meets that desire in as many ways as possible. Why your audience should care People care about things that matter to them and that affect their lives. Generally, people want to feel happy and safe. They want to have fulfilling relationships. They want to have good status among their peers. People like to learn things, particularly things that help them earn more money. People like possessions and things that bring them comfort, status, and security. People like to feel good about themselves and what they do. This is all part of human nature. The desires just described summarize why people care about anything — from the readers of your story to the person down the street. People care because it does something for them, it fills one of their core desires. Consequently, if your goal is to publish a high-performing, well-received data journalism piece, make sure to craft it in a way that fulfills one or two core desires of your target readership.
View ArticleArticle / Updated 04-18-2017
Data and stories are always more relevant to some places than others. From where is a story derived, and where is it going? If you keep these important facts in mind, the publications you develop are more relevant to their intended audience. The where aspect in data journalism is a bit ambiguous because it can refer to a geographical location or a digital location, or both. Where is the story relevant? You need to focus on where your story is most relevant so that you can craft the most compelling story by reporting on the most relevant trends. If your story is location independent — you're reporting on a trend that's irrelevant to location — of course you want to use data sources that most clearly demonstrate the trend on which you're reporting. Likewise, if you're reporting a story that's tied to a specific geographic location, you probably want to report statistics that are generated from regional areas demonstrating the greatest degree of extremes — either as greatest value fluxes or as greatest value differences for the parameters on which you're reporting. Sometimes you find multiple geographic or digital locations that exemplify extreme trends and unusual outliers. In other words, you find more than one excellent information source. In these cases, consider using all of them by creating and presenting a data mashup — a combination of two or more data sources that are analyzed together in order to provide readers with a more complete view of the situation at hand. Where should the story be published? Another important question to consider in data journalism is, "Where do you intend to publish your story?" This where can be a geographical place, a particular social media platform, or certain series of digital platforms that are associated with a particular brand — Facebook, Twitter, Pinterest, and Instagram accounts, as well as blogs, that are all tied together to stream data from one branded source. Just as you need to have a firm grasp on who your audience is, you should clearly understand the implications of where your publication is distributed. Spelling out where you'll be publishing helps you conceptualize to whom you're publishing, what you should publish, and how you should present that publication. If your goal is to craft high-performing data journalism articles, your headlines and storylines should cater to the interests of the people that are subscribed to the channels in which you're distributing. Since the collective interest of the people at each channel may slightly differ, make sure to adapt to those differences before posting your work.
View ArticleArticle / Updated 04-18-2017
As the old adage goes, timing is everything. It's a valuable skill to know how to refurbish old data so that it's interesting to a modern readership. Likewise, in data journalism, it's imperative to keep an eye on contextual relevancy and know when is the optimal time to craft and publish a particular story. When as the context to your story If you want to craft a data journalism piece that really garners a lot of respect and attention from your target audience, consider when — over what time period — your data is relevant. Stale, outdated data usually doesn't help the story make breaking news, and unfortunately you can find tons of old data out there. But if you're skillful with data, you can create data mashups that take trends in old datasets and present them in ways that are interesting to your present-day readership. For example, take gender-based trends in 1940s employment data and do a mashup — integration, comparison, or contrast — of that data and employment data trends from the five years just previous to the current one. You could then use this combined dataset to support a truly dramatic story about how much things have changed or how little things have changed, depending on the angle you're after with your piece. Returning once again to the issue of ethical responsibilities in journalism, as a data journalist you walk a fine line between finding datasets that most persuasively support your storyline and finding facts that support a factually challenged story you're trying to push. Journalists have an ethical responsibility to convey an honest message to their readers. When building a case to support your story, don't take things too far — in other words, don't take the information into the realm of fiction. There are a million facts that could be presented in countless ways to support any story you're looking to tell. Your story should be based in reality, and not be some divisive or fabricated story that you're trying to promote because you think your readers will like it. You may sometimes have trouble finding interesting or compelling datasets to support your story. In these situations, look for ways to create data mashups that tie your less-interesting data into some data that's extremely interesting to your target audience. Use the combined dataset as a basis for your data-driven story. When does the audience care the most? If your goal is to publish a data journalism piece that goes viral, then you certainly want to consider the story's timeliness: When would be the prime time to publish an article on this particular topic? For obvious reasons, you're not going to do well by publishing a story in 2017 about who won the 1984 election for U.S. president; everyone knows, and no one cares. Likewise, if a huge, present-day media scandal has already piqued the interest of your readership, it's not a bad idea to ride the tailwinds of that media hype and publish a related story. The story would likely perform pretty well, if it's interesting. As a recent example, you could have created a data journalism piece on Internet user privacy assumptions and breaches thereof and then published it in the days just after news of the Edward Snowden/NSA controversy broke. Keeping relevant and timely publishing schedules is one way to ensure that your stories garner the attention they need to keep you employed.
View ArticleArticle / Updated 04-18-2017
The Washington Post story "The Black Budget" is an incredible example of data science in journalism. When former NSA contractor Edward Snowden leaked a trove of classified documents, he unleashed a storm of controversy not only among the public but also among the data journalists who were tasked with analyzing the documents for stories. The challenge for data journalists in this case was to discover and disclose data insights that were relevant to the public without compromising the safety of ordinary citizens. Among the documents leaked by Snowden was the so-called Black Budget for fiscal year 2013, a 178-page line-by-line breakdown of the funds that were earmarked for 16 various U.S. federal intelligence agencies. Through the Washington Post's "The Black Budget," the American public was informed that $52.6 billion taxpayer dollars had been spent on mostly covert federal intelligence services in 2013 alone. The Washington Post did a phenomenal job in its visual presentation of the data. The opening title is a somber visual pun: The words The Black Budget are written in a huge black box contrasted only with gray and white. This layout visually implies the serious and murky nature of the subject matter. The only touch of color is a navy blue, which conjures a vaguely military image and barely contrasts with the black. This limited palette is continued throughout the visual presentation of the data. Washington Post data journalists used unusual blocky data graphics — an unsettling, strangely horizontal hybrid of a pie chart, a bar graph, and a tree map — to hint at the surreptitious and dangerous nature of the topic, as well as the shady manner in which the information was obtained. The data graphics used in the piece exhibited a low data-to-ink ratio — in other words, only a little information is conveyed with a lot of screen space. Although normally a low data-to-ink ratio indicates bad design, the data-to-ink ratio here effectively hints that mountains of data lie underneath the layers being shown, and that these layers remain undisclosed so as not to endanger intelligence sources and national security. Traditional infographic elements used in this piece include stark, light gray seals of the top five intelligence agencies, only three of which the average person would have ever seen. Simple bar charts outlined funding trends, and people-shaped icons represented the army of personnel involved in intelligence gathering. A lot of thought went into the collection, analysis, and presentation of this story. Its ensemble is an unsettling, yet overwhelmingly informative, piece of data journalism. Although this sort of journalism was in its infancy even just a decade ago, now the data and tools required for this type of work are widely available for journalists to use to quickly develop high-quality data journalism articles.
View Article