The Duties for a Data Analysis Coding Job

By Nikhil Abraham

Data analysts sift through large volumes of data, looking for insights that help drive the product or business forward. This coding role marries programing and statistics in the search for patterns in the data. Popular examples of data analysis in action include the recommendation engines used by Amazon to make product suggestions to users based on previous purchases and by Netflix to make movie suggestions based on movies watched.

The data analyst’s first challenge is simply importing, cleaning, and processing the data. A website can generate millions of database entries of users’ data daily, requiring the use of complicated techniques, referred to as machine learning, to create classifications and predictions from the data.

For example, half a billion messages are sent per day using Twitter; some hedge funds analyze this data and classify whether a person talking about a stock is expressing a positive or negative sentiment. These sentiments are then aggregated to see whether a company has a positive or negative public opinion before the hedge fund purchases or sells any stock.

Any programming language can be used to analyze data, but the most popular programming languages used for the task are R, Python, and SQL. Publicly shared code in these three languages makes it easier for individuals entering the field to build on another person’s work. While crunching the data is important, employers also look for data analysts with skills in the following:

  • Visualization: Just as important as finding insight in the data is communicating that insight. Data visualization uses charts, graphs, dashboards, infographics, and maps, which can be interactive, to display data and reduce the complexity such that one or two conclusions appear obvious. Common data visualization tools include D3.js, a JavaScript graphing library, and ArcGIS for geographic data.

    The two Manhattan addresses farthest away from Starbucks.

    The two Manhattan addresses farthest away from Starbucks.
  • Distributed storage and processing: Processing large amounts of data on one computer can be time intensive. One option is to purchase a single faster computer. Another option, called distributed storage and processing, is to purchase multiple machines and divide the work. For example, imagine that you want to count the number of people living in Manhattan. In the distributed storage and processing approach, you might ring odd‐numbered homes, someone else would ring even‐numbered homes, and when everyone finishes you would sum the counts.

Data analysts work with back‐end developers to gather data needed for their work. After the data analysts have drawn conclusions from the data, and come up with ideas on improving the existing product, they meet with the entire team to help design prototypes to test the ideas on existing customers.