10 Free Resources for Data Science - dummies

10 Free Resources for Data Science

By Lillian Pierson

One excellent thing about data science is that you don’t need to spend a lot of money on software applications and source data in order to start reaping its benefits. You can simply start with your own data and begin using open source applications or programming languages in order to begin deriving valuable insights.

Even contextual data can be gathered for free from open data sources. It’s likely that you’ll get better, more exact results (with less effort) if you have money to spend on expensive tools and exact-targeted data sources, but this is not an absolute requirement by any means.

Many of the resources presented here are made available as part of the donating organization’s contribution to the open movement — a movement that advocates the free exchange of intellectual property in the areas of art, education, software, data, and technology, among others. Three major components of the open movement are open data, open source, and open knowledge.

Open data initiatives are based on the premise that data should be made freely, and openly available to the general public for use, reuse, and republishing. Open data might be issued under some sort of open licensing requirement, but it’s generally free of copyright and patent restrictions.

The open source software movement started in the late 1970’s and is based on the premise that software users should be able to freely share software. The open source movement also advocates that developers should be permitted to build derivative open-source or proprietary works upon it.

Lastly, the open knowledge movement has positioned itself on top of other open movements and sets forth the idea that knowledge in all forms should be made freely and openly available to the general public. The open knowledge movement also advocates that people should be free to use, reuse, share, and republish that knowledge.

Access open data through Open Knowledge

The largest proponent of the open knowledge movement is the non-profit organization, Open Knowledge. This organization defines open knowledge as the useful, meaningful, and valuable insights derived from open data sources. The Open Knowledge website is a terrific source for meeting and collaborating with open data and data science enthusiasts.

Find free data at World Bank Open Data

World Bank Open Data is an incredibly robust source of data and demographics on developing nations. Topics include health, infrastructure, poverty, trade, urban development, agriculture and rural development, and the environment, among many others. Data on the World Bank’s indicator metrics are also made publicly available. Those are a great basis on which to do a comparative analysis between developing nations.

Source free spatial data from OpenStreetMap

OpenStreetMap (OSM) is an excellent source of free, open geographic data that you can use for analyzing data in Geographic Information Systems (GIS) or for developing location-aware web applications. At the OSM Export page you can download the entire (massive) database of OSM spatial data. (To download data extracts from only the continents, countries, and cities you want, be sure to use the Geofabrik Downloads option; you can also use OSM’s Metro Extracts feature to download metropolitan data on the world’s largest cities.)

Gather free government data from Data.gov

Data.gov is a tremendous source if you’re looking for free data on business, environment, public health, and research and development (R&D) in the USA. Data.gov is awesome because, in addition to standard tabular datasets, it also offers spatial data sources that you can download and use for analysis in GIS.

Compute with languages born of the open source movement

Data science friendly programming languages, like R and Python are open source and free to use for analysis or application development. Both languages are ideal for data analysis and visualization. R has the advantage of offering more sophisticated statistical and data visualization capabilities, while Python’s advantage is that it is far easier to learn than R.

Analyze data for free with the Data Applied application

If you don’t want to use code to write custom routines for your analysis, then you could opt to use a free web-based application instead. Data Applied offers robust functionality for creating pivot tables, tree maps, and predictive forecasts. It’s also got features that you can use for correlation, outlier, and association analysis.

Make maps with Geocommons

Geocommons is a free web-based mapping application. Its most salient feature is its integration of related charts and Google Earth functionality. With Geocommons, you can upload data and automatically geocode your identifiers. You have the choice of visualizing data using a chloropleth (for which you can devise a number of automated or manual coloring schemes) or visualizing data using bubbles and markers (like flags and droplets that convey data by size or by color).

Geocommons also allows you to insert a simple line chart, bar chart, column chart, or scatter plot (generally not something that’s easy to do on a map) to show the relationship between two quantitative variables. Finally, Geocommons integrates marker or categorical area visualizations with Google Earth so that you can view your spatial data on a spherical, rotatable, zoom-able depiction of the planet. (Note that, in order to use this feature, you need to install the Google Earth plug-in.)

Make maps with Datawrapper

Datawrapper is a free, open-source tool that you can use to create simple visualizations. It has two features not found elsewhere — an Election Donut data graphic that resembles the layout of various houses of parliament or congresses, and an easy highlighting functionality that you can use to emphasize a data visualization’s core significance by highlighting only one data point in a graph. Datawrapper offers the following choices in data graphic type: Column charts, pie charts, donut charts, election donut charts, maps, and bar charts.

Make cool interactive infographics with Infoactive.co

Infoactive.co is a low-cost, web-based data visualization design application. It offers an attractive template highlighting those minimalist “flat-design” text and visualizations that are all the rage these days. The application has capabilities to add mouse-over events, data filtering, and even data connections to live-updating, online sources such as Google Sheets.

Connecting your Infoactive.co infographic to data that’s stored in a Google Sheet can produce dynamic results. Every time the spreadsheet is updated, the Infoactive visualization is updated at the same time.

Infoactive.co offers many choices in data graphic type — including column charts, pie charts, donut charts, percentage icons, line charts, area charts, maps, and bar charts. Also, just as a little tidbit about Infoactive.co — the startup was born as a result of a successful Kickstarter campaign!! That just goes to show that you can never really predict from where success will be born.

Chart with ChartBuilder

ChartBuilder is amazing because it’s even easier to build charts in the ChartBuilder application than it is in Microsoft Excel. What’s more, ChartBuilder charts can be more attractive and professional-looking than those designed in the standard Excel application. You can use ChartBuilder for free to make line charts, column charts, bar charts, and scatter plots.

What’s more, the application allows you to do easy copy-and-paste operations to import your data and then generate usable results in mere seconds. Although the site doesn’t offer data storage capabilities, your visualizations can be downloaded with a single click, as either bitmap images or SVG vector graphics.