Exploring Exversion Data - dummies

By Lillian Pierson

Modeled after Github — the cloud-hosted platform across which programmers can collaboratively share and review code — Exversion aims to provide the same collaborative functionality around data that GitHub provides around code. The Exversion platform offers version control functionality and hosting services to which you can upload and share your data.

To illustrate how Exversion works, imagine a platform that would allow you to first fork (or copy) a dataset and then make the changes you want. Exversion would be there to keep track of what has changed from the original set and every change that you make to it. Exversion also allows users to rate, review, and comment on datasets.

Datasets hosted on the Exversion platform are either provided by a user or created by a spider that crawls and indexes open data to make it searchable from a single application programming interface (API). Like GitHub, with a free user account, all the data you upload to Exversion is public. If you’re willing to pay for an account, then you can create your own private data repositories. Also, with the paid account, you get the option to share your data with selected users for collaborative projects.

When you work on collaborative projects, version control becomes vitally important. Instead of learning this lesson the hard way, just start your project on a version-enabled application or platform — this approach will save you from a lot of problems in the future.

Exversion is extremely useful in the data-cleanup stage. Most developers are familiar with data-cleanup hassles. Imagine you want to use a particular dataset, but in order to do so, you must put tabs in all the right places to make the columns line up correctly.

Meanwhile, the other 100 developers out there working with that dataset are doing the exact same thing. In contrast, if you download, clean, and then upload the data to Exversion, other developers can use it and don’t have to spend their time doing the same work later. In this way, everyone can benefit from each other’s work, and each individual person can spend more time analyzing data and less time cleaning it.