Core Data Curators

The Core Data Curators curate the core data.

Curation involves identifying and locating core (public) data, and packaging it up as high-quality data packages.

New team members wanted: We are always seeking volunteers to join the Data Curators team. Get to be part of a crack team and develop and hone your data wrangling skills whilst helping to provide high quality data to the community.

What Roles and Skills are Needed

We have a variety of roles from identifying new "core" data, to collecting and packaging the data, to performing quality control.

Core Skills – at least one of these skills is strongly recommended:

  • Data Wrangling Experience. Many of our source data are not complex (just an Excel file or similar) and can be "wrangled" in a Spreadsheet program. What we therefore recommend is at least one of:
    • Experience with a Spreadsheet application such as Excel or (preferably) Google Docs including use of formulas and (desirably) macros (you should at least know how you could quickly convert a cell containing '2014' to '2014-01-01' across 1000 rows)
    • Coding for data processing (especially scraping) in one or more of python, javascript, bash
  • Data sleuthing - the ability to dig up data on the web (specific desirable skills: you know how to search by filetype in google, you know where the developer tools are in chrome or firefox, you know how to find the URL a form posts to)

Desirable Skills (the more the better!):

  • Data vs Metadata: know difference between data and metadata
  • Familiarity with Git (and Github)
  • Familiarity with a command line (preferably bash)
  • Know what JSON is
  • Mac or Unix is your default operating system (will make access to relevant tools that much easier)
  • Knowledge of Web APIs and/or HTML
  • Use of curl or similar command line tool for accessing Web APIs or web pages
  • Scraping using a command line tool or (even better) by coding yourself
  • Know what a Data Package and a Tabular Data Package are
  • Know what a text editor is (e.g. notepad, textmate, vim, emacs, …) and know how to use it (useful for both working with data and for editing Data Package metadata)

Get Involved - Sign Up Now!

Here's what you need to know when you sign up:

  • Time commitment: Members of the team commit to at least 8-16h per month (though this will be an average - if you are especially busy with other things one month and do less that is fine)
  • Schedule: There is no schedule so you can contribute at any time that is good for you - evenings, weekeneds, lunch-times etc
  • Location: all activity will be carried out online so you can be based anywhere in the world
  • Skills: see above

To register your interest please get in touch via the issue tracker here https://github.com/datasets/awesome-data/issues

Built with DataHub LogoDataHub Cloud