Google this week announced version 2.0 of Google Refine, a new product with some attractive features and a free price point, further expanding the options for organisations wishing to get started with data quality.
As quoted on their open source blog Google Refine 2.0 offers the following:
Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases.
It’s already delivered data quality headlines, reporters have been using it to uncover discrepancies in government and healthcare records. For an open source product it has some extremely powerful features, in particular the geo-coding and language detection features really tap into the web services capabilities of Google.
Given the size of Google and their open source reach it’s obviously one product to keep tabs on.
Check out the following videos and resource links at the end of this post.
(Thanks to member Paul Young at Capita for sharing this story).
- Google Refine : Main developer website, download code, view documentation etc.
- New features in version 2.0
- Sample Datasets : Some sample data to get you started
- Related Software : Other open source products (as listed by Google)
- Chicago Tribune write up : Freebase Gridworks overview (the precursor to Refine)