ODI Toolbox application experiments: from Comma Chameleon to Data Curator

ODI Code Fellow Stephen Fortune explains how the ODI Australian Network – an ODI Node – built on ODI Toolbox application Comma Chameleon’s code to enhance the application while working in the spirit of open source software development

Since my previous post, I’ve been continuing much needed house-keeping of the ODI Toolbox applications while providing as much assistance to the ODI Australian Network (previously ODI Queensland) as possible. As hinted at last time, the ODI Australian Network – one of our Pioneer Nodes – have also been busy, thinking through future directions for our ODI Toolbox application, Comma Chameleon.

Comma Chameleon is one of our ‘alpha stage’ applications – a desktop CSV editor, designed to make creating machine-readable tabular data easy through a minimal interface. Comma Chameleon is like a pared-back version of spreadsheet software with integrations to some of our other toolbox apps, namely CSV Lint (for validation) and Octopub (for publishing your well-formed CSV dataset to GitHub as open data).

Stephen Gates, an active contributor to several ODI Toolbox applications – most notably Open Data Pathway – and the Australian team expressed an interest in taking Comma Chameleon further. They proposed developing it into an application that open data creators could use as part of their data publishing workflow. A proposal to fund the further development of Comma Chameleon was prepared and accepted.

Needless to say, this was an exciting development. I’d arrived at the ODI to treat the ODI Toolbox as a maintenance effort, creating a hospitable environment in which a community could establish itself. But here was both the energy and resources from an ODI Node to build on the existing code and enhance the application, all the while working in the spirit of OSS development.

Enter, Data Curator

As discussions between the ODI Australian Network and ODI HQ progressed, a consensus emerged that the ODI Australian Network were taking Comma Chameleon in a slightly different direction than ODI HQ had intended. There were additional concerns that the currently reduced capacity of ODI Labs team would be unable to assist with the additional overheads the proposal required. Given those differences in approach, both ODI HQ and the ODI Australian Network agreed it would make more sense to fork the existing code and develop a separate project under a new name, Data Curator.

In terms of open source development, forking is a big deal. As Karl Fogel – who literally wrote the book on open source software development – says: ‘at its simplest, a fork is when one copy of a project diverges from another copy’. All forks therefore have potential to split the attention of the contributor community (though in Fogel’s outline social forks – when a disaffected group of developers decide to create a divergent version of a project more in line with their own vision – tend to be the ones most likely to tend that way).

I think it is important to understand why this fork had to happen, and what it means for the respective futures of Comma Chameleon and Data Curator.

First off, what does the change in direction amount to? The first, and most significant, aspect is creating an editor that is both dataset and schema-oriented. Schemas are metadata files that define the format and structure of data. For tabular data, such as CSVs, a schema describes what the contents of cells in a CSV file should contain, on a column-by-column basis.

Data Curator is intended to help users working on a CSV dataset to create an accompanying metadata file for the CSV file they’re currently editing and then use that schema to validate the data. Sharing the schema with open data consumers helps them better understand and use the data.

When I initially worked on Comma Chameleon in 2015, I was interested in closer coupling schema and CSV file, though obviously had no idea of the level of work it would require (as the fossil file schemawizard.js attests to). Data Curator’s schema emphasis envisions a different end-user, one with more tightly defined user needs than what the existing Comma Chameleon application is intended for.

The second distinguishing feature of Data Curator is its implementation. Offline operation is a crucial requirement as open data creators need approval before publishing the data and supporting documentation online. Offline data validation is already included in Comma Chameleon – implemented by the Ruby gem CSVLint, incorporated via the Travelling Ruby library. The ODI Australian Network's preference is to have an all-JavaScript solution – entirely sensible, given that this is the language of the Electron runtime. CSV validation will accomplished by reusing a javascript library from Open Knowledge International's (OKI) Frictionless Data project. OKI have been upgrading their Frictionless Data specification (ands supporting libraries) to a version 1.0 over the summer months.

CSV Lint supports a prior version of the Frictionless Data specification, with some variations. It also supports the W3C CSV for the Web standard. Removing the CSVLint gem from Comma Chameleon would mean that it no longer supported both schema standards. The ODI considers multiple supported standards as a measure of healthy data infrastructure, and typically aspire to have our tools support several standards, rather than one to the exclusion of others in the data ecosystem. The ODI Australian Network's view was that the broad software support available for the Frictionless Data specification outweighed the value of supporting multiple standards.

Lastly, there was an aspiration to release Comma Chameleon on both the Mac and Windows app stores to extend its reach and confidence in its use. As one of our Comma Chameleon contributors remarked, this would result people having different expectations around the product’s support. ODI HQ had to concede that we didn’t currently have the capacity to satisfactorily meet these expectations. That same lack of capacity meant we couldn’t support the infrastructure required to distribute the application to the Mac and Windows stores.

The agreement ODI HQ and the ODI Australian Network reached speaks directly to the question that animated my fellowship:

How can software tools useful to data literacy and data practice be maintained and sustained?

Short fork, social fork or no fork at all?

Comma Chameleon is – and, given that a fork has occurred, will remain – a very ‘rough around the edges’ experiment. To get it to the point where it can even be appraised as a tool useful to best data practices requires funding, which the ODI Australian Network has been able to unlock. It has also required vim, vigour and affection. Stephen Gates and his seemingly indefatigable reserves of enthusiasm deserve so much credit for what's happened for Comma Chameleon, and for the future of Data Curator.

In Karl Fogel’s forking terminology, there are couple of ways of appraising these developments. Data Curator may end up as a short fork – where a public copy of the project's master repository is modified and later merged back into the codebase, because ODI HQ will increase capacity and may then be able to contribute to Data Curator. Alternatively, it may be deemed a social fork, because some of the motivation is based on substantially diverging views on core features.

While forks are generally to be avoided if possible, they are as much an unavoidable aspect of open source software (OSS) dynamics as funding challenges. In time, we may even retrospectively consider that this wasn’t a fork at all. Comma Chameleon – in its current state – amounts to a proof of concept or experiment. The fact that the ODI Australian Network want to take that experiment further – fork or no fork – is indicative that ODI Lab’s experimental foray was successful. And because Comma Chameleon is being forked, it can continue to be an experiment.

So what happens next? Comma Chameleon is going to remain part of the ODI Toolbox and remain in its Alpha state. The ODI Australian Network are now working on Data Curator, which has some initial releases planned and funded, with much more on the drawing board. If you're curious, check it out.

Stephen Fortune is ODI Code Fellow. Follow @stephenfortune on Twitter

If you have ideas or experience in open data that you'd like to share, pitch us a blog or tweet us at @ODIHQ