3:AM – Beyond the article: research data and software

This post is contributed by Joe Wass, tech lead of Crossref Event Data.

Martin Fenner opened the session reminding us that questions surrounding the tracking, citation and use of research objects beyond articles are at least as old as altmetrics.

Nicolás Robinson-García, Daniel S Katz, Heather Piwowar and Elena Zudilova-Seinstra looked variously at the methods and practice of citing datasets and software, and offered some observations. I think the session could be broken out into the following themes:

 – what are we tracking?

 – why are we tracking it?

 – how is it being tracked?

Nicolás opened the ‘what’ by showing that the world isn’t just divided into articles and datasets. Of the items he found in DataCite, most were datasets, but there was also text, images, collections and a lot of ‘other’. The most tweeted ‘data’ he found is in fact a poster.

Heather introduced the ‘why’ with a quote from a demotivated PhD student who felt their contribution to software development went unrecognised.

Daniel recognised that a significant amount of research effort went into writing software, but that the record of research effort fails to represent this. Without proper citation practices, development effort will continue to go undocumented. As Elena pointed out, software is a research method in its own right and it deserves full academic recognition. Increasing amounts of research is ‘born digital’ but traditional citation has failed to keep up with the activities that comprise research. Not only are authors not credited, but without proper citation, readers are unable to locate the software being mentioned.

Apart from appropriate capturing of activity, Daniel also pointed out that a lack of formal recognition of the software’s role could prevent sustainable development, which is his end goal.

Daniel presented us with two choices for the ‘how’ of software citation: either we jam software citations into a system that doesn’t really suit it, or we completely rework the citation system. He considered the first option the most possible. 

Daniel described the Force11 Software Citation group’s effort to improve traditional citation of software. He outlined the six Force 11 software principles. He also had some direct advice for make software citable and for citing software: publish it, get a DOI, mention it in the README or CITATION file. And when using software, always look for this information so you can cite the software directly, not a paper about it.

Heather described Depsy, an experimental system for analysing software dependencies within the PyPI and CRAN repositories (for Python and R respectively) and combining it with usage metrics. It tracks dependencies between packages and usage figures from repositories. After a year of operation, she has had a lot of positive feedback, including a validation of the assumptions that they made. The most prominent conclusion seemed to be “we want more, and we want it more real time”.

Elena described softwareX, a journal of software launched two years ago. Authors describe software that they have written with a short article, they include some metadata and a permanent link to a ‘frozen’ copy of the version of the software. They are invited to publish updates to software. Peer review is conducted but it covers usability and scientific impact rather than a full code review.

The session gave us a glimpse into a sub-field that is sometimes less talked about than article metrics and showed us the diversity of research objects that are out there.