2:AM Standards in altmetrics
9:45 – 11:00: Standards in altmetrics
Chair: Fiona Murphy, Senior Associate at Maverick Publishing Specialists
Geoff Bilder, CrossRef
Zohreh Zahedi, CWTS-Leiden University
Challenges in altmetric data collection: are there differences among different altmetric providers/aggregators?
This project focuses on the study of data collection consistency across three altmetrics providers or aggregators: Altmetric.com, Mendeley and the Open Source software Lagotto (used by PLOS, CrossRef and others). The aim is to explore if metrics for a same set of publications are consistent across them. A random sample of 30,000 Crossref (15,000) and WoS (15,000) DOIs from 2013 has been considered. The data collection has been done at the same date/time on July 23 2015 starting at 2 PM CEST using the Mendeley REST API, Altmetric.com dump file and the Lagotto open source application. Several discrepancies among these altmetrics data providers in reporting metrics have been found. Regarding the coverage of DOIs per provider, Mendeley has the highest coverage 20,677 (69%), Lagotto 20,364 (68%) and altmetric.com 6,946 (21%). As expected Mendeley provides the highest values of readership counts compared to Lagotto and altmetric.com. Lagotto provides the highest number of Facebook counts, Reddit mentions and CiteULike counts. Altmetric.com provides the highest number of tweets. Regarding ‘intensity’ (average counts for the papers with at least one event) there are differences across the data providers in the common data sources (Tweets, Facebook, CiteULike, Reddit and Mendeley counts). Regarding overlapping papers with metrics, Altmetric.com has a higher twitter coverage (21%) and Facebook coverage (5%) than Lagotto Twitter (0.1%) and Lagotto Facebook (4%). For CiteULike (2.5%) and Reddit (36%) Lagotto has higher coverage than altmetric.com of CiteULike (1.9%) and Reddit (26%). There are some differences for Mendeley readerships as well and for a small set of DOIs with higher Mendeley reader counts reported by Lagotto and altmetric.com than Mendeley itself.
The reasons for the different metrics relate to the different methods in collecting and processing metrics by the different providers. How each provider queries from sources also matters (using DOI or other metadata), using different APIs (for example for Facebook and Twitter) or possible time lags in the data collection or updating issues. Furthermore, if the data provider is reporting the public Facebook counts or public tweets or compiling all the retweets or favorites in one metric or as a separate value cause differences in the counts. There are also issues with tracking DOIs from difference registration agencies Moreover, there are issues with the quality of metadata for which altmetrics are collected, for example differences in publication dates between WoS and Crossref. Other problems include accessibility issues (e.g. with Twitter), resolving DOIs to URLs issues (e.g. differences across publisher platforms in resolving DOIs to journal landing pages), etc. These results emphasize the need for adhering to best practices in altmetric data collection both by altmetric providers and the publishers. Future steps include developing guidelines and recommendations regarding altmetric data collection to introduce transparency and consistent across providers. NISO in 2015 has initiated a working group on altmetrics data quality and the group has developed a draft code of conduct for collection, processing, dissemination and reuse of altmetric data.
Martin Fenner, NISO
Gregg Gordon, SSRN
Altmetric Data Integrity is not a Game
Interest in and use of article-level metrics (ALMs) has grown rapidly amongst the research community, by researchers, publishers, funders, and research institutions. As this happens, it is critical to ensure secure and reliable data that is trustworthy and can be used by all.