By Kobe Desender and Anne Urai.
In a recent lab outing at the beach of the Elbe, we discussed how we want to establish open science practices as a lab, and what issues can arise when doing so. We read McKiernan et al. 2016 as a starting point for our discussion.
Beach, beers and open science
There was unanimous agreement that research articles should be freely available to read. The formal way to accomplish this is to publish our work as open access articles. We all agreed this is desirable, primarily because the results of taxpayer-funded science should be available openly, but also because open access publications are more frequently read, receive more citations, and generate more media coverage (McKiernan et al. 2016).
Many, but not all journals we usually publish in are fully or hybrid open access.
One obstacle is funding: open access journals and hybrid options are expensive. Some of our funding sources support open access fees, but this is not the case for all grants. In such cases, an open access article means less money for research costs or conference participation. However, there is general support from our department at the UKE for open access publications, and we agreed that open access fees should be incorporated in grant budgets whenever possible. Another difficulty arises with papers targeted at a specific audience, for which no open access journals exist so far (e.g., theoretical and mathematical psychology as well as experimental psychology journals). We agreed that the fit to the journal and its target audience should be the leading factor in choosing where to submit. When choosing between two journals that have an equally good fit, open access journals can be given more weight.
Lastly, we discussed the very common practice of making pdfs available on our own lab website. These are usually indexed by Google Scholar, and by the Unpaywall browser extension. While this practice is extremely widespread, posting the final pdf can be prohibited by the journal’s copyright agreement – what exactly you are allowed to share can be found in the Sherpa Romeo database. We agreed that it’s the first author’s responsibility to be aware of the signed copyright agreement. An interesting case study played out recently, when APA requested that researchers take down pdfs on their personal website. We also agreed that open access publications are preferable to pdfs on the lab website in terms of impact: many of us shared the experience of not reading a paper if it’s not possible to quickly access it (but see here), especially when it’s interesting but not immediately relevant to your project.
We discussed whether it should be a lab guideline to put our papers on preprint repositories, specifically on bioRxiv. Preprints are particularly useful to demonstrate work that did not yet reach the publication stage. This can be an advantage in job and or grant applications. One additional positive effect of posting preprints might be that authors refrain from drawing overly strong conclusions (even overselling their work) in the first submission and then tone down during the revision – which seems common practice in many high-impact journals. Lastly, preprints establish priority of your findings or ideas, which can protect against scooping.
We concluded that uploading papers to bioRxiv is something our lab will generally aim for in the future, and our forthcoming submissions will follow that path (our first two preprints: Braun et al. and Pfeffer et al.). However, we won’t set this as a fixed rule but decide on a project-by-project basis; all co-authors should agree before a paper is published on a preprint server. If we decide to post a preprint, we will do so around the time of submission to a journal (Kriegeskorte, 2016).
Open data & code
Apart from the article itself, we agreed that the data and code of a project should also be freely available after publication. For many of us, this will require some more effort during the analysis stage: Code should be written in a cleaner and more efficient style, and should be documented well, knowing that other people have to be able to understand it. The obvious advantage is that the long-term sustainability of code and data can aid reproducibility (other researchers can check your analysis) and replication (other researchers can easily repeat your experiment). Moreover, adopting this practice can be very useful to your future self, and reduce the likelihood of errors because of sloppy programming. Many journals now require a statement about availability of data and code. If not, the link should be provided in the final article itself to make readers aware of the materials.
A concern often voiced about freely sharing data, is that other researchers might free-ride based on data that is collected with lots of efforts and patience. In our case, this would be a matter of concern in particular for large pharmacological MEG data sets, which several of our lab members have acquired at a large scale over the past years — but also for large (7T) fMRI data sets. Both of these are relatively rare in the community, have required mastering substantial logistical challenges and time investment by lab members. Our lab often uses such data sets for testing a number of different hypotheses that require separate analyses to be reported in separate papers. In general, our lab is committed to make all our data available to the community at some point during each project. Whether this is before, at the moment of the primary publication, or after a later publication by our lab, will be decided on a project-by-project basis.
One interesting possibility with rather unique and large datasets that we will consider from now on is to publish these as a data paper. Niklas has published a large amount of eye-movement data in Scientific Data (Wilming et al. 2017), which received quite some media attention, and he received several inquiries to collaborate with other researchers. Thus, freely providing data and code of our experiments might lead to more visibility and start potential new collaborations. For less special data (such as behavioural datasets) simply uploading the data is not that much work, and allows others to reproduce the exact analyses.
Preregistration is generally considered good practice. However, some concerns were voiced that it might not be well suited for all projects. For replication studies, student projects, or behavioral studies into a simple question (of which we have quite a few), pre-registering the analysis plans is commendable. However, for other projects it might be less suited. Many good ideas for data analyses that dive deeper into the mechanism of interest actually arise after having looked at the first simple analyses. Also, our lab aims to always use the most advanced analyses for any problem we address. We continue to develop analysis tools and monitor the methodological publications from other labs interested in cutting-edge data analysis. When a new, better analysis tool becomes available (or to our attention) during the evolution of a project (i.e., after the initial analysis plan), this might call for changes in the analysis approach. We are against discouraging such changes, since this would unnecessarily slow down scientific progress. Such analyses could be presented together with the preregistered ones in the final publication, while making the difference between pre-registered and not preregistered analyses explicit.
In addition, or as an alternative, to illuminate the reproducibility of our findings, we aim to self-replicate our results, both across and within papers. As an example of the former, in Anke’s new paper (Braun et al.), it is explicitly documented how the findings of Anne’s previous work (Urai et al. 2017) are replicated. As an example of the latter, in the Supplementary Materials of his recent paper, Kobe added the reanalysis of four old datasets that were reanalysed in the same way as the main experiment (Desender et al. 2017). In Jan-Willem’s latest paper, his main behavioural findings were also replicated in several (old and newly collected) datasets (de Gee et al. 2017). Our lab feels strongly that approaches like these increase our confidence in our own findings, and demonstrate to the community that our reported findings are robust and reproducible.
Overall, we did not reach a consensus to generally preregister studies. Everyone agreed that this was useful in some cases, but there won’t be a general rule, at least for now. The question will be decided on a case-by-case basis among all members of a project team. But every lab member is requested to actively think about this option before starting a new project.
Cross-checks of code
A final issue that we discussed (inspired by Ana Todorovic) is the concept of frenemies: a lab member who is not a direct collaborator and actively thinks about alternative scenarios that might explain the patterns we see in our data. In the process of analyzing data, one easily becomes stuck in a confirmatory mindset, analyzing data in ways that confirm our current hypotheses. Some people indicated they sometimes felt uncomfortable to directly criticize the work of their lab members. We agreed that this would become easier if they were explicitly appointed to the role of frenemy, whose actual goal than becomes to provide alternative interpretations for the findings, think about further steps of analyses that follow from the current hypothesis, etc.
We aim to establish a similar system for analysis code. A so-called coding buddy will check code for potential errors, and/or try to reproduce the analyses independently from the lead author. Currently this does happen in some projects (e.g. Braun et al.), but clearly not in all. Everyone agreed this would be helpful in general. However, practical limitations may limit the usefulness of this idea, because it implies a major time investment and may sometimes not be compatible with the level of expertise of individual lab members — we use a variety of different techniques and not every lab member fully understands the most advanced analyses or modeling approaches used by others. In some specific cases, another role for a frenemy or coding buddy could be to simulate data from a model that contains everything except the relevant factor we are investigating, and then gives these data to the lead author.
Everyone in the lab agreed on the usefulness of frenemies and coding buddies, and we will try and implement this practice. One important question this brought up is how lab members get credit for this rather challenging role. According to current publishing rules, the role of a frenemy does not warrant co-authorship on a paper. Part of us would like to change this and explicitly indicate that our lab considers this practice an important aspect of scientific research. However, other disagreed and there was, so far, no unanimous agreement on this issue.
As a lab, we aim to integrate open science practices into our work whenever possible. Open science and reproducibility are currently hot topics, and we expect scientific communities to integrate many of the above into their standard practices over the years. We will continue to assess our own work in the light of these debates, and update our lab policies.