Open Code refers to custom, author-generated code used in a scientific research study — often during data collection, interpretation or analysis—and subsequently made publicly available under an Open Access license via a linked repository, or as Supporting Information.
Reference: Public Library of Science (PLOS). Open Code.
If you wish to find out more about or participate in Open Source, please see here.
You may like to refer to this checklist when making your code open:
The code should be well documented and can actually be executed. This is influenced by:
Tell the story about your data using data versioning across stages of transformation, i.e. from raw, interim, processed, will allow stakeholders to validate that the logic is sound and data can be trusted. The analysis can be extended or reverted as necessary.
Same random numbers
Random numbers will always be a part of machine learning workflows, when train/test splits, cross validation, or optimization takes place to name a few. You can control them with seed numbers. The “seed” is a starting point for the sequence and the guarantee is that if you start from the same seed you will get the same sequence of numbers. Random seeds allow for quick troubleshooting of problems as the pipeline is built out, because they introduce Reproducibility into your model outputs. This is especially important when you use a learning algorithm with random effects in it, like neural nets or random forest. If you don’t use seeds, then you don’t know if the change in model outputs, standard errors, etc. is due to random effects or due to a change in the hyper-parameters. To ensure that this randomness is at least temporarily consistent while you build out your product, then setting a random seed controls and eliminates random deviation in your machine learning pipeline.
Reference: Carlos Brown (2020). Reproducibility in Data Science. Medium.
To improve the sharing and reuse of research software, the FAIR for Research Software (FAIR4RS) Working Group has applied the FAIR Guiding Principles for scientific data management and stewardship to research software.
Adoption and implementation of the FAIR for Research Software principles will create significant benefits for many stakeholders, including increased research reproducibility for research organizations, better practices and more software usage for its developers, clarity for funders around their own policies and requirements for software investments, and guidelines for publishers on sharing requirements.
Anonymous code sharing as part of peer review editorial process:
Anonymous GitHub: Allows you to simply anonymize your Github repository. Several anonymization options are available to ensure that you do not break the double-anonymize such as removing links, images or specific terms. You still keep control of your repository, define an expiration date to make your repository unavailable after the review.
We have brought together over 150 repositories of open standards, data and source code, tackling some of the most important challenges in wrangling multi-modal data and generating replicable insights.