Models, Software, and Code
Instructions on contributing model output, software, and code to BCO-DMO
If you have code or software you developed as part of your project, we can publish it from a supplemental files section of relevant Dataset Metadata Pages. If your code has already been documented and archived (e.g. citable with a Zenodo DOI), we can link to it as a Related Publication from your Dataset Landing Page.
Check your funding agency's requirements for making your code public. If you were funded by NSF's OCE division you are required to make your code public within two years of developing it.
You can send us your code any way you choose, whether by attaching files to an email, uploading them to the submission tool (https://submit.bco-dmo.org/), sending us a repository link (e.g. github), or sending us a Zenodo DOI.
Code and software should be documented and commented to an extent that it is understandable to others. Full reproducibility isn't always possible but you should include enough detail that someone could follow along with what was done and be able to understand how it works and how the results were produced. The goal is transparency and transferable knowledge. The knowledge gained and techniques employed should be reproducible even if the exact workflow can't be.
We do not require you to get a DOI for your code before submitting to BCO-DMO. However, if you would like to make your code persistent and citable you can archive your code using Zenodo. You will then have a DOI and formal citation for your code.
You can upload your files directly to Zenodo or link your GitHub repository if you have one. Here is information on how to do that if you have your code in a GitHub repository: "Making Your Code Citable" https://guides.github.com/activities/citable-code/. Make sure to pay attention to the part of the guidance that explains to link Zenodo to your GitHub repository before making a Github release.
Examples of Github releases archived at Zenodo:
- Dykman, L. (2021). ldykman/FD_EPR: Github repository release associated with Dykman et al. (2021) Ecology (Version v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.4625160
- Schenck, F. R. (2022). schenckf/BWE_Experiment: The effect of warming on seagrass wasting disease depends on host genotypic identity and diversity - Analyses (Version V2.0.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.7129500
Please include these topics in your code documentation to the greatest extent possible:
- Provide a general description of what your code does and how it works.
- Describe dependencies and prerequisites. Don't forget to include the version of the programming language you used.
- Provide information about settings and configurations, if applicable.
- Provide a description to go along with each file.
- Describe what the file does if it is code.
- If it is an input data file, describe where it came from, what is in it (e.g. CTD data from cruise KN1818 obtained from R2R, DOI: ####), and provide parameter names descriptions, and units (e.g. NH4, Pore water dissolved ammonium, micromolar (uM)).
For more information and examples, you can refer to "How to Write Good Documentation": https://guides.lib.berkeley.edu/how-to-write-good-documentation.
Comment lines within code are a recommended best practice, however, they do not take the place of the above recommendations.
Where do you put documentation? If you have a GitHub repository you can put documentation in your README.md file. Or you can put it in a supplemental file archived with your code. You can also provide it to BCO-DMO along with your data submission's methodology section or as a supplemental file.
- Example of documentation as part of the github README.md file: a README.md archived at Zenodo with package DOI: 10.5281/zenodo.4625160)
- Example of code documentation as a supplemental file attached to a BCO-DMO dataset.
- Include a general description of how your dataset was produced using your code.
- Example: "..the column Functional Guild was generated by hierarchical clustering in R using the function gowdis in the package FD as described in Dykman et al. (2021). Scripts to run these analyses are available at Zenodo (Dykman, 2021, DOI: 10.5281/zenodo.4625160)."
- Document the version of your code used to produce the dataset you are submitting to BCO-DMO. This lets us connect the exact version of your code to the exact data version we publish at BCO-DMO.
- Provide a version number, commit, release, or DOI. This lets us connect the exact version of your code to the exact data version we publish at BCO-DMO.
- If it is related code used to analyze (or plot) your dataset for a subsequent journal publication, state what version of the code was used for the journal publication.
- Example: "...All code was written and run in R (version 3.6.1, www.R-project.org). Github repository https://github.com/schenckf/BWE_Experiment V2.0.0 archived at Zenodo (DOI: 10.5281/zenodo.7129500). A general description of the code is included in the repository release in file "Analysis Description.docx."
- Supply any settings and configurations used to produce your dataset.
- Document the versions of the language and packages you used. e.g.
R version 3.4.1 (2017-06-30), packages;
vegan v2.5.4 (Oksanen et al. 2019), ggplot2 v3.2.1 (Wickham 2016).
- [If applicable] Provide input/config files. We can publish these as supplemental files attached to the dataset.
Where possible provide the exact package and software version numbers you used to generate your data and the recommended citation for your software.
Example of citing the language version:
These data were produced using code run with R version 3.4.1 (2017-06-30).
Example of software cited in methodology text: * From the dataset "Tidal study of seawater microbial communities" https://www.bco-dmo.org/dataset/783679
To understand the variability in microbial communities over time at all sites, Bray-Curtis dissimilarity was calculated between each sample in the R package vegan v2.5.4 (Oksanen et al. 2019) and illustrated using non-metric multidimensional scaling (NMDS) in the R package, ggplot2 v3.2.1 (Wickham 2016).
- Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org. https://doi.org/10.1007/978-3-319-24277-4
- Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens HH, Szoecs E, Wagner H (2019) Vegan: Community Ecology Package. R package version 25-4. Available from https://cran.r-project.org/package=vegan https://cran.r-project.org/src/contrib/Archive/vegan/vegan_2.5-4.tar.gz
Note that the reference Wickham (2016) for ggplot2 doesn't include the version. But the methods text does include the exact version number that was used to produce the dataset.
If you have a lot of installed packages you can supply a supplemental file with the packages and version numbers you used instead of citing them all in your methods text.
To print the R version associated with a project in RStudio:
 "R version 4.1.0 (2021-05-18)"
To get the version of a package you imported (dplyr is the name of an example package) :
To print package information about all loaded packages (including package version), use the installed.packages() function.
This function returns a matrix containing one row per loaded package. The matrix shows
Since this is a matrix, you can specify which columns, and therefore which metadata is returned. For example, if you want the package name and the version for each package, you would run:
> package_metadata <- as.data.frame(installed.packages()[,c(1,3:4,11)])
> rownames(package_metadata) <- NULL
> package_metadata <- package_metadata[is.na(package_metadata$Priority),1:2,drop=FALSE]
> print(package_metadata, row.names=FALSE)
To print all high-level system information in RStudio:
R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4
Matrix products: default
attached base packages:
 stats graphics grDevices utils datasets methods base
other attached packages:
 fuzzyjoin_0.1.6 stringr_1.4.1 dplyr_1.0.7 lubridate_1.7.10
loaded via a namespace (and not attached):
 Rcpp_1.0.9 rstudioapi_0.13 magrittr_2.0.3 tidyselect_1.1.1
 R6_2.5.1 rlang_1.0.6 fansi_0.5.0 tools_4.1.0
 utf8_1.2.1 cli_3.4.1 DBI_1.1.1 ellipsis_0.3.2
 assertthat_0.2.1 tibble_3.1.2 lifecycle_1.0.3 crayon_1.5.2
 purrr_0.3.5 vctrs_0.3.8 glue_1.6.2 stringi_1.7.8
 compiler_4.1.0 pillar_1.6.1 cellranger_1.1.0 generics_0.1.0
In this output, the standard or base packages are listed under "attached base packages." These packages are loaded by R by default and do not need to be called into the project.
"Other attached packages" are those that are loaded into a project using the library() function.
Packages listed under "loaded via a namespace" are those imported automatically by R as dependencies of "other attached packages."
To get your python version using command line:
$ python3 --version
$ python --version
To get your python version from within a notebook (jupyter, colab, etc)
To get the version of a package you imported you can use
>>> import numpy
There are several ways to get all package versions in your environment. See pip "list" "freeze" or "show" in pip documentation.
MATLAB Version: 220.127.116.11 (R2013b)
MATLAB License Number: 234567
Operating System: Microsoft Windows 7 Version 6.1 (Build 7601: Service Pack 1)
Java Version: Java 1.7.0_11-b21 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
MATLAB Version 8.2 (R2013b)
Simulink Version 8.2 (R2013b)
Control System Toolbox Version 9.6 (R2013b)
Please include these topics in your documentation to the greatest extent possible:
- Provide a general description of what the model does.
- Supply input data and files, settings, and configurations.
- Describe the results and output of model runs.
- Describe the format of the output files, what kind of data are in them, and the parameter descriptions and units (e.g. temperature of the surface layer, degrees Celsius).
- If you used a community-developed model (e.g. ROMS), please cite the model you used along with the version number, if possible. Provide a link where documentation can be found for the model.
- If you developed your model as part of your project, refer to the above guidance for submitting software/code to BCO-DMO.
We can publish the full set of output files. However, if the full output can be easily recreated from the files and methodology you provide, it would be fine to only publish one output file as an example. If publishing the full output would be valuable to your community to make things easier for others' research, that would be another reason to publish the full output.
No. GitHub and Bitbucket are code repositories but not archives. Code repositories are great for sharing and collaborating, but they can be taken down at any time by the authors or the repository. If you archive your code repository using Zenodo (or another long-term archive), it will be preserved and you will get a digital object identifier (DOI) for your code. See "Making Your Code Citable" https://guides.github.com/activities/citable-code/
While Zenodo does not have the same support for connecting to Bitbucket as it does for GitHub, you can still archive a copy of your Bitbucket repository at Zenodo by uploading your repository files directly. It is a good idea to make a tag in Bitbucket to document your code version and include the version information in your documentation when creating a Zenodo DOI.
We understand the reality that datasets are sometimes produced using code that isn't publicly available. Whether it is proprietary software from an instrument manufacturer or if you used code provided by a colleague that isn't published in a way that follows best practice recommendations for long-term archiving. When you submit a dataset to BCO-DMO provide whatever information you have about the code. * Who are the authors? Provide real names not just usernames if you have them. * Briefly describe what the code does. * How do you get the code/software? Is it publicly accessible? If so, where can it be found? Or is it available upon request or purchase from the authors or company? * What version of the code did you use? Or lacking that, what date did you use the code? If it is public on github provide the link to the repository and the version (or commit) you used if available. If you know the authors of public, but not archived code, encourage them to publish their code with documentation, a release, and a license. (See "Making Your Code Citable" https://guides.github.com/activities/citable-code/). They get a formal citation they can put on their CV after archiving it which is a great incentive. Not to mention enhanced discoverability and usage.