Publication Catalog

Important

The workflow described here remains a valid description for dataset publication on Sciebo. However, as a means for submitting datasets to the SFB1451 catalog, it has been deprecated in favour of the procedure described in Register data in the data catalog.

If publications are to be shown in the catalog, information about them needs to be included directly in the dataset (metadata) files, preferably using standard metadata formats (e.g as citation file format references field) or as a datalad-tabby publications table. All datasets with project descriptions and publication metadata created according to the workflow description below (either by INF or by respective projects) have been republished to GitHub as the sfb1451/all-projects superdataset.

Publication Catalog from Sciebo: a walkthrough

Create a dataset containing:
- project description – project title, involved people, and abstract – in CITATION.cff file;
- information on project's publications in one or many files in .ris or .nbib format.
Push the dataset to Sciebo in export mode.
Share your Sciebo project folder with INF and Z01.

Preliminary: Sciebo access

We will use Sciebo to publish the datasets. Sciebo is a file sharing service (much like Dropbox or Google Drive) for scientific institutions in NRW. It is a federated service, and can be accessed under different addresses, depending on your institution.

Sciebo accounts are available for most higher-education institutions from NRW. If you don't have an account, you can register at https://hochschulcloud.nrw/ through your institutional login. If your institution is not on the list, you will need to contact INF for a guest account.

All content will be placed in project folders within the SFB1451 project box. INF will share project folders with PIs of all projects, who will be able to manage permissions for their group members. Although nothing in the SFB data management strategy requires us to have a central storage, using a Sciebo project box for this particular task will help us establish a common baseline.

Preliminary: DataLad installation

You will need a working DataLad installation with DataLad-NEXT extension (see DataLad and DataLad-NEXT).

Instructions below are written for command line usage. If you prefer a graphical user interface, you may also use the DataLad-Gooey extension, which provides just that (and comes with DataLad-NEXT already).

Create a dataset

On your computer, create a new empty DataLad dataset, and navigate into it:

datalad create my-project
cd my-project
datalad run-procedure cfg_text2git

Info

The run-procedure cfg_text2git will configure DataLad so that text files will not be annexed – this is a useful setting for the files we will be adding.

Create a citation file

Create a CITATION.cff file with the following fields: cff-version, title, message, type, authors, abstract. This file will contain the general information about your project, which will be shown in the catalog. For title, authors, and abstract, we suggest that you reuse the information from the SFB website.

We recommend using the cffinit generator, which will guide you through adding all the content (hint: after completing the mandatory section, you need to click "add more" to add the abstract), and let you copy or download the file. Alternatively, you can write the file from scratch in any text editor.

This is what the file for INF could look like:

cff-version: 1.2.0
title: 'INF: Data Management for Computational Modelling'
message: This is our project description
type: dataset
authors:
  - given-names: Michael
    family-names: Hanke
    affiliation: 'INM-7, Forschungszentrum Jülich'
    orcid: 'https://orcid.org/0000-0003-3456-2493'
  - given-names: Michał
    family-names: Szczepanik
    affiliation: 'INM-7, Forschungszentrum Jülich'
    orcid: 'https://orcid.org/0000-0002-4028-2087'
abstract: >-
  This project will provide expertise for access,
  description, and modelling of the data collected in
  the individual projects as well as Z02 and Z03. INF
  will continuously assess general workflows,
  resource requirements, and data analysis processes
  to capture between-project differences that may
  impact data comparability and re-usability across
  projects. INF will provide tools, services and
  training to help projects align their research
  output to (i) facilitate data analysis for
  extracting common activity patterns and mechanisms
  underlying motor behaviours across species, and
  (ii) promote data-driven computational modelling.

Place this file in the my-project directory and save the addition in DataLad:

datalad save -m "Add citation file" CITATION.cff

Info

Citation File Format is a standard for plain text files with human- and machine-readable information for datasets (and software). You can read more on the CFF website.

Add bibliographic information to the dataset

Next, add your project's publication information to the dataset. This information will be displayed in the Publications tab of the catalog. Store the information in RIS (.ris) or MEDLINE/PubMed (.nbib) format. You can put all publications into one file, or use multiple files.

These file formats are widely used, and can be exported from bibliography management software, Google Scholar (RIS format available as RefMan) or PubMed.

Place the file(s) into the dataset folder, in the publications subdirectory, and save the addition with DataLad:

datalad save -m "Add bibliographic info"

Configure Sciebo "sibling"

To allow uploading to Sciebo, we must configure a dataset "sibling". In this case, we will need extended functionality provided by DataLad-next extension. A complete walkthrough can be found in the WebDAV walkthrough page. In short, once you enable DataLad-next, the configuration can be done with the following command (replace <WEBDAV URL> with the url pointing to your folder - see note below):

datalad create-sibling-webdav \
--dataset . \
--name sciebo \
--mode filetree \
<WEBDAV URL>

If this is the first time you are using DataLad with Sciebo, you will be prompted for your credentials.

Info

Sciebo uses your entire e-mail address (name@example.com) as its username.

Info

The --dataset option lets us explicitly specify the dataset for which we configure a sibling. The --name option sets the name which we will use later when publishing. The --mode filetree option enables filetree mode, meaning that the sibling will have the same, human-readable, file tree layout as the folder on your drive.

The WebDAV url can be obtained from Sciebo's web UI, by clicking at "Settings" in the lower left. This URL points to the user's home directory, and any subfolders must be appended "by hand". Nonexisting subfolders will be created. The URLs will be different for different instances and users. For example, with FZJ's instance, the full URL to the dataset folder would look like this (<USER> and <PROJECT> are placeholders):

https://fz-juelich.sciebo.de/remote.php/dav/files/<USER>/<PROJECT>/pub-dataset

Publish to Sciebo

To publish this dataset to Sciebo use:

datalad push --to sciebo

This completes the walkthrough! Any time you want to update the content, you can edit the files, datalad save and datalad push.

We expect that in the SFB data catalog, this dataset's entry can become the landing page for your project. You will then be able to use this dataset as a "registry" for your other datasets, by adding them as subdatasets.

Appendix: cloning from Sciebo

Datasets published (pushed) to Sciebo can also be consumed (cloned) by users with whom the Sciebo folders are shared. There is one caveat: because each user has their own URL pointing to the shared folder, consumers will need to reconfigure their clones. An example command would look as follows:

git annex initremote mysciebo --private --sameas=<annex UUID> type=webdav "url=https://fz-juelich.sciebo.de/remote.php/dav/files/<USER>/<PROJECT>/pub-dataset" exporttree=yes