Publication Catalog
Caution
The workflow described here remains a valid description for dataset publication on Sciebo. However, as a means for submitting datasets to the SFB1451 catalog, it has been deprecated in favour of the procedure described in Register data in the data catalog.
If publications are to be shown in the catalog, information about them needs to be included directly in the dataset (metadata) files, preferably using standard metadata formats (e.g as citation file format references field) or as a datalad-tabby publications table. All datasets with project descriptions and publication metadata created according to the workflow description below (either by INF or by respective projects) have been republished to GitHub as the sfb1451/all-projects superdataset.
Publication Catalog from Sciebo: a walkthrough
- Create a dataset containing:
- project description – project title, involved people, and
abstract – in
CITATION.cff
file; - information on project's publications in one or many files in
.ris
or.nbib
format.
- project description – project title, involved people, and
abstract – in
- Push the dataset to Sciebo in export mode.
- Share your Sciebo project folder with INF and Z01.
Preliminary: Sciebo access
We will use Sciebo to publish the datasets. Sciebo is a file sharing service (much like Dropbox or Google Drive) for scientific institutions in NRW. It is a federated service, and can be accessed under different addresses, depending on your institution.
Sciebo accounts are available for most higher-education institutions from NRW. If you don't have an account, you can register at https://hochschulcloud.nrw/ through your institutional login. If your institution is not on the list, you will need to contact INF for a guest account.
All content will be placed in project folders within the SFB1451 project box. INF will share project folders with PIs of all projects, who will be able to manage permissions for their group members. Although nothing in the SFB data management strategy requires us to have a central storage, using a Sciebo project box for this particular task will help us establish a common baseline.
Preliminary: DataLad installation
You will need a working DataLad installation with DataLad-NEXT extension (see DataLad and DataLad-NEXT).
Instructions below are written for command line usage. If you prefer a graphical user interface, you may also use the DataLad-Gooey extension, which provides just that (and comes with DataLad-NEXT already).
Create a dataset
On your computer, create a new empty DataLad dataset, and navigate into it:
datalad create my-project
cd my-project
datalad run-procedure cfg_text2git
Info
The run-procedure cfg_text2git
will configure DataLad so that text
files will not be annexed – this is a useful setting for the files we
will be adding.
Create a citation file
Create a CITATION.cff
file with the following fields: cff-version,
title, message, type, authors, abstract. This file will contain the
general information about your project, which will be shown in the
catalog. For title, authors, and abstract, we suggest that you reuse the
information from the SFB website.
We recommend using the cffinit generator, which will guide you through adding all the content (hint: after completing the mandatory section, you need to click "add more" to add the abstract), and let you copy or download the file. Alternatively, you can write the file from scratch in any text editor.
This is what the file for INF could look like:
cff-version: 1.2.0
title: 'INF: Data Management for Computational Modelling'
message: This is our project description
type: dataset
authors:
- given-names: Michael
family-names: Hanke
affiliation: 'INM-7, Forschungszentrum Jülich'
orcid: 'https://orcid.org/0000-0003-3456-2493'
- given-names: Michał
family-names: Szczepanik
affiliation: 'INM-7, Forschungszentrum Jülich'
orcid: 'https://orcid.org/0000-0002-4028-2087'
abstract: >-
This project will provide expertise for access,
description, and modelling of the data collected in
the individual projects as well as Z02 and Z03. INF
will continuously assess general workflows,
resource requirements, and data analysis processes
to capture between-project differences that may
impact data comparability and re-usability across
projects. INF will provide tools, services and
training to help projects align their research
output to (i) facilitate data analysis for
extracting common activity patterns and mechanisms
underlying motor behaviours across species, and
(ii) promote data-driven computational modelling.
Place this file in the my-project
directory and save the addition in
DataLad:
datalad save -m "Add citation file" CITATION.cff
Info
Citation File Format is a standard for plain text files with human- and machine-readable information for datasets (and software). You can read more on the CFF website.
Add bibliographic information to the dataset
Next, add your project's publication information to the dataset. This
information will be displayed in the Publications tab of the catalog.
Store the information in RIS (.ris
) or MEDLINE/PubMed (.nbib
)
format. You can put all publications into one file, or use multiple
files.
These file formats are widely used, and can be exported from bibliography management software, Google Scholar (RIS format available as RefMan) or PubMed.
Place the file(s) into the dataset folder, in the publications
subdirectory, and save the addition with DataLad:
datalad save -m "Add bibliographic info"
Configure Sciebo "sibling"
To allow uploading to Sciebo, we must configure a dataset "sibling".
In this case, we will need extended functionality provided by
DataLad-next extension. A complete walkthrough can be found in the
WebDAV walkthrough page. In short,
once you enable DataLad-next, the configuration can be done with the
following command (replace <WEBDAV URL>
with the url pointing to your
folder - see note below):
datalad create-sibling-webdav \
--dataset . \
--name sciebo \
--mode filetree \
<WEBDAV URL>
If this is the first time you are using DataLad with Sciebo, you will be prompted for your credentials.
Info
Sciebo uses your entire e-mail address (name@example.com) as its username.
Info
The --dataset
option lets us explicitly specify the dataset for which
we configure a sibling. The --name
option sets the name which we will
use later when publishing. The --mode filetree
option enables filetree
mode, meaning that the sibling will have the same, human-readable, file
tree layout as the folder on your drive.
The WebDAV url can be obtained from Sciebo's web UI, by clicking at
"Settings" in the lower left. This URL points to the user's home
directory, and any subfolders must be appended "by hand". Nonexisting
subfolders will be created. The URLs will be different for different
instances and users. For example, with FZJ's instance, the full URL to
the dataset folder would look like this (<USER>
and <PROJECT>
are
placeholders):
https://fz-juelich.sciebo.de/remote.php/dav/files/<USER>/<PROJECT>/pub-dataset
Publish to Sciebo
To publish this dataset to Sciebo use:
datalad push --to sciebo
This completes the walkthrough! Any time you want to update the content,
you can edit the files, datalad save
and datalad push
.
We expect that in the SFB data catalog, this dataset's entry can become the landing page for your project. You will then be able to use this dataset as a "registry" for your other datasets, by adding them as subdatasets.
Appendix: cloning from Sciebo
Datasets published (pushed) to Sciebo can also be consumed (cloned) by users with whom the Sciebo folders are shared. There is one caveat: because each user has their own URL pointing to the shared folder, consumers will need to reconfigure their clones. An example command would look as follows:
git annex initremote mysciebo --private --sameas=<annex UUID> type=webdav "url=https://fz-juelich.sciebo.de/remote.php/dav/files/<USER>/<PROJECT>/pub-dataset" exporttree=yes