Accessing motor assesment data collection on Windows

This note will explain in detail how to set-up your local copy of the aggregated motor assesment data, and help deal with potential configuration-dependent obstacles.

Overview of the encrypted workflow and prerequisites are explained in the encrypted workflow page. It is highly recommended that you read it first.

The note assumes that you are working on Windows and using cmd shell. It also assumes that you have access to the shared Sciebo folder, and the data is being encrypted to your GPG key.

Before you begin

GPG

If you install Git (which is a requirement for using DataLad), it comes with its own gpg executable. If you install gpg4win (which is the solution recommended for digital signatures in our docs on Input signing it naturally comes with a separate gpg executable. This means it is possible to have two GPG programs… which store their data (e.g. keys) in completely different locations on your computer.

What’s more, in this case, git-annex operations will default to using the one which came with Git, while user operations (be it via Kleopatra GUI or the command line) will more than likely1 use the one which came with gpg4win. This can become confusing. Fortunately, git-annex behaviour is configurable. The goal here is to have both the user and git-annex interact with the same gpg executable.

If you do not have gpg4win: you probably do not need to do anything.

If you have gpg4win and used it to generate (or import) your private key, you need to configure Git and git-annex to use this one.

First, find out the path to the gpg executable. The path components would tell you whether it came with Git or gpg4win - the latter is more likely.

where gpg

Then (if it is gpg4win) tell Git to use this one. Git-annex obeys the same setting. Replace PATH/TO/GPG with the path reported by the previous command.

git config gpg.program PATH/TO/GPG

GPG-agent

Operations which use your secret key (like downloading encrypted data with git-annex or listing your secret keys with gpg --list-secret-keys) require interactions with a program called gpg-agent 2. Normally, gpg-agent should be running in the background, and there is nothing to do.

If you notice gpg-agent is not running (e.g. get fails with an error message which says that gpg-agent is not running, or you can not list your secret keys), you can start the gpg-agent with:

gpg-connect-agent reloadagent /bye

Sciebo token (app password)

Instead of using your pasword (the one you use to log in to Sciebo), it is recommended to generate a token for use with DataLad & git-annex. Tokens are like passwords, but they are generated by the website rather than created by the user. A token can be revoked when it is no longer used or there is a risk that it got exposed.

To generate a token:

  • log in to Sciebo
  • click your username on the upper right and go to settings
  • select Security
  • scroll down to App passwords / tokens
  • enter a token name and click “Create new app passcode”
  • copy (and store safely) the username and token

The token name can be anything - it is only for your information, and will be displayed in the list of tokens, with the date of last use and an option to delete (revoke) the token.

WebDAV URLs

When using DataLad & git-annex, you will connect to Sciebo through a protocol called WebDAV. For this, you need to know your WebDAV URL, which is slightly different than the URL you use to visit Sciebo’s web interface.

Base WebDAV URL

To find your WebDAV address:

  • go to Sciebo (file view)
  • click ⚙️ Settings in the lower left corner
  • a “drawer” will slide up, showing your WebDAV URL

The WebDAV URL will look like this:

https://fz-juelich.sciebo.de/remote.php/dav/files/j.doe%40fz-juelich.de/

WebDAV URLs to folders

The URL above is pointing to your home directory. To access a specific folder, you will need to add some path components manually.

Most likely, a folder named Z03 will be shared with you. It will contain some folders; motor_export_data is the relevant one.

The folder path relative to your home directory will depend on whether you move the shared folder or not. Assuming that Z03 is in your home directory on Sciebo, the relative paths will be Z03/motor_export_repo and Z03/motor_export_data, respectively.

For use with DataLad, you will need to put the base URL and the relative paths together. Using the example above, and assuming that the shared folder is in the top-level directory, the data folder URL (for initremote) is:

https://fz-juelich.sciebo.de/remote.php/dav/files/j.doe%40fz-juelich.de/Z03/motor_export_data

Distributed storage

There are two components to the dataset:

  • The Git repository (which we can call a “lightweight dataset”) contains information about content identity and availability, but does not contain the actual data.
  • The content managed by git-annex (“annexed data”), i.e. the actual data.

For technical reasons they are stored and shared separately. In the chart below, a square box symbolizes a Git remote, and cylinders symbolize Git-annex special remotes.

flowchart TD
    A(Your local dataset clone)
    A ---o |clone, update| B["**JuGit** (public)"]
    A ---o |initremote, get| C[("**Sciebo**
    (restricted, encrypted)")]
    A ---o |"🚫"| D[("**INM-7 storage**
    (restricted, encrypted)")]

Retrieving data requires interacting with both the Git remote and the Git-annex special remote (key commands are shown over the lines). Because no data are stored in JuGit, the repository can be made public, and fetching information from JuGit requires no credentials. However, access to the data stored in Sciebo requires not only credentials, but also posession of a GPG key for which the data are encrypted. Note that the Git repository also indexes data stored in INM7-storage, but that special remote is not accessible outside INM-7.

Initial setup

Let git-annex know your token

Git-annex needs WebDAV credentials to access the stored repository and data. It expects them to be provided via environment variables. Set the environment variables with the set command (substitute your Sciebo username and token):

set WEBDAV_USERNAME=j.doe@fz-juelich.de
set WEBDAV_PASSWORD=XXXXX-XXXXX-XXXXX-XXXXX

Important: do not put the password in quotes (""), because set command treats them literally.

Tip: you can check the value of an environment variable with echo, e.g. echo %WEBDAV_USERNAME%.

The environment variables persist until you close your cmd. However, these credentials will later be cached in a owner-read-only file in the clone, so re-entering the credentials should not be needed for subsequent updates.

Clone the dataset

Clone the (lightweight) dataset from JuGit (using the URL below) into a directory with a name of your choice (here: motor_export):

datalad clone https://jugit.fz-juelich.de/m.szczepanik/motor-assesment.git motor_export

Then change your working directory to go into the dataset:

cd motor_export

Initialize sciebo remote

Git-annex special remotes represent methods of accessing data, and special remote configuration contains things like access URL or encryption type to be used.

One weaknes of the Sciebo setup is that WebDAV URLs contain user names and instance names, so will be different for each user (even though the shared folder is, in principle, the same). The dataset already has a sciebo remote, but it points to FZJ’s sciebo and Michał Szczepanik’s user account. There is no easy way to reconfigure it, other than each user initializing a new remote pointing to their sciebo URL.

The trick is to initialize a special remote, and mark it same as the one above (meaning that we expect to reach exactly the same place, just through different means).

This is by far the most involved part of the setup.

(optional): check available remotes

Each special remote has its own ID and assigned name, which can be used interchangeably with git-annex commands. To list remotes known to the dataset, run:

git annex info

Pending future changes, the remote we want to enable is going to be named sciebo-storage, i.e. it will be this one:

325dd4e6-e71b-4512-bce6-0eb574b6ed15 -- sciebo-storage

Initialize your own private special remote

As explained above, you will need to initialize a special remote for yourself, and mark it “same as” the one above.

Here’s the command broken up into multiple lines with a caret (^) for readability.

git annex initremote my-sciebo-storage^
  --private^
  --sameas sciebo-storage^
  type=webdav^
  url="<url of motor_export_storage>"

And this is the same as a one-liner:

git annex initremote my-sciebo-storage --private --sameas sciebo-storage type=webdav url="<url of motor_export_storage>"

Substitute the <url of motor_export_storage> with your data URL (the one which ends with motor_export_data)!

Note

In the example above, we enclosed the URL with double quotes ("..."). Keep in mind that cmd distinguishes single- and double quotes and treats them differently! Find out more in https://ss64.com/nt/syntax-esc.html

Troubleshooting

  • If the error message complains about credentials (bearer authentication), most likely the credentials (environment variables) are incorrect.
  • If the error message complains about gpg-agent not running, it is either not running or git-annex is using a different gpg than intended. See sections on gpg and gpg-agent above.
  • If there is seemingly no error but data can not be downloaded, then probably the URL given to initremote was incorrect. Make sure that the URL points to motor_export_data (not repo).

Data access

Note

This assumes your working directory is in the dataset (that you cd‘ed into the dataset).

To download data, use:

datalad get output

To remove your local copy of the data, use drop which does the opposite of get:

datalad drop output

To obtain the latest version of the data, use:

datalad update --how merge
datalad get output

To see when the data (according to your local state) last changed, use git log (up/down arrows scroll, Q quits):

git log

  1. It would, at least in theory, be possible to modify the PATH variable so that running gpg would use the Git-bundled gpg instead of the gpg4win gpg. ↩︎

  2. You call datalad get, DataLad calls git annex get; git-annex calls gpg --decrypt, and gpg asks gpg-agent for a matching secret key. ↩︎