Accessing motor assesment data collection on Windows
This note will explain in detail how to set-up your local copy of the aggregated motor assesment data, and help deal with potential configuration-dependent obstacles.
Overview of the encrypted workflow and prerequisites are explained in the encrypted workflow page. It is highly recommended that you read it first.
The note assumes that you are working on Windows and using cmd shell. It also assumes that you have access to the shared Sciebo folder, and the data is being encrypted to your GPG key.
Before you begin
GPG
If you install Git (which is a requirement for using DataLad), it comes with its own gpg
executable. If you install gpg4win (which is the solution recommended for digital signatures in our docs on Input signing it naturally comes with a separate gpg
executable. This means it is possible to have two GPG programs… which store their data (e.g. keys) in completely different locations on your computer.
What’s more, in this case, git-annex operations will default to using the one which came with Git, while user operations (be it via Kleopatra GUI or the command line) will more than likely1 use the one which came with gpg4win. This can become confusing. Fortunately, git-annex behaviour is configurable. The goal here is to have both the user and git-annex interact with the same gpg
executable.
If you do not have gpg4win: you probably do not need to do anything.
If you have gpg4win and used it to generate (or import) your private key, you need to configure Git and git-annex to use this one.
First, find out the path to the gpg executable. The path components would tell you whether it came with Git or gpg4win - the latter is more likely.
where gpg
Then (if it is gpg4win) tell Git to use this one. Git-annex obeys the same setting. Replace PATH/TO/GPG
with the path reported by the previous command.
git config gpg.program PATH/TO/GPG
GPG-agent
Operations which use your secret key (like downloading encrypted data with git-annex or listing your secret keys with gpg --list-secret-keys
) require interactions with a program called gpg-agent
2. Normally, gpg-agent
should be running in the background, and there is nothing to do.
If you notice gpg-agent is not running (e.g. get
fails with an error message which says that gpg-agent is not running, or you can not list your secret keys), you can start the gpg-agent with:
gpg-connect-agent reloadagent /bye
Sciebo token (app password)
Instead of using your pasword (the one you use to log in to Sciebo), it is recommended to generate a token for use with DataLad & git-annex. Tokens are like passwords, but they are generated by the website rather than created by the user. A token can be revoked when it is no longer used or there is a risk that it got exposed.
To generate a token:
- log in to Sciebo
- click your username on the upper right and go to settings
- select Security
- scroll down to App passwords / tokens
- enter a token name and click “Create new app passcode”
- copy (and store safely) the username and token
The token name can be anything - it is only for your information, and will be displayed in the list of tokens, with the date of last use and an option to delete (revoke) the token.
WebDAV URLs
When using DataLad & git-annex, you will connect to Sciebo through a protocol called WebDAV. For this, you need to know your WebDAV URL, which is slightly different than the URL you use to visit Sciebo’s web interface.
Base WebDAV URL
To find your WebDAV address:
- go to Sciebo (file view)
- click ⚙️ Settings in the lower left corner
- a “drawer” will slide up, showing your WebDAV URL
The WebDAV URL will look like this:
https://fz-juelich.sciebo.de/remote.php/dav/files/j.doe%40fz-juelich.de/
WebDAV URLs to folders
The URL above is pointing to your home directory. To access a specific folder, you will need to add some path components manually.
Most likely, a folder named Z03
will be shared with you. It will contain some folders; motor_export_data
is the relevant one.
The folder path relative to your home directory will depend on whether you move the shared folder or not. Assuming that Z03
is in your home directory on Sciebo, the relative paths will be Z03/motor_export_repo
and Z03/motor_export_data
, respectively.
For use with DataLad, you will need to put the base URL and the relative paths together. Using the example above, and assuming that the shared folder is in the top-level directory, the data folder URL (for initremote) is:
https://fz-juelich.sciebo.de/remote.php/dav/files/j.doe%40fz-juelich.de/Z03/motor_export_data
Distributed storage
There are two components to the dataset:
- The Git repository (which we can call a “lightweight dataset”) contains information about content identity and availability, but does not contain the actual data.
- The content managed by git-annex (“annexed data”), i.e. the actual data.
For technical reasons they are stored and shared separately. In the chart below, a square box symbolizes a Git remote, and cylinders symbolize Git-annex special remotes.
flowchart TD A(Your local dataset clone) A ---o |clone, update| B["**JuGit** (public)"] A ---o |initremote, get| C[("**Sciebo** (restricted, encrypted)")] A ---o |"🚫"| D[("**INM-7 storage** (restricted, encrypted)")]
Retrieving data requires interacting with both the Git remote and the Git-annex special remote (key commands are shown over the lines). Because no data are stored in JuGit, the repository can be made public, and fetching information from JuGit requires no credentials. However, access to the data stored in Sciebo requires not only credentials, but also posession of a GPG key for which the data are encrypted. Note that the Git repository also indexes data stored in INM7-storage, but that special remote is not accessible outside INM-7.
Initial setup
Let git-annex know your token
Git-annex needs WebDAV credentials to access the stored repository and data.
It expects them to be provided via environment variables.
Set the environment variables with the set
command (substitute your Sciebo username and token):
set WEBDAV_USERNAME=j.doe@fz-juelich.de
set WEBDAV_PASSWORD=XXXXX-XXXXX-XXXXX-XXXXX
Important: do not put the password in quotes (""
), because set
command treats them literally.
Tip: you can check the value of an environment variable with echo
, e.g. echo %WEBDAV_USERNAME%
.
The environment variables persist until you close your cmd. However, these credentials will later be cached in a owner-read-only file in the clone, so re-entering the credentials should not be needed for subsequent updates.
Clone the dataset
Clone the (lightweight) dataset from JuGit (using the URL below) into a directory with a name of your choice
(here: motor_export
):
datalad clone https://jugit.fz-juelich.de/m.szczepanik/motor-assesment.git motor_export
Then change your working directory to go into the dataset:
cd motor_export
Initialize sciebo remote
Git-annex special remotes represent methods of accessing data, and special remote configuration contains things like access URL or encryption type to be used.
One weaknes of the Sciebo setup is that WebDAV URLs contain user names and instance names, so will be different for each user (even though the shared folder is, in principle, the same). The dataset already has a sciebo remote, but it points to FZJ’s sciebo and Michał Szczepanik’s user account. There is no easy way to reconfigure it, other than each user initializing a new remote pointing to their sciebo URL.
The trick is to initialize a special remote, and mark it same as the one above (meaning that we expect to reach exactly the same place, just through different means).
This is by far the most involved part of the setup.
(optional): check available remotes
Each special remote has its own ID and assigned name, which can be used interchangeably with git-annex commands. To list remotes known to the dataset, run:
git annex info
Pending future changes, the remote we want to enable is going to be named sciebo-storage, i.e. it will be this one:
325dd4e6-e71b-4512-bce6-0eb574b6ed15 -- sciebo-storage
Initialize your own private special remote
As explained above, you will need to initialize a special remote for yourself, and mark it “same as” the one above.
Here’s the command broken up into multiple lines with a caret (^
) for readability.
git annex initremote my-sciebo-storage^
--private^
--sameas sciebo-storage^
type=webdav^
url="<url of motor_export_storage>"
And this is the same as a one-liner:
git annex initremote my-sciebo-storage --private --sameas sciebo-storage type=webdav url="<url of motor_export_storage>"
Substitute the <url of motor_export_storage>
with your data URL (the one which ends with motor_export_data
)!
Note
In the example above, we enclosed the URL with double quotes ("...")
.
Keep in mind that cmd distinguishes single- and double quotes and treats them differently!
Find out more in https://ss64.com/nt/syntax-esc.html
Troubleshooting
- If the error message complains about credentials (bearer authentication), most likely the credentials (environment variables) are incorrect.
- If the error message complains about gpg-agent not running, it is either not running or git-annex is using a different gpg than intended. See sections on gpg and gpg-agent above.
- If there is seemingly no error but data can not be downloaded, then probably the URL given to initremote was incorrect. Make sure that the URL points to
motor_export_data
(notrepo
).
Data access
Note
This assumes your working directory is in the dataset (that you cd
‘ed into the dataset).
To download data, use:
datalad get output
To remove your local copy of the data, use drop
which does the opposite of get
:
datalad drop output
To obtain the latest version of the data, use:
datalad update --how merge
datalad get output
To see when the data (according to your local state) last changed, use git log
(up/down arrows scroll, Q
quits):
git log
-
It would, at least in theory, be possible to modify the PATH variable so that running
gpg
would use the Git-bundled gpg instead of the gpg4win gpg. ↩︎ -
You call
datalad get
, DataLad callsgit annex get
; git-annex callsgpg --decrypt
, and gpg asks gpg-agent for a matching secret key. ↩︎