Differences between revisions 3 and 4
Deletions are marked like this. Additions are marked like this.
Line 35: Line 35:
## Retrieve a specific file
Line 37: Line 36:
}}}
Line 38: Line 38:
## Retrieve everything under the current directory (not recommended) Retrieve everything under the current directory (not recommended)
{{{
Line 42: Line 43:
== Modifying a data file ==

To modify the contents of a data file, first unlock it (which eliminates the symlink), than modify it, then re-add to the annex:
{{{
git annex unlock mri_em_register/testdata.tar.gz
<modify contents of tar file>
git annex add mri_em_register/testdata.tar.gz
git commit -am "New test data"
git push
git annex sync
}}}

This page describes how to deal with adding and tagging data files in the freesurfer source code repository.

Initial Setup

Based in the information included in the Freesurfer_github page, the remotes of your freesurfer repo working directory should look something like:

git remote -v 
 
  datasrc       file:///space/freesurfer/repo/annex.git (fetch)
  datasrc       file:///space/freesurfer/repo/annex.git (push)
  origin        git@github.com:zkaufman/freesurfer.git (fetch)
  origin        git@github.com:zkaufman/freesurfer.git (push)
  upstream      git@github.com:freesurfer/freesurfer.git (fetch)
  upstream      git@github.com:freesurfer/freesurfer.git (push)

Adding a data file

The following example assumes we want to add a data file called 'testdata.tar.gz' to the 'distribution' directory:

Getting a data file

To retrieve the contents of a data file:

git annex get mri_em_register/testdata.tar.gz

Retrieve everything under the current directory (not recommended)

git annex get .

Modifying a data file

To modify the contents of a data file, first unlock it (which eliminates the symlink), than modify it, then re-add to the annex:

git annex unlock mri_em_register/testdata.tar.gz
<modify contents of tar file>
git annex add mri_em_register/testdata.tar.gz
git commit -am "New test data"
git push
git annex sync

Tagging a data file

The data files have been broken down into the following 3 categories, and it is essential that data files get the proper tag(s) so that our servers and diskspace is not overwhelmed when only a known subset of the data is required.:

  1. Those being required for build time checks (tagged makecheck)

  2. Those required for a local installation (tagged makeinstall)

  3. Everything else (untagged)

Display metadata

To show all the metadata associated with a file:

git annex metadata mri_em_register/testdata.tar.gz

Assign metadata

To assign a tag to an existing datafile.

git annex metadata mri_em_register/testdata.tar.gz -s fstags=makecheck
git annex sync

We can also append tags:

git annex metadata mri_em_register/testdata.tar.gz -s fstags+=makeinstall
git annex sync

List all files with a given tag

git annex find --metadata fstags=makecheck

Retrieve all files with a given tag

Get only the data files required for build time checks (1.9 GB)

git annex get --metadata fstags=makecheck .

Get only the data files required for local installation (4.3 GB)

git annex get --metadata fstags=makeinstall .