Differences between revisions 6 and 19 (spanning 13 versions)
Revision 6 as of 2017-04-18 11:06:16
Size: 2932
Editor: AndrewHoopes
Comment:
Revision 19 as of 2019-02-03 12:18:17
Size: 0
Editor: AndrewHoopes
Comment: moved to GitAnnex
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl LcnGroup:read,write,delete,revert

<<TableOfContents>>

This page describes how to deal with adding and tagging data files in the freesurfer source code repository.

== Initial Setup ==

Based in the information included in the [[Freesurfer_github|Freesurfer_github]] page, the remotes of your freesurfer repo working directory should look something like:

{{{
git remote -v
 
  datasrc file:///space/freesurfer/repo/annex.git (fetch)
  datasrc file:///space/freesurfer/repo/annex.git (push)
  origin git@github.com:zkaufman/freesurfer.git (fetch)
  origin git@github.com:zkaufman/freesurfer.git (push)
  upstream git@github.com:freesurfer/freesurfer.git (fetch)
  upstream git@github.com:freesurfer/freesurfer.git (push)
}}}

== Adding a data file ==

The following example assumes we want to add a data file called 'testdata.tar.gz' to the 'distribution' directory:

{{{
git annex add <filename>
git commit -a -m "Added new file"
git annex copy --to datasrc
}}}

== Getting a data file ==

To retrieve the contents of a data file:

{{{
git fetch datasrc (maybe could do 'git annex sync')
git annex get mri_em_register/testdata.tar.gz
}}}

Retrieve everything under the current directory (not recommended)
{{{
git annex get .
}}}

== Modifying a data file ==

To modify the contents of a data file, first unlock it (which eliminates the symlink), than modify it, then re-add to the annex:
{{{
git annex unlock mri_em_register/testdata.tar.gz
<modify contents of tar file>
git annex add mri_em_register/testdata.tar.gz
git commit -am "New test data"
git push
git annex copy --to datasrc
}}}

== Tagging a data file ==

The data files have been broken down into the following 3 categories, and it is essential that data files get the proper tag(s) so that our servers and diskspace is not overwhelmed when only a known subset of the data is required.:

 1. Those being required for build time checks (tagged '''makecheck''')
 1. Those required for a local installation (tagged '''makeinstall''')
 1. Everything else (untagged)

=== Display metadata ===

To show all the metadata associated with a file:

{{{
git annex metadata mri_em_register/testdata.tar.gz
}}}

=== Assign metadata ===

To assign a tag to an existing datafile.

{{{
git annex metadata mri_em_register/testdata.tar.gz -s fstags=makecheck
git annex sync
}}}

We can also append tags:
{{{
git annex metadata mri_em_register/testdata.tar.gz -s fstags+=makeinstall
git annex sync
}}}

=== List all files with a given tag ===
{{{
git annex find --metadata fstags=makecheck
}}}

=== Retrieve all files with a given tag ===

Get only the data files required for build time checks (1.9 GB)
{{{
git annex get --metadata fstags=makecheck .
}}}

Get only the data files required for local installation (4.3 GB)
{{{
git annex get --metadata fstags=makeinstall .
}}}