Differences between revisions 2 and 10 (spanning 8 versions)
Revision 2 as of 2017-04-13 14:12:31
Size: 2544
Editor: ZekeKaufman
Comment:
Revision 10 as of 2017-05-19 11:59:21
Size: 4242
Editor: ZekeKaufman
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#acl LcnGroup:read,write,delete,revert #acl LcnGroup:read,write,delete,revert All:read
Line 5: Line 5:
This page describes how to deal with adding and tagging data files in the freesurfer source code repository. This page describes how to work with
Line 11: Line 11:
 {{{ {{{
Line 20: Line 20:
 }}} }}}

Users outside the Martinos Center, who do not have access to the local filesystem, should instead have the '''datasrc''' repo pointed to the public facing server:

{{{
git remote -v

  datasrc https://surfer.nmr.mgh.harvard.edu/pub/dist/freesurfer/repo/annex.git (fetch)
  datasrc https://surfer.nmr.mgh.harvard.edu/pub/dist/freesurfer/repo/annex.git (push)
  origin git@github.com:zkaufman/freesurfer.git (fetch)
  origin git@github.com:zkaufman/freesurfer.git (push)
  upstream git@github.com:freesurfer/freesurfer.git (fetch)
  upstream git@github.com:freesurfer/freesurfer.git (push)
}}}
Line 27: Line 40:
git annex add <filename>
git commit -a -m "Added new file"
git annex copy --to datasrc
Line 33: Line 48:
{{{
git fetch datasrc (maybe could do 'git annex sync')
git annex get mri_em_register/testdata.tar.gz
}}}
Line 34: Line 53:
Get only the data files required for build time checks (1.9 GB)
Line 35: Line 55:
## Retrieve a specific file
git annex get mri_em_register/testdata.tar.gz
git annex get --metadata fstags=makecheck .
}}}
Line 38: Line 58:
## Retrieve everything under the current directory (not recommended) Get only the data files required for local installation (4.3 GB)
{{{
git annex get --metadata fstags=makeinstall .
}}}

Retrieve everything under the current directory (not recommended)
{{{
Line 42: Line 68:
To retrieve everything (not recommended): == Modifying a data file ==
Line 44: Line 70:
To modify the contents of a data file, first unlock it (which eliminates the symlink), than modify it, then re-add to the annex:
Line 45: Line 72:
git annex unlock mri_em_register/testdata.tar.gz
<modify contents of tar file>
git annex add mri_em_register/testdata.tar.gz
git commit -am "New test data"
git push
git annex copy --to datasrc
}}}
Line 46: Line 80:
== Tagging a data file == == Tagging ==
Line 48: Line 82:
The data files have been broken down into the following 3 categories, and it is essential that data files get the proper tag(s) so that our servers and diskspace is not overwhelmed when only a known subset of the data is required.: Git -annex provides the ability to to tag data files. Freesurfer utilizes tags so that subsets of the data can be retrieved without having to download everything. The data files have been broken down into the following 3 categories:
Line 53: Line 87:

It is essential that data files get the proper tag(s) so that our servers and diskspace is not overwhelmed when only a known subset of the data is required.
Line 74: Line 110:
git annex sync
Line 76: Line 113:
To list all files with a given tag: === List all files with a given tag ===
Line 81: Line 118:
To download all datafiles with a given tag: == Mirroring ==

The git annex repo exists on the local file system in the following directory:
Line 84: Line 123:
## Get only the data files required for build time checks (1.9 GB)
git annex get --metadata fstags=makecheck .
/space/freesurfer/repo/annex.git
}}}
Line 87: Line 126:
## Get only the data files required for local installation (4.3 GB)
git annex get --metadata fstags=makeinstall .
The public facing git annex repo exists on local file system in the following directory (mounted by our server):
Line 90: Line 128:
## Just give me everything! Not Recommended (6.8 GB)
git annex get .
{{{
/cluster/pubftp/dist/freesurfer/repo/annex.git
Line 93: Line 131:

Currently we "mirror" the two repos daily using the following commands:

{{{
ssh pinto (Must be on machine pinto)
rsync -av /space/freesurfer/repo/annex.git/* /cluster/pubftp/dist/freesurfer/repo/annex.git
git update-server-info
}}}

The proper way to mirror would be as follows:

{{{

}}}

This page describes how to work with

Initial Setup

Based in the information included in the Freesurfer_github page, the remotes of your freesurfer repo working directory should look something like:

git remote -v 
 
  datasrc       file:///space/freesurfer/repo/annex.git (fetch)
  datasrc       file:///space/freesurfer/repo/annex.git (push)
  origin        git@github.com:zkaufman/freesurfer.git (fetch)
  origin        git@github.com:zkaufman/freesurfer.git (push)
  upstream      git@github.com:freesurfer/freesurfer.git (fetch)
  upstream      git@github.com:freesurfer/freesurfer.git (push)

Users outside the Martinos Center, who do not have access to the local filesystem, should instead have the datasrc repo pointed to the public facing server:

git remote -v

  datasrc       https://surfer.nmr.mgh.harvard.edu/pub/dist/freesurfer/repo/annex.git (fetch)
  datasrc       https://surfer.nmr.mgh.harvard.edu/pub/dist/freesurfer/repo/annex.git (push)
  origin        git@github.com:zkaufman/freesurfer.git (fetch)
  origin        git@github.com:zkaufman/freesurfer.git (push)
  upstream      git@github.com:freesurfer/freesurfer.git (fetch)
  upstream      git@github.com:freesurfer/freesurfer.git (push)

Adding a data file

The following example assumes we want to add a data file called 'testdata.tar.gz' to the 'distribution' directory:

git annex add <filename>
git commit -a -m "Added new file"
git annex copy --to datasrc

Getting a data file

To retrieve the contents of a data file:

git fetch datasrc (maybe could do 'git annex sync')
git annex get mri_em_register/testdata.tar.gz

Get only the data files required for build time checks (1.9 GB)

git annex get --metadata fstags=makecheck .

Get only the data files required for local installation (4.3 GB)

git annex get --metadata fstags=makeinstall .

Retrieve everything under the current directory (not recommended)

git annex get .

Modifying a data file

To modify the contents of a data file, first unlock it (which eliminates the symlink), than modify it, then re-add to the annex:

git annex unlock mri_em_register/testdata.tar.gz
<modify contents of tar file>
git annex add mri_em_register/testdata.tar.gz
git commit -am "New test data"
git push
git annex copy --to datasrc

Tagging

Git -annex provides the ability to to tag data files. Freesurfer utilizes tags so that subsets of the data can be retrieved without having to download everything. The data files have been broken down into the following 3 categories:

  1. Those being required for build time checks (tagged makecheck)

  2. Those required for a local installation (tagged makeinstall)

  3. Everything else (untagged)

It is essential that data files get the proper tag(s) so that our servers and diskspace is not overwhelmed when only a known subset of the data is required.

Display metadata

To show all the metadata associated with a file:

git annex metadata mri_em_register/testdata.tar.gz

Assign metadata

To assign a tag to an existing datafile.

git annex metadata mri_em_register/testdata.tar.gz -s fstags=makecheck
git annex sync

We can also append tags:

git annex metadata mri_em_register/testdata.tar.gz -s fstags+=makeinstall
git annex sync

List all files with a given tag

git annex find --metadata fstags=makecheck

Mirroring

The git annex repo exists on the local file system in the following directory:

/space/freesurfer/repo/annex.git

The public facing git annex repo exists on local file system in the following directory (mounted by our server):

/cluster/pubftp/dist/freesurfer/repo/annex.git

Currently we "mirror" the two repos daily using the following commands:

ssh pinto   (Must be on machine pinto)
rsync -av /space/freesurfer/repo/annex.git/* /cluster/pubftp/dist/freesurfer/repo/annex.git
git update-server-info

The proper way to mirror would be as follows: