Freesurfer Test Plan

WORK IN PROGRESS

Introduction

This page documents the Freesurfer software test plan. A formal software test plan (see Wikipedia reference) describes a systematic approach to testing a software application (or suite), and includes these elements:

Also, tests should cover the following categories of testing:

The Freesurfer test plan is a work-in-progress. It is one not developed top-down, but rather grown from the bottom-up as necessity and time has dictated. The goal is to build a test suite that meets the criteria of a formal test plan. This will take time.

The current test suite is an ad-hoc collection of test scripts and C/C++ code providing rudimentary testing of most of the freesurfer code-base, consisting of unit, module and system tests. The #1 aim of these tests is simple: the output files produced by the recon-all stream (as documented here) must be 'correct', relative to reference files which are known to be 'correct' as determined by manual inspection or some formal method (a table of precalculated results from another program). The word 'correct' is in quotes because Freesurfer, being a research tool, is constantly evolving, as well as there being inherent variability in any complex scientific software application.

Unit tests

The term 'unit test' is defined in our Freesurfer test plan to mean a test of a freesurfer binary (such as mri_ca_register) or smaller (a subroutine). The framework for these tests is the 'make check' framework built into the 'make' utility (and the 'automake' tools). The 'check' target of 'make' initiates the build and run of tests created by the user to test the thing that is made by the 'all' target of a Makefile. In freesurfer, there are a number of 'make check' tests, and 'make check' is run after 'make' on each nightly build platform (see the section "How the nightly build works" for details).

Future - To formalize the unit tests, documentation (a wiki page) should be created which lists 1. all the binaries used in recon-all, 2. other important binaries not in the stream, and 3. the critical subroutines, as determined either by name (see Bruce Fischl and Doug Greve) and/or by profiling the binaries during a run of the recon-all stream; and for each of these, the name of the test (as run by 'make check') is listed. A table of this sort allows ascertaining coverage, and identification of tests to be developed.

Module tests

The term 'module test' is defined in our Freesurfer test plan to apply, at this time, to the atlases used by the recon-all stream. The AtlasSubjects page describes how these atlas are built and tested. So there are two module tests, summarized (from AtlasSubjects) here:

Future - These tests ought to be run automatically periodically, say, once a month. The results should also be automatically determined.

System tests

The term 'system test' is defined in our Freesurfer test plan to apply to the recon-all stream as a whole. In the current setup, different test platforms, representing the varying OS's (Linux and Mac, 32 and 64bit), each run the recon-all stream, and then each output file is compared against a known-good reference set for one subject (bert). This is briefly described in the section "How the daily testing works".

Future - A 64bit Mac OS platform needs to be setup. Additionally, and more importantly, a bigger set of test subjects needs to be included in the test suite. Currently, just 'bert' is used. But Freesurfer by its very nature can react quite differently to different scan parameters, and different brain pathologies, and subject ages. An automatic test of the Buckner40 set of subjects is necessary, and is documented here: Buckner40Adni60Testing.

Another future item is to regulary run 'valgrind' on each binary, to check for memory corruption or huge memory leaks.

GUI tests

Here, 'GUI tests' means testing of the GUI apps tkmedit, tksurfer, tkregister2, scuba, freeview and qdec. There are no format tests for these apps at this time. We rely on users reporting problems.

Future - Create a wiki page listing all the steps necessary to test the major functionality of each GUI app. This would be a manual test. It would be conducted prior to issuing a new release, or before a course.

Reporting

Test results need to be reported firstly to those who can determine failure causes (and fix them), and secondly to users of the software so that they can be aware of the general state of health of Freesurfer.

Currently, test results are reported by email only, and only to specific developers (NickSchmansky and KrishSubramaniam).

Future - A Dart Dashboard needs to be created to allow intuitive and global (public) reporting of test results. This application is a reporting manager, not a test framework, so existing unit, module and system test scripts would report to it (in place of, or in addition to, emailing results).

TestPlan (last edited 2018-02-09 18:01:44 by AndrewHoopes)