WARNING This documentation is a work in progress and does not reflect the full FTI plugin potential.
The FTI plugin enables one to perform checkpoints using FTI. FTI plugin does not support the full FTI feature set but offers a simple declarative interface to access a large subset of it.
FTI plugin uses a
MPI_Comm datatype and
MPI_COMM_WORLD descriptor, which are defined in
mpi plugin. For now it must be loaded alongside FTI plugin (as shown below).
FTI initialization in FTI plugin is based on several rules:
init_onis specified, FTI plugin will initialize FTI on that exact event and expects the communicator and file name to be correctly available or fail otherwise,
init_onis not specified and the communicator is available on plugin initialization (e.g. when using
mpiplugin), FTI plugin will try to initialize FTI (on plugin initialization) and expects file name to be correctly available or fail otherwise,
init_onis not specified and the communicator is not available on plugin initialization, FTI plugin will try to initialize FTI on communicators descriptor exposure and expects file name to be correctly available or fail otherwise.
FTI plugin loads a predefined descriptor named
FTI_COMM_WORLD_F for Fortran) after a successfull initialization of FTI. This descriptor is treated as metadata. Its content can be accessed using
At its root, the FTI configuration is made of several nodes:
checkpoint specifies a list of checkpoints to execute. It is specified by a key/value map that may contain following keys:
L4_on. Each of the keys specifies events names that trigger a specified checkpoint. Value of this key can be either string or a list of strings:
L1_on: executes FTI L1 checkpoint on specified PDI events,
L2_on: executes FTI L2 checkpoint on specified PDI events,
L3_on: executes FTI L3 checkpoint on specified PDI events,
L4_on: executes FTI L4 checkpoint on specified PDI events.
communicator is a string, which specifies name of descriptor containing a MPI communicator to use for FTI initialization. Additionally, FTI plugin can write
FTI_COMM_WORLD into this descriptor, if it is shared with write access right. It defaults to
MPI_COMM_WORLD, which is a predefined descriptor from
config_file is a string that can contain $-expressions. Specifies the name of the FTI configuration file.
dataset is a key/value map, which keys are integers that specify datasets ids in FTI and values are names of protected descriptors. Alternatively, the map may contain a map with keys
name is a name of protected descriptor and
size is a descriptor where FTI plugin will write size of the protected descriptor when shared with write access right.
init_on is a string or a list of strings that specifies the names of the PDI events that executes recovery of FTI.
recover_on is a string or a list of strings that specifies the names of the PDI events that executes recovery of FTI.
recover_var is a list of maps with keys:
on_event defines on which events the variable recovery should be done (single event or list of events)
var defines IDs of variables to recover (single ID or list of IDs)
send_file is a key/value map or a list of key/value maps describing source and destination paths of files to send, event or list of events to trigger on and optionally name of descriptor, where will be stored status information when exposed. Example:
snapshot_on is a string or a list of strings that specifies the names of the PDI events that executes FTI_Snapshot.
status is a string that specifies the name of a descriptor to which FTI will write FTI_Status code when shared.