trixi.experiment¶

Experiment¶

class trixi.experiment.experiment.Experiment(n_epochs=0)[source]¶

Bases: object

An abstract Experiment which can be run for a number of epochs.

The basic life cycle of an experiment is:

setup()
prepare()

while epoch < n_epochs:
    train()
    validate()
    epoch += 1

end()

If you want to use another criterion than number of epochs, e.g. stopping based on validation loss, you can implement that in your validation method and just call .stop() at some point to break the loop. Just set your n_epochs to a high number or np.inf.

The reason there is both setup() and prepare() is that internally there is also a _setup_internal() method for hidden magic in classes that inherit from this. For example, the trixi.experiment.pytorchexperiment.PytorchExperiment uses this to restore checkpoints. Think of setup() as an __init__() that is only called when the Experiment is actually asked to do anything. Then use prepare() to modify the fully instantiated Experiment if you need to.

To write a new Experiment simply inherit the Experiment class and overwrite the methods. You can then start your Experiment calling run()

In Addition the Experiment also has a test function. If you call the run_test() method it will call the test() and end_test() method internally (and if you give the parameter setup = True in run_test is will again call setup() and prepare() ).

Each Experiment also has its current state in _exp_state, its start time in _time_start, its end time in _time_end and the current epoch index in _epoch_idx

Parameters:	n_epochs (int) – The number of epochs in the Experiment (how often the train and validate method will be called)

end()[source]¶: Is called at the end of each experiment

end_test()[source]¶: Is called at the end of each experiment test

epoch¶: Convenience access property for self._epoch_idx

prepare()[source]¶: This method is called directly before the experiment training starts

process_err(e)[source]¶

This method is called if an error occurs during the execution of an experiment. Will just raise by default.

Parameters:	e (Exception) – The exception which was raised during the experiment life cycle

run(setup=True)[source]¶

This method runs the Experiment. It runs through the basic lifecycle of an Experiment:

setup()
prepare()

while epoch < n_epochs:
    train()
    validate()
    epoch += 1

end()

run_test(setup=True)[source]¶

This method runs the Experiment.

The test consist of an optional setup and then calls the test() and end_test().

Parameters:	setup – If True it will execute the `setup()` and `prepare()` function similar to the run method before calling `test()`.

setup()[source]¶: Is called at the beginning of each Experiment run to setup the basic components needed for a run.

stop()[source]¶: If called the Experiment will stop after that epoch and not continue training

test()[source]¶: The testing part of the Experiment

train(epoch)[source]¶

The training part of the Experiment, it is called once for each epoch

Parameters:	epoch (int) – The current epoch the train method is called in

validate(epoch)[source]¶

The evaluation/validation part of the Experiment, it is called once for each epoch (after the training part)

Parameters:	epoch (int) – The current epoch the validate method is called in

PytorchExperiment¶

class trixi.experiment.pytorchexperiment.PytorchExperiment(config=None, name=None, n_epochs=None, seed=None, base_dir=None, globs=None, resume=None, ignore_resume_config=False, resume_save_types=('model', 'optimizer', 'simple', 'th_vars', 'results'), resume_reset_epochs=True, parse_sys_argv=False, checkpoint_to_cpu=True, save_checkpoint_every_epoch=1, explogger_kwargs=None, explogger_freq=1, loggers=None, append_rnd_to_name=False, default_save_types=('model', 'optimizer', 'simple', 'th_vars', 'results'), save_checkpoints_default=True)[source]¶

Bases: trixi.experiment.experiment.Experiment

A PytorchExperiment extends the basic functionality of the Experiment class with convenience features for PyTorch (and general logging) such as creating a folder structure, saving, plotting results and checkpointing your experiment.

The basic life cycle of a PytorchExperiment is the same as Experiment:

setup()
prepare()

for epoch in n_epochs:
    train()
    validate()

end()

where the distinction between the first two is that between them PytorchExperiment will automatically restore checkpoints and save the _config_raw in _setup_internal(). Please see below for more information on this. To get your own experiment simply inherit from the PytorchExperiment and overwrite the setup(), prepare(), train(), validate() method (or you can use the very experimental decorator experimentify() to convert your class into a experiment). Then you can run your own experiment by calling the run() method.

Internally PytorchExperiment will provide a number of member variables which you can access.

n_epochs

Number of epochs.

exp_name

Name of your experiment.

config

The (initialized) Config of your experiment. You can access the uninitialized one via _config_raw.

result

A dict in which you can store your result values. If a PytorchExperimentLogger is used, results will be a ResultLogDict that directly automatically writes to a file and also stores the N last entries for each key for quick access (e.g. to quickly get the running mean).

elog (if base_dir is given)

A PytorchExperimentLogger instance which can log your results to a given folder. Will automatically be created if a base_dir is available.

loggers

Contains all loggers you provide, including the experiment logger, accessible by the names you provide.

clog

A CombinedLogger instance which logs to all loggers with different frequencies (specified with the last entry in the tuple you provide for each logger where 1 means every time and N means every Nth time, e.g. if you only want to send stuff to Visdom every 10th time).

The most important attribute is certainly config, which is the initialized Config for the experiment. To understand how it needs to be structured to allow for automatic instantiation of types, please refer to its documentation. If you decide not to use this functionality, config and _config_raw are identical. Beware however that by default the Pytorchexperiment only saves the raw config after setup(). If you modify config during setup, make sure to implement _setup_internal() yourself should you want the modified config to be saved:

def _setup_internal(self):

    super(YourExperiment, self)._setup_internal() # calls .prepare_resume()
    self.elog.save_config(self.config, "config")

Parameters:

config (dict or Config) – Configures your experiment. If name, n_epochs, seed, base_dir are given in the config, it will automatically overwrite the other args/kwargs with the values from the config. In addition (defined by parse_config_sys_argv) the config automatically parses the argv arguments and updates its values if a key matches a console argument.
name (str) – The name of the PytorchExperiment.
n_epochs (int) – The number of epochs (number of times the training cycle will be executed).
seed (int) – A random seed (which will set the random, numpy and torch seed).
base_dir (str) – A base directory in which the experiment result folder will be created. A PytorchExperimentLogger instance will be created if this is given.
globs – The globals() of the script which is run. This is necessary to get and save the executed files in the experiment folder.
resume (str or PytorchExperiment) – Another PytorchExperiment or path to the result dir from another PytorchExperiment from which it will load the PyTorch modules and other member variables and resume the experiment.
ignore_resume_config (bool) – If True it will not resume with the config from the resume experiment but take the current/own config.
resume_save_types (list or tuple) –
A list which can define which values to restore when resuming. Choices are:
- ”model” <– Pytorch models
- ”optimizer” <– Optimizers
- ”simple” <– Simple python variables (basic types and lists/tuples
- ”th_vars” <– torch tensors/variables
- ”results” <– The result dict
resume_reset_epochs (bool) – Set epoch to zero if you resume an existing experiment.
parse_sys_argv (bool) – Parsing the console arguments (argv) to get a config path and/or resume_path.
parse_config_sys_argv (bool) – Parse argv to update the config (if the keys match).
checkpoint_to_cpu (bool) – When checkpointing, transfer all tensors to the CPU beforehand.
save_checkpoint_every_epoch (int) – Determines after how many epochs a checkpoint is stored.
explogger_kwargs (dict) – Keyword arguments for elog instantiation.
explogger_freq (int) – The frequency x (meaning one in x) with which the clog will call the elog.
loggers (dict) –
Specify additional loggers. Entries should have one of these formats:
```
"name": "identifier" (will default to a frequency of 10)
"name": ("identifier"(, kwargs, frequency)) (last two are optional)
```
”identifier” is one of “telegram”, “tensorboard”, “visdom”, “slack”.
append_rnd_to_name (bool) – If True, will append a random six digit string to the experiment name.
save_checkpoints_default (bool) – By default save the current and the last checkpoint or not.

add_result(value, name, counter=None, tag=None, label=None, plot_result=True, plot_running_mean=False)[source]¶

Saves a results and add it to the result dict, this is similar to results[key] = val, but in addition also logs the value to the combined logger (it also stores in the results-logs file).

This should be your preferred method to log your numeric values

Parameters:

value – The value of your variable
name (str) – The name/key of your variable
counter (int or float) – A counter which can be seen as the x-axis of your value. Normally you would just use the current epoch for this.
tag (str) – A label/tag which can group similar values and will plot values with the same label in the same plot
label – deprecated label
plot_result (bool) – By default True, will also log all your values to the combined logger (with show_value).

add_result_without_epoch(val, name)[source]¶

A faster method to store your results, has less overhead and does not call the combined logger. Will only store to the results dictionary.

Parameters:	val – the value you want to add. name (str) – the name/key of your value.

at_exit_func()[source]¶: Stores the results and checkpoint at the end (if not already stored). This method is also called if an error occurs.

get_pytorch_modules(from_config=True)[source]¶

Returns all torch.nn.Modules stored in the experiment in a dict (even child dicts are stored).

Parameters:	from_config (bool) – Also get modules that are stored in the `config` attribute.
Returns:	Dictionary of PyTorch modules
Return type:	dict

get_pytorch_optimizers(from_config=True)[source]¶

Returns all torch.optim.Optimizers stored in the experiment in a dict.

Parameters:	from_config (bool) – Also get optimizers that are stored in the `config` attribute.
Returns:	Dictionary of PyTorch optimizers
Return type:	dict

get_pytorch_tensors(ignore=())[source]¶

Returns all torch.tensors in the experiment in a dict.

Parameters:	ignore (list or tuple) – Iterable of names which will be ignored
Returns:	Dictionary of PyTorch tensor
Return type:	dict

get_pytorch_variables(ignore=())[source]¶: Same as get_pytorch_tensors().

get_result(name)[source]¶

Similar to result[key] this will return the values in the results dictionary with the given name/key.

Parameters:	name (str) – the name/key for which a value is stored.
Returns:	The value with the key ‘name’ in the results dict.

get_result_without_epoch(name)[source]¶

Similar to result[key] this will return the values in result with the given name/key.

Parameters:	name (str) – the name/ key for which a value is stores.
Returns:	The value with the key ‘name’ in the results dict.

get_simple_variables(ignore=())[source]¶

Returns all standard variables in the experiment in a dict. Specifically, this looks for types int, float, bytes, bool, str, set, list, tuple.

Parameters:	ignore (list or tuple) – Iterable of names which will be ignored
Returns:	Dictionary of variables
Return type:	dict

load_checkpoint(name='checkpoint', save_types=('model', 'optimizer', 'simple', 'th_vars', 'results'), n_iter=None, iter_format='{:05d}', prefix=False, path=None)[source]¶

Loads a checkpoint and restores the experiment.

Make sure you have your torch stuff already on the right devices beforehand, otherwise this could lead to errors e.g. when making a optimizer step (and for some reason the Adam states are not already on the GPU: https://discuss.pytorch.org/t/loading-a-saved-model-for-continue-training/17244/3 )

Parameters:

name (str) – The name of the checkpoint file
save_types (list or tuple) – What kind of member variables should be loaded? Choices are: “model” <– Pytorch models, “optimizer” <– Optimizers, “simple” <– Simple python variables (basic types and lists/tuples), “th_vars” <– torch tensors, “results” <– The result dict
n_iter (int) – Number of iterations. Together with the name, defined by the iter_format, a file name will be created and searched for.
iter_format (str) – Defines how the name and the n_iter will be combined.
prefix (bool) – If True, the formatted n_iter will be prepended, otherwise appended.
path (str) – If no path is given then it will take the current experiment dir and formatted name, otherwise it will simply use the path and the formatted name to define the checkpoint file.

load_pytorch_models()[source]¶: Loads all model files from the experiment checkpoint folder.

load_simple_vars()[source]¶: Restores all simple python member variables from the ‘simple_vars.json’ file in the log folder.

log_simple_vars()[source]¶: Logs all simple python member variables as a json file in the experiment log folder. The file will be names ‘simple_vars.json’.

prepare_resume()[source]¶: Tries to resume the experiment by using the defined resume path or PytorchExperiment.

print(*args)[source]¶: Calls ‘print’ on the experiment logger or uses builtin ‘print’ if former is not available.

process_err(e)[source]¶

This method is called if an error occurs during the execution of an experiment. Will just raise by default.

Parameters:	e (Exception) – The exception which was raised during the experiment life cycle

save_checkpoint(name='checkpoint', save_types=('model', 'optimizer', 'simple', 'th_vars', 'results'), n_iter=None, iter_format='{:05d}', prefix=False)[source]¶

Saves a current model checkpoint from the experiment.

Parameters:

name (str) – The name of the checkpoint file
save_types (list or tuple) – What kind of member variables should be stored? Choices are: “model” <– Pytorch models, “optimizer” <– Optimizers, “simple” <– Simple python variables (basic types and lists/tuples), “th_vars” <– torch tensors, “results” <– The result dict
n_iter (int) – Number of iterations. Together with the name, defined by the iter_format, a file name will be created.
iter_format (str) – Defines how the name and the n_iter will be combined.
prefix (bool) – If True, the formatted n_iter will be prepended, otherwise appended.

save_end_checkpoint()[source]¶: Saves the current checkpoint as checkpoint_last.

save_pytorch_models()[source]¶: Saves all torch.nn.Modules as model files in the experiment checkpoint folder.

save_results(name='results.json')[source]¶

Saves the result dict as a json file in the result dir of the experiment logger.

Parameters:	name (str) – The name of the json file in which the results are written.

save_temp_checkpoint()[source]¶: Saves the current checkpoint as checkpoint_current.

slog¶

tblog¶

tlog¶

txlog¶

update_attributes(var_dict, ignore=())[source]¶

Updates the member attributes with the attributes given in the var_dict

Parameters:	var_dict (dict) – dict in which the update values stored. If a key matches a member attribute name the member attribute will be updated ignore (list or tuple) – iterable of keys to ignore

vlog¶

trixi.experiment.pytorchexperiment.experimentify(setup_fn='setup', train_fn='train', validate_fn='validate', end_fn='end', test_fn='test', **decoargs)[source]¶

Experimental decorator which monkey patches your class into a PytorchExperiment. You can then call run on your new PytorchExperiment class.

Parameters:	setup_fn – The name of your setup() function train_fn – The name of your train() function validate_fn – The name of your validate() function end_fn – The name of your end() function test_fn – The name of your test() function

trixi.experiment.pytorchexperiment.get_last_file(dir_, name=None)[source]¶

Returns the most recently created file in the folder which matches the name supplied

Parameters:	dir – The base directory to start the search in name – The name pattern to match with the files
Returns:	the path to the most recent file
Return type:	str

trixi.experiment.pytorchexperiment.get_vars_from_sys_argv()[source]¶

Parses the command line args (argv) and looks for –config_path and –resume_path and returns them if found.

Returns:	a tuple of (config_path, resume_path ) , None if it is not found
Return type:	tuple