trixi.experiment¶
Experiment¶
-
class
trixi.experiment.experiment.
Experiment
(n_epochs=0)[source]¶ Bases:
object
An abstract Experiment which can be run for a number of epochs.
The basic life cycle of an experiment is:
setup() prepare() while epoch < n_epochs: train() validate() epoch += 1 end()
If you want to use another criterion than number of epochs, e.g. stopping based on validation loss, you can implement that in your validation method and just call .stop() at some point to break the loop. Just set your n_epochs to a high number or np.inf.
The reason there is both
setup()
andprepare()
is that internally there is also a_setup_internal()
method for hidden magic in classes that inherit from this. For example, thetrixi.experiment.pytorchexperiment.PytorchExperiment
uses this to restore checkpoints. Think ofsetup()
as an__init__()
that is only called when the Experiment is actually asked to do anything. Then useprepare()
to modify the fully instantiated Experiment if you need to.To write a new Experiment simply inherit the Experiment class and overwrite the methods. You can then start your Experiment calling
run()
In Addition the Experiment also has a test function. If you call the
run_test()
method it will call thetest()
andend_test()
method internally (and if you give the parameter setup = True in run_test is will again callsetup()
andprepare()
).Each Experiment also has its current state in
_exp_state
, its start time in_time_start
, its end time in_time_end
and the current epoch index in_epoch_idx
Parameters: n_epochs (int) – The number of epochs in the Experiment (how often the train and validate method will be called) -
epoch
¶ Convenience access property for self._epoch_idx
-
process_err
(e)[source]¶ This method is called if an error occurs during the execution of an experiment. Will just raise by default.
Parameters: e (Exception) – The exception which was raised during the experiment life cycle
-
run
(setup=True)[source]¶ This method runs the Experiment. It runs through the basic lifecycle of an Experiment:
setup() prepare() while epoch < n_epochs: train() validate() epoch += 1 end()
-
run_test
(setup=True)[source]¶ This method runs the Experiment.
The test consist of an optional setup and then calls the
test()
andend_test()
.Parameters: setup – If True it will execute the setup()
andprepare()
function similar to the run method before callingtest()
.
-
setup
()[source]¶ Is called at the beginning of each Experiment run to setup the basic components needed for a run.
-
PytorchExperiment¶
-
class
trixi.experiment.pytorchexperiment.
PytorchExperiment
(config=None, name=None, n_epochs=None, seed=None, base_dir=None, globs=None, resume=None, ignore_resume_config=False, resume_save_types=('model', 'optimizer', 'simple', 'th_vars', 'results'), resume_reset_epochs=True, parse_sys_argv=False, checkpoint_to_cpu=True, save_checkpoint_every_epoch=1, explogger_kwargs=None, explogger_freq=1, loggers=None, append_rnd_to_name=False, default_save_types=('model', 'optimizer', 'simple', 'th_vars', 'results'), save_checkpoints_default=True)[source]¶ Bases:
trixi.experiment.experiment.Experiment
A PytorchExperiment extends the basic functionality of the
Experiment
class with convenience features for PyTorch (and general logging) such as creating a folder structure, saving, plotting results and checkpointing your experiment.The basic life cycle of a PytorchExperiment is the same as
Experiment
:setup() prepare() for epoch in n_epochs: train() validate() end()
where the distinction between the first two is that between them PytorchExperiment will automatically restore checkpoints and save the
_config_raw
in_setup_internal()
. Please see below for more information on this. To get your own experiment simply inherit from the PytorchExperiment and overwrite thesetup()
,prepare()
,train()
,validate()
method (or you can use the very experimental decoratorexperimentify()
to convert your class into a experiment). Then you can run your own experiment by calling therun()
method.Internally PytorchExperiment will provide a number of member variables which you can access.
- n_epochs
- Number of epochs.
- exp_name
- Name of your experiment.
- config
- The (initialized)
Config
of your experiment. You can access the uninitialized one via_config_raw
.
- result
- A dict in which you can store your result values. If a
PytorchExperimentLogger
is used, results will be aResultLogDict
that directly automatically writes to a file and also stores the N last entries for each key for quick access (e.g. to quickly get the running mean).
- elog (if base_dir is given)
- A
PytorchExperimentLogger
instance which can log your results to a given folder. Will automatically be created if a base_dir is available.
- loggers
- Contains all loggers you provide, including the experiment logger, accessible by the names you provide.
- clog
- A
CombinedLogger
instance which logs to all loggers with different frequencies (specified with the last entry in the tuple you provide for each logger where 1 means every time and N means every Nth time, e.g. if you only want to send stuff to Visdom every 10th time).
The most important attribute is certainly
config
, which is the initializedConfig
for the experiment. To understand how it needs to be structured to allow for automatic instantiation of types, please refer to its documentation. If you decide not to use this functionality,config
and_config_raw
are identical. Beware however that by default the Pytorchexperiment only saves the raw config aftersetup()
. If you modifyconfig
during setup, make sure to implement_setup_internal()
yourself should you want the modified config to be saved:def _setup_internal(self): super(YourExperiment, self)._setup_internal() # calls .prepare_resume() self.elog.save_config(self.config, "config")
Parameters: - config (dict or Config) – Configures your experiment. If
name
,n_epochs
,seed
,base_dir
are given in the config, it will automatically overwrite the other args/kwargs with the values from the config. In addition (defined byparse_config_sys_argv
) the config automatically parses the argv arguments and updates its values if a key matches a console argument. - name (str) – The name of the PytorchExperiment.
- n_epochs (int) – The number of epochs (number of times the training cycle will be executed).
- seed (int) – A random seed (which will set the random, numpy and torch seed).
- base_dir (str) – A base directory in which the experiment result folder
will be created. A
PytorchExperimentLogger
instance will be created if this is given. - globs – The
globals()
of the script which is run. This is necessary to get and save the executed files in the experiment folder. - resume (str or PytorchExperiment) – Another PytorchExperiment or path to the result dir from another PytorchExperiment from which it will load the PyTorch modules and other member variables and resume the experiment.
- ignore_resume_config (bool) – If
True
it will not resume with the config from the resume experiment but take the current/own config. - resume_save_types (list or tuple) –
A list which can define which values to restore when resuming. Choices are:
- ”model” <– Pytorch models
- ”optimizer” <– Optimizers
- ”simple” <– Simple python variables (basic types and lists/tuples
- ”th_vars” <– torch tensors/variables
- ”results” <– The result dict
- resume_reset_epochs (bool) – Set epoch to zero if you resume an existing experiment.
- parse_sys_argv (bool) – Parsing the console arguments (argv) to get a
config path
and/orresume_path
. - parse_config_sys_argv (bool) – Parse argv to update the config (if the keys match).
- checkpoint_to_cpu (bool) – When checkpointing, transfer all tensors to the CPU beforehand.
- save_checkpoint_every_epoch (int) – Determines after how many epochs a checkpoint is stored.
- explogger_kwargs (dict) – Keyword arguments for
elog
instantiation. - explogger_freq (int) – The frequency x (meaning one in x) with which
the
clog
will call theelog
. - loggers (dict) –
Specify additional loggers. Entries should have one of these formats:
"name": "identifier" (will default to a frequency of 10) "name": ("identifier"(, kwargs, frequency)) (last two are optional)
”identifier” is one of “telegram”, “tensorboard”, “visdom”, “slack”.
- append_rnd_to_name (bool) – If
True
, will append a random six digit string to the experiment name. - save_checkpoints_default (bool) – By default save the current and the last checkpoint or not.
-
add_result
(value, name, counter=None, tag=None, label=None, plot_result=True, plot_running_mean=False)[source]¶ Saves a results and add it to the result dict, this is similar to results[key] = val, but in addition also logs the value to the combined logger (it also stores in the results-logs file).
This should be your preferred method to log your numeric values
Parameters: - value – The value of your variable
- name (str) – The name/key of your variable
- counter (int or float) – A counter which can be seen as the x-axis of your value. Normally you would just use the current epoch for this.
- tag (str) – A label/tag which can group similar values and will plot values with the same label in the same plot
- label – deprecated label
- plot_result (bool) – By default True, will also log all your values to the combined logger (with show_value).
-
add_result_without_epoch
(val, name)[source]¶ A faster method to store your results, has less overhead and does not call the combined logger. Will only store to the results dictionary.
Parameters: - val – the value you want to add.
- name (str) – the name/key of your value.
-
at_exit_func
()[source]¶ Stores the results and checkpoint at the end (if not already stored). This method is also called if an error occurs.
-
get_pytorch_modules
(from_config=True)[source]¶ Returns all torch.nn.Modules stored in the experiment in a dict (even child dicts are stored).
Parameters: from_config (bool) – Also get modules that are stored in the config
attribute.Returns: Dictionary of PyTorch modules Return type: dict
-
get_pytorch_optimizers
(from_config=True)[source]¶ Returns all torch.optim.Optimizers stored in the experiment in a dict.
Parameters: from_config (bool) – Also get optimizers that are stored in the config
attribute.Returns: Dictionary of PyTorch optimizers Return type: dict
-
get_pytorch_tensors
(ignore=())[source]¶ Returns all torch.tensors in the experiment in a dict.
Parameters: ignore (list or tuple) – Iterable of names which will be ignored Returns: Dictionary of PyTorch tensor Return type: dict
-
get_pytorch_variables
(ignore=())[source]¶ Same as
get_pytorch_tensors()
.
-
get_result
(name)[source]¶ Similar to result[key] this will return the values in the results dictionary with the given name/key.
Parameters: name (str) – the name/key for which a value is stored. Returns: The value with the key ‘name’ in the results dict.
-
get_result_without_epoch
(name)[source]¶ Similar to result[key] this will return the values in result with the given name/key.
Parameters: name (str) – the name/ key for which a value is stores. Returns: The value with the key ‘name’ in the results dict.
-
get_simple_variables
(ignore=())[source]¶ Returns all standard variables in the experiment in a dict. Specifically, this looks for types
int
,float
,bytes
,bool
,str
,set
,list
,tuple
.Parameters: ignore (list or tuple) – Iterable of names which will be ignored Returns: Dictionary of variables Return type: dict
-
load_checkpoint
(name='checkpoint', save_types=('model', 'optimizer', 'simple', 'th_vars', 'results'), n_iter=None, iter_format='{:05d}', prefix=False, path=None)[source]¶ Loads a checkpoint and restores the experiment.
Make sure you have your torch stuff already on the right devices beforehand, otherwise this could lead to errors e.g. when making a optimizer step (and for some reason the Adam states are not already on the GPU: https://discuss.pytorch.org/t/loading-a-saved-model-for-continue-training/17244/3 )
Parameters: - name (str) – The name of the checkpoint file
- save_types (list or tuple) – What kind of member variables should be loaded? Choices are: “model” <– Pytorch models, “optimizer” <– Optimizers, “simple” <– Simple python variables (basic types and lists/tuples), “th_vars” <– torch tensors, “results” <– The result dict
- n_iter (int) – Number of iterations. Together with the name, defined by the iter_format, a file name will be created and searched for.
- iter_format (str) – Defines how the name and the n_iter will be combined.
- prefix (bool) – If True, the formatted n_iter will be prepended, otherwise appended.
- path (str) – If no path is given then it will take the current experiment dir and formatted name, otherwise it will simply use the path and the formatted name to define the checkpoint file.
-
load_simple_vars
()[source]¶ Restores all simple python member variables from the ‘simple_vars.json’ file in the log folder.
-
log_simple_vars
()[source]¶ Logs all simple python member variables as a json file in the experiment log folder. The file will be names ‘simple_vars.json’.
-
prepare_resume
()[source]¶ Tries to resume the experiment by using the defined resume path or PytorchExperiment.
-
print
(*args)[source]¶ Calls ‘print’ on the experiment logger or uses builtin ‘print’ if former is not available.
-
process_err
(e)[source]¶ This method is called if an error occurs during the execution of an experiment. Will just raise by default.
Parameters: e (Exception) – The exception which was raised during the experiment life cycle
-
save_checkpoint
(name='checkpoint', save_types=('model', 'optimizer', 'simple', 'th_vars', 'results'), n_iter=None, iter_format='{:05d}', prefix=False)[source]¶ Saves a current model checkpoint from the experiment.
Parameters: - name (str) – The name of the checkpoint file
- save_types (list or tuple) – What kind of member variables should be stored? Choices are: “model” <– Pytorch models, “optimizer” <– Optimizers, “simple” <– Simple python variables (basic types and lists/tuples), “th_vars” <– torch tensors, “results” <– The result dict
- n_iter (int) – Number of iterations. Together with the name, defined by the iter_format, a file name will be created.
- iter_format (str) – Defines how the name and the n_iter will be combined.
- prefix (bool) – If True, the formatted n_iter will be prepended, otherwise appended.
-
save_pytorch_models
()[source]¶ Saves all torch.nn.Modules as model files in the experiment checkpoint folder.
-
save_results
(name='results.json')[source]¶ Saves the result dict as a json file in the result dir of the experiment logger.
Parameters: name (str) – The name of the json file in which the results are written.
-
slog
¶
-
tblog
¶
-
tlog
¶
-
txlog
¶
-
update_attributes
(var_dict, ignore=())[source]¶ Updates the member attributes with the attributes given in the var_dict
Parameters:
-
vlog
¶
-
trixi.experiment.pytorchexperiment.
experimentify
(setup_fn='setup', train_fn='train', validate_fn='validate', end_fn='end', test_fn='test', **decoargs)[source]¶ Experimental decorator which monkey patches your class into a PytorchExperiment. You can then call run on your new
PytorchExperiment
class.Parameters: - setup_fn – The name of your setup() function
- train_fn – The name of your train() function
- validate_fn – The name of your validate() function
- end_fn – The name of your end() function
- test_fn – The name of your test() function
-
trixi.experiment.pytorchexperiment.
get_last_file
(dir_, name=None)[source]¶ Returns the most recently created file in the folder which matches the name supplied
Parameters: - dir – The base directory to start the search in
- name – The name pattern to match with the files
Returns: the path to the most recent file
Return type: