AIModels.ClimFormer module

ClimFormer class, a subclass of InformerForPrediction

This class is a subclass of Informer It contains classes for time series dataset, future time series dataset, and two subclasses of InformerForPrediction and TimeSeriesTransformerForPrediction

Utilities

class AIModels.ClimFormer.TimeSeriesDataset(datasrc, datatgt, TIN, MIN, T, K, time_features=None)[source]

Bases: Dataset

Class for time series dataset. Includes time feature for transformers

Parameters:
  • datasrc (numpy array) -- Source data

  • datatgt (numpy array) -- Target data

  • TIN (int) -- Input time steps

  • MIN (int) -- Input variables size

  • T (int) -- Predictions time steps

  • K (int) -- Output variables size

  • time_features (numpy array (optional)) -- If not None contain Time Features

Variables:
  • datasrc (numpy array) -- Source data

  • datatgt (numpy array) -- Target data

  • time_features (numpy array) -- Time features

  • TIN (int) -- Input time steps

  • MIN (int) -- Input variables

  • T (int) -- Output time steps

  • K (int) -- Output variables

class AIModels.ClimFormer.TimeSeriesFuture(datasrc, datatgt, TIN, MIN, T, K, Tpredict, time_features=None)[source]

Bases: Dataset

Class for time series dataset. Includes future time feature for prediction with informer

Parameters:
  • datasrc (numpy array) -- Source data

  • datatgt (numpy array) -- Target data

  • TIN (int) -- Input time steps

  • MIN (int) -- Input variables size

  • T (int) -- Predictions time steps

  • K (int) -- Output variables size

  • time_features (numpy array (optional)) -- If not None contain Time Features

  • shift --

    Overlap between source and target, for trasnformers

    overlap = 0 for LSTM overlap should be TIN-T

Variables:
  • datasrc (numpy array) -- Source data

  • datatgt (numpy array) -- Target data

  • time_features (numpy array) -- Time features

  • TIN (int) -- Input time steps

  • MIN (int) -- Input variables

  • T (int) -- Output time steps

  • K (int) -- Output variables

class AIModels.ClimFormer.ClimFormer(config)[source]

Bases: InformerForPrediction

Class for training and prediction with InformerForPrediction model from transformers library

class AIModels.ClimFormer.TrasFormer(*args)[source]

Bases: TimeSeriesTransformerForPrediction

Class for training and prediction with TimeSeriesTransformerForPrediction model from transformers library

class AIModels.ClimFormer.ClimFormerDeter(config)[source]

Bases: InformerModel

Class for training and prediction with InformerModel model, deterministic version. It projects with a fully connected layer on output_size the last hidden state of the model.

forward(past_values: Tensor, past_time_features: Tensor, past_observed_mask: Tensor, static_categorical_features: Tensor | None = None, static_real_features: Tensor | None = None, future_values: Tensor | None = None, future_time_features: Tensor | None = None, decoder_attention_mask: LongTensor | None = None, head_mask: Tensor | None = None, decoder_head_mask: Tensor | None = None, cross_attn_head_mask: Tensor | None = None, encoder_outputs: List[FloatTensor] | None = None, past_key_values: List[FloatTensor] | None = None, output_hidden_states: bool | None = None, output_attentions: bool | None = None, use_cache: bool | None = None, return_dict: bool | None = None)[source]

The [InformerModel] forward method, overrides the __call__ special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

</Tip>

Args:
past_values (torch.FloatTensor of shape (batch_size, sequence_length) or (batch_size, sequence_length, input_size)):

Past values of the time series, that serve as context in order to predict the future. The sequence size of this tensor must be larger than the context_length of the model, since the model will use the larger size to construct lag features, i.e. additional values from the past which are added in order to serve as "extra context".

The sequence_length here is equal to config.context_length + max(config.lags_sequence), which if no lags_sequence is configured, is equal to config.context_length + 7 (as by default, the largest look-back index in config.lags_sequence is 7). The property _past_length returns the actual length of the past.

The past_values is what the Transformer encoder gets as input (with optional additional features, such as static_categorical_features, static_real_features, past_time_features and lags).

Optionally, missing values need to be replaced with zeros and indicated via the past_observed_mask.

For multivariate time series, the input_size > 1 dimension is required and corresponds to the number of variates in the time series per time step.

past_time_features (torch.FloatTensor of shape (batch_size, sequence_length, num_features)):

Required time features, which the model internally will add to past_values. These could be things like "month of year", "day of the month", etc. encoded as vectors (for instance as Fourier features). These could also be so-called "age" features, which basically help the model know "at which point in life" a time-series is. Age features have small values for distant past time steps and increase monotonically the more we approach the current time step. Holiday features are also a good example of time features.

These features serve as the "positional encodings" of the inputs. So contrary to a model like BERT, where the position encodings are learned from scratch internally as parameters of the model, the Time Series Transformer requires to provide additional time features. The Time Series Transformer only learns additional embeddings for static_categorical_features.

Additional dynamic real covariates can be concatenated to this tensor, with the caveat that these features must but known at prediction time.

The num_features here is equal to config.`num_time_features + config.num_dynamic_real_features.

past_observed_mask (torch.BoolTensor of shape (batch_size, sequence_length) or (batch_size, sequence_length, input_size), optional):

Boolean mask to indicate which past_values were observed and which were missing. Mask values selected in [0, 1]:

  • 1 for values that are observed,

  • 0 for values that are missing (i.e. NaNs that were replaced by zeros).

static_categorical_features (torch.LongTensor of shape (batch_size, number of static categorical features), optional):

Optional static categorical features for which the model will learn an embedding, which it will add to the values of the time series.

Static categorical features are features which have the same value for all time steps (static over time).

A typical example of a static categorical feature is a time series ID.

static_real_features (torch.FloatTensor of shape (batch_size, number of static real features), optional):

Optional static real features which the model will add to the values of the time series.

Static real features are features which have the same value for all time steps (static over time).

A typical example of a static real feature is promotion information.

future_values (torch.FloatTensor of shape (batch_size, prediction_length) or (batch_size, prediction_length, input_size), optional):

Future values of the time series, that serve as labels for the model. The future_values is what the Transformer needs during training to learn to output, given the past_values.

The sequence length here is equal to prediction_length.

See the demo notebook and code snippets for details.

Optionally, during training any missing values need to be replaced with zeros and indicated via the future_observed_mask.

For multivariate time series, the input_size > 1 dimension is required and corresponds to the number of variates in the time series per time step.

future_time_features (torch.FloatTensor of shape (batch_size, prediction_length, num_features)):

Required time features for the prediction window, which the model internally will add to future_values. These could be things like "month of year", "day of the month", etc. encoded as vectors (for instance as Fourier features). These could also be so-called "age" features, which basically help the model know "at which point in life" a time-series is. Age features have small values for distant past time steps and increase monotonically the more we approach the current time step. Holiday features are also a good example of time features.

These features serve as the "positional encodings" of the inputs. So contrary to a model like BERT, where the position encodings are learned from scratch internally as parameters of the model, the Time Series Transformer requires to provide additional time features. The Time Series Transformer only learns additional embeddings for static_categorical_features.

Additional dynamic real covariates can be concatenated to this tensor, with the caveat that these features must but known at prediction time.

The num_features here is equal to config.`num_time_features + config.num_dynamic_real_features.

future_observed_mask (torch.BoolTensor of shape (batch_size, sequence_length) or (batch_size, sequence_length, input_size), optional):

Boolean mask to indicate which future_values were observed and which were missing. Mask values selected in [0, 1]:

  • 1 for values that are observed,

  • 0 for values that are missing (i.e. NaNs that were replaced by zeros).

This mask is used to filter out missing values for the final loss calculation.

attention_mask (torch.Tensor of shape (batch_size, sequence_length), optional):

Mask to avoid performing attention on certain token indices. Mask values selected in [0, 1]:

  • 1 for tokens that are not masked,

  • 0 for tokens that are masked.

[What are attention masks?](../glossary#attention-mask)

decoder_attention_mask (torch.LongTensor of shape (batch_size, target_sequence_length), optional):

Mask to avoid performing attention on certain token indices. By default, a causal mask will be used, to make sure the model can only look at previous inputs in order to predict the future.

head_mask (torch.Tensor of shape (encoder_layers, encoder_attention_heads), optional):

Mask to nullify selected heads of the attention modules in the encoder. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,

  • 0 indicates the head is masked.

decoder_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional):

Mask to nullify selected heads of the attention modules in the decoder. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,

  • 0 indicates the head is masked.

cross_attn_head_mask (torch.Tensor of shape (decoder_layers, decoder_attention_heads), optional):

Mask to nullify selected heads of the cross-attention modules. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,

  • 0 indicates the head is masked.

encoder_outputs (tuple(tuple(torch.FloatTensor), optional):

Tuple consists of last_hidden_state, hidden_states (optional) and attentions (optional) last_hidden_state of shape (batch_size, sequence_length, hidden_size) (optional) is a sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.

past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True):

Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.

If past_key_values are used, the user can optionally input only the last decoder_input_ids (those that don't have their past key value states given to this model) of shape (batch_size, 1) instead of all decoder_input_ids of shape (batch_size, sequence_length).

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model's internal embedding lookup matrix.

use_cache (bool, optional):

If set to True, past_key_values key value states are returned and can be used to speed up decoding (see past_key_values).

output_attentions (bool, optional):

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional):

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

return_dict (bool, optional):

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

Returns:

[transformers.modeling_outputs.Seq2SeqTSModelOutput] or tuple(torch.FloatTensor): A [transformers.modeling_outputs.Seq2SeqTSModelOutput] or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration ([InformerConfig]) and inputs.

  • last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size)) -- Sequence of hidden-states at the output of the last layer of the decoder of the model.

    If past_key_values is used only the last hidden-state of the sequences of shape (batch_size, 1, hidden_size) is output.

  • past_key_values (tuple(tuple(torch.FloatTensor)), optional, returned when use_cache=True is passed or when config.use_cache=True) -- Tuple of tuple(torch.FloatTensor) of length config.n_layers, with each tuple having 2 tensors of shape (batch_size, num_heads, sequence_length, embed_size_per_head)) and 2 additional tensors of shape (batch_size, num_heads, encoder_sequence_length, embed_size_per_head).

    Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention blocks) that can be used (see past_key_values input) to speed up sequential decoding.

  • decoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) -- Tuple of torch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the decoder at the output of each layer plus the optional initial embedding outputs.

  • decoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) -- Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

  • cross_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) -- Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the weighted average in the cross-attention heads.

  • encoder_last_hidden_state (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional) -- Sequence of hidden-states at the output of the last layer of the encoder of the model.

  • encoder_hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) -- Tuple of torch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the encoder at the output of each layer plus the optional initial embedding outputs.

  • encoder_attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) -- Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the self-attention heads.

  • loc (torch.FloatTensor of shape (batch_size,) or (batch_size, input_size), optional) -- Shift values of each time series' context window which is used to give the model inputs of the same magnitude and then used to shift back to the original magnitude.

  • scale (torch.FloatTensor of shape (batch_size,) or (batch_size, input_size), optional) -- Scaling values of each time series' context window which is used to give the model inputs of the same magnitude and then used to rescale back to the original magnitude.

  • static_features (torch.FloatTensor of shape (batch_size, feature size), optional) -- Static features of each time series' in a batch which are copied to the covariates at inference time.

Examples:

```python >>> from huggingface_hub import hf_hub_download >>> import torch >>> from transformers import InformerModel

>>> file = hf_hub_download(
...     repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)
>>> model = InformerModel.from_pretrained("huggingface/informer-tourism-monthly")
>>> # during training, one provides both past and future values
>>> # as well as possible additional features
>>> outputs = model(
...     past_values=batch["past_values"],
...     past_time_features=batch["past_time_features"],
...     past_observed_mask=batch["past_observed_mask"],
...     static_categorical_features=batch["static_categorical_features"],
...     static_real_features=batch["static_real_features"],
...     future_values=batch["future_values"],
...     future_time_features=batch["future_time_features"],
... )
>>> last_hidden_state = outputs.last_hidden_state
```