AIModels.LocalInformer module

Modified version of the InformerForPrediction class

This modules contains routines modified from the original InformerForPrediction class from the Huggingface library. The modifications are made to allow for the use of the InformerModel class in the training and prediction of time series data with weights and deterministic loss functions. The original Informer prediction classes can be found at https://huggingface.co/docs/transformers/main/model_doc/informer.

Main Modifications:

1. The InformerForPrediction class has been modified to allow for the use of deterministic loss functions and weights in the training of time series data. The modifications are made to the forward method of the class. The original forward method is replaced with a new forward method that allows for the use of deterministic loss functions and weights in the training of time series data. The new forward method also allows for the use of the InformerModel class in the training and prediction of time series data with weights and deterministic loss functions.

2. The generate method of the InformerForPrediction class has been modified to allow for the use of deterministic loss functions and weights in the training of time series data. The modifications are made to the generate method of the class. The original generate method is replaced with a new generate method that allows for the use of deterministic loss functions and weights in the training of time series data.

3. The output_distribution method of the InformerForPrediction class has been modified to allow for the use of deterministic loss functions and weights in the training of time series data. The modifications are made to the output_distribution method of the class. The original output_distribution method is replaced with a new output_distribution method that allows for the use of deterministic loss functions and weights in the training of time series data.

4. The InformerForPrediction class has been modified to allow for the use of deterministic loss functions and weights in the training of time series data. The modifications are made to the forward method of the class. The original forward method is replaced with a new forward method that allows for the use of deterministic loss functions and weights in the training of time series data. The new forward method also allows for the use of the InformerModel class in the training and prediction of time series data with weights and deterministic loss functions.

Utilities

class AIModels.LocalInformer.InformerModel(config: InformerConfig)[source]

Bases: InformerPreTrainedModel

get_lagged_subsequences(sequence: Tensor, subsequences_length: int, shift: int = 0) Tensor[source]

Returns lagged subsequences of a given sequence

Parameters:
  • sequence (Tensor) -- The sequence from which lagged subsequences should be extracted. Shape: (N, T, C).

  • subsequences_length (int) -- Length of the subsequences to be extracted.

  • shift (int) -- Shift the lags by this amount back.

Returns:

Returns a tensor of shape (N, S, C, I), where S = subsequences_length and I = len(indices), containing lagged subsequences. Specifically, lagged[i, j, :, k] = sequence[i, -indices[k]-S+j, :].

Return type:

Tensor

create_network_inputs(past_values: Tensor, past_time_features: Tensor, static_categorical_features: Tensor | None = None, static_real_features: Tensor | None = None, past_observed_mask: Tensor | None = None, future_values: Tensor | None = None, future_time_features: Tensor | None = None)[source]
get_encoder()[source]
get_decoder()[source]
forward(past_values: Tensor, past_time_features: Tensor, past_observed_mask: Tensor, static_categorical_features: Tensor | None = None, static_real_features: Tensor | None = None, future_values: Tensor | None = None, future_time_features: Tensor | None = None, decoder_attention_mask: LongTensor | None = None, head_mask: Tensor | None = None, decoder_head_mask: Tensor | None = None, cross_attn_head_mask: Tensor | None = None, encoder_outputs: List[FloatTensor] | None = None, past_key_values: List[FloatTensor] | None = None, output_hidden_states: bool | None = None, output_attentions: bool | None = None, use_cache: bool | None = None, return_dict: bool | None = None) Seq2SeqTSModelOutput | Tuple[source]

python:

>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import InformerModel
>>> file = hf_hub_download(
...     repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)
>>> model = InformerModel.from_pretrained("huggingface/informer-tourism-monthly")
>>> # during training, one provides both past and future values
>>> # as well as possible additional features
>>> outputs = model(
...     past_values=batch["past_values"],
...     past_time_features=batch["past_time_features"],
...     past_observed_mask=batch["past_observed_mask"],
...     static_categorical_features=batch["static_categorical_features"],
...     static_real_features=batch["static_real_features"],
...     future_values=batch["future_values"],
...     future_time_features=batch["future_time_features"],
... )
>>> last_hidden_state = outputs.last_hidden_state
class AIModels.LocalInformer.InformerForPrediction(config: InformerConfig)[source]

Bases: InformerPreTrainedModel

output_params(dec_output)[source]
get_encoder()[source]
get_decoder()[source]
output_distribution(params, loc=None, scale=None, trailing_n=None) Distribution[source]
forward(past_values: Tensor, past_time_features: Tensor, past_observed_mask: Tensor, static_categorical_features: Tensor | None = None, static_real_features: Tensor | None = None, future_values: Tensor | None = None, future_time_features: Tensor | None = None, future_observed_mask: Tensor | None = None, decoder_attention_mask: LongTensor | None = None, head_mask: Tensor | None = None, decoder_head_mask: Tensor | None = None, cross_attn_head_mask: Tensor | None = None, encoder_outputs: List[FloatTensor] | None = None, past_key_values: List[FloatTensor] | None = None, output_hidden_states: bool | None = None, output_attentions: bool | None = None, use_cache: bool | None = None, return_dict: bool | None = None) Seq2SeqTSModelOutput | Tuple[source]

Returns:

Examples:

python:

>>> from huggingface_hub import hf_hub_download
>>> import torch
>>> from transformers import InformerForPrediction
>>> file = hf_hub_download(
...     repo_id="hf-internal-testing/tourism-monthly-batch", filename="train-batch.pt", repo_type="dataset"
... )
>>> batch = torch.load(file)
>>> model = InformerForPrediction.from_pretrained("huggingface/informer-tourism-monthly")
>>> # during training, one provides both past and future values
>>> # as well as possible additional features
>>> outputs = model(
...     past_values=batch["past_values"],
...     past_time_features=batch["past_time_features"],
...     past_observed_mask=batch["past_observed_mask"],
...     static_categorical_features=batch["static_categorical_features"],
...     static_real_features=batch["static_real_features"],
...     future_values=batch["future_values"],
...     future_time_features=batch["future_time_features"],
... )
>>> loss = outputs.loss
>>> loss.backward()
>>> # during inference, one only provides past values
>>> # as well as possible additional features
>>> # the model autoregressively generates future values
>>> outputs = model.generate(
...     past_values=batch["past_values"],
...     past_time_features=batch["past_time_features"],
...     past_observed_mask=batch["past_observed_mask"],
...     static_categorical_features=batch["static_categorical_features"],
...     static_real_features=batch["static_real_features"],
...     future_time_features=batch["future_time_features"],
... )
>>> mean_prediction = outputs.sequences.mean(dim=1)
generate(past_values: Tensor, past_time_features: Tensor, future_time_features: Tensor, past_observed_mask: Tensor | None = None, static_categorical_features: Tensor | None = None, static_real_features: Tensor | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None) SampleTSPredictionOutput[source]

Greedily generate sequences of sample predictions from a model with a probability distribution head.

Parameters:
past_values (torch.FloatTensor of shape (batch_size, sequence_length) or (batch_size, sequence_length, input_size)):

Past values of the time series, that serve as context in order to predict the future. The sequence size of this tensor must be larger than the context_length of the model, since the model will use the larger size to construct lag features, i.e. additional values from the past which are added in order to serve as "extra context".

The sequence_length here is equal to config.context_length + max(config.lags_sequence), which if no lags_sequence is configured, is equal to config.context_length + 7 (as by default, the largest look-back index in config.lags_sequence is 7). The property _past_length returns the actual length of the past.

The past_values is what the Transformer encoder gets as input (with optional additional features, such as static_categorical_features, static_real_features, past_time_features and lags).

Optionally, missing values need to be replaced with zeros and indicated via the past_observed_mask.

For multivariate time series, the input_size > 1 dimension is required and corresponds to the number of variates in the time series per time step.

past_time_features (torch.FloatTensor of shape (batch_size, sequence_length, num_features)):

Required time features, which the model internally will add to past_values. These could be things like "month of year", "day of the month", etc. encoded as vectors (for instance as Fourier features). These could also be so-called "age" features, which basically help the model know "at which point in life" a time-series is. Age features have small values for distant past time steps and increase monotonically the more we approach the current time step. Holiday features are also a good example of time features.

These features serve as the "positional encodings" of the inputs. So contrary to a model like BERT, where the position encodings are learned from scratch internally as parameters of the model, the Time Series Transformer requires to provide additional time features. The Time Series Transformer only learns additional embeddings for static_categorical_features.

Additional dynamic real covariates can be concatenated to this tensor, with the caveat that these features must but known at prediction time.

The num_features here is equal to config.`num_time_features + config.num_dynamic_real_features.

future_time_features (torch.FloatTensor of shape (batch_size, prediction_length, num_features)):

Required time features for the prediction window, which the model internally will add to sampled predictions. These could be things like "month of year", "day of the month", etc. encoded as vectors (for instance as Fourier features). These could also be so-called "age" features, which basically help the model know "at which point in life" a time-series is. Age features have small values for distant past time steps and increase monotonically the more we approach the current time step. Holiday features are also a good example of time features.

These features serve as the "positional encodings" of the inputs. So contrary to a model like BERT, where the position encodings are learned from scratch internally as parameters of the model, the Time Series Transformer requires to provide additional time features. The Time Series Transformer only learns additional embeddings for static_categorical_features.

Additional dynamic real covariates can be concatenated to this tensor, with the caveat that these features must but known at prediction time.

The num_features here is equal to config.`num_time_features + config.num_dynamic_real_features.

past_observed_mask (torch.BoolTensor of shape (batch_size, sequence_length) or (batch_size, sequence_length, input_size), optional):

Boolean mask to indicate which past_values were observed and which were missing. Mask values selected in [0, 1]:

  • 1 for values that are observed,

  • 0 for values that are missing (i.e. NaNs that were replaced by zeros).

static_categorical_features (torch.LongTensor of shape (batch_size, number of static categorical features), optional):

Optional static categorical features for which the model will learn an embedding, which it will add to the values of the time series.

Static categorical features are features which have the same value for all time steps (static over time).

A typical example of a static categorical feature is a time series ID.

static_real_features (torch.FloatTensor of shape (batch_size, number of static real features), optional):

Optional static real features which the model will add to the values of the time series.

Static real features are features which have the same value for all time steps (static over time).

A typical example of a static real feature is promotion information.

output_attentions (bool, optional):

Whether or not to return the attentions tensors of all attention layers.

output_hidden_states (bool, optional):

Whether or not to return the hidden states of all layers.

Return:

[SampleTSPredictionOutput] where the outputs sequences tensor will have shape (batch_size, number of samples, prediction_length) or (batch_size, number of samples, prediction_length, input_size) for multivariate predictions.