AIModels.AIutil module¶

Utility Treatment Routines for ML Predicting models¶

This modules contains a set of routines for preparing data for insertion in transformers models used for prediction of multivariate time series data. The classes are contaiend in the companion file AIClasses.py and are imported in this module. The classes are contaiend in the companion file AIClasses.py and are imported in this module.

Utilities¶

AIModels.AIutil.func_name()[source]¶

Returns:: name of caller

AIModels.AIutil.copy_dict(data, strip_values=False, remove_keys=[])[source]¶

Copy dictionary

Parameters:

data (dict) -- Dictionary to be copied
strip_values (boolean) -- If True, strip values
remove_keys (list) -- List of keys to be removed

Returns:

out -- Copied dictionary without the keys in remove_keys

Return type:

dict

AIModels.AIutil.get_arealat_arealon(area)[source]¶

Obtain lat lon extent for various choices of areas

Parameters:

area (string) -- Area to be analyzed, possible values are

'TROPIC': Tropics (-35,35), [0,360]
'GLOBAL': Global (-60,60), [120,290]
'PACTROPIC': Pacific Tropics (-25,25), [180,290]
'WORLD': World (-70,70), [0,360]
'EUROPE': Europe (30,70), [-15,50]
'NORTH_AMERICA': North America (25,70), [200,310]
'NH-ML': Northern Hemisphere Mid-Latitudes (20,90), [0,360]

AIModels.AIutil.select_field(INX, outfield, verbose=False)[source]¶

Select field for outfield in dict

Parameters:

INX (dict) -- Dictionary with the fields to be analyzed
outfield (string) -- Field to be analyzed
verbose (boolean) -- If True, print the field information

Returns:

out (numpy array) -- Field data
arealat (numpy array) -- Latitude limits of the field
arealon (numpy array) -- Longitude limits of the field
centlon (numpy array) -- Central longitude of the field

AIModels.AIutil.select_field_key(INX, outfield, dataname)[source]¶

Select field for outfield in dict according to dataname

Parameters:

INX (dict) -- Dictionary with the fields to be analyzed
outfield (string) -- Field to be analyzed
dataname (string) -- Key to be analyzed

Returns:

out -- Field data

Return type:

numpy array

AIModels.AIutil.select_field_eof(INX, outfield)[source]¶

Select eof-related fields for outfield in dict

Parameters:

INX (dict) -- Dictionary with the fields to be analyzed
outfield (string) -- Field to be analyzed

Returns:

U (numpy array) -- EOF modes
S (numpy array) -- Singular values
V (numpy array) -- EOF coefficients

AIModels.AIutil.select_area(area, ZZ)[source]¶

Select area for dataset ZZ

Parameters:

area (string) -- Area to be analyzed, possible values are (EUROPE only implemented)
- 'TROPIC': Tropics
- 'GLOBAL': Global
- 'PACTROPIC': Pacific Tropics
- 'WORLD': World
- 'EUROPE': Europe
- 'NORTH_AMERICA': North America
- 'NH-ML': Northern Hemisphere Mid-Latitudes
ZZ (xarray dataset)

Returns:

Z

Return type:

xarray dataset

AIModels.AIutil.make_matrix(S, SMOOTH, normalization, shift, **kwargs)[source]¶

Make matrix for SVD analysis

Parameters:

S (xarray dataset) -- Dataset with the field to be analyzed
SMOOTH (boolean) -- If True, smooth the data
normalization (string) -- Type of normalization to be applied
shift (string) -- Type of shift to be applied to select the data
dropnan (String) -- Option to drop NaN values from Xmat True -- Uses indexed dropna in Xmat False -- Uses standard dropna in Xmat from Xarray
detrend (boolean) -- If True, detrend the data

Returns:

X -- Math Matrix with EOF coefficients

Return type:

xarray dataset

AIModels.AIutil.make_eof(X, mr, eof_interval=None)[source]¶

Compute EOF analysis on data matrix

Parameters:

X (X matrix) -- Data matrix in X format
mr (int) -- Number of modes to be retained
eof_interval (list) -- List with the starting and ending date for EOF analysis

Returns:

mr (int) -- Number of modes retained, in case the number of modes is less than rank Set to math.inf to keep all modes
var_retained (float) -- Percentage of variance retained
udat (numpy array) -- Matrix of EOF modes
vdat (numpy array) -- Matrix of EOF coefficients
sdat (numpy array) -- Vector of singular values

AIModels.AIutil.make_field(*args, **kwargs)[source]¶

Make field for analysis

Parameters:

area (string (postional argument)) -- Area to be analyzed, possible values are
- 'TROPIC': Tropics
- 'GLOBAL': Global
- 'PACTROPIC': Pacific Tropics
- 'WORLD': World
- 'EUROPE': Europe
- 'NORTH_AMERICA': North America
- 'NH-ML': Northern Hemisphere Mid-Latitudes
var (string (keyword argument)) -- Variable to be analyzed
level (string (keyword argument)) -- Level to be analyzed
period (string (keyword argument)) -- Period to be analyzed
version (string (keyword argument)) -- Version of the dataset
loc (string (keyword argument)) -- Location of the dataset

Returns:

Z (xarray dataset) -- Field to be analyzed
arealat (numpy array) -- Latitude limits of the field
arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_field_V5(area, **kwargs)[source]¶

Make field for analysis

Parameters:

area (string) -- Area to be analyzed, possible values are
- 'TROPIC': Tropics
- 'GLOBAL': Global
- 'PACTROPIC': Pacific Tropics
- 'WORLD': World
- 'EUROPE': Europe
- 'NORTH_AMERICA': North America
- 'NH-ML': Northern Hemisphere Mid-Latitudes
var (string) -- Variable to be analyzed
level (string) -- Level to be analyzed
period (string) -- Period to be analyzed
version (string) -- Version of the dataset
loc (string) -- Location of the dataset

Returns:

Z (xarray dataset) -- Field to be analyzed
arealat (numpy array) -- Latitude limits of the field
arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_field_HAD(area, **kwargs)[source]¶

Make field for analysis for HADSST

Parameters:

area (string) -- Area to be analyzed, possible values are
- 'TROPIC': Tropics
- 'GLOBAL': Global
- 'PACTROPIC': Pacific Tropics
- 'WORLD': World
- 'EUROPE': Europe
- 'NORTH_AMERICA': North America
- 'NH-ML': Northern Hemisphere Mid-Latitudes
var (string) -- Variable to be analyzed
level (string) -- Level to be analyzed
period (string) -- Period to be analyzed
version (string) -- Version of the dataset
loc (string) -- Location of the dataset

Returns:

Z (xarray dataset) -- Field to be analyzed
arealat (numpy array) -- Latitude limits of the field
arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.normalize_training_data(params, vdat, period='train', scaler=None, feature_scale=1)[source]¶

Normalize training data

Parameters:

params (dict) -- Dictionary with the parameters for the analysis
vdat (numpy array) -- Data to be normalized
period (string) -- Period to be analyzed
scaler (object) -- Scaler object
feature_scale (float) -- Feature scale

Returns:

datatr (numpy array) -- Normalized data
sstr (object) -- Scaler object

AIModels.AIutil.make_data(INX, params)[source]¶

Prepare data for analysis and concatenate as needed. Modify input INX dictionary by adding values for scaler and index for each field. scaler is the scaler used, index is the index of the data in the concatenated matrix INX.

The Convention for indeces is that they point to the real date. If python ranges need to be defined then it must take into account the extra 1 in the end of the range.

Parameters:

INX (dict) -- Dictionary with the fields to be analyzed
params (dict) -- Dictionary with the parameters for the analysis

Returns:

datain (numpy array) -- Matrix with the data to be analyzed
INX (dict) -- Input dictionary updated with the information from the data analysis

AIModels.AIutil.make_features(INX)[source]¶

Prepare features for analysis and compute features boundaries

Parameters:

INX (dict) -- Dictionary with the fields to be analyzed

Returns:

num_features (int) -- Number of features
m_limit (list) -- List with the boundaries of the features

AIModels.AIutil.make_data_base(InputVars, period='ANN', version='V5', SMOOTH=False, normalization='STD', eof_interval=None, detrend=False, shift='ERA5', case=None, datatype='Source_data', location='DDIR')[source]¶

Organize data variables in data base INX

Parameters:

InputVars (list) -- List of variables to be analyzed
period (string) -- Period to be analyzed
version (string) -- Version of the dataset
SMOOTH (boolean) -- If True, smooth the data
normalization (string) -- Type of normalization to be applied
eof_interval (list) -- List with the starting and ending date for EOF analysis
detrend (boolean) -- If True, detrend the data
shift (string) -- Type of shift to be applied to select the data
case (string) -- Case to be analyzed
datatype (string) -- Type of data to be analyzed, either Source_data or Target_data
location (string) -- Data directory

Returns:

INX -- Dictionary with the fields to be analyzed. The dictionary is organized as follows: INX = {'id1':{'case':case,'datatype': datatype,'field':invar.name,'level':inlevel, 'centlon':centlon,'arealat':arealat, 'arealon':arealon, X':X,'xdat':xdat,'mr':mr,'var_retained':varr,'udat':udat,'vdat':vdat,'sdat':sdat}}

Return type:

dict

AIModels.AIutil.init_weights(m)[source]¶: Initialize weights with uniform ditribution

AIModels.AIutil.count_parameters(model)[source]¶: Count Model Parameters

AIModels.AIutil.epoch_time(start_time, end_time)[source]¶: Compute Execution time per epoch

AIModels.AIutil.create_time_features(data_time, start, device)[source]¶

Create the past features for monthly means

Parameters:

data_time (xarray dataset) -- Time data
start (datetime) -- Starting date
device (string) -- Device to be used for computation

Returns:

pasft -- Tensor with the past features Three levels:

The first level is the month sequence
The second level is the seasonal cycle
The third is the year

Return type:

torch tensor

AIModels.AIutil.rescale(params, PDX, out_train0, out_val0, out_test0, verbose=True)[source]¶

' Rescale data to original values, according to scaling choice in

Parameters:

params (dict) -- Dictionary with the parameters for the analysis
PDX (dict) -- Dictionary with the information for the fields to be analyzed
out_train0 (numpy array) -- Training data
out_val0 (numpy array) -- Validation data
out_test0 (numpy array) -- Test data

Returns:

out_train (numpy array) -- Rescaled training data
out_val (numpy array) -- Rescaled validation data
out_test (numpy array) -- Rescaled test data
true (numpy array) -- Original data

AIModels.AIutil.matrix_rank_light(X, S)[source]¶

Compute the rank of a matrix using the singular values

Parameters:

X (numpy array) -- Matrix to be analyzed
S (numpy array) -- Singular values of the matrix

Returns:

rank -- Rank of the matrix

Return type:

int

AIModels.AIutil.CPRSS(observations, climate, forecasts)[source]¶

Compute the CRPSS score for ensemble forecasts with respect to the climate reference.

Positive CRPSS (0 < CRPSS ≤ 1): Indicates that your forecast model has better skill than the reference model.: Closer to 1: Greater improvement over the reference.
Zero CRPSS (CRPSS = 0): Your forecast model performs equally to the reference model.: Negative CRPSS (CRPSS < 0): Indicates that your forecast model performs worse than the reference model. More Negative: Greater underperformance compared to the reference.

Parameters:

observations (numpy array) -- Observations data
forecasts (numpy array) -- Forecasts data
climate (numpy array) -- Reference data

Returns:

score -- CRPS score

Return type:

float

AIModels.AIutil.transform_strings(strings)[source]¶

Transforms a list of strings by extracting the repeating pattern (2, 3, or 7 characters long) from each string.

Parameters:: strings (list of str): List of input strings.
Returns:: list of str: Transformed strings with only the repeating pattern.

AIModels.AIutil.make_fcst_array(startdate, enddate, leads, data)[source]¶

Make forecast array for verification. The input uses the xarray DataArray format of dimension (ntim, lead, z) where z is a stacked coordinate.

The output is an xarray DataArray with the time, lead, and z dimensions, and the valid_time coordinate as a 2D array.

The lead time starts from 0 as the last element of the input sequence to leads-1.

Parameters:

startdate (string) -- Starting date for the forecast
enddate (string) -- Ending date for the forecast
leads (int) -- Number of leads, including the IC
data (xarray DataArray) -- DataArray with the forecast data

Returns:

out -- Forecast array for verification

Return type:

xarray DataArray

AIModels.AIutil.eof_to_grid(field, forecasts, startdate, enddate, params=None, INX=None, truncation=None)[source]¶

Transform from the eof representation to the grid representation, putting data in the format (Np, Tpredict,gridpoints) in stacked format . Where Np is the total number of cases that is given by $N_p = L - TIN - Tpredict +1$ L is the total length of the test period and Tpredict is the number of lead times and gridpoints is the number of grid points in the field. All fields start at month 1 of the prediction.

Parameters:

field (string) -- Field to be analyzed
forecasts (numpy array) -- Forecasts data from the network
startdate (int) -- Starting date for the forecast (IC)
enddate (int) -- Ending date for the forecast
params (dict) -- Dictionary with the parameters for the analysis
INX (dict) -- Dictionary with the information for the fields to be analyzed
truncation (int) -- Number of modes to be retained in the observations

Returns:

The routines returns arrays in the format of xarray datasets
as (Np,lead,grid points) in stacked format
Fcst (xarray dataset) -- Forecast data in grid representation with dims (Np,lead,grid points)
Per (xarray dataset) -- Persistence data in grid representation with dims (Np,lead,grid points)

AIModels.AIutil.advance_months(ts, n=1)[source]¶

Advance or reduce a Timestamp by n months

Parameters:

ts (pd.Timestamp or str) -- Timestamp to be modified. If str, it should be in the YYYY-MM-DD format.
n (int) -- Number of months to advance (positive) or reduce (negative)

Returns:

Modified timestamp. If input was a string, output will be a string in the YYYY-MM-DD format.

Return type:

pd.Timestamp or str

AIModels.AIutil.project_dyn(data, INX, field, truncation=None)[source]¶

Project the dynamic data into the EOF space

Parameters:

data (xarray dataset) -- Dynamic data
INX (dict) -- Dictionary with the information for the fields to be analyzed
field (string) -- Field to be analyzed
truncation (int) -- Number of modes to be retained in the observations

Returns:

out -- Projected data, in stacked format with NaN values

Return type:

xarray dataset

AIModels.AIutil.make_dyn_verification_new(ver_f, area, dyn_cases, dynddr, times, filever)[source]¶

Make dynamic verification data. Read all time levels for the GCM data

Parameters:

ver_f (numpy array) -- Verification data
area (string) -- Area to be analyzed
dyn_cases (list) -- List of cases to be analyzed
dynaddr (string) -- Address of the dynamic data
times (numpy array) -- Time data
filever (string) -- Name of the file to be written

Returns:

ver_d -- Verification data in numpy format

Return type:

numpy array

AIModels.AIutil.compute_increments(tensor, axis=0)[source]¶

Take the difference of the torch tensor along the specified axis and output the initial value

Parameters:

tensor (torch tensor) -- Tensor to be analyzed
axis (int) -- Axis along which to compute the differences

Returns:

diff (torch tensor) -- Tensor with the differences along the specified axis
init_value (torch tensor) -- Initial value along the specified axis

AIModels.AIutil.cumsum_with_init(differences, init_value)[source]¶

Compute cumulative sum with initial values for a PyTorch tensor.

This function calculates the cumulative sum of a tensor containing differences, while incorporating specific initial values. The input tensor differences has dimensions (time, nlead, neof), and the initial values tensor init_value has dimensions (neof). The initial values are added to the first time step of each lead slice, and the cumulative sum is computed along the time dimension.

Parameters:

differences (torch.Tensor) -- Tensor containing the differences with dimensions (time, nlead, neof).
init_value (torch.Tensor) -- 1D tensor representing the initial values, with size matching the last dimension (neof) of differences.

Returns:

Tensor after computing the cumulative sum with initial values, having the same shape as the input differences.

Return type:

torch.Tensor

Raises:

ValueError -- If the size of init_value does not match the last dimension of differences.
Computes the cumulative sum for a PyTorch tensor of differences with specific initial values. --
The differences tensor has dimensions (time, nlead, neof), and the initial values tensor --
has dimensions (neof). The initial values are added to the first time level of each slice (lead) --
and then accumulated using the cumulative sum. --
Parameters: --
- differences -- torch.Tensor: Input tensor containing the differences with dimensions (time, nlead, neof).
- init_value -- torch.Tensor: A 1D tensor representing the initial values, with a size matching the last dimension (neof).
Returns: --
- result -- torch.Tensor: The resulting tensor after computing the cumulative sum with initial values.

AIModels.AIutil.select_fcst(IC, my_data)[source]¶

Select the forecast data for the given initial condition IC from the xarray my_data dataset.

Parameters:

IC (string) -- Initial condition for the forecast
my_data (xarray dataset) -- Forecast data

Returns:

ds_single_init -- Forecast data for verification with coordinate "time" as the valid_time

Return type:

xarray dataset

AIModels.AIutil.variance_features(INX)[source]¶

retur the variance of the features

Parameters:: INX (dict) -- Dictionary with the information for the fields to be analyzed
Returns:: ssvar -- Variance of the features
Return type:: numpy array

AIModels.AIutil.create_subdirectory(parent_dir, subdirectory_name)[source]¶

Create a subdirectory within the specified parent directory if it does not exist.

Parameters:

parent_dir (str) -- The path to the parent directory.
subdirectory_name (str) -- The name of the subdirectory to create.

Returns:

The full path of the created or existing subdirectory.

Return type:

str

AIModels.AIutil.eof_to_grid_new(field, forecasts, startdate, enddate, params=None, INX=None, truncation=None)[source]¶

Transform from the EOF representation to the grid representation. The routine now supports forecasts arrays with an extra ensemble dimension.

For the original case, forecasts has shape (Np, T, n_eofs) and the output forecast dataset has dims (Np, lead, gridpoints). Now, if forecasts has shape (Np, K, T, n_eofs), the output forecast dataset will have dims (Np, member, lead, gridpoints).

Parameters:

field (string) -- Field to be analyzed
forecasts (numpy array) -- Forecast data from the network. Expected shape is either (Np, T, n_eofs) or (Np, K, T, n_eofs), where K is the ensemble size.
startdate (datetime-like) -- Starting date for the forecast (initial condition)
enddate (datetime-like) -- Ending date for the forecast
params (dict) -- Dictionary with the parameters for the analysis. Must include 'Tpredict', 'TIN', and 'T'
INX (dict) -- Dictionary with information for the fields to be analyzed (including the EOF patterns)
truncation (int, optional) -- Number of modes to be retained in the observations

Returns:

Fcst (xarray DataArray) -- Forecast data in grid representation. * If forecasts is 3D, dims are (time, lead, z). * If forecasts is 4D, dims are (time, member, lead, z).
Per (xarray DataArray) -- Persistence data in grid representation, with the same dims as Fcst.
Obs (xarray DataArray) -- Observation data in grid representation, with dims (time, z)

AIModels.AIutil.get_common_dates(Ft, Dt)[source]¶

Get common dates between two xarray datasets and their indices.

Parameters:

Ft (xarray.DataArray) -- First dataset with a time dimension.
Dt (xarray.DataArray) -- Second dataset with a time dimension.

Returns:

common_dates (numpy.ndarray) -- Array of common dates.
Ft_indices (numpy.ndarray) -- Indices of common dates in the first dataset.
Dt_indices (numpy.ndarray) -- Indices of common dates in the second dataset.

AIModels.AIutil module¶

Utility Treatment Routines for ML Predicting models¶

Utilities¶

Table of Contents

Previous topic

Next topic

This Page