AIModels.AIutil module

Utility Treatment Routines for ML Predicting models

This modules contains a set of routines for preparing data for insertion in transformers models used for prediction of multivariate time series data. The classes are contaiend in the companion file AIClasses.py and are imported in this module. The classes are contaiend in the companion file AIClasses.py and are imported in this module.

Utilities

AIModels.AIutil.func_name()[source]
Returns:

name of caller

AIModels.AIutil.copy_dict(data, strip_values=False, remove_keys=[])[source]

Copy dictionary

Parameters:
  • data (dict) -- Dictionary to be copied

  • strip_values (boolean) -- If True, strip values

  • remove_keys (list) -- List of keys to be removed

Returns:

out -- Copied dictionary without the keys in remove_keys

Return type:

dict

AIModels.AIutil.get_arealat_arealon(area)[source]

Obtain lat lon extent for various choices of areas

Parameters:

area (string) -- Area to be analyzed, possible values are

  • 'TROPIC': Tropics (-35,35), [0,360]

  • 'GLOBAL': Global (-60,60), [120,290]

  • 'PACTROPIC': Pacific Tropics (-25,25), [180,290]

  • 'WORLD': World (-70,70), [0,360]

  • 'EUROPE': Europe (30,70), [-15,50]

  • 'NORTH_AMERICA': North America (25,70), [200,310]

  • 'NH-ML': Northern Hemisphere Mid-Latitudes (20,90), [0,360]

AIModels.AIutil.select_field(INX, outfield, verbose=False)[source]

Select field for outfield in dict

Parameters:
  • INX (dict) -- Dictionary with the fields to be analyzed

  • outfield (string) -- Field to be analyzed

  • verbose (boolean) -- If True, print the field information

Returns:

  • out (numpy array) -- Field data

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

  • centlon (numpy array) -- Central longitude of the field

AIModels.AIutil.select_field_key(INX, outfield, dataname)[source]

Select field for outfield in dict according to dataname

Parameters:
  • INX (dict) -- Dictionary with the fields to be analyzed

  • outfield (string) -- Field to be analyzed

  • dataname (string) -- Key to be analyzed

Returns:

out -- Field data

Return type:

numpy array

AIModels.AIutil.select_field_eof(INX, outfield)[source]

Select eof-related fields for outfield in dict

Parameters:
  • INX (dict) -- Dictionary with the fields to be analyzed

  • outfield (string) -- Field to be analyzed

Returns:

  • U (numpy array) -- EOF modes

  • S (numpy array) -- Singular values

  • V (numpy array) -- EOF coefficients

AIModels.AIutil.select_area(area, ZZ)[source]

Select area for dataset ZZ

Parameters:
  • area (string) -- Area to be analyzed, possible values are (EUROPE only implemented)

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • ZZ (xarray dataset)

Returns:

Z

Return type:

xarray dataset

AIModels.AIutil.make_matrix(S, SMOOTH, normalization, shift, **kwargs)[source]

Make matrix for SVD analysis

Parameters:
  • S (xarray dataset) -- Dataset with the field to be analyzed

  • SMOOTH (boolean) -- If True, smooth the data

  • normalization (string) -- Type of normalization to be applied

  • shift (string) -- Type of shift to be applied to select the data

  • dropnan (String) -- Option to drop NaN values from Xmat True -- Uses indexed dropna in Xmat False -- Uses standard dropna in Xmat from Xarray

  • detrend (boolean) -- If True, detrend the data

Returns:

X -- Math Matrix with EOF coefficients

Return type:

xarray dataset

AIModels.AIutil.make_eof(X, mr, eof_interval=None)[source]

Compute EOF analysis on data matrix

Parameters:
  • X (X matrix) -- Data matrix in X format

  • mr (int) -- Number of modes to be retained

  • eof_interval (list) -- List with the starting and ending date for EOF analysis

Returns:

  • mr (int) -- Number of modes retained, in case the number of modes is less than rank Set to math.inf to keep all modes

  • var_retained (float) -- Percentage of variance retained

  • udat (numpy array) -- Matrix of EOF modes

  • vdat (numpy array) -- Matrix of EOF coefficients

  • sdat (numpy array) -- Vector of singular values

AIModels.AIutil.make_field(*args, **kwargs)[source]

Make field for analysis

Parameters:
  • area (string (postional argument)) -- Area to be analyzed, possible values are

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • var (string (keyword argument)) -- Variable to be analyzed

  • level (string (keyword argument)) -- Level to be analyzed

  • period (string (keyword argument)) -- Period to be analyzed

  • version (string (keyword argument)) -- Version of the dataset

  • loc (string (keyword argument)) -- Location of the dataset

Returns:

  • Z (xarray dataset) -- Field to be analyzed

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_field_V5(area, **kwargs)[source]

Make field for analysis

Parameters:
  • area (string) -- Area to be analyzed, possible values are

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • var (string) -- Variable to be analyzed

  • level (string) -- Level to be analyzed

  • period (string) -- Period to be analyzed

  • version (string) -- Version of the dataset

  • loc (string) -- Location of the dataset

Returns:

  • Z (xarray dataset) -- Field to be analyzed

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_field_HAD(area, **kwargs)[source]

Make field for analysis for HADSST

Parameters:
  • area (string) -- Area to be analyzed, possible values are

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • var (string) -- Variable to be analyzed

  • level (string) -- Level to be analyzed

  • period (string) -- Period to be analyzed

  • version (string) -- Version of the dataset

  • loc (string) -- Location of the dataset

Returns:

  • Z (xarray dataset) -- Field to be analyzed

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.normalize_training_data(params, vdat, period='train', scaler=None, feature_scale=1)[source]

Normalize training data

Parameters:
  • params (dict) -- Dictionary with the parameters for the analysis

  • vdat (numpy array) -- Data to be normalized

  • period (string) -- Period to be analyzed

  • scaler (object) -- Scaler object

  • feature_scale (float) -- Feature scale

Returns:

  • datatr (numpy array) -- Normalized data

  • sstr (object) -- Scaler object

AIModels.AIutil.make_data(INX, params)[source]

Prepare data for analysis and concatenate as needed. Modify input INX dictionary by adding values for scaler and index for each field. scaler is the scaler used, index is the index of the data in the concatenated matrix INX.

The Convention for indeces is that they point to the real date. If python ranges need to be defined then it must take into account the extra 1 in the end of the range.

Parameters:
  • INX (dict) -- Dictionary with the fields to be analyzed

  • params (dict) -- Dictionary with the parameters for the analysis

Returns:

  • datain (numpy array) -- Matrix with the data to be analyzed

  • INX (dict) -- Input dictionary updated with the information from the data analysis

AIModels.AIutil.make_features(INX)[source]

Prepare features for analysis and compute features boundaries

Parameters:

INX (dict) -- Dictionary with the fields to be analyzed

Returns:

  • num_features (int) -- Number of features

  • m_limit (list) -- List with the boundaries of the features

AIModels.AIutil.make_data_base(InputVars, period='ANN', version='V5', SMOOTH=False, normalization='STD', eof_interval=None, detrend=False, shift='ERA5', case=None, datatype='Source_data', location='DDIR')[source]

Organize data variables in data base INX

Parameters:
  • InputVars (list) -- List of variables to be analyzed

  • period (string) -- Period to be analyzed

  • version (string) -- Version of the dataset

  • SMOOTH (boolean) -- If True, smooth the data

  • normalization (string) -- Type of normalization to be applied

  • eof_interval (list) -- List with the starting and ending date for EOF analysis

  • detrend (boolean) -- If True, detrend the data

  • shift (string) -- Type of shift to be applied to select the data

  • case (string) -- Case to be analyzed

  • datatype (string) -- Type of data to be analyzed, either Source_data or Target_data

  • location (string) -- Data directory

Returns:

INX -- Dictionary with the fields to be analyzed. The dictionary is organized as follows: INX = {'id1':{'case':case,'datatype': datatype,'field':invar.name,'level':inlevel, 'centlon':centlon,'arealat':arealat, 'arealon':arealon, X':X,'xdat':xdat,'mr':mr,'var_retained':varr,'udat':udat,'vdat':vdat,'sdat':sdat}}

Return type:

dict

AIModels.AIutil.init_weights(m)[source]

Initialize weights with uniform ditribution

AIModels.AIutil.count_parameters(model)[source]

Count Model Parameters

AIModels.AIutil.epoch_time(start_time, end_time)[source]

Compute Execution time per epoch

AIModels.AIutil.create_time_features(data_time, start, device)[source]

Create the past features for monthly means

Parameters:
  • data_time (xarray dataset) -- Time data

  • start (datetime) -- Starting date

  • device (string) -- Device to be used for computation

Returns:

pasft -- Tensor with the past features Three levels:

  • The first level is the month sequence

  • The second level is the seasonal cycle

  • The third is the year

Return type:

torch tensor

AIModels.AIutil.rescale(params, PDX, out_train0, out_val0, out_test0, verbose=True)[source]

' Rescale data to original values, according to scaling choice in

Parameters:
  • params (dict) -- Dictionary with the parameters for the analysis

  • PDX (dict) -- Dictionary with the information for the fields to be analyzed

  • out_train0 (numpy array) -- Training data

  • out_val0 (numpy array) -- Validation data

  • out_test0 (numpy array) -- Test data

Returns:

  • out_train (numpy array) -- Rescaled training data

  • out_val (numpy array) -- Rescaled validation data

  • out_test (numpy array) -- Rescaled test data

  • true (numpy array) -- Original data

AIModels.AIutil.matrix_rank_light(X, S)[source]

Compute the rank of a matrix using the singular values

Parameters:
  • X (numpy array) -- Matrix to be analyzed

  • S (numpy array) -- Singular values of the matrix

Returns:

rank -- Rank of the matrix

Return type:

int

AIModels.AIutil.CPRSS(observations, climate, forecasts)[source]

Compute the CRPSS score for ensemble forecasts with respect to the climate reference.

Positive CRPSS (0 < CRPSS ≤ 1): Indicates that your forecast model has better skill than the reference model.

Closer to 1: Greater improvement over the reference.

Zero CRPSS (CRPSS = 0): Your forecast model performs equally to the reference model.

Negative CRPSS (CRPSS < 0): Indicates that your forecast model performs worse than the reference model. More Negative: Greater underperformance compared to the reference.

Parameters:
  • observations (numpy array) -- Observations data

  • forecasts (numpy array) -- Forecasts data

  • climate (numpy array) -- Reference data

Returns:

score -- CRPS score

Return type:

float

AIModels.AIutil.transform_strings(strings)[source]

Transforms a list of strings by extracting the repeating pattern (2, 3, or 7 characters long) from each string.

Parameters:

strings (list of str): List of input strings.

Returns:

list of str: Transformed strings with only the repeating pattern.

AIModels.AIutil.make_fcst_array(startdate, enddate, leads, data)[source]

Make forecast array for verification. The input uses the xarray DataArray format of dimension (ntim, lead, z) where z is a stacked coordinate.

The output is an xarray DataArray with the time, lead, and z dimensions, and the valid_time coordinate as a 2D array.

The lead time starts from 0 as the last element of the input sequence to leads-1.

Parameters:
  • startdate (string) -- Starting date for the forecast

  • enddate (string) -- Ending date for the forecast

  • leads (int) -- Number of leads, including the IC

  • data (xarray DataArray) -- DataArray with the forecast data

Returns:

out -- Forecast array for verification

Return type:

xarray DataArray

AIModels.AIutil.eof_to_grid(field, forecasts, startdate, enddate, params=None, INX=None, truncation=None)[source]

Transform from the eof representation to the grid representation, putting data in the format (Np, Tpredict,gridpoints) in stacked format . Where Np is the total number of cases that is given by $N_p = L - TIN - Tpredict +1$ L is the total length of the test period and Tpredict is the number of lead times and gridpoints is the number of grid points in the field. All fields start at month 1 of the prediction.

Parameters:
  • field (string) -- Field to be analyzed

  • forecasts (numpy array) -- Forecasts data from the network

  • startdate (int) -- Starting date for the forecast (IC)

  • enddate (int) -- Ending date for the forecast

  • params (dict) -- Dictionary with the parameters for the analysis

  • INX (dict) -- Dictionary with the information for the fields to be analyzed

  • truncation (int) -- Number of modes to be retained in the observations

Returns:

  • The routines returns arrays in the format of xarray datasets

  • as (Np,lead,grid points) in stacked format

  • Fcst (xarray dataset) -- Forecast data in grid representation with dims (Np,lead,grid points)

  • Per (xarray dataset) -- Persistence data in grid representation with dims (Np,lead,grid points)

AIModels.AIutil.advance_months(ts, n=1)[source]

Advance or reduce a Timestamp by n months

Parameters:
  • ts (pd.Timestamp or str) -- Timestamp to be modified. If str, it should be in the YYYY-MM-DD format.

  • n (int) -- Number of months to advance (positive) or reduce (negative)

Returns:

Modified timestamp. If input was a string, output will be a string in the YYYY-MM-DD format.

Return type:

pd.Timestamp or str

AIModels.AIutil.project_dyn(data, INX, field, truncation=None)[source]

Project the dynamic data into the EOF space

Parameters:
  • data (xarray dataset) -- Dynamic data

  • INX (dict) -- Dictionary with the information for the fields to be analyzed

  • field (string) -- Field to be analyzed

  • truncation (int) -- Number of modes to be retained in the observations

Returns:

out -- Projected data, in stacked format with NaN values

Return type:

xarray dataset

AIModels.AIutil.make_dyn_verification_new(ver_f, area, dyn_cases, dynddr, times, filever)[source]

Make dynamic verification data. Read all time levels for the GCM data

Parameters:
  • ver_f (numpy array) -- Verification data

  • area (string) -- Area to be analyzed

  • dyn_cases (list) -- List of cases to be analyzed

  • dynaddr (string) -- Address of the dynamic data

  • times (numpy array) -- Time data

  • filever (string) -- Name of the file to be written

Returns:

ver_d -- Verification data in numpy format

Return type:

numpy array

AIModels.AIutil.compute_increments(tensor, axis=0)[source]

Take the difference of the torch tensor along the specified axis and output the initial value

Parameters:
  • tensor (torch tensor) -- Tensor to be analyzed

  • axis (int) -- Axis along which to compute the differences

Returns:

  • diff (torch tensor) -- Tensor with the differences along the specified axis

  • init_value (torch tensor) -- Initial value along the specified axis

AIModels.AIutil.cumsum_with_init(differences, init_value)[source]

Compute cumulative sum with initial values for a PyTorch tensor.

This function calculates the cumulative sum of a tensor containing differences, while incorporating specific initial values. The input tensor differences has dimensions (time, nlead, neof), and the initial values tensor init_value has dimensions (neof). The initial values are added to the first time step of each lead slice, and the cumulative sum is computed along the time dimension.

Parameters:
  • differences (torch.Tensor) -- Tensor containing the differences with dimensions (time, nlead, neof).

  • init_value (torch.Tensor) -- 1D tensor representing the initial values, with size matching the last dimension (neof) of differences.

Returns:

Tensor after computing the cumulative sum with initial values, having the same shape as the input differences.

Return type:

torch.Tensor

Raises:
  • ValueError -- If the size of init_value does not match the last dimension of differences.

  • Computes the cumulative sum for a PyTorch tensor of differences with specific initial values. --

  • The differences tensor has dimensions (time, nlead, neof), and the initial values tensor --

  • has dimensions (neof). The initial values are added to the first time level of each slice (lead) --

  • and then accumulated using the cumulative sum. --

  • Parameters: --

  • - differences -- torch.Tensor: Input tensor containing the differences with dimensions (time, nlead, neof).

  • - init_value -- torch.Tensor: A 1D tensor representing the initial values, with a size matching the last dimension (neof).

  • Returns: --

  • - result -- torch.Tensor: The resulting tensor after computing the cumulative sum with initial values.

AIModels.AIutil.select_fcst(IC, my_data)[source]

Select the forecast data for the given initial condition IC from the xarray my_data dataset.

Parameters:
  • IC (string) -- Initial condition for the forecast

  • my_data (xarray dataset) -- Forecast data

Returns:

ds_single_init -- Forecast data for verification with coordinate "time" as the valid_time

Return type:

xarray dataset

AIModels.AIutil.variance_features(INX)[source]

retur the variance of the features

Parameters:

INX (dict) -- Dictionary with the information for the fields to be analyzed

Returns:

ssvar -- Variance of the features

Return type:

numpy array

AIModels.AIutil.create_subdirectory(parent_dir, subdirectory_name)[source]

Create a subdirectory within the specified parent directory if it does not exist.

Parameters:
  • parent_dir (str) -- The path to the parent directory.

  • subdirectory_name (str) -- The name of the subdirectory to create.

Returns:

The full path of the created or existing subdirectory.

Return type:

str

AIModels.AIutil.eof_to_grid_new(field, forecasts, startdate, enddate, params=None, INX=None, truncation=None)[source]

Transform from the EOF representation to the grid representation. The routine now supports forecasts arrays with an extra ensemble dimension.

For the original case, forecasts has shape (Np, T, n_eofs) and the output forecast dataset has dims (Np, lead, gridpoints). Now, if forecasts has shape (Np, K, T, n_eofs), the output forecast dataset will have dims (Np, member, lead, gridpoints).

Parameters:
  • field (string) -- Field to be analyzed

  • forecasts (numpy array) -- Forecast data from the network. Expected shape is either (Np, T, n_eofs) or (Np, K, T, n_eofs), where K is the ensemble size.

  • startdate (datetime-like) -- Starting date for the forecast (initial condition)

  • enddate (datetime-like) -- Ending date for the forecast

  • params (dict) -- Dictionary with the parameters for the analysis. Must include 'Tpredict', 'TIN', and 'T'

  • INX (dict) -- Dictionary with information for the fields to be analyzed (including the EOF patterns)

  • truncation (int, optional) -- Number of modes to be retained in the observations

Returns:

  • Fcst (xarray DataArray) -- Forecast data in grid representation. * If forecasts is 3D, dims are (time, lead, z). * If forecasts is 4D, dims are (time, member, lead, z).

  • Per (xarray DataArray) -- Persistence data in grid representation, with the same dims as Fcst.

  • Obs (xarray DataArray) -- Observation data in grid representation, with dims (time, z)

AIModels.AIutil.get_common_dates(Ft, Dt)[source]

Get common dates between two xarray datasets and their indices.

Parameters:
  • Ft (xarray.DataArray) -- First dataset with a time dimension.

  • Dt (xarray.DataArray) -- Second dataset with a time dimension.

Returns:

  • common_dates (numpy.ndarray) -- Array of common dates.

  • Ft_indices (numpy.ndarray) -- Indices of common dates in the first dataset.

  • Dt_indices (numpy.ndarray) -- Indices of common dates in the second dataset.