AIModels.AIutil module

Utility Treatment Routines for ML Predicting models

Utility Treatment Routines for ML Predicting models

This modules contains a set of routines for preparing data for insertion in transformers models used for prediction of multivariate time series data. The classes are contaiend in the companion file and are imported in this module.



AIModels.AIutil.copy_dict(data, strip_values=False, remove_keys=[])[source]

Copy dictionary

  • data (dict) -- Dictionary to be copied

  • strip_values (boolean) -- If True, strip values

  • remove_keys (list) -- List of keys to be removed


out -- Copied dictionary without the keys in remove_keys

Return type:


AIModels.AIutil.select_field(INX, outfield, verbose=False)[source]

Select field for outfield in dict

  • INX (dict) -- Dictionary with the fields to be analyzed

  • outfield (string) -- Field to be analyzed

  • verbose (boolean) -- If True, print the field information


  • out (numpy array) -- Field data

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

  • centlon (numpy array) -- Central longitude of the field

AIModels.AIutil.select_field_key(INX, outfield, dataname)[source]

Select field for outfield in dict according to dataname

  • INX (dict) -- Dictionary with the fields to be analyzed

  • outfield (string) -- Field to be analyzed

  • dataname (string) -- Key to be analyzed


out -- Field data

Return type:

numpy array

AIModels.AIutil.select_field_eof(INX, outfield)[source]

Select eof-related fields for outfield in dict

  • INX (dict) -- Dictionary with the fields to be analyzed

  • outfield (string) -- Field to be analyzed


  • U (numpy array) -- EOF modes

  • S (numpy array) -- Singular values

  • V (numpy array) -- EOF coefficients

AIModels.AIutil.select_area(area, ZZ)[source]

Select area for dataset ZZ

  • area (string) -- Area to be analyzed, possible values are (EUROPE only implemented)

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • ZZ (xarray dataset)



Return type:

xarray dataset

AIModels.AIutil.make_matrix(S, SMOOTH, normalization, shift, **kwargs)[source]

Make matrix for SVD analysis

  • S (xarray dataset) -- Dataset with the field to be analyzed

  • SMOOTH (boolean) -- If True, smooth the data

  • normalization (string) -- Type of normalization to be applied

  • shift (string) -- Type of shift to be applied to select the data

  • dropnan (String) -- Option to drop NaN values from Xmat True -- Uses indexed dropna in Xmat False -- Uses standard dropna in Xmat from Xarray

  • detrend (boolean) -- If True, detrend the data


X -- Math Matrix with EOF coefficients

Return type:

xarray dataset

AIModels.AIutil.make_eof(X, mr, eof_interval=None)[source]

Compute EOF analysis on data matrix

  • X (X matrix) -- Data matrix in X format

  • mr (int) -- Number of modes to be retained

  • eof_interval (list) -- List with the starting and ending date for EOF analysis


  • mr (int) -- Number of modes retained, in case the number of modes is less than rank Set to math.inf to keep all modes

  • var_retained (float) -- Percentage of variance retained

  • udat (numpy array) -- Matrix of EOF modes

  • vdat (numpy array) -- Matrix of EOF coefficients

  • sdat (numpy array) -- Vector of singular values

AIModels.AIutil.make_field(*args, **kwargs)[source]

Make field for analysis

  • area (string (postional argument)) -- Area to be analyzed, possible values are

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • var (string (keyword argument)) -- Variable to be analyzed

  • level (string (keyword argument)) -- Level to be analyzed

  • period (string (keyword argument)) -- Period to be analyzed

  • version (string (keyword argument)) -- Version of the dataset

  • loc (string (keyword argument)) -- Location of the dataset


  • Z (xarray dataset) -- Field to be analyzed

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_field_V5(area, **kwargs)[source]

Make field for analysis

  • area (string) -- Area to be analyzed, possible values are

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • var (string) -- Variable to be analyzed

  • level (string) -- Level to be analyzed

  • period (string) -- Period to be analyzed

  • version (string) -- Version of the dataset

  • loc (string) -- Location of the dataset


  • Z (xarray dataset) -- Field to be analyzed

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_field_HAD(area, **kwargs)[source]

Make field for analysis for HADSST

  • area (string) -- Area to be analyzed, possible values are

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • var (string) -- Variable to be analyzed

  • level (string) -- Level to be analyzed

  • period (string) -- Period to be analyzed

  • version (string) -- Version of the dataset

  • loc (string) -- Location of the dataset


  • Z (xarray dataset) -- Field to be analyzed

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_data(INX, params)[source]

Prepare data for analysis and concatenate as needed. Modify input INX dictionary by adding values for scaler and index for each field. scaler is the scaler used, index is the index of the data in the concatenated matrix INX.

The Convention for indeces is that they point to the real date. If python ranges need to be defined then it must take into account the extra 1 in the end of the range.

  • INX (dict) -- Dictionary with the fields to be analyzed

  • params (dict) -- Dictionary with the parameters for the analysis


  • datain (numpy array) -- Matrix with the data to be analyzed

  • INX (dict) -- Input dictionary updated with the information from the data analysis


Prepare features for analysis and compute features boundaries


INX (dict) -- Dictionary with the fields to be analyzed


  • num_features (int) -- Number of features

  • m_limit (list) -- List with the boundaries of the features

AIModels.AIutil.make_data_base(InputVars, period='ANN', version='V5', SMOOTH=False, normalization='STD', eof_interval=None, detrend=False, shift='ERA5', case=None, datatype='Source_data', location='DDIR')[source]

Organize data variables in data base INX

  • InputVars (list) -- List of variables to be analyzed

  • period (string) -- Period to be analyzed

  • version (string) -- Version of the dataset

  • SMOOTH (boolean) -- If True, smooth the data

  • normalization (string) -- Type of normalization to be applied

  • eof_interval (list) -- List with the starting and ending date for EOF analysis

  • detrend (boolean) -- If True, detrend the data

  • shift (string) -- Type of shift to be applied to select the data

  • case (string) -- Case to be analyzed

  • datatype (string) -- Type of data to be analyzed, either Source_data or Target_data

  • location (string) -- Data directory


INX -- Dictionary with the fields to be analyzed. The dictionary is organized as follows: INX = {'id1':{'case':case,'datatype': datatype,'field','level':inlevel, 'centlon':centlon,'arealat':arealat, 'arealon':arealon, X':X,'xdat':xdat,'mr':mr,'var_retained':varr,'udat':udat,'vdat':vdat,'sdat':sdat}}

Return type:

Return type:



Initialize weights with uniform ditribution


Count Model Parameters

AIModels.AIutil.epoch_time(start_time, end_time)[source]

Compute Execution time per epoch

AIModels.AIutil.create_time_features(data_time, start, device)[source]

Create the past features for monthly means

  • data_time (xarray dataset) -- Time data

  • start (datetime) -- Starting date

  • device (string) -- Device to be used for computation


pasft -- Tensor with the past features Three levels:

  • The first level is the month sequence

  • The second level is the seasonal cycle

  • The third is the year

Return type:

torch tensor

AIModels.AIutil.rescale(params, PDX, out_train0, out_val0, out_test0)[source]

' Rescale data to original values, according to scaling choice in

  • params (dict) -- Dictionary with the parameters for the analysis

  • PDX (dict) -- Dictionary with the information for the fields to be analyzed

  • out_train0 (numpy array) -- Training data

  • out_val0 (numpy array) -- Validation data

  • out_test0 (numpy array) -- Test data


  • out_train (numpy array) -- Rescaled training data

  • out_val (numpy array) -- Rescaled validation data

  • out_test (numpy array) -- Rescaled test data

  • true (numpy array) -- Original data

AIModels.AIutil.make_dyn_verification(ver_f, area, dyn_cases, dynddr, times, dyn_startdate, dyn_enddate, filever)[source]

Make dynamic verification data

  • ver_f (numpy array) -- Verification data

  • area (string) -- Area to be analyzed

  • dyn_cases (list) -- List of cases to be analyzed

  • dynaddr (string) -- Address of the dynamic data

  • times (numpy array) -- Time data

  • dyn_startdate (string) -- Starting date for the data

  • dyn_enddate (string) -- Ending date for the data

  • filever (string) -- Name of the file to be written


ver_d -- Verification data in numpy format

Return type:

numpy array

AIModels.AIutil.eof_to_grid_new(choice, field, verification, forecasts, times, INX=None, params=None, truncation=None)[source]

Transform from the eof representation to the grid representation, putting data in the format (Np, Tpredict,gridpoints) in stacked format . Where Np is the total number of cases that is given by $N_p = L - TIN - Tpredict +1$ L is the total length of the test period and Tpredict is the number of lead times and gridpoints is the number of grid points in the field. All fields start at month 1 of the prediction.

  • choice (string) -- Choice of data to be analyzed, either test, validation or training

  • field (string) -- Field to be analyzed

  • verification (numpy array) -- Verification data from reanalysis

  • forecasts (numpy array) -- Forecasts data from the network

  • times (DateTime) -- Time data for entire period

  • INX (dict) -- Dictionary with the information for the fields to be analyzed

  • truncation (int) -- Number of modes to be retained in the observations


  • The routines returns arrays in the format of xarray datasets

  • as (Np,lead,grid points) in stacked format

  • Fcst (xarray dataset) -- Forecast data in grid representation

  • Obs (xarray dataset) -- Observations data in grid representation

  • Ver (xarray dataset) -- Verification data in grid representation

  • Per (xarray dataset) -- Persistence data in grid representation

AIModels.AIutil.matrix_rank_light(X, S)[source]

Compute the rank of a matrix using the singular values

  • X (numpy array) -- Matrix to be analyzed

  • S (numpy array) -- Singular values of the matrix


rank -- Rank of the matrix

Return type:
