AIModels.AIutil module

Utility Treatment Routines for ML Predicting models

This modules contains a set of routines for preparing data for insertion in transformers models used for prediction of multivariate time series data. The classes are contaiend in the companion file AIClasses.py and are imported in this module. The classes are contaiend in the companion file AIClasses.py and are imported in this module.

Utilities

AIModels.AIutil.func_name()[source]
Returns:

name of caller

AIModels.AIutil.copy_dict(data, strip_values=False, remove_keys=[])[source]

Copy dictionary

Parameters:
  • data (dict) -- Dictionary to be copied

  • strip_values (boolean) -- If True, strip values

  • remove_keys (list) -- List of keys to be removed

Returns:

out -- Copied dictionary without the keys in remove_keys

Return type:

dict

AIModels.AIutil.select_field(INX, outfield, verbose=False)[source]

Select field for outfield in dict

Parameters:
  • INX (dict) -- Dictionary with the fields to be analyzed

  • outfield (string) -- Field to be analyzed

  • verbose (boolean) -- If True, print the field information

Returns:

  • out (numpy array) -- Field data

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

  • centlon (numpy array) -- Central longitude of the field

AIModels.AIutil.select_field_key(INX, outfield, dataname)[source]

Select field for outfield in dict according to dataname

Parameters:
  • INX (dict) -- Dictionary with the fields to be analyzed

  • outfield (string) -- Field to be analyzed

  • dataname (string) -- Key to be analyzed

Returns:

out -- Field data

Return type:

numpy array

AIModels.AIutil.select_field_eof(INX, outfield)[source]

Select eof-related fields for outfield in dict

Parameters:
  • INX (dict) -- Dictionary with the fields to be analyzed

  • outfield (string) -- Field to be analyzed

Returns:

  • U (numpy array) -- EOF modes

  • S (numpy array) -- Singular values

  • V (numpy array) -- EOF coefficients

AIModels.AIutil.select_area(area, ZZ)[source]

Select area for dataset ZZ

Parameters:
  • area (string) -- Area to be analyzed, possible values are (EUROPE only implemented)

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • ZZ (xarray dataset)

Returns:

Z

Return type:

xarray dataset

AIModels.AIutil.make_matrix(S, SMOOTH, normalization, shift, **kwargs)[source]

Make matrix for SVD analysis

Parameters:
  • S (xarray dataset) -- Dataset with the field to be analyzed

  • SMOOTH (boolean) -- If True, smooth the data

  • normalization (string) -- Type of normalization to be applied

  • shift (string) -- Type of shift to be applied to select the data

  • dropnan (String) -- Option to drop NaN values from Xmat True -- Uses indexed dropna in Xmat False -- Uses standard dropna in Xmat from Xarray

  • detrend (boolean) -- If True, detrend the data

Returns:

X -- Math Matrix with EOF coefficients

Return type:

xarray dataset

AIModels.AIutil.make_eof(X, mr, eof_interval=None)[source]

Compute EOF analysis on data matrix

Parameters:
  • X (X matrix) -- Data matrix in X format

  • mr (int) -- Number of modes to be retained

  • eof_interval (list) -- List with the starting and ending date for EOF analysis

Returns:

  • mr (int) -- Number of modes retained, in case the number of modes is less than rank Set to math.inf to keep all modes

  • var_retained (float) -- Percentage of variance retained

  • udat (numpy array) -- Matrix of EOF modes

  • vdat (numpy array) -- Matrix of EOF coefficients

  • sdat (numpy array) -- Vector of singular values

AIModels.AIutil.make_field(*args, **kwargs)[source]

Make field for analysis

Parameters:
  • area (string (postional argument)) -- Area to be analyzed, possible values are

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • var (string (keyword argument)) -- Variable to be analyzed

  • level (string (keyword argument)) -- Level to be analyzed

  • period (string (keyword argument)) -- Period to be analyzed

  • version (string (keyword argument)) -- Version of the dataset

  • loc (string (keyword argument)) -- Location of the dataset

Returns:

  • Z (xarray dataset) -- Field to be analyzed

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_field_V5(area, **kwargs)[source]

Make field for analysis

Parameters:
  • area (string) -- Area to be analyzed, possible values are

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • var (string) -- Variable to be analyzed

  • level (string) -- Level to be analyzed

  • period (string) -- Period to be analyzed

  • version (string) -- Version of the dataset

  • loc (string) -- Location of the dataset

Returns:

  • Z (xarray dataset) -- Field to be analyzed

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_field_HAD(area, **kwargs)[source]

Make field for analysis for HADSST

Parameters:
  • area (string) -- Area to be analyzed, possible values are

    • 'TROPIC': Tropics

    • 'GLOBAL': Global

    • 'PACTROPIC': Pacific Tropics

    • 'WORLD': World

    • 'EUROPE': Europe

    • 'NORTH_AMERICA': North America

    • 'NH-ML': Northern Hemisphere Mid-Latitudes

  • var (string) -- Variable to be analyzed

  • level (string) -- Level to be analyzed

  • period (string) -- Period to be analyzed

  • version (string) -- Version of the dataset

  • loc (string) -- Location of the dataset

Returns:

  • Z (xarray dataset) -- Field to be analyzed

  • arealat (numpy array) -- Latitude limits of the field

  • arealon (numpy array) -- Longitude limits of the field

AIModels.AIutil.make_data(INX, params)[source]

Prepare data for analysis and concatenate as needed. Modify input INX dictionary by adding values for scaler and index for each field. scaler is the scaler used, index is the index of the data in the concatenated matrix INX.

The Convention for indeces is that they point to the real date. If python ranges need to be defined then it must take into account the extra 1 in the end of the range.

Parameters:
  • INX (dict) -- Dictionary with the fields to be analyzed

  • params (dict) -- Dictionary with the parameters for the analysis

Returns:

  • datain (numpy array) -- Matrix with the data to be analyzed

  • INX (dict) -- Input dictionary updated with the information from the data analysis

AIModels.AIutil.make_features(INX)[source]

Prepare features for analysis and compute features boundaries

Parameters:

INX (dict) -- Dictionary with the fields to be analyzed

Returns:

  • num_features (int) -- Number of features

  • m_limit (list) -- List with the boundaries of the features

AIModels.AIutil.make_data_base(InputVars, period='ANN', version='V5', SMOOTH=False, normalization='STD', eof_interval=None, detrend=False, shift='ERA5', case=None, datatype='Source_data', location='DDIR')[source]

Organize data variables in data base INX

Parameters:
  • InputVars (list) -- List of variables to be analyzed

  • period (string) -- Period to be analyzed

  • version (string) -- Version of the dataset

  • SMOOTH (boolean) -- If True, smooth the data

  • normalization (string) -- Type of normalization to be applied

  • eof_interval (list) -- List with the starting and ending date for EOF analysis

  • detrend (boolean) -- If True, detrend the data

  • shift (string) -- Type of shift to be applied to select the data

  • case (string) -- Case to be analyzed

  • datatype (string) -- Type of data to be analyzed, either Source_data or Target_data

  • location (string) -- Data directory

Returns:

INX -- Dictionary with the fields to be analyzed. The dictionary is organized as follows: INX = {'id1':{'case':case,'datatype': datatype,'field':invar.name,'level':inlevel, 'centlon':centlon,'arealat':arealat, 'arealon':arealon, X':X,'xdat':xdat,'mr':mr,'var_retained':varr,'udat':udat,'vdat':vdat,'sdat':sdat}}

Return type:

dict

AIModels.AIutil.init_weights(m)[source]

Initialize weights with uniform ditribution

AIModels.AIutil.count_parameters(model)[source]

Count Model Parameters

AIModels.AIutil.epoch_time(start_time, end_time)[source]

Compute Execution time per epoch

AIModels.AIutil.create_time_features(data_time, start, device)[source]

Create the past features for monthly means

Parameters:
  • data_time (xarray dataset) -- Time data

  • start (datetime) -- Starting date

  • device (string) -- Device to be used for computation

Returns:

pasft -- Tensor with the past features Three levels:

  • The first level is the month sequence

  • The second level is the seasonal cycle

  • The third is the year

Return type:

torch tensor

AIModels.AIutil.rescale(params, PDX, out_train0, out_val0, out_test0)[source]

' Rescale data to original values, according to scaling choice in

Parameters:
  • params (dict) -- Dictionary with the parameters for the analysis

  • PDX (dict) -- Dictionary with the information for the fields to be analyzed

  • out_train0 (numpy array) -- Training data

  • out_val0 (numpy array) -- Validation data

  • out_test0 (numpy array) -- Test data

Returns:

  • out_train (numpy array) -- Rescaled training data

  • out_val (numpy array) -- Rescaled validation data

  • out_test (numpy array) -- Rescaled test data

  • true (numpy array) -- Original data

AIModels.AIutil.make_dyn_verification(ver_f, area, dyn_cases, dynddr, times, dyn_startdate, dyn_enddate, filever)[source]

Make dynamic verification data

Parameters:
  • ver_f (numpy array) -- Verification data

  • area (string) -- Area to be analyzed

  • dyn_cases (list) -- List of cases to be analyzed

  • dynaddr (string) -- Address of the dynamic data

  • times (numpy array) -- Time data

  • dyn_startdate (string) -- Starting date for the data

  • dyn_enddate (string) -- Ending date for the data

  • filever (string) -- Name of the file to be written

Returns:

ver_d -- Verification data in numpy format

Return type:

numpy array

AIModels.AIutil.eof_to_grid_new(choice, field, verification, forecasts, times, INX=None, params=None, truncation=None)[source]

Transform from the eof representation to the grid representation, putting data in the format (Np, Tpredict,gridpoints) in stacked format . Where Np is the total number of cases that is given by $N_p = L - TIN - Tpredict +1$ L is the total length of the test period and Tpredict is the number of lead times and gridpoints is the number of grid points in the field. All fields start at month 1 of the prediction.

Parameters:
  • choice (string) -- Choice of data to be analyzed, either test, validation or training

  • field (string) -- Field to be analyzed

  • verification (numpy array) -- Verification data from reanalysis

  • forecasts (numpy array) -- Forecasts data from the network

  • times (DateTime) -- Time data for entire period

  • INX (dict) -- Dictionary with the information for the fields to be analyzed

  • truncation (int) -- Number of modes to be retained in the observations

Returns:

  • The routines returns arrays in the format of xarray datasets

  • as (Np,lead,grid points) in stacked format

  • Fcst (xarray dataset) -- Forecast data in grid representation

  • Obs (xarray dataset) -- Observations data in grid representation

  • Ver (xarray dataset) -- Verification data in grid representation

  • Per (xarray dataset) -- Persistence data in grid representation

AIModels.AIutil.matrix_rank_light(X, S)[source]

Compute the rank of a matrix using the singular values

Parameters:
  • X (numpy array) -- Matrix to be analyzed

  • S (numpy array) -- Singular values of the matrix

Returns:

rank -- Rank of the matrix

Return type:

int