AIModels.AIutil module¶
Utility Treatment Routines for ML Predicting models¶
This modules contains a set of routines for preparing data for insertion in transformers models used for prediction of multivariate time series data. The classes are contaiend in the companion file AIClasses.py and are imported in this module. The classes are contaiend in the companion file AIClasses.py and are imported in this module.
Utilities¶
- AIModels.AIutil.copy_dict(data, strip_values=False, remove_keys=[])[source]¶
Copy dictionary
- Parameters:
data (dict) -- Dictionary to be copied
strip_values (boolean) -- If True, strip values
remove_keys (list) -- List of keys to be removed
- Returns:
out -- Copied dictionary without the keys in remove_keys
- Return type:
dict
- AIModels.AIutil.select_field(INX, outfield, verbose=False)[source]¶
Select field for outfield in dict
- Parameters:
INX (dict) -- Dictionary with the fields to be analyzed
outfield (string) -- Field to be analyzed
verbose (boolean) -- If True, print the field information
- Returns:
out (numpy array) -- Field data
arealat (numpy array) -- Latitude limits of the field
arealon (numpy array) -- Longitude limits of the field
centlon (numpy array) -- Central longitude of the field
- AIModels.AIutil.select_field_key(INX, outfield, dataname)[source]¶
Select field for outfield in dict according to dataname
- Parameters:
INX (dict) -- Dictionary with the fields to be analyzed
outfield (string) -- Field to be analyzed
dataname (string) -- Key to be analyzed
- Returns:
out -- Field data
- Return type:
numpy array
- AIModels.AIutil.select_field_eof(INX, outfield)[source]¶
Select eof-related fields for outfield in dict
- Parameters:
INX (dict) -- Dictionary with the fields to be analyzed
outfield (string) -- Field to be analyzed
- Returns:
U (numpy array) -- EOF modes
S (numpy array) -- Singular values
V (numpy array) -- EOF coefficients
- AIModels.AIutil.select_area(area, ZZ)[source]¶
Select area for dataset ZZ
- Parameters:
area (string) -- Area to be analyzed, possible values are (EUROPE only implemented)
'TROPIC': Tropics
'GLOBAL': Global
'PACTROPIC': Pacific Tropics
'WORLD': World
'EUROPE': Europe
'NORTH_AMERICA': North America
'NH-ML': Northern Hemisphere Mid-Latitudes
ZZ (xarray dataset)
- Returns:
Z
- Return type:
xarray dataset
- AIModels.AIutil.make_matrix(S, SMOOTH, normalization, shift, **kwargs)[source]¶
Make matrix for SVD analysis
- Parameters:
S (xarray dataset) -- Dataset with the field to be analyzed
SMOOTH (boolean) -- If True, smooth the data
normalization (string) -- Type of normalization to be applied
shift (string) -- Type of shift to be applied to select the data
dropnan (String) -- Option to drop NaN values from Xmat True -- Uses indexed dropna in Xmat False -- Uses standard dropna in Xmat from Xarray
detrend (boolean) -- If True, detrend the data
- Returns:
X -- Math Matrix with EOF coefficients
- Return type:
xarray dataset
- AIModels.AIutil.make_eof(X, mr, eof_interval=None)[source]¶
Compute EOF analysis on data matrix
- Parameters:
X (X matrix) -- Data matrix in X format
mr (int) -- Number of modes to be retained
eof_interval (list) -- List with the starting and ending date for EOF analysis
- Returns:
mr (int) -- Number of modes retained, in case the number of modes is less than rank Set to math.inf to keep all modes
var_retained (float) -- Percentage of variance retained
udat (numpy array) -- Matrix of EOF modes
vdat (numpy array) -- Matrix of EOF coefficients
sdat (numpy array) -- Vector of singular values
- AIModels.AIutil.make_field(*args, **kwargs)[source]¶
Make field for analysis
- Parameters:
area (string (postional argument)) -- Area to be analyzed, possible values are
'TROPIC': Tropics
'GLOBAL': Global
'PACTROPIC': Pacific Tropics
'WORLD': World
'EUROPE': Europe
'NORTH_AMERICA': North America
'NH-ML': Northern Hemisphere Mid-Latitudes
var (string (keyword argument)) -- Variable to be analyzed
level (string (keyword argument)) -- Level to be analyzed
period (string (keyword argument)) -- Period to be analyzed
version (string (keyword argument)) -- Version of the dataset
loc (string (keyword argument)) -- Location of the dataset
- Returns:
Z (xarray dataset) -- Field to be analyzed
arealat (numpy array) -- Latitude limits of the field
arealon (numpy array) -- Longitude limits of the field
- AIModels.AIutil.make_field_V5(area, **kwargs)[source]¶
Make field for analysis
- Parameters:
area (string) -- Area to be analyzed, possible values are
'TROPIC': Tropics
'GLOBAL': Global
'PACTROPIC': Pacific Tropics
'WORLD': World
'EUROPE': Europe
'NORTH_AMERICA': North America
'NH-ML': Northern Hemisphere Mid-Latitudes
var (string) -- Variable to be analyzed
level (string) -- Level to be analyzed
period (string) -- Period to be analyzed
version (string) -- Version of the dataset
loc (string) -- Location of the dataset
- Returns:
Z (xarray dataset) -- Field to be analyzed
arealat (numpy array) -- Latitude limits of the field
arealon (numpy array) -- Longitude limits of the field
- AIModels.AIutil.make_field_HAD(area, **kwargs)[source]¶
Make field for analysis for HADSST
- Parameters:
area (string) -- Area to be analyzed, possible values are
'TROPIC': Tropics
'GLOBAL': Global
'PACTROPIC': Pacific Tropics
'WORLD': World
'EUROPE': Europe
'NORTH_AMERICA': North America
'NH-ML': Northern Hemisphere Mid-Latitudes
var (string) -- Variable to be analyzed
level (string) -- Level to be analyzed
period (string) -- Period to be analyzed
version (string) -- Version of the dataset
loc (string) -- Location of the dataset
- Returns:
Z (xarray dataset) -- Field to be analyzed
arealat (numpy array) -- Latitude limits of the field
arealon (numpy array) -- Longitude limits of the field
- AIModels.AIutil.make_data(INX, params)[source]¶
Prepare data for analysis and concatenate as needed. Modify input INX dictionary by adding values for scaler and index for each field. scaler is the scaler used, index is the index of the data in the concatenated matrix INX.
The Convention for indeces is that they point to the real date. If python ranges need to be defined then it must take into account the extra 1 in the end of the range.
- Parameters:
INX (dict) -- Dictionary with the fields to be analyzed
params (dict) -- Dictionary with the parameters for the analysis
- Returns:
datain (numpy array) -- Matrix with the data to be analyzed
INX (dict) -- Input dictionary updated with the information from the data analysis
- AIModels.AIutil.make_features(INX)[source]¶
Prepare features for analysis and compute features boundaries
- Parameters:
INX (dict) -- Dictionary with the fields to be analyzed
- Returns:
num_features (int) -- Number of features
m_limit (list) -- List with the boundaries of the features
- AIModels.AIutil.make_data_base(InputVars, period='ANN', version='V5', SMOOTH=False, normalization='STD', eof_interval=None, detrend=False, shift='ERA5', case=None, datatype='Source_data', location='DDIR')[source]¶
Organize data variables in data base INX
- Parameters:
InputVars (list) -- List of variables to be analyzed
period (string) -- Period to be analyzed
version (string) -- Version of the dataset
SMOOTH (boolean) -- If True, smooth the data
normalization (string) -- Type of normalization to be applied
eof_interval (list) -- List with the starting and ending date for EOF analysis
detrend (boolean) -- If True, detrend the data
shift (string) -- Type of shift to be applied to select the data
case (string) -- Case to be analyzed
datatype (string) -- Type of data to be analyzed, either Source_data or Target_data
location (string) -- Data directory
- Returns:
INX -- Dictionary with the fields to be analyzed. The dictionary is organized as follows: INX = {'id1':{'case':case,'datatype': datatype,'field':invar.name,'level':inlevel, 'centlon':centlon,'arealat':arealat, 'arealon':arealon, X':X,'xdat':xdat,'mr':mr,'var_retained':varr,'udat':udat,'vdat':vdat,'sdat':sdat}}
- Return type:
dict
- AIModels.AIutil.create_time_features(data_time, start, device)[source]¶
Create the past features for monthly means
- Parameters:
data_time (xarray dataset) -- Time data
start (datetime) -- Starting date
device (string) -- Device to be used for computation
- Returns:
pasft -- Tensor with the past features Three levels:
The first level is the month sequence
The second level is the seasonal cycle
The third is the year
- Return type:
torch tensor
- AIModels.AIutil.rescale(params, PDX, out_train0, out_val0, out_test0)[source]¶
' Rescale data to original values, according to scaling choice in
- Parameters:
params (dict) -- Dictionary with the parameters for the analysis
PDX (dict) -- Dictionary with the information for the fields to be analyzed
out_train0 (numpy array) -- Training data
out_val0 (numpy array) -- Validation data
out_test0 (numpy array) -- Test data
- Returns:
out_train (numpy array) -- Rescaled training data
out_val (numpy array) -- Rescaled validation data
out_test (numpy array) -- Rescaled test data
true (numpy array) -- Original data
- AIModels.AIutil.make_dyn_verification(ver_f, area, dyn_cases, dynddr, times, dyn_startdate, dyn_enddate, filever)[source]¶
Make dynamic verification data
- Parameters:
ver_f (numpy array) -- Verification data
area (string) -- Area to be analyzed
dyn_cases (list) -- List of cases to be analyzed
dynaddr (string) -- Address of the dynamic data
times (numpy array) -- Time data
dyn_startdate (string) -- Starting date for the data
dyn_enddate (string) -- Ending date for the data
filever (string) -- Name of the file to be written
- Returns:
ver_d -- Verification data in numpy format
- Return type:
numpy array
- AIModels.AIutil.eof_to_grid_new(choice, field, verification, forecasts, times, INX=None, params=None, truncation=None)[source]¶
Transform from the eof representation to the grid representation, putting data in the format (Np, Tpredict,gridpoints) in stacked format . Where Np is the total number of cases that is given by $N_p = L - TIN - Tpredict +1$ L is the total length of the test period and Tpredict is the number of lead times and gridpoints is the number of grid points in the field. All fields start at month 1 of the prediction.
- Parameters:
choice (string) -- Choice of data to be analyzed, either test, validation or training
field (string) -- Field to be analyzed
verification (numpy array) -- Verification data from reanalysis
forecasts (numpy array) -- Forecasts data from the network
times (DateTime) -- Time data for entire period
INX (dict) -- Dictionary with the information for the fields to be analyzed
truncation (int) -- Number of modes to be retained in the observations
- Returns:
The routines returns arrays in the format of xarray datasets
as (Np,lead,grid points) in stacked format
Fcst (xarray dataset) -- Forecast data in grid representation
Obs (xarray dataset) -- Observations data in grid representation
Ver (xarray dataset) -- Verification data in grid representation
Per (xarray dataset) -- Persistence data in grid representation