zapata.computation module¶
- zapata.computation.smooth_xarray(X, sigma=5, order=0, mode='wrap')[source]¶
Smooth xarray X with a gaussian filter .
It uses a routine from scipy ndimage (
ndimage.gaussian_filter
). The filter is applied to all dimensions. See the doc page of (ndimage.gaussian_filter
) for a full documentation. The filter can be used for periodic fields, then the correct setting of mode is 'wrap'- Parameters:
X -- Input Xarray
sigma -- Standard deviation for the Gaussian kernel
order -- Order of the smoothing, 0 is a simple convolution
mode -- The mode parameter determines how the input array is extended when the filter overlaps a border. By passing a sequence of modes with length equal to the number of dimensions of the input array, different modes can be specified along each axis. Default value is ‘reflect’.
The valid values and their behaviors are as follows:
‘reflect’ (d c b a | a b c d | d c b a) The input is extended by reflecting about the edge of the last pixel.
‘constant’ (k k k k | a b c d | k k k k) The input is extended by filling all values beyond the edge with the same constant value, defined by the cval parameter.
‘nearest’ (a a a a | a b c d | d d d d) The input is extended by replicating the last pixel.
‘mirror’ (d c b | a b c d | c b a) The input is extended by reflecting about the center of the last pixel.
‘wrap’ (a b c d | a b c d | a b c d) The input is extended by wrapping around to the opposite edge.
- Returns:
numpy array
- Return type:
smooth_array
Examples
Smooth a X[lat,lon] array with nearest repetition in lat and periodicity in lon
>>> smooth_array(X,sigma=5,order=0,mode=['nearest','wrap'])
- zapata.computation.anomaly(var, option='anom', freq='month')[source]¶
Compute Anomalies according to option
Long description here.
- Parameters:
var (xarray) -- array to compute anomalies
option --
- Option controlling the type of anomaly calculation
deviation
Subtract the time mean of the time series
deviation_std
Subtract the time mean and normalize by standard deviation
anom
Compute anomalies from monthly climatology
anomstd
Compute standardized anomalies from monthly climatology
None
Return unchanged data
freq -- Frequency of data
- Returns:
anom
- Return type:
xarray
- class zapata.computation.Xmat(X, dims: Hashable | Sequence[Hashable] | None = None, option=None)[source]¶
Bases:
object
This class creates xarrays in vector mathematical form.
The xarray is stacked along dims dimensions with the spatial values as column vectors and time as the number of columns.
Specifying the parameter option as DropNaN will drop all the NaN values and the matrix can then be recontructed using the expand method.
- Parameters:
X (xarray) -- xarray of at leasts two dimensions
dims -- Dimensions to be stacked, Default ('lat','lon')
options -- Options for Xmat creation None : Keep NaN (Default) DropNaN : Drop NaN values
- Variables:
A (xarray) -- Stacked matrix of type xarray
_ntime -- Length of time points
_npoints -- Length of spatial points
_F (xarray) -- Original matrix of type xarray with only NaN values
Examples
Create a stacked data matrix along the 'lon' 'lat' dimension
>>> Z = Xmat(X, dims=('lat','lon'))
- A¶
- expand()[source]¶
Unroll Xmat matrix to xarray
Examples
Unroll a stacked and NaN-dropped matrix X
>>> Xlatlon = X.expand()
- svd(N=10)[source]¶
Compute SVD of Data Matrix A.
The calculation is done in a way that the modes are equivalent to EOF
- Parameters:
N -- Number of modes desired. If it is larger than the number of time levels then it is set to the maximum
- Returns:
out --
- Dictionary including
Pattern
EOF patterns
Singular_Values
Singular Values
Coefficient
Time Coefficients
Varex
Variance Explained
- Return type:
dictionary
Examples
>>> out = Z.svd(N=10)
- corr(y, Dim='time', option=None)[source]¶
Compute correlation of data matrix A with index y.
This method compute the correlation of the data matrix with an index of the same length of the time dimension of A
The p-value returned by corr is a two-sided p-value. For a given sample with correlation coefficient r, the p-value is the probability that the absolute value of the correlation of a random sample x' and y' drawn from the population with zero correlation would be greater than or equal to the computed correlation. The algorithms is taken from scipy.stats.pearsonsr' that can be consulted for full reference
- Parameters:
y (xarray) -- Index, should have the same dimension length time
option (str) --
'probability' _Returns the probability (p-value) that the correlation is smaller than a random sample
'signicance' _Returns the significance level ( 1 - p-value)
- Returns:
According to option
* None -- corr : Correlation array
* 'Probability' -- corr : Correlation array prob : p-value array
* 'Significance' -- corr : Correlation array prob : Significance array
Examples
Correlation of data matrix Z with index
>>> corr = Z.corr(index) >>> corr,p = Z.corr(index,'Probability') >>> corr,s = Z.corr(index,'Significance')
- zapata.computation.feature_to_input(k, num, PsiX, Proj, icstart=0.15)[source]¶
Transform from Feature space to input space.
It computes an approximate back-image for the Gaussian kernels and and exact backimage for kernels based on scalar product whose nonlinear map can be inverted.
Still working on.
- Parameters:
k (Kernel) -- Kernel to be used
num -- Number of RKHS vectors to transform
PsiX (array (npoints,ntime)) -- Original data defininf the kernel
Proj -- Projction coefficients on Feature Space
icstart -- Starting Value for iteration
- Returns:
back_image
- Return type:
array(npoints, num)
- zapata.computation.make_random_index(dskm, inda, X, arealat, arealon)[source]¶
Generate an index from random sampling of modes
It computes an index defined over an area using a random sampling of the modes
Still working on.
- Parameters:
dskm (Data Set) -- Data set containing the Koopman decomposition
inda -- Number of modes to be considered in the random reconstruction
X -- Data array with geographiucal information
arealat -- Latitudinal boundaries of index calculation
arealon -- Longitudinal boundaries of index calculation
- Returns:
Correlation coefficient
- Return type:
c
- zapata.computation.zonal_average_era5(*, var: str = 'T', levels: List[int] | None = None, epoch: str = 'V5', root: str | Path = '.', time_mean: bool = False, output_file: str | Path | None = None, plot: bool = False, cmap: str = 'viridis', levels_cont: int | List[float] | None = None, vmin: float | None = None, vmax: float | None = None, title: str | None = None) Dataset [source]¶
Compute zonal‑mean ERA5 fields (optonal time mean).¶
This helper reads a sequence of single‑pressure‑level ERA5 NetCDF files whose names follow the CMCC naming convention and stack them into a latitude–pressure section.
` {root}/{var}/{var}_{level}_{epoch}.nc `
It performs longitude averaging (always) and, optionally, time averaging. The files are concatenated into a single xarray.Dataset whose dimensions are
Latitude – degrees north (inherited from the source files)
pressure – pressure level (hPa) – one entry per input file
time (optional) – kept only when no time‑mean is requested
The function is deliberately written without
open_mfdataset
to keep memory usage low: each file is opened, processed, and closed before moving to the next.- Parameters:
var (str, default "Z") -- Variable name (both inside the NetCDF files and in the filename).
levels (list[int] | None) -- Pressure levels (hPa). If None, the default CMCC set
[10, 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000]
is used.epoch (str, default "V5") -- Experiment / version tag that appears at the end of each filename.
root (str | Path, default ".") -- Base directory that contains one sub‑folder per variable.
time_mean (bool, default *False*) -- If True the field is averaged over the time dimension.
output_file (str | Path | None) -- Write the resulting dataset to this NetCDF file when provided.
plot (bool, default *False*) -- Produce a latitude–pressure plot (via
mapping.zonal_plot
).cmap, levels_cont, vmin, vmax, title -- Customisation options forwarded to the plotting backend.
- returns:
Dataset with coordinates
Latitude
andpressure
(and optionallytime
), containing a single data variable named var.- rtype:
xr.Dataset
Examples
Basic longitude mean (keep time):
>>> import era5_average as ea >>> ds = zcom.zonal_average_era5(var="T", root="/data/ERA5", plot=True)
Time and longitude mean:
>>> ds = zcom.zonal_average_era5( ... var="U", levels=[100, 200, 500], epoch="V4", ... root="/data/ERA5", time_mean=True, ... output_file="U_zonal_mean.nc")