cryptodatapy.transform.od
Classes
Detects outliers. |
Module Contents
- class cryptodatapy.transform.od.OutlierDetection(raw_df: pandas.DataFrame, excl_cols: str | list | None = None, log: bool = False, window_size: int = 7, model_type: str = 'estimation', thresh_val: int = 5, plot: bool = False, plot_series: tuple = ('BTC', 'close'))
Detects outliers.
- raw_df
- excl_cols = None
- log = False
- window_size = 7
- model_type = 'estimation'
- thresh_val = 5
- plot = False
- plot_series = ('BTC', 'close')
- df
- yhat = None
- outliers = None
- filtered_df = None
- log_transform() None
Log transform the dataframe.
- atr() pandas.DataFrame
Detects outliers using OHLC values and H-L range.
- Returns:
filtered_df – Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
- Return type:
pd.DataFrame - MultiIndex
- iqr() pandas.DataFrame
Detects outliers using interquartile range (IQR) method.
- Returns:
filtered_df – Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
- Return type:
pd.DataFrame - MultiIndex
- mad() pandas.DataFrame
Detects outliers using a median absolute deviation method, aka Hampler filter.
- Returns:
filtered_df – Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
- Return type:
pd.DataFrame - MultiIndex
- z_score() pandas.DataFrame
Detects outliers using a z-score method, aka simple moving average.
- Returns:
filtered_df – Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
- Return type:
pd.DataFrame - MultiIndex
- ewma() pandas.DataFrame
Detects outliers using an exponential moving average method.
- Returns:
filtered_df – Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
- Return type:
pd.DataFrame - MultiIndex
- seasonal_decomp(period: int = 7, model: str = 'additive', filt: numpy.array | None = None, two_sided: bool | None = True, extrapolate_trend: int | None = 0) Dict[str, pandas.DataFrame]
Detects outliers with seasonal decomposition moving averages from statsmodels.
- Parameters:
period (int, optional, default 7) – periodicity of the sequence.
model (str, {'additive', 'multiplicative'}, default 'additive') – Type of seasonal component.
filt (array-like, optional, default None) – The filter coefficients for filtering out the seasonal component. The concrete moving average method used in filtering is determined by two_sided.
two_sided (bool, optional, default True) – The moving average method used in filtering. If True (default), a centered moving average is computed using the filt. If False, the filter coefficients are for past values only.
extrapolate_trend (int, optional, default 0) – If set to > 0, the trend resulting from the convolution is linear least-squares extrapolated on both ends (or the single one if two_sided is False) considering this many (+1) closest points. If set to ‘freq’, use freq closest points. Setting this parameter results in no NaN values in trend or resid components.
- Returns:
filtered_df – Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
- Return type:
pd.DataFrame - MultiIndex
- stl(period: int | None = 7, seasonal: int | None = 7, trend: int | None = None, low_pass: int | None = None, seasonal_deg: int | None = 1, trend_deg: int | None = 1, low_pass_deg: int | None = 1, robust: bool | None = False, seasonal_jump: int | None = 1, trend_jump: int | None = 1, low_pass_jump: int | None = 1) pandas.DataFrame
Detects outliers with seasonal decomposition moving averages from statsmodels.
- Parameters:
period (int, optional, default 7) – Periodicity of the sequence.
seasonal (int, optional, default 7) – Length of the seasonal smoother. Must be an odd integer, and should normally be >= 7.
trend (int, optional, default None) – Length of the trend smoother. Must be an odd integer. If not provided uses the smallest odd integer greater than 1.5 * period / (1 - 1.5 / seasonal), following the suggestion in the original implementation.
low_pass (int, optional, default None) – Length of the low-pass filter. Must be an odd integer >=3. If not provided, uses the smallest odd integer > period.
seasonal_deg (int, optional, default 1) – Degree of seasonal LOESS. 0 (constant) or 1 (constant and trend).
trend_deg (int, optional, default 1) – Degree of trend LOESS. 0 (constant) or 1 (constant and trend).
low_pass_deg (int, optional, default 1) – Degree of low pass LOESS. 0 (constant) or 1 (constant and trend).
robust (bool, optional, default False) – Flag indicating whether to use a weighted version that is robust to some forms of outliers.
seasonal_jump (int, optional, default 1) – Positive integer determining the linear interpolation step. If larger than 1, the LOESS is used every seasonal_jump points and linear interpolation is between fitted points. Higher values reduce estimation time.
trend_jump (int, optional, default 1) – Positive integer determining the linear interpolation step. If larger than 1, the LOESS is used every trend_jump points and values between the two are linearly interpolated. Higher values reduce estimation time.
low_pass_jump (int, optional, default 1) – Positive integer determining the linear interpolation step. If larger than 1, the LOESS is used every low_pass_jump points and values between the two are linearly interpolated. Higher values reduce estimation time.
- Returns:
filtered_df – Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
- Return type:
pd.DataFrame - MultiIndex
- prophet(interval_width: float | None = 0.999) pandas.DataFrame
Detects outliers using Prophet, a time series forecasting algorithm published by Facebook.
- Parameters:
interval_width (float, optional, default 0.99) – Uncertainty interval estimated by Monte Carlo simulation. The larger the value, the larger the upper/lower thresholds interval for outlier detection.
- Returns:
filtered_df – Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
- Return type:
pd.DataFrame - MultiIndex
- plot_outliers() None
Plots time series with outliers highlighted (red dots).