cryptodatapy.transform.od ========================= .. py:module:: cryptodatapy.transform.od Classes ------- .. autoapisummary:: cryptodatapy.transform.od.OutlierDetection Module Contents --------------- .. py:class:: OutlierDetection(raw_df: pandas.DataFrame, excl_cols: Optional[Union[str, list]] = None, log: bool = False, window_size: int = 7, model_type: str = 'estimation', thresh_val: int = 5, plot: bool = False, plot_series: tuple = ('BTC', 'close')) Detects outliers. .. py:attribute:: raw_df .. py:attribute:: excl_cols :value: None .. py:attribute:: log :value: False .. py:attribute:: window_size :value: 7 .. py:attribute:: model_type :value: 'estimation' .. py:attribute:: thresh_val :value: 5 .. py:attribute:: plot :value: False .. py:attribute:: plot_series :value: ('BTC', 'close') .. py:attribute:: df .. py:attribute:: yhat :value: None .. py:attribute:: outliers :value: None .. py:attribute:: filtered_df :value: None .. py:method:: log_transform() -> None Log transform the dataframe. .. py:method:: atr() -> pandas.DataFrame Detects outliers using OHLC values and H-L range. :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed. :rtype: pd.DataFrame - MultiIndex .. py:method:: iqr() -> pandas.DataFrame Detects outliers using interquartile range (IQR) method. :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed. :rtype: pd.DataFrame - MultiIndex .. py:method:: mad() -> pandas.DataFrame Detects outliers using a median absolute deviation method, aka Hampler filter. :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed. :rtype: pd.DataFrame - MultiIndex .. py:method:: z_score() -> pandas.DataFrame Detects outliers using a z-score method, aka simple moving average. :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed. :rtype: pd.DataFrame - MultiIndex .. py:method:: ewma() -> pandas.DataFrame Detects outliers using an exponential moving average method. :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed. :rtype: pd.DataFrame - MultiIndex .. py:method:: seasonal_decomp(period: int = 7, model: str = 'additive', filt: Optional[numpy.array] = None, two_sided: Optional[bool] = True, extrapolate_trend: Optional[int] = 0) -> Dict[str, pandas.DataFrame] Detects outliers with seasonal decomposition moving averages from statsmodels. https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html#statsmodels.tsa.seasonal.seasonal_decompose :param period: periodicity of the sequence. :type period: int, optional, default 7 :param model: Type of seasonal component. :type model: str, {'additive', 'multiplicative'}, default 'additive' :param filt: The filter coefficients for filtering out the seasonal component. The concrete moving average method used in filtering is determined by two_sided. :type filt: array-like, optional, default None :param two_sided: The moving average method used in filtering. If True (default), a centered moving average is computed using the filt. If False, the filter coefficients are for past values only. :type two_sided: bool, optional, default True :param extrapolate_trend: If set to > 0, the trend resulting from the convolution is linear least-squares extrapolated on both ends (or the single one if two_sided is False) considering this many (+1) closest points. If set to ‘freq’, use freq closest points. Setting this parameter results in no NaN values in trend or resid components. :type extrapolate_trend: int, optional, default 0 :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed. :rtype: pd.DataFrame - MultiIndex .. py:method:: stl(period: Optional[int] = 7, seasonal: Optional[int] = 7, trend: Optional[int] = None, low_pass: Optional[int] = None, seasonal_deg: Optional[int] = 1, trend_deg: Optional[int] = 1, low_pass_deg: Optional[int] = 1, robust: Optional[bool] = False, seasonal_jump: Optional[int] = 1, trend_jump: Optional[int] = 1, low_pass_jump: Optional[int] = 1) -> pandas.DataFrame Detects outliers with seasonal decomposition moving averages from statsmodels. https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html#statsmodels.tsa.seasonal.seasonal_decompose :param period: Periodicity of the sequence. :type period: int, optional, default 7 :param seasonal: Length of the seasonal smoother. Must be an odd integer, and should normally be >= 7. :type seasonal: int, optional, default 7 :param trend: Length of the trend smoother. Must be an odd integer. If not provided uses the smallest odd integer greater than 1.5 * period / (1 - 1.5 / seasonal), following the suggestion in the original implementation. :type trend: int, optional, default None :param low_pass: Length of the low-pass filter. Must be an odd integer >=3. If not provided, uses the smallest odd integer > period. :type low_pass: int, optional, default None :param seasonal_deg: Degree of seasonal LOESS. 0 (constant) or 1 (constant and trend). :type seasonal_deg: int, optional, default 1 :param trend_deg: Degree of trend LOESS. 0 (constant) or 1 (constant and trend). :type trend_deg: int, optional, default 1 :param low_pass_deg: Degree of low pass LOESS. 0 (constant) or 1 (constant and trend). :type low_pass_deg: int, optional, default 1 :param robust: Flag indicating whether to use a weighted version that is robust to some forms of outliers. :type robust: bool, optional, default False :param seasonal_jump: Positive integer determining the linear interpolation step. If larger than 1, the LOESS is used every seasonal_jump points and linear interpolation is between fitted points. Higher values reduce estimation time. :type seasonal_jump: int, optional, default 1 :param trend_jump: Positive integer determining the linear interpolation step. If larger than 1, the LOESS is used every trend_jump points and values between the two are linearly interpolated. Higher values reduce estimation time. :type trend_jump: int, optional, default 1 :param low_pass_jump: Positive integer determining the linear interpolation step. If larger than 1, the LOESS is used every low_pass_jump points and values between the two are linearly interpolated. Higher values reduce estimation time. :type low_pass_jump: int, optional, default 1 :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed. :rtype: pd.DataFrame - MultiIndex .. py:method:: prophet(interval_width: Optional[float] = 0.999) -> pandas.DataFrame Detects outliers using Prophet, a time series forecasting algorithm published by Facebook. :param interval_width: Uncertainty interval estimated by Monte Carlo simulation. The larger the value, the larger the upper/lower thresholds interval for outlier detection. :type interval_width: float, optional, default 0.99 :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed. :rtype: pd.DataFrame - MultiIndex .. py:method:: plot_outliers() -> None Plots time series with outliers highlighted (red dots).