cryptodatapy.transform.od
=========================

.. py:module:: cryptodatapy.transform.od


Classes
-------

.. autoapisummary::

   cryptodatapy.transform.od.OutlierDetection


Module Contents
---------------

.. py:class:: OutlierDetection(raw_df: pandas.DataFrame, excl_cols: Optional[Union[str, list]] = None, log: bool = False, window_size: int = 7, model_type: str = 'estimation', thresh_val: int = 5, plot: bool = False, plot_series: tuple = ('BTC', 'close'))

   Detects outliers.


   .. py:attribute:: raw_df


   .. py:attribute:: excl_cols
      :value: None


   .. py:attribute:: log
      :value: False


   .. py:attribute:: window_size
      :value: 7


   .. py:attribute:: model_type
      :value: 'estimation'


   .. py:attribute:: thresh_val
      :value: 5


   .. py:attribute:: plot
      :value: False


   .. py:attribute:: plot_series
      :value: ('BTC', 'close')


   .. py:attribute:: df


   .. py:attribute:: yhat
      :value: None


   .. py:attribute:: outliers
      :value: None


   .. py:attribute:: filtered_df
      :value: None


   .. py:method:: log_transform() -> None

      Log transform the dataframe.


   .. py:method:: atr() -> pandas.DataFrame

      Detects outliers using OHLC values and H-L range.

      :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
      :rtype: pd.DataFrame - MultiIndex


   .. py:method:: iqr() -> pandas.DataFrame

      Detects outliers using interquartile range (IQR) method.

      :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
      :rtype: pd.DataFrame - MultiIndex


   .. py:method:: mad() -> pandas.DataFrame

      Detects outliers using a median absolute deviation method, aka Hampler filter.

      :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
      :rtype: pd.DataFrame - MultiIndex


   .. py:method:: z_score() -> pandas.DataFrame

      Detects outliers using a z-score method, aka simple moving average.

      :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
      :rtype: pd.DataFrame - MultiIndex


   .. py:method:: ewma() -> pandas.DataFrame

      Detects outliers using an exponential moving average method.

      :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
      :rtype: pd.DataFrame - MultiIndex


   .. py:method:: seasonal_decomp(period: int = 7, model: str = 'additive', filt: Optional[numpy.array] = None, two_sided: Optional[bool] = True, extrapolate_trend: Optional[int] = 0) -> Dict[str, pandas.DataFrame]

      Detects outliers with seasonal decomposition moving averages from statsmodels.

      https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html#statsmodels.tsa.seasonal.seasonal_decompose

      :param period: periodicity of the sequence.
      :type period: int, optional, default 7
      :param model: Type of seasonal component.
      :type model: str, {'additive', 'multiplicative'}, default 'additive'
      :param filt: The filter coefficients for filtering out the seasonal component.
                   The concrete moving average method used in filtering is determined by two_sided.
      :type filt: array-like, optional, default None
      :param two_sided: The moving average method used in filtering. If True (default), a centered moving average is
                        computed using the filt. If False, the filter coefficients are for past values only.
      :type two_sided: bool, optional, default True
      :param extrapolate_trend: If set to > 0, the trend resulting from the convolution is linear least-squares extrapolated
                                on both ends (or the single one if two_sided is False) considering this many (+1) closest points.
                                If set to ‘freq’, use freq closest points. Setting this parameter results in no NaN values in trend
                                or resid components.
      :type extrapolate_trend: int, optional, default 0

      :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
      :rtype: pd.DataFrame - MultiIndex


   .. py:method:: stl(period: Optional[int] = 7, seasonal: Optional[int] = 7, trend: Optional[int] = None, low_pass: Optional[int] = None, seasonal_deg: Optional[int] = 1, trend_deg: Optional[int] = 1, low_pass_deg: Optional[int] = 1, robust: Optional[bool] = False, seasonal_jump: Optional[int] = 1, trend_jump: Optional[int] = 1, low_pass_jump: Optional[int] = 1) -> pandas.DataFrame

      Detects outliers with seasonal decomposition moving averages from statsmodels.

      https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.seasonal_decompose.html#statsmodels.tsa.seasonal.seasonal_decompose

      :param period: Periodicity of the sequence.
      :type period: int, optional, default 7
      :param seasonal: Length of the seasonal smoother. Must be an odd integer, and should normally be >= 7.
      :type seasonal: int, optional, default 7
      :param trend: Length of the trend smoother. Must be an odd integer.
                    If not provided uses the smallest odd integer greater than 1.5 * period / (1 - 1.5 / seasonal),
                    following the suggestion in the original implementation.
      :type trend: int, optional, default None
      :param low_pass: Length of the low-pass filter. Must be an odd integer >=3. If not provided,
                       uses the smallest odd integer > period.
      :type low_pass: int, optional, default None
      :param seasonal_deg: Degree of seasonal LOESS. 0 (constant) or 1 (constant and trend).
      :type seasonal_deg: int, optional, default 1
      :param trend_deg: Degree of trend LOESS. 0 (constant) or 1 (constant and trend).
      :type trend_deg: int, optional, default 1
      :param low_pass_deg: Degree of low pass LOESS. 0 (constant) or 1 (constant and trend).
      :type low_pass_deg: int, optional, default 1
      :param robust: Flag indicating whether to use a weighted version that is robust to some forms of outliers.
      :type robust: bool, optional, default False
      :param seasonal_jump: Positive integer determining the linear interpolation step. If larger than 1,
                            the LOESS is used every seasonal_jump points and linear interpolation is between fitted points.
                            Higher values reduce estimation time.
      :type seasonal_jump: int, optional, default 1
      :param trend_jump: Positive integer determining the linear interpolation step. If larger than 1,
                         the LOESS is used every trend_jump points and values between the two are linearly interpolated.
                         Higher values reduce estimation time.
      :type trend_jump: int, optional, default 1
      :param low_pass_jump: Positive integer determining the linear interpolation step. If larger than 1,
                            the LOESS is used every low_pass_jump points and values between the two are linearly interpolated.
                            Higher values reduce estimation time.
      :type low_pass_jump: int, optional, default 1

      :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
      :rtype: pd.DataFrame - MultiIndex


   .. py:method:: prophet(interval_width: Optional[float] = 0.999) -> pandas.DataFrame

      Detects outliers using Prophet, a time series forecasting algorithm published by Facebook.

      :param interval_width: Uncertainty interval estimated by Monte Carlo simulation. The larger the value,
                             the larger the upper/lower thresholds interval for outlier detection.
      :type interval_width: float, optional, default 0.99

      :returns: **filtered_df** -- Filtered dataframe with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with outliers removed.
      :rtype: pd.DataFrame - MultiIndex


   .. py:method:: plot_outliers() -> None

      Plots time series with outliers highlighted (red dots).