cryptodatapy.transform.filter
Classes
Filters dataframe in tidy format. |
Module Contents
- class cryptodatapy.transform.filter.Filter(raw_df: pandas.DataFrame, excl_cols: str | list | None = None, plot: bool = False, plot_series: tuple = ('BTC', 'close'))
Filters dataframe in tidy format.
- raw_df
- excl_cols = None
- plot = False
- plot_series = ('BTC', 'close')
- df
- filtered_df = None
- avg_trading_val(thresh_val: int = 10000000, window_size: int = 30) pandas.DataFrame
Filters values below a threshold of average trading value (price * volume/size in quote currency) over some lookback window, replacing them with NaNs.
- Parameters:
thresh_val (int, default 10,000,000) – Threshold/cut-off for avg trading value.
window_size (int, default 30) – Size of rolling window.
- Returns:
filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with values below the threshold removed.
- Return type:
DataFrame - MultiIndex
- missing_vals_gaps(gap_window: int = 30) pandas.DataFrame
Filters values before a large gap of missing values, replacing them with NaNs.
- Parameters:
gap_window (int, default 30) – Size of window where all values are missing (NaNs).
- Returns:
filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with values before missing values gaps removed.
- Return type:
DataFrame - MultiIndex
- min_nobs(ts_obs=100, cs_obs=1) pandas.DataFrame
Removes tickers from dataframe if the ticker has less than a minimum number of observations and removes dates if there is less than a minimum number of tickers.
- Parameters:
ts_obs (int, default 100) – Minimum number of observations for field/column over time series.
cs_obs (int, default 1) – Minimum number of observations for tickers over the cross-section.
- Returns:
filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers with minimum number of observations (level 1) and fields (cols).
- Return type:
DataFrame - MultiIndex
- delisted_tickers(method: str = 'replace') pandas.DataFrame
Repairs delisted tickers by either removing them or replacing them with NaNs.
- Parameters:
method (str, {'replace', 'remove'}, default 'replace') – Method to repair delisted tickers. Can be ‘remove’ or ‘replace’.
- Returns:
filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers (level 1) and fields (cols).
- Return type:
pd.DataFrame - MultiIndex
- tickers(tickers_list) pandas.DataFrame
Removes specified tickers from dataframe.
- Parameters:
tickers_list (str or list) – List of tickers to be removed. Can be used to remove tickers to be excluded from data analysis, e.g. stablecoins or indexes.
- Returns:
filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers (level 1) and fields (cols).
- Return type:
pd.DataFrame - MultiIndex
- plot_filtered(plot_series: tuple | None = None) None
Plots filtered time series.
- Parameters:
plot_series (tuple, optional, default None) – Plots the time series of a specific (ticker, field) tuple.