cryptodatapy.transform.filter

Classes

Filter

Filters dataframe in tidy format.

Module Contents

class cryptodatapy.transform.filter.Filter(raw_df: pandas.DataFrame, excl_cols: str | list | None = None, plot: bool = False, plot_series: tuple = ('BTC', 'close'))

Filters dataframe in tidy format.

raw_df
excl_cols = None
plot = False
plot_series = ('BTC', 'close')
df
filtered_df = None
avg_trading_val(thresh_val: int = 10000000, window_size: int = 30) pandas.DataFrame

Filters values below a threshold of average trading value (price * volume/size in quote currency) over some lookback window, replacing them with NaNs.

Parameters:
  • thresh_val (int, default 10,000,000) – Threshold/cut-off for avg trading value.

  • window_size (int, default 30) – Size of rolling window.

Returns:

filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with values below the threshold removed.

Return type:

DataFrame - MultiIndex

missing_vals_gaps(gap_window: int = 30) pandas.DataFrame

Filters values before a large gap of missing values, replacing them with NaNs.

Parameters:

gap_window (int, default 30) – Size of window where all values are missing (NaNs).

Returns:

filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers (level 1) and fields (cols) with values before missing values gaps removed.

Return type:

DataFrame - MultiIndex

min_nobs(ts_obs=100, cs_obs=1) pandas.DataFrame

Removes tickers from dataframe if the ticker has less than a minimum number of observations and removes dates if there is less than a minimum number of tickers.

Parameters:
  • ts_obs (int, default 100) – Minimum number of observations for field/column over time series.

  • cs_obs (int, default 1) – Minimum number of observations for tickers over the cross-section.

Returns:

filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers with minimum number of observations (level 1) and fields (cols).

Return type:

DataFrame - MultiIndex

delisted_tickers(method: str = 'replace') pandas.DataFrame

Repairs delisted tickers by either removing them or replacing them with NaNs.

Parameters:

method (str, {'replace', 'remove'}, default 'replace') – Method to repair delisted tickers. Can be ‘remove’ or ‘replace’.

Returns:

filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers (level 1) and fields (cols).

Return type:

pd.DataFrame - MultiIndex

tickers(tickers_list) pandas.DataFrame

Removes specified tickers from dataframe.

Parameters:

tickers_list (str or list) – List of tickers to be removed. Can be used to remove tickers to be excluded from data analysis, e.g. stablecoins or indexes.

Returns:

filtered_df – Filtered dataFrame with DatetimeIndex (level 0), tickers (level 1) and fields (cols).

Return type:

pd.DataFrame - MultiIndex

plot_filtered(plot_series: tuple | None = None) None

Plots filtered time series.

Parameters:

plot_series (tuple, optional, default None) – Plots the time series of a specific (ticker, field) tuple.