base_wrangler

Attributes

logger

Classes

BaseDataWrangler

Base class for vendor-specific data wranglers.

Functions

_load_field_map(→ Dict[str, Dict[str, str]])

Loads fields.csv from the config path and creates a nested dictionary map:

Module Contents

base_wrangler.logger
base_wrangler._load_field_map() Dict[str, Dict[str, str]]

Loads fields.csv from the config path and creates a nested dictionary map: {‘vendor_name’: {‘vendor_field_lower’: ‘CRYPTODATAPY_FIELD’}}

class base_wrangler.BaseDataWrangler(data_req: cryptodatapy.extract.datarequest.DataRequest, data_resp: Dict | pandas.DataFrame)

Bases: abc.ABC

Base class for vendor-specific data wranglers. Handles common data cleaning, filtering, and field mapping operations.

_FIELD_MAP
_DEFAULT_AGG_MAP
data_req
data_resp
field_map
_convert_fields_to_lib(data_source: str) None

Convert columns from vendor field names to CryptoDataPy standard field names using the dictionary map. Mutates self.data_resp.

_set_index_and_sort(index_cols: str | List[str] = 'date') None

Sets the index and sorts the DataFrame by the index.

It ensures that if ‘date’ is part of the index, it is converted to a date-only Timestamp (time component set to 00:00:00) while retaining the datetime64[ns] dtype for optimal index performance.

_filter_dates() None

Filters data response based on start and end dates in data_req.

_resample(agg_func: str | Dict[str, str] | None = None) None

Resamples the DataFrame to the frequency in the data_req.

Logic: 1. If agg_func is a string (‘last’, ‘sum’), it applies to all columns. 2. If agg_func is None, it uses the DEFAULT_AGG_MAP for known columns. 3. If a column isn’t in the map, it defaults to ‘last’.

Parameters:

agg_func (str or dict, optional) – Aggregation function(s) to use during resampling. If a string is provided, it applies to all columns. If a dict is provided, it should map column names to aggregation functions. If None, the default aggregation map is used.

_reorder_columns(requested_fields: bool = False) None

Reorders columns based on the provided column order list.

Parameters:

requested_fields (bool) – If True, only requested fields are kept and ordered. If False, all columns are kept.

_clean_data() None

Removes duplicates, NaNs (full row/col), and 0 values.

_convert_types() None

Converts columns to appropriate numeric types, explicitly excluding known string/metadata columns, and uses standard pandas dtypes.

abstractmethod wrangle() pandas.DataFrame

Abstract method for wrangling. Must be implemented by child classes.