Datasource
CSVDataSource
Bases: DataSource
Data source for loading CSV files into the backtesting framework.
This class extends DataSource to specifically handle CSV file loading, providing convenient initialization from CSV files with automatic parsing and validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pathname
|
str
|
Path to the CSV file to load. |
required |
train_test_split
|
bool
|
Whether to split data into train and test sets. Defaults to False. |
False
|
mode
|
str
|
Mode for data splitting ('train' or 'test'). Only used when train_test_split=True. Defaults to "train". |
'train'
|
index_col
|
str
|
Name of the column to use as index. Defaults to '0' (first column). |
0
|
Example
source = CSVDataSource("data/AAPL.csv")
print(f"Loaded {len(source)} data points")
__init__(pathname, train_test_split=False, mode='train', index_col=0)
Initialize CSV data source from file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pathname
|
str
|
Path to the CSV file. |
required |
train_test_split
|
bool
|
Whether to split data. |
False
|
mode
|
str
|
Split mode ('train' or 'test'). |
'train'
|
index_col
|
str
|
Column to use as datetime index. |
0
|
DataSource
Base class for handling market data sources in backtesting.
This class provides the interface for loading and accessing OHLCV (Open, High, Low, Close, Volume) market data. It supports time series data with pandas DataFrame input and provides efficient array-based access to price data during backtesting.
The class handles: - Data validation (ensuring required columns are present) - Optional train/test splitting for walk-forward analysis - Efficient numpy array conversion for fast access - Current data point tracking for time series iteration
Attributes:
| Name | Type | Description |
|---|---|---|
required_columns |
list
|
List of required column names ['Open', 'High', 'Low', 'Close', 'Volume']. |
Example
import pandas as pd
df = pd.read_csv('data.csv')
source = DataSource(df)
print(f"Data points: {len(source)}")
print(f"Current close: {source.CClose}")
CClose
property
Get the current close price at the current index.
Returns:
| Type | Description |
|---|---|
float64
|
np.float64: Close price for the current data point. |
CHigh
property
Get the current high price at the current index.
Returns:
| Type | Description |
|---|---|
float64
|
np.float64: High price for the current data point. |
CLow
property
Get the current low price at the current index.
Returns:
| Type | Description |
|---|---|
float64
|
np.float64: Low price for the current data point. |
COpen
property
Get the current open price at the current index.
Returns:
| Type | Description |
|---|---|
float64
|
np.float64: Open price for the current data point. |
CVolume
property
Get the current volume at the current index.
Returns:
| Type | Description |
|---|---|
float64
|
np.float64: Volume for the current data point. |
Close
property
Get the historical close prices up to current index.
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Array of close prices for all historical data points up to the current iteration index. |
High
property
Get the historical high prices up to current index.
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Array of high prices for all historical data points up to the current iteration index. |
Index
property
Get the datetime index of the data source.
Returns:
| Type | Description |
|---|---|
|
pd.Index: Index containing the timestamps for each data point. |
Low
property
Get the historical low prices up to current index.
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Array of low prices for all historical data points up to the current iteration index. |
Open
property
Get the historical open prices up to current index.
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Array of open prices for all historical data points up to the current iteration index. |
Volume
property
Get the historical volume data up to current index.
Returns:
| Type | Description |
|---|---|
|
np.ndarray: Array of volume values for all historical data points up to the current iteration index. |
__init__(df, train_test_split=False, mode='train')
Initialize the data source with a pandas DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame containing OHLCV data with columns 'Open', 'High', 'Low', 'Close', 'Volume'. Index should be datetime values for proper period calculation. |
required |
train_test_split
|
bool
|
Whether to split data into train and test sets. Defaults to False. |
False
|
mode
|
str
|
Mode for data splitting ('train' or 'test'). Only used when train_test_split=True. Defaults to "train". |
'train'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If DataFrame doesn't contain required columns. |
Note
- Data is split 80/20 for train/test when enabled.
- All data is converted to numpy arrays for efficient access.
- Current index starts at the end for proper iteration.
__len__()
Get the total number of data points in the source.
Returns:
| Name | Type | Description |
|---|---|---|
int |
Number of rows in the DataFrame. |
ParquetDataSource
Bases: DataSource
Data source for loading Parquet files into the backtesting framework.
This class extends DataSource to specifically handle Parquet file loading, providing efficient binary format loading with automatic parsing and validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pathname
|
str
|
Path to the Parquet file to load. |
required |
train_test_split
|
bool
|
Whether to split data into train and test sets. Defaults to False. |
False
|
mode
|
str
|
Mode for data splitting ('train' or 'test'). Only used when train_test_split=True. Defaults to "train". |
'train'
|
Example
source = ParquetDataSource("data/market_data.parquet")
print(f"Loaded {len(source)} data points")
Note
Parquet files are generally faster to load than CSV files and preserve data types more accurately.
__init__(pathname, train_test_split=False, mode='train')
Initialize Parquet data source from file path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pathname
|
str
|
Path to the Parquet file. |
required |
train_test_split
|
bool
|
Whether to split data. |
False
|
mode
|
str
|
Split mode ('train' or 'test'). |
'train'
|
Source code
src/quantex/datasource.py- DataSource, CSVDataSource, and ParquetDataSource classes