Datasource

`CSVDataSource`

Bases: DataSource

Data source for loading CSV files into the backtesting framework.

This class extends DataSource to specifically handle CSV file loading, providing convenient initialization from CSV files with automatic parsing and validation.

Parameters:

Name	Type	Description	Default
`pathname`	`str`	Path to the CSV file to load.	required
`train_test_split`	`bool`	Whether to split data into train and test sets. Defaults to False.	`False`
`mode`	`str`	Mode for data splitting ('train' or 'test'). Only used when train_test_split=True. Defaults to "train".	`'train'`
`index_col`	`str`	Name of the column to use as index. Defaults to '0' (first column).	`0`

Example

source = CSVDataSource("data/AAPL.csv")
print(f"Loaded {len(source)} data points")

`init(pathname, train_test_split=False, mode='train', index_col=0)`

Initialize CSV data source from file path.

Parameters:

Name	Type	Description	Default
`pathname`	`str`	Path to the CSV file.	required
`train_test_split`	`bool`	Whether to split data.	`False`
`mode`	`str`	Split mode ('train' or 'test').	`'train'`
`index_col`	`str`	Column to use as datetime index.	`0`

`DataSource`

Base class for handling market data sources in backtesting.

This class provides the interface for loading and accessing OHLCV (Open, High, Low, Close, Volume) market data. It supports time series data with pandas DataFrame input and provides efficient array-based access to price data during backtesting.

The class handles: - Data validation (ensuring required columns are present) - Optional train/test splitting for walk-forward analysis - Efficient numpy array conversion for fast access - Current data point tracking for time series iteration

Attributes:

Name	Type	Description
`required_columns`	`list`	List of required column names ['Open', 'High', 'Low', 'Close', 'Volume'].

Example

import pandas as pd
df = pd.read_csv('data.csv')
source = DataSource(df)
print(f"Data points: {len(source)}")
print(f"Current close: {source.CClose}")

`CClose` `property`

Get the current close price at the current index.

Returns:

Type	Description
`float64`	np.float64: Close price for the current data point.

`CHigh` `property`

Get the current high price at the current index.

Returns:

Type	Description
`float64`	np.float64: High price for the current data point.

`CLow` `property`

Get the current low price at the current index.

Returns:

Type	Description
`float64`	np.float64: Low price for the current data point.

`COpen` `property`

Get the current open price at the current index.

Returns:

Type	Description
`float64`	np.float64: Open price for the current data point.

`CVolume` `property`

Get the current volume at the current index.

Returns:

Type	Description
`float64`	np.float64: Volume for the current data point.

`Close` `property`

Get the historical close prices up to current index.

Returns:

Type	Description
	np.ndarray: Array of close prices for all historical data points up to the current iteration index.

`High` `property`

Get the historical high prices up to current index.

Returns:

Type	Description
	np.ndarray: Array of high prices for all historical data points up to the current iteration index.

`Index` `property`

Get the datetime index of the data source.

Returns:

Type	Description
	pd.Index: Index containing the timestamps for each data point.

`Low` `property`

Get the historical low prices up to current index.

Returns:

Type	Description
	np.ndarray: Array of low prices for all historical data points up to the current iteration index.

`Open` `property`

Get the historical open prices up to current index.

Returns:

Type	Description
	np.ndarray: Array of open prices for all historical data points up to the current iteration index.

`Volume` `property`

Get the historical volume data up to current index.

Returns:

Type	Description
	np.ndarray: Array of volume values for all historical data points up to the current iteration index.

`init(df, train_test_split=False, mode='train')`

Initialize the data source with a pandas DataFrame.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	DataFrame containing OHLCV data with columns 'Open', 'High', 'Low', 'Close', 'Volume'. Index should be datetime values for proper period calculation.	required
`train_test_split`	`bool`	Whether to split data into train and test sets. Defaults to False.	`False`
`mode`	`str`	Mode for data splitting ('train' or 'test'). Only used when train_test_split=True. Defaults to "train".	`'train'`

Raises:

Type	Description
`ValueError`	If DataFrame doesn't contain required columns.

Note

Data is split 80/20 for train/test when enabled.
All data is converted to numpy arrays for efficient access.
Current index starts at the end for proper iteration.

`len()`

Get the total number of data points in the source.

Returns:

Name	Type	Description
`int`		Number of rows in the DataFrame.

`ParquetDataSource`

Bases: DataSource

Data source for loading Parquet files into the backtesting framework.

This class extends DataSource to specifically handle Parquet file loading, providing efficient binary format loading with automatic parsing and validation.

Parameters:

Name	Type	Description	Default
`pathname`	`str`	Path to the Parquet file to load.	required
`train_test_split`	`bool`	Whether to split data into train and test sets. Defaults to False.	`False`
`mode`	`str`	Mode for data splitting ('train' or 'test'). Only used when train_test_split=True. Defaults to "train".	`'train'`

Example

source = ParquetDataSource("data/market_data.parquet")
print(f"Loaded {len(source)} data points")

Note

Parquet files are generally faster to load than CSV files and preserve data types more accurately.

`init(pathname, train_test_split=False, mode='train')`

Initialize Parquet data source from file path.

Parameters:

Name	Type	Description	Default
`pathname`	`str`	Path to the Parquet file.	required
`train_test_split`	`bool`	Whether to split data.	`False`
`mode`	`str`	Split mode ('train' or 'test').	`'train'`

Source code

src/quantex/datasource.py - DataSource, CSVDataSource, and ParquetDataSource classes

Datasource

CSVDataSource

__init__(pathname, train_test_split=False, mode='train', index_col=0)

DataSource

CClose property

CHigh property

CLow property

COpen property

CVolume property

Close property

High property

Index property

Low property

Open property

Volume property

__init__(df, train_test_split=False, mode='train')

__len__()

ParquetDataSource

__init__(pathname, train_test_split=False, mode='train')

Source code

`CSVDataSource`

`init(pathname, train_test_split=False, mode='train', index_col=0)`

`DataSource`

`CClose` `property`

`CHigh` `property`

`CLow` `property`

`COpen` `property`

`CVolume` `property`

`Close` `property`

`High` `property`

`Index` `property`

`Low` `property`

`Open` `property`

`Volume` `property`

`init(df, train_test_split=False, mode='train')`

`len()`

`ParquetDataSource`

`init(pathname, train_test_split=False, mode='train')`