Skip to content

Datasource

CSVDataSource

Bases: DataSource

Data source for loading CSV files into the backtesting framework.

This class extends DataSource to specifically handle CSV file loading, providing convenient initialization from CSV files with automatic parsing and validation.

Parameters:

Name Type Description Default
pathname str

Path to the CSV file to load.

required
train_test_split bool

Whether to split data into train and test sets. Defaults to False.

False
mode str

Mode for data splitting ('train' or 'test'). Only used when train_test_split=True. Defaults to "train".

'train'
index_col str

Name of the column to use as index. Defaults to '0' (first column).

0
Example

source = CSVDataSource("data/AAPL.csv")
print(f"Loaded {len(source)} data points")

__init__(pathname, train_test_split=False, mode='train', index_col=0)

Initialize CSV data source from file path.

Parameters:

Name Type Description Default
pathname str

Path to the CSV file.

required
train_test_split bool

Whether to split data.

False
mode str

Split mode ('train' or 'test').

'train'
index_col str

Column to use as datetime index.

0

DataSource

Base class for handling market data sources in backtesting.

This class provides the interface for loading and accessing OHLCV (Open, High, Low, Close, Volume) market data. It supports time series data with pandas DataFrame input and provides efficient array-based access to price data during backtesting.

The class handles: - Data validation (ensuring required columns are present) - Optional train/test splitting for walk-forward analysis - Efficient numpy array conversion for fast access - Current data point tracking for time series iteration

Attributes:

Name Type Description
required_columns list

List of required column names ['Open', 'High', 'Low', 'Close', 'Volume'].

Example

import pandas as pd
df = pd.read_csv('data.csv')
source = DataSource(df)
print(f"Data points: {len(source)}")
print(f"Current close: {source.CClose}")

CClose property

Get the current close price at the current index.

Returns:

Type Description
float64

np.float64: Close price for the current data point.

CHigh property

Get the current high price at the current index.

Returns:

Type Description
float64

np.float64: High price for the current data point.

CLow property

Get the current low price at the current index.

Returns:

Type Description
float64

np.float64: Low price for the current data point.

COpen property

Get the current open price at the current index.

Returns:

Type Description
float64

np.float64: Open price for the current data point.

CVolume property

Get the current volume at the current index.

Returns:

Type Description
float64

np.float64: Volume for the current data point.

Close property

Get the historical close prices up to current index.

Returns:

Type Description

np.ndarray: Array of close prices for all historical data points up to the current iteration index.

High property

Get the historical high prices up to current index.

Returns:

Type Description

np.ndarray: Array of high prices for all historical data points up to the current iteration index.

Index property

Get the datetime index of the data source.

Returns:

Type Description

pd.Index: Index containing the timestamps for each data point.

Low property

Get the historical low prices up to current index.

Returns:

Type Description

np.ndarray: Array of low prices for all historical data points up to the current iteration index.

Open property

Get the historical open prices up to current index.

Returns:

Type Description

np.ndarray: Array of open prices for all historical data points up to the current iteration index.

Volume property

Get the historical volume data up to current index.

Returns:

Type Description

np.ndarray: Array of volume values for all historical data points up to the current iteration index.

__init__(df, train_test_split=False, mode='train')

Initialize the data source with a pandas DataFrame.

Parameters:

Name Type Description Default
df DataFrame

DataFrame containing OHLCV data with columns 'Open', 'High', 'Low', 'Close', 'Volume'. Index should be datetime values for proper period calculation.

required
train_test_split bool

Whether to split data into train and test sets. Defaults to False.

False
mode str

Mode for data splitting ('train' or 'test'). Only used when train_test_split=True. Defaults to "train".

'train'

Raises:

Type Description
ValueError

If DataFrame doesn't contain required columns.

Note
  • Data is split 80/20 for train/test when enabled.
  • All data is converted to numpy arrays for efficient access.
  • Current index starts at the end for proper iteration.

__len__()

Get the total number of data points in the source.

Returns:

Name Type Description
int

Number of rows in the DataFrame.

ParquetDataSource

Bases: DataSource

Data source for loading Parquet files into the backtesting framework.

This class extends DataSource to specifically handle Parquet file loading, providing efficient binary format loading with automatic parsing and validation.

Parameters:

Name Type Description Default
pathname str

Path to the Parquet file to load.

required
train_test_split bool

Whether to split data into train and test sets. Defaults to False.

False
mode str

Mode for data splitting ('train' or 'test'). Only used when train_test_split=True. Defaults to "train".

'train'
Example

source = ParquetDataSource("data/market_data.parquet")
print(f"Loaded {len(source)} data points")

Note

Parquet files are generally faster to load than CSV files and preserve data types more accurately.

__init__(pathname, train_test_split=False, mode='train')

Initialize Parquet data source from file path.

Parameters:

Name Type Description Default
pathname str

Path to the Parquet file.

required
train_test_split bool

Whether to split data.

False
mode str

Split mode ('train' or 'test').

'train'

Source code