Skip to content

Data Sources Guide

This guide explains how market data enters Quantex and how strategy code reads it during a backtest.

If you are new to the library, start with this mental model:

  • a DataSource wraps historical OHLCV data
  • a strategy reads that data one bar at a time
  • the backtester moves the visible point forward during SimpleBacktester.run()

The three data source classes

The current codebase defines:

Required data format

Every data source must provide these columns:

  • Open
  • High
  • Low
  • Close
  • Volume

This requirement is enforced in DataSource.__init__().

The index should be time-based if you want meaningful annualization in backtest metrics, because BacktestReport.periods_per_year infers data frequency from the index.

Loading CSV data

The simplest entry point is CSVDataSource.

from quantex import CSVDataSource

source = CSVDataSource("data.csv")

print(len(source))
print(source.Index[0])
print(source.Index[-1])

CSVDataSource.__init__() reads the file with pandas using the first column as the parsed index by default.

Example CSV layout:

Date,Open,High,Low,Close,Volume
2024-01-01,100,101,99,100.5,1500
2024-01-02,100.5,102,100,101.8,1700
2024-01-03,101.8,103,101,102.6,1650

Loading Parquet data

Use ParquetDataSource when your historical data is stored as Parquet.

from quantex import ParquetDataSource

source = ParquetDataSource("data.parquet")
print(len(source))

Unlike some earlier documentation, you should pass a Parquet file path here, not a CSV path. ParquetDataSource.__init__() calls pandas read_parquet directly.

Creating a data source from a pandas DataFrame

If you already have data in memory, create a plain DataSource.

import pandas as pd
from quantex import DataSource

df = pd.DataFrame(
    {
        "Open": [100, 101, 102],
        "High": [101, 102, 103],
        "Low": [99, 100, 101],
        "Close": [100.5, 101.8, 102.6],
        "Volume": [1500, 1700, 1650],
    },
    index=pd.to_datetime(["2024-01-01", "2024-01-02", "2024-01-03"]),
)

source = DataSource(df)

Train/test split support

DataSource.__init__() has a built-in 80/20 split option.

train_source = DataSource(df, train_test_split=True, mode="train")
test_source = DataSource(df, train_test_split=True, mode="test")

This is simple dataset slicing inside the constructor, as tested in tests/test_datasource.py.

What you can read from a data source

Historical arrays

The following properties expose the visible history up to the current step:

Example:

recent_closes = source.Close[-20:]
latest_visible_high = source.High[-1]

Current-bar values

The following properties expose the single current bar:

Example:

current_open = source.COpen
current_close = source.CClose

Index access

The full index is available through DataSource.Index.

timestamp = source.Index[source.current_index]

How data becomes visible during a backtest

This is one of the most important pieces of the library.

During SimpleBacktester.run(), each source receives a new DataSource.current_index on every step.

That means:

  • DataSource.CClose points to the current bar
  • DataSource.Close includes only data up to that bar
  • future bars are not exposed through the normal visible-history properties

This is why strategy code can safely use slices like self.data["TEST"].Close[-20:] during a backtest.

Using data sources inside a strategy

from quantex import Strategy, CSVDataSource


class MyStrategy(Strategy):
    def init(self):
        self.add_data(CSVDataSource("eurusd.csv"), "EURUSD")

    def next(self):
        price_now = self.data["EURUSD"].CClose
        recent_prices = self.data["EURUSD"].Close[-10:]

        if price_now > recent_prices.mean():
            self.positions["EURUSD"].buy(quantity=0.1)

Strategy.add_data() stores the source in Strategy.data and creates a matching Broker in Strategy.positions.

Multiple data sources in one strategy

Multiple symbols

class MyStrategy(Strategy):
    def init(self):
        self.add_data(CSVDataSource("eurusd.csv"), "EURUSD")
        self.add_data(CSVDataSource("gbpusd.csv"), "GBPUSD")

    def next(self):
        if self.data["EURUSD"].CClose > self.data["GBPUSD"].CClose:
            self.positions["EURUSD"].buy(quantity=0.1)

Multiple timeframes

class MyStrategy(Strategy):
    def init(self):
        self.add_data(CSVDataSource("eurusd_m1.csv"), "EURUSD_M1")
        self.add_data(CSVDataSource("eurusd_h1.csv"), "EURUSD_H1")

    def next(self):
        fast_price = self.data["EURUSD_M1"].CClose
        slower_context = self.data["EURUSD_H1"].CClose

When you attach multiple sources, remember that SimpleBacktester.run() splits starting cash evenly across the brokers created for those sources.

Data validation tips

Before backtesting, check:

  1. required columns exist
  2. index ordering makes sense
  3. numeric values are numeric
  4. timestamps are what you expect

Example:

def validate_dataframe(df):
    required = ["Open", "High", "Low", "Close", "Volume"]
    missing = [col for col in required if col not in df.columns]

    if missing:
        raise ValueError(f"missing columns: {missing}")

    if not df.index.is_monotonic_increasing:
        df = df.sort_index()

    return df

Limitations to be aware of

The current data source API does not include:

  • a from_dataframe convenience constructor on CSVDataSource
  • built-in resampling helpers
  • built-in synchronization utilities for mismatched timelines

If you need those features, preprocess the pandas DataFrame first and then pass the result into DataSource.

Minimal preprocessing workflow

import pandas as pd
from quantex import DataSource


df = pd.read_csv("raw.csv", index_col=0, parse_dates=[0])
df = df.sort_index()
df = df.dropna(subset=["Open", "High", "Low", "Close", "Volume"])

source = DataSource(df)

Summary

Use:

For next steps, see Strategy guide and Backtesting guide.