


Contents
Python Backtesting Libraries For Quant Trading Strategies
Written by Khang Nguyen Vo, khangvo88@gmail.com, for the RobustTechHouse (Mobile App Development Singapore) blog. Khang is a graduate from the Masters of Quantitative and Computational Finance Program, John Von Neumann Institute 2014. He is passionate about research in machine learning, predictive modeling and backtesting of trading strategies.
Frequently Mentioned Python Backtesting Libraries
It is essential to backtest quant trading strategies before trading them with real money. Here, we review frequently used Python backtesting libraries. We examine them in terms of flexibility (can be used for backtesting, paper-trading as well as live-trading), ease of use (good documentation, good structure) and scalability (speed, simplicity, and compatibility with other libraries).
- Zipline: This is an event-driven backtesting framework used by Quantopian.
- Zipline has a great community, good documentation, great support for Interactive Broker (IB) and Pandas integration. The syntax is clear and easy to learn.
- It has a lot of examples. If your main goal for trading is US equity, then this framework might be the best candidate. Quantopian allows one to backtest, share, and discuss trading strategies in its community.
- However, in our experiment, Zipline is extremely slow. This is the biggest disadvantage of this library. Quantopian has some work-around such as running the Zipline library in parallel in the cloud. You can take a look at this post if this interests you.
- Zipline also seems to work poorly with local file and non-US data.
- It is difficult to use this framework for different financial asset classes.
- PyAlgoTrade: This is another event-driven library which is active and supports backtesting, paper-trading and live-trading. It is well-documented and also supports TA-Lib integration (Technical Analysis library). It outperforms Zipline in terms of speed and flexibility. However, one big drawback of PyAlgoTrade is that it does not support Pandas-object and Pandas modules.
- pybacktest: Vectorized backtesting framework in Python that is very simple and light-weight. This project seemed to be revived again recently on May 21st,2015.
- TradingWithPython: Jev Kuznetsov extended the pybacktest library and build his own backtester. This library seems to updated recently in Feb 2015. However, the documentation and course for this library costs $395.
- Some other projects: ultra-finance
Python Backtesting Libraries are summarized in the following table:
Zipline | PyAlgoTrade | TradingWithPython | pybacktest | |
Type | Event-driven | Event-driven | Vectorized | Vectorized |
Community | Great | Normal | No | No |
Cloud | Quantopian | No | No | No |
Interactive Broker support | Yes | No | No | No |
Data feed | Yahoo, Google, NinjaTrader | Yahoo, Google, NinjaTrader, Xignite, Bitstamp realtime feed | ||
Documentation | Great | Great | $395 | Poor |
Event profile | Yes | Yes | ||
Speed | Slow | Fast | ||
Pandas Supported | Yes | No | Yes | Yes |
Trading calendar | Yes | No | No | No |
TA-Lib support | Yes | Yes | Yes | |
Suitable for | US-equity only | Real trading Paper-test trading | Paper-test trading | Paper-test trading |
Zipline vs PyAlgoTrade Python Backtesting Libraries
We will focus on comparing the more popular Zipline and PyAlgoTrade Python Backtesting Libraries below.
1. Zipline:
The documentation could be found on http://www.zipline.io/tutorial/ and you can find some implementations on Quantopian. We do not go into detail of how to use this library here since the documentation is clear and concise. The sample script below just shows how this Python Backtesting library works for a simple strategy.
The syntax for zipline is very clear and simple and it is suitable for newbies so they can focus on the main trading algorithm strategy itself. Its other strengths include:
- Good documentations, great community
- IPython-compatible: support %%zipline
- Input and output for zipline is based on Pandas DataFrame. This is a big advantage since Pandas is the biggest and easiest library to use for data analysis and modeling
- Support slippage (or impact model, that means when you buy or sell, this action will impact the real price) and Commission model (the cost of transaction). Modeling makes trading strategies more realistic.
import pytz from datetime import datetime import zipline from zipline.api import order, record, symbol from zipline.algorithm import TradingAlgorithm from zipline.utils.factory import load_bars_from_yahoo # Load data manually from Yahoo! finance start = datetime(2000, 1, 1, 0, 0, 0, 0, pytz.utc) end = datetime(2012, 1, 1, 0, 0, 0, 0, pytz.utc) data = load_bars_from_yahoo(stocks=['AAPL'], start=start, end=end) print type(data["AAPL"]); print data["AAPL"] #this is create cache file for benchmarks. SHOULD ONLY RUN ONCE zipline.data.loader.dump_benchmarks('SPY') # Define algorithm def initialize(context): pass def handle_data(context, data): order(symbol('AAPL'), 10) record(AAPL=data[symbol('AAPL')].price) # Create algorithm object passing in initialize, handle_data functions algo_obj = TradingAlgorithm(initialize=initialize, handle_data=handle_data) import time start_time = time.time() #calculate the running time for i in xrange(10): perf_manual = algo_obj.run(data) print("--- %s seconds ---" % (time.time() - start_time))
This trading strategy is simple, we basically buy 10 shares in each iteration. Note that zipline allows negative cash, so the order is always filled. The iteration occurs in the handle_data() function and then each bar data will be fetched into data variable. Each bar data is defined as follows:
BarData({'AAPL': SIDData({'high': 3.8190101840271575, 'open': 3.5603358942290511, 'price': 3.8, 'volume': 133949200, 'low': 3.452045738788637, 'sid': 'AAPL', 'source_id': 'DataPanelSource-6d0572f7ed3cad6d52522c275aee663d', 'close': 3.7999999999999998, 'dt': Timestamp('2000-01-03 00:00:00+0000', tz='UTC'), 'type': 4})})
The average running time (10 loops) for this script is about 66 seconds which seems really long considering we are only fetching daily data and running a simple trading algorithm. We then try using local file instead of fetching from Yahoo Finance.
data = pd.read_csv('AAPL.csv', header=0, index_col=0, parse_dates = True) data.sort(inplace=True);data = data.tz_localize('UTC') #required to run data = data[data.index >= start];data = data[data.index <= end]
APPL.csv is the local file downloaded from http://ichart.finance.yahoo.com/table.csv?s=APPL. Sorting and localizing data is mandatory because zipline considers data as ascending timeline, and extracts data bar from that.
def handle_data(context, data): order('Close', 10) record(AAPL=data['Close'].price)
Then the data changes as follow:
BarData({ 'Volume': SIDData({'price': 151494000.0, 'volume': 1000, 'sid': 'Volume', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'Adj Close': SIDData({'price': 55.305234999999996, 'volume': 1000, 'sid': 'Adj Close', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'High': SIDData({'price': 421.58997, 'volume': 1000, 'sid': 'High', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'Low': SIDData({'price': 411.999977, 'volume': 1000, 'sid': 'Low', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'Close': SIDData({'price': 412.13998, 'volume': 1000, 'sid': 'Close', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4}), 'Open': SIDData({'price': 419.639992, 'volume': 1000, 'sid': 'Open', 'source_id': 'DataFrameSource-f6bfb478831d7581226e6a4507bc386b', 'dt': Timestamp('2011-09-21 00:00:00+0000', tz='UTC'), 'type': 4})})
* Note: We have to be careful with the volume field here. With this method, each data column (Open, Close, High, Low, Adj Close and Volume) is treated as individual instruments here and the ‘volume’ field is set 1000 as default. In backtest, the order is filled or cancelled based on the available market volume (please see this reference), so we need to change the ‘volume’ field set here.
The average running time is: 61 seconds which isn’t much better than load_bars_from_yahoo() we had tried before. Performance is in fact a known issue for the zipline library. Even though we use local data files, zipline also needs to fetch data from yahoo for the trading environment. This is due to the benchmark mechanism embedded in this library. e.g: get_raw_benchmark_data() function request to yahoo to get the data point for ^GSPC.
Of course, one can try to customize the code to use one’s own data rather than fetch data from other sources; however it requires a lot of effort. Jason Swearingen deals with this problems (stated in this post) by writing his own library called QuanShim, which supports Zipline and Quantopian. However, this is out-of-scope here.
Also, it is really difficult to deal with higher frequency trading data (hourly, minutes, tick data) here. In order to work with data outside of the provided benchmark date range, one can either:
(1) supply your own benchmark (look at this suggestion and answer for issue 271); or
(2) run without a benchmark and then don’t compute the risk metrics that require it (comment some code line in risk.py or benchmark.py). This is mentioned in the issue 13.
If your target market is US market, then zipline is a decent choice for a Python Backtesting library. But for backtesting different financial assets in all markets, zipline‘s lack of flexibility and slow running time will cause issues.
2. PyAlgoTrade:
We use the following simple script to demonstrate how PyAlgoTrade works compared to Zipline. PyAlgoTrade’s documentation can be found here, including tutorial and sample strategies. For fair comparison, let’s try the same strategy we did above:
from pyalgotrade import strategy from pyalgotrade.tools import yahoofinance instruments = ["AAPL"] class MyStrategy(strategy.BacktestingStrategy): def __init__(self, feed, instrument, useAdjustedClose = False): strategy.BacktestingStrategy.__init__(self, feed,cash_or_brk=100000) self.__instrument = instrument self.setUseAdjustedValues(useAdjustedClose) # We will allow buying more shares than cash allows. self.getBroker().setAllowNegativeCash(True) def onBars(self, bars): bar = bars[self.__instrument] self.marketOrder(self.__instrument, 10) # buy 10 self.info("BUY 10 %s, Portfolio value: %s" %(self.__instrument, self.getBroker().getEquity())) feed = yahoofinance.build_feed(instruments, fromYear=2000, toYear=2012, storage="data") # Evaluate the strategy with the feed's bars. myStrategy = MyStrategy(feed, instruments[0]) myStrategy.run() print "Final portfolio value: $%.2f" % myStrategy.getResult()
This is also pretty simple. The script obtains data from Yahoo, iterates using onBars(). Unlike zipline, PyAlgoTrade does not allow negative cash by default, so we must explicitly defined it.
Changing the feed to local file is very easy on PyAlgoTrade, which makes this library more suitable for paper- backtests than zipline. In the below example, we also use the data file downloaded from Yahoo.
# Load the yahoo feed from the CSV file from pyalgotrade.barfeed import yahoofeed feed = yahoofeed.Feed() feed.addBarsFromCSV(instrument="AAPL", path="AAPL.csv")
from pyalgotrade.barfeed import csvfeed from pyalgotrade.bar import Frequency filename = '../../data/gold/gold3_1.csv' feed = csvfeed.GenericBarFeed(Frequency.DAY,pytz.utc) feed.addBarsFromCSV('gap',filename)
One thing I like about PyAlgoTrade is that it is more flexible than zipline library for placing orders. Besides individual orders (eg: market, limit, stop, stop-limit order), PyAlgoTrade provide higher level functions that wrap a pair of entry/exit orders (eg: enterLong, enterShort, enterLongLimit, enterShortLimit interface).
PyAlgoTrade definitely provides more flexibility for placing orders. In most cases, we only work with the first 6 events i.e. onEnterOk, onEnterCanceled, onExitOk, onExitCanceled, onOrderUpdated and onBars.
However, PyAlgoTrade provides their own DataSeries and Bar classes, and these classes do not work with Pandas library. This is frustrating since Pandas is common to Data Analysis and modeling. Let’s look at the bars define in each iteration:
<class 'pyalgotrade.bar.BasicBar'> ['_BasicBar__adjClose', '_BasicBar__close', '_BasicBar__dateTime', '_BasicBar__frequency', '_BasicBar__high', '_BasicBar__low', '_BasicBar__open', '_BasicBar__useAdjustedValue', '_BasicBar__volume', '__abstractmethods__', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getstate__', '__hash__', '__init__', '__metaclass__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '_abc_registry', 'getAdjClose', 'getAdjHigh', 'getAdjLow', 'getAdjOpen', 'getClose', 'getDateTime', 'getFrequency', 'getHigh', 'getLow', 'getOpen', 'getPrice', 'getTypicalPrice', 'getUseAdjValue', 'getVolume', 'setUseAdjustedValue']
With lack of support for Pandas, you will likely spend more time learning PyAlgoTrade than zipline libray. Zipline provides a simple interface, and familiar datatype (Pandas) so the user can focus on the strategy itself, rather than take time working with other technical plumbing.
However, compared to zipline, PyAlgoTrade clearly outperforms in terms of running time. With the same algorithm, the average running time is only 2 seconds while the zipline script above takes about a minute.
Summary of Zipline vs PyAlgoTrade Python Backtesting Libraries
I would likely to rating these 2 Python Backtesting Libraries as follows:
Zipline | PyAlgoTrade | Description | |
Paper-Trading | ♦ | ♦ ♦ ♦ | Zipline doesn’t seem to work for non-US and local data, while PyAlgoTrade works with any type of data |
Real-trading | ♦ ♦ | ♦ ♦ | Both good but cloud programming in Quantpian is really impressive |
Flexibility | ♦ ♦ | ♦ ♦ ♦ | PyAlgoTrade supports higher level order types and more events in transactions. Zipline, on other hand, provides simple Slippage model |
Speed | ♦ | ♦ ♦ ♦ | Zipline is really slow compared to PyAlgoTrade. |
Ease of use | ♦ ♦ ♦ | ♦ ♦ | PyAlgoTrade does not support pandas. |
Each Python Backtesting library has its own strengths and weaknesses, and a lot of interesting functions which I didn’t bring up in this article. So I would suggest you choose the most suitable one based on what your requirements are and the pros and cons mentioned above.
[…] Python Backtesting Libraries For Quant Trading Strategies [Robust Tech House] Frequently Mentioned Python Backtesting Libraries It is essential to backtest quant trading strategies before trading them with real money. Here, we review frequently used Python backtesting libraries. We examine them in terms of flexibility (can be used for backtesting, paper-trading as well as live-trading), ease of use (good documentation, good structure) […]
[…] Python Backtesting Libraries For Quant Trading Strategies Great write-up comparing the various python frameworks out there… Python Backtesting Libraries For Quant Trading Strategies […]
Leave a Comment
Where do see pyalgotrade supporting Interactive Brokers?
You are right. I think article just updated to state pyalgotrade does not support IB
Woud you be willing to include “backtrader” in your comparison? (www.backtrader.com)
I’m so happy to read this. This is the kind of manual that needs to be given and not the random misinformation that’s at the other blogs. Appreciate your sharing this best doc.
I’m really enjoying the design and layout of your blog. It’s a very easy on the eyes which makes it much more pleasant for me to come here and visit more often. Did you hire out a designer to create your theme? Exceptional work!
Great site you have here.. It’s hard to find quality writing like yours nowadays. I honestly appreciate people like you! Take care!!
It’s very simple to find out any topic on net as compared to books, as I found this post at this web page.
Awesome post. I’m a regular visitor of your web site and appreciate you taking the time to maintain the nice site. I’ll be a regular visitor for a long time.
I think this is among the most important info for me. And i’m glad reading your article.But want to remark on few general things, The web site style is wonderful, the articles is really nice.
You can definitely see your skills within the work you write. The arena hopes for even more passionate writers like you who are not afraid to say how they believe. At all times follow your heart. https://solarmoviesc.to/
awesome info on Python. This is a good resource for individuals involved with this programming language. Muama enence translator reviews
123movies. The arena hopes for even more passionate writers like you