In the tutorial How To Perturb Your Simulation With The Random Function, I described a method of extracting statistics from a simulation that I believe are more realistic than the typical backtest consisting of one run, and results based on either conscious or subconscious optimization. My method involves adding randomness into the buying decision, causing lower ranked stocks to sometimes be bought instead of the highest ranked stocks. Then multiple simulation runs are performed using the Optimizer, and the resulting statistics are averaged. I will now refer to these results as "sober statistics". (Folks, remember that you heard this term here first!) In this post, I will expand on this technique and explain why it is important during the system development process, and end-product evaluation.
Financial software tools provided by Portfolio123,
After I posted about perturbing your simulation, there were questions raised as to why I go to the trouble of using sober statistics, when the same results can be obtained by simply doubling the number of stock holdings and running the simulation once. My answer to that observation is that it isn't correct.
First of all, when you double the number of stock holdings, from 10 to 20 stocks for example, the general characteristics of the portfolio will change. The annualized returns, volatility and maximum drawdown will all decrease. While the annual returns will decrease in a fashion similar to the sober annualized returns, decreasing volatility and drawdown are an undesirable trait. In other words, if the end result is a 10 stock portfolio then you should restrict your simulation to 10 stocks if you want results that are representative of a 10 stock portfolio.
Shown below are the Probability Density Functions for the 20 stock simulation versus the 10 stock perturbed simulations.
While the 20 stock simulation has 100% certainty that the 20 top ranked stocks will be purchased, the 10 stock perturbed simulation does not, but instead has a binomial distribution of probable purchase. The more simulation runs that are performed and averaged, the closer the stock weightings in the sober statistics will reflect the probability density function. Thus, the sober statistics will have a crude gaussian average applied throughout.
Is the averaging too extreme? I don't believe so. A quick back of the napkin calculation shows that for a 10 stock S&P 500 simulation, 97% of the stock representation will come from the top 10% ranked stocks. If you need a top 2% system to get your desired performance as opposed to a top 10% system, then you should probably reconsider your goal because it is likely not achievable.
Lets have a look at a simple example, based on the perturbing your simulation post. The ranking system used here is the SiRatio stock factor (not the adaptive version). Stocks with a ranking above a specified threshold are shuffled to the bottom of the ranks. The charts below show the annualized returns and maximum drawdown for both the 20 stock single simulation and 10 stock perturbed simulations.
As can be seen from the Annualized Return chart, the 20 stock simulation plot is significantly more volatile than the perturbed 10 stock plot. This is due to the fact that the gaussian weighting is not applied to the 20 stock version. This leads me to believe that the sober statistics are more reliable.
The maximum drawdown is worse for the 10 stock perturbed simulations than the 20 stock simulations, in 5 of 7 cases. This suggests that the 20 stock simulations do indeed understate maximum drawdown.
Now, to end this post, I would like to answer the question "Why produce sober statistics at all?" The answer is that we human beings have a tendency to migrate towards the best looking simulations, even though we don't have a clue what would happen if the best trades didn't actually occur, or were based on an anomaly, or even noise. Do you want to develop your portfolios that way? Do you want to invest your life savings in a car without checking under the hood?
And by the way, sober statistics are not the holy grail. If your ranking system is extremely optimized, you can take a cold shower and drink a cup of coffee, but your statistics won't sober up.