A Winning Portfolio Selection Algorithm

The subject of stock market prediction is of great interest to many, but rather than trying to predict what will happen next, is it possible to profit from the existing statistical relationships in the market? We will examine an algorithm created by Borodin, El-Yaniv, and Gogan called the anticor algorithm that claims to be able to do just that. The function of the anticor algorithm is that of sequential portfolio rebalancing. Perhaps the most widely known algorithm of this type is the mean variance portfolio optimization method from Markowitz . The idea from Markowitz's method is to minimize the variance of the portfolio by placing sufficiently large weights on uncorrelated stocks. A similar concept is used here, hence the reference to the word anti-correlation.

Earlier we had mentioned the use of statistical relationships to profit from the market. The core of the principle requires that stocks with similar long term growth rates occasionally have larger return rates followed by smaller rates. This phenomenon is a type of mean reversion and is a key component of the anticor algorithm. To see why this would work consider a trivial example of a constant rebalancing strategy with a portfolio of two stocks over a period of four days. Assume that the price relatives (or current day price level divided by previous day price level) for $\frac{\text{stock} \ 1}{\text{stock} \ 2}$ are as follows:

\begin{equation} \frac{1}{1}, \ \frac{1}{2}, \ \frac{2}{1}, \ \frac{1}{1} \end{equation}

Assuming the weights of $\frac{.5}{.5}$ the returns by day are:

\begin{equation} 1, \ \frac{3}{2}, \frac{3}{2}, \ 1 \end{equation}

In this trivial example, our strategy has yielded a return of $\frac{9}{4}$ despite the fact that the stocks have essentially been flat from day one to day four.

At a high level, the algorithm separates the data into two windows and compares the growth rates in the second window as well as the correlations over each window. Let us begin by looking at the code for defining the two windows. Formally, $LX_{1}$ and $LX_{2}$ (windows 1 and 2) are defined as:

\begin{align} LX_{1} &= \log(x_{t-2w+1}), \cdots, \log(x_{t-w})^{T} \\ LX_{2} &= \log(x_{t-w+1}), \cdots, \log(x_{t})^{T} \end{align}

Where $x$ is the market vector of relative prices, $t$ is the index of the last trading day and $w$ is the window size. In our implementation, we found that the results sometimes improve if the log transformation is omitted. We included a boolean switch that can be passed to the function to turn this feature on or off. Since the anticor algorithm is a heuristic algorithm, we are not violating any theoretical assumptions by removing this transformation. Additionally, the log transformation is applied to the price relatives and not the actual price levels. More research could be done as to whether or not this finding is consistent across other equities and the potential causes for this difference.

#Retrieve data from the appropriate windows and convert the returns data to logs.  Interestingly, the algorithm sometimes works better without the log transformation.  The lt switch passed to the function determines whether we use this or not.
            
            if lt:
                LX1 = np.log(x[t-(2*w) + 1:t - w + 1,:])
                LX2 = np.log(x[t-w + 1:t + 1,:])
            else:    
                LX1 = x[t-(2*w) + 1:t - w + 1,:]
                LX2 = x[t-w + 1:t + 1,:]

Next we calculate the mean and standard deviation of each stock for each window, along with the tensor product of the standard deviations from windows 1 and 2:

#Calculate the mean and standard deviation of each column. mu1, sig1 is the mean, standard deviation of each stock from window 1. mu2, sig2 is the mean, standard deviation of each stock from window 2. sigma is the tensor product of sig1 and sig2 used for the correlation calculation below (mxm)
            
            mu1 = np.average(LX1, axis=0)
            sig1 = np.std(LX1, axis=0, ddof=1)
            mu2 = np.average(LX2, axis=0)
            sig2 = np.std(LX2, axis=0, ddof=1)
            sigma = np.outer(np.transpose(sig1),sig2)

We then calculate the cross-correlation matrix between $LX_{1}$ and $LX_{2}$, as defined by:

\begin{align} M_{cov}(i,j) &= \frac{1}{w - 1}\left(LX_{1}(i) - \mu_{1}(i) \right)^{T} \left(LX_{2}(j) - \mu_{2}(j) \right) \\ M_{cor}(i,j) &= \begin{cases} \frac{M_{cov}(i,j)}{\sigma_{1}(i) \sigma_{2}(j)} \ \ \sigma_{1}(i), \sigma_{2}(j) \neq 0 \\ 0 \ \ \text{otherwise} \end{cases} \end{align}
#Covariance matrix is the dot product of x - mu of window 1 and window 2 (mxm)
            
            mCov = (1.0/np.float64(w-1)) * np.dot(np.transpose(np.subtract(LX1,mu1)),np.subtract(LX2,mu2)) 

#Correlation matrix is mCov divided element wise by sigma (mxm), 0 if sig1, sig2 = 0

            mCorr = np.where(sigma != 0, np.divide(mCov,sigma), 0)

In the next section, we calculate what the authors [1] called the claims matrix which will ultimately serve as an input in determining how much we will transfer from stock $i$ to stock $j$. If the average return of $i$ in the second period is greater than the average return of $j$ and the correlation is positive between windows, then we add the correlation between $i$ and $j$ to the claims matrix. Also, if either $i$ or $j$ has been negatively correlated with itself between windows, this correlation is subtracted from the corresponding element of the claims matrix. In essence, we are looking for the condition where $i$ and $j$ are positively correlated with each other, but negatively correlated with themselves between consecutive windows and if $j$ is under performing $i$, then more weight is transferred to $j$. In the interest of efficiency, we chose a vectorized implementation instead of using traditional control statements. The result of this is that we are often multiplying several boolean matrices to capture the multiple criteria.

#Take the Hadamard product of the correlation matrix, the boolean matrix comparing mu2[i] to mu2[j] and the boolean matrix where correlation matrix is greater than zero
            
            claim = np.multiply(mCorr,np.multiply(mCorr > 0,mu_matrix))             

#The boolean claim matrix will be used to obtain only the entries that meet the criteria that mu2[i] > mu2[j], and mCorr is > 0 for the i_corr and j_corr matrices
            
            bool_claim = claim > 0
            
#If stock i is negatively correlated with itself we want to add that correlation to all instances of i.  To do this, we multiply a matrix of ones by the diagonal of the correlation matrix row wise
            
            i_corr = np.multiply(np.ones((mu1.shape[0],mu2.shape[0])),np.diagonal(mCorr)[:,np.newaxis])  
            
#Since our condition is when the correlation is negative, we zero out any positive values.  Also we want to multiply by the bool_claim matrix to obtain only valid entries 
            
            i_corr = np.where(i_corr > 0,0,i_corr)
            i_corr = np.multiply(i_corr,bool_claim)
            
#Subtracting out these negative correlations is essentially the same as adding them to the claims matrix
            
            claim -= i_corr
            
#We repeat the same process for stock j except this time we will multiply the diagonal of the correlation matrix column wise
            
            j_corr = np.multiply(np.ones((mu1.shape[0],mu2.shape[0])),np.diagonal(mCorr)[np.newaxis,:]) 

#Since our condition is when the correlation is negative, we zero out any positive values again multiplying by the bool_claim matrix
           
            j_corr = np.where(j_corr > 0,0,j_corr)
            j_corr = np.multiply(j_corr,bool_claim)            

#Once again subtract these to obtain our final claims matrix            

            claim -= j_corr           

Finally, we calculate the new portfolio weights by first calculating the wealth transfer matrix. The wealth transfer matrix is computed by first taking the claims matrix and dividing it by the sum of the claims matrix along the rows. We then multiply this matrix by the prior portfolio weights row wise and this yields our final wealth transfer matrix. To get the new portfolio weights we subtract the $i \rightarrow j$ element of the wealth transfer matrix from the prior portfolio weights and add the $j \rightarrow i$ element of the wealth transfer matrix.

\begin{align} b_{t+1}(i) = b_{t}(i) + \sum_{j \neq i} \left[ \text{transfer}_{j \rightarrow i} - \text{transfer}_{i \rightarrow j} \right] \end{align}
#Create the wealth transfer matrix first by summing the claims matrix along the rows
                  
            sum_claim = np.sum(claim, axis=1)
            
#Then divide each element of the claims matrix by the sum of it's corresponding row
            
            transfer = np.divide(claim,sum_claim[:,np.newaxis])
            
#Multiply the original weights to get the transfer matrix row wise

            transfer = np.multiply(transfer,b[:,np.newaxis])
            
#Replace the nan with zeros in case the divide encountered any
            
            transfer = np.where(np.isnan(transfer),0,transfer)                        
 
#We don't transfer any stock to itself, so we zero out the diagonals     
      
            np.fill_diagonal(transfer,0)
                        
#Create the new portfolio weight adjustments by adding the j direction weights or the sum by columns and subtracting the i direction weights or the sum by rows
                        
            adjustment = np.subtract(np.sum(transfer, axis=0),np.sum(transfer,axis=1))
            
#Finally, we adjust our original weights and we are done
            
            b += adjustment

The data set that we used was the adjusted closing prices for all of the major SPDR sector ETFs from December 22nd, 1998 to June 19th 2015. The symbols are: XLB, XLE, XLF, XLI, XLK, XLP, XLU, XLV, and XLY. Typically, one of the inherent risks in deploying a strategy that assigns more weight to the under-performing stock is the risk that the under-performing stock experiences a material event or becomes insolvent. However, we can mitigate this risk by using ETFs which are essentially a portfolio of stocks that are occasionally re-balanced.

The success of many strategies is often conditioned on some type of optimization parameter, and the anticor algorithm is no exception. The authors labeled the window size as a critical parameter, and our findings show that this is indeed the case. The following chart shows the return of the algorithm for a \$1,000 initial investment over different window sizes ranging from 2 to 50 days. Coincident with other findings, the shorter windows appear to have better performance with an oscillating degradation as the window size increases , .

Windows Analysis

We chose the window size of 6 and the results we achieved are nothing short of exceptional. The graph below starts with an initial capital value of \$20,000 at the end of 1998. By June of 2015, the equity curve of the strategy reaches an astounding \$560,445 or 2,802% return. This translates to a 34.2% annualized return with a Sharpe ratio of .94.


While the results so far are very impressive, they are not realistic for many of us. Transaction costs have not been factored in and many profitable trading strategies become unprofitable once market friction and fees are considered. We will assume a commission rate of \$.01 per share (\$1 for every 100 shares). While it is true that the assumed commission rate of \$.01 per share would have been unrealistic for 1999, the current commission rates from www.lightspeed.com and www.interactivebrokers.com indicate an even more favorable commission rate of only \$.0045 per trade and \$.005 respectively.

What we find is that it is not how much the commissions add up to, but rather how they are taken out that makes the most difference. By taking out a small portion of the earnings every day, the cumulative growth effect of the portfolio is diminished substantially. With only a \$.01 commission, our ending balance drops to only \$94,938 with a 475% overall return and 14.7% annualized return. What is even more interesting is that by decreasing our commission rate by roughly half of that at \$.0045, our ending balance increases to \$252,203 with a 1,261% overall return and 25.1% annualized return. It is no wonder that well capitalized firms pay hundreds of thousands and even millions of dollars for exchange membership in order to get even lower commission rates. Since this is an end of day strategy, membership at one exchange should be adequate, but one consideration for intraday traders would be that since the markets are so fragmented, there is no guarantee that the member exchange is the exchange that has the National Best Bid and Offer (NBBO) . In essence one could save on the transaction cost, but ultimately end up paying more by receiving a worse price.

Analysis of Commissions

In closing, the anticor algorithm illustrates that it is possible to profit from the market without directly trying to make predictions about the market direction. Rather, we used the mean reversion properties of the market to derive an optimal portfolio of stocks. Our findings, as well as others [1], [3], [4] have demonstrated the remarkable results that can be achieved without any prior knowledge of future prices. The key caveat with parametric algorithms is to find the correct parameters, and we have found that the anticor algorithm performs better with window sizes of about one week depending on the dataset. We have also shown that even marginal transaction costs can have an overwhelming effect on the profitablility of even low frequency trading strategies. The best performing ETF in our portfolio achieved a return of 431% in our timeframe versus the 2,802% achieved by the anticor algorithm. Combined with an appropriate commission structure, the anticor algorithm clearly deserves the title of "a winning portfolio selection algorithm."

All of the code used in this paper is available via github at https://github.com/t6labs/anticor/.


Download as PDF