Categorizing Cryptoassets: A Return-Driven Cluster Analysis

What does cluster analysis reveal about Bitcoin, Ripple and other large digital assets?
Binance Research (Etienne)

KEY TAKEAWAYS

  • Using a correlation matrix along with hierarchical clustering, digital assets can be grouped into several sub-segments.
  • Based on weekly returns, large cryptoassets such as Bitcoin and Ethereum exhibit the highest correlations, but Ripple displays a lower correlation than in our previous report and is an exception as the best diversifier amongst digital assets with a market cap above $3 billion.
  • Bitcoin forks (Gold and Cash), Ethereum Classic and Litecoin form a single cluster whereas other potential groups around the following effects:
    • “Binance effect”: Tezos and Dogecoin, two assets not listed on Binance, each form a single child cluster.
    • Potential geographical effects such as a dichotomy between American and Asian cryptoassets.
    • “Coinbase listing effect”: some assets that were reported to be listed or investigated by Coinbase seemed to belong to similar clusters.
    • Some privacy coins (Dash and Monero) form a single cluster.
  • On the other hand, performing K-Means clustering on risk-return profiles of each cryptoasset did not return any meaningful results. One potential explanation is that return & volatility profiles are not related to underlying price co-movements over the study period.

In a previous report, we used a cross-sectional method to analyze the internal correlations of the cryptoasset market to observe the cyclical patterns of various assets. We found that low internal correlations between cryptoassets are often due to idiosyncratic factors, in addition to a coin’s consensus mechanism and a potential “Binance Effect”.

However, the overall correlation observed across the cryptoasset market has increased, which may be due to the rise of stablecoin volume, and corresponding increase in pair offerings, in all cryptoasset markets. In this report, cluster analysis over the twelve previous trading months (March 2018 to March 2019) was conducted to distinguish whether cryptoasset clusters, constructed using an unsupervised learning approach, can be created and more importantly, interpreted.

1. Methodology

Cluster analysis is a technique used to group sets of objects that share similar characteristics. It is common in statistics, but investors will use the approach to build a diversified portfolio. Stocks that exhibit high correlations in returns fall into one basket, those slightly less correlated in another, and so on, until each stock is placed into a category.”

1.1 Dataset

For the top 30 cryptoassets by market capitalization, USD-equivalent prices were retrieved as reported by CoinMarketCap.

Stablecoins are excluded from the analysis along with any cryptoasset that is backed by other assets, whether they are digital or physical1. Specifically, 30-day rolling average market capitalizations were computed and the 30 “non-backed” largest cryptoassets (as of March 31st 2019) were selected.

The data collection period covers a full year between March 31st, 2018 and March 31st, 2019.

1.2 Algorithm selection

Cluster algorithms are one of the sub-segments of unsupervised learning algorithms. The table below highlights key differences between two of the most common cluster algorithms, k-means clustering and hierarchical clustering2.

Table 1 - Comparison between K-Means and Hierarchical clustering methods
K-MEANS CLUSTERINGHIERARCHICAL CLUSTERING
Size of the dataset?Large datasetsSmall datasets
Selection of amount of clusters?Predefined manuallyAutomatic
Partitionings?SingleMultiple
Approach?HeuristicBottom-up (Agglomerative) Top-down (Divisive)

1.2.1 Hierarchical clustering methodology

  1. Data is pre-processed using a correlation matrix based on 1-year weekly returns, thus representing 52 observations. This correlation matrix captures variable interactions in a normalized way. As a result, it simplifies calculations by inherently eliminating a lot of irrelevant information.
  2. A dendrogram illustrates the composition of each cluster by drawing a U-shaped link between a non-singleton cluster and its children. The height of the top of the U-link represents the distance between his groups of children but also the cophenetic distance between the original observations in the two groups of children.
  3. Euclidean distance (i.e. L2 distance) is used and the linkage method relies on the “Ward” method which minimizes within-cluster variance.

1.2.2 K-means clustering methodology

  1. Weekly returns are calculated for the largest 30 assets by market capitalization.
  2. From these weekly returns, the annualized volatility is computed along with the annualized average return. equation1
  3. Data is pre-processed, using a feature scaling, such that each single value is normalized based on the variance (sigma) and mean(mu): equation2
  4. Eventually, the analysis relies on a three-process methodology:
    1. Initialization: k-initial centroids are generated randomly.
    2. Assignment: k-clusters are created by matching each observation with the nearest centroid.
    3. Update: the centroid of the clusters becomes the new mean.
  5. The second and third steps above are repeated until they converge to a solution that minimizes the sum of squared errors between points and their respective centroids.
  6. The k-optimal amount of clusters is selected based on the “elbow curve” methodology that selects an optimal value that minimizes the distance to the centroid for each center while minimizing the amount of clusters.

2. Results

2.1 Selection of assets

The eligible assets based on the methodology described in 1.1 results in the following digital assets being selected are the following:

Top 10 cryptoassets selected:

Bitcoin (BTC), Ethereum (ETH), Ripple (XRP), Litecoin (LTC), EOS, Bitcoin Cash (BCHABC), Binance Coin (BNB), Stellar (XLM), Tron (TRX), Cardano (ADA).

Top 11-20 cryptoassets selected:

Monero (XMR), IOTA, DASH, Maker (MKR), NEO, Ethereum Classic (ETC), Ontology (ONT), NEM, Tezos (XTZ), ZCash (ZEC).

Top 21-30 cryptoassets selected:

Waves (WAVES), Basic Attention Token (BAT), Dogecoin (DOGE), Bitcoin Gold (BTG), Qtum (QTUM), OmiseGo (OMG), Decred (DCR), Lisk (LSK), ChainLink (LINK), 0x (ZRX).

Chart 1 - 30-day 10 largest average market capitalizations (USDbn) as of March 31st 2019

chart1

Bitcoin, Ethereum and Ripple account for most of the industry total market capitalization.

Chart 2 - 30-day 11-30 largest average market capitalizations (USDbn) as of March 31st 2019

chart2

2.2 52-week returns correlation matrix

Figure 1 - Weekly return correlation matrix

figure1

As explained in the first section, the top 30 digital assets, based on the 30-day mean market capitalization as of March 31st 2019, were considered for this analysis.

As highlighted in our previous reports:

  • Correlations are extremely high among large-cap digital assets.
  • Ether and Bitcoin also exhibited an extremely high correlation (0.872) between each other.
  • POW assets exhibited higher correlations with each other than with non-POW assets.
  • Observation of a potential “Binance effect”: Tezos and Dogecoin, the only two assets not listed on Binance, exhibited lower correlations with other cryptoassets.

However, a few additional observations were noted:

  • Dogecoin (DOGE), Tezos (XTZ), Ripple (XRP) exhibited the lowest correlations with other digital assets across this one-year period. Notably, Ripple is less correlated in the long-term than what our previous analysis suggested across several 3-month time-periods, using daily returns.
  • Ripple is highly correlated with Stellar (0.73). While Stellar was initially built on the Ripple protocol, its code was forked quickly forked and revamped. As of today, Stellar and Ripple code does not rely on the same common core. Yet these two digital assets still share several similarities as they both aim to "reshape the global remittance industry."

2.3 Hierarchical clustering results

Figure 2 - Dendrogram based on correlation matrix (based on squared euclidean distances)

figure2

Based on the above dendrogram, some clusters seem to share similar characteristics such as:

  • Geographical affinities: the popularity of the coins in specific countries and the team’s own location could affect the clusters. For instance, Qtum (QTUM), Cardano (ADA), NEO and OmiseGo (OMG) are projects based in Asia, and most of their coin-holders are located in this region3. Similarly, Ripple (XRP), Basic Attention Token (BAT) or Dogecoin (DOGE) are digital assets where most of the teams and investors are located in America.
  • Privacy coins belong to the same sub-cluster such as DASH and Monero (XMR).
  • Stellar (XLM) and NEM are payment systems and they ultimately form a single cluster.
  • “Coinbase listing effect”: Ripple (XRP) and Basic Attention Token (BAT) are two digital assets that got listed on Coinbase over the study period. Furthermore, Zcoin (ZEC) was listed on Coinbase in November 2018, and followed by Stellar (XLM) and Maker (MKR) a few months after4. These three digital assets reside in a common sub-group as well, supporting the notion that depending on the timing of listings on the same exchanges, coins may exhibit similar trends in the same market conditions .
  • Hard forks and “code forks”: Litecoin (LTC), Ethereum Classic (ETC), Bitcoin Cash (BCHABC), Bitcoin Gold (BTG) all share a common history, whether it is on the chain or not, with the top 2 largest digital assets: Ethereum (ETH) and Bitcoin (BTC).
    • While Litecoin is not a fork of Bitcoin, its code was initially forked with very little changes, from the Bitcoin Github repository. On the other hand, Bitcoin Cash and Bitcoin Gold were forked from Bitcoin.
    • Conversely, Ethereum Classic and Ethereum shares the same genesis block. Whereas Ethereum Classic may be the original chain, Ethereum has become, by far, the most used and active of the two.
    • Here, Bitcoin Cash is in the same child cluster with Bitcoin Gold whereas Litecoin is grouped with Ethereum Classic.
  • Binance Coin represents its own child group and first-degree parent group but is in a 2nd degree parent group with EOS, Tron (TRX), Lisk (LSK) and Decred (DCR).
  • Potential “Binance effect”: Dogecoin (DOGE) and Tezos (XTZ), the only two digital assets not listed on Binance, are each their only component in their child group.
  • “Largest market capitalization”: Bitcoin (BTC) and Ethereum (ETH), the two largest digital assets, belong to the same sub-cluster.

However there are clear limitations to any of the interpretation above:

  • The timing of the listings on Coinbase does not match. For example, Ripple(XRP) and Basic Attention Token(BAT) were listed more than five months apart.
  • Matches of Misfits. Waves (WAVES), Ontology (ONT) and Tezos (XTZ) fundamentally, share little in common. By default, they were grouped together, as they all exhibited lower correlations with other assets.
  • IOTA is grouped with Bitcoin (BTC) and Ethereum (ETH), but its market capitalization is not close to any of them. However it is arguable that:
    • Bitcoin: the first Blockchain 1.0 for digital money
    • Ethereum: the first Blockchain 2.0 with the introduction of smart contracts
    • IOTA could be seen as the first Blockchain 3.0 for the Internet of Things
  • If EOS, Tron (TRX) and Lisk (LSK) are all on a similar market same segment (“blockchains with smart contracts”), they don’t have much in common with Decred (DCR) that defines itself as an “autonomous digital currency”.

2.4 K-means analysis

Figure 3 - Elbow curve - Selecting the optimal amount of clusters

figure3

Based on the above figure, the optimal amount of clusters appears to be around 6, as the marginal improvement in the total within-cluster sum of squared distances becomes very small beyond 6 clusters.

As a result, six clusters are selected.

Table 2 - Risk-return profiles for first cluster (A)
BNBBATMKR
Annualized return (%)134242186
Annualized volatility (%)91132116
ClusterAAA
Table 3 - Risk-return profiles for second cluster (B)
BTCLSKDCROMGQTUMBTGZECNEMETCNEOXLMETHXMRLTCDASH
Annualized average return (%)-26-58-18-55-58-44-47-57-43-61-4-35-50-19-38
Annualized volatility (%)651029711611710698100941101131069694104
ClusterBBBBBBBBBBBBBBB

Cluster B is constituted by digital asset with high volatility but negative average returns over the period.

Table 4 - Risk-return profiles for the 4 other clusters (C-F)
XRPEOSDOGEIOTATRXXTZADAWAVESZRXBCHABCLINKONT
Annualized average return (%)299570-3355-81632910420214
Annualized volatility (%)136152150130132129128126119192143208
ClusterCCCCCCCCCDEF

Based on a traditional risk-return attributes of cryptoassets, many assets share the same risk-return profiles (clusters B and C). Cluster C is consistuded assets with extremely high volatility but annualized returns either positive or slightly negative (the exception being IOTA).

Clusters D, E and, F are made of one single digital asset such as:

  • Cluster D is constituted by Bitcoin Cash (BCHABC): the second highest-volatility asset in the group but with an extremely small positive return. This risk-profile is explained by the Bitcoin Cash fork (in November 2018) that led to the creation of Bitcoin SV (BSV/BCHSV), creating a common event in the coins’ mutual histories.
  • ChainLink (LINK) forms cluster E.
  • Ontology (ONT) forms cluster F5.

In general, the drawback of this method is that two assets might belong to the same cluster without having any correlation. Precisely, two assets may display a negative correlation while simultaneously sharing an exact same risk-return profile within the market, resulting in them being part of the same cluster.

In the future, our approach must be extended to include additional characteristics such as: market capitalization, volume and turnover ratio.

Native blockchain metrics such as hashrate, active addresses, on-chain transactions or amount of active nodes could also be used as inputs in cluster analysis to analyze what are the underlying dynamics among cryptoassets.

3. Conclusion

Cluster analysis is an unsupervised learning technique that provides flexibility in classification of objects in groups without introducing human bias. For digital assets, cluster analysis has not been extensively studied in research and this report represents one of the first attempts at classifying cryptoassets from an unsupervised approach.

Hierarchical cluster analysis revealed potential groups of cryptoassets such based on characteristics such as asset function (e.g. privacy tokens), chain history (e.g. Bitcoin forks) or a potential dichotomy between Asian and US-based cryptoassets. Eventually the potential existence of a “Binance effect” strikes again along with a newer potential effect related to Coinbase-related news (e.g. Ripple, Basic Attention Token).

Whereas some of the results appear to be consistent with industry-defined fundamental approaches, the difficulty of finding trustworthy data may hold investors back from completing a thorough analysis on this topic. In comparison, traditional equity markets offer plenty of metrics (e.g P/E ratio, turnover, ROE) that are routinely used in research reports.

Additional cluster analysis of the digital asset industry could be performed from different perspectives, and further research on different crypto market cap segments (e.g. mid and small caps) with alternative inputs such as hashrate or on-chain transactions may help paint a more complete picture for the cryptoasset market as a whole.

  1. Indices, stablecoins or cryptoassets claiming to be backed by collateral (e.g. gold) are excluded from this analysis.
  2. http://www.cs.utah.edu/~piyush/teaching/4-10-print.pdf
  3. For instance, Cardano’s ICO was aimed at Japanese investors who bought up to 95% of the total issue. Furthermore, Emergo is based in Tokyo, Japan. https://www.worldcryptoindex.com/cardano/ https://emurgo.io/#/ja
  4. https://blog.coinbase.com/stellar-lumens-xlm-now-available-on-coinbase-37bc730ec79a
  5. As Ontology weekly returns are not normally distributed (with large extreme positive values), the annualized average return is strongly biased upward.