Crypto Code Watch

Frequently Asked Questions

What is this site about?

This site reports open source development activity of popular cryptocurrency projects on GitHub.

What about cryptocurrencies that don't release open source software?

We recognize that open source software may not make sense for some cryptocurrency models. Our aim is to make information available for projects that do embrace open source software development.

What do the symbols mean?

Stars ( ) generally denote the popularity of software repositories. GitHub users star projects of personal interest. When users star a repository, it can act as a personal bookmark. A GitHub user may star any particular repository only once.

A fork ( ) is a copy of another repository of code, managed by a different user or organization than the original repository. Forks are typically created when a user or organization wishes to change an existing codebase. Sometimes the changes are incorporated (merged) back into the original repository. In other cases, the forked code becomes an entity of its own. An example of the last case is LiteCoin, which is a fork of Bitcoin. The number of forks can indicate how many developers interact with a repository's code, either for personal use or for extending the original codebase.

Subscribers () watch repositories to receive realtime notifications about new activity in a codebase. Subscribers tend to be actively interested in new developments in the codebase.

How do you calculate aggregate metrics? Aren't forks problematic?

The main page shows aggregate metrics over multiple repositories managed by a single user or organization. The aggregates include: (a) code additions and deletions, (b) commit activity, (c) contributors, (d) stars, (e) forks, (f) subscribers, and (g) the last updated time. For contributors (c), GitHub provides up to 100 contributors per repository. When we see a repository with 100+ developers, we sum 100 developers to the total count and add a '+' to indicate 'possibly more'. Since a single contributor may contribute to multiple repositories, it is important to remove duplicate cases in our aggregate count (which we do). So, the number of developers is conservatively reported as being at least that amount of unique developers. For (g), we use the most recent repository of all repositories for a particular organization, including forks.

Inclusion and exclusion of forks in the aggregate data generally depends on the time window that we aggregate over. More recent aggregates include forks (i.e., the last 24 hours or 7 days), while we exclude forks for older historic data. This is to ensure that, in general, we do not include forked activity (e.g., a year's Bitcoin activity) for recent cryptocurrencies forking Bitcoin. At the same time, we want to include new, independent developments for forked repositories of derivative cryptocurrencies. As another example, it would be unfair to exclude code additions or stars that are specific to LiteCoin's forked BitCoin repositories. While imperfect, selective inclusion gives a more balanced view of development activity on an individual codebase than either including all or none of fork data in the aggregate.
Including some metrics only for a recent time window makes sense where the effect of including historic, unrelated activity is minimized, while attributing fresh activity of a forked repository to the appropriate project.

Commits (b): By default, we include commits to forks in the aggregate only for the last 24 hours and 7 days, and exclude commits to forks within the last year.

Contributors (c): Similar to commits, we include contributor activity on forks only in the last 7 days, and exclude the all time count of developers.

Code changes (a), stars (d), and subscribers (f) include forks. Code additions and deletions (a) include forks since these concern only the most recent 7 days of development; unless a repository fork is fresh (i.e., within a given week) it does not contribute to the aggregate counts. Stars (d) and subscriber counts (f) of forks are included, since these reset to 0 when a repository is forked. The current counts, as reported on GitHub, are unique to the owner.

Forks of forks (e) are excluded: the number of forks of a fork is not reset to 0 when the repository is forked. Instead, the number of forks is carried over from the original forked project. Since GitHub does not expose the number of forks for a project outside the original one, we do not to include it in the aggregate.

Our intention is to be honest and fair about reflecting activity of open source contributions for a unique cryptocurrency. On a case-by-case basis, we may manually revise and adjust inclusion and exclusion of repositories depending on project maturity and development behavior. At the moment we have two case-specific rules: (1) exclude cloned and comitted 'nixpkgs' repositories (see a problematic case here where it doesn't make sense to include nixpkgs) and (2) include all metrics for the core (forked) repository for litecoin.

Regardless of whether you agree with our aggregation choices, raw data for all individual repositories can be judged in the detailed view, without aggregation applied.

Do you include commit data, changes, etc., in feature branches?

Currently, we do not: we only consider the default branch of a repository. It is currently prohibitively expensive to process all branches of all repos, but this may change in the future.

I see some repositories are updated less than a day ago, but no commits or changes were made. What gives?

An update can things like a change in a repository's description or wiki pages. We count such activities as updates.

Do these metrics reveal anything about software quality?

Many commits a good project does not (necessarily) make. Research in evaluating software quality is a longstanding and open problem [1, 2, 3, 4]. Metrics such as lines of code and commit history may impact qualitative attributes such as software maintainability [5], but do not speak directly to quality attributes. Quantitative measures (such as those on this site) also miss language-specific attributes (e.g., one line of Haskell may be more expressive than one line of Javascript). However, research shows that quantiative metrics can be useful as covariates for software quality predictors [6]. At the least, our metrics are a good measure of active developer activity for budding open source cryptocurrency developments.

Can't these code metrics be artifically influenced?

We use independent metrics to give a high level view of a project's activity. This helps to avoid data skew and makes it harder to influence the numbers artificially. For example, a single commit could be broken up into 10 smaller commits, but the number of lines would stay the same. Taking it up a level, using a variety of metrics (including number of developers and stars) gives greater confidence that the numbers reflect organic activity.

Some crypto currencies, like Bitcoin Cash, don't have a reference implementation. What do you do about that?

It is challenging to collect distributed projects that implement a protocol. For example, Bitcoin Cash has at least four (1, 2, 3, 4) related client implementations. It also prompts challenges to include projects relating to a protocol, but not strictly part of the reference organization (e.g., should we include the MetaMask project for Ethereum?) Due to these challenges, we (currently) only include projects with a dedicated reference organization or user on GitHub, and are investigating improvements.

How often do you update the data?

Data is updated every 24 hours.

Who are you?

This site is developed by PhD Software Engineering students based in the US. As such, we have no vested interest in promoting any particular cryptocurrency, open source or otherwise, based on the contents of this site.

© Copyright CryptoCodeWatch 2018