When it comes to GitHub, we often see fake GitHub users who are always enthusiastic and active, giving timely feedback to project maintainers and contributors, and helping developers with tasks that can be automated. Yes, the next thing I want to discuss is something about GitHub bots.
Overview
In the OSSInsight project, we have developed a number of metrics to provide insight into open source projects. When developing some open source project metrics, we always consider excluding bot-generated actions or events from the metric calculation.
However, We can't ignore the contribution of robots in the domain of open source, and it's important to shift our thinking to look at what bots are doing on GitHub.
GitHub's bots help developers do a lot of work:
- Issue triage and management. (For example:
stale[bot]
、todo[bot]
) - Code review, security audit and quality inspection (For example,
snyk-bot
). - Format checking like ensuring license agreement signing, or make sure commit messages semantic. (For example:
CLAassistant
) - Integration with third-party systems, including Jira, Slack, Jenkins and so on.
- As an agent to help contributor perform some operations needed permission on the repository. (For example:
k8s-ci-bot
、ti-chi-bot
)
History trends
Looking at the historical data, we see that the number of GitHub bots grows significantly faster after 2019 (on average, 20,000 new bots are created each year)
Click here to expand SQL
I looked into what happened during the year and found that GitHub invested a lot in its software development infrastructure (including bots) during the year.
On May 23, 2019, GitHub announced acquired Dependabot (Aka,
dependabot[bot]
).In June 17th, 2019, GitHub announced acquired Pull Panda.
In September 18th, 2019, GitHub announced acquired Semmle (Aka, the team built
lgtm-com[bot]
).
At this year, we, human beings, were amazed to discover that bots could find problems, submit PRs, wait CI test code, complete merges and comment on PRs on their own without any human involvement.
For now, rough statistics found that there are more than 95,620 bots on GitHub, the number doesn't seem like so much, but wait...
Click here to show computational process
👀 These 95 thousand bot accounts generated 603 million events, these events account for 12.82% of all public events on GitHub.
Click here to show computational process
And these GitHub robots have served over 18 million open source repositories.
Click here to show computational process
Cases study
Dependabot[bot]
dependabot[bot]
is a hard-working bot responsible for helping open source projects keep their dependencies up to date.
By analyzing depentenbot's Push commit time, we found that he likes to start his busy week at 8:00 on Mondays (at GMT timezone).
Click here to expand SQL
It is commendable that, after a series of log4j security vulnerabilities came to light, it helped many Java-language repositories to update the dependency to a secure version timely.
Stale Bots
Stale Bot is a controversial class of robots, they are responsible for reminding maintainers to continue promoting long-term stale issue.
Bad practice | Best practices |
---|---|
The user from Gatsby:
| The user from NixOS:
|
To verify the above statement, we run the following query through the SQL statement:
SELECT actor_login, COUNT(DISTINCT pr_or_issue_id) AS cnt
FROM github_events ge
WHERE
repo_name = 'gatsbyjs/gatsby'
AND type = 'IssuesEvent'
AND action = 'closed'
AND (actor_login LIKE '%[bot]' OR actor_login LIKE '%bot')
GROUP BY actor_login
ORDER BY cnt DESC;
We know from the following query that many Issues in the gatsbyjs/gatsby
repository have been closed by the stale bots.
+---------------------+------+
| actor_login | cnt |
+---------------------+------+
| gatsbot[bot] | 1389 |
| github-actions[bot] | 777 |
| gatsbybot | 265 |
| stale[bot] | 50 |
| renovate[bot] | 1 |
+---------------------+------+
5 rows in set
Time: 0.100s
I think it is necessary to distinguish between what should be done by robots and what must be done with human involvement.
Weird bots
There are some weird bots on GitHub that don't help people work and learn on GitHub, but rather act as data movers.
Click here to expand SQL
Some of them will use GitHub as a free place to archive their data, for example,
speedtracker-bot
,newstools
.Some of them will periodically upload a timestamp to the code repository as a commit, for example,
keihin00174
.Some are even crazier, and you can't even access their profile pages because the number of events generated is so large that GitHub's database can't process them quickly, for example,
mhutchinson-witness
,direwolf-github
.
Ranks
We ranked the robots according to their contribution.