GitHub Action for advanced repository traffic analysis and reporting
This is a GitHub Action originally built to overcome the 14-day limitation of GitHub’s built-in traffic statistics.
Run this daily to collect potentially valuable data.
According to the motto: a data snapshot each day keeps the doctor away 🍎
See this Action in Marketplace.
High-level method description:
Looking for a quick start? Follow the simple tutorial in the Wiki.
undefinedDemo 1:
undefinedDemo 2:
For more use cases (and their setup), see “Used by” section below.
This walks on the shoulders of giants:
Naming is hard :-). Let’s define two concepts and their names:
Let me know if you can think of better names.
These two repositories can be the same. But they don’t have to be :-).
That is, you can for example set up this Action in a private repository but have it observe a public repository.
This section contains brief instructions for a scenario where the data repository is different from the stats repository.
For a more detailed walkthrough (showing how to greate a personal access token, and also which git commands to use) please follow the Tutorial in the wiki.
Example scenario:
bob/nice-projectbob/private-ghrs-data-repoCreate a GitHub Actions workflow file in the data repository (in the example this is the repo bob/private-ghrs-data-repo). Example path: .github/workflows/repostats-for-nice-project.yml.
Example workflow file content with code comments:
on:
schedule:
# Run this once per day, towards the end of the day for keeping the most
# recent data point most meaningful (hours are interpreted in UTC).
- cron: "0 23 * * *"
workflow_dispatch: # Allow for running this manually.
jobs:
j1:
name: repostats-for-nice-project
runs-on: ubuntu-latest
steps:
- name: run-ghrs
uses: jgehrcke/github-repo-stats@RELEASE
with:
# Define the stats repository (the repo to fetch
# stats for and to generate the report for).
# Remove the parameter when the stats repository
# and the data repository are the same.
repository: bob/nice-project
# Set a GitHub API token that can read the GitHub
# repository traffic API for the stats repository,
# and that can push commits to the data repository
# (which this workflow file lives in, to store data
# and the report files).
ghtoken: ${{ secrets.ghrs_github_api_token }}
undefinedNote: the recommended way to run this Action is on a schedule, once per day. Really.
undefinedNote: defining ghtoken: ${{ secrets.ghrs_github_api_token }} is required. In the data repository (where the action is executed) you need to have a secret defined, with the name GHRS_GITHUB_API_TOKEN (of course you can change the name in both places).
The content of the secret needs to be an API token that has the repo scope. Follow the tutorial for precise instructions.
In the workflow file you can set various configuration parameters. They
are specified and documented in the action.yml file (the reference). Here
is a quick description, for convenience:
ghtoken: GitHub API token for reading the GitHub repository traffic API forrepository: Repository spec (<owner-or-org>/<reponame>) for the repository${{ github.repository }} (the repo thisdatabranch: Branch to push data to (in the data repo).github-repo-statsghpagesprefix: Set this if the data branch in the data repo is exposed viahttps://jgehrcke.github.io/ghrs-testIt is recommended that you create the data branch and delete all files from that branch before setting this Action up in your repository, so that this data branch appears as a tidy environment.
You can of course remove files from that branch at any other point in time, too.
matrixThe GitHub Actions workflow specification language allows for defining a matrix of different job configurations through the jobs.<job_id>.strategy.matrix directive.
This can be used for efficiently tracking multiple stats repositories from within the same data repository.
Example workflow file:
name: fetch-repository-stats
concurrency: fetch-repository-stats
on:
schedule:
- cron: "0 23 * * *"
workflow_dispatch:
jobs:
run-ghrs-with-matrix:
name: repostats-for-nice-projects
runs-on: ubuntu-latest
strategy:
matrix:
# The repositories to generate reports for.
statsRepo: ['bob/nice-project', 'alice/also-nice-project']
# Do not cancel&fail all remaining jobs upon first job failure.
fail-fast: false
# Help avoid commit conflicts. Note(JP): this should not be
# necessary anymore, feedback appreciated
max-parallel: 1
steps:
- name: run-ghrs
uses: jgehrcke/github-repo-stats@RELEASE
with:
repository: ${{ matrix.statsRepo }}
ghtoken: ${{ secrets.ghrs_github_api_token }}
Here is how to run bats-based checks from within a checkout:
$ git clone https://github.com/jgehrcke/github-repo-stats
$ cd github-repo-stats/
$ make clitests
...
1..5
ok 1 analyze.py: snapshots: some, vcagg: yes, stars: some, forks: none
ok 2 analyze.py: snapshots: some, vcagg: yes, stars: none, forks: some
ok 3 analyze.py: snapshots: some, vcagg: yes, stars: some, forks: some
ok 4 analyze.py: snapshots: some, vcagg: no, stars: some, forks: some
ok 5 analyze.py + pdf.py: snapshots: some, vcagg: no, stars: some, forks: some
$ make lint
...
All done! ✨ 🍰 ✨
...
Set environment variables, example:
export GITHUB_REPOSITORY=jgehrcke/ghrs-test
export GITHUB_WORKFLOW="localtesting"
export INPUT_DATABRANCH=databranch-test
export INPUT_GHTOKEN="c***1"
export INPUT_REPOSITORY=jgehrcke/covid-19-germany-gae
export INPUT_GHPAGESPREFIX="none"
export GHRS_FILES_ROOT_PATH="/home/jp/dev/github-repo-stats"
export GHRS_TESTING="true"
(for an up-to-date list of required env vars see .github/workflows/prs.yml)
Run in empty directory. Example:
cd /tmp/ghrstest
rm -rf .* *; bash /home/jp/dev/github-repo-stats/entrypoint.sh
A few rather randomly picked use cases:
We use cookies
We use cookies to analyze traffic and improve your experience. You can accept or reject analytics cookies.