Step 2: Load Data to TiDB

1. Setup TiDB Cloud


We will use TiDB Cloud Serverless Tier in this workshop, The resource of Serverless Tier is limited, and it's provided for development usage.

First, sign up TiDB Cloud, then create a Serverless Tier database.

On top-right of the cluster detail page you created, click Connect button, you will see the connection string like below:

mysql --connect-timeout 15 -u '3xxzxxxxx1xxxxr.root' -h -P 4000 -D test -p

There are several values we can get from the connection string that will be used later:

  • username: the value after -u option, in this case, it's 39jzyT3RT1DWrAr.root
  • password: the password you just inputted while creating database
  • host: the value after -h option, in this case, it's
  • port: 4000, by default, TiDB Cloud uses 4000 instead of 3306

2. Load realtime events to TiDB Cloud

a. Install Docker Compose


We suppose you have knowledge about Docker / Container / Docker Compose.

If you haven't installed Docker Compose, please install it with this doc, then verfiy it with:

docker-compose --version

b. Clone the OSS Insight repo to local

git clone --depth=1;
cd ossinsight;

c. Start up the mini OSSInsight program

Configure the necessary environment variables and start the mini OSSInsight through docker compose.

export GITHUB_TOKEN=<your personal access token>;
# e.g. DATABASE_URL=tidb://
export DATABASE_URL=tidb://<your tidb username>:<your tidb password>@<your tidb host>:4000/gharchive_dev;
docker-compose pull;
docker-compose up;

3. Load historical GitHub events to TiDB Cloud

Download and import the sample events data which contains several active OSS database repos, about 240k rows.

# e.g. mysql -u 39jzyT3RT1DWrAr.root -h -P 4000 -p gharchive_dev < gharchive_dev.github_events.000000000.sql
mysql --host <your tidb host> --port 4000 -u <your tidb username> -p gharchive_dev < gharchive_dev.github_events.000000000.sql

4. Test

On top-right of the TiDB Cloud cluster detail page, click Connect button and get you connection command in bottom like below:

mysql --connect-timeout 15 -u '<your tidb username>' -h -P 4000 -D test -p

Execute the following SQL to check if it is ACTUALLY ready:

SELECT count(*) FROM gharchive_dev.github_events;

Try it again after a few seconds, make sure the results are different.