Skip to main content

Step 2: Load Data to TiDB

1. Setup TiDB


You can use either TiDB Cloud or your local tidb cluster(on laptop or servers), by default, we use TiDB Cloud in this workshop.

  • With the TiDB Cloud mode, because the Developer Tier uses Shared Ram and provides only 1GB of disk, so we only provide 240k rows of data for importing.
  • But for the Local TiDB, we provide a dataset with 0.8 million of rows.

First, sign up TiDB Cloud, then create a Developer Tier database which is free for 1 year.

On top-right of the cluster detail page you created, click Connect button, you will see the connection string like below:

mysql --connect-timeout 15 -u '39jzyT3RT1DWrAr.root' -h -P 4000 -D test -p

There are several values we can get from the connection string that will be used later:

  • username: the value after -u option, in this case, it's 39jzyT3RT1DWrAr.root
  • password: the password you just inputted while creating database
  • host: the value after -h option, in this case, it's
  • port: 4000, by default, TiDB Cloud uses 4000 instead of 3306

2. Load realtime events to TiDB

a. Install Docker Compose


We suppose you have knowledge about Docker / Container / Docker Compose.

If you haven't installed Docker Compose, please install it with this doc, then verfiy it with:

docker-compose --version

b. Clone the OSS Insight repo to local

git clone --depth=1;
cd ossinsight;

c. Start up the mini OSSInsight program

Configure the necessary environment variables and start the mini OSSInsight through docker compose.

export GITHUB_TOKEN=<your personal access token>;
# e.g. DATABASE_URL=tidb://
export DATABASE_URL=tidb://<your tidb username>:<your tidb password>@<your tidb host>:4000/gharchive_dev;
docker-compose pull;
docker-compose up;

3. Load sample historical GitHub events to TiDB

Download and import the sample events data which contains several active OSS database repos, about 240k rows.

# e.g. mysql -u 39jzyT3RT1DWrAr.root -h -P 4000 -p gharchive_dev < gharchive_dev.github_events.000000000.sql
mysql --host <your tidb host> --port 4000 -u <your tidb username> -p gharchive_dev < gharchive_dev.github_events.000000000.sql

4. Test

Connect to TiDB by:

On top-right of the TiDB Cloud cluster detail page, click Connect button and get you connection command in bottom like below:

mysql --connect-timeout 15 -u '<your tidb username>' -h -P 4000 -D test -p

Execute the following SQL to check if it is ACTUALLY ready:

SELECT count(*) FROM gharchive_dev.github_events;

Try it again after a few seconds, make sure the results are different.