Skip to main content

Step 2: Load Data to TiDB

1. Setup TiDB

info

You can use either TiDB Cloud or your local tidb cluster(on laptop or servers), by default, we use TiDB Cloud in this workshop.

  • With the TiDB Cloud mode, because the Developer Tier uses Shared Ram and provides only 1GB of disk, so we only provide 240k rows of data for importing.
  • But for the Local TiDB, we provide a dataset with 0.8 million of rows.

First, sign up TiDB Cloud, then create a Developer Tier database which is free for 1 year.

On top-right of the cluster detail page you created, click Connect button, you will see the connection string like below:

mysql --connect-timeout 15 -u '39jzyT3RT1DWrAr.root' -h gateway01.us-west-2.prod.aws.tidbcloud.com -P 4000 -D test -p

There are several values we can get from the connection string that will be used later:

  • username: the value after -u option, in this case, it's 39jzyT3RT1DWrAr.root
  • password: the password you just inputted while creating database
  • host: the value after -h option, in this case, it's gateway01.us-west-2.prod.aws.tidbcloud.com
  • port: 4000, by default, TiDB Cloud uses 4000 instead of 3306

2. Load realtime events to TiDB

a. Install Docker Compose

note

We suppose you have knowledge about Docker / Container / Docker Compose.

If you haven't installed Docker Compose, please install it with this doc, then verfiy it with:

docker-compose --version

b. Clone the OSS Insight repo to local

git clone --depth=1 https://github.com/pingcap/ossinsight.git;
cd ossinsight;

c. Start up the mini OSSInsight program

Configure the necessary environment variables and start the mini OSSInsight through docker compose.

export GITHUB_TOKEN=<your personal access token>;
# e.g. DATABASE_URL=tidb://39jzyT3RT1DWrAr.root:xxxxxxxxx@gateway01.us-west-2.prod.aws.tidbcloud.com:4000/gharchive_dev
export DATABASE_URL=tidb://<your tidb username>:<your tidb password>@<your tidb host>:4000/gharchive_dev;
docker-compose pull;
docker-compose up;

3. Load sample historical GitHub events to TiDB

Download and import the sample events data which contains several active OSS database repos, about 240k rows.

wget https://github.com/pingcap/ossinsight/releases/download/sample-data/gharchive_dev.github_events.000000000.sql;
# e.g. mysql -u 39jzyT3RT1DWrAr.root -h gateway01.us-west-2.prod.aws.tidbcloud.com -P 4000 -p gharchive_dev < gharchive_dev.github_events.000000000.sql
mysql --host <your tidb host> --port 4000 -u <your tidb username> -p gharchive_dev < gharchive_dev.github_events.000000000.sql

4. Test

Connect to TiDB by:

On top-right of the TiDB Cloud cluster detail page, click Connect button and get you connection command in bottom like below:

mysql --connect-timeout 15 -u '<your tidb username>' -h gateway01.us-west-2.prod.aws.tidbcloud.com -P 4000 -D test -p

Execute the following SQL to check if it is ACTUALLY ready:

SELECT count(*) FROM gharchive_dev.github_events;

Try it again after a few seconds, make sure the results are different.