Step 2: Load Data to TiDB
1. Setup TiDB Cloud
We will use TiDB Cloud Serverless Tier in this workshop, The resource of Serverless Tier is limited, and it's provided for development usage.
First, sign up TiDB Cloud, then create a Serverless Tier
database.
On top-right of the cluster detail page you created, click Connect
button, you will see the connection string like below:
mysql --connect-timeout 15 -u '3xxzxxxxx1xxxxr.root' -h gateway01.us-west-2.prod.aws.tidbcloud.com -P 4000 -D test -p
There are several values we can get from the connection string that will be used later:
- username: the value after -u option, in this case, it's
39jzyT3RT1DWrAr.root
- password: the password you just inputted while creating database
- host: the value after -h option, in this case, it's
gateway01.us-west-2.prod.aws.tidbcloud.com
- port:
4000
, by default, TiDB Cloud uses4000
instead of3306
2. Load realtime events to TiDB Cloud
a. Install Docker Compose
We suppose you have knowledge about Docker / Container / Docker Compose.
If you haven't installed Docker Compose, please install it with this doc, then verfiy it with:
docker-compose --version
b. Clone the OSS Insight repo to local
- GitHub
- GHProxy
git clone --depth=1 https://github.com/pingcap/ossinsight.git;
cd ossinsight;
git clone --depth=1 https://ghproxy.com/https://github.com/pingcap/ossinsight.git;
cd ossinsight;
c. Start up the mini OSSInsight program
Configure the necessary environment variables and start the mini OSSInsight through docker compose.
export GITHUB_TOKEN=<your personal access token>;
# e.g. DATABASE_URL=tidb://39jzyT3RT1DWrAr.root:xxxxxxxxx@gateway01.us-west-2.prod.aws.tidbcloud.com:4000/gharchive_dev
export DATABASE_URL=tidb://<your tidb username>:<your tidb password>@<your tidb host>:4000/gharchive_dev;
docker-compose pull;
docker-compose up;
Learn how to create a personal access token
3. Load historical GitHub events to TiDB Cloud
Download and import the sample events data which contains several active OSS database repos, about 240k rows.
wget https://github.com/pingcap/ossinsight/releases/download/sample-data/gharchive_dev.github_events.000000000.sql;
# e.g. mysql -u 39jzyT3RT1DWrAr.root -h gateway01.us-west-2.prod.aws.tidbcloud.com -P 4000 -p gharchive_dev < gharchive_dev.github_events.000000000.sql
mysql --host <your tidb host> --port 4000 -u <your tidb username> -p gharchive_dev < gharchive_dev.github_events.000000000.sql
4. Test
On top-right of the TiDB Cloud cluster detail page, click Connect
button and get you connection command in bottom like below:
mysql --connect-timeout 15 -u '<your tidb username>' -h gateway01.us-west-2.prod.aws.tidbcloud.com -P 4000 -D test -p
Execute the following SQL to check if it is ACTUALLY ready:
SELECT count(*) FROM gharchive_dev.github_events;
Try it again after a few seconds, make sure the results are different.