Skip to main content

Data Explorer

Explore 5 billion GitHub data with no SQL or plotting skills. Reveal fascinating discoveries RIGHT NOW!

💡 Popular questions

GitHub data is not your focus?

Import any datasetimage


How it works

Input your question
Translate the question into SQL
Visualize and output results

Can I use the AI-powered feature with my own dataset?

Yes! It's built with Chat2Query, an AI-powered SQL generator in TiDB Cloud. If you want to explore any other dataset, Chat2Query is an excellent choice.

What are the limitations of Data Explorer?

  1. AI is still a work in progress with limitations
    Its limitations include:
    • A lack of context and knowledge of the specific database structure
    • A lack of domain knowledgestructure
    • Inability to produce the most efficient SQL statement for large and complex queries
    • Sometimes service instability

    To help AI understand your query intention, please use clear, specific phrases in your question. Check out our question optimization tips. We're constantly working on improving and optimizing it, so any feedback you have is greatly appreciated. Thanks for using!

  2. The dataset itself is a limitation for our tool
  3. All the data we use on this website is sourced from GH Archive, a non-profit project that records and archives all GitHub event data since 2011 (public data only). If a question falls outside of the scope of the available data, it may be difficult for our tool to provide a satisfactory answer.

Why did it fail to generate an SQL query?

Potential reasons:
  • The AI was unable to understand or misunderstood your question, resulting in an inability to generate SQL. To know more about AI's limitations, you can check out the previous question.
  • Network issues.
  • You had excessive requests. Note that you can ask up to 15 questions per hour.

The potential solution is phrase your question which is related GitHub with short, specific words, then try again. And we strongly recommend you use our query templates near the search box to start your exploring.

The query result is not satisfactory. How can I optimize my question?

We use AI to translate your question to SQL. But it's still a work in progress with limitations.
To help AI understand your query intention and get a desirable query result, you can rephrase your question using clear, specific phrases related to GitHub. We recommend:
  • Using a GitHub login account instead of a nickname. For example, change "Linus" to "torvalds."
  • Using a GitHub repository's full name. For example, change "react" to "facebook/react."
  • Using GitHub terms. For example, to find Python projects with the most forks in 2022, change your query "The most popular Python projects 2022" to "Python projects with the most forks in 2022."

You can also get inspiration from the suggested queries near the search box.

Why did it fail to generate a chart?

Potential reasons:
  • The SQL query was incorrect or could not be generated, so the answer could not be found in the database, and the chart could not be generated.
  • The answer was found, but the AI did not choose the correct chart template, so the chart could not be generated.
  • The SQL query was correct, but no answer was found, so the chart could not be displayed.

What technology is Data Explorer built on?

Its major technologies include:
  • Data source: GH Archive and GitHub event API
    GH Archive collects and archives all GitHub data since 2011 and updates it hourly. By combining the GH Archive data and the GitHub event API, we can gain streaming, real-time data updates.
  • One database for all workloads: TiDB Cloud
    Facing continuously growing large-volume data (currently 5+ billion GitHub events), we need a database that can:
    • Store massive data
    • Handle complex analytical queries
    • Serve online traffic
    TiDB is an ideal solution. TiDB Cloud is its fully managed cloud Database as a Service. It lets users launch TiDB in seconds and offers the pay-as-you-go pricing model. Therefore, we choose TiDB Cloud as our backend database.
  • SQL generator: Chat2Query
  • Chat2Query is an AI-powered SQL editor in the TiDB Cloud console. We use it to generate SQL for your queries.
  • AI engine: OpenAI
  • To enable users without SQL knowledge to query with this tool, we use OpenAI to translate the natural language to SQL.

Still having trouble? Contact us, we're happy to help!

Wonder how OSS Insight works?


How do we implement OSS Insight ?

Blog: 10 min read

read more

Use TiDB Cloud to Analyze GitHub Events in 10 Minutes

Tutorial: 10 min read

read more

Join a Workshop to Setup a Mini OSS Insight

Tutorial: 25 min

read more

Follow us at @OSSInsight and join the conversation using the hashtags
#OSSInsight #TiDBCloud