Overview

At my day job our development team is making future impacting decisions on architectures and frameworks.  I believe in collaboration and openness when it comes to such broad technical decisions and don’t prefer to hand down mandates to the team. Besides, developers will tend to push toward the tools they want anyway.

In particular, the team was vetting frontend UI frameworks to base all new projects on and transition older projects to as part of moving to an API first mentality. When I say frontend UI framework, I mean CSS/HTML (some JS) focused on presentation, responsiveness and cross-browser compatibility.  We had three main contenders that were on the table: Bootstrap, Foundation and Semantic UI.

To help make a quantitative decision, I wanted to get some hard data on usage and penetration of Semantic UI in respect to support, community and long-term prospects of the project. There are plenty of other ways to gauge this, but I recalled an episode of the The Changelog Podcast (highly recommended)1 where I could potentially have a little fun digging deep into all of Github’s public repos. See bottom section for links to this episode.

You can use this same or similar approach to answer questions for your software problems or decision making. The blog is an attempt to show an example use-case.

Create Google Cloud Account, add billing and create project

A Google Cloud account with one billing account setup is required to get started. You will have to use a credit card, but it’s it’s free for first time users. Presently, Google gives you $300 credit to start playing with Google Cloud.(https://support.google.com/cloud/answer/6288653?hl=en).

After you have created an account following the how-to(s) from Google, go to: https://cloud.google.com/bigquery/quickstart-web-ui

The above link will let you setup a Google Cloud project. I created a project called “bigquery-semantic-ui-work”.

There’s one hint, if you can’t see a link or screen as mentioned in docs, you’ll want try going straight to the console at: console.cloud.google.com for billing,etc and/or https://bigquery.cloud.google.com for BigQuery.

While viewing the BigQuery console, click the small blue arrow beside your project to create a dataset to save your results to in next step. A dataset can be thought of like a DB schema or tablespace.

bigquery_project_dataset.png

I created a dataset call “semantic_ui_bigquery_fun”.

create_dataset_dialog

Phase 1 – Querying data

If you recall my goal was to find all repos in the Github public data that used Semantic UI. The GitHub data and docs don’t list (that I could easily find) the exact columns that you are searching and can query against. So here they are today:

{id, content, copies, sample_repo_name, sample_ref, sample_path,
sample_mode,sample_symlink_target}

Here was my final query to get first set of results:

Enter into the text area click the “Compose Query” red button. After a few seconds the query will return a table and some more options. Pretty fast too.

Query complete (6.7s elapsed, 23.9 GB processed)

Now save the results to a table. I created a table called “semantic_all_repo_results”. This effectively is a temporary table. By using a temp table you aren’t going to have to burn as many resources to get the data you are concerned about.

Using this table I got a distinct of repos that included something related to Semantic UI.

Note, BigQuery doesn’t support the actual DISTINCT operator, you must use GROUP BY to accomplish same task.
Now that I have a list of repos I was ready to get more info relevant to my basic business questions: how much of a community and support surrounds Semantic UI?

Phase 2 – Obtaining meta data from Github on repos

The BigQuery data is great, but it doesn’t include things like “stargazers counts” or “fork counts”. So the last part was saving the above query results with unique repos to a txt file and using a little Ruby to analyze.

I could now inspect the meta data. It wasn’t perfect. There were some duplicates still, but likely because of my quickly hacked together script. But that’s not the point. I was able to cope with this easily in a spreadsheet and get some metrics on Semantic UI in just about ~hour.  I shared my result file on a Google Doc in case someone wants to see the final result.

https://docs.google.com/spreadsheets/d/1RpP79mhG-DY1W48-ryvjYi5Ti1BFzRApEOp0ZfoqWbQ/edit?usp=sharing

Wrapping up

Ensure you clean up your datasets after you are done or you will burn credits! And go big and query.

References

1The Changelog 209: GitHub and Google on Public Datasets & Google BigQuery – Listen on Changelog.com
https://medium.com/google-cloud/github-on-bigquery-analyze-all-the-code-b3576fd2b150
https://cloud.google.com/bigquery/public-data/github

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s