Big Query: How to Fetch Data In JSON Format By Consuming BigQuery’s REST API(s)


BigQuery is revolutionizing the way Big Data is maintained and processed. Its “server-less” architecture is not just enterprise friendly but also developer friendly as it takes care of all the hardware configuration and scalability matrices on its own.

As a developer you just need to know the process of data extraction and that is all. All the performance related hard-work have been taken care by Google’s BigQuery.

Before you begin, I would suggest you to read following article:

This article will give you the entire picture of Big Query and why you should be using it in the first place to deal with Big Data.

Create a New Project

To do anything, you first need to create a Project under Google Cloud Platform.

Navigate to the Google Credentials Page. And select your project from the top navbar. Look at the image below:

If your project is not visible, then you will have to create it.

Click on select project drop-down and  New Project. Take a look at the below snapshot.

Upon clicking the New Project button, a new page will open where you will be asked to enter project details and create new project.

Create New GCP Project

Once your project is created, navigate back to the Credentials Page and look for Manage service Account Link at the bottom. Your application should use Service Account to interact with different Google Services.

Before moving forward, let me give you a quick overview of Service Account and its significance to better understand the tutorial.

Understanding Service Account

The first thing I tell everyone: Do not confuse Service Account with User Account.

Service accounts are not user specific, in-fact, these are special type of Google Accounts that belongs to your application instead of a User. Service account is a way to access Google Services with proper authentication and authorization.

Google assume identity of a Service account while calling Google APIs. Service accounts are made with the sole purpose to eliminate User from directly accessing the resources.

How does Service Account maintain authentication and authorization?

Service account have special keys that are used to authenticate themselves to google while accessing google services.

When you create Service Account, you decide:

  • Which resources must be accessible by the service account?
  • What permissions does the Service Account need in order to access a specific resource?
  • Where will the code that assumes the identity of the service account be running: on Google Cloud Platform or on-premises?

Use below flowchart to figure out how to configure service account:

How to Create Service Account

Login to your Google Cloud Platform dashboard. Navigate to IAM & admin link on the left side bar and click on Service Accounts link.

Access Service Account Link in Google Cloud Platform

Alternatively, you can Click on Manage Service Account link and that will lead you to Service Account Creation page.

Create new service account

After selecting project you will see all the listed accounts in the list format. And on the top navbar there is an option to create new service account. Click on Create New Service Account and provide the necessary information.

Make sure to choose the BigQuery role under Project role option. Checkout the image below:

Service Account Creation Page Google cloud Platform
Service Account Creation Page Google cloud Platform

Do not forget to check the furnish new private key checkbox. This will provide you file for download which we will be using later to access data from BigQuery tables.

Click save and your Service account will be created.

Note: Download the private key and save it on your local machine.

Great, you are done with all the boring service account creation part. Now, let’s jump into the development part. Here, you will be fetching the data from the BigQuery tables in JSON format.

Environment Setup

Set the path to the Private Key file in your System Environment. Follow following steps:

  • Right Click on PC in windows explore and click on properties.
PC Properties in windows explorer
  • Click on Advanced System Settings
Advanced System Settings Windows
Advanced System Settings
  • Click on Environment Variables button and create new environment variable with the key GOOGLE_APPLICATION_CREDENTIALS.
Environment Variables
GOOGLE_APPLICATION_CREDENTIALS
  • You are done setting up the environment and now can make calls to Google BigQuery.

JAVA Project Setup for BigQuery Data Access Using REST API(s)

In this project, you will query the Hacker News comments from the BigQuery database. This dataset is uploaded to BigQuery publicly and is accessible by everyone. To give you the overview, below is the query that you will be firing on the Bigquery database:

BigQuery Hacker News Comments Query Result
Query result for Hacker News comments on 3.41 GB dataset

You will first need to setup JAVA project and include all the required dependencies to make HTTP calls and retrieve data from the BigQuery database. For this project you will be using the pom.xml file provided by Google in their sample projects. You can copy and paste the file contents from the following link: https://github.com/vslala/BigQueryRestSample/blob/master/pom.xml. The main dependencies included in the pom.xml file are:

  • Google Cloud BigQuery Library (1.35.0)
  • Google API Client (1.23.0)
  • Google HTTP Client (1.23.0)
  • Google OAuth Client (1.23.0)

You can explore pom.xml file later. Once you have created a project and imported all the dependencies into your classpath, then its time for some action 😉

Project Structure

For this project, I would suggest you to use simple structure and try out the application.

  1. I assume you have already created a project with all the required dependencies.
  2. Create Package Structure: com.example.bigquery
  3. For simplicity, keep all the code inside this package.

Classes Used

Let’s take each class one-by-one and see what it does.

Dataset.java

Go through the below code:

package com.example.bigquery;

import java.util.HashMap;
import java.util.Map;

public class Dataset {
	
	private Map<String, Object> labels;
	
	public Dataset() {
		labels = new HashMap<>();
	}
	
	public Dataset addLabel(String labelKey, String labelValue) {
		labels.put(labelKey, labelValue);
		return this;
	}
	
	public Dataset removeLabel(String labelKey) {
		labels.remove(labelKey);
		return this;
	}
	
	public Map<String, Object> getLabels() {
		return labels;
	}
}

BigQueryResponseVO.java

This class is Response Value Object for the BigQuery Response. Take a look at its properties:

package com.example.bigquery;

import java.util.List;

import org.codehaus.jackson.annotate.JsonProperty;

import com.google.api.services.bigquery.model.JobReference;
import com.google.cloud.bigquery.BigQueryError;
import com.google.cloud.bigquery.Schema;

public class BigQueryResponseVO {
	@JsonProperty
	String kind;
	@JsonProperty
	String etag;
	@JsonProperty
	Schema schema;
	@JsonProperty
	JobReference jobReference;
	@JsonProperty
	long totalRows;
	@JsonProperty
	String pageToken;
	@JsonProperty
	boolean cacheHit;
	@JsonProperty
	long totalBytesProcessed;
	@JsonProperty
	List<BigQueryError> errors;
	@JsonProperty
	long numDmlAffectedRows;
	
	
}

LabelsSample.java

This class is the bootstrap class for the project. It contains the main() method that gives call to the BigQueryDemo.implicit() method. 

Take a look at the class:

/*
 * Copyright 2016 Google Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.example.bigquery;

import java.io.IOException;

/** Sample demonstrating labeling a BigQuery dataset or table. */
public class LabelsSample {

	public static void main(String[] args) throws IOException, InterruptedException {
		BigQueryDemo.implicit();
	}
}

BigQueryDemo.java

Go through the below code and then read further:

package com.example.bigquery;

import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.Arrays;

import org.apache.log4j.Logger;

import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.http.GenericUrl;
import com.google.api.client.http.HttpContent;
import com.google.api.client.http.HttpHeaders;
import com.google.api.client.http.HttpRequest;
import com.google.api.client.http.HttpRequestFactory;
import com.google.api.client.http.HttpResponse;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.http.json.JsonHttpContent;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.common.io.CharStreams;

public class BigQueryDemo {
	
	private static final String QUERY_URL_FORMAT = "https://www.googleapis.com/bigquery/v2/projects/%s/queries" + "?access_token=%s";

	private static final String QUERY = "query";

	private static final String QUERY_HACKER_NEWS_COMMENTS = "SELECT * FROM [bigquery-public-data:hacker_news.comments] LIMIT 1000";

	private static final Logger logger = Logger.getLogger(BigQueryDemo.class);
	
	static GoogleCredential credential = null;
	static final HttpTransport HTTP_TRANSPORT = new NetHttpTransport();
	static final JsonFactory JSON_FACTORY = new JacksonFactory();
	static {
		// Authenticate requests using Google Application Default credentials.
		try {
			credential = GoogleCredential.getApplicationDefault();
			credential = credential.createScoped(Arrays.asList("https://www.googleapis.com/auth/bigquery"));
			credential.refreshToken();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	public static void implicit() {
		String projectId = credential.getServiceAccountProjectId();
		String accessToken = generateAccessToken();
		// Set the content of the request.
		Dataset dataset = new Dataset().addLabel(QUERY,	QUERY_HACKER_NEWS_COMMENTS);
		HttpContent content = new JsonHttpContent(JSON_FACTORY, dataset.getLabels());
		// Send the request to the BigQuery API.
		GenericUrl url = new GenericUrl(String.format(QUERY_URL_FORMAT, projectId, accessToken));
		logger.debug("URL: " + url.toString());
		String responseJson = getQueryResult(content, url);
		logger.debug(responseJson);
	}

	private static String getQueryResult(HttpContent content, GenericUrl url) {
		String responseContent = null;
		HttpRequestFactory requestFactory = HTTP_TRANSPORT.createRequestFactory();
		HttpRequest request = null;
		try {
			request = requestFactory.buildPostRequest(url, content);
			request.setParser(JSON_FACTORY.createJsonObjectParser());
			request.setHeaders(
					new HttpHeaders().set("X-HTTP-Method-Override", "POST").setContentType("application/json"));
			HttpResponse response = request.execute();
			InputStream is = response.getContent();
			responseContent = CharStreams.toString(new InputStreamReader(is));
		} catch (IOException e) {
			logger.error(e);
		}
		return responseContent;
	}

	private static String generateAccessToken() {
		String accessToken = null;
		if ((System.currentTimeMillis() > credential.getExpirationTimeMilliseconds())) {
			accessToken = credential.getRefreshToken();
		} else {
			accessToken = credential.getAccessToken();
		}
		System.out.println(accessToken);
		return accessToken;
	}
}

This is the main class that would do all the work.

The very first thing that executes in this class is the static block.

static {
		// Authenticate requests using Google Application Default credentials.
		try {
			credential = GoogleCredential.getApplicationDefault();
			credential = credential.createScoped(Arrays.asList("https://www.googleapis.com/auth/bigquery"));
			credential.refreshToken();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

The static block gets the Secret Json File of Service Account from the system environment properties. Then it sends a request to Google OAuth service to get the refresh token.

This refresh token is used to access the BigQuery with a scope as `https://www.googleapis.com/auth/bigquery`.

Form Request Body

Once the static block has executed successfully, the implicit method will be invoked.

The starting point for this class is its implicit() method. This implicit method is called from the main() method of LabelsSample.java.

Next, you will have to create the request body that will contain the query and BigQuery post URL.

public static void implicit() {
		String projectId = credential.getServiceAccountProjectId();
		String accessToken = generateAccessToken();
		// Set the content of the request.
		Dataset dataset = new Dataset().addLabel(QUERY,	QUERY_HACKER_NEWS_COMMENTS);
		HttpContent content = new JsonHttpContent(JSON_FACTORY, dataset.getLabels());
		// Send the request to the BigQuery API.
		GenericUrl url = new GenericUrl(String.format(QUERY_URL_FORMAT, projectId, accessToken));
		logger.debug("URL: " + url.toString());
		String responseJson = getQueryResult(content, url);
		logger.debug(responseJson);
	}

Send Request and Get Query Result

Once you have formed the request content, it is time to call the BigQuery’s REST API and fetch the results in JSON format.

Use request = requestFactory.buildPostRequest(url, content) (Request Factory) to create post request and pass in the URL and Content to it.

Set the appropriate headers (mandatory) and call the execute method.

request.setHeaders(
    new HttpHeaders().set("X-HTTP-Method-Override", "POST").setContentType("application/json"));

The response is provided in the InputStream. Convert the InputStream into Characters to retrieve the response in JSON.

That is all you need to do in order to fetch the data from BigQuery in JSON format.

Conclusion

Great! I hope you were able to get the response by following this tutorial.

This article was only intended to fetch the JSON response from BigQuery. Do not use the example as it is. Try to play around and modify the code to suit your need.

I will be modifying the code to make it more friendly and act as a framework for your project. So, do check the master branch of the repository as well.

If you have any question or if there is anything that you do not understand then please do comment below. I would be more than happy to interact with you and solve your queries.


  • Article By: Varun Shrivastava

  • Varun Shrivastava is an innovative full stack web developer. He is a passionate blogger and loves to write about philosophy, programming, tech and relationships. He works as a Software Developer. This is his den, you can get in touch with him here anytime you want.
  • Website Search
  • Free Ebook Download ($7.06 Value)



  • Grab the Deal Now!

    Hostgator  Starting @$3.95/mo

  • DigitalOcean Free Credits!

    DigitalOcean Credit Sqare Banner

%d bloggers like this: