Hands-On Machine Learning on Google Cloud Platform
上QQ阅读APP看书,第一时间看更新

Google BigQuery

Data represents a fundamental factor for the management and growth of companies. Ensuring that data is protected, available and easily accessible is a fundamental requirement of any IT department. More importantly, another requirement is to ensure that data is used in the correct way: to manage processes, to inform decision makers, and to intervene intelligently in changing circumstances.

The way companies ensure data availability is rapidly changing. Cloud computing has seen impressive growth in recent years, both as a concept and as a practical component of the IT infrastructure.

Cloud computing is a technology that allows the use, via remote server, of software and hardware resources (such as mass storage for data storage), whose use is offered as a service by a provider, specifically by subscription.

A particularly interesting Cloud computing solution is Google BigQuery. BigQuery, is a web service designed to allow you to perform queries on large datasets; for example, it is able to perform selection and aggregation queries on tables with billions of records in a few seconds, so it would be a good step forward to obtain in an interactive way information that previously took days to be calculated.

BigQuery enables companies and developers around the world to manage large amounts of data in real time, without the support of any hardware or software investment. The service provided by Google is useful if, for example, a large multinational company has to optimize its daily spending based on sales and advertising data, but even if a small online retailer has to change the presentation of a product based on the user clicks. The system, as stated by the producers themselves, also aims to help many companies fight the prevailing world economic crisis.

By making BigQuery a public service, Google claims to have reached an important milestone in the effort to make Big data analytics accessible to all businesses through the Cloud service. BigQuery is accessible through a simple user interface that allows you to take advantage of the power of calculation offered by Google. The collected data is protected on multiple levels of security, replicated in multiple servers and can be easily exported. Developers and businesses can subscribe online to the service and take advantage of 100 GB of data per month for free.

The main features of BigQuery are:

  • Scalability: One of the inherent advantages of Cloud computing is the ability to expand the infrastructure on demand, ensuring a dynamic scalability of application capacity based on the increase in needs. This is particularly useful when the peak usage level of hosted applications changes consistently with the passage of time.
  • Interactivity: Manages to perform selection or group queries on billions of records in a matter of seconds.
  • Familiarity: Uses an SQL dialect for writing queries.

It also allows a good sharing of data, the use of Google Storage allows you to create a collaboration hub. Whenever the need arises to share their data with other users it is possible to give access to the information available to those who want to appropriately by setting up the access control list (ACL).

An ACL is a mechanism used to express complex rules that determine whether some of the IT system's resources are accessed by its users.

BigQuery contains methods that allow both to create, populate, and delete tables, and to query above them. Writing queries in BigQuery is possible using a SQL dialect; in this dialect, some SQL methods have been modified to speed up the execution of some queries; in those cases, where the precision of the results is not essential, they are based on statistical estimates and return an indicative value.