Loading JSON data from Cloud Storage

Loading JSON files from Cloud Storage

You tin load newline delimited JSON information from Cloud Storage into a new table or partition, or append to or overwrite an existing tabular array or segmentation. When your information is loaded into BigQuery, it is converted into columnar format for Capacitor (BigQuery'southward storage format).

When yous load data from Cloud Storage into a BigQuery table, the dataset that contains the tabular array must be in the same regional or multi- regional location as the Cloud Storage bucket.

The newline delimited JSON format is the same format as the JSON Lines format.

For information nigh loading JSON information from a local file, see Loading data from local files.

Limitations

When you load JSON files into BigQuery, note the following:

  • JSON information must exist newline delimited. Each JSON object must exist on a separate line in the file.
  • If you utilize gzip compression, BigQuery cannot read the data in parallel. Loading compressed JSON data into BigQuery is slower than loading uncompressed information.
  • You lot cannot include both compressed and uncompressed files in the same load job.
  • The maximum size for a gzip file is 4 GB.
  • BigQuery does not support maps or dictionaries in JSON, due to potential lack of schema data in a pure JSON dictionary. For case, to stand for a list of products in a cart "products": {"my_product": 40.0, "product2" : 16.5} is not valid, but "products": [{"product_name": "my_product", "amount": 40.0}, {"product_name": "product2", "amount": sixteen.5}] is valid.

    If you lot need to keep the unabridged JSON object, then it should be put into a string column, which can exist queried using JSON functions.

  • If y'all use the BigQuery API to load an integer exterior the range of [-253+one, ii53-1] (usually this means larger than 9,007,199,254,740,991), into an integer (INT64) cavalcade, laissez passer information technology every bit a string to avoid data corruption. This issue is caused past a limitation on integer size in JSON/ECMAScript. For more information, see the Numbers department of RFC 7159.

  • When you load CSV or JSON data, values in Appointment columns must apply the nuance (-) separator and the date must be in the following format: YYYY-MM-DD (year-month-solar day).
  • When you load JSON or CSV data, values in TIMESTAMP columns must employ a dash (-) separator for the date portion of the timestamp, and the date must be in the following format: YYYY-MM-DD (twelvemonth-month-24-hour interval). The hh:mm:ss (hour-infinitesimal-second) portion of the timestamp must use a colon (:) separator.

Earlier you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each chore in this document.

Required permissions

To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading information from Cloud Storage, you also demand IAM permissions to access the bucket that contains your data.

Permissions to load data into BigQuery

To load data into a new BigQuery table or partition or to append or overwrite an existing tabular array or partition, y'all need the following IAM permissions:

  • bigquery.tables.create
  • bigquery.tables.updateData
  • bigquery.tables.update
  • bigquery.jobs.create

Each of the following predefined IAM roles includes the permissions that you lot need in guild to load data into a BigQuery table or partition:

  • roles/bigquery.dataEditor
  • roles/bigquery.dataOwner
  • roles/bigquery.admin (includes the bigquery.jobs.create permission)
  • bigquery.user (includes the bigquery.jobs.create permission)
  • bigquery.jobUser (includes the bigquery.jobs.create permission)

Additionally, if you have the bigquery.datasets.create permission, you can create and update tables using a load chore in the datasets that you create.

For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.

Permissions to load data from Cloud Storage

To load information from a Cloud Storage saucepan, you lot demand the following IAM permissions:

  • storage.objects.get
  • storage.objects.listing (required if you are using a URI wildcard)

The predefined IAM function roles/storage.objectViewer includes all the permissions you need in order to load data from a Cloud Storage saucepan.

Loading JSON data into a new table

You tin load newline delimited JSON information from Deject Storage into a new BigQuery table by using one of the following:

  • The Deject Console
  • The bq command-line tool's bq load command
  • The jobs.insert API method and configuring a load job
  • The customer libraries

To load JSON data from Deject Storage into a new BigQuery table:

Console

  1. In the Cloud Console, open up the BigQuery page.

    Go to BigQuery

  2. In the Explorer panel, expand your project and select a dataset.

  3. Expand the Actions choice and click Open.

  4. In the details panel, click Create tabular array .

  5. On the Create table page, in the Source section:

    • For Create tabular array from, select Deject Storage.

    • In the source field, browse to or enter the Deject Storage URI. You cannot include multiple URIs in the Cloud Console, simply wildcards are supported. The Cloud Storage saucepan must exist in the same location as the dataset that contains the table you're creating.

      Select file.

    • For File format, select JSON (Newline delimited).

  6. On the Create table page, in the Destination department:

    • For Dataset name, choose the appropriate dataset.

      View dataset.

    • Verify that Table type is set to Native tabular array.

    • In the Table name field, enter the proper name of the table you're creating in BigQuery.

  7. In the Schema section, for Auto find, check Schema and input parameters to enable schema auto detection. Alternatively, you tin manually enter the schema definition past:

    • Enabling Edit as text and inbound the table schema as a JSON array.

      Add schema as JSON array.

    • Using Add together field to manually input the schema.

      Add schema definition using the Add Field button.

  8. (Optional) To partition the table, cull your options in the Partition and cluster settings. For more information, see Creating partitioned tables.

  9. (Optional) For Partitioning filter, click the Require partition filter box to require users to include a WHERE clause that specifies the partitions to query. Requiring a partition filter tin reduce cost and improve performance. For more information, see Querying partitioned tables. This choice is unavailable if No partitioning is selected.

  10. (Optional) To cluster the table, in the Clustering guild box, enter between one and 4 field names.

  11. (Optional) Click Advanced options.

    • For Write preference, exit Write if empty selected. This option creates a new table and loads your data into it.
    • For Number of errors immune, accept the default value of 0 or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job results in an invalid bulletin and fails.
    • For Unknown values, cheque Ignore unknown values to ignore any values in a row that are not nowadays in the table's schema.
    • For Encryption, click Customer-managed key to employ a Cloud Primal Management Service key. If you lot leave the Google-managed cardinal setting, BigQuery encrypts the data at remainder.
  12. Click Create table.

bq

Utilise the bq load command, specify NEWLINE_DELIMITED_JSON using the --source_format flag, and include a Cloud Storage URI. Y'all tin include a single URI, a comma-separated list of URIs, or a URI containing a wildcard. Supply the schema inline, in a schema definition file, or apply schema auto-observe.

(Optional) Supply the --location flag and set up the value to your location.

Other optional flags include:

  • --max_bad_records: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is 0. At nearly, five errors of whatever blazon are returned regardless of the --max_bad_records value.
  • --ignore_unknown_values: When specified, allows and ignores extra, unrecognized values in CSV or JSON data.
  • --autodetect: When specified, enable schema auto-detection for CSV and JSON data.
  • --time_partitioning_type: Enables fourth dimension-based partitioning on a table and sets the partition blazon. Possible values are Hour, DAY, Calendar month, and Yr. This flag is optional when you create a table partitioned on a DATE, DATETIME, or TIMESTAMP column. The default partition blazon for fourth dimension-based sectionalization is 24-hour interval. You cannot change the partitioning specification on an existing table.
  • --time_partitioning_expiration: An integer that specifies (in seconds) when a time-based sectionalisation should be deleted. The expiration time evaluates to the partition'south UTC appointment plus the integer value.
  • --time_partitioning_field: The Date or TIMESTAMP column used to create a partitioned table. If time-based partitioning is enabled without this value, an ingestion-time partitioned tabular array is created.
  • --require_partition_filter: When enabled, this option requires users to include a WHERE clause that specifies the partitions to query. Requiring a partitioning filter tin can reduce toll and ameliorate performance. For more information, run into Querying partitioned tables.
  • --clustering_fields: A comma-separated list of upwards to four cavalcade names used to create a amassed table.
  • --destination_kms_key: The Cloud KMS fundamental for encryption of the table data.

    For more data on partitioned tables, see:

    • Creating partitioned tables

    For more information on clustered tables, meet:

    • Creating and using amassed tables

    For more information on table encryption, come across:

    • Protecting information with Cloud KMS keys

To load JSON data into BigQuery, enter the following command:

bq --location=LOCATION                        load \ --source_format=FORMAT                        \                        DATASET.TABLE                        \                        PATH_TO_SOURCE                        \                        SCHEMA                      

Supersede the following:

  • LOCATION : your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, you lot can gear up the flag's value to asia-northeast1. You can gear up a default value for the location using the .bigqueryrc file.
  • FORMAT : NEWLINE_DELIMITED_JSON.
  • DATASET : an existing dataset.
  • Tabular array : the name of the table into which yous're loading data.
  • PATH_TO_SOURCE : a fully qualified Cloud Storage URI or a comma-separated list of URIs. Wildcards are also supported.
  • SCHEMA : a valid schema. The schema can exist a local JSON file, or it can be typed inline as office of the command. If you use a schema file, do not give it an extension. You can also use the --autodetect flag instead of supplying a schema definition.

Examples:

The following command loads data from gs://mybucket/mydata.json into a table named mytable in mydataset. The schema is defined in a local schema file named myschema.

                                                  bq load \     --source_format=NEWLINE_DELIMITED_JSON \     mydataset.mytable \     gs://mybucket/mydata.json \     ./myschema                                              

The following command loads data from gs://mybucket/mydata.json into a new ingestion-fourth dimension partitioned table named mytable in mydataset. The schema is defined in a local schema file named myschema.

                                                  bq load \     --source_format=NEWLINE_DELIMITED_JSON \     --time_partitioning_type=Twenty-four hour period \     mydataset.mytable \     gs://mybucket/mydata.json \     ./myschema                                              

The post-obit command loads data from gs://mybucket/mydata.json into a partitioned table named mytable in mydataset. The table is partitioned on the mytimestamp column. The schema is defined in a local schema file named myschema.

                                                  bq load \     --source_format=NEWLINE_DELIMITED_JSON \     --time_partitioning_field mytimestamp \     mydataset.mytable \     gs://mybucket/mydata.json \     ./myschema                                              

The following command loads data from gs://mybucket/mydata.json into a tabular array named mytable in mydataset. The schema is auto detected.

                                                  bq load \     --autodetect \     --source_format=NEWLINE_DELIMITED_JSON \     mydataset.mytable \     gs://mybucket/mydata.json                                              

The following control loads data from gs://mybucket/mydata.json into a table named mytable in mydataset. The schema is divers inline in the format FIELD:DATA_TYPE, FIELD:DATA_TYPE .

                                                  bq load \     --source_format=NEWLINE_DELIMITED_JSON \     mydataset.mytable \     gs://mybucket/mydata.json \     qtr:Cord,sales:Float,year:STRING                                              

The post-obit command loads data from multiple files in gs://mybucket/ into a table named mytable in mydataset. The Cloud Storage URI uses a wildcard. The schema is motorcar detected.

                                                  bq load \     --autodetect \     --source_format=NEWLINE_DELIMITED_JSON \     mydataset.mytable \     gs://mybucket/mydata*.json                                              

The following command loads data from multiple files in gs://mybucket/ into a table named mytable in mydataset. The command includes a comma- separated list of Cloud Storage URIs with wildcards. The schema is defined in a local schema file named myschema.

                                                  bq load \     --source_format=NEWLINE_DELIMITED_JSON \     mydataset.mytable \     "gs://mybucket/00/*.json","gs://mybucket/01/*.json" \     ./myschema                                              

API

  1. Create a load chore that points to the source data in Cloud Storage.

  2. (Optional) Specify your location in the location property in the jobReference department of the job resources.

  3. The source URIs property must be fully qualified, in the format gs://BUCKET/OBJECT . Each URI can contain one '*' wildcard grapheme.

  4. Specify the JSON data format past setting the sourceFormat property to NEWLINE_DELIMITED_JSON.

  5. To check the chore status, call jobs.get(JOB_ID*), replacing JOB_ID with the ID of the job returned by the initial request.

    • If status.state = Washed, the task completed successfully.
    • If the status.errorResult property is present, the request failed, and that object includes data describing what went wrong. When a request fails, no tabular array is created and no data is loaded.
    • If status.errorResult is absent, the job finished successfully; although, at that place might have been some nonfatal errors, such as problems importing a few rows. Nonfatal errors are listed in the returned job object's status.errors property.

API notes:

  • Load jobs are diminutive and consistent; if a load chore fails, none of the information is available, and if a load job succeeds, all of the data is bachelor.

  • Every bit a best practice, generate a unique ID and pass information technology as jobReference.jobId when calling jobs.insert to create a load task. This approach is more robust to network failure because the client can poll or retry on the known job ID.

  • Calling jobs.insert on a given job ID is idempotent. You can retry as many times as y'all similar on the same job ID, and at most, i of those operations succeed.

C#

Earlier trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more than information, see the BigQuery C# API reference documentation.

Employ the BigQueryClient.CreateLoadJob() method to start a load job from Cloud Storage. To use newline-delimited JSON, create a CreateLoadJobOptions object and ready its SourceFormat property to FileFormat.NewlineDelimitedJson.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using customer libraries. For more than information, see the BigQuery Go API reference documentation.

Java

Before trying this sample, follow the Coffee setup instructions in the BigQuery quickstart using client libraries. For more than information, see the BigQuery Java API reference documentation.

Use the LoadJobConfiguration.builder(tableId, sourceUri) method to start a load job from Cloud Storage. To employ newline-delimited JSON, use the LoadJobConfiguration.setFormatOptions(FormatOptions.json()).

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more data, see the BigQuery Node.js API reference documentation.

PHP

Before trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more data, come across the BigQuery PHP API reference documentation.

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, encounter the BigQuery Python API reference documentation.

Use the Client.load_table_from_uri() method to offset a load chore from Cloud Storage. To use newline-delimited JSON, set the LoadJobConfig.source_format property to the string NEWLINE_DELIMITED_JSON and laissez passer the job config every bit the job_config argument to the load_table_from_uri() method.

Ruby

Earlier trying this sample, follow the Ruddy setup instructions in the BigQuery quickstart using customer libraries. For more information, see the BigQuery Ruby API reference documentation.

Utilize the Dataset.load_job() method to start a load job from Cloud Storage. To use newline-delimited JSON, set the format parameter to "json".

Loading nested and repeated JSON information

BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON, Avro, ORC, Parquet, Firestore, and Datastore.

Ane JSON object, including whatever nested/repeated fields, must appear on each line.

The following example shows sample nested/repeated data. This tabular array contains information about people. Information technology consists of the following fields:

  • id
  • first_name
  • last_name
  • dob (date of birth)
  • addresses (a nested and repeated field)
    • addresses.status (current or previous)
    • addresses.address
    • addresses.city
    • addresses.state
    • addresses.nada
    • addresses.numberOfYears (years at the address)

The JSON data file would expect like the following. Notice that the accost field contains an array of values (indicated by [ ]).

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 Get-go Avenue","city":"Seattle","land":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Principal Street","metropolis":"Portland","state":"OR","zilch":"22222","numberOfYears":"5"}]} {"id":"ii","first_name":"Jane","last_name":"Doe","dob":"1980-ten-16","addresses":[{"status":"electric current","address":"789 Any Avenue","city":"New York","state":"NY","nil":"33333","numberOfYears":"two"},{"status":"previous","address":"321 Main Street","urban center":"Hoboken","state":"NJ","zippo":"44444","numberOfYears":"3"}]}                  

The schema for this table would look similar the following:

[     {         "name": "id",         "type": "Cord",         "mode": "NULLABLE"     },     {         "proper name": "first_name",         "type": "Cord",         "mode": "NULLABLE"     },     {         "name": "last_name",         "type": "Cord",         "mode": "NULLABLE"     },     {         "proper noun": "dob",         "blazon": "DATE",         "mode": "NULLABLE"     },     {         "proper name": "addresses",         "type": "Tape",         "manner": "REPEATED",         "fields": [             {                 "name": "condition",                 "type": "STRING",                 "mode": "NULLABLE"             },             {                 "name": "address",                 "blazon": "STRING",                 "mode": "NULLABLE"             },             {                 "name": "city",                 "type": "STRING",                 "mode": "NULLABLE"             },             {                 "name": "country",                 "type": "STRING",                 "mode": "NULLABLE"             },             {                 "proper name": "zip",                 "type": "STRING",                 "way": "NULLABLE"             },             {                 "name": "numberOfYears",                 "type": "STRING",                 "mode": "NULLABLE"             }         ]     } ]                  

For data on specifying a nested and repeated schema, come across Specifying nested and repeated fields.

Appending to or overwriting a table with JSON data

Y'all tin load boosted data into a table either from source files or by appending query results.

In the Cloud Console, utilise the Write preference selection to specify what action to take when you load data from a source file or from a query result.

You have the following options when y'all load additional information into a table:

Console choice bq tool flag BigQuery API belongings Description
Write if empty Non supported WRITE_EMPTY Writes the data only if the table is empty.
Append to table --noreplace or --replace=false; if --[no]supplant is unspecified, the default is append WRITE_APPEND (Default) Appends the data to the end of the table.
Overwrite table --replace or --supervene upon=truthful WRITE_TRUNCATE Erases all existing data in a table earlier writing the new information. This action likewise deletes the table schema and removes any Cloud KMS primal.

If yous load data into an existing tabular array, the load chore can append the information or overwrite the tabular array.

You can append or overwrite a tabular array by using one of the following:

  • The Cloud Console
  • The bq control-line tool's bq load control
  • The jobs.insert API method and configuring a load chore
  • The client libraries

Console

  1. In the Deject Console, open the BigQuery folio.

    Become to BigQuery

  2. In the Explorer panel, expand your project and select a dataset.

  3. Expand the Deportment option and click Open.

  4. In the details panel, click Create table .

  5. On the Create table page, in the Source section:

    • For Create table from, select Cloud Storage.

    • In the source field, browse to or enter the Cloud Storage URI. Y'all cannot include multiple URIs in the Cloud Console, but wildcards are supported. The Deject Storage bucket must be in the aforementioned location as the dataset that contains the table you lot're appending or overwriting.

      Select file.

    • For File format, select JSON (Newline delimited).

  6. On the Create table page, in the Destination section:

    • For Dataset proper name, choose the appropriate dataset.

      Select dataset.

    • In the Tabular array proper noun field, enter the proper name of the table you're appending or overwriting in BigQuery.

    • Verify that Table type is fix to Native table.

  7. In the Schema section, for Motorcar detect, check Schema and input parameters to enable schema auto detection. Alternatively, you can manually enter the schema definition by:

    • Enabling Edit as text and entering the table schema as a JSON assortment.

      Add schema as JSON array.

    • Using Add field to manually input the schema.

      Add schema definition using the Add Field button.

  8. For Partition and cluster settings, get out the default values. Yous cannot convert a table to a partitioned or clustered table by appending or overwriting it. The Cloud Console does not support appending to or overwriting partitioned or amassed tables in a load job.

  9. Click Advanced options.

    • For Write preference, choose Append to table or Overwrite table.
    • For Number of errors allowed, accept the default value of 0 or enter the maximum number of rows containing errors that tin can exist ignored. If the number of rows with errors exceeds this value, the job results in an invalid bulletin and fails.
    • For Unknown values, bank check Ignore unknown values to ignore any values in a row that are not present in the tabular array's schema.
    • For Encryption, click Customer-managed central to use a Cloud Key Management Service primal. If you lot leave the Google-managed primal setting, BigQuery encrypts the data at rest.

      Overwrite table.

  10. Click Create tabular array.

bq

Utilise the bq load command, specify NEWLINE_DELIMITED_JSON using the --source_format flag, and include a Cloud Storage URI. Y'all can include a unmarried URI, a comma-separated list of URIs, or a URI containing a wildcard.

Supply the schema inline, in a schema definition file, or use schema car-detect.

Specify the --replace flag to overwrite the table. Employ the --noreplace flag to append data to the table. If no flag is specified, the default is to append information.

It is possible to modify the table'southward schema when you lot append or overwrite it. For more information on supported schema changes during a load functioning, see Modifying table schemas.

(Optional) Supply the --location flag and set the value to your location.

Other optional flags include:

  • --max_bad_records: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is 0. At most, v errors of any blazon are returned regardless of the --max_bad_records value.
  • --ignore_unknown_values: When specified, allows and ignores actress, unrecognized values in CSV or JSON data.
  • --autodetect: When specified, enable schema auto-detection for CSV and JSON data.
  • --destination_kms_key: The Cloud KMS key for encryption of the table data.
bq --location=LOCATION                        load \ --[no]replace \ --source_format=FORMAT                        \                        DATASET.Tabular array                        \                        PATH_TO_SOURCE                        \                        SCHEMA                      

Replace the following:

  • LOCATION : your location. The --location flag is optional. You tin can set a default value for the location using the .bigqueryrc file.
  • FORMAT : NEWLINE_DELIMITED_JSON.
  • DATASET : an existing dataset.
  • Tabular array : the name of the table into which you're loading data.
  • PATH_TO_SOURCE : a fully qualified Cloud Storage URI or a comma-separated listing of URIs. Wildcards are likewise supported.
  • SCHEMA : a valid schema. The schema can be a local JSON file, or it tin can be typed inline equally part of the command. You can also utilize the --autodetect flag instead of supplying a schema definition.

Examples:

The following command loads data from gs://mybucket/mydata.json and overwrites a table named mytable in mydataset. The schema is divers using schema automobile-detection.

                                                  bq load \     --autodetect \     --supersede \     --source_format=NEWLINE_DELIMITED_JSON \     mydataset.mytable \     gs://mybucket/mydata.json                                              

The following command loads data from gs://mybucket/mydata.json and appends data to a table named mytable in mydataset. The schema is divers using a JSON schema file — myschema.

                                                  bq load \     --noreplace \     --source_format=NEWLINE_DELIMITED_JSON \     mydataset.mytable \     gs://mybucket/mydata.json \     ./myschema                                              

API

  1. Create a load job that points to the source data in Cloud Storage.

  2. (Optional) Specify your location in the location holding in the jobReference department of the job resources.

  3. The source URIs property must be fully-qualified, in the format gs://BUCKET/OBJECT . You tin can include multiple URIs as a comma-separated list. The wildcards are also supported.

  4. Specify the information format past setting the configuration.load.sourceFormat property to NEWLINE_DELIMITED_JSON.

  5. Specify the write preference past setting the configuration.load.writeDisposition holding to WRITE_TRUNCATE or WRITE_APPEND.

Become

Before trying this sample, follow the Get setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.

Java

Node.js

Earlier trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using customer libraries. For more than information, encounter the BigQuery Node.js API reference documentation.

PHP

Earlier trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, run into the BigQuery PHP API reference documentation.

Python

To replace the rows in an existing table, set up the LoadJobConfig.write_disposition property to the cord WRITE_TRUNCATE.

Earlier trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, encounter the BigQuery Python API reference documentation.

Ruby

To replace the rows in an existing table, set up the write parameter of Table.load_job() to "WRITE_TRUNCATE".

Before trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, encounter the BigQuery Cerise API reference documentation.

Loading hive-partitioned JSON data

BigQuery supports loading hive partitioned JSON data stored on Cloud Storage and populates the hive segmentation columns as columns in the destination BigQuery managed table. For more information, see Loading externally partitioned data.

Details of loading JSON information

This section describes how BigQuery parses diverse data types when loading JSON information.

Data types

Boolean. BigQuery can parse whatever of the post-obit pairs for Boolean data: one or 0, truthful or simulated, t or f, yep or no, or y or due north (all case insensitive). Schema autodetection automatically detects any of these except 0 and i.

Bytes. Columns with BYTES types must be encoded as Base64.

Appointment. Columns with Appointment types must be in the format YYYY-MM-DD.

Datetime. Columns with DATETIME types must be in the format YYYY-MM-DD HH:MM:SS[.SSSSSS].

Geography. Columns with GEOGRAPHY types must contain strings in 1 of the following formats:

  • Well-known text (WKT)
  • Well-known binary (WKB)
  • GeoJSON

If you use WKB, the value should exist hex encoded.

The following list shows examples of valid data:

  • WKT: Bespeak(i 2)
  • GeoJSON: { "type": "Signal", "coordinates": [1, 2] }
  • Hex encoded WKB: 0101000000feffffffffffef3f0000000000000040

Before loading GEOGRAPHY data, too read Loading geospatial data.

Interval. Columns with INTERVAL types must be in ISO 8601 format PYMDTHMS, where:

  • P = Designator that indicates that the value represents a duration. Yous must e'er include this.
  • Y = Year
  • Yard = Month
  • D = Day
  • T = Designator that denotes the time portion of the elapsing. You lot must always include this.
  • H = Hour
  • Chiliad = Minute
  • S = 2d. Seconds tin be denoted equally a whole value or equally a fractional value of up to six digits, at microsecond precision.

You tin can indicate a negative value by prepending a dash (-).

The following list shows examples of valid information:

  • P-10000Y0M-3660000DT-87840000H0M0S
  • P0Y0M0DT0H0M0.000001S
  • P10000Y0M3660000DT87840000H0M0S

To load INTERVAL data, you must use the bq load command and use the --schema flag to specify a schema. You can't upload INTERVAL information by using the console.

Fourth dimension. Columns with Time types must be in the format HH:MM:SS[.SSSSSS].

Timestamp. BigQuery accepts various timestamp formats. The timestamp must include a date portion and a fourth dimension portion.

  • The engagement portion can be formatted every bit YYYY-MM-DD or YYYY/MM/DD.

  • The timestamp portion must be formatted as HH:MM[:SS[.SSSSSS]] (seconds and fractions of seconds are optional).

  • The date and time must be separated by a infinite or 'T'.

  • Optionally, the appointment and time tin can be followed past a UTC offset or the UTC zone designator (Z). For more information, encounter Time zones.

For case, whatever of the following are valid timestamp values:

  • 2018-08-19 12:eleven
  • 2018-08-19 12:11:35
  • 2018-08-19 12:11:35.22
  • 2018/08/19 12:11
  • 2018-07-05 12:54:00 UTC
  • 2018-08-19 07:11:35.220 -05:00
  • 2018-08-19T12:11:35.220Z

If you provide a schema, BigQuery also accepts Unix epoch time for timestamp values. However, schema autodetection doesn't discover this case, and treats the value as a numeric or string type instead.

Examples of Unix epoch timestamp values:

  • 1534680695
  • ane.534680695e11

Array (repeated field). The value must be a JSON array or null. JSON null is converted to SQL Naught. The assortment itself cannot contain aught values.

JSON options

To modify how BigQuery parses JSON information, specify boosted options in the Cloud Console, the bq control-line tool, the API, or the customer libraries.

JSON pick Panel option bq tool flag BigQuery API property Description
Number of bad records immune Number of errors allowed --max_bad_records maxBadRecords (Java, Python) (Optional) The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job issue. The default value is `0`, which requires that all records are valid.
Unknown values Ignore unknown values --ignore_unknown_values ignoreUnknownValues (Java, Python) (Optional) Indicates whether BigQuery should let extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated every bit bad records, and if there are also many bad records, an invalid error is returned in the job outcome. The default value is false. The `sourceFormat` property determines what BigQuery treats as an extra value: CSV: trailing columns, JSON: named values that don't friction match whatever column names.