
Convert or validate an Arrow schema for compatibility with base R column types
Source:R/utils-arrow-types.R
as_r_schema.Rd
These functions help convert or validate an arrow::Schema object (typically from a Parquet file or Arrow dataset) by translating Arrow types to R equivalents, extracting type strings, or checking for compatibility.
Usage
as_r_schema(arrow_schema, call = rlang::caller_env())
arrow_schema_to_string(arrow_schema)
is_supported_arrow_type(arrow_schema)
validate_arrow_schema(arrow_schema, call = rlang::caller_env())
Value
as_r_schema()
: A named character vector mapping column names to base R type strings (e.g.,"integer"
,"double"
,"logical"
).arrow_schema_to_string()
: A named character vector mapping column names to Arrow type strings.is_supported_arrow_type()
: A named logical vector indicating whether each column is supported.validate_arrow_schema()
: Returns the original schema (invisibly) if all column types are supported; otherwise throws an error.
Details
as_r_schema()
maps Arrow types to base R types (e.g.,"int32"
→"integer"
). It throws an error if unsupported column types are present.arrow_schema_to_string()
returns a named character vector of raw Arrow type strings (e.g.,"int64"
,"date32[day]"
) for schema field.is_supported_arrow_type()
returns a named logical vector indicating whether each schema field type is supported.validate_arrow_schema()
throws an error if any fields has an unsupported Arrow type.
For a full list of supported types and their R mappings, see arrow_to_r_datatypes()
.
Examples
# Path to a single Parquet file
file_path <- system.file(
"testhubs/parquet/model-output/hub-baseline/2022-10-01-hub-baseline.parquet",
package = "hubUtils"
)
# Get schema from the file
file_schema <- arrow::read_parquet(file_path, as_data_frame = FALSE)$schema
# Convert to R types
as_r_schema(file_schema)
#> origin_date target horizon location output_type
#> "Date" "character" "integer" "character" "character"
#> output_type_id value
#> "double" "integer"
# Get raw Arrow type strings
arrow_schema_to_string(file_schema)
#> origin_date target horizon location output_type
#> "date32[day]" "string" "int32" "string" "string"
#> output_type_id value
#> "double" "int32"
# Check which columns are supported
is_supported_arrow_type(file_schema)
#> origin_date target horizon location output_type
#> TRUE TRUE TRUE TRUE TRUE
#> output_type_id value
#> TRUE TRUE
# Validate schema (throws error if any unsupported types are present)
validate_arrow_schema(file_schema)
# From a multi-file dataset
dataset_path <- system.file(
"testhubs/parquet/model-output/hub-baseline",
package = "hubUtils"
)
ds <- arrow::open_dataset(dataset_path)
as_r_schema(ds$schema)
#> origin_date target horizon location output_type
#> "Date" "character" "integer" "character" "character"
#> output_type_id value
#> "double" "integer"
arrow_schema_to_string(ds$schema)
#> origin_date target horizon location output_type
#> "date32[day]" "string" "int32" "string" "string"
#> output_type_id value
#> "double" "int32"
is_supported_arrow_type(ds$schema)
#> origin_date target horizon location output_type
#> TRUE TRUE TRUE TRUE TRUE
#> output_type_id value
#> TRUE TRUE
validate_arrow_schema(ds$schema)