Places
This guide is focused on the Overture places data — its content, scope, properties, and use cases. Please see the schema reference documentation for more information on the Overture places schema.
Overview
The Overture places theme has one feature type, called place
, and contains more than 53 million point representations of real-world entities: schools, businesses, hospitals, religious organizations, landmarks, mountain peaks, and much more. The theme is derived from a conflation of Meta and Microsoft data and is available under a CDLA Permissive 2.0 license.
Overture places data, styled by data source: purple for Meta, orange for Microsoft. |
Primary source | Feature count, July 2024 release |
---|---|
Meta | ~48 million |
Microsoft | ~5.5 million |
Dataset description
All Overture data, including places data, is distributed as GeoParquet, a column-based data structure. Below you'll find a table with column-by-column descriptions of the properties in the place feature type. Of particular interest to users is the categories property; we offer a complete list of available categories here.
Schema for GeoParquet files in the places theme
- places
column | type | description |
---|---|---|
id | VARCHAR | A feature ID. This may be an ID associated with the Global Entity Reference System (GERS) if—and-only-if the feature represents an entity that is part of GERS. |
geometry | BLOB | The point representation of the Place's location. Place's geometry which MUST be a Point as defined by GeoJSON schema. |
bbox | STRUCT | Area defined by two longitudes and two latitudes: latitude is a decimal number between -90.0 and 90.0; longitude is a decimal number between -180.0 and 180.0. |
version | INTEGER | Version number of the feature, incremented in each Overture release where the geometry or attributes of this feature changed. |
sources | STRUCT | The array of source information for the properties of a given feature, with each entry being a source object which lists the property in JSON Pointer notation and the dataset that specific value came from. All features must have a root level source which is the default source if a specific property's source is not specified. |
names | STRUCT | Properties defining the names of a feature. |
categories | STRUCT | The categories of the place. Complete list is available on GitHub: https://github.com/OvertureMaps/schema/blob/main/task-force-docs/places/overture_categories.csv |
confidence | DOUBLE | The confidence of the existence of the place. It's a number between 0 and 1. 0 means that we're sure that the place doesn't exist (anymore). 1 means that we're sure that the place exists. If there's no value for the confidence, it means that we don't have any confidence information. |
websites | VARCHAR[] | The websites of the place. |
socials | VARCHAR[] | The social media URLs of the place. |
emails | VARCHAR[] | The email addresses of the place. |
phones | VARCHAR[] | The phone numbers of the place. |
brand | STRUCT | The brand of the place. A location with multiple brands is modeled as multiple separate places, each with its own brand. |
addresses | STRUCT | The addresses of the place. |
Data access and retrieval
The latest places data can be obtained from AWS or Azure as GeoParquet files at the following locations.
Provider | Location |
---|---|
Amazon S3 |
|
Azure Blob Storage |
|
More information can be found in the Getting Overture Data section of this documentation. You can download the entire dataset directly from the S3 or Azure locations above. Warning: the output will be a very large file. Depending on your use case, these methods might be more practical for you:
- Python Command-line Tool
- DuckDB
First, follow the setup guide for the Python Command-line Tool.
overturemaps download -f geoparquet --type=place -o places.geoparquet
First, follow the setup guide for DuckDB.
LOAD spatial;
LOAD httpfs;
-- Access the data on AWS in this example
SET s3_region='us-west-2';
COPY (
SELECT
*
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/*/*'))
TO 'places.parquet';
Data usage guidelines
We recommend downloading only the Overture data you need. If you have a particular geographic area of interest, there are several options for using a simple bounding box to extract places data and output a GeoJSON file.
- Overture Maps Explorer
- Python Command-line Tool
- DuckDB
To quickly view and download modest amounts of data, you can use the Overture Maps Explorer website.
To download data: Pan to the area you are interested in, turn off the other layers, then click Download Visible
.
This will download the area visible on your screen.
First, follow the setup guide for the Python Command-line Tool.
Simply alter the bbox
value to download a particular area.
overturemaps download --bbox=12.46,41.89,12.48,41.91 -f geojson --type=place -o rome.geojson
First, follow the setup guide for DuckDB.
Replace the bbox.xmin
and bbox.ymin
values with a new bounding box to run the query for a different area.
LOAD spatial;
LOAD httpfs;
-- Access the data on AWS in this example
SET s3_region='us-west-2';
COPY (
SELECT
id,
version,
-- We are casting these columns to JSON in order to ensure compatibility with our GeoJSON output.
-- These conversions may be not necessary for other output formats.
CAST(names AS JSON) AS names,
CAST(categories AS JSON) AS categories,
confidence,
CAST(websites AS JSON) AS websites,
CAST(socials AS JSON) AS socials,
CAST(emails AS JSON) AS emails,
CAST(phones AS JSON) AS phones,
CAST(brand AS JSON) AS brand,
CAST(addresses AS JSON) AS addresses,
CAST(sources AS JSON) AS sources,
ST_GeomFromWKB(geometry)
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/*/*')
WHERE
-- Point geometry doesn't require looking at both min and max:
bbox.xmin BETWEEN 12.46 AND 12.48 AND
bbox.ymin BETWEEN 41.89 AND 41.91
) TO 'rome_places.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON', SRS 'EPSG:4326');
Data manipulation and analysis
Querying by properties
These examples use data properties in the address, category, and confidence scores columns to filter the data in useful ways using DuckDB.
- Query by address
- Query by category
- Query by confidence score
The address
column can be used to quickly filter data down to a particular political unit. This example uses the country key to get all the data with addresses in Lithuania. Region can be likewise used to extract data from smaller units such as US states.
LOAD spatial;
LOAD httpfs;
-- Access the data on AWS in this example
SET s3_region='us-west-2';
COPY (
SELECT
-- we can parse addresses into columns to make further filtering of the data simpler
addresses[1].freeform as street,
addresses[1].locality as locality,
addresses[1].postcode as postcode,
addresses[1].region as region,
addresses[1].country as country,
*
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/*/*')
WHERE
addresses[1].country = 'LT'
) TO 'lithuania_places.parquet';
For filtering data by a particular type of place we can use the categories
column. In this example we'll extract all the places with categories of rice_mill
or flour_mill
The full category list is available here.
LOAD spatial;
LOAD httpfs;
-- Access the data on AWS in this example
SET s3_region='us-west-2';
COPY (
SELECT
names.primary as primary_name,
confidence,
addresses,
websites,
geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/*/*')
WHERE
categories.primary IN ('flour_mill', 'rice_mill')
) TO 'mills.parquet';
Suppose you only want data that definitely exists and is accurate. We can use the confidence
score to filter out data below a certain threshold to remove any suspect data.
LOAD spatial;
LOAD httpfs;
-- Access the data on AWS in this example
SET s3_region='us-west-2';
COPY (
SELECT
*
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/*/*')
WHERE
-- Only select data with a confidence score above .95
confidence > .95
-- Further filtering for data within Massachusetts to limit the size of this query
AND addresses[1].region = 'MA'
) TO 'MA_high_confidence_places.parquet';
Advanced examples
These examples present some use cases that combine places data with other datasets.
- Conflate with OpenStreetMap
- Find building addresses
Overture Places can be a valuable source for conflating with or enhancing your own existing dataset.
In this example, suppose we want to use OpenStreetMap POIs for a project but would like to fill in any missing attributes such as addresses or phone numbers with Overture Place data.
Using some basic matching logic, we can join these two datasets together to create a more robust final product. By also joining the GERS ID to our output dataset we could easily keep our now conflated dataset synced with future Overture releases with a simple join.
To run this example yourself, an Oregon PBF can be obtained from Geofabrik.
Note: Joining data with a CDLA Permissive 2.0 license to OSM is permitted but the resulting data may need to carry the Open Database License (ODbL) if it is a derivative database. Please see the OSM Collective Database Guideline for information on this topic.
Query
LOAD spatial;
LOAD httpfs;
-- Access the data on AWS in this example
SET s3_region='us-west-2';
COPY (
-- We'll first select OSM data from Oregon with amenity = restaurant
WITH osm AS (
SELECT kind,
id,
tags->>'name' AS name,
tags->>'addr:housenumber' AS housenumber,
tags->>'addr:street' AS street,
tags->>'addr:postcode' AS postcode,
tags->>'addr:city' AS city,
tags->>'website' AS website,
tags->>'phone' AS phone,
lat,
lon,
tags
FROM st_readosm(
'oregon-latest.osm.pbf'
)
WHERE tags->>'amenity' = 'restaurant'
),
-- Then select Overture data with any category containing the word restauarant in Oregon.
overture AS (
SELECT id,
names.primary AS "names.primary",
websites[1] AS website,
socials[1] AS social,
emails[1] AS email,
phones[1] AS phone,
addresses[1].freeform AS freeform,
addresses[1].locality AS locality,
addresses[1].postcode AS postcode,
addresses[1].region AS region,
addresses[1].country AS country,
ST_GeomFromWKB(geometry) AS geometry
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/*/*')
WHERE region = 'OR'
AND country = 'US'
AND categories.primary ilike '%restaurant%'
)
-- Now that we have our input data we will join them together.
SELECT
-- With the GERS id joined to the final result this dataset can be quickly synced to future Overture releases
overture.id AS GERS_id,
osm.name,
-- Using CASE statements, we'll favor OSM data when it is present but use Overture data wherever there are gaps
CASE
WHEN osm.housenumber IS NOT NULL
OR osm.street IS NOT NULL THEN concat(osm.housenumber, ' ', osm.street)
ELSE overture.freeform
END AS address,
CASE
WHEN osm.city IS NOT NULL THEN osm.city
ELSE overture.locality
END AS city,
CASE
WHEN osm.postcode IS NOT NULL THEN osm.postcode
ELSE overture.postcode
END AS postcode,
CASE
WHEN osm.website IS NOT NULL THEN osm.website
ELSE overture.website
END AS website,
CASE
WHEN osm.phone IS NOT NULL THEN osm.phone
ELSE overture.phone
END AS phone,
overture.social,
overture.email,
ST_AsWKB(st_point(osm.lon, osm.lat)) AS geometry
FROM osm
-- To join the data, we'll first match features that have the same OR similar names
LEFT JOIN overture ON (
osm.name = overture."names.primary"
OR osm.name ilike concat('%', overture."names.primary", '%')
OR overture."names.primary" ilike concat('%', osm.name, '%')
OR damerau_levenshtein(osm.name, overture."names.primary") < 3
)
-- Then use a small buffer to match features that are nearby to each other
AND st_intersects(
st_buffer(overture.geometry::geometry, 0.003),
st_point(osm.lon, osm.lat)
)
) TO 'oregon_restaurants_combined.parquet';
Suppose you are interested in having address data attached to buildings. The Overture addresses theme might be a good place to check, but let's assume it does not cover the area you are interested in.
The places theme has wide coverage and many of the place point features have addresses associated with them. Using an intersect we can find places that fall inside buildings and then join the place's address to the building polygon.
Query
LOAD spatial;
LOAD httpfs;
-- Access the data on AWS in this example
SET s3_region='us-west-2';
COPY (
-- First query places with address data in the area we are interested in
WITH places AS
(
SELECT *
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=places/*/*')
WHERE bbox.xmin BETWEEN 14.38 AND 14.44
AND bbox.ymin BETWEEN 50.07 AND 50.11
AND addresses[1].freeform IS NOT NULL
),
-- Then get buildings in the same area
buildings as
(
SELECT *
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-07-22.0/theme=buildings/type=building/*')
WHERE bbox.xmin > 14.38 AND bbox.xmax < 14.44
AND bbox.ymin > 50.07 AND bbox.ymax < 50.11
)
-- Join the data using an intersect and select distinct to avoid duplicates
SELECT distinct(buildings.id), buildings.*, places.addresses
FROM buildings
LEFT JOIN places on st_intersects(ST_GeomFromWKB(places.geometry), ST_GeomFromWKB(buildings.geometry))
ORDER BY buildings.id
) TO 'prague_places_in_buildings.parquet';
Tools and libraries
Rapid
Rapid, an OpenStreetMap editor, is capable of displaying places data as a reference layer by following the guide here.
The license is compatible with OSM and this data can be used for mapping.