Data Extracts - Technical Details

The extract process

Once a day, around 21:00 CET, we update a locally held version of the planet file from the latest OpenStreetMap data and split it into a number of pre-defined regions. This is done using the Osmosis and osmium programs (Osmosis to download updates, Osmium to apply them and split).

Every couple of months we re-initialise our update process with a new planet file just to make sure we're not carrying over potential replication errors forever.

The splitting is done in a cascading fashion - first we split the world in two halves, then we cut out the continents from each half, then countries, and so on.

We use polygonal boundaries for the splitting - boundaries that are sometiems derived and simplified from OSM data, sometimes just hand-drawn. The boundaries usually follow country borders, but occasionally we take liberties and include a litte more of a neighbouring country if this greatly simplifies the polygon. The Osmium extract function that we use keeps ways and multipolygon relations that cross an extract border complete, i.e. when a very large mutlipolygon crosses the border, an extract can occasionally contain a lot more that expected.

Polygon files

The .poly files that you can download reflect the exact clipping boundary that we use in generating the extract, and can be used with programs like Osmosis, osm-history-splitter, or osmconvert to generate the extract from a larger file. The KML files are the same data, just in different format. Please note that these files are not country boundaries but a buffer around countries - go to naturalearthdata.com if you want a simple set of country boundaries.

pbf files

The .osm.pbf data format is the common format for the exchange of raw OpenStreetMap data. It is fast to read and write and can be directly processed by most programs dealing with OSM data. Our .osm.pbf files are 100% pure, un-filtered OSM and contain all data and metadata available in OSM for the region; the only thing they don't contain is history, i.e. information about past edits.

We do, however, keep a couple of older files around. They are not usually shown but you can access them through the directory index; they are timestamped in the file name. We delete these older files after a while. If you are on a very slow and/or flaky internet connection, do not download the file named "...-latest", download the timestamped file instead, then you can resume the download even if the connection fails.

The .osh.pbf format is for history files. We keep one history file for each region that is on offer, and that file is only updated weekly, but it contains the full history of an area and can be used to synthesize a data file for the region for any timestamp in the past.

Changes from the previous download server: Files are now generally called "something-current.osm.pbf" instead of just "something.pbf", and you will get a HTTP redirect if you request the old file name. This makes it likely that the file will be called "something-latest.osm.pbf" on the receiving end too! Also, the leading "/openstreetmap/" in the path name has gone, as have all underscores (replaced by dashes, so great-britain instead of great_britain).

bz2 files

The .osm.bz2 files are bzip2-compressed versions of OSM XML data for the region. We generate these files from the .osm.pbf files with a low-priority background process which means that they will often be a day older than the .osm.pbf counterparts. They are also slower to process and larger.

Shape files

The .shp.zip files contain a number of shape layers (.shp/.shx/.dbf combos). In contrast to the pbf/bz2 files, the shape files are not "complete" - we have made a selection of features and attributes. The shape files do not (yet) support multipolygon geometries, i.e. complex area features could be missing.

The free shape files available here have a relatively simple structure that should be more or less self-explanatory. The structure is different from the shape files that we make to order.

.osc.gz files, or "diff updates"

Whenever we produce a new exctract for a region, we also compute the difference between the new extract and the previous one, and make that available for download so that users can continuously update their own regional extract instead of having to download the full file. The file names for update files follow the convention used by Osmosis for the "read-replication-interval" task, so automatic updates are possible with Osmosis and osmupdate.

Please be aware that these diffs really only represent the changes between the previous and current versions of the Geofabrik extract. These changes will mainly be recent changes in OSM but not exclusively; it is for example possible that we modify the clip bounds for one country a little and therefore the next extract contains more or less data than the previous one, a change that would also be reflected in the diffs.