Data Extracts - Technical Details

The extract process

Once a day, around 21:00 CET, we update a locally held version of the planet file from the latest OpenStreetMap data and split it into a number of pre-defined regions. This is done using the Osmosis and osm-history-splitter programs.

Every couple of months we re-initialise our update process with a new planet file just to make sure we're not carrying over potential replication errors forever.

The splitting is done in a cascading fashion - first we split the world in two halves, then we cut out the continents from each half, then countries, and so on.

We use polygonal boundaries for the splitting - boundaries that are sometiems derived and simplified from OSM data, sometimes just hand-drawn. The boundaries usually follow country borders, but occasionally we take liberties and include a litte more of a neighbouring country if this greatly simplifies the polygon. We also use a clipping process that attempts to keep OSM ways complete even if they are only partly withing the clipping region. The same is not (yet) true for relations.

Polygon files

The .poly files that you can download reflect the exact clipping boundary that we use in generating the extract, and can be used with programs like Osmosis, osm-history-splitter, or osmconvert to generate the extract from a larger file. The KML files are the same data, just in different format. Please note that these files are not country boundaries but a buffer around countries - go to naturalearthdata.com if you want a simple set of country boundaries.

pbf files

The .osm.pbf data format is the common format for the exchange of raw OpenStreetMap data. It is fast to read and write and can be directly processed by most programs dealing with OSM data. Our .osm.pbf files are 100% pure, un-filtered OSM and contain all data and metadata available in OSM for the region; the only thing they don't contain is history, i.e. information about past edits.

We do, however, keep a couple of older files around. They are not usually shown but you can access them through the directory index; they are timestamped in the file name. We delete these older files after a while. If you are on a very slow and/or flaky internet connection, do not download the file named "...-latest", download the timestamped file instead, then you can resume the download even if the connection fails.

Changes from the previous download server: Files are now generally called "something-current.osm.pbf" instead of just "something.pbf", and you will get a HTTP redirect if you request the old file name. This makes it likely that the file will be called "something-latest.osm.pbf" on the receiving end too! Also, the leading "/openstreetmap/" in the path name has gone, as have all underscores (replaced by dashes, so great-britain instead of great_britain).

bz2 files

The .osm.bz2 files are bzip2-compressed versions of OSM XML data for the region. We generate these files from the .osm.pbf files with a low-priority background process which means that they will often be a day older than the .osm.pbf counterparts. They are also slower to process and larger.

Shape files

The .shp.zip files contain a number of shape layers (.shp/.shx/.dbf combos). In contrast to the pbf/bz2 files, the shape files are not "complete" - we have made a selection of features and attributes. The shape files do not (yet) support multipolygon geometries, i.e. complex area features could be missing.

The free shape files available here have a relatively simple structure that should be more or less self-explanatory. The structure is different from the shape files that we make to order.

.osc.gz files, or "diff updates"

Whenever we produce a new exctract for a region, we also compute the difference between the new extract and the previous one, and make that available for download so that users can continuously update their own regional extract instead of having to download the full file. The file names for update files follow the convention used by Osmosis for the "read-replication-interval" task, so automatic updates are possible with Osmosis and osmupdate.

Please be aware that these diffs really only represent the changes between the previous and current versions of the Geofabrik extract. These changes will mainly be recent changes in OSM but not exclusively; it is for example possible that we modify the clip bounds for one country a little and therefore the next extract contains more or less data than the previous one, a change that would also be reflected in the diffs.