Version 2.0.0 (in progress)#
Shapely 2.0 version is a major release featuring a complete refactor of the internals and new vectorized (element-wise) array operations, providing considerable performance improvements (based on the developments in the PyGEOS package), along with several breaking API changes and many feature improvements.
For more background, see RFC 1: Roadmap for Shapely 2.0.
Refactor of the internals#
Shapely wraps the GEOS C++ library for use in Python. Before 2.0, Shapely
ctypes to link to GEOS at runtime, but doing so resulted in extra
overhead and installation challenges. With 2.0, the internals of Shapely have
been refactored to expose GEOS functionality through a Python C extension
module that is compiled in advance.
The pointer to the actual GEOS Geometry object is stored in a lightweight Python extension type. A single Geometry Python extension type is defined in C wrapping a GEOSGeometry pointer. This extension type is further subclassed in Python to provide the geometry type-specific classes from Shapely (Point, LineString, Polygon, etc). The GEOS pointer is accessible from C as a static attribute of the Python object (an attribute of the C struct that makes up a Python object), which enables using vectorized functions within C and thus avoiding Python overhead while looping over an array of geometries (see next section).
Vectorized (element-wise) geometry operations#
Before the 2.0 release, Shapely only provided an interface for scalar (individual) geometry objects. Users had to loop over individual geometries within an array of geometries and call scalar methods or properties, which is both more verbose to use and has a large performance overhead.
Shapely 2.0 exposes GEOS operations as vectorized functions that operate on arrays of geometries using a familiar NumPy interface. Those functions are implemented as NumPy *universal functions* (or ufunc for short). A universal function is a function that operates on n-dimensional arrays in an element-by-element fashion and supports array broadcasting. All loops over geometries are implemented in C, which results in substantial performance improvements when performing operations using many geometries. This also allows operations to be less verbose.
NumPy is now a required dependency.
An example of this functionality using a small array of points and a single polygon:
>>> import shapely >>> from shapely import Point, box >>> import numpy as np >>> geoms = np.array([Point(0, 0), Point(1, 1), Point(2, 2)]) >>> polygon = box(0, 0, 2, 2)
Before Shapely 2.0, a
for loop was required to operate over an array of
>>> [polygon.contains(point) for point in geoms] [False, True, False]
In Shapely 2.0, we can now compute whether the points are contained in the polygon directly with one function call:
>>> shapely.contains(polygon, geoms) array([False, True, False])
This results in a considerable speedup, especially for larger arrays of
geometries, as well as a nicer user interface that avoids the need to write
for loops. Depending on the operation, this can give a performance
increase with factors of 4x to 100x. In general, the greatest speedups are
for lightweight GEOS operations, such as
contains, which would previously
have been dominated by the high overhead of
for loops in Python. See
https://caspervdw.github.io/Introducing-Pygeos/ for more detailed examples.
The new vectorized functions are available in the top-level
namespace. All the familiar geospatial methods and attributes from the
geometry classes now have an equivalent as top-level function (with some
small name deviations, such as the
.wkt attribute being available as a
to_wkt() function). Some methods from submodules (for example, several
functions from the
shapely.ops submodule such as
also made available in a vectorized version as top-level function.
A full list of functions can be found in the API docs. TODO add link
Vectorized constructor functions
Optionally output to a user-specified array (
outkeyword argument) when constructing geometries from
Enable bulk construction of geometries with different number of coordinates by optionally taking index arrays in all creation functions.
Shapely 2.0 API changes (deprecated in 1.8)#
The Shapely 1.8 release included several deprecation warnings about API changes that would happen in Shapely 2.0 and that can be fixed in your code (making it compatible with both <=1.8 and >=2.0). See Migrating to Shapely 1.8 / 2.0 for more details on how to update your code.
It is highly recommended to first upgrade to Shapely 1.8 and resolve all deprecation warnings before upgrading to Shapely 2.0.
Summary of changes:
Geometries are now immutable and hashable.
Multi-part geometries such as MultiPolygon no longer behave as “sequences”. This means that they no longer have a length, are not iterable, and are not indexable anymore. Use the
.geomsattribute instead to access individual parts of a multi-part geometry.
Geometry objects no longer directly implement the numpy array interface to expose their coordinates. To convert to an array of coordinates, use the
.coordsattribute instead (
The following attributes and methods on the Geometry classes were previously deprecated and are now removed from Shapely 2.0:
asShape(), and the adapters classes to create geometry-like proxy objects (use
Some new deprecations have been introduced in Shapely 2.0:
Directly calling the base class
BaseGeometry()constructor or the
EmptyGeometry()constructor is deprecated and will raise an error in the future. To create an empty geometry, use one of the subclasses instead, for example
disablefunctions) is deprecated and will be removed in the future. The module no longer has any affect in Shapely >=2.0.
Breaking API changes#
Some additional backwards incompatible API changes were included in Shapely 2.0 that were not deprecated in Shapely 1.8:
Consistent creation of empty geometries (for example
Polygon()now actually creates an empty Polygon instead of an empty geometry collection).
.boundsattribute of an empty geometry now returns a tuple of NaNs instead of an empty tuple (#1023).
simplify()now defaults to
GeometryCollectionthat consists of all empty sub-geometries now returns those empty geometries from its
.geomsattribute instead of returning an empty list (#1420).
Point(..)constructor no longer accepts a sequence of coordinates consisting of more than one coordinate pair (previously, subsequent coordinates were ignored) (#1600).
HeterogeneousGeometrySequenceclass are removed (#1421).
__geom__attribute has been removed. If necessary (although not recommended for use beyond experimentation), use the
_geomattribute to access the raw GEOS pointer (#1417).
loggingfunctionality has been removed. All error messages from GEOS are now raised as Python exceptions (#998).
Several custom exception classes defined in
shapely.errorsthat are no longer used internally have been removed. Errors from GEOS are now raised as
STRtree interface has been substantially changed. See the section
below for more details.
Additionally, starting with GEOS 3.11 (which is included in the binary wheels
on PyPI), the behaviour of the
changed regarding the orientation of the resulting line. With GEOS < 3.11,
the line retains the same direction for a left offset (positive distance) or
has opposite direction for a right offset (negative distance), and this
behaviour was documented as such in previous Shapely versions. Starting with
GEOS 3.11, the function tries to preserve the orientation of the original
Geometry subclasses are now available in the top-level namespace#
Following the new vectorized functions in the top-level
namespace, the Geometry subclasses (
etc) are now available in the top-level namespace as well. Thus it is no
longer needed to import those from the
from shapely.geometry import Point
can be replaced with:
from shapely import Point
import shapely shapely.Point(...)
Note: for backwards compatibility (and being able to write code that works
for both <=1.8 and >2.0), those classes still remain accessible from the
shapely.geometry submodule as well.
More informative repr with truncated WKT#
The repr (
__repr__) of Geometry objects has been simplified and improved
to include a descriptive Well-Known-Text (WKT) formatting. Instead of showing
the class name and id:
>>> Point(0, 0) <shapely.geometry.point.Point at 0x7f0b711f1310>
we now get:
>>> Point(0, 0) <POINT (0 0)>
For large geometries with many coordinates, the output gets truncated to 80 characters.
Support for fixed precision model for geometries and in overlay functions#
GEOS 3.9.0 overhauled the overlay operations (union, intersection, (symmetric) difference). A complete rewrite, dubbed “OverlayNG”, provides a more robust implementation (no more TopologyExceptions even on valid input), the ability to specify the output precision model, and significant performance optimizations. When installing Shapely with GEOS >= 3.9 (which is the case for PyPI wheels and conda-forge packages), you automatically get these improvements (also for previous versions of Shapely) when using the overlay operations.
Shapely 2.0 also includes the ability to specify the precision model directly:
set_precision()function can be used to conform a geometry to a certain grid size (may round and reduce coordinates), and this will then also be used by subsequent overlay methods. A
get_precision()function is also available to inspect the precision model of geometries.
grid_sizekeyword in the overlay methods can also be used to specify the precision model of the output geometry (without first conforming the input geometries).
Releasing the GIL for multithreaded applications#
Shapely itself is not multithreaded, but its functions generally allow for multithreading by releasing the Global Interpreter Lock (GIL) during execution. Normally in Python, the GIL prevents multiple threads from computing at the same time. Shapely functions internally release this constraint so that the heavy lifting done by GEOS can be done in parallel, from a single Python process.
STRtree API changes and improvements#
The biggest change in the
STRtree interface is that all operations
now return indices of the input tree or query geometries, instead of the
geometries itself. These indices can be used to index into anything
associated with the input geometries, including the input geometries
themselves, or custom items stored in another object of the same length and
order as the geometries.
In addition, Shapely 2.0 includes several improvements to
Directly include predicate evaluation in
STRtree.query()by specifying the
predicatekeyword. If a predicate is provided, tree geometries with bounding boxes that overlap the bounding boxes of the input geometries are further filtered to those that meet the predicate (using prepared geometries under the hood for efficiency).
Query multiple input geometries (spatial join style) with
STRtree.query()by passing an array of geometries. In this case, the return value is a 2D array with shape (2, n) where the subarrays correspond to the indices of the input geometries and indices of the tree geometries associated with each.
STRtree.query_nearest()method was added, returning the index of the nearest geometries in the tree for each input geometry. Compared to
STRtree.nearest(), which only returns the index of a single nearest geometry for each input geometry, this new methods allows for:
returning all equidistant nearest geometries,
excluding nearest geometries that are equal to the input,
max_distanceto limit the search radius, potentially increasing the performance,
optionally returning the distance.
STRtreecreation to allow querying the tree in a multi-threaded context.
Bindings for new GEOS functionalities#
Several (new) functions from GEOS are now exposed in Shapely:
build_area()(GEOS >= 3.8)
segmentize()(GEOS >= 3.10)
dwithin()(GEOS >= 3.10)
remove_repeated_points()(GEOS >= 3.11)
line_merge()added directed parameter (GEOS > 3.11)
concave_hull()(GEOS >= 3.11)
In addition some aliases for existing methods have been added to provide a method name consistent with GEOS or PostGIS:
Getting information / parts / coordinates from geometries#
A set of GEOS getter functions are now also exposed to inspect geometries:
Several functions are added to extract parts:
get_geometry()to get a geometry from a GeometryCollection or Multi-part geometry.
get_point()to get a point (vertex) of a linestring or linearring.
Methods to extract all parts or coordinates at once have been added:
get_parts()function can be used to get individual parts of an array of multi-part geometries.
get_rings()function, similar as
get_partsbut specifically to extract the rings of Polygon geometries.
get_coordinates()function to get all coordinates from a geometry or array of goemetries as an array of floats.
Each of those three functions has an optional
return_index keyword, which
allows to also return the indexes of the original geometries in the source
Prepared geometries are now no longer separate objects, but geometry objects
themselves can be prepared (this makes the
prepare() function generates a GEOS prepared geometry which is
stored on the Geometry object itself. All binary predicates (except
equals) will make use of this if the input geometry has already been
prepared. Helper functions
are also available.
New IO methods (GeoJSON, ragged arrays)#
Addition of a
total_bounds()function to return the outer bounds of an array of geometries.
empty()to create a geometry array pre-filled with None or with empty geometries.
Performance improvement in constructing LineStrings or LinearRings from numpy arrays for GEOS >= 3.10.
box()ufunc to use internal C function for creating polygon (about 2x faster) and added
ccwparameter to create polygon in counterclockwise (default) or clockwise direction.
Start of a benchmarking suite using ASV.
Fixed several corner cases in WKT and WKB serialization for varying GEOS versions, including:
Fixed the WKT serialization of single part 3D empty geometries to correctly include “Z” (for GEOS >= 3.9.0).
Handle empty points in WKB serialization by conversion to
POINT (nan, nan)consistently for all GEOS versions (GEOS started doing this for >= 3.9.0).
Thanks to everyone who contributed to this release! People with a “+” by their names contributed a patch for the first time.
Adam J. Stewart +
Alan D. Snow +
Brendan Ward +
Casper van der Wel +
James Myatt +
Joris Van den Bossche
Keith Jenkins +
Kian Meng Ang +
Krishna Chaitanya +
Martin Fleischmann +
Martin Lackner +
Tanguy Ophoff +
Giorgos Papadokostakis +
Mattijn van Hoek +