Batch module
Batch processing utility functions.
geoapify.com offers several of its services in a batch version. They all work the same: you start with a list of records and ask to process each component. Processing is component-wise independent and can be POSTed in a batch instead requesting for each component separately. Geoapify is able to distribute processing on its servers. You can use GET requests to ask if a job is completed. If it is, you can GET the results for a complete batch.
BatchClient
Source code in geobatchpy/batch.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
|
geocode(locations, batch_len=1000, parameters=None, simplify_output=False)
Returns batch geocoding results as a list of dictionaries.
Note: this whole process may take long time (hours), depending on the size of the input, the number of batches, and the level of your geoapify.com subscription. In such a case, it may make more sense to store the job URLs to disk, stop there, and continue later with monitor_batch_jobs_and_get_results.
Note: as of this writing, you need to use parameters={'format': 'geojson'} to get results back that are consistent with the standard single-location geocoding endpoint. We have not made it a default for the batch version here to be consistent with Geoapify. But we strongly recommend to use GeoJSON. See https://geojson.org/ for the GeoJSON specification and check the third party package geopandas to learn how to parse such objects for efficient analytics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
locations |
List[Union[str, Dict]]
|
locations in a supported format as validated in parse_geocoding_inputs. |
required |
batch_len |
int
|
split addresses into chunks of maximal size batch_len for parallel processing. |
1000
|
parameters |
Dict[str, str]
|
optional parameters as key value pairs that apply to all locations. See the Geoapify docs. |
None
|
simplify_output |
bool
|
if True, returns output in simplified format, including only top match per address. |
False
|
Returns:
Type | Description |
---|---|
List[dict]
|
List of structured, geocoded, and enriched address records. |
Source code in geobatchpy/batch.py
get_sleep_time(number_of_items)
staticmethod
Choose an appropriate sleep time between GET requests for a batch job.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
number_of_items |
int
|
original number of items/addresses/locations/etc. |
required |
Returns:
Type | Description |
---|---|
int
|
Sleep time in seconds. |
Source code in geobatchpy/batch.py
isoline(geocodes, travel_range, travel_mode='drive', isoline_type='time', batch_len=1000, output_format='geojson')
Returns batch isoline results as a list of dictionaries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
geocodes |
List[Union[Tuple[float, float], Dict[str, float]]]
|
list of input locations as geocodes with a format supported by self.parse_geocodes. |
required |
travel_range |
int
|
either travel time in seconds or travel distance in meters, depending on |
required |
travel_mode |
str
|
one of the many supported 'mode's - see the Geoapify API docs. |
'drive'
|
isoline_type |
str
|
either 'time' or 'distance'. |
'time'
|
batch_len |
int
|
split addresses into chunks of maximal size batch_len for parallel processing. |
1000
|
output_format |
str
|
one of 'geojson', 'topojson', 'geobuf'. |
'geojson'
|
Returns:
Type | Description |
---|---|
List[dict]
|
List of structured isoline records. |
Source code in geobatchpy/batch.py
monitor_batch_jobs_and_get_results(sleep_time, result_urls)
Monitors completion of each batch processing job and returns/stores results.
Previous POST requests started batch processing jobs on geopify.com servers. Here we monitor the status and return/store results when all jobs succeeded.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sleep_time |
int
|
time in seconds to sleep between every request for a single job. |
required |
result_urls |
List[str]
|
list of batch job URLs that are to be monitored. |
required |
Returns:
Type | Description |
---|---|
List[dict]
|
Batch job results as a list - one element per location. |
Source code in geobatchpy/batch.py
place_details(place_ids=None, geocodes=None, batch_len=1000, features=None, language=None)
Returns batch place details results as a list of dictionaries.
Note: this whole process may take long time (hours), depending on the size of the input, the number of batches, and the level of your geoapify.com subscription. In such a case, it may make more sense to store the job URLs to disk, stop there, and continue later with monitor_batch_jobs_and_get_results.
Use either place_ids or geocodes to encode your inputs. place_ids is prioritized if both are not None.
See the Geoapify.com API docs for a list of available features.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
place_ids |
List[str]
|
list of place_id values. |
None
|
geocodes |
List[Union[Tuple[float, float], Dict[str, float]]]
|
list of input locations as geocodes with a format supported by self.parse_geocodes. |
None
|
batch_len |
int
|
split addresses into chunks of maximal size batch_len for parallel processing. |
1000
|
features |
List[str]
|
list of types of details. Defaults to just ["details"] if not specified. |
None
|
language |
str
|
2-character iso language code. |
None
|
Returns:
Type | Description |
---|---|
List[dict]
|
List of structured, reverse geocoded, and enriched address records. |
Source code in geobatchpy/batch.py
places(individual_parameters, parameters=None, batch_len=1000)
Returns batch places results as a list of dictionaries.
Every Places call is defined by a set of parameters. See the Geoapify API docs to get an overview. In the
batch version, we can provide those parameters in two arguments. individual_parameters
is a list of
dictionaries, one per call, which defines parameters applicable to individual calls. The parameters
dictionary
applies to all calls of the batch.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
individual_parameters |
List[dict]
|
one dictionary per Places call. |
required |
parameters |
dict
|
one dictionary with common parameters for all calls. |
None
|
batch_len |
int
|
split calls into chunks of maximal size batch_len for parallel processing. |
1000
|
Returns:
Type | Description |
---|---|
List[dict]
|
List of structured Places responses. |
Source code in geobatchpy/batch.py
post_batch_jobs_and_get_job_urls(api, inputs, parameters=None, batch_len=None)
Triggers batch process on server and returns URLs to be used in GET requests for obtaining results.
The returned URLs represent a batch each. There is a limit in batch size of 1000 which usually means we need to split our workload into multiple batches. But even if the size of our inputs is smaller than 1000, it can help to further limit the size of batches. Several smaller batches may be processed quicker than a few large ones.
Available api values: - geocoding: '/v1/geocode/search' - reverse geocoding: '/v1/geocode/reverse' - place details: '/v2/place-details'
Parameters:
Name | Type | Description | Default |
---|---|---|---|
api |
str
|
name of the batch enabled API - see above. |
required |
inputs |
List[Any]
|
list of locations to be processed by batch jobs. |
required |
parameters |
dict
|
optional parameters - see the Geoapify API docs. |
None
|
batch_len |
int
|
split addresses into chunks of maximal size batch_len for parallel processing. |
None
|
Returns:
Type | Description |
---|---|
List[str]
|
List of batch job URLs. |
Source code in geobatchpy/batch.py
reverse_geocode(geocodes, batch_len=1000, parameters=None, simplify_output=False)
Returns batch reverse geocoding results as a list of dictionaries.
Note: this whole process may take long time (hours), depending on the size of the input, the number of batches, and the level of your geoapify.com subscription. In such a case, it may make more sense to store the job URLs to disk, stop there, and continue later with monitor_batch_jobs_and_get_results.
Note: as of this writing, you need to use parameters={'format': 'geojson'} to get results back that are consistent with the standard single-location geocoding endpoint. We have not made it a default for the batch version here to be consistent with Geoapify. But we strongly recommend to use GeoJSON. See https://geojson.org/ for the GeoJSON specification and check the third party package geopandas to learn how to parse such objects for efficient analytics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
geocodes |
List[Union[Tuple[float, float], Dict[str, float]]]
|
list of input locations as geocodes with a format supported by self.parse_geocodes. |
required |
batch_len |
int
|
split addresses into chunks of maximal size batch_len for parallel processing. |
1000
|
parameters |
Dict[str, str]
|
optional parameters as dictionary. See the geoapify.com API documentation. |
None
|
simplify_output |
bool
|
if True, the output will be provided in a slightly simplified format. |
False
|
Returns:
Type | Description |
---|---|
List[dict]
|
List of structured, reverse geocoded, and enriched address records. |
Source code in geobatchpy/batch.py
parse_geocodes(geocodes)
Validate and parse lists of geocoordinates.
Supported formats: - List of (longitude, latitude) tuples as floats. - List of dictionaries, with each containing attributes 'lon' and 'lat'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
geocodes |
List[Union[Tuple[float, float], Dict[str, float]]]
|
original input. |
required |
Returns:
Type | Description |
---|---|
List[dict]
|
parsed geocodes. |
Source code in geobatchpy/batch.py
parse_geocoding_inputs(locations)
Validate and parse the input for the batch geocoding API.
Supported formats: - List of free text search strings. - List of dictionaries with structured location definition. See the Geoapify API docs for forward geocoding.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
locations |
List[Union[str, dict]]
|
original input. |
required |
Returns:
Type | Description |
---|---|
List[dict]
|
Parsed locations. |
Source code in geobatchpy/batch.py
simplify_batch_geocoding_results(results, input_format)
Simplifies the output of a batch geocoding response.
This function takes the rather verbose response of the batch geocoding service as input and returns a more lightweight version, preserving the most relevant information. Try pandas.json_normalize on the output to see how easy it becomes to convert the JSON into a data table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
results |
List[dict]
|
Dictionary of results as returned by Client.batch.geocode. |
required |
input_format |
str
|
Either 'json' (default) or 'geojson'. See the Geoapify API docs, geocoding, parameter |
required |
Returns:
Type | Description |
---|---|
List[dict]
|
A lightweight, reshaped version of the original results. |
Source code in geobatchpy/batch.py
simplify_batch_place_details_results(results)
Simplifies the output of a batch place details response to a list of lists of GeoJSON feature-like dicts.
The batch Place Details API takes as input locations plus feature types, and responds with Place details for every combination of the two inputs, provided there is a match.
This function parses the rather verbose results
object and parses it into a more lightweight list of lists of
dictionaries.
- A single element of the outer list corresponds to a single location.
- A single element of the inner list corresponds to a single feature type of a single location.
- Every such innermost element is a Python dictionary which effectively is a GeoJSON feature.
See https://geojson.org/ for a specification of the GeoJSON format. For you that means, you can use third party packages like geopandas and shapely to parse the output of this function and analyze these objects with sophisticated opensource packages for geographic data science.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
results |
List[dict]
|
Dictionary of results as returned by Client.batch.place_details. |
required |
Returns:
Type | Description |
---|---|
List[List[dict]]
|
A lightweight, reshaped version of the original results. |