Setting up Spark Cluster · Location Tools

This section will get you up and running on a distributed cluster (like Cloudera CDH)

Spark Cluster Config

Versions on which batch-geocode has been tested on

Spark 2.3.3

Scala 2.11

Hive [TO DO]

Hadoop 2.7.7

Lucene Index

In order to run the Spark Application, all the nodes in the cluster needs to have the previously built lucene index at the same path. This path is then set for lucene.index.dir key in the [batch-geocode] section of driver.ini

libpostal and jpostal

In addition to lucene index, all the nodes in the cluster need to have libpostal installed and must have the jpostal java bindings from jpostal/src/main/jniLibs after compilation. The jniLibs path is passed as config to spark2-submit as --conf spark.driver.extraLibraryPath=/path/to/jniLibs

For working with all the nodes in the cluster simulaneously you may want to use cluster ssh.