Location Tools

Location Tools

  • Docs
  • API
  • Help
  • Blog

›Spark Batch Processing

Getting Started

  • Installing Prerequisites
  • Building JAR
  • Configuration
  • Building Lucene Index
  • Console Examples

API Server

  • API Server
  • API reference

Spark Batch Processing

  • Setting up Spark Cluster
  • Running Spark Application

Docker

  • Docker Setup
  • Pre-Built Image

Setting up Spark Cluster

This section will get you up and running on a distributed cluster (like Cloudera CDH)

Spark Cluster Config

Versions on which batch-geocode has been tested on

  • Spark 2.3.3
  • Scala 2.11
  • Hive [TO DO]
  • Hadoop 2.7.7

We use Spark on YARN (Cloudera)

Lucene Index

In order to run the Spark Application, all the nodes in the cluster needs to have the previously built lucene index at the same path. This path is then set for lucene.index.dir key in the [batch-geocode] section of driver.ini

libpostal and jpostal

In addition to lucene index, all the nodes in the cluster need to have libpostal installed and must have the jpostal java bindings from jpostal/src/main/jniLibs after compilation. The jniLibs path is passed as config to spark2-submit as --conf spark.driver.extraLibraryPath=/path/to/jniLibs

For working with all the nodes in the cluster simulaneously you may want to use cluster ssh.

Last updated on 12/7/2019 by rahul-pande
← API referenceRunning Spark Application →
  • Spark Cluster Config
  • Lucene Index
  • libpostal and jpostal
Location Tools
Docs
Getting StartedAPI Reference
Resources
NFPAHelp
More
BlogGitHub
Facebook Open Source
Copyright © 2019 National Fire Protection Association