Spark launch JVM processes on worker nodes to execute tasks and store data in memory. Such processes are called executors. The number of executors per worker node is configurable and depends on:
- The total cores available on the worker node.
- The total memory available on the worker node.
- Your application’s configuration.
For example:
- On a machine with 8 CPU cores and 32 GB of RAM:
- You can run 1 executor with 8 cores and 30 GB of memory.
- Alternatively, you can run 2 executors, each with 4 cores and 15 GB of memory.
Key Configurations for Executors
spark.executor.cores:- Defines the number of CPU cores allocated to each executor.
- Determines how many tasks an executor can run concurrently.
- Example:
spark.executor.cores=4→ Each executor uses 4 cores.
spark.executor.memory:- Specifies the amount of memory allocated to each executor.
- Includes space for task execution and in-memory storage.
- Example:
spark.executor.memory=15g→ Each executor gets 15 GB of memory.
spark.executor.instances:- Controls the total number of executors launched for the application.
- Example:
spark.executor.instances=10→ Launches 10 executors across the cluster.
spark.task.cpus:- Specifies the number of cores each task requires.
- Example:
spark.task.cpus=2→ Each task consumes 2 cores, so an executor with 4 cores can run 2 tasks concurrently.
Memory best practice
Use 90% for Spark, 10% for other OS related stuff.
Dynamic Allocation
Dynamic allocation allows Spark to adjust the number of executors during job execution. Executors can be added or removed based on workload requirements, optimizing resource utilization in shared clusters. This feature is particularly useful when running Spark on resource managers like YARN, Kubernetes, or Mesos.
Key configuration options include:
spark.dynamicAllocation.enabled=truespark.dynamicAllocation.initialExecutorsspark.dynamicAllocation.maxExecutorsspark.dynamicAllocation.minExecutors