joshita / dev

Published

- 1 min read

Apache Spark

img of Apache Spark

Spark interview questions How to solve OOM in apache spark

  1. Increase Executor Memory: Adjust the executor memory in your Spark job configuration: —executor-memory 4G
  2. If the OOM happens on driver - —driver-memory 4G
  3. Tune memory fractions: Spark divides executor memory into storage and execution memory. Adjust these parameters: spark.memory.fraction=0.6 spark.memory.storageFraction=0.5
  4. Use more memory for computation if tasks are spilling: spark.memory.fraction=0.8
  5. Avoid wide Transformations spark.sql.shuffle.partitions=200, use effiecent operations reduceByKey, instead of groupByKey
  6. If the memory is limited : spill to disk
  7. Optimize partitions, way too many partitions reduce paritions, increase partitions coalese and repartition
  8. Broadcast smaller table
  9. Optimize garbage collection - spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=35
  10. Monitor and debug : Enable logs spark.eventLog.enable = true