Apache Spark Logging

Posted by Kaya Kupferschmidt • Saturday, December 13. 2014 • Category: Programming
I just began learning about Apache Spark, a great tool for Big Data processing. But when I start the spark-shell, I get lots and lots of logging output, which is really annoying to me:
kaya@dvorak:/opt/kaya$ spark-shell 
2014-12-13 17:59:59,652 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: kaya
2014-12-13 17:59:59,657 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: kaya
2014-12-13 17:59:59,657 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kaya); users with modify permissions: Set(kaya)
2014-12-13 17:59:59,658 INFO  [main] spark.HttpServer (Logging.scala:logInfo(59)) - Starting HTTP Server
2014-12-13 17:59:59,712 INFO  [main] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2014-12-13 17:59:59,736 INFO  [main] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SocketConnector@0.0.0.0:41602
2014-12-13 17:59:59,736 INFO  [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'HTTP class server' on port 41602.
Welcome to
      __              __
     / _/_  _ ___/ /__
    \ \/  \/  `/ _/  '_/
   /_/ ._/_,// //_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
2014-12-13 18:00:03,792 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: kaya
2014-12-13 18:00:03,793 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: kaya
2014-12-13 18:00:03,793 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kaya); users with modify permissions: Set(kaya)
2014-12-13 18:00:04,193 INFO  [sparkDriver-akka.actor.default-dispatcher-2] slf4j.Slf4jLogger (Slf4jLogger.scala:applyOrElse(80)) - Slf4jLogger started
2014-12-13 18:00:04,229 INFO  [sparkDriver-akka.actor.default-dispatcher-2] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Starting remoting
2014-12-13 18:00:04,416 INFO  [sparkDriver-akka.actor.default-dispatcher-3] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting started; listening on addresses :[akka.tcp://sparkDriver@dvorak.ffm.dimajix.net:54519]
2014-12-13 18:00:04,418 INFO  [sparkDriver-akka.actor.default-dispatcher-4] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting now listens on addresses: [akka.tcp://sparkDriver@dvorak.ffm.dimajix.net:54519]
2014-12-13 18:00:04,425 INFO  [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'sparkDriver' on port 54519.
2014-12-13 18:00:04,439 INFO  [main] spark.SparkEnv (Logging.scala:logInfo(59)) - Registering MapOutputTracker
[...]
scala> 
I searched the net for finding a hint how to get rid of all those INFO messages, but most advices quite didn't work. But finally I found a way to calm down the output of spark-shell for the current user. You need to create a file called log4j.properties (or any other name) and store it in a convenient location. I put mine into my Linux home directory /home/kaya/log4j.properties. The file should contain the following content:
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
That was the easy part. Now you need to tell spark-shell to actually use this file for its logging configuration. This can be done by setting the following environment variable SPARK_SUBMIT_OPTS to -Dlog4j.configuration=file:/home/kaya/log4j.properties. This can be done in bash for example by
export SPARK_SUBMIT_OPTS=-Dlog4j.configuration=file:/home/kaya/log4j.properties
I simply added the line into my .bash_profile file, such that the environment variable gets set every time I log into my computer. And now sparks starts as follows:
kaya@dvorak:/opt/kaya$ spark-shell 
Welcome to
      __              __
     / _/_  _ ___/ /__
    \ \/  \/  `/ _/  '_/
   /_/ ._/_,// //_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
14/12/13 18:09:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context available as sc.
Much better. Plus I now see an important warning which I need to address...

0 Comments

Display comments as (Linear | Threaded)
  1. No comments

Add Comment


Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

Markdown format allowed



A Simple Sidebar