Apache Spark Logging

Posted by Kaya Kupferschmidt • Saturday, December 13. 2014 • Category: Programming
I just began learning about Apache Spark, a great tool for Big Data processing. But when I start the spark-shell, I get lots and lots of logging output, which is really annoying to me:

kaya@dvorak:/opt/kaya$ spark-shell 
2014-12-13 17:59:59,652 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: kaya
2014-12-13 17:59:59,657 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: kaya
2014-12-13 17:59:59,657 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kaya); users with modify permissions: Set(kaya)
2014-12-13 17:59:59,658 INFO  [main] spark.HttpServer (Logging.scala:logInfo(59)) - Starting HTTP Server
2014-12-13 17:59:59,712 INFO  [main] server.Server (Server.java:doStart(272)) - jetty-8.y.z-SNAPSHOT
2014-12-13 17:59:59,736 INFO  [main] server.AbstractConnector (AbstractConnector.java:doStart(338)) - Started SocketConnector@0.0.0.0:41602
2014-12-13 17:59:59,736 INFO  [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'HTTP class server' on port 41602.
Welcome to
      __              __
     / _/_  _ ___/ /__
    \ \/  \/  `/ _/  '_/
   /_/ ._/_,// //_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
2014-12-13 18:00:03,792 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing view acls to: kaya
2014-12-13 18:00:03,793 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - Changing modify acls to: kaya
2014-12-13 18:00:03,793 INFO  [main] spark.SecurityManager (Logging.scala:logInfo(59)) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(kaya); users with modify permissions: Set(kaya)
2014-12-13 18:00:04,193 INFO  [sparkDriver-akka.actor.default-dispatcher-2] slf4j.Slf4jLogger (Slf4jLogger.scala:applyOrElse(80)) - Slf4jLogger started
2014-12-13 18:00:04,229 INFO  [sparkDriver-akka.actor.default-dispatcher-2] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Starting remoting
2014-12-13 18:00:04,416 INFO  [sparkDriver-akka.actor.default-dispatcher-3] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting started; listening on addresses :[akka.tcp://sparkDriver@dvorak.ffm.dimajix.net:54519]
2014-12-13 18:00:04,418 INFO  [sparkDriver-akka.actor.default-dispatcher-4] Remoting (Slf4jLogger.scala:apply$mcV$sp(74)) - Remoting now listens on addresses: [akka.tcp://sparkDriver@dvorak.ffm.dimajix.net:54519]
2014-12-13 18:00:04,425 INFO  [main] util.Utils (Logging.scala:logInfo(59)) - Successfully started service 'sparkDriver' on port 54519.
2014-12-13 18:00:04,439 INFO  [main] spark.SparkEnv (Logging.scala:logInfo(59)) - Registering MapOutputTracker
[...]
scala> 

I searched the net for finding a hint how to get rid of all those INFO messages, but most advices quite didn't work. But finally I found a way to calm down the output of spark-shell for the current user. You need to create a file called log4j.properties (or any other name) and store it in a convenient location. I put mine into my Linux home directory /home/kaya/log4j.properties. The file should contain the following content:
# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

That was the easy part. Now you need to tell spark-shell to actually use this file for its logging configuration. This can be done by setting the following environment variable SPARK_SUBMIT_OPTS to -Dlog4j.configuration=file:/home/kaya/log4j.properties. This can be done in bash for example by
export SPARK_SUBMIT_OPTS=-Dlog4j.configuration=file:/home/kaya/log4j.properties

I simply added the line into my .bash_profile file, such that the environment variable gets set every time I log into my computer. And now sparks starts as follows:
kaya@dvorak:/opt/kaya$ spark-shell 
Welcome to
      __              __
     / _/_  _ ___/ /__
    \ \/  \/  `/ _/  '_/
   /_/ ._/_,// //_\   version 1.1.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Type :help for more information.
14/12/13 18:09:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context available as sc.

Much better. Plus I now see an important warning which I need to address...

Disable DHCPv6 on AVM Fritzbox

Posted by Kaya Kupferschmidt • Wednesday, January 25. 2012 • Category: Hardware
If you own a FritzBox router from AVM and use IPv6, this might be interesting for you. If IPv6 is enabled, all clients will get a IPv6 DNS server from the router. Although this might seem to be a nice feature, it creates problems if you run your own DNS server for your local net. All Windows clients first will ask the IPv6 DNS server configured from the FritzBox, and then ask other IPv4 DNS servers. This might be especially bad, if you configured some hostnames in your own DNS server differently for your local net than for the internet (this makes sense if you run some server in your net which is also accessible from the internet). In such situations you really want to get rid of that DNS server announced from the FritzBox.

Unfortunately this is not possible from the GUI, but you can disable DHCPv6 (which is used for announcing) by changing some config file on the FritzBox. So you need to do the following:

1. Enable telnet via #96*7*.
2. Login to your FritzBox with telnet fritz.box (or whatever address the FritzBox has in your LAN)
3. # cd /var/flash
4. # nvi ar7.cfg
5. Change the setting dhcpv6lanmode to dhcpv6lanmode_off_stateless
6. Disable telnet via #96*8*
7. Reboot the FritzBox

This should completely turn off the DHCPv6 server in the FritzBox.

Serviio DLAN Server on Debian

Posted by Kaya Kupferschmidt • Wednesday, December 28. 2011 • Category: Hardware
If you want to share your media collection (that is music, videos and pictures) in your LAN on multimedia devices like tablets, smartphones, TVs and consoles, you end up using either DLNA or UPnP. Because my devices support DLNA, I decided to give it a try to install a DLNA service on a Debian server. Googling around, I found some different implementations of which Serviio media server looked most primising. Implemented in Java it surely uses some more resources than some native C/C++ implementation, but it offers some nice features like plugins and device profiles. And it offers a pure server implementation without a GUI, which was very important to me for running it on a headless server.

Continue reading "Serviio DLAN Server on Debian"

Mercurial, finally!

Posted by Kaya Kupferschmidt • Saturday, August 13. 2011
When I started to work as a software developer, still during the time when I was studying, I made first contact with a source control system. Of course it was the highly respected Microsoft Source Safe. For me as a fresh developer, this was something really new, and I immediately started to like it. I even started to use it for some private projects and immediately saw the benefit of using it, even if you are the sole person working on a project. After that some years later, I made the experience with the old and famous CVS - but only to see how it was replaced by the far better new-kid-on the rock called subversion.

That was really something nice - supporting branching, atomic commits, nice integration in all relevant IDEs and of course TortoiseSVN, which became the de-facto standard for accessing subversion repositories with windows. There came even a clone for CVS called TortoiseCVS. Everything was much better than with Source Safe or CVS.

But after some time, it became clear, that branching (one of the best-selling features of subversion) just doesn't work - simply because merging doesn't work. With those problems in mind, even Linus Torvalds said that "Subversion is the most pointless project ever started". He was into something better called Git which is a distributed source control system. At more or less the same time a second project called Mercurial was started with the same ideas like Git. Both of them work in a completely distributed manner, such that everyone has a copy of the complete repository including all the history. Of course in this situation merging becomes a non-trivial part, and that is the reason why they are doing this so much better than subversion. Without robust merging and tracking branches a distributed version control system simply wouldn't work.

So today I finally made the switch to Mercurial with my private project, after I have been happily using subversion for several years. Luckily it is quite easy to convert a subversion repository to a mercurial repository. I also chose Bitbucket as a public hosting platform, so from now on everyone is invited to clone the Magnum repository which is available at https://bitbucket.org/dimajix/magnum.

Looking back, I think it was still the right decision first to move to Subversion and then to move to Mercurial/Git, simply because those projects weren't up to speed at that time.

Add NIS Client Support to ReadyNAS

Posted by Kaya Kupferschmidt • Saturday, February 21. 2009 • Category: Hardware
This guide is about how to setup probably any ReadyNAS device to act as a NIS/YP client. NIS/YP is a protocol that shares account information accross the network. In such an environment it is important that the ReadyNAS knows about all Linux and Windows account, so it can keep access rights on files in sync. If users had different numerical IDs on Linux clients and on the ReadyNAS, all files created from these clients wouldn't beb accessible on Windows machines any more, because the ReadyNAS wouldn't know which account the files belong to.

On Windows there is already a powerful solution, called Active Directory. This is already supported on the ReadyNAS, but there is no support for the corresponding UNIX protocol, which is NIS. Having a central account authority which manages both Windows and Linux accounts via Active Directory and NIS is very helpful in such mixed environments.

Continue reading "Add NIS Client Support to ReadyNAS"

Phun with Physics

Posted by Kaya Kupferschmidt • Wednesday, January 21. 2009 • Category: General

While browsing on OpenGL.org, I found a really nice educational (?) 2D physics simulator called Phun. The physics engine is a commercial multibody simulator, which seems to be new in the physics scene (at least I never heard of AgX before).

And now, don't waste your time on my blog, and Grab Phun here.

A Simple Sidebar