I've recently been doing a lot of work with getting stack traces for a project I'm working on, and I've had what I sometimes call a "BGO" - Blinding Glimpse of the Obvious: getting Stack Trace information in Java is dead slow.
JDK 1.4 introduced the getStackTrace() method on Thread, which returns an array of StackTraceElement objects. This is way better than what had to be done previously, which was to generate a stack trace dump to a String-backed output stream and parse the dump, but underneath it all (and you can see this if you download the source for the JDK) it calls native code (fillInStackTrace()) to get the actual stack information for the thread. This is because threads are implemented natively using platform-dependent code that will not only vary by platform but by JVM vendor and version.
As it turns out, this native code, at least on my machine, is very slow. How slow? Rather than tell you in terms of milliseconds, I'll describe it in terms of work accomplished.
In slightly less than the time it took to get the stack trace for the current thread, I was able to iterate through all the elements of the returned array of StackTraceElement objects, use Class.forName() on each one to get the class's definition through reflection, testing to see if it was type-assignable to another class I was interested in, and checking if the method had the right name, all the while counting far up the call stack I had to go before I found a match. Once I found one, I then looked in a Map of Strings (package name) to another Map of Strings (class name) to yet another Map of Strings (method name) to an array of ints, (making any necessary structures along the way), and incrementing the appropriate int found at the innermost level.
This was even before I did any optimization to the code that processed the stack trace.
So, what I learned here was that when the Log4J manual tells you that getting method and line number information for your log.debug() statements is slow, they mean it. Any time you frequently get stack traces, through exceptions, Thread.dumpStack(), or Thread.currentThread().getStackTrace() -- particularly inside tight loops -- you are asking for a huge degradation in performance.