[sakai-dev] GC memory leak with java.lang.ref.Finalizer / BaseAnnouncementService

classic Classic list List threaded Threaded
5 messages Options
Austin Nakasone Austin Nakasone
Reply | Threaded
Open this post in threaded view
|

[sakai-dev] GC memory leak with java.lang.ref.Finalizer / BaseAnnouncementService

Hello Sakai Devs,

Yesterday we experienced an issue where GC was looping over and over causing some performance issues on one of our servers.

We're running:

Sakai 11.4
Tomcat 8.0.47
Java 1.8.0_151
MySQL 5.6
RHEL 7

with these Java opts (these aren't all of them, but the ones that are mem. related):

-Xmx10280m 
-Xms10280m 
-XX:MaxMetaspaceSize=1024m 
-Xmn2500m 
-XX:+UseConcMarkSweepGC 
-XX:+UseParNewGC 
-XX:CMSInitiatingOccupancyFraction=80 
-XX:+DisableExplicitGC 
-XX:+DoEscapeAnalysis 
-verbose:gc 
-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps 

The error in the logs that keep looping over and over are:

47863.032: [CMS-concurrent-preclean: 5.037/5.037 secs] [Times: user=5.35 sys=0.04, real=5.04 secs] 
47863.032: [CMS-concurrent-abortable-preclean-start]
47863.032: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
47863.033: [GC (CMS Final Remark) [YG occupancy: 1542954 K (2304000 K)]47863.033: [Rescan (parallel) , 0.3890201 secs]47863.423: [weak refs processing, 0.0000592 secs]47863.423: [class unloading, 0.2006707 secs]47863.623: [scrub symbol table, 0.0485713 secs]47863.672: [scrub string table, 0.0086044 secs][1 CMS-remark: 7966719K(7966720K)] 9509674K(10270720K), 0.6472521 secs] [Times: user=2.06 sys=0.02, real=0.65 secs] 
47863.682: [CMS-concurrent-sweep-start]
47865.226: [CMS-concurrent-sweep: 1.544/1.544 secs] [Times: user=1.58 sys=0.01, real=1.54 secs] 
47865.226: [CMS-concurrent-reset-start]
47865.241: [CMS-concurrent-reset: 0.015/0.015 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
06-Feb-2018 16:40:56.125 DEBUG [ajp-bio-8009-exec-309] edu.gatech.sakai.user.impl.GaTechUserDirectoryServiceImpl.authorizeUserLogin authorizeUserLogin; user.eid; gasconm; user.email: [hidden email]
47867.243: [GC (CMS Initial Mark) [1 CMS-initial-mark: 7966695K(7966720K)] 9965851K(10270720K), 0.3508320 secs] [Times: user=0.93 sys=0.00, real=0.35 secs] 
47867.594: [CMS-concurrent-mark-start]
47871.695: [CMS-concurrent-mark: 4.099/4.101 secs] [Times: user=8.48 sys=0.03, real=4.10 secs] 
47871.695: [CMS-concurrent-preclean-start]
47871.739: [CMS-concurrent-preclean: 0.041/0.044 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 
47871.739: [CMS-concurrent-abortable-preclean-start]
47871.739: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
47871.740: [GC (CMS Final Remark) [YG occupancy: 2083498 K (2304000 K)]47871.741: [Rescan (parallel) , 0.5782041 secs]47872.319: [weak refs processing, 0.0000354 secs]47872.319: [class unloading, 0.1000239 secs]47872.419: [scrub symbol table, 0.0407029 secs]47872.460: [scrub string table, 0.0067250 secs][1 CMS-remark: 7966695K(7966720K)] 10050194K(10270720K), 0.7260444 secs] [Times: user=2.95 sys=0.01, real=0.72 secs] 

While the problem was happening I took a heap dump with jmap and upon analyzing it with Eclipse Memory Analyzer, it said that there was a leak with

java.lang.ref.Finalizer

and it was taking up 61% of the heap.  Digging a little deeper, the first 10 or 20 of those objects (out of 8million?) were coming from

BaseAnnouncementService

and of those 10 or 20 they all seemed to coming from the same site, which appeared to have an announcement that included an <img> tag(s) with base64 image data.  I'm not sure if the content of the announcement had anything to do with it or if the instructor did something specific and the content is irrelevant. 

Has anyone seen anything like this?  Is it a memory leak in the Announcement tool?  Or do we need to tune our Memory settings to improve the GC?

Thanks,

Austin

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

memory_leak1.png (131K) Download Attachment
memory_leak2.png (211K) Download Attachment
memory_leak3.png (536K) Download Attachment
Austin Nakasone Austin Nakasone
Reply | Threaded
Open this post in threaded view
|

Re: [sakai-dev] GC memory leak with java.lang.ref.Finalizer / BaseAnnouncementService

Thanks Hendrik!

On Wed, Feb 7, 2018 at 4:01 PM, Hendrik Steller <[hidden email]> wrote:
On Donnerstag, 8. Februar 2018 02:04:16 CET Austin wrote:

> Creating a "secret JSP" that runs System.runFinalization() sounds
> interesting, but I might need a little help getting started with that.

1. Copy the code below into a file with a top secret name of your choice
ending in ".jsp"  (e.g.  "thats_totally_secure.jsp")

2. Copy that file into the root folder of a webapp of your choice (e.g.
tomcat/webapps/foobar)

Now  you (well, and everyone else..) can call the JSP directly by opening the
URL http://server/foobar/thats_totally_secure.jsp
Or, if you copy the file into tomcat/webapps/ROOT, you can call it via
http://server/thats_totally_secure.jsp

(3.)
Delete the file again / move it out of the webapp's folder once you don't need
it anymore.

Hendrik

Code (also attached):


<%@page import="java.util.Date"%>
<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<%!
        private static Date lastFinalize;
        private static Date lastGC;
%>
<%
        if ("post".equalsIgnoreCase(request.getMethod())) {
                final String action = request.getParameter("whatdo");
                if ("finalize".equals(action)) {
                        lastFinalize = new Date();
                        System.runFinalization();
                }
                else if ("gc".equals(action)) {
                        lastGC = new Date();
                        System.gc();
                }
        }
%>
<html>
        <head><title>Top secret JSP</title></head>
        <body>
                <div>
                        <form action="" method="POST">
                                <span>Last manual finalize run: <%=lastFinalize != null ?
lastFinalize : "never"%></span> <input type="submit"
                                        value="Run finalize"> <input type="hidden" name="whatdo"
value="finalize">
                        </form>
                </div>
                <div>
                        <form action="" method="POST">
                                <span>Last manual garbage collection: <%=lastGC != null ? lastGC :
"never"%></span> <input type="submit" value="Force GC">
                                <input type="hidden" name="whatdo" value="gc">
                        </form>
                </div>
        </body>
</html>

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.
Austin Nakasone Austin Nakasone
Reply | Threaded
Open this post in threaded view
|

Re: [sakai-dev] GC memory leak with java.lang.ref.Finalizer / BaseAnnouncementService

Does anyone know which comes first (or how I can find out) ... 

if the heap fills up with Finalizer objects, then the [CMS-concurrent-reset-start] is unable to process them causing it to keep looping?

Or...

does the [CMS-concurrent-reset-start] fail first (for some other reason), making the heap fill up with Finalizer objects?

On Wed, Feb 7, 2018 at 4:15 PM, Austin <[hidden email]> wrote:
Thanks Hendrik!

On Wed, Feb 7, 2018 at 4:01 PM, Hendrik Steller <[hidden email]> wrote:
On Donnerstag, 8. Februar 2018 02:04:16 CET Austin wrote:

> Creating a "secret JSP" that runs System.runFinalization() sounds
> interesting, but I might need a little help getting started with that.

1. Copy the code below into a file with a top secret name of your choice
ending in ".jsp"  (e.g.  "thats_totally_secure.jsp")

2. Copy that file into the root folder of a webapp of your choice (e.g.
tomcat/webapps/foobar)

Now  you (well, and everyone else..) can call the JSP directly by opening the
URL http://server/foobar/thats_totally_secure.jsp
Or, if you copy the file into tomcat/webapps/ROOT, you can call it via
http://server/thats_totally_secure.jsp

(3.)
Delete the file again / move it out of the webapp's folder once you don't need
it anymore.

Hendrik

Code (also attached):


<%@page import="java.util.Date"%>
<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<%!
        private static Date lastFinalize;
        private static Date lastGC;
%>
<%
        if ("post".equalsIgnoreCase(request.getMethod())) {
                final String action = request.getParameter("whatdo");
                if ("finalize".equals(action)) {
                        lastFinalize = new Date();
                        System.runFinalization();
                }
                else if ("gc".equals(action)) {
                        lastGC = new Date();
                        System.gc();
                }
        }
%>
<html>
        <head><title>Top secret JSP</title></head>
        <body>
                <div>
                        <form action="" method="POST">
                                <span>Last manual finalize run: <%=lastFinalize != null ?
lastFinalize : "never"%></span> <input type="submit"
                                        value="Run finalize"> <input type="hidden" name="whatdo"
value="finalize">
                        </form>
                </div>
                <div>
                        <form action="" method="POST">
                                <span>Last manual garbage collection: <%=lastGC != null ? lastGC :
"never"%></span> <input type="submit" value="Force GC">
                                <input type="hidden" name="whatdo" value="gc">
                        </form>
                </div>
        </body>
</html>


--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.
Austin Nakasone Austin Nakasone
Reply | Threaded
Open this post in threaded view
|

Re: [sakai-dev] GC memory leak with java.lang.ref.Finalizer / BaseAnnouncementService

In reply to this post by Austin Nakasone
Hello Matthew,

Sorry, I'm new to MAT.  I'm not sure if I'm looking in the right place...

The Finalizer thread stats show:

Type|Name       |Value
-----------------------------------------------------------
ref |lock       |java.lang.Object @ 0x5e0e9aad0
ref |unfinalized|java.lang.ref.Finalizer @ 0x55ab72458
ref |queue      |java.lang.ref.ReferenceQueue @ 0x5d9c07f90
-----------------------------------------------------------

Also the thread details says that the thread name is NULL.  I'm not sure if that's expected?

Thanks,

Austin


On Thu, Feb 8, 2018 at 9:05 AM, Matthew Buckett <[hidden email]> wrote:
Are you able to look at a thread dump and see the state of the finalizer thread(s), it sounds like they may have become stuck and although the the announcement objects are in the queue are they the ones currently being processed?

As Hendrik mentioned Sakai does lots of scary stuff in finalizers (like making database calls) so this could well be where it's got stuck.

On 8 February 2018 at 18:20, Austin <[hidden email]> wrote:
Does anyone know which comes first (or how I can find out) ... 

if the heap fills up with Finalizer objects, then the [CMS-concurrent-reset-start] is unable to process them causing it to keep looping?

Or...

does the [CMS-concurrent-reset-start] fail first (for some other reason), making the heap fill up with Finalizer objects?

On Wed, Feb 7, 2018 at 4:15 PM, Austin <[hidden email]> wrote:
Thanks Hendrik!

On Wed, Feb 7, 2018 at 4:01 PM, Hendrik Steller <[hidden email]> wrote:
On Donnerstag, 8. Februar 2018 02:04:16 CET Austin wrote:

> Creating a "secret JSP" that runs System.runFinalization() sounds
> interesting, but I might need a little help getting started with that.

1. Copy the code below into a file with a top secret name of your choice
ending in ".jsp"  (e.g.  "thats_totally_secure.jsp")

2. Copy that file into the root folder of a webapp of your choice (e.g.
tomcat/webapps/foobar)

Now  you (well, and everyone else..) can call the JSP directly by opening the
URL http://server/foobar/thats_totally_secure.jsp
Or, if you copy the file into tomcat/webapps/ROOT, you can call it via
http://server/thats_totally_secure.jsp

(3.)
Delete the file again / move it out of the webapp's folder once you don't need
it anymore.

Hendrik

Code (also attached):


<%@page import="java.util.Date"%>
<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<%!
        private static Date lastFinalize;
        private static Date lastGC;
%>
<%
        if ("post".equalsIgnoreCase(request.getMethod())) {
                final String action = request.getParameter("whatdo");
                if ("finalize".equals(action)) {
                        lastFinalize = new Date();
                        System.runFinalization();
                }
                else if ("gc".equals(action)) {
                        lastGC = new Date();
                        System.gc();
                }
        }
%>
<html>
        <head><title>Top secret JSP</title></head>
        <body>
                <div>
                        <form action="" method="POST">
                                <span>Last manual finalize run: <%=lastFinalize != null ?
lastFinalize : "never"%></span> <input type="submit"
                                        value="Run finalize"> <input type="hidden" name="whatdo"
value="finalize">
                        </form>
                </div>
                <div>
                        <form action="" method="POST">
                                <span>Last manual garbage collection: <%=lastGC != null ? lastGC :
"never"%></span> <input type="submit" value="Force GC">
                                <input type="hidden" name="whatdo" value="gc">
                        </form>
                </div>
        </body>
</html>


--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.



--
  Matthew Buckett, Senior VLE Developer

  Systems and Applications, Technology Enhanced Learning
  IT Services, University of Oxford
  13 Banbury Road, OX2 6NN
  Tel: 01865 283349

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.

thread detail.png (113K) Download Attachment
thread stats.png (269K) Download Attachment
Mark Triggs Mark Triggs
Reply | Threaded
Open this post in threaded view
|

Re: [sakai-dev] GC memory leak with java.lang.ref.Finalizer / BaseAnnouncementService

Hi Austin,

Have you been able to get a thread dump out of Tomcat when it has run out of memory? I think that might be the missing piece here (I read back through the archives and didn't see one, but apologies if I missed it).

When an object with a finalize() method is to be garbage collected, a java.lang.ref.Finalizer object is created for it and added to a queue. That queue is processed by a "Finalizer" background thread, which takes care of running the appropriate finalize() method for each of those objects.

All it takes is for one object to have a badly implemented finalize() method for the whole thing to come crashing down. As a perverse example, creating an object with a method like:

 class UhOh {
     protected void finalize() throws Throwable {
         while (true) {}
     }
 }

is going to permanently lock up that Finalizer thread. At that point you're leaking memory and it's only a matter of time until you get an OOM error (which may take a while to manifest if you're running with a large heap).

The important thing here is that the counts of objects waiting to be finalized don't necessarily tell you the root cause—there may just be millions of ready-to-be-collect object stuck behind a single bad object, so you've got to find the needle in the haystack. My above example is a bit contrived, but I've seen this happen with single LDAP connections deadlocking, for instance.

If you can get a thread dump from the JVM, that will let you see what the Finalizer thread is actually doing, and hopefully give a hint as to what's blocking the queue. While the JVM is running, you can do:

 kill -3 <java pid>

and it will log a thread dump to your catalina.out file. Look for an entry called "Finalizer"--something like this:

 "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007faac04d9000 nid=0x347a in Object.wait() [0x00007faaadf15000]
    java.lang.Thread.State: WAITING (on object monitor)
         at java.lang.Object.$$YJP$$wait(Native Method)
         at java.lang.Object.wait(Object.java)
         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
         - locked <0x00000000800466d0> (a java.lang.ref.ReferenceQueue$Lock)
         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
         at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

Mine's currently doing nothing, which in a healthy system is probably a good thing. If yours looks different, try running multiple thread dumps a few seconds apart and see how they change. That should give more an idea of where that thread is spending its time.

Cheers,

Mark

Austin <[hidden email]> writes:

Hello All,

Sorry to keep bugging you all with this problem. But here's some additional info from our heap dumps.

After comparing the java.lang.ref.Finalizer object between a bad and a normal heap dump (e.g. when the system is running normally), the number objects retained in Finalizer is inflated by the thousands (note these are just a handful of objects, I haven't listed them all). However, that makes sense because Sam Ottenhoff said,

*"If a "Finalizer" thread cannot keep up with the rate at which higher priority threads cause finalizable objects to be queued, the finalizer queue will keep growing and cause the Java heap to fill up. Eventually the Java heap will get exhausted and a java.lang.OutOfMemoryError will be thrown"*

However, we don't have an OutOfMemoryError when this happens, the GC just keeps looping over and over, but you can see below how many more objects are in the Finalizer when the problem happens.

 +-----------+----------+------------------------------------
 ----------------------------------------------------+
 |  BAD GC   |  GOOD GC |
 | #objects  | #objects |  Class
 +-----------+----------+------------------------------------
 ----------------------------------------------------+
 |   245,941 |   18,875 |  org.sakaiproject.announcement.impl.BaseAnnouncementService$BaseAnnouncementMessageEdit
 | 2,171,252 |    6,289 |  org.sakaiproject.content.impl.BaseContentService$BaseCollectionEdit
 |   444,086 |    2,443 |  org.sakaiproject.content.impl.BaseContentService$BaseResourceEdit
 |   248,318 |      200 |  java.util.zip.ZipFile\$ZipFileInputStream
 |   246,652 |    1,715 |  java.util.zip.Inflater
 |   248,030 |      200 |  java.util.zip.ZipFile\$ZipFileInflaterInputStream
 |   242,115 |      207 |  sun.net.www.protocol.jar.URLJarFile
 |    39,572 |       18 |  com.sun.crypto.provider.PBEKey
 |   341,342 |    3,236 |  java.util.jar.JarFile
 |   103,226 |   57,231 |  org.sakaiproject.user.impl.BaseUserDirectoryService$BaseUserEdit
 |    92,685 |       49 |  java.io.FileInputStream
 |     5,780 |        0 |  com.novell.ldap.LDAPConnection
 |    67,453 |       11 |  org.apache.commons.fileupload.disk.DiskFileItem
 +-----------+----------+------------------------------------
 ------------------------------------------------------+

The 2 million org.sakaiproject.content.impl.BaseContentService$BaseCollectionEdit objects above seemed suspicious, so looking into *that* object, there were 400K+ instances each of Content Collection objects that referenced "/" and "/group/". And normal site references seemed to have 1000s of references more than usual. (see screenshots). However, I wouldn't know if these increased number of "/" and "/group/" objects is *CAUSING* the Finalizer to keep growing, or if it's a *SYMPTOM* of it growing.

We also tried adding these to our JAVA config, but the problem still happened (I didn't get a chance to take a heap dump when the prob. happened with these settings)

-XX:CMSMaxAbortablePrecleanTime=10000
-XX:+CMSScavengeBeforeRemark

Thanks,

Austin

-- 
Mark Triggs
<[hidden email]>

--
You received this message because you are subscribed to the Google Groups "Sakai Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Visit this group at https://groups.google.com/a/apereo.org/group/sakai-dev/.