Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The takeaway from this article is that, if you run Jira or ConfluenceServer, there is a config tweak you really ought to make, to allow the use of GCViewer to debug any future garbage collection problems:

For Jira,

...

edit /opt/atlassian/jira/bin/set-gc-params.sh and add

...

make this change:

Code Block
diff --git a/bin/set-gc-params.sh b/bin/set-gc-params.sh
--- a/bin/set-gc-params.sh
+++ b/bin/set-gc-params.sh
@@ -6,7 +6,7 @@
 then
     # In Java 9, GC logging has been re-implemented using the Unified GC logging framework.
     # See http://openjdk.java.net/jeps/158 or https://docs.oracle.com/javase/10/jrockit-hotspot/logging.htm
-    GC_JVM_PARAMETERS="-Xlog:gc*:file=$LOGBASEABS/logs/atlassian-jira-gc-%t.log:time,uptime:filecount=5,filesize=20M ${GC_JVM_PARAMETERS}"
+    GC_JVM_PARAMETERS="-Xlog:gc*:file=$LOGBASEABS/logs/atlassian-jira-gc-%t.log:time,uptime,tags,level:filecount=5,filesize=20M ${GC_JVM_PARAMETERS}"
     GC_JVM_PARAMETERS="${GC_JVM_PARAMETERS} ${JVM_GC_ARGS}"
 else
     # Set the JVM arguments used to start Jira. For a description of the options, see

...


What's this all about?

Java garbage collection is not something most people think about much, until one day your Jira server grinds to a halt. If you're luckyyou'll see an OutOfMemoryError  in your /opt/atlassian/jira/logs/catalina.out  log file, making the diagnosis easy. But more often Jira will just be really slow, high CPU usage, and thread dumps showing a variety of threads doing normal CPU-bound things.  In this case your Jira may be GC thrashing: i.e. under memory pressure, and spending most of its time (but not quite enough to trigger an OutOfMemoryError) on garbage collector threads trying to clear up memory.

Jira is written in Java, which is a "garbage-collected" language. This means that Java programmers don't need to worry about de-allocating memory; the Java virtual machine (JVM) periodically "garbage collects" any memory that is unused.

Garbage collection (GC) happens all the time in a running JVM, and typically takes only milliseconds. However if your JVM is under memory pressure, a full GC can't actually free up that much memory, and soon after another GC is needed, and another.. and you get a situation where the JVM is spending more time GC'ing than doing anything useful. This is called GC thrashing.

If you're lucky you'll see an OutOfMemoryError in your /opt/atlassian/jira/logs/catalina.out  log file, making the diagnosis easy. But sometimes Jira will just be really slow, high CPU usage, and thread dumps showing a variety of threads doing normal CPU-bound things. but not quite enough to trigger an OutOfMemoryError) on garbage collector threads trying to clear up memory.

The best way to diagnose GC thrashing, or any "what the hell is Jira so busy with?" situation, is to use top  (or htop  or atop ) with threads enabled, so you can see which Java thread(s) are the CPU consumers. I like atop for its ability to rewind and view past states. Here is atop  after The best way to diagnose GC thrashing, or any "what the hell is Jira so busy with?" situation, is to use top  (or htop  or atop ) with threads enabled, so you can see which Java thread(s) are the CPU consumers. I like atop for its ability to rewind and view past states. Here is atop  after pressing 'y' to enable threads:

...

See in the CPU column: Jira is consuming all CPU (1092%), and specifically the only threads active (79% each) are ParGC Thread  threads.

GC logs

Out the box, Jira is configured (in bin/set-gc-params.sh ) with (8.7.1+) is configured with these GC configuration parameters:

Code Block
# in bin/setenv.sh
JVM_GC_ARGS="-XX:+ExplicitGCInvokesConcurrent"
# in bin/set-gc-params.sh
GC_JVM_PARAMETERS="-Xlog:gc*:file=$LOGBASEABS/logs/atlassian-jira-gc-%t.log:time,uptime:filecount=5,filesize=20M $  {GC_JVM_PARAMETERS}"
    GC_JVM_PARAMETERS="-XX:+UseParallelGC ${GC_JVM_PARAMETERS}"

That means you should have a log file, /opt/atlassian/jira/logs/atlassian-jira-log-gc-*.log with GC logs looking like this:

 ${JVM_GC_ARGS}"

If you are using Java 9 or above, this means the G1GC algorithm is in use (see Disruptive Changes to GC Logging in Java 9). Earlier versions of Jira used the older Parallel GC (see

Jira
serverAtlassian JIRA
serverId144880e9-a353-312f-9412-ed028e8166fa
keyJRASERVER-64331
), and look like this:

Code Block
    GC_JVM_PARAMETERS="-Xlog:gc*:file=$LOGBASEABS/logs/atlassian-jira-gc-%t.log:time,uptime:filecount=5,filesize=20M ${GC_JVM_PARAMETERS}"
    GC_JVM_PARAMETERS="-XX:+UseParallelGC ${GC_JVM_PARAMETERS}"

Either way, you should have a log file, /opt/atlassian/jira/logs/atlassian-jira-log-gc-*.log. GC logs looking something like this:

Code Block
Code Block
[2020-09-22T03:34:05.289-0700][2465737.513s] GC(355175) Pause Full (Ergonomics) 5020M->4521M(5587M) 3608.541ms
[2020-09-22T03:34:05.289-0700][2465737.513s] GC(355175) User=41.39s Sys=0.00s Real=3.60s
[2020-09-22T03:34:05.524-0700][2465737.748s] GC(355176) Pause Full (Ergonomics)
[2020-09-22T03:34:05.524-0700][2465737.748s] GC(355176) Marking Phase
[2020-09-22T03:34:0605.432289-0700][24657382465737.656s513s] GC(355176355175) Marking Phase 908.653msPause Full (Ergonomics) 5020M->4521M(5587M) 3608.541ms
[2020-09-22T03:34:0605.432289-0700][24657382465737.656s513s] GC(355176355175) Summary PhaseUser=41.39s Sys=0.00s Real=3.60s
[2020-09-22T03:34:0605.432524-0700][24657382465737.657s748s] GC(355176) SummaryPause Phase 0.074msFull (Ergonomics)
[2020-09-22T03:34:0605.432524-0700][24657382465737.657s748s] GC(355176) AdjustMarking RootsPhase
[2020-09-22T03:34:06.592432-0700][2465738.816s656s] GC(355176) AdjustMarking RootsPhase 159908.736ms653ms
[2020-09-22T03:34:06.592432-0700][2465738.816s656s] GC(355176) CompactionSummary Phase
[2020-09-22T03:34:0906.213432-0700][24657412465738.437s657s] GC(355176) CompactionSummary Phase 26200.355ms074ms
[2020-09-22T03:34:0906.213432-0700][24657412465738.437s657s] GC(355176) PostAdjust CompactRoots
[2020-09-22T03:34:0906.238592-0700][24657412465738.462s816s] GC(355176) PostAdjust CompactRoots 25159.488ms736ms
[2020-09-22T03:34:0906.238592-0700][24657412465738.462s816s] GC(355176) PSYoungGen: 957440K->451409K(1527296K)Compaction Phase
[2020-09-22T03:34:09.238213-0700][2465741.462s437s] GC(355176) Compaction ParOldGen: 4194303K->4194162K(4194304K)Phase 2620.355ms
[2020-09-22T03:34:09.238213-0700][2465741.462s437s] GC(355176) Metaspace: 759906K->759906K(1824768K)Post Compact
[2020-09-22T03:34:09.238-0700][2465741.462s] GC(355176) PausePost Full (Ergonomics) 5030M->4536M(5587M) 3714.746msCompact 25.488ms
[2020-09-22T03:34:09.238-0700][2465741.462s] GC(355176) User=42.65s Sys=0.10s Real=3.71sPSYoungGen: 957440K->451409K(1527296K)
[2020-09-22T03:34:09.387238-0700][2465741.611s462s] GC(355177355176) Pause Full (ErgonomicsParOldGen: 4194303K->4194162K(4194304K)
[2020-09-22T03:34:09.387238-0700][2465741.611s462s] GC(355177355176) Marking PhaseMetaspace: 759906K->759906K(1824768K)
[2020-09-22T03:34:1009.306238-0700][24657422465741.531s462s] GC(355177355176) Marking Phase 919.544msPause Full (Ergonomics) 5030M->4536M(5587M) 3714.746ms
[2020-09-22T03:34:1009.306238-0700][24657422465741.531s462s] GC(355177) Summary Phase355176) User=42.65s Sys=0.10s Real=3.71s
[2020-09-22T03:34:1009.307387-0700][24657422465741.531s611s] GC(355177) SummaryPause Phase 0.078msFull (Ergonomics)
[2020-09-22T03:34:1009.307387-0700][24657422465741.531s611s] GC(355177) AdjustMarking RootsPhase
[2020-09-22T03:34:10.470306-0700][2465742.694s531s] GC(355177) AdjustMarking RootsPhase 163919.227ms544ms
[2020-09-22T03:34:10.470306-0700][2465742.694s531s] GC(355177) CompactionSummary Phase
[2020-09-22T03:34:1310.473307-0700][24657452465742.698s531s] GC(355177) CompactionSummary Phase 30030.617ms078ms
[2020-09-22T03:34:1310.474307-0700][24657452465742.698s531s] GC(355177) PostAdjust CompactRoots
[2020-09-22T03:34:1310.499470-0700][24657452465742.723s694s] GC(355177) PostAdjust CompactRoots 25163.643ms227ms
[2020-09-22T03:34:1310.499470-0700][24657452465742.723s694s] GC(355177) PSYoungGen: 957440K->512920K(1527296K)Compaction Phase
[2020-09-22T03:34:13.499473-0700][2465745.723s698s] GC(355177) Compaction ParOldGen: 4194162K->4194126K(4194304K)Phase 3003.617ms
[2020-09-22T03:34:13.499474-0700][2465745.723s698s] GC(355177) Metaspace: 759906K->759906K(1824768K)
Post Compact
[2020-09-22T03:34:13.499-0700][2465745.724s723s] GC(355177) PausePost Full (Ergonomics) 5030M->4596M(5587M) 4112.550msCompact 25.643ms
[2020-09-22T03:34:13.499-0700][2465745.724s723s] GC(355177) User=47.24s Sys=0.00s Real=4.11sPSYoungGen: 957440K->512920K(1527296K)
[2020-09-22T03:34:13.503499-0700][2465745.727s723s] GC(355178355177) Pause Full (ErgonomicsParOldGen: 4194162K->4194126K(4194304K)
[2020-09-22T03:34:13.503499-0700][2465745.727s723s] GC(355178355177) Marking PhaseMetaspace: 759906K->759906K(1824768K)
[2020-09-22T03:34:1413.417499-0700][24657462465745.641s724s] GC(355178355177) Marking Phase 914.147msPause Full (Ergonomics) 5030M->4596M(5587M) 4112.550ms
[2020-09-22T03:34:1413.417499-0700][24657462465745.641s724s] GC(355178355177) Summary PhaseUser=47.24s Sys=0.00s Real=4.11s
[2020-09-22T03:34:1413.417503-0700][24657462465745.641s727s] GC(355178) SummaryPause Phase 0.066msFull (Ergonomics)
[2020-09-22T03:34:1413.417503-0700][24657462465745.641s727s] GC(355178) AdjustMarking RootsPhase
[2020-09-22T03:34:14.555417-0700][2465746.779s641s] GC(355178) AdjustMarking RootsPhase 138914.314ms147ms
[2020-09-22T03:34:14.555417-0700][2465746.779s641s] GC(355178) CompactionSummary Phase
[2020-09-22T03:34:1714.136417-0700][24657492465746.361s641s] GC(355178) CompactionSummary Phase 25810.276ms066ms
[2020-09-22T03:34:1714.137417-0700][24657492465746.361s641s] GC(355178) PostAdjust CompactRoots
[2020-09-22T03:34:1714.161555-0700][24657492465746.385s779s] GC(355178) PostAdjust CompactRoots 24138.319ms314ms
[2020-09-22T03:34:1714.161555-0700][24657492465746.385s779s] GC(355178) PSYoungGen: 513664K->513419K(1527296K)Compaction Phase
[2020-09-22T03:34:17.161136-0700][2465749.385s361s] GC(355178) Compaction ParOldGen: 4194126K->4194120K(4194304K)Phase 2581.276ms
[2020-09-22T03:34:17.161137-0700][2465749.385s361s] GC(355178) Metaspace: 759906K->759906K(1824768K)Post Compact
[2020-09-22T03:34:17.161-0700][2465749.385s] GC(355178) PausePost Full (Ergonomics) 4597M->4597M(5587M) 3658.596msCompact 24.319ms
[2020-09-22T03:34:17.161-0700][2465749.385s] GC(355178) User=42.50s Sys=0.01s Real=3.66sPSYoungGen: 513664K->513419K(1527296K)
[2020-09-22T03:34:17.318161-0700][2465749.542s385s] GC(355179355178) Pause Full (ErgonomicsParOldGen: 4194126K->4194120K(4194304K)
[2020-09-22T03:34:17.318161-0700][2465749.542s385s] GC(355179355178) Marking PhaseMetaspace: 759906K->759906K(1824768K)
[2020-09-22T03:34:1817.222161-0700][24657502465749.446s385s] GC(355179355178) Marking Phase 903.470ms Pause Full (Ergonomics) 4597M->4597M(5587M) 3658.596ms
[2020-09-22T03:34:1817.222161-0700][24657502465749.446s385s] GC(355179355178) Summary PhaseUser=42.50s Sys=0.01s Real=3.66s
[2020-09-22T03:34:1817.222318-0700][24657502465749.446s542s] GC(355179) SummaryPause Phase 0.074msFull (Ergonomics)
[2020-09-22T03:34:17.318-0700][2465749.542s] GC(355179) Marking Phase
[2020-09-22T03:34:18.222-0700][2465750.446s] GC(355179) Adjust Roots

This is from a thrashing server. The lines to note are marked Pause Full. e.g.:

Code Block
Marking Phase 903.470ms
[2020-09-22T03:34:1318.499222-0700][24657452465750.724s446s] GC(355177355179) Pause Full (Ergonomics) 5030M->4596M(5587M) 4112.550msSummary Phase
[2020-09-22T03:34:18.222-0700][2465750.446s] GC(355179) Summary Phase 0.074ms
[2020-09-22T03:34:1318.499222-0700][24657452465750.724s446s] GC(355177355179) User=47.24s Sys=0.00s Real=4.11s

These are "full" stop-the-world events, at least for the default UseParallelGC implementation. This full GC took 4.11s, and occurred just 4.26s since the previous full GC. So not much non-GC work is getting done.

I personally find it quite tedious to interpret these GC logs - finding the relevant lines, and doing timestamp math to figure out the intervals. Hence I find atop with threads a much faster means of answering the simple question "is it thrashing?".

Confluence

Confluence (at least 7.1.2 till 7.4.4) defaults to these options:

Code Block
-XX:+IgnoreUnrecognizedVMOptions
-Xlog:gc+age=debug:file=$LOGBASEABS/logs/gc-`date +%F_%H-%M-%S`.log::filecount=5,filesize=2M
-Xloggc:$LOGBASEABS/logs/gc-`date +%F_%H-%M-%S`.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M ${CATALINA_OPTS}

Per Disruptive Changes to GC Logging in Java 9

...

Adjust Roots

The sample log above is from a thrashing server using the older ParallelGC algorithm. The lines to note are marked Pause Full. e.g.:

Code Block
[2020-09-22T03:34:09.238-0700][2465741.462s] GC(355176) Pause Full (Ergonomics) 5030M->4536M(5587M) 3714.746ms
.....
[2020-09-22T03:34:13.499-0700][2465745.724s] GC(355177) Pause Full (Ergonomics) 5030M->4596M(5587M) 4112.550ms
[2020-09-22T03:34:13.499-0700][2465745.724s] GC(355177) User=47.24s Sys=0.00s Real=4.11s

By subtracting the timestamps we see a 4.26s interval between full GCs (2465745.724 - 2465741.462), of which the GC occcupied 4.11s. This is what thrashing looks like.

Interpreting GC log files is exceptionally tedious - finding relevant lines, doing timestamp math, etc. If you did want to further analyze this GC log file, what tools could you use?

GCViewer

My tool of choice is GCViewer, because it has been around forever and does the job.

The problem alluded to in the introduction is that GCViewer cannot read the default format of GC logs generated by Jira and Confluence, when using Java 9+. You'll just get an error for each line parsed:

Code Block
WARNING [DataReaderSun1_6_0]: com.tagtraum.perf.gcviewer.imp.ParseException: com.tagtraum.perf.gcviewer.imp.UnknownGcTypeException: Unknown gc type: '-' Line 1: [2020-09-19T22:16:23.955-0700][2273876.179s] GC(326262) ParOldGen: 2230694K->2232246K(2380800K)


Image Added


After making the -Xlog:gc  tweak suggested in the intro, and restarting Jira, GCViewer should be able to load the /opt/atlassian/jira/logs/atlassian-jira-gc-*.log  files:

(example from a perfectly healthy server, sorry)

Image Added

GCEasy.io

Discovered after reading these GC tuning tips; https://gceasy.io is likely worth the $10/mo. Unlike GCViewer, no -Xlog:gc  tweaks are needed. The free tier is enough to give a general idea of GC activity:

Image Added