Strange running processes? 100% CPU use? Security emails? Odds are your server has been hacked. This guide is for system administrators who are not security experts, but who nevertheless need to recover from a hacked Jira/Confluence installation.
Something is fishy..
It began with some emails to
root on the server, which get redirected to my mailbox:
confluence user trying to run commands with sudo, which it is not authorized to do. Strange!
Looking at running processes with atop, two jump out as suspicious:
khugepaged running as the Confluence user. A third process,
kerberods occasionally appeared too:
and a curl command:
Yes, Confluence has been hacked
What to do?
At this point, the server is a crime scene. An attacker is running arbitrary commands as the
confluence user, meaning they are able to access everything in Confluence, regardless of permissions. Think through what your Confluence instance contains. Passwords to external systems? Confidential data about your business? Confidential information about clients? The implications of a breach depend on what confidential is stored, and the laws of your country. In Australia, you may have legal obligations under the Notifiable Data Breaches scheme, and may want to report the intrusion at https://www.cyber.gov.au/report
The point being, a hacked server represents a problem way beyond your pay grade as a humble system administrator. The response must be at multiple levels:
- Initial response:
- gather "first responder" forensic evidence of what is happening
- prevent further damage, while modifying the system as little as possible
- understand the attack vector sufficiently in order to be able to block it
- allow normal business activities to resume as soon as possible
- Forensic - understanding more fully how the intrusion happened, and the extent of the breach. As far as possible this is best left to a security professional, because it's a specialized skillset and a lot is at stake.
- Organizational - management need to be in the loop to coordinate a response (e.g. engage security experts), deal with the fallout, and address the failures (e.g. IT understaffing) that led to the hack.
- Legal - as mentioned, there may be legal ramifications, particularly of leaking third-party confidential information.
This guide deals only with the initial response, but it is critical to be aware of the bigger picture. Get technical help if you are not confident (see shameless plug at the end). A panicky, botched initial response will make forensics hard or impossible, which in turn increases the management and legal headaches. There will be difficult decisions to make:
- while the attack is live, what forensic evidence do you gather, and how?
- at what point do you have enough forensics?
- shutting down services affects legitimate business users. This needs to be balanced against the need to stop ongoing damage from the hack.
- what is the extent of the breach? Is a full server rebuild required? Are other servers affected? What was the hack entry point, and how can we block it? These decisions need to be made fast if services are offline.
The technical response described here is, I think, appropriate for a small to medium business without extraordinarily sensitive data.
Some quick things to do before anything else:
Disable SSH agent forwarding
I know I shouldn't, but for some servers I have
ForwardAgent yes in SSH so I can easily jump between servers. Agent forwarding to a hacked server is a really bad idea, as the matrix.org experience illustrates. Turn agent forwarding off in your ~/.ssh/config before continuing.
SSH in and become root
ssh in and
sudo su - if necessary.
Record your session
If we have to go tramping through a crime scene, let's at least record what we see. As soon as you SSH in to the server, run:
Now everything you see, even ephemeral information like
top output, is logged.
Log network activity
On the server, as root, run:
This records all network activity on the server. This takes a few seconds to do, and may provide valuable evidence of e.g. data exfiltration.
Snapshot system activity
Lock down the system
Do not shut down the server. Doing so would lose potentially critical information. In my case, the malicious scripts are running from
/tmp/ , so restarting the server would lose them.
Instead, cut off network access (incoming and outgoing) to all non-essential parties. This should be done at the management layer (e.g. network ACLs), to avoid trusting potentially compromised binaries on the server.
If you are confident that
root has not been breached, This can be done with iptables rules on the server:
- Before locking down iptables, now is an excellent time to verify your out-of-band console access to the server (Linode provides lish, for instance).
Figure out what IP(s) to allow. If you are SSHed into the server, run:
Lock down the iptables rules. On Debian/Ubuntu, run:
The server is now completely locked down, except for (hopefully!) SSH connections from you.
Back up important files
If your VPS infrastructure allows you to take a snapshot of a running server, now is the time to do so. Who knows, perhaps there is a
sleep 1000; rm -rf / time bomb ticking away.
If you can't snapshot the whole system, rsync off the important contents including:
postgresjust prior if you don't trust Postgres WAL).
and files that will help you figure out what happened, such as:
/var/log/journal(if systemd journaling is enabled)
confluenceis the account running Confluence.
~/hack/(your terminal output and network captures so far)
Using rsync, this can be done with a command like:
Now if the server spontaneously combusts, you have at least salvaged what you could.
Consider locking down other affected systems
You now have a locked down, backed up server. It is time to consider whether other systems might have been hacked too:
Does Confluence store its user passwords on an external system, like AD, LDAP or Jira? Did Confluence have permission to instigate password resets? If yes and yes, that is bad news: your hacker may have reset passwords reused on other systems (e.g. Jira), and thereby accessed those other systems.
- log in to Confluence as an administrator
go to the User Management admin section (type 'gg' then 'user management')
Confluence doesn't make our lives easy here. You'll probably see something like:
This doesn't tell us if our access to the 'JIRA Server' user directory is read-only of read/write. So click on the 'Directory Configuration Summary':
If you see a full set of 'Allowed Operations', as above, that means Confluence has permission to modify user passwords in Jira. A read-only Jira would have a much shorter list of allowed operations:
If your user directory is read-write for Confluence, then check if that system if any user passwords were reset, e.g. in Jira's audit log:
What could a malicious
confluenceuser see in the system? Check the permissions of user directories in
/home. Are they world-readable/executable? If so, anything sensitive in those home directories may have been exposed.
This ensures Jira/Confluence cannot see directories they don't need to.
Do not set
PrivateTmp=nobecause that prevents
jconsoleand friends from communicating with the Java process.
- How are external backups done? Are credentials to the backup system compromised? If so, move to protect your backups before anything else.
In our case, external backups are stored on tarsnap. The backup process runs as root and the tarsnap key is only root accessible. I was fairly confident
roothas not been compromised.
- Are there usernames and passwords stored as plaintext in Confluence? If so, consider those systems breached too.
Understand the attack vector
Once you have locked down all potentially affected systems, the damage should be contained. How did the attacker get in?
It is worth spending some time on these questions now, as an easy win will get all users back online soon. However you may not be so lucky, and as users complain and pressure mounts for restored services, you may want to proceed to the next section: restoring emergency access.
For reference, the
kerberods binary I found had signatures:
Attack vector 1: Application-level vulnerabilities
How do you figure out if you have been breached through a particular security vulnerability? Unfortunately it's not easy. Sometimes a hack will leave a characteristic stacktrace, but more often you have to trawl through the webserver access logs, looking for anything suspicious. "Suspicious" means requests from unusual IPs (e.g. in foreign countries) accessing URLs relating to the vulnerable resource. Sometimes the vulnerable resource URL is made explicit in Atlassian's vulnerability report (if the mitigation if "block /frobiz URLs" then you know /frobniz the vulnerable resource), but sometimes there is no simple correlation. For instance, the 2019-03-20 security vulnerability in in the Widget Connector, but in the logs the only symptom is a series of anonymous requests to the macro preview URL:
This is one reason to install mod_security on your server: it gives you visibility into the contents of POST requests, for instance.
lnav is an invaluable aid to access log trawling, as it lets you run SQL queries on access logs. For example, we know rogue JSP files would be a sure sign of a breach. Here is a SQL query on your access logs that identify requests to JSPs:
In my example, there are some hits, but fortunately all with 404 or 301 responses, indicating the JSPs do not exist:
Attack vector 2: User account compromises
Perhaps a user's password has been guessed (e.g. by reusing it on other services - see https://haveibeenpwned.com), or the user succumbed to a phishing attack and clicked on an XSRFed resource. If the account had administrator-level privileges, the attacker has full Confluence access, and possibly OS-level access (through Groovy scripts or a custom plugin).
Things to do:
- Check the audit log for suspicious admin activity, but be aware that the audit log is not trustable at this point.
Identify accounts whose password has recently changed, by comparing password hashes with that from a recent backup.
This command compares the
cwd_usertable from a monthly backup to that from the current
(diffing database dumps like this is a generally useful technique, described here)
Check for users logging in from strange IPs, e.g. foreign countries or VPSes.
This lnav command prints a summary of Confluence access counts grouped by username and originating IP hostname
The originating IPs do not look suspicious for a small Australian business:
Attack vector 3: Lower-level vulnerabilities
It is possible the hack was doing through SSH account compromise, webserver vulnerability, Java vulnerability or something more exotic. Check
last for suspicious logins, as well as
/var/log/*.log (with lnav) for errors.
Restoring critical user access
Finding the intruder's point of entry isn't always possible in a hurry. Often though, we can say for certain that certain IPs and usernames are not the source of the hack, and can be let in safely to reduce business impact of service unavailability.
Building on our ufw rules above, here is a script that grants two administrators SSH/HTTPS access, and then grants HTTP/HTTPS access to a list of safe IPs:
In my case, the attack was being launched through unauthenticated accesses, so all IPs that had successfully logged in to Jira or Confluence are safe. We can construct valid_ips.txt with lnav:
Resume normal business activities
Once an operating system account has been compromised, it's generally safest to assume that the attacker has also found a local privilege escalation, achieved root, has installed trojan variants of system binaries. If so, it is game over: time to build a new server from scratch.
Or you may like to take a calculated risk that
root has not been breached, and so salvage the server by cleaning up artifacts of the hack.
In the case of my khugepaged hack, I (in consultation with the client) went with the latter, and followed the 'LDS cleanup tool' procedure mentioned on the community.atlassian.com thread. If you go this route, double-check that
/opt/atlassian/confluence/bin/*.sh files are not modified (they should be read-only to
Either way, the question arises, is the Confluence data itself safe? Must you restore from a pre-hack backup?
To answer this question, consider what an attacker might have done with complete access:
- Changed passwords of administrator accounts
- Created new administrator accounts
- Installed rogue plugins
- Deleted entries in the audit log to cover their tracks
- Deleted or corrupted Confluence content
- Installed application links to foreign systems
The attacker may now know the hashes of all user passwords, and can probably brute-force them. You should probably reset passwords globally. More importantly, if you were relying on only passwords in a publicly exposed Confluence, you were Doing It Wrong. Install a 2FA plugin or implement a SSO system like Okta as a matter of urgency.
If you don't reset passwords, at a minimum I would check user passwords before and after the hack (using a backup):
and check for unexpected plugins on the filesystem level:
check additional tables against the backup in accordance with your level of paranoia.
The aftermath of a hack is a golden time in which management are suddenly extremely security conscious. Take the opportunity to make long-term changes for the better!
Red Radish Consulting specializes in cost-effective remote upgrades-and-support solutions for self-hosted instances. We are flexible, with a particular affinity for the small/medium business market that other consultants don't want to touch.
Drop us a line to discuss whether this will work for you.