menu
Posted by Arnon Erba in How-To Guides on .

SELinux has a well-earned reputation for being hard to use. It’s infamous for causing strange, illogical faults that can’t be fixed via normal troubleshooting routines, and, as a consequence, many guides and blog posts recommend disabling it outright. However, SELinux is a great way to secure and harden Linux systems, and with a few simple steps it’s possible to fix most common problems you might encounter while using it.

Examples of Common Issues

Let’s start by looking at a few issues I’ve had in the past that turned out to be caused by SELinux:

  1. A user could no longer log in with an SSH key after their home directory was restored from a backup. Their authorized_keys file was configured correctly but was being ignored by SSH.
  2. A service wouldn’t start after replacing its config file with a modified version that had been uploaded via SFTP. The service complained about the config file being inaccessible even though its permissions were set correctly.
  3. Postfix couldn’t communicate with OpenDKIM when the latter was set to use a UNIX socket instead of a TCP/IP socket. The Postfix user was in the correct security group and the socket was configured correctly.

Without a general understanding of how SELinux works, you might guess that the issues above were caused by bad file permissions. That’s why it’s important to understand SELinux and to identify it as a possible culprit as early as possible in the troubleshooting process.

What is SELinux, Exactly?

At its core, SELinux is a set of rules that tell applications what they can and can’t do. SELinux is separate from the regular Linux file permissions model and is therefore able to protect against issues like misconfigured permissions or privilege escalation exploits. In order for an operation to succeed on an SELinux-enabled system, it must be permitted by file permissions as well as by the active SELinux policy.

Regular file permissions are a form of discretionary access control, or DAC. On the other hand, SELinux is a form of mandatory access control, or MAC. With DAC, a user or service can do anything they have permission to do, even if it’s something undesirable or dangerous. With MAC, malicious or dangerous actions can be stopped, even if a DAC policy would otherwise permit them to happen.

Here’s an example of why you’d want to keep SELinux enabled. Normally, Apache shouldn’t be able to read /etc/shadow, and the default file permissions prevent that from happening. However, if those permissions were misconfigured and Apache was configured to serve files from /etc, it would be possible for anyone with a web browser to download /etc/shadow. A properly configured SELinux policy would override both misconfigurations and prevent Apache from serving sensitive system files from /etc.

Putting Things in Context

Extra protection is great, but what happens when SELinux interferes when it shouldn’t? If SELinux is interfering with something “normal” that should otherwise work, chances are you have one simple problem: incorrect file security contexts. Security contexts are how SELinux categorizes files and decides which applications can access them. By default, security contexts are applied to files based on their location. For example, files in home directories get different security contexts from files in /etc or /tmp.

You can inspect a file’s security context with ls -Z, but you’re probably better off using restorecon to reset contexts to their default values if you suspect a problem. To save time, you can run restorecon -rv /path/to/directory to recursively reset the security contexts for an entire directory. If things are bad enough, you can relabel your entire filesystem by running touch /.autorelabel and then rebooting.

The restorecon command was the solution to problems #1 and #2 from the list at the beginning of this post. Incorrect security contexts can be applied when files are restored from a backup or copied from a nonstandard location.

Adjusting the Policy

In most mainstream Linux distributions, the default SELinux policy is carefully crafted by a group of upstream maintainers. Creating a perfect one-size-fits-all policy is impossible, so the maintainers provide built-in policy exceptions in the form of SELinux booleans. SELinux booleans can be easily enabled or disabled to cover common use cases where the default SELinux policy falls short. If you have an SELinux problem that can’t be fixed by restoring default file security contexts, you should check to see if an available SELinux boolean covers your use case.

You can use getsebool -a to retrieve a list of available booleans on your system and then use setsebool to enable or disable them. Alternatively, you can use the semanage tool to see more detailed information about available booleans. Examples of SELinux booleans include:

  • use_nfs_home_dirs: Support NFS home directories.
  • httpd_can_network_connect: Allow HTTPD scripts and modules to connect to the network.
  • ftpd_full_access: Allow full filesystem access over FTP.

Rewriting the Policy

If fixing security contexts and enabling booleans hasn’t worked, ask yourself if you’re doing something abnormal. “Abnormal” in this context might include running a service on a nonstandard port, serving web files from an unconventional location, or moving config files out of their default directory. If you are, there’s a good chance your system’s default SELinux policy won’t cover your use case.

Before you proceed, you should think hard about what benefit you’re getting from running a nonstandard configuration. Standards exist for good reasons: troubleshooting is easier, malicious activity is simpler to detect, and applications can be configured to behave more predictably. With that said, there’s plenty of vendor software out there that relies on an “abnormal” configuration to work properly.

If you’ve evaluated your configuration and decided to proceed, you have two options. First, you may have discovered a bug in your platform’s SELinux policy, which means you should submit a bug report so that the policy can be fixed upstream. This is the course I ended up pursuing for the OpenDKIM issue mentioned above, and Red Hat updated the upstream policy after a few months.

Alternatively, you can write and compile a custom SELinux policy module. This is not as difficult as it sounds, as audit2allow can generate SELinux modules directly from audit log entries. A brief description of how to make use of the audit log is below, but a full explanation is beyond the scope of this post.

The Audit Log

By default, SELinux violations are logged to the audit log at /var/log/audit/audit.log. The best way to troubleshoot potential SELinux issues is to consult the audit log, but the default log format is not particularly user-friendly and raw entries are not always easy to understand. Instead of reading the audit log file directly, you can search the log with the ausearch tool or generate comprehensive, human-readable reports from it with the sealert tool. A full description of how to use those programs is provided by the documents in the “Read More” section at the bottom of this post.

Wrapping Up

SELinux has been around for a long time, and many mainstream Linux distributions now ship with robust SELinux policies that cover a range of use cases. Additionally, configuration management tools like Puppet can automatically set SELinux contexts for you and help you avoid inadvertently mislabeling files.

That said, the default SELinux policy can’t possibly cover all possible use cases, so you may still need to enable SELinux booleans or compile custom policy modules to make SELinux work for you. In any case, you should avoid disabling it outright, especially if you’re running a derivative of Fedora such as RHEL or CentOS where SELinux is intended to be the primary form of mandatory access control.

Read More

The banner image for this post was created by The Worlds Beyond.

Updated Posted by Arnon Erba in News on .

Bitdefender Antivirus — the free edition, at least — appears to be interfering with Remote Desktop Protocol (RDP) connections on Windows. Affected users may receive the following error when they try to log on to a remote PC or server with Network Level Authentication (NLA) enabled:

An authentication error has occurred.

The Local Security Authority cannot be contacted.

This could be due to an expired password.

While an expired password or a server-side misconfiguration can cause this error, it may also indicate a client-side issue. In this case, the error appears to be caused by Bitdefender Antivirus replacing the remote computer’s certificate in order to inspect encrypted RDP traffic. This process breaks Network Level Authentication and causes the connection to fail.

One workaround is to add file-level exclusions in Bitdefender for both the 64-bit and 32-bit versions of the Windows RDP client:

  • C:\Windows\system32\mstsc.exe
  • C:\Windows\syswow64\mstsc.exe

This is not an ideal solution, but the free version of Bitdefender Antivirus has a limited control panel and does not provide alternative workarounds.

References

Updated Posted by Arnon Erba in News on .

Some iPhones and iPads appear to be having trouble updating to iOS 14 from older versions of iOS. If you receive an “Unable to Install Update” error after downloading iOS 14, it may be worth temporarily disabling your passcode before trying the update again. Make sure to re-enable your passcode once your device successfully updates.

It’s unclear what devices and/or old versions of iOS are affected by this bug, but I confirmed the issue on an iPhone 7 running iOS 10. At the moment, several other users are reporting similar issues on the Apple Developer Forums.

If your device still won’t update with the passcode disabled, check if you have enough free storage space available. It’s also worth taking a look at Apple’s guide on what to do when your iOS device won’t update.

Posted by Arnon Erba in Meta on .

It’s been about two years since my last meta post, so in keeping with tradition it would seem I’m due for another ground-up redesign of my blog and a retrospective post to go along with it. This time, though, I think I’ll skip the redesign and jump straight to the retrospective post.

I’ve written about the history of this blog before, and much of what I wrote in 2018 still holds true today. I’ve been blogging under the Arnon on Technology moniker for a while now and the past two years have seen the publication of some of my favorite posts. Still, I’ve managed to almost continuously break the cardinal rule of blogging by averaging, in many cases, less than a single post per month.

Why? To be honest, writing posts for this blog takes a lot of time. Some of the more in-depth how-to guides I’ve written have over a dozen hours invested in them and were composed over a period of weeks or months as time permitted. For example, I’ve been working on an article about Git off-and-on since February of this year, and it’s still far from complete. In some way, it’s a case of perfect being the enemy of good enough.

In the end, that’s OK, because this blog still isn’t about money or fame. I’ve never had the time or desire to pursue ad revenue and I’m not active on Twitter or other social media platforms. Instead, my main goal is to continue writing articles about poorly documented tech problems while giving back to the community of Internet bloggers who have helped me out time and again with similar niche issues.

For now, though, it’s time to finish a few more drafts.

Updated Posted by Arnon Erba in How-To Guides on .

On Windows and macOS, Stata can be configured to check for updates automatically with the set update_query command. However, there are a few drawbacks to this approach.

For one, this feature isn’t present on the Linux version of Stata. For two, this command doesn’t actually update Stata — it just enables update notifications. Stata will still need to be manually updated by someone with the permission to do so.

If you’re running Stata on a standalone Linux server or an HPC cluster, you may be interested in having Stata update itself without any user interaction. This is especially useful if Stata users do not have permission to update the software themselves, as is often the case on shared Linux systems.

We can enable true automatic updates with a cron job and a Stata batch mode hack:

0 0 * * 0 echo 'update all' | /usr/local/stata16/stata > /dev/null

Adding this line to root’s crontab will cause the update all command to be run every Sunday at 12am. Standard output is piped to /dev/null to prevent cron from sending unnecessary emails.

As always, think carefully before enabling automatic updates for mission-critical pieces of software. However, this approach can save time over updating Stata manually.

Updated Posted by Arnon Erba in General on .

Life as a sysadmin is constantly entertaining. Some days, even when you think you’ve accounted for every possible contingency, something happens that still manages to take you by surprise. Wednesday was one of those days.

I manage a few production Red Hat Enterprise Linux servers that, until Wednesday of this week, were all running RHEL 7. RHEL 7 is still well within its support window, but ever since RHEL 8 came out in May of last year I’ve been preparing to proactively upgrade my systems. By chance, I finished my preparations this week, so I scheduled an in-place rebuild of one of my less critical servers for the Wednesday the 29th.

The Upgrade Begins

Because true hardware-based RAID controllers are blisteringly expensive, I like to run simple software RAID arrays with mdadm where possible. The RHEL installer makes it easy to place all your partitions — even the EFI system partition — on an mdadm array during the installation process, so I started my server rebuild by creating a few RAID 1 arrays on the server’s dual HDDs. Later, with the installation complete, I rebooted the server and was greeted by a fresh RHEL 8 login prompt at the console.

Shortly after that, things went sideways. I ran yum update to pull down security patches and discovered a kernel/firmware update and a GRUB2 update. During the update process, I noticed that the server had slowed to a crawl, so I checked /proc/mdstat and realized that mdadm was still building the RAID 1 arrays and was eating up all the bandwidth my HDDs could muster while doing so. Impatient, and eager to get out of the loud server room and back to a desk, I decided to reboot the server to apply the kernel update so I could finish setting things up over SSH.

No Boot?

Two minutes later, I was staring at a frozen BIOS splash screen. As I’d just installed a new network card, I immediately suspected hardware problems, so I powered the server down and checked things over. Nothing helped: The hardware seemed fine, but it still wouldn’t boot.

Mdadm is pretty resilient, but since I’d shut the server down mid-verification I hastily assumed I’d somehow broken my RAID setup. Because I hadn’t gotten very far post-installation, I decided to wipe the server and reinstall RHEL 8 to rule out any issues. This time, I let mdadm sit for an hour or so before I touched anything, and then patched and rebooted the server again. Cue the frozen BIOS splash screen.

In hindsight, the common factor was clearly the updates, but as I’d just updated my RHEL 8 development server the day before with no ill effects I didn’t immediately consider a bad update as a possibility. Instead, I reset the BIOS to factory defaults and reviewed all my settings. When that didn’t help, I rummaged through my drawer of spare parts and grabbed an unused NVMe SSD to replace the server’s frustratingly slow HDDs in case they or the RAID configuration was the source of the problem. After installing RHEL 8 on the new drive, I rebooted the server several times to verify everything worked before applying updates. Once again, everything was fine until I applied the GRUB2 updates.

Verifying the Problem

Faced with what now seemed to be a bootloader issue, I went back to my RHEL 8 development server and updated it again. Sure enough, a new GRUB2 update popped up, and when I rebooted after applying it I got stuck at a black screen. Confident that I’d narrowed the issue down to a bugged update, I reinstalled RHEL 8 one last time on my production server — this time, skipping the update step — and set about reinstalling software on it.

When I finished later that night, I got the Red Hat daily digest email summarizing the latest RHEL updates. As it turned out, Red Hat had released patches for the BootHole vulnerability just a few hours before I arrived on-site in the afternoon ready to rebuild my server. (For reference, the RHEL 8 patch is RHSA-2020:3216 and the RHEL 7 patch is RHSA-2020:3217.) I quickly disabled automatic updates on the rest of my servers and wrote up a hasty bug report at 10:15pm.

The Results

I woke up on Thursday morning to 50+ email notifications from Bugzilla and a tweet from @nixcraft linking to the bug report. As the day went on, it became apparent that RHEL 7 was also affected and certain Ubuntu systems were suffering from the fallout of a similar patch.

As of the writing of this post, it seems like the specific issue lies with shim rather than GRUB2 itself. Right now, Red Hat is advising that people avoid the broken updates, and they’ve published various workarounds that may come in handy if you’ve already applied them. For the moment, I still have automatic updates disabled, and I’m hoping that Red Hat will publish fixed versions of GRUB2 and shim soon.

In the end, I spent four hours reinstalling RHEL 8, submitted a hastily written bug report, and became the anonymous “user” mentioned in the first paragraph of an Ars Technica article:

Early this morning, an urgent bug showed up at Red Hat’s bugzilla bug tracker—a user discovered that the RHSA_2020:3216 grub2 security update and RHSA-2020:3218 kernel security update rendered an RHEL 8.2 system unbootable.