Understanding Capabilities in Linux

For some time now the Linux kernel has been supporting a capabilities(7) based permission control model. In theory this allows assigning fine-grained permissions to processes so that processes that previously required UID 0/root permissions don’t need these any more. In practice though, uptake of this feature is relatively low, and actually trying to use it is hampered by confusing vocabulary and non-intuitive semantics.

So what’s the story?

All special access permission exemptions that were previously exclusively attached to UID 0 are now associated with a capability. Examples for these are: CAP_FOWNER (bypass file permission checks), CAP_KILL (bypass permission checks for sending signals), CAP_NET_RAW (use raw sockets), CAP_NET_BIND_SERVICE (bind a socket to Internet domain privileged ports).

Capabilities can be bestowed on execution (similar to how SUID operates) or be inherited from a parent process. So in theory it should be possible to, for example, start an Apache web server on port 80 as a normal user with no root access at all, if you can provide it with the CAP_NET_BIND_SERVICE capability. Another example: Wireshark only needs the CAP_NET_RAW and CAP_NET_ADMIN capabilities. It is highly undesirable to run the main UI and protocol parsers as root, and slightly less desirable to run dumpcap, which is the helper tool that Wireshark actually uses to sniff traffic, as root. Instead, the preferred installation method on Debian systems is to set the dumpcap binary up so that it automatically gains the required privileges on execution, and then limit execution of the binary to a certain group of users.

Gaining and giving capabilities

This is the most confusing part, because a) it doesn’t behave intuitively in the “just like suid-root” mental model, and b) uses the same words for completely different functions.

Conceptually capabilities are maintained in sets, which are represented as bit masks. For all running processes capability information is maintained per thread; for binaries in the file system it’s stored in extended attributes. Thread capability sets are copied on fork() and specially transformed on execve(), as discussed below.

Several different capability sets and related variables exist. In the documentation these are treated as somewhat symmetrical for files and threads, but in reality they are not, so I’ll describe them one by one:

Thread permitted set
This is a superset of capabilities that the thread may add to either the thread permitted or thread inheritable sets. The thread can use the capset() system call to manage capabilities: It may drop any capability from any set, but only add capabilities to its thread effective and inherited sets that are in its thread permitted set. Consequently it cannot add any capability to its thread permitted set, unless it has the CAP_SETPCAP capability in its thread effective set.
Thread effective set
This is the actual set of capabilities that the kernel uses for permission checks.
Thread inheritable set
This is a set that plays a role in bequeathing capabilities to other binaries. It would more properly be called ‘bequeathable’: a capability not in this set cannot be inherited by a different binary through the inheritance process. However, being in this set does also not automatically make a binary inherit the capability. Also note that ‘inheriting’ a capability does not necessarily automatically give any thread effective capabilities: ‘inherited’ capabilities only directly influence the new thread permitted set.
File permitted set
This is a set of capabilities that are added to the thread permitted set on binary execution (limited by cap_bset).
File inheritable set
This set plays a role in inheriting capabilities from another binary: the intersection (logical AND) of the thread inheritable and file inheritable sets are added to the thread permitted set after the execve() is successful.
File effective flag
This is actually just a flag: When the flag is set, then the thread effective set after execve() is set to the thread permitted set, otherwise it’s empty.
cap_bset
This is a bounding capability set which can mask out (by ANDing) file permitted capabilities, and some other stuff. I’ll not discuss it further and just assume that it contains everything.

Based on these definitions the documentation gives a concise algorithm for the transformation that is applied on execve() (new and old relate to the thread capability sets before and after the execve(), file refers to the binary file being executed):

  • New thread permitted = (old thread inheritable AND file inheritable) OR (file permitted AND cap_bset)
  • New thread effective = new thread permitted, if file effective flag set, 0 otherwise
  • New thread inheritable = old thread inheritable

This simple definition has some surprising (to me) consequences:

  1. The ‘file inheritable set’ is not related to the ‘thread inheritable set’. Having a capability in the file inheritable set of a binary will not put that capability into the resulting processes thread inheritable set. In other words: A thread that wants to bequeath a capability to a different binary needs to explicitly add the capability to its thread inheritable set through setcap().
  2. Conversely the ‘thread inheritable set’ is not solely responsible for bequeathing a capability to a different binary. The binary also needs to be allowed to receive the capability by setting it in the file inheritable set.
  3. Bequeathing a capability to a different binary by default only gives it the theoretical ability to use the capability. To become effective, the target process must add the capability to its effective set using setcap(). Or the file effective flag must be set.
  4. A nice side effect of the simple copy operation used for the thread inheritable set: A capability can be passed in the thread inheritable set through multiple intermediate fork() and execve() calls to a target process at the end of a very long chain without becoming effective in the middle.
  5. The relevant file capability sets are those of the binary being executed. When trying to give permitted capabilities to an interpreted script, the capabilities must be in the file inheritable set of the interpreter binary. Additionally: If the script can’t/won’t call capset(), the file effective flag must be set on the interpreter binary.

Summary

I’ve tried to summarize all the possible paths that a capability can take within a Linux thread using capset() or execve(). (Note: fork() isn’t shown here, since all capability information is simply duplicated when forking.)

Linux Capabilities: Possible capability transmission paths

Rescuing Full Archives From A Google Group

A project I’m newly affiliated with has used Google Groups for all their private group communication so far. Since I’m not a big fan of storing private data in the proprietary data silos of cloud providers, this is a situation I want to rectify. Why use Google Groups when you can set up GNU mailman yourself and not have all data and meta-data pass through Google?

There’s one caveat: While Google provides an export of group member lists, there’s no export functionality for the current archive. Which in this case represented 2 years’ worth of fruitful discussion and organizational knowledge. Some tools exist to try and dump all of a group’s archive, but none really agreed with me. So I rolled my own.

I give to you: https://github.com/henryk/gggd

Inside you’ll find a Python script that uses the lynx browser to access the Google Groups API (so it can work with a Google login cookie as an authenticated user) and will enumerate all messages in a group’s archive and download each into a different file as a standard RFC (2)822 message. While programming this I found that some of the messages are being returned from the API in a mangled form, so I also wrote a tool (can be called with an option from the downloader) that can partially reverse this mangling.

With the message files from my download tool, formail from the procmail package, and some shell scripting I was able to generate an mbox file with the entire groups’ archives which could be easily imported into mailman.

owncloud – Cache Static Assets (CSS/Javascript)

To whom it might be useful,

I recently set up an owncloud instance for private use and found that the load time was abysmal. Showing the default “Files” page spends ~21 seconds for ~140 HTTP requests1, even though my HTTP setup is already quite pimped (with SPDY and all). What is worse: The time does not reduce on subsequent visits. No cache-control headers are sent and all the Javascript and CSS resources are requested again. There is ETag and If-None-Match in place, so most of the requests just yield a 304 Not Modified response, but they still block the loading process. Which is even less understandable if you look at the requests: All Javascript and CSS resources are using a “?v=md5($owncloud_version)” cache buster, so they would be fully cacheable with no ill effects.

For a standard owncloud installation in /var/www with Apache: Open your /var/www/owncloud/.htaccess in a text editor and append the following lines (Update 2014-10-09 18:35 UTC: Add missing \ before .)

<IfModule mod_headers.c>
<FilesMatch "\.(css|js)$">
Header set Cache-Control "max-age=2592000, public"
</FilesMatch>
</IfModule>

then in a shell make sure that the headers module is enabled in Apache:

sudo a2enmod headers

and restart Apache as prompted by a2enmod.

The next time you load the owncloud web interface your browser will be told to cache the Javascript and CSS resources for 30 days, and the time after that it won’t request them again. The “Files” app load time dropped from 21 to 6 seconds for me – with 16 instead of ~140 requests.  That’s almost reasonable!


  1. In Firefox: Press Ctrl-Shift‑Q to bring up the Network web developer tool to watch the drama unfold in its entirety. 

DJB’s tinydns and DNSSEC

While upgrading my server infrastructure I noticed that I really should be providing IPv6 not only for the services (like this HTTP/HTTPS site) but also for the DNS itself, and also at some point might want to enable DNSSEC for my domain to join in the fight with DANE against the mafia that is the global X.509 certification authority infrastructure.

My DNS servers have been powered by DJB’s most excellent djbdns package1 since I first started hosting theses services myself. The software package truly is fire and forget: You set it up once and it will continue working, with no maintenance or pesky software upgrades, year after year. That’s one thing Dan’s software is famous for.

The other thing everyone knows about his software is that if you want to add features, you’ll have to apply third-party patches. A well-known patch set for IPv6 in tinydns is available from my friend Fefe, and is also included in Debian-based distributions in a package called dbndns. Peter Conrad wrote DNSSEC support for tinydns (explicitly basing on Fefe’s IPv6 patches).

When trying to set that up, I quickly became frustrated: Applying several patches from several distinct locations one after the other doesn’t seem like the way software should be distributed in 2014. Also, Peter’s code has a few easily patched problems.

So I’ve set up github.com/henryk/tinydnssec/tree/dnssec‑1.05-test27-8ubuntu1-tinydnssec_1.3. Each commit is either the import of a tarball, application of a patch or a fix from me. I have signed the tag with my GPG key. You can easily use the github provided download link dnssec‑1.05-test27-8ubuntu1-tinydnssec_1.3.zip.

The steps I took, in order:

  1. Import djbdns‑1.05.tar.gz. No signature check was made since no signed version is available, but I checked that I was using the same package as Ubuntu/Debian.
  2. Apply djbdns‑1.05-test27.diff.bz2. I checked Fefe’s signature and verified his key’s fingerprint using a separate channel.
  3. Apply 0003-djbdns-misformats-some-long-response-packets-patch‑a.diff from the Ubuntu package.
  4. Apply 0004-dnscache.c‑allow-a-maximum-of-20-concurrent-outgoing.diff from the Ubuntu package.
  5. Apply djbdns-ipv6-make.patch. No signature check was done, but the patch is trivial.
  6. Import tinydnssec‑1.05–1.3.tar.bz2. I checked Peter’s signature and verified his key through the web of trust.
  7. Apply djbdns‑1.05-dnssec.patch from the aforementioned package.
  8. Small fixup for conf-cc and conf-ld: Do not use diet for compilation or linking (was introduced with Fefe’s patch).
  9. Small fixup for tinydns-sign.pl: Use Digest::SHA instead of Digest::SHA1.
  10. Small fixup for run-tests.sh: GNU tail does not understand the +n syntax.
  11. Small fixup for run-tests.sh: Need bash, say so (not all /bin/sh are bash).

The resulting source builds fine, and the tests mostly run fine. Tests 1 and 7 each fail in 50% of cases due to the randomized record ordering in the tinydns output which is not accounted for in the test code.

djbdns is in the public domain, tinydnssec is published under GPL‑3, which means that the combined source also falls under GPL‑3.


  1. The software package is ‘djbdns’, among the servers in it are ‘tinydns’ to host an authoritative UDP DNS server and ‘axfrdns’ to host a TCP DNS server 

Setting Arbitrary Baud Rates On Linux

Historically, baud rates on UNIX –later: POSIX– systems have been manipulated using the tcgetattr()/tcsetattr() functions with a struct termios and a very limited set of possible rates identified by constants such as B0, B50, B75, B110, …, through B9600. These have later been extended for select values such as B38400 and B115200. Hardware has since evolved to be able to use almost any value as a baud rate, even much higher ones. The interface however, has never been properly repaired.

Linux used a technique called “baud rate aliasing” to circumvent that problem in the past: A special mode can be set so that a request for B38400 would not actually set 38.4kBaud but instead a separately defined other baud rate with names like spd_hi (“high”?) for 57.6kBaud, spd_shi (“super high”?) for 230kBaud or spd_warp for 460kBaud. These names may give you an idea how old and limited this interface is.

For this reason there is a new ioctl interface to set an arbitrary baud rate by actually using an integer to store the requested baud rate: TCGETS2/TCSETS2 using struct termios2.

Both documentation and example code for this method are sparse. A bug report to implement this in libc6 is still open. Thankfully that bug report includes example C code to use the interface directly. The constant to tell the structure that an OTHER Baud rate has been set has unwisely been called BOTHER, which, being a proper English word, makes it completely impossible to find any information on the internet about. So, to be more explicit (and hopefully be found by any future search for this topic): This is an example on how to set a custom baud rate with the BOTHER flag on Linux in Perl.

Transforming the C example into Perl code using the Perl ioctl function should be easy, right? Muahahaha. Every example on how to use Perl ioctl on the Internet (that I’ve reviewed) is wrong and/or broken. Even better: the perl distribution itself is broken in this instance. Quoth /usr/lib/perl/5.18.2/asm-generic/ioctls.ph on Ubuntu 14.04:

eval 'sub TCGETS2 () { &_IOR(ord(\'T\'), 0x2a, 1;}' unless defined(&TCGETS2);

(hint: count the number of opening and closing parentheses.)

Even if that Perl code was syntactically correct, it’s wrong in principle: The third argument to the _IOR macro should be the struct termios2 structure size. On x86_64 it’s 44 bytes, not 1.

So, I’ve written code with two purposes:

  1. Correctly use Perl’s ioctl to
  2. set a custom serial baud rate under Linux.

The definitions of both TCGETS2 and struct termios2 may be architecture dependent, so there’s a helper program in C to output the parameters for the current architecture.

All the code (set baud rate with TCSETS2 BOTHER in C, set baud rate with TCSETS2 BOTHER in Perl, C helper to output constants for the current architecture, Makefile) I released into the public domain at github.com/henryk/perl-baudrate/.

On the Difference Between RFID and NFC

What is RFID? What is NFC? What is the difference between RFID and NFC? These questions come up time and again, so let me answer them in some detail.

Both are terms that are almost never used correctly, and both have, in a general sense, something to do with communicating or radioing.

What is RFID?

Let’s start with the older term: RFID is just “radio frequency identification”. It’s not really defined, beyond being a combination of the two attributes, and, if you are so inclined, you could cite the “Identification Friend or Foe” systems invented for military airplanes in the 1930s as one of the earliest RFID systems1.

In modern times, the term RFID is almost always used to imply a system consisting of few relatively complex ‘readers’ and a larger number of relatively, or very, simple ‘transponders’, with some sort of radio signal being used to indicate the identification, or at least presence2, of the latter to the former. Now, that’s still quite abstract, so let’s add further characteristics, at each step going in the direction of the systems that most people actually mean when they say RFID with no further qualification:

  • The transponder could be active (have its own power source) or passive (be energized by the reader using some physical effect), the latter is what’s on most peoples minds in the context of RFID.
  • A passive transponder can be communicated with with radio waves through radar backscatter (ultra-high frequencies, range in the hundreds of meters, very little power available to the transponder) or, more often seen in everyday life, be inductively coupled (low to high frequencies, range less than a couple meters, possibly high power available).
  • An inductively coupled transponder could operate on a non-standardized low frequency (LF, ~120–140kHz) in a proprietary system, the standard high frequency (HF, 13.56MHz) in a proprietary system, or, most uses of the term RFID, the 13.56MHz frequency using an ISO standardized protocol.
  • The 13.56MHz RFID ISO protocols are ISO 15693, vicinity coupling, defined range less than a meter, and, more often referenced in the context of “RFID”, ISO 14443, proximity coupling, defined range less than ten centimeters.

Different properties of these general approaches lead to a very domain specific understanding of what “a normal RFID system” is: Warehouse management applications sometimes deal with ISO 15693 and more often with Gen 2 EPC (ISO 18000–6, passive backscatter, UHF: 860–960MHz). Average consumers overwhelmingly find themselves confronted with ISO 14443 systems (electronic passports, credit cards, newer corporate IDs) or proprietary HF systems (many corporate IDs). Finally, most very simple or moderately old applications quite often work with proprietary LF systems.

It’s a shaky definition process, but at least once you have determined that you are talking about ISO 14443 you’re on quite firm ground. However, this only gets you to establish communication with a transponder, possibly gather a transponder specific unique identifier, and transmit bytes back and forth. The actual command set for reading and writing, and potentially other functions such as electronic purse applications, is a completely different horseride altogether.

What is NFC?

Now, on the subject of NFC, this is even less well defined – or possibly better, depending on how you look at it. It’s a relatively new term, so there’s no firm default interpretation you could use, beside it having to do something with “near-field” and “communication” (e.g. inductive coupling and some sort of information transfer). There are, however, a couple of well defined things that bear the name NFC – none of which are usually exclusively intended by someone using the term:

  • NFCIP‑1, also known as ISO 18092 (dual-published as ECMA-340 [PDF]), which is an air interface for half-duplex communication between two entities using inductive coupling on 13.56MHz, at least one of the entities must be actively powered.
  • NFC Forum which is an industry association that publishes a set of standards, among them are: 
    • NFC Data Exchange Format (NDEF) which is compact binary data storage and message serialization format
    • NFC Record Type Definition (RTD) which is a specification format for NDEF message formats
    • A couple of RTDs that define both the message format and expected semantics of common use cases such as smart posters, business cards, etc.
    • NFC Tag Type definitions (1 through 4) that define a set of protocols for passive data storage tags and how to access NDEF messages on them

How do RFID and NFC relate?

Now comes the fun part: NFCIP‑1 is, not by accident, compatible with ISO 14443, where appropriate. Full-on NFCIP‑1 devices generally can implement both sides (now called Initiator and Target) of the communication, and so are compatible both with ISO 14443 readers (by emulating a tag) and ISO 14443 tags (by behaving as a reader). As an aside: Most vendors, while they’re on the 13.56MHz frequency anyway, also implement all the usual 13.56MHz RFID protocols in the things they call NFC chipsets, which is not at all helpful when trying to untangle the standards salad. Just because your “NFC phone” can operate with a certain tag does not mean that it’s “doing NFC” in a certain narrowly defined sense.

And even better: The NFC tag types correspond to existing 13.56MHz RFID tag products, but sometimes in a generalized version. For example tag type 2 is essentially NXP Mifare Ultralight3, but where Ultralight has a fixed 64 bytes of memory, the tag type 2 also allows arbitrary sizes bigger than 64B. And indeed, one of the most ubiquitous “NFC tag”s that you can buy now are NFC type 2 tags which are not NXP Mifare Ultralight and have ~160 bytes of memory.

In conclusion, by NFC most people mean, depending on context, a tag type or message format from the NFC ecosystem, or the NFC chip in their phones, even when they are using it with any old ISO 14443 tag4, which, closing the loop here, is what most people mean when they are referencing RFID.


  1. I got that example from Dr. Melanie Rieback, who does so in all her talks. 

  2. This is sometimes referred to as ‘1‑bit identification’, and extremely often seen in the context of electronic article surveillance

  3. The memory map table in the NFC tag type definition is an almost verbatim copy of that in the Ultralight data sheet, however, you will not find the words “mifare” nor “ultralight” anywhere in the tag type definition document. 

  4. The single most widespread ISO 14443 transponder type is Mifare Classic, which is not an NFC Forum tag type, but, confusingly, works with most NFC implementations in mobile phones as if it was.