We had just deployed to a staging environment and nothing seemed to be working right. There were no problems though when running locally or in the integrated dev/QA environments. The logs showed there were SSL certificate verification issues, however, the SSL certificate was validating fine in browsers. Something else was amiss, but what?
We did find and solve the SSL issue. It was caused by the customer-specific truststore for root certificates. It was missing the certificate for the certificate authority that had signed their SSL certificate, Comodo. That's not the interesting part though.
There were multiple keystores and truststores on the production server not counting the system or JVM defaults. After spending time looking through configurations to see what was being used we decided to take step back and interrogate the system. It could tell us what file was being opened.
This got me thinking about three ways that I had used in the past to interrogate the system: lsof, strace, and dtrace. In fact, I even got to learn about a fourth: SystemTap.
While I didn't get to use all of these to solve our issue (as dtrace and SystemTap aren't available on all systems), they are good tools to have in your toolbox.
Here are three simple ways to use them to find out who's opened a particular file, or, as in our case: what file has been opened.
lsof is a unix command which is for "list open files". It can be used to determine which process currently has a file open. It has a lot of options, but in its most basic form is incredibly easy to use:
The above command will show you which process has
Here's a working example:
If you close the file or end the irb session and re-run
lsof you'll no longer see output as there is no process with the file currently open.
You can find more information via its main page. There are a lot of options, but at its heart the simple rule is this: if a process currently has a file open
lsof will be able to tell you about it.
This is a blessing and a curse as only being able to interrogate the system about currently open file handles can be limiting.
A great list of
lsof examples can be found over at http://www.catonmat.net/blog/unix-utilities-lsof/.
strace is a linux command for tracing system calls and signals. It works by running a specified command, intercepting and recording system calls and signals, and then reporting its findings when the command has finished.
The basic usage is this:
Now run strace on
That prints out – unfiltered – all system calls and signals that occured when running the ruby command. It's a bit more than what we care for.
strace does provide a way to get better signal to noise ratio. We can specify a qualifying expression using the
-e option to indicate we only want to see
open system calls:
Your output will now look a lot more reasonable and possibly a lot like this:
This is pretty close to what we did when looking to see what truststore file was being opened by our process, with one major caveat. By default
strace won't enabling tracing for child processes.
Let's run the above command again and cause our ruby process to fork before reading
This time you'll notice that we're missing the last system open on
/etc/passwd. We can fix this using the
-f option which tells
strace to trace child processes:
This will trace forks of forks of forks:
One last tip for using
strace. It's the
-o option which lets you specify an output file. This can come in handy when the process you start with
strace ends up being started by other processes (like init.d daemons) and you don't have access to its STDOUT.
The following example logs the system calls to
strace is a great tool especially on linux systems since it seems to be installed on many distributions by default. It's not installed by default on OSX or other unix systems as far as I know although there appear to be packages that may be installed.
A good list of
strace examples can be found at http://www.commandlinefu.com/commands/using/strace.
dtrace, which stands for "dynamic tracing", is the bees knees in process and systems debugging. It gives you the ability to trace dynamic languages, compiled executables, libraries, system calls, kernel calls, and hardware calls – and does it dynamically.
There's a lot to
dtrace and its power extends far beyond what
strace can accomplish. There are a few books on
dtrace (one of which I am currently working my way through).
Since the scope of
dtrace is so vast I'm going to force myself to keep it super minimal. Oh, and you should know,
dtrace is available on OSX 10.5 and up, FreeBSD, Oracle Solaris, and OpenSolaris. There is a port to linux but I haven't used it. And by default it requires superuser privileges to run.
Here's an example for finding which files are being opened on your system:
This will output all of the files that are being opened on your system by all processes. It's quite a bit of information! If only there was a way to narrow it down.
Well, you're in luck. With
dtrace you can use predicates to narrow down the results collected. For example, start an
irb session in one terminal, and then open a second terminal.
In that second terminal, let's find out the PID of the
Now insert that PID into the follwoing
dtrace command and run it:
dtrace running in your second terminal and navigate back to your first terminal. The one with
irb running. In
irb read a file like so:
Now go back to your
dtrace terminal. You'll see something similar to this:
Any time your
irb session opens a file
dtrace will tell you about it. This is not limited to
irb either. You can trace any open system calls for any process. It's just that in the above example we chose to focus one our
Let's revisit the problem described at the top of the post for just a second. We could have used
dtrace and the above command(s) to help us identify the truststore file that was being used. Unfortunately, the server was RedHat Enterprise Linux which doesn't have
dtrace. Had it been available it would have saved us time stopping our daemon, modifying the init script to run with
strace, and then re-starting the daemon to collect output.
All in all
dtrace is very powerful and hopefully this one example provides a little insight into how nice it can be.
Many useful one liners for
dtrace have been shared by Brendan Gregg over at http://www.brendangregg.com/DTrace/dtrace_oneliners.txt.
There's a possible fourth... SystemTap
I was using RHEL6 and tried to get SystemTap to work. Sadly, it didn't happen. The necessary
debuginfo kernel package that SystemTap needs isn't currenlty in the RHEL6 RPM/Yum repository.
It's not every day that you need to find out what low level system calls are ongoing. When you do need to know tools like lsof, strace, dtrace, and SystemTap can make the difference between pulling your hair out and pinpointing the issue. You may need to find out who's got your file in some of the strangest situations, like, in the case of SSL certificates and truststores.
Masthead image courtesy of Paramount Pictures