Mastering Strace tool: A Practical Guide

we’ll dive deep into `strace`, uncovering how it works, its role in the Linux ecosystem, and how you can wield it to debug and analyze applications effectively. We will also focus on how to interpret its output, so you can make the most of this tool

Introduction

Linux is a powerful and flexible operating system that developers around the world value for its openness, reliability, and ability to be customized. One of its standout tools is strace, an essential utility for debugging and gaining insights into how programs interact with the system.

In this blog post, we’ll dive deep into strace, uncovering how it works, its role in the Linux ecosystem, and how you can wield it to debug and analyze applications effectively. We will also focus on how to interpret its output, so you can make the most of this tool.

What is the Linux System?

Linux is a Unix-like operating system kernel that forms the backbone of countless distributions, from Ubuntu to Fedora. It manages hardware resources, provides an interface for applications to run, and facilitates multitasking and networking.

Why Developers Love Linux

Transparency: Linux is open-source, allowing anyone to inspect, modify, and learn from its code.
Stability: Renowned for its reliability in both desktop and server environments.
Customization: From the kernel to the desktop environment, everything is customizable.
Community: A robust community and wealth of resources make Linux approachable for beginners and experts alike.

System Calls in Linux: Categories and Main Methods

Before we dive into strace, let’s first understand system calls. These are the bridge between user applications and the kernel. When an application needs to perform a task that requires hardware access or privileged operations, it uses system calls

So in simple words what is a system call is a way for programs to ask the operating system to do something for them. When a program makes a system call, it switches from regular user mode to a more powerful mode called kernel mode. This allows the program to request services, like reading a file or creating a new process, from the operating system.

System calls are grouped into five main categories:

Process Control
File Management
Device Management
Information Maintenance
Communication

Let’s explore each category in detail and list the main system calls related to them.

1. Process Control

Process control system calls manage the execution of processes. These system calls enable the creation, termination, and manipulation of processes and threads. The operating system provides these services to manage the execution lifecycle of a process.

Main System Calls in Process Control:

fork(): Creates a new process by duplicating the calling process. The new process is a child of the calling process.
- Return Value: The process ID of the child is returned to the parent, and 0 is returned to the child process.
exec(): Replaces the current process’s image with a new program. It is used after fork() to run a different program in the child process.
- Variants: execv(), execp(), execl(), etc.
wait(): Makes the parent process wait for the termination of a child process. It returns the exit status of the terminated child.
exit(): Terminates the calling process and returns an exit status to the parent process.
getpid(): Returns the process ID (PID) of the calling process.
getppid(): Returns the parent process ID (PPID) of the calling process.
kill(): Sends a signal to a process, which can terminate it or trigger other actions based on the signal.

Explanation:

Process Creation: The fork() system call is fundamental to process creation. It creates a new child process which is a copy of the parent process, and then you can use exec() to replace the child’s image with a different program.
Process Termination: After the child process completes its task, it exits, and the parent can capture the exit status using wait().
Signal Handling: The kill() system call sends a signal to a process, allowing communication or termination of processes.

2. File Management

File management system calls allow processes to interact with files. These system calls help manage file access, reading, writing, and other file-related operations.

Main System Calls in File Management:

open(): Opens a file for reading, writing, or both. It returns a file descriptor used for further operations on the file.
- Flags: O_RDONLY, O_WRONLY, O_RDWR, O_CREAT, O_TRUNC, etc.
read(): Reads data from a file. It takes a file descriptor and stores the data into a buffer.
write(): Writes data to a file. It takes a file descriptor and the data to be written.
close(): Closes an open file descriptor, releasing the resources associated with it.
lseek(): Changes the file offset for the next read/write operation. This is used to navigate through large files.
stat(): Retrieves information about a file, such as its size, permissions, and timestamps.
unlink(): Deletes a file. It removes the file from the filesystem.

Explanation:

File Operations: open(), read(), write(), and close() are the core system calls for interacting with files.
File Information: stat() provides metadata about a file, while lseek() allows for precise file navigation.
File Deletion: unlink() removes a file from the filesystem, freeing up space.

3. Device Management

Device management system calls handle interactions with hardware devices, including input/output devices such as disks, terminals, and network interfaces.

Main System Calls in Device Management:

ioctl(): Provides a way for applications to communicate with device drivers, allowing them to control hardware directly. This call can perform various device-specific operations.
read() and write(): These system calls are also used for device I/O, where a device is treated as a file.
mmap(): Maps a file or device into memory. This allows for direct memory access to hardware, offering faster I/O operations.

Explanation:

Device Interaction: Devices in Linux are represented as files, and read() and write() are commonly used to interact with them.
Device Control: ioctl() allows for device-specific operations that go beyond basic file operations.
Memory Mapping: mmap() is often used for devices like graphics cards and for memory-mapped I/O operations.

4. Information Maintenance

Information maintenance system calls help manage and retrieve system information, such as process information, system time, and system configuration.

Main System Calls in Information Maintenance:

gettimeofday(): Retrieves the current time of day, including the number of seconds and microseconds since the Unix epoch.
time(): Returns the number of seconds since the Unix epoch.
uname(): Provides system information, such as the kernel version, machine architecture, and operating system.
sysinfo(): Retrieves various system statistics, including memory usage, load average, and uptime.
getpid() and getppid(): These calls return the process ID and parent process ID, respectively.

Explanation:

Time: gettimeofday() and time() are used to fetch the current system time, which is crucial for timestamps and scheduling.
System Info: uname() provides details about the system, while sysinfo() offers performance statistics.

5. Communication

Communication system calls facilitate the exchange of data between processes, either within the same system or across networked systems.

Main System Calls in Communication:

pipe(): Creates a unidirectional data channel (pipe) between two processes. One process writes to the pipe, and the other reads from it.
socket(): Creates an endpoint for network communication. This can be used to create TCP/UDP sockets for inter-process communication over the network.
connect(): Establishes a connection to a remote socket (e.g., a network server).
send() and recv(): Used for sending and receiving data over a socket connection.
shmget(): Creates a shared memory segment, allowing multiple processes to access the same memory space.
msgget(): Creates a message queue, enabling inter-process communication via messages.

Explanation:

Inter-process Communication: pipe(), shmget(), and msgget() provide ways for processes to share data. Pipes offer communication between parent-child processes, while shared memory allows multiple processes to access the same memory space.
Network Communication: socket(), connect(), send(), and recv() are the core calls for network communication, forming the foundation for client-server interactions.

What is `strace`?

strace is a diagnostic, debugging, and troubleshooting tool for Linux that tracks system calls made by a process. It provides visibility into what your program is doing behind the scenes, offering a real-time or recorded log of system calls and signals.

Think of strace as a microscope for your program's execution. It reveals the sequence of system calls and signals, providing insight into what your program is doing behind the scenes.

How `strace` Works

strace operates by intercepting and recording the system calls made by a process. A system call is how a program interacts with the operating system to request services such as reading a file, creating a process, or sending network data.

Installing `strace`

Before using strace, ensure it’s installed on your system:

bash
sudo apt install strace  # For Debian/Ubuntu systems

How to Read `strace` Output

Understanding strace output is critical to using it effectively. The typical format of a system call in the output is as follows:

bash
  open("example.txt", O_RDWR) = 3

Breaking Down the Structure

System Call Name: open — This is the name of the system call being invoked.
Arguments:
- "example.txt": The first argument, typically a string, specifies the file to open.
- O_RDWR: The second argument, often a constant, specifies the mode (read/write in this case).
Return Value:
- = 3: The return value is 3, which is the file descriptor assigned by the kernel for this file. If the call fails, this would instead be -1 with an error code (e.g., ENOENT) displayed.

Understanding this structure allows you to trace the behavior of your application step by step.

Example: Tracing `cat /dev/null`

Run cat /dev/null with strace:

bash
  strace strace cat /dev/null

Sample Output:

bash
    execve("/usr/bin/cat", ["cat", "/dev/null"], 0x7ffd6925a260 /* 64 vars */) = 0
    .......
    openat(AT_FDCWD, "/dev/null", O_RDONLY)   = 3
    newfstatat(3, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, AT_EMPTY_PATH) = 0
    fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
    read(3, "", 131072)                       = 0
    ....
    close(3)                                  = 0
    ....
    exit_group(0)

Detailed Interpretation

bash
    execve("/usr/bin/cat", ["cat", "/dev/null"], 0x7ffd6925a260 /* 64 vars */) = 0

execve: This system call executes the /usr/bin/cat program, passing the arguments cat and /dev/null.
- ["cat", "/dev/null"]: These are the command-line arguments.
- = 0: The execution of execve was successful, returning 0.

bash
    openat(AT_FDCWD, "/dev/null", O_RDONLY) = 3

openat: Opens the file /dev/null with read-only (O_RDONLY) access.
- /dev/null: The special file /dev/null is a device that discards all data written to it and returns EOF when read.
- = 3: The kernel successfully opens /dev/null and assigns file descriptor 3.

bash
    read(3, "", 131072) = 0

read: Reads from file descriptor 3 (/dev/null). Since /dev/null always returns EOF, no data is read.
- = 0: Indicates that nothing was read because /dev/null produces no content.

bash
    close(3) = 0

close: Closes the file descriptor 3 (the open /dev/null).
- = 0: The file descriptor was successfully closed.

bash
    exit_group(0)

exit_group: The process exits with an exit status of 0, indicating that it completed successfully.

Key Flags in `strace`

strace offers various options (flags) that allow you to customize its behavior. Below are some of the most useful flags you can use with strace:

1. `-e` (Expression)

Use -e to specify which system calls to trace. For example, to trace only openat or open calls:

bash
 strace -e openat ls -l

Output:

bash
    openat(AT_FDCWD, "ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
    openat(AT_FDCWD, "../../libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
    openat(AT_FDCWD, "../../libc.so.6", O_RDONLY|O_CLOEXEC) = 3
    openat(AT_FDCWD, "../../libpcre2-8.so.0", O_RDONLY|O_CLOEXEC) = 3
    openat(AT_FDCWD, "../../filesystems", O_RDONLY|O_CLOEXEC) = 3
    openat(AT_FDCWD, "../../locale-archive", O_RDONLY|O_CLOEXEC) = 3
    openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3

Example:

Trace only open or openat and read system calls:

bash
    strace  -e  trace=openat,read  ls  -l

Trace all system calls but exclude openat:

bash
    strace  -e  \!open  ls  -l

Since shell interprets ! as a special symbol and thus fails to run the command.

Use quotes ' : strace -e '!open' ls -l

or escape with \ : strace -e \!open ls -l

2. `-f` (Follow Forks)

The -f flag tells strace to follow child processes created by the traced process. This is useful when debugging programs that create child processes using fork() or clone().

Example:

bash
    strace -f bash -c "echo hello; ls /tmp"

The bash shell will separate processes for each command (e.g., echo, ls) and Strace will trace all system calls made by the bash

another example can be:

bash
    strace -f git clone https://github.com/Edmondi-Kacaj/shell-examples.git

Why? git spawns subprocesses for network operations, unpacking objects, and interacting with the filesystem.

3. `-p` (Process ID)

The -p flag allows you to attach strace to an already running process by specifying its process ID (PID).

Example:

bash
    strace  -p  12345

This will attach strace to the process with PID 12345 and start tracing its system calls.

4. `-s` (String Size)

By default, strace will only print the first 32 bytes of any string argument passed to system calls. You can change this limit with the -s flag to show longer strings.

Example:

without -s

bash
    strace -e trace=write  ls

Output:

bash
    write(1, "image1.jpg  miscfile  notes.txt "..., 62image1.jpg miscfile  notes.txt  report.pdf  script1.sh  test) = 62

with -s

bash
    strace -e trace=write -s 256  ls

Output:

bash
    write(1, "image1.jpg  miscfile  notes.txt  report.pdf  script1.sh  test\n", 62image1.jpg  miscfile  notes.txt  report.pdf  script1.sh  test) = 62

This will trace the write system calls and display up to 256 bytes of the string arguments for each call.

5. `-o` (Output File)

The -o flag allows you to redirect the output of strace to a file instead of printing it to the terminal.

Example:

bash
    strace  -o  output.txt  ls  -l

This will save the trace output to output.txt.

6. `-y` or `-yy` (`-y` and `-yy` options control the level of detail displayed )

Let's we want to spy on ls -l command (trace only read):

bash
   strace -e trace=read ls -l

Result:

bash
   read(3, "...."..., 4096) = 2996

but we don't know which file 3 the system refers too.

The -y flag is used to display more info about the system call, for example, instead of showing the raw memory address of a file descriptor, it might display the actual filename and -yy is used to provide even more information.

Example:

bash
   strace -y -e trace=read ls -l

Result:

bash
   read(3</etc/locale.alias>, "...... "..., 4096) = 2996

As we see now the system is referring to the /etc/locale.alias

Example Walkthroughs

Example 1: Monitoring Network Traffic

If you want to trace only network-related system calls (like socket(), connect(), send(), recv()) for a process, you can use the following:

bash
    strace  -e  trace=network  -p  12345

Output example:

bash
    recvmsg(.., {msg_namelen=0}, 0)         = -1 EAGAIN (Resource temporarily unavailable)
    recvmsg(.., {msg_name=.., msg_namelen=.., msg_iov=[{iov_base="...", iov_len=...}], msg_iovlen=.., msg_controllen=.., msg_flags=0}, 0) = 32

This will show network-related activity of the process with PID 12345, such as establishing connections and sending/receiving data.

Example 2: Trying to open an file without permission

This example demonstrates using strace to trace system calls related to file access when attempting to open or create a file without sufficient permissions.

bash
    strace -e trace=openat,write -s 128 /bin/touch /etc/apache2/sites-enabled/000-default.conf

Explanation:

-e trace=openat,write: Filters the trace to show only openat (file access) and write (error messages) system calls.
-s 128: Displays up to 128 characters of string data, ensuring longer strings like file paths are fully visible.
/bin/touch /etc/apache2/sites-enabled/000-default.conf: The program being traced attempts to create or update a file in a protected directory, in which case I'm trying to update the default file generated by apache server.

Output

bash
    openat(AT_FDCWD, "/etc/apache2/sites-enabled/000-default.conf", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 EACCES (Permission denied)
    .....
    write(2, ": Permission denied", 19: Permission denied)     = 19
    ......

Detailed Breakdown:

openat:
- Attempts to open or create the file /etc/apache2/sites-enabled/000-default.conf.
- Fails with -1 EACCES, indicating "Permission denied".
write:
- Outputs the error message : Permission denied to standard error (file descriptor 2).
- Writes 19 bytes successfully, showing how the error is communicated to the user.

Conclusion

In this blog, we learned about system calls and their role in enabling communication between user applications and the kernel. We explored the different categories of system calls and how they help manage processes, files, devices, information, and communication.

We also went deep into understanding the strace command, which is an invaluable tool for tracing and debugging system calls made by a program. With strace, we can monitor system calls in real time, filter specific calls, follow child processes, and save trace outputs to files. Mastering strace gives you the ability to diagnose issues, optimize performance, and gain a deeper understanding of how your programs interact with the Linux operating system.

Table of Contents

Mastering Strace tool: A Practical Guide