Debug

Basic checklist #

  • uptime: check load average
  • df -h: check filesystems usage
  • dmesg -H: check potentially-useful kernel messages
  • sudo journalctl -e: check service error messages
  • top or htop or ps aux: check irregular processes (dead or zombie processes)
  • free -h: check free memory
  • ip a: check IP addresses

When a system is abnormal (slow?), make sure you are able to identify what aspect of the system might be the cause: CPU, memory, networking or I/O?

Networking #

  • See flow graph of network I/O: nload
  • See network I/O per host: iftop
  • List remote IP of open connections: ss.
  • A background service that log the total network I/O in a period:
vnstat
vnstat --top10
vnstat -d

DNS #

$ cat /etc/resolv.conf # are server IPs proper?
$ nslookup www.google.com # nslookup is from the bind-utils package
$ nslookup linux3.sa.csie.ntu.edu.tw 10.217.44.1 # check DNS server 10.217.44.1 is working
$ ping www.google.com # if you don't have `nslookup`, this can also identify DNS problem

Unable to start a service #

  • lsof -i TCP:[port]: Check whether its listening port is used by another process

Irregular connections #

  • When the system is intruded, the attacker can install malicious programs that perform unusual amount of outbound connections, typically for the purpose of attacking other computers.
  • A user might be running P2P applications that are otherwise prohibited in TANet.
## Irregular traffic
$ ss -t -a # Any suspicious IP?
$ netstat -tulnp # Any weird process listening?

## Outbound connection slow ...
$ mtr  # Packet dropped in any route?
$ tcpdump -nn host [xxx] and port [yyy]
$ iftop  # Fancy tools XD
$ nload  # Also fancy tools

Disk or I/O #

Sometimes filesystem can become full or disk I/O become slow.

If you identify the offending over-sized file, it is not sufficient to just unlink it; you must make sure it is not being held opened by any process.

$ df -h # what filesystem is full?
$ ncdu  # check directories space usage
$ du -sh * # check directories space usage
$ lsof [somefile]  # check what process opens a file
$ iostat -x 1  # check unusual disk I/O

Broken services #

Sometimes a service can misbehave…

  • pgrep -al cupsd: are your service alive?

  • systemctl status sshd: are your service alive?

  • sudo strace -p $(pgrep cupsd) -s 1000 -f -o /tmp/trace-output: looks what happen in the cupsd process

  • vim -c 'set filetype=strace' /tmp/trace-output: view strace’s output

  • sudo journalctl -f -e -u sshd: monitor SSH server’s instantaneous output

  • sudo tail -f /var/log/sssd/sssd.log: monitor SSSD’s instantaneous log output

  • lsof [some file]: Is my file opened by any process?

Advanced material #

Brendan Gregg’s USE method

Calendar Last modified: March 11, 2019