Pdf

PDF #

  • It seems that Evince renders clearer text than Firefox when printing a PDF file.
  • Many useful utilities are provided by Poppler.

Find a string in lots of PDFs #

Recursively search for a string in current directory

pdfgrep -RHni "storage"

Convert a color PDf into greyscale #

gs \
 -sOutputFile=output-file.pdf \
 -sDEVICE=pdfwrite \
 -sColorConversionStrategy=Gray \
 -dProcessColorModel=/DeviceGray \
 -dCompatibilityLevel=1.4 \
 -dNOPAUSE \
 -dBATCH \
 input-file.pdf

Substitute one page in a PDF file into another #

pdfseparate input.pdf i-%d.pdf
cp page.pdf i-3.pdf # substitute page three
pdfunite i-*.pdf output.pdf
rm i-*.pdf

Double-sided to single-sided #

To scan a double-sided document, two passes are required: one for the upper part, the other for the opposite parts, which produce two PDFs files. This script merges them into one.

if [ ! -f "$1" ] || [ ! -f "$2" ] || [ -z "$3" ]; then
    echo "Usage: pdf-merge-double-side [down side pdf] [upper side pdf] [output file]";
    exit 1;
fi

touch $3
if [ "$?" != 0 ]; then
    echo "Cannot write to output file: $3";
    exit 1
fi

tmp=$(mktemp -d)
down=$(readlink -f $1)
up=$(readlink -f $2)
out=$(readlink -f $3)
uptmp=____up_tmp
cd "$tmp"

# Reverse the upper side PDF first
pdfseparate "$up" "0%d00000000"
pdfunite $(find . -type f | sed 's/[^0-9]*//g' | sort -nr | tr '\n' ' ') \
    "$uptmp" && rm *00000000

# Merge
pdfseparate "$down" "%d00000"
pdfseparate "$uptmp" "%d00001"
rm -f "$uptmp"
pdfunite $(find . -type f | sed 's/[^0-9]*//g' | sort -n | tr '\n' ' ') \
    "$out"

cd / # just leave $tmp
rm -fr "$tmp"

Export highlights and comments into text #

Use pdfannots developed by Andrew Baumann: review a PDF paper (adding highlights and comments), then exporting them for writing review comments.

Calendar Last modified: April 21, 2020