PDF #
- It seems that Evince renders clearer text than Firefox when printing a PDF file.
- Many useful utilities are provided by Poppler.
Find a string in lots of PDFs #
Recursively search for a string in current directory
pdfgrep -RHni "storage"
Convert a color PDf into greyscale #
gs \
-sOutputFile=output-file.pdf \
-sDEVICE=pdfwrite \
-sColorConversionStrategy=Gray \
-dProcessColorModel=/DeviceGray \
-dCompatibilityLevel=1.4 \
-dNOPAUSE \
-dBATCH \
input-file.pdf
Substitute one page in a PDF file into another #
pdfseparate input.pdf i-%d.pdf
cp page.pdf i-3.pdf # substitute page three
pdfunite i-*.pdf output.pdf
rm i-*.pdf
Double-sided to single-sided #
To scan a double-sided document, two passes are required: one for the upper part, the other for the opposite parts, which produce two PDFs files. This script merges them into one.
if [ ! -f "$1" ] || [ ! -f "$2" ] || [ -z "$3" ]; then
echo "Usage: pdf-merge-double-side [down side pdf] [upper side pdf] [output file]";
exit 1;
fi
touch $3
if [ "$?" != 0 ]; then
echo "Cannot write to output file: $3";
exit 1
fi
tmp=$(mktemp -d)
down=$(readlink -f $1)
up=$(readlink -f $2)
out=$(readlink -f $3)
uptmp=____up_tmp
cd "$tmp"
# Reverse the upper side PDF first
pdfseparate "$up" "0%d00000000"
pdfunite $(find . -type f | sed 's/[^0-9]*//g' | sort -nr | tr '\n' ' ') \
"$uptmp" && rm *00000000
# Merge
pdfseparate "$down" "%d00000"
pdfseparate "$uptmp" "%d00001"
rm -f "$uptmp"
pdfunite $(find . -type f | sed 's/[^0-9]*//g' | sort -n | tr '\n' ' ') \
"$out"
cd / # just leave $tmp
rm -fr "$tmp"
Export highlights and comments into text #
Use pdfannots developed by Andrew Baumann: review a PDF paper (adding highlights and comments), then exporting them for writing review comments.