Splitting Large Audio Books

I’m big fan of audio books.   During past years I’ve been using setup described in this article (libresonic server, android client, audio encoded with opus codec) for audio books listening.  It works well  for me , but it’s best with audio books split to chapters or to parts not longer then 1 hour. However some audio books come in  one large file (m4b format, or  aax proprietary file  from Audible).  To listen to such audio books conveniently I need to split them. Luckily with ffmpeg tool and a bit of bash scripting it is not difficult.

Audio Books Formats

Basically audio books can use any of available format for digital audio, however following formats are most common:

mp3 – good old MPEG layer 3 is still predominant format for digital audio.  Audio book is usually a directory that contains mp3  files, usually one file per chapter, sometimes split arbitrary to several pieces of same duration. Metadata are in ID3 tags and cover image in either as image file in the directory or as ID3 tag in files.   As this is very informal layout indeed,  it’s usage differs for user to user, company to company. Especially ID3 tags are big mess (as intended originally for music, so they have to be re-purposed for audio books)

m4b (or m4a) –  This is MPEG 4 container with audio encoded with AAC codec.  This format is used by iTunes (m4b is basically equivalent to m4a, b it there just to stress that it is audio book).  Often m4a/m4b is one big file with chapters information in metadata – chapter name, starts and end – and supportive players (like VLC) can show list of chapters and let you skip directly to selected chapter. File also contains metadata tags and cover is usually encoded as additional video stream containing just a jpeg image.

aax – this is proprietary format of Audible (Amazon company and biggest player in commercial audio books in English language). Basically it’s very similar to m4b – it’s MPEG 4 container with AAC LC encoded audio. The main difference is DRM protection – the audio stream is encrypted with 4 bytes key, specific for customer who bought the file.  This means that in regular player like VLC you can see metadata, even can start playback, but will not hear anything (and will see lots of decoding error in terminal output). I would say this DRM protection is rather symbolical now, decryption key can be relatively easily recovered.

Other formats like Vorbis, Opus, WMA  are also possible for audio books but much more rare.

Why Opus?

I tried opus for audio books several years ago. My experiences are summarized in this article and  so far are quite positive. I can see more and more support for opus around and with advancement of AV1 video codec, where opus is supposed to be it’s primarily audio companion, opus will became one of main audio codecs of the future, I believe.

Opus provides very good compression for speech, while retaining good quality.  From my experiences I can use 32kbps or 48kbps bitrates for encoding , while maintaining very good quality of the audio and assuring comfortable listening of the audio books (I’m not such zealous audiophile, I’ve seen a guy claiming he cannot listen to audio book encoded below 192 kbps in MP3, which I consider rather excessive, if you look into details of audio books for Audible, they are encoded with AAC LC 64kbps with sample rate 22050 Hz – which is fairly comparable to opus 32kpbs with 24kHz sample rate concerning audio quality).

So main opus advantage for me is lower bitrate, which is especially appreciated when streaming audio book to mobile over Internet – it assures continuous playback even in areas with lower data speed and of course can have notable impact on mobile bills. And as I’m no media company, it’s enough for me to store in quality suitable for my listening and thus I can also save space on my home server.

Another advantage is that opus is open source,  royalty and patent free, so it can be easily used in any project and we all like open open source, right?

The Script

There is now better, more advanced Python script script here.

I’ve created bash script to split big audio books into smaller files encoded with opus audio (script is using ffmpeg and ffprobe):

#!/bin/bash
# Author : <Ivan Zderadicka> ivan@zderadicka.eu
# License: MIT
VERSION="0.2.3"
BITRATE=48
CUTOFF=12000
SEGMENT_TIME=1800
COMMON_PARAMS="-nostdin -v error"
print_help () {
cat << EOF
Splits large audiobook files into smaller parts which are then encoded with Opus codec.
Split points are either chapters defined in the audiobook or fixed size pieces.
Requires ffmpeg adn ffprobe version v >= 2.8.11
Supports input formats m4a, m4b, mp3, aax (mka should also work but not tested)
Usage: split_audiobook.sh [options] <audiobook>...
-h, --help Shows this help
-v, --version Prints version and exits
-r, --replace Replace existing output directory
-q <quality>
--quality <quality> Quality of the output - top (64kbps cutoff 20kHz), high (48kbps, cutoff 12kHz),
normal (32kbps, cutoff 12kHz), low (24kbps, cutoff 8kHz) [default: high]
-l <secs>
--length <secs> Lenght of piece in seconds (in case chapters are not defined) [default: $SEGMENT_TIME]
--activation_bytes <xxxxxxxx> Activation bytes required for aax format
EOF
}
while (( $# > 0 )); do
case $1 in
-h|--help)
print_help
exit
;;
-v|--version)
echo Version: $VERSION
exit
;;
-r|--replace)
REPLACE_DIR=1
;;
-q|--quality)
case $2 in
high)
BITRATE=48
CUTOFF=12000
;;
top)
BITRATE=64
CUTOFF=20000
;;
low)
BITRATE=24
CUTOFF=8000
;;
normal)
BITRATE=32
CUTOFF=12000
;;
*)
echo Invalid quality param $2 >&2
exit 1
;;
esac
shift
;;
--activation_bytes)
ACTIVATION_BYTES=$2
shift
;;
-l|--length)
SEGMENT_TIME=$2
shift
;;
*)
break
;;
esac
shift
done
OPUS_PARAMS="-acodec libopus -b:a ${BITRATE}k -vbr on -compression_level 10 -application audio -cutoff $CUTOFF"
temp_file=$(tempfile) || exit 1
trap "rm -f -- $temp_file" EXIT
trap "exit 2" SIGINT
wait_proc() {
while (( $(jobs -pr | wc -l ) >= $(nproc) )); do
sleep 1
done
}
while [[ $# -gt 0 ]]; do
echo Processing file $1
if [[ ! -f "$1" ]]; then
echo File $1 does not exists >&2
shift
continue
fi
ext=${1##*.}
if [[ $ext = "aax" && ${#ACTIVATION_BYTES} != 8 ]]; then
echo "Activation bytes (4 bytes = 8 chars in hexa) are needed for aax file" >&2
shift
continue
fi
if [[ -n "$ACTIVATION_BYTES" ]]; then
COMMON_PARAMS="-activation_bytes $ACTIVATION_BYTES $COMMON_PARAMS"
fi
ffprobe -v error -print_format compact=nokey=1 -show_chapters "$1" > $temp_file
dirname=${1%.*}
if [[ -n "$REPLACE_DIR" && -e "$dirname" ]]; then
rm -r "$dirname"
fi
mkdir "$dirname"
if [[ $? != 0 ]]; then
echo "Directory $dirname exists or cannot be created" >&2
shift
continue
fi
num_chapters=$(wc -l < $temp_file)
if [[ $num_chapters -gt 1 ]]; then
count=0
while IFS=\| read -r _ id _ _ start _ end chapter; do
((count++))
echo Processing chapter $count of $num_chapters
chap=${chapter/\//-}
{
ffmpeg $COMMON_PARAMS -i "$1" -ss "$start" -to "$end" -vn $OPUS_PARAMS\
-metadata title="$chapter"\
-metadata track="$count/$num_chapters"\
"$dirname/$(printf %03d $(($count-1))) - $chap.opus"
if [[ $? -ne 0 ]]; then
echo Error processing chapter $count of $num_chapters >&2
else
echo Finished chapter $count of $num_chapters
fi
} &
wait_proc
done < $temp_file
else
echo "No chapters found"
echo "Splitting file into pieces of $SEGMENT_TIME secs"
# this works fine however title and track tags cannot be sent for each part
# ffmpeg $COMMON_PARAMS -i "$1" -vn $OPUS_PARAMS -f segment -segment_time $SEGMENT_TIME\
# -reset_timestamps 1 "$dirname/%03d.opus"
if [[ $ext = "m4b" ]]; then
ext=m4a
fi
ffmpeg $COMMON_PARAMS -stats -i "$1" -vn -acodec copy -f segment -segment_time $SEGMENT_TIME\
-reset_timestamps 1 "$dirname/%03d.$ext"
count=0
num_files=$(ls -1q "$dirname" | wc -l)
echo Done with file split - number of parts is $num_files
for f in "$dirname/"*; do
((count++))
echo Processing part $count of $num_files
{
ffmpeg $COMMON_PARAMS -i "$f" $OPUS_PARAMS -metadata track=$count/$num_files\
-metadata title="Part $(($count - 1))" "${f%.*}.opus"
if [[ $? = 0 ]]; then
rm "$f"
echo Finished part $count of $num_files
else
echo Error converting file $f >&2
fi
} &
wait_proc
done
fi
# try extract cover art
ffmpeg $COMMON_PARAMS -i "$1" "$dirname/cover.jpg"
shift
done
wait
exit

Usage is pretty straightforward ( run with -h to see help). It can split large m4b/m4a files  into smaller files by chapters (if they are defined in metadata) or to files of fixed duration ( half an hour by default). Split files are stored in subdirectory with same name as the original file. Most time consuming is transcoding of audio – so it’s done in parallel (number of processes is number of cores). Cover image is also extracted to that directory (if possible). It works also for mp3 and aax (if you provide activation bytes) files.

If you need to convert individual mp3 files without splitting check this script.

4 thoughts on “Splitting Large Audio Books”

    1. 🙂 This was not about DRM protection – it was mostly about splitting unprotected files. It just happens that ffmpeg supports removal of DRM protection for aax files, so it’s available in script too.

Leave a Reply to admin Cancel reply

Your email address will not be published. Required fields are marked *