So I wanted to format some text whereby I specify a width and a utility fills and wraps the lines.
Example input text:
The output for `ls-files` is incredibly limited, right? That's because the
typical use case is to list files of a certain type.
You can do
ls-files -c # show cached (tracked)
ls-files -d # show unstaged deletions
ls-files -u # show unmerged
There is also an option -t that shows status tag (though the tags are weird:
for instance, the tag for a tracked file is: H).
Fortunately there's a nice tool called fmt
that can do pretty much what we want.
You can do fmt -w 50 <text.txt
and get the following:
The output for `ls-files` is incredibly limited,
right? That's because the typical use case is to
list files of a certain type.
You can do
ls-files -c # show cached (tracked) ls-files -d
# show unstaged deletions ls-files -u # show
unmerged
There is also an option -t that shows status tag
(though the tags are weird: for instance, the tag
for a tracked file is: H).
Well that's not right. I would like it to leave lines alone that have fewer characters than the specified width. In other words, I want:
The output for `ls-files` is incredibly limited,
right? That's because the typical use case is to
list files of a certain type.
You can do
ls-files -c # show cached (tracked)
ls-files -d # show unstaged deletions
ls-files -u # show unmerged
There is also an option -t that shows status tag
(though the tags are weird: for instance, the tag
for a tracked file is: H).
I could do this pretty fast in Ruby, but in this case, I needed it in shell (for bash), and I didn't want to incur the startup time of a Ruby script called from a bash script.
My first instinct was to use arrays, because that feels natural to me as a programmer -- I might split on newlines and do something with the result.
Here's what I came up with. Note that on my target, the Mac, the latest version of bash is 3.2 (for licensing reasons). This means I couldn't use some of the newer array functions from bash 4.
(By the way, I do know the bash convention of using all uppercase for environment variable names, but that's just too ugly. Sorry.)
#!/usr/bin/env bash
width="$1"
shift
text=("$@")
output=()
buffer=()
line=""
last_index=0
format_command="fmt" # or `par -j`
format () {
local formatted
if [[ "${#buffer[@]}" -gt 1 ]]; then
formatted=$(printf '%s' "${buffer[@]}" | "$format_command" -w "${width}")
output+=("$formatted")
fi
output+=("")
buffer=()
}
for line in "${text[@]}"; do
# hit a newline; format everything accumulated in the buffer
if [ "${line}" == "" ]; then
format
# hit a short line; add it to the output unformatted
elif [[ "${#buffer[@]}" -eq 0 && "${line}" != "" && "${#line}" -le "${width}" ]]; then
output+=("$line")
# a regular line; add it to the buffer for later formatting.
else
buffer+=("$line ")
fi
done
format
last_index=$(( ${#output[@]} - 1 ))
unset "output[$last_index]"
# Split elements in original on newline
split_on_newline=()
while IFS= read -r line; do
split_on_newline+=("$line")
done <<< "$(printf '%s\n' "${output[@]}")"
output=("${split_on_newline[@]}")
So as you can see, there's rather a lot of wrangling of arrays: getting lines added from a file. There's also the syntax burden of bash arrays, which is non-trivial. I kept looking at this, and after awhile it just seemed dumb. Why not write a more routine bash command that reads from STDIN and writes to STDOUT?
So I came up with this, which is shorter and seems more natural:
#!/usr/bin/env bash
width=30
command="fmt"
buffer=""
[[ -n "$1" ]] && width=$1
[[ -n "$2" ]] && command=$2
while IFS= read -r line; do
if [[ -z "$line" && "${#buffer}" -ne 0 ]]; then
printf '%s\n' "$buffer" | $command -w "$width"
printf "\n"
buffer=""
elif [[ "${#buffer}" -eq 0 && "${#line}" -lt "$width" ]]; then
printf '%s\n' "$line"
else
buffer+="$line"
buffer+=" "
fi
done
printf '%s\n' "$buffer" | $command -w "$width"
This one takes two parameters, the first being the width, and the second being the command to use (if one doesn't want to use the default, fmt
). So if you like, you can use par
to justify the text. For example: ./wrap 50 "par -j"
gets you:
The output for `ls-files` is incredibly limited,
right? That's because the typical use case is to
list files of a certain type.
You can do
ls-files -c # show cached (tracked)
ls-files -d # show unstaged deletions
ls-files -u # show unmerged
There is also an option -t that shows status tag
(though the tags are weird: for instance, the tag
for a tracked file is: H).