Arrays are rarely the natural choice in bash; use files by jgn on Monday, December 2, 2024 in programming and bash

So I wanted to format some text whereby I specify a width and a utility fills and wraps the lines.

Example input text:

The output for `ls-files` is incredibly limited, right? That's because the
typical use case is to list files of a certain type.

You can do

  ls-files -c  # show cached (tracked)
  ls-files -d  # show unstaged deletions
  ls-files -u  # show unmerged

There is also an option -t that shows status tag (though the tags are weird:
for instance, the tag for a tracked file is: H).

Fortunately there's a nice tool called fmt that can do pretty much what we want.

You can do fmt -w 50 <text.txt and get the following:

The output for `ls-files` is incredibly limited,
right? That's because the typical use case is to
list files of a certain type.

You can do

  ls-files -c  # show cached (tracked) ls-files -d
  # show unstaged deletions ls-files -u  # show
  unmerged

There is also an option -t that shows status tag
(though the tags are weird: for instance, the tag
for a tracked file is: H).

Well that's not right. I would like it to leave lines alone that have fewer characters than the specified width. In other words, I want:

The output for `ls-files` is incredibly limited,
right? That's because the typical use case is to
list files of a certain type.

You can do

  ls-files -c  # show cached (tracked)
  ls-files -d  # show unstaged deletions
  ls-files -u  # show unmerged

There is also an option -t that shows status tag
(though the tags are weird: for instance, the tag
for a tracked file is: H).

I could do this pretty fast in Ruby, but in this case, I needed it in shell (for bash), and I didn't want to incur the startup time of a Ruby script called from a bash script.

My first instinct was to use arrays, because that feels natural to me as a programmer -- I might split on newlines and do something with the result.

Here's what I came up with. Note that on my target, the Mac, the latest version of bash is 3.2 (for licensing reasons). This means I couldn't use some of the newer array functions from bash 4.

(By the way, I do know the bash convention of using all uppercase for environment variable names, but that's just too ugly. Sorry.)

#!/usr/bin/env bash

width="$1"
shift
text=("$@")

output=()
buffer=()
line=""
last_index=0

format_command="fmt"    # or `par -j`

format () {
  local formatted

  if [[ "${#buffer[@]}" -gt 1 ]]; then
    formatted=$(printf '%s' "${buffer[@]}" | "$format_command" -w "${width}")
    output+=("$formatted")
  fi
  output+=("")
  buffer=()
}

for line in "${text[@]}"; do
  # hit a newline; format everything accumulated in the buffer
  if [ "${line}" == "" ]; then
    format
  # hit a short line; add it to the output unformatted
  elif [[ "${#buffer[@]}" -eq 0 && "${line}" != "" && "${#line}" -le "${width}" ]]; then
    output+=("$line")
  # a regular line; add it to the buffer for later formatting.
  else
    buffer+=("$line ")
  fi
done

format

last_index=$(( ${#output[@]} - 1 ))
unset "output[$last_index]"

# Split elements in original on newline
split_on_newline=()
while IFS= read -r line; do
    split_on_newline+=("$line")
done <<< "$(printf '%s\n' "${output[@]}")"
output=("${split_on_newline[@]}")

So as you can see, there's rather a lot of wrangling of arrays: getting lines added from a file. There's also the syntax burden of bash arrays, which is non-trivial. I kept looking at this, and after awhile it just seemed dumb. Why not write a more routine bash command that reads from STDIN and writes to STDOUT?

So I came up with this, which is shorter and seems more natural:

#!/usr/bin/env bash

width=30
command="fmt"
buffer=""

[[ -n "$1" ]] && width=$1
[[ -n "$2" ]] && command=$2

while IFS= read -r line; do
  if [[ -z "$line" && "${#buffer}" -ne 0 ]]; then
    printf '%s\n' "$buffer" | $command -w "$width"
    printf "\n"
    buffer=""
  elif [[ "${#buffer}" -eq 0 && "${#line}" -lt "$width" ]]; then
    printf '%s\n' "$line"
  else
    buffer+="$line"
    buffer+=" "
  fi
done
printf '%s\n' "$buffer" | $command -w "$width"

This one takes two parameters, the first being the width, and the second being the command to use (if one doesn't want to use the default, fmt). So if you like, you can use par to justify the text. For example: ./wrap 50 "par -j" gets you:

The output  for `ls-files` is  incredibly limited,
right? That's  because the typical use  case is to
list files of a certain type.

You can do

  ls-files -c  # show cached (tracked)
  ls-files -d  # show unstaged deletions
  ls-files -u  # show unmerged

There is also  an option -t that  shows status tag
(though the tags are  weird: for instance, the tag
for a tracked file is: H).