A Brief and Painless Introduction To Bash

Given by Henry J Schmale on 2021 January 27

A pretty decent introduction to Bash for pipelines and interactive use.

What is Bash?

Bash is the Bourne Again SHell

It's the default shell and command language on a lot of Linux distributions
It was written by Brian Fox as a free software replacement for the Bourne shell in 1989
It's text oriented.
It's part of the GNU project

https://www.gnu.org/software/bash/

What are processes & Arguments/Flags?

Processes are programs on a running machine
Every program accepts a series of arguments, known as flags
These flags vary based on the program
Arguments are generally in the form of `--longopt` or `-a` for single options

Flag: are single character options like `-a`, and don't take an argument
Arguments: Arguments are what you actually pass to the program, and may take an argument, such as `--max-workers 5`.

An Introduction To Files

Files are streams of bytes. Possibly bi-directional, but usually opened for either reading or writing.

A stream has the following valid operations on it.

hasNext: Is there more information available in the file?
read: Get the next value in the stream.
write: Write a value into the stream.

Unix/Linux is all about files. Just about everything is accessible as a file.

Standard Input & Standard Output & Standard Error

By default a program has 3 files open.

Standard Input (stdin): This is usually connected to the keyboard, so a user can interact with the program. Alternately it can be attached to a file.
Standard Output (stdout): The default place where print statements go
Standard Error (stderr): This is where errors are usually written. It's kept separate so you might see the errors when writing a pipeline.

Creating a temporary file

Ever need a file for a small task, but don't want to go to the effort of coming up with a good name. Temporary files are a life saver. They don't stick around, and are guaranteed to have a unique name

How to create a temporary file

MY_TEMP_FILE=$(mktemp)
cat > $MY_TEMP_FILE

A subshell captures what is printed by `mktemp`, and assigned to `MY_TEMP_FILE`. You can write to the file by using `cat` and a redirection. Close the file with CTRL+D

Where am I and how do I move around?

These are some common commands to figure out where you are and what you want to do.

ls: List files in my current working directory
cd: Change directory
pwd: Print working directory (where exactly am I?)

Some Basic Commands

cut
awk
sort & uniq
cat
grep
find
curl
xargs

Cut: Remove sections from each line of files

This utility can split sections of a line by bytes or delimiter. It's most useful with simple CSVs. For example, the first and third columns are needed from a CSV

cut -d, -f1,3 random.csv

For more information read the man page.

Awk: pattern scanning and processing language

Awk is an entire language, but most people use it for line oriented processing. It's a fancy version of cut for a lot of purposes. It uses regex matching to take actions and has various blocks to take actions. The default field separator is white space.

# Print lines where the http status code is 404 not found
awk '$9 == "404"'

# Count up status codes
awk '{s[$9]+=1}END{for(c in s)print c " " s[c]}'

# Sum the 3rd column in a csv without headers
awk -F, '{s+=$3}END{print s}'

Generally you won't see too much awk used in these pipelines, as it takes longer to write and digest AWK over a few simple pipeline patterns. Just know it's very powerful, and can do a lot with very few characters.

Sort & Uniq

`sort` sorts lines in a file. `uniq` removes duplicate lines that appear together.

Sort Examples

# Sort by the natural ordering
sort my_words
# Sort by numbers
sort -n my_item_counts
# Sort in reverse ordering
sort -r my_words

Uniq Examples

# Remove duplicates
uniq my_duplicated_words
# find cases of duplicated words, and count them up
uniq -c my_document_broken_to_words_per_line

Cat - concatenate files

Cat takes a series of files to print the contents of

Print a single file

cat foo

Print many files

cat foo bar baz

Grep: Find it

Grep stands for globally search for a regular expression and print matching lines. Commands are generally in the following form:

grep [-P] pattern [files...]

Grep is often used in log processing, like counting up how many times a page or how many times a POST request occurred was viewed in Apache logs.

grep --count POST logs/access_log logs/access_log.old
logs/access_log:8
logs/access_log.old:19

Find: Search for files in a directory hierarchy

Find accepts a series of starting points to traverse the directory hierarchy, and supports various filtering operations. Examples:

# Find all directories below the current directory
find . -type d

# Find only files ending in .txt below the current directory
find . -type f -name '*.txt'

# Find all java files in the test directory that maven will consider a test
find */src/test -name '*Test.java'

These are some common operations, but it supports many more advanced operations. It supports filtering by access time, modified time, permissions, and many other attributes.

Curl : Download A File

Curl is one of the most commonly used programs ever. It makes sending web requests on a ton of protocols super easy. It's great for making API calls or downloading web pages. The result of the request is written to standard out.

Curl supports many flags, but it's a large topic. Generally curl is used like so:

curl $my_url > result.txt

xargs: Use Input to Build A Command Line

Many times we need to select a subset of files and perform an operation on them. Possibly in parallel.

find . -name '*.txt' | xargs zip MyTextFiles.zip

This adds all .txt in your directory hierarchy to a zip file.

Pipelines

Pipes allow commands to be composed and feed into one operation after another. You saw one with the xargs example. To link one command's output to another's standard input, the commands are separated with a pipe(|) character.

# Find a file with the most lines.
find . -type f | xargs wc -l | sort -n | tail -n 2 | head -n 1

# Sort some input
foo_pipeline | sort

# Dump intermediate results to a file.
foo_pipeline_step1 | tee -a $(mktemp) | foo_pipeline_remaining

Process Substitution

Many times it's necessary to feed the results of a pipeline into a program that accepts a file. Bash provides the ability to present these results as file with the following syntax.

# Use a pipeline to determine what patterns to search for
grep -f <(my_pipeline) file1.txt file2.txt

# Diff the output of 2 programs or pipelines
diff <(prog1) <(prog1)

The above feeds a series of patterns into grep, from a given pipeline. A more advanced example would be locating a series of files which lack a frontmatter field, but have an image in them. For example if one was working with Jekyll, it might look:

grep '!\[' _posts/* | grep -v -f <(grep -l cover: _posts/*)

Getting Help

There's lots of sources to get help with bash, and figure out the flags to a given program

The man pages. `man $PROGRAM_NAME`
`apropos`: Search the man pages
`whatis`: Displays a one line description of a program.

Sources and Further Reading

This is the end. Thank you for reading or attending.

This post is tagged:

Read more by exploring the posts in the above tags. Otherwise go home.