A pretty decent introduction to Bash for pipelines and interactive use.
Bash is the Bourne Again SHell
Files are streams of bytes. Possibly bi-directional, but usually opened for either reading or writing.
A stream has the following valid operations on it.
Unix/Linux is all about files. Just about everything is accessible as a file.
By default a program has 3 files open.
Ever need a file for a small task, but don't want to go to the effort of coming up with a good name. Temporary files are a life saver. They don't stick around, and are guaranteed to have a unique name
MY_TEMP_FILE=$(mktemp)
cat > $MY_TEMP_FILE
A subshell captures what is printed by `mktemp`, and assigned to `MY_TEMP_FILE`. You can write to the file by using `cat` and a redirection. Close the file with CTRL+D
These are some common commands to figure out where you are and what you want to do.
This utility can split sections of a line by bytes or delimiter. It's most useful with simple CSVs. For example, the first and third columns are needed from a CSV
cut -d, -f1,3 random.csv
For more information read the man page.
Awk is an entire language, but most people use it for line oriented processing. It's a fancy version of cut for a lot of purposes. It uses regex matching to take actions and has various blocks to take actions. The default field separator is white space.
# Print lines where the http status code is 404 not found
awk '$9 == "404"'
# Count up status codes
awk '{s[$9]+=1}END{for(c in s)print c " " s[c]}'
# Sum the 3rd column in a csv without headers
awk -F, '{s+=$3}END{print s}'
Generally you won't see too much awk used in these pipelines, as it takes longer to write and digest AWK over a few simple pipeline patterns. Just know it's very powerful, and can do a lot with very few characters.
`sort` sorts lines in a file. `uniq` removes duplicate lines that appear together.
# Sort by the natural ordering
sort my_words
# Sort by numbers
sort -n my_item_counts
# Sort in reverse ordering
sort -r my_words
# Remove duplicates
uniq my_duplicated_words
# find cases of duplicated words, and count them up
uniq -c my_document_broken_to_words_per_line
Cat takes a series of files to print the contents of
cat foo
cat foo bar baz
Grep stands for globally search for a regular expression and print matching lines. Commands are generally in the following form:
grep [-P] pattern [files...]
Grep is often used in log processing, like counting up how many times a page or how many times a POST request occurred was viewed in Apache logs.
grep --count POST logs/access_log logs/access_log.old
logs/access_log:8
logs/access_log.old:19
Find accepts a series of starting points to traverse the directory hierarchy, and supports various filtering operations. Examples:
# Find all directories below the current directory
find . -type d
# Find only files ending in .txt below the current directory
find . -type f -name '*.txt'
# Find all java files in the test directory that maven will consider a test
find */src/test -name '*Test.java'
These are some common operations, but it supports many more advanced operations. It supports filtering by access time, modified time, permissions, and many other attributes.
Curl is one of the most commonly used programs ever. It makes sending web requests on a ton of protocols super easy. It's great for making API calls or downloading web pages. The result of the request is written to standard out.
Curl supports many flags, but it's a large topic. Generally curl is used like so:
curl $my_url > result.txt
Many times we need to select a subset of files and perform an operation on them. Possibly in parallel.
find . -name '*.txt' | xargs zip MyTextFiles.zip
This adds all .txt in your directory hierarchy to a zip file.
Pipes allow commands to be composed and feed into one operation after another. You saw one with the xargs example. To link one command's output to another's standard input, the commands are separated with a pipe(|) character.
# Find a file with the most lines.
find . -type f | xargs wc -l | sort -n | tail -n 2 | head -n 1
# Sort some input
foo_pipeline | sort
# Dump intermediate results to a file.
foo_pipeline_step1 | tee -a $(mktemp) | foo_pipeline_remaining
Many times it's necessary to feed the results of a pipeline into a program that accepts a file. Bash provides the ability to present these results as file with the following syntax.
# Use a pipeline to determine what patterns to search for
grep -f <(my_pipeline) file1.txt file2.txt
# Diff the output of 2 programs or pipelines
diff <(prog1) <(prog1)
The above feeds a series of patterns into grep, from a given pipeline. A more advanced example would be locating a series of files which lack a frontmatter field, but have an image in them. For example if one was working with Jekyll, it might look:
grep '!\[' _posts/* | grep -v -f <(grep -l cover: _posts/*)
There's lots of sources to get help with bash, and figure out the flags to a given program
This post is tagged:
Read more by exploring the posts in the above tags. Otherwise go home.