Introduction to awk
what is awk
- The word awk is derived from the names of its inventors!!!
- awk is actually Aho Weinberger and Kernighan ;).
- From the original awk paper published by Bell Labs, awk is
“ Awk is a programming language designed to make many common information retrieval and text manipulation tasks easy to state and to perform.”
- Simply put, awk is a programming language designed to search for, match patterns, and perform actions on files.
awk versions
- awk – Original Bell Labs awk (Version 7 UNIX, around 1978) + latest POSIX
awk.
- nawk – New awk (released with SVR4 around 1989)
- gawk – GNU implementation of awk standard.
- mawk – Michael’s awk.
……… and the list goes on.
All these are basically same except for some minor differences in features
provided. This presentation will assume the widely used POSIX awk (also
called “awk”).
A few basic things about awk
- awk reads from a file or from its standard input, and outputs to its standard output.
- awk recognizes the concepts of « file », « record » and « field ».
- A file consists of records, which by default are the lines of the file. One line becomes one record.
- awk operates on one record at a time.
- A record consists of fields, which by default are separated by any number of spaces or tabs.
- Field number 1 is accessed with $1, field 2 with $2, and so forth. $0 refers to the whole record.
Program Structure in Awk
- An awk program is a sequence of statements of the form:
pattern { action }
pattern { action }
…
- pattern in front of an action acts as a selector that determines whether the action is to be executed.
- Patterns can be : regular expressions, arithmetic relational expressions, string-valued expressions, and arbitrary boolean combinations of these.