Awk is the most popular utility that is developed for the purpose of data extraction, text processing, and moreover like creating formatted reports.
It is way more similar to sed but more powerful than sed as sed has limitations in text processing.
AWK doesn’t have a specific meaning to its name as it is named using the first letter of its developers Alfred Aho, Peter J. Weinberger, and Brian Kernighan.
Here at LinuxAPT, as part of our Server Management Services, we regularly help our Customers to perform Linux Terminal related queries.
In this context, we shall look into some useful awk commands you must need to know as a Linux User.
Here, have created and added the following set of data in people.txt as an example.
The data set has 4 columns where the first field contains the first name, the second field contains the second name, the third field contains age and the last one contains the class:
$ cat people.txt
Mike Hamby 19 10
John Fray 14 6
Ellen Hoy 18 8
Robbie Shinn 15 6
Jake Mickel 18 9
How to Print Specific Field Using Variable
Awk has many prebuilt variables that have their respective purpose. Using this command we can print all the specific field data using $x where x refers to the field numbering position:
$ awk '{print $1, $2}' people.txt
Mike Hamby
John Fray
Ellen Hoy
Robbie Shinn
Jake Mickel
BEGIN Variable
BEGIN Variable is used to add header or title to resulting data as it executed the script before processing the data.
It helps in indexing while formatting the data tables.
In the following example, we have printed some text as indexing and then print all student names:
$ awk 'BEGIN {print "People : "} {print $1}' people.txt
People :
Mike
John
Ellen
Robbie
Jake
END Variable
END is just the opposite of BEGIN as it executes the script after data processing. It can be used for the final reporting of the data set.
In the following example, we have printed all the student age and then printed some ending messages:
$ awk '{print $3}
END {
print "These are people's age"
} ' people.txt
In our case, you will see:
$ awk '{print $3}
> END {
> print "These are people's age"
> } ' people.txt
19
14
18
15
18
These are people's age
File Separator
Space and Tab space are default separators of the awk command however we can separate text based on other separators like comma, slash, etc.
To achieve this we need to add the -F flag to the command and the provide separator in a single quotation mark.
$ awk -F':' '{print $1}' /etc/passwd
root daemon
bin
sys
sync
games
man
How to Run Script From File ?
We can execute the awk script from the file also which provides us the tendency of creating reports efficiently. For this, you need to create the file then write the script and execute it using the awk command. For the demo, you can create a file name demo_script and copy-paste the following script:
$ vi demo_script
{
sum+=$3
}
END {
print("Sum of all people's age is", sum)
}
So, you will get:
{
sum+=$3
}
END {
print("Sum of all people's age is", Sum)
}
The awk command provides a -f flag for executing the script from the file:
$ awk -f demo_script people.txt
Sum of all people's age is 84
How to use Multiple Script ?
We can execute the multiple scripts using the semicolon. In the following example, we have printed some text then pipe the output, with awk and print out the modified result.
$ echo "Hello, Dr. John" | awk '{$3="George"; print $0}'
Hello, Dr. George
How to Count Number of Lines ?
We can allocate the number to the report using the NR variable which is awk built-in variable that automatically prints the line number to the report:
$ awk '{print NR "\t" $0}' people.txt
1 Mike Hamby 19 10
2 John Fray 14 6
3 Ellen Hoy 18 8
4 Robbie Shinn 15 6
5 Jake Mickel 18 9
How to Count Number of Fields ?
Sometimes, while preparing the data we forgot to add data in the specific column which may lead to irregularity in the report.
We can count fields using the NF variable which makes us easier to review and arrange the reports.
$ awk '{print NR".",$0 "\n Count=" NF}' people.txt
1. Mike Hamby 19 10
Count=4
2. John Fray 14 6
Count=4
3. Ellen Hoy 18 8
Count=4
4. Robbie Shinn 15 6
Count=4
5. Jake Mickel 18 9
Count=4
If Condition
We can use if condition in preparing a conditional report. In the following example, we print all the student whose age is below 16:
$ awk '
BEGIN{
print "People whose age are under 16 are:"
}
{
if($3<16){
print $1
}
}' people.txt
For Loop
In the following example, we use for loop to print 5 random numbers in succession. For generating random numbers we will use the rand() function which is a system inbuilt function.
This function will generate a random number in decimal so we need to multiply 100 to get random numbers 1 to 100:
$ awk 'BEGIN {
for (i = 1; i <= 5; i++){
print int(100 * rand())
}
}'
This article covers a few awk commands and scripts.
Awk is a scripting language used for manipulating data and generating reports.
The awk command programming language requires no compiling, and allows the user to use variables, numeric functions, string functions, and logical operators.
AWK Syntax:
$ awk options 'selection _criteria {action }' input-file > output-file
Functions of AWK:
1. AWK Operations:
(a) Scans a file line by line
(b) Splits each input line into fields
(c) Compares input line/fields to pattern
(d) Performs action(s) on matched lines
2. Useful For:
(a) Transform data files
(b) Produce formatted reports
3. Programming Constructs:
(a) Format output lines
(b) Arithmetic and string operations
(c) Conditionals and loops