Brief Introduction
学习网址:https://www.tecmint.com/use-linux-awk-command-to-filter-text-string-in-files/
The general syntax of awk
is:
|
|
Where 'script'
is a set of commands that are understood by awk
and execute on file, filename.
It works by reading a given line in the file, makes a copy of the line and then executes the script on the line. This is repeated on all the lines in the file.
The 'script'
is in the form '/pattern/ action'
where pattern is a regular expression and the action is what awk will do when it finds the given pattern in a line.
The GNU Awk User’s Guide
学习网址:https://www.gnu.org/software/gawk/manual/gawk.html#Getting-Started
awk
is an interpreted language which means awk utility reads your program and then processes your data according to the instructions in your program. awk
programs are data driven while most other languages are procedural.
When you run awk
, you specify an awk
program that tells awk
what to do. The program consists of a series of rules. Each rule specifies one pattern to search for and one action to perform upon finding the pattern.
An awk
program looks like this:
|
|
1. Run awk
programs
If the program is short, it is easiest to include it in the command that runs awk
:
|
|
otehrwise:
|
|
1.1 Running
Running without input files
|
|
awk
applies the program to the standard input, which usually means whatever you type on the keyboard. This continues until you indicate end-of-file by typing Ctrl-d
. (On non-POSIX operating systems, the end-of-file character may be different.)
Executable awk
program
|
|
1.2 Example programs
This is a typical task that awk
programs do.
|
|
The
awk
utility reads the input files one line at a time. For each line,awk
tries the patterns of each rule. If several patterns match, then several actions execute in the order in which they appear in theawk
program. If no patterns match, then no actions run.This command prints the total number of bytes in all the files in the current directory that were last modified in November (of any year).
When
awk
statements within one rule are short, you might want to put more than one of them on a line. This is accomplished by separating the statements with a semicolon (‘;’). This also applies to the rules themselves. Thus, the program shown at the start of this section could also be written this way:1
$ ls -l | awk '$6 == "Nov"{ sum += $5 }; END {print sum}'
Print every line that has at least one field:
|
|
- default action is
{print $0}
NF
stand for the fields amout in one line
Count line in a file:
|
|
2. Running awk and gawk
3. Regex Expression
4. Reading Input Files
The input is read in units called records, and is processed by the rules of your program one record at a time. By default, each record is one line. Each record is automatically split into chunks called fields. This makes it more convenient for programs to work on the parts of a record.
Record Split
Record splitting with standard awk
.
Records are separated by a character called the record separator. By default, the record separator is the newline character. This is why records are, by default, single lines. To use a different character for the record separator, simply assign that character to the predefined variable RS
. For example:
|
|
Another way to change to record separator is on the command line, using the variable-assignment feature:
|
|