|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
Jargon File |
|
AWK is a simple and elegant pattern scanning and processing language. I would call it the first and last simple scripting language. AWK is also the most portable scripting language in existence. It's the precursor and the main inspiration of Perl. Although originated in Unix it is available and widely used in Windows environment too.
|
It was created in late 70th of the last century almost simultaneously with Borne shell. The name was composed from the initial letters of three original authors Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger. The team was more talented then Stephen Bourne and produced higher quality product. Unfortunately it was never well integrated into the shell. It is commonly used as a command-line filter in pipes to reformat the output of other commands.
AWK takes two inputs: data file and command file. The command file can be absent and necessary commands can be passed as augments. As Ronald P. Loui aptly noted awk is very underappreciated language:
Most people are surprised when I tell them what language we use in our undergraduate AI programming class. That's understandable. We use GAWK. GAWK, Gnu's version of Aho, Weinberger, and Kernighan's old pattern scanning language isn't even viewed as a programming language by most people. Like PERL and TCL, most prefer to view it as a "scripting language." It has no objects; it is not functional; it does no built-in logic programming. Their surprise turns to puzzlement when I confide that (a) while the students are allowed to use any language they want; (b) with a single exception, the best work consistently results from those working in GAWK. (footnote: The exception was a PASCAL programmer who is now an NSF graduate fellow getting a Ph.D. in mathematics at Harvard.) Programmers in C, C++, and LISP haven't even been close (we have not seen work in PROLOG or JAVA).
The main advantage of AWK is that unlike Perl and other "scripting monsters" that it is very slim without feature creep so characteristic of Perl and thus it can be very efficiently used with pipes. Also it has rather simple, clean syntax and like much heavier TCL can be used with C for "dual-language" implementations.
Generally Perl might be better for really complex tasks, but this is not always the case. In reality AWK much better integrates with Unix shell and until probably in 2004 for simple scripts there was no noticeable difference in speed due to the additional time to load and initialize huge Perl interpreter (but Perl 5 still grows and is now looks slim for a typical PC with dual core 3GHz CPU and 2GB of RAM or server, which typically has at least four core CPU and 6GB or more of RAM).
Unfortunately, Larry Wall then decided to throwing in the kitchen sink, and as a side effect sacrificed the simplicity and orthogonally. I would agree that Perl added some nice things, but it probably added too much nice things :-). Perl4 can probably be used as AWK++ but it's not that portable or universally supported. Like I mentioned above, AWK is the most portable scripting language in existence.
IMHO the original book that describes AWK ( Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger The Awk Programming Language, Addison-Wesley, 1988.) can serve as an excellent introduction into scripting. One chapter is available free Chapter 11 The awk Programming Language
AWK has a unique blend of simplicity and power that is especially attractive for novices, who do not have to spend days and weeks learning all those intricacies of Perl before they become productive. In awk you can became productive in several hours. For instance, to print only the second and sixth fields of the date command--the month and year--with a space separating them, use:
date | awk '{print $2 " " $6}'
The GNU Project produced the most popular version of awk, gawk. gawk has precompiled binaries for MS-DOS and Win32. It has some interesting and useful enhancement. File can be read under control of powerful getline function. Unlike other implementation GNU AWL contains the dgawk debugger is purposely modeled after GDB. GNU AWK 4.0 and higher has "--sandbox" option disables the call of system() and write access to the file system.
The question arise why to use AWK if Perl is widely available and includes its as a subset. I would like to reproduce here the answer given in the newsgroup comp.lang.awk.
9. Why would anyone still use awk instead of perl?
...a valid question, since awk is a subset of perl (functionally, not necessarily syntactically); also, the authors of perl have usually known awk (and sed, and C, and a host of other Unix tools) very well, and still decided to move on.
...there are some things that perl has built-in support for that almost no version of awk can do without great difficulty (if at all); if you need to do these things, there may be no choice to make. for instance, no reasonable person would try to write a web server in awk instead of using perl or even C, if the actual socket programming has to be written in awk. keep in mind that gawk 3.1.0's /inet and ftwalk's built-in networking primitives should help this situation.
however, there are some things in awk's favor compared to perl:
- awk is simpler (especially important if deciding which to learn first)
- awk syntax is far more regular (another advantage for the beginner, even without considering syntax-highlighting editors)
- you may already know awk well enough for the task at hand
Here is a nice into to awk from gawk manual (Getting Started with awk):
The basic function of awk is to search files for lines (or other units of text) that contain certain patterns. When a line matches one of the patterns, awk performs specified actions on that line. awk keeps processing input lines in this way until it reaches the end of the input files.
Programs in awk are different from programs in most other languages, because awk programs are data-driven; that is, you describe the data you want to work with and then what to do when you find it. Most other languages are procedural; you have to describe, in great detail, every step the program is to take. When working with procedural languages, it is usually much harder to clearly describe the data your program will process. For this reason, awk programs are often refreshingly easy to read and write.
When you run awk, you specify an awk program that tells awk what to do. The program consists of a series of rules. (It may also contain function definitions, an advanced feature that we will ignore for now. See User-defined.) Each rule specifies one pattern to search for and one action to perform upon finding the pattern.
Syntactically, a rule consists of a pattern followed by an action. The action is enclosed in curly braces to separate it from the pattern. Newlines usually separate rules. Therefore, an awk program looks like this:
pattern { action } pattern { action } ...
- Running gawk: How to run gawk programs; includes command-line syntax.
- Sample Data Files: Sample data files for use in the awk programs illustrated in this Web page.
- Very Simple: A very simple example.
- Two Rules: A less simple one-line example using two rules.
- More Complex: A more complex example.
- Statements/Lines: Subdividing or combining statements into lines.
- Other Features: Other Features of awk.
- When: When to use gawk and when to use other things.
1.1 How to Run awk Programs
There are several ways to run an awk program. If the program is short, it is easiest to include it in the command that runs awk, like this:
awk 'program' input-file1 input-file2 ...When the program is long, it is usually more convenient to put it in a file and run it with a command like this:
awk -f program-file input-file1 input-file2 ...This section discusses both mechanisms, along with several variations of each.
- One-shot: Running a short throwaway awk program.
- Read Terminal: Using no input files (input from terminal instead).
- Long: Putting permanent awk programs in files.
- Executable Scripts: Making self-contained awk programs.
- Comments: Adding documentation to gawk programs.
- Quoting: More discussion of shell quoting issues.
1.1.1 One-Shot Throwaway awk Programs
Once you are familiar with awk, you will often type in simple programs the moment you want to use them. Then you can write the program as the first argument of the awk command, like this:
awk 'program' input-file1 input-file2 ...where program consists of a series of patterns and actions, as described earlier.
This command format instructs the shell, or command interpreter, to start awk and use the program to process records in the input file(s). There are single quotes around program so the shell won't interpret any awk characters as special shell characters. The quotes also cause the shell to treat all of program as a single argument for awk, and allow program to be more than one line long.
This format is also useful for running short or medium-sized awk programs from shell scripts, because it avoids the need for a separate file for the awk program. A self-contained shell script is more reliable because there are no other files to misplace.
Very Simple, later in this chapter, presents several short, self-contained programs.
1.1.2 Running awk Without Input Files
You can also run awk without any input files. If you type the following command line:
awk 'program'awk applies the program to the standard input, which usually means whatever you type on the terminal. This continues until you indicate end-of-file by typing Ctrl-d. (On other operating systems, the end-of-file character may be different. For example, on OS/2 and MS-DOS, it is Ctrl-z.)
As an example, the following program prints a friendly piece of advice (from Douglas Adams's The Hitchhiker's Guide to the Galaxy), to keep you from worrying about the complexities of computer programming (BEGIN is a feature we haven't discussed yet):
awk "BEGIN { print \"Don't Panic!\" }"This program does not read any input. The `\' before each of the inner double quotes is necessary because of the shell's quoting rules—in particular because it mixes both single quotes and double quotes.
This next simple awk program emulates the cat utility; it copies whatever you type on the keyboard to its standard output (why this works is explained shortly).
$ awk '{ print }' Now is the time for all good men -| Now is the time for all good men to come to the aid of their country. -| to come to the aid of their country. Four score and seven years ago, ... -| Four score and seven years ago, ... What, me worry? -| What, me worry? Ctrl-d1.1.3 Running Long Programs
Sometimes your awk programs can be very long. In this case, it is more convenient to put the program into a separate file. In order to tell awk to use that file for its program, you type:
awk -f source-file input-file1 input-file2 ...The -f instructs the awk utility to get the awk program from the file source-file. Any file name can be used for source-file. For example, you could put the program:
BEGIN { print "Don't Panic!" }into the file advice. Then this command:
awk -f advicedoes the same thing as this one:
awk "BEGIN { print \"Don't Panic!\" }"This was explained earlier (see Read Terminal). Note that you don't usually need single quotes around the file name that you specify with -f, because most file names don't contain any of the shell's special characters. Notice that in advice, the awk program did not have single quotes around it. The quotes are only needed for programs that are provided on the awk command line.
If you want to identify your awk program files clearly as such, you can add the extension .awk to the file name. This doesn't affect the execution of the awk program but it does make "housekeeping" easier.
1.1.4 Executable awk Programs
Once you have learned awk, you may want to write self-contained awk scripts, using the `#!' script mechanism. You can do this on many Unix systems7 as well as on the GNU system. For example, you could update the file advice to look like this:
#! /bin/awk -f BEGIN { print "Don't Panic!" }After making this file executable (with the chmod utility), simply type `advice' at the shell and the system arranges to run awk as if you had typed `awk -f advice':
chmod +x advice $ advice -| Don't Panic!(We assume you have the current directory in your shell's search path variable (typically $PATH). If not, you may need to type `./advice' at the shell.)
Self-contained awk scripts are useful when you want to write a program that users can invoke without their having to know that the program is written in awk.
Advanced Notes: Portability Issues with `#!'
Some systems limit the length of the interpreter name to 32 characters. Often, this can be dealt with by using a symbolic link.
You should not put more than one argument on the `#!' line after the path to awk. It does not work. The operating system treats the rest of the line as a single argument and passes it to awk. Doing this leads to confusing behavior—most likely a usage diagnostic of some sort from awk.
Finally, the value of ARGV[0] (see Built-in Variables) varies depending upon your operating system. Some systems put `awk' there, some put the full pathname of awk (such as /bin/awk), and some put the name of your script (`advice'). Don't rely on the value of ARGV[0] to provide your script name.
1.1.5 Comments in awk Programs
A comment is some text that is included in a program for the sake of human readers; it is not really an executable part of the program. Comments can explain what the program does and how it works. Nearly all programming languages have provisions for comments, as programs are typically hard to understand without them.
In the awk language, a comment starts with the sharp sign character (`#') and continues to the end of the line. The `#' does not have to be the first character on the line. The awk language ignores the rest of a line following a sharp sign. For example, we could have put the following into advice:
# This program prints a nice friendly message. It helps # keep novice users from being afraid of the computer. BEGIN { print "Don't Panic!" }You can put comment lines into keyboard-composed throwaway awk programs, but this usually isn't very useful; the purpose of a comment is to help you or another person understand the program when reading it at a later time.
Caution: As mentioned in One-shot, you can enclose small to medium programs in single quotes, in order to keep your shell scripts self-contained. When doing so, don't put an apostrophe (i.e., a single quote) into a comment (or anywhere else in your program). The shell interprets the quote as the closing quote for the entire program. As a result, usually the shell prints a message about mismatched quotes, and if awk actually runs, it will probably print strange messages about syntax errors. For example, look at the following:
awk '{ print "hello" } # let's be cute' >The shell sees that the first two quotes match, and that a new quoted object begins at the end of the command line. It therefore prompts with the secondary prompt, waiting for more input. With Unix awk, closing the quoted string produces this result:
awk '{ print "hello" } # let's be cute' > ' error--> awk: can't open file be error--> source line number 1Putting a backslash before the single quote in `let's' wouldn't help, since backslashes are not special inside single quotes. The next subsection describes the shell's quoting rules.
1.1.6 Shell-Quoting Issues
For short to medium length awk programs, it is most convenient to enter the program on the awk command line. This is best done by enclosing the entire program in single quotes. This is true whether you are entering the program interactively at the shell prompt, or writing it as part of a larger shell script:
awk 'program text' input-file1 input-file2 ...Once you are working with the shell, it is helpful to have a basic knowledge of shell quoting rules. The following rules apply only to POSIX-compliant, Bourne-style shells (such as bash, the GNU Bourne-Again Shell). If you use csh, you're on your own.
- Quoted items can be concatenated with nonquoted items as well as with other quoted items. The shell turns everything into one argument for the command.
- Preceding any single character with a backslash (`\') quotes that character. The shell removes the backslash and passes the quoted character on to the command.
- Single quotes protect everything between the opening and closing quotes. The shell does no interpretation of the quoted text, passing it on verbatim to the command. It is impossible to embed a single quote inside single-quoted text. Refer back to Comments, for an example of what happens if you try.
- Double quotes protect most things between the opening and closing quotes. The shell does at least variable and command substitution on the quoted text. Different shells may do additional kinds of processing on double-quoted text.
Since certain characters within double-quoted text are processed by the shell, they must be escaped within the text. Of note are the characters `$', ``', `\', and `"', all of which must be preceded by a backslash within double-quoted text if they are to be passed on literally to the program. (The leading backslash is stripped first.) Thus, the example seen previously in Read Terminal, is applicable:
awk "BEGIN { print \"Don't Panic!\" }" -| Don't Panic!Note that the single quote is not special within double quotes.
- Null strings are removed when they occur as part of a non-null command-line argument, while explicit non-null objects are kept. For example, to specify that the field separator FS should be set to the null string, use:
awk -F "" 'program' files # correctDon't use this:
awk -F"" 'program' files # wrong!In the second case, awk will attempt to use the text of the program as the value of FS, and the first file name as the text of the program! This results in syntax errors at best, and confusing behavior at worst.
Mixing single and double quotes is difficult. You have to resort to shell quoting tricks, like this:
awk 'BEGIN { print "Here is a single quote <'"'"'>" }' -| Here is a single quote <'>This program consists of three concatenated quoted strings. The first and the third are single-quoted, the second is double-quoted.
This can be "simplified" to:
awk 'BEGIN { print "Here is a single quote <'\''>" }' -| Here is a single quote <'>Judge for yourself which of these two is the more readable.
Another option is to use double quotes, escaping the embedded, awk-level double quotes:
awk "BEGIN { print \"Here is a single quote <'>\" }" -| Here is a single quote <'>This option is also painful, because double quotes, backslashes, and dollar signs are very common in awk programs.
A third option is to use the octal escape sequence equivalents for the single- and double-quote characters, like so:
awk 'BEGIN { print "Here is a single quote <\47>" }' -| Here is a single quote <'> $ awk 'BEGIN { print "Here is a double quote <\42>" }' -| Here is a double quote <">This works nicely, except that you should comment clearly what the escapes mean.
A fourth option is to use command-line variable assignment, like this:
awk -v sq="'" 'BEGIN { print "Here is a single quote <" sq ">" }' -| Here is a single quote <'>If you really need both single and double quotes in your awk program, it is probably best to move it into a separate file, where the shell won't be part of the picture, and you can say what you mean.
1.2 Data Files for the Examples
Many of the examples in this Web page take their input from two sample data files. The first, BBS-list, represents a list of computer bulletin board systems together with information about those systems. The second data file, called inventory-shipped, contains information about monthly shipments. In both files, each line is considered to be one record.
In the data file BBS-list, each record contains the name of a computer bulletin board, its phone number, the board's baud rate(s), and a code for the number of hours it is operational. An `A' in the last column means the board operates 24 hours a day. A `B' in the last column means the board only operates on evening and weekend hours. A `C' means the board operates only on weekends:
aardvark 555-5553 1200/300 B
alpo-net 555-3412 2400/1200/300 A
barfly 555-7685 1200/300 A
bites 555-1675 2400/1200/300 A
camelot 555-0542 300 C
core 555-2912 1200/300 C
fooey 555-1234 2400/1200/300 B
foot 555-6699 1200/300 B
macfoo 555-6480 1200/300 A
sdace 555-3430 2400/1200/300 A
sabafoo 555-2127 1200/300 CThe data file inventory-shipped represents information about shipments during the year. Each record contains the month, the number of green crates shipped, the number of red boxes shipped, the number of orange bags shipped, and the number of blue packages shipped, respectively. There are 16 entries, covering the 12 months of last year and the first four months of the current year.
Jan 13 25 15 115
Feb 15 32 24 226
Mar 15 24 34 228
Apr 31 52 63 420
May 16 34 29 208
Jun 31 42 75 492
Jul 24 34 67 436
Aug 15 34 47 316
Sep 13 55 37 277
Oct 29 54 68 525
Nov 20 87 82 577
Dec 17 35 61 401Jan 21 36 64 620
Feb 26 58 80 652
Mar 24 75 70 495
Apr 21 70 74 514
1.3 Some Simple Examples
The following command runs a simple awk program that searches the input file BBS-list for the character string `foo' (a grouping of characters is usually called a string; the term string is based on similar usage in English, such as "a string of pearls," or "a string of cars in a train"):
awk '/foo/ { print $0 }' BBS-listWhen lines containing `foo' are found, they are printed because `print $0' means print the current line. (Just `print' by itself means the same thing, so we could have written that instead.)
You will notice that slashes (`/') surround the string `foo' in the awk program. The slashes indicate that `foo' is the pattern to search for. This type of pattern is called a regular expression, which is covered in more detail later (see Regexp). The pattern is allowed to match parts of words. There are single quotes around the awk program so that the shell won't interpret any of it as special shell characters.
Here is what this program prints:
awk '/foo/ { print $0 }' BBS-list -| fooey 555-1234 2400/1200/300 B -| foot 555-6699 1200/300 B -| macfoo 555-6480 1200/300 A -| sabafoo 555-2127 1200/300 CIn an awk rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for every input line. If the action is omitted, the default action is to print all lines that match the pattern.
Thus, we could leave out the action (the print statement and the curly braces) in the previous example and the result would be the same: all lines matching the pattern `foo' are printed. By comparison, omitting the print statement but retaining the curly braces makes an empty action that does nothing (i.e., no lines are printed).
Many practical awk programs are just a line or two. Following is a collection of useful, short programs to get you started. Some of these programs contain constructs that haven't been covered yet. (The description of the program will give you a good idea of what is going on, but please read the rest of the Web page to become an awk expert!) Most of the examples use a data file named data. This is just a placeholder; if you use these programs yourself, substitute your own file names for data. For future reference, note that there is often more than one way to do things in awk. At some point, you may want to look back at these examples and see if you can come up with different ways to do the same things shown here:
- Print the length of the longest input line:
awk '{ if (length($0) > max) max = length($0) } END { print max }' data- Print every line that is longer than 80 characters:
awk 'length($0) > 80' dataThe sole rule has a relational expression as its pattern and it has no action—so the default action, printing the record, is used.
- Print the length of the longest line in data:
expand data | awk '{ if (x < length()) x = length() } END { print "maximum line length is " x }'The input is processed by the expand utility to change tabs into spaces, so the widths compared are actually the right-margin columns.
- Print every line that has at least one field:
awk 'NF > 0' dataThis is an easy way to delete blank lines from a file (or rather, to create a new file similar to the old file but from which the blank lines have been removed).
- Print seven random numbers from 0 to 100, inclusive:
awk 'BEGIN { for (i = 1; i <= 7; i++) print int(101 * rand()) }'- Print the total number of bytes used by files:
ls -l files | awk '{ x += $5 } END { print "total bytes: " x }'- Print the total number of kilobytes used by files:
ls -l files | awk '{ x += $5 } END { print "total K-bytes: " (x + 1023)/1024 }'- Print a sorted list of the login names of all users:
awk -F: '{ print $1 }' /etc/passwd | sort- Count the lines in a file:
awk 'END { print NR }' data- Print the even-numbered lines in the data file:
awk 'NR % 2 == 0' dataIf you use the expression `NR % 2 == 1' instead, the program would print the odd-numbered lines.
More examples from Hartigan-Computer-AWK(Oct 10, 2005)
EXAMPLES # is the comment character for awk. 'field' means 'column' # Print first two fields in opposite order: awk '{ print $2, $1 }' file # Print lines longer than 72 characters: awk 'length > 72' file # Print length of string in 2nd column awk '{print length($2)}' file # Add up first column, print sum and average: { s += $1 } END { print "sum is", s, " average is", s/NR } # Print fields in reverse order: awk '{ for (i = NF; i > 0; --i) print $i }' file # Print the last line {line = $0} END {print line} # Print the total number of lines that contain the word Pat /Pat/ {nlines = nlines + 1} END {print nlines} # Print all lines between start/stop pairs: awk '/start/, /stop/' file # Print all lines whose first field is different from previous one: awk '$1 != prev { print; prev = $1 }' file # Print column 3 if column 1 > column 2: awk '$1 > $2 {print $3}' file # Print line if column 3 > column 2: awk '$3 > $2' file # Count number of lines where col 3 > col 1 awk '$3 > $1 {print i + "1"; i++}' file # Print sequence number and then column 1 of file: awk '{print NR, $1}' file # Print every line after erasing the 2nd field awk '{$2 = ""; print}' file # Print hi 28 times yes | head -28 | awk '{ print "hi" }' # Print hi.0010 to hi.0099 (NOTE IRAF USERS!) yes | head -90 | awk '{printf("hi00%2.0f \n", NR+9)}' # Replace every field by its absolute value { for (i = 1; i <= NF; i=i+1) if ($i < 0) $i = -$i print} # If you have another character that delimits fields, use the -F option # For example, to print out the phone number for Jones in the following file, # 000902|Beavis|Theodore|333-242-2222|149092 # 000901|Jones|Bill|532-382-0342|234023 # ... # type awk -F"|" '$2=="Jones"{print $4}' filename # Some looping for printouts BEGIN{ for (i=875;i>833;i--){ printf "lprm -Plw %d\n", i } exit } Formatted printouts are of the form printf( "format\n", value1, value2, ... valueN) e.g. printf("howdy %-8s What it is bro. %.2f\n", $1, $2*$3) %s = string %-8s = 8 character string left justified %.2f = number with 2 places after . %6.2f = field 6 chars with 2 chars after . \n is newline \t is a tab # Print frequency histogram of column of numbers $2 <= 0.1 {na=na+1} ($2 > 0.1) && ($2 <= 0.2) {nb = nb+1} ($2 > 0.2) && ($2 <= 0.3) {nc = nc+1} ($2 > 0.3) && ($2 <= 0.4) {nd = nd+1} ($2 > 0.4) && ($2 <= 0.5) {ne = ne+1} ($2 > 0.5) && ($2 <= 0.6) {nf = nf+1} ($2 > 0.6) && ($2 <= 0.7) {ng = ng+1} ($2 > 0.7) && ($2 <= 0.8) {nh = nh+1} ($2 > 0.8) && ($2 <= 0.9) {ni = ni+1} ($2 > 0.9) {nj = nj+1} END {print na, nb, nc, nd, ne, nf, ng, nh, ni, nj, NR} # Find maximum and minimum values present in column 1 NR == 1 {m=$1 ; p=$1} $1 >= m {m = $1} $1 <= p {p = $1} END { print "Max = " m, " Min = " p } # Example of defining variables, multiple commands on one line NR == 1 {prev=$4; preva = $1; prevb = $2; n=0; sum=0} $4 != prev {print preva, prevb, prev, sum/n; n=0; sum=0; prev = $4; preva = $1; prevb = $2} $4 == prev {n++; sum=sum+$5/$6} END {print preva, prevb, prev, sum/n} # Example of using substrings # substr($2,9,7) picks out characters 9 thru 15 of column 2 {print "imarith", substr($2,1,7) " - " $3, "out."substr($2,5,3)} {print "imarith", substr($2,9,7) " - " $3, "out."substr($2,13,3)} {print "imarith", substr($2,17,7) " - " $3, "out."substr($2,21,3)} {print "imarith", substr($2,25,7) " - " $3, "out."substr($2,29,3)}[3.0] Awk Examples, Nawk, & Awk Quick Reference
For example, suppose I want to turn a document with single-spacing into a document with double-spacing. I could easily do that with the following Awk program:
awk '{print ; print ""}' infile > outfileNotice how single-quotes (' ') are used to allow using double-quotes (" ") within the Awk expression. This "hides" special characters from the shell you are using. You could also do this as follows:awk "{print ; print \"\"}" infile > outfile-- but the single-quote method is simpler.This program does what it supposed to, but it also doubles every blank line in the input file, which leaves a lot of empty space in the output. That's easy to fix, just tell Awk to print an extra blank line if the current line is not blank
awk '{print ; if (NF != 0) print ""}' infile > outfile* One of the problems with Awk is that it is ingenious enough to make a user want to tinker with it, and use it for tasks for which it isn't really appropriate. For example, you cuse Awk to count the number of lines in a file:awk 'END {print NR}' infile-- but this is dumb, because the "wc (word count)" utility gives the same answer with less bother. "Use the right tool for the job."Awk is the right tool for slightly more complicated tasks. Once I had a file containing an email distribution list. The email addresses of various different groups were placed on consecutive lines in the file, with the different groups separated by blank lines. If I wanted to quickly and reliably determine how many people were on the distribution list, I couldn't use "wc", since, it counts blank lines, but Awk handled it easily:
awk 'NF != 0 {++count} END {print count}' list* Another problem I ran into was determining the average size of a number of files. I was creating a set of bitmaps with a scanner and storing them on a floppy disk. The disk started getting full and I was curious to know just how many more bitmaps I could store on the disk.I could obtain the file sizes in bytes using "wc -c" or the "list" utility ("ls -l" or "ll"). A few tests showed that "ll" was faster. Since "ll" lists the file size in the fifth field, all I had to do was sum up the fifth field and divide by NR. There was one slight problem, however: the first line of the output of "ll" listed the total number of sectors used, and had to be skipped.
No problem. I simply entered:
ll | awk 'NR!=1 {s+=$5} END {print "Average: " s/(NR-1)}'This gave me the average as about 40 KB per file.* Awk is useful for performing simple iterative computations for which a more sophisticated language like C might prove overkill. Consider the Fibonacci sequence:
1 1 2 3 5 8 13 21 34 ...Each element in the sequence is constructed by adding the two previous elements together, with the first two elements defined as both "1". It's a discrete formula for exponential growth. It is very easy to use Awk to generate this sequence:awk 'BEGIN {a=1;b=1; while(++x<=10){print a; t=a;a=a+b;b=t}; exit}'This generates the following output data:1 2 3 5 8 13 21 34 55 89UNIX Basics Examples with awk A short introduction
A long time later, we are back in my life again. A colleague of mine used AWK to extract the first column from a file with the command:
awk ' '{print $1}' fileEasy, isn't it? This simple task does not need complex programming in C. One line of AWK does it. Once we have learned the lesson on how to extract a column we can do things such as renaming files (append .new to "files_list"):ls files_list | awk '{print "mv "$1" "$1".new"}' | sh... and more:
- Renaming within the name:
ls -1 *old* | awk '{print "mv "$1" "$1}' | sed s/old/new/2 | sh
(although in some cases it will fail, as in file_old_and_old)- Remove only files:
ls -l * | grep -v drwx | awk '{print "rm "$9}' | sh
or with awk alone:
ls -l|awk '$1!~/^drwx/{print $9}'|xargs rm
Be careful when trying this out in your home directory. We remove files!- Remove only directories
ls -l | grep '^d' | awk '{print "rm -r "$9}' | sh
or
ls -p | grep /$ | wk '{print "rm -r "$1}'
or with awk alone:
ls -l|awk '$1~/^d.*x/{print $9}'|xargs rm -r
Be careful when trying this out in your home directory. We remove things!- Killing processes by name (in this example we kill the process called netscape):
kill `ps auxww | grep netscape | egrep -v grep | awk '{print $2}'`
or with awk alone:
ps auxww | awk '$0~/netscape/&&$0!~/awk/{print $2}' |xargs kill
It has to be adjusted to fit the ps command on whatever unix system you are on. Basically it is: "If the process is called netscape and it is not called 'grep netscape' (or awk) then print the pid"
Dr. Nikolai Bezroukov
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Feb 06, 2012 | sanctum.geek.nz
For many system administrators, Awk is used only as a way to print specific columns of data from programs that generate columnar output, such as
netstat
orps
.For example, to get a list of all the IP addresses and ports with open TCP connections on a machine, one might run the following:
# netstat -ant | awk '{print $5}'This works pretty well, but among the data you actually wanted it also includes the fifth word of the opening explanatory note, and the heading of the fifth column:
and Address 0.0.0.0:* 205.188.17.70:443 172.20.0.236:5222 72.14.203.125:5222There are varying ways to deal with this.
Matching patternsOne common way is to pipe the output further through a call to
grep
, perhaps to only include results with at least one number:# netstat -ant | awk '{print $5}' | grep '[0-9]'In this case, it's instructive to use the
awk
call a bit more intelligently by setting a regular expression which the applicable line must match in order for that field to be printed, with the standard/
characters as delimiters. This eliminates the need for the call togrep
:# netstat -ant | awk '/[0-9]/ {print $5}'We can further refine this by ensuring that the regular expression should only match data in the fifth column of the output, using the
~
operator:# netstat -ant | awk '$5 ~ /[0-9]/ {print $5}'Skipping linesAnother approach you could take to strip the headers out might be to use
sed
to skip the first two lines of the output:# netstat -ant | awk '{print $5}' | sed 1,2dHowever, this can also be incorporated into the
awk
call, using theNR
variable and making it part of a conditional checking the line number is greater than two:# netstat -ant | awk 'NR>2 {print $5}'Combining and excluding patternsAnother common idiom on systems that don't have the special
pgrep
command is to filterps
output for a string, but exclude thegrep
process itself from the output withgrep -v grep
:# ps -ef | grep apache | grep -v grep | awk '{print $2}'If you're using Awk to get columnar data from the output, in this case the second column containing the process ID, both calls to
grep
can instead be incorporated into theawk
call:# ps -ef | awk '/apache/ && !/awk/ {print $2}'Again, this can be further refined if necessary to ensure you're only matching the expressions against the command name by specifying the field number for each comparison:
# ps -ef | awk '$8 ~ /apache/ && $8 !~ /awk/ {print $2}'If you're used to using Awk purely as a column filter, the above might help to increase its utility for you and allow you to write shorter and more efficient command lines. The Awk Primer on Wikibooks is a really good reference for using Awk to its fullest for the sorts of tasks for which it's especially well-suited.
May 4, 2015 | O'Reilly Radar
I maintain GNU Awk. As part of making releases, I have to create a patch script to convert the file tree of the previous release into the current one. This means writing
rm
commands to remove any files that have been removed. This is fairly straightforward using tools likefind
,sort
, andcomm
.However, for the 4.1.2 release, I also changed the permissions (mode) on some files. I want to create
chmod
commands to update these files' permission settings as well. This is a little harder, so I decided to write anawk
script that will do this for me.Let's take a look at some of the sophistication and control you can achieve using
awk
, such as recursion, the use of arrays of arrays, and extension functions for using operating system facilities.This script,
comptrees.awk
, uses thefts()
extension function to do the heavy lifting. This function walks file trees, building up a representation of those trees usinggawk
's arrays of arrays.The script then uses an
awk
function to compare the two trees' arrays. We start with a#!
header and some descriptive comments:
1 2
34
56
78
#! /usr/local/bin/gawk -f # comptrees.awk --- compare two file trees and print commands to synchronize them #
# Arnold Robbins # April, 2015
The next statement loads the
filefuncs
extension, which includes thefts()
function:
1 2
@load "filefuncs"
The program is run from a
BEGIN
rule. The first thing to do is check the number of arguments and print an error message if that count is incorrect:
1 2
34
56
7BEGIN { # argument count checking
if (ARGC != 3) {print "usage: comptrees dir1 dir2" > "/dev/stderr"
exit 1}
The next step is to remove the program name from
ARGV
, leaving just the two file names in the array. This lets us passARGV
directly tofts()
.
1 2
3# remove program name delete ARGV[0]
The
fts()
function walks the trees. The first argument is an array whose element values are the paths to walk. The second is one or more flag values ORed together; in this case symbolic links are not followed. The final argument holds the results as an array of arrays.
1 2
3# walk the trees fts(ARGV, FTS_PHYSICAL, results)
The top level indices in the
results
array are the final component of the full path. Thus, a simplebasename()
function strips out the leading path components to get at each subarray. We pass the full names and subarrays into the comparison function, which does the work, and then we're done:
1 2
34
5# compare them compare(ARGV[1], results[basename(ARGV[1])],
ARGV[2], results[basename(ARGV[2])])}
The
basename()
function returns the final component of the input pathname, usinggawk
'sgensub()
function to do the work:
1 2
34
56
7# basename --- strip out all but the last part of a filename function basename(path) {
return gensub(".*/", "", "g", path)}
The arrays created by
fts()
are a little bit complicated. See thefilefuncs.3am
man page in thegawk
distribution and the documentation for the details. Basically, directories are represented by arrays where each file is a subarray. Files are arrays with a few special elements, including one named "stat
" which is an array with file information such as owner and permissions. Thecompare()
function has to carefully walk the two arrays representing the trees. The header lists the parameters and the single local variable:
1 2
34
5# compare --- compare two trees function compare(oldname, oldtree, newname, newtree, i) {
The function loops over all the elements in
oldtree
, skipping any of the special ones:
1 2
34
56
7# loop over all elements in the array for (i in oldtree) {
# skip special elements filled in by fts()if (i == "." || i == "stat" || i == "type" ||
i == "path" || i == "error")continue
If an element is itself a directory, compare the directories recursively:
1 2
34
56
if ("." in oldtree[i]) { # directory # recurse
compare(oldname "/" i, oldtree[i],newname "/" i, newtree[i])
}
Next thing to check. If the element is not in the new tree, it was removed, so print an
rm
command:
1 2
34
5else if (! (i in newtree)) { # removed file
printf("rm -fr %s/%s\n", oldname, i)}
Finally, if an element is a file and the permissions are different between the old and new trees, print a
chmod
command. The permission value is ANDed with0777
to get just the file permissions, since the mode value also contains bits indicating the file type:
1 2
34
56
78
910
11else if (oldtree[i]["stat"]["type"] == "file") { if (oldtree[i]["stat"]["mode"] != newtree[i]["stat"]["mode"]) {
# file permissions changeprintf("chmod %o %s/%s\n",
and(newtree[i]["stat"]["mode"], 0777),newname, i)
}}
}}
That's it! 63 lines of
awk
that will save me a lot of time as I prepare futuregawk
releases. I think this script nicely demonstrates the power of thefts()
extension function andgawk
's arrays of arrays.
Editor's note: If your work involves a significant amount of data extraction, reporting, and data-reformatting jobs, you'll definitely want to check out Arnold Robbins' Effective awk Programming, 4th Edition.
UrFix's Blog
stuck on OS X, so here's a Perl version for the Mac:
curl -u username:password --silent "https://mail.google.com/mail/feed/atom" | tr -d '\n' | awk -F '<entry>' '{for (i=2; i<=NF; i++) {print $i}}' | perl -pe 's/^<title>(.*)<\/title>.*<name>(.*)<\/name>.*$/$2 - $1/'
If you want to see the name of the last person, who added a message to the conversation, change the greediness of the operators like this:
curl -u username:password --silent "https://mail.google.com/mail/feed/atom" | tr -d '\n' | awk -F '<entry>' '{for (i=2; i<=NF; i++) {print $i}}' | perl -pe 's/^<title>(.*)<\/title>.*?<name>(.*?)<\/name>.*$/$2 - $1/'
5) Remove duplicate entries in a file without sorting.
awk '!x[$0]++' <file>Using awk, find duplicates in a file without sorting, which reorders the contents. awk will not reorder them, and still find and remove duplicates which you can then redirect into another file.
6) find geographical location of an ip addresslynx -dump http://www.ip-adress.com/ip_tracer/?QRY=$1|grep address|egrep 'city|state|country'|awk '{print $3,$4,$5,$6,$7,$8}'|sed 's\ip address flag \\'|sed 's\My\\'
I save this to bin/iptrace and run "iptrace ipaddress" to get the Country, City and State of an ip address using the http://ipadress.com service.
I add the following to my script to get a tinyurl of the map as well:
URL=`lynx -dump http://www.ip-adress.com/ip_tracer/?QRY=$1|grep details|awk '{print $2}'`
lynx -dump http://tinyurl.com/create.php?url=$URL|grep tinyurl|grep "19. http"|awk '{print $2}'
7) Block known dirty hosts from reaching your machinewget -qO " http://infiltrated.net/blacklisted|awk '!/#|[a-z]/&&/./{print "iptables -A INPUT -s "$1″ -j DROP"}'Blacklisted is a compiled list of all known dirty hosts (botnets, spammers, bruteforcers, etc.) which is updated on an hourly basis. This command will get the list and create the rules for you, if you want them automatically blocked, append |sh to the end of the command line. It's a more practical solution to block all and allow in specifics however, there are many who don't or can't do this which is where this script will come in handy. For those using ipfw, a quick fix would be {print "add deny ip from "$1″ to any}. Posted in the sample output are the top two entries. Be advised the blacklisted file itself filters out RFC1918 addresses (10.x.x.x, 172.16-31.x.x, 192.168.x.x) however, it is advisable you check/parse the list before you implement the rules
8) Display a list of committers sorted by the frequency of commitssvn log -q|grep "|"|awk "{print \$3}"|sort|uniq -c|sort -nrUse this command to find out a list of committers sorted by the frequency of commits.
9) List the number and type of active network connectionsnetstat -ant | awk '{print $NF}' | grep -v '[a-z]' | sort | uniq -c10) View facebook friend list [hidden or not hidden]lynx -useragent=Opera -dump 'http://www.facebook.com/ajax/typeahead_friends.php?u=4&__a=1" |gawk -F'\"t\":\"' -v RS='\",' 'RT{print $NF}' |grep -v '\"n\":\"' |cut -d, -f2
There's no need to be logged in facebook. I could do more JSON filtering but you get the idea…
Replace u=4 (Mark Zuckerberg, Facebook creator) with desired uid.
Hidden or not hidden… Scary, don't you?
11) List recorded formular fields of Firefoxcd ~/.mozilla/firefox/ && sqlite3 `cat profiles.ini | grep Path | awk -F= '{print $2}'`/formhistory.sqlite "select * from moz_formhistory" && cd " > /dev/null
When you fill a formular with Firefox, you see things you entered in previous formulars with same field names. This command list everything Firefox has registered. Using a "delete from", you can remove anoying Google queries, for example ;-)
12) Brute force discoversudo zcat /var/log/auth.log.*.gz | awk '/Failed password/&&!/for invalid user/{a[$9]++}/Failed password for invalid user/{a["*" $11]++}END{for (i in a) printf "%6s\t%s\n", a[i], i|"sort -n"}'
Show the number of failed tries of login per account. If the user does not exist it is marked with *.
13) Show biggest files/directories, biggest first with 'k,m,g' eyecandydu "max-depth=1 | sort -r -n | awk '{split("k m g",v); s=1; while($1>1024){$1/=1024; s++} print int($1)" "v[s]"\t"$2}'I use this on debian testing, works like the other sorted du variants, but i like small numbers and suffixes :)
14) Analyse an Apache access log for the most common IP addressestail -10000 access_log | awk '{print $1}' | sort | uniq -c | sort -n | tailThis uses awk to grab the IP address from each request and then sorts and summarises the top 10
15) copy working directory and compress it on-the-fly while showing progresstar -cf " . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgzWhat happens here is we tell tar to create "-c" an archive of all files in current dir "." (recursively) and output the data to stdout "-f -". Next we specify the size "-s" to pv of all files in current dir. The "du -sb . | awk ?{print $1}?" returns number of bytes in current dir, and it gets fed as "-s" parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way "pv" knows how much data is still left to be processed and shows us that it will take yet another 4 mins 49 secs to finish.
Credit: Peteris Krumins http://www.catonmat.net/blog/unix-utilities-pipe-viewer/
16) List of commands you use most oftenhistory | awk '{print $2}' | sort | uniq -c | sort -rn | head17) Identify long lines in a fileawk 'length>72" fileThis command displays a list of lines that are longer than 72 characters. I use this command to identify those lines in my scripts and cut them short the way I like it.
18) Makes you look busyalias busy='my_file=$(find /usr/include -type f | sort -R | head -n 1); my_len=$(wc -l $my_file | awk "{print $1}"); let "r = $RANDOM % $my_len" 2>/dev/null; vim +$r $my_file'
This makes an alias for a command named 'busy'.
The 'busy' command opens a random file in /usr/include to a random line with vim. Drop this in your .bash_aliases and make sure that file is initialized in your .bashrc.
19) Show me a histogram of the busiest minutes in a log file:cat /var/log/secure.log | awk '{print substr($0,0,12)}' | uniq -c | sort -nr | awk '{printf("\n%s ",$0) ; for (i = 0; i<$1 ; i++) {printf("*")};}'
20) Analyze awk fieldsawk '{print NR": "$0; for(i=1;i<=NF;++i)print "\t"i": "$i}'
Breaks down and numbers each line and it's fields. This is really useful when you are going to parse something with awk but aren't sure exactly where to start.
21) Browse system RAM in a human readable formsudo cat /proc/kcore | strings | awk "length > 20" | lessThis command lets you see and scroll through all of the strings that are stored in the RAM at any given time. Press space bar to scroll through to see more pages (or use the arrow keys etc).
Sometimes if you don't save that file that you were working on or want to get back something you closed it can be found floating around in here!
The awk command only shows lines that are longer than 20 characters (to avoid seeing lots of junk that probably isn't "human readable").
If you want to dump the whole thing to a file replace the final '| less' with '> memorydump'. This is great for searching through many times (and with the added bonus that it doesn't overwrite any memory…).
Here's a neat example to show up conversations that were had in pidgin (will probably work after it has been closed)…
sudo cat /proc/kcore | strings | grep '([0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\})'
(depending on sudo settings it might be best to run
22) Monitor open connections for httpd including listen, count and sort it per IP
sudo su
first to get to a # prompt)watch "netstat -plan|grep :80|awk {'print \$5"} | cut -d: -f 1 | sort | uniq -c | sort -nk 1″It's not my code, but I found it useful to know how many open connections per request I have on a machine to debug connections without opening another http connection for it.
You can also decide to sort things out differently then the way it appears in here.
23) Purge configuration files of removed packages on debian based systemssudo aptitude purge `dpkg "get-selections | grep deinstall | awk '{print $1}'`Purge all configuration files of removed packages
24) Quick glance at who's been using your system recentlylast | grep -v "^$" | awk '{ print $1 }' | sort -nr | uniq -cThis command takes the output of the 'last' command, removes empty lines, gets just the first field ($USERNAME), sort the $USERNAMES in reverse order and then gives a summary count of unique matches.
25) Number of open connections per ip.netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -nHere is a command line to run on your server if you think your server is under attack. It prints our a list of open connections to your server and sorts them by amount.
Jul 05, 2011 | freshmeat.net
Changes: New options were added. All long options now have corresponding short options. The "--sandbox" option disables the call of system() and write access to the file system. The POSIX... 2008 behavior for "sub" and "gsub" is now the default, bringing a change from the previous behavior. Regular expressions were enhanced. Many further enhancements as well as bugfixes and code cleanups were made
runawk is a small wrapper for the AWK interpreter that helps one write standalone AWK scripts. Its main feature is to provide a module/library system for AWK which is somewhat similar to Perl's "use" command. It also allows one to select a preferred AWK interpreter and to set up the environment for AWK scripts. Dozens of ready for use [modules].awk are also provided.
The awk implementation of cut uses the getopt library function (see section Processing Command Line Options), and the join library function (see section Merging an Array Into a String).
The program begins with a comment describing the options and a usage function which prints out a usage message and exits. usage is called if invalid arguments are supplied.
# cut.awk --- implement cut in awk # Arnold Robbins, [email protected], Public Domain # May 1993 # Options: # -f list Cut fields # -d c Field delimiter character # -c list Cut characters # # -s Suppress lines without the delimiter character function usage( e1, e2) { e1 = "usage: cut [-f list] [-d c] [-s] [files...]" e2 = "usage: cut [-c list] [files...]" print e1 > "/dev/stderr" print e2 > "/dev/stderr" exit 1 }The variables e1 and e2 are used so that the function fits nicely on the screen.
Next comes a BEGIN rule that parses the command line options. It sets FS to a single tab character, since that is cut's default field separator. The output field separator is also set to be the same as the input field separator. Then getopt is used to step through the command line options. One or the other of the variables by_fields or by_chars is set to true, to indicate that processing should be done by fields or by characters respectively. When cutting by characters, the output field separator is set to the null string.
BEGIN \ { FS = "\t" # default OFS = FS while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) { if (c == "f") { by_fields = 1 fieldlist = Optarg } else if (c == "c") { by_chars = 1 fieldlist = Optarg OFS = "" } else if (c == "d") { if (length(Optarg) > 1) { printf("Using first character of %s" \ " for delimiter\n", Optarg) > "/dev/stderr" Optarg = substr(Optarg, 1, 1) } FS = Optarg OFS = FS if (FS == " ") # defeat awk semantics FS = "[ ]" } else if (c == "s") suppress++ else usage() } for (i = 1; i < Optind; i++) ARGV[i] = ""
Oct 30, 2005 | Blog O' Matty
I was working on a shell script last week and wanted to grab just the CPU section from the Solaris prtdiag(1m) output. I was able to perform this operation with awk by checking $0 for one or more "=" characters, and then setting a variable named SECTION to the value contained in the second position variable. If this variable was equal to the string CPUs, all subsequent lines would be printed up until the next block of "=" characters were detected. The awk script looked similar to the following:
$ prtdiag -v | awk ' $1 ~ /^\=+$/ {SECTION=$2} { if (SECTION == "CPUs") print }'
==================================== CPUs ==================================== E$ CPU CPU CPU Freq Size Implementation Mask Status Location --- -------- ---------- ------------------- ----- ------ -------- 0 502 MHz 256KB SUNW,UltraSPARC-IIe 1.4 on-line +-board/cpu0I really dig awk!
Posted by matty, filed under UNIX Shell. Date: October 29, 2005, 8:01 pm | No Comments
03 Jul 2008 | www.ibm.com/developerworks
Conditional statements
Awk also offers very nice C-like if statements. If you'd like, you could rewrite the previous script using an if statement:
{ if ( $5 ~ /root/ ) { print $3 } }Both scripts function identically. In the first example, the boolean expression is placed outside the block, while in the second example, the block is executed for every input line, and we selectively perform the print command by using an if statement. Both methods are available, and you can choose the one that best meshes with the other parts of your script.Here's a more complicated example of an awk if statement. As you can see, even with complex, nested conditionals, if statements look identical to their C counterparts:
{ if ( $1 == "foo" ) { if ( $2 == "foo" ) { print "uno" } else { print "one" } } else if ($1 == "bar" ) { print "two" } else { print "three" } }Using if statements, we can also transform this code:! /matchme/ { print $1 $3 $4 }to this:{ if ( $0 !~ /matchme/ ) { print $1 $3 $4 } }Both scripts will output only those lines that don't contain a matchme character sequence. Again, you can choose the method that works best for your code. They both do the same thing.Awk also allows the use of boolean operators "||" (for "logical or") and "&&"(for "logical and") to allow the creation of more complex boolean expressions:
( $1 == "foo" ) && ( $2 == "bar" ) { print }This example will print only those lines where field one equals foo and field two equals bar.
This week, we're going to look at a technique for adding a column of numbers. This particular technique requires that the column line up vertically, as we're going to strip off the leftmost part of each line in the file using the cut command. The sample script examines only those lines that contain a particular tag, which enables us to easily ignore lines not containing the numbers we seek and process only those containing numeric data. Assume in this example that the numbers in question are timing measurements (the "ms:" tag indicates that the numbers are in milliseconds). The script isolates the tag and the columnar position of the data to be averaged at the top of the script, making these values obvious and easy to modify.
#!/bin/sh # # Compute the average of specified column in a file TAG = "ms:" COL = 29 if [ "$1" = "" ]; then echo "Usage: $0 <filename>" exit else INFILE=$1 fi grep $TAG $INFILE | cut -c$COL- | \ awk '{sum += $1;total += 1;printf"avg = %.2f\n", sum/total}' | \ tail -1The file's name containing the data to be averaged needs to be supplied as an argument when the script is run; otherwise, a usage statement is issued, and the script exits. The usage statement includes $0 so that it reflects whatever name the user decides to give the script. I call mine addcol.
boson% ./addcol Usage: ./addcol <filename> boson% A sample data file for this script might look like this: date: 06/11/01 12:11:00 ms: 117 measurement from boson.particle.net date: 06/11/01 12:21:00 ms: 132 measurement from boson.particle.net date: 06/11/01 12:31:00 ms: 109 measurement from boson.particle.net date: 06/11/01 12:41:00 ms: 121 measurement from boson.particle.net This data file contains a time measurement taken once every ten minutes and a comment. The grep command reduces this to: date: 06/11/01 12:11:00 ms: 117 date: 06/11/01 12:21:00 ms: 132 date: 06/11/01 12:31:00 ms: 109 date: 06/11/01 12:41:00 ms: 121 The cut command further reduces it to: 117 132 109 121By the time data is passed to the awk command, only a list of numbers remains of the original data. The awk command sees each of these numbers, therefore, as $1 and computes a sum and an average (i.e., su/total) for each data point. This data is then passed to the tail command, so that only the final computation appears on the user's screen.
You would use a different approach for numbers that don't appear in the same columnar position or for those that include a decimal.
nawk '
BEGIN {
# Read the whole substitution file
# into the array tab[].
# Format of the substitution file:
# oldword newword
substfile = "'"$SubstFile"'"
while ( getline < substfile ) {
tab [$1] = $2 # fill conversion table
}
close (substfile)
}
{
for ( i=1; i<=NF; i++ ) {
if ( tab [$i] != "" ) {
# substitute old word
$i = tab [$i]
}
}
}
' "$@"
Pseudo-filesAWK knows another way to assign values to AWK variables, like in the following example:
awk '{ print "var is", var }' var=TEST file1This statement assigns the value "TEST" to the AWK variable "var", and then reads the files "file1" and "file2". The assignment works, because AWK interprets each file name containing an equal sign ("=") as an assignment.
This example is very portable (even oawk understands this syntax), and easy to use. So why don't we use this syntax exclusively?
This syntax has two drawbacks: the variable assignment are interpreted by AWK the moment the file would have been read. At this time the assignment takes place. Since the BEGIN action is performed before the first file is read, the variable is not available in the BEGIN action.
The second problem is, that the order of the variable assignments and of the files are important. In the following example
awk '{ print "var is", var }' file1 var=TEST file2the variable var is not defined during the read of file1, but during the reading of file2. This may cause bugs that are hard to track down.
SCALAR (Squid Cache Advanced Log Analyzer & Reporter) produces many detailed reports, such as:
Time Based Load Statistic,
Extensions Report,
Content Report,
Object Sizes Report,
Request Methods Report,
Squid & HTTP Result Codes Report
Cache Hierarchy Reports
most of reports are splitted on Requests, Traffic, Timeouts and Denies statistic.
SCALAR is highly customizable tool/script written on AWK - all setting can be defined inside script header. SCALAR developed by Yuri N. Fominov.
LinuxPlanet
Here is the section of my book that talks about how to get gawk to communicate with a coprocess:
"Coprocess: Two-Way I/O
"A coprocess is a process that runs in parallel with another process. Starting with version 3.1, gawk can invoke a coprocess to exchange information directly with a background process. A coprocess can be useful when you are working in a client/server environment, setting up an SQL front end/back end, or exchanging data with a remote system over a network. The gawk syntax identifies a coprocess by preceding the name of the program that starts the background process with a |& operator.
"The coprocess command must be a filter (i.e., it reads from standard input and writes to standard output) and must flush its output whenever it has a complete line rather than accumulating lines for subsequent output. When a command is invoked as a coprocess, it is connected via a two-way pipe to a gawk program so that you can read from and write to the coprocess.
"When used alone the tr utility does not flush its output after each line. The to_upper shell script is a wrapper for tr that does flush its output; this filter can be run as a coprocess. For each line read, to_upper writes the line, translated to uppercase, to standard output. Remove the # before set -x if you want to_upper to display debugging output.
$ cat to_upper #!/bin/bash #set -x while read arg do echo "$arg" | tr '[a-z]' '[A-Z]' done $ echo abcdef | to_upper ABCDEF"The g6 program invokes to_upper as a coprocess. This gawk program reads standard input or a file specified on the command line, translates the input to uppercase, and writes the translated data to standard output.
$ cat g6 { print $0 |& "to_upper" "to_upper" |& getline hold print hold } $ gawk -f g6 < alpha AAAAAAAAA BBBBBBBBB CCCCCCCCC DDDDDDDDD"The g6 program has one compound statement, enclosed within braces, comprising three statements. Because there is no pattern, gawk executes the compound statement once for each line of input.
"In the first statement, print $0 sends the current record to standard output. The |& operator redirects standard output to the program named to_upper, which is running as a coprocess. The quotation marks around the name of the program are required. The second statement redirects standard output from to_upper to a getline statement, which copies its standard input to the variable named hold. The third statement, print hold, sends the contents of the hold variable to standard output."
Sometimes, you just want to use awk as a formatter, and dump the output stright to the user. The following script takes a list of users as its argument, and uses awk to dump information about them out of /etc/passwd.Note: observe where I unquote the awk expression, so that the shell does expansion of $1, rather than awk.
#!/bin/sh while [ "$1" != "" ] ; do awk -F: '$1 == "'$1'" { print $1,$3} ' /etc/passwd shift doneSometimes you just want to use awk as a quick way to set a value for a variable. Using the passwd theme, we have a way to grab the shell for a user, and see if it is in the list of official shells.
Again, be aware of how I unquote the awk expression, so that the shell does expansion of $1, rather than awk.
#!/bin/sh user="$1" if [ "$user" ="" ] ; then echo ERROR: need a username ; exit ; fi usershell=`awk -F: '$1 == "'$1'" { print $7} ' /etc/passwd` grep -l $usershell /etc/shells if [ $? -ne 0 ] ; then echo ERROR: shell $usershell for user $user not in /etc/shells fiOther alternatives:
# See "man regex" usershell=`awk -F: '/^'$1':/ { print $7} ' /etc/passwd` #Only modern awks take -v. You may have to use "nawk" or "gawk" usershell=`awk -F: -v user=$1 '$1 == user { print $7} ' /etc/passwd`The explanation of the extra methods above, is left as an exercise to the reader :-)
In a pipe-line
Sometimes, you just want to put awk in as a filter for data, either in a larger program, or just a quickie one-liner from your shell prompt. Here's a quickie to look at the "Referrer" field of weblogs, and see what sites link to your top page many different types of web browsers come to look at your site.
#!/bin/sh grep -h ' /index.html' $* | awk -F\" '{print $4}' | sort -u
awk is a programming language that gets its name from the 3 people who invented it (Aho, Weinberger, and Kernighan). Because it was developed on a Unix operating system, its name is usually printed in lower-case ("awk") instead of capitalized ("Awk"). awk is distributed as free software, meaning that you don't have to pay anything for it and you can get the source code to build awk yourself .
It's not an "I can do anything" programming language like C++ or VisualBasic, although it can do a lot. awk excels at handling text and data files, the kind that are created in Notepad or (for example) HTML files. You wouldn't use awk to modify a Microsoft Word document or an Excel spreadsheet. However, if you take the Word document and Save As "Text Only" or if you take the Excel spreadsheet and Save As tab-delimited (*.txt) or comma-delimited (*.csv) output files, then awk could do a good job at handling them.
I like awk because it's concise. The shortest awk program that does anything useful is just 1 character:
awk 1 yourfile
On a DOS/Windows machine, this converts Unix line endings (LF) to standard DOS line endings (CR,LF). awk programs are often called "scripts" because they don't require an intermediate stage of compiling the progam into an executable form like an *.EXE file. In fact, awk programs are almost never compiled into *.EXE files (although I think it's possible to do this). Thus, many people refer to awk as a "scripting language" instead of a "programming language."
This doesn't mean that you couldn't run an awk program from an icon on the Windows desktop. It means that instead of creating a shortcut to something like "mywidget.exe", you'd create a shortcut to "awk -f mywidget.awk somefile.txt" when Windows prompts you for the Command Line.
Google matched content |
comp.lang.awk FAQ Important, albeit outdated document. It also contains lots of additional awk links.
Examples from O'Reilly book sedawk_2
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: February 19, 2020