|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
About the author:
Javier is involved in a Ph. D. in Astronomy at a Spanish university where he administrates a workstation cluster. The daily work in his department is done on Unix machines. After some initial problems and trials slackware Linux was chosen. Linux turned out to be much better than some other proprietary Unix systems.
Content:
|
Abstract:
|
This article gives some insight in to the tricks that you can do with AWK. It is not a tutorial but it provides real live examples to use.
Originally, the idea to write this text came to me after reading a couple of articles published in LinuxFocus that were written by Guido Socher. One of them, about find and related commands, showed me that I was not the only one who used the command line. Pretty GUIs don't tell you how the things are really done (that's the way that Windows went years ago). The other article was about regular expressions. Although regular expressions are only slightly touched in this article, you need to know them to get the maximum from awk and other commands like sed and grep.
The key question is whether this awk command is really useful. The answer is definitly yes! It could be useful for a
normal user to process text files, re-format them etc... For a system administrator AWK is really a very important utility.
Just walk around /var/yp/Makefile
or look at the initialization scripts . AWK is used everywhere.
My first news about AWK are old enough for being forgotten. I had a colleague who needed to work with some really big
outputs from a small Cray. The manual page for awk
on the Cray was small, but he said that AWK looks very much
like the thing he needs although he did not yet understand how to use it.
A long time later, we are back in my life again. A colleague of mine used AWK to extract the first column from a file with
the command:
awk ' '{print $1}' fileEasy, isn't it? This simple task does not need complex programming in C. One line of AWK does it.
Once we have learned the lesson on how to extract a column we can do things such as renaming files (append .new to "files_list"):
ls files_list | awk '{print "mv "$1" "$1".new"}' | sh
... and more:
ls -1 *old* | awk '{print "mv "$1" "$1}' | sed s/old/new/2 | sh
ls -l * | grep -v drwx | awk '{print "rm "$9}' | sh
ls -l|awk '$1!~/^drwx/{print $9}'|xargs rm
ls -l | grep '^d' | awk '{print "rm -r "$9}' | sh
ls -p | grep /$ | wk '{print "rm -r "$1}'
ls -l|awk '$1~/^d.*x/{print $9}'|xargs rm -r
kill `ps auxww | grep netscape | egrep -v grep | awk '{print $2}'`
As you can see, AWK really helps when the same calculations are repeated over and over ... and apart from that it is much more fun to write an AWK program than doing almost the same thing 20 times manually.
awk
is a little programming language, with a syntax close to C in many aspects. It is an interpreted language
and the awk
interpreter processes the instructions.
About the syntax of the awk command interpreter itself:
# gawk --help Usage: gawk [POSIX or GNU style options] -f progfile [--] file ... gawk [POSIX or GNU style options] [--] 'program' file ... POSIX options: GNU long options: -f progfile --file=progfile -F fs --field-separator=fs -v var=val --assign=var=val -m[fr] val -W compat --compat -W copyleft --copyleft -W copyright --copyright -W help --help -W lint --lint -W lint-old --lint-old -W posix --posix -W re-interval --re-interval -W source=program-text --source=program-text -W traditional --traditional -W usage --usage -W version --versionInstead of simply quoting (') the programs in the command line, we can, as you can see above, write the instructions into a file, and call it with the option
-f
. With command line defined variables using -v var=
Awk is, roughly speaking, a language oriented to manage tables. That is some information which can be grouped inside fields and records. The advantage here is that the record definition (and the field definition) is flexible.
Awk is powerful. It's designed for work with one-line records, but that point could be relaxed. In order to see in some of these aspects, we are going to look at some illustrative (and real) examples.
BEGIN { printf "LaTeX preample" printf "\\begin{tabular}{|c|c|...|c|}" }
{ printf $1" & " printf $2" & " . . . printf $n" \\\\ " printf "\\hline" }
END { print "\\end{document}" }
|
( $1 == "====>" ) { NomObj = $2 TotObj = $4 if ( TotObj > 0 ) { FS = "|" for ( cont=0 ; cont<TotObj ; cont++ ) { getline print $2 $4 $5 $3 >> NomObj } FS = " " } } |
Acutally, the object name was not returned, and it was sligthly more complicated, but this is supposed to be an illustrative example. |
BEGIN { BEGIN_MSG = "From" BEGIN_BDY = "Precedence:" MAIN_KEY = "Subject:" VALIDATION = "[MONTH REPORT]" HEAD = "NO"; BODY = "NO"; PRINT="NO" OUT_FILE = "Month_Reports" } { if ( $1 == BEGIN_MSG ) { HEAD = "YES"; BODY = "NO"; PRINT="NO" } if ( $1 == MAIN_KEY ) { if ( $2 == VALIDATION ) { PRINT = "YES" $1 = ""; $2 = "" print "\n\n"$0"\n" > OUT_FILE } } if ( $1 == BEGIN_BDY ) { getline if ( $0 == "" ) { HEAD = "NO"; BODY = "YES" } else { HEAD = "NO"; BODY = "NO"; PRINT="NO" } } if ( BODY == "YES" && PRINT == "YES" ) { print $0 >> OUT_FILE } } |
Maybe we are administrating a mailing list and from time to time, some special messages
are submitted to the list (for example, monthly reports) with some specific format (subject as '[MONTH REPORT]
month , dept'). Suddenly, we decide at the end of the year put together all these messages, saving aside the
others. This can be done by processing the mail spool with the awk program on the left. To get each report written to an individual file means three extra lines of code. |
NOTE: This example assumes that the mail spool is structured as I think it is. This programs works for my mail. |
I've used awk for many other tasks (automatic generation of web pages with information from simple databases) and I know
enough about awk programming to be sure that a lot of things can be done.
Just let your imagination fly.
Up to now, nearly all the examples process all the input file lines. But, as also the manual page states, it is possible to process only some of the input lines. One must just preceed the group of commands with the condition the line should meet. The matching condition could be very flexible, variing from a simple regular expression to a check on the contents of some field, with the possibility of grouping conditions with the proper logical operators.
As any other programming language, awk
implements all the necessary flow control structures, as well as
a set of operators and predefined functions to deal with numbers and strings.
It's possible, of course, to include user defined functions with the keyword function. Apart from the common scalar variables, awk is also able to manage variable sized arrays.
As it happens in any programming language, there are some very common functions and it becomes uncomfortable to cut and
paste pieces of code. That's the reason why libraries exist. With the GNU version of awk
, is possible include
them within the awk
program. This is however an outlook to the things which are possible and outside the scope
of this article.
AWK is very appropriate for the purposes for which it was build: Read data line by line and act upon the strings and patterns in the lines.
Files like /etc/password
turn out to be ideal for reformatting and processing with AWK. AWK is invaluable
for such tasks.
Of course AWK is not alone. Perl is a strong competitor but still it is worthwhile to know some AWK tricks.
This kind of very basic commands and is not very well documented, but you can find something when looking around.
man awk
Usually, all books on unix mention this command, but only some of them treat it in detail. The best we can do, is to browse any book we get into our hands. You never know where useful information can be found.
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019