|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
The external cut command displays selected columns or fields from each line of a file. It is a UNIX equivalent to the relational algebra selection operation. If the capabilities of cut are not enough (and cut limitation of delimiter to a single character is one very annoying limitation), then the alternatives are AWK-reimplementation and Perl re-implementations Generally it make sense to use Perl re-implementations that supports regular expression for specifying delimiters unless high speed is absolutely crucial (and even in this case your mileage can vary ;-). Perl used to have a very interesting project called Perl power tools -- a reimplementation of most of Unix utilities in Perl.
|
The cut command is one of the oldest Unix command. That means that it is more then 40 years old. And it shows. It is important to understand that this is a Unix command and behaves in "Unix way". For example, it uses IFS (Input Field Separators) to determine where to split fields. You can check it with set | grep IFS . You can also set it, for example, to:
IFS=" \t\n"
The most typical usage of cut command is cutting one of several columns from a file (often a log file) to create a new file. For example:
cut -d ' ' -f 2-7 < messages
retrieves the second to seventh field assuming that each field is separated by a single ( note: single ) blank. Fields are counted starting from one. If you to split filed seperated by multiple blanks or \t character you need pre[prcess the file using tr with option -s (sweeze)
Often you need to squeeze blanks for cut to work correctly using tr command. See Squeezing blanks with tr before applying cut in field selection mode. To replace every sequence of characters in the <blank> character class with a single : (colon) character before using cut, enter:
tr -s '[:blank:]' ':'You can just squeeze blanks without replacing them with some other character too:
cat messages | tr -s ' ' ' ' | cut -d ' ' -f 2-7
Option -d specified a single character delimiter (in the example above it is a blank) which serves as field separator. option -f which specifies range of fields included in the output (fields range from two to seven ). Option -d presuppose usage of option -f.
Cut can work in two modes:
Cut is essentially a simple text parsing tool and unless the task in hands is also simple you will be better off using other, more flexible, text parsing tools instead. On modern computers difference between invocation of cut and invocation of awk is negligible. You can also use Perl in command line mode for the same task. If option -a ( autosplit mode) is specified, then each line in Perl is converted into array @F. So Perl emulation of cut consist of writing a simple print statement that outputs the necessary fields. The advantage of using Perl is that the columns can be counted from the last (using negative indexes).
The advantage of Perl is that the columns can be counted from the last (using negative indexes). |
Typical pages with Perl command line one-liners contain many interesting examples that can probably be adapted to your particular situation:
Here is example on how to print first and the second from the last columns:
perl -lane 'print "$F[0]:$F[-2]\n"'Here's a more complex one-line script that will print out the fourth word of every line, but also skip any line beginning with a # because it's a comment line.
perl -naF 'next if /^#/; print "$F[3]\n"'
The most popular modern usage of cut is probably connected with processing http and proxy logs (see Tips).
Note:
Another, less cool way to specify blank (or other shell-sensitive character) is to use \ -- the following example prints the second field of every line in the file /etc/passwd
cut -f2 -d\ /etc/passwd | more
A column is one character position. In this mode cut acts as a generalized for files substr function. Classic Unix cat cannot count characters from the back of the line like Perl substr function, but rcut can ). This type of selection is specified with -c option. List entries can be open (from the beginning like in -5, or to the end like in 6-) , or closed (like 6-9).
cut -c 4,5,20 foocuts file foo at columns 4, 5, and 20.
cut -c 1-5 a.dat | moreprint the first 5 characters of every line in the file a.dat
cut -c -5 a.dat | moresame as above but using open range
In this mode cut selects not characters but fields delimited by specific single character delimiter specified by option -d. The list of fields is specified with -f option ( -f [list] )
cut -d ":" -f1,7 /etc/passwdcuts fields 1 and 7 from /etc/passwd
cut -d ":" -f 1,6- /etc/passwdcuts fields 1, 6 to the end from /etc/passwd
The default delimiter is TAB. If space is used as a delimiter, be sure to put it in quotes (-d " ").
To deal with multiple delimiters (for example multiple blanks separating fields, you need either use Perl or preprocess the record with tr (the latter has option -s, --squeeze-repeats, -- replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character )
To deal with multiple delimiters (for example multiple blanks separating fields, you need either use Perl or preprocess the record with tr (the latter has option -s, --squeeze-repeats --replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character ) |
Notes:
Unix tr command has option -s which allow to replace sequence of identical characters with a single character:
-s, --squeeze-repeats Replace sequences of the same character with one. -s uses set1 if neither translating nor deleting specified, otherwise squeeze uses set2 and occurs after translation or deletion.
For example:
tr -s ' ' ' ' < in_file
tr -s '[:space:]' ':' < text
In case the second set is shorter then the first the tr command to repeat the last specified character enough times to make the second string as long as the first string.
Sets are specified as strings of characters. Most represent themselves. Interpreted sequences are:
- \nnn -- character with octal value nnn
- \xnn -- character with hexadecimal value nn
- \\ -- backslash
- \a -- alert
- \b -- backpace
- \f -- form feed
- \r -- return
- \t -- horizontal tab
- \v -- vertical tab
- \E -- escape
- c1-c2 -- all characters from c1 to c2 in ascending order. The character specified by c1 must collate before the character specified by c2.
- [c1-c2] -- same as c1-c2 if both sets use this form
- [c*] -- set2 extended to the length of set1 with the symbol c. Useful is you need to repeat in set2 symbol other then the last character. In other words fills out the set2 with the character specified by c. This option can be used only at the end of the set2. Any characters specified after the * (asterisk) are ignored.
- [c*N] -- N copies of symbol c. N is considered a decimal integer unless the first digit is a 0; then it is considered an octal integer.
- [:alnum:] -- all letters and digits
- [:alpha:] -- all letters
- [:blank:] -- all horizontal whitespace
- [:cntrl:] -- all control characters
- [:digit:] -- all digits
- [:graph:] -- all printable characters, not including space
- [:lower:] -- all lower case letters
- [:print:] -- all printable characters, including space
- [:punct:] -- all punctuation characters
- [:space:] -- all horizontal or vertical whitespace
- [:upper:] -- all upper case letters
- [:xdigit:] -- all hexadecimal digits
- [=c=] -- Specifies all of the characters with the same equivalence class as the character specified by C.
As cut consider each blank as delimiter, multiple blanks need to be squeezed before applying cut. Here, for example is how to get the list of the largest files in the subtree:
find . -ls | tr -s ' ' ' ' | cut -d ' ' -f 7- | sort -nr | head
In field selection mode cut can suppress lines that contain no defined in option -d delimiters ( -s option). Unless this option is specified, lines with no delimiters will be included in the output untouched.
This is GNU cut option only. Option --complement converts the set of selected bytes, characters or fields to its complement. It applies to the preceding option. In this case you can specify not the list of fields of character columns to be retained, but those that needs to be excluded. In some cases that simplifies the writing of the selection range. For example instead of the example listed above:
cut -d ":" -f 1,6- /etc/passwdThis one-liner cuts fields 1 and 6 to the end on the line from /etc/passwd
you can specify:
cut -d ":" -f 2-5 --complement /etc/passwdThis one liner cuts fields 1 and 6 to the end on the line from /etc/passwd
By using pipes and output shell redirection operators you can create new files with a subset of columns or fields contained in the first file.
As shell has very primitive string handling capabilities which are not well understood by most Unix sysadmin (see String Operations in Shell for details). That leads to the situation, when cut is used in shell programming as poor man substr function to select certain substrings from a variable.
For example:
echo Argument 1 = [$1] c=`echo $1 | cut -c6-8` echo Characters 6 to 8 = [$c]Output:
Argument 1 = [1234567890] Characters 6 to 8 = [678]
This is one of many ways to perform such a selection. In all but simplest cases AWK or Perl are better tools for the job. If you are selecting fields of a shell variable, you should probably use the set command and echo the desired positional parameter into pipe.
For complex cases Perl is definitely a preferable tool. Moreover several Perl re-implementations of cut exists: see for example Perl cut.
BTW Perl implementations are more flexible and less capricious that the C-written original Unix cut command.
As I mentioned before there are two variants of cut: the first in character column cut and the second is delimiter based (parsing) cut. In both cases option can be separated from the value by a space, for example
-d ' '
In other words POSIX and GNU implementations of cut uses "almost" standard logical lexical parsing of argument although most examples in the books use "old style" with arguments "glued" to options. "Glued" style of specifying arguments is generally an anachronism. Still quoting of delimiter might not always be possible even in modern versions for example most implementations of cut requires that delimiter \t (tab) be specified without quotes. You generally need to experiment with your particular implementation.
1. Character column cut
cut -c list [ file_list ]
Option:
-c list Display (cut) columns, specified in list, from the input data. Columns are counted from one, not from zero, so the first column is column 1. List can be separated from the option by space(s) but no spaces are allowed within the list. Multiple values must be comma (,) separated. The list defines the exact columns to display. For example, the -c 1,4,7 notation cuts columns 1, 4, and 7 of the input. The -c -10,50 would select columns 1 through 10 and 50 through end-of-line (please remember that columns are counted from one)2. Delimiter-based (parsing) cut
cut -f list [ -d char ] [ -s ] [ file_list ]
Options:
d char The character char is used as the field delimiter. It is usually quoted but can be escaped. The default delimiter is a tab character. To use a character that has special meaning to the shell, you must quote the character so the shell does not interpret it. For example, to use a single space as a delimiter, type -d' '.
-f list Selects (cuts) fields, specified in list, from the input data. Fields are counted from one, not from zero. No spaces are allowed within the list. Multiple values must be comma (,) separated. The list defines the exact field to display. The most practically important ranges are "open" ranges, were either starting field or the last field are not specified explicitly (omitted). For example:
Specification can be complex and include both selected fields and ranges. For example, -f 1,4,7 would select fields 1, 4, and 7.
The -f2,4-6,8 would select fields 2 to 6 (range) and field 8.
Please remember that cut is good only for simple cases. In complex cases AWK and Perl actually save your time. Limitations are many. Among them:
Creating an alias, which cuts output on SGE qhost command to fit smaller screen on the smartphone (this trick can be used for other commands too):
alias qh='qhost | cut -c 1-20,59-80'
cut -f 1,5 -d : /etc/passwdThis displays the login name and full user name fields of the system password file. These are the first and fifth fields (-f 1,5) separated by colons (-d :).
For example, if the /etc/passwd file looks like this:
su:*:0:0:User with special privileges:/:/usr/bin/sh daemon:*:1:1::/etc: bin:*:2:2::/usr/bin: sys:*:3:3::/usr/src: adm:*:4:4:System Administrator:/var/adm:/usr/bin/sh pierre:*:2000:100:Pierre Harper:/home/pierre:/usr/bin/sh joan:*:2001:100:Joan Brown:/home/joan:/usr/bin/sh
The cut command produces:
su:User with special privileges daemon: bin: sys: adm:System Administrator pierre:Pierre Harper joan:Joan Brown
cut -f "1 2 3" -d : /etc/passwd
The cut command produces:
su:*:0 daemon:*:1 bin:*:2 sys:*:3 adm:*:4 pierre:*:200 joan:*:202
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.stats
Dr. Nikolai Bezroukov
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Jul 19, 2016 | shapeshed.com
... ... ... How to cut by complement patternTo cut by complement us the
--complement
option. Note this option is not available on the BSD version ofcut
. The--complement
option selects the inverse of the options passed to sort.In the following example the
-c
option is used to select the first character. Because the--complement
option is also passed tocut
the second and third characters are cut.echo 'foo' | cut --complement -c 1 ooHow to modify the output delimiterTo modify the output delimiter use the
--output-delimiter
option. Note that this option is not available on the BSD version ofcut
. In the following example a semi-colon is converted to a space and the first, third and fourth fields are selected.echo 'how;now;brown;cow' | cut -d ';' -f 1,3,4 --output-delimiter=' ' how brown cowGeorge Ornbo is a hacker, futurist, blogger and Dad based in Buckinghamshire, England.He is the author of Sams Teach Yourself Node.js in 24 Hours .He can be found in most of the usual places as shapeshed including Twitter and GitHub .
Content is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Aug 14, 2017 | www.unix.com
06-29-2016Vikram Jain Registered UserJoin Date: Jun 2016 Last Activity: 23 March 2017, 2:57 PM EDT Posts: 3 Thanks: 3 Thanked 0 Times in 0 Posts
Cut command on RHEL 6.8 compatibility issues
We have a lot of scripts using cut as :
cut -c 0-8 --works for cut (GNU coreutils) 5.97, but does not work for cut (GNU coreutils) 8.4.
Gives error -Code:
cut: fields and positions are numbered from 1 Try `cut --help' for more information.The position needs to start with 1 for later version of cut and this is causing an issue.Is there a way where I can have multiple cut versions installed and use the older version of cut for the user which runs the script?
or any other work around without having to change the scripts?
Thanks.
Last edited by RudiC; 06-30-2016 at 04:53 AM .. Reason: Added code tags.Vikram Jain
Don Cragun AdministratorJoin Date: Jul 2012 Last Activity: 14 August 2017, 3:59 PM EDT Location: San Jose, CA, USA Posts: 10,455 Thanks: 533 Thanked 3,654 Times in 3,118 Posts
What are you trying to do when you invoke
Code:
cut -c 0-8
with your old version of cutWith that old version of cut , is there any difference in the output produced by the two pipelines:
Code:
echo 0123456789abcdef | cut -c 0-8
and:Code:
echo 0123456789abcdef | cut -c 1-8
or do they produce the same output?Don Cragun
Vikram Jain Registered UserJoin Date: Jun 2016 Last Activity: 23 March 2017, 2:57 PM EDT Posts: 3 Thanks: 3 Thanked 0 Times in 0 Posts
I am trying to get a value from the 1st line of the file and check if that value is a valid date or not.
------------------------------------------------------------------
Below is the output for the cut command from new versionCode:
$ echo 0123456789abcdef | cut -c 0-8 cut: fields and positions are numbered from 1 Try `cut --help' for more information. $ echo 0123456789abcdef | cut -c 1-8 01234567
-------------------------------------------------------------------
With old version, both have same results:Code:
$ echo 0123456789abcdef | cut -c 0-8 01234567 $ echo 0123456789abcdef | cut -c 1-8 01234567Please wrap all code, files, input & output/errors in CODE tags
It makes them far easier to read and preserves spaces for indenting or fixed-width data.
Last edited by rbatte1; 06-30-2016 at 11:38 AM .. Reason: Code tagsVikram Jain
06-30-2016
Scrutinizer ModeratorJoin Date: Nov 2008 Last Activity: 14 August 2017, 2:48 PM EDT Location: Amsterdam Posts: 11,509 Thanks: 497 Thanked 3,326 Times in 2,934 Posts
The use of 0 is not according to specification. Alternatively, you can just omit it, which should work across versions
Code:
$ echo 0123456789abcdef | cut -c -8 01234567
If you cannot adjust the scripts, you could perhaps create a wrapper script for cut, so that the 0 gets stripped..
Last edited by Scrutinizer; 07-02-2016 at 02:28 AM ..Scrutinizer
06-30-2016
Vikram Jain Registered UserJoin Date: Jun 2016 Last Activity: 23 March 2017, 2:57 PM EDT Posts: 3 Thanks: 3 Thanked 0 Times in 0 Posts
Yes, don't want to adjust my scripts.
Wrapper for cut looks like something that would work.could you please tell me how would I use it, as in, how would I make sure that the wrapper is called and not the cut command which causes the issue.
Vikram Jain
Don Cragun AdministratorJoin Date: Jul 2012 Last Activity: 14 August 2017, 3:59 PM EDT Location: San Jose, CA, USA Posts: 10,455 Thanks: 533 Thanked 3,654 Times in 3,118 Posts
The only way to make sure that your wrapper is always called instead of the OS supplied utility is to move the OS supplied utility to a different location and install your wrapper in the location where your OS installed cut originally.
Of course, once you have installed this wrapper, your code might or might not work properly (depending on the quality of your wrapper) and no one else on your system will be able to look at the diagnostics produced by scripts that have bugs in the way they specify field and character ranges so they can identify and fix their code.
My personal opinion is that you should spend time fixing your scripts that call cut -c 0.... , cut -f 0... , and lots of other possible misuses of 0 that are now correctly diagnosed as errors by the new version of cut instead of debugging code to be sure that it changes all of the appropriate 0 characters in its argument list to 1 characters and doesn't change any 0 characters that are correctly specified and do not reference a character 0 or field 0.
vgersh99 (06-30-2016), Vikram Jain (06-30-2016)
06-30-2016
MadeInGermany ModeratorJoin Date: May 2012 Last Activity: 14 August 2017, 2:33 PM EDT Location: Simplicity Posts: 3,666 Thanks: 295 Thanked 1,226 Times in 1,108 Posts
An update of "cut" will overwrite your wrapper.
Much better: change your scripts. Run the following fix_cut script on your scripts:
Code:
#!/bin/sh # fix_cut PATH=/bin:/usr/bin PRE="\b(cut\s+(-\S*\s+)*-[cf]\s*0*)0-" for arg do perl -ne 'exit 1 if m/'"$PRE"'/' "$arg" || { perl -i -pe 's/'"$PRE"'/${1}1-/g' "$arg" } doneExample: fix all .sh scriptsCode:
fix_cut *.shThe Following User Says Thank You to MadeInGermany For This Useful Post:
Vikram Jain (07-08-2016)
Power Tools project
=head1 DESCRIPTION The B utility selects portions of each line (as specified by I) from each I (or the standard input by default), and writes them to the standard output. The items specified by I can be in terms of column position or in terms of fields delimited by a special character. Column numbering starts from 1. I is a comma- or whitespace-separated set of increasing numbers and/or number ranges. Number ranges consist of a number, a dash ('-'), and a second number and select the fields or columns from the first number to the second, inclusive. Numbers or number ranges may be preceded by a dash, which selects all fields or columns from 1 to the first number. Numbers or number ranges may be followed by a dash, which selects all fields or columns from the last number to the end of the line. Numbers and number ranges may be repeated, overlapping, and in any order. It is not an error to select fields or columns not present in the input line. =head1 OPTIONS B accepts the following options: =over 4 =item -b list The I specifies byte positions. =item -c list The I specifies character positions. =item -d string Use the first character of I as the field delimiter character instead of the tab character. =item -f list The I specifies fields, delimited in the input by a single tab character. Output fields are separated by a single tab character. =item -n Do not split multi-byte characters. =item -s Suppresses lines with no field delimiter characters. Unless specified, lines with no delimiters are passed through unmodified. =back =head1 BUGS B does not understand multibyte characters; the C<-c> and C<-b> options function identically, and C<-n> does nothing. =head1 STANDARDS This B implementation is compatible with the I implementation. =head1 AUTHOR The Perl implementation of B was written by Rich Lafferty, I. =head1 COPYRIGHT and LICENSE This program is free and open software. You may use, copy, modify, distribute and sell this program (and any modified variants) in any way you wish, provided you do not restrict others to do the same. =cut
Below is command to find out number of connections to each ports which are in use using netstat & cut.
netstat -nap | grep 'tcp\|udp' | awk '{print $4}' | cut -d: -f2 | sort | uniq -c | sort -nBelow is description of each commands :: Netstat command is used to check all incoming and outgoing connections on linux server. Using Grep command you can sort lines which are matching pattern you defined. AWk is very important command generally used for scanning pattern and process it. It is powerful tool for shell scripting. Sort is used to sort output and sort -n is for sorting output in numeric order. Uniq -c this help to get uniq output by deleting duplicate lines from it.
ffe is a flat file extractor. It can be used for reading different flat file structures and displaying them in different formats. ffe can read fixed length and separated text files and fixed length binary files.
It is a command line tool developed under GNU/Linux. The main areas of use are extracting particular fields or records from a flat file, converting data from one format to an other, e.g. from CSV to fixed length, verifying a flat file structure, as a testing tool for flat file development, and displaying flat file content in human readable form.
Records can now be identified using regular expressions using the new keyword "rid". The -l/--loose option does not cause the program to abort when an invalid block is found from binary input. Instead of aborting, the next valid block is searched from the input stream.
Author:
tjsa [contact developer]
The cut command has the ability to cut out characters or fields. cut uses delimiters.The cut command uses delimiters to determine where to split fields, so the first thing we need to understand about cut is how it determines what its delimiters are. By default, cut's delimiters are stored in a shell variable called IFS (Input Field Separators).
Typing:
set | grep IFSwill show you what the separator characters currently are; at present, IFS is either a tab, or a new line or a space.
Looking at the output of our free command, we successfully separated every field by a space (remember the tr command!)
Similarly, if our delimiter between fields was a comma, we could set the delimiter within cut to be a comma using the -d switch:
cut -d ","The cut command lets one cut on the number of characters or on the number of fields. Since we're only interested in fields 2,3 and 4 of our memory, we can extract these using:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4Why do you need to set -d " " even when IFS already specifies that a spaces is a IFS ?
If this does not work on your system, then you need to set the IFS variable.
Detour:
Setting shell variables is easy. If you use the bash or the Bourne shell (sh), then:
IFS=" \t\n"In the csh or the ksh, it would be:
setenv IFS=" \t\n"That ends this short detour.
At this point, it would be nice to save the output to a file. So let's append this to a file called mem.stats:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.statsEvery time you run this particular command it should append the output to the mem.stats file.
The -f switch allows us to cut based upon fields. If we were wanting to cut based upon characters (e.g. cut character 6-13 and 15, 17) we would use the -c switch.
To affect the above example:
free | tr -s ' ' | sed '/^Mem/!d' | cut -c6-13,15,17 >> mem.stats
May 14, 1997
Hipparcos and Tycho Data structures and Load routines in C 33
7. Unix Utilities
The output of the above programs are ideal for use with the standard unix utilities such as egrep, cut, join and nawk. These may also be used to query the data files directly although this is not very efficient. For example, the following stores all HIP identifiers of entries in hip_main with a DSS chart in file hip.DSS:
cut -f2,70 -d | hip_main.dat | egrep D| cut -f1 -d " " >hip.DSSOn a sparc 20 this pipeline took over 5 minutes.
... ... ...
A Script to Check Oracle Values on Hundreds of Databases for Unix was a way to run the same SQL*Plus command on every database, and even databases on other servers. I had a manager who wanted to know the default optimizer mode for every database at a shop that had over 150 databases on 30 database servers. The manager allotted me two days for this task, and he was quite surprised when I provided the correct answer in ten minutes. I did it using the following script:
# Loop through each host name . . . for host in `cat ~oracle/.rhosts|\ cut -d"." -f1|awk '{print $1}'|sort -u` do echo " " echo "************************" echo "$host" echo "************************" # loop from database to database for db in `cat /etc/oratab|egrep ':N|:Y'|\ grep -v \*|grep ${db}|cut -f1 -d':'"` do home=`rsh $host "cat /etc/oratab|egrep ':N|:Y'|\ grep -v \*|grep ${db}|cut -f2 -d':'"` echo "************************" echo "database is $db" echo "************************" rsh $host " ORACLE_SID=${db}; export ORACLE_SID; ORACLE_HOME=${home}; export ORACLE_HOME; ${home}/bin/sqlplus -s /<<! set pages 9999; set heading off; select value from v"\\""$"parameter where name='optimizer_mode'; exit !" done doneThis script requires the Unix remote shell (rsh) privilege so that it can bounce quickly between servers. You do this by making entries into your .rhosts file. The script will loop though all of the server names defined in the .rhosts file on your system, and will then loop through each database listed in each server's /etc/oratab file.
You can use this script to check any database values or to run any SQL*Plus script. You quickly can get user reports, performance statistics, and a wealth of information on every database in your enterprise. I have also used variations on this script to delete old trace files from the Oracle directories and to check free space in archived redo log filesystems. This script has saved me many hours of repetitive work executing the same command against many databases.
Finally, to show some of the flexibility of find, let's look at one example that is a bit more advanced. Suppose we were looking for all data files in the HP user home directory filesystems (which are named /u and /u2) that are over one million bytes long and were modified in the past 30 days. The comma< DDelimiter can be specified for fields nd below, where the output of find is piped into a few other Unix commands for postprocessing, results in a mail message being sent to the issuer of the command, containing the desired information in a neat tabular form.
The full command is:
find /u /u2 -type f -size +1000000c -mtime -30 -print | \ xargs file | grep data$ | cut -d ':' -f 1 | \ xargs ls -aoq | cut -c 16- | sort | mailx $LOGNAME
You cannot cut columns in emacs. Emacs will only allow you to select rows, not columns. Here are instructions on cutting the columns from your data tables you really need, and putting them into a new file.EXTRA CHARACTERS OR FIELDS IN THE FILE
If your genotyper files include a few characters before the family name, you can eliminate those columns before processing with gtyper2.pl by using the cut command. For example, if you have a file with three characters preceding the family name, at the user prompt type:
cut -c4-55 genotyper-file-name > cut-file-name
This takes characters 4 through 55 and puts them into the file 'cut-file-name'. (The header of the genotyper file is also cut, but that does not matter because the pattern matched to disregard the header is 'Dye' which is located later in the line.)
To see how many characters are needed in a cut command, copy the length of the line you will need, then type at the prompt: wc -c (and a carriage return). Then in the next line, paste in the copied length and end with another carriage return. On the next (blank) line, press ctrl-d (an escape). The number of characters you pasted in will be returned.
To cut fields of variable width, use -f and -d to denote the separator (default is tab). For example, to cut fields 1 through 5 that are separated by spaces, from file filename:
cut -d" " -f1,2,3,4,5 filename > filename2also:
cut -f1,3 file > newfilefor a tab-delimited file, will put fields 1 & 3 into newfile
learnlinux.tsf.org.za
This page disappeared from the WEB. This is the only backup available.
The cut command has the ability to cut out characters or fields. cut uses delimiters.
The cut command uses delimiters to determine where to split fields, so the first thing we need to understand about cut is how it determines what its delimiters are. By default, cut's delimiters are stored in a shell variable called IFS (Input Field Separators).
Typing:
set | grep IFSwill show you what the separator characters currently are; at present, IFS is either a tab, or a new line or a space.
Looking at the output of our free command, we successfully separated every field by a space (remember the tr command!)
Similarly, if our delimiter between fields was a comma, we could set the delimiter within cut to be a comma using the -d switch:
cut -d ",The cut command lets one cut on the number of characters or on the number of fields. Since we're only interested in fields 2,3 and 4 of our memory, we can extract these using:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4Why do you need to set -d " " even when IFS already specifies that a spaces is a IFS ?
If this does not work on your system, then you need to set the IFS variable.
Detour:Setting shell variables is easy. If you use the bash or the Bourne shell (sh), then:
IFS=" \t\n"In the csh or the ksh, it would be:
setenv IFS=" \t\n"
That ends this short detour.
At this point, it would be nice to save the output to a file. So let's append this to a file called mem.stats:
free | tr -s ' ' | sed '/^Mem/!d' | cut -d" " -f2-4 >> mem.statsEvery time you run this particular command it should append the output to the mem.stats file.
The -f switch allows us to cut based upon fields. If we were wanting to cut based upon characters (e.g. cut character 6-13 and 15, 17) we would use the -c switch.
To affect the above example:
free | tr -s ' ' | sed '/^Mem/!d' | cut -c6-13,15,17 >> mem.statsFirst Example in stages:1. For the next example I'd like you to make sure that you've logged on as a user (potentially root) on one of your virtual terminals.
How do you get to a virtual terminal? Ctrl-Alt plus F1 or F2 or F3 etcetera.
It should prompt you for a username and a password. Log in as root, or as yourself or as a different user and once you've logged in, switch back to your X terminal with Alt-F7. If you weren't working on X at the beginning of this session, then the Ctrl + Alt + F1 is not necessary. A simple Alt + F2 would open a new terminal, to return to the first terminal press Alt+F1.
2. Run the who command:
whoThis will tell us who is logged on to the system. We could also run the w command:
wThis will not only tell us who is logged on to our system, but what they're doing. Let's use the w command, since we want to save information about what users are doing on our system. We may also want to save information about how long they've been idle and what time they logged on.
3. Find out who is logged on to your system. Pipe the output of the w command into the input of cut. This time however we're not going to use a delimiter to delimit fields but we're going to cut on characters. We could say:
w | cut -c1-8This tells the cut command the first eight characters. Doing this you will see that it cuts up until the first digit of the second. So in my case the time is now
09:57:24and it cuts off to
09:57:2It also cuts off the user. So if you look at this, you're left with USER and all the users currently logged onto your system. And that's cutting exactly 8 characters.
4. To cut characters 4 to 8?
w | cut -c4-8This will produce slightly bizarre-looking output.
So cut cannot only cut fields, it can cut exact characters and ranges of characters. We can cut any number of characters in a line.
Second Example in stages:Often cutting characters in a line is less than optimal, since you never know how long your usernames might be. Really long usernames would be truncated which clearly would not be acceptable. Cutting on characters is rarely a long-term solution.. It may work because your name is Sam, but not if your name is Jabberwocky!
1. Let's do a final example using cut. Using our password file:
cat /etc/passwdI'd like to know all usernames on the system, and what shell each is using.
The password file has 7 fields separated by a ':'. The first field is the login username, the second is the password which is an x (because it is kept in the shadow password file), the third field is the userid, the fourth is the group id, the fifth field is the comment, the sixth field is the users home directory and the seventh field 7 indicates the shell that the user is using. I'm interested in fields 1 and 7.
2. How would we extract the particular fields? Simple:[6]
cat /etc/passwd |cut -d: -f1,7 cut -d -f1,7 cut -d" " -f 1,7If we do this, we should end up with just the usernames and their shells. Isn't that a nifty trick?
3. Let's pipe that output to the sort command, to sort the usernames alphabetically:
cat /etc/passwd | cut -d: -f1,7 | sortThird example in stagesSo this is a fairly simple way to extract information out of files. The cut command doesn't only work with files, it also works with streams. We could do a listing which that would produce a number of fields. If you recall, we used the tr command earlier to squeeze spaces.
ls -alIf you look at this output, you will see lines of fields. Below is a quick summary of these fields and what they refer to.
field number indication of 1 permissions of the file 2 number of links to the file 3 user id 4 group id 5 size of the file 6 month the file was modified 7 day the file was modified 8 time the file was modified 9 name of the file I'm particularly interested in the size and the name of each file.
1. Let's try and use our cut command in the same way that we used it for the password file:
ls -al | cut -d' ' -f5,8The output is not as expected. Because it is using a space to look for separate fields, and the output contains tabs. This presents us with a bit of a problem.
2. We could try using a \t (tab) for the delimiter instead of a space, however cut only accepts a single character (\t is two characters). An alternative way of inserting a special character like tab is to type Ctrl-v then hit the tab key.
^v + <tab>That would replace the character by a tab.
ls -al | cut -d" " -f5,8That makes the delimiter a tab. But, we still don't get what we want, so let's try squeezing multiple spaces into a single space in this particular output. Thus:
ls -la | tr -s ' ' | cut -d' ' -f5,83. And hopefully that should now produce the output we're after. If it produces the output we're after on your system, then we're ready for lift-off. If it doesn't, then try the command again.
Now what happens if we want to swap the name with the size? I'll leave that as an exercise for you.
Exercises:
- Using the tr and the cut commands, perform the following:
- Obtain the mount point, the percentage in use and the partition of that mount of you disk drive to produce the following:
/dev/hdb2 80% /home- Replace the spaces in your output above by colons (:)
- Remove the /dev/shm line
- As root, make the following change:[7]
chmod o+r /dev/hda- Now, obtain the Model and Serial Number of your hard disk, using the command hdparm.
- Obtain the stats (reads and writes etc.) on your drive using the iostat command, keeping the output as a comma separated value format file for later use
Google matched content |
Linux and Unix cut command tutorial with examples George Ornbo
The following options are supported:
NAME cut - remove sections from each line of files SYNOPSIS cut {-b byte-list, --bytes=byte-list} [-n] [--help] [--version] [file...] cut {-c character-list, --characters=character-list} [--help] [--version] [file...] cut {-f field-list, --fields=field-list} [-d delim] [-s] [--delimiter=delim] [--only-delimited] [--help] [--ver- sion] [file...] DESCRIPTION This manual page documents the GNU version of cut. cut prints sections of each line of each input file, or the standard input if no files are given. A file name of `-' means standard input. Which sections are printed is selected by the options. OPTIONS The byte-list, character-list, and field-list are one or more numbers or ranges (two numbers separated by a dash) separated by commas. The first byte, character, and field are numbered 1. Incomplete ranges may be given: `-m' means `1-m'; `n-' means `n' through end of line or last field. -b, --bytes byte-list Print only the bytes in positions listed in byte- list. Tabs and backspaces are treated like any other character; they take up 1 byte. -c, --characters character-list Print only characters in positions listed in char- acter-list. The same as -b for now, but interna- tionalization will change that. Tabs and backspaces are treated like any other character; they take up 1 character. -f, --fields field-list Print only the fields listed in field-list. Fields are separated by a TAB by default. -d, --delimiter delim For -f, fields are separated by the first character in delim instead of by TAB. -n Do not split multibyte characters (no-op for now). -s, --only-delimited For -f, do not print lines that do not contain the field separator character. --help Print a usage message and exit with a status code indicating success. --version Print version information on standard output then exit.
Example pipesline_count=`wc -l $filename | cut -c1-8` process_id=`ps -ef \ | grep $process \ | grep -v grep \ | cut -f1 -d\ ` upper_case=`echo $lower_case | tr '[a-z]' '[A-Z]'`In all cases the pipeline has been used to set a variable to the value returned by the last command in the pipe. In the first example, the wc -l command counts the number of lines in the filename contained in the variable $filename. This text string is then piped to the cut command which snips off the first 8 characters and passes them on to stdout, hence setting the variable line_count.
In the second example, the pipeline has been folded using the backslash and we are searching for the process_id or PID of an existing command running somewhere on the system. The ps -ef command lists the whole process table from the machine. Piping this through to the grep command will filter out everything except any line containing our wanted process string. This will not return one line however, as the grep command itself also has the process string on its command line. So by passing the data through a second grep -v grep command, any lines containing the word grep are also filtered out. We now have just the one line we need and the last thing is to get the PID from the line. As luck would have it, the PID is the first thing on the line, so piping through a version of cut using the field option, we finally get the PID we are looking for. Note the field option delimiter character is an escaped tab character here. Always test the blank characters that UNIX commands return, they are not always what you would think they are.
Finally, to show some of the flexibility of find, let's look at one example that is a bit more advanced. Suppose we were looking for all data files in the HP user home directory filesystems (which are named /u and /u2) that are over one million bytes long and were modified in the past 30 days. The command below, where the output of find is piped into a few other Unix commands for postprocessing, results in a mail message being sent to the issuer of the command, containing the desired information in a neat tabular form.
The full command is:
find /u /u2 -type f -size +1000000c -mtime -30 -print | \ xargs file | grep data$ | cut -d: -f1 | \ xargs ls -aoq | cut -c16- | sort | mailx $LOGNAME
Once you have mastered various UNIX programs, the power doesn't stop there. The UNIX shell lets you build sophisticated "pipelines" that send data from one program into another. As an example, let's find out the most common first name of all the users on a UNIX machine. In a single command, you can get a list of all user names from the file /etc/passwd, extract the first names, sort them, count adjacent identical names, sort the resulting numbers, and then find the largest:
Command: cut -d: -f5 /etc/passwd \ | cut -d' ' -f1 \ | sort \ | uniq -c \ | sort -nr \ | head -1 Response: 12 John
Sort and uniqThe other two UNIX commands I've found useful when parsing log files are sort and uniq. Say you want to look at all the pages requested from your site, in alphabetical order. The command would look something like this:
cat myapp_log.20031016 | cut -d' ' -f4 | sortBut that gives you all the pages requested. If you're not interested in all requests, but only the unique pages, whether they were requested once or a million times, then you would just filter through the uniq command:cat myapp_log.20031016 cut -d' ' -f4 | sort | uniq
cut -- Look at part of each line.
"cut" lets you select just part of the information from each line of a file. If, for instance, you have a file called "file1" with data in this format:
0001 This is the first line 0002 This is the secondand so on, you can look at just the numbers by typingcut -c1-4 file1The "-c" flag means "columns"; it will display the first four columns of each line of the file. You can also look at everything but the line numbers:cut -c6-100 file1will display the sixth through one hundredth column (if the line is less than a hundred characters -- and most will be -- you'll see up to the end of the line).You can also use cut to look at fields instead of columns: for instance, if a file looks like this:
curran:Stuart Curran jlynch:Jack Lynch afilreis:Al Filreis loh:Lucy Ohyou can use cut to find the full name of each person, even though it's not always in the same place on each line. Typecut -f2 -d: file1"-f2" means "the second field"; "-d:" means the delimiter (the character that separates the fields) is a colon. To use a space as a delimiter, put it in quotations:cut -f2 -d" " file1
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Created: May 16, 1996; Last modified: October 15, 2018