|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
Both grep and map operate on lists: both input and output are lists. Perl grep that resembles Unix grep. Both are essentially an shorthand for the foreach loop and their execution involves execution an implicit loop over all elements of the list that is passed as the second argument. The have great value as in simple cases they allow to write more compact and faster code.
Their main value is not a new functionality, but the ability to make the code more compact and transparent. Both functions accept two arguments: the first is an expression and the second is the list or array.
|
The main difference between them is that grep can just select certain elements from the array or list, while map can transform them into a new array or list. That means that grep can be implemented via map, but not vice versa
The grep function evaluates the BLOCK or EXPR for each element of LIST, locally setting the $_ variable equal to each element. BLOCK is one or more Perl statements delimited by curly brackets. function grep does not affect the list supplied as the second argument.
The syntax is
grep BLOCK LIST grep EXPR, LIST
...Evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value consisting of those elements for which the expression evaluated to true. In scalar context, returns the number of times the expression was true.
@foo = grep(!/^#/, @bar) # weed out commentsor equivalently,
@foo = grep {!/^#/} @bar; # weed out commentsNote:
$_ is an alias to the current list element (parameter passed by reference), so it can be used to modify the elements of the LIST. While this is useful and supported, it can cause bizarre results if the elements of LIST are not variables. In other words grep returns aliases into the original list, much as a for loop's index variable. That is, modifying an element of a list returned by grep (for example, in a foreach , map or another grep) actually modifies the element in the original list. This is usually avoided when writing clear code.
If $_ is lexical in the scope where the grep appears (declared with the my $_ construct) then, in addition to being locally aliased to the list elements, $_ keeps being lexical inside the block; i.e., it can't be seen from the outside, avoiding any potential side-effects.
In a scalar context, grep returns a count of the selected elements.
$num_apple = grep /^apple$/i, @fruits;
In list context grep returns a list of those elements for which the EXPR or BLOCK evaluates to TRUE. Much like Unix grep does no files.
@bought_fruits=grep /^apple$/i, @fruits;
Function grep run implicit loop and on each iteration of this loop it returns the current element if the expression specified by the first argument is true, and does nothing otherwise. Thus, grep (like its Unix counterpart) can be used as a filter to extract those elements of a list for which an expression is true. The most common use of grep is with regular expression as the first argument, but any expression that evaluates to true for the elements you want and false for the elements you don't want can be used.
For each element of the array or list used by grep, implicit loop is executed with $_ set to the current element. |
Suppose we have an array of filenames called @files, and we want a second list called @jpgs containing only those filenames which end in the extension ".jpg''. Instead of using a loop, we could employ grep as follows:
@jpgs = grep(/\.jpg$/,@files);
As another example, suppose we have a list of files created with Unix ls utility and we want to eliminate all the directories from the list.
@non_a_directory = grep(! -d, @list);
Function grep can replace a loop that select elements based on some criteria and might execute faster. For example instead of
for ($i=0; $i<@x; i++) { if (x[i]>0) { y[j++]=x[i] } }you can write simply (instantly saving four line of code):
@y=grep($_>0,@x);If you need the index of the found element the problem became more involved, but still you can do it with map.
The first argument should be a statement of function call, which is evaluated for each element of a list. The result is written as an element of returned by map a new list. Like in grep the default variable $_ serves as a placeholder for the currently processed element of the list (one per iteration). This is a very compact notation for printing an array, as it gives the ability to add "\n" to each line. For example
print "Program arguments are:\n", map(" '$_'\n", @ARGV);
Generally it is suitable for simple processing of all elements of the array, the processing that create a new array by one-to-one mapping with the old one. For example when in the new array each element represents the value of the function applied to each element of the initial array. Suppose we have a array called @files containing filenames, and we wish to create a similar array called @sizes containing the sizes of those files.
@sizes = map(-s,@files);The same can be achieved by:
foreach @files { $sizes($i++)= -s; }
The expression "-s'', uses $_ as its default argument, so it will return the size of each file on each iteration of the implicit loop. Using map shortens the code and permit to avoid using explicit iteration counter in a loop.
Map often can be used to create a hash in which keys are elements of the array and values is some result of evaluation of each key. For example:
map($sizes{$_} = -s, @files);
In this case, we created the hash %sizes as the expression that we used as the first argument is evaluated once for each element of the array.
You can use a function as the first argument to map, but it should operate on the default variable $_ as the only parameter, since map does not pass any arguments to a function. If this function is also used for other purposes you can simply pass $_ as one of the arguments.
Map can select elements from an array, just like grep. The following two statements are equivalent (EXPR represents a logical expression).@selected = grep EXPR, @input; @selected = map { if (EXPR) { $_ } } @input;
Generate random password
@a = (0 .. 9, 'a' .. 'z'); $password = join '', map { $a[int rand @a] } 0 .. 7; print "$password\n"; y2ti3dal
Strip digits from the elements of an array
As with grep, avoid modifying $_ in map's block or using the returned list value as an lvalue, as this will modify the elements of LIST.# Trashes @array :(
@digitless = map { tr/0-9//d; $_ } @array; # Preserves @array :)
@digitless = map { ($x = $_) =~ tr/0-9//d; $x; } @array;
Transform filenames to file sizes
@sizes = map { -s $_ } @file_names;The -s file test operator returns the size of a file and will work on regular files, directories, special files, etc.
Convert an array to a hash
Converting an array to a hash is a fairly common use for map. In this example the values of the hash are irrelevant; we are only checking for the existence of hash keys.
%dictionary = map { $_, 1 } qw(cat dog man woman hat glove);
Capitalize an entire array by applying the uc function to each element:
@caps = map uc, @phrases;
In the next sample, mapping a regular expression to the array returns the first word of every phrase:
@first_word = map { /(\S+)/ } @phrases;
Each element need not necessarily map to a single item. If multiple values are created, map returns them all as a single, flattened list. For example, you could split all words in all phrases into a single list:
@words = map split, @phrases;
Still another use for map might be to convert a string to title case. You can do this by splitting a string into individual words, converting each to lowercase and then initial capitalization, and finally joining the words back into a single string:
$title = join ' ', map { ucfirst lc } split / /, $name;
Our final example uses map to put the sorted key/value pairs of a hash into a two-column HTML table:
print "<table>\n";
print map {"<tr><td>$_</td><td>$hash{$_}</td></tr>\n"} sort keys %hash;
print "</table>\n";
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Jun 03, 2016 | alvinalexander.com
Perl array FAQ: How can I test to see if a Perl array already contains a given value? (Also written as, How do I search an array with the Perl grep function?)
I use the Perl grep function to see if a Perl array contains a given entry. For instance, in this Perl code:
if ( grep { $_ eq $clientAddress} @ip_addresses ) { # the array already contains this ip address; skip it this time next; } else { # the array does not yet contain this ip address; add it push @ip_addresses, $clientAddress; }I'm testing to see if the Perl array "@ip_addresses" contains an entry given by the variable "$clientAddress".
Just use this Perl array search technique in an "if" clause, as shown, and then add whatever logic you want within your if and else statements. In this case, if the current IP address is not already in the array, I add it to the array in the "else" clause, but of course your logic will be unique.
An easier "Perl array contains" exampleIf it's easier to read without a variable in there, here's another example of this "Perl array contains" code:
if ( grep { $_ eq '192.168.1.100'} @ip_addresses )if you'd like more details, I didn't realize it, but I have another good example out here in my " Perl grep array tutorial ." (It's pretty bad when you can't find things on your own website.)
Nov 22, 2017 | alvinalexander.com
Perl grep array FAQ - How to search an array/list of strings By Alvin Alexander. Last updated: June 3 2016 Perl "grep array" FAQ: Can you demonstrate a Perl grep array example? (Related: Can you demonstrate how to search a Perl array?)
A very cool thing about Perl is that you can search lists (arrays) with the Perl grep function. This makes it very easy to find things in large lists -- without having to write your own Perl for/foreach loops.
A simple Perl grep array example (Perl array search)Here's a simple Perl array grep example. First I create a small string array (pizza toppings), and then search the Perl array for the string "pepper":
# create a perl list/array of strings @pizzas = qw(cheese pepperoni veggie sausage spinach garlic); # use the perl grep function to search the @pizzas list for the string "pepper" @results = grep /pepper/, @pizzas; # print the results print "@results\n";As you might guess from looking at the code, my
@results
Perl array prints the following output:pepperoniPerl grep array - case-insensitive searchingIf you're familiar with Perl regular expressions, you might also guess that it's very easy to make this Perl array search example case-insensitive using the standard
i
operator at the end of my search string.Here's what our Perl grep array example looks like with this change:
@results = grep /pepper/i, @pizzas;Perl grep array and regular expressions (regex)You can also use more complex Perl regular expressions (regex) in your array search. For instance, if for some reason you wanted to find all strings in your array that contain at least eight consecutive word characters, you could use this search pattern:
@results = grep /\w{8}/, @pizzas;That example results in the following output:
pepperoniPerl grep array - SummaryI hope this Perl grep array example (Perl array search example) has been helpful. For related Perl examples, see the Related block on this web page, or use the search form on this website. If you have any questions, or better yet, more Perl array search examples, feel free to use the Comments section below.
Nov 22, 2017 | stackoverflow.com
Learn more up vote down vote favorite
Geo ,Jun 10, 2010 at 16:39
Let's say I have this list:my @list = qw(one two three four five);and I want to grab all the elements containing
o
. I'd have this:my @containing_o = grep { /o/ } @list;But what would I have to do to also receive an index, or to be able to access the index in
grep
's body?,
my @index_containing_o = grep { $list[$_] =~ /o/ } 0..$#list; # ==> (0,1,3) my %hash_of_containing_o = map { $list[$_]=~/o/?($list[$_]=>$_):() } 0..$#list # ==> ( 'one' => 0, 'two' => 1, 'four' => 3 )
Nov 22, 2017 | stackoverflow.com
Perl: Searching for item in an Array Ask Question up vote down vote favorite 1
Majic Johnson ,Apr 20, 2012 at 4:53
Given anarray @A
we want to check if theelement $B
is in it. One way is to say this:Foreach $element (@A){ if($element eq $B){ print "$B is in array A"; } }However when it gets to Perl, I am thinking always about the most elegant way. And this is what I am thinking: Is there a way to find out if array A contains B if we convert A to a variable string and use
index(@A,$B)=>0Is that possible?
cHao ,Apr 20, 2012 at 4:55
grep { $_ eq $B } @A
? – cHao Apr 20 '12 at 4:55daxim ,Apr 20, 2012 at 7:06
Related: stackoverflow.com/questions/7898499/ stackoverflow.com/questions/3086874/ – daxim Apr 20 '12 at 7:06Nikhil Jain ,Apr 20, 2012 at 5:49
There are many ways to find out whether the element is present in the array or not:
- Using foreach
foreach my $element (@a) { if($element eq $b) { # do something last; } }- Using Grep:
my $found = grep { $_ eq $b } @a;- Using List::Util module
use List::Util qw(first); my $found = first { $_ eq $b } @a;- Using Hash initialised by a Slice
my %check; @check{@a} = (); my $found = exists $check{$b};- Using Hash initialised by map
my %check = map { $_ => 1 } @a; my $found = $check{$b};pilcrow ,May 2, 2012 at 19:56
The List::Util::first() example is (potentially) subtly incorrect when searching for false values, since$found
will also evaluate false. (die unless $found
... oops!) List::MoreUtils::any does the right thing here. – pilcrow May 2 '12 at 19:56yazu ,Apr 20, 2012 at 4:56
use 5.10.1; $B ~~ @A and say '$B in @A';brian d foy ,Apr 20, 2012 at 13:07
You have to be very careful with this because this distributes the match over the elements. If @A has an array reference element that contains $B, this will still match even though $B isn't a top level element of @A. The smart match is fundamentally broken for this and many other reasons. – brian d foy Apr 20 '12 at 13:07obmib ,Apr 20, 2012 at 5:51
use List::AllUtils qw/ any /; print "\@A contains $B" if any { $B eq $_ } @A;bvr ,Apr 20, 2012 at 7:43
I would recommendfirst
in this case, as it does not have to traverse whole array. It can stop when item is found. – bvr Apr 20 '12 at 7:43brian d foy ,Apr 20, 2012 at 13:10
any can stop too because it needs only one element to be true. – brian d foy Apr 20 '12 at 13:10pilcrow ,May 3, 2012 at 1:38
Beware thatfirst
can also return a false value if it finds, e.g., "0", which would confound the example given in this answer.any
has the desired semantics. – pilcrow May 3 '12 at 1:38
One of the advantages of the Perl programming language is its rich lexicon of built-in functions (around 193 by my count). Programming tasks that would take dozens of lines of code in other languages can often be done in a few lines of Perl. However, Perl's many functions won't help you unless you know how and when to use them.
In this tutorial I discuss three functions: grep, map and sort. Grep selects members from a list, map performs transforms on a list and sort sorts a list. Sounds simple? Yes, but you can solve complex problems by using these functions as building blocks. Scroll on ...
- The grep function
- The map function
- The sort function
- The holy grail
- Bibliography
- About the author
- Acknowledgements
Grep
Definition and syntax
Grep vs. loops
Count array elements that match a pattern
Extract unique elements from a list
Extract list elements that occur exactly twice
List text files in the current directory
Select array elements and eliminate duplicates
Select elements from a 2-D array where y > x
Search a simple database for restaurantsMap
Definition and syntax
Map vs. grep vs. foreach
Transform filenames to file sizes
Convert an array to a hash: find the index for an array value
Convert an array to a hash: search for misspelled words
Convert an array to a hash: store selected CGI parameters (map + grep)
Generate a random password
Strip digits from the elements of an array
Print "just another perl hacker"
Transpose a matrix
Find prime numbers: a cautionary tale... ... ...
The grep function
(If you are new to Perl, skip the next two paragraphs and proceed to the "Select lines from a file" example below. Hang loose, you'll pick it up as you go along.)grep BLOCK LIST
grep EXPR, LISTThe grep function evaluates the BLOCK or EXPR for each element of LIST, locally setting the $_ variable equal to each element. BLOCK is one or more Perl statements delimited by curly brackets. LIST is an ordered set of values. EXPR is one or more variables, operators, literals, functions, or subroutine calls. Grep returns a list of those elements for which the EXPR or BLOCK evaluates to TRUE. If there are multiple statements in the BLOCK, the last statement determines whether the BLOCK evaluates to TRUE or FALSE. LIST can be a list or an array. In a scalar context, grep returns the number of times the expression was TRUE.
Avoid modifying $_ in grep's BLOCK or EXPR, as this will modify the elements of LIST. Also, avoid using the list returned by grep as an lvalue, as this will modify the elements of LIST. (An lvalue is a variable on the left side of an assignment statement.) Some Perl hackers may try to exploit these features, but I recommend that you avoid this confusing style of programming
Grep vs. loops
This example prints any lines in the file named myfile that contain the (case-insensitive) strings terrorism or nuclear:open FILE "<myfile" or die "Can't open myfile: $!"; print grep /terrorism|nuclear/i, <FILE>;This code consumes a lot of memory for large files because grep evaluates its second argument in a list context, and when the diamond operator (<>) is evaluated in a list context it returns the entire file. A more memory-efficient way to do the same thing is:while ($line = <FILE>) { if ($line =~ /terrorism|nuclear/i) { print $line } }This example shows that anything grep can do can also be done by a loop. So why use grep? The glib answer is that grep is more Perlish whereas loops are moreC-like. A better answer is that grep makes it obvious that we are selecting elements from a list, and grep is more succinct than a loop. (Software engineers would say that grep has more cohesion than a loop.) Bottom line: if you are not experienced with Perl, go ahead and use loops; as you become familiar with Perl, take advantage of power tools like grep.Count array elements that match a pattern
In a scalar context, grep returns a count of the selected elements.
$num_apple = grep /^apple$/i, @fruits;The ^ and $ metacharacters anchor the regular expression to the beginning and end of the string, respectively, so that grep selects apple but not pineapple.Extract unique elements from a list
@unique = grep { ++$count{$_} < 2 } qw(a b a c d d e f g f h h); print "@unique\n";a b c d e f g hThe $count{$_} is a single element of a Perl hash, which is a list of key-value pairs. (The meaning of "hash" in Perl is related to, but not identical to, the meaning of "hash" in computer science.) The hash keys are the elements of grep's input list; the hash values are running counts of how many times an element has passed through grep's BLOCK. The expression is false for all occurrences of an element except the first.Extract list elements that occur exactly twice
@crops = qw(wheat corn barley rice corn soybean hay alfalfa rice hay beets corn hay); @dupes = grep { $count{$_} == 2 } grep { ++$count{$_} > 1 } @crops; print "@dupes\n";riceThe second argument to grep is "evaluated in a list context" before the first list element is passed to grep's BLOCK or EXPR. This means that the grep on the right completely loads the %count hash before the grep on the left begins evaluating its BLOCK.List text files in the current directory
@files = grep { -f and -T } glob '* .*'; print "@files\n";
The glob function emulates Unix shell filename expansions; an asterisk means "give me all the files in the current directory except those beginning with a period". The -f and -T file test operators return TRUE for plain and text files respectively. Testing with -f and -T is more efficient than testing with only -T because the -T operator is not evaluated if a file fails the less costly -f test.Select array elements and eliminate duplicates
@array = qw(To be or not to be that is the question); print "@array\n"; @found_words = grep { $_ =~ /b|o/i and ++$counts{$_} < 2; } @array; print "@found_words\n";To be or not to be that is the question To be or not to questionThe logical expression $_ =~ /b|o/i uses the match operator to select words that contain o or i (case-insensitive). Putting the match operator test before the hash increment test is slightly more efficient than vice-versa (for this example): if the left-hand expression is FALSE, the right-hand expression is not evaluated.Select elements from a 2-D array where y > x
# An array of references to anonymous arrays @data_points = ( [ 5, 12 ], [ 20, -3 ], [ 2, 2 ], [ 13, 20 ] ); @y_gt_x = grep { $_->[1] > $_->[0] } @data_points; foreach $xy (@y_gt_x) { print "$xy->[0], $xy->[1]\n" }5, 12 13, 20Search a simple database for restaurantsThis example is not a practical way to implement a database, but does illustrate that the only limit to the complexity of grep's block is the amount of virtual memory available to the program.
# @database is array of references to anonymous hashes @database = ( { name => "Wild Ginger", city => "Seattle", cuisine => "Asian Thai Chinese Japanese", expense => 4, music => "\0", meals => "lunch dinner", view => "\0", smoking => "\0", parking => "validated", rating => 4, payment => "MC VISA AMEX", }, # { ... }, etc. ); sub findRestaurants { my ($database, $query) = @_; return grep { $query->{city} ? lc($query->{city}) eq lc($_->{city}) : 1 and $query->{cuisine} ? $_->{cuisine} =~ /$query->{cuisine}/i : 1 and $query->{min_expense} ? $_->{expense} >= $query->{min_expense} : 1 and $query->{max_expense} ? $_->{expense} <= $query->{max_expense} : 1 and $query->{music} ? $_->{music} : 1 and $query->{music_type} ? $_->{music} =~ /$query->{music_type}/i : 1 and $query->{meals} ? $_->{meals} =~ /$query->{meals}/i : 1 and $query->{view} ? $_->{view} : 1 and $query->{smoking} ? $_->{smoking} : 1 and $query->{parking} ? $_->{parking} : 1 and $query->{min_rating} ? $_->{rating} >= $query->{min_rating} : 1 and $query->{max_rating} ? $_->{rating} <= $query->{max_rating} : 1 and $query->{payment} ? $_->{payment} =~ /$query->{payment}/i : 1 } @$database; } %query = ( city => 'Seattle', cuisine => 'Asian|Thai' ); @restaurants = findRestaurants(\@database, \%query); print "$restaurants[0]->{name}\n";Wild GingerThe map function
map BLOCK LIST
map EXPR, LISTThe map function evaluates the BLOCK or EXPR for each element of LIST, locally setting the $_ variable equal to each element. It returns a list of the results of each evaluation. Map evaluates BLOCK or EXPR in a list context. Each element of LIST may produce zero, one, or more elements in the output list.
In a scalar context, map returns the number of elements in the results. In a hash context, the output list (a, b, c, d, ...) is cast into the form ( a => b, c => d, ... ). If the number of output list elements is not even, the last hash element will have an undefined value.
Avoid modifying $_ in map's BLOCK or EXPR, as this will modify the elements of LIST. Also, avoid using the list returned by map as an lvalue, as this will modify the elements of LIST. (An lvalue is a variable on the left side of an assignment statement.) Some Perl hackers may try to exploit these features, but I recommend that you avoid this confusing style of programming.
Map vs. grep vs. foreach
Map can select elements from an array, just like grep. The following two statements are equivalent (EXPR represents a logical expression).@selected = grep EXPR, @input; @selected = map { if (EXPR) { $_ } } @input;
Also, map is just a special case of a foreach statement. The statement:@transformed = map EXPR, @input;
(where EXPR is some expression containing $_) is equivalent to (if @transformed is undefined or empty):foreach (@input) { push @transformed, EXPR; }In general, use grep to select elements from an array and map to transform the elements of an array. Other array processing can be done with one of the loop statements (foreach, for, while, until, do while, do until, redo). Avoid using statements in grep/map blocks that do not affect the grep/map results; moving these "side-effect" statements to a loop makes your code more readable and cohesive.Transform filenames to file sizes
@sizes = map { -s $_ } @file_names;The -s file test operator returns the size of a file and will work on regular files, directories, special files, etc.Convert an array to a hash: find the index for an array value
Instead of searching an array, we can use map to convert the array to a hash and then do a direct lookup by hash key. The code using map is simpler and, if we are doing repeated searches, more efficient.In this example we use map and a hash to find the array index for a particular value:
@teams = qw(Miami Oregon Florida Tennessee Texas Oklahoma Nebraska LSU Colorado Maryland); %rank = map { $teams[$_], $_ + 1 } 0 .. $#teams; print "Colorado: $rank{Colorado}\n"; print "Texas: $rank{Texas} (hook 'em, Horns!)\n";Colorado: 9 Texas: 5 (hook 'em, Horns!)The .. is Perl's range operator and $#teams is the maximum index of the @teams array. When the range operator is bracketed by two numbers, it generates a list of integers for the specified range.When using map to convert an array to a hash, we need to think about how non-unique array elements affect the hash. In the example above, a non-unique team name will make the code print the lowest rank for that team name. Non-unique team names are a data entry error; one way to handle them would be to add a second map to preprocess the array and convert the second and subsequent occurences of a name to a dummy value (and output an error message).
Convert an array to a hash: search for misspelled words
Converting an array to a hash is a fairly common use for map. In this example the values of the hash are irrelevant; we are only checking for the existence of hash keys.%dictionary = map { $_, 1 } qw(cat dog man woman hat glove); @words = qw(dog kat wimen hat man glov); foreach $word (@words) { if (not $dictionary{$word}) { print "Possible misspelled word: $word\n"; } }Possible misspelled word: kat Possible misspelled word: wimen Possible misspelled word: glovThis is more efficient than using the grep function to search the entire dictionary for each word. In contrast to the previous example, duplicate values in the input list do not affect the results.Convert an array to a hash: map is often the most convenient way to create the parameter hash.
use CGI qw(param); %params = map { $_, ( param($_) )[0] } grep { lc($_) ne 'submit' } param();The param() call returns a list of CGI parameter names; the param($_) call returns the CGI parameter value for a name. If the param($_) call returns multiple values for a CGI parameter, the ( param($_) )[0] syntax extracts only the first value so that the hash is still well-defined. Map's block could be modified to issue a warning message for multi-valued parameters.Gener are irrelevant; only the number of elements in LIST affects the result.
@a = (0 .. 9, 'a' .. 'z'); $password = join '', map { $a[int rand @a] } 0 .. 7; print "$password\n"; y2ti3dal(You would have to augment this example for production use, as most computer systems require a letter for the first character of a password.)Strip digits from the elements of an array
As with grep, avoid modifying $_ in map's block or using the returned list value as an lvalue, as this will modify the elements of LIST.# Trashes @array :( @digitless = map { tr/0-9//d; $_ } @array; # Preserves @array :) @digitless = map { ($x = $_) =~ tr/0-9//d; $x; } @array;Print "just another perl hacker" using maximal obfuscation
The chr function in the map block below converts a single number into the corresponding ASCII character. The "() =~ /.../g" decomposes the string of digits into a list of strings, each three digits long.print map( { chr } ('10611711511603209711011111610410111' . '4032112101114108032104097099107101114') =~ /.../g ), "\n";just another perl hackerTranspose a matrixThis works with square or rectangular matrices.
@matrix = ( [1, 2, 3], [4, 5, 6], [7, 8, 9] ); foreach $xyz (@matrix) { print "$xyz->[0] $xyz->[1] $xyz->[2]\n"; } @transposed = map { $x = $_; [ map { $matrix[$_][$x] } 0 .. $#matrix ]; } 0 .. $#{$matrix[0]}; print "\n"; foreach $xyz (@transposed) { print "$xyz->[0] $xyz->[1] $xyz->[2]\n"; }1 2 3 4 5 6 7 8 9 1 4 7 2 5 8 3 6 9Find prime numbers: a cautionary taleLastly, an example of how NOT to use map. Once you become proficient with map, it is tempting to apply it to every problem involving an array or hash. This can lead to unfortunate code like this:
foreach $num (1 .. 1000) { @expr = map { '$_ % ' . $_ . ' &&' } 2 .. int sqrt $num; if (eval "grep { @expr 1 } $num") { print "$num " } }1 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 ...This works, but the code is such an evil mess that it made my dog cry. This code violates rule 1 from the classic work The Elements of Programming Style: Write clearly - don't be too clever.Look at how easy it is to understand a straightforward implementation of the same algorithm:
CANDIDATE: foreach $num (1 .. 1000) { foreach $factor (2 .. int sqrt $num) { unless ($num % $factor) { next CANDIDATE } } print "$num "; }1 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 ...As a bonus, the simple implementation is two orders of magnitude faster than the self-generating code! In general, the simpler your code looks, the more the compiler can optimize it. Of course, a complex implementation of a fast algorithm can trump a simple implementation of a slow algorithm.Now that we have gazed upon the Dark Side, let us return to the path of righteous, simple code ...
Bibliography
GrepCozens, S., Function of the month - grep
Wall, L., et al, Grep function man page
Map
Cozens, S., Function of the month - map
Hall, J. N., Without map you can only grep your way around
Wall, L., et al, Map function man page
... ... ....
SAGE
Welcome to Effective Perl Programming, the column. In this and coming articles I'm going to discuss ways that you can use Perl more effectively, whether by mastering Perl idioms, using Perl modules, or finding new applications for Perl programs. I begin by covering two very powerful but underused language features: Perl's map operator and its cousin, the grep operator. I'll cover the basics quickly, then show you some useful techniques and a few neat tricks as well.
The Basics of map and grep
The map operator creates a transformed copy of a list by evaluating a specified expression or code block for each element in the list. Each time the expression or block is evaluated, $_ contains the value of the current element. The result from the expression or block is appended to the list returned by map. The syntax for map looks like this:
@result = map expression, list
@result = map { code } listRewritten without using map, the effect is like this:
@result = ();
foreach (list) { push @result, expression; }For example:
@times_ten = map $_ * 10, 1..10;
returns the list (10, 20, 30, 40, 50, 60, 70, 80, 90, 100), and
@uppercased = map { ucfirst $_ } qw(george jane judy elroy);
returns the list ('George', 'Jane', 'Judy', 'Elroy'). The transform expression (or block) is evaluated in a list context. It does not have to return a single element it can return two or more or none at all (an empty list). More on that later.
The grep operator resembles the map operator syntactically:
@result = grep expression, list
@result = grep { code } listHowever, unlike the map operator, which constructs a transformed copy of a list, the grep operator selects items from a list. The selection expression (or block) is evaluated for each element of its argument, with $_ set to the current element. If the result is true (anything other than the empty string or the string '0'), a copy of the element is appended to the result from grep. For example:
# Returns (2, 4, 6, 8, 10).
@even = grep { not $_ % 2 } 1..10;# Returns a list of text files in the current directory.
@text_files = grep -T, glob "*";# Classic grep -- imitating Unix grep. Prints lines containing the
# word 'Joseph'.
print grep /\bJoseph\b/, <>;The grep operator has been around for a long time, but the map operator is new in Perl 5 (as much as anything that is four years old can be called "new," anyway). The map operator is more versatile and can do anything that grep can:
# Another way to get a list of text files.
@text_files = map { (-T) ? $_ : () } glob "*";map and grep Idioms
The map operator is obviously useful for simple one-to-one transformations:
# Print out the contents of a hash.
print map "$_: $hash{$_}\n", sort keys %hash;
Be careful, though: this approach creates a lot of temporary structures in memory. For a very large hash it would be more appropriate to use an each loop:
while (($key, $val) = each %hash) { print "$key: $val\n" }
Using map to construct hashes is an important idiom. You can construct existence hashes that are used to test whether a particular value has been seen; in this case, set all the values in the hash to 1 (or some other "true" value). You can also use map to construct hashes where the value is computed from the key. To use map to construct a hash, return two values for each original element the key and its corresponding value.
# Create keys for all the "words" in $text, so that we can test for
# a word later with if $seen{$word}.
%words_seen = map { $_ => 1 } split /\s+/, $text;# After this, $file_size{$file} gives -s $file -- saves time if
# we need to use it more than once.
%file_size = map { $_ => -s } @files;The map operator is handy for "nesting" and "slicing" multidimensional data structures. Using an anonymous array (or hash) constructor inside map creates nested structures. For example, you can blend parallel arrays into a single 2-d structure:
# Blend @x, @y, and @z into a single 2-d array @xyz ... $xyz[0][0]
# is $x[0], $xyz[0][1] is $y[0], and so on.
@xyz = map [$x[$_], $y[$_], $z[$_]], 0..$#x;You can use the same technique to create a hash of arrays:
# Cache the results from stat into a hash of arrays ... then
# $info{'file'}[7] gives the size of 'file', $info{'file'}[5]
# gives the owner's uid, and so on.
%info = map { $_, [ stat $_ ] } @files;Extracting a slice of a nested structure is just as easy. Just use a subscript inside map:
# This will extract @x from @xyz (undoing what we did above) ...
# $x[0] is $xyz[0][0], $x[1] is $xyz[1][0], and so on.
@x = map $_->[0], @xyz;The grep operator isn't as versatile as map, but it is usually the most succinct way to select items from a list. Don't forget that it can be used on complex structures:
# Select elements from @xyz whose "coordinates" are all >0.
# @gt_zero is still a 2-d array with the same organization as @xyz.
@gt_zero = grep {$_->[0] > 0 and $_->[1] > 0 and $_->[2] > 0} @xyz;Cool Tricks with map
You can use map to read several lines of input at a time:
# Read 10 lines from STDIN.
@ten_lines = map scalar(<STDIN>), 1..10;The "Schwartzian Transform" (named after fellow Perl trainer and author Randal L. Schwartz) is a sort surrounded by maps. It is generally preferred over other techniques when the sorting process requires time-intensive key transformations:
# Sort files in descending order of size.
@files_by_size = map { $_->[0] } # 3. slice out the original list, now sorted sort { $b->[1] <=> $a->[1] } # 2. sort the list of tuples map { [$_, -s $_] } # 1. create a list of tuples by nesting @files; # the data to be sorted You can use map for some set operations. Here is an example of using it to find the elements in one hash (%hash1) that are not in another hash (%hash2). Depending on the relative sizes of the hashes involved, this can be more efficient than other methods (like using the delete operator):
# keys %result contains 2 4 6 7 8 9 when this is done.
%hash1 = map { $_, 1 } 1..9; # some sample data
%hash2 = map { $_, 1 } 1, 3, 5; # more sample data
%result = map { $_, $hash1{$_} } grep { not exists $hash2{$_} } keys %hash1;# Another way to do the same thing, with delete.
%result = %hash1;
delete @result{keys %hash2};Because map's transform expression is evaluated in a list context, using map in combination with a pattern match that contains some parentheses can produce unusually succinct code:
# Create a hash of user name vs. user id from lines in /etc/passwd.
open PASSWD, "/etc/passwd" or
die "couldn't open password file: $!\n";
%name_to_id = map /(.*?):.*?:(.*?):/, <PASSWD>;The map operator can even be useful for some string operations:
# Convert a string like 'ABC' into its
# hex equivalent, '\x41\x42\x43'.$hexed = join '', map { sprintf "\\x%x", ord $_ } split //, $str;
# An alternative using s///, which is slightly slower
# for long strings.($hexed = $str) =~ s/(.)/sprintf "\\x%x", ord $1/ge;
That should be enough for now. I hope you've enjoyed this little tour of map and grep. My next column will be something of a change of pace I will introduce object-oriented programming in Perl.
Use the map function to make your code more concise:
One of Perl's less commonly used functions (at least, by novices) is the map function.
Learning to use this function effectively can make you a more efficient programmer by eliminating the need to code many of the mindless iterative loops that occur in the course of any programming project.
How many times do you have an array of values, and you want to do an operation on each of them -- perhaps to print them all out, each indented and on a separate line? The brute force method that most novice Perl weenies use might look something like this:
print "ARGV[] is:\n"; for ($i=0; $i<$#ARGV; $i++) { print " '$ARGV[$i]'\n"; } print "--------\n";This code snippet, saved to file t.1 and run from the command line, results in output like the following:
bash) perl ./t.1 foo bar baz blah ARGV[] is: 'foo' 'bar' 'baz' 'blah' --------Sure, it works, but that's an awful lot of code to do such a simple operation. Plus, if you're running with use strict; (as you should be!), you've got to declare the variable $i. Let's try the same using the map function. Quoth the camel:
map BLOCK LIST map EXPR, LISTThis function evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value composed of the results of each such evaluation.In plainer terms, map takes an array, returns a new array in which each of the elements has changed to whatever is specified in the BLOCK or EXPR.
Here's that same example, written using map :
print "ARGV[] is:\n", map(" '$_'\n", @ARGV), "--------\n";Save this to a file t.2, run it, and you'll see the same results as from the previous example. Notice, there's no need to declare a loop variable, set up a loop, etc. With this trivial operation, the difference is minimal, but as you start doing more complicated operations on the array elements, you'll really see a difference in the compactness of the code you need to write. (Some may find this a code a little dense, but trust me, the more you work with it, the more you'll see it as a clearer way to do it.)
For those of you who want more, notice all of the explicit newlines ("\n") in the above. Let's use the join function to eliminate the need for specifying all of these:
print join "\n", "ARGV[] is:", map(" '$_'", (@ARGV)), "--------", ""; # This one causes a trailing newlineAnd for those of you writing CGI applications, who are familiar with Lincoln Stein's excellent CGI.pm module, consider the following:
use CGI qw/:html/; print div({ALIGN => 'CENTER'}, table({BORDER => 1}, join "\n", caption("ARGV[] List:"), map( { TR(td($_))} @ARGV) ) # End-table ), # End-div ""; # This one causes a trailing newlineWay cool.
Perl's map function can come in handy when you need to simplify potentially repetitive operations, such as capitalizing strings of text. In this article, we'll offer several examples of how you can put map to work. Then, we'll turn our attention to Perl's parsing capabilities with a look at various ways you can parse your program's command line to extract switches or other information.
The power of map
Perl offers many functions that help to simplify and shorten code.Among the more powerful is map, which takes a list, evaluates a specified block or expression on each element, and then returns a list of all the results. Inside the block, map locally assigns $_ as an alias to the current list item.
One of the simplest uses of map is to capitalize an entire array by applying the uc function to each element:
@caps = map uc, @phrases;In the next sample, mapping a regular expression to the array returns the first word of every phrase:
@first_word = map { /(\S+)/ } @phrases;Each element need not necessarily map to a single item. If multiple values are created, map returns them all as a single, flattened list. For example, you could split all words in all phrases into a single list:
@words = map split, @phrases;Still another use for map might be to convert a string to title case. You can do this by splitting a string into individual words, converting each to lowercase and then initial capitalization, and finally joining the words back into a single string:
$title = join ' ', map { ucfirst lc } split / /, $name;Our final example uses map to put the sorted key/value pairs of a hash into a two-column HTML table:
print "<table>\n";
print map {"<tr><td>$_</td><td>$hash{$_}</td></tr>\n"} sort keys %hash;
print "</table>\n";Command-line parsing
When you need to determine the command-line switches passed into a Perl program, you can take various approaches. An easy way of identifying expected Boolean switches is to loop through @ARGV, setting a flag for each option that is encountered:
foreach $arg (@ARGV) {
$a = 1, next if $arg eq '-a';
$b = 1, next if $arg eq '-b';
$c = 1, next if $arg eq '-c';
}Another simple option is to use Perl's -s switch. In this case, Perl will create variables named the same as each switch and then remove them from @ARGV. For example:
perl -s prog.pm -a -b -cWhen prog.pm is executed, the variables $a, $b, and $c are all defined and set to 1. Only the switches listed before any nonswitch argument or "--" will be handled. Therefore, the following may not work as desired:
perl -s prog.pm -a -b 13 -c 6/6/2001Here, $a and $b are set to 1 and @ARGV contains ('13', '-c', '6/6/2001'). To have $b set to 13 and $c set to 6/6/2001, the command line could be entered as:
perl -s prog.pm -a -b=13 -c=6/6/2001A more robust alternative to using -s is to use either Getopt::Std or Getopt::Long. These modules will parse the command line and set global variables (named the same as the switch but prefixed with opt_) for each option. In the following sample, -a is a Boolean option, -b requires that an integer be specified, -c requires a string, and -d accepts an optional string:
use Getopt::Long; GetOptions("a!", "b=i", "c=s", "d:s");The variables set would be $opt_a, $opt_b, $opt_c, and $opt_d, respectively.
Dan Richter daniel.richter at wimba.com
Fri Nov 28 13:01:50 EST 2003
- Previous message: [Courses] [Perl] Nothing this week
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
LinuxChix Perl Course Part 17: grep and map 1) Introduction 2) grep - filters a list 3) map - transforms the values of a list 4) What "grep" and "map" have in common 5) Exercises 6) Answer to Previous Exercise 7) Past Information 8) Credits 9) Licensing ----------------------------------- 1) Introduction Before we finish looking at arrays in Perl, I thought we should take a quick look at two handy Perl functions: "grep" and "map". Both functions are technically operators because they allow you to do magical things that a function can't do, but syntactically they look like functions, so we refer to them as functions here. Let me add that this will be the last e-mail before January. It's that busy-busy-busy time of year and I'm afraid that I have no more time to write about Perl than you have to read about it. ----------------------------------- 2) grep - filters a list The "grep" function returns only the elements of a list that meet a certain condition: @positive_numbers = grep($_ > 0, @numbers); As you can see, each element is refered to as "$_". This (plus the fact that parentheses are optional) allows you write commands that look similar to invocations of the Unix "grep" program: @non_blank_lines = grep /\S/, @lines; In addition, you can specify a code block rather than a single condition: @non_blank_lines = grep { /\S/ } @lines; # Equivalent to the above. Obviously it doesn't matter in this case, but code blocks are helpful when you want a complex filter with multiple lines of code. The result of the code block is the result of the last statement executed: # All positive numbers can be used as exponents, # but negative exponents must be integers. @can_be_used_as_exponent = grep { if ( $_ < 0 ) { ! /\./; # No decimal point -> integer. } else { 1; # Always true. } } @array; ----------------------------------- 3) map - transforms the values of a list The "map" function applies a transformation to each element of a list and returns the result, leaving the original list unchanged (unless you mess it up; more on that in a moment). @lines_with_newlines = map( $_ . "\n", @lines_without_newlines); As with "grep", each value in the list is refered to as "$_". "map" can also take a block of code: # Replace "x at y.z" with "x at y dot z" to confuse spammers. @disguised_addresses = map { my $email = $_; $email =~ s/\@/ at /; $email =~ s/\./ dot /g; $email; } @email_addresses; Note that it's important not to change "$_" because that would change the original "@email_addresses" (and you wouldn't get what you wanted in "@disguised_addresses"). "map" needs not be a one-to-one mapping. For example, in the following code: @words = map m/\b(\w+)\b/g, @lines; # Spaces are for clarity. the regular expression splits a string into a list of words. The "map" function returns the result of joining all the small lists. If a line contains no words, the regular expression will return an empty list, and that's okay. ----------------------------------- 4) What "grep" and "map" have in common "grep" and "map" have a lot in common. They both "magically" take a piece of code (either an expression or a code block) as a parameter. You need to put a comma after an expression but shouldn't put a comma after a code block. Changing "$_" in "grep" or "map" will change the original list. This isn't generally a good idea because it makes the code hard to read. Remember that "map" builds a list of results by evaluating an expression, NOT by setting "$_". A side effect of this fact is that you should not use "s///" with "map". The "s///" operator changes "$_" rather than returning a result, so you won't get what you would expect if you use it with "map" (and you CERTAINLY shouldn't use it with "grep"). ----------------------------------- 5) Exercises a) Write some Perl code that, given a list of numbers, generates a list of square roots of those numbers. (The square root function in Perl is "sqrt".) b) Modify the code to filter out any negative numbers. The result should be as though the negative numbers were never in the original list. c) Write a Perl program that reads two files and outputs only the lines that are common to both of them. ----------------------------------- 6) Answer to Previous Exercise The following program reads the password file and outputs a list of usernames and UIDs, ordered by username: #!/usr/bin/perl -w use strict; open FILE, '< /etc/passwd' or die "Couldn't open file: $!"; my @data = sort(<FILE>); close FILE; my @result; foreach (@data) { my @fields = split(/:/); # Equivalent to split(/:/, $_) push @result, $fields[0] . ' -> ' . $fields[2]; } print join("\n", at result) . "\n"; The above program is a nice review of Perl functions. But of course, There Is More Than One Way To Do It, and we could replace the bottom half with: foreach (@data) { s/^(.*?):.*?:(\d*):.*$/$1 -> $2/; } print join("\n", at result) . "\n"; Or to make the program really short: $_ = join '', @data; s/^(.*?):.*?:(\d*):.*$/$1 -> $2/gm; print; # Prints "$_" ----------------------------------- 7) Past Information Part 16: Array Functions http://linuxchix.org/pipermail/courses/2003-November/001359.html Part 15: More About Lists http://linuxchix.org/pipermail/courses/2003-November/001351.html Part 14: Arrays http://linuxchix.org/pipermail/courses/2003-October/001350.html Part 13: Perl Style http://linuxchix.org/pipermail/courses/2003-October/001349.html Part 12: Side Effects with Perl Variables http://linuxchix.org/pipermail/courses/2003-October/001347.html Part 11: Perl Variables http://linuxchix.org/pipermail/courses/2003-October/001345.html Parts 1-10: see the end of: http://linuxchix.org/pipermail/courses/2003-October/001345.html ----------------------------------- 8) Credits Works cited: a) man perlfunc b) Kirrily Robert, Paul Fenwick and Jacinta Richardson's "Intermediate Perl", which you can find (along with their "Introduction to Perl") at: http://www.perltraining.com.au/notes.html Thanks to Jacinta Richardson for fact checking. ----------------------------------- 9) Licensing This course (i.e., all parts of it) is copyright 2003 by Alice Wood and Dan Richter, and is released under the same license as Perl itself (Artistic License or GPL, your choice). This is the license of choice to make it easy for other people to integrate your Perl code/documentation into their own projects. It is not generally used in projects unrelated to Perl.
The map() functionThe map() function is like a rubber stamp applied to all the elements of a list (see "perldoc -f map" for more information). It consists of two parts: a block or an expression, and a list:
Listing 1. How map() works
# map {BLOCK GOES HERE} LIST; map {$_++} @p; # increment every element of @p # map EXPRESSION, LIST; map -f,@p; # file test every element of @pThe foreach() loop will usually do better than map() in benchmarks, so don't use map() for CPU-intensive calculations unless you test its performance. Do use it when it makes your code more elegant and simple without a loss of efficiency.
There are many neat things that map() can do. First of all, it can modify the array elements as it goes through them by changing the $_ variable. Inside the block or (less commonly) the expression, $_ is the current element of the list. You don't know which element you are looking at -- the whole point is to map one single function to all the elements independently. It is my experience that in about 80% of loops over an array you don't need to know the offset of the current element inside the array. Thus, map() can improve coding efficiency and style by forcing the programmer to think independently of array offsets.
Listing 2. foreach() vs. map()
# "normal" (procedural) way foreach (sort keys %ENV) { print "$_ => $ENV{$_}\n"; } # FP way map { print "$_ => $ENV{$_}\n" } sort keys %ENV;
In Listing 2, you can see how the FP way is not fundamentally different, yet manages to convey a flow of functions from right to left. First the list of keys is obtained, then it is sorted, then the print() function is applied to each element of the sorted key list.
Listing 3. Modifying a list on the fly with map()
# these are the users @users = ('joe', 'ted', 'larry'); # and this is an on-the-fly substitution of user names with hash references map { $_ = { $_ => length } } @users; # @users is now ( { 'joe' => 3 }, # { 'ted' => 3 }, # { 'larry' => 5 } )
Listing 3 demonstrates how the list passed to map() can be completely rewritten. In this case, the array @users contained only user names, but after map() was applied, the array contained hash references with one key-value pair, username => byte length of user name. How about quickly filling in file information?
Listing 4. Modifying a list on the fly with map(), part 2
use File::stat; use Data::Dumper; @files = ('/etc/passwd', '/etc/group', '/etc/fstab', '/etc/vfstab'); print Dumper map { $sb = stat $_; $_ = (defined $sb) ? { name => $_, size =>$sb->size(), mtime => $sb->mtime() } : undef } @files;
In Listing 4 we create a list of files, and then in one statement create a list of hashes with entries for the name, size, and mtime (modification time) of each file. Furthermore, non-existent files get an "undef" reference instead of a hash reference. Finally, Data::Dumper is used to print out a nice view of the entire product list.
Needless to say, code like Listing 4 should be heavily documented for other people's sake. Don't try to cram it all into one line, either. Elegant code is just an ugly duckling without proper formatting and comments.
For anyone who has used UNIX, the grep() function is simple to learn and use. It acts just like the grep utility -- elements that satisfy a test pass through, while everything else gets dropped.
The syntax of grep() is just like map(). A block or an expression can be passed, and $_ is aliased to the current element under examination. It is not a good idea to modify elements of the list passed to grep(). That's what map() is for. Use grep() to grep, and map() to map. The only exception to this rule is if you must create temporary hash fields or array entries while sorting, but make sure you remove them afterwards.
Listing 5. How grep() works
# grep {BLOCK GOES HERE} LIST; grep {$_ > 1} @p; # only accept numbers more than 1 grep {$_++} @p; # please don't do this # grep EXPRESSION, LIST; grep /hi/,@p; # only accept matching elements grep !/hi/,@p; # do not accept matching elements
It can be very convenient to use grep() for quick filtering, but remember that a foreach() loop may be faster under some circumstances. When in doubt, benchmark.
Listing 6. Using grep() to filter out odd numbers
my @list = (1, 2, 3, 'hi'); my @results; # the procedural approach foreach (@list) { push @results, $_ unless $_%2; } # using grep - FP approach @results = grep { !($_ % 2) } @list;
Here is another example. Say we need to look in a directory and retrieve all the file names from it:
Listing 7. Using grep() to get all the filenames in a directory
opendir(DIR, ".") || die "can't opendir: $!"; # get the directory handle my @f = grep { /^[^\.]/ && -f } readdir(DIR); # filter only files into @f
Line 1 of Listing 7 just opens the current directory, or exits the program with the appropriate notice.Line 2 invokes the readdir() function, which returns a list of filenames, and runs a grep() that filters out hidden files (filenames must not begin with a "." character) and non-file objects such as directories.
In two lines we do as much work as four or five lines of a foreach() loop might do. Don't forget to comment such tight code; the short comments shown are not sufficient for production code. Sometimes, grep() is used in scalar context (for instance, to test if Perl interpreters are running [the Proc::ProcessTable module is from CPAN]):
Listing 8. Using grep() to get all the Perl processes running
use Proc::ProcessTable; # get this module from CPAN use strict; my $table = new Proc::ProcessTable; my @procs; if (@procs = grep { defined $_->cmndline && $_->cmndline =~ /^perl/ } @{$table->table}) { print $_->pid, "\n" foreach @procs; } else { print "No Perl interpreters seem to be running.\n"; }
Here, we simultaneously assign the return from grep() to the @procs array, and we test to see if it contained any elements at all. If there were no elements matching the pattern, we print a message to that effect. The same code with a foreach() loop would probably take five or six lines. In case it hasn't been said enough, code like Listing 8 should be commented well enough that someone else could look at it and immediately know the intent and effect of that code. It's no use writing production code if you are the only one who will ever be able to read it.
Sorting with map: the Schwartzian and Guttman-Rosler transforms
The sort() function in Perl is "sort of" procedural, in that it takes a code block or a function name and uses it to sort all elements. The comparison function has to be written as if it were only looking at two elements -- it doesn't know which ones specifically out of the whole list. Like map() and grep(), sort() deals with references to the values being compared, so modifying them would modify the values being compared. Don't do this (for more information on the sort() function, see "perldoc -f sort").
Perl's sorting abilities are remarkably simple to use. In its simplest form, a sort can be done like this:
@new_list = sort @old_list; # sort @old_list alphabetically
The default sort uses simple string comparisons on all the scalars in the list. This is fine if the list contains dictionary words to be sorted alphabetically, but not so great if the list contains numbers. Why? Because "1" comes before "010" in a string comparison, for example. Numbers have to be compared by value, not as strings.
Fortunately, this is easy to do and is in fact a common Perl idiom:
Listing 10. The numeric sort()
@old_list = (1, 2, 5, 200, '010'); @new_list = sort { $a <=> $b } @old_list; # sort @old_list numerically
I quoted 010, because in Perl numbers that begin with 0 are interpreted as octal, so 010 octal would have been 8 decimal. Try it without the quotes and see for yourself. This also demonstrates how Perl scalars are automatically converted to numbers when necessary.
If you run the default sort from Listing 9 on the @old_list in Listing 10, you will see that 200 is before 5, for example. That's what we are trying to avoid.
To reverse the sorted list, you can either apply the reverse() function to the list after it's sorted, or you can change the sorting function. For example, the reverse sort of the one in Listing 10 would have the comparison code block be { $b <=> $a }.
See "perldoc perlop" for more information on the <=> and cmp operators, which are essential to all sorting in Perl. The cmp operators are what's used in the default search in Listing 9 behind the scenes.
Well, fine, so we can sort scalars. That's not enough -- most sorting is done on data structures such as arrays and hashes. Perl supports almost any kind of sorting because of its flexible syntax. For instance, say we need to sort a bunch of hash references, where the 'name' key in the hash is the sorting field. We want a regular alphabetical sorting order, so the cmp operator should be used:
Listing 11. The sort by a hash member
# create a list with two hash references @old_list = ( { name => "joe" }, { name => "moe" } ); # sort @old_list by the value of the 'name' key @new_list = sort { $a->{name} cmp $b->{name} } @old_list;
Now we get into the interesting stuff. What if it's expensive to obtain data from the objects being sorted? Say we need to apply the split() function to a string every time we need to obtain the value to sort by. It would be computationally expensive to run a split() every time the comparison value is needed, and your co-workers would laugh at you. You could build a temporary list of the comparison values, but that's not so easy to do and can easily introduce bugs. You are probably better off using the Schwartzian transform, named after Randal Schwartz.The Schwartzian transform looks like this:
Listing 12. The Schwartzian transform
# create a list with some strings @old_list = ( '5 eagles', '10 hawks', '2 bulls', '8 cows'); # sort @old_list by the first word in each string, numerically @new_list = map($_->[1], sort( { $a->[0] <=> $b->[0] } map( [ (split)[0], $_ ], @old_list)));
Look at it from right to left. First, we apply a map() to the @old_list list to create a new temporary list. Remember that map() can transform a list. In this case, we rewrite @old_list to contain an array consisting of the first value from a split() of the string (this is the comparison value) and the string itself, for each string in @old_list. The result is a new list; no changes are made to @old_list.
Next, we sort by the comparison value (first element of the array elements in @old_list). Note that @old_list is not actually modified in this whole process.
Then, we perform another map() on the sorted list to reduce it back to just strings by mapping only the second array element into the current variable. $_->[1] means "rewrite $_ to be the value stored in the second object in the list that $_ refers to."
Right about now your eyes are probably glazed from looking at the Schwartzian transform. It really does look frightening at first, but deep down inside it's just a little pussycat. If you are still unclear on it, see the Resources below.
The Guttman-Rosler transform is fascinating in its own right, but discussion of it will only make your eyes glaze further. Look at the Resources for a paper on the GRT, which explains it best. The paper is a very nice introduction to sorting in general. I recommend taking a look at that paper if you do not have at least a little bit of background knowledge on sorting algorithms. The theory behind sorting is extremely useful to a programmer, and understanding O(n) notation can be an invaluable tool not just for sorting, but also for profiling, debugging, and writing good code.
When to use FPOnce again, I will say this: know your tools. Functional programming is an excellent tool, as we have seen so far. It can simplify some pretty hairy problems and make others a little easier. So when should you use FP?
- First of all, remember that in Perl, FP is only an approach. The actual solution will be procedural, even though it simulates a functional solution. The question is not when FP should be used, but how much it should be used, from "not at all" to "as much as possible."
- Any time you need to do complex sorting, see if the Schwartzian transform or the Guttman-Rosler transform are appropriate. They are drop-in functional replacements for regular sorting.
- If your functions are chained often, consider FP. For example, modification of a list in steps by several functions can probably be accomplished with an FP approach.
- If you have a lot of temporary variables that are thrown away as soon as they are used, consider FP to decrease their number.
- Filtering, sorting, and general transforming functions applied to lists or hashes are candidates for FP.
- If your functions have a lot of side effects, and their parameters are more than a few, FP is probably not going to work too well.
- Recursive algorithms can go either way with regard to FP. They are not clearly better or worse when done with the FP approach.
- Avoid FP if performance is very important. Use the Benchmark module to check your approach -- sometimes FP will speed things up considerably (for example, the Schwartzian transform is significantly faster because of its cache of comparison values), but sometimes it will cause the performance of the code to drop significantly.
- One-liners work well with FP.
- Obfuscated Perl code has always favored grep() and map() as ways to obscure the actions of code. Unless you are writing obfuscated Perl code for a contest, don't use grep() and map() without at least some commenting.
- Learn, practice, and use FP in your daily programming work. You will gain insight into all of your other code, see new ways ahead, and make life easier. Don't use FP just because it's there, but do use it because it works well for your specific problem.
- Write a code snippet that uses map() to transform a list of user names into user IDs.
- Write a program that uses (1) to look up user IDs. Allow filtering of the user list by partial names.
- Write code to check if any processes owned by root are running (on a UNIX system).
- Benchmark the Schwartzian transform versus your own sorting code versus the Guttman-Rosler transform on a small data set. Use the Benchmark module to do this.
- Do (4) on a large data set. For instance, sort all the file names on your system by size. Look at the File::Find module for ideas.
- Read about Erlang, Scheme, and Haskell in the comp.lang.functional FAQ. Look at other FP languages, and see if any of them have neat ideas that you can use in Perl.
- Write a Perl version of the grep program. Did you think of the grep() function right away? You shouldn't use grep() in this case because you may have to process a large file, and there's no sense in keeping the contents of the whole file in memory just to run grep() on them. Think of a better solution.
Nov 16, 2017 | stackoverflow.com
Grep Two Dimensional Array Ask Question up vote down vote favorite
Taranasaur ,yesterday
Since this is not a question directly covered here, thought best I ask and answer it.I had an issue where I wanted to add a node name to a list only if the same node doesn't already exist. The array was built using:
push (@fin_nodes, [$node, $hindex, $e->{$hip}->{FREQ}]);So given when given array (@fin_nodes) that looks like:
$VAR1 = [ 'first-node', '4', 3 ]; $VAR2 = [ 'second-node', '1', 3 ]; $VAR3 = [ 'another-node', '1', 5 ]; $VAR4 = [ 'some-node', '0', 5 ];To do a grep on this the following works:
my @match = grep { grep { $_ =~ $node } @$_ } @fin_nodes;So given a $node "second-node" the above statement will return @match as:
$VAR1 = [ 'second-node', '1', 3 ];Sobrique ,yesterday
Why not use a hash instead? – Sobrique yesterdayysth ,yesterday
when dumping an array, do Data::Dumper::Dumper(\@array), not ...(@array). if passed a list, Dumper dumps each element individually, which is not what you want here – ysth yesterday,
I would say "don't" and instead:my %fin_nodes; $fin_nodes{$node} = [$hindex, $e->{$hip}->{FREQ}]);And then you can simply
if ($fin_nodes{$node}) {
Failing that though - you don't need to grep every element, as your node name is always first.
So:
my @matches = grep { $_ -> [0] eq $node } @fin_nodes;
eq
is probably a better choice than=~
here, because the latter will substring match. (And worse, can potentially do some quite unexpected things if you've metacharacters in there, since you're not quoting or escaping them)E.g. in your example - if you look for a node called
"node"
you'll get multiple hits.Note - if you're only looking for one match, you can do something like:
my ( $first_match ) = grep { $_ -> [0] eq $node } @fin_nodes;This will just get you the first result, and the rest will be discarded. (Which isn't too efficient, because
grep
will continue to iterate the whole list).Taranasaur ,yesterday
Your last statement was on point, I only needed one match. Then before pushing a node onto fin_nodes this was enough: "if (!$first_match)" – Taranasaur yesterdayBorodin ,yesterday
@Taranasaur: I think you missed the point of Sobrique's answer. A hash is by far the better choice for this, and you can simply write$fin_nodes{$node} //= [ $hindex, $e->{$hip}{FREQ} ]
and avoid the need for any explicit test altogether. – Borodin yesterdayTaranasaur ,yesterday
@Borodin, no I do get Sobrique's point. The fin_nodes array is being used for a simple list function that another method is already using quite happily in my program. I will at some point go back and create a hash as there might be more attributes I'll need to include in that array/hash – Taranasaur yesterdayysth ,yesterday
"because the latter will substring match" assuming no regex metacharacters; if there are any, it will be even worse – ysth yesterdaySobrique ,yesterday
Good point @ysth I will add that. – Sobrique yesterday
Google matched content |
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: August 17, 2020