This is the central page of the Softpanorama WEB site because I am strongly convinced that the development of scripting languages,
not the replication of the efforts of BSD group undertaken by Stallman and Torvalds is the central part of open source. See Scripting languages as VHLL for more details.
Ordinarily technology changes fast. But programming languages are different: programming languages are not just technology, but
what programmers think in.
They're half technology and half religion. And so the median language, meaning whatever language the median programmer
uses, moves as slow as an iceberg.
A fruitful way to think about language development is to consider it a to be special type of theory building. Peter Naur suggested
that programming in general is theory building activity in his 1985 paper "Programming as Theory Building". But idea is
especially applicable to compilers and interpreters. What Peter Naur failed to understand was that design of programming languages has
religious overtones and sometimes represent an activity, which is pretty close to the process of creating a new, obscure cult ;-). Clueless
academics publishing junk papers at obscure conferences are high priests of the church of programming languages. Some, like Niklaus Wirth
and Edsger W. Dijkstra, (temporary) reached the status close to (false) prophets :-).
On a deep conceptual level building of a new language is a human way of solving complex problems. That means that complier construction
in probably the most underappreciated paradigm of programming of large systems. Much more so then greatly oversold object-oriented programming.
OO benefits are greatly overstated.
For users, programming languages distinctly have religious aspects, so decisions about what language to use are often far from being
rational and are mainly cultural. Indoctrination at the university plays a very important role. Recently they were instrumental
in making Java a new Cobol.
The second important observation about programming languages is that language per se is just a tiny part of what can be called language
programming environment. The latter includes libraries, IDE, books, level of adoption at universities, popular, important applications
written in the language, level of support and key players that support the language on major platforms such as Windows and Linux and
other similar things.
A mediocre language with good programming environment can give a run for the money to similar superior in design languages that are
just naked. This is a story behind success of Java and PHP. Critical application is also very important and this is
a story of success of PHP which is nothing but a bastardatized derivative of Perl (with all the most interesting Perl features surgically
removed ;-) adapted to creation of dynamic web sites using so called LAMP stack.
Progress in programming languages has been very uneven and contain several setbacks. Currently this progress is mainly limited to
development of so called scripting languages. Traditional high level languages field is stagnant
for many decades. From 2000 to 2017 we observed the huge sucess of Javascript; Python encroached in Perl territory (including
genomics/bioinformatics) and R in turn start squeezing Python in several areas. At the same time Ruby despite initial success
remained niche language. PHP still holds its own in web-site design.
At the same time there are some mysterious, unanswered question about factors that help the particular scripting language to
increase its user base, or fail in popularity. Among them:
Why new programming languages repeat old mistakes? Does this happens because complexity of languages is already
too high, or because language designers are unable to learn from "old masters" and just step on the same rake out of lack of
knowledge of language development history.
Why starting from approximately 1990 the progress in language design is almost absent and the most popular languages created
after 1990 such as Java and PHP are at best mediocre and constitute a (huge) step back from the state of the art of language
design?
Why fashion rules and fashionable languages (OO-based) gain momentum and support despite their (obvious) flaws.
Why "worse is better" approach is so successful, why less powerful and less elegant languages can make it to mainstream
and stay here ? In this about overcomplexity. For example very few people know "full Perl". Most including me know some subset.
How complexity of the language inhibit it wide usage. The story of PHP (simple BASIC-style language inferiors to almost
any other scripting language developed after 1990) eliminating Perl as a CGI scripting language is an interesting and pretty fascinating
story. Success of Pascal (which is a bastardatized version of Algol, but contained several innovative ideas in its compiler -- the
portability and the idea of fast one pass translation instead of linking. High speed of compilation was achieved by adopting
compiler-friendly language design (compiler used recursive descent). Pascal was the first language after Basic which success was
clearly related to the fact that
it was used at universities as the first programming language. Also the compiler from Pascal was open source. Now the same situation repeats
with Java. As universities teach it, it became dominant language despite the fact that its designers never learn important lessons
from the history of C and C++.
Why the success at universities is such an extremely important factor in the language adoption? The most recent two
cases are Java and Python. Both are far form innovative languages although Python being a derivative of Perl is much better in
this respect then Java. It looks to me that
suitability for entry level courses is a factor in language design that is now as important that anything else. In this
sense both Java and Python stories replicated
the success of Pascal as it now is often used for first programming language courses at universities. Now R is displacing Python
in certain areas partially due to the fact that it is used instead for corresponding courses.
The role of "language hype". Now real qualities of the language are less important that hype around it and
being "fashionable" language creates huge advantages and bring additional resources for development. Java, a pretty mediocre
language in most respects, was hyped to the sky and now only gradually gives way to better languages. It essentially locked the
area of commercial software development despite obvious problems with the language (one step forward, two steps back) and,
especially with its implementation.
The quality of the language design no longer the decisive factor in the success of the language. Success of PHP
which we can view it similar to the success of Basic has shown that "the language design does not matter". The quality of PHP
design it really bad even for amateur level: they repeat all the errors in language design that were known since 60th.
Some lexical level and syntax level language features are so error prone that inclusion of them into language create huge
problems for programmers, no matter what is the level of experience with the language. On lexical level the requirement
of ending each statement with a delimiter (typically a semicolon) is one such feature. Using non labeled closing delimiters
created unclosed "}" problem in C-style languages (but Python "solution" of using whitespace for nesting proved to be problematic
too). Mixing "==" and "=" in comparisons in C style languages is yet another. On syntax level mistyped identifiers is a chronic
problem and the requirement of using each name at least twice ("declare before usage") is a reasonable compromise in view
of the number of errors it prevents. Also ability to close of all blocks with one delimiter is a useful tool for prevention
of unmatched'}' type of errors. The requirement that subroutines declarations should be on nesting level 0 is also a
reasonable one. Another idea that prevents a lot of errors is the idea of providing template for subroutine
arguments. Perl-style "contextual conversion" of variables from type to another proved to be a bad idea. Some more limited forms
are better. And Unix shell (borrowed by Perl) idea of using different set of conditional operators for comparing strings
and numeric values proved to be horrible, extremely error prone idea.
The quality of the scripting language IDE proved to be comparable in importance with the quality of the scripting
language itself. Languages that have high quality IDE (Python) or which ship with such an IDE (like R) have an
edge over languages. One problem with Perl is lack of a good IDE, although Komodo is not bad.
Those are difficult questions to answer without some way of classifying languages into different categories. Several such classifications
exists. First of all like with natural languages, the number of people who speak a given language is a tremendous force that can overcome
any real of perceived deficiencies of the language. In programming languages, like in natural languages nothing succeed like success.
The second interesting category is number of applications written in particular language that became part of Linux or, at least, are
including in standard RHEL/FEDORA/CENTOS or Debian/Ubuntu repository.
The third relevant category is the number and quality of books for the particular language.
History of programming languages raises interesting general questions about the limit of complexity of programming languages. There
is strong historical evidence that a language with simpler core, or even simplistic core Basic, Pascal) have better chances to acquire
high level of popularity.
The underlying fact here probably is that most programmers are at best mediocre and such programmers tend on intuitive level to avoid
more complex, more rich languages and prefer, say, Pascal to PL/1 and PHP to Perl. Or at least avoid it on a particular phase of language
development (C++ is not simpler language then PL/1, but was widely adopted because of the progress of hardware, availability of compilers
and not the least, because it was associated with OO exactly at the time OO became a mainstream fashion).
Complex non-orthogonal languages can succeed only as a result of a long period of language development (which usually adds complexly
-- just compare Fortran IV with Fortran 99; or PHP 3 with PHP 5 ) from a smaller core. Attempts to ride some fashionable new trend extending
existing popular language to this new "paradigm" also proved to be relatively successful (OO programming in case of C++, which is a superset of C).
Historically, few complex languages were successful (PL/1, Ada, Perl, C++), but even if they were successful, their success typically
was temporary rather then permanent (PL/1, Ada, Perl). As Professor Wilkes noted (iee90):
Things move slowly in the computer language field but, over a sufficiently long period of time, it is possible to discern trends.
In the 1970s, there was a vogue among system programmers for BCPL, a typeless language. This has now run its course, and system programmers
appreciate some typing support. At the same time, they like a language with low level features that enable them to do things
their way, rather than the compiler�s way, when they want to.
They continue, to have a strong preference for a lean language. At present they tend to favor C in its various versions.
For applications in which flexibility is important, Lisp may be said to have gained strength as a popular programming language.
Further progress is necessary in the direction of achieving modularity. No language has so far emerged which exploits objects
in a fully satisfactory manner, although C++ goes a long way. ADA was progressive in this respect, but unfortunately it is
in the process of collapsing under its own great weight.
ADA is an example of what can happen when an official attempt is made to orchestrate technical advances. After the experience
with PL/1 and ALGOL 68, it should have been clear that the future did not lie with massively large languages.
I would direct the reader�s attention to Modula-3, a modest attempt to build on the appeal and success of Pascal and Modula-2
[12].
Complexity of the compiler/interpreter also matter as it affects portability: this is one thing that probably doomed
PL/1 (and later Ada), although those days a new language typically come with open source compiler (or
in case of scripting languages, an interpreter) and this is less of a problem.
Programming language design seeks power in simplicity and, when successful, begets beauty.
Choosing the trade-offs among contradictory requirements is a difficult task that requires good taste from the language designer
as much as mastery of theoretical principles and of practical implementation matters. Programming language design is software-engineering-complete.
D is a language that attempts to consistently do the right thing within the constraints it chose: system-level access to computing
resources, high performance, and syntactic similarity with C-derived languages. In trying to do the right thing, D sometimes stays
with tradition and does what other languages do, and other times it breaks tradition with a fresh, innovative solution. On occasion
that meant revisiting the very constraints that D ostensibly embraced. For example, large program fragments or indeed entire programs
can be written in a well-defined memory-safe subset of D, which entails giving away a small amount of system-level access for a large
gain in program debuggability.
You may be interested in D if the following values are important to you:
Performance. D is a systems programming language. It has a memory model that, although highly structured, is compatible
with C�s and can call into and be called from C functions without any intervening translation.
Expressiveness. D is not a small, minimalistic language, but it does have a high power-to-weight ratio. You can define
eloquent, self-explanatory designs in D that model intricate realities accurately.
�Torque.� Any backyard hot-rodder would tell you that power isn�t everything; its availability is. Some languages are
most powerful for small programs, whereas other languages justify their syntactic overhead only past a certain size. D helps you
get work done in short scripts and large programs alike, and it isn�t unusual for a large program to grow organically from a simple
single-file script.
Concurrency. D�s approach to concurrency is a definite departure from the languages it resembles, mirroring the departure
of modern hardware designs from the architectures of yesteryear. D breaks away from the curse of implicit memory sharing (though
it allows statically checked explicit sharing) and fosters mostly independent threads that communicate with one another via
messages.
Generic code. Generic code that manipulates other code has been pioneered by the powerful Lisp macros and continued
by C++ templates, Java generics, and similar features in various other languages. D offers extremely powerful generic and generational
mechanisms.
Eclecticism. D recognizes that different programming paradigms are advantageous for different design challenges and
fosters a highly integrated federation of styles instead of One True Approach.
�These are my principles. If you don�t like them, I�ve got others.� D tries to observe solid principles of language
design. At times, these run into considerations of implementation difficulty, usability difficulties, and above all human
nature that doesn�t always find blind consistency sensible and intuitive. In such cases, all languages must make judgment
calls that are ultimately subjective and are about balance, flexibility, and good taste more than anything else. In my opinion,
at least, D compares very favorably with other languages that inevitably have had to make similar decisions.
At the initial, the most difficult stage of language development the language should solve an important problem that was inadequately
solved by currently popular languages. But at the same time the language has few chances to succeed unless it perfectly fits into
the current software fashion. This "fashion factor" is probably as important as several other factors combined. With the notable exclusion
of "language sponsor" factor. The latter can make or break the language.
Like in woman dress fashion rules in language design. And with time this trend became more and more pronounced.
A new language should simultaneously represent the current fashionable trend. For example OO-programming was a visit card into
the world of "big, successful languages" since probably early 90th (C++, Java, Python). Before that "structured programming" and
"verification" (Pascal, Modula) played similar role.
PL/1, Java, C#, Ada, Python are languages that had powerful sponsors. Pascal, Basic, Forth, partially Perl (O'Reilly was
a sponsor for a short period of time) are examples of the languages that had no such sponsor during the initial period of development.
C and C++ are somewhere in between.
But language itself is not enough. Any language now need a "programming environment" which consists of a set of libraries,
debugger and other tools (make tool, lint, pretty-printer, etc). The set of standard" libraries and debugger are probably two most important
elements. They cost lot of time (or money) to develop and here the role of powerful sponsor is difficult to underestimate.
While this is not the necessary condition for becoming popular, it really helps: other things equal the weight of the sponsor of
the language does matter. For example Java, being a weak, inconsistent language (C-- with garbage collection and OO) was pushed through
the throat on the strength of marketing and huge amount of money spend on creating Java programming environment.
The same was partially true for C# and Python. That's why Python, despite its "non-Unix" origin is more viable scripting language
now then, say, Perl (which is better integrated with Unix and has pretty innovative for scripting languages support of pointers and
regular expressions), or Ruby (which has support of coroutines from day 1, not as "bolted on" feature like in Python).
Like in political campaigns, negative advertizing also matter. For example Perl suffered greatly from blackmail comparing programs
in it with "white noise". And then from withdrawal of O'Reilly from the role of sponsor of the language (although it continue to milk
that Perl book publishing franchise ;-)
People proved to be pretty gullible and in this sense language marketing is not that different from woman clothing marketing :-)
One very important classification of programming languages is based on so called the level of the language. Essentially
after there is at least one language that is successful on a given level, the success of other languages on the same level became more
problematic. Higher chances for success are for languages that have even slightly higher, but still higher level then successful predecessors.
The level of the language informally can be described as the number of statements (or, more correctly, the number of lexical
units (tokens)) needed to write a solution of a particular problem in one language versus another. This way we can distinguish several
levels of programming languages:
Lowest levels. This level is occupied by assemblers and languages designed fro specific instruction sets like
PL\360.
Low level with access to low level architecture features (C, BCPL). They are also called system programming
languages and are, in essence, a high-level assembler). In those languages you need specify details related to the machine
organization (computer instruction set); memory is allocated explicitly.
High level without automatic memory allocation for variables and garbage collection (Fortran, Algol style languages
like Modula, Pascal, PL/1, C++, VB. Most of languages in this category are compiled.
High level with automatic memory allocation for variables and garbage collection. Languages of this category
(Java, C#) typically are compiled not to the native instruction set of the computer they need to run, but to some abstract
instruction set called virtual machine.
Very high level languages (scripting languages, as well as Icon, SETL, and awk). Most are impossible to compile as dynamic
features prevent generation of code at compile time. they also typically use a virtual machine and garbage collection.
OS shells. They also are often called "glue" languages as they provide integration of existing OS utilities. Those language
currently represent the highest level of languages available. This category is mainly represented by Unix shells such as bash and
ksh93, but Windows PowerShell belongs to the same category. They typically use virtual machine and intermediate code like scripting
languages. They presuppose a specific OS as a programming environment and as such are less portable then other categories.
Some people distinguish between "nanny languages" and "sharp razor" languages. The latter do not attempt to protect user from his
errors while the former usually go too far... Right compromise is extremely difficult to find.
For example, I consider the explicit availability of pointers as an important feature of the language that greatly increases its
expressive power and far outweighs risks of errors in hands of unskilled practitioners. In other words attempts to make the language
"safer" often misfire.
Another useful typology is based in expressive style of the language:
Procedural. The programming style you're probably used to, procedural languages execute a sequence of statements that
lead to a result. In essence, a procedural language expresses the procedure to be followed to solve a problem. Procedural languages
typically use many variables and have heavy use of loops and other elements of "state", which distinguishes them from functional
programming languages. Functions in procedural languages may modify variables or have other side effects (e.g., printing out information)
other than the value that the function returns.
Functional. Employing a programming style often contrasted with procedural programming, functional programs typically
make little use of stored state, often eschewing loops in favor of recursive functions. The most popular functional language and
the most successful one (most of functional languages are failures, despite interesting features that are present) is probably regular
expressions notation. Another very successful non-procedural language notation are Unix pipe notation. All-in-all functional languages
have a lot of problems and none of them managed to get into mainstream. All the talk about superiority of Lisp remained the talk,
as Lisp limits the expressive power of programmer by overloading the board on one side.
Object-oriented. This is a popular subclass on procedural languages with a better handling of namespaces (hierarchical
structuring on namespace that reminds Unix file system) and couple of other conveniences in defining multiple entry functions (class
methods in OO-speak). Classes strictly speaking are evolution of records introduced by Simula. The main difference with Cobol and
PL/1 style of records is that classes have executable components (pointers to functions) and are hierarchically organized with subclasses
being lower level sub-records, that is still accessible for any name space with higher level class. A pure hierarchically organized
structures were introduced in Cobol. Later PL/1 extended and refined them introducing name-space copy (like attribute),
pointer base (based -records), etc. C being mostly a subset of PL/1 also used some of those refinements but in a very limited way.
In a way PL/1 record is a non-inherited class without any methods. Some languages like Perl 5 implement "nuts and bolts" approach
to the introduction of OO constructs, exposing the kitchen. As such those implementation is highly educational for students as they
can see how "object-oriented" kitchen operates. For example, the type of the class in Perl 5 is implemented as a hidden first parameter
that is passed with each procedure call "behind the sc�ne".
Scripting languages are typically procedural but may contain non-procedural elements (regular expressions) as well as
elements of object-oriented languages (Python, Ruby). Some of them support coroutines. They fall into their own category because
they are higher level languages then compiled language or languages with an abstract machine and garbage collection (Java). Scripting
languages usually implement automatic garbage collection. Variables type in scripting languages is typically dynamic, declarations
of variables are not strictly needed (but can be used) and they usually do not have compile-time type checking of type compatibility
of operands in classic operations. Some like Perl try to convert the variable into the type required by particular operation (for
example string into numeric constant, if "+" operation is used). Possible errors are "swiped under the carpet." Uninitialized variables
typically are hanged as having the value zero in numeric operations and null string in string operations. In case operation can't
be performed it returns zero, nil or some other special value. Some scripting language have a special value of UNDEF which
gives the possibility to determine whether particular variable was assigned any value before using it in expression.
Logic. Logic programming languages allow programmers to make declarative statements (possibly in first-order logic: "grass
implies green" for example). The most successful was probably Prolog. In a way this is another
type of functional languages and Prolog can be viewed as the regular expressions language on steroids. The success of this type of languages was/is
very limited. Prolog was used in IBM Tivoli TEC monitoring system and proved to be a failure - it does not match skills sysadmins
who managed TEC and rules used were mostly default.
Those categories are not pure and somewhat overlap. For example, it's possible to program in an object-oriented style in C, or even
assembler. Some scripting languages like Perl have built-in regular expressions engines that are a part of the language so they have
functional component despite being procedural. Some relatively low level languages (Algol-style languages) implement garbage collection.
A good example is Java. There are scripting languages that compile into common language framework which was designed for high level
languages. For example, Iron Python compiles into .Net.
Popularity of the programming languages is not strongly connected to their quality. Some languages that look like a collection of
language designer blunders (PHP, Java ) became quite popular. Java became a new Cobol and PHP dominates dynamic Web sites construction.
The dominant technology for such Web sites is often called LAMP, which means Linux - Apache -MySQL- PHP. Being a highly simplified but
badly constructed subset of Perl ( kind of new Basic for dynamic Web sites) PHP provides the most depressing experience.
I was unpleasantly surprised when I had learnt that the Wikipedia engine was rewritten in PHP from Perl some time ago, but this fact
quite illustrates the trend. The number of mediocre programmer outweigh the number of talented programmers by factor of
100 or higher.
So language design quality has little to do with the language success in the marketplace. Simpler languages have more wide appeal
as success of PHP (which at the beginning was at the expense of Perl) suggests. In addition much depends whether the language has powerful
sponsor like was the case with Java (Sun and IBM), PHO (Facebook). This is partially true for Python (Google) but it was after the
designers of the language spend many years fighting for survival.
Progress in programming languages has been very uneven and contain several setbacks like Java and PHP (and partially C++). Currently this progress is usually
associated with scripting languages. History of programming languages raises interesting general questions about "laws" of programming
language design. First let's reproduce several notable quotes:
Knuth law of optimization: "Premature optimization is the root of all evil (or at least most of it) in programming."
- Donald Knuth
"Greenspun's Tenth Rule of Programming: any sufficiently complicated C or Fortran program contains an ad hoc informally-specified
bug-ridden slow implementation of half of Common Lisp." - Phil Greenspun
"The key to performance is elegance, not battalions of special cases."- Jon Bentley and Doug McIlroy
"Some may say Ruby is a bad rip-off of Lisp or Smalltalk, and I admit that. But it is nicer to ordinary people." - Matz, LL2
Most papers in computer science describe how their author learned what someone else already knew.- Peter Landin
"The only way to learn a new programming language is by writing programs in it." - Kernighan and Ritchie
"If I had a nickel for every time I've written "for (i = 0; i < N; i++)" in C, I'd be a millionaire." - Mike Vanier
"Language designers are not intellectuals. They're not as interested in thinking as you might hope. They just want to get a language
done and start using it." - Dave Moon
"Don't worry about what anybody else is going to do. The best way to predict the future is to invent it." - Alan Kay
"Programs must be written for people to read, and only incidentally for machines to execute." - Abelson & Sussman, SICP, preface
to the first edition
Please note that one thing is to read language manual and appreciate how good the concepts are, and another to bet your project on
a new, unproved language without good debuggers, manuals and, what is very important, libraries. Debugger is very important but standard
libraries are crucial: they represent a factor that makes or breaks new languages.
In this sense languages are much like cars. For many people car is the thing that they use get to work and shopping mall and they
are not very interesting is engine inline or V-type and the use of fuzzy logic in the transmission. What they care is safety, reliability,
mileage, insurance and the size of trunk. In this sense "Worse is better" is very true. I already mentioned the importance of the debugger.
The other important criteria is quality and availability of libraries. Actually libraries are what make 80% of the usability of the
language, moreover in a sense libraries are more important than the language...
A popular belief that scripting is "unsafe" or "second rate" or "prototype" solution is completely wrong. If a project had died than
it does not matter what was the implementation language, so for any successful project and tough schedules scripting language (especially
in dual scripting language+C combination, for example TCL+C) is an optimal blend that for a large class of tasks. Such an approach
helps to separate architectural decisions from implementation details much better that any OO model does.
Moreover even for tasks that handle a fair amount of computations and data (computationally intensive tasks) such languages as Python
and Perl are often (but not always !) competitive with C++, C# and, especially, Java.
The second important observation about programming
languages is that language per se is just a tiny part of what can be called language programming environment. the latter includes libraries,
IDE, books, level of adoption at universities, popular, important applications written in the language, level of support and key players
that support the language on major platforms such as Windows and Linux and other similar things. A mediocre language with good programming
environment can give a run for the money to similar superior in design languages that are just naked. This is a story behind success
of Java. Critical application is also very important and this is a story of success of PHP which is nothing but a bastardatized derivative
of Perl (with all most interesting Perl features removed ;-) adapted to creation of dynamic web sites using so called LAMP stack.
History of programming languages raises interesting general questions about the limit of complexity of programming languages. There
is strong historical evidence that languages with simpler core, or even simplistic core has more chanced to acquire high level of popularity.
The underlying fact here probably is that most programmers are at best mediocre and such programmer tend on intuitive level to avoid
more complex, more rich languages like, say, PL/1 and Perl. Or at least avoid it on a particular phase of language development (C++
is not simpler language then PL/1, but was widely adopted because OO became a fashion). Complex non-orthogonal languages can succeed
only as a result on long period of language development from a smaller core or with the banner of some fashionable new trend (OO programming
in case of C++).
Konrad Zuse , a German engineer working alone while hiding out in the Bavarian Alps, develops Plankalkul. He applies
the language to, among other things, chess.
1949
Short Code , the first computer language actually used on an electronic computing device, appears. It is, however,
a "hand-compiled" language.
Fifties
1951
Grace Hopper , working for Remington
Rand, begins design work on the first widely known compiler, named A-0. When the language is released by Rand in 1957, it is called
MATH-MATIC.
1952
Alick E. Glennie , in his spare time at the University of Manchester, devises a programming system called AUTOCODE,
a rudimentary compiler.
1957
FORTRAN --mathematical FORmula TRANslating system--appears. Heading the team is John Backus, who goes on to contribute
to the development of ALGOL and the well-known syntax-specification system known as BNF.
1958
FORTRAN II appears, able to handle subroutines and links to assembly language.
LISP. John McCarthy at M.I.T. begins work on LISP--LISt Processing.
Algol-58. The original specification for ALGOL appears. The specification does not describe how data will be input
or output; that is left to the individual implementations.
1959
LISP 1.5 appears.
COBOL is created by the Conference on Data Systems and Languages (CODASYL).
Sixties
1960
ALGOL 60 , the specification for Algol-60, the first block-structured language, appears. This is the root of the family
tree that will ultimately produce the likes of Pascal. ALGOL goes on to become the most popular language in Europe in the mid-
to late-1960s. Compilers for the language were quite difficult to write and that hampered it widespread use. FORTRAN managed to
hold its own in the area of numeric computations and Cobol in data processing. Only PL/1 (which was released in 1964) managed
to advance ideas of Algol 60 to reasonably wide audience.
APL Sometime in the early 1960s , Kenneth Iverson begins work on the language that will become APL--A Programming Language.
It uses a specialized character set that, for proper use, requires APL-compatible I/O devices.
Discovery of context free languages formalism. The 1960's also saw the rise of automata theory and the theory of formal
languages. Noam Chomsky introduced the notion of
context free languages and later became well-known for his theory that language is "hard-wired" in human brains, and for
his criticism of American foreign policy.
1962
Snobol was designed in 1962 in Bell Labs by R. E. Griswold and I. Polonsky. Work begins on the sure-fire winner
of the "clever acronym" award, SNOBOL--StriNg-Oriented symBOlic Language. It will spawn other clever acronyms: FASBOL, a SNOBOL
compiler (in 1971), and SPITBOL--SPeedy ImplemenTation of snoBOL--also in 1971.
APL is documented in Iverson's book, A Programming Language .
FORTRAN IV appears.
1963
ALGOL 60 is revised.
PL/1. Work begins on PL/1.
1964
System/360, announced in April of 1964,
PL/1 is released with high quality compiler (F-compiler), which beats is quality of both compile-time and
run-time diagnostics most of the compilers of the time. Later two brilliantly written and in some aspects unsurpassable
compilers: debugging and optimizing PL/1 compilers were added. Both represented state of the art of compiler writing.
Cornell University implemented subset of PL/1 for teaching called PL/C with the compiler that has probably the most advanced
error detection and correction capabilities of batch compilers of all times. PL/1 was also adopted as system implementation
language for Multics.
APL\360 is implemented.
BASIC. At Dartmouth University , professors John G. Kemeny and Thomas E. Kurtz invent BASIC. The first implementation
was on a timesharing system. The first BASIC program runs at about 4:00 a.m. on May 1, 1964.
1965
SNOBOL3 appears.
1966
FORTRAN 66 appears.
LISP 2 appears.
Work begins on LOGO at Bolt, Beranek, & Newman. The team is headed by Wally Fuerzeig and includes Seymour Papert. LOGO
is best known for its "turtle graphics."
1967
SNOBOL4 , a much-enhanced SNOBOL, appears.
The first volume of The Art of Computer Programming was published in 1968 and instantly became classic
Donald Knuth (b. 1938) later published two additional volumes of his world famous
three-volume treatise.
Structured programming movement started. The start
of the first religious cult in programming language design. It was created by Edgar Dijkstra who published his infamous "Go to
statement considered harmful" (CACM 11(3), March 1968, pp 147-148). While misguided this cult somewhat contributed to the design
of control structures in programming languages serving as a kind of stimulus for creation of more rich set of control structures
in new programming languages (with PL/1 and its derivative -- C as probably the two popular programming languages which incorporated
this new tendencies). Later it degenerated into completely fundamentalist and mostly counter-productive verification cult.
ALGOL 68 , the successor of ALGOL 60, appears. Was the first extensible language that got some traction but generally
was a flop. Some members of the specifications committee--including C.A.R. Hoare and Niklaus Wirth -- protested its approval on
the basis of its overcomplexity. They proved to be partially write: ALGOL 68 compilers proves to be difficult to implement and
tat doomed the language. Dissatisfied with the complexity of the Algol-68 Niklaus Wirth begins his work on a simple teaching language
which later becomes Pascal.
ALTRAN , a FORTRAN variant, appears.
COBOL is officially defined by ANSI.
Niklaus Wirth begins work on Pascal language design (in part as a reaction to overcomplexity of Algol 68). Like
Basic before it, Pascal was specifically designed for teaching programming at universities and as such was specifically
designed to allow one pass recursive decent compiler. But the language has multiple grave deficiencies. While a talented
language designer Wirth went overboard in simplification of the language (for example in the initial version of the language loops
were the allowed to have only increment one, arrays were only static, etc). It also was used to promote bizarre ideas of correctness
proofs of the program inspired by verification movement with the high priest Edgar Dijkstra -- the first (or may be the second
after structured programming) mass religious cult in programming languages history that destroyed careers of several talented
computer scientists who joined it, such as David Gries). Some of blunders in Pascal design were later corrected in Modula and
Modula 2.
1969
500 people attend an APL conference at IBM's headquarters in Armonk, New York. The demands for APL's distribution are
so great that the event is later referred to as "The March on Armonk."
Seventies
1970
Forth.Sometime in the early 1970s , Charles Moore writes the first significant programs in his new language,
Forth.
Prolog. Work on Prolog begins about this time. For some time Prolog became fashionable due to Japan initiatives.
Later it returned to relative obscurity, although did not completely disappeared from the language map.
Also sometime in the early 1970s , work on Smalltalk begins at Xerox PARC, led by Alan Kay. Early versions will include
Smalltalk-72, Smalltalk-74, and Smalltalk-76.
An implementation of Pascal appears on a CDC 6000-series computer.
Icon , a descendant of SNOBOL4, appears.
1972
The manuscript for Konrad Zuse's Plankalkul (see 1946) is finally published.
Dennis Ritchie produces C. The definitive reference manual for it will not appear until 1974.
PL/M. In 1972 Gary Kildall implemented a subset of PL/1, called "PL/M" for microprocessors. PL/M was used to write the CP/M
operating system - and much application software running on CP/M and MP/M. Digital Research also sold a PL/I compiler for
the PC written in PL/M. PL/M was used to write much other software at Intel for the 8080, 8085, and Z-80 processors during the
1970s.
The first implementation of Prolog -- by Alain Colmerauer and Phillip Roussel
1974
Donald E. Knuth published his article that give a decisive blow to "structured programming fundamentalists" led by Edgar Dijkstra:
Structured Programming with go to Statements.
ACM Comput. Surv. 6(4):
261-301 (1974)
Another ANSI specification for COBOL appears.
1975
Paul Abrahams (Courant Intritute of Mathematical Sciences) destroyed credibility of "structured programming" cult in his article
" 'Structure programming' considered harmful" (SYGPLAN Notices, 1975, April, p 13-24
Tiny BASIC by Bob Albrecht and Dennis Allison (implementation by Dick Whipple and John Arnold) runs on a microcomputer
in 2 KB of RAM. It is usable of a 4-KB machine, which left 2 KB available for the program.
Microsoft was formed on April 4, 1975 to develop and sell
BASICinterpreters
for the Altair 8800. Bill Gates and Paul Allen
write a version of BASIC that they sell to MITS (Micro Instrumentation and Telemetry Systems) on a per-copy royalty basis.
MITS is producing the Altair, one of the earlier 8080-based microcomputers that came with a interpreter for a programming
language.
Scheme , a LISP dialect by G.L. Steele and G.J. Sussman, appears.
Pascal User Manual and Report , by Jensen and Wirth, is published. Still considered by many to be the definitive
reference on Pascal. This was kind of attempt to replicate the success of Basic relying of growing "structured programming" fundamentalism
movement started by Edgar Dijkstra. Pascal acquired large following in universities as compiler was made freely available. It
was adequate for teaching, has fast completer and was superior to Basic.
B.W. Kerninghan describes RATFOR--RATional FORTRAN. It is a preprocessor that allows C-like control structures in FORTRAN.
RATFOR is used in Kernighan and Plauger's "Software Tools," which appears in 1976.
1976
Backlash on Dijkstra
correctness proofs pseudo-religious cult started:
Andrew Tenenbaum (Vrije University, Amsterdam) published paper In Defense of Program Testing or Correctness Proofs Considered
Harmful (SIGPLAN Notices, May 1976 pp 64-68). Made the crucial contribution to the "Structured programming without GOTO"
programming debate, which was a decisive blow to the structured programming fundamentalists led by
E. Dijkstra;
Maurice Wilkes, famous computer scientists and the first president of British Computer Society (1957-1960) attacked
"verification cult" in this article Software engineering and Structured programming published in IEEE transactions on Software
engineering (SE-2, No.4, December 1976, pp 274-276. The paper was also presented as a Keynote address at the Second International
Conference on Software engineering, San Francisco, CA, October 1976.
Design System Language , considered to be a forerunner of PostScript, appears.
1977
AWK was probably the second (after Snobol) string processing language that extensively use regular expressions. The
first version was created in BellLabs by Alfred V. Aho, Peter J. Weinberger, and Brian W. Keringhan in 1977. This was also the
first widely used language with built-in garbage collection.
The ANSI standard for MUMPS -- Massachusetts General Hospital Utility Multi-Programming System -- appears. Used originally
to handle medical records, MUMPS recognizes only a string data-type. Later renamed M.
The design competition that will produce Ada begins. Honeywell Bull's team, led by Jean Ichbiah, will win the competition.
Ada never live to promises and became an expensive flop.
Kim Harris and others set up FIG, the FORTH interest group. They develop FIG-FORTH, which they sell for around $20.
UCSD Pascal. In the late 1970s , Kenneth Bowles produces UCSD Pascal, which makes Pascal available on PDP-11
and Z80-based computers.
Niklaus Wirth begins work on Modula, forerunner of Modula-2 and successor to Pascal. It was the first widely used language
that incorporate the concept of coroutines.
1978
AWK -- a text-processing language named after the designers, Aho, Weinberger, and Kernighan -- appears.
FORTRAN 77: The ANSI standard for FORTRAN 77 appears.
1979
Bourne shell. The Bourne shell
was included Unix Version 7. It was inferior
to paralleled developed C-shell but gained tremendous popularity on the strength of AT&T ownership of Unix.
C shell.The Second Berkeley Software Distribution (2BSD),
was released in May 1979. It included updated versions of the 1BSD software as well as two new programs by Joy that persist on
Unix systems to this day: the vi text editor (a visual version of ex) and the
C shell.
REXXwas designed and first implemented between 1979
and mid-1982 by Mike Cowlishaw of IBM.
Bjarne Stroustrup develops a set of languages -- collectively referred to as "C With Classes" -- that serve as the
breeding ground for C++.
1981
C-shell was extended into tcsh.
Effort begins on a common dialect of LISP, referred to as Common LISP.
Japan begins the Fifth Generation Computer System project. The primary language is Prolog.
1982
ISO Pascal appears.
In 1982 one of the first scripting languages REXX was released by IBM as a product. It was four years after AWK was released.
Over the years IBM included REXX in almost all of its operating systems (VM/CMS, VM/GCS, MVS TSO/E, AS/400, VSE/ESA, AIX, CICS/ESA,
PC DOS, and OS/2), and has made versions available for Novell NetWare, Windows, Java, and Linux.
PostScript appears. It revolutionized printing on dot matrix and laser printers.
1983
REXX was included in the third release of IBM's VM/CMS shipped in 1983; It was four years after AWK was released. Over
the years IBM included REXX in almost all of its operating systems (VM/CMS, VM/GCS, MVS TSO/E, AS/400, VSE/ESA, AIX, CICS/ESA,
PC DOS, and OS/2), and has made versions available for Novell NetWare, Windows, Java, and Linux.
Smalltalk-80: The Language and Its Implementation by Goldberg et al is published. Influencial early book
that promoted ideas of OO programming.
Ada appears . Its name comes from Lady Augusta Ada Byron, Countess of Lovelace and daughter of the English poet Byron.
She has been called the first computer programmer because of her work on Charles Babbage's analytical engine. In 1983, the Department
of Defense directs that all new "mission-critical" applications be written in Ada.
In late 1983 and early 1984, Microsoft and Digital Research both release the first C compilers for microcomputers.
In July , the first implementation of C++ appears. The name was coined by Rick Mascitti.
In November , Borland's Turbo Pascal hits the scene like a nuclear blast, thanks to an advertisement in BYTE magazine.
1984
GCC development started. In 1984 Stallman started his work on an open source C compiler that became widely
knows as gcc. The same year Steven Levy "Hackers" book is published
with a chapter devoted to RMS that presented him in an extremely favorable light.
Icon. R.E.Griswold designed Icon programming language Icon (see
overview). Like Perl Icon is a high-level, programming
language with a large repertoire of features for processing data structures and character strings. Icon is an imperative, procedural
language with a syntax reminiscent of C and Pascal, but with semantics at a much higher level (see Griswold, Ralph E. and Madge
T. Griswold. The Icon Programming Language, Second Edition, Prentice-Hall, Inc., Englewood Cliffs, New Jersey. 1990, ISBN 0-13-447889-4.).
APL2. A reference manual for APL2 appears. APL2 is an extension of APL that permits nested arrays.
1985
REXX. The first PC implementation of REXX was released.
Forth controls the submersible sled that locates the wreck of the Titanic.
Vanilla SNOBOL4 for microcomputers is released.
Methods, a line-oriented Smalltalk for PCs, is introduced.
The first version of GCC was able to compile itself appeared in late 1985. The same year GNU Manifesto published
1986
Smalltalk/V appears--the first widely available version of Smalltalk for microcomputers.
Apple releases Object Pascal for the Mac.
Borland releases Turbo Prolog.
Charles Duff releases Actor, an object-oriented language for developing Microsoft Windows applications.
Eiffel , another object-oriented language, appears.
C++ appears.
1987
PERL. The first version of Perl, Perl 1.000
was released by Larry Wall in 1987. See an excellent
PerlTimeline for more information.
Turbo Pascal version 4.0 is released.
1988
The specification for CLOS -- Common LISP Object System -- is published.
Oberon. Niklaus Wirth finishes Oberon, his follow-up to Modula-2. The language was still-born but some of its ideas
found its was to Python.
PERL 2 was released.
TCL was created. The Tcl scripting language grew out of work of
John Ousterhout on creating the design tools for integrated circuits at the University
of California at Berkeley in the early 1980's. In the fall of 1987, while on sabbatical at DEC's Western Research Laboratory,
he decided to build an embeddable command language. He started work on Tcl in early 1988, and began using the first version of
Tcl in a graphical text editor in the spring of 1988. The idea of TCL is different and to certain extent more interesting than
idea of Perl -- TCL was designed as embeddable macro language for applications. In this sense TCL is closer to REXX (which
was probably was one of the first language that was used both as a shell language and as a macrolanguage). Important products
that use Tcl are TK toolkit and Expect.
1989
The ANSI C specification is published.
C++ 2.0 arrives in the form of a draft reference manual. The 2.0 version adds features such as multiple inheritance
and pointers to members.
Perl 3.0 was released in 1989 was distributed under GNU public license -- one of the first major open source project
distributed under GNU license and probably the first outside FSF.
zsh. Paul Falstad wrote zsh, a superset of the ksh88 which also had many csh features.
C++ 2.1 , detailed in Annotated C++ Reference Manual by B. Stroustrup et al, is published. This adds templates
and exception-handling features.
FORTRAN 90 includes such new elements as case statements and derived types.
Kenneth Iverson and Roger Hui present J at the APL90 conference.
1991
Visual Basic wins BYTE's Best of Show award at Spring COMDEX.
PERL 4 released. In January 1991 the first edition of Programming Perl, a.k.a. The Pink Camel, by Larry Wall and Randal
Schwartz is published by O'Reilly and Associates. It described a new, 4.0 version of Perl. Simultaneously Perl 4.0 was released
(in March of the same year). Final version of Perl 4 was released in 1993. Larry Wall is awarded the Dr. Dobbs Journal Excellence
in Programming Award. (March)
Python 0.9.0 was release February 20, 1991 to alt.sources
1992
Dylan -- named for Dylan Thomas -- an object-oriented language resembling Scheme, is released by Apple.
1993
ksh93 was released by David Korn. This was a reaction to the success of Perl and the last of line on AT&T developed shells.
PERL 4.036 was released. Proved to be very stable. This last version of Perl 4 was the first widely used version of Perl. Timing was simply perfect: it was already widely available
before WEB explosion in 1994.
1994
Python 1.0.0 released on January 26, 1994,
comp.lang.python, the primary discussion forum for Python, was formed
PERL 5. Version 5 was released in October of
1994.
Microsoft incorporates Visual Basic for Applications into Excel later ccreating the whole Office environment
(MSWord, Excel, PowerPoint, Outlook, Frontpage, etc) with a single scripting language.
1995
Javascript was released in September 1995, which features prototype-based object model. In JavaScript, an
object is an
associative array, augmented with a
prototype (see below); each string key provides the name for an object property, and there are two syntactical ways to specify
such a name: dot notation (obj.x = 10) and bracket notation (obj['x'] = 10). A property may be added,
rebound, or deleted at run-time. Most properties of an object (and any property that belongs to an object's prototype
inheritance chain) can be enumerated using a for...in loop.
In February , ISO accepts the 1995 revision of the Ada language. Called Ada 95, it includes OOP features and support
for real-time systems.
RUBY December: First release 0.95.
1996
Jscript: Javascript derivative JScript was
released by Microsoft.
first ANSI C++ standard .
Ruby 1.0 released. Did not gain much popularity until later.
1997
Java. In 1997 Java was released. This was basically an attempt to create Basic C++, originally intended for
imbedded applications. This subset of C++ uses standard Simula object model but was implemented using VM and the switch to VM
was the major and only innovation that Java brought to the world. Sun launches a tremendous and widely successful campaign to replace Cobol with
Java as a standard language for writing commercial applications for the industry.
Javascript: ECMAScript 5 was finally released in December 2009
2011
Dennis Ritchie, the creator of C, dies. He was only 70 at the time.
2017:
Javascript: ECMAScript 2017 was released in June 2017
Special note on Scripting languages
Scripting helps to avoid OO trap that is pushed by
"a hoard of practically illiterate researchers
publishing crap papers in junk conferences."
Despite the fact that scripting languages are really important computer science phenomena, they are usually happily ignored in university
curriculums. Students are usually indoctrinated (or in less politically correct terms "brainwashed") in Java and OO
programming ;-)
This site tries to give scripting languages proper emphasis and promotes scripting languages as an alternative to mainstream
reliance on "Java as a new Cobol" approach for software development. Please read my introduction to the topic that was recently converted
into the article: A Slightly Skeptical View on
Scripting Languages.
The tragedy of scripting language designer is that there is no way to overestimate the level of abuse of any feature of the language.
Half of the programmers by definition is below average and it is this half that matters most in enterprise environment. In a way
the higher is the level of programmer, the less relevant for him are limitations of the language. That's why statements like "Perl is
badly suitable for large project development" are plain vanilla silly. With proper discipline it is perfectly suitable and programmers
can be more productive with Perl than with Java. The real question is "What is the team quality and quantity?".
Scripting is a part of Unix cultural tradition and Unix was the initial development platform for most of mainstream scripting languages
with the exception of REXX. But they are portable and now all can be used in Windows and other OSes.
List of Softpanorama pages related to scripting languages
Different scripting languages provide different level of integration with base OS API (for example, Unix or Windows). For example
Iron Python compiles into .Net and provides pretty high level of integration with Windows. The same is true about Perl and
Unix: almost all Unix system calls are available directly from Perl. Moreover Perl integrates most of Unix API in a very natural way,
making it perfect replacement of shell for coding complex scripts. It also have very good debugger. The latter is weak point of shells
like bash and ksh93
Unix proved that treating everything like a file is a powerful OS paradigm. In a similar way scripting languages proved that "everything
is a string" is also an extremely powerful programming paradigm.
Unix proved that treating everything like a file is a powerful OS paradigm. In a similar way scripting languages proved
that "everything is a string" is also extremely powerful programming paradigm.
Along with pages devoted to major scripting languages this site has many pages devoted to scripting in different applications.
There are more then a dozen of "Perl/Scripting tools for a particular area" type of pages. The most well developed and up-to-date pages
of this set are probably Shells and Perl. This page
main purpose is to follow the changes in programming practices that can be called the "rise of scripting," as predicted in the
famous John Ousterhout article
Scripting: Higher Level Programming for the 21st Century
in IEEE COMPUTER (1998). In this brilliant paper he wrote:
...Scripting languages such as Perl and Tcl represent a very different style of programming than system programming languages
such as C or Java. Scripting languages are designed for "gluing" applications; they use typeless approaches to achieve a higher level
of programming and more rapid application development than system programming languages. Increases in computer speed and changes
in the application mix are making scripting languages more and more important for applications of the future.
...Scripting languages and system programming languages are complementary, and most major computing platforms since the 1960's
have provided both kinds of languages. The languages are typically used together in component frameworks, where components are created
with system programming languages and glued together with scripting languages. However, several recent trends, such as faster machines,
better scripting languages, the increasing importance of graphical user interfaces and component architectures, and the growth of
the Internet, have greatly increased the applicability of scripting languages. These trends will continue over the next decade,
with more and more new applications written entirely in scripting languages and system programming languages used primarily for creating
components.
My e-book Portraits of Open Source Pioneers contains several chapters on scripting
(most are in early draft stage) that expand on this topic.
The reader must understand that the treatment of the scripting languages in press, and especially academic press is far from being
fair: entrenched academic interests often promote old or commercially supported paradigms until they retire, so change of paradigm often
is possible only with the change of generations. And people tend to live longer those days... Please also be aware that even respectable
academic magazines like Communications of ACM and IEEE Software often promote "Cargo cult software engineering" like
Capability Maturity (CMM)model.
https://832dd5f9ff74a6be66c562b9cf145a16.safeframe.googlesyndication.com/safeframe/1-0-38/html/container.html
Report this ad
mweerden ,
22 10
Assume that I have programs P0 , P1 , ... P(n-1)
for some n > 0 . How can I easily redirect the output of program
Pi to program P(i+1 mod n) for all i ( 0 <= i
< n )?
For example, let's say I have a program square , which repeatedly reads a
number and than prints the square of that number, and a program calc , which
sometimes prints a number after which it expects to be able to read the square of it. How do
I connect these programs such that whenever calc prints a number,
square squares it returns it to calc ?
Edit: I should probably clarify what I mean with "easily". The named pipe/fifo solution is
one that indeed works (and I have used in the past), but it actually requires quite a bit of
work to do properly if you compare it with using a bash pipe. (You need to get a not yet
existing filename, make a pipe with that name, run the "pipe loop", clean up the named pipe.)
Imagine you could no longer write prog1 | prog2 and would always have to use
named pipes to connect programs.
I'm looking for something that is almost as easy as writing a "normal" pipe. For instance
something like { prog1 | prog2 } >&0 would be great. bash Share Improve this question
Follow edited Sep 4
'08 at 7:38 asked Sep 2 '08 at 18:40 mweerden 12.5k 4 4 gold badges 28 28
silver badges 31 31 bronze badges
After spending quite some time yesterday trying to redirect stdout to
stdin , I ended up with the following method. It isn't really nice, but I think
I prefer it over the named pipe/fifo solution.
read | { P0 | ... | P(n-1); } >/dev/fd/0
The { ... } >/dev/fd/0 is to redirect stdout to stdin for the pipe
sequence as a whole (i.e. it redirects the output of P(n-1) to the input of P0). Using
>&0 or something similar does not work; this is probably because bash
assumes 0 is read-only while it doesn't mind writing to /dev/fd/0
.
The initial read -pipe is necessary because without it both the input and
output file descriptor are the same pts device (at least on my system) and the redirect has
no effect. (The pts device doesn't work as a pipe; writing to it puts things on your screen.)
By making the input of the { ... } a normal pipe, the redirect has the desired
effect.
To illustrate with my calc / square example:
function calc() {
# calculate sum of squares of numbers 0,..,10
sum=0
for ((i=0; i<10; i++)); do
echo $i # "request" the square of i
read ii # read the square of i
echo "got $ii" >&2 # debug message
let sum=$sum+$ii
done
echo "sum $sum" >&2 # output result to stderr
}
function square() {
# square numbers
read j # receive first "request"
while [ "$j" != "" ]; do
let jj=$j*$j
echo "square($j) = $jj" >&2 # debug message
echo $jj # send square
read j # receive next "request"
done
}
read | { calc | square; } >/dev/fd/0
Running the above code gives the following output:
Of course, this method is quite a bit of a hack. Especially the read part has
an undesired side-effect: termination of the "real" pipe loop does not lead to termination of
the whole. I couldn't think of anything better than read as it seems that you
can only determine that the pipe loop has terminated by try to writing write something to it.
Share Improve this answer
Follow answered Sep 4 '08 at 8:22 mweerden 12.5k 4 4 gold badges 28 28
silver badges 31 31 bronze badges
regnarg ,
Nice solution. I had to do something similar using netcat inside a loop and worked around the
'read' side effect by 'closing' its input with an 'echo'. In the end it was something like
this : echo | read | { P0 | ... | P(n-1); } >/dev/fd/0 – Thiago de Arruda Nov 30 '11 at
16:29
Douglas Leeder , 2008-09-02 20:57:53
15
A named pipe might do it:
$ mkfifo outside
$ <outside calc | square >outside &
$ echo "1" >outside ## Trigger the loop to start
This is a very interesting question. I (vaguely) remember an assignment very similar in
college 17 years ago. We had to create an array of pipes, where our code would get
filehandles for the input/output of each pipe. Then the code would fork and close the unused
filehandles.
I'm thinking you could do something similar with named pipes in bash. Use mknod or mkfifo
to create a set of pipes with unique names you can reference then fork your program.
Share Improve this answer
Follow answered Sep 2 '08 at 19:16 Mark Witczak 1,413 2 2 gold badges
14 14 silver badges 13 13 bronze badges
My solutions uses pipexec (Most of the function implementation comes
from your answer):
square.sh
function square() {
# square numbers
read j # receive first "request"
while [ "$j" != "" ]; do
let jj=$j*$j
echo "square($j) = $jj" >&2 # debug message
echo $jj # send square
read j # receive next "request"
done
}
square $@
calc.sh
function calc() {
# calculate sum of squares of numbers 0,..,10
sum=0
for ((i=0; i<10; i++)); do
echo $i # "request" the square of i
read ii # read the square of i
echo "got $ii" >&2 # debug message
let sum=$sum+$ii
done
echo "sum $sum" >&2 # output result to stderr
}
calc $@
Comment: pipexec was designed to start processes and build arbitrary pipes in between.
Because bash functions cannot be handled as processes, there is the need to have the
functions in separate files and use a separate bash. Share Improve this answer Follow answered Mar 14
'15 at 20:30 Andreas Florath 3,797 19 19
silver badges 31 31 bronze badges
I doubt sh/bash can do it. ZSH would be a better bet, with its MULTIOS and coproc
features. Share Improve this
answer Follow answered Sep 2 '08 at 20:31 Penz 4,680 4 4 gold badges 26 26 silver
badges 26 26 bronze badges
A command stack can be composed as string from an array of arbitrary commands and
evaluated with eval. The following example gives the result 65536.
function square ()
{
read n
echo $((n*n))
} # ---------- end of function square ----------
declare -a commands=( 'echo 4' 'square' 'square' 'square' )
#-------------------------------------------------------------------------------
# build the command stack using pipes
#-------------------------------------------------------------------------------
declare stack=${commands[0]}
for (( COUNTER=1; COUNTER<${#commands[@]}; COUNTER++ )); do
stack="${stack} | ${commands[${COUNTER}]}"
done
#-------------------------------------------------------------------------------
# run the command stack
#-------------------------------------------------------------------------------
eval "$stack"
To get the logic right, just minor changes are required. Use:
while ! df | grep '/toBeMounted'
do
sleep 2
done
echo -e '\a'Hey, I think you wanted to know that /toBeMounted is available finally.
Discussion
The corresponding code in the question was:
while df | grep -v '/toBeMounted'
The exit code of a pipeline is the exit code of the last command in the pipeline.
grep -v '/toBeMounted' will return true (code=0) if at least one line of input
does not match /toBeMounted . Thus, this tests whether there are other things
mounted besides /toBeMounted . This is not at all what you are looking
for.
To use df and grep to test whether /toBeMounted is
mounted, we need
df | grep '/toBeMounted'
This returns true if /toBeMounted is mounted. What you actually need is the
negation of this: you need a condition that is true if /toBeMounted is not
mounted. To do that, we just need to use negation, denoted by ! :
The return status of a pipeline is the exit status of the last command, unless the
pipefail option is enabled. If pipefail is enabled, the pipeline's return status is the
value of the last (rightmost) command to exit with a non-zero status, or zero if all
commands exit successfully. If the reserved word ! precedes a pipeline, the
exit status of that pipeline is the logical negation of the exit status as described above.
The shell waits for all commands in the pipeline to terminate before returning a value.
Yeah it looks like my real problem wasn't the pipe, but not clearly thinking about the
-v on a line by line basis. – dlamblin Sep 17 '16 at 6:47
Sergiy Kolodyazhnyy ,
4
The fact that you're using df with grep tells me that you're
filtering output of df until some device mounts to specific directory, i.e.
whether or not it's on the list.
Instead of filtering the list focus on the directory that you want. Luckly for us, the
utility mountpoint allows us to do exactly that, and allows to deal with exit
status of that command. Consider this:
$ mountpoint /mnt/HDD/
/mnt/HDD/ is a mountpoint
$ echo $?
0
$ mountpoint ~
/home/xieerqi is not a mountpoint
$ echo $?
1
Your script thus, can be rewritten as
while ! mountput /toBeMounted > /dev/null
do
sleep 3
done
echo "Yup, /toBeMounted got mounted!"
Sample run with my own disk:
$ while ! mountpoint /mnt/HDD > /dev/null
> do
> echo "Waiting"
> sleep 1
> done && echo "/mnt/HDD is mounted"
Waiting
Waiting
Waiting
Waiting
Waiting
/mnt/HDD is mounted
On a side note, you can fairly easy implement your own version of mountpoint
command, for instance , in python , like i did:
#!/usr/bin/env python3
from os import path
import sys
def main():
if not sys.argv[1]:
print('Missing a path')
sys.exit(1)
full_path = path.realpath(sys.argv[1])
with open('/proc/self/mounts') as mounts:
print
for line in mounts:
if full_path in line:
print(full_path,' is mountpoint')
sys.exit(0)
print(full_path,' is not a mountpoint')
sys.exit(1)
if __name__ == '__main__':
main()
Sample run:
$ python3 ./is_mountpoint.py /mnt/HDD
/mnt/HDD is mountpoint
$ python3 ./is_mountpoint.py ~
/home/xieerqi is not a mountpoint
I was generally unclear on using a pipe in a conditional statement. But the specific case of
checking for a mounted device, mountpoint sounds perfect, thanks. Though
conceptually in this case I could have also just done: while [ ! -d /toBeMounted ]; do
sleep 2; done; echo -e \\aDing the directory is available now. – dlamblin Sep 20 '16 at 0:52
https://523467b4f3186a665b8a0c59ce7f89c4.safeframe.googlesyndication.com/safeframe/1-0-38/html/container.html
Report this ad
gugy , 2018-07-25 09:56:33
0
I am trying to do the following (using bash): Search for files that always have the same
name and extract data from these files. I want to store the extracted data in new arrays I am
almost there, I think, see code below.
The files I am searching for all have this format:
#!/bin/bash
echo "the concentration of NDPH is 2 mM, which corresponds to 2 molecules in a box of size 12 nm (12 x 12 x 12 nm^3)" > README_test
#find all the README* files and save the paths into an array called files
files=()
data1=()
data2=()
data3=()
while IFS= read -r -d $'\0'; do
files+=("$REPLY")
#open all the files and extract data from them
while read -r line
do
name="$line"
echo "$name" | tr ' ' '\n'| awk 'f{print;f=0;exit} /of/{f=1}'
echo "$name"
echo "$name" | tr ' ' '\n'| awk 'f{print;f=0;exit} /of/{f=1}'
data1+=( "$echo "$name" | tr ' ' '\n'| awk 'f{print;f=0;exit} /of/{f=1}' )" )
# variables are not preserved...
# data2+= echo "$name" | tr ' ' '\n'| awk 'f{print;f=0;exit} /is/{f=1}'
echo "$name" | tr ' ' '\n'| awk 'f{print;f=0;exit} /size/{f=1}'
# variables are not preserved...
# data3+= echo "$name" | tr ' ' '\n'| awk 'f{print;f=0;exit} /size/{f=1}'
done < "$REPLY"
done < <(find . -name "README*" -print0)
echo ${data1[0]}
The issue is that the pipe giving me the exact output I want from the files is "not
working" (variables are not preserved) in the loops. I have no idea how/if I can use process
substitution to get what I want: an array (data1, data2, data3) filled with the output of the
pipes.
UPDATE: SO I was not assigning things to the array correctly (see data1, which is properly
assigning sth now.) But why are
#!/bin/bash
echo "the concentration of NDPH is 2 mM, which corresponds to 2 molecules in a box of size 12 nm (12 x 12 x 12 nm^3)" > README_test
files=()
data1=()
data2=()
data3=()
get_some_field() {
echo "$1" | tr ' ' '\n'| awk -vkey="$2" 'f{print;f=0;exit} $0 ~ key {f=1}'
}
#find all the README* files and save the paths into an array called files
while IFS= read -r -d $'\0'; do
files+=("$REPLY")
#open all the files and extract data from them
while read -r line
do
name="$line"
echo "$name"
echo "$name" | tr ' ' '\n'| awk 'f{print;f=0;exit} /of/{f=1}'
data1+=( "$(get_some_field "$name" of)" )
data2+=( "$(get_some_field "$name" is)" )
data3+=( "$(get_some_field "$name" size)" )
done < "$REPLY"
done < <(find . -name "README*" -print0)
echo ${data1[0]}
echo ${data2[0]}
echo ${data3[0]}
data1+= echo... doesn't really do anything to the data1 variable.
Do you mean to use data1+=( "$(echo ... | awk)" ) ? – ilkkachu Jul 25 '18 at 10:20
> ,
2
I'm assuming you want the output of the echo ... | awk stored in a variable,
and in particular, appended to one of the arrays.
First, to capture the output of a command, use "$( cmd... )" (command
substitution). As a trivial example, this prints your hostname:
var=$(uname -n)
echo $var
Second, to append to an array, you need to use the array assignment syntax, with
parenthesis around the right hand side. This would append the value of var to
the array:
array+=( $var )
And third, the expansion of $var and the command substitution
$(...) are subject to word splitting, so you want to use parenthesis around
them. Again a trivial example, this puts the full output of uname -a as a
single element in the array:
(Note that the quotes inside the command substitution are distinct from the quotes
outside it. The quote before $1 doesn't stop the quoting started
outside $() , unlike what the syntax hilighting on SE seems to imply.)
You could make that slightly simpler to read by putting the pipeline in a function:
"... This is the Unix philosophy: Write programs that do one thing and do it well. Write
programs to work together. Write programs to handle text streams, because that is a universal
interface." ..."
Notable quotes:
"... This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface." ..."
Author's note: Much of the content in this article is excerpted, with some significant
edits to fit the Opensource.com article format, from Chapter 3: Data Streams, of my new book,
The Linux Philosophy
for SysAdmins .
Everything in Linux revolves around streams of data -- particularly text streams. Data
streams are the raw materials upon which the GNU Utilities , the Linux core
utilities, and many other command-line tools perform their work.
As its name implies, a data stream is a stream of data -- especially text data -- being
passed from one file, device, or program to another using STDIO. This chapter introduces the
use of pipes to connect streams of data from one utility program to another using STDIO. You
will learn that the function of these programs is to transform the data in some manner. You
will also learn about the use of redirection to redirect the data to a file.
I use the
term "transform" in conjunction with these programs because the primary task of each is to
transform the incoming data from STDIO in a specific way as intended by the sysadmin and to
send the transformed data to STDOUT for possible use by another transformer program or
redirection to a file.
The standard term, "filters," implies something with which I don't agree. By definition, a
filter is a device or a tool that removes something, such as an air filter removes airborne
contaminants so that the internal combustion engine of your automobile does not grind itself
to death on those particulates. In my high school and college chemistry classes, filter paper
was used to remove particulates from a liquid. The air filter in my home HVAC system removes
particulates that I don't want to breathe.
Although they do sometimes filter out unwanted data from a stream, I much prefer the term
"transformers" because these utilities do so much more. They can add data to a stream, modify
the data in some amazing ways, sort it, rearrange the data in each line, perform operations
based on the contents of the data stream, and so much more. Feel free to use whichever term
you prefer, but I prefer transformers. I expect that I am alone in this.
Data streams can be manipulated by inserting transformers into the stream using pipes.
Each transformer program is used by the sysadmin to perform some operation on the data in the
stream, thus changing its contents in some manner. Redirection can then be used at the end of
the pipeline to direct the data stream to a file. As mentioned, that file could be an actual
data file on the hard drive, or a device file such as a drive partition, a printer, a
terminal, a pseudo-terminal, or any other device connected to a computer.
The ability to manipulate these data streams using these small yet powerful transformer
programs is central to the power of the Linux command-line interface. Many of the core
utilities are transformer programs and use STDIO.
In the Unix and Linux worlds, a stream is a flow of text data that originates at some
source; the stream may flow to one or more programs that transform it in some way, and then
it may be stored in a file or displayed in a terminal session. As a sysadmin, your job is
intimately associated with manipulating the creation and flow of these data streams. In this
post, we will explore data streams -- what they are, how to create them, and a little bit
about how to use them.
Text streams -- a universal interface
The use of Standard Input/Output (STDIO) for program input and output is a key foundation
of the Linux way of doing things. STDIO was first developed for Unix and has found its way
into most other operating systems since then, including DOS, Windows, and Linux.
" This is the Unix philosophy: Write programs that do one thing and do it well.
Write programs to work together. Write programs to handle text streams, because that is a
universal interface."
-- Doug McIlroy, Basics of the Unix Philosophy
STDIO
STDIO was developed by Ken Thompson as a part of the infrastructure required to implement
pipes on early versions of Unix. Programs that implement STDIO use standardized file handles
for input and output rather than files that are stored on a disk or other recording media.
STDIO is best described as a buffered data stream, and its primary function is to stream data
from the output of one program, file, or device to the input of another program, file, or
device.
There are three STDIO data streams, each of which is automatically opened as a file at
the startup of a program -- well, those programs that use STDIO. Each STDIO data stream is
associated with a file handle, which is just a set of metadata that describes the
attributes of the file. File handles 0, 1, and 2 are explicitly defined by convention and
long practice as STDIN, STDOUT, and STDERR, respectively.
STDIN, File handle 0 , is standard input which is usually input from the keyboard.
STDIN can be redirected from any file, including device files, instead of the keyboard. It
is not common to need to redirect STDIN, but it can be done.
STDOUT, File handle 1 , is standard output which sends the data stream to the display
by default. It is common to redirect STDOUT to a file or to pipe it to another program for
further processing.
STDERR, File handle 2 . The data stream for STDERR is also usually sent to the
display.
If STDOUT is redirected to a file, STDERR continues to be displayed on the screen. This
ensures that when the data stream itself is not displayed on the terminal, that STDERR is,
thus ensuring that the user will see any errors resulting from execution of the program.
STDERR can also be redirected to the same or passed on to the next transformer program in a
pipeline.
STDIO is implemented as a C library, stdio.h , which can be included in the source code of
programs so that it can be compiled into the resulting executable.
Simple streams
You can perform the following experiments safely in the /tmp directory of your Linux host.
As the root user, make /tmp the PWD, create a test directory, and then make the new directory
the PWD.
# cd /tmp ; mkdir test ; cd test
Enter and run the following command line program to create some files with content on the
drive. We use the dmesg command simply to provide data for the files to contain.
The contents don't matter as much as just the fact that each file has some content.
# for I in 0 1 2 3 4 5 6 7 8 9 ; do dmesg > file$I.txt ; done
Verify that there are now at least 10 files in /tmp/ with the names file0.txt through
file9.txt .
# ll
total 1320
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file0.txt
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file1.txt
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file2.txt
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file3.txt
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file4.txt
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file5.txt
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file6.txt
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file7.txt
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file8.txt
-rw-r--r-- 1 root root 131402 Oct 17 15:50 file9.txt
We have generated data streams using the dmesg command, which was redirected
to a series of files. Most of the core utilities use STDIO as their output stream and those
that generate data streams, rather than acting to transform the data stream in some way, can
be used to create the data streams that we will use for our experiments. Data streams can be
as short as one line or even a single character, and as long as needed.
Exploring the
hard drive
It is now time to do a little exploring. In this experiment, we will look at some of the
filesystem structures.
Let's start with something simple. You should be at least somewhat familiar with the
dd command. Officially known as "disk dump," many sysadmins call it "disk
destroyer" for good reason. Many of us have inadvertently destroyed the contents of an entire
hard drive or partition using the dd command. That is why we will hang out in
the /tmp/test directory to perform some of these experiments.
Despite its reputation, dd can be quite useful in exploring various types of
storage media, hard drives, and partitions. We will also use it as a tool to explore other
aspects of Linux.
Log into a terminal session as root if you are not already. We first need to determine the
device special file for your hard drive using the lsblk
command.
We can see from this that there is only one hard drive on this host, that the device
special file associated with it is /dev/sda , and that it has two partitions. The /dev/sda1
partition is the boot partition, and the /dev/sda2 partition contains a volume group on which
the rest of the host's logical volumes have been created.
As root in the terminal session, use the dd command to view the boot record
of the hard drive, assuming it is assigned to the /dev/sda device. The bs=
argument is not what you might think; it simply specifies the block size, and the
count= argument specifies the number of blocks to dump to STDIO. The
if= argument specifies the source of the data stream, in this case, the /dev/sda
device. Notice that we are not looking at the first block of the partition, we are looking at
the very first block of the hard drive.
... ... ...
This prints the text of the boot record, which is the first block on the disk -- any disk.
In this case, there is information about the filesystem and, although it is unreadable
because it is stored in binary format, the partition table. If this were a bootable device,
stage 1 of GRUB or some other boot loader would be located in this sector. The last three
lines contain data about the number of records and bytes processed.
Starting with the beginning of /dev/sda1 , let's look at a few blocks of data at a time to
find what we want. The command is similar to the previous one, except that we have specified
a few more blocks of data to view. You may have to specify fewer blocks if your terminal is
not large enough to display all of the data at one time, or you can pipe the data through the
less utility and use that to page through the data -- either way works. Remember, we are
doing all of this as root user because non-root users do not have the required
permissions.
Enter the same command as you did in the previous experiment, but increase the block count
to be displayed to 100, as shown below, in order to show more data.
.... ... ...
Now try this command. I won't reproduce the entire data stream here because it would take
up huge amounts of space. Use Ctrl-C to break out and stop the stream of data.
[root@studentvm1 test]# dd if=/dev/sda
This command produces a stream of data that is the complete content of the hard drive,
/dev/sda , including the boot record, the partition table, and all of the partitions and
their content. This data could be redirected to a file for use as a complete backup from
which a bare metal recovery can be performed. It could also be sent directly to another hard
drive to clone the first. But do not perform this particular experiment.
You can see that the dd command can be very useful for exploring the
structures of various types of filesystems, locating data on a defective storage device, and
much more. It also produces a stream of data on which we can use the transformer utilities in
order to modify or view.
The real point here is that dd , like so many Linux commands, produces a
stream of data as its output. That data stream can be searched and manipulated in many ways
using other tools. It can even be used for ghost-like backups or disk
duplication.
Randomness
It turns out that randomness is a desirable thing in computers -- who knew? There are a
number of reasons that sysadmins might want to generate a stream of random data. A stream of
random data is sometimes useful to overwrite the contents of a complete partition, such as
/dev/sda1 , or even the entire hard drive, as in /dev/sda .
Perform this experiment as a non-root user. Enter this command to print an unending stream
of random data to STDIO.
[student@studentvm1 ~]$ cat /dev/urandom
Use Ctrl-C to break out and stop the stream of data. You may need to use Ctrl-C multiple
times.
Random data is also used as the input seed to programs that generate random passwords and
random data and numbers for use in scientific and statistical calculations. I will cover
randomness and other interesting data sources in a bit more detail in Chapter 24: Everything
is a file.
Pipe dreams
Pipes are critical to our ability to do the amazing things on the command line, so much so
that I think it is important to recognize that they were invented by Douglas McIlroy during
the early days of Unix (thanks, Doug!). The Princeton University website has a fragment of an
interview with McIlroy in
which he discusses the creation of the pipe and the beginnings of the Unix philosophy.
Notice the use of pipes in the simple command-line program shown next, which lists each
logged-in user a single time, no matter how many logins they have active. Perform this
experiment as the student user. Enter the command shown below:
The results from this command produce two lines of data that show that the user's root and
student are both logged in. It does not show how many times each user is logged in. Your
results will almost certainly differ from mine.
Pipes -- represented by the vertical bar ( | ) -- are the syntactical glue, the operator,
that connects these command-line utilities together. Pipes allow the Standard Output from one
command to be "piped," i.e., streamed from Standard Output of one command to the Standard
Input of the next command.
The |& operator can be used to pipe the STDERR along with STDOUT to STDIN of the next
command. This is not always desirable, but it does offer flexibility in the ability to record
the STDERR data stream for the purposes of problem determination.
A string of programs connected with pipes is called a pipeline, and the programs that use
STDIO are referred to officially as filters, but I prefer the term "transformers."
Think about how this program would have to work if we could not pipe the data stream from
one command to the next. The first command would perform its task on the data and then the
output from that command would need to be saved in a file. The next command would have to
read the stream of data from the intermediate file and perform its modification of the data
stream, sending its own output to a new, temporary data file. The third command would have to
take its data from the second temporary data file and perform its own manipulation of the
data stream and then store the resulting data stream in yet another temporary file. At each
step, the data file names would have to be transferred from one command to the next in some
way.
I cannot even stand to think about that because it is so complex. Remember: Simplicity
rocks!
Building pipelines
When I am doing something new, solving a new problem, I usually do not just type in a
complete Bash command pipeline from scratch off the top of my head. I usually start with just
one or two commands in the pipeline and build from there by adding more commands to further
process the data stream. This allows me to view the state of the data stream after each of
the commands in the pipeline and make corrections as they are needed.
It is possible to build up very complex pipelines that can transform the data stream using
many different utilities that work with STDIO.
Redirection
Redirection is the capability to redirect the STDOUT data stream of a program to a file
instead of to the default target of the display. The "greater than" ( > ) character, aka
"gt", is the syntactical symbol for redirection of STDOUT.
Redirecting the STDOUT of a command can be used to create a file containing the results
from that command.
[student@studentvm1 ~]$ df -h > diskusage.txt
There is no output to the terminal from this command unless there is an error. This is
because the STDOUT data stream is redirected to the file and STDERR is still directed to the
STDOUT device, which is the display. You can view the contents of the file you just created
using this next command:
When using the > symbol to redirect the data stream, the specified file is created if
it does not already exist. If it does exist, the contents are overwritten by the data stream
from the command. You can use double greater-than symbols, >>, to append the new data
stream to any existing content in the file.
[student@studentvm1 ~]$ df -h >> diskusage.txt
You can use cat and/or less to view the diskusage.txt file in
order to verify that the new data was appended to the end of the file.
The < (less than) symbol redirects data to the STDIN of the program. You might want to
use this method to input data from a file to STDIN of a command that does not take a filename
as an argument but that does use STDIN. Although input sources can be redirected to STDIN,
such as a file that is used as input to grep, it is generally not necessary as grep also
takes a filename as an argument to specify the input source. Most other commands also take a
filename as an argument for their input source.
Just grep'ing around
The grep command is used to select lines that match a specified pattern from
a stream of data. grep is one of the most commonly used transformer utilities
and can be used in some very creative and interesting ways. The grep command is
one of the few that can correctly be called a filter because it does filter out all the lines
of the data stream that you do not want; it leaves only the lines that you do want in the
remaining data stream.
If the PWD is not the /tmp/test directory, make it so. Let's first create a stream of
random data to store in a file. In this case, we want somewhat less random data that would be
limited to printable characters. A good password generator program can do this. The following
program (you may have to install pwgen if it is not already) creates a file that
contains 50,000 passwords that are 80 characters long using every printable character. Try it
without redirecting to the random.txt file first to see what that looks like, and then do it
once redirecting the output data stream to the file.
$ pwgen -sy 80 50000 > random.txt
Considering that there are so many passwords, it is very likely that some character
strings in them are the same. First, cat the random.txt file, then use the
grep command to locate some short, randomly selected strings from the last ten
passwords on the screen. I saw the word "see" in one of those ten passwords, so my command
looked like this: grep see random.txt , and you can try that, but you should
also pick some strings of your own to check. Short strings of two to four characters work
best.
$ grep see random.txt
R=p)'s/~0}wr~2(OqaL.S7DNyxlmO69`"12u]h@rp[D2%3}1b87+>Vk,;4a0hX]d7see;1%9|wMp6Yl.
bSM_mt_hPy|YZ1<TY/Hu5{g#mQ<u_(@8B5Vt?w%i-&C>NU@[;zV2-see)>(BSK~n5mmb9~h)yx{a&$_e
cjR1QWZwEgl48[3i-(^x9D=v)seeYT2R#M:>wDh?Tn$]HZU7}j!7bIiIr^cI.DI)W0D"'[email protected]
z=tXcjVv^G\nW`,y=bED]d|7%s6iYT^a^Bvsee:v\UmWT02|P|nq%A*;+Ng[$S%*s)-ls"dUfo|0P5+n
Summary
It is the use of pipes and redirection that allows many of the amazing and powerful tasks
that can be performed with data streams on the Linux command line. It is pipes that transport
STDIO data streams from one program or file to another. The ability to pipe streams of data
through one or more transformer programs supports powerful and flexible manipulation of data
in those streams.
Each of the programs in the pipelines demonstrated in the experiments is small, and each
does one thing well. They are also transformers; that is, they take Standard Input, process
it in some way, and then send the result to Standard Output. Implementation of these programs
as transformers to send processed data streams from their own Standard Output to the Standard
Input of the other programs is complementary to, and necessary for, the implementation of
pipes as a Linux tool.
STDIO is nothing more than streams of data. This data can be almost anything from the
output of a command to list the files in a directory, or an unending stream of data from a
special device like /dev/urandom , or even a stream that contains all of the raw data from a
hard drive or a partition.
Any device on a Linux computer can be treated like a data stream. You can use ordinary
tools like dd and cat to dump data from a device into a STDIO data
stream that can be processed using other ordinary Linux tools.
David Both is a Linux and Open Source advocate who resides in Raleigh, North
Carolina. He has been in the IT industry for over forty years and taught OS/2 for IBM where
he worked for over 20 years. While at IBM, he wrote the first training course for the
original IBM PC in 1981. He has taught RHCE classes for Red Hat and has worked at MCI
Worldcom, Cisco, and the State of North Carolina. He has been working with Linux and Open
Source Software for almost 20 years. David has written articles for...
The problem is C-style
delimiters for conditional statements (round brackets) and overuse of curvy brackets. The
former is present in all C-style languages.
So IMHO omitting brackets in built-in functions was a false start; the problem that should
be addressed is the elimination of brackets in prefix conditionals.
One possible way is to have a pragma "altblockdelim" or something like that, which would
allow to use,say, ?? and ;; or classic "begin/end" pair instead of '{' and '}', which are
overused in Perl. That would decrease parenthesis nesting.
After all, we can write && as "and" and some people like it.
It's like within Perl 5 exists a language with more modern syntax that just wants to
emerge.
I'm not sure how a discussion about parenthesis ( ) morphed into too many
brackets ("curvy brackets" - { } ), but I don't see the problem in any case. The use
of brackets for block delimiters is visually quite distinct from any other use I'm familiar
with so I don't see the problem.
There is a usage of ;; that I don't quite grok, but seems to be fairly common so the ;;
option probably wouldn't fly in any case.
Perl's && and and operators have
substantially different precedence. They must not be used as interchangeable. Yes, subtle I
know, but very useful.
Optimising for fewest key strokes only makes sense transmitting to
Pluto or beyond
I'm not sure how a discussion about parenthesis ( ) morphed into too many brackets
("curvy brackets" - { }), but I don't see the problem in any case.
In C you can write
if(i<0) i=0
if Perl you can't and should write
if( $i<0 ){ $i=0 }
because the only statement allowed after conditionals is a compound statement -- a
block. Which was a pretty elegant idea that eliminates the problem of "dangling else"
https://www.sanfoundry.com/c-question-dangling-else-statements/
But the problem is that at this point round parenthesis become a wart. They are not
needed and they detract from readability. So if curvy brackets were not used anywhere else
you can simplify this to
if $i<0 {$i=0}
But you can't do this in Perl because curvy brackets are used for hashes.
There is a usage of ;; that I don't quite grok, but seems to be fairly common so the
;; option probably wouldn't fly in any case.
In Perl ; is an empty(null) statement. So the current meaning of ;; is "the
end of the previous statement followed by the null statement".
the new meaning will be "the end of the current statement and the end of the block", which is
pretty elegant idea in its own way. Because now Perl allows omitting semicolon before } as special
case, but in the new syntax this is just a general case and the special case is not needed.
Comment on Re^12:
What esteemed monks think about changes necessary/desirable in Perl 7 outside of OO
staff
For programming languages to evolve and flourish, we all need to accept other people's
viewpoints and continue open-minded, civil and respectful dialogue.
In science, scientists always question everything; why shouldn't we question some features
and point out deficiencies of Perl 5 which after version 5.10 became really stale feature-wise
-- the last important addition was the addition of state variables in 5.10. Partially
this happened as most resources were reallocated to Perl 6 (The Perl 6 project was announced in
2000), a robust interpreter for which failed to materialize for too long: the situation which
also slowed down Perl 5 interpreter development.
The question arise: Should it be possible on perlmonks to criticize some aspects of Perl 5
current features and implementation as well as its use without being denigrated as a
reward?
At least after the split Perl 5 has theoretical chances to stand on its own, and evolve like
other languages evolved (for example, FORTRAN after 1977 adopted 11 years cycle for new
versions). As Perl 5.10 was released in 2007, now it is 13 years since this date and Perl 7 is
really overdue. The question is what to include and what to exclude and what glaring flaws need
to be rectified (typically a new version of a programming language tries to rectify the most
glaring design flaws in the language and introduce changes that could not be implemented while
retaining full backward compatibility.)
Brian D Foy post that announced this new version is really weak. It essentially states "We
decided to rename 5.32 and you all should be happy." It does not contain any new ideas, just
the desire to have new version of Perl as Perl 5.32 with few new defaults (which BTW will break
compatibility with old scripts at least with 5.8 and earlier versions scripts as not all of
them use strict pragma, and strict pragma implementation still has its own set of problems
).
The question arises: Whether the game worth candles? Unless the new editions of O'Reilly
books is the goal. That's why I provided this contribution, suggesting some minor enhancements
which might better justify calling the new version Perl 7. And what I got in return ?
I hoped that this post would be a start of the meaningful discussion. But people like you
turned it into a flame-fest.
It looks like it is impossible to have a rational fact-based discussion on this subject with
zealots like you.
PERL is not dead, only those guardians of PerlMonks dot org, who lie in wait to bounce
upon your most recent posts with spiteful replies loaded with falsehoods and hate and
jealousy.
Good luck trying to impart your acquired PERL knowledge there. They will do their very
best to attempt to discredit you and your ideas.Alex
Jones , works at Own My Own Business Answered January 12, 2020 ·
Author has 259 answers and 76.1K answer views
My answer refers to Perl 5 rather than Raku (Perl 6),
Perl 5 is a veteran computer language with a track record and pedigree of several decades.
Perl has been around long enough that its strengths and weaknesses are known; it is a stable,
predictable and reliable language that will deliver results with little effort.
In the new decade 2020 and beyond, Perl in my opinion, remains competitive in performance
against any other computer language. Perl remains viable as a language to use in even the most
advanced of information technology projects.
Simple market forces have driven Perl out of the top computer languages of choice for
projects. Because a business finds it hard to find Perl developers, they are forced to use a
computer language where there are more developers such as Python. Because fewer businesses are
using Perl in their projects, the education system selects a language such as Python to train
their students in.
Perl 5 will probably no longer be the universal language of choice for developers and
businesses, but may dominate in a particular niche or market. There is a major campaign
underway by supporters of Perl 5 and Raku to promote and encourage people to learn and use
these languages again.
My startup is involved in AI, and I use Perl 5 for the projects I am developing. There are a
number of strengths in Perl 5 which appeal to me in my projects. Perl 5 has a strong reputation
for the abilty to create and execute scripts of only a few lines of code to solve problems. As
Perl 5 is designed to be like a natural spoken language, it becomes the practical choice for
handling text. When handling complex patterns, the regex capabilities in Perl 5 is probably the
best of any computer language. Lastly, Perl 5 was the glue that enabled the systems of the
1990's to work together, and might offer a pragmatic solution to bridging the old with the new
in the modern era.
I would describe Perl as existing in a dormant phase, which is waiting for the right
conditions to emerge where it will regain its place at the leading edge in a niche or market
such as in artificial intelligence. Joe Pepersack , Just Another Perl Hacker
Answered May
31, 2015 · Author has 5.7K answers and 7M answer views
No. It's not dead. But it's not very active, either and it's lost a lot of mindshare to Ruby
and Python. Hopefully the recently-announced December release of Perl 6 (Finally!) will renew
interest in the language.
I found a really useful site the other day: Modulecounts . CPAN is Perl's greatest asset, but
unfortunately it seems to have stagnated compared to Pypi or RubyGems. CPAN is getting 3 new
modules per day whereas RubyGems is getting 53/day. Rubygems overtook CPAN in 2011 and Pypi
overtook it in 2013.
Personally I think Python is Perl on training wheels and represents a step backwards if
you're coming from Perl. Ruby is a great language and is pretty Perl-ish overall. Plus someone
just recently ported Moose to Ruby so that's a huge win.
I would argue Perl is still worth learning for a couple main reasons:
It's ubiquitous. Every Unix-ish system made in the last decade has some version of Perl
on it.
It's still unbeaten for text manipulation and for doing shell-scripty type things that
are too hard to do in bash.
Ad-hoc one-liners. Neither ruby nor python can match perl for hacking together something
on the command line.
There's a lot of Perl code still out there doing important things. It's cheaper to
maintain it than it is to re-write it in another language.
Perl is certainly not dead, but it does face an adoption challenge. For example, fewer
vendors are releasing Perl API's or code samples (but the Perl community often steps in at
least for popular platforms). Finding new developers who know Perl is more difficult, while it
is much less difficult to find developers with Python and Java. The emerging technology areas
such as big data and data science have a strong Python bent, but a lot of their tasks could be
done faster in Perl (from my own experience).
What is great about Perl is despite its quirks, is it is relatively easy to learn if you
know other programming languages. What I have found amazing is that when developers are "forced
to learn Perl" for a project, they usually pleasantly surprised at how powerful and unique Perl
is compared to their language of choice.
From a job value perspective, Perl knowledge has some interesting value quirks (just like
the language has some interesting quirks). The market for Perl developers is not as large as
other languages, but companies that need Perl developers have a hard time finding good
candidates. Thus, you might find it easier to get a job with Perl skills even though there are
fewer jobs that require it.
In short, Perl has an amazing ability to convert existing programmers, but fewer programmers
are coming into the workforce with Perl experience. Avi
Mehenwal , Ex-perl programmer, but still cannot let it go. I feel the RegEx Attachement
Answered April
2, 2016 · Author has 64 answers and 226.8K answer views
Perl has been around since 1987 and became an early darling of web developers. These days,
however, you don't hear much about Perl. Everyone seems to be talking about trendier languages
like PHP, Python and Ruby, with Perl left in the back as a neglected, not-so-hip cousin.
That might lead you to think that Perl is dying, but as it turns out, it's still used by
plenty of websites out there, including some pretty big hitters.
Here are some of the more popular sites that use Perl extensively today:
The language is still thriving. There's a new release every year and each release includes
interesting new features (most recently, subroutine signatures). More modules are uploaded to
CPAN every year. More authors contribute code to CPAN every year.
But I still think that Perl is dying and I would find it hard to recommend that anyone
should choose a career in Perl at this point.
Ask yourself these three questions:
When did you last read a general programming book that included examples in Perl?
Why did you last see an API that included usage examples written in Perl?
When did you last hear of a company using Perl that you didn't already know about?
I should be working 16 hours ago removelink
Hey guys: MAYBE WE SHOULD FOCUS ON GETTING GOOGLE UNDER CONTROL FIRST!play_arrow mike_1010 17
hours ago (Edited)
Source code information is a closely guarded secret for all IT companies. Because if hackers
get access to it, then they can find many ways to compromise its security and to spy on its
users.
So, it makes sense that the Chinese government might want to protect the source code of apps
that are used by many people in China.
I'm sure the US government would say the same thing, if some Chinese company wanted to buy
the source code of Microsoft's Windows 10 operating system or something like that.
From the point of view of cybersecurity, this makes perfect sense.
Every country has legitimate security concerns. And these concerns were heightened, when
Edward Snowden revealed the extent of US government hacking and spying of the rest of the
world, including China.
The Chinese government has actually more evidence and more reasons to be concerned about
possible hacking and spying by the US government, than the other way. USA has only been
accusing China of doing the same. But they've never shown any conclusive evidence to back their
claims, the way Edward Snowden has revealed such evidence about USA.
The only thing that surprises me in this whole affair is that it took the Chinese government
this long to say the obvious. If the situation was reversed and the issue was about the source
code of some US company software, then US politicians and security experts would've been
yelling about this kind of thing right from the start.
All of the following satisfy your criteria, are valid and normal Perl code, and would get a semicolon incorrectly inserted based
on your criteria:
use softsemicolon;
$x = $a
+ $b;
$x = 1
if $condition;
$x = 1 unless $condition1
&& $condition2;
Yes in cases 1 and 2; it depends on depth of look-ahead in case 3. Yes if it is one symbol. No it it is two(no Perl statement can
start with && )
As for "valid and normal" your millage may vary. For people who would want to use this pragma it is definitely not "valid and
normal". Both 1 and 2 looks to me like frivolities without any useful meaning or justification. Moreover, case 1 can be rewritten
as:
Both 1 and 2 looks to me like frivolities without any useful meaning or justification
You and I have vastly differing perceptions of what constitutes normal perl code. For example there are over 700 examples of the
'postfix if on next line' pattern in the .pm files distributed with the perl core.
There doesn't really seem any point in discussing this further. You have failed to convince me, and I am very unlikely to work
on this myself or accept such a patch into core.
You and I have vastly differing perceptions of what constitutes normal perl code. For example there are over 700 examples of
the 'postfix if on next line' pattern in the .pm files distributed with the perl core.
Probably yes. I am an adherent of "defensive programming" who is against over-complexity as well as arbitrary formatting (pretty
printer is preferable to me to manual formatting of code). Which in this audience unfortunately means that I am a minority.
BTW your idea that this pragma (which should be optional) matters for Perl standard library has no connection to reality.
A very large proportion of the replies you have received in this thread are from people who put a high value on writing maintainable
code. "maintainable" is short hand for code that is written to be understood and maintained with minimum effort over long periods
of time and by different programmers of mixed ability.
There is a strong correlation with your stance of "defensive programming"
... against over-complexity as well as arbitrary formatting . None of us are arguing with that stance. We are arguing with
the JavaScript semicolon that you would like introduced based on a personal whim in a context of limited understanding of Perl
syntax and idiomatic use.
Personally I use an editor that has an on demand pretty printer which I use frequently. The pretty printer does very little
work because I manually format my code as I go and almost always that is how the pretty printer will format it. I do this precisely
to ensure my code is not overly complex and is maintainable. I do this in all the languages that I use and the hardest languages
to do that in are Python, VBScript and JavaScript because of the way they deal with semi-colons.
Oh, and in case it is of interest, dave_the_m is one of
the current maintainers of Perl. He is in a great position to know how the nuts and bolts of an optional semi-colon change might
be made and has a great understanding of how Perl is commonly used. Both give him something of a position of authority in determining
the utility of such a change.
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Parser lookaheads are implemented in terms of tokens, not characters. The first token of yada is a triple-dot, not a dot. While
you may think it starts with a dot, that's not how the parser sees it, so the existence of yada is not relevant here.
You also completely ruin maintainability and extensibility. Consider a filter module ...
my $fixed = $bad =~ y/\x{00d0}/\x{0110}/r # Eth != D-stroke =~ y/\x{0189}/\x{0110}/r # LETTER AFRICAN D != + D-stroke =~ s{\bpra[ck]ti[sc]e\b}{practice}gr
# All 4 seen in docume + nt AB12.38C =~ s{\bX13\.GtrA\.14\b}{X13_GA12}gr # Product got renamed =~ s{\b1234\s*zip\b}{1234ZIP}gir
# Reciever will crash + on badly formed ZIP code =~ s{\bpays\s*-?\s*bas\b} {The Netherlands}gir # French forms :( =~ ....;
Why you are concentrating on just one proposal. Are all other equally bad ?
As for soft-semicolon you completly misunderstood the situation:
First, nobody force you to use this pragma. And if you do not use it you are not affected. I am thinking now that it should
be enabled only with option -d.
It does not make sense to conduct something like "performance review" in a large corporation for my proposals concentrating
on "soft-semicolon" idea and ignoring all others. As if it is the only one worth any discussion. It might be the easiest one to
piss off, but it is far from being the most important or far reaching among those proposals.
There is no free lunch, and for some coding styles (including but not limited to coding styles used in many modules in Perl
standard library) it is definitely inappropriate. Nobody claim that it is suitable for all users. It is an optional facility for
those who want and need it. In a way, it is a debugging aid that allows to cut the number of debugging runs. And IMHO there is
not a zero subset of Perl users who would be interested in this capability. Especially system administrators who systematically
use bash along with Perl. And many of them do not use sophisticated editors, often this is just vi or Midnight Commander editor.
Detractors can happily stay with the old formatting styles forever. Why is this so difficult to understand before producing
such an example?
Moreover, how can you reconcile the amount of efforts (and resulting bugs) for the elimination of extra round brackets in Perl
with this proposal? Is not this the same idea -- to lessen the possible number of user errors?
For me, it looks like a pure hypocrisy - in one case we are spending some efforts following other scripting languages at some
cost; but the other, similar in its essence, proposal is rejected blindly as just a bad fashion. If this is a fashion, then eliminating
round brackets is also a bad fashion, IMHO.
And why only I see some improvements possible at low cost in the current Perl implementation and nobody else proposed anything
similar or better, or attempted to modify/enhance my proposals? After all Perl 5.10 was a definite step forward for Perl. Perl
7 should be the same.
I think the effort spend here in criticizing my proposal would be adequate to introduce the additional parameter into index
function ("to" limit). Which is needed and absence of which dictates using substr to limit the search zone in long strings. Which
is sub-optimal solution unless the interpreter has advanced optimization capabilities and can recognize such a use as the attempt
to impose the limit on the search.
Or both this and an option in tr that allows it to stop after the first character not is set1 and return this position.:-)
Constructive discussion does not mean pissing off each and every my posts ( one has -17 votes now; looks a little bit like
schoolyard bulling ) -- you need to try to find rational grain in them, and if such exists, try to revise and enhance the proposal.
The stance "I am happy with Perl 'as is' and go to hell with your suggestions" has its value and attraction, but it is unclear
how it will affect the future of the language.
As for soft-semicolon you completly misunderstood the situation: First, nobody force you to use this pragma. And if you
do not use it you are not affected. I am thinking now that it should be enabled only with option -d.
In the OP you make no mention of a pragma in proposal 1, you just say that it would be "highly desirable" to have soft semicolons.
This implies that you would like it to be the default behaviour in Perl 7, which, judging by the responses, would hack a lot of
people off, me included. If you are proposing that soft semicolons are only enabled via a pragma perhaps you should add a note
to that effect in the OP, being sure to make it clear that it is an update rather than silently changing the text.
And IMHO there is not a zero subset of Perl users who would be interested in this capability. Especially system administrators
who systematically use bash along with Perl.
I spent the last 26 years of my career as a systems administrator (I had no ambition to leave technical work and become a manager)
on Unix/Linux systems and started using Perl in that role in 1994 with perl 4.036, quickly moving to 5. The lack of semicolon
statement terminators in the various shell programming languages I had to use was a pain in the arse and moving to Perl was a
huge relief as well as a boost to effectiveness. I would not be the slightest bit interested in soft semicolons and they would,
to my mind, be either a debugging nightmare or would force me into a coding style alien to my usual practice.
to which I say, nonsense! Why add unnecessary round brackets to perfectly valid code? Use round brackets where they are needed
to disambiguate precedence but not where they just add superfluous noise. Nothing to do with fascination, I've never touched Python!
You should be commended on the amount of thought that you have put into your proposals and such efforts should not be discouraged.
It is unfortunate that your first proposal has been the most contentious and the one that most responses have latched onto. Sticking
to one's guns is also a praiseworthy trait but doing so in the face of several powerful and cogent arguments to the contrary from
experienced Perl users is perhaps taking it too far. Making it clear that soft semicolons would not be the default behaviour might
apply some soothing balm to this thread.
It does not make sense to conduct something like "performance review" in a large corporation for my proposals concentrating
on "soft-semicolon" idea and ignoring all others. As if it is the only one worth any discussion.
Others have already contributed their thoughts on the rest of your proposals, which I generally agree with and (more significantly)
you haven't disputed. IMO, the primary reason that all the discussion is focusing on soft semicolons is because it's the only
point you're attempting to defend against our criticisms. There was also a brief subthread about your ideas on substring manipulation,
and a slightly longer one about alternate braces which close multiple levels of blocks, but those only lasted as long as you continued
the debate.
In a way, it is a debugging aid that allows to cut the number of debugging runs.
Seems like just the opposite to me. It may allow you to get your code to run sooner, but, when it does, any semicolon errors will
still be there and need to be fixed in additional debugging runs. Maybe a marginal decrease in overall debugging time if there's
a line where you never have to fix the semicolon error because that line ends up getting deleted before you finish, but it seems
unlikely to provide any great savings if (as you assert) such errors are likely to be present on a significant proportion of lines.
Also, even if it does cut out some debugging runs, they're runs with a very fast turnaround and little-to-no cognitive effort
involved. According to your "BlueJ" paper, even rank beginners need only 8 seconds to fix a missing semicolon error and initiate
a new compile.
if we assume that somebody uses this formatting to suffix conditionals
I do, pretty much all the time! The ability to span a statement over multiple lines
without jumping through backslash hoops is one of the things that makes Perl so
attractive. I also think it makes code much easier to read rather than having excessively
long lines that involve either horizontal scrolling or line wrapping. As to your
comment regarding
excessive length identifiers, I come from a Fortran IV background where we had a maximum
of 8 characters for identifiers (ICL 1900 Fortran compiler) so I'm all for long,
descriptive and unambiguous identifiers that aid those who come after in understanding my
code.
It might make sense to enable it only with -d options as a help for debugging, which
cuts the number of debugging runs for those who do not have editor with built-in syntax
checking (like ActiveState Komodo Editor; which really helps is such cases ).
That list includes most Linux/Unix system administrators, who use just command line
and vi or similar. And they also use bash of daily basis along with Perl, which increases
the probability of making such an error. And this is probably one of the most important
category of uses for the future of Perl: Perl started with this group (Larry himself,
Randal L. Schwartz, Tom Christiansen, etc) and after a short affair with the Web
programming (yahoo, etc) and bioinformatics (bioperl) retreated back to the status of the
scripting language of choice for the elite Unix sysadmins.
That does not exclude other users and applications, but I think the core of Perl users
are now Unix sysadmins. And their interests should be reflected in Perl 7 with some
priority.
BTW, I do not see benefits of omitted semicolons in the final program (as well as, in
certain cases, omitted round brackets).
if we assume that somebody uses this formatting to suffix conditionals
I do, pretty much all the time! The ability to span a statement over multiple lines
without jumping through backslash hoops is one of the things that makes Perl so
attractive. I also think it makes code much easier to read rather than having excessively
long lines that involve either horizontal scrolling or line wrapping. As to your
comment regarding
excessive length identifiers, I come from a Fortran IV background where we had a maximum
of 8 characters for identifiers (ICL 1900 Fortran compiler) so I'm all for long,
descriptive and unambiguous identifiers that aid those who come after in understanding my
code.
It might make sense to enable it only with -d options as a help for debugging, which
cuts the number of debugging runs for those who do not have editor with built-in syntax
checking (like ActiveState Komodo Editor; which really helps is such cases ).
That list includes most Linux/Unix system administrators, who use just command line
and vi or similar. And they also use bash of daily basis along with Perl, which increases
the probability of making such an error. And this is probably one of the most important
category of uses for the future of Perl: Perl started with this group (Larry himself,
Randal L. Schwartz, Tom Christiansen, etc) and after a short affair with the Web
programming (yahoo, etc) and bioinformatics (bioperl) retreated back to the status of the
scripting language of choice for the elite Unix sysadmins.
That does not exclude other users and applications, but I think the core of Perl users
are now Unix sysadmins. And their interests should be reflected in Perl 7 with some
priority.
BTW, I do not see benefits of omitted semicolons in the final program (as well as, in
certain cases, omitted round brackets).
We present a rationale for expanding the presence of the Lisp family of programming
languages in bioinformatics and computational biology research. Put simply, Lisp-family
languages enable programmers to more quickly write programs that run faster than in other
languages. Languages such as Common Lisp, Scheme and Clojure facilitate the creation of
powerful and flexible software that is required for complex and rapidly evolving domains like
biology. We will point out several important key features that distinguish languages of the
Lisp family from other programming languages, and we will explain how these features can aid
researchers in becoming more productive and creating better code. We will also show how these
features make these languages ideal tools for artificial intelligence and machine learning
applications. We will specifically stress the advantages of domain-specific languages (DSLs):
languages that are specialized to a particular area, and thus not only facilitate easier
research problem formulation, but also aid in the establishment of standards and best
programming practices as applied to the specific research field at hand. DSLs are particularly
easy to build in Common Lisp, the most comprehensive Lisp dialect, which is commonly referred
to as the 'programmable programming language'. We are convinced that Lisp grants programmers
unprecedented power to build increasingly sophisticated artificial intelligence systems that
may ultimately transform machine learning and artificial intelligence research in
bioinformatics and computational biology.
The programming language Lisp is credited for pioneering fundamental computer science
concepts that have influenced the development of nearly every modern programming language to
date. Concepts such as tree data structures, automatic storage management, dynamic typing,
conditionals, exception handling, higher-order functions, recursion and more have all shaped
the foundations of today's software engineering community. The name Lisp derives from 'List
processor' [ 1 ], as
linked lists are one of Lisp's major data structures, and Lisp source code is composed of
lists. Lists, which are a generalization of graphs, are extraordinarily well supported by Lisp.
As such, programs that analyze sequence data (such as genomics), graph knowledge (such as
pathways) and tabular data (such as that handled by R [ 2 ]) can be written easily, and can be made to work
together naturally in Lisp. As a programming language, Lisp supports many different programming
paradigms, each of which can be used exclusively or intermixed with others; this includes
functional and procedural programming, object orientation, meta programming and reflection.
But more to the point, we have empirical evidence that Lisp is a more productive
general-purpose programming language than the other usual suspects, and that most Lisp programs
run faster than their counterparts in other languages. Gat [ 3 ] compared the run times, development times and
memory usage of 16 programs written by 14 programmers in Lisp, C/C ++ and Java.
Development times for the Lisp programs ranged from 2 to 8.5 h, compared with 2 to 25 h for
C/C ++ and 4 to 63 h for Java (programmer experience alone does not account for
the differences). The Lisp programs were also significantly shorter than the other
programs.
And although the execution times of the fastest C/C ++ programs were faster than the
fastest Lisp programs, on average, the Lisp programs ran significantly faster than the
C/C ++ programs and much faster than the Java programs (mean runtimes were 41 s for Lisp
versus 165 s for C/C ++).
Lisp applications and dialects
In bioinformatics and computational biology, Lisp has successfully been applied to research
in systems biology [ 4 ,
5 ], high-performance
computing (HPC) [ 6 ],
database curation [ 7 ,
8 ], drug discovery [
9 ], computational
chemistry and nanotechnology [ 10 , 11 ], network and pathway -omics analysis [ 12 , 13 , 14 , 15 , 16 ], single-nucleotide polymorphism analysis [ 17 , 18 , 19 ] and RNA structure prediction [ 20 , 21 , 22 ]. In general, the Lisp family of programming languages, which
includes Common Lisp, Scheme and Clojure, has powered multiple applications across fields as
diverse as [ 23 ]:
animation and graphics, artificial intelligence (AI), bioinformatics, B2B and e-commerce, data
mining, electronic design automation/semiconductor applications, embedded systems, expert
systems, finance, intelligent agents, knowledge management, mechanical computer-aided design
(CAD), modeling and simulation, natural language, optimization, risk analysis, scheduling,
telecommunications and Web authoring.
Programmers often test a language's mettle by how successfully it has fared in commercial
settings, where big money is often on the line. To this end, Lisp has been successfully adopted
by commercial vendors such as the Roomba vacuuming robot [ 24 , 25 ], Viaweb (acquired by Yahoo! Store) [ 26 ], ITA Software (acquired by Google Inc. and
in use at Orbitz, Bing Travel, United Airlines, US Airways, etc.) [ 27 ], Mirai (used to model the Gollum character
for the Lord of the Rings movies) [ 28 ], Boeing [ 29 ], AutoCAD [ 30 ], among others. Lisp has also been the driving force behind open source
applications like Emacs [ 31 ] and Maxima [ 32 ], which both have existed for decades and continue to be used
worldwide.
Among the Lisp-family languages (LFLs), Common Lisp has been described as the most powerful
and accessible modern language for advanced biomedical concept representation and manipulation
[ 33 ]. For concrete
code examples of Common Lisp's dominance over mainstream programming languages like R and
Python, we refer the reader to Sections 4 and 5 of Ross Ihaka's (creator of the R programming
language) seminal paper [ 34 ].
Scheme [ 35 ] is an
elegant and compact version of Common Lisp that supports a minimalistic core language and an
excellent suite of language extension tools. However, Scheme has traditionally mainly been used
in teaching and computer science research and its implementors have thus prioritized small
size, the functional programming paradigm and a certain kind of 'cleanliness' over more
pragmatic features. As such, Scheme is considered far less popular than Common Lisp for
building large-scale applications [ 24 ].
The third most common LFL, Clojure [ 36 , 37 ], is a rising star language in the modern software development
community. Clojure specializes in the parallel processing of big data through the Java Virtual
Machine (JVM), recently making its debut in bioinformatics and computational biology research [
38 , 39 , 40 ]. Most recently, Clojure was used to
parallelize the processing and analysis of SAM/BAM files [ 39 ]. Furthermore, the BioClojure project provides seeds
for the bioinformatics community that can be used as building blocks for writing LFL
applications. As of now, BioClojure consists of parsers for various kinds of file formats
(UniProtXML, Genbank XML, FASTA and FASTQ), as well as wrappers of select data analysis
programs (BLAST, SignalP, TMHMM and InterProScan) [ 39 ].
As a whole, Lisp continues to develop new offshoots. A relatively recent addition to the
family is Julia [ 41
]. Although it is sometimes touted 'C for scientists' and caters to a different community
because of its syntactical proximity to Python, it is a Lisp at heart and certainly worth
watching.
Rewards and challenges
In general, early adopters of a language framework are better poised to reap the scientific
benefits, as they are the first to set out building the critical libraries, ultimately
attracting and retaining a growing share of the research and developer community. As library
support for bioinformatics tasks in the Lisp family of programming languages (Clojure, Common
Lisp and Scheme) is yet in its early stages and on the rise, and there is (as of yet) no
officially established bioinformatics Lisp community, there is plenty of opportunity for
high-impact work in this direction.
It is well known that the best language to choose from should be the one that is most well
suited to the job at hand. Yet, in practice, few programmers may consider a nonmainstream
programming language for a project, unless it offers strong, community-tested benefits over its
popular contenders for the specific task under study. Often times, the choice comes down to
library support: does language X already offer well-written, optimized code to help solve my
research problem, as opposed to language Y (or perhaps language Z)? In general, new language
adoption boils down to a chicken-and-egg problem: without a large user base, it is difficult to
create and maintain large-scale, reproducible tools and libraries. But without these tools and
libraries, there can never be a large user base. Hence, a new language must have a big
advantage over the existing ones and/or a powerful corporate sponsorship behind it to compete [
42 ]. Most often, a
positive feedback loop is generated by repositories of useful libraries attracting users, who,
in turn, add more functional libraries, thereby raising a programming language's popularity,
rather than reflecting its theoretical potential.
With mainstream languages like R [ 2 ] and Python [ 43 ] dominating the bioinformatics and computational biology scene for
years, large-scale software development and community support for other less popular language
frameworks have waned to relative obscurity. Consequently, languages winning over increasingly
growing proportions of a steadily expanding user base have the effect of shaping research
paradigms and influencing modern research trends. For example, R programming generally promotes
research that frequently leads to the deployment of R packages to Bioconductor [ 44 ], which has steadily grown into
the largest bioinformatics package ecosystem in the world, whose package count is considerably
ahead of BioPython [ 45 ], BioClojure [ 38 ], BioPerl [ 46 ], BioJava [ 47 ], BioRuby [ 48 ], BioJulia [ 49 ] or SCABIO [ 50 ]. Given the choice, R programmers interested in deploying large-scale
applications are more likely to branch out to releasing Web applications (e.g. Shiny [
51 ]) than to
graphical user interface (GUI) binary executables, which are generally more popular with
lower-level languages like C/C ++ [ 52 ]. As such, language often dictates research direction, output and
funding. Questions like 'who will be able to read my code?', 'is it portable?', 'does it
already have a library for that?' or 'can I hire someone?' are pressing questions, often
inexorably shaping the course and productivity of a project. However, despite its popularity, R
has been severely criticized for its many shortcomings by its own creator, Ross Ihaka, who has
openly proposed to scrap the language altogether and start afresh by using a Lisp-based engine
as the foundation for a statistical computing system [ 34 , 53 ].
As a community repository of bioinformatics packages, BioLisp does not yet exist as such
(albeit its name currently denotes the native language of BioBike [ 4 , 54 ], a large-scale bioinformatics Lisp application),
which means that there is certainly wide scope and potential for its rise and development in
the bioinformatics community.
Macros and domain-specific languages
Lisp is a so-called homoiconic language, which means that Lisp code is represented as a data
structure of the language itself in such a way that its syntactical structure is preserved. In
more technical terms, while the Lisp compiler has to parse the textual representation of the
program (the 'source code') into a so-called abstract syntax tree (like any other compiler of
any programming language has to), a Lisp program has direct access to (and can modify) this
abstract syntax tree, which is presented to the program in a convenient, structured way.
This property enables Lisp to have a macro system that remains undisputed in the programming
language world [ 55 ].
Although 'macros' in languages like C have the same name, they are essentially just text
substitutions performed on the source code before it is compiled and they cannot always
reliably preserve the lexical structure of the code. Lisp macros, on the other hand, operate at
the syntactic level. They transform the program structure itself and, as opposed to C macros,
are written in the same language they work on and have the full language available all the
time. Lisp macros are thus not only used for moderately simple 'find and replace' chores but
can apply extensive structural changes to a program. This includes tasks that are impossible in
other languages. Examples would be the introduction of new control structures (while Python
users had to wait for the language designers to introduce the 'with' statement in version 2.5,
Lisp programmers could always add something like that to the language themselves), pattern
matching capabilities (while Lisp does not have pattern matching like ML or Haskell out of the
box, it is easy to add [ 56 ]) or the integration of code with markup languages (if you want you can,
e.g., write code that mimics the structure of an HTML document it is supposed to emit [
57 , 58 ]).
In addition to that, Common Lisp even offers access to its 'reader', which means that code
can be manipulated (in Lisp) before it is parsed [ 59 ]. This enables Lisp programs to completely change
their surface syntax if necessary. Examples would be code that adds Perl-like interpolation
capabilities to Lisp strings [ 60 ] or a library [ 61 ] that enables Lisp to read arithmetic in 'infix' notation, i.e. to
understand '20 + 2 * 21' in addition to the usual '(+ 20 (* 2 21))'.
These features make Lisp an ideal tool for the creation of domain-specific languages:
languages that are custom-tailored to a specific problem domain but can still have access to
all of Lisp. A striking example is Common Prolog [ 62 ], a professional Prolog system implemented and
embedded in Common Lisp. In bioinformatics, the Biolingua [ 5 ] project (now called BioBike) built a cloud-based
general symbolic biocomputing domain-specific language (DSL) entirely in Common Lisp. The
system, which could be programmed entirely through the browser, was its own complete
biocomputing language, which included a built-in deductive reasoner, called BioDeducta [
54 ]. Biolingua
programs, guided by the reasoner, would invisibly call tools such as BLAST [ 63 ] and Bioconductor [
44 ] on the
server-side, as needed. Symbolic biocomputing has also previously been used to create
user-friendly visual tools for interactive data analysis and exploration [ 64 ].
Other unique
strengths
In addition to homoiconicity, Lisp has several other features that set it apart from
mainstream languages:
In Lisp, programmers usually work in a special incremental interactive programming
environment called the read-eval-print loop (REPL) [ 65 , 66 ]. This means that the Lisp system continuously reads
expressions typed by the user, evaluates them and prints the results. The REPL enables a
paradigm that allows the programmer to continually interact with their program as it is
developed. This is similar to the way Smalltalk 'images' evolve [ 59 ] and different from the usual
edit-compile-link-execute cycle of C-like languages. This approach lends itself well to
explorative programming and rapid prototyping. As such, the REPL enables the programmer to
write a function, test it, change it, try a different approach, etc., while never having to
stop for any lengthy compilation cycles [ 24 ].
Common Lisp was designed from the ground up to create large, complex and long-running
applications and thus supports software 'hot swapping': the code of a running program can
be changed without the need to interrupt it. This includes features like the ability of the
Common Lisp object system (CLOS) to change the classes of existing objects. Although Erlang
and Smalltalk also support hot swapping, no mainstream compiled language does this to our
knowledge. Hot swapping can be performed in Java to a certain extent, but only with the
help of third-party frameworks, as it is not an intrinsic feature of the language
itself.
Lisp invented exception handling, and Common Lisp, in particular, has an error-handling
facility (the 'condition system' [ 24 ]) that goes far beyond most other languages: it does not necessarily
unwind the stack if an exception occurs and instead offers so-called restarts to
programmatically continue 'where the error happened'. This system makes it easy to write
robust software, which is an essential ingredient to building industry-strength
fault-tolerant systems capable of handling a variety of conditions, a trait especially
useful for artificial intelligence and machine learning applications. In the Bioconductor
community, error-handling facilities are ubiquitously present in practically all
R/Bioconductor packages via tryCatch(), a base R function whose roots originate directly
from Lisp's condition system.
Common Lisp implementations usually come with a sophisticated 'foreign function
interface' (FFI) [ 24 ], which allows direct access from Lisp to code written in C or
C ++ and sometimes also to Java code. This enables Lisp programmers to make use of
libraries written in other languages, making those libraries a direct strength of Lisp. For
instance, it is simple to call Bioconductor from Lisp, just as Python and other programming
languages can [ 67
, 68 ]. Likewise,
Clojure runs on the JVM and, thus, has immediate access to all of Java's libraries.
It has been shown that these features, together with other amenities like powerful debugging
tools that Lisp programmers take for granted, offer a significant productivity boost to
programmers [ 3 ]. Lisp
also gives programmers the ability to implement complex data operations and mathematical
constructs in an expressive and natural idiom [ 69 ].
Speed considerations
The interactivity and flexibility of Lisp languages are something that can usually only be
found (if at all) in interpreted languages. This might be the origin of the old myth that Lisp
is interpreted and must thus be slow -- however, this is not true. Compilers for Lisp have
existed since 1959, and all major Common Lisp implementations nowadays can compile directly to
machine code, which is often on par with C code [ 70 , 71 , 72 ] or only slightly slower. Some also offer an interpreter in addition to
the compiler, but examples like Clozure Common Lisp demonstrate that a programmer can have a
compiler-only Common Lisp. For example, CL-PPCRE, a regular expression library written in
Common Lisp, runs faster than Perl's regular expression engine on some benchmarks, even though
Perl's engine is written in highly tuned C [ 24 ].
Although programmers who use interpreted languages like Python or Perl for their convenience
and flexibility will have to resort to writing in C/C ++ for time-critical portions of
their code, Lisp programmers can usually have their cake and eat it too. This was perhaps best
shown with direct benchmarking by the creator of the R programming language, Ross Ihaka, who
provided benchmarks demonstrating that Lisp's optional type declaration and machine-code
compiler allow for code that is 380 times faster than R and 150 times faster than Python [
34 ]. And not only
will the code created by Lisp compilers be efficient by default, Common Lisp, in particular,
offers unique features to optimize those parts of the code (usually only a tiny fraction) that
really need to be as fast as possible [ 59 ]. This includes so-called compiler macros, which can transform function
calls into more efficient code at runtime, and a mandatory disassembler, which enables
programmers to fine-tune time-critical functions until the compiled code matches their
expectations. It should also be emphasized that while the C or Java compiler is 'history' once
the compiled program is started, the Lisp compiler is always present and can thus generate new,
fast code while the program is already running. This is rarely used in finished applications
(except for some areas of AI), but it is an important feature during development and helpful
for explorative programming.
To further debunk the popular misconception that Lisp languages are slow, Clojure was
recently used to process and analyze SAM/BAM files [ 39 ] with significantly less lines of code and almost
identical speeds as SAMTools [ 73 ], which is written in the C programming language. In addition, Common
Lisp was recently used to build a high-performance tool for preparing sequence alignment/map
files for variant calling in sequencing pipelines [ 6 ]. This HPC tool was shown to significantly outperform
SAMTools and Picard on a variety of benchmarks [ 6 ].
A case study: Pathway Tools
Pathway Tools [ 74
, 75 ] is an example
of a large bioinformatics software system written in Common Lisp (Allegro Common Lisp from
Franz Inc.). Pathway Tools has among the largest functionality of any bioinformatics software
system, including genome informatics, regulatory network informatics, metabolic pathway
informatics and omics data analysis. For example, the software includes a genome browser that
zooms from the nucleotide level to the chromosome level; it infers metabolic reconstructions
from annotated genomes; it computes organism-specific layouts of metabolic map diagrams; it
computes optimal routes within metabolic networks; and it can execute quantitative metabolic
flux models.
The same Pathway Tools binary executable can execute as both a desktop window application
and as a Web server. In Web server mode, Pathway Tools powers the BioCyc.org Web site, which
contains 7600 organism-specific Pathway/Genome Databases, and services ∼500 000 unique
visitors per year and up to 100 000 page views per day. Pathway Tools uses the 'hot-swapping'
capabilities of Common Lisp to download and install software patches at user sites and within
the running BioCyc Web server. Pathway Tools has been licensed by 7200 groups, and was found to
have the best performance and documentation among multiple genome database warehousing systems
[ 76 ].
Pathway Tools consists of 680 000 lines of Common Lisp code (roughly the equivalent of 1 400
000 lines of C or Java code), organized into 20 subsystems. In addition, 30 000 lines of
JavaScript code are present within the Pathway Tools Web interface. We chose Common Lisp for
development of Pathway Tools because of its excellent properties as a high-level, highly
productive, easy-to-debug programming language; we strongly believe that the choice of Common
Lisp has been a key factor behind our ability to develop and maintain this large and complex
software system.
A case study: BioBike
BioBike provides an example of a large-scale application of the power of homoiconicity. In
personal communication, the inventor of BioBike, Jeff Shrager, explained why Lisp (in this
case, Common Lisp) was chosen as the implementation language, an unusual choice even for the
early 2000's. According to Shrager, Lisp-style DSL creation is uniquely suited to 'living'
domains, such as biology, where new concepts are being introduced on an ongoing basis (as
opposed to, for example, electronics, where the domain is better understood, and so the
conceptual space is more at rest). Shrager pointed out that as Lisp-based DSLs are usually
implemented through macros, this provides the unique capability of creating new language
constructs that are embedded in the home programming language (here, in Lisp). This is a
critical distinction: in most programming languages, DSLs are whole new programming languages
built on top of the base language, whereas in Lisp, DSLs are built directly into the
language.
Lisp-based DSLs commonly show up in two sorts of domain-specific control structures:
WITH- clauses and MAP- clauses. By virtue of Lisp's homoiconicity, such
constructs can take code as arguments, and can thereby create code-local bindings, and do
various specialized manipulation directly on the code itself, in accord with the semantics of
the new construct. In non-homoiconic languages, users must do this either by creating new
classes/objects, or through function calls or via an ugly hack commonly referred to as
'Greenspun's 10th rule' [ 77 ], wherein users must first implement a quasi-LFL on top of the base
language, and then implement the DSL in that quasi-LFL. Both the object-creation and
function-call means of creating new constructs lead to encapsulation problems, often requiring
ugly manipulations such as representing code as strings, passing code-conditionalizing
arguments, and then having to either globalize them, or re-pass them throughout a large part of
the codebase. The Lisp-like methods of embedding DSLs into the base language via macros, one
can simply use, for example, a WITH-GENES or a MAP-GENES macro wrapper, and within these, all
one need do is to write normal everyday Lisp code, and the wrapper, because it has access to
and can modify the code that gets run, has no such firewalls, enabling a much more powerful
sort of computation. This greatly simplifies the incremental creation and maintenance of the
DSL, and it is for this reason, argues Shrager, that Lisp (and LFLs more generally) is well
suited to biology. Being a science that is creating new concepts constantly, it is especially
important to be able to flexibly add concepts to the DSL.
BioBike was created by a team led by Jeff Shrager and JP Massar, and later Jeff Elhai. Its
core Web listener is almost 15 000 lines of Common Lisp code in 25 modules, and the entire
BioBike system is nearly 400 000 lines of code in about 850 modules, including the Web
listener, many specialized bioinformatics modules, a scratch-like visual programming language
(built using a specialized LFL that compiles to JavaScript, because of Peter Siebel), a
specialized bioinformatics-oriented frame system (because of Mike Travers) and many other
smaller modules.
Perspectives and outlook
Historically speaking, Lisp is the second oldest (second only to Fortran) programming
language still in use and has influenced nearly every major programming language to date with
its constructs [ 78 ].
For example, it may be surprising to learn that R is written atop of Scheme [ 79 ]. In fact, R borrows directly
from its Lisp roots for creating embedded domain-specific languages within R's core language
set [ 80 ]. For
instance, ggplot2 [ 81
], dplyr [ 82 ] and
plyr [ 83 ] are all
examples of DSLs in R. This highlights the importance and relevance of Lisp as a programmable
programming language, namely the ability to be user-extensible beyond the core language set.
Given the wide spectrum of domains and subdomains in bioinformatics and computational biology
research, it follows that similar applications tailored to genomics, proteomics, metabolomics
or other research fields may also be developed as extensible macros in Common Lisp. By way of
analogy, perhaps a genomics equivalent of ggplot2 or dplyr is in store in the not-so-distant
future. Advice for when such pursuits are useful is readily available [ 84 ]. Perhaps even more
importantly, it is imperative to take into the consideration the future of statistical
computing [ 34 ],
which will form the big data backbone of artificial intelligence and machine learning
applications in bioinformatics.
Conclusions
New programming language adoption in a scientific community is both a challenging and
rewarding process. Here, we advocate for and propose a greater inclusion of the LFLs into
large-scale bioinformatics research, outlining the benefits and opportunities of the adoption
process. We provide historical perspective on the influence of language choice on research
trends and community standards, and emphasize Lisp's unparalleled support for homoiconicity,
domain-specific languages, extensible macros and error handling, as well as their significance
to future bioinformatics research. We forecast that the current state of Lisp research in
bioinformatics and computational biology is highly conducive to a timely establishment of
robust community standards and support centered around not only the development of
bioinformatic domain-specific libraries but also the rise of highly customizable and efficient
machine learning and AI applications written in languages like Common Lisp, Clojure and
Scheme.
Key Points
Lisp empowers programmers to write faster programs faster. An empirical study shows that
when programmers tackle the same problems in Lisp, C/C ++ and Java, that the Lisp
programs are smaller (and therefore easier to maintain), take less time to develop and run
faster.
The Lisp family of programming languages (Common Lisp, Scheme and Clojure) makes it easy
to create extensible macros, which facilitate the creation of modularized extensions to
help bioinformaticians easily create plug-ins for their software. This, in turn, paves the
way for creating enterprise-level, fault-tolerant domain-specific languages in any research
area or specialization.
The current state of Lisp research in bioinformatics and computational biology is at a
point where an official BioLisp community is likely to be established soon, especially
considering the documented shortcomings of mainstream programming languages like R and
Python when compared side by side with identical implementations in Lisp.
Bohdan B. Khomtchouk is an NDSEG Fellow and PhD candidate in the Human Genetics and Genomics
Graduate Program at the University of Miami Miller School of Medicine. His research interests
include bioinformatics and computational biology applications in HPC, integrative multi-omics,
artificial intelligence, machine learning, mathematical genetics, biostatistics, epigenetics,
visualization, search engines and databases.
Edmund Weitz is full professor at the University of Applied Sciences in Hamburg, Germany. He
is a mathematician and his research interests include set theory, logic and combinatorics.
Peter D. Karp is the director of the Bioinformatics Research Group within the Artificial
Intelligence Center at SRI International. Dr Karp has authored >130 publications in
bioinformatics and computer science in areas including metabolic pathway bioinformatics,
computational genomics, scientific visualization and scientific databases.
Claes Wahlestedt is Leonard M. Miller Professor at the University of Miami Miller School of
Medicine and is working on a range of basic science and translational efforts in his roles as
Associate Dean and Center Director for Therapeutic Innovation. The author of some 250
peer-reviewed scientific publications, his ongoing research projects concern bioinformatics,
epigenetics, genomics and drug/biomarker discovery across several therapeutic areas. He has
experience not only from academia but also from leadership positions in the pharmaceutical and
biotechnology industry.
Acknowledgements
B.B.K. dedicates this work to the memory of his uncle, Taras Khomchuk. B.B.K. wishes to
acknowledge the financial support of the United States Department of Defense (DoD) through the
National Defense Science and Engineering Graduate Fellowship (NDSEG) Program: this research was
conducted with Government support under and awarded by DoD, Army Research Office (ARO),
National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a. C.W. thanks
Jeff Shrager for critical review and helpful comments on the manuscript.
[Edited] [Highly desirable] Make a semicolon optional at the end of the line,
if there is a balance of brackets on the line and the statement looks syntactically correct
(optional pragma "soft semicolon", similar to the solution used in famous IBM PL/1 debugging
compiler). That can help sysadmins who use bash and Perl in parallel and work from
command line with vi or similar editors, and are not using such editors as Komodo Edit which
flag syntax errors. If might make sense to enable this pragma only via option -d of the
interpreter. In this case it will suit as a pure debugging aid, cutting the number of
iterations of editing the source before actual run. It does not make much sense to leave
statements without semicolons in the final, production version of the program. See, for
example, the discussion in Stack Overflow
Do you recommend using semicolons after every statement in JavaScript
if we assume that somebody uses this formatting to suffix conditionals
I do, pretty much all the time! The ability to span a statement over multiple lines
without jumping through backslash hoops is one of the things that makes Perl so attractive.
I also think it makes code much easier to read rather than having excessively long lines
that involve either horizontal scrolling or line wrapping. As to your comment regarding excessive length
identifiers, I come from a Fortran IV background where we had a maximum of 8 characters for
identifiers (ICL 1900 Fortran compiler) so I'm all for long, descriptive and unambiguous
identifiers that aid those who come after in understanding my code.
It might make sense to enable it only with -d options as a help for debugging, which
cuts the number of debugging runs for those who do not have editor with built-in syntax
checking (like ActiveState Komodo Editor; which really helps is such cases ).
That list includes most Linux/Unix system administrators, who use just command line and
vi or similar. And they also use bash of daily basis along with Perl, which increases the
probability of making such an error. And this is probably one of the most important
category of uses for the future of Perl: Perl started with this group (Larry himself,
Randal L. Schwartz, Tom Christiansen, etc) and after a short affair with the Web
programming (yahoo, etc) and bioinformatics (bioperl) retreated back to the status of the
scripting language of choice for the elite Unix sysadmins.
That does not exclude other users and applications, but I think the core of Perl users
are now Unix sysadmins. And their interests should be reflected in Perl 7 with some
priority.
BTW, I do not see benefits of omitted semicolons in the final program (as well as, in
certain cases, omitted round brackets).
In the following, the first line has a balance of brackets and looks syntactically
correct. Would you expect the lexer to add a semicolon?
$a = $b + $c
+ $d + $e;
Yes, and the user will get an error. This is similar to previous example with
trailing on a new line
if (1);
The first question is why he/she wants to format the code this way if he/she suffers
from "missing semicolons" problem, wants to avoid missing semicolon error and, supposedly
deliberately enabled pragma "softsemicolons" for that?
This is the case where the user need to use #\ to inform the scanner about his choice.
But you are right in a sense that it creates a new type of errors -- "missing
continuation." And that there is no free lunch. This approach requires specific discipline
to formatting your code.
The reason I gave that code as an example is that it's a perfectly normal way of
spreading complex expressions over multiple lines: e.g. where you need to add several
variables together and the variables have non-trivial (i.e. long) names,
e.g.
$pressure = $partial_pressure_nitrogen + $partial_pressure_oxygen +
$partial_pressure_water_vapour + $partial_pressure_argon +
$partial_pressure_carbon_dioxide;[download]
In this case, the automatic semicolons are unhelpful and will give rise to confusing error
messages. So you've just switched one problem for another, and raised the cognitive load -
people now need to know about your pragma and also know when its in scope.
Yes it discourages certain formatting style. So what ? If you can't live without such
formatting (many can) do not use this pragma. BTW you can always use extra parentheses,
which will be eliminated by the parser as in
* How exactly does the lexer/parser know when it should insert a soft semicolon?
* How exactly does it give a meaningful error message when it inserts one where the user
didn't intend for there to be one?
My problem with your proposal is that it seems to require the parser to apply some
complex heuristics to determine when to insert and when to complain meaningfully. It is not
obvious to me what these heuristics should be. My suspicion is that such an implementation
will just add to perl's already colourful collection of edge cases, and just confuse both
beginner and expert alike.
Bear in mind that I am one of just a handful of people who actively work on perl's lexer
and parser, so I have a good understanding of how it works, and am painfully aware of its
many complexities. (And its quite likely that I would end up being the one implementing
this.)
The lexical analyser is Perl is quite sophisticated due to lexical complexity of the
language. So I think it already counts past lexems and thus can determine the balance of
"()", '[]' and "{}"
So you probably can initially experiment with the following scheme
If all the following conditions are true
You reached the EOL
Pragma "softsemicolon" is on
The balance is zero
[Edited] The last processed token in not ',', '.' '=' *and all derivatives
like ++, -++),"=='( and other conditionals like <,>,!=, =<, <=.<=,
eq,etc), ':','&','&&','!',"||",'+','-','*' or similar tokens which imply the
continuation of the statement.
[Edited] The next token (not symbol but token) via look-ahead buffer is not
one of the set "{", "}", ';', and ".", "!!", "+"(but not "++") '-','*' and several others
(see above).
the lexical analyser needs to insert lexem "semicolon" in the stream of lexem passed to
syntax analyser.
The warning issued should be something like:
"Attempt to correct missing semicolon was attempted. If this is incorrect please use
extra parenthesis or disable pragma "softsemicolon" for this fragment."
From what I read, Perl syntax analyser relies on lexical analyser in some
unorthodox way, so it might be possible to use "clues" from syntax analyser for improving
this scheme. See, for example, the scheme proposed for recursive descent parsers in:
Follow set error recovery
C Stirling - Software: Practice and Experience, 1985 - Wiley Online Library
Some accounts of the recovery scheme mention and make use of non-systematic changes to
their recursive descent parsers in order to improve In the former he anticipates the possibility of
a missing semicolon whereas in the latter he does not anticipate a missing comma
So I think it already counts past lexems and thus can determine the balance of "()",
'[]' and "{}"
It can't currently.
If all the following conditions are true
All of the following satisfy your criteria, are valid and normal perl code,
and would get a semicolon incorrectly inserted based on your criteria: use
softsemicolon; $x = $a + $b; $x = 1 if $condition; $x = 1 unless $condition1 &&
$condition2;[download]
The warning issued should be something like
I didn't ask what the text of the warning should be, I asked how the parser
can determine when the warning should be issued.
the scheme proposed for recursive descent parsers
But perl uses an LR(1) parser, not a recursive descent parser.
All of the following satisfy your criteria, are valid and normal Perl code, and would get
a semicolon incorrectly inserted based on your criteria:
use softsemicolon;
$x = $a
+ $b;
$x = 1
if $condition;
$x = 1 unless $condition1
&& $condition2;
Yes in cases 1 and 2; it depends on depth of look-ahead in case 3. Yes if it
is one symbol. No it it is two(no Perl statement can start with && )
As for "valid and normal" your millage may vary. For people who would want to use this
pragma it is definitely not "valid and normal". Both 1 and 2 looks to me like frivolities
without any useful meaning or justification. Moreover, case 1 can be rewritten
as:
$x =($a + $b);[download]
The case 3 actually happens in Perl most often with regular if and here opening bracket is
obligatory: if ( ( $tokenstr=~/a\[s\]/ || $tokenstr =~/h\[s\]/ ) && (
$tokenstr... ) ){ .... }[download]
Also Python-inspired fascination with eliminating all brackets does not do here any good
1 2 $a=$b=1; 3 $x=1 if $a==1 4 && $b=2;[download]
should generally be written 2 $a=$b=1; 3 $x=1 if( $a==1 4 && $b=2);[download]
I was surprised that the case without brackets was accepted by the syntax analyser. Because
how would you interpret $x=1 if $a{$b}; without brackets is unclear to me. It has
dual meaning: should be a syntax error in one case $x=1 if $a{ $b };[download]
and the test for an element of hash $a in another.
Both 1 and 2 looks to me like frivolities without any useful meaning or
justification
You and I have vastly differing perceptions of what constitutes normal perl
code. For example there are over 700 examples of the 'postfix if on next line' pattern in
the .pm files distributed with the perl core.
There doesn't really seem any point in discussing this further. You have failed to
convince me, and I am very unlikely to work on this myself or accept such a patch into
core.
You and I have vastly differing perceptions of what constitutes normal perl code. For
example there are over 700 examples of the 'postfix if on next line' pattern in the .pm
files distributed with the perl core.
Probably yes. I am an adherent of "defensive programming" who is against
over-complexity as well as arbitrary formatting (pretty printer is preferable to me to
manual formatting of code). Which in this audience unfortunately means that I am a
minority.
BTW your idea that this pragma (which should be optional) matters for Perl standard
library has no connection to reality.
A very large proportion of the replies you have received in this thread are from people
who put a high value on writing maintainable code. "maintainable" is short hand for code
that is written to be understood and maintained with minimum effort over long periods of
time and by different programmers of mixed ability. There is a strong correlation with your
stance of "defensive programming" ... against over-complexity as well as arbitrary
formatting . None of us are arguing with that stance. We are arguing with the
JavaScript semicolon that you would like introduced based on a personal whim in a context
of limited understanding of Perl syntax and idiomatic use.
Personally I use an editor that has an on demand pretty printer which I use frequently.
The pretty printer does very little work because I manually format my code as I go and
almost always that is how the pretty printer will format it. I do this precisely to ensure
my code is not overly complex and is maintainable. I do this in all the languages that I
use and the hardest languages to do that in are Python, VBScript and JavaScript because of
the way they deal with semi-colons.
Oh, and in case it is of interest, dave_the_m is one of the current maintainers
of Perl. He is in a great position to know how the nuts and bolts of an optional semi-colon
change might be made and has a great understanding of how Perl is commonly used. Both give
him something of a position of authority in determining the utility of such a
change.
Optimising for fewest key strokes only makes sense transmitting to Pluto or
beyond
Parser lookaheads are implemented in terms of tokens, not characters. The first token of
yada is a triple-dot, not a dot. While you may think it starts with a dot, that's not how
the parser sees it, so the existence of yada is not relevant here.
You also completely ruin maintainability and extensibility. Consider a filter module
...
my $fixed = $bad =~ y/\x{00d0}/\x{0110}/r # Eth != D-stroke =~
y/\x{0189}/\x{0110}/r # LETTER AFRICAN D != + D-stroke =~ s{\bpra[ck]ti[sc]e\b}{practice}gr
# All 4 seen in docume + nt AB12.38C =~ s{\bX13\.GtrA\.14\b}{X13_GA12}gr # Product got
renamed =~ s{\b1234\s*zip\b}{1234ZIP}gir # Reciever will crash + on badly formed ZIP code
=~ s{\bpays\s*-?\s*bas\b} {The Netherlands}gir # French forms :( =~ ....;[download]
The more examples I see posted by my esteemed co-monks, the less I like the idea, and I
hated it already when I read it in the OP.
As for soft-semicolon you completly misunderstood the situation:
First, nobody force you to use this pragma. And if you do not use it you are not
affected. I am thinking now that it should be enabled only with option -d.
It does not make sense to conduct something like "performance review" in a large
corporation for my proposals concentrating on "soft-semicolon" idea and ignoring all
others. As if it is the only one worth any discussion. It might be the easiest one to piss
off, but it is far from being the most important or far reaching among those proposals.
There is no free lunch, and for some coding styles (including but not limited to coding
styles used in many modules in Perl standard library) it is definitely inappropriate.
Nobody claim that it is suitable for all users. It is an optional facility for those who
want and need it. In a way, it is a debugging aid that allows to cut the number of
debugging runs. And IMHO there is not a zero subset of Perl users who would be interested
in this capability. Especially system administrators who systematically use bash along with
Perl.
Detractors can happily stay with the old formatting styles forever. Why is this so
difficult to understand before producing such an example?
Moreover, how can you reconcile the amount of efforts (and resulting bugs) for the
elimination of extra round brackets in Perl with this proposal? Is not this the same idea
-- to lessen the possible number of user errors?
For me, it looks like a pure hypocrisy - in one case we are spending some efforts
following other scripting languages at some cost; but the other, similar in its essence,
proposal is rejected blindly as just a bad fashion. If this is a fashion, then eliminating
round brackets is also a bad fashion, IMHO.
And why only I see some improvements possible at low cost in the current Perl
implementation and nobody else proposed anything similar or better, or attempted to
modify/enhance my proposals? After all Perl 5.10 was a definite step forward for Perl. Perl
7 should be the same.
I think the effort spend here in criticizing my proposal would be adequate to introduce
the additional parameter into index function ("to" limit). Which is needed and absence of
which dictates using substr to limit the search zone in long strings. Which is sub-optimal
solution unless the interpreter has advanced optimization capabilities and can recognize
such a use as the attempt to impose the limit on the search.
Constructive discussion does not mean pissing off each and every my posts ( one has -17
votes now; looks a little bit like schoolyard bulling ) -- you need to try to find rational
grain in them, and if such exists, try to revise and enhance the proposal.
The stance "I am happy with Perl 'as is' and go to hell with your suggestions" has its
value and attraction, but it is unclear how it will affect the future of the language.
As for soft-semicolon you completly misunderstood the situation: First, nobody force
you to use this pragma. And if you do not use it you are not affected. I am thinking now
that it should be enabled only with option -d.
In the OP you make no mention of a pragma in proposal 1, you just say that it would be
"highly desirable" to have soft semicolons. This implies that you would like it to be the
default behaviour in Perl 7, which, judging by the responses, would hack a lot of people
off, me included. If you are proposing that soft semicolons are only enabled via a pragma
perhaps you should add a note to that effect in the OP, being sure to make it clear that it
is an update rather than silently changing the text.
And IMHO there is not a zero subset of Perl users who would be interested in this
capability. Especially system administrators who systematically use bash along with
Perl.
I spent the last 26 years of my career as a systems administrator (I had no ambition to
leave technical work and become a manager) on Unix/Linux systems and started using Perl in
that role in 1994 with perl 4.036, quickly moving to 5. The lack of semicolon statement
terminators in the various shell programming languages I had to use was a pain in the arse
and moving to Perl was a huge relief as well as a boost to effectiveness. I would not be
the slightest bit interested in soft semicolons and they would, to my mind, be either a
debugging nightmare or would force me into a coding style alien to my usual practice.
to which I say, nonsense! Why add unnecessary round brackets to perfectly valid code?
Use round brackets where they are needed to disambiguate precedence but not where they just
add superfluous noise. Nothing to do with fascination, I've never touched Python!
You should be commended on the amount of thought that you have put into your proposals
and such efforts should not be discouraged. It is unfortunate that your first proposal has
been the most contentious and the one that most responses have latched onto. Sticking to
one's guns is also a praiseworthy trait but doing so in the face of several powerful and
cogent arguments to the contrary from experienced Perl users is perhaps taking it too far.
Making it clear that soft semicolons would not be the default behaviour might apply some
soothing balm to this thread.
It does not make sense to conduct something like "performance review" in a large
corporation for my proposals concentrating on "soft-semicolon" idea and ignoring all
others. As if it is the only one worth any discussion.
Others have already contributed their thoughts on the rest of your proposals,
which I generally agree with and (more significantly) you haven't disputed. IMO, the
primary reason that all the discussion is focusing on soft semicolons is because it's the
only point you're attempting to defend against our criticisms. There was also a brief
subthread about your ideas on substring manipulation, and a slightly longer one about
alternate braces which close multiple levels of blocks, but those only lasted as long as
you continued the debate.
In a way, it is a debugging aid that allows to cut the number of debugging runs.
Seems like just the opposite to me. It may allow you to get your code to run
sooner, but, when it does, any semicolon errors will still be there and need to be fixed in
additional debugging runs. Maybe a marginal decrease in overall debugging time if there's a
line where you never have to fix the semicolon error because that line ends up getting
deleted before you finish, but it seems unlikely to provide any great savings if (as you
assert) such errors are likely to be present on a significant proportion of lines.
Also, even if it does cut out some debugging runs, they're runs with a very fast
turnaround and little-to-no cognitive effort involved. According to your "BlueJ" paper,
even rank beginners need only 8 seconds to fix a missing semicolon error and initiate a new
compile.
That's neither a natural tendency nor an interesting psychological phenomenon. You just
made that up.
Semicolons at the end of a statement are as natural as a full stop "." at the end of a
sentence, regardless of whether the sentence is the last in a paragraph. The verification
process whether a line "looks syntactically correct" takes longer than just hitting the ";"
key, and the chances of a wrong assessment of "correct" may lead to wrong behavior of the
software.
Language-aware editors inform you about a missing semicolon by indenting the following
line as a continuation of the statement in the previous line, so it is hard to miss.
If, on the other hand, you want to omit semicolons, then the discussion should
have informed you that you aren't going to find followers.
Semicolons at the end of a statement are as natural as a full stop "." at the end of a
sentence, regardless of whether the sentence is the last in a paragraph.
I respectfully disagree, but your comment can probably explain fierce
rejection of this proposal in this forum. IMHO this is a wrong analogy as the level of
precision requred is different. If you analyse books in print you will find paragraphs in
which full stop is missing at the end. Most people do not experience difficulties learning
to put a full stop at the end of the sentence most of the time. Unfortunately this does
work this way in programming languages with semicolon at the end of statement. Because what
is needed is not "most of the time" but "all the time"
My view, supported by some circumstantial evidence and my own practice, is that this is
a persistent error that arise independently of the level of qualification for most or all
people, and semicolon at the end of the statement contradicts some psychological mechanism
programmers have.
Because people have a natural tendency to omit them at the end of the line.
Fascinating. I've never heard of, nor observed, such a tendency. Might you
provide references to a few peer-reviewed studies on the topic? I don't necessarily need
URLs or DOIs (although those would be most convenient) - bibliographic citations, or even
just the titles, should be sufficient, since I have access to a good academic publication
search system.
Offhand, the only potentially-related publication I can locate is "The Case of the
Disappearing Semicolon: Expressive-Assertivism and the Embedding Problem" (Philosophia.
Dec2018, Vol. 46 Issue 4), but that's a paper on meta-ethics, not programming.
Literature is available for free only to academic researchers, so some money might be
involved in getting access.
You can start with
A statistical analysis of syntax errors - ScienceDirect
For example, approximately one-fourth of all original syntax errors in the Pascal sample were
missing semicolons or use of comma in place of semicolon 4) indicates that this type of error
is quite infrequent (80o) and hence needn't be of as great a concern to recovery pro
PDF Error log analysis in C programming language courses
BOOK Programming languages
JJ Horning - 1979 - books.google.com
to note that over 14% of the faults occurring in topps programs during the second half of the
experiment were still semicolon faults (compared to 1% for toppsii), and that missing semicolons
were about Every decision takes time, and provides an opportunity for error
n assessment of locally least-cost error recovery
SO Anderson, RC Backhouse, EH Bugge - The Computer , 1983 - academic.oup.com
sym = semicolon in the former, one is anticipating the possibility of a missing semicolon; in contrast,
a missing comma is 13, p. 229) if sy = semicolon then insymbol else begin error(14); if sy = comma
then insymbol end Both conditional statements accept semicolons but the
The role of systematic errors in developmental studies of programming language learners
J Segal, K Ahmad, M Rogers - Journal of Educational , 1992 - journals.sagepub.com
Errors were classified by their surface characteristics into single token (missing gathered from
the students, was that they would experience considerable difficulties with using semicolons,
and that the specific rule of ALGOL 68 syntax concerning the role of the semicolon as a
Cited by 9 Related articles
Follow set error recovery
C Stirling - Software: Practice and Experience, 1985 - Wiley Online Library
Some accounts of the recovery scheme mention and make use of non-systematic changes to
their recursive descent parsers in order to improve In the former he anticipates the possibility of
a missing semicolon whereas in the latter he does not anticipate a missing comma
A first look at novice compilation behaviour using BlueJ
MC Jadud - Computer Science Education, 2005 - Taylor & Francis
or mark themselves present from weeks previous they may have missed -- either way change
programmer behaviour -- perhaps encouraging them to make fewer "missing semicolon" errors,
or be or perhaps highlight places where semicolons should be when they are missing
Making programming more conversational
A Repenning - 2011 IEEE Symposium on Visual Languages , 2011 - ieeexplore.ieee.org
Miss one semicolon in a C program and the program may no longer work at all Similar to code
auto-completion approaches, these kinds of visual programming environments prevent syntactic
programming mistakes such as missing semicolons or typos
Literature is available for free only to academic researchers, so some money might be
involved in getting access.
No problem here. Not only do I work at an academic library, I'm the primary
responsible for the proxy we use to provide journal access for off-campus researchers. All
the benefits of being an academic researcher, with none of the grant proposals!
A statistical analysis of syntax errors - ScienceDirect
The first thing to catch my eye was that the abstract states it found that
syntax errors as a whole (not just semicolon errors) "occur relatively infrequently", which
seems to contradict your presentation of semicolon problems as something which constantly
afflicts all programmers.
Going over the content of the paper itself, I couldn't help noticing that a substantial
fraction of the semicolon errors discussed were in contexts idiosyncratic to Pascal which
have no Perl equivalent, such as the use of semicolons to separate groups of formal
parameters (vs. commas within each group); using semicolon after END most of the time, but
a period at the end of the program; or incorrectly using a semicolon before ELSE. Aside
from being idiosyncratic, these situations also have the common feature of being cases
where sometimes a semicolon is correct and sometimes a semicolon is
incorrect, depending on the context of the surrounding code - which is precisely the major
criticism of your "make semicolons sometimes optional, and escaping line breaks
sometimes required, depending on the context of the surrounding code". The primary
issue in these cases is that the rules change based on context, and you've proposed
propagating the larger problem in an attempt to resolve a smaller problem which, it seems,
only you perceive.
I also note that the data used in this research consisted of code errors collected from
two university programming classes, one of which was an introductory course and the other a
relatively advanced one. It is to be expected that semicolon errors (particularly given the
Pascal idiosyncrasies I mentioned above) would be common in code written for the
introductory course. It would be interesting to see how the frequency compared between the
two courses; I expect that it would be much, much lower in the advanced course - and lower
still in code written by practicing professionals in the field, which was omitted entirely
from the study.
Oh, and a number of other comments in this discussion have mentioned using syntax-aware
editors. Did those even exist in 1978, when this paper was published? Sorry, I'm just being
silly with that question - the paper mentions card decks and keypunch errors, and says that
the students were asked to "access [the compiler] using a 'cataloged procedure' of job
control statements". These programs weren't entered using anything like a modern text
editor, much less one with syntax awareness. (I wasn't able to find a clear indication of
whether the CDC 6000 Series, which is the computer these programs were compiled on, would
have used a card reader or a keyboard for them to enter their code, but I did find that CDC
didn't make a full-screen editor available to time-sharing users on the 6000 series until
1982, which is well after the paper's publication date.)
A first look at novice compilation behaviour using BlueJ
Yep, this one indeed found that missing semicolons were the most common type
of compilation error at 18%, with unknown variable name and missing brackets in a dead heat
for second place at 12%. Of course, it also found that the median time to correct and run
another compile was only 8 seconds after getting a missing semicolon error, so hardly a
major problem to resolve.
Also, once again, as even stated in the title of the paper, this was limited to code
written by novice programmers, taking a one-hour-a-week introductory course, so it seems
misguided to make assertions about the semicolon habits of experienced programmers based on
its findings.
Making programming more conversational
The only mentions of semicolons in this document are " Miss one semicolon
in a C program and the program may no longer work at all. " and " Instead of typing
in text-based instructions, many visual programming languages use mechanisms such as drag
and drop to compose programs. Similar to code auto-completion approaches, these kinds of
visual programming environments prevent syntactic programming mistakes such as missing
semicolons or typos. " While these statements confirm that semicolons are important and
that programmers can sometimes get them wrong (neither of which has been in dispute here),
they make no attempt to examine how commonly semicolon-related errors occur. Given that the
purpose of this paper was to introduce a new form of computer-assisted programming rather
than to examine existing coding practices, I doubt that the authors even considered looking
into the frequency of semicolon errors.
I was not able to locate the remaining papers you mentioned by doing title or author
searches using Ebsco's metasearch tools.
Highly desirable Make a semicolon optional at the end of the line
Highly un desirable. If things to be made optional for increased readability, not
this, but making braces optional for singles statement blocks. But that won't happen
either.
Highly Questionable Introduce pragma that specify max allowed length of single and
double quoted string
Probably already possible with a CPAN module, but who would use it? This is more something
for a linter or perltidy.
Highly desirable Compensate for some deficiencies of using curvy brackets as the
block delimiters
Unlikely to happen and very un undesirable. The first option is easy } #
LABEL (why introduce new syntax when comments will suffice). The second is just plain
illogical and uncommon in most other languages. It will confuse the hell out of every
programmer.
Make function slightly more flexible
a) no b) Await the new signatures c) Macro's are unlikely to happen. See the
problems they faced in Raku. Would be fun though
Long function names
Feel free to introduce a CPAN module that does all you propose. A new function for trimming
has recently been introduced and spun off a lot of debate. I think none of your proposed
changes in this point is likely to gain momentum.
Allow to specify and use "hyperstrings"
I have no idea what is to be gained. Eager to learn though. Can you give better
examples?
Put more attention of managing namespaces
I think a) is part of the proposed OO reworks for perl7 based on Cor , b) is just plain silly, c) could be useful, but
not based on letters but on sigils or interpunction, like in Raku</lI.
Analyze structure of text processing functions in competing scripting
languages
Sounds like a great idea for a CPAN module, so all that require this functionality can use
it
Improve control statements
Oooooh, enter the snake pit! There be dragons here, lots of nasty dragons. We have has
given/when and several switch implementations and suggestions, and so far there has been no
single solution to this. We all want it, but we all have different expectations for its
feature sets and behavior. Wise people are still working on it so expect *something* at
some time.
by you !!! on Sep 10,
2020 at 16:57 UTC Reputation: -4
Because }:LABEL actually forcefully closes all blocks in between, but the comment just
informs you which opening bracket this closing bracket corresponds to. and, as such, can
placed on the wrong closing bracket, especially if the indentation is wrong too. Worsening
already bad situation.
Your "one brace to close them all" idea is not needed if you have a decent editor - and,
incidentally, would most likely break this feature in many/most/all editors which provide
it.
by you !!! on Sep 11,
2020 at 16:45 UTC Reputation: -5
Highly desirable Make a semicolon optional at the end of the line
Highly undesirable. If things to be made optional for increased readability,
not this, but making braces optional for singles statement blocks. But that won't happen
either.
Making single statement blocks is a programming language design blunder made in PHP. It
creates the so called "dangling else" problem.
BTW, if this is "highly undesirable", can you please explain why Perl designers took
some efforts to allow omitting semicolon before closing brace?
Which works evenly fine when semi-colons are added.
Following the complete discussion, I wonder why you persist. To me it is obvious that
Perl is not (or should not be) your language of choice.
If you really think trailing semi-colons should be omitted, do find a language that
allows it. You have come up with exactly ZERO arguments that will convince the other users
of perl and perl language designers and language maintainers.
To me however, all the counter-arguments were very insightful, so thank you for starting
it anyway.
/me wonders how many users would stop using perl completely if your rules would be
implemented (wild guess 90%) and how many new users the language would gain (wild guess
1%)
Making Perl more like modern Python or JS is not improvement to language, you need another
word for that, something like "trends" or "fashion", or something like that. I see this list
as a simplification of language (and in a bad way), not improvement. As if some newby
programmer would not want to improve himself, to get himself up to match the complexity of
language, but blaim language complexity and demand the language complexity to go down to his
(low) level. "I don't want to count closing brackets, make something that will close them
all", "I don't want to watch for semicolons, let interpreter watch for end of sentence for
me", "This complex function is hard to understand and remember how to use it in a right way,
give me bunch of simple functions that will do the same as this one function, but they will
be easy to remember".
Making tool more simple will not make it more powerful, or more efficient, but instead
could make it less efficient, because the tool will have to waste some of its power to
compensate user's ineptitude. Interpreter would waste CPU and memory to comprehend sentence
ending, this "new" closing brackets and extra function calls, and what's gain here? I see
only one - that newby programmer could write code with less mind efforts. So it's not
improvement of language to do more with less, but instead a change that will cause tool do
same with more. Is it improvement? I don't think so.
by you !!! on Sep 10,
2020 at 16:52 UTC Reputation: -7
As if some newby programmer would not want to improve himself, to get himself up to match
the complexity of language, but blaim language complexity and demand the language
complexity to go down to his (low) level.
The programming language should be adapted to actual use by programmers, not to some
illusions of actual use under the disguise of "experts do not commit those errors." If the
errors committed by programmers in the particular language are chronic like is the case for
semicolons and missing closing brace something needs to be done about them, IMHO.
The same is true with the problem of "overexposure" of global variables. Most
programmers at some point suffer from this type of bugs. That's why "my" was pushed into
the language. But IMHO it does not go far enough as it does not distinguish between reading
and modifying a variable. And "sunglasses" approach to visibility of global variable might
be beneficial.
BTW the problem of missing parentheses affects all languages which use this "{" and "}"
as block delimiters and the only implementation which solved this complex problem
satisfactory were closing labels on closing block delimiter in PL/1 ("}" in Perl;
"begin/end" pair in PL/1). Like with "missing semicolon" this is the problem from which
programmer suffer independently of the level of experience with the language.
So IMHO any measures that compensate for "dangling '}' " problem and provide better
coordination between opening and closing delimiters in the nested blocks would be
beneficial.
Again the problem of missing closing brace is a chronic one. As somebody mentioned here
the editor that has "match brace" can be used to track it but that does not solve the
problem itself, rather it provides a rather inefficient (for complex script) way to
troubleshoot one. Which arise especially often if you modify a long and not written by you
(or written by you long go) script. I experienced even a case when syntactically { } braces
structure were correct but semantically wrong and that was detected only after the program
was moved to production. Closing label on bracket would prevent it.
If you write short subroutines, as you should, you don't suffer from misplaced closing
curly braces. I had problems with them, especially when doing large edits on code not
written by me, but the editor always saved me.
More or less agree WRT mismatched closing curlies. I see it pretty much entirely as an
editor issue.
(I mean isn't that the whole python argument for Semantic-Whitespace-As-Grouping? At
least I recall that ("Your editor will keep it straight") being seriously offered as a
valid dismissal of the criticism against S-W-A-G . . .)
The cake is a lie.
The cake is a lie.
The cake is a lie.
I mean isn't that the whole python argument for Semantic-Whitespace-As-Grouping?
No the argument is different, but using indentation to determine block nesting
does allow multiple closure of blocks, as a side effect. Python invented strange mixed
solution when there is an opening bracket (usually ":") and there is no closing bracket --
instead indent is used as the closing bracket.
The problem is that it breaks too many other things, so here the question "whether it
worth it" would be more appropriate, then in the case of soft semicolons.
As somebody mentioned here the editor that has "match brace" can be used to track it
but that does not solve the problem itself, rather it provides a rather inefficient (for
complex script) way to troubleshoot one. Which arise especially often if you modify the
script.
I would submit that, if you have enough levels of nested blocks that "match
brace" becomes a cumbersome and "inefficient" tool, then your problem is that your code is
overly-complex and poorly-structured, not any issue with the language or the editor. Good
code does not have 47-level-deep nested blocks.
by likbez on Sep 15,
2020 at 04:12 UTC Reputation: -1
Implement head and tail functions as synonyms to substr ($line,0,$len) and
substr($line,-$len)
Nothing's stopping you from doing that right now.
Yes, you can do it with certain limitations and the loss of flexibility as a
user function. The key question here is not whether "you can do it" -- but how convenient
it will be in comparison with the "status quo", are key categories of users benefit
directly from this addition (for Perl first of all whether sysadmins will benefit), and
what is the cost -- how much trouble is to add it into the already huge interpreter, which
inevitably increase already large number of built-in functions. As well as whether "in the
long run" new functions can retire same "inferior" functions like chomp and
chop
NOTE: it is better to call them ltrim and rtrim.
With chomp, which is far more frequently used out of two, replacing it by rtrim is just
a renaming operation, with chop you need some "inline" function capability
(macrosubstitution). So rtrim($line) should be equivalent of chomp($line) --assuming that
"\n" is the default second argument for rtrim)
Also any user function by definition has more limited flexibility in comparison to the
built-in function and is less efficient unless implemented in C.
Without introduction of additional argument for a user-defined function it is impossible
to determine if the function ltrim has target or not (if not, it should modify the
first parameter.
So on user function level you need to have two functions: ltrim and
myltrim ), as it this case the second argument has a more natural meaning.
On use defined function level you have quite limited capabilities to determine the
lexical type of the second argument (at run time in Perl you can only distinguish between
the numeric type and the string -- not that regex is passed, or translation table is
passed. Actually some languages allow to specify different entry points to the function
depending on the number and type of arguments (string, integer, float, pointer, etc)
passed. In Perl terms this looks something like extended signatures:
sub name {
entry ($$){ } entry (\$\$){ } }[download]
A couple of examples:
The call ltrim($line,7) should be interpreted as
$line=substr($line,7)
but the call $header=ltrim($line,'<h1>'); obviously should be interpreted
as
$header=substr($line,index($line,'<h1>');
Also if you want to pass regex or translation table you need somehow to distinguish type
of the last argument passed. So instead of the function call
$body=ltrim($line,/\s*/); you need to use
$body=ltrim($line,'\s*','r'); which should be interpreted as
the same problem arise if you want to pass a set of characters to be eliminated like in
tr/set1//d;
$body=ltrim($line," \t",'t'); # equivalent to ($body)=split(' ',$line,1);
One argument in favor of such functions is that in many languages the elimination of
free space at the beginning and end of strings is recognized as an important special case
and built-in function provided for this purpose. Perl is one of the few in which there is
no such special operation.
Allow default read access for global variables, but write mode only with own
declaration via special pragma, for example use sunglasses.
You can do this already. But it doesn't make sense to do this instead of creating
accessors.
Allow to specify set of characters, for which variable acquires my attribute
automatically, as well as the default minimum length of non my variables via pragma
my
There's a lot of problems with this. But hey, if you want this, there's nothing's stopping
from writing a module that provide this "feature".
...BTW the problem of missing parentheses affects all languages which use this "{" and "}"
as block delimiters and the only implementation which solved this complex problem
satisfactory were closing labels on closing block delimiter in PL/1 ("}" in Perl; "begin/end"
pair in PL/1). Like with "missing semicolon" this is the problem from which programmer suffer
independently of the level of experience with the language.
So IMHO any measures that compensate for "dangling '}' " problem and provide better
coordination between opening and closing delimiters in the nested blocks would be
beneficial.
Again the problem of missing closing brace is a chronic one. As somebody mentioned here
the editor that has "match brace" can be used to track it but that does not solve the problem
itself, rather it provides a rather inefficient (for complex script) way to troubleshoot one.
Which arise especially often if you modify a long and not written by you (or written by you
long go) script. I experienced even a case when syntactically { } braces structure were
correct but semantically wrong and that was detected only after the program was moved to
production. Closing label on bracket would prevent it.
If you write short subroutines, as you should, you don't suffer from misplaced closing
curly braces. I had problems with them, especially when doing large edits on code not
written by me, but the editor always saved me.
More or less agree WRT mismatched closing curlies. I see it pretty much entirely as an
editor issue.
(I mean isn't that the whole python argument for Semantic-Whitespace-As-Grouping? At
least I recall that ("Your editor will keep it straight") being seriously offered as a
valid dismissal of the criticism against S-W-A-G . . .)
The cake is a lie.
The cake is a lie.
The cake is a lie.
I mean isn't that the whole python argument for Semantic-Whitespace-As-Grouping?
No the argument is different, but using indentation to determine block nesting
does allow multiple closure of blocks, as a side effect. Python invented mixed solution
when there is an opening bracket (usually ":") and there is no closing bracket -- instead
the change in indent is used as the proxy for the presence the closing bracket.
The problem is that it breaks too many other things, so here the question "whether it
worth it" would be more appropriate, then in the case of soft semicolons or "reverse labels
on "}" like "}LABEL" .
I think these data are flawed since although writing vanilla programs is encouraged
(simple but good code), some of the entries have done an extraordinary effort to make their
program fast. For instance look at the Haskell entry for k-nucleotide. It rolls it's own
hash table. It claims that there is no library for hash table, but there is one, and using
it would cut the code size to about a quarter in size.
Same is somewhat true for C/C++ implementation. Using specialized SIMD instructions to
get better performance, or using the GMP library which reduces all languages to: who can
call a library faster.
If you have measurements, why not improve them with a visualization?
"Improve them" is good by definition - unfortunately a visualization can also be a
meaningless mush of bogus relationships and misunderstanding :-)
To better visualize which languages' implementations took advantage of parallelism
... Some stay in the same spot, probably indicating that the Computer Language Benchmarks
Game doesn't have the best implementations.
That seems to be a plucked-out-of-thin-air guess.
Shouldn't you have figured out what (if anything) it indicates?
For the single-core case, 27% of the languages are on the list above; for quad-core,
48% made the cut.
Does it make any difference to your statistics that the single core data simply includes
more language implementations than the quad core data?
Guillaume Marceau produced informative plots ... showing the tradeoffs between
metrics which the CLBG has now included on their site
No, Guillaume Marceau's plots are not included on the website.
Yes, because Guillaume Marceau showed people were interested in seeing code-used
time-used scatterplots - I designed a new chart.
I guess I'm happy there are quad-core numbers on here, but are they really useful? I
looked through the benchmarks and support for parallelism is spotty, and I'm not really
sure they represent things people even want to do in parallel. Does it make sense to have
parallel benchmarks for everything?
For example, binary-trees benchmark "allocates MANY, MANY binary trees". Yeah, ok,
that's embarrassingly parallel. Yay! What are you going to do with the trees once you have
them allocated, and how much memory bandwidth do you need to do it all on your quad-core
machine?
Also disappointing: n-body is not parallelized despite being one of the more interesting
parallelizable problems in there. And the various ports of it (e.g. to OCaml) look like
they're really just line-for-line ports of the C version. Wouldn't you write this
differently if you were really an OCaml person?
Another complaint: the OpenMP benchmarks in there don't seem to use any inter-thread
communication more complicated than a reduction, and they're GNU OpenMP. Nobody uses GNU
OpenMP because it sucks compared to all the other implementations. Granted, they're
commercial, but the GNU guys only test their stuff for correctness, not performance.
Finally, is parallelism really rightly expressed as a function of the programming
language? You've got a few different parallel programming models represented here: threads,
OpenMP, Erlang's actor model (if you can call that different from threads), and essentially
one of those is used in a few different languages. Why not compare the programming model
instead? You can do threads in pretty much any of these languages, and it'd be interesting
to see how that does against models (like OpenMP and, I assume, Erlang) where the compiler
helps out with the parallelization to some degree.
Also, Ruby moves up and to the right and uses *less* memory with more cores wtf?
Ok, I think I've fallen for this sufficiently now :-).
There are a lot of interesting comments about how flawed these benchmarks are. Let me
just point out that the Benchmarks Game folks are quick to acknowledge that. If you don't
like the benchmarks, then perhaps you at least will find that this visualization can reveal
some new questionable properties of the data!
Elvind: Yes, this visualization uses the fastest implementation of each language;
as you've pointed out, even with in a language there can be a significant tradeoff between
the metrics. I wouldn't say that means the data is flawed. It's just showing one point in
the tradeoff space -- an incomplete picture of each language.
Isaac wrote, regarding the fraction of languages that are Pareto-optimal: "Does
it make any difference to your statistics that the single core data simply includes more
language implementations than the quad core data?"
My main point was that a reasonably large fraction of languages are Pareto-optimal.
Beyond that I'm not making any claims, and I wouldn't read too much into the particular
numbers 27% and 48%. (But yes, I suspect that the size of the Pareto-optimal set does in
general depend in nontrivial ways on the number of points in the set. I would guess that
larger sets imply more Pareto-optimal points but a smaller fraction of points being
Pareto-optimal. Maybe theorists have analyzed this... In this particular case, the larger
set has fewer Pareto-optimal points and a smaller fraction of points being
Pareto-optimal.)
@tgamblin > Does it make sense to have parallel benchmarks for everything?
Those aren't "parallel benchmarks" - the task set was established and measured on single
core Intel Pentium 4 five years ago.
@tgamblin > n-body is not parallelized despite being one of the more interesting
parallelizable problems in there
Please design a program that uses multi core for that task and then contribute your
program to the benchmarks game.
(So far people have tried to come up with a multi core program that's faster than the
sequential implementations but have not succeeded - so they don't bother contributing
them.)
@tgamblin > Another complaint: the OpenMP benchmarks
What OpenMP benchmarks? Which benchmarks have OpenMP as a requirement?
@Brighten Godfrey > If you press the movie button in the bottom left, it will
transition to results on a quad-core box. Still normalized by the best single-core
score.
I think I got this wrong earlier, so let's be clear -
You took the summary data for x86 single core and x86 quad core ,
and used the best scores from the single core data to normalize both the single core
and quad core data?
So now let's look at tgamblin's question - Also, Ruby moves up and to the right and
uses *less* memory with more cores wtf?
JRuby memory measurements for some tasks are a lot less on quad core than
single core.
@Isaac Gouy: Those aren't "parallel benchmarks" - the task set was established and
measured on single core Intel Pentium 4 five years ago.
If they're not parallel benchmarks, then why test them on four cores? What do you learn?
This is exactly my point. The parallel benchmarks aren't even things people want to do in
parallel.
@Isaac Gouy: Please design a program that uses multi core for that task and then
contribute your program to the benchmarks game.
I might, someday when I have a little free time. But for anyone who tries this before
me: I would start by setting the number of bodies higher than 5. If that constant is a
requirement, then I imagine no one has succeeded in parallelizing the benchmark because it
doesn't actually do much work.
Also, it's not too hard to find implementations that get speedup .
Caveat: you could argue that parallelizing the 5-body problem is interesting, and that
you could perhaps parallelize over time. Maybe. People have looked into it. But that's a
research problem, and I would counter that no language is going to help you do that.
@Isaac Gouy: What OpenMP benchmarks? Which benchmarks have OpenMP as a
requirement?
The ones that use it -- you need to look at the code to see this. I didn't say
any of them required it.
My point is that maybe the benchmarks, instead of comparing languages for
parallelism, should look at the programming model instead (OpenMP, MPI, threads,
Erlang actors, PGAS, etc.). Otherwise you're falsely attributing any speedup you see to the
language, when maybe it only sped up because a particular parallel paradigm was available
there.
Maybe you've answered my question, though. If they're not really intended to be parallel
benchmarks, none of these points matter anyway.
Isaac wrote: "Just exclude language implementations from the single core data that
are not in the quad-core data - and re-run your analysis, and see what happens."
Since the question you asked above doesn't seem central to my main point I'll forgo
doing that calculation now but you can figure it out the number you're interested in by
inspecting the figure above. (Also, when I said "I suspect that the size of the
Pareto-optimal set does in general depend..." I was referring to Pareto-optimal sets in
general (e.g., a random data set), not this particular data set.)
You took the summary data for x86 single core and x86 quad core, and used the best
scores from the single core data to normalize both the single core and quad core
data?
That's right. If they had been normalized by different values, they wouldn't be directly
comparable.
So now let's look at tgamblin's question - "Also, Ruby moves up and to the right and
uses *less* memory with more cores wtf?" JRuby memory measurements for some tasks are a lot
less on quad core than single core. Look at the quad-core measurements. Look at the
one-core measurements.
I'm not sure what the Benchmarks Game has done, but I wonder if there are very different
implementations or algorithms for the two cases. For example, on chameneos-redux with N=6M,
moving from 1 to 4 cores JRuby gets substantially slower (57 sec -> 167 sec) but uses
substantially less memory (161,480 KB -> 49,580 KB).
@Brighten Godfrey > I'm not sure what the Benchmarks Game has done, but I wonder
if there are very different implementations or algorithms for the two cases.
If you don't know that the programs for the two cases are the same then you don't know
that they are directly comparable.
@Brighten Godfrey > I'm not sure what the Benchmarks Game has done, but I wonder
if there are very different implementations or algorithms for the two cases.
Select Python 3 and be amazed when the movie shows running on quad core increases source
code length!
(Incidentally, you may think that gzipped source code size has something to do with
language "expressiveness" but if you're going to claim that's what it was intended to
measure then I think you need to be able to show where the benchmarks game website says
that.)
Hacker circles, huh? Pretty nice term used. I fairly believe that since internet was
invented, the programming language wars has been evolving and has become more and more
serious. Expressing thoughts and arguments are somehow the main ingredients in the
programming language wars. Let us just sit back and observe what and where these
programming language wars would last and take us.
OK. You are right. So it
will now be interpreted as syntax error, but was valid previously, if we assume that somebody
uses this formatting for suffix conditionals.
That supports another critique of the same proposal -- it might break old Perl 5 scripts and
should be implemented only as optional pragma. Useful only for programmers who experience this
problem.
Because even the fact that this error is universal and occurs to all programmers is disputed
here.
if we assume that somebody uses this formatting to suffix conditionals
I do, pretty much all the time! The ability to span a statement over multiple lines
without jumping through backslash hoops is one of the things that makes Perl so attractive. I
also think it makes code much easier to read rather than having excessively long lines that
involve either horizontal scrolling or line wrapping. As to your comment regarding excessive length identifiers,
I come from a Fortran IV background where we had a maximum of 8 characters for identifiers
(ICL 1900 Fortran compiler) so I'm all for long, descriptive and unambiguous identifiers that
aid those who come after in understanding my code.
We need not "assume that somebody uses this formatting". I do it frequently, and I have
often seen it in other people's code. It is not a purely-hypothetical case.
It might make sense to enable it only with -d options as a help for debugging, which
cuts the number of debugging runs for those who do not have editor with built-in syntax
checking (like ActiveState Komodo Editor; which really helps is such cases ).
That list includes most Linux/Unix system administrators, who use just command line and
vi or similar. And they also use bash of daily basis along with Perl, which increases the
probability of making such an error. And this is probably one of the most important
category of uses for the future of Perl: Perl started with this group (Larry himself,
Randal L. Schwartz, Tom Christiansen, etc) and after a short affair with the Web
programming (yahoo, etc) and bioinformatics (bioperl) retreated back to the status of the
scripting language of choice for the elite Unix sysadmins.
That does not exclude other users and applications, but I think the core of Perl users
are now Unix sysadmins. And their interests should be reflected in Perl 7 with some
priority.
BTW, I do not see benefits of omitted semicolons in the final program (as well as, in
certain cases, omitted round brackets).
As for soft-semicolon you completly misunderstood the situation: First, nobody force you
to use this pragma. And if you do not use it you are not affected. I am thinking now that
it should be enabled only with option -d.
In the OP you make no mention of a pragma in proposal 1, you just say that it would be
"highly desirable" to have soft semicolons. This implies that you would like it to be the
default behaviour in Perl 7, which, judging by the responses, would hack a lot of people off,
me included. If you are proposing that soft semicolons are only enabled via a pragma perhaps
you should add a note to that effect in the OP, being sure to make it clear that it is an
update rather than silently changing the text.
And IMHO there is not a zero subset of Perl users who would be interested in this
capability. Especially system administrators who systematically use bash along with
Perl.
I spent the last 26 years of my career as a systems administrator (I had no ambition to
leave technical work and become a manager) on Unix/Linux systems and started using Perl in
that role in 1994 with perl 4.036, quickly moving to 5. The lack of semicolon statement
terminators in the various shell programming languages I had to use was a pain in the arse
and moving to Perl was a huge relief as well as a boost to effectiveness. I would not be the
slightest bit interested in soft semicolons and they would, to my mind, be either a debugging
nightmare or would force me into a coding style alien to my usual practice.
to which I say, nonsense! Why add unnecessary round brackets to perfectly valid code? Use
round brackets where they are needed to disambiguate precedence but not where they just add
superfluous noise. Nothing to do with fascination, I've never touched Python!
You should be commended on the amount of thought that you have put into your proposals and
such efforts should not be discouraged. It is unfortunate that your first proposal has been
the most contentious and the one that most responses have latched onto. Sticking to one's
guns is also a praiseworthy trait but doing so in the face of several powerful and cogent
arguments to the contrary from experienced Perl users is perhaps taking it too far. Making it
clear that soft semicolons would not be the default behaviour might apply some soothing balm
to this thread.
I do not understand your train of thought. In the first example end of the
line occurred when all brackets are balanced, so it will will be interpretered as
print( "Hello World" ); if( 1 );[download]
So this is a syntactically incorrect example, as it should be. The second example will
be interpreted as
That supports another critique of the same proposal -- it might break old Perl 5 scripts
and should be implemented only as optional pragma. Useful only for programmers who
experience this problem.
Because even the fact that this error is universal and occurs to all programmers is
disputed here.
if we assume that somebody uses this formatting to suffix conditionals
I do, pretty much all the time! The ability to span a statement over multiple lines
without jumping through backslash hoops is one of the things that makes Perl so attractive.
I also think it makes code much easier to read rather than having excessively long lines
that involve either horizontal scrolling or line wrapping. As to your comment regarding excessive length
identifiers, I come from a Fortran IV background where we had a maximum of 8 characters for
identifiers (ICL 1900 Fortran compiler) so I'm all for long, descriptive and unambiguous
identifiers that aid those who come after in understanding my code.
In the following, the first line has a balance of brackets and looks syntactically
correct. Would you expect the lexer to add a semicolon?
$a = $b + $c
+ $d + $e;
Yes, and the user will get an error. This is similar to previous example with
trailing on a new line "if (1);" suffix. The first question is why he/she wants to format
the code this way if he/she suffers from this problem, wants to avoid missing semicolon
error and, supposedly enabled pragma "softsemicolons" for that?
This is the case where the user need to use #\ to inform the scanner about his choice.
But you are right in a sense that it creates a new type of errors -- "missing
continuation." And that there is no free lunch. This approach requires specific discipline
to formatting your code.
The reason I gave that code as an example is that it's a perfectly normal way of
spreading complex expressions over multiple lines: e.g. where you need to add several
variables together and the variables have non-trivial (i.e. long) names, e.g.
[download]
In this case, the automatic semicolons are unhelpful and will give rise to confusing error
messages. So you've just switched one problem for another, and raised the cognitive load - people
now need to know about your pragma and also know when its in scope.
Yes it discourages certain formatting style. So what ? If you can't live without such
formatting (many can) do not use this pragma. BTW you can always use extra parentheses,
which will be eliminated by the parser as in
* How exactly does the lexer/parser know when it should insert a soft semicolon?
* How exactly does it give a meaningful error message when it inserts one where the user
didn't intend for there to be one?
My problem with your proposal is that it seems to require the parser to apply some
complex heuristics to determine when to insert and when to complain meaningfully. It is not
obvious to me what these heuristics should be. My suspicion is that such an implementation
will just add to perl's already colourful collection of edge cases, and just confuse both
beginner and expert alike.
Bear in mind that I am one of just a handful of people who actively work on perl's lexer
and parser, so I have a good understanding of how it works, and am painfully aware of its
many complexities. (And its quite likely that I would end up being the one implementing
this.)
The lexical analyzer is Perl is quite sophisticated due to lexical complexity of the
language. So I think it already counts past lexems and thus can determine the balance of
"()", '[]' and "{}"
So you probably can initially experiment with the following scheme
If all the following conditions are true
You reached the EOL
Pragma "softsemicolon" is on
The balance is zero
The next symbol via look-ahead buffer is not one of the set "{", "}", ';', and ".",
-- no Perl statement can start with the dot. Probably this set can be extended with
"&&", '||', and "!". Also the last ',' on the current line, and some other
symbols clearly pointing toward extension of the statement on the next line should block
this insertion.
the lexical analyzer needs to insert lexem "semicolon" in the stream of lexem passed to
syntax analyzer.
The warning issued should be something like:
"Attempt to correct missing semicolon was attempted. If this is incorrect please use
extra parenthesis or disable pragma "softsemicolon" for this fragment."
From what I read, Perl syntax analyser relies on lexical analyser in some
unorthodox way, so it might be possible to use "clues" from syntax analyser for improving
this scheme. See, for example, the scheme proposed for recursive descent parsers in:
Follow set error recovery
C Stirling - Software: Practice and Experience, 1985 - Wiley Online Library
Some accounts of the recovery scheme mention and make use of non-systematic changes to
their recursive descent parsers in order to improve In the former he anticipates the possibility of
a missing semicolon whereas in the latter he does not anticipate a missing comma
All of the following satisfy your criteria, are valid and normal Perl code, and would get
a semicolon incorrectly inserted based on your criteria:
use softsemicolon;
$x = $a
+ $b;
$x = 1
if $condition;
$x = 1 unless $condition1
&& $condition2;
Yes in cases 1 and 2; it depends on depth of look-ahead in case 3. Yes if it
is one symbol. No it it is two(no Perl statement can start with && )
As for "valid and normal" your millage may vary. For people who would want to use this
pragma it is definitely not "valid and normal". Both 1 and 2 looks to me like frivolities
without any useful meaning or justification. Moreover, case 1 can be rewritten as:
$x =($a
+ $b);
[download]
The case 3 actually happens in Perl most often with regular if and here opening bracket is
obligatory:
if ( ( $tokenstr=~/a\[s\]/ || $tokenstr =~/h\[s\]/ )
&& ( $tokenstr... ) ){ .... }
[download]
Also Python-inspired fascination with eliminating all brackets does not do here any good
I was surprised that the case without brackets was accepted by the syntax analyser.
Because how would you interpret $x=1 if $a{$b}; without brackets is unclear to me.
It has dual meaning: should be a syntax error in one case
$x=1 if $a{
$b
};
[download]
and the test for an element of hash $a in another.
Both 1 and 2 looks to me like frivolities without any useful meaning or
justification
You and I have vastly differing perceptions of what constitutes normal perl
code. For example there are over 700 examples of the 'postfix if on next line' pattern in
the .pm files distributed with the perl core.
There doesn't really seem any point in discussing this further. You have failed to
convince me, and I am very unlikely to work on this myself or accept such a patch into
core.
You and I have vastly differing perceptions of what constitutes normal perl code. For
example there are over 700 examples of the 'postfix if on next line' pattern in the .pm
files distributed with the perl core.
Probably yes. I am an adherent of "defensive programming" who is against
over-complexity as well as arbitrary formatting (pretty printer is preferable to me to
manual formatting of code). Which in this audience unfortunately means that I am a
minority.
BTW your idea that this pragma (which should be optional) matters for Perl standard
library has no connection to reality.
Modern society is built on software platforms that encompass a great deal of our lives.
While this is well known, software is invented by people and this comes at considerable cost.
Notably, approximately $331.7 billion are paid, in the U.S. alone, in wages every year for this
purpose. Generally, developers in industry use programming languages to create their software,
but there exists significant dispersion in the designs of competing language products. In some
cases, this dispersion leads to trivial design inconsistencies (e.g., the meaning of the symbol
+), while in other cases the approaches are radically different. Studies in the literature show
that some of the broader debates, like the classic ones on static vs. dynamic typing or
competing syntactic designs, provide consistent and replicable results in regard to their human
factors impacts.
For example, programmers can generally write correct programs more quickly using static
typing than dynamic for reasons that are now known. In this talk, we will discuss three facets
of language design dispersion, sometimes colloquially referred to as the "programming language
wars."
First, we will flesh out the broader impacts inventing software has on society, including
its cost to industry, education, and government. Second, recent evidence has shown that even
research scholars are not gathering replicable and reliable data on the problem. Finally, we
will give an overview of the facts now known about competing alternatives (e.g., types, syntax,
compiler error design, lambdas).
"... Most people do not experience difficulties learning to put a full stop at the end of the sentence most of the time. Unfortunately this does work this way in programming languages with semicolon at the end of statement. Because what is needed is not "most of the time" but "all the time" ..."
"... My view supported by some circumstantial evidence and my own practice is the this is a persistent error that arise independently of the level of qualification for most or all people, and semicolon at the end of the statement contradicts some psychological mechanism programmers have. ..."
What esteemed monks think about changes necessary/desirable in Perl 7 outside of OO staff. I compiled some my suggestions and will
appreciate the feedback:
[Highly desirable] Make a semicolon optional at the end of the line, if there is a balance of brackets on the line and
the statement looks syntactically correct ("soft semicolon", the solution used in famous IBM PL/1 debugging compiler).
[Highly Questionable] Introduce pragma that specify max allowed length of single and double quoted string (not not any other
type of literals). That might simplify catching missing quote (which is not a big problem with any decent Perl aware editor anyway)
[Highly desirable] Compensate for some deficiencies of using curvy brackets as the block delimiters:
Treat "}:LABEL" as the bracket closing "LABEL:{" and all intermediate blocks (This idea was also first
implemented in PL/1)
Treat " }.. " symbol as closing all opened brackets up to the subroutine/BEGIN block level and }... including
this level (closing up to the nesting level zero. ). Along with conserving vertical space, this allows search for missing closing
bracket to be more efficient.
Make function slightly more flexible:
Introduce pragma that allows to define synonyms to built-in functions, for example ss for for substr and ix for index
Allow default read access for global variables with subroutines, but write mode only with own declaration via
special pragma, for example use sunglasses;
Introduce inline functions which will be expanded like macros at compile time: sub subindex inline{ $_[0]=substr($_[0],index($_[0],$_[1],$_2]))
}[download]
As extracting of substring is a very frequent operation the use of such long name as substr is counterproductive; it also
contradicts the Perl goal of being concise and expressive .
allow to extract substring via : or '..' notations like $line [$from:$to] (label can't be put inside square brackets
in any case)
Explicitly distinguish between translation table and regular expressions by introducing tt-strings
Implement tail and head functions as synonyms to substr ($line,0,$len) and substr($line,-$len)
With the ability to specify string, regex of translation table(tr style) instead of number as the third argument tail($line,'#')
tail($line,/\s+#\w+$/) tail($line,tt/a-zA-z]/[download]
Implement similar to head and tail function called, for example, trim: trim(string,tt/leftcharacter_set/, tt/right_character_set/);[download]
which deleted all characters from the first character set at the left and all characters from the second character set from
the right, trim(string,,/right_character_set)
strips trailing characters only.
Allow to specify and use "hyperstrings" -- strings with characters occupying any power of 2 bytes (2,4,8, ...). Unicode
is just a special case of hyperstring
$hyper_example1= h4/aaaa/bbbb/cccc/;
$hyper_example2= h2[aa][bb][cc];
$pos=index($hyper_example,h4/bbbb/cccc/)
Put more attention of managing namespaces.
Allow default read access for global variables, but write mode only with own declaration via special pragma, for example
use sunglasses.
Allow to specify set of characters, for which variable acquires my attribute automatically, as well as the default minimum
length of non my variables via pragma my (for example, variables with the length of less then three character should always
be my)
Allow to specify set of character starting from which variable is considered to be own, for example [A-Z] via
pragma own.
Analyze structure of text processing functions in competing scripting languages and implement several enhancements for
existing functions. For example:
Allow "TO" argument in index function, specifying upper range of the search.
Implement delete function for strings and arrays. For example adel(@array,$from,$to) and asubstr and aindex functions.
Improve control statements
Eliminate keyword 'given' and treat for(scalar) as a switch statement. Allow when operator in all regular
loops too. for($var){<br> when('b'){ ...;} # means if ($var eq 'b') { ... ; las + t} when(>'c'){...;} } # for[download]
[Questionable] Extend last to accept labels and implement "post loop switch" (See Donald Knuth
Structured Programming with go to Statements programming
with goto statements) my rc==0; for(...){ if (condition1) { $rc=1; last;} elsif(...){$rc=2; last} } if ($rc==0){...} elif($rc==1){...}
elif($rc==3){...}[download]
May be (not that elegant, but more compact the emulation above)
for ...{ when (...); when (...); }with switch{ default: 1: ... 2: ... }[download]
Highly desirable Make a semicolon optional at the end of the line, if there is a balance of brackets on the line and the statement
looks syntactically correct ("soft semicolon", the solution used in famous IBM PL/1 debugging compiler).
Even aside from that, some of us don't like really long lines of code. Having to scroll horizontally in GUI editors sucks, and
I do most of my coding in good-old 80-column terminal windows. So it's not uncommon for me to split up a long statement into multiple
shorter lines, since whitespace has no syntactic significance.
If CRLF becomes a potential statement terminator, then breaking a single statement across multiple lines not only becomes
a minefield of "will this be treated as one or multiple statements?", but the answer to that question may change depending on
where in the statement the line breaks are inserted!
If implemented, this change would make a mockery of any claims that Perl 7 will just be "Perl 5 with different defaults", as
well as any expectations that it could be used to run "clean" (by some definition) Perl 5 code without modification.
If implemented, this change would make a mockery of any claims that Perl 7 will just be "Perl 5 with different defaults", as
well as any expectations that it could be used to run "clean" (by some definition) Perl 5 code without modification.
Looks like a valid objection. I agree. With certain formatting style it is possible. But do you understand the strict as the default
will break a lot of old scripts too.
Per your critique, it probably should not be made as the default and implemented as pragma
similar to warnings and strict. You can call this pragma "softsemicolon"
What most people here do not understand is it can be implemented completely on lexical scanner level, not affecting syntax
analyzer.
If CRLF becomes a potential statement terminator, then breaking a single statement across multiple lines not only becomes a
minefield of "will this be treated as one or multiple statements?", but the answer to that question may change depending on
where in the statement the line breaks are inserted!
No. The classic solution of this problem was invented in FORTRAN in early 50 -- it is a backslash at the end of the line. Perl
can use #\ as this is pragma to lexical scanner, not the element of the language.
Usually long line in Perl is the initialization of array or hash and after the split they do not have balanced brackets and,
as such, are not affected and do not require #\ at the end.
Question to you: how many times you corrected missing semicolon in your Perl scripts the last week ? If you do not know, please
count this during the next week and tell us.
The classic solution of this problem was invented in FORTRAN in early 50 -- it is a backslash at the end of the line.
Fortran didn't have a release until 1957 so not early 50s. Fortran prior to F90 used a continuation character at the start
(column 6) of the subsequent line not the end of the previous line. The continuation character in Fortran has never been specified
as a backslash. Perhaps you meant some other language?
Yes, the first FORTRAN compiler delivered in April 1957. I was wrong, sorry about it. Still the idea of continuation symbol
belongs to FORTRAN, although the solution was different then I mentioned.
how many times you corrected missing semicolon in your Perl scripts the last week
After running the code - never. All the IDEs I use for all the languages I use flag missing semi-colons and other similar foibles
(like mis-matched brackets.
There are nasty languages that I use occasionally, and even some respectable ones, that need to quote new lines to extend a
statement across multiple lines. That is just nasty on so many levels. I very much agree with
dsheroh that long lines are anathema. Code becomes much harder
to read and understand when lines are long and statements are not chunked nicely.
Don't break what's not broken!
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
I do not understand your train of thought. In the first example end of the line occurred when all brackets are balanced, so it
will will be interpretered as print( "Hello World" ); if( 1 );[download]
So this is a syntactically incorrect example, as it should be. The second example will be interpreted as
OK. You are right. So it will now be interpreted as syntax error, but was valid previously, if we assume that somebody uses
this formatting for suffix conditionals.
That supports another critique of the same proposal -- it might break old Perl 5 scripts and should be implemented only as
optional pragma. Useful only for programmers who experience this problem.
Because even the fact that this error is universal and occurs to all programmers is disputed here.
Because people have a natural tendency to omit them at the end of the line. That's why. This is an interesting psychological phenomenon that does not depend on your level of mastery of the language and is not limited
to novices.
So instead, beginners would encounter the interesting psychological phenomenon where a physical end of line is sometimes interpreted
by the compiler as an end of statement, and other times not. One set of errors would be replaced by another.
The problem is real and the solution is real. Objections so far were pretty superficial and stems from insufficient level of
understanding of how the proposal works on the level of lexical scanner -- it essentially replaces the end of line with semicolon
if there a balance in brackets and syntax analyzer is not affected at all.
Can you please tell us how many times you corrected the missing semicolon error in your scripts during the lst week?
As I said, I don't forget to include semicolons. See for example
this video , it's 7 years old, but my habits haven't
changed much since then. map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
That's neither a natural tendency nor an interesting psychological phenomenon. You just made that up.
Semicolons at the end of a statement are as natural as a full stop "." at the end of a sentence, regardless of whether the
sentence is the last in a paragraph. The verification process whether a line "looks syntactically correct" takes longer than just
hitting the ";" key, and the chances of a wrong assessment of "correct" may lead to wrong behavior of the software.
Language-aware editors inform you about a missing semicolon by indenting the following line as a continuation of the statement
in the previous line, so it is hard to miss.
If, on the other hand, you want to omit semicolons, then the discussion should have informed you that you aren't going
to find followers.
Semicolons at the end of a statement are as natural as a full stop "." at the end of a sentence, regardless of whether the
sentence is the last in a paragraph.
I respectfully disagree, but your comment can probably explain fierce rejection of this proposal in this forum. IMHO this is a
wrong analogy as the level of precision required is different. If you analyze books in print you will find paragraphs in which
full stop is missing at the end. Most people do not experience difficulties learning to put a full stop at the end of the sentence
most of the time. Unfortunately this does work this way in programming languages with semicolon at the end of statement. Because
what is needed is not "most of the time" but "all the time"
My view supported by some circumstantial evidence and my own practice is the this is a persistent error that arise independently
of the level of qualification for most or all people, and semicolon at the end of the statement contradicts some psychological
mechanism programmers have.
Highly desirable Make a semicolon optional at the end of the line
Highly un desirable. If things to be made optional for increased readability, not this, but making braces optional for
singles statement blocks. But that won't happen either.
Highly Questionable Introduce pragma that specify max allowed length of single and double quoted string
Probably already possible with a CPAN module, but who would use it? This is more something for a linter or perltidy.
Highly desirable Compensate for some deficiencies of using curvy brackets as the block delimiters
Unlikely to happen and very un undesirable. The first option is easy } # LABEL (why introduce new syntax when
comments will suffice). The second is just plain illogical and uncommon in most other languages. It will confuse the hell out
of every programmer.
Make function slightly more flexible
a) no b) Await the new signatures c) Macro's are unlikely to happen. See the problems they faced in Raku. Would be fun
though
Long function names
Feel free to introduce a CPAN module that does all you propose. A new function for trimming has recently been introduced and
spun off a lot of debate. I think none of your proposed changes in this point is likely to gain momentum.
Allow to specify and use "hyperstrings"
I have no idea what is to be gained. Eager to learn though. Can you give better examples?
Put more attention of managing namespaces
I think a) is part of the proposed OO reworks for perl7 based on Cor
, b) is just plain silly, c) could be useful, but not based on letters but on sigils or interpunction, like in Raku</lI.
Analyze structure of text processing functions in competing scripting languages
Sounds like a great idea for a CPAN module, so all that require this functionality can use it
Improve control statements
Oooooh, enter the snake pit! There be dragons here, lots of nasty dragons. We have has given/when and several switch implementations
and suggestions, and so far there has been no single solution to this. We all want it, but we all have different expectations
for its feature sets and behavior. Wise people are still working on it so expect *something* at some time.
by likbez on Sep 10, 2020 at 16:57 UTC Reputation:
0
>The first option is easy } # LABEL (why introduce new syntax when comments will suffice).
Because }:LABEL actually forcefully closes all blocks in between, but the comment just informs you which opening bracket this
closing bracket corresponds to. and, as such, can placed on the wrong closing bracket, especially if the indentation is wrong
too. Worsening already bad situation.
That I can agree with. The rest of your proposals seem either unnecessary (because the facilities already exist in the language)
or potentially problematic or almost without utility to me. Sorry. That's not to say you shouldn't suggest them all to p5p for
further review of course - it's only the opinion of a humble monk after all.
by likbez on Sep 10, 2020 at 15:16 UTC Reputation:
2
> I have good news: it already does
What I mean is a numeric "local" (in Pascal style; can be redefined later in other blocks ) label in the context of the Knuth idea
of "continuations" outside the loop
That's quite some work you've invested here. I've looked at them from two perspectives:
How does it help when I'm writing code?
How does it help when I'm reading code (mostly written by someone else)?
In summary, your suggestions don't perform that great. These are rather nerdy ideas where I don't see which problem
they solve. There isn't much to be added to the comments of other monks, so I'll keep attention to two items:
I challenge the claim that closing more than one block with one brace allows search for missing closing bracket to be more
efficient . It just hides problems when you lost control over your block structure. Source code editors easily allow to jump
from opening to closing brace, or to highlight matching braces, but they are extremely unlikely to support such constructs.
I challenge the claim that extracting of substring is a very frequent operation . It is not in the Perl repositories
I've cloned. Many of them don't have a single occurrence of substring . Please support that claim with actual data.
by likbez on Sep 10, 2020 at 21:49 UTC Reputation:
-1
The frequency per line of code is rather low -- slightly above 4% (156/3678)
But in my text processing scripts this is the most
often used function. In comparison the function "index" is used only 53 times. Or three times less. It also exceeds the use of
regular expressions -- 108 in 3678.
Strokes for folks. I use regexen vastly more often than substr and I almost never use index. Maybe you grew up with some other
language (Visual Basic maybe?) and haven't actually learned to use Perl in an idiomatic way? Perl encourages a plethora of paradigms
for solving problems. The flip side is Perl doesn't do much to discourage hauling less appropriate "comfort coding practices"
from other languages. That is no reason to assume that all Perl users abuse Perl in the same way you do, or have as much trouble
typing statement terminators
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Making Perl more like modern Python or JS is not improvement to language, you need another word for that, something like "trends"
or "fashion", or something like that. I see this list as a simplification of language (and in a bad way), not improvement. As
if some newby programmer would not want to improve himself, to get himself up to match the complexity of language, but blaim language
complexity and demand the language complexity to go down to his (low) level. "I don't want to count closing brackets, make something
that will close them all", "I don't want to watch for semicolons, let interpreter watch for end of sentence for me", "This complex
function is hard to understand and remember how to use it in a right way, give me bunch of simple functions that will do the same
as this one function, but they will be easy to remember".
Making tool more simple will not make it more powerful, or more efficient, but instead could make it less efficient, because
the tool will have to waste some of its power to compensate user's ineptitude. Interpreter would waste CPU and memory to comprehend
sentence ending, this "new" closing brackets and extra function calls, and what's gain here? I see only one - that newby programmer
could write code with less mind efforts. So it's not improvement of language to do more with less, but instead a change that will
cause tool do same with more. Is it improvement? I don't think so.
As if some newby programmer would not want to improve himself, to get himself up to match the complexity of language, but blaim
language complexity and demand the language complexity to go down to his (low) level.
The programming language should be adapted to actual use by programmers, not to some illusions of actual use under the disguise
of "experts do not commit those errors." If the errors committed by programmers in the particular language are chronic like is
the case for semicolons and missing closing brace something needs to be done about them, IMHO.
The same is true with the problem of "overexposure" of global variables. Most programmers at some point suffer from this type
of bugs. That's why "my" was pushed into the language. But IMHO it does not go far enough as it does not distinguish between reading
and modifying a variable. And "sunglasses" approach to visibility of global variable might be beneficial.
BTW the problem of missing parentheses affects all languages which use this "{" and "}" as block delimiters and the only implementation
which solved this complex problem satisfactory were closing labels on closing block delimiter in PL/1 ("}" in Perl; "begin/end"
pair in PL/1). Like with "missing semicolon" this is the problem from which programmer suffer independently of the level of experience
with the language.
So IMHO any measures that compensate for "dangling '}' " problem and provide better coordination between opening and closing
delimiters in the nested blocks would be beneficial.
Again the problem of missing closing brace is a chronic one. As somebody mentioned here the editor that has "match brace" can
be used to track it but that does not solve the problem itself, rather it provides a rather inefficient (for complex script) way
to troubleshoot one. Which arise especially often if you modify the script. I experienced even a case when syntactically { } braces
structure were correct but semantically wrong and that was detected only after the program was moved to production. Closing label
on bracket would prevent it.
I never had problems with omitting semicolons; maybe it's because of the extensive Pascal training.
If you write short subroutines, as you should, you don't suffer from misplaced closing curly braces. I had problems with them,
especially when doing large edits on code not written by me, but the editor always saved me.
More or less agree WRT mismatched closing curlies. I see it pretty much entirely as an editor issue.
(I mean isn't that the whole python argument for Semantic-Whitespace-As-Grouping? At least I recall that ("Your editor will
keep it straight") being seriously offered as a valid dismissal of the criticism against S-W-A-G . . .)
I mean isn't that the whole python argument for Semantic-Whitespace-As-Grouping?
No the argument is different, but using indentation to determine block nesting does allow multiple close of blocks, as a side
effect. Python invented strange mixed solution when there is an opening bracket (usually ":") and there is no closing bracket
-- instead indent is used as the closing bracket.
The problem is that it breaks too many other things, so here the question "whether it worth it" would be more appropriate,
then in case of soft semicolons.
[Highly desirable] Make a semicolon optional at the end of the line, if there is a balance of brackets on the line and the
statement looks syntactically correct ("soft semicolon", the solution used in famous IBM PL/1 debugging compiler).
I feel a bit ashamed to admit that I had programmed in PL/I for several years. The reason why PL/I was so relaxed w.r.t. syntax
is simple: You put your box full of punched cards to the operators' desk and you get the compiler's result the next day. If the
job had failed just because of a missing semicolon, you'd loose one full day. Nowadays there is absolutely no need for such stuff.
BTW, the really fatal errors in a PL/I program resulted in a compiler warning of the kind "conversion done by subroutine
call". This happend e.g. when assigning a pointer to a character array.
I wouldn't like to see any of the fancy features of PL/I in Perl. Consult your fortune database:
Speaking as someone who has delved into the intricacies of PL/I, I am sure that only Real Men could have written such a
machine-hogging, cycle-grabbing, all-encompassing monster. Allocate an array and free the middle third? Sure! Why not? Multiply
a character string times a bit string and assign the result to a float decimal? Go ahead! Free a controlled variable procedure
parameter and reallocate it before passing it back? Overlay three different types of variable on the same memory location?
Anything you say! Write a recursive macro? Well, no, but Real Men use rescan. How could a language so obviously designed and
written by Real Men not be intended for Real Man use?
PL/1 still exists, although as a niche language practically limited to mainframes. Along with being a base for C it also was probably
the first programming language that introduced exceptions as mainstream language feature. Also IMHO it is the origin of functions
substr, index and translate as we know them. Compilers from PL/1 were real masterpieces of software engineering and probably in
many aspects remain unsurpassed.
What is common between PL/1 and Perl is the amount of unjustified hate from CS departments and users of other languages toward
them.
What I think is common about both is that, while being very unorthodox, they are expressive and useful. Fun to program with.
As Larry Wall said: "Perl is, in intent, a cleaned up and summarized version of that wonderful semi-natural language known as
'Unix'."
Unorthodox nature and solutions in Perl which stems from Unix shell is probably what makes people coming from Python/Java/JavaScript
background hate it.
Currently, the big push is to turn on warnings and strict by default; I like the initially slow approach. I don't have a strong
opinion about any of your suggestions (good or bad) because I see none of them as particularly disruptive. Heck, I'd be happy
to to have say and state available without turning them on explicitly. Ultimately, I just look forward to moving
towards a more aggressive model of having new features on by default.
Thomas Claburn, writing for The Register: Developers really dislike Perl, and projects
associated with Microsoft, at least among those who volunteer their views through Stack
Overflow. The community coding site offers programmers a way to document their technical
affinities on their developer story profile pages. Included therein is an input box for tech
they'd prefer to avoid. For developers who have chosen to provide testaments of loathing,
Perl tops the list of
disliked programming languages, followed by Delphi and VBA . The yardstick here consists of
the ratio of "likes" and "dislikes" listed in developer story profiles; to merit chart
position, the topic or tag in question had to show up in at least 2,000 stories. Further down
the down the list of unloved programming language comes PHP, Objective-C, CoffeeScript, and
Ruby. In a blog post seen by The Register ahead of its publication today, Stack Overflow data
scientist David Robinson said usually there's a relationship between how fast a particular tag
is growing and how often it's disliked. "Almost everything disliked by more than 3 per cent of
Stories mentioning it is shrinking in Stack Overflow traffic (except for the quite polarizing
VBA, which is steady or slightly growing)," said Robinson. "And the least-disliked tags -- R,
Rust, TypeScript and Kotlin -- are all among the fast-growing tags (TypeScript and Kotlin
growing so quickly they had to be truncated in the plot).
And once you want to move beyond some simple automation scripts, you find that Python
doesn't have the performance to handle anything more taxing. Re:Perl Is Hated
Because It's Difficult (
Score: 4 , Interesting) by Anonymous Coward on Wednesday November 01, 2017 @11:05AM (
#55469365 )
Perl doesn't encourage or discourage you to write good or bad code. What it does very well
is work with the philosophy of DWIM (Do What I Mean). Importantly, it doesn't throw a giant
pile of (effectively) RFCs with an accompanying Nazi yelling, "YOU VILL VRITE CODE DIS VAY." at
you the way Python does. I've seen great Perl code and poor Perl code. I've seen great Python
code and poor Python code. A shitty developer writes shitty code and doesn't read
documentation. A great developer can take a language like Perl and create a great, readable
code. Real source (
Score: 4 , Informative) by Shompol ( 1690084 ) on Wednesday November 01, 2017 @10:12AM
( #55469013 ) The
original study is
here [stackoverflow.blog] I found the "polarization of technology" diagram at the bottom
even more interesting. Experience-based
opinions (
Score: 5 , Insightful) by Sarten-X ( 1102295 ) on Wednesday November 01, 2017
@10:16AM ( #55469047 )
Homepage
Having worked in Perl (and many other languages) for about 15 years now, I'm curious how
many of those polled actually use Perl regularly.
Whenever I have to introduce someone to my Perl scripts, their first reaction is usually the
typical horror, which fades in a few days after they start using it. Yes, there are comments.
Yes, there is decent design. No, the regular expressions are not worse than any other
implementation. No, the "clever" one-liner you copied off of a PerlMonks golf challenge will
not pass review.
Sure, there are a few weird warts on the language ("bless" being the most obvious example),
but it's no worse than any other, and significantly better than some of today's much more
popular languages. Mostly, I find that Perl just has a bad reputation because it allows you to
write ugly code, just like C allows you to corrupt data and Java allows you to consume obscene
amounts of memory. The language choice does not excuse being a bad programmer. At
least Perl stable. (
Score: 5 , Insightful) by Qbertino ( 265505 ) < [email protected] > on Wednesday November 01,
2017 @10:38AM ( #55469163 )
Perl is a wacky language and only bareable if you can handle old school unix stunts, no
doubt. It gave birth to PHP, which speaks volumes. I remember reading an OReilly introduction
to Perl and laughing at the wackyness. I've done the same with PHP, but I've always respected
both. Sort of.
Unlike newfangled fads and desasters like Ruby, Perl is a language that remains usable. Books
on Perl from 18 years ago are still valid today, just like with awk, TCL and Emacs Lisp.
Complain all you want about the awkwardness of old-school languages - they still work and
many of them run on just about anything that can be powered by electricity. These days I'm
still a little reluctant to say which side Javascript will come up on now that Node has it's
very own version hodgepodge gumming up the works. Two types of
languages . . . (
Score: 5 , Insightful) by walterbyrd ( 182728 ) on Wednesday November 01, 2017
@10:42AM ( #55469203 )
Personally I prefer Perl over similar scripting languages.
I write in KSH, CSH, Python and Perl regularly... Of the three, Perl is my hands down
favorite for a scripting language.
If you are writing applications in Perl though, it sucks. The implementation of objects is
obtuse, it isn't geared for User Interfaces (Perl/TK anyone?) and performance is really
horrid.
But... I cut my programming teeth on C (K&R, not ANSI) so I'm one of those old grey
headed guys who go "tisk tisk" at all those new fangled, it's better because it's new things
you young ones think are great.
Funny, I quite enjoyed writing in Perl 5 and the feeling was empowerment, and the community
was excellent. At the time Python was quite immature. Python has grown but Perl 5 is still
quite useful.
There is also quite a difference between legacy code and code written today using modern
extensions, though it seems people enjoy trashing things, instead of admitting they did not
actually learn it. Perl is just
fine (
Score: 2 , Insightful) by Anonymous Coward on Wednesday November 01, 2017 @11:43AM (
#55469621 )
I love perl. What I don't love is the deliberately obfuscated perl written by someone trying
to be clever and/or indispensible by writing code only they can (quickly) understand. A quick
down-and-dirty perl script is one thing, using it in reusable scripts is just immature and
pointless. Especially those who refuse to document their code.
My biggest problem I find with Perl is that there were SO many ways to express a similar
operations, conditionals, etc. While this may be nice for single developer projects, it is
utter hell if someone has to read that code. This has happened because of Perl's long life and
its iterations to add more and more contemporary programming concepts. This has made it
possible (and thus it will happen) to make Perl code a spaghetti mess of syntaxes. This makes
perl code difficult to read much less grok.
I'm not saying Perl is the only offender of this. PHP has the same issue with its older
functional programming syntax style and its OOP syntax. But PHP has kept it mainly to two
styles. Perl has way too many styles so people get lots in syntax and find it hard fallow the
code.
ShareRe:
It is surprising to me that enough developers have used Perl for it to be the most hated
language. I would have guessed JavaScript, or maybe VB (#4 & #2 most hated). Re: My
usual experience with Perl goes like this: We can't process data this year can you help us? Oh,
this is a 20-year-old Perl script. Let the biopsy begin. Re:Is that
surprising? (
Score: 5 , Informative) by Austerity Empowers ( 669817 ) on Wednesday
November 01, 2017 @11:05AM ( #55469361 )
My experience with the Perl hate is it's usually from younger people (by which I mean anyone
under about 40). It violates everything some may have been taught as part of their software
engineering program: it's difficult to read, maintain, and support.
But, it exists for a reason and it's ridiculously good at that purpose. If I want to process
lots of text, I do not use Python, I whip out perl. And usually it's fine, the little bits of
perl here and there that glue the world together aren't usually that egregious to maintain
(particularly in context of the overall mechanism it's being used to glue together,
usually).
If I'm going to write serious code, code that may formulate the basis for my corporations
revenue model or may seriously improve our cost structure, I use a serious language (C/C++,
usually) and spend significant amounts of time architecting it properly. The problem is that
more and more people are using scripting languages for this purpose, and it's becoming socially
acceptable to do so. The slippery slope being loved by children and idiots alike, one might say
"I know Perl, let's use that!" and countless innocents are harmed. Re:Is that
surprising? (
Score: 5 , Informative) by networkBoy ( 774728 ) on Wednesday November 01, 2017
@11:57AM ( #55469737 )
Journal
I *love* perl.
It is C for lazy programmers.
I tend to use it for four distinct problem domains:
* one-offs for data processing (file to file, file to stream, stream to file, stream to
stream). When I'm done I don't need it any more
* glue code for complex build processes (think a preprocessor and puppetmaster for
G/CMAKE)
* cgi scripts on websites. Taint is an amazing tool for dealing with untrusted user input.
The heavy lifting may be done by a back end binary, but the perl script is what lives in the
/cgi-bin dir.
* test applications. I do QA and Perl is a godsend for writing fuzzers and mutators.
Since it's loosely typed and dynamically allocates/frees memory in a quite sane manner it is
able to deal with the wacky data you want fuzzers to be working with.
Parent ShareRe:Is that
surprising? (
Score: 5 , Insightful) by al0ha (
1262684 ) on Wednesday November 01, 2017 @01:28PM ( #55470385 )
Journal Yep -
Perl is C for lazy programmers - well maybe not lazy, but programmers that don't want to have
to deal with allocating and freeing memory, which is the bane of C and where many of the
security problems arise. The other beautiful thing about Perl is no matter how you write your
code, the interpreter compiles it into the most efficient form, just like C.
I think hate for Perl stems from the scripters who try to show off their Perl skills,
writing the most concise code which is exasperatingly confusing and serves absolutely no
purpose. Whether you write verbose code which takes many lines to do the same thing as concise
and hard to understand code, at run time they perform exactly the same.
Perl coders have only themselves to blame for the hate; thousands of lines of stupid
hard to read code is a nightmare for the person that comes along months or years later and has
to work on your code. Stop it damn it, stop it!!!!! Re:Is that
surprising? (
Score: 5 , Insightful) by fahrbot-bot ( 874524 ) on Wednesday November 01, 2017
@12:28PM ( #55469959
)
My experience with the Perl hate is it's usually from younger people (by which I mean
anyone under about 40). It violates everything some may have been taught as part of their
software engineering program: it's difficult to read, maintain, and support.
The quality of the program structure and the ability to read, maintain and support it
are due to the programmer, not Perl. People can write programs well/poorly in any language.
Like some others here, I *love* Perl and always endeavor to write clear, well-organized code -
like I do in any other programming language - so others can make sense of it -- you know, in
case I get hit by a bus tomorrow... It's call being professional.
Hate the programmer, not the programming language. Re:Is that
surprising? (
Score: 5 , Funny) by Anonymous Coward on Wednesday November 01, 2017 @10:16AM (
#55469039
)
Many of us who know perl (and think you're a hypersensitive snowflake of a developer)
learned C before we learned Perl.
We're immune to coding horrors. Re:Ruby... (
Score: 4 , Interesting) by Anonymous Coward on Wednesday November 01, 2017 @11:28AM (
#55469503
)
The problem is because people use the wrong tools for things. This is not a definitive
list:
Perl is ONLY useful today as a server-sided processing script. If you are using Perl
on your front end, you will get dependency hell as your server updates things arbitrarily. Perl
breaks super-frequently due to the move from manual updates to automatic updates of third party
libraries/ports. Thus if you don't update Perl and everything that uses Perl at the same time,
mass-breakage. Thus "Don't update Perl you moron"
To that end PHP is on the other side of that coin. PHP is only useful for websites and
nothing else. If you run PHP as a backend script it will typically time out, or run out of
memory, because it's literately not designed to live very long. Unfortunately the monkeys that
make Wordpress themes, plugins, and "frameworks" for PHP don't understand this. Symfony is
popular, Symfony also is a huge fucking pain in the ass. Doctrine, gobbles up memory and gets
exponentially slower the longer the process runs.
Thus "Don't update Wordpress" mantra, because good lord there are a lot of shitty
plugins and themes. PHP's only saving grace is that they don't break shit to cause dependency
hell, they just break API's arbitrarily, thus rendering old PHP code broken until you update
it, or abandon it.
Ruby is a poor all-purpose tool. In order to use it with the web, you basically need
to have the equivalent of php-fpm for Ruby running, and if your server is exhausted, just like
php, it just rolls over and dies. Ruby developers are just like Python developers (next) in
that they don't fundamentally understand what they are doing , and leave (crashed) processes
running perpetually. At least PHP gets a force-kill after a while. Ruby Gems create another
dependency hell. In fact good luck getting Ruby on a CentOS installation, it will be obsolete
and broken.
Python, has all the worst of Perl's dependency hell with Ruby's clueless developers.
Python simply doesn't exist on the web, but good lord so many "build tools" love to use it, and
when it gets depreciated, whole projects that aren't even developed in Python, stop
working.
Which leads me to NodeJS/NodeWebkit. Hey it's Javascript, everyone loves javascript.
If you're not competent enough to write Javascript, turn in your developers license. Despite
that, just like Perl, Ruby and Python, setting up a build environment is an annoying pain in
the ass. Stick to the web browser and don't bother with it.
So that covers all the interpreted languages that you will typically run into on the
web.
Java is another language that sometimes pops up on servers, but it's more common in
finance and math projects, which are usually secretive. Java, just like everything mentioned,
breaks shit with every update.
C is the only languages that haven't adopted the "break shit with every update"
because C can not be "improved" on any level. Most of what has been added to the C API deals
with threading and string handling. At the very basics, anything written in C can compile on
everything as long as the platform has the same functions built into the runtime. Which isn't
true when cross-compiling between Linux and Windows. Windows doesn't "main()" while Linux has a
bunch of posix functions that don't exist on Windows.
Ultimately the reasons all these languages suck comes right back to dependency hell. A
language that has a complete API, requiring no libraries, simply doesn't exist, and isn't
future proof anyways.
People hate a lot of these languages because they don't adhere to certain programming
habits they have, like object oriented "overide-bullshit", abuse of global variables, or
strongly typed languages. Thus what should work in X language, doesn't work in Y language,
because that language simply does it differently.
Like weakly typed languages are probably supreme, at the expense of runtime
performance, because it results in less errors. That said, =, == and === are different. In a
strong type language, you can't fuck that up. In a weak type language, you can make mistakes
like if(moose=squirrel){blowshitup();} and the script will assume you want to make moose the
value of squirrel, AND run blowshitup() regardless of the result. Now if you meant ===, no type
conversion. Re:Ruby... (
Score: 5 , Interesting) by Darinbob ( 1142669 ) on Wednesday November 01, 2017
@02:20PM ( #55470827
)
You can write Forth code that is readable. Once you've got the reverse notation
figured out it is very simple to deal with. The real problem with Perl is that the same
variable name can mean many different things depending upon the prefix character and the
context in which it is used. This can lead to a lot of subtle bugs, leads to a steep learning
curve, and even a few months of vacation from the language can result in being unable to read
one's own code.
On the other hand, Perl was never designed to be a typical computer language. I was
berated by Larry Wall over this, he told me "you computer scientists are all alike". His goal
was to get a flexible and powerful scripting language that can be used to get the job done. And
it does just that - people use Perl because it can get stuff done. When it was new on Unix it
was the only thing that could really replace that nasty mix of sh+awk+ed scripting that was
common, instead being able to do all of that in a single script, and that led to its extremely
fast rise in popularity in the early 90s. Yes, it's an ugly syntax but it's strong underneath,
like the Lou Ferrigno of programming languages. Re:Ruby... (
Score: 2 ) by Shompol ( 1690084
) on Wednesday November 01, 2017 @10:27AM ( #55469105 )
Ruby is ahead of Perl, in the "medium-disliked"
[stackoverflow.blog] category. I find it amusing that Ruby was conceived as a Python
replacement, yet fell hopelessly behind in the popularity contest.
Perl bashing is popular sport among a particularly vocal crowd.
Perl is extremely flexible. Perl holds up TIMTOWTDI ( There Is More Than One Way To
Do It ) as a virtue. Larry Wall's Twitter handle is @TimToady, for goodness sake!
That flexibility makes it extremely powerful. It also makes it extremely easy to write code
that nobody else can understand. (Hence, Tim Toady
Bicarbonate.)
You can pack a lot of punch in a one-liner in Perl:
That one-liner takes a block of raw data (in $data ), expands it to an array of
values, and then f
Continue Reading Steve J
, Software Engineer
Answered November 4, 2017 · Author has 486 answers and 133.6K answer views Originally
Answered: Why does Perl so hated and not commanly used? And why should I learn it?
You should learn things that make your life easier or better. I am not an excellent Perl
user, but it is usually my go-to scripting language for important projects. The syntax is
difficult, and it's very easy to forget how to use it when you take significant time away from
it.
That being said, I love how regular expressions work in Perl. I can use sed like commands
$myvar =~ s/old/new/g for string replacement when processing or filtering strings. It's much
nicer than other languages imo.
I also like Perls foreach loops and its data structures.
I tried writing a program of moderate length in Pytho
It is still used, but its usage is declining. People use Python today in situations when
they would have used Perl ten years ago.
The problem is that Perl is extremely pragmatic. It is designed to be "a language to get
your job done", and it does that well; however, that led to rejection by language formalists.
However, Perl is very well designed, only it is well designed for professionals who grab in the
dark expecting that at this place there should be a button to do the desired functionality, and
indeed, there will be the button. It is much safer to use than for example C (the sharp knife
th
Continue Reading Michael
Yousrie , A programmer, A Problem Solver
Answered November 4, 2017 · Author has 82 answers and 169.1K answer views Originally
Answered: Why does Perl so hated and not commanly used? And why should I learn it?
Allow me to state my opinion though; You can't have people agreeing on everything because
that's just people. You can't expect every single person to agree on a certain thing, it's
impossible. People argue over everything and anything, that's just people.
You will find people out there that are Perl fanatics, people that praise the language! You
will also find people that don't like Perl at all and always give negative feedback about
it.
To be honest, I never gave a damn about people's opinions, I a
The truth is, that by any metric, more Perl is being done today than during the dot com
boom. It's just a somewhat smaller piece of a much bigger pie. In fact, I've heard from some
hiring managers that there's actually a shortage of Perl programmers, and not just for
maintaining projects, but for new greenfield deploys.
1.2K views View 25 Upvoters Richard
Conto , Programmer in multiple languages. Debugger in even more
Answered December 18, 2017 · Author has 7K answers and 5.4M answer views
Perl bashing is largely hear-say. People hear something and they say it. It doesn't require
a great deal of thought.
As for Perl not commonly being used - that's BS. It may not be as common as the usual gang
of languages, but there's an enormous amount of work done in Perl.
As for you you should learn Perl, it's for the same reason you would learn any other
language - it helps you solve a particular problem better than another language available. And
yes, that can be a very subjective decision to make.
Because even the best features of perl produce easily write only language. I have written
one liner XML parser using perl regex. The program has worked perfectly more than 10 years but
I have been afraid of change or new feature requiment which I cann't fullfil without writing
totally new program bacause I cann't understand my old one.
649 views View 4 Upvoters
Reed
White , former Engineer at Hewlett-Packard (1978-2000)
Answered November 7, 2017 · Author has 3K answers and 695.9K answer views
Yes, Perl takes verbal abuse; but in truth, it is an extremely powerful, reliable language.
In my opinion, one of its outstanding characteristics is that you don't need much knowledge
before you can write useful programs. As time goes by, you gradually learn the real power of
the language.
However, because Perl-bashing is popular, you might better put your efforts into learning
Python, which is also quite capable.
What esteemed monks
think about changes necessary/desirable in Perl 7 outside of OO staff. I compiled some my
suggestions and will appreciate the feedback:
[Highly desirable] Make a semicolon optional at the end of the line, if there is a
balance of brackets on the line and the statement looks syntactically correct ("soft
semicolon", the solution used in famous IBM PL/1 debugging compiler).
[Highly Questionable] Introduce pragma that specify max allowed length of single and
double quoted string (not not any other type of literals). That might simplify catching
missing quote (which is not a big problem with any decent Perl aware editor anyway)
[Highly desirable] Compensate for some deficiencies of using curvy brackets as the
block delimiters:
Treat "}:LABEL" as the bracket closing "LABEL:{" and all
intermediate blocks (This idea was also first implemented in PL/1)
Treat " }.. " symbol as closing all opened brackets up to the
subroutine/BEGIN block level and }... including this level (closing up to the
nesting level zero. ). Along with conserving vertical space, this allows search for
missing closing bracket to be more efficient.
Make function slightly more flexible:
Introduce pragma that allows to define synonyms to built-in functions, for example ss
for for substr and ix for index
Allow default read access for global variables with subroutines, but write mode only
with own declaration via special pragma, for example use
sunglasses;
Introduce inline functions which will be expanded like macros at compile time:
sub subindex inline{ $_[0]=substr($_[0],index($_[0],$_[1],$_2])) }[download]
As extracting of substring is a very frequent operation and use of such long name is
counterproductive; it also contradicts the Perl goal of being concise and expressive .
allow to extract substring via : or '..' notations like $line [$from:$to]
(label can't be put inside square brackets in any case)
Explicitly distinguish between translation table and regular expressions by
introducing tt-strings
Implement tail and head functions as synonyms to substr
($line,0,$len) and substr($line,-$len)
With the ability to specify string, regex of translation table(tr style) instead of
number as the third argument tail($line,'#') tail($line,/\s+#\w+$/)
tail($line,tt/a-zA-z]/[download]
Implement similar to head and tail function called, for example, trim: trim(string,tt/leftcharacter_set/, tt/right_character_set/);[download]
which deleted all characters from the first character set at the left and all characters
from the second character set from the right,
trim(string,,/right_character_set)
strips trailing characters only.
Allow to specify and use "hyperstrings" -- strings with characters occupying any power
of 2 bytes (2,4,8, ...). Unicode is just a special case of hyperstring
$hyper_example1= h4/aaaa/bbbb/cccc/;
$hyper_example2= h2[aa][bb][cc];
$pos=index($hyper_example,h4/bbbb/cccc/)
Put more attention of managing namespaces.
Allow default read access for global variables, but write mode only with own
declaration via special pragma, for example use sunglasses.
Allow to specify set of characters, for which variable acquires my attribute
automatically, as well as the default minimum length of non my variables via pragma my
(for example, variables with the length of less then three character should always be
my)
Allow to specify set of character starting from which variable is considered to be
own, for example [A-Z] via pragma own.
Analyze structure of text processing functions in competing scripting languages and
implement several enhancements for existing functions. For example:
Allow "TO" argument in index function, specifying upper range of the search.
Implement delete function for strings and arrays. For example
adel(@array,$from,$to) and asubstr and aindex functions.[download]
Improve control statements
Eliminate keyword 'given' and treat for(scalar) as a switch statement. Allow
when operator in all regular loops too. for($var){<br> when('b'){
...;} # means if ($var eq 'b') { ... ; las + t} when(>'c'){...;} } # for[download]
[Questionable] Extend last to accept labels and implement "post loop switch" (See
Donald Knuth Structured
Programming with go to Statements programming with goto statements) my rc==0;
for(...){ if (condition1) { $rc=1; last;} elsif(...){$rc=2; last} } if ($rc==0){...}
elif($rc==1){...} elif($rc==3){...}[download]
May be (not that elegant, but more compact the emulation above)
for ...{ when
(...); when (...); }with switch{ default: 1: ... 2: ... }[download]
Highly desirable Make a semicolon optional at the end of the line, if there is a balance of
brackets on the line and the statement looks syntactically correct ("soft semicolon", the
solution used in famous IBM PL/1 debugging compiler).
If CRLF becomes a potential statement terminator, then breaking a single
statement across multiple lines not only becomes a minefield of "will this be treated as
one or multiple statements?", but the answer to that question may change depending on where
in the statement the line breaks are inserted!
If implemented, this change would make a mockery of any claims that Perl 7 will just be
"Perl 5 with different defaults", as well as any expectations that it could be used to run
"clean" (by some definition) Perl 5 code without modification.
If implemented, this change would make a mockery of any claims that Perl 7 will just be
"Perl 5 with different defaults", as well as any expectations that it could be used to
run "clean" (by some definition) Perl 5 code without modification.
Looks like a valid objection. I agree. With certain formatting style it is
possible. But do you understand the strict as the default will break a lot of old scripts
too. Per your critique, it probably should not be made as the default and implemented as
pragma similar to warnings and strict. You can call this pragma "softsemicolon"
What most people here do not understand is it can be implemented completely on lexical
scanner level, not affecting syntax analyser.
If CRLF becomes a potential statement terminator, then breaking a single statement across
multiple lines not only becomes a minefield of "will this be treated as one or multiple
statements?", but the answer to that question may change depending on where in the
statement the line breaks are inserted!
No. The classic solution of this problem was invented in FORTRAN in early 50
-- it is a backslash at the end of the line. Perl can use #\ as this is pragma to lexical
scanner, not the element of the language.
Usually long line in Perl is the initialization of array or hash and after the split
they do not have balanced brackets and, as such, are not affected and do not require #\ at
the end.
Question to you: how many times you corrected missing semicolon in your Perl scripts the
last week ? If you do not know, please count this during the next week and tell us.
how many times you corrected missing semicolon in your Perl scripts the last week
After running the code - never. All the IDEs I use for all the languages I use flag
missing semi-colons and other similar foibles (like mis-matched brackets.
There are nasty languages that I use occasionally, and even some respectable ones, that
need to quote new lines to extend a statement across multiple lines. That is just nasty on
so many levels. I very much agree with dsheroh that long lines are anathema. Code
becomes much harder to read and understand when lines are long and statements are not
chunked nicely.
Don't break what's not broken!
Optimising for fewest key strokes only makes sense
transmitting to Pluto or beyond
The classic solution of this problem was invented in FORTRAN in early 50 -- it is a
backslash at the end of the line.
Fortran didn't have a release until 1957 so not early 50s. Fortran prior to F90 used a
continuation character at the start (column 6) of the subsequent line not the end of the
previous line. The continuation character in Fortran has never been specified as a
backslash. Perhaps you meant some other language?
I do not understand your train of thought. In the first example end of the
line occurred when all brackets are balanced, so it will will be interpretered as
print( "Hello World" ); if( 1 );[download]
So this is a syntactically incorrect example, as it should be. The second example will
be interpreted as
That support another critique of the same proposal -- it might break old Perl 5 scripts
and should be implemented only as optional pragma. Usuful only for programmers who
experience this problem.
Because even the fact that this error is universal and occurs to all programmers is
disputed here.
That's neither a natural tendency nor an interesting psychological phenomenon. You just
made that up.
Semicolons at the end of a statement are as natural as a full stop "." at the end of a
sentence, regardless of whether the sentence is the last in a paragraph. The verification
process whether a line "looks syntactically correct" takes longer than just hitting the ";"
key, and the chances of a wrong assessment of "correct" may lead to wrong behavior of the
software.
Language-aware editors inform you about a missing semicolon by indenting the following
line as a continuation of the statement in the previous line, so it is hard to miss.
If, on the other hand, you want to omit semicolons, then the discussion should
have informed you that you aren't going to find followers.
Semicolons at the end of a statement are as natural as a full stop "." at the end of a
sentence, regardless of whether the sentence is the last in a paragraph.
I respectfully disagree, but your comment can probably explain fierce
rejection of this proposal in this forum. IMHO this is a wrong analogy as the level of
precision requred is different. If you analyse books in print you will find paragraphs in
which full stop is missing at the end. Most people do not experience difficulties learning
to put a full stop at the end of the sentence most of the time. Unfortunately this does
work this way in programming languages with semicolon at the end of statement. Because what
is needed is not "most of the time" but "all the time"
My view supported by some circumstantial evidence and my own practice is the this is a
persistent error that arise independently of the level of qualification for most or all
people, and semicolon at the end of the statement contradicts some psychological mechanism
programmers have.
Highly desirable Make a semicolon optional at the end of the line
Highly un desirable. If things to be made optional for increased readability, not
this, but making braces optional for singles statement blocks. But that won't happen
either.
Highly Questionable Introduce pragma that specify max allowed length of single and
double quoted string
Probably already possible with a CPAN module, but who would use it? This is more something
for a linter or perltidy.
Highly desirable Compensate for some deficiencies of using curvy brackets as the
block delimiters
Unlikely to happen and very un undesirable. The first option is easy } #
LABEL (why introduce new syntax when comments will suffice). The second is just plain
illogical and uncommon in most other languages. It will confuse the hell out of every
programmer.
Make function slightly more flexible
a) no b) Await the new signatures c) Macro's are unlikely to happen. See the
problems they faced in Raku. Would be fun though
Long function names
Feel free to introduce a CPAN module that does all you propose. A new function for trimming
has recently been introduced and spun off a lot of debate. I think none of your proposed
changes in this point is likely to gain momentum.
Allow to specify and use "hyperstrings"
I have no idea what is to be gained. Eager to learn though. Can you give better
examples?
Put more attention of managing namespaces
I think a) is part of the proposed OO reworks for perl7 based on Cor , b) is just plain silly, c) could be useful, but
not based on letters but on sigils or interpunction, like in Raku</lI.
Analyze structure of text processing functions in competing scripting
languages
Sounds like a great idea for a CPAN module, so all that require this functionality can use
it
Improve control statements
Oooooh, enter the snake pit! There be dragons here, lots of nasty dragons. We have has
given/when and several switch implementations and suggestions, and so far there has been no
single solution to this. We all want it, but we all have different expectations for its
feature sets and behavior. Wise people are still working on it so expect *something* at
some time.
by you !!! on Sep 10,
2020 at 16:57 UTC Reputation: -2
Because }:LABEL actually forcefully closes all blocks in between, but the comment just
informs you which opening bracket this closing bracket corresponds to. and, as such, can
placed on the wrong closing bracket, especially if the indentation is wrong too. Worsening
already bad situation.
That I can agree with. The rest of your proposals seem either unnecessary (because the
facilities already exist in the language) or potentially problematic or almost without
utility to me. Sorry. That's not to say you shouldn't suggest them all to p5p for further
review of course - it's only the opinion of a humble monk after all.
by you !!! on Sep 10,
2020 at 15:16 UTC Reputation: 1
What I mean is a numeric "local" (in Pascal style; can be redefined later in other
blocks ) label in context of the Knuth idea of "continuations" outside the loop
That's quite some work you've invested here. I've looked at them from two
perspectives:
How does it help when I'm writing code?
How does it help when I'm reading code (mostly written by someone else)?
In summary, your suggestions don't perform that great. These are rather nerdy ideas
where I don't see which problem they solve. There isn't much to be added to the comments of
other monks, so I'll keep attention to two items:
I challenge the claim that closing more than one block with one brace allows search for
missing closing bracket to be more efficient . It just hides problems when you lost
control over your block structure. Source code editors easily allow to jump from opening to
closing brace, or to highlight matching braces, but they are extremely unlikely to support
such constructs.
I challenge the claim that extracting of substring is a very frequent operation .
It is not in the Perl repositories I've cloned. Many of them don't have a single occurrence
of substring . Please support that claim with actual data.
Making Perl more like modern Python or JS is not improvement to language, you need another
word for that, something like "trends" or "fashion", or something like that. I see this list
as a simplification of language (and in a bad way), not improvement. As if some newby
programmer would not want to improve himself, to get himself up to match the complexity of
language, but blaim language complexity and demand the language complexity to go down to his
(low) level. "I don't want to count closing brackets, make something that will close them
all", "I don't want to watch for semicolons, let interpreter watch for end of sentence for
me", "This complex function is hard to understand and remember how to use it in a right way,
give me bunch of simple functions that will do the same as this one function, but they will
be easy to remember".
Making tool more simple will not make it more powerful, or more efficient, but instead
could make it less efficient, because the tool will have to waste some of its power to
compensate user's ineptitude. Interpreter would waste CPU and memory to comprehend sentence
ending, this "new" closing brackets and extra function calls, and what's gain here? I see
only one - that newby programmer could write code with less mind efforts. So it's not
improvement of language to do more with less, but instead a change that will cause tool do
same with more. Is it improvement? I don't think so.
by you !!! on Sep 10,
2020 at 16:52 UTC Reputation: -4
As if some newby programmer would not want to improve himself, to get himself up to match
the complexity of language, but blaim language complexity and demand the language
complexity to go down to his (low) level.
The programming language should be adapted to actual use by programmers, not to some
illusions of actual use under the disguise of "experts do not commit those errors." If the
errors committed by programmers in the particular language are chronic like is the case for
semicolons and missing closing brace something needs to be done about them, IMHO.
The same is true with the problem of "overexposure" of global variables. Most
programmers at some point suffer from this type of bugs. That's why "my" was pushed into
the language.
BTW the problem of missing parentheses affects all languages which use this "{" and "}"
as block delimiters and the only implementation which solved this complex problem
satisfactory were closing labels on closing block delimiter in PL/1 ("}' in Perl;
"begin/end pair in PL/1). Like with "missing semicolon" this is the problem from which
programmer suffer independently of the their level of experience with the language.
So IMHO any measures that compensate for "dangling '}' " problem and provide better
coordination between opening and closing delimiters in the nested blocks would be
beneficial.
Again the problem of missing closing brace is a chronic one. As somebody mentioned here
the editor that has "match brace" can be used to track it but that does not solve the
problem itself, dues provide a rather inefficient (for complex script) way to troubleshoot
one. Which arise especially often if you modify the script. I experienced even a case when
syntactically { } braces structure were correct but semantically wrong and that was
detected only after the program was moved to production. Closing label on bracket would
prevent it.
But IMHO it does not go far enough as it does not distinguish between reading and
modifying a variable. And "sunglasses" approach to visibility of global variable might be
beneficial.
If you write short subroutines, as you should, you don't suffer from misplaced closing
curly braces. I had problems with them, especially when doing large edits on code not
written by me, but the editor always saved me.
More or less agree WRT mismatched closing curlies. I see it pretty much entirely as an
editor issue.
(I mean isn't that the whole python argument for Semantic-Whitespace-As-Grouping? At
least I recall that ("Your editor will keep it straight") being seriously offered as a
valid dismissal of the criticism against S-W-A-G . . .)
The cake is a lie.
The cake is a lie.
The cake is a lie.
I mean isn't that the whole python argument for Semantic-Whitespace-As-Grouping?
No the argument is different, but using indentation to determine block nesting
does allow multiple close of blocks, as a side effect. Python invented strange mixed
solution when there is an opening bracket (usually ":") and there is no closing bracket --
instead indent is used as the closing bracket.
The problem is that it breaks too many other things, so here the question "whether it
worth it" would be more appropriated that in case of soft semicolons.
[Highly desirable] Make a semicolon optional at the end of the line, if there is a
balance of brackets on the line and the statement looks syntactically correct ("soft
semicolon", the solution used in famous IBM PL/1 debugging compiler).
I feel a bit ashamed to admit that I had programmed in PL/I for several years. The reason
why PL/I was so relaxed w.r.t. syntax is simple: You put your box full of punched cards to
the operators' desk and you get the compiler's result the next day. If the job had failed
just because of a missing semicolon, you'd loose one full day. Nowadays there is absolutely
no need for such stuff.
BTW, the really fatal errors in a PL/I program resulted in a compiler warning of
the kind "conversion done by subroutine call". This happend e.g. when assigning a pointer to
a character array.
I wouldn't like to see any of the fancy features of PL/I in Perl. Consult your
fortune database:
Speaking as someone who has delved into the intricacies of PL/I, I am sure that only
Real Men could have written such a machine-hogging, cycle-grabbing, all-encompassing
monster. Allocate an array and free the middle third? Sure! Why not? Multiply a character
string times a bit string and assign the result to a float decimal? Go ahead! Free a
controlled variable procedure parameter and reallocate it before passing it back? Overlay
three different types of variable on the same memory location? Anything you say! Write a
recursive macro? Well, no, but Real Men use rescan. How could a language so obviously
designed and written by Real Men not be intended for Real Man use?
Currently, the big push is to turn on warnings and strict by default; I like the initially
slow approach. I don't have a strong opinion about any of your suggestions (good or bad)
because I see none of them as particularly disruptive. Heck, I'd be happy to to have
say and state available without turning them on explicitly. Ultimately, I
just look forward to moving towards a more aggressive model of having new features on by
default.
Disambiguation: If the actual element is blessed and can('method1') , it is
invoked. Otherwise it is treated as a function call ( :: might be used for further
disambiguation).
I.e. similar to Data::Diver , just
more efficient together with a pragma or other method to control auto-vivification. Yes, I am
aware, that I could build something similar as a module, but it would be pure Perl.
Comment on What
esteemed monks think about changes necessary/desirable in Perl 7 outside of OO
staff
"... A lot of resources have been pushed into Python and The Cloud in the past decade. It seems to me that this has come at the opportunity cost of traditional Linux/Unix sysadmin skills. ..."
"... And in a lot of cases, there's no documentation, because after all, the guy was just trying to solve a problem, so why document it? It's "just for now," right? If you find yourself in this situation enough times, then it's easy to start resenting the thing that all of these pieces of code have in common: Perl. ..."
"... I'm glad you're finding Perl to be clever and useful. Because it is. And these days, there are lots of cool modules and frameworks that make it easier to write maintainable code. ..."
"... Perl was the tool of choice at the dawn of the web and as a result a lot of low to average skill coders produced a large amount of troublesome code much of which ended up being critical to business operations. ..."
"... As a Bioinformatician I also see a bunch of hostility for Perl, many people claim it is unreadable and inefficient, but as other pointed, it has a lot of flexibility and if the code is bad is because of the programmer, not the language. ..."
"... I don't hate Python but I prefer Perl because I feel more productive on it, so I don't understand people who said that Perl is far worse than Python. Both have their own advantages and disadvantages as any other computer language. ..."
"... When you look at a language like Go, it was designed to make writing good go easy and bad go hard. it's still possible to architect your program in a bad way, but at least your implementation details will generally make sense to the next person who uses it. ..."
"... I also commonly see bad code in Python or Java ..."
"... Much of the hostility comes from new programmers who have not had to work in more than one language, Part of this is the normal Kubler Ross grief cycle whenever programmers take on a legacy code base. Part of it has to do with some poorly written free code that became popular on early Web 1.0 websites from the 90's. Part of this come from the organizations where "scripting" languages are popular and their "Engineering In Place" approach to infrastructure. ..."
"... Perl might be better than Python3 for custom ETL work on large databases. ..."
"... Perl might be better than Python at just about everything. ..."
"... I use f-strings in Python for SQL and... I have no particular feelings about them. They're not the worst thing ever. Delimiters aren't awful. I'm not sure they do much more for me than scalar interpolation in Perl. ..."
"... I think Perl is objectively better, performance and toolset wise. My sense is the overhead of "objectifying" every piece of data is too much of a performance hit for high-volume database processing ..."
"... the Python paradigm of "everything is an object" introduces overhead to a process that doesn't need it and is repeated millions or billions of times, so even small latencies add up quickly. ..."
"... I think what the Perl haters are missing about the language is that Perl is FUN and USEFUL. It's a joy to code in. It accomplishes what I think was Larry Wall's primary goal in creating it: that it is linguistically expressive. There's a certain feeling of freedom when coding in it. I do get though that it's that linguistic expressiveness characteristic that makes people coming from Python/Java/JavaScript background dislike about it. ..."
"... Like you said, the way to appreciate Perl is to be aware that it is part of the Unix package. I think Larry Wall said it best: "Perl is, in intent, a cleaned up and summarized version of that wonderful semi-natural language known as 'Unix'." ..."
"... I don't know why, but people hating on Perl doesn't bother me as much as those who are adverse to Perl but fond of other languages that heavily borrow from Perl -- without acknowledging the aspects of their language that were born, and created first in Perl. Most notably regular expressions; especially Perl compatible expressions. ..."
"... My feelings is that Perl is an easy language to learn, but difficult to master. By master I don't just mean writing concise, reusable code, but I mean readable, clean, well-documented code. ..."
"... Larry Wall designed perl using a radically different approach from the conventional wisdom among the Computer Science intelligentsia, and it turned out to be wildly successful. They find this tremendously insulting: it was an attack on their turf by an outsider (a guy who kept talking about linguistics when all the cool people know that mathematical elegance is the only thing that matters). ..."
"... The CS-gang responded to this situation with what amounts to a sustained smear campaign, attacking perl at every opportunity, and pumping up Python as much as possible. ..."
"... The questionable objections to perl was not that it was useless-- the human genome project, web 1.0, why would anyone need to defend perl? The questionable objections were stuff like "that's an ugly language". ..."
"... I generally agree that weak *nix skills is part of it. People don't appreciate the fact that Perl has very tight integration with unix (fork is a top-level built in keyword) and think something like `let p :Process = new Process('ls', new CommandLineArguments(new Array<string>('-l'))` is clean and elegant. ..."
"... But also there's a lot of dumb prejudice that all techies are guilty of. Think Bing -- it's a decent search engine now ..."
"... On a completely different note, there's a lot of parallels between the fates of programming languages (and, dare I say, ideas in general ) and the gods of Terry Pratchett's Discworld. I mean, how they are born, how they compete for believers, how they dwindle, how they are reborn sometimes. ..."
"... You merely tell your conclusions and give your audience no chance to independently arrive at the same, they just have to believe you. Most of the presented facts are vague allusions, not hard and verifiable. If you cannot present your evidence and train of thought, then hardly anyone takes you serious even if the expressed opinions happen to reflect the truth. ..."
I like Perl, even though I struggle with it sometimes. I've slowly been pecking away at
getting better at it. I'm a "the right tool for the job" kind of person and Perl really is the
lowest common denominator across many OSes and Bash really has its limits. Perl still trips me
up on occasion, but I find it a very clever and efficient language, so I like it.
I don't understand the hostility towards it given present reality. With the help of Perl
Critic and Perl Tidy it's no longer a "write only language." I find it strange that people call
it a "dead language" when it's still widely used in production.
A lot of resources have been pushed into Python and The Cloud in the past decade. It seems
to me that this has come at the opportunity cost of traditional Linux/Unix sysadmin skills. Perl
is part of that package, along with awk, sed, and friends along with a decent understanding of
how the init system actually works, what kernel tunables do, etc.
I could be wrong, not nearly all seeming correlations are causal relationships. Am I alone
in thinking a decent portion of the hostility towards Perl is a warning sign of weak sysadmin
skills a decent chunk of the time?
Perl was the tool of choice at the dawn of the web and as a result a lot of low to average
skill coders produced a large amount of troublesome code much of which ended up being
critical to business operations. This was complicated by the fact that much early web
interaction was dominated by CGI based forms which had many limitations as well as early Perl
CGI modules having many quirks.
The long term oriented dreaming about the future that started with Perl 6 and matured into
Rakudo also made a lot of people with issues to resolve with the deployed base of mostly Perl
5 code also alienated a lot of people.
Yeah, this is where the hostility comes from. The only reason to be angry at Perl is that
Perl allows you to do almost anything. And this large user base of people who weren't
necessarily efficient programmers -- or even programmers at all -- people like me, that is...
took up Perl on that challenge.
"OK, we'll do it HOWEVER we want."
Perl's flexibility makes it very powerful, and can also make it fairly dangerous. And
whether that code continues to work or not (it generally does), somebody is inevitably going
to have to come along and maintain it, and since anything goes, it can be an amazingly
frustrating experience to try to piece together what the programmer was thinking.
And in a lot of cases, there's no documentation, because after all, the guy was just
trying to solve a problem, so why document it? It's "just for now," right? If you find yourself in this situation enough times, then it's easy to start resenting the
thing that all of these pieces of code have in common: Perl.
I'm glad you're finding Perl to be clever and useful. Because it is. And these days, there
are lots of cool modules and frameworks that make it easier to write maintainable code.
It was also where I began to learn my craft. My coding practices improved as a learned
more, but I appreciate that Matt was there at the time, offering solutions to those who
needed them, when they needed them.
I also appreciate that Matt himself has spoken out about this, saying "The code you find
at Matt's Script Archive is not representative of how even I would code these days."
It's easy to throw stones 25 years later, but I think he did more good than harm. That
might be a minority opinion. In any case, I'm grateful for the start it gave me.
Speaking of his script archive, I believe his early scripts used cgi-lib.pl, which had a
subroutine in it called ReadParse() . That is where my username comes form. It's
a tribute to the subroutine that my career relied on in the early days, before I graduated to
CGI.pm, before I graduated to mod_perl, before I graduated to Dancer and Nginx.
Perl was the tool of choice at the dawn of the web and as a result a lot of low to
average skill coders produced a large amount of troublesome code much of which ended up
being critical to business operations.
So in the context of webdev, it was JavaScript before JavaScript was a thing. No wonder
people still have a bad taste in their mouth lol
As a Bioinformatician I also see a bunch of hostility for Perl, many people claim it is
unreadable and inefficient, but as other pointed, it has a lot of flexibility and if the code
is bad is because of the programmer, not the language.
I don't hate Python but I prefer Perl because I feel more productive on it, so I don't
understand people who said that Perl is far worse than Python. Both have their own advantages
and disadvantages as any other computer language.
'if the code is bad is because of the programmer, not the language'
That's not going to make you feel any better about joining a project and having to work on
a lot of badly written code, nor does it help when you need to trace through your
dependencies and find it too is badly written.
in the end it's entirely possible to write good Perl, but you have to go out of your way
to do so. Bad Perl is just as valid and still works, so it gets used more often than good
Perl.
When you look at a language like Go, it was designed to make writing good go easy and bad
go hard. it's still possible to architect your program in a bad way, but at least your
implementation details will generally make sense to the next person who uses it.
Personally I still really like Perl for one-off tasks that are primarily string
manipulation. it's really good at that, and maintainability doesn't matter nor does anyone
else code. For anything else, there's usually a better tool to reach for.
agree, I also commonly see bad code in Python or Java. I am tented to learn Go too, I was
looking into the syntax but I don't have any project or requirement that needs it.
Also, Bioinformatics has a large part for string manipulation (genes and genomes are
represented as long strings), so Perl fits naturally. Hard tasks are commonly using specific
programs (C/Java/etc) so you need to glue them, for that Bash, Perl or even Python are
perfectly fine.
Much of the hostility comes from new programmers who have not had to work in more than one
language, Part of this is the normal Kubler Ross grief cycle whenever programmers take on a
legacy code base. Part of it has to do with some poorly written free code that became popular
on early Web 1.0 websites from the 90's. Part of this come from the organizations where
"scripting" languages are popular and their "Engineering In Place" approach to
infrastructure.
And then there is the self inflicted major version freeze for 20 years. Any normal project
would have had three or more major version bumps for the amount of change between perl 5.0
and perl 5.30. Instead perl had a schism. Now perl5 teams are struggling to create the
infrastructure needed to release a major version bump. Even seeding the field ahead with
mines just to make the bump from 5 to 7 harder.
What do you think about Javascript's template strings (which can be tagged for custom
behaviors!) and Python's recent f-strings? Well, there's also Ruby's
#{interpolation} which allows arbitrary expressions to be right there, and which
existed for quite a while (maybe even inspiring similar additions elsewhere, directly or
indirectly).
Having to either fall back on sprintf for readability or turn the strings
into a concatenation-fest somewhat tarnishes Perl's reputation as the ideal language for text
processing in this day and age.
My other pet peeve with Perl is how fuzzy its boundaries between bytes and unicode are,
and how you always need to go an extra mile to ensure the string has the exact state you
expect it to be in, at all callsites which care. Basically, string handling in Perl is
something that could be vastly improved for software where bytes/character differences are
important.
What do you think about Javascript's template strings (which can be tagged for custom
behaviors!) and Python's recent f-strings?
I use f-strings in Python for SQL and... I have no particular feelings about them. They're
not the worst thing ever. Delimiters aren't awful. I'm not sure they do much more for me than scalar interpolation in Perl. Maybe because
it's Python I'm always trying to write the most boring code ever because it feels like the
language fights me when I'm not doing it Python's way.
My other pet peeve with Perl is how fuzzy its boundaries between bytes and unicode are,
and how you always need to go an extra mile to ensure the string has the exact state you
expect it to be in, at all callsites which care.
I'd have to know more details about what's bitten you here to have a coherent opinion. I
think people should know the details of the encoding of their strings everywhere it matters,
but I'm not sure that's what you mean.
In Python's f-strings (and JS template strings) you can interpolate arbitrary expressions,
thus no need to pollute the local scope with ad hoc scalars.
Ad hoc scalar pollution hasn't been a problem in code I've worked on, mostly because I try
to write the most boring Python code possible. I've seen Rust, Go, and plpgsql code get really ugly with lots of interpolation in
formatted strings though, so I believe you.
I think Perl is objectively better, performance and toolset wise. My sense is the overhead
of "objectifying" every piece of data is too much of a performance hit for high-volume
database processing. Just one datapoint, but as far as I know, Python doesn't support
prepared statements in Postgres. psycopg2 is serviceable but a far cry from the
nearly-perfect DBI. Sqlalchemy is a wonderful alternative to the also wonderful DBIx::Class,
but performance-wise neither are suitable for ETL.
I think Perl is objectively better, performance and toolset wise. My sense is the
overhead of "objectifying" every piece of data is too much of a performance hit for
high-volume database processing.
It's because objects are not first-class in Perl. Python's objects run circles around
plain Perl blessed hashes, and we aren't even talking about Moose at this point.
The point is, for ETL in particular, the Python paradigm of "everything is an object"
introduces overhead to a process that doesn't need it and is repeated millions or billions of
times, so even small latencies add up quickly. And, if you are using psycopg2, the lack of
prepared statements adds yet more latency. This is a very specific use case where Perl is
unequivocally better.
Do you have any measurements to back this overhead assertion, or are you just imagining it
would be slower, because objects? int, str, float "objects" in Python are objects indeed, but
they also are optimized highly enough to be on par, if not, dare I say, faster than Perl
counterparts.
Also, you can run "PREPARE x AS SELECT" in psycopg2. It's not trying to actively
prevent it from doing it. I also bet the author would add this functionality if someone paid
him, but even big corporations tend to be on the "all take and no give" side, which shouldn't
come as news anyway.
Before you say "but it's inconvenient!", I'd really like my exceptions, context managers
and generator functions in Perl, in a readable and caveat-less style, thank you very much,
before we can continue our discussion about readability.
I think what the Perl haters are missing about the language is that Perl is FUN and
USEFUL. It's a joy to code in. It accomplishes what I think was Larry Wall's primary goal in
creating it: that it is linguistically expressive. There's a certain feeling of freedom when
coding in it. I do get though that it's that linguistic expressiveness characteristic that
makes people coming from Python/Java/JavaScript background dislike about it.
Like you said, the way to appreciate Perl is to be aware that it is part of the Unix
package. I think Larry Wall said it best: "Perl is, in intent, a cleaned up and summarized
version of that wonderful semi-natural language known as 'Unix'."
I don't know why, but people hating on Perl doesn't bother me as much as those who are
adverse to Perl but fond of other languages that heavily borrow from Perl -- without
acknowledging the aspects of their language that were born, and created first in Perl. Most
notably regular expressions; especially Perl compatible expressions.
What else, except for regular expressions, which wasn't borrowed by Perl itself from awk,
C, Algol?
Ruby, for example, does state quite plainly that it aims to absorb the good parts of Perl
without its warts, and add good things of its own. So you have, for example, the "unless"
keyword in it, as well as postfix conditionals. Which are exceptionally good for guard
clauses, IMO.
PHP started specifically as "Perl-lite", thus it borrowed a lot from Perl, variables
having the $ sigil in front of them are taken specifically from Perl, nobody is
denying that.
This doesn't mean this cross-pollination should ever stop, or all other languages suddenly
need to start paying tribute for things they might have got inspired by Perl. Making every
little user on the internets acknowledge that this or that appeared in Perl first does
little, alas, to make Perl better and catch up to what the world is doing today.
It's very much like modern Greeks are so enamored with their glorious past, Alexander the
Great, putting a lot of effort into preserving their ancient history, and to remind the world
about how glorious the ancient Greeks were while the barbarians of Europe were all unwashed
and uncivilized, that they forget to build a glorious present and future.
Also an interesting quote from the Man Himself in 1995:
I certainly "borrowed" some OO ideas from Python, but it would be inaccurate to claim
either that Perl borrowed all of Python's OO stuff, or that all of Perl's OO stuff is
borrowed from Python.
Looking at Perl's OO system, I find myself mildly surprised, because it's nothing
like Python's. But here you have, cross-pollination at work.
My feelings is that Perl is an easy language to learn, but difficult to master. By master
I don't just mean writing concise, reusable code, but I mean readable, clean, well-documented
code.
I can count on one hand the Perl developers I've known that really write such clean Perl
code. I feel the freehand style of perl has been a double-edged sword. With freedom comes to
many developers a relaxed sense of responsibility.
I feel that the vast amount of poorly written code that has been used in the wild, which
has earned (as we've all heard at one time or another) the dubious honor of being the "duct
tape and bailing wire" language that glues the IT world together has caused a lot of people
to be biased poorly against the language as a whole.
Larry Wall designed perl using a radically different approach from the conventional wisdom
among the Computer Science intelligentsia, and it turned out to be wildly successful. They
find this tremendously insulting: it was an attack on their turf by an outsider (a guy who
kept talking about linguistics when all the cool people know that mathematical elegance is
the only thing that matters).
You would have to say that Larry Wall has conceded he initally over-did some things, and
the perl5 developers later set about fixing them as well as they could, but perl's detractors
always seem to be unaware of these fixes: they've never heard of "use strict", and they
certainly haven't ever heard of //x extended regex syntax.
The CS-gang responded to this situation with what amounts to a sustained smear campaign,
attacking perl at every opportunity, and pumping up Python as much as possible.
Any attempt at understanding this situation is going to fail if you try to understand it
on anything like rational grounds-- e.g. might their be some virtue in Python's rigid syntax?
Maybe, but it can't possibly have been a big enough advantage to justify re-writing all of
CPAN.
You described it as if there had been a religious war, or a conspiracy, and not simple
honest-to-god pragmatism at work. People have work to do, that's all there is to it.
Because I really don't think it was. I was there, and I've been around for quite some
time, and I've watched many technical fads take off before there was any there there which
then had people scrambling to back-fill the Latest Thing with enough of a foundation to keep
working with it. Because once one gets going, it can't be stopped without an admission we
were wrong again.
The questionable objections to perl was not that it was useless-- the human genome project, web 1.0,
why would anyone need to defend perl? The questionable objections were stuff like "that's an ugly
language".
I generally agree that weak *nix skills is part of it. People don't appreciate the fact
that Perl has very tight integration with unix (fork is a top-level built in keyword) and
think something like `let p :Process = new Process('ls', new CommandLineArguments(new
Array<string>('-l'))` is clean and elegant.
But also there's a lot of dumb prejudice that all techies are guilty of. Think Bing --
it's a decent search engine now, but everyone who has ever opened a terminal window thinks it
because it had a shaky first few years. Perl 4 and early Perl 5 which looked like Perl 4 was
basically our "Bing".
On a completely different note, there's a lot of parallels between the fates of
programming languages (and, dare I say, ideas in general ) and the gods of Terry
Pratchett's Discworld. I mean, how they are born, how they compete for believers, how they
dwindle, how they are reborn sometimes.
(Take it with a grain of salt, of course. But I generally like the idea of ideas being a
kind of lifeforms unto themselves, of which we the minds are mere medium.)
I follow the analogy. Ideas are in some sense "alive" (and tend to follow a viral model in
my view) to a great extent. I have not read Discworld, so the rest I do not follow.
Can you spell it out for me, especially any ideas you have about a Perl renaissance? I
have a gut sense Perl is at the very start of one, but feel free to burst my bubble if you
think I'm too far wrong.
In Perl, length(@list) still wasn't doing what any reasonable person would
expect it to do.
CPAN was full of zombie modules. There is a maintainer who is apparently unaware what
"deprecated" means, so you have a lot of useful and widely-used modules DEPRECATED without
any alternative.
So far, there only exists one Perl implementation. Can you compile it down to Javascript
so there is a way to run honest-to-god Perl in a browser? Many languages can do it without
resorting to emscripten or WebAssembly.
I'm not aware of any new Perl 5 books which would promote good programming style, instead
of trying to enamore the newbies with cutesy of "this string of sigils you can't make heads
or tails of prints 'Yet another Perl hacker! How powerful!'". Heck, I'm not aware of any Perl
books published in the last decade at all . So much for teaching good practices to
newbies, eh?
Perl is a packaging nightmare, compared to Python (and Python is a nightmare too in this
regard, but a far far more manageable one), Javascript (somewhat better), or Go. It takes a
lot of mental gymnastics to make Perl CI-friendly and reproducible-process-friendly (not the
same as reproducible-builds-friendly, but still a neat thing to have in this day and
age).
Tell me about new Perl projects that started in 2020, and about teams who can count their
money who would consciously choose Perl for new code.
And the community. There are feuds between that one developer and the world, between that
other developer and the world, and it just so happens that those two wrote things 90% of
functioning CPAN mass depends on one way or the other.
I don't hate Perl (it makes me decent money, why would I?), so much as I have a pity for
it. It's a little engine that could, but not the whole way up.
In Perl, length(@list) still wasn't doing what any reasonable person would expect it to
do.
I'm not aware of any new Perl 5 books which would promote good programming style,
instead of trying to enamore the newbies with cutesy of "this string of sigils you can't
make heads or tails of prints 'Yet another Perl hacker! How powerful!'". Heck, I'm not
aware of any Perl books published in the last decade at all.
Which could use a new edition, granted, as the last one is from 2015
There are a couple of things I'd like to add in a new edition (postfix dereferencing,
signatures), but I might wait to see if there's more clarification around Perl 7 first.
You really need to explain how you arrive at the claim that "Perl is a packaging
nightmare" – I am a packager – and also be less vague about the other things you
mention. It is not possible to tell whether you are calibrated accurately against
reality.
so there is a way to run honest-to-god Perl in a browser?
You merely tell your conclusions and give your audience no chance to independently arrive
at the same, they just have to believe you. Most of the presented facts are vague allusions,
not hard and verifiable. If you cannot present your evidence and train of thought, then
hardly anyone takes you serious even if the expressed opinions happen to reflect the
truth.
The tooling around CPAN, including cpan and cpanm alike, last
that I looked at them, did a depth-first dependency resolution. So, the first thing is a
module A. It depends on module B, get the latest version of that. It depends on C, get the
latest version of that. Finally, install C, B, and A. Next on the dependency list is module
D, which depends on module E, which wants a particular, not the latest, version of B. But B
is already installed and at the wrong version! So cpan and cpanm
alike will just give up at this point, leaving my $INSTALL_BASE in a broken
state.
Note that we're talking about second- and third-order dependencies here, in the ideal
world, I'd prefer I didn't have to know about them. In a particular codebase I'm working on,
there are 130 first-order dependencies already.
Carton, I see, is trying to sidestep the dependency resolution of cpanm .
Good, but not good enough, once your codebase depends on Image::Magick . Which
the one I care about does. You cannot install it from CPAN straight, not if you want a
non-ancient version.
So I had to write another tool that is able to do breadth-first search when resolving
dependencies, so that I either get an installable set of modules, or an error before it
messes up the $INSTALL_BASE . In the process, I've learned a lot about the
ecosystem which is in the category of "how sausages are made": for example, in the late 2017
and early 2018, ExtUtils::Depends , according to MetaCPAN, was provided by
Perl-Gtk . Look it up if you don't believe me, ask the MetaCPAN explorer
this:
/module/_search?q=module.name.lowercase:"extutils::depends" AND maturity:released&sort=version_numified:desc&fields=module.name,distribution,date
The first entry rectifies the problem, but it was released in 2019. In 2018, MetaCPAN
thought that ExtUtils::Depends was best served by Perl-Gtk . Also,
to this day, MetaCPAN thinks that the best distribution to contain
Devel::CheckLib is Sereal-Path .
Oh. And I wanted an ability to apply arbitrary patches to distributions, which fix issues
but the maintainers can't be bothered to apply them, or which remove annoyances. Not
globally, like cpan+distroprefs does, but per-project. (Does Carton even work with
distroprefs or things resembling them?) Yes, I know I can vendor in a dependency, but it's a
third-order dependency, and why should I even bother for a one-liner patch?
Now, the deal is, I needed a tool that installs for me everything the project depends on,
and does it right on the first try, because the primary user of the tool is CI, and there are
few things I hate more than alarms about broken, flaky builds. Neither Carton, nor cpanminus,
nor cpan could deliver on this. Maybe they do today, but I simply don't care anymore, good
for them if they do. I've got under a very strong impression that the available Perl tooling
is still firmly in the land of sysadmin practices from the 1990s, and it's going to take a
long while before workflows that other stacks take for granted today arrive there.
P. S. I don't particularly care how seriously, if at all, I'm taken here. There have been
questions asked, so I'm telling about my experience. Since comments which express a dislike
with particular language warts but don't have a disclaimer "don't get me wrong, I ABSOLUTELY
LOVE Perl" get punished by downvoting here, I feel that the judgment may be mutual.
I am glad that you wrote a post with substance, thank you for taking the time to gather
the information. That makes it possible to comment on it and further the insight and reply
with corrections where appropriate. From my point of view the situation is not as bad as you
made it out to be initially, let me know what you think.
punished by downvoting
I didn't downvote you because I despise downvote abuse; it's a huge problem on Reddit and
this subreddit is no exception. I upvoted the top post to prevent it from becoming
hidden.
dependency resolution
You got into the proverbial weeds by trying to shove the square peg into the round hole,
heaping work-around onto work-around. You should have noticed that when you "had to write
another tool"; that would be the point in time to stop and realise you need to ask experts
for advice. They would have put you on the right track: OS level packaging. That's what I
use, too, and it works pretty good, especially compared with other programming languages and
their respective library archives. Each dep is created in a clean fakeroot, so it is
impossible to run into " $INSTALL_BASE in a broken state". Image::Magick is not
a problem because it's already packaged, and even in the case of a library in a similar
broken state it can be packaged straight-forward because OS level packaging does not care
about CPAN releases per se. Module E depending on a certain version of B is not a problem
because a repo can contain many versions of B and the OS packager tool will resolve it
appropriately at install time. Per-project patches are not a problem because patches are a
built-in feature in the packaging toolchain and one can set up self-contained repos for
packages made from a patched dist if they should not be used in the general case.
MetaCPAN explorer
I hope you reported those two bugs. Using that API to resolve deps is a large blunder.
cpan uses the index files, cpanm uses cpanmetadb.
sysadmin practices from the 1990s
There's nothing wrong with them, these practices do not lose value just by time passing
by, the practices and corresponding software tools are continually updated over the years,
and the fundamentals are applicable with innovative packaging targets (e.g. CI environments
or open containers).
I simply don't care anymore
I omit the details showing how to do packaging properly.
Re: OS packaging versus per-project silos. The latter approach is winning now. There's an
ongoing pressure that OS-packaged scripting runtimes (like Python, Ruby, and Perl) should
only be used by the OS-provided software which happens to depend on it. I think I've read
some about it even here on this sub.
And I'll tell you why. By depending on OS-provided versions of modules, you basically cast
your fate to Canonical, or Red Hat, or whoever else is maintaining the OS, and they don't
really care that what they thought was a minor upgrade broke your code (say,
Crypt::PW44 changed its interface when moving from 0.13 to 0.14, how could
anyone even suspect?). You are too small a fish for them to care. They go with whatever the
upstream supplies. And you have better things to do than adapt whenever your underlying OS
changes things behind your back. Keeping a balance between having to rebuild what's needed in
response to key system libraries moving can be work enough.
There's also this problem when a package you absolutely need becomes orphaned by the
distribution.
So any sane project with bigger codebases will keep their own dependency tree. Not being
married to a particular OS distribution helps, too.
So, keeping your own dependency tree is sort of an established pattern now. Try to suggest
to someone who maintains Node-based software that they use what, for example, CentOS provides
in RPMs instead of directly using NPM. They will die of laughter, I'm afraid.
Re: sysadmin practices from 1990s. They are alive and well, the problem is that they are
not nearly enough in the world of cloud where you can bring a hundred servers to life
with an API call and tear down other 100 with another API call. "Sysadmin practices from
1990s" assume very dedicated humans who work with physical machines and know names of each of
them, think a university lab. Today, you usually need a superset of these skills and
practices. You need to manage machines with other machines. Perl's tooling could be vastly
improved in this regard.
Re: CPAN, MetaCPAN, cpanmetadb and friends . So I'm getting confused. Which metadata
database is authoritative, the most accurate and reliable for package metadata retrieval,
including historical data? MetaCPAN, despite its pain points, looks the most complete so far.
cpanmetadb doesn't have some of the metacpan's bugs, but I'm wary of it as it looks like it's
a one man show (one brilliant man show, but still) and consists of archiving package
metadata files as they change.
Also, I don't think that if one Marc Lehmann provides such metadata in
Gtk-Perl that metacpan starts honestly thinking it provides
ExtUtils::Depends (which is by the same author, so fair enough), there can be
anything done about it. When I pointed out those things, I was lamenting the state of the
ecosystem as such more than any tool. With metacpan, my biggest peeve is that they use
ElasticSearch as the database, which is wrong on many levels (like 404-ing legit packages
because the search index died, WTF? It also appears anyone on the internets can purge the
cache with a curl command, WTF???)
The MetaCPAN API is not the canonical way to resolve module dependencies, and is not used
by CPAN clients normally, only used by cpanm when a specific version or dev
release of a module is requested. See https://cpanmeta.grinnz.com/packages for a way to
search the 02packages index, which is canonical.
I understand you are beyond this experience, but for anyone who runs into similar problems
and wants guidance for the correct solutions, please ask the experts at #toolchain on
irc.perl.org (politely, we are all volunteers) before digging yourself further holes.
In Perl, length(@list) still wasn't doing what any reasonable person would expect it to
do.
I would be quite against overloading length to have a different effect when
passed an array. But I don't disagree that "array in scalar context" being the standard way
to get its size is unintuitive.
He claims that functions cannot be redeclared (they can, but you get a warning).
He recommends calling functions using the ampersand ( that's a bad idea ).
Looking at the table of contents, I see he calls hashes "associative array variables" (a
term that Perl stopped using when Perl 5 was released in 1994).
This is not a book that I would recommend to anyone.
Update: And here's Pro Perl Programming by the same author.
In the preview, he uses bareword filehandles instead of the lexical variables that have been
recommended for about fifteen years.
I have run into developers who active loathe Perl for it's "line noise" quality.
Unfortunately, I think they've mostly every encountered bad Perl, to which they would
respond, "Is there any other kind of Perl?"
The benchmarks here do not try to be complete, as they are showing the performance of the
languages in one aspect, and mainly: loops, dynamic arrays with numbers, basic math operations
.
This is a redo of the tests
done in previous years . You are strongly encouraged to read the additional information
about the tests in the article
.
"One of the first programming languages." Wow. That kinda dismisses about 30 years of
programming language history before Perl, and at least a couple of dozen major languages,
including LISP, FORTRAN, Algol, BASIC, PL/1, Pascal, Smalltalk, ML, FORTH, Bourne shell and
AWK, just off the top of my head. Most of what exists in today's common (and even
not-so-common) programming languages was invented before Perl.
That said, I know you're arm-waving the history here, and those details are not really
part of the point of your post. But I do have a few comments on the meat of your post.
Perl is a bit punctuation- and magic-variable-heavy, but is far from unique in being so.
One example I just happened to be looking at today is VTL-2
("A Very Tiny Language") which, admittedly, ran under unusually heavy memory constraints (a
768 byte interpreter able to run not utterly trivial programs in a total of 1 KB of
memory). This uses reading from and assignment to special "magic" variables for various
functions. X=? would read a character or number from the terminal and assign
it to X ; ?=X would print that value. # was the
current execution line; #=300 would goto line 300. Comparisons returned 0 or
1, so #=(X=25)\*50 was, "If X is equal to 25, goto line 50."
Nor is Perl at all exotic if you look at its antecedents. Much of its syntax and
semantics are inspired by Bourne shell, AWK and similar languages, and a number of these
ideas were even carried forward into Ruby. Various parts of that style (magic variables,
punctuation prefixes/suffixes determining variable type, automatic variable interpolation
in strings, etc.) have been slowly but steadily going out of style since the 70s, for good
reasons, but those also came into existence for good reasons and were not at all unique to
Perl. Perl may look exotic now, but to someone who had been scripting on Unix in the 80s
and 90s, Perl was very comfortable because it was full of common idioms that they were
already familiar with.
I'm not sure what you mean by Perl "[not supporting] functions with arguments";
functions work the same way that they work in other languages, defined with sub foo {
... } and taking parameters; as with Bourne shell, the parameters need not be
declared in the definition. It's far from the only language where parentheses need not be
used to delimit parameters when calling a function. Further, it's got fairly good
functional and object-oriented programming support.
I'm not a huge fan of Perl (though I was back in the early '90s), nor do I think its
decline is unwarranted (Ruby is probably a better language to use now if you want to
program in that style), but I don't think you give it a fair shake here.
Nor is it its decline, along with COBOL and Delphi, anything to do with age. Consider
LISP, which is much older, arguably weirder, and yet is seeing if anything a resurgence of
popularity (e.g., Clojure) in the last ten years.
There are many languages indeed. Speaking from a career-wise, professional perspective
here. It could be quite difficult to make a career today out of those.
About functions. What I mean is that Perl doesn't do functions with arguments like
current languages do. "func myfunction(arg1, arg2, arg3)."
It's correct to say that Perl has full support for routines and parameters, it does and
even in multiple ways, but it's not comparable to what is in mainstream languages
today.
I can understand, that you don't like Perl as a language, but it doesn't mean you
should write misconceptions about it.
Personally I think Perl won't go anywhere. Nobody wants to rewrite existing scripts
that are used by system tools, ie. dpkg utilities in Debian or Linux kernel profiling
stuff. As a real scripting language for basic system tasks is still good enough and
probably you won't find better replacement.
And nobody uses CGI module from Perl in 2019. Really.
I see by "functions with arguments" you mean specifically call-site checking against
a prototype. By that definition you can just as well argue that Python and Ruby "don't
support functions with arguments" because they also don't do do call-site checking
against prototypes in the way that C and Java do, instead letting you pass a string to
a function expecting an integer and waiting until it gets however much further down the
call stack before generating an exception.
"Dynamic" languages all rely to some degree or other on runtime checks; how and what
you check is something you weigh against other tradeoffs in the language design. If you
were saying that you don't like the syntax of sub myfunction { my ($arg1, $arg2,
$arg3) = @_; ...} as compared to def myfunction(arg1, arg2, arg3):
... that would be fair enough, but going so far as to say "Perl doesn't support
functions with arguments" is at best highly misleading and at worst flat-out wrong.
Particularly when Perl does have prototypes with more call site checking
than Python or Ruby do, albeit as part of a language feature for doing
things that neither those nor any other language you mention support.
In fact, many languages even deliberately provide support to remove parameter count
checks and get Perl's @_ semantics. Python programmers regularly use
def f(*args): ... ; C uses the more awkward varargs .
And again I reiterate (perhaps more clearly this time): Perl was in no way
"genuinely unique and exotic" when it was introduced; it brought together and built on
a bunch of standard language features from various languages that anybody programming
on Unix above the level of C in the 80s and 90s was already very familiar with.
Also, I forgot to mention this in my previous comment, but neither Python nor Perl
have ever been required by POSIX (or even mentioned by it, as far as I know), nor did
Python always come pre-installed on Linux distributions. Also, it seems unlikely to be
a "matter of time" until Python gets removed from the default Ubuntu install since
Snappy and other Canonical tools are written in it.
There are plenty of folks making a career out of Clojure, which is one flavour of
LISP, these days. According to your metric, Google Trends, it overtook
OCaml years ago, and seems to be trending roughly even, which is
better than Haskell is doing .
Perl gets picked on for its syntax. It is able to represent very complex programs with
minimalist tokens. A jumble of punctuation can serve to represent an intricate program. This is
trivial terseness in comparison to programming languages like APL (or its later ASCII-suitable
descendants, such as J), where not a single character is wasted.
The Learning Curb
Something can be said for terseness. Rust, having chosen fn to denote
functions, seems to have hit a balance in that regard. There is very little confusion over what
fn means these days, and a simple explanation can immediately alleviate any
confusion. Don't confuse initial confusion with permanent confusion. Once you get over
that initial "curb" of confusion, we don't have to worry any more.
Foreign !=
Confusing
You'll also find when encountering a new syntax that you will immediately not understand,
and instead wish for something much simpler. Non-C++ programers, for example, will raise an
eyebrow at the following snippet:
I remember my first encounter with C++ lambdas, and I absolutely hated the syntax. It
was foreign and unfamiliar, but other than that, my complaints stopped. I could have said "This
is confusing," but after having written C++ lambda expressions for years the above syntax has
become second nature and very intuitive. Do not confuse familiarity with
simplicity.
Explicit is Better than Implicit
except when it needlessly verbose.
Consider the following code:
template <typename T, typename U, int N>
class some_class {};
Pretty straightforward, right?
Now consider this:
class<T, U, int N> some_class {};
Whoa that's not C++!
Sure, but it could be, if someone were convinced enough that it warranted a proposal,
but I doubt it will happen any time soon.
So, you know it isn't valid C++, but do you know what the code means? I'd
wager that the second example is quite clear to almost all readers. It's semantically identical
to the former example, but significantly terser . It's visually distinct from any
existing C++ construct, yet when shown the two "equivalent" code samples side-by-side you can
immediately cross-correlate them to understand what I'm trying to convey.
There's a lot of bemoaning the verbosity of C++ class templates, especially in comparison to
the syntax of generics in other languages. While they don't map identically, a lot of the
template syntax is visual noise that was inserted to be "explicit" about what was
going on, so as not to confuse a reader that didn't understand how template syntax works.
The template syntax, despite being an expert-friendly feature , uses
a beginner-friendly syntax. As someone who writes a lot of C++ templates, I've often
wished for terseness in this regard.
Looking at the API for data::person , we can see that bar() is a
deprecated alias of name() , and frombulate() is deprecated in favor
of get_people() . And using the name foo to refer to a sequence of
data::person seems silly. We have an English plural people . Okay,
let's fix all those things too:
Perfect! We're now know exactly what we're doing: Sorting a list of people by name.
Crazy idea, though Let's put those auto s back in and see what happens:
auto people = get_people();
std::sort(
people.begin(),
people.end(),
[](auto&& lhs, auto&& rhs) {
return lhs.name() < rhs.name();
}
);
Oh no! Our code has suddenly become unreadable again and oh.
Oh wait.
No, it's just fine. We can see that we're sorting a list of people by their name. No
explicit types needed. We can see perfectly well what's going on here. Using foo
and bar while demonstrating why some syntax/semantics are bad is muddying the
water. No one writes foo and bar in real production-ready code. (If
you do, please don't send me any pull requests.)
Even Terser?
std::sort in the above example takes an iterator pair to represent a "range" of
items to iterate over. Iterators are pretty cool, but the common case of "iterate the whole
thing " is common enough to warrant "we want ranges." Dealing with iterables should be
straightforward and simple. With ranges, the iterator pair is extracted implicitly, and we
might write the above code like this:
auto people = get_people();
std::sort(
people,
[](auto&& lhs, auto&& rhs) {
return lhs.name() < rhs.name();
}
);
That's cool! And we could even make it shorter (even fitting the whole sort()
call on a single line) using an expression lambda:
auto people = get_people();
std::sort(people, [][&1.name() < &2.name()]);
What? You haven't seen this syntax before? Don't worry, you're not alone: I made it up. The
&1 means "the first argument", and &2 means "the second
argument."
Note: I'm going to be using range-based algorithms for the remainder of this post, just to
follow the running theme of terseness.
A Modest Proposal: Expression Lambdas
If my attempt has been successful, you did not recoil in horror and disgust as the sight of
my made-up "expression lambda" syntax:
[][&1.name() < &2.name()]
Here's what I hope:
You are over the "learning curb" as you've seen how the syntax corresponds to an earlier
syntax. (The "expression lambda" is roughly equivalent to the lambda in the prior
example).
You have seen how a prior "foreign" example ("terse" templates) can be understandable,
even if not perfect.
You know exactly what it means because the example does not simply use "dummy"
identifiers ( foo , bar , baz , etc.) and actually
acts in a real-world-use-case capacity.
Yes, the lead-in paragraphs were me buttering you up in preparation for me to unveil the
horror and beauty of "expression lambdas."
I am aware of the abbreviated lambdas proposals, and I am aware that it was shot down as
(paraphrasing) "they did not offer sufficient benefit for their added cost and complexity."
Besides that, "expression lambdas" are not abbreviated lambdas. Rather, the original
proposal document cites this style as "hyper-abbreviated" lambdas. The original authors note
that their abbreviated lambda syntax "is about as abbreviated as you can get, without loss of
clarity or functionality." I take that as a challenge.
For one, I'd note that all their examples use simplistic variables names, like
a , b , x , y , args , and
several others. The motivation for the abbreviated lambda is to gain the ability to wield
terseness where verbosity is unnecessary. Even in my own example, I named my parameters
lhs and rhs to denote their position in the comparison, yet there is
very little confusion as to what was going on. I could as well have named them a
and b . We understood with the context what they were. The naming of parameters
when we have such useful context clues is unnecessary!
I don't want abbreviated lambdas. I'm leap-frogging it and proposing hyper-abbreviated
lambdas, but I'm going to call them "expression lambdas," because I want to be different (and I
think it's a significantly better name).
Use-case: Calling an overload-set
C++ overload sets live in a weird semantic world of their own. They are not objects, and you
cannot easily create an object from one. For additional context, see Simon Brand's talk on the subject . There
are several proposals floating around to fill this gap, but I contend that "expression lambdas"
can solve the problem quite nicely.
Suppose I have a function that takes a sequence of sequences. I want to iterate over each
sequence and find the maximum-valued element within. I can use std::transform and
std::max_element to do this work:
Oops! I can't pass std::max_element because it is an overload set, including
function templates. How might an "expression lambda" help us here? Well, take a look:
Cool. We capture like a regular lambda [&] and pass the comparator as an
argument to max_element . What does the equivalent with regular lambdas look
like?
That's quite a bit more. And yes, that decltype(<expr>) is required for
proper SFINAE when calling the closure object. It may not be used in this exact context, but it
is useful in general.
What about variadics?
Simple:
[][some_function(&...)]
What about perfect forwarding?
Well we're still in the boat of using std::forward<decltype(...)> on that
one. Proposals for a dedicated "forward" operator have been shot down repeatedly. As someone
who does a lot of perfect forwarding, I would love to see a dedicated operator (I'll
throw up the ~> spelling for now).
The story isn't much better for current generic lambdas, though:
I use Python somewhat regularly, and overall I consider it to be a very good language. Nonetheless, no
language is perfect. Here are the drawbacks in order of importance to me personally:
It's slow. I mean really, really slow. A lot of times this doesn't matter, but it definitely means
you'll need another language for those performance-critical bits.
Nested functions kind of suck in that you can't modify variables in the outer scope.
Edit:
I
still use Python 2 due to library support, and this design flaw irritates the heck out of me, but
apparently it's fixed in Python 3 due to the
nonlocal
statement.
Can't wait for the libs I use to be ported so this flaw can be sent to the ash heap of history for
good.
It's missing a few features that can be useful to library/generic code and IMHO are simplicity
taken to unhealthy extremes. The most important ones I can think of are user-defined value types
(I'm guessing these can be created with metaclass magic, but I've never tried), and ref function
parameter.
It's far from the metal. Need to write threading primitives or kernel code or something? Good luck.
While I don't mind the lack of ability to catch
semantic
errors
upfront as a tradeoff for the dynamism that Python offers, I wish there were a way to catch
syntactic errors and silly things like mistyping variable names without having to actually run the
code.
The documentation isn't as good as languages like PHP and Java that have strong corporate backings.
Mark
Canlas
,
@Casey, I have to disagree. The index is horrible - try looking up the
with
statement,
or methods on a
list
.
Anything covered in the tutorial is basically unsearchable. I have much better luck with
Microsoft's documentation for C++.
�
Mark
Ransom
Oct
29 '10 at 6:14
2 revs
, 2011-07-24 13:49:48
I hate that Python can't distinguish between declaration and usage of a variable. You don't need
static typing to make that happen. It would just be nice to have a way to say "this is a variable that
I deliberately declare, and I
intend
to
introduce a new name, this is not a typo".
Furthermore, I usually use Python variables in a write-once style, that is, I treat variables as being
immutable and don't modify them after their first assignment. Thanks to features such as list
comprehension, this is actually incredibly easy and makes the code flow more easy to follow.
However, I can't document that fact. Nothing in Python prevents me form overwriting or reusing
variables.
In summary, I'd like to have two keywords in the language:
var
and
let
.
If I write to a variable not declared by either of those, Python should raise an error. Furthermore,
let
declares
variables as read-only, while
var
variables
are "normal".
Consider this example:
x = 42 # Error: Variable `x` undeclared
var x = 1 # OK: Declares `x` and assigns a value.
x = 42 # OK: `x` is declared and mutable.
var x = 2 # Error: Redeclaration of existing variable `x`
let y # Error: Declaration of read-only variable `y` without value
let y = 5 # OK: Declares `y` as read-only and assigns a value.
y = 23 # Error: Variable `y` is read-only
Notice that the types are still implicit (but
let
variables
are for all intents and purposes statically typed since they cannot be rebound to a new value, while
var
variables
may still be dynamically typed).
Finally, all method arguments should automatically be
let
,
i.e. they should be read-only. There's in general no good reason to modify a parameter, except for the
following idiom:
def foo(bar = None):
if bar == None: bar = [1, 2, 3]
This could be replaced by a slightly different idiom:
def foo(bar = None):
let mybar = bar or [1, 2, 3]
Evan
Plaice
,
I so so wish Python had a "var" statement. Besides the (very good) reason you state, it would
also make it a lot easier to read the code because then you can just scan over the page to
spot all the variable declarations.
�
jhocking
Jul
11 '11 at 23:19
2 revs, 2 users 67%
, 2012-09-08 13:01:49
My main complaint is threading, which is not as performant in many circumstances (compared to Java, C
and others) due to the global interpreter lock (see
"Inside
the Python GIL" (PDF link)
talk)
However there is a
multiprocess
interface
that is very easy to use, however it is going to be heavier on memory usage for the same
number of processes vs. threads, or difficult if you have a lot of shared data. The benefit however,
is that once you have a program working on with multiple processes, it can scale across multiple
machines, something a threaded program can't do.
I really disagree on the critique of the documentation, I think it is excellent and better than most
if not all major languages out there.
Also you can catch many of the runtime bugs running
pylint
.
dbr
,
+1 for pylint. I was unaware of it. Next time I do a project in Python, I'll try it out.
Also, multithreading seems to work fine if you use Jython instead of the reference CPython
implementation. OTOH Jython is somewhat slower than CPython, so this can partially defeat the
purpose.
�
dsimcha
Oct
29 '10 at 0:48
Jacob
, 2010-10-28 22:33:08
Arguably
, the lack of static typing, which can introduce certain classes of
runtime
errors,
is not worth the added flexibility that duck typing provides.
Jacob
,
This is correct, though there are tools like PyChecker which can check for errors a compiler
in languages like C/Java would do.
�
Oliver
Weiler
Oct
28 '10 at 23:42
2 revs
, 2010-10-29 14:14:06
I think the object-oriented parts of Python feel kind of "bolted on". The whole need to explicitly
pass "self" to every method is a symptom that it's OOP component wasn't expressly
planned
,
you could say; it also shows Python's sometimes warty scoping rules that were criticized in another
answer.
Edit:
When I say Python's object-oriented parts feel "bolted on", I mean that at times, the OOP side feels
rather inconsistent. Take Ruby, for example: In Ruby,
everything
is
an object, and you call a method using the familiar
obj.method
syntax
(with the exception of overloaded operators, of course); in Python, everything is an object, too, but
some methods you call as a function; i.e., you overload
__len__
to
return a length, but call it using
len(obj)
instead
of the more familiar (and consistent)
obj.length
common
in other languages. I know there are reasons behind this design decision, but I don't like them.
Plus, Python's OOP model lacks any sort of data protection, i.e., there aren't private, protected, and
public members; you can mimic them using
_
and
__
in
front of methods, but it's kind of ugly. Similarly, Python doesn't
quite
get
the message-passing aspect of OOP right, either.
ncoghlan
,
The self parameter is merely making
explicit
what
other languages leave implicit. Those languages clearly have a "self" parameter.
�
Roger
Pate
Oct
29 '10 at 6:08
MAK
, 2010-11-11 13:38:01
Things I don't like about Python:
Threading (I know its been mentioned already, but worth mentioning in every post).
No support for multi-line anonymous functions (
lambda
can
contain only one expression).
Lack of a simple but powerful input reading function/class (like
cin
or
scanf
in
C++ and C or
Scanner
in
Java).
All strings are not unicode by default (but fixed in Python 3).
def foo(a, L = []):
L.append(a)
print L
>>> foo(1)
[1]
>>> foo(2)
[1, 2]
It's usually the result of some subtle bugs. I think it would be better if it created a new list
object whenever a default argument was required (rather than creating a single object to use for every
function call).
Edit: It's not a huge problem, but when something needs to be referred in the docs, it commonly means
it's a problem. This shouldn't be required.
def foo(a, L = None):
if L is None:
L = []
...
Especially when that should have been the default. It's just a strange behavior that doesn't match
what you would expect and isn't useful for a large number of circumstances.
Patrick
Collins
,
I see lots of complaints about this, but why does people insist having an empty list (that
the function modifies) as a default argument? Is this really such a big problem? I.e., is
this a
real
problem?
�
Martin
Vilcans
Jul
25 '11 at 21:22
3 revs
, 2011-07-15 03:15:50
Some of Python's features that make it so flexible as a development language are also seen as major
drawbacks by those used to the "whole program" static analysis conducted by the compilation and
linking process in languages such as C++ and Java.
Implicit declaration of local variables
Local variables are declared using the ordinary assignment statement. This means that variable
bindings in any other scope require explicit annotation to be picked up by the compiler (global and
nonlocal declarations for outer scopes, attribute access notation for instance scopes). This massively
reduces the amount of boilerplate needed when programming, but means that third party static analysis
tools (such as pyflakes) are needed to perform checks that are handled by the compiler in languages
that require explicit variable declarations.
"Monkey patching" is supported
The contents of modules, class objects and even the builtin namespace can be modified at runtime. This
is hugely powerful, allowing many extremely useful techniques. However, this flexibility means that
Python does not offer some features common to statically typed OO languages. Most notably, the "self"
parameter to instance methods is explicit rather than implicit (since "methods" don't have to be
defined inside a class, they can be added later by modifying the class, meaning that it isn't
particularly practical to pass the instance reference implicitly) and attribute access controls can't
readily be enforced based on whether or not code is "inside" or "outside" the class (as that
distinction only exists while the class definition is being executed).
Far from the metal
This is also true of many other high level languages, but Python tends to abstract away most hardware
details. Systems programming languages like C and C++ are still far better suited to handling direct
hardware access (however, Python will quite happily talk to those either via CPython extension modules
or, more portably, via the
ctypes
library).
Using indentation for code blocks instead of {} / begin-end, whatever.
Every newer modern language has proper lexical scoping, but not Python (see below).
Chaotic docs (compare with Perl5 documentation, which is superb).
Strait-jacket (there's only one way to do it).
Example for broken scoping; transcript from interpreter session:
>>> x=0
>>> def f():
... x+=3
... print x
...
>>> f()
Traceback (most recent call last):
File "", line 1, in ?
File "", line 2, in f
UnboundLocalError: local variable 'x' referenced before assignment
global
and
nonlocal
keywords
have been introduced to patch this design stupidity.
This is particularly bad because a large number of the functions (rather than methods) are just dumped
into the
global
namespace
: methods relating to lists, strings, numbers, constructors, metaprogramming, all mixed
up in one big alphabetically-sorted list.
At the very least, functional languages like F# have all the functions properly namespaced in modules:
List.map(x)
List.reversed(x)
List.any(x)
So they aren't all together. Furthermore, this is a standard followed throughout the library, so at
least it's consistent.
I understand the reasons for doing the
function
vs
method
thing,
but i still think it's a bad idea to mix them up like this. I would be much happier if the
method-syntax was followed, at least for the common operations:
Whether the methods are mutating or not, having them as methods on the object has several advantages:
Single place to look up the "common" operations on a data-type: other libraries/etc. may have other
fancy things they can do to the datatypes but the "default" operations are all in the object's
methods.
No need to keep repeating the
Module
when
calling
Module.method(x)
.
Taking the functional List example above, why do i have to keep saying
List
over
and over? It should know that it's a
List
and
I don't want to call the
Navigation.map()
function
on it! Using the
x.map()
syntax
keeps it DRY and still unambiguous.
And of course it has advantages over the
put-everything-in-global-namespace
way
of doing it. It's not that the current way is
incapable
of
getting things done. It's even pretty terse (
len(lst)
),
since nothing is namespaced! I understand the advantages in using functions (default behavior, etc.)
over methods, but I still don't like it.
It's just messy. And in big projects, messiness is your worst enemy.
Wayne
Werner
,
yeah... I really miss LINQ style (I'm sure LINQ isn't the first to implement it, but I'm most
familiar with it) list handling.
�
CookieOfFortune
Sep
8 '12 at 15:38
Python had to wait for 3.x to add a "with" keyword. In any homoiconic language it could have trivially
been added in a library.
Most other issues I've seen in the answers are of one of 3 types:
1) Things that can be fixed with tooling (e.g. pyflakes) 2) Implementation details (GIL, performance)
3) Things that can be fixed with coding standards (i.e. features people wish weren't there)
#2 isn't a problem with the language, IMO #1 and #3 aren't serious problems.
dbr
,
with
was available from Python 2.5 with
from
__future__ import with_statement
, but I agree, I've occasionally found it unfortunate
that statements like
if
/
for
/
print
/etc
are "special" instead of regular functions
�
dbr
Sep
9 '12 at 22:03
Martin Vilcans
, 2011-07-25 22:21:42
Python is my favourite language as it is very expressive, but still keeps you from making too many
mistakes. I still have a few things that annoy me:
No real anonymous functions. Lambda can be used for single-statement functions, and the
with
statement
can be used for many things where you'd use a code block in Ruby. But in some situations it makes
things a bit more clumsy than they would have to be. (Far from as clumsy as it would be in Java,
but still...)
Some confusion in the relation between modules and files. Running "python foo.py" from the command
line is different from "import foo". Relative imports in Python 2.x can also cause problems. Still,
Python's modules is so much better than the corresponding features of C, C++ and Ruby.
Explicit
self
.
Even though I understand some of the reasons for it, and even though I use Python daily, I tend to
make the mistake of forgetting it. Another issue with it is that it becomes a bit tedious to make a
class out of a module. Explicit self is related to the limited scoping that others have complained
about. The smallest scope in Python is the function scope. If you keep your functions small, as you
should, that isn't a problem by itself and IMO often gives cleaner code.
Some global functions, such as
len
,
that you'd expect to be a method (which it actually is behind the scenes).
Significant indentation. Not the idea itself, which I think is great, but since this is the single
thing that keeps so many people from trying Python, perhaps Python would be better off with some
(optional) begin/end symbols. Ignoring those people, I could totally live with an enforced size for
the indentation too.
That it is not the built-in language of web browsers, instead of JavaScript.
Of these complaints, it's only the very first one that I care enough about that I think it should be
added to the language. The other ones are rather minor, except for the last one, which would be great
if it happened!
Zoran
Pavlovic
,
+1 It makes me wonder whether to write
datetime.datetime.now()
when
one project could write
datetime.now
and
then mixing two projects one way of writing it rules out the other and surely this wouldn't
have happened in Java which wouldn't name a module the same as a file(?) if you see how the
common way seems to have the module confusing us with the file when both uses are practiced
and explicit
self
I
still try to understand since the calls don't have the same number of arguments as the
functions. And you might thinkn that the VM python has is slow?
�
Niklas
R.
Sep
1 '11 at 16:19
5 revs
, 2013-05-23 22:03:02
Python is not fully mature: the python 3.2 language at this moment in time has compatibility problems
with most of the packages currently distributed (typically they are compatible with python 2.5). This
is a big drawback which currently requires more development effort (find the package needed; verify
compatibility; weigh choosing a not-as-good package which may be more compatible; take the best
version, update it to 3.2 which could take days; then begin doing something useful).
Likely in mid-2012 this will be less of a drawback.
Note that I guess I got downvoted by a fan-boy. During a developer discussion our high level developer
team reached the same conclusion though.
Maturity in one main sense means a team can use the technology and be very quickly up & running
without hidden risks (including compatibility problems). 3rd party python packages and many apps do
not work under 3.2 for the majority of the packages today. This creates more work of integration,
testing, reimplementing the technology itself instead of solving the problem at hand == less mature
technology.
Update for June 2013: Python 3 still has maturity problems. Every so often a team member will mention
a package needed then say "except it is only for 2.6" (in some of these cases I've implemented a
workaround via localhost socket to use the 2.6-only package with 2.6, and the rest of our tools stay
with 3.2). Not even MoinMoin, the pure-python wiki, is written in Python 3.
Jonathan
Cline IEEE
,
I agree with you only if your definition of maturity is
not
compatible with a version that is incompatible by design
.
�
tshepang
Jul
17 '11 at 7:25
Mason Wheeler
, 2010-10-28 22:35:52
Python's scoping is badly broken, which makes object-oriented programming in Python very awkward.
Access modifiers in Python are not enforcable - makes it difficult to write well structured,
modularized code.
I suppose that's part of @Mason's broken scoping - a big problem in general with this language. For
code that's supposed to be readable, it seems quite difficult to figure what can and should be in
scope and what a value will be at any given point in time - I'm currently thinking of moving on from
the Python language because of these drawbacks.
Just because "we're all consenting adults" doesn't mean that we don't make mistakes and don't work
better within a strong structure, especially when working on complex projects - indentation and
meaningless underscores don't seem to be sufficient.
ncoghlan
,
So lack of access controls is bad... but explicit scoping of variable writes to any non-local
namespace is also bad?
�
ncoghlan
Jul
13 '11 at 3:25
dan_waterworth
, 2010-12-26 13:05:49
The performance is not good, but is improving with pypy,
The GIL prevents the use of threading to speed up code, (although this is usually a premature
optimization),
It's only useful for application programming,
But it has some great redeeming features:
It's perfect for RAD,
It's easy to interface with C (and for C to embed a python interpreter),
It's very readable,
It's easy to learn,
It's well documented,
Batteries really are included, it's standard library is huge and pypi contains modules for
practically everything,
It has a healthy community.
dan_waterworth
,
What inspired to mention the advantages? The question for the problems. Anyways, what you
mean it's useful only for application programming? What other programming is there? What
specifically is it not good for?
�
tshepang
Dec
30 '10 at 13:27
Niklas R.
, 2011-07-23 07:31:38
I do favor python and the first disadvantage that comes to my mind is when commenting out a statement
like
if
myTest():
then you must change the indentation of the whole executed block which you wouldn't
have to do with C or Java. In fact in python instead of commenting out an if-clause instead I've
started to comment it out this way: `if True:#myTest() so I won't also have to change the following
code block. Since Java and C don't rely on indentation it makes commenting out statements easier with
C and Java.
Christopher
Mahan
,
You would seriously edit C or Java code to change the block level of some code without
changing its indentation?
�
Ben
Jul
24 '11 at 6:35
Jed
, 2012-09-08 13:49:04
Multiple dispatch does not integrate well with the established single-dispatch type system and is not
very performant.
Dynamic loading is a massive problem on parallel file systems where POSIX-like semantics lead to
catastrophic slow-downs for metadata-intensive operations. I have colleagues that have burned a
quarter million core-hours just getting Python (with numpy, mpi4py, petsc4py, and other extension
modules) loaded on 65k cores. (The simulation delivered a significant new science results, so it was
worth it, but it is a problem when more than a barrel of oil is burned to load Python once.) Inability
to link statically has forced us to go to great contortions to get reasonable load times at scale,
including patching libc-rtld to make
dlopen
perform
collective file system access.
Jed
,
Wow, seems highly technical, do you have any reference material, examples, blog posts or
articles on the subject ? I wonder if I might be exposed to such cases in a near future.
�
vincent
Sep
8 '12 at 20:00
vincent
, 2012-09-08 16:03:54
quite a bunch of very mainstream 3rd party libraries and software that are widely used, are quite
not pythonic. A few examples : soaplib, openerp, reportlab. Critique is out-of-scope, it's there,
it's widely used, but it makes the python culture confusing ( it hurts the motto that says " There
should be one-- and preferably only one --obvious way to do it "). Known pythonic successes ( such
as django or trac ) seem to be the exception.
the potentially unlimited depth of abstraction of instance, class, metaclass is conceptually
beautiful and unique. But to master it you have to deeply know the interpreter ( in which order
python code is interpreted, etc. ). It's not widely known and used ( or used correctly ), while
similar black magic such as C# generics, that is conceptually more convoluted ( IMHO ) seems more
widely known and used, proportionally.
to get a good grasp of memory and threading model, you have to be quite experienced with python,
because there's no comprehensive spec. You just know what works, maybe because you read the
interpreter's sources or experienced quirks and discovered how to fix them. For instance, there are
only strong or weak references, not the soft and phantom refs of java. Java has a thread for
garbage collection while there is no formal answer about when garbage collection happens in python
; you can just observe that garbage collection doesn't happen if no python code is executed, and
conclude it's probably happening sometimes when trying to allocate memory. Can be tricky when you
don't know why a locked resource wasn't released ( my experience about that was mod_python in
freeswitch ).
Anyhow, python is my main language for 4 years now. Being fanboys, elitists or monomaniacs is not a
part of the python culture.
Andrew
Janke
,
+1. Spec for memory and threading model is right on. But FWIW, the Java garbage collector
being on a thread (and most everything else about the GC) is not an aspect of the Java
language or VM specifications per se, but is a matter of a particular JVM's implementation.
However, the main Sun/Oracle JVM is extensively documented wrt GC behavior and
configurability, to the extent that there are whole books published on JVM tuning. In theory
one could document CPython in the same way, regardless of language spec.
�
Andrew
Janke
Nov
26 '12 at 3:44
deamon
, 2012-09-10 12:59:24
Strange OOP:
len(s)
through
__len__(self)
and
other "special methods"
extra special methods which could be derived from other special methods (
__add__
and
__iadd__
for
+
and
+=
)
self
as first method parameter
you can forget to call base class constructor
no access modifiers (private, protected ...)
no constant definitions
no immutability for custom types
GIL
poor performance which leads to a mix of Python and C and troubles with builds (looking for C libs,
platform dependencies ...)
bad documentation, especially in third party libs
incompatibility between Python 2.x and 3.x
poor code analysis tools (compared to what is offered for statically typed languages such as Java
or C#)
Konrad
Rudolph
,
Personally I think that incompatibility between between 2.x and 3.x is one of Python's
biggest advantages. Sure, it
also
is
a disadvantage. But the audacity of the developers to break backwards compatibility also
means that they didn't have to carry cruft around endlessly. More languages need such an
overhaul.
�
Konrad
Rudolph
Sep
10 '12 at 13:39
Kosta
, 2012-09-08 15:04:10
"Immutability" is not exactly it's strong point. AFAIK numbers, tuples and strings are immutable,
everything else (i.e. objects) is mutable. Compare that to functional languages like Erlang or Haskell
where everything is immutable (by default, at least).
However, Immutability really really shines with concurrency*, which is also not Python's strong point,
so at least it's consequent.
(*= For the nitpickers: I mean concurrency which is at least partially parallel. I guess Python is ok
with "single-threaded" concurrency, in which immutability is not as important. (Yes, FP-lovers, I know
that immutability is great even without concurrency.))
I'd love to have explicitly parallel constructs. More often than not, when I write a list
comprehension like
[ f(x) for x in lots_of_sx ]
I don't care the order in which the elements will be processed. Sometimes, I don't even care in which
order they are returned.
Even if CPython can't do it well when my f is pure Python, behavior like this could be defined for
other implementations to use.
Zoran
Pavlovic
,
//spawn bunch of threads //pass Queue que to all threads que.extend([x for x in lots_of_sx])
que.wait() # Wait for all lots_of_sx to be processed by threads.
�
Zoran
Pavlovic
Jan
7 '13 at 7:02
2 revs, 2 users 80%
, 2012-10-07 15:16:37
Python has no tail-call optimization, mostly for
philosophical
reasons
. This means that tail-recursing on large structures can cost O(n) memory (because of the
unnecessary stack that is kept) and will require you to rewrite the recursion as a loop to get O(1)
memory.
I'm writing a script for work, and need to be able to create a hash of arrays that will check
to see if a key exists in the hash (or dictionary), and if it does I will roll up some values
from the new line into the existing hash values. Here is my code in Perl, what would be the
translation in Python?
If you're just checking if the key exists, you can do if "key" in
your_dictionary
Edit:
To handle the unintended second part of your question, about adding the new value to the
array, you can do something like this
# -1 will give you the last item in the list every time
for key, value in nums.iteritems():
nums[key].append(value[-1]+value[-1])
omri_saadon ,
You can use this as well
rollUpHash.get(key, None)
If the key exists then the function return the value of this key, else the function will
return whatever you assigned as the default value (second parameter)
A very common idiom is to say "when it rains, it pours."
"Pours" in this context means, "rains very heavily."
What this means, roughly speaking is "when one bad thing happens, you can expect a lot
more bad things." So, for example, when talking to a friend who has just described a litany
of bad luck in his life you'd say, "when it rains, it pours."
Which is usually meant as escaping a bad situation only to find oneself in a worse
situation.
nnnnnn , 2019-12-23 04:41:12
"Out of the frying pan, into the fire" is identified in the question as not applicable.
(Unless the question was edited to add that after you answered, but there's no edit history
shown.) But in any case as you have correctly noted it means to replace a problem with a
worse problem, whereas the question is asking about adding additional problems without
solving the original one. – nnnnnn 3 hours ago
When the first bad situation was recent and related with subsequent deterioration (so the
"new difficulties" are related to the old ones)
This went downhill fast.
is a common way of expressing exasperation, particularly when human factors are involved
in the deterioration (namely people taking things badly). For new difficulties unrelated to
the old ones, I'd choose the previously mentioned "when it rains, it pours".
Mark Foskey ,
I believe there is no idiom that means exactly the same thing. Maybe you could just translate
yours into English? "It's like I had a lightning strike followed by a snake bite." People
won't even know you're using a cliche, or you can choose to say it's an expression from your
native language.
Actually, come to think of it, the word "snakebit" means that someone has had bad luck,
but it seems to be especially often used when someone has had a whole series of misfortunes.
So it might do.
dimachaerus , 2019-12-22 14:30:22
between a rock and a hard place
This phrase became popular during The Great Depression of the 1930s, as many citizens were
hit hard by the stock market crash and were left with no choice as unemployment levels rose
also.
I am doing these days a lot of collaborative writing with a colleague born and raised in
Russia, and now working in the US. He has a very good English and yet, as we circulated
various texts, I noticed that he tends to drop the definite article, the , more than
is acceptable. I attributed that to a trend of his native language.
Because I will continue working with him for some time, I hope to be aware of other such
possible errors influenced by his mother tongue (especially because I'm not a native English
speaker either!). So, what are common errors (or shibboleths) of native Russian speakers when
they write in English?
Andrew Grimm , 2015-03-28 00:38:44
Russian is a very flexible language. Its complexity allows one to put words in a sentence in
just about any order. There are 5 * 4 * 3 * 2 * 1 = 120 valid permutations of "I ate green
apple yesterday", although some sound more weird than others. Run-on sentences are not a big
deal - they are allowed. In fact, one can express a sentence "It is getting dark" with just a
single word, and it is a valid sentence. The words from Latin made their way into many
languages, but sometime their meanings have changed. – Job Mar 20 '11 at 2:18
> ,
13
mplungjan , 2011-03-19 09:31:44
A, an and the are all dropped. Using past tense with did (in my experience almost all
non-native do this until they learn not to). Sometimes using she instead of he. Word order is
not as important in Russian as in English. Missing prepositions
Russians I have met who have large vocabularies tend to stress words with more than two
syllables in an idiosyncratic manner since they likely only ever read the words.
I have the same problem on rare occasions where I know a word, know how to use it but guess
the pronunciation since I got it from literature.
For example beginning learners often omit the auxiliary in questions or negatives: How
you do that? / I no have it. The present simple is commonly used where the progressive form
or perfect is needed: She has a bath now / How long are you in Germany?. In comparison with
Russian the modal verb system in English is very complex. Mistakes such as Must you to work
on Friday? / I will not can come, etc. are common among beginners. The lack of a copula in
Russian leads to errors such as She good teacher.
Aside from the items pointed above, a well-educated native russian speaker often writes (and
speaks) in incredibly long, almost Hemingway-ish, compound sentences, where you can barely
remember what the beginning of the sentence was about. I'm not sure if it's primarily the
influence of russian prose, or something about the language itself which causes the brain to
produce the long sentences.
kotekzot , 2012-06-14 09:21:37
+1 It is certainly true for well-educated ones. Our famous writers used to write sentences
half a page long. Even for native speakers it is too much sometimes. – Edwin Ross Mar 19 '11
at 20:01
> ,
11
rem ,
Russian and English languages have somewhat different structure of verb tenses. For native
speakers of Russian it can often be difficult to correctly use perfect tense forms due to the
influence of their mother tongue.
The grammatical concepts behind the correct usage of English perfect tenses can be very
confusing to Russian speakers, so they tend to replace it with Simple Past tense for example
(in case of Present Perfect or Past Perfect), or just fail to use it appropriately.
I am from Russia and I work at an international company so my colleagues and I have to use
English all the time. There are really some common errors. The most difficult for us is to
use articles properly. There are nothing similar to them in our native language. That is why
we often use them where they are not needed and vice versa.
The second difficult part is using of prepositions. We tend to use those that we would use
in our language if they were translated. For example, instead of at office we tend to
say in office , instead of to London we often say in London . There are
many other examples.
We don't have gerund in our language, so sometimes it is difficult for us to use it
properly.
I can not agree with mplungjan that word order is not so important. It is important in any
language and in Russian you can too change the meaning of a sentence if you change word
order. Not always though, but in English it does not happen every time either.
There is also a rather big problem with sequence of tenses. In our language we do not have
to do it. That is why we misuse perfect tense and even past tense forms often.
These are the most often encountered mistakes that I can spot when I talk to or read
something from a native Russian speaker.
mplungjan , 2011-03-19 22:21:46
Almost all my closest colleagues are from the former soviet union. The order of the words in
a sentence seem to a non-Russian speaker to at least have a different importance since I see
this very often. Perhaps the person wanted to make a point in his native tongue, but the end
effect was an incorrect sentence. – mplungjan Mar 19 '11 at 17:48
konung , 2011-03-31 14:34:41
One thing that nobody seemed to mention is punctuation. It is of paramount importance in
Russian, because it brings intonation across.
Here is a famous example from an old Soviet cartoon that is based on a tale by Hans
Christian Andersen in which a little princess is asked to sign a decree of execution. Pay
attention to the position of the comma.
Казнить,
нельзя
помиловать!
This means :
Execute this person! Cannot pardon him!
Казнить
нельзя,
помиловать!
This means :
Do not execute this person! Pardon him!
I guess you could argue that you can do the same in English like so:
Execute cannot, pardon! vs Execute, cannot pardon!
And this would make sense to an attentive English speaker, but punctuation tends to be not
emphasized as much as spelling; as a result it will most likely be ignored or at the very
least be ambiguous. I was just trying to illustrate the point that punctuation is so
important that they made a cartoon for little children about it :-)
In fact it's so important that in Russia, Russian Language teachers usually give 2 grades
for some written assignments: one for grammar and the other one for punctuation (it wasn't
uncommon for me to get 5/3 ( or A/C in American equivalent) (I'm not a bad speller, but
sometimes I can't get those punctuations signs right even if my life depended on it :-) )
To relate to this question though: you will notice that Russian speakers that finished at
least 9 classes or high school in Russia will tend to use a lot more of , ; : " etc to bring
extra nuances across, especially in run-on sentences because it's ingrained in the way
language is taught. I see it with my Dad a lot. I've lived in the US for more than a decade
now myself and I still tend to put commas in front of "that" in the middle of the
sentence.
As previously mentioned, Russian doesn't use articles ( a, the ), so Russian speakers
use them - or don't - by guesswork, and often get them wrong.
What I haven't seen anyone else mention, however, is that the present tense of to be (
I am, thou art†, he is, we are, you are, they are ) is rarely (if ever) used in
Russian. As a result, again, Russian speakers sometimes make surprising mistakes in this
area. (My favorite: " Is there is...? ")
In speech, of course, there are at least three major pitfalls: Russian lacks a "th" sound
- foreign words that are imported into Russian tend to get substituted with "f" or "t". When
speaking English, "th" tends to turn into "s" or "z". If you're feeling especially cruel, ask
your Russian colleague to say "thither". (Of course, a lot of Americans also have trouble
with that one.)
Russian also doesn't have an equivalent to English "h" - the Russian letter х
, pronounced like the "ch" in loch , is not equivalent - so foreign (mostly German)
words imported into Russian usually substitute "g". Russians speaking English will, at first,
turn all of their aitches into gees; later on, some learn to pronounce an English h ,
while others convert h 's into х 's - the source of the infamous "kheavy
Roossian excent".
Finally, several of the "short" English vowel sounds - the a in "at", i in "in", and u in
"up" - don't exist in Russian, while Russian has at least one vowel sound (ы) that
doesn't exist in English. (Hence "excent" instead of "accent".)
†Yes, I know - we don't actually use "thou" anymore. Russians do, however
(ты) and so I mentioned it for completeness.
Here´s a conversation I had with my Russian colleague, who speaks English well:
me: Is Jane on board with this plan?
Russian: Jane's not on the board now. Didn't you know that?
me: No, I mean, does Jane agree with us on this?
Russian: What? What are you talking about?
me: "on board" means "is she on the same boat (page, etc) with us?"
To her, the word "the" should carry no significant change in meaning. She didn't 'get it'
on an intuitive level, despite years of successful study of English.
Human languages gather their own logic. Shall we discuss 'verbs of motion' in Russian, for
example? Why, if I am 200 miles outside of Moscow, do I have to specify whether I'm walking
or going by a vehicle when I say, "I'm going to Moscow tomorrow." Isn't it obvious I won't be
walking?
I'm enjoying learning Russian, because I'm uncovering the hidden logic in it. It's a
beautiful language.
Thanks for the very useful examples and explanations!
Actually I am still keep "fighting" with English articles after my at least 15 years of
good English experience. I tend to drop them in order to avoid using them wrong. I remember
very good how my collegues and my chief cursed my disability to use articles when editing my
English texts (looking for and fixing mostly only articles). The idea of articles in English
(and in German, French, too) seems very weird to my Russian mind. Why one need articles at
all? There are much more logical words"this", "that", "these" in English language as in
Russian (and many other languages). If we need to pinpoint the object (stress which one
exactly) then we use these words in Russian: "this car". Otherwise we Russians just do not
care to show that some "car" exist only in one piece (it's damn clear already since it's not
"cars") like one should do it in English stressing "a car" or "une voiture" in French.
I wonder what happens in the old times in English (and other Germanic languages) to force
people use article instead of logical "this", "that", "these" words?
Surprisingly it works much better for me with Swedish articles. May be because they are
not so strict about the articles, may be because Swedish article always connected with the
different ending of the word. They say and write not just "a car -- the car" but "en bil
(often droping "en") - den ha:r bilen". This somehow more complicated but in some strange way
concentrate me more on the certain object. Here is the link with professional explanation
about Swedish approach: http://www.thelocal.se/blogs/theswedishteacher/2012/04/11/denna-or-den-har/
♦ , 2013-02-21 09:37:10
Well if you wonder what happened in the old times, look up the etymology of the and
a . The former is the word "that", and the latter comes from the word "one".
I.e. "the apple" is simply "that apple", and "an apple" is "one apple". So English is not
that different from Russian, actually. – RegDwigнt ♦ Feb 21
'13 at 9:37
The words in bold in the quote below are meant to express something that I don't know how to
put in English. The main idea is that someone is spending too much energy in many different
areas thinking that he is going to achieve some considerable progress in all of them while in
fact he is only going to enjoy a small amount of success (if any) in all those areas due to
the enormous scale of area.
Jack: So what project did you choose for this semester?
Linda: The children illiteracy in in-land towns in Uganda, The correlation between
humans' eating habits and their behavioral patterns, The possibility of practical
application of the Poincaré conjecture solution in the nearest future, The affect of
globing warming on blue whales migratory patterns...
Jack: Wow! Isn't it too many? Why not focus on only one project and research it
thoroughly instead? I suggest that you should not shallowly spread yourself on so many
projects.
I can't think of an idiom that exactly expresses your meaning. We can suggest to Linda that
she should not spread herself so thinly , but that suggests the risk of failure,
rather than insufficient progress.
If Linda lives her life this way, she might become a jack of all trades, but master of
none . I.e. she has acquired many "shallow skills" through her diverse experiences, but
no deep ones.
I'm trying to express the idea of someone who consistently underestimates his own
contributions or his ability to impact a situation, despite having high self esteem. This is
due to seeing themselves as currently fitting into a category of people that are not expected
to be impactful in a situation, and thus they don't believe they should be impactful despite
their actually being qualified to have an impact.
In essences it's like someone saying they are just an intern so they can't/didn't have a
significant impact on a project because everyone knows interns are just there to learn not
create something, or someone saying they couldn't/didn't help lead the direction of a project
because they weren't a manager and only the manager is allowed to do that etc.
I struggle to best explain this concept, while stressing that the underestimation is not
due to bad self esteem or negativity, simply the fact that he does not believe he
should be impactful and thus underestimates any impact he could have.
In this situation would it be right to say that the individual is denying their agency? Or
perhaps does not acknowledge their agency, in the situation? I'm not certain if it is right
to say someone can be 'given' agency, or if agency is the intrinsic quality that the person
has rather or not he acknowledges its existence?
If the phrase isn't right, is there a better phrase to use?
I want to say something along the lines of "Warrantless Deference"
1) that they are deferring to someone else without authorization to do so. and 2) though
having the ability, the person assumes someone else will take on responsibility of the
tasks.
I would like to expose all subs into my namespace without having to list them one at a time:
@EXPORT = qw( firstsub secondsub third sub etc );
Using fully qualified names would require bunch of change to existing code so I'd rather
not do that.
Is there @EXPORT_ALL?
I think documentation says it's a bad idea, but I'd like to do it anyway, or at least know
how.
To answer Jon's why: right now for quick refactoring I want to move of bunch of subs into
their own package with least hassle and code changes to the existing scripts (where those
subs are currenty used and often repeated).
Also, mostly, I was just curious. (since it seemed like that Exporter might as well have
that as standard feature, but somewhat surprisingly based on answers so far it doesn't)
brian d foy , 2009-04-08 23:58:35
Don't do any exporting at all, and don't declare a package name in your library. Just load
the file with require and everything will be in the current package. Easy peasy.
Michael Carman , 2009-04-09 00:15:10
Don't. But if you really want to... write a custom import that walks the symbol
table and export all the named subroutines.
# Export all subs in package. Not for use in production code!
sub import {
no strict 'refs';
my $caller = caller;
while (my ($name, $symbol) = each %{__PACKAGE__ . '::'}) {
next if $name eq 'BEGIN'; # don't export BEGIN blocks
next if $name eq 'import'; # don't export this sub
next unless *{$symbol}{CODE}; # export subs only
my $imported = $caller . '::' . $name;
*{ $imported } = \*{ $symbol };
}
}
Chas. Owens ,
Warning, the code following is as bad an idea as exporting everything:
package Expo;
use base "Exporter";
seek DATA, 0, 0; #move DATA back to package
#read this file looking for sub names
our @EXPORT = map { /^sub\s+([^({\s]+)/ ? $1 : () } <DATA>;
my $sub = sub {}; #make sure anon funcs aren't grabbed
sub foo($) {
print shift, "\n";
}
sub bar ($) {
print shift, "\n";
}
sub baz{
print shift,"\n";
}
sub quux {
print shift,"\n";
}
1;
__DATA__
Here is the some code that uses the module:
#!/usr/bin/perl
use strict;
use warnings;
use Expo;
print map { "[$_]\n" } @Expo::EXPORT;
foo("foo");
bar("bar");
baz("baz");
quux("quux");
And here is its output:
[foo]
[bar]
[baz]
[quux]
foo
bar
baz
quux
Jon Ericson , 2009-04-08 22:33:36
You can always call subroutines in there fully-specified form:
MyModule::firstsub();
For modules I write internally, I find this convention works fairly well. It's a bit more
typing, but tends to be better documentation.
Take a look at perldoc perlmod for more information about what you are trying
to accomplish.
More generally, you could look at Exporter 's code and see how it uses glob
aliasing. Or you can examine your module's namespace and export each subroutine. (I don't
care to search for how to do that at the moment, but Perl makes this fairly easy.) Or you
could just stick your subroutines in the main package:
package main;
sub firstsub() { ... }
(I don't think that's a good idea, but you know better than I do what you are trying to
accomplish.)
There's nothing wrong with doing this provided you know what you are doing and aren't just
trying to avoid thinking about your interface to the outside world.
ysth , 2009-04-09 01:29:04
Perhaps you would be interested in one of the Export* modules on CPAN that lets you mark subs
as exportable simply by adding an attribute to the sub definition? (Don't remember which one
it was, though.)
Although it is not usually wise to dump all sub s from module into the caller
namespace, it is sometimes useful (and more DRY!) to automatically generate
@EXPORT_OK and %EXPORT_TAGS variables.
The easiest method is to extend the Exporter. A simple example is something like this:
package Exporter::AutoOkay;
#
# Automatically add all subroutines from caller package into the
# @EXPORT_OK array. In the package use like Exporter, f.ex.:
#
# use parent 'Exporter::AutoOkay';
#
use warnings;
use strict;
no strict 'refs';
require Exporter;
sub import {
my $package = $_[0].'::';
# Get the list of exportable items
my @export_ok = (@{$package.'EXPORT_OK'});
# Automatically add all subroutines from package into the list
foreach (keys %{$package}) {
next unless defined &{$package.$_};
push @export_ok, $_;
}
# Set variable ready for Exporter
@{$package.'EXPORT_OK'} = @export_ok;
# Let Exporter do the rest
goto &Exporter::import;
}
1;
Note the use of goto that removes us from the caller stack.
A more complete example can be found here: http://pastebin.com/Z1QWzcpZ It automatically generates
tag groups from subroutine prefixes.
Sérgio , 2013-11-14 21:38:06
case 1
Library is :
package mycommon;
use strict;
use warnings;
sub onefunctionthatyoumadeonlibary() {
}
1;
you can use it, calling common:: :
#!/usr/bin/perl
use strict;
use warnings;
use mycommon;
common::onefunctionthatyoumadeonlibary()
case 2
Library is , yousimple export them :
package mycommon;
use strict;
use warnings;
use base 'Exporter';
our @EXPORT = qw(onefunctionthatyoumadeonlibary);
sub onefunctionthatyoumadeonlibary() {
}
1;
use it in same "namespace":
#!/usr/bin/perl
use strict;
use warnings;
use mycommon qw(onefunctionthatyoumadeonlibary);
onefunctionthatyoumadeonlibary()
Also we can do a mix of this two cases , we can export more common functions to use it
without calling the packages name and other functions that we only call it with package name
and that ones don't need to be exported.
> ,
You will have to do some typeglob munging. I describe something similar here:
The import routine there should do exactly what you want -- just don't import any symbols
into your own namespace.
Ville M ,
I would like to expose all subs into my namespace without having to list them one at a time:
@EXPORT = qw( firstsub secondsub third sub etc );
Using fully qualified names would require bunch of change to existing code so I'd rather
not do that.
Is there @EXPORT_ALL?
I think documentation says it's a bad idea, but I'd like to do it anyway, or at least know
how.
To answer Jon's why: right now for quick refactoring I want to move of bunch of subs into
their own package with least hassle and code changes to the existing scripts (where those
subs are currenty used and often repeated).
Also, mostly, I was just curious. (since it seemed like that Exporter might as well have
that as standard feature, but somewhat surprisingly based on answers so far it doesn't)
brian d foy , 2009-04-08 23:58:35
Don't do any exporting at all, and don't declare a package name in your library. Just load
the file with require and everything will be in the current package. Easy peasy.
Michael Carman , 2009-04-09 00:15:10
Don't. But if you really want to... write a custom import that walks the symbol
table and export all the named subroutines.
# Export all subs in package. Not for use in production code!
sub import {
no strict 'refs';
my $caller = caller;
while (my ($name, $symbol) = each %{__PACKAGE__ . '::'}) {
next if $name eq 'BEGIN'; # don't export BEGIN blocks
next if $name eq 'import'; # don't export this sub
next unless *{$symbol}{CODE}; # export subs only
my $imported = $caller . '::' . $name;
*{ $imported } = \*{ $symbol };
}
}
Chas. Owens ,
Warning, the code following is as bad an idea as exporting everything:
package Expo;
use base "Exporter";
seek DATA, 0, 0; #move DATA back to package
#read this file looking for sub names
our @EXPORT = map { /^sub\s+([^({\s]+)/ ? $1 : () } <DATA>;
my $sub = sub {}; #make sure anon funcs aren't grabbed
sub foo($) {
print shift, "\n";
}
sub bar ($) {
print shift, "\n";
}
sub baz{
print shift,"\n";
}
sub quux {
print shift,"\n";
}
1;
__DATA__
Here is the some code that uses the module:
#!/usr/bin/perl
use strict;
use warnings;
use Expo;
print map { "[$_]\n" } @Expo::EXPORT;
foo("foo");
bar("bar");
baz("baz");
quux("quux");
And here is its output:
[foo]
[bar]
[baz]
[quux]
foo
bar
baz
quux
Jon Ericson , 2009-04-08 22:33:36
You can always call subroutines in there fully-specified form:
MyModule::firstsub();
For modules I write internally, I find this convention works fairly well. It's a bit more
typing, but tends to be better documentation.
Take a look at perldoc perlmod for more information about what you are trying
to accomplish.
More generally, you could look at Exporter 's code and see how it uses glob
aliasing. Or you can examine your module's namespace and export each subroutine. (I don't
care to search for how to do that at the moment, but Perl makes this fairly easy.) Or you
could just stick your subroutines in the main package:
package main;
sub firstsub() { ... }
(I don't think that's a good idea, but you know better than I do what you are trying to
accomplish.)
There's nothing wrong with doing this provided you know what you are doing and aren't just
trying to avoid thinking about your interface to the outside world.
ysth , 2009-04-09 01:29:04
Perhaps you would be interested in one of the Export* modules on CPAN that lets you mark subs
as exportable simply by adding an attribute to the sub definition? (Don't remember which one
it was, though.)
Although it is not usually wise to dump all sub s from module into the caller
namespace, it is sometimes useful (and more DRY!) to automatically generate
@EXPORT_OK and %EXPORT_TAGS variables.
The easiest method is to extend the Exporter. A simple example is something like this:
package Exporter::AutoOkay;
#
# Automatically add all subroutines from caller package into the
# @EXPORT_OK array. In the package use like Exporter, f.ex.:
#
# use parent 'Exporter::AutoOkay';
#
use warnings;
use strict;
no strict 'refs';
require Exporter;
sub import {
my $package = $_[0].'::';
# Get the list of exportable items
my @export_ok = (@{$package.'EXPORT_OK'});
# Automatically add all subroutines from package into the list
foreach (keys %{$package}) {
next unless defined &{$package.$_};
push @export_ok, $_;
}
# Set variable ready for Exporter
@{$package.'EXPORT_OK'} = @export_ok;
# Let Exporter do the rest
goto &Exporter::import;
}
1;
Note the use of goto that removes us from the caller stack.
A more complete example can be found here: http://pastebin.com/Z1QWzcpZ It automatically generates
tag groups from subroutine prefixes.
Sérgio , 2013-11-14 21:38:06
case 1
Library is :
package mycommon;
use strict;
use warnings;
sub onefunctionthatyoumadeonlibary() {
}
1;
you can use it, calling common:: :
#!/usr/bin/perl
use strict;
use warnings;
use mycommon;
common::onefunctionthatyoumadeonlibary()
case 2
Library is , yousimple export them :
package mycommon;
use strict;
use warnings;
use base 'Exporter';
our @EXPORT = qw(onefunctionthatyoumadeonlibary);
sub onefunctionthatyoumadeonlibary() {
}
1;
use it in same "namespace":
#!/usr/bin/perl
use strict;
use warnings;
use mycommon qw(onefunctionthatyoumadeonlibary);
onefunctionthatyoumadeonlibary()
Also we can do a mix of this two cases , we can export more common functions to use it
without calling the packages name and other functions that we only call it with package name
and that ones don't need to be exported.
> ,
You will have to do some typeglob munging. I describe something similar here:
The import routine there should do exactly what you want -- just don't import any symbols
into your own namespace.
Ville M ,
I would like to expose all subs into my namespace without having to list them one at a time:
@EXPORT = qw( firstsub secondsub third sub etc );
Using fully qualified names would require bunch of change to existing code so I'd rather
not do that.
Is there @EXPORT_ALL?
I think documentation says it's a bad idea, but I'd like to do it anyway, or at least know
how.
To answer Jon's why: right now for quick refactoring I want to move of bunch of subs into
their own package with least hassle and code changes to the existing scripts (where those
subs are currenty used and often repeated).
Also, mostly, I was just curious. (since it seemed like that Exporter might as well have
that as standard feature, but somewhat surprisingly based on answers so far it doesn't)
brian d foy , 2009-04-08 23:58:35
Don't do any exporting at all, and don't declare a package name in your library. Just load
the file with require and everything will be in the current package. Easy peasy.
Michael Carman , 2009-04-09 00:15:10
Don't. But if you really want to... write a custom import that walks the symbol
table and export all the named subroutines.
# Export all subs in package. Not for use in production code!
sub import {
no strict 'refs';
my $caller = caller;
while (my ($name, $symbol) = each %{__PACKAGE__ . '::'}) {
next if $name eq 'BEGIN'; # don't export BEGIN blocks
next if $name eq 'import'; # don't export this sub
next unless *{$symbol}{CODE}; # export subs only
my $imported = $caller . '::' . $name;
*{ $imported } = \*{ $symbol };
}
}
Chas. Owens ,
Warning, the code following is as bad an idea as exporting everything:
package Expo;
use base "Exporter";
seek DATA, 0, 0; #move DATA back to package
#read this file looking for sub names
our @EXPORT = map { /^sub\s+([^({\s]+)/ ? $1 : () } <DATA>;
my $sub = sub {}; #make sure anon funcs aren't grabbed
sub foo($) {
print shift, "\n";
}
sub bar ($) {
print shift, "\n";
}
sub baz{
print shift,"\n";
}
sub quux {
print shift,"\n";
}
1;
__DATA__
Here is the some code that uses the module:
#!/usr/bin/perl
use strict;
use warnings;
use Expo;
print map { "[$_]\n" } @Expo::EXPORT;
foo("foo");
bar("bar");
baz("baz");
quux("quux");
And here is its output:
[foo]
[bar]
[baz]
[quux]
foo
bar
baz
quux
Jon Ericson , 2009-04-08 22:33:36
You can always call subroutines in there fully-specified form:
MyModule::firstsub();
For modules I write internally, I find this convention works fairly well. It's a bit more
typing, but tends to be better documentation.
Take a look at perldoc perlmod for more information about what you are trying
to accomplish.
More generally, you could look at Exporter 's code and see how it uses glob
aliasing. Or you can examine your module's namespace and export each subroutine. (I don't
care to search for how to do that at the moment, but Perl makes this fairly easy.) Or you
could just stick your subroutines in the main package:
package main;
sub firstsub() { ... }
(I don't think that's a good idea, but you know better than I do what you are trying to
accomplish.)
There's nothing wrong with doing this provided you know what you are doing and aren't just
trying to avoid thinking about your interface to the outside world.
ysth , 2009-04-09 01:29:04
Perhaps you would be interested in one of the Export* modules on CPAN that lets you mark subs
as exportable simply by adding an attribute to the sub definition? (Don't remember which one
it was, though.)
Although it is not usually wise to dump all sub s from module into the caller
namespace, it is sometimes useful (and more DRY!) to automatically generate
@EXPORT_OK and %EXPORT_TAGS variables.
The easiest method is to extend the Exporter. A simple example is something like this:
package Exporter::AutoOkay;
#
# Automatically add all subroutines from caller package into the
# @EXPORT_OK array. In the package use like Exporter, f.ex.:
#
# use parent 'Exporter::AutoOkay';
#
use warnings;
use strict;
no strict 'refs';
require Exporter;
sub import {
my $package = $_[0].'::';
# Get the list of exportable items
my @export_ok = (@{$package.'EXPORT_OK'});
# Automatically add all subroutines from package into the list
foreach (keys %{$package}) {
next unless defined &{$package.$_};
push @export_ok, $_;
}
# Set variable ready for Exporter
@{$package.'EXPORT_OK'} = @export_ok;
# Let Exporter do the rest
goto &Exporter::import;
}
1;
Note the use of goto that removes us from the caller stack.
A more complete example can be found here: http://pastebin.com/Z1QWzcpZ It automatically generates
tag groups from subroutine prefixes.
Sérgio , 2013-11-14 21:38:06
case 1
Library is :
package mycommon;
use strict;
use warnings;
sub onefunctionthatyoumadeonlibary() {
}
1;
you can use it, calling common:: :
#!/usr/bin/perl
use strict;
use warnings;
use mycommon;
common::onefunctionthatyoumadeonlibary()
case 2
Library is , yousimple export them :
package mycommon;
use strict;
use warnings;
use base 'Exporter';
our @EXPORT = qw(onefunctionthatyoumadeonlibary);
sub onefunctionthatyoumadeonlibary() {
}
1;
use it in same "namespace":
#!/usr/bin/perl
use strict;
use warnings;
use mycommon qw(onefunctionthatyoumadeonlibary);
onefunctionthatyoumadeonlibary()
Also we can do a mix of this two cases , we can export more common functions to use it
without calling the packages name and other functions that we only call it with package name
and that ones don't need to be exported.
> ,
You will have to do some typeglob munging. I describe something similar here:
The import routine there should do exactly what you want -- just don't import any symbols
into your own namespace.
Ville M ,
I would like to expose all subs into my namespace without having to list them one at a time:
@EXPORT = qw( firstsub secondsub third sub etc );
Using fully qualified names would require bunch of change to existing code so I'd rather
not do that.
Is there @EXPORT_ALL?
I think documentation says it's a bad idea, but I'd like to do it anyway, or at least know
how.
To answer Jon's why: right now for quick refactoring I want to move of bunch of subs into
their own package with least hassle and code changes to the existing scripts (where those
subs are currenty used and often repeated).
Also, mostly, I was just curious. (since it seemed like that Exporter might as well have
that as standard feature, but somewhat surprisingly based on answers so far it doesn't)
brian d foy , 2009-04-08 23:58:35
Don't do any exporting at all, and don't declare a package name in your library. Just load
the file with require and everything will be in the current package. Easy peasy.
Michael Carman , 2009-04-09 00:15:10
Don't. But if you really want to... write a custom import that walks the symbol
table and export all the named subroutines.
# Export all subs in package. Not for use in production code!
sub import {
no strict 'refs';
my $caller = caller;
while (my ($name, $symbol) = each %{__PACKAGE__ . '::'}) {
next if $name eq 'BEGIN'; # don't export BEGIN blocks
next if $name eq 'import'; # don't export this sub
next unless *{$symbol}{CODE}; # export subs only
my $imported = $caller . '::' . $name;
*{ $imported } = \*{ $symbol };
}
}
Chas. Owens ,
Warning, the code following is as bad an idea as exporting everything:
package Expo;
use base "Exporter";
seek DATA, 0, 0; #move DATA back to package
#read this file looking for sub names
our @EXPORT = map { /^sub\s+([^({\s]+)/ ? $1 : () } <DATA>;
my $sub = sub {}; #make sure anon funcs aren't grabbed
sub foo($) {
print shift, "\n";
}
sub bar ($) {
print shift, "\n";
}
sub baz{
print shift,"\n";
}
sub quux {
print shift,"\n";
}
1;
__DATA__
Here is the some code that uses the module:
#!/usr/bin/perl
use strict;
use warnings;
use Expo;
print map { "[$_]\n" } @Expo::EXPORT;
foo("foo");
bar("bar");
baz("baz");
quux("quux");
And here is its output:
[foo]
[bar]
[baz]
[quux]
foo
bar
baz
quux
Jon Ericson , 2009-04-08 22:33:36
You can always call subroutines in there fully-specified form:
MyModule::firstsub();
For modules I write internally, I find this convention works fairly well. It's a bit more
typing, but tends to be better documentation.
Take a look at perldoc perlmod for more information about what you are trying
to accomplish.
More generally, you could look at Exporter 's code and see how it uses glob
aliasing. Or you can examine your module's namespace and export each subroutine. (I don't
care to search for how to do that at the moment, but Perl makes this fairly easy.) Or you
could just stick your subroutines in the main package:
package main;
sub firstsub() { ... }
(I don't think that's a good idea, but you know better than I do what you are trying to
accomplish.)
There's nothing wrong with doing this provided you know what you are doing and aren't just
trying to avoid thinking about your interface to the outside world.
ysth , 2009-04-09 01:29:04
Perhaps you would be interested in one of the Export* modules on CPAN that lets you mark subs
as exportable simply by adding an attribute to the sub definition? (Don't remember which one
it was, though.)
Although it is not usually wise to dump all sub s from module into the caller
namespace, it is sometimes useful (and more DRY!) to automatically generate
@EXPORT_OK and %EXPORT_TAGS variables.
The easiest method is to extend the Exporter. A simple example is something like this:
package Exporter::AutoOkay;
#
# Automatically add all subroutines from caller package into the
# @EXPORT_OK array. In the package use like Exporter, f.ex.:
#
# use parent 'Exporter::AutoOkay';
#
use warnings;
use strict;
no strict 'refs';
require Exporter;
sub import {
my $package = $_[0].'::';
# Get the list of exportable items
my @export_ok = (@{$package.'EXPORT_OK'});
# Automatically add all subroutines from package into the list
foreach (keys %{$package}) {
next unless defined &{$package.$_};
push @export_ok, $_;
}
# Set variable ready for Exporter
@{$package.'EXPORT_OK'} = @export_ok;
# Let Exporter do the rest
goto &Exporter::import;
}
1;
Note the use of goto that removes us from the caller stack.
A more complete example can be found here: http://pastebin.com/Z1QWzcpZ It automatically generates
tag groups from subroutine prefixes.
Sérgio , 2013-11-14 21:38:06
case 1
Library is :
package mycommon;
use strict;
use warnings;
sub onefunctionthatyoumadeonlibary() {
}
1;
you can use it, calling common:: :
#!/usr/bin/perl
use strict;
use warnings;
use mycommon;
common::onefunctionthatyoumadeonlibary()
case 2
Library is , yousimple export them :
package mycommon;
use strict;
use warnings;
use base 'Exporter';
our @EXPORT = qw(onefunctionthatyoumadeonlibary);
sub onefunctionthatyoumadeonlibary() {
}
1;
use it in same "namespace":
#!/usr/bin/perl
use strict;
use warnings;
use mycommon qw(onefunctionthatyoumadeonlibary);
onefunctionthatyoumadeonlibary()
Also we can do a mix of this two cases , we can export more common functions to use it
without calling the packages name and other functions that we only call it with package name
and that ones don't need to be exported.
> ,
You will have to do some typeglob munging. I describe something similar here:
The import routine there should do exactly what you want -- just don't import any symbols
into your own namespace.
Ville M ,
I would like to expose all subs into my namespace without having to list them one at a time:
@EXPORT = qw( firstsub secondsub third sub etc );
Using fully qualified names would require bunch of change to existing code so I'd rather
not do that.
Is there @EXPORT_ALL?
I think documentation says it's a bad idea, but I'd like to do it anyway, or at least know
how.
To answer Jon's why: right now for quick refactoring I want to move of bunch of subs into
their own package with least hassle and code changes to the existing scripts (where those
subs are currenty used and often repeated).
Also, mostly, I was just curious. (since it seemed like that Exporter might as well have
that as standard feature, but somewhat surprisingly based on answers so far it doesn't)
brian d foy , 2009-04-08 23:58:35
Don't do any exporting at all, and don't declare a package name in your library. Just load
the file with require and everything will be in the current package. Easy peasy.
Michael Carman , 2009-04-09 00:15:10
Don't. But if you really want to... write a custom import that walks the symbol
table and export all the named subroutines.
# Export all subs in package. Not for use in production code!
sub import {
no strict 'refs';
my $caller = caller;
while (my ($name, $symbol) = each %{__PACKAGE__ . '::'}) {
next if $name eq 'BEGIN'; # don't export BEGIN blocks
next if $name eq 'import'; # don't export this sub
next unless *{$symbol}{CODE}; # export subs only
my $imported = $caller . '::' . $name;
*{ $imported } = \*{ $symbol };
}
}
Chas. Owens ,
Warning, the code following is as bad an idea as exporting everything:
package Expo;
use base "Exporter";
seek DATA, 0, 0; #move DATA back to package
#read this file looking for sub names
our @EXPORT = map { /^sub\s+([^({\s]+)/ ? $1 : () } <DATA>;
my $sub = sub {}; #make sure anon funcs aren't grabbed
sub foo($) {
print shift, "\n";
}
sub bar ($) {
print shift, "\n";
}
sub baz{
print shift,"\n";
}
sub quux {
print shift,"\n";
}
1;
__DATA__
Here is the some code that uses the module:
#!/usr/bin/perl
use strict;
use warnings;
use Expo;
print map { "[$_]\n" } @Expo::EXPORT;
foo("foo");
bar("bar");
baz("baz");
quux("quux");
And here is its output:
[foo]
[bar]
[baz]
[quux]
foo
bar
baz
quux
Jon Ericson , 2009-04-08 22:33:36
You can always call subroutines in there fully-specified form:
MyModule::firstsub();
For modules I write internally, I find this convention works fairly well. It's a bit more
typing, but tends to be better documentation.
Take a look at perldoc perlmod for more information about what you are trying
to accomplish.
More generally, you could look at Exporter 's code and see how it uses glob
aliasing. Or you can examine your module's namespace and export each subroutine. (I don't
care to search for how to do that at the moment, but Perl makes this fairly easy.) Or you
could just stick your subroutines in the main package:
package main;
sub firstsub() { ... }
(I don't think that's a good idea, but you know better than I do what you are trying to
accomplish.)
There's nothing wrong with doing this provided you know what you are doing and aren't just
trying to avoid thinking about your interface to the outside world.
ysth , 2009-04-09 01:29:04
Perhaps you would be interested in one of the Export* modules on CPAN that lets you mark subs
as exportable simply by adding an attribute to the sub definition? (Don't remember which one
it was, though.)
Although it is not usually wise to dump all sub s from module into the caller
namespace, it is sometimes useful (and more DRY!) to automatically generate
@EXPORT_OK and %EXPORT_TAGS variables.
The easiest method is to extend the Exporter. A simple example is something like this:
package Exporter::AutoOkay;
#
# Automatically add all subroutines from caller package into the
# @EXPORT_OK array. In the package use like Exporter, f.ex.:
#
# use parent 'Exporter::AutoOkay';
#
use warnings;
use strict;
no strict 'refs';
require Exporter;
sub import {
my $package = $_[0].'::';
# Get the list of exportable items
my @export_ok = (@{$package.'EXPORT_OK'});
# Automatically add all subroutines from package into the list
foreach (keys %{$package}) {
next unless defined &{$package.$_};
push @export_ok, $_;
}
# Set variable ready for Exporter
@{$package.'EXPORT_OK'} = @export_ok;
# Let Exporter do the rest
goto &Exporter::import;
}
1;
Note the use of goto that removes us from the caller stack.
A more complete example can be found here: http://pastebin.com/Z1QWzcpZ It automatically generates
tag groups from subroutine prefixes.
Sérgio , 2013-11-14 21:38:06
case 1
Library is :
package mycommon;
use strict;
use warnings;
sub onefunctionthatyoumadeonlibary() {
}
1;
you can use it, calling common:: :
#!/usr/bin/perl
use strict;
use warnings;
use mycommon;
common::onefunctionthatyoumadeonlibary()
case 2
Library is , yousimple export them :
package mycommon;
use strict;
use warnings;
use base 'Exporter';
our @EXPORT = qw(onefunctionthatyoumadeonlibary);
sub onefunctionthatyoumadeonlibary() {
}
1;
use it in same "namespace":
#!/usr/bin/perl
use strict;
use warnings;
use mycommon qw(onefunctionthatyoumadeonlibary);
onefunctionthatyoumadeonlibary()
Also we can do a mix of this two cases , we can export more common functions to use it
without calling the packages name and other functions that we only call it with package name
and that ones don't need to be exported.
> ,
You will have to do some typeglob munging. I describe something similar here:
The import routine there should do exactly what you want -- just don't import any symbols
into your own namespace.
Ville M ,
I would like to expose all subs into my namespace without having to list them one at a time:
@EXPORT = qw( firstsub secondsub third sub etc );
Using fully qualified names would require bunch of change to existing code so I'd rather
not do that.
Is there @EXPORT_ALL?
I think documentation says it's a bad idea, but I'd like to do it anyway, or at least know
how.
To answer Jon's why: right now for quick refactoring I want to move of bunch of subs into
their own package with least hassle and code changes to the existing scripts (where those
subs are currenty used and often repeated).
Also, mostly, I was just curious. (since it seemed like that Exporter might as well have
that as standard feature, but somewhat surprisingly based on answers so far it doesn't)
brian d foy , 2009-04-08 23:58:35
Don't do any exporting at all, and don't declare a package name in your library. Just load
the file with require and everything will be in the current package. Easy peasy.
Michael Carman , 2009-04-09 00:15:10
Don't. But if you really want to... write a custom import that walks the symbol
table and export all the named subroutines.
# Export all subs in package. Not for use in production code!
sub import {
no strict 'refs';
my $caller = caller;
while (my ($name, $symbol) = each %{__PACKAGE__ . '::'}) {
next if $name eq 'BEGIN'; # don't export BEGIN blocks
next if $name eq 'import'; # don't export this sub
next unless *{$symbol}{CODE}; # export subs only
my $imported = $caller . '::' . $name;
*{ $imported } = \*{ $symbol };
}
}
Chas. Owens ,
Warning, the code following is as bad an idea as exporting everything:
package Expo;
use base "Exporter";
seek DATA, 0, 0; #move DATA back to package
#read this file looking for sub names
our @EXPORT = map { /^sub\s+([^({\s]+)/ ? $1 : () } <DATA>;
my $sub = sub {}; #make sure anon funcs aren't grabbed
sub foo($) {
print shift, "\n";
}
sub bar ($) {
print shift, "\n";
}
sub baz{
print shift,"\n";
}
sub quux {
print shift,"\n";
}
1;
__DATA__
Here is the some code that uses the module:
#!/usr/bin/perl
use strict;
use warnings;
use Expo;
print map { "[$_]\n" } @Expo::EXPORT;
foo("foo");
bar("bar");
baz("baz");
quux("quux");
And here is its output:
[foo]
[bar]
[baz]
[quux]
foo
bar
baz
quux
Jon Ericson , 2009-04-08 22:33:36
You can always call subroutines in there fully-specified form:
MyModule::firstsub();
For modules I write internally, I find this convention works fairly well. It's a bit more
typing, but tends to be better documentation.
Take a look at perldoc perlmod for more information about what you are trying
to accomplish.
More generally, you could look at Exporter 's code and see how it uses glob
aliasing. Or you can examine your module's namespace and export each subroutine. (I don't
care to search for how to do that at the moment, but Perl makes this fairly easy.) Or you
could just stick your subroutines in the main package:
package main;
sub firstsub() { ... }
(I don't think that's a good idea, but you know better than I do what you are trying to
accomplish.)
There's nothing wrong with doing this provided you know what you are doing and aren't just
trying to avoid thinking about your interface to the outside world.
ysth , 2009-04-09 01:29:04
Perhaps you would be interested in one of the Export* modules on CPAN that lets you mark subs
as exportable simply by adding an attribute to the sub definition? (Don't remember which one
it was, though.)
Although it is not usually wise to dump all sub s from module into the caller
namespace, it is sometimes useful (and more DRY!) to automatically generate
@EXPORT_OK and %EXPORT_TAGS variables.
The easiest method is to extend the Exporter. A simple example is something like this:
package Exporter::AutoOkay;
#
# Automatically add all subroutines from caller package into the
# @EXPORT_OK array. In the package use like Exporter, f.ex.:
#
# use parent 'Exporter::AutoOkay';
#
use warnings;
use strict;
no strict 'refs';
require Exporter;
sub import {
my $package = $_[0].'::';
# Get the list of exportable items
my @export_ok = (@{$package.'EXPORT_OK'});
# Automatically add all subroutines from package into the list
foreach (keys %{$package}) {
next unless defined &{$package.$_};
push @export_ok, $_;
}
# Set variable ready for Exporter
@{$package.'EXPORT_OK'} = @export_ok;
# Let Exporter do the rest
goto &Exporter::import;
}
1;
Note the use of goto that removes us from the caller stack.
A more complete example can be found here: http://pastebin.com/Z1QWzcpZ It automatically generates
tag groups from subroutine prefixes.
Sérgio , 2013-11-14 21:38:06
case 1
Library is :
package mycommon;
use strict;
use warnings;
sub onefunctionthatyoumadeonlibary() {
}
1;
you can use it, calling common:: :
#!/usr/bin/perl
use strict;
use warnings;
use mycommon;
common::onefunctionthatyoumadeonlibary()
case 2
Library is , yousimple export them :
package mycommon;
use strict;
use warnings;
use base 'Exporter';
our @EXPORT = qw(onefunctionthatyoumadeonlibary);
sub onefunctionthatyoumadeonlibary() {
}
1;
use it in same "namespace":
#!/usr/bin/perl
use strict;
use warnings;
use mycommon qw(onefunctionthatyoumadeonlibary);
onefunctionthatyoumadeonlibary()
Also we can do a mix of this two cases , we can export more common functions to use it
without calling the packages name and other functions that we only call it with package name
and that ones don't need to be exported.
> ,
You will have to do some typeglob munging. I describe something similar here:
I would like to expose all subs into my namespace without having to list them one at a time:
@EXPORT = qw( firstsub secondsub third sub etc );
Using fully qualified names would require bunch of change to existing code so I'd rather
not do that.
Is there @EXPORT_ALL?
I think documentation says it's a bad idea, but I'd like to do it anyway, or at least know
how.
To answer Jon's why: right now for quick refactoring I want to move of bunch of subs into
their own package with least hassle and code changes to the existing scripts (where those
subs are currenty used and often repeated).
Also, mostly, I was just curious. (since it seemed like that Exporter might as well have
that as standard feature, but somewhat surprisingly based on answers so far it doesn't)
brian d foy , 2009-04-08 23:58:35
Don't do any exporting at all, and don't declare a package name in your library. Just load
the file with require and everything will be in the current package. Easy peasy.
Michael Carman , 2009-04-09 00:15:10
Don't. But if you really want to... write a custom import that walks the symbol
table and export all the named subroutines.
# Export all subs in package. Not for use in production code!
sub import {
no strict 'refs';
my $caller = caller;
while (my ($name, $symbol) = each %{__PACKAGE__ . '::'}) {
next if $name eq 'BEGIN'; # don't export BEGIN blocks
next if $name eq 'import'; # don't export this sub
next unless *{$symbol}{CODE}; # export subs only
my $imported = $caller . '::' . $name;
*{ $imported } = \*{ $symbol };
}
}
Chas. Owens ,
Warning, the code following is as bad an idea as exporting everything:
package Expo;
use base "Exporter";
seek DATA, 0, 0; #move DATA back to package
#read this file looking for sub names
our @EXPORT = map { /^sub\s+([^({\s]+)/ ? $1 : () } <DATA>;
my $sub = sub {}; #make sure anon funcs aren't grabbed
sub foo($) {
print shift, "\n";
}
sub bar ($) {
print shift, "\n";
}
sub baz{
print shift,"\n";
}
sub quux {
print shift,"\n";
}
1;
__DATA__
Here is the some code that uses the module:
#!/usr/bin/perl
use strict;
use warnings;
use Expo;
print map { "[$_]\n" } @Expo::EXPORT;
foo("foo");
bar("bar");
baz("baz");
quux("quux");
And here is its output:
[foo]
[bar]
[baz]
[quux]
foo
bar
baz
quux
Jon Ericson , 2009-04-08 22:33:36
You can always call subroutines in there fully-specified form:
MyModule::firstsub();
For modules I write internally, I find this convention works fairly well. It's a bit more
typing, but tends to be better documentation.
Take a look at perldoc perlmod for more information about what you are trying
to accomplish.
More generally, you could look at Exporter 's code and see how it uses glob
aliasing. Or you can examine your module's namespace and export each subroutine. (I don't
care to search for how to do that at the moment, but Perl makes this fairly easy.) Or you
could just stick your subroutines in the main package:
package main;
sub firstsub() { ... }
(I don't think that's a good idea, but you know better than I do what you are trying to
accomplish.)
There's nothing wrong with doing this provided you know what you are doing and aren't just
trying to avoid thinking about your interface to the outside world.
ysth , 2009-04-09 01:29:04
Perhaps you would be interested in one of the Export* modules on CPAN that lets you mark subs
as exportable simply by adding an attribute to the sub definition? (Don't remember which one
it was, though.)
Although it is not usually wise to dump all sub s from module into the caller
namespace, it is sometimes useful (and more DRY!) to automatically generate
@EXPORT_OK and %EXPORT_TAGS variables.
The easiest method is to extend the Exporter. A simple example is something like this:
package Exporter::AutoOkay;
#
# Automatically add all subroutines from caller package into the
# @EXPORT_OK array. In the package use like Exporter, f.ex.:
#
# use parent 'Exporter::AutoOkay';
#
use warnings;
use strict;
no strict 'refs';
require Exporter;
sub import {
my $package = $_[0].'::';
# Get the list of exportable items
my @export_ok = (@{$package.'EXPORT_OK'});
# Automatically add all subroutines from package into the list
foreach (keys %{$package}) {
next unless defined &{$package.$_};
push @export_ok, $_;
}
# Set variable ready for Exporter
@{$package.'EXPORT_OK'} = @export_ok;
# Let Exporter do the rest
goto &Exporter::import;
}
1;
Note the use of goto that removes us from the caller stack.
A more complete example can be found here: http://pastebin.com/Z1QWzcpZ It automatically generates
tag groups from subroutine prefixes.
Sérgio , 2013-11-14 21:38:06
case 1
Library is :
package mycommon;
use strict;
use warnings;
sub onefunctionthatyoumadeonlibary() {
}
1;
you can use it, calling common:: :
#!/usr/bin/perl
use strict;
use warnings;
use mycommon;
common::onefunctionthatyoumadeonlibary()
case 2
Library is , yousimple export them :
package mycommon;
use strict;
use warnings;
use base 'Exporter';
our @EXPORT = qw(onefunctionthatyoumadeonlibary);
sub onefunctionthatyoumadeonlibary() {
}
1;
use it in same "namespace":
#!/usr/bin/perl
use strict;
use warnings;
use mycommon qw(onefunctionthatyoumadeonlibary);
onefunctionthatyoumadeonlibary()
Also we can do a mix of this two cases , we can export more common functions to use it
without calling the packages name and other functions that we only call it with package name
and that ones don't need to be exported.
> ,
You will have to do some typeglob munging. I describe something similar here:
Can someone please explain the exact meaning of having leading underscores before an object's
name in Python? Also, explain the difference between a single and a double leading
underscore. Also, does that meaning stay the same whether the object in question is a
variable, a function, a method, etc.?
Andrew Keeton , 2009-08-19 17:15:53
Single Underscore
Names, in a class, with a leading underscore are simply to indicate to other programmers
that the attribute or method is intended to be private. However, nothing special is done with
the name itself.
Any identifier of the form __spam (at least two leading underscores, at
most one trailing underscore) is textually replaced with _classname__spam ,
where classname is the current class name with leading underscore(s) stripped.
This mangling is done without regard to the syntactic position of the identifier, so it can
be used to define class-private instance and class variables, methods, variables stored in
globals, and even variables stored in instances. private to this class on instances of
other classes.
And a warning from the same page:
Name mangling is intended to give classes an easy way to define "private" instance
variables and methods, without having to worry about instance variables defined by derived
classes, or mucking with instance variables by code outside the class. Note that the
mangling rules are designed mostly to avoid accidents; it still is possible for a
determined soul to access or modify a variable that is considered private.
Example
>>> class MyClass():
... def __init__(self):
... self.__superprivate = "Hello"
... self._semiprivate = ", world!"
...
>>> mc = MyClass()
>>> print mc.__superprivate
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: myClass instance has no attribute '__superprivate'
>>> print mc._semiprivate
, world!
>>> print mc.__dict__
{'_MyClass__superprivate': 'Hello', '_semiprivate': ', world!'}
Alex Martelli , 2009-08-19 17:52:36
Excellent answers so far but some tidbits are missing. A single leading underscore isn't
exactly just a convention: if you use from foobar import * , and module
foobar does not define an __all__ list, the names imported from the
module do not include those with a leading underscore. Let's say it's mostly a
convention, since this case is a pretty obscure corner;-).
The leading-underscore convention is widely used not just for private names, but
also for what C++ would call protected ones -- for example, names of methods that are
fully intended to be overridden by subclasses (even ones that have to be overridden since in
the base class they raise NotImplementedError !-) are often
single-leading-underscore names to indicate to code using instances of that class (or
subclasses) that said methods are not meant to be called directly.
For example, to make a thread-safe queue with a different queueing discipline than FIFO,
one imports Queue, subclasses Queue.Queue, and overrides such methods as _get
and _put ; "client code" never calls those ("hook") methods, but rather the
("organizing") public methods such as put and get (this is known as
the Template
Method design pattern -- see e.g. here
for an interesting presentation based on a video of a talk of mine on the subject, with the
addition of synopses of the transcript).
Ned Batchelder , 2009-08-19 17:21:29
__foo__ : this is just a convention, a way for the Python system to use names
that won't conflict with user names.
_foo : this is just a convention, a way for the programmer to indicate that
the variable is private (whatever that means in Python).
__foo : this has real meaning: the interpreter replaces this name with
_classname__foo as a way to ensure that the name will not overlap with a similar
name in another class.
No other form of underscores have meaning in the Python world.
There's no difference between class, variable, global, etc in these conventions.
2 revs, 2 users 93% , 2016-05-17 10:09:08
._variable is semiprivate and meant just for convention
.__variable is often incorrectly considered superprivate, while it's actual
meaning is just to namemangle to prevent accidental access [1]
.__variable__ is typically reserved for builtin methods or variables
You can still access .__mangled variables if you desperately want to. The
double underscores just namemangles, or renames, the variable to something like
instance._className__mangled
Example:
class Test(object):
def __init__(self):
self.__a = 'a'
self._b = 'b'
>>> t = Test()
>>> t._b
'b'
t._b is accessible because it is only hidden by convention
>>> t.__a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute '__a'
t.__a isn't found because it no longer exists due to namemangling
>>> t._Test__a
'a'
By accessing instance._className__variable instead of just the double
underscore name, you can access the hidden value
9 revs, 8 users 82% , 2018-08-21 19:42:09
Single underscore at the beginning:
Python doesn't have real private methods. Instead, one underscore at the start of a method
or attribute name means you shouldn't access this method, because it's not part of the
API.
class BaseForm(StrAndUnicode):
def _get_errors(self):
"Returns an ErrorDict for the data provided for the form"
if self._errors is None:
self.full_clean()
return self._errors
errors = property(_get_errors)
(This code snippet was taken from django source code: django/forms/forms.py). In this
code, errors is a public property, but the method this property calls,
_get_errors, is "private", so you shouldn't access it.
Two underscores at the beginning:
This causes a lot of confusion. It should not be used to create a private method. It
should be used to avoid your method being overridden by a subclass or accessed accidentally.
Let's see an example:
class A(object):
def __test(self):
print "I'm a test method in class A"
def test(self):
self.__test()
a = A()
a.test()
# a.__test() # This fails with an AttributeError
a._A__test() # Works! We can access the mangled name directly!
Output:
$ python test.py
I'm test method in class A
I'm test method in class A
Now create a subclass B and do customization for __test method
class B(A):
def __test(self):
print "I'm test method in class B"
b = B()
b.test()
Output will be....
$ python test.py
I'm test method in class A
As we have seen, A.test() didn't call B.__test() methods, as we might expect. But in fact,
this is the correct behavior for __. The two methods called __test() are automatically
renamed (mangled) to _A__test() and _B__test(), so they do not accidentally override. When
you create a method starting with __ it means that you don't want to anyone to be able to
override it, and you only intend to access it from inside its own class.
Two underscores at the beginning and at the end:
When we see a method like __this__ , don't call it. This is a method which
python is meant to call, not you. Let's take a look:
>>> name = "test string"
>>> name.__len__()
11
>>> len(name)
11
>>> number = 10
>>> number.__add__(40)
50
>>> number + 50
60
There is always an operator or native function which calls these magic methods. Sometimes
it's just a hook python calls in specific situations. For example __init__() is
called when the object is created after __new__() is called to build the
instance...
Let's take an example...
class FalseCalculator(object):
def __init__(self, number):
self.number = number
def __add__(self, number):
return self.number - number
def __sub__(self, number):
return self.number + number
number = FalseCalculator(20)
print number + 10 # 10
print number - 20 # 40
For more details, see the PEP-8
guide . For more magic methods, see this PDF .
Tim D , 2012-01-11 16:28:22
Sometimes you have what appears to be a tuple with a leading underscore as in
def foo(bar):
return _('my_' + bar)
In this case, what's going on is that _() is an alias for a localization function that
operates on text to put it into the proper language, etc. based on the locale. For example,
Sphinx does this, and you'll find among the imports
from sphinx.locale import l_, _
and in sphinx.locale, _() is assigned as an alias of some localization function.
Dev Maha , 2013-04-15 01:58:14
If one really wants to make a variable read-only, IMHO the best way would be to use
property() with only getter passed to it. With property() we can have complete control over
the data.
I understand that OP asked a little different question but since I found another question
asking for 'how to set private variables' marked duplicate with this one, I thought of adding
this additional info here.
SilentGhost ,
Single leading underscores is a convention. there is no difference from the interpreter's
point of view if whether names starts with a single underscore or not.
Double leading and trailing underscores are used for built-in methods, such as
__init__ , __bool__ , etc.
Double leading underscores w/o trailing counterparts are a convention too, however, the
class methods will be mangled by the
interpreter. For variables or basic function names no difference exists.
3 revs , 2018-12-16 11:41:34
Since so many people are referring to Raymond's talk , I'll just make it a
little easier by writing down what he said:
The intention of the double underscores was not about privacy. The intention was to use
it exactly like this
class Circle(object):
def __init__(self, radius):
self.radius = radius
def area(self):
p = self.__perimeter()
r = p / math.pi / 2.0
return math.pi * r ** 2.0
def perimeter(self):
return 2.0 * math.pi * self.radius
__perimeter = perimeter # local reference
class Tire(Circle):
def perimeter(self):
return Circle.perimeter(self) * 1.25
It's actually the opposite of privacy, it's all about freedom. It makes your subclasses
free to override any one method without breaking the others .
Say you don't keep a local reference of perimeter in Circle .
Now, a derived class Tire overrides the implementation of perimeter
, without touching area . When you call Tire(5).area() , in theory
it should still be using Circle.perimeter for computation, but in reality it's
using Tire.perimeter , which is not the intended behavior. That's why we need a
local reference in Circle.
But why __perimeter instead of _perimeter ? Because
_perimeter still gives derived class the chance to override:
Double underscores has name mangling, so there's a very little chance that the local
reference in parent class get override in derived class. thus " makes your subclasses free to
override any one method without breaking the others ".
If your class won't be inherited, or method overriding does not break anything, then you
simply don't need __double_leading_underscore .
u0b34a0f6ae , 2009-08-19 17:31:04
Your question is good, it is not only about methods. Functions and objects in modules are
commonly prefixed with one underscore as well, and can be prefixed by two.
But __double_underscore names are not name-mangled in modules, for example. What happens
is that names beginning with one (or more) underscores are not imported if you import all
from a module (from module import *), nor are the names shown in help(module).
Marc , 2014-08-22 19:15:48
Here is a simple illustrative example on how double underscore properties can affect an
inherited class. So with the following setup:
class parent(object):
__default = "parent"
def __init__(self, name=None):
self.default = name or self.__default
@property
def default(self):
return self.__default
@default.setter
def default(self, value):
self.__default = value
class child(parent):
__default = "child"
if you then create a child instance in the python REPL, you will see the below
child_a = child()
child_a.default # 'parent'
child_a._child__default # 'child'
child_a._parent__default # 'parent'
child_b = child("orphan")
## this will show
child_b.default # 'orphan'
child_a._child__default # 'child'
child_a._parent__default # 'orphan'
This may be obvious to some, but it caught me off guard in a much more complex
environment
aptro , 2015-02-07 17:57:10
"Private" instance variables that cannot be accessed except from inside an object don't exist
in Python. However, there is a convention that is followed by most Python code: a name
prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API
(whether it is a function, a method or a data member). It should be considered an
implementation detail and subject to change without notice.
Great answers and all are correct.I have provided simple example along with simple
definition/meaning.
Meaning:
some_variable --► it's public anyone can see this.
_some_variable --► it's public anyone can see this but it's a convention to indicate
private... warning no enforcement is done by Python.
__some_varaible --► Python replaces the variable name with _classname__some_varaible
(AKA name mangling) and it reduces/hides it's visibility and be more like private
variable.
""Private" instance variables that cannot be accessed except from inside an object don't
exist in Python"
The example:
class A():
here="abc"
_here="_abc"
__here="__abc"
aObject=A()
print(aObject.here)
print(aObject._here)
# now if we try to print __here then it will fail because it's not public variable
#print(aObject.__here)
2 revs , 2017-11-04 17:51:49
Getting the facts of _ and __ is pretty easy; the other answers express them pretty well. The
usage is much harder to determine.
This is how I see it:
_
Should be used to indicate that a function is not for public use as for example an API.
This and the import restriction make it behave much like internal in c#.
__
Should be used to avoid name collision in the inheritace hirarchy and to avoid
latebinding. Much like private in c#.
==>
If you want to indicate that something is not for public use, but it should act like
protected use _ . If you want to indicate that something is not for
public use, but it should act like private use __ .
This is also a quote that I like very much:
The problem is that the author of a class may legitimately think "this attribute/method
name should be private, only accessible from within this class definition" and use the
__private convention. But later on, a user of that class may make a subclass that
legitimately needs access to that name. So either the superclass has to be modified (which
may be difficult or impossible), or the subclass code has to use manually mangled names
(which is ugly and fragile at best).
But the problem with that is in my opinion that if there's no IDE that warns you when you
override methods, finding the error might take you a while if you have accidentially
overriden a method from a base-class.
I'm looking for advice on Perl best practices. I wrote a script which had a complicated
regular expression:
my $regex = qr/complicated/;
# ...
sub foo {
# ...
if (/$regex/)
# ...
}
where foo is a function which is called often, and $regex is not
used outside that function. What is the best way to handle situations like this? I only want
it to be interpreted once, since it's long and complicated. But it seems a bit questionable
to have it in global scope since it's only used in that sub. Is there a reasonable way to
declare it static?
A similar issue arises with another possibly-unjustified global. It reads in the current
date and time and formats it appropriately. This is also used many times, and again only in
one function. But in this case it's even more important that it not be re-initialized, since
I want all instances of the date-time to be the same from a given invocation of the script,
even if the minutes roll over during execution.
At the moment I have something like
my ($regex, $DT);
sub driver {
$regex = qr/complicated/;
$DT = dateTime();
# ...
}
# ...
driver();
which at least slightly segregates it. But perhaps there are better ways.
Again: I'm looking for the right way to do this, in terms of following best practices and
Perl idioms. Performance is nice but readability and other needs take priority if I can't
have everything.
hobbs ,
If you're using perl 5.10+, use a state variable.
use feature 'state';
# use 5.010; also works
sub womble {
state $foo = something_expensive();
return $foo ** 2;
}
will only call something_expensive once.
If you need to work with older perls, then use a lexical variable in an outer scope with
an extra pair of braces:
{
my $foo = something_expensive();
sub womble {
return $foo ** 2;
}
}
this keeps $foo from leaking to anyone except for womble .
ikegami , 2012-05-31 21:14:04
Is there any interpolation in the pattern? If not, the pattern will only be compiled once no
matter how many times the qr// is executed.
use feature qw( state );
sub foo {
state $re = qr/.../;
...
/$re/
...
}
Alan Rocker , 2014-07-02 16:25:27
Regexes can be specified with the "o" modifier, which says "compile pattern once only" - in
the 3rd. edition of the Camel, see p. 147
zoul ,
There's a state
keyword that might be a good fit for this situation:
sub foo {
state $regex = /.../;
...
}
TrueY , 2015-01-23 10:14:12
I would like to complete ikegami 's great answer. Some more words I would like
to waste on the definition of local variables in pre 5.10 perl .
Let's see a simple example code:
#!/bin/env perl
use strict;
use warnings;
{ # local
my $local = "After Crying";
sub show { print $local,"\n"; }
} # local
sub show2;
show;
show2;
exit;
{ # local
my $local = "Solaris";
sub show2 { print $local,"\n"; }
} # local
The user would expect that both sub will print the local variable, but this
is not true!
Output:
After Crying
Use of uninitialized value $local in print at ./x.pl line 20.
The reason is that show2 is parsed, but the initialization of the local
variable is not executed! (Of course if exit is removed and a show2
is added at the end, Solaris will be printed in the thirds line)
This can be fixed easily:
{ # local
my $local;
BEGIN { $local = "Solaris"; }
sub show2 { print $local,"\n"; }
} # local
The canonical way to strip end-of-line (EOL) characters is to use the string rstrip() method
removing any trailing \r or \n. Here are examples for Mac, Windows, and Unix EOL characters.
Using '\r\n' as the parameter to rstrip means that it will strip out any trailing
combination of '\r' or '\n'. That's why it works in all three cases above.
This nuance matters in rare cases. For example, I once had to process a text file which
contained an HL7 message. The HL7 standard requires a trailing '\r' as its EOL character. The
Windows machine on which I was using this message had appended its own '\r\n' EOL character.
Therefore, the end of each line looked like '\r\r\n'. Using rstrip('\r\n') would have taken
off the entire '\r\r\n' which is not what I wanted. In that case, I simply sliced off the
last two characters instead.
Note that unlike Perl's chomp function, this will strip all specified
characters at the end of the string, not just one:
>>> "Hello\n\n\n".rstrip("\n")
"Hello"
, 2008-11-28 17:31:34
Note that rstrip doesn't act exactly like Perl's chomp() because it doesn't modify the
string. That is, in Perl:
$x="a\n";
chomp $x
results in $x being "a" .
but in Python:
x="a\n"
x.rstrip()
will mean that the value of x is still "a\n" . Even
x=x.rstrip() doesn't always give the same result, as it strips all whitespace
from the end of the string, not just one newline at most.
Jamie ,
I might use something like this:
import os
s = s.rstrip(os.linesep)
I think the problem with rstrip("\n") is that you'll probably want to make
sure the line separator is portable. (some antiquated systems are rumored to use
"\r\n" ). The other gotcha is that rstrip will strip out repeated
whitespace. Hopefully os.linesep will contain the right characters. the above
works for me.
kiriloff , 2013-05-13 16:41:22
You may use line = line.rstrip('\n') . This will strip all newlines from the end
of the string, not just one.
slec , 2015-03-09 08:02:55
s = s.rstrip()
will remove all newlines at the end of the string s . The assignment is
needed because rstrip returns a new string instead of modifying the original
string.
Alien Life Form ,
This would replicate exactly perl's chomp (minus behavior on arrays) for "\n" line
terminator:
def chomp(x):
if x.endswith("\r\n"): return x[:-2]
if x.endswith("\n") or x.endswith("\r"): return x[:-1]
return x
(Note: it does not modify string 'in place'; it does not strip extra trailing whitespace;
takes \r\n in account)
Careful with "foo".rstrip(os.linesep) : That will only chomp the newline
characters for the platform where your Python is being executed. Imagine you're chimping the
lines of a Windows file under Linux, for instance:
$ python
Python 2.7.1 (r271:86832, Mar 18 2011, 09:09:48)
[GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.platform
'linux2'
>>> "foo\r\n".rstrip(os.linesep)
'foo\r'
>>>
Use "foo".rstrip("\r\n") instead, as Mike says above.
Perl's chomp function removes one linebreak sequence from the end of a string
only if it's actually there.
Here is how I plan to do that in Python, if process is conceptually the
function that I need in order to do something useful to each line from this file:
import os
sep_pos = -len(os.linesep)
with open("file.txt") as f:
for line in f:
if line[sep_pos:] == os.linesep:
line = line[:sep_pos]
process(line)
I don't program in Python, but I came across an
FAQ at python.org advocating S.rstrip("\r\n") for python 2.2 or later.
, 2014-01-20 19:07:03
import re
r_unwanted = re.compile("[\n\t\r]")
r_unwanted.sub("", your_text)
Leozj ,
If your question is to clean up all the line breaks in a multiple line str object (oldstr),
you can split it into a list according to the delimiter '\n' and then join this list into a
new str(newstr).
newstr = "".join(oldstr.split('\n'))
kuzzooroo ,
I find it convenient to have be able to get the chomped lines via in iterator, parallel to
the way you can get the un-chomped lines from a file object. You can do so with the following
code:
with open("file.txt") as infile:
for line in chomped_lines(infile):
process(line)
Chij , 2011-11-30 14:04:19
workaround solution for special case:
if the newline character is the last character (as is the case with most file inputs),
then for any element in the collection you can index as follows:
foobar= foobar[:-1]
to slice out your newline character.
user3780389 , 2017-04-26 17:58:16
It looks like there is not a perfect analog for perl's chomp . In particular, rstrip cannot handle
multi-character newline delimiters like \r\n . However, splitlines does
as pointed out here
. Following my
answer on a different question, you can combine join and splitlines to
remove/replace all newlines from a string s :
''.join(s.splitlines())
The following removes exactly one trailing newline (as chomp would, I believe).
Passing True as the keepends argument to splitlines retain the
delimiters. Then, splitlines is called again to remove the delimiters on just the last
"line":
def chomp(s):
if len(s):
lines = s.splitlines(True)
last = lines.pop()
return ''.join(lines + last.splitlines())
else:
return ''
Taylor Edmiston ,
I'm bubbling up my regular expression based answer from one I posted earlier in the comments
of another answer. I think using re is a clearer more explicit solution to this
problem than str.rstrip .
>>> import re
If you want to remove one or more trailing newline chars:
>>> re.sub(r'[\n\r]+$', '', '\nx\r\n')
'\nx'
If you want to remove newline chars everywhere (not just trailing):
>>> re.sub(r'[\n\r]+', '', '\nx\r\n')
'x'
If you want to remove only 1-2 trailing newline chars (i.e., \r ,
\n , \r\n , \n\r , \r\r ,
\n\n )
I have a feeling what most people really want here, is to remove just one
occurrence of a trailing newline character, either \r\n or \n and
nothing more.
(By the way this is not what '...'.rstrip('\n', '').rstrip('\r', '')
does which may not be clear to others stumbling upon this thread. str.rstrip
strips as many of the trailing characters as possible, so a string like
foo\n\n\n would result in a false positive of foo whereas you may
have wanted to preserve the other newlines after stripping a single trailing one.)
Help me , 2016-05-20 12:29:21
Just use :
line = line.rstrip("\n")
or
line = line.strip("\n")
You don't need any of this complicated stuff
, 2016-11-22 18:30:37
>>> ' spacious '.rstrip()
' spacious'
>>> "AABAA".rstrip("A")
'AAB'
>>> "ABBA".rstrip("AB") # both AB and BA are stripped
''
>>> "ABCABBA".rstrip("AB")
'ABC'
internetional , 2016-11-22 20:17:58
There are three types of line endings that we normally encounter: \n ,
\r and \r\n . A rather simple regular expression in re.sub ,
namely r"\r?\n?$" , is able to catch them all.
(And we gotta catch 'em all , am I right?)
import re
re.sub(r"\r?\n?$", "", the_text, 1)
With the last argument, we limit the number of occurences replaced to one, mimicking chomp
to some extent. Example:
import re
text_1 = "hellothere\n\n\n"
text_2 = "hellothere\n\n\r"
text_3 = "hellothere\n\n\r\n"
a = re.sub(r"\r?\n?$", "", text_1, 1)
b = re.sub(r"\r?\n?$", "", text_2, 1)
c = re.sub(r"\r?\n?$", "", text_3, 1)
... where a == b == c is True .
Venfah Nazir , 2018-06-15 07:24:21
This will work both for windows and linux (bit expensive with re sub if you are looking
for only re solution)
import re
if re.search("(\\r|)\\n$", line):
line = re.sub("(\\r|)\\n$", "", line)
Stephen Miller ,
If you are concerned about speed (say you have a looong list of strings) and you know the
nature of the newline char, string slicing is actually faster than rstrip. A little test to
illustrate this:
import time
loops = 50000000
def method1(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string[:-1]
t1 = time.time()
print('Method 1: ' + str(t1 - t0))
def method2(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string.rstrip()
t1 = time.time()
print('Method 2: ' + str(t1 - t0))
method1()
method2()
Output:
Method 1: 3.92700004578
Method 2: 6.73000001907
sim , 2019-10-22 07:43:27
s = '''Hello World \t\n\r\tHi There'''
# import the module string
import string
# use the method translate to convert
s.translate({ord(c): None for c in string.whitespace}
>>'HelloWorldHiThere'
With regex
s = ''' Hello World
\t\n\r\tHi '''
print(re.sub(r"\s+", "", s), sep='') # \s matches all white spaces
>HelloWorldHi
Replace \n,\t,\r
s.replace('\n', '').replace('\t','').replace('\r','')
>' Hello World Hi '
With regex
s = '''Hello World \t\n\r\tHi There'''
regex = re.compile(r'[\n\r\t]')
regex.sub("", s)
>'Hello World Hi There'
with Join
s = '''Hello World \t\n\r\tHi There'''
' '.join(s.split())
>'Hello World Hi There'
DeepBlue , 2019-11-06 20:50:30
First split lines then join them by any separator you like.
x = ' '.join(x.splitlines())
should work like a charm.
user4178860 , 2014-10-24 18:34:12
A catch all:
line = line.rstrip('\r|\n')
Flimm , 2016-06-30 16:20:15
rstrip does not take regular expression.
"hi|||\n\n".rstrip("\r|\n") returns "hi" – Flimm Jun 30 '16 at 16:20
How can I create or use a global variable in a function?
If I create a global variable in one function, how can I use that global variable in
another function? Do I need to store the global variable in a local variable of the function
which needs its access?
Paul Stephenson , 2009-01-08 08:39:44
You can use a global variable in other functions by declaring it as global in
each function that assigns to it:
globvar = 0
def set_globvar_to_one():
global globvar # Needed to modify global copy of globvar
globvar = 1
def print_globvar():
print(globvar) # No need for global declaration to read value of globvar
set_globvar_to_one()
print_globvar() # Prints 1
I imagine the reason for it is that, since global variables are so dangerous, Python wants
to make sure that you really know that's what you're playing with by explicitly requiring the
global keyword.
See other answers if you want to share a global variable across modules.
Jeff Shannon , 2009-01-08 09:19:55
If I'm understanding your situation correctly, what you're seeing is the result of how Python
handles local (function) and global (module) namespaces.
You might expecting this to print 42, but instead it prints 5. As has already been
mentioned, if you add a ' global ' declaration to func1() , then
func2() will print 42.
def func1():
global myGlobal
myGlobal = 42
What's going on here is that Python assumes that any name that is assigned to ,
anywhere within a function, is local to that function unless explicitly told otherwise. If it
is only reading from a name, and the name doesn't exist locally, it will try to look
up the name in any containing scopes (e.g. the module's global scope).
When you assign 42 to the name myGlobal , therefore, Python creates a local
variable that shadows the global variable of the same name. That local goes out of scope and
is garbage-collected
when func1() returns; meanwhile, func2() can never see anything
other than the (unmodified) global name. Note that this namespace decision happens at compile
time, not at runtime -- if you were to read the value of myGlobal inside
func1() before you assign to it, you'd get an UnboundLocalError ,
because Python has already decided that it must be a local variable but it has not had any
value associated with it yet. But by using the ' global ' statement, you tell
Python that it should look elsewhere for the name instead of assigning to it locally.
(I believe that this behavior originated largely through an optimization of local
namespaces -- without this behavior, Python's VM would need to perform at least three name
lookups each time a new name is assigned to inside a function (to ensure that the name didn't
already exist at module/builtin level), which would significantly slow down a very common
operation.)
gimel , 2009-01-08 05:59:04
You may want to explore the notion of namespaces . In Python, the module is the natural place
for global data:
Each module has its own private symbol table, which is used as the global symbol table
by all functions defined in the module. Thus, the author of a module can use global
variables in the module without worrying about accidental clashes with a user's global
variables. On the other hand, if you know what you are doing you can touch a module's
global variables with the same notation used to refer to its functions,
modname.itemname .
The canonical way to share information across modules within a single program is to
create a special configuration module (often called config or cfg ). Just import the
configuration module in all modules of your application; the module then becomes available
as a global name. Because there is only one instance of each module, any changes made to
the module object get reflected everywhere. For example:
File: config.py
x = 0 # Default value of the 'x' configuration setting
File: mod.py
import config
config.x = 1
File: main.py
import config
import mod
print config.x
SingleNegationElimination ,
Python uses a simple heuristic to decide which scope it should load a variable from, between
local and global. If a variable name appears on the left hand side of an assignment, but is
not declared global, it is assumed to be local. If it does not appear on the left hand side
of an assignment, it is assumed to be global.
See how baz, which appears on the left side of an assignment in foo() , is
the only LOAD_FAST variable.
J S , 2009-01-08 09:03:33
If you want to refer to a global variable in a function, you can use the global keyword to
declare which variables are global. You don't have to use it in all cases (as someone here
incorrectly claims) - if the name referenced in an expression cannot be found in local scope
or scopes in the functions in which this function is defined, it is looked up among global
variables.
However, if you assign to a new variable not declared as global in the function, it is
implicitly declared as local, and it can overshadow any existing global variable with the
same name.
Also, global variables are useful, contrary to some OOP zealots who claim otherwise -
especially for smaller scripts, where OOP is overkill.
Rauni Lillemets ,
In addition to already existing answers and to make this more confusing:
In Python, variables that are only referenced inside a function are implicitly global .
If a variable is assigned a new value anywhere within the function's body, it's assumed to
be a local . If a variable is ever assigned a new value inside the function, the variable
is implicitly local, and you need to explicitly declare it as 'global'.
Though a bit surprising at first, a moment's consideration explains this. On one hand,
requiring global for assigned variables provides a bar against unintended side-effects. On
the other hand, if global was required for all global references, you'd be using global all
the time. You'd have to declare as global every reference to a built-in function or to a
component of an imported module. This clutter would defeat the usefulness of the global
declaration for identifying side-effects.
If I create a global variable in one function, how can I use that variable in another
function?
We can create a global with the following function:
def create_global_variable():
global global_variable # must declare it to be a global first
# modifications are thus reflected on the module's global scope
global_variable = 'Foo'
Writing a function does not actually run its code. So we call the
create_global_variable function:
>>> create_global_variable()
Using globals without modification
You can just use it, so long as you don't expect to change which object it points to:
Modification of the global variable from inside a function
To point the global variable at a different object, you are required to use the global
keyword again:
def change_global_variable():
global global_variable
global_variable = 'Bar'
Note that after writing this function, the code actually changing it has still not
run:
>>> use_global_variable()
'Foo!!!'
So after calling the function:
>>> change_global_variable()
we can see that the global variable has been changed. The global_variable
name now points to 'Bar' :
>>> use_global_variable()
'Bar!!!'
Note that "global" in Python is not truly global - it's only global to the module level.
So it is only available to functions written in the modules in which it is global. Functions
remember the module in which they are written, so when they are exported into other modules,
they still look in the module in which they were created to find global
variables.
Local variables with the same name
If you create a local variable with the same name, it will overshadow a global
variable:
def use_local_with_same_name_as_global():
# bad name for a local variable, though.
global_variable = 'Baz'
return global_variable + '!!!'
>>> use_local_with_same_name_as_global()
'Baz!!!'
But using that misnamed local variable does not change the global variable:
>>> use_global_variable()
'Bar!!!'
Note that you should avoid using the local variables with the same names as globals unless
you know precisely what you are doing and have a very good reason to do so. I have not yet
encountered such a reason.
Bohdan , 2013-10-03 05:41:16
With parallel execution, global variables can cause unexpected results if you don't
understand what is happening. Here is an example of using a global variable within
multiprocessing. We can clearly see that each process works with its own copy of the
variable:
You need to reference the global variable in every function you want to use.
As follows:
var = "test"
def printGlobalText():
global var #wWe are telling to explicitly use the global version
var = "global from printGlobalText fun."
print "var from printGlobalText: " + var
def printLocalText():
#We are NOT telling to explicitly use the global version, so we are creating a local variable
var = "local version from printLocalText fun"
print "var from printLocalText: " + var
printGlobalText()
printLocalText()
"""
Output Result:
var from printGlobalText: global from printGlobalText fun.
var from printLocalText: local version from printLocalText
[Finished in 0.1s]
"""
Kylotan , 2009-01-09 11:56:19
You're not actually storing the global in a local variable, just creating a local reference
to the same object that your original global reference refers to. Remember that pretty much
everything in Python is a name referring to an object, and nothing gets copied in usual
operation.
If you didn't have to explicitly specify when an identifier was to refer to a predefined
global, then you'd presumably have to explicitly specify when an identifier is a new local
variable instead (for example, with something like the 'var' command seen in JavaScript).
Since local variables are more common than global variables in any serious and non-trivial
system, Python's system makes more sense in most cases.
You could have a language which attempted to guess, using a global variable if it
existed or creating a local variable if it didn't. However, that would be very error-prone.
For example, importing another module could inadvertently introduce a global variable by that
name, changing the behaviour of your program.
Sagar Mehta ,
Try this:
def x1():
global x
x = 6
def x2():
global x
x = x+1
print x
x = 5
x1()
x2() # output --> 7
Martin Thoma , 2017-04-07 18:52:13
In case you have a local variable with the same name, you might want to use the globals()
function .
globals()['your_global_var'] = 42
, 2015-10-24 15:46:18
Following on and as an add on, use a file to contain all global variables all declared
locally and then import as :
File initval.py :
Stocksin = 300
Prices = []
File getstocks.py :
import initval as iv
def getmystocks():
iv.Stocksin = getstockcount()
def getmycharts():
for ic in range(iv.Stocksin):
Mike Lampton , 2016-01-07 20:41:19
Writing to explicit elements of a global array does not apparently need the global
declaration, though writing to it "wholesale" does have that requirement:
import numpy as np
hostValue = 3.14159
hostArray = np.array([2., 3.])
hostMatrix = np.array([[1.0, 0.0],[ 0.0, 1.0]])
def func1():
global hostValue # mandatory, else local.
hostValue = 2.0
def func2():
global hostValue # mandatory, else UnboundLocalError.
hostValue += 1.0
def func3():
global hostArray # mandatory, else local.
hostArray = np.array([14., 15.])
def func4(): # no need for globals
hostArray[0] = 123.4
def func5(): # no need for globals
hostArray[1] += 1.0
def func6(): # no need for globals
hostMatrix[1][1] = 12.
def func7(): # no need for globals
hostMatrix[0][0] += 0.33
func1()
print "After func1(), hostValue = ", hostValue
func2()
print "After func2(), hostValue = ", hostValue
func3()
print "After func3(), hostArray = ", hostArray
func4()
print "After func4(), hostArray = ", hostArray
func5()
print "After func5(), hostArray = ", hostArray
func6()
print "After func6(), hostMatrix = \n", hostMatrix
func7()
print "After func7(), hostMatrix = \n", hostMatrix
Rafaël Dera ,
I'm adding this as I haven't seen it in any of the other answers and it might be useful for
someone struggling with something similar. The globals() function
returns a mutable global symbol dictionary where you can "magically" make data available for
the rest of your code. For example:
from pickle import load
def loaditem(name):
with open(r"C:\pickle\file\location"+"\{}.dat".format(name), "rb") as openfile:
globals()[name] = load(openfile)
return True
and
from pickle import dump
def dumpfile(name):
with open(name+".dat", "wb") as outfile:
dump(globals()[name], outfile)
return True
Will just let you dump/load variables out of and into the global namespace. Super
convenient, no muss, no fuss. Pretty sure it's Python 3 only.
llewellyn falco , 2017-08-19 08:48:27
Reference the class namespace where you want the change to show up.
In this example, runner is using max from the file config. I want my test to change the
value of max when runner is using it.
main/config.py
max = 15000
main/runner.py
from main import config
def check_threads():
return max < thread_count
tests/runner_test.py
from main import runner # <----- 1. add file
from main.runner import check_threads
class RunnerTest(unittest):
def test_threads(self):
runner.max = 0 # <----- 2. set global
check_threads()
Use the reindent.py script that you find in the Tools/scripts/ directory of your Python installation:
Change Python (.py) files to use 4-space indents and no hard tab characters. Also trim excess spaces and tabs from ends
of lines, and remove empty lines at the end of files. Also ensure the last line ends with a newline.
Have a look at that script for detailed usage instructions.
I was wondering if there exists a sort of Python beautifier like the gnu-indent command line tool for C code. Of course indentation
is not the point in Python since it is programmer's responsibility but I wish to get my code written in a perfectly homogenous
way, taking care particularly of having always identical blank space between operands or after and before separators and between
blocks.
I am the one who asks the question. In fact, the tool the closest to my needs seems to be
PythonTidy (it's a Python program of course : Python is best
served by himself ;) ).
I've been looking at passing arrays, or lists, as Python tends to call them, into a function.
I read something about using *args, such as:
def someFunc(*args)
for x in args
print x
But not sure if this is right/wrong. Nothing seems to work as I want. I'm used to be able
to pass arrays into PHP function with ease and this is confusing me. It also seems I can't do
this:
def someFunc(*args, someString)
As it throws up an error.
I think I've just got myself completely confused and looking for someone to clear it up
for me.
You're telling it that you expect a variable number of arguments. If you want to pass in a
List (Array from other languages) you'd do something like this:
def someFunc(myList = [], *args)
for x in myList:
print x
Then you can call it with this:
items = [1,2,3,4,5]
someFunc(items)
You need to define named arguments before variable arguments, and variable arguments
before keyword arguments. You can also have this:
def someFunc(arg1, arg2, arg3, *args, **kwargs)
for x in args
print x
Which requires at least three arguments, and supports variable numbers of other arguments
and keyword arguments.
Python lists (which are not just arrays because their size can be changed on the fly) are
normal Python objects and can be passed in to functions as any variable. The * syntax is used
for unpacking lists, which is probably not something you want to do now.
,
You don't need to use the asterisk to accept a list.
Simply give the argument a name in the definition, and pass in a list like
def takes_list(a_list):
for item in a_list:
print item
# Note: this first one does not work in Python 3
print >> sys.stderr, "spam"
sys.stderr.write("spam\n")
os.write(2, b"spam\n")
from __future__ import print_function
print("spam", file=sys.stderr)
That seems to contradict zen of Python #13† , so what's the
difference here and are there any advantages or disadvantages to one way or the other? Which
way should be used?
†There should be one -- and preferably only one -- obvious way to
do it.
Is my choice, just more readable and saying exactly what you intend to do and portable
across versions.
Edit: being 'pythonic' is a third thought to me over readability and performance... with
these two things in mind, with python 80% of your code will be pythonic. list comprehension
being the 'big thing' that isn't used as often (readability).
For Python 2 my choice is: print >> sys.stderr, 'spam' Because you can
simply print lists/dicts etc. without convert it to string. print >> sys.stderr,
{'spam': 'spam'} instead of: sys.stderr.write(str({'spam': 'spam'}))
Nobody's mentioned logging yet, but logging was created specifically to
communicate error messages. By default it is set up to write to stderr. This script:
# foo.py
import logging
logging.basicConfig(format='%(message)s')
logging.warning('I print to stderr by default')
logging.info('For this you must change the level and add a handler.')
print('hello world')
has the following result when run on the command line:
$ python3 foo.py > bar.txt
I print to stderr by default
(and bar.txt contains the 'hello world')
(Note, logging.warn has been deprecated , use
logging.warning instead)
EDIT In hind-sight, I think the potential confusion with changing sys.stderr and not seeing
the behaviour updated makes this answer not as good as just using a simple function as others
have pointed out.
Using partial only saves you 1 line of code. The potential confusion is not worth saving 1
line of code.
original
To make it even easier, here's a version that uses 'partial', which is a big help in
wrapping functions.
from __future__ import print_function
import sys
from functools import partial
error = partial(print, file=sys.stderr)
# over-ride stderr to prove that this function works.
class NullDevice():
def write(self, s):
pass
sys.stderr = NullDevice()
# we must import print error AFTER we've removed the null device because
# it has been assigned and will not be re-evaluated.
# assume error function is in print_error.py
from print_error import error
# no message should be printed
error("You won't see this error!")
The downside to this is partial assigns the value of sys.stderr to the wrapped function at
the time of creation. Which means, if you redirect stderr later it won't affect this
function. If you plan to redirect stderr, then use the **kwargs method mentioned by
aaguirre on this
page.
As stated in the other answers, print offers a pretty interface that is often more
convenient (e.g. for printing debug information), while write is faster and can also
be more convenient when you have to format the output exactly in certain way. I would
consider maintainability as well:
You may later decide to switch between stdout/stderr and a regular file.
print() syntax has changed in Python 3, so if you need to support both versions,
write() might be better.
Answer to the question is : There are different way to print stderr in python but that
depends on 1.) which python version we are using 2.) what exact output we want.
The differnce between print and stderr's write function: stderr : stderr (standard error)
is pipe that is built into every UNIX/Linux system, when your program crashes and prints out
debugging information (like a traceback in Python), it goes to the stderr pipe.
print : print is a wrapper that formats the inputs (the input is the space between
argument and the newline at the end) and it then calls the write function of a given object,
the given object by default is sys.stdout, but we can pass a file i.e we can print the input
in a file also.
Python2: If we are using python2 then
>>> import sys
>>> print "hi"
hi
>>> print("hi")
hi
>>> print >> sys.stderr.write("hi")
hi
Python2 trailing comma has in Python3 become a parameter, so if we use trailing commas
to avoid the newline after a print, this will in Python3 look like print('Text to print',
end=' ') which is a syntax error under Python2.
Under Python 2.6 there is a future import to make print into a function. So to avoid any
syntax errors and other differences we should start any file where we use print() with from
future import print_function. The future import only works under Python 2.6 and later, so
for Python 2.5 and earlier you have two options. You can either convert the more complex
print to something simpler, or you can use a separate print function that works under both
Python2 and Python3.
>>> from __future__ import print_function
>>>
>>> def printex(*args, **kwargs):
... print(*args, file=sys.stderr, **kwargs)
...
>>> printex("hii")
hii
>>>
Case: Point to be noted that sys.stderr.write() or sys.stdout.write() ( stdout (standard
output) is a pipe that is built into every UNIX/Linux system) is not a replacement for
print, but yes we can use it as a alternative in some case. Print is a wrapper which wraps
the input with space and newline at the end and uses the write function to write. This is
the reason sys.stderr.write() is faster.
Note: we can also trace and debugg using Logging
#test.py
import logging
logging.info('This is the existing protocol.')
FORMAT = "%(asctime)-15s %(clientip)s %(user)-8s %(message)s"
logging.basicConfig(format=FORMAT)
d = {'clientip': '192.168.0.1', 'user': 'fbloggs'}
logging.warning("Protocol problem: %s", "connection reset", extra=d)
oct EXPR
oct Interprets EXPR as an octal string and returns the
corresponding value. (If EXPR happens to start
off with "0x", interprets it as a hex string. If
EXPR starts off with "0b", it is interpreted as a
binary string. Leading whitespace is ignored in
all three cases.)
I'm trying to match nested {} brackets with a regular expressions in Perl so
that I can extract certain pieces of text from a file. This is what I have currently:
At certain times this works as expected. For instance, if $str = "abc {{xyz} abc}
{xyz}" I obtain:
abc
{{xyz} abc}
{xyz}
as expected. But for other input strings it does not function as expected. For example, if
$str = "{abc} {{xyz}} abc" , the output is:
{abc} {{xyz}}
abc
which is not what I expected. I would have wanted {abc} and
{{xyz}} to be on separate lines, since each is balanced on its own in terms of
brackets. Is there an issue with my regular expression? If so, how would I go about fixing
it?
The problem of matching balanced and nested delimiters is covered in perlfaq5
and I'll leave it to them to cover all the options including (?PARNO) and Regexp::Common .
But matching balanced items is tricky and prone to error, unless you really want to learn
and maintain advanced regexes, leave it to a module. Fortunately there is Text::Balanced to handle this and so
very much more. It is the Swiss Army Chainsaw of balanced text matching.
use v5.10;
use strict;
use warnings;
use Text::Balanced qw(extract_multiple extract_bracketed);
my @strings = ("abc {{xyz} abc} {xyz}", "{abc} {{xyz}} abc");
for my $string (@strings) {
say "Extracting from $string";
# Extract all the fields, rather than one at a time.
my @fields = extract_multiple(
$string,
[
# Extract {...}
sub { extract_bracketed($_[0], '{}') },
# Also extract any other non whitespace
qr/\S+/
],
# Return all the fields
undef,
# Throw out anything which does not match
1
);
say join "\n", @fields;
print "\n";
}
To match nested brackets with just one pair at each level of nesting,
but any number of levels, e.g. {1{2{3}}} , you could use
/\{[^}]*[^{]*\}|\w+/g
To match when there may be multiple pairs at any level of nesting, e.g.
{1{2}{2}{2}} , you could use
/(?>\{(?:[^{}]*|(?R))*\})|\w+/g
The (?R) is used to match the whole pattern recursively.
To match the text contained within a pair of brackets the engine must match
(?:[^{}]*|(?R))* ,
i.e. either [^{}]* or (?R) , zero or more times *
.
So in e.g. "{abc {def}}" , after the opening "{" is matched, the
[^{}]* will match the "abc " and the (?R) will match
the "{def}" , then the closing "}" will be matched.
The "{def}" is matched because (?R) is simply short for the
whole pattern (?>\{(?:[^{}]*|(?R))*\})|\w+ , which as we have just seen will match a
"{" followed by text matching [^{}]* , followed by "}"
.
Atomic grouping (?> ... ) is used to prevent the regex engine
backtracking into bracketed text once it has been matched. This is important to ensure the
regex will fail fast if it cannot find a match.
Wow. What a bunch of complicated answers to something that simple.
The problem you're having is that you're matching in greedy mode. That is, you are aking
the regex engine to match as much as possible while making the expression true.
To avoid greedy match, just add a '?' after your quantifier. That makes the match as short
as possible.
So, I changed your expression from:
my @matches = $str =~ /\{(?:\{.*\}|[^\{])*\}|\w+/sg;
To:
my @matches = $str =~ /\{(?:\{.*?\}|[^\{])*?\}|\w+/sg;
Demo (This is in PCRE. The
behavior is slightly different from Perl when it comes to recursive regex, but I think it
should produce the same result for this case).
After some struggle (I am not familiar with Perl!), this is the demo on ideone . $& refers to the string matched
by the whole regex.
my $str = "abc {{xyz} abc} {xyz} {abc} {{xyz}} abc";
while ($str =~ /(\{(?:(?1)|[^{}]*+)++\})|[^{}\s]++/g) {
print "$&\n"
}
Note that this solution assumes that the input is valid. It will behave rather randomly on
invalid input. It can be modified slightly to halt when invalid input is encountered. For
that, I need more details on the input format (preferably as a grammar), such as whether
abc{xyz}asd is considered valid input or not.
I second ysth's suggestion to use the Text::Balanced module. A few
lines will get you on your way.
use strict;
use warnings;
use Text::Balanced qw/extract_multiple extract_bracketed/;
my $file;
open my $fileHandle, '<', 'file.txt';
{
local $/ = undef; # or use File::Slurp
$file = <$fileHandle>;
}
close $fileHandle;
my @array = extract_multiple(
$file,
[ sub{extract_bracketed($_[0], '{}')},],
undef,
1
);
print $_,"\n" foreach @array;
I don't think pure regular expressions are what you want to use here (IMHO this might not
even be parsable using regex).
Instead, build a small parser, similar to what's shown here: http://www.perlmonks.org/?node_id=308039 (see
the answer by shotgunefx (Parson) on Nov 18, 2003 at 18:29 UTC)
UPDATE It seems it might be doable with a regex - I saw a reference to matching nested
parentheses in Mastering
Regular Expressions (that's available on Google Books and thus can be googled for if you
don't have the book - see Chapter 5, section "Matching balanced sets of parentheses")
You're much better off using a state machine than a regex for this type of parsing.
> ,
Regular expressions are actually pretty bad for matching braces. Depending how deep you want
to go, you could write a full grammar (which is a lot easier than it sounds!) for Parse::RecDescent . Or, if you
just want to get the blocks, search through for opening '{' marks and closing '}', and just
keep count of how many are open at any given time.
1) Created ~/.perldb , which did not exist previously.
2) Added &parse_options("HistFile=$ENV{HOME}/.perldb.hist"); from mirod's
answer.
3) Added export PERLDB_OPTS=HistFile=$HOME/.perldb.history to ~/.bashrc from
mephinet's answer.
4) Ran source .bashrc
5) Ran perl -d my program.pl , and got this warning/error
perldb: Must not source insecure rcfile /home/ics/.perldb.
You or the superuser must be the owner, and it must not
be writable by anyone but its owner.
6) I protected ~/.perldb with owner rw chmod 700 ~/.perldb , and
the error went away.
I want to convert the above in a string with each of the words in double quotes
You can use the following regex -
>>> line="a sentence with a few words"
>>> import re
>>> re.sub(r'(\w+)',r'"\1"',line)
'"a" "sentence" "with" "a" "few" "words"'
This would take into consideration punctuations, etc as well (if that is really what you
wanted) -
>>> line="a sentence with a few words. And, lots of punctuations!"
>>> re.sub(r'(\w+)',r'"\1"',line)
'"a" "sentence" "with" "a" "few" "words". "And", "lots" "of" "punctuations"!'
Or you can something simpler (more implementation but easier for beginners) by searching for
each space in the quote then slice whatever between the spaces, add " before and after it
then print it.
Joshua Day ,
Currently developing reporting and testing tools for linux
Updated Apr 26 · Author has 83 answers and 71k answer views
There are several reasons and ill try to name a few.
Perl syntax and semantics closely resembles shell languages that are part of core Unix
systems like sed, awk, and bash. Of these languages at least bash knowledge is required to
administer a Unix system anyway.
Perl was designed to replace or improve the shell languages in Unix/linux by combining
all their best features into a single language whereby an administrator can write a complex
script with a single language instead of 3 languages. It was essentially designed for
Unix/linux system administration.
Perl regular expressions (text manipulation) were modeled off of sed and then drastically
improved upon to the extent that subsequent languages like python have borrowed the syntax
because of just how powerful it is. This is infinitely powerful on a unix system because the
entire OS is controlled using textual data and files. No other language ever devised has
implemented regular expressions as gracefully as perl and that includes the beloved python.
Only in perl is regex integrated with such natural syntax.
Perl typically comes preinstalled on Unix and linux systems and is practically considered
part of the collection of softwares that define such a system.
Thousands of apps written for Unix and linux utilize the unique properties of this
language to accomplish any number of tasks. A Unix/linux sysadmin must be somewhat familiar
with perl to be effective at all. To remove the language would take considerable effort for
most systems to the extent that it's not practical.. Therefore with regard to this
environment Perl will remain for years to come.
Perl's module archive called CPAN already contains a massive quantity of modules geared
directly for unix systems. If you use Perl for your administration tasks you can capitalize
on these modules. These are not newly written and untested modules. These libraries have been
controlling Unix systems for 20 years reliably and the pinnacle of stability in Unix systems
running across the world.
Perl is particularly good at glueing other software together. It can take the output of
one application and manipulate it into a format that is easily consumable by another, mostly
due to its simplistic text manipulation syntax. This has made Perl the number 1 glue language
in the world. There are millions of softwares around the world that are talking to each other
even though they were not designed to do so. This is in large part because of Perl. This
particular niche will probably decline as standardization of interchange formats and APIs
improves but it will never go away.
I hope this helps you understand why perl is so prominent for Unix administrators. These
features may not seem so obviously valuable on windows systems and the like. However on Unix
systems this language comes alive like no other.
Posted by EditorDavid
on Saturday September 08, 2018 @03:34PM
from the
newer-kid-on-the-block
dept.
InfoWorld described the move as a "breakthrough":
As expected, Python has climbed into the Top 3 of the
Tiobe index of language popularity,
achieving that milestone for the first time
ever in the September 2018 edition of the index. With a rating
of 7.653 percent,
Python placed third
behind first-place Java,
which had a rating of 17.436 percent, and second-place C, rated at 15.447. Python displaced C++, which finished
third last month and took fourth place this month, with a rating of 7.394 percent...
Python also has been scoring high in two other language rankings:
- The PyPL Popularity of Programming Language index,
where it ranked
No. 1 this month
, as it has done before, and has had the most growth in the past five years.
- The RedMonk Programming Language Rankings,
where Python again placed third
.
Tiobe notes that Python's arrival in the top 3 "really took a long time," since it first entered their chart at
the beginning of the 1990s. But today, "It is already the first choice at universities (for all kinds of
subjects for which programming is demanded) and is now also conquering the industrial world." In February Tiobe
also added a new programming language to their index: SQL. (Since "SQL appears to be Turing complete.")
Never mind Python 2 vs 3; one major reason I shy away from Python is the incompatibility in
point releases. I'd see "requires Python 2.6� and see that I have Python 2.7 so it should be
fine, right? Nope, code written for 2.6 won't run under Python 2.7. It needs to be EXACTLY
2.6.
It's at this point that some Python fanboi gets really upset and starts screaming
about how that's now problem, with Python you set up separate virtual environments for each
script, so that each one can have exactly the version of Python it is written for, with
exactly the version of each library. When there is some bug or security issue you then hope
that there is a patch for each, and deal with all that. (As opposed to every other peice of
software in the world, which you simply upgrade to the latest version to get all the latest
fixes). Yes, you CAN deal with that problem, it's possible, in most cases. You shouldn't
have to. Every other language does some simple things to maintain backward compatibility in
point releases (and mostly in major releases too).
Also the fact that most languages use every day and have used for decades use braces for
blocks means my eyes and mind are very much trained for that. Braces aren't necessarily
BETTER than just using indentation, but it's kinda like building a car which uses the pedals
to steer and a hand stick for the gas. It's not necessarily inherently better or worse, but
it would be almost undriveable for an experienced driver with decades of muscle memory in
normal cars. Python's seemingly random choices on these things make it feel like using your
feet to steer a car. There should be compelling reasons before you break long-established
conventions and Python seems to prefer to break conventions just to be different. It seems
the Python team is a bit like Berstein in that way. It's really annoying.
Yeah, just FYI Python 2.7 is in a way its own thing. Different from the 2.x and
different from the 3.x series. 2.6 is a no holds barred pure 2.x whereas 2.7 is a mixture
of 2.x and 3.x features. So if you want to compare point releases, best to try that with
the 3.x series. Also, if you're using something that requires the 2.x series, you
shouldn't use that unless it is absolutely critical with zero replacements.
You shouldn't have to. Every other language does some simple things to maintain
backward compatibility in point releases (and mostly in major releases too).
Again see argument about 3.x, but yeah not every language does this. Java 8/9
transition breaks things. ASP.Net to ASP.Net core breaks things along the way. I'm
interested in what languages you have in mind, because I know quite a few languages that
do maintain backwards compatibility (ish). For example, C++ pre and post namespace breaks
fstreams in programs, but compilers provide flags to override that, so it depends on what
you mean by breaking. Does it count if the compiler by default breaks, but providing
flags fixes it? Because if your definition means including flags break compatibility,
then oooh boy are there a a shit ton of broken languages.
Also the fact that most languages use every day and have used for decades use braces
for blocks means my eyes and mind are very much trained for that
Yeah, it's clear that you've never used a positional programming language. I guess
it'll be a sign of my age, but buddy, program COBOL or RPG on punch cards and let me know
about that curly brace issue you're having. Positional and indentation has been used way,
way, way, longer than curly braces. That's not me knocking on the curly braces, I love my
C/C++ folks out there! But I hate to tell you C and C-style is pretty recent in the
timeline of all things computer.
> C++ pre and post namespace breaks fstreams in programs, but compilers provide flags
to override that, so it depends on what you mean by breaking. Does it count if the
compiler by default breaks, but providing flags fixes it?
If it results in weird
runtime errors, that's definitely a problem.
If the compiler I'm using gives the message "incompatible use of fstream, try
'-fstreamcompat' flag", that's no big deal.
On a similar note, if something is marked deprecated long before it's removed, that
matters. Five years of compiler/interpreter warnings saying "deprecated use of
function in null context on line #47" gives plenty of opportunity. To fix it. From
the bit of Python I've worked with, the recommended method on Friday completely
stops working on Monday.
That's plainly not true - Python follows the established
deprecate-first-remove-next cycle. This is readily obvious when you look at the
changelogs. For example, from the 2.6 changelog:
The threading module API is being changed to use properties such as daemon
instead of setDaemon() and isDaemon() methods, and some methods have been
renamed to use underscores instead of camel-case; for example, the
activeCount() method is renamed to active_count(). Both the 2.6 and 3.0
versions of the module support the same properties and renamed methods, but
don't remove the old methods. No date has been set for the deprecation of the
old APIs in Python 3.x; the old APIs won't be removed in any 2.x version.
For another example, the ability to throw strings (rather than
BaseException-derived objects) was deprecated in 2.3 (2003) and finally removed
in 2.6 (2008).
For comparison, in the C++ land, dynamic exception specifications were
deprecated in C++11, and removed in C++17. So the time scale is comparable.
That's great that they deprecate something on some occasions.
MY experience with the Python I run is that one version gives no warning,
going up one point release throws multiple fatal errors.
> This is readily
obvious when you look at the changelogs
Maybe that's the thing - one has read the changelogs to see what is
deprecated, as opposed to getting a clear deprecation warning from the
interpreter/compiler like you would with C, Perl, and other languages?
It's possible that a Python expert might be able to
MY experience with the Python I run is that one version gives no
warning, going up one point release throws multiple fatal errors.
Can you give an example? I'm just not aware of any, and it makes me
suspect that what you were running into was an issue in a third-party
library (some of which do indeed have a cowboy attitude towards breaking
changes - but that's common across all languages).
Maybe that's the thing - one has read the changelogs to see what is
deprecated, as opposed to getting a clear deprecation warning from the
interpreter/compiler like you would with C, Perl, and other languages?
And I have never, ever seen a deprecation warning in C or C++. You have
to read the change sections for new standards to see what was deprecated
or removed.
> And I have never, ever seen a deprecation warning in C or C++. You
have to read the change sections for new standards to see what was
deprecated or removed.
The default with gcc is to warn about
deprecation.
You can turn the warnings off by setting the CFLAGS environment
variable to include -Wno-deprecated, which you can do in your
.bashrc oe wherever. What's most often recommended is -Wall to
show all warnings of all types.
For example, C++ pre and post namespace breaks fstreams in programs, but compilers
provide flags to override that
Dude, that was in 1990, back before there even was a standard C++. And I very much
doubt those flags still exist today.
program COBOL or RPG on punch cards and let me know about that curly brace issue
you're having
You seem to have forgotten how that really worked in your old age though. Punch
cards had columns with specific functions assigned to them, so yes, of course you
would have to skip certain columns on occasion. That was not indentation, though. You
didn't have indentation; moving your holes by one position or one column meant the
machine would interpret your instruction as something else ent
Static typing isn't just about clarity to the programmer. In strict typing languages, the
rule is to use the type that matches the range that actually applies. This is to help
testing (something coders should not ignore), automated validation, compilation (a compiler
can choose sensible values, optimise the code, etc etc etc) and maintainers (a clear
variable name won't tell anyone if a variable's range can be extended without impacting the
compiled code).
Besides, I've looked at Python code. I'm not convinc
Type annotations and docstrings help with the whole lack of type declaration thing. Of
course that requires discipline, which is in short supply from my experience. If you can
force your developers to run pylint that will at least complain when they don't have
docstrings.
The list is compiled from a restricted pool and lists popularity.
That may mean a vendor
throwing out ten individually packaged Python scripts counts as ten sources with one C
program of equalling counts as one. If that's the case, Python would be ten times as popular
in the stats whilst being equally popular in practice.
So if Python needed ten times as many modules to be as versatile, it would seem popular
whilst only being frustrating.
The fact is, we don't know their methodology. We don't know if they're weighting results
in any way to factor in workarounds and hacks due to language deficiency that might show up
as extra use.
We do know they don't factor in defect density, complexity or anything else like that as
they do say that much. So are ten published attempts at a working program worth ten times
one attempt in a language that makes it easy first time? We will never know.
I find java in an uncanny valley. Its still a few times slower than c++ for the sort of
stuff I do but it isn't enough quicker to develop than c++ to be worth that hit. Python is
far slower than java even using numpy but its so easy to develop in that it is worth the
gamble that it will be fast enough. And the rewrite in c++ will go quickly even if it isn't.
The title is because VBA is 11x faster than numpy at small dense matricies and almost as
easy to develop in.
Java is useful because you can throw a team of lowskill developers at it and they won't
mess things up beyond the point of unmaintanability. It will be a pain to maintain, sure,
but the same developers using C would make memory errors that push things beyond
hopeless, and if they were using Python or JavaScript the types would become more and
more jumbled as the size of the program increases that no one would be able to understand
it and things would start breaking more and more. Java enforces a minimal lev
Tiobe notes that Python's arrival in the top 3 "really took a long time," since it first
entered their chart at the beginning of the 1990s. But today, "It is already the first
choice at universities (for all kinds of subjects for which programming is demanded)
Undergraduate was all C/C++ for me then I ended up at a graduate school where everything was
Java. I disliked it so much that I decided to find an alternative and teach myself. I found
Python and loved it. I still love it. You can't find anything better for both heavy duty
programming and quick and dirty scripting. It's versatility makes It like the Linux of
programming languages.
I found Python and loved it. I still love it. You can't find anything better for both
heavy duty programming...
What? Python is hopelessly inefficient for heavy duty programming, unless you happen to
be doing something that is mainly handled by a Python library, written in C. Python's
interface to C disgusting, so if you have a lot of small operations handled by a C library,
you will get pathetic performance.
It really isn't. There are some apps that actually need something faster and a lot of
apps that don't. It really doesn't help if a faster executable ends up waiting for I./O.
It really is
[debian.net] and you blathering about what you don't know does not
change that fact. (Python 14 minutes vs C++ 8..24 seconds for N-Body simulation.)
Well, since 99.99999999999% of all software run by literally everybody is an n-body
simulation....
That would be an example of the "some apps" I spoke of. I note
that Intel Fortran was at the top of the list (not surprising). Would ifort be your
first choice if you were writing a text editor or a tar file splitter? How about an
smtp daemon?
Well, since 99.99999999999% of all software run by literally everybody is an
n-body simulation..
Explaining the concept of "compute intensive" to you makes me feel more
stupid. Check out
any
of the compute intensive Python benchmarks.
Consider not waving your ignorance around quite so much.
Having actually built a cluster that was in the top 500 for a while, I am
well acquainted with compute intensive applications. I am also aware that
compute intensive is a subset of "heavy duty" programming which is a subset
of general programming.
Now, pull your head out of your ass and look
around, you might learn something. And while you're at it, consider working
on your social skills.
Either you understand that Python is crap for compute intensive work, or
you are lying about building a cluster. Or you just connected the cables,
more like it, and really don't have a clue about how to use it.
I do understand that python isn't the right choice for compute
intensive work. With the exception that if it is great for doing setup
for something in FORTRAN or C that does the heavy lifting.
I am
certain that YOU don't understand that compute intensive work is a
small fraction of what is done on computers. For example, I/O intensive
work doesn't really care if it is Python or FORTRAN that is waiting for
I/O to complete. There is a reason people don't drive a top fuel
dragster to work.
I am certain that YOU don't understand that compute intensive
work is a small fraction of what is done on computers.
First, you have no idea what I do or do not understand because
you find yourself way too entertained by your own blather, and
second, computers are used more for browsing than any other single
task these days, and wasteful use of the CPU translates into
perceptible lag. Playing media is very CPU intensive. You don't
write those things in Python because Python sucks for efficiency. My
point.
Yes, I had you figured, you're a sysadmin with delusions about
being a dev. Seen way too many of those. They tend to ta
As I said, "Python is hopelessly inefficient for heavy duty
programming". WTF are you blathering on about. Fresh air is good for
you, maybe get out of your basement more.
It really is
[debian.net] and you blathering about what you don't know does
not change that fact. (Python 14 minutes vs C++ 8..24 seconds for N-Body
simulation.)
I've just run it on my machine. C++: 2.3 seconds, Python: 22 seconds. That's for
straightforward mathy Python against C++ code with vector instrinsics. Concerning
C++ code without manual vectorization, it's 4 seconds against 22. Not terribly bad,
I'd say. Not to mention that this isn't the kind of code that would be typical for
a larger application.
The first python program I wrote was a test for a job interview.
It involved downloading meteorologic data from the internet.
Analyzing it, creating a kind of summary and using a graph plotting library
to display a graph (generate a *.png)
It would not have been noticeable
faster if I had written it in C++, because
... you know:
downloading via a network.
Then you looked at it without understanding it. Do you seriously
think you can out-optimize gcc's code generator? Do you even know
how to use LEA for arithmetic?
Is there a programming language out there, that is as fast as C++ or even C, has a proper
strict type system (no duck typing, nothing like Python or JS), fast garbage collection (no
fuckin' auto_pointer worst-of-both-worlds), is elegant and emergent (so very powerful for its
simplicity), and doesn't require an advanced degree in computer sciences to do simple things
(Hello Haskell!).
Of course with key libraries being available for it. (The equivalent of a standard library,
Vulcan, a simple GUI widget toolki
I really like the Qt framework. It's well done, well documented and well supported. Sure
it's C++ so it doesn't meet your need of finding a new language but the API is pretty clean
and simple so that you can avoid the complications and ugliness of C++ in most cases. If you
need to though, it's all right there so you don't give up the additional power if you need
it. The Python version is good too and very similar to the C++ version so it's not hard to
switch between languages as your needs change.
Is there a programming language out there, that is as fast as C++ or even C, has a
proper strict type system (no duck typing, nothing like Python or JS), fast garbage
collection
No.
Neither will there be. There's always a penalty for garbage collection.
Is there a programming language out there, that is as fast as C++ or even C, has a
proper strict type system (no duck typing, nothing like Python or JS), fast garbage
collection
No.
Neither will there be. There's always a penalty for garbage collection.
I think go is the closest to your requested feature list.
I think go is the closest to your requested feature list.
The GP, not mine.
And yuck, no thanks. Go just seems, well, deeply mediocre in many places. It's like
someone pdated C, ignoreing the last 40 years of language developments.
Sure I can program without generics, I'm at a loss to see why I'd want to though.
Its performance is good enough that I'm able
to drop C++ (I'm a mathematical modeller), it's amazing at multidimensional array
manipulation, and its typing system is really good. It just feels nice to program in. Bonus,
one of the inspirations was Lisp, so it's got good metaprogramming. Also it's free software,
made by people at MIT, so your conscience can remain appeased.
It's still a young language, but libraries are being built for it at an impressive rate,
and i
That advocated a language. Languages shift faster than sand on speed. Universities should teach
logic, reasoning, methodology, good practices and programming technique.
Languages should be
for the purpose of example only. Universities should teach programming, not Java, software
engineering, not Python. Java and Python should be in there, yes, along with Perl, C and Ada.
Syntax is just sugar over the semantics. Learn the semantics well and the syntax is irrelevant.
You want universities to teach kids how to
when Cobol and Fortran were the in thing. Last forever, they thought.
Any evidence most universities believed that? (They are still around and relatively
common, by the way.)
Universities have to pick something to program lesson projects in, and selecting
language(s) common in the current market helps student job prospects. I suggest STEM
students be required to learn at least one compiled/strong-typed language, and one
script/dynamic language.
My university (Manchester University, UK) certainly didn't pick a language.
We studied
many languages, compiler design, formal semantics and a boatload of other computer
science things but at no point did they try to teach me a programming language. In fact
at induction they said explicitly that they expected us all to know how to program before
we arrived.
(note: this is a UK perspective, other places may vary)
Universities have to work
with the students they can get.
I think you and your co-students were lucky to catch the height of the 80s
microcomputer boom, the time when computers booted into BASIC, when using a computer
pretty much meant programming it.
Then the windows PCs with their pre-canned applications and no obvious programming
language swept in. Leaning to program now meant not just finding a suitible book, it
often meant buying the programming lang
Didn't you have projects that involved turning in your code to the teacher/graders?
The graders don't want to see every which language. Multi-lingual graders are more
expensive. Most colleges dictate a narrow set of languages for such projects.
Didn't you have projects that involved turning in your code to the
teacher/graders? The graders don't want to see every which language.
Multi-lingual graders are more expensive. Most colleges dictate a narrow set of
languages for such projects.
Yes, but it was hardly narrow. We had homework to hand in using a variety of
languages, depending on the course. Pascal tended to be used for general algorithm
stuff. But Smalltalk, Prolog, ML and other usual suspects were used when they made
sense for the course. You were supposed to leave with a CompSci degree where you
understood the theory of languages more than the details of specific languages.
Usually, for project work, you were free to choose your language and would be
expected to justify the reason
Tablizer
( 95088 )
writes:
That's a pretty big jump. Groovy is a well-thought-out language and nicely facilitates
writing clean, readable, compact code (especially compared to Java). However, it needs a better
framework than Grails (85% really good convention over configuration stuff but 15% convoluted
j2ee era framework stuff).
Can someone explain to me why using a dynamically typed language is a good idea for "big"
applications ?
Python is subject to all sorts of really horrendous bugs that would not happen
in a compiled, type-checked language.
For example if you are accessing an undefined variable in the else branch of an if
statement, you won't know it's undefined unless that branch is taken. which means if it's
something like a rarely occurring error condition it's kind of annoying. yes you can figure it
out by writing enough t
It's really simple, Writing an application in Python is x3 quicker than writing it in
C/C++/Java, etc... That means you either get to market 3x faster or only need 1/3 the number
of programmers. Everything else is completely and utterly irrelevant. "you won't know it's
undefined unless that branch is taken" The code linter built into your Python IDE, will tell
you about it.
Hogwash. Even if x3 were true, Dev is roughly 20-40% of overall software cost. Unless
you're arguing that every aspect of coding is reliably 3x faster in Python. Given the
value of strong typing when refactoring, I'd wager python is not even competitive price
wise past the proof of concept/one-off script scenario.
I'm am arguing that because it's true. There isn't much benefit to strong typing when
refactoring but the benefits of duck typing when it comes to unit testing are quite
significant. I've done commercial software development in strong and weakly typed
languages before. The benefits of things like "strong typing" are generally not that
much. If you are on board with the whole agile bandwagon and writing unit tests and
all that. You would be much better off with Python's significally better unit testing
facilities than strong typing.
I'm at least in the caravan trailing the agile/unit-test bandwagon, but those are
orthogonal to typing (and being explicit generally). Looking at a method signature
and knowing that it requires a decimal and enumeration of a given type is more than
a run-time check; it provides information about the intent, limitations, and
discoverability options. There are very real trade-offs for the speed and
flexibility of a language like Python, and my view is that it's more jet-fuel than
solar power.
If you are using a good IDE "provides information about the intent, limitations,
and discoverability options" can all be found out with a couple of key strokes
(git blame, find all usages, pylint, etc..). So putting that information into
the language explicitly is an obsolete and backwards way of going about things
:p The job of the compiler in Python has just been redistributed
elsewhere. It's different but there are many ways to solve the same problems.
Six of one, half dozen of the other. The Python program will be smaller for the same
functionality and it won't have buffer overflows and memory leaks The C program will run
faster (unless it has to wait on I/O) and will check for variables used before assignment.
Using an undefined variable in Python triggers an exception, and you get a traceback. In a
larger program you will normally have a system for capturing and storing such tracebacks for
analysis, and with the traceback in hand, it's typically a very simple fix.
In C++ you get
an incorrect value created by default-initialisation (or maybe undefined behaviour): the
program hobbles along as best it can, and you may never find the problem. You just see your
program behaving strangely sometimes, and as the program gets larger, those strange
behaviours accumulate.
Python is subject to all sorts of really horrendous bugs that would not happen in a
compiled, type-checked language.
Horrendous is not the right word. Bugs that come with tracebacks are simple bugs. Zen#10:
"Errors should never pass silently" is exactly what you want in large-scale programming.
Writing big applications in Java/C++ takes too long. And then managements decide to avoid
'custom code' in favor of 'standard' vendor tools where you can drag and drop to build parts
of the 'big' application. This applies to ETL, reporting, messaging to name a few. With
Python, the development cycle shortens and you can still stick to writing code instead of
dealing with vendor binaries, lock-ins, licensing etc. Python with strong emphasis on unit
tests, coupled with plugging in C/C++ where necessary f
I think what has really propelled Python into a higher rank is machine learning, where it is
simply the de-facto language of choice by quite a margin.
I have to admit I am impressed with
the progress it has made; of many recent CS grads I've talked to it seemed to be the favorite
language...
I have to admit that over the years I've not really enjoyed Python much myself in the on and
off again times I've used it, for me the spaces as indent levels maybe get too close to the
meaningful whitespace of Fortran... I guess modern programmers do not have this hangup.
:-)
That's a great point, and to be honest that is probably a better language for learning
than Java... it would also explain why newer CS grads all like it more now.
The only
downside is that most jobs are still using Java or something besides python... but
probably it means we'll se more python used in businesses I guess. That usually ends up
following eventually.
I have been sporadically using Python for some years already and never really liked it. Note
that most of my experience is focused on C-based and strongly-typed programming languages.
Recently, I have been spending some time on a Python project and have realised about its
(newbie) friendliness.
I still don't quite understand the reason for all the tabs/spaces problems, consider it too
slow, don't like the systematic need of relying on external resources and I will certainly
continue using other languages before it. But I do understand now why newbies or those
performing relatively small developments or those wanting to rely on some of the associated
resources (although I don't like being systematically forced to include external dependencies,
I do recognise that Python deals with these aspects quite gracefully and that there are many
interesting libraries) might prefer Python. It is one of the most intuitive programming
languages which I have ever used, at least from what seems a newcomer perspective (e.g., same
command performing what are intuitively seen as similar actions).
I don't think this should be about lines of code written. A more interesting approach would be
to also count all dependencies, counting things like libc a gazillion times. Even more
interesting would be to count what's actually executed.
"Or any of the games in my library, which all appear to be C or C++, with a few C#."
From
what I've seen a lot of the core engine stuff is C/C++; but a lot of the UI, AI, and "mod
support" stuff is commonly done in Python and Lua.
Personally, I disagree with semantic whitespace so I don't like python. (I think its the
editors job to handle pretty formatting to reflect the structure, rather than the
programmers job to define structure with pretty formatting.) But I can see why python would
be a good learning l
I sort of see it as the opposite... semantic
whitespace teaches mostly good habits, its just fucking irritating to maintain, and to
work with snippets and code fragments etc.
But its highly readable, and pretty straightforward, and i don't see anything wrong
with it as a beginning/educational language; for teaching flow control, algorithms,
structured/modular programming, and so on.
Back in April 2010, Russ Cox charitably suggested that
only
fannkuch-redux, fasta, k-nucleotide, mandlebrot, nbody,
reverse-complement and spectral-norm were close to
fair
comparisons. As someone who implemented programming
languages, his interest was "measuring the quality of the generated code when both compilers are presented with what amounts
to the same program."
Differences in approach - to memory management, parallel programming, regex, arbitrary precision arithmetic,
implementation technique - don't fit in that kind-of
fair
comparison -- but we still have to deal with them.
These are only the fastest programs. There may be additional measurements for programs which seem more-like a
fair
comparison to you. Always look at the source code.
How do I change the value of a variable in the package used by a module so that subroutines
in that module can use it?
Here's my test case:
testmodule.pm:
package testmodule;
use strict;
use warnings;
require Exporter;
our ($VERSION, @ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS);
@ISA = qw(Exporter);
@EXPORT = qw(testsub);
my $greeting = "hello testmodule";
my $var2;
sub testsub {
printf "__PACKAGE__: %s\n", __PACKAGE__;
printf "\$main::greeting: %s\n", $main::greeting;
printf "\$greeting: %s\n", $greeting;
printf "\$testmodule::greeting: %s\n", $testmodule::greeting;
printf "\$var2: %s\n", $var2;
} # End testsub
1;
testscript.pl:
#!/usr/bin/perl -w
use strict;
use warnings;
use testmodule;
our $greeting = "hello main";
my $var2 = "my var2 in testscript";
$testmodule::greeting = "hello testmodule from testscript";
$testmodule::var2 = "hello var2 from testscript";
testsub();
output:
Name "testmodule::var2" used only once: possible typo at ./testscript.pl line 11.
__PACKAGE__: testmodule
$main::greeting: hello main
$greeting: hello testmodule
$testmodule::greeting: hello testmodule from testscript
Use of uninitialized value $var2 in printf at testmodule.pm line 20.
$var2:
I expected $greeting and $testmodule::greeting to be the same
since the package of the subroutine is testmodule .
I guess this has something to do with the way use d modules are
eval d as if in a BEGIN block, but I'd like to understand it
better.
I was hoping to set the value of the variable from the main script and use it in the
module's subroutine without using the fully-qualified name of the variable.
As you found out, when you use my , you are creating a locally scoped
non-package variable. To create a package variable, you use our and not
my :
my $foo = "this is a locally scoped, non-package variable";
our $bar = "This is a package variable that's visible in the entire package";
Even better:
{
my $foo = "This variable is only available in this block";
our $bar = "This variable is available in the whole package":
}
print "$foo\n"; #Whoops! Undefined variable
print "$bar\n"; #Bar is still defined even out of the block
When you don't put use strict in your program, all variables defined are
package variables. That's why when you don't put it, it works the way you think it should and
putting it in breaks your program.
However, as you can see in the following example, using our will solve your
dilemma:
File Local/Foo.pm
#! /usr/local/bin perl
package Local::Foo;
use strict;
use warnings;
use feature qw(say);
use Exporter 'import';
our @EXPORT = qw(testme);
our $bar = "This is the package's bar value!";
sub testme {
# $foo is a locally scoped, non-package variable. It's undefined and an error
say qq(The value of \$main::foo is "$main::foo");
# $bar is defined in package main::, and will print out
say qq(The value of \$main::bar is "$main::bar");
# These both refer to $Local::Foo::bar
say qq(The value of \$Local::Foo::bar is "$Local::Foo::bar");
say qq(The value of bar is "$bar");
}
1;
File test.pl
#! /usr/local/bin perl
use strict;
use warnings;
use feature qw(say);
use Local::Foo;
my $foo = "This is foo";
our $bar = "This is bar";
testme;
say "";
$Local::Foo::bar = "This is the NEW value for the package's bar";
testme
And, the output is:
Use of uninitialized value $foo in concatenation (.) or string at Local/Foo.pm line 14.
The value of $main::foo is ""
The value of $main::bar is "This is bar"
The value of $Local::Foo::bar is "This is the package's bar value!"
The value of bar is "This is the package's bar value!"
Use of uninitialized value $foo in concatenation (.) or string at Local/Foo.pm line 14.
The value of $main::foo is ""
The value of $main::bar is "This is bar"
The value of $Local::Foo::bar is "This is the NEW value for the package's bar"
The value of bar is "This is the NEW value for the package's bar"
The error message you're getting is the result of $foo being a local
variable, and thus isn't visible inside the package. Meanwhile, $bar is a
package variable and is visible.
Sometimes, it can be a bit tricky:
if ($bar -eq "one") {
my $foo = 1;
}
else {
my $foo = 2;
}
print "Foo = $foo\n";
That doesn't work because $foo only bas a value inside the if
block. You have to do this:
Yes, it can be a bit to get your head wrapped around it initially, but the use of
use strict; and use warnings; is now de rigueur and for good
reasons. The use of use strict; and use warnings; probably has
eliminated 90% of the mistakes people make in Perl. You can't make a mistake of setting the
value of $foo in one part of the program, and attempting to use
$Foo in another. It's one of the things I really miss in Python.
> ,
After reading Variable
Scoping in Perl: the basics more carefully, I realized that a variable declared with
my isn't in the current package. For example, in a simple script with no modules
if I declare my $var = "hello"$main::var still doesn't have a
value.
The way that this applies in this case is in the module. Since my $greeting
is declared in the file, that hides the package's version of $greeting and
that's the value which the subroutine sees. If I don't declare the variable first, the
subroutine would see the package variable, but it doesn't get that far because I use
strict .
If I don't use strict and don't declare my $greeting , it works
as I would have expected. Another way to get the intended value and not break use
strict is to use our $greeting . The difference being that my declares a variable in the
current scope while our declares a variable in the current
package .
I want to repeatedly search for values in an array that does not change.
So far, I have been doing it this way: I put the values in a hash (so I have an array and a hash with essentially the same
contents) and I search the hash using exists .
I don't like having two different variables (the array and the hash) that both store the same thing; however, the hash is much
faster for searching.
I found out that there is a ~~ (smartmatch) operator in Perl 5.10. How efficient is it when searching for a scalar
in an array?
> ,
If you want to search for a single scalar in an array, you can use
List::Util 's first subroutine. It stops as soon
as it knows the answer. I don't expect this to be faster than a hash lookup if you already have the hash , but when you
consider creating the hash and having it in memory, it might be more convenient for you to just search the array you already have.
As for the smarts of the smart-match operator, if you want to see how smart it is, test it. :)
There are at least three cases you want to examine. The worst case is that every element you want to find is at the end. The
best case is that every element you want to find is at the beginning. The likely case is that the elements you want to find average
out to being in the middle.
Now, before I start this benchmark, I expect that if the smart match can short circuit (and it can; its documented in
perlsyn ), that the best case times will stay the same despite
the array size, while the other ones get increasingly worse. If it can't short circuit and has to scan the entire array every
time, there should be no difference in the times because every case involves the same amount of work.
Here's a benchmark:
#!perl
use 5.12.2;
use strict;
use warnings;
use Benchmark qw(cmpthese);
my @hits = qw(A B C);
my @base = qw(one two three four five six) x ( $ARGV[0] || 1 );
my @at_end = ( @base, @hits );
my @at_beginning = ( @hits, @base );
my @in_middle = @base;
splice @in_middle, int( @in_middle / 2 ), 0, @hits;
my @random = @base;
foreach my $item ( @hits ) {
my $index = int rand @random;
splice @random, $index, 0, $item;
}
sub count {
my( $hits, $candidates ) = @_;
my $count;
foreach ( @$hits ) { when( $candidates ) { $count++ } }
$count;
}
cmpthese(-5, {
hits_beginning => sub { my $count = count( \@hits, \@at_beginning ) },
hits_end => sub { my $count = count( \@hits, \@at_end ) },
hits_middle => sub { my $count = count( \@hits, \@in_middle ) },
hits_random => sub { my $count = count( \@hits, \@random ) },
control => sub { my $count = count( [], [] ) },
}
);
div class="answercell post-layout--right
,
Here's how the various parts did. Note that this is a logarithmic plot on both axes, so the slopes of the plunging lines aren't
as close as they look:
So, it looks like the smart match operator is a bit smart, but that doesn't really help you because you still might have to
scan the entire array. You probably don't know ahead of time where you'll find your elements. I expect a hash will perform the
same as the best case smart match, even if you have to give up some memory for it.
Okay, so the smart match being smart times two is great, but the real question is "Should I use it?". The alternative is a
hash lookup, and it's been bugging me that I haven't considered that case.
As with any benchmark, I start off thinking about what the results might be before I actually test them. I expect that if I
already have the hash, looking up a value is going to be lightning fast. That case isn't a problem. I'm more interested in the
case where I don't have the hash yet. How quickly can I make the hash and lookup a key? I expect that to perform not so well,
but is it still better than the worst case smart match?
Before you see the benchmark, though, remember that there's almost never enough information about which technique you should
use just by looking at the numbers. The context of the problem selects the best technique, not the fastest, contextless micro-benchmark.
Consider a couple of cases that would select different techniques:
You have one array you will search repeatedly
You always get a new array that you only need to search once
You get very large arrays but have limited memory
Now, keeping those in mind, I add to my previous program:
my %old_hash = map {$_,1} @in_middle;
cmpthese(-5, {
...,
new_hash => sub {
my %h = map {$_,1} @in_middle;
my $count = 0;
foreach ( @hits ) { $count++ if exists $h{$_} }
$count;
},
old_hash => sub {
my $count = 0;
foreach ( @hits ) { $count++ if exists $old_hash{$_} }
$count;
},
control_hash => sub {
my $count = 0;
foreach ( @hits ) { $count++ }
$count;
},
}
);
Here's the plot. The colors are a bit difficult to distinguish. The lowest line there is the case where you have to create
the hash any time you want to search it. That's pretty poor. The highest two (green) lines are the control for the hash (no hash
actually there) and the existing hash lookup. This is a log/log plot; those two cases are faster than even the smart match control
(which just calls a subroutine).
There are a few other things to note. The lines for the "random" case are a bit different. That's understandable because each
benchmark (so, once per array scale run) randomly places the hit elements in the candidate array. Some runs put them a bit earlier
and some a bit later, but since I only make the @random array once per run of the entire program, they move around
a bit. That means that the bumps in the line aren't significant. If I tried all positions and averaged, I expect that "random"
line to be the same as the "middle" line.
Now, looking at these results, I'd say that a smart-match is much faster in its worst case than the hash lookup is in its worst
case. That makes sense. To create a hash, I have to visit every element of the array and also make the hash, which is a lot of
copying. There's no copying with the smart match.
Here's a further case I won't examine though. When does the hash become better than the smart match? That is, when does the
overhead of creating the hash spread out enough over repeated searches that the hash is the better choice?
,
Fast for small numbers of potential matches, but not faster than the hash. Hashes are really the right tool for testing set membership.
Since hash access is O(log n) and smartmatch on an array is still O(n) linear scan (albeit short-circuiting, unlike grep), with
larger numbers of values in the allowed matches, smartmatch gets relatively worse. Benchmark code (matching against 3 values):
#!perl
use 5.12.0;
use Benchmark qw(cmpthese);
my @hits = qw(one two three);
my @candidates = qw(one two three four five six); # 50% hit rate
my %hash;
@hash{@hits} = ();
sub count_hits_hash {
my $count = 0;
for (@_) {
$count++ if exists $hash{$_};
}
$count;
}
sub count_hits_smartmatch {
my $count = 0;
for (@_) {
$count++ when @hits;
}
$count;
}
say count_hits_hash(@candidates);
say count_hits_smartmatch(@candidates);
cmpthese(-5, {
hash => sub { count_hits_hash((@candidates) x 1000) },
smartmatch => sub { count_hits_smartmatch((@candidates) x 1000) },
}
);
I'm pretty sure stdout keeps all output, it's a stream object with a buffer. I use a very similar technique to deplete all
remaining output after a Popen have completed, and in my case, using poll() and readline during the execution to capture
output live also. –
Max Ekman Nov 28 '12 at 21:55
I've removed my misleading comment. I can confirm, p.stdout.readline() may return the non-empty
previously-buffered output even if the child process have exited already ( p.poll() is not None ). – jfs Sep 18 '14 at 3:12
I'm currently studying penetration testing and Python
programming. I just want to know how I would go about executing a Linux command in Python.
The commands I want to execute are:
Better yet, you can use subprocess's call, it is safer, more powerful and likely
faster:
from subprocess import call
call('echo "I like potatos"', shell=True)
Or, without invoking shell:
call(['echo', 'I like potatos'])
If you want to capture the output, one way of doing it is like this:
import subprocess
cmd = ['echo', 'I like potatos']
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
o, e = proc.communicate()
print('Output: ' + o.decode('ascii'))
print('Error: ' + e.decode('ascii'))
print('code: ' + str(proc.returncode))
I highly recommend setting a timeout in communicate , and
also to capture the exceptions you can get when calling it. This is a very error-prone code,
so you should expect errors to happen and handle them accordingly.
The first command simply writes to a file. You wouldn't execute that as a shell command
because python can read and write to files without the help of a shell:
with open('/proc/sys/net/ipv4/ip_forward', 'w') as f:
f.write("1")
The iptables command is something you may want to execute externally. The
best way to do this is to use the subprocess module .
This isn't the most flexible approach; if you need any more control over your process than
"run it once, to completion, and block until it exits", then you should use the
subprocess module instead.
This would make you depend on an external lib, so you have to weight the benefits. Using
subprocess works, but if you want to use the output, you'll have to parse it yourself, and
deal with output changes in future iptables versions.
,
A python version of your shell. Be careful, I haven't tested it.
from subprocess import run
def bash(command):
run(command.split())
>>> bash('find / -name null')
/dev/null
/sys/fs/selinux/null
/sys/devices/virtual/mem/null
/sys/class/mem/null
/usr/lib/kbd/consoletrans/null
Is there any static code analysis module in Perl except B::Lint and Perl::Critic? How
effective is Module::Checkstyle?
> ,
There is a post on
perlmonks.org asking if PPI can be used for static analysis. PPI is the power behind
Perl::Critic, according to the reviews of this module. (I have not used it yet).
Is there any static code analysis module in Perl except B::Lint and Perl::Critic? How
effective is Module::Checkstyle?
> ,
There is a post on
perlmonks.org asking if PPI can be used for static analysis. PPI is the power behind
Perl::Critic, according to the reviews of this module. (I have not used it yet).
I want to repeatedly search for values in an array that does not change.
So far, I have been doing it this way: I put the values in a hash (so I have an array
and a hash with essentially the same contents) and I search the hash using
exists .
I don't like having two different variables (the array and the hash) that both store
the same thing; however, the hash is much faster for searching.
I found out that there is a ~~ (smartmatch) operator in Perl 5.10. How
efficient is it when searching for a scalar in an array?
> ,
If you want to search for a single scalar in an array, you can use List::Util 's first
subroutine. It stops as soon as it knows the answer. I don't expect this to be faster
than a hash lookup if you already have the hash , but when you consider creating
the hash and having it in memory, it might be more convenient for you to just search the
array you already have.
As for the smarts of the smart-match operator, if you want to see how smart it is,
test it. :)
There are at least three cases you want to examine. The worst case is that every
element you want to find is at the end. The best case is that every element you want to
find is at the beginning. The likely case is that the elements you want to find average
out to being in the middle.
Now, before I start this benchmark, I expect that if the smart match can short circuit
(and it can; its documented in perlsyn ), that the best case times will stay
the same despite the array size, while the other ones get increasingly worse. If it can't
short circuit and has to scan the entire array every time, there should be no difference
in the times because every case involves the same amount of work.
Here's a benchmark:
#!perl
use 5.12.2;
use strict;
use warnings;
use Benchmark qw(cmpthese);
my @hits = qw(A B C);
my @base = qw(one two three four five six) x ( $ARGV[0] || 1 );
my @at_end = ( @base, @hits );
my @at_beginning = ( @hits, @base );
my @in_middle = @base;
splice @in_middle, int( @in_middle / 2 ), 0, @hits;
my @random = @base;
foreach my $item ( @hits ) {
my $index = int rand @random;
splice @random, $index, 0, $item;
}
sub count {
my( $hits, $candidates ) = @_;
my $count;
foreach ( @$hits ) { when( $candidates ) { $count++ } }
$count;
}
cmpthese(-5, {
hits_beginning => sub { my $count = count( \@hits, \@at_beginning ) },
hits_end => sub { my $count = count( \@hits, \@at_end ) },
hits_middle => sub { my $count = count( \@hits, \@in_middle ) },
hits_random => sub { my $count = count( \@hits, \@random ) },
control => sub { my $count = count( [], [] ) },
}
);
div class="answercell post-layout--right
,
Here's how the various parts did. Note that this is a logarithmic plot on both axes, so
the slopes of the plunging lines aren't as close as they look:
So, it looks like the smart match operator is a bit smart, but that doesn't really
help you because you still might have to scan the entire array. You probably don't know
ahead of time where you'll find your elements. I expect a hash will perform the same as
the best case smart match, even if you have to give up some memory for it.
Okay, so the smart match being smart times two is great, but the real question is
"Should I use it?". The alternative is a hash lookup, and it's been bugging me that I
haven't considered that case.
As with any benchmark, I start off thinking about what the results might be before I
actually test them. I expect that if I already have the hash, looking up a value is going
to be lightning fast. That case isn't a problem. I'm more interested in the case where I
don't have the hash yet. How quickly can I make the hash and lookup a key? I expect that
to perform not so well, but is it still better than the worst case smart match?
Before you see the benchmark, though, remember that there's almost never enough
information about which technique you should use just by looking at the numbers. The
context of the problem selects the best technique, not the fastest, contextless
micro-benchmark. Consider a couple of cases that would select different techniques:
You have one array you will search repeatedly
You always get a new array that you only need to search once
You get very large arrays but have limited memory
Now, keeping those in mind, I add to my previous program:
my %old_hash = map {$_,1} @in_middle;
cmpthese(-5, {
...,
new_hash => sub {
my %h = map {$_,1} @in_middle;
my $count = 0;
foreach ( @hits ) { $count++ if exists $h{$_} }
$count;
},
old_hash => sub {
my $count = 0;
foreach ( @hits ) { $count++ if exists $old_hash{$_} }
$count;
},
control_hash => sub {
my $count = 0;
foreach ( @hits ) { $count++ }
$count;
},
}
);
Here's the plot. The colors are a bit difficult to distinguish. The lowest line there
is the case where you have to create the hash any time you want to search it. That's
pretty poor. The highest two (green) lines are the control for the hash (no hash actually
there) and the existing hash lookup. This is a log/log plot; those two cases are faster
than even the smart match control (which just calls a subroutine).
There are a few other things to note. The lines for the "random" case are a bit
different. That's understandable because each benchmark (so, once per array scale run)
randomly places the hit elements in the candidate array. Some runs put them a bit earlier
and some a bit later, but since I only make the @random array once per run
of the entire program, they move around a bit. That means that the bumps in the line
aren't significant. If I tried all positions and averaged, I expect that "random" line to
be the same as the "middle" line.
Now, looking at these results, I'd say that a smart-match is much faster in its worst
case than the hash lookup is in its worst case. That makes sense. To create a hash, I
have to visit every element of the array and also make the hash, which is a lot of
copying. There's no copying with the smart match.
Here's a further case I won't examine though. When does the hash become better than
the smart match? That is, when does the overhead of creating the hash spread out enough
over repeated searches that the hash is the better choice?
,
Fast for small numbers of potential matches, but not faster than the hash. Hashes are
really the right tool for testing set membership. Since hash access is O(log n) and
smartmatch on an array is still O(n) linear scan (albeit short-circuiting, unlike grep),
with larger numbers of values in the allowed matches, smartmatch gets relatively worse.
Benchmark code (matching against 3 values):
#!perl
use 5.12.0;
use Benchmark qw(cmpthese);
my @hits = qw(one two three);
my @candidates = qw(one two three four five six); # 50% hit rate
my %hash;
@hash{@hits} = ();
sub count_hits_hash {
my $count = 0;
for (@_) {
$count++ if exists $hash{$_};
}
$count;
}
sub count_hits_smartmatch {
my $count = 0;
for (@_) {
$count++ when @hits;
}
$count;
}
say count_hits_hash(@candidates);
say count_hits_smartmatch(@candidates);
cmpthese(-5, {
hash => sub { count_hits_hash((@candidates) x 1000) },
smartmatch => sub { count_hits_smartmatch((@candidates) x 1000) },
}
);
Environment variables are accessed through os.environ
import os
print(os.environ['HOME'])
Or you can see a list of all the environment variables using:
os.environ
As sometimes you might need to see a complete list!
# using get will return `None` if a key is not present rather than raise a `KeyError`
print(os.environ.get('KEY_THAT_MIGHT_EXIST'))
# os.getenv is equivalent, and can also give a default value instead of `None`
print(os.getenv('KEY_THAT_MIGHT_EXIST', default_value))
Python default
installation on Windows is C:\Python . If you want to find out while running
python you can do:
import sys
print(sys.prefix)
,
import sys
print sys.argv[0]
This will print foo.py for python foo.py ,
dir/foo.py for python dir/foo.py , etc. It's the first argument to
python . (Note that after py2exe it would be foo.exe .)
I usually use sys.platform ( docs ) to get the platform.
sys.platform will distinguish between linux, other unixes, and OS X, while
os.name is " posix " for all of them.
For much more detailed information, use the platform module . This has
cross-platform functions that will give you information on the machine architecture, OS and
OS version, version of Python, etc. Also it has os-specific functions to get things like the
particular linux distribution.
This gives you the essential information you will usually need. To distinguish between,
say, different editions of Windows, you will have to use a platform-specific method.
To complement Greg's post, if you're on a posix system, which includes MacOS, Linux, Unix,
etc. you can use os.uname() to get a better feel for what kind of system it is.
> ,
Something along the lines:
import os
if (os.name == "posix"):
print os.system("uname -a")
# insert other possible OSes here
# ...
else:
print "unknown OS"
I have one problem regarding using python to process the trace file (it contains billion
lines of data).
What I want to do is, the program will find one specific line in the file (say it is line#
x), and it needs to find another symbol from this (line# x) in the file. Once it finds the
line, starts from (line# x) again to search another one.
What I did now, is as following, but the problem is it always needs to reopen the file and
read from the beginning to find the match ones (line # > x, and contain the symbol I
want). For one big trace file, it takes too long to processing.
1.
for line in file.readlines()
i++ #update the line number
if i > x:
if (line.find()):
or:
for i, line in enumerate(open(file)):
if i > x:
if ....
If the file is otherwise stable, use fileobj.tell() to
remember your position in the file, then next time use fileobj.seek(pos)
to return to that same position in the file.
This only works if you do not use the fileobject as an iterator (no for line
in fileobject) or next(fileobject) ) as that uses a read-ahead buffer
that will obscure the exact position.
I suggest you use random access, and record where your line started. Something like:
index = []
fh = open(gash.txt)
for line in fh:
if target in line:
index.append(fh.tell() - len(line))
Then, when you want to recall the contents, use fh.seek(index[n]) .
A couple of "gotchas":
Notice that the index position will not be the same as the line number. If you need the
line number then maybe use a dictionary, with the line number as the key.
On Windows, you will have to adjust the file position by -1. This is because the "\r"
is stripped out and does not appear in the len(line) .
What happens when a
Benevolent Dictator For Life moves on from an open source project? up
2 comments Image credits : Original photo by Gabriel Kamener, Sown Together, Modified by
Jen Wike Huger x Subscribe now
Get the highlights in your inbox every week.
https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0
Guido van Rossum ,
creator of the Python
programming language and Benevolent Dictator For Life
(BDFL) of the project, announced his intention to step away.
Below is a portion of his message, although the entire email is
not terribly long and worth taking the time to read if you're interested in the circumstances
leading to van Rossum's departure.
I would like to remove myself entirely from the decision process. I'll still be there for
a while as an ordinary core dev, and I'll still be available to mentor people -- possibly
more available. But I'm basically giving myself a permanent vacation from being BDFL, and you
all will be on your own.
After all that's eventually going to happen regardless -- there's still that bus lurking
around the corner, and I'm not getting younger... (I'll spare you the list of medical
issues.)
I am not going to appoint a successor.
So what are you all going to do? Create a democracy? Anarchy? A dictatorship? A
federation?
It's worth zooming out for a moment to consider the issue at a larger scale. How an open
source project is governed can have very real consequences on the long-term sustainability of
its user and developer communities alike.
BDFLs tend to emerge from passion projects, where a single individual takes on a project
before growing a community around it. Projects emerging from companies or other large
organization often lack this role, as the distribution of authority is more formalized, or at
least more dispersed, from the start. Even then, it's not uncommon to need to figure out how to
transition from one form of project governance to another as the community grows and
expands.
But regardless of how an open source project is structured, ultimately, there needs to be
some mechanism for deciding how to make technical decisions. Someone, or some group, has to
decide which commits to accept, which to reject, and more broadly what direction the project is
going to take from a technical perspective.
Surely the Python project will be okay without van Rossum. The Python Software Foundation has plenty of formalized
structure in place bringing in broad representation from across the community. There's even
been a humorous April Fools Python Enhancement Proposal (PEP) addressing
the BDFL's retirement in the past.
That said, it's interesting that van Rossum did not heed the fifth lesson of Eric S. Raymond
from his essay, The Mail Must Get
Through (part of The
Cathedral & the Bazaar ) , which stipulates: "When you lose interest in a
program, your last duty to it is to hand it off to a competent successor." One could certainly
argue that letting the community pick its own leadership, though, is an equally-valid
choice.
What do you think? Are projects better or worse for being run by a BDFL? What can we expect
when a BDFL moves on? And can someone truly step away from their passion project after decades
of leading it? Will we still turn to them for the hard decisions, or can a community smoothly
transition to new leadership without the pitfalls of forks or lost participants?
Can you truly stop being a BDFL? Or is it a title you'll hold, at least informally, until
your death? TopicsCommunity management Python 2018 Open Source
Yearbook YearbookAbout the author Jason Baker - I use technology to make the world
more open. Linux desktop enthusiast. Map/geospatial nerd. Raspberry Pi tinkerer. Data analysis
and visualization geek. Occasional coder. Cloud nativist. Civic tech and open government
booster. More about
me
"So what are you all going to do? Create a democracy? Anarchy? A dictatorship? A
federation?"
Power coalesced to one point is always scary when thought about in the context of
succession. A vacuum invites anarchy and I often think about this for when Linus Torvalds
leaves the picture. We really have no concrete answers for what is the best way forward but
my hope is towards a democratic process. But, as current history indicates, a democracy
untended by its citizens invites quite the nightmare and so too does this translate to the
keeping up of a project.
To explain the above, I'm builidng my bash prompt by executing a function stored in a
string, which was a decision made as the result of
this question . Let's pretend like it works fine, because it does, except when unicode
characters get involved
I am trying to find the proper way to escape a unicode character, because right now it
messes with the bash line length. An easy way to test if it's broken is to type a long
command, execute it, press CTRL-R and type to find it, and then pressing CTRL-A CTRL-E to
jump to the beginning / end of the line. If the text gets garbled then it's not working.
I have tried several things to properly escape the unicode character in the function
string, but nothing seems to be working.
Which is the main reason I made the prompt a function string. That escape sequence does
NOT mess with the line length, it's just the unicode character.
The \[...\] sequence says to ignore this part of the string completely, which is
useful when your prompt contains a zero-length sequence, such as a control sequence which
changes the text color or the title bar, say. But in this case, you are printing a character,
so the length of it is not zero. Perhaps you could work around this by, say, using a no-op
escape sequence to fool Bash into calculating the correct line length, but it sounds like
that way lies madness.
The correct solution would be for the line length calculations in Bash to correctly grok
UTF-8 (or whichever Unicode encoding it is that you are using). Uhm, have you tried without
the \[...\] sequence?
Edit: The following implements the solution I propose in the comments below. The cursor
position is saved, then two spaces are printed, outside of \[...\] , then the
cursor position is restored, and the Unicode character is printed on top of the two spaces.
This assumes a fixed font width, with double width for the Unicode character.
PS1='\['"`tput sc`"'\] \['"`tput rc`"'༇ \] \$ '
At least in the OSX Terminal, Bash 3.2.17(1)-release, this passes cursory [sic]
testing.
In the interest of transparency and legibility, I have ignored the requirement to have the
prompt's functionality inside a function, and the color coding; this just changes the prompt
to the character, space, dollar prompt, space. Adapt to suit your somewhat more complex
needs.
The trick as pointed out in @tripleee's link is the use of the commands tput
sc and tput rc which save and then restore the cursor position. The code
is effectively saving the cursor position, printing two spaces for width, restoring the
cursor position to before the spaces, then printing the special character so that the width
of the line is from the two spaces, not the character.
> ,
(Not the answer to your problem, but some pointers and general experience related to your
issue.)
I see the behaviour you describe about cmd-line editing (Ctrl-R, ... Cntrl-A Ctrl-E ...)
all the time, even without unicode chars.
At one work-site, I spent the time to figure out the diff between the terminals
interpretation of the TERM setting VS the TERM definition used by the OS (well, stty I
suppose).
NOW, when I have this problem, I escape out of my current attempt to edit the line, bring
the line up again, and then immediately go to the 'vi' mode, which opens the vi editor.
(press just the 'v' char, right?). All the ease of use of a full-fledged session of vi; why
go with less ;-)?
Looking again at your problem description, when you say
That is just a string definition, right? and I'm assuming your simplifying the problem
definition by assuming this is the output of your my_function . It seems very
likely in the steps of creating the function definition, calling the function AND using the
values returned are a lot of opportunities for shell-quoting to not work the way you want it
to.
If you edit your question to include the my_function definition, and its
complete use (reducing your function to just what is causing the problem), it may be easier
for others to help with this too. Finally, do you use set -vx regularly? It can
help show how/wnen/what of variable expansions, you may find something there.
Failing all of those, look at Orielly termcap & terminfo
. You may need to look at the man page for your local systems stty and related
cmds AND you may do well to look for user groups specific to you Linux system (I'm assuming
you use a Linux variant).
What happens when a Benevolent Dictator For Life moves on from an open source
project? 16 Jul 2018 Jason Baker (Red Hat) Feed 131 up
2 comments Image credits : Original photo by Gabriel Kamener, Sown Together, Modified by
Jen Wike Huger x Subscribe now
Get the highlights in your inbox every week.
https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0
Guido van Rossum ,
creator of the Python
programming language and Benevolent Dictator For Life
(BDFL) of the project, announced his intention to step away.
Below is a portion of his message, although the entire email is
not terribly long and worth taking the time to read if you're interested in the circumstances
leading to van Rossum's departure.
I would like to remove myself entirely from the decision process. I'll still be there for
a while as an ordinary core dev, and I'll still be available to mentor people -- possibly
more available. But I'm basically giving myself a permanent vacation from being BDFL, and you
all will be on your own.
After all that's eventually going to happen regardless -- there's still that bus lurking
around the corner, and I'm not getting younger... (I'll spare you the list of medical
issues.)
I am not going to appoint a successor.
So what are you all going to do? Create a democracy? Anarchy? A dictatorship? A
federation?
It's worth zooming out for a moment to consider the issue at a larger scale. How an open
source project is governed can have very real consequences on the long-term sustainability of
its user and developer communities alike.
BDFLs tend to emerge from passion projects, where a single individual takes on a project
before growing a community around it. Projects emerging from companies or other large
organization often lack this role, as the distribution of authority is more formalized, or at
least more dispersed, from the start. Even then, it's not uncommon to need to figure out how to
transition from one form of project governance to another as the community grows and
expands.
But regardless of how an open source project is structured, ultimately, there needs to be
some mechanism for deciding how to make technical decisions. Someone, or some group, has to
decide which commits to accept, which to reject, and more broadly what direction the project is
going to take from a technical perspective.
Surely the Python project will be okay without van Rossum. The Python Software Foundation has plenty of formalized
structure in place bringing in broad representation from across the community. There's even
been a humorous April Fools Python Enhancement Proposal (PEP) addressing
the BDFL's retirement in the past.
That said, it's interesting that van Rossum did not heed the fifth lesson of Eric S. Raymond
from his essay, The Mail Must Get
Through (part of The
Cathedral & the Bazaar ) , which stipulates: "When you lose interest in a
program, your last duty to it is to hand it off to a competent successor." One could certainly
argue that letting the community pick its own leadership, though, is an equally-valid
choice.
What do you think? Are projects better or worse for being run by a BDFL? What can we expect
when a BDFL moves on? And can someone truly step away from their passion project after decades
of leading it? Will we still turn to them for the hard decisions, or can a community smoothly
transition to new leadership without the pitfalls of forks or lost participants?
Can you truly stop being a BDFL? Or is it a title you'll hold, at least informally, until
your death? TopicsCommunity management Python 2018 Open Source
Yearbook YearbookAbout the author Jason Baker - I use technology to make the world
more open. Linux desktop enthusiast. Map/geospatial nerd. Raspberry Pi tinkerer. Data analysis
and visualization geek. Occasional coder. Cloud nativist. Civic tech and open government
booster. More about
me
"So what are you all going to do? Create a democracy? Anarchy? A dictatorship? A
federation?"
Power coalesced to one point is always scary when thought about in the context of
succession. A vacuum invites anarchy and I often think about this for when Linus Torvalds
leaves the picture. We really have no concrete answers for what is the best way forward but
my hope is towards a democratic process. But, as current history indicates, a democracy
untended by its citizens invites quite the nightmare and so too does this translate to the
keeping up of a project.
I have a simple problem in Python that is very very strange.
def estExt(matriz,erro):
# (1) Determinar o vector X das soluções
print ("Matrix after:");
print(matriz);
aux=matriz;
x=solucoes(aux); # IF aux is a copy of matrix, why the matrix is changed??
print ("Matrix before: ");
print(matriz)
...
As you see below, the matrix matriz is changed in spite of the fact that
aux is the one being changed by the function solucoes() .
Hey I've been using Linux for a while and thought it was time to finally dive into shell
scripting.
The problem is I've failed to find any significant advantage of using Bash over something
like Perl or Python. Are there any performance or power differences between the two? I'd
figure Python/Perl would be more well suited as far as power and efficiency goes.
Simplicity: direct access to all wonderful linux tools wc ,
ls , cat , grep , sed ... etc. Why
constantly use python's subprocess module?
I'm increasingly fond of using gnu parallel , with which you can
execute your bash scripts in parallel. E.g. from the man page, batch create thumbs of all
jpgs in directory in parallel:
ls *.jpg | parallel convert -geometry 120 {} thumb_{}
By the way, I usually have some python calls in my bash scripts (e.g. for plotting). Use
whatever is best for the task!
bash isn't a language so much as a command interpreter that's been hacked to death to allow
for things that make it look like a scripting language. It's great for the simplest 1-5 line
one-off tasks, but things that are dead simple in Perl or Python like array manipulation are
horribly ugly in bash. I also find that bash tends not to pass two critical rules of thumb:
The 6-month rule, which says you should be able to easily discern the purpose and basic
mechanics of a script you wrote but haven't looked at in 6 months.
The 'WTF per minute' rule. Everyone has their limit, and mine is pretty small. Once I
get to 3 WTFs/min, I'm looking elsewhere.
As for 'shelling out' in scripting languages like Perl and Python, I find that I almost
never need to do this, fwiw (disclaimer: I code almost 100% in Python). The Python os and
shutil modules have most of what I need most of the time, and there are built-in modules for
handling tarfiles, gzip files, zip files, etc. There's a glob module, an fnmatch module...
there's a lot of stuff there. If you come across something you need to parallelize, then
indent your code a level, put it in a 'run()' method, put that in a class that extends either
threading.Thread or multiprocessing.Process, instantiate as many of those as you want,
calling 'start()' on each one. Less than 5 minutes to get parallel execution generally.
There are a few things you can only do in bash (for example, alter the calling environment
(when a script is sourced rather than run). Also, shell scripting is commonplace. It is
worthwhile to learn the basics and learn your way around the available docs.
Plus there are times when knowing a shell well can save your bacon (on a fork-bombed
system where you can't start any new processes, or if /usr/bin and or
/usr/local/bin fail to mount).
The advantage is that it's right there. Unless you use Python (or Perl) as your shell,
writing a script to do a simple loop is a bunch of extra work.
For short, simple scripts that call other programs, I'll use Bash. If I want to keep the
output, odds are good that I'll trade up to Python.
For example:
for file in *; do process $file ; done
where process is a program I want to run on each file, or...
while true; do program_with_a_tendency_to_fail ; done
Doing either of those in Python or Perl is overkill.
For actually writing a program that I expect to maintain and use over time, Bash is rarely
the right tool for the job. Particularly since most modern Unices come with both Perl and
Python.
The most important advantage of POSIX shell scripts over Python or Perl scripts is that a
POSIX shell is available on virtually every Unix machine. (There are also a few tasks shell
scripts happen to be slightly more convenient for, but that's not a major issue.) If the
portability is not an issue for you, I don't see much need to learn shell scripting.
If you want to execute programs installed on the machine, nothing beats bash. You can always
make a system call from Perl or Python, but I find it to be a hassle to read return values,
etc.
And since you know it will work pretty much anywhere throughout all of of time...
The advantage of shell scripting is that it's globally present on *ix boxes, and has a
relatively stable core set of features you can rely on to run everywhere. With Perl and
Python you have to worry about whether they're available and if so what version, as there
have been significant syntactical incompatibilities throughout their lifespans. (Especially
if you include Python 3 and Perl 6.)
The disadvantage of shell scripting is everything else. Shell scripting languages are
typically lacking in expressiveness, functionality and performance. And hacking command lines
together from strings in a language without strong string processing features and libraries,
to ensure the escaping is correct, invites security problems. Unless there's a compelling
compatibility reason you need to go with shell, I would personally plump for a scripting
language every time.
I am working on a web-based log management system that will be built on the Grails framework
and I am going to use one of the text processing languages like Python or Perl. I have
created Python and Perl scripts that load log files and parse each line to save them to a
MySQL database (the file contains about 40,000 lines, about 7MB). It took 1 min 2 secs using
Perl and only 17 secs using Python .
I had supposed that Perl would be faster than Python, as
Perl is the original text processing language (my suspicions also coming from different blogs
where I was reading about Perl text processing performance).
Also I was not expecting a 47
second difference between Perl and Python. Why is Perl taking more time than Python to
process my log file? Is it because I am using some wrong db module or my code and regular
expression for Perl can be improved?
Note: I am a Java and Groovy developer and I have no experience with Perl (I am using
Strawberry Perl v5.16). Also I have made this test with Java (1 min 5 secs) and Groovy (1 min
7 secs) but more than 1 min to process the log file is too much, so both languages are out
and now I want to choose between Perl and Python.
PERL Code
use DBI;
use DBD::mysql;
# make connection to database
$connection = DBI->connect("dbi:mysql:logs:localhost:3306","root","") || die "Cannot connect: $DBI::errstr";
# set the value of your SQL query
$query = "insert into logs (line_number, dated, time_stamp, thread, level, logger, user, message)
values (?, ?, ?, ?, ?, ?, ?, ?) ";
# prepare your statement for connecting to the database
$statement = $connection->prepare($query);
$runningTime = time;
# open text file
open (LOG,'catalina2.txt') || die "Cannot read logfile!\n";;
while (<LOG>) {
my ($date, $time, $thread, $level, $logger, $user, $message) = /^(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2},\d{3}) (\[.*\]) (.*) (\S*) (\(.*\)) - (.*)$/;
$statement->execute(1, $date, $time, $thread, $level, $logger, $user, $message);
}
# close the open text file
close(LOG);
# close database connection
$connection->disconnect;
$runningTime = time - $runningTime;
printf("\n\nTotal running time: %02d:%02d:%02d\n\n", int($runningTime / 3600), int(($runningTime % 3600) / 60), int($runningTime % 60));
# exit the script
exit;
PYTHON Code
import re
import mysql.connector
import time
file = open("D:\catalina2.txt","r")
rexp = re.compile('^(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2},\d{3}) (\[.*\]) (.*) (\S*) (\(.*\)) - (.*)$')
conn = mysql.connector.connect(user='root',host='localhost',database='logs')
cursor = conn.cursor()
tic = time.clock()
increment = 1
for text in file.readlines():
match = rexp.match(text)
increment += 1
cursor.execute('insert into logs (line_number,dated, time_stamp, thread,level,logger,user,message ) values (%s,%s,%s,%s,%s,%s,%s,%s)', (increment, match.group(1), match.group(2),match.group(3),match.group(4),match.group(5),match.group(6),match.group(7)))
conn.commit()
cursor.close()
conn.close()
toc = time.clock()
print "Total time: %s" % (toc - tic)
You are only calling cursor.execute once in Python:
for text in file.readlines():
match = rexp.match(text)
increment += 1
cursor.execute('insert into logs (line_number,dated, time_stamp, thread,level,logger,user,message ) values (%s,%s,%s,%s,%s,%s,%s,%s)', (increment, match.group(1), match.group(2),match.group(3),match.group(4),match.group(5),match.group(6),match.group(7)))
But you are calling $statement->execute many times in Perl:
By the way, for the Python version, calling cursor.execute once for every row
will be slow. You can make it faster by using cursor.executemany :
sql = 'insert into logs (line_number,dated, time_stamp, thread,level,logger,user,message ) values (%s,%s,%s,%s,%s,%s,%s,%s)'
args = []
for text in file:
match = rexp.match(text)
increment += 1
args.append([increment] + list(match.groups()))
cursor.executemany(sql, args)
If there are too many lines in the log file, you may need to break this up into
blocks:
args = []
for text in file:
match = rexp.match(text)
increment += 1
args.append([increment] + list(match.groups()))
if increment % 1000 == 0:
cursor.executemany(sql, args)
args = []
if args:
cursor.executemany(sql, args)
(Also, don't use file.readlines() because this creates a list (which may be
huge). file is an iterator which spits out one line at a time, so for text
in file suffices.)
"This simple glitch in the original script calls into question the conclusions of a
significant number of papers on a wide range of topics in a way that cannot be easily resolved
from published information because the operating system is rarely mentioned," the new paper
reads. "Authors who used these scripts should certainly double-check their results and any
relevant conclusions using the modified scripts in the [supplementary information]."
Yuheng Luo, a graduate student at the University of Hawaii at Manoa, discovered the
glitch this summer when he was verifying the results of research conducted by chemistry
professor Philip Williams on cyanobacteria... Under supervision of University of Hawaii at
Manoa assistant chemistry professor Rui Sun, Luo used a script written in Python that was
published as part of a 2014 paper by Patrick Willoughby, Matthew Jansma, and Thomas Hoye in the
journal Nature Protocols . The code computes chemical shift values for NMR, or nuclear magnetic
resonance spectroscopy, a common technique used by chemists to determine the molecular make-up
of a sample. Luo's results did not match up with the NMR values that Williams' group had
previously calculated, and according to Sun, when his students ran the code on their computers,
they realized that different operating systems were producing different results.
Sun then adjusted the code to fix the glitch, which had to do with how different
operating systems sort files.
The researcher who wrote the flawed script told Motherboard that the new study was "a beautiful
example of science working to advance the work we reported in 2014. They did a tremendous
service to the community in figuring this out."
Sun described the original authors as "very gracious," saying they encouraged the
publication of the findings.
In perl, I could take the To: line of a raw email and find either of the
above addresses with
/\w+@(tickets\.)?company\.com/i
In python, I simply wrote the above regex as '\w+@(tickets\.)?company\.com'
expecting the same result. However, [email protected] isn't found at all and a
findall on the second returns a list containing only 'tickets.' . So clearly the
'(tickets\.)?' is the problem area, but what exactly is the difference in
regular expression rules between Perl and Python that I'm missing?
findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.
Since (tickets\.) is a group, findall returns that instead of
the whole match. If you want the whole match, put a group around the whole pattern and/or use
non-grouping matches, i.e.
There isn't a difference in the regexes, but there is a difference in what you are looking
for. Your regex is capturing only "tickets." if it exists in both regexes. You
probably want something like this
I am an experienced Perl developer with some degree of experience and/or familiarity with
other languages (working experience with C/C++, school experience with Java and Scheme, and
passing familiarity with many others).
I might need to get some web work done in Python (most immediately, related to Google App
Engine). As such, I'd like to ask SO overmind for good references on how to best learn Python
for someone who's coming from Perl background (e.g. the emphasis would be on differences
between the two and how to translate perl idiomatics into Python idiomatics, as opposed to
generic Python references). Something also centered on Web development is even better. I'll
take anything - articles, tutorials, books, sample apps?
I've recently had to make a similar transition for work reasons, and it's been pretty
painful. For better or worse, Python has a very different philosophy and way of working than
Perl, and getting used to that can be frustrating. The things I've found most useful have
been
Spend a few hours going through all the basics. I found the official tutorial quite good, if a little dry.
Take a look at this handy Perl<->Python phrasebook (common
tasks, side by side, in both languages).
A reference for the Python approach to "common tasks". I use the Python Cookbook .
An ipython terminal open at all times
to test syntax, introspect object methods etc.
Get pip and easy-install (to install Python
modules easily).
Learn about unit tests fast. This is because without use strict you will
feel crippled, and you will make many elementary mistakes which will appear as runtime
errors. I recommend nose rather than the unittest framework that comes with the
core install. unittest is very verbose if you're used to Test::More .
Being a hardcore Perl programmer, all I can say is DO NOT BUY O'Reilly's "Learning Python".
It is nowhere NEAR as good as "Learning Perl", and there's no equivalent I know of to Larry
Wall's "Programming Perl", which is simply unbeatable.
I've had the most success taking past Perl programs and translating them into Python,
trying to make use of as many new techniques as possible.
Check out the official tutorial ,
which is actually pretty good. If you are interested in web development you should be ready
at that point to jump right in to the documentation of the web framework you will be working
with; Python has many to choose from, with zope, cherrypy, pylons, and werkzeug all having
good reputations.
I would not try to search for things specifically meant to help you transition from Perl,
which are not to be of as high of quality as references that can be useful for more
people.
This is the site you should really
go to. There's a section called Getting Started which you should take a look. There are also
recommendations on books. On top of that, you might also be interested in this on "idioms"
I wouldn't try to compare Perl and Python too much in order to learn Python, especially since
you have working knowledge of other languages. If you are unfamiliar with OOP/Functional
programming aspects and just looking to work procedurally like in Perl, start learning the
Python language constructs / syntax and then do a couple examples. if you are making a switch
to OO or functional style paradigms, I would read up on OO fundamentals first, then start on
Python syntax and examples...so you have a sort of mental blueprint of how things can be
constructed before you start working with the actual materials. this is just my humble
opinion however..
"... Perl has native regular expression support, ..."
"... Perl has quite a few more operators , including matching ..."
"... In PHP, new is an operator. In Perl, it's the conventional name of an object creation subroutine defined in packages, nothing special as far as the language is concerned. ..."
"... Perl logical operators return their arguments, while they return booleans in PHP. ..."
"... Perl gives access to the symbol table ..."
"... Note that "references" has a different meaning in PHP and Perl. In PHP, references are symbol table aliases. In Perl, references are smart pointers. ..."
"... Perl has different types for integer-indexed collections (arrays) and string indexed collections (hashes). In PHP, they're the same type: an associative array/ordered map ..."
"... Perl arrays aren't sparse ..."
"... Perl supports hash and array slices natively, ..."
Perl and PHP are more different than alike. Let's consider Perl 5, since Perl 6 is still under development. Some differences,
grouped roughly by subject:
Perl has native regular expression support, including regexp literals. PHP uses Perl's regexp functions as an
extension.
In PHP, new is an operator. In Perl, it's the conventional
name of an object creation
subroutine defined in packages, nothing special as far as the language is concerned.
Perl logical operators return their arguments, while they
return booleans in PHP. Try:
$foo = '' || 'bar';
in each language. In Perl, you can even do $foo ||= 'default' to set $foo to a value if it's not already set.
The shortest way of doing this in PHP is $foo = isset($foo) ? $foo : 'default'; (Update, in PHP 7.0+ you can do
$foo = $foo ?? 'default' )
Perl variable names indicate built-in
type, of which Perl has three, and the type specifier is part of the name (called a "
sigil "), so $foo is a
different variable than @foo or %foo . (related to the previous point) Perl has separate
symbol table entries
for scalars, arrays, hashes, code, file/directory handles and formats. Each has its own namespace.
Perl gives access to the symbol table
, though manipulating it isn't for the faint of heart. In PHP, symbol table manipulation is limited to creating
references and the extract function.
Note that "references" has a different meaning in PHP and Perl. In PHP,
references are symbol table aliases. In Perl,
references are smart pointers.
Perl has different types for integer-indexed collections (arrays) and string indexed collections (hashes). In PHP,
they're the same type: an associative array/ordered map.
Perl arrays aren't sparse: setting an element with index larger than the current size of the array will set all
intervening elements to undefined (see perldata
). PHP arrays are sparse; setting an element won't set intervening elements.
Perl supports hash and array slices natively,
and slices are assignable, which has all sorts of
uses . In PHP, you use array_slice to extract a slice
and array_splice to assign to a slice.
In addition, Perl has global, lexical (block), and package
scope . PHP has global, function, object,
class and namespace scope .
In Perl, variables are global by default. In PHP, variables in functions are local by default.
Perl supports explicit tail calls via the
goto function.
Perl's prototypes provide more limited type
checking for function arguments than PHP's
type hinting . As a result, prototypes are of more limited utility than type hinting.
In Perl, the last evaluated statement is returned as the value of a subroutine if the statement is an expression (i.e.
it has a value), even if a return statement isn't used. If the last statement isn't an expression (i.e. doesn't have a value),
such as a loop, the return value is unspecified (see perlsub
). In PHP, if there's no explicit return, the
return value is NULL .
Perl flattens lists (see perlsub ); for un-flattened
data structures, use references.
@foo = qw(bar baz);
@qux = ('qux', @foo, 'quux'); # @qux is an array containing 4 strings
@bam = ('bug-AWWK!', \@foo, 'fum'); # @bam contains 3 elements: two strings and a array ref
PHP doesn't flatten arrays.
Perl has special
code blocks ( BEGIN , UNITCHECK , CHECK , INIT and END
) that are executed. Unlike PHP's auto_prepend_file and
auto_append_file
, there is no limit to the number of each type of code block. Also, the code blocks are defined within the scripts, whereas
the PHP options are set in the server and per-directory config files.
In Perl, the semicolon separates statements
. In PHP, it terminates
them, excepting that a PHP close tag ("?>") can also terminate a statement.
Negative subscripts in Perl are relative to the end of the array. $bam[-1] is the final element of the array.
Negative subscripts in PHP are subscripts like any other.
In Perl 5, classes are based on packages and look nothing like classes in PHP (or most other languages). Perl 6 classes
are closer to PHP classes, but still quite different. (Perl 6 is
different from Perl 5 in many other ways, but that's
off topic.) Many of the differences between Perl 5 and PHP arise from the fact that most of the OO features are not built-in
to Perl but based on hacks. For example, $obj->method(@args) gets translated to something like (ref $obj)::method($obj,
@args) . Non-exhaustive list:
PHP automatically provides the special variable $this in methods. Perl passes a reference to the object
as the first argument to methods.
Perl requires references to be blessed to
create an object. Any reference can be blessed as an instance of a given class.
In Perl, you can dynamically change inheritance via the packages @ISA variable.
Strictly speaking, Perl doesn't have multiline comments, but the
POD system can be used for the same affect.
In Perl, // is an operator. In PHP, it's the start of a one-line comment.
Until PHP 5.3, PHP had terrible support for anonymous functions (the create_function function) and no support
for closures.
PHP had nothing like Perl's packages until version 5.3, which introduced
namespaces .
Arguably, Perl's built-in support for exceptions looks almost nothing like exceptions in other languages, so much so that
they scarcely seem like exceptions. You evaluate a block and check the value of $@ ( eval instead
of try , die instead of
throw ). The ErrorTry::Tiny module supports exceptions as you find them in other languages
(as well as some other modules listed in Error's See Also
section).
PHP was inspired by Perl the same way Phantom of the Paradise was inspired by Phantom of the Opera , or Strange
Brew was inspired by Hamlet . It's best to put the behavior specifics of PHP out of your mind when learning Perl, else
you'll get tripped up.
I've noticed that most PHP vs. Perl pages seem to be of the
PHP is better than Perl because <insert lame reason here>
ilk, and rarely make reasonable comparisons.
Syntax-wise, you will find PHP is often easier to understand than Perl, particularly when you have little experience. For example,
trimming a string of leading and trailing whitespace in PHP is simply
$string = trim($string);
In Perl it is the somewhat more cryptic
$string =~ s/^\s+//;
$string =~ s/\s+$//;
(I believe this is slightly more efficient than a single line capture and replace, and also a little more understandable.)
However, even though PHP is often more English-like, it sometimes still shows its roots as a wrapper for low level C, for example,
strpbrk and strspn are probably rarely used, because most PHP dabblers write their own equivalent functions
for anything too esoteric, rather than spending time exploring the manual. I also wonder about programmers for whom English is
a second language, as everybody is on equal footing with things such as Perl, having to learn it from scratch.
I have already mentioned the manual. PHP has a fine online manual, and unfortunately it needs it. I still refer to it from
time to time for things that should be simple, such as order of parameters or function naming convention. With Perl, you will
probably find you are referring to the manual a lot as you get started and then one day you will have an a-ha moment and
never need it again. Well, at least not until you're more advanced and realize that not only is there more than one way, there
is probably a better way, somebody else has probably already done it that better way, and perhaps you should just visit CPAN.
Perl does have a lot more options and ways to express things. This is not necessarily a good thing, although it allows code
to be more readable if used wisely and at least one of the ways you are likely to be familiar with. There are certain styles and
idioms that you will find yourself falling into, and I can heartily recommend reading
Perl Best Practices (sooner rather than
later), along with Perl Cookbook, Second Edition
to get up to speed on solving common problems.
I believe the reason Perl is used less often in shared hosting environments is that historically the perceived slowness of
CGI and hosts' unwillingness to install mod_perl due to security
and configuration issues has made PHP a more attractive option. The cycle then continued, more people learned to use PHP because
more hosts offered it, and more hosts offered it because that's what people wanted to use. The speed differences and security
issues are rendered moot by FastCGI these days, and in most
cases PHP is run out of FastCGI as well, rather than leaving it in the core of the web server.
Whether or not this is the case or there are other reasons, PHP became popular and a myriad of applications have been written
in it. For the majority of people who just want an entry-level website with a simple blog or photo gallery, PHP is all they need
so that's what the hosts promote. There should be nothing stopping you from using Perl (or anything else you choose) if you want.
At an enterprise level, I doubt you would find too much PHP in production (and please, no-one point at Facebook as a
counter-example, I said enterprise level).
Perl is used plenty for websites, no less than Python and Ruby for example. That said, PHP is used way more often than any of
those. I think the most important factors in that are PHP's ease of deployment and the ease to start with it.
The differences in syntax are too many to sum up here, but generally it is true that it has more ways to express yourself (this
is know as TIMTWOTDI, There Is More Than One Way To Do It).
My favorite thing about Perl is the way it handles arrays/lists. Here's an example of how you would make and use a Perl function
(or "subroutine"), which makes use of this for arguments:
sub multiply
{
my ($arg1, $arg2) = @_; # @_ is the array of arguments
return $arg1 * $arg2;
}
In PHP you could do a similar thing with list() , but it's not quite the same; in Perl lists and arrays are actually
treated the same (usually). You can also do things like:
And another difference that you MUST know about, is numerical/string comparison operators. In Perl, if you use <
, > , == , != , <=> , and so on, Perl converts both operands to numbers. If
you want to convert as strings instead, you have to use lt , gt , eq , ne
, cmp (the respective equivalents of the operators listed previously). Examples where this will really get you:
if ("a" == "b") { ... } # This is true.
if ("a" == 0) { ... } # This is also true, for the same reason.
I have been using R CMD BATCH my_script.R from a terminal to execute an
R script. I am now at the point where I would like to pass an argument to the
command, but am having some issues getting it working. If I do R CMD BATCH my_script.R
blabla then blabla becomes the output file, rather than being interpreted
as an argument available to the R script being executed.
I have tried Rscript my_script.R blabla which seems to pass on
blabla correctly as an argument, but then I don't get the
my_script.Rout output file that I get with R CMD BATCH (I want the
.Rout file). While I could redirect the output of a call to Rscript
to a file name of my choosing, I would not be getting the R input commands included in the
file in the way R CMD BATCH does in the .Rout file.
So, ideally, I'm after a way to pass arguments to an R script being executed via the
R CMD BATCH method, though would be happy with an approach using
Rscript if there is a way to make it produce a comparable .Rout
file.
My impression is that R CMD BATCH is a bit of a relict. In any case, the more
recent Rscript executable (available on all platforms), together with
commandArgs() makes processing command line arguments pretty easy.
As an example, here is a little script -- call it "myScript.R" :
Not that I'd recommend it, but ... using a combination of source() and
sink() , you could get Rscript to produce an .Rout
file like that produced by R CMD BATCH . One way would be to create a little R
script -- call it RscriptEcho.R -- which you call directly with Rscript. It
might look like this:
##First read in the arguments listed at the command line
args=(commandArgs(TRUE))
##args is now a list of character vectors
## First check to see if arguments are passed.
## Then cycle through each element of the list and evaluate the expressions.
if(length(args)==0){
print("No arguments supplied.")
##supply default values
a = 1
b = c(1,1,1)
}else{
for(i in 1:length(args)){
eval(parse(text=args[[i]]))
}
}
print(a*2)
print(b*3)
You need to put arguments before my_script.R and use - on the
arguments, e.g.
R CMD BATCH -blabla my_script.R
commandArgs() will receive -blabla as a character string in this
case. See the help for details:
$ R CMD BATCH --help
Usage: R CMD BATCH [options] infile [outfile]
Run R non-interactively with input from infile and place output (stdout
and stderr) to another file. If not given, the name of the output file
is the one of the input file, with a possible '.R' extension stripped,
and '.Rout' appended.
Options:
-h, --help print short help message and exit
-v, --version print version info and exit
--no-timing do not report the timings
-- end processing of options
Further arguments starting with a '-' are considered as options as long
as '--' was not encountered, and are passed on to the R process, which
by default is started with '--restore --save --no-readline'.
See also help('BATCH') inside R.
Here's another way to process command line args, using R CMD BATCH . My
approach, which builds on an earlier answer here , lets you specify
arguments at the command line and, in your R script, give some or all of them default values.
Here's an R file, which I name test.R :
defaults <- list(a=1, b=c(1,1,1)) ## default values of any arguments we might pass
## parse each command arg, loading it into global environment
for (arg in commandArgs(TRUE))
eval(parse(text=arg))
## if any variable named in defaults doesn't exist, then create it
## with value from defaults
for (nm in names(defaults))
assign(nm, mget(nm, ifnotfound=list(defaults[[nm]]))[[1]])
print(a)
print(b)
At the command line, if I type
R CMD BATCH --no-save --no-restore '--args a=2 b=c(2,5,6)' test.R
then within R we'll have a = 2 and b =
c(2,5,6) . But I could, say, omit b , and add in another argument
c :
R CMD BATCH --no-save --no-restore '--args a=2 c="hello"' test.R
Then in R we'll have a = 2 , b =
c(1,1,1) (the default), and c = "hello" .
Finally, for convenience we can wrap the R code in a function, as long as we're careful
about the environment:
## defaults should be either NULL or a named list
parseCommandArgs <- function(defaults=NULL, envir=globalenv()) {
for (arg in commandArgs(TRUE))
eval(parse(text=arg), envir=envir)
for (nm in names(defaults))
assign(nm, mget(nm, ifnotfound=list(defaults[[nm]]), envir=envir)[[1]], pos=envir)
}
## example usage:
parseCommandArgs(list(a=1, b=c(1,1,1)))
I'm trying to set up an easy to use R development environment for multiple users. R is
installed along with a set of other dev tools on an NFS mount.
I want to create a core set of R packages that also live on NFS so n users don't need to
install their own copies of the same packages n times. Then, I was hoping users can install
one off packages to a local R library. Has anyone worked with an R setup like this before?
From the doc, it looks doable by adding both the core package and personal package file paths
to .libPaths() .
,
You want to use the .Renviron file (see ?Startup ).
There are three places to put the file:
Site wide in R_HOME/etc/Renviron.site
Local in either the current working directory or the home area
In this file you can specify R_LIBS and the R_LIBS_SITE
environment variables.
For your particular problem, you probably want to add the NFS drive location to
R_LIBS_SITE in the R_HOME/etc/Renviron.site file.
I would like to upgrade one R package to the newer version which is already available. I
tried
update.packages(c("R2jags"))
but it does nothing! No output on console, no error, nothing. I used the same syntax as
for install.packages but perhaps I'm doing something wrong. I have been looking
at ?update.packages but I was not been able to figure out how it works, where to
specify the package(s) etc. There is no example. I also tried to update the package using
install.packages to "install" it again but that says "Warning: package
'R2jags' is in use and will not be installed" .
You can't do this I'm afraid, well, not with update.packages() . You need to
call install.packages("R2jags") instead.
You can't install R2jags in the current session because you have already loaded the
current version into the session. If you need to, save any objects you can't easily recreate,
and quit out of R. Then start a new R session, immediately run
install.packages("R2jags") , then once finished, load the package and reload in
any previously saved objects. You could try to unload the package with:
detach(package:R2jags, unload = TRUE)
but it is quite complex to do this cleanly unless the package cleans up after itself.
update.packages() exists to update all outdated packages in a stated library
location. That library location is given by the first argument (if not supplied it works on
all known library locations for the current R session). Hence you were asking it the update
the packages in library location R2jags which is most unlikely to exist on your
R installation.
# The following two commands remove any previously installed H2O packages for R.
if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }
# Next, we download packages that H2O depends on.
pkgs <- c("RCurl","jsonlite")
for (pkg in pkgs) {
if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }
}
# Now we download, install and initialize the H2O package for R.
install.packages("h2o", type="source", repos="http://h2o-release.s3.amazonaws.com/h2o/rel-xia/2/R")
# Finally, let's load H2O and start up an H2O cluster
library(h2o)`enter code here`
h2o.init()
always when I am installing a new packages with dependecies (like e.g. htmltools depends
on lme4), I get erros like:
Error in .requirePackage(package) :
unable to find required package 'lme4'
although lme4 is installed and I used it before.... also other errors/warnings like:
Warning in install.packages :
cannot remove prior installation of package 'Rcpp'
or:
Warning in install.packages :
unable to move temporary installation 'c:\...\file17b033a54a21\jsonlite' to 'c:\...\jsonlite'
occur. If I install them twice they usually work but sometimes dependencies to packages
that worked before are lost and I have to reinstall them again. Is there a way to circumvent
this?
> ,
Put this in a file named .REnviron in your Documents folder and
restart R:
R_LIBS=c:/R/mylibraries
From then on, you should be able to install packages into that location automatically,
without having to fiddle around with .libPaths .
always when I am installing a new packages with dependecies (like e.g. htmltools depends
on lme4), I get erros like:
Error in .requirePackage(package) :
unable to find required package 'lme4'
although lme4 is installed and I used it before.... also other errors/warnings like:
Warning in install.packages :
cannot remove prior installation of package 'Rcpp'
or:
Warning in install.packages :
unable to move temporary installation 'c:\...\file17b033a54a21\jsonlite' to 'c:\...\jsonlite'
occur. If I install them twice they usually work but sometimes dependencies to packages
that worked before are lost and I have to reinstall them again. Is there a way to circumvent
this?
> ,
Put this in a file named .REnviron in your Documents folder and
restart R:
R_LIBS=c:/R/mylibraries
From then on, you should be able to install packages into that location automatically,
without having to fiddle around with .libPaths .
I have read the R FAQS and other posts but I am a bit confused and would be grateful to know
whether I did everything correctly.
In Windows, in order to modify the default library folder I created a file
Renviron.site and put inside E:/Programs/R-3.3.0/etc . The file has
only one line saying
R_LIBS=E:/Rlibrary
When I open R and run .libPaths() I see E:/Rlibrary as [1] and
the default R library E:/Programs/R-3.3.0/library as [2].
This should mean that from now on all packages I will install will go in
E:/Rlibrary but at the same time I will be able to load and use both packages in
this folder and those in the default location. Am I correct?
,
When you load a package via library , it will go through each directory in
.libPaths() in turn to find the required package. If the package hasn't been
found, you will get an error. This means you can have multiple versions of a package (in
different directories), but the package that will be used is determined by the order of
.libPaths() .
Regarding how .libPaths() is constructed, from ?.R_LIBS
The library search path is initialized at startup from the environment variable 'R_LIBS'
(which should be a colon-separated list of directories at which R library trees are rooted)
followed by those in environment variable 'R_LIBS_USER'. Only directories which exist at
the time will be included.
Every time R starts, a number of files are read, in a particular order. The contents of
these files determine how R performs for the duration of the session. Note that these files
should only be changed with caution, as they may make your R version behave differently to
other R installations. This could reduce the reproducibility of your code.
Files in three folders are important in this process:
R_HOME , the directory in which R is installed . The etc
sub-directory can contain start-up files read early on in the start-up process. Find out
where your R_HOME is with the R.home() command.
HOME , the user's home directory . Typically this is
/home/username on Unix machines or C:\Users\username on Windows
(since Windows 7). Ask R where your home directory with, path.expand("~") (note
the use of the Unix-like tilde to represent the home directory).
R's current working directory . This is reported by getwd() .
It is important to know the location of the .Rprofile and
.Renviron set-up files that are being used out of these three options. R only uses
one .Rprofile and one .Renviron in any session: if you have a
.Rprofile file in your current project, R will ignore .Rprofile in
R_HOME and HOME . Likewise, .Rprofile in
HOME overrides .Rprofile in R_HOME . The same applies to
.Renviron : you should remember that adding project specific environment variables
with .Renviron will de-activate other .Renviron files.
To create a project-specific start-up script, simply create a .Rprofile file in
the project's root directory and start adding R code, e.g. via
file.edit(".Rprofile") . Remember that this will make .Rprofile in
the home directory be ignored. The following commands will open your .Rprofile
from within an R editor:
file.edit(file.path("~", ".Rprofile")) # edit .Rprofile in HOME
file.edit(".Rprofile") # edit project specific .Rprofile
Note that editing the .Renviron file in the same locations will have the same
effect. The following code will create a user specific .Renviron file (where API
keys and other cross-project environment variables can be stored), without overwriting any
existing file.
user_renviron = path.expand(file.path("~", ".Renviron"))
if(!file.exists(user_renviron)) # check to see if the file already exists
file.create(user_renviron)
file.edit(user_renviron) # open with another text editor if this fails
The location, contents and uses of each is outlined in more detail below. 3.3.1 The
.Rprofile file
By default R looks for and runs .Rprofile files in the three locations
described above, in a specific order. .Rprofile files are simply R scripts that
run each time R runs and they can be found within R_HOME , HOME and
the project's home directory, found with getwd() . To check if you have a
site-wide .Rprofile , which will run for all users on start-up, run:
The above code checks for the presence of Rprofile.site in that directory. As
outlined above, the .Rprofile located in your home directory is user-specific.
Again, we can test whether this file exists using
file.exists("~/.Rprofile")
We can use R to create and edit .Rprofile (warning: do not overwrite your
previous .Rprofile - we suggest you try project-specific .Rprofile
first):
if(!file.exists("~/.Rprofile")) # only create if not already there
file.create("~/.Rprofile") # (don't overwrite it)
file.edit("~/.Rprofile")
3.3.2 Example .Rprofile settings
An .Rprofile file is just an R script that is run at start-up. The examples at
the bottom of the .Rprofile help file
help("Rprofile")
give clues as to the types of things we could place in our profile.
3.3.2.1 Setting
options
The function options is a list that contains a number of default options. See
help("options") or simply type options() to get an idea of what we
can configure. In my .Rprofile file, we have the line
The R prompt, from the boring > to the exciting R> .
The number of digits displayed.
Removing the stars after significant p -values.
Typically we want to avoid adding options to the start-up file that make our code
non-portable. For example, adding
options(stringsAsFactors=FALSE)
to your start-up script has knock-on effects for read.table and related
functions including read.csv , making them convert text strings into characters
rather than into factors as is default. This may be useful for you, but it is dangerous as it
may make your code less portable. 3.3.2.2 Setting the CRAN mirror
To avoid setting the CRAN mirror each time you run install.packages you can
permanently set the mirror in your .Rprofile .
## local creates a new, empty environment
## This avoids polluting the global environment with
## the object r
local({
r = getOption("repos")
r["CRAN"] = "https://cran.rstudio.com/"
options(repos = r)
})
The RStudio mirror is a virtual machine run by Amazon's EC2 service, and it syncs with the
main CRAN mirror in Austria once per day. Since RStudio is using Amazon's CloudFront, the
repository is automatically distributed around the world, so no matter where you are in the
world, the data doesn't need to travel very far, and is therefore fast to download. 3.3.2.3
The fortunes package
This section illustrate what .Rprofile does with reference to a package that
was developed for fun. The code below could easily be altered to automatically connect to a
database, or ensure that the latest packages have been downloaded.
The fortunes package contains a number of memorable quotes that the community has collected
over many years, called R fortunes. Each fortune has a number. To get fortune number , for
example, enter
fortunes::fortune(50)
It is easy to make R print out one of these nuggets of truth each time you start a session,
by adding the following to ~/.Rprofile :
The interactive function tests whether R is being used interactively in a
terminal. The fortune function is called within try . If the fortunes
package is not available, we avoid raising an error and move on. By using :: we
avoid adding the fortunes package to our list of attached packages..
The function .Last , if it exists in the .Rprofile , is always run
at the end of the session. We can use it to install the fortunes package if needed. To load the
package, we use require , since if the package isn't installed, the
require function returns FALSE and raises a warning.
You can also load useful functions in .Rprofile . For example, we could load
the following two functions for examining data frames:
## ht == headtail
ht = function(d, n=6) rbind(head(d, n), tail(d, n))
## Show the first 5 rows & first 5 columns of a data frame
hh = function(d) d[1:5, 1:5]
and a function for setting a nice plotting window:
Note that these functions are for personal use and are unlikely to interfere with code from
other people. For this reason even if you use a certain package every day, we don't recommend
loading it in your .Rprofile . Also beware the dangers of loading many functions
by default: it may make your code less portable. Another downside of putting functions in your
.Rprofile is that it can clutter-up your work space: when you run the
ls() command, your .Rprofile functions will appear. Also if you run
rm(list=ls()) , your functions will be deleted.
One neat trick to overcome this issue is to use hidden objects and environments. When an
object name starts with . , by default it doesn't appear in the output of the
ls() function
.obj = 1
".obj" %in% ls()
## [1] FALSE
This concept also works with environments. In the .Rprofile file we can create
a hidden environment
.env = new.env()
and then add functions to this environment
.env$ht = function(d, n = 6) rbind(head(d, n), tail(d, n))
At the end of the .Rprofile file, we use attach , which makes it
possible to refer to objects in the environment by their names alone.
attach(.env)
3.3.3 The .Renviron file
The .Renviron file is used to store system variables. It follows a similar
start up routine to the .Rprofile file: R first looks for a global
.Renviron file, then for local versions. A typical use of the
.Renviron file is to specify the R_LIBS path
## Linux
R_LIBS=~/R/library
## Windows
R_LIBS=C:/R/library
This variable points to a directory where R packages will be installed. When
install.packages is called, new packages will be stored in R_LIBS
.
Another common use of .Renviron is to store API keys that will be available
from one session to another. 4 The
following line in .Renviron , for example, sets the ZEIT_KEY
environment variable which is used in the package diezeit package:
ZEIT_KEY=PUT_YOUR_KEY_HERE
You will need to sign-in and start a new R session for the environment variable (accessed by
Sys.getenv ) to be visible. To test if the example API key has been successfully
added as an environment variable, run the following:
Sys.getenv("ZEIT_KEY")
Use of the .Renviron file for storing settings such as library paths and API
keys is efficient because it reduces the need to update your settings for every R session.
Furthermore, the same .Renviron file will work across different platforms so keep
it stored safely. 3.3.4 Exercises
What are the three locations where they are stored? Where are these locations on your
computer?
For each location, does a .Rprofile or .Renviron file
exist?
Create a .Rprofile file in your current working directory that prints the
message Happy efficient R programming each time you start R at this
location.
I am trying to translate a Perl function into a Python function, but I am having trouble
figuring out what some of the Perl to Python function equivalents.
Perl function:
sub reverse_hex {
my $HEXDATE = shift;
my @bytearry=();
my $byte_cnt = 0;
my $max_byte_cnt = 8;
my $byte_offset = 0;
while($byte_cnt < $max_byte_cnt) {
my $tmp_str = substr($HEXDATE,$byte_offset,2);
push(@bytearry,$tmp_str);
$byte_cnt++;
$byte_offset+=2;
}
return join('',reverse(@bytearry));
}
I am not sure what "push", "shift", and "substr" are doing here that would be the same in
Python.
The Perl subroutine seems rather complicated for what it does, viz., taking chunks of two
chars at a time (the first 16 chars) from the sent string and then reverses it. Another Perl
option is:
sub reverse_hex {
return join '', reverse unpack 'A2' x 8, $_[0];
}
First, unpack here takes two characters at a time (eight times) and produces
a list. That list is reverse d and join ed to produce the final
string.
Here's a Python subroutine to accomplish this:
def reverse_hex(HEXDATE):
hexVals = [HEXDATE[i:i + 2] for i in xrange(0, 16, 2)]
reversedHexVals = hexVals[::-1]
return ''.join(reversedHexVals)
The list comprehension produces eight elements of two characters each. [::-1]
reverses the list's elements and the result is join ed and returned.
I realize that you are asking about the perl to python translation, but if you have any
control over the perl, I would like to point out that this function is a lot more complicated
than it needs to be.
The entire thing could be replaced with:
sub reverse_hex
{
my $hexdate = shift;
my @bytes = $hexdate =~ /../g; # break $hexdate into array of character pairs
return join '', reverse(@bytes);
}
Not only is this shorter, it is much easier to get your head around. Of course, if you
have no control over the perl, you are stuck with what you were dealt.
Many distributions ship a lot of perl modules as packages.
Debian/Ubuntu: apt-cache search 'perl$'
Arch Linux: pacman -Ss '^perl-'
Gentoo: category dev-perl
You should always prefer them as you benefit from automatic (security) updates and
the ease of removal . This can be pretty tricky with the cpan tool itself.
For Gentoo there's a nice tool called g-cpan which builds/installs the
module from CPAN and creates a Gentoo package ( ebuild ) for you.
It's great for just getting stuff installed. It provides none of the more complex
functionality of CPAN or CPANPLUS, so it's easy to use, provided you know which module you
want to install. If you haven't already got cpanminus, just type:
# cpan App::cpanminus
to install it.
It is also possible to install it without using cpan at all. The basic bootstrap procedure
is,
I note some folks suggesting one run cpan under sudo. That used to be necessary to install
into the system directory, but modern versions of the CPAN shell allow you to configure it to
use sudo just for installing. This is much safer, since it means that tests don't run as
root.
If you have an old CPAN shell, simply install the new cpan ("install CPAN") and when you
reload the shell, it should prompt you to configure these new directives.
Nowadays, when I'm on a system with an old CPAN, the first thing I do is update the shell
and set it up to do this so I can do most of my cpan work as a normal user.
Also, I'd strongly suggest that Windows users investigate strawberry Perl . This is a version of Perl that comes
packaged with a pre-configured CPAN shell as well as a compiler. It also includes some
hard-to-compile Perl modules with their external C library dependencies, notably XML::Parser.
This means that you can do the same thing as every other Perl user when it comes to
installing modules, and things tend to "just work" a lot more often.
A couple of people mentioned the cpan utility, but it's more than just starting a shell. Just
give it the modules that you want to install and let it do it's work.
$prompt> cpan Foo::Bar
If you don't give it any arguments it starts the CPAN.pm shell. This works on Unix, Mac,
and should be just fine on Windows (especially Strawberry Perl).
There are several other things that you can do with the cpan tool as well. Here's a
summary of the current features (which might be newer than the one that comes with CPAN.pm
and perl):
-a
Creates the CPAN.pm autobundle with CPAN::Shell->autobundle.
-A module [ module ... ]
Shows the primary maintainers for the specified modules
-C module [ module ... ]
Show the Changes files for the specified modules
-D module [ module ... ]
Show the module details. This prints one line for each out-of-date module (meaning,
modules locally installed but have newer versions on CPAN). Each line has three columns:
module name, local version, and CPAN version.
-L author [ author ... ]
List the modules by the specified authors.
-h
Prints a help message.
-O
Show the out-of-date modules.
-r
Recompiles dynamically loaded modules with CPAN::Shell->recompile.
-v
Print the script version and CPAN.pm version.
Otto made a
good suggestion . This works for Debian too, as well as any other Debian derivative. The
missing piece is what to do when apt-cache search doesn't find something.
This will give you a deb package that you can install to get Some::Random::Module. One of
the big benefits here is man pages and sample scripts in addition to the module itself will
be placed in your distro's location of choice. If the distro ever comes out with an official
package for a newer version of Some::Random::Module, it will automatically be installed when
you apt-get upgrade.
Lots of recommendation for CPAN.pm , which is great, but if you're using
Perl 5.10 then you've also got access to CPANPLUS.pm which is like
CPAN.pm but better.
And, of course, it's available on CPAN for people still using older versions
of Perl. Why not try:
On Fedora Linux or Enterprise Linux , yum also tracks
perl library dependencies. So, if the perl module is available, and some rpm package exports
that dependency, it will install the right package for you.
yum install 'perl(Chocolate::Belgian)'
(most likely perl-Chocolate-Belgian package, or even ChocolateFactory package)
Seems like you've already got your answer but I figured I'd chime in. This is what I do in
some scripts on an Ubuntu (or debian server)
#!/usr/bin/perl
use warnings;
use strict;
#I've gotten into the habit of setting this on all my scripts, prevents weird path issues if the script is not being run by root
$ENV{'PATH'} = '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin';
#Fill this with the perl modules required for your project
my @perl = qw(LWP::Simple XML::LibXML MIME::Lite DBI DateTime Config::Tiny Proc::ProcessTable);
chomp(my $curl = `which curl`);
if(!$curl){ system('apt-get install curl -y > /dev/null'); }
chomp(my $cpanm = system('/bin/bash', '-c', 'which cpanm &>/dev/null'));
#installs cpanm if missing
if($cpanm){ system('curl -s -L http://cpanmin.us | perl - --sudo App::cpanminus'); }
#loops through required modules and installs them if missing
foreach my $x (@perl){
eval "use $x";
if($@){
system("cpanm $x");
eval "use $x";
}
}
This works well for me, maybe there is something here you can use.
I want to play with system command in python . for example we have this function in perl :
system("ls -la"); and its run ls -la what is the system function in python ? Thanks in
Advance .
But if you want to do more advanced things with subprocesses the subprocess module provides
a higher level interface with more possibilities that is usually preferable.
I'm using Python 2.6 on Linux. What is the fastest way:
to determine which partition contains a given directory or file?
For example, suppose that /dev/sda2 is mounted on /home ,
and /dev/mapper/foo is mounted on /home/foo . From the string
"/home/foo/bar/baz" I would like to recover the pair
("/dev/mapper/foo", "home/foo") .
and then, to get usage statistics of the given partition? For example, given
/dev/mapper/foo I would like to obtain the size of the partition and the free
space available (either in bytes or approximately in megabytes).
If you just need the free space on a device, see the answer using
os.statvfs() below.
If you also need the device name and mount point associated with the file, you should call
an external program to get this information. df will provide all the information
you need -- when called as df filename it prints a line about the partition that
contains the file.
Note that this is rather brittle, since it depends on the exact format of the
df output, but I'm not aware of a more robust solution. (There are a few
solutions relying on the /proc filesystem below that are even less portable than
this one.)
This doesn't give the name of the partition, but you can get the filesystem statistics
directly using the statvfs Unix system call. To call it from Python, use
os.statvfs('/home/foo/bar/baz')
.
unsigned long f_frsize Fundamental file system block size.
fsblkcnt_t f_blocks Total number of blocks on file system in units of f_frsize.
fsblkcnt_t f_bfree Total number of free blocks.
fsblkcnt_t f_bavail Number of free blocks available to
non-privileged process.
So to make sense of the values, multiply by f_frsize :
import os
statvfs = os.statvfs('/home/foo/bar/baz')
statvfs.f_frsize * statvfs.f_blocks # Size of filesystem in bytes
statvfs.f_frsize * statvfs.f_bfree # Actual number of free bytes
statvfs.f_frsize * statvfs.f_bavail # Number of free bytes that ordinary users
# are allowed to use (excl. reserved space)
import os
def get_mount_point(pathname):
"Get the mount point of the filesystem containing pathname"
pathname= os.path.normcase(os.path.realpath(pathname))
parent_device= path_device= os.stat(pathname).st_dev
while parent_device == path_device:
mount_point= pathname
pathname= os.path.dirname(pathname)
if pathname == mount_point: break
parent_device= os.stat(pathname).st_dev
return mount_point
def get_mounted_device(pathname):
"Get the device mounted at pathname"
# uses "/proc/mounts"
pathname= os.path.normcase(pathname) # might be unnecessary here
try:
with open("/proc/mounts", "r") as ifp:
for line in ifp:
fields= line.rstrip('\n').split()
# note that line above assumes that
# no mount points contain whitespace
if fields[1] == pathname:
return fields[0]
except EnvironmentError:
pass
return None # explicit
def get_fs_freespace(pathname):
"Get the free space of the filesystem containing pathname"
stat= os.statvfs(pathname)
# use f_bfree for superuser, or f_bavail if filesystem
# has reserved space for superuser
return stat.f_bfree*stat.f_bsize
import os
from collections import namedtuple
disk_ntuple = namedtuple('partition', 'device mountpoint fstype')
usage_ntuple = namedtuple('usage', 'total used free percent')
def disk_partitions(all=False):
"""Return all mountd partitions as a nameduple.
If all == False return phyisical partitions only.
"""
phydevs = []
f = open("/proc/filesystems", "r")
for line in f:
if not line.startswith("nodev"):
phydevs.append(line.strip())
retlist = []
f = open('/etc/mtab', "r")
for line in f:
if not all and line.startswith('none'):
continue
fields = line.split()
device = fields[0]
mountpoint = fields[1]
fstype = fields[2]
if not all and fstype not in phydevs:
continue
if device == 'none':
device = ''
ntuple = disk_ntuple(device, mountpoint, fstype)
retlist.append(ntuple)
return retlist
def disk_usage(path):
"""Return disk usage associated with path."""
st = os.statvfs(path)
free = (st.f_bavail * st.f_frsize)
total = (st.f_blocks * st.f_frsize)
used = (st.f_blocks - st.f_bfree) * st.f_frsize
try:
percent = ret = (float(used) / total) * 100
except ZeroDivisionError:
percent = 0
# NB: the percentage is -5% than what shown by df due to
# reserved blocks that we are currently not considering:
# http://goo.gl/sWGbH
return usage_ntuple(total, used, free, round(percent, 1))
if __name__ == '__main__':
for part in disk_partitions():
print part
print " %s\n" % str(disk_usage(part.mountpoint))
import os
from collections import namedtuple
DiskUsage = namedtuple('DiskUsage', 'total used free')
def disk_usage(path):
"""Return disk usage statistics about the given path.
Will return the namedtuple with attributes: 'total', 'used' and 'free',
which are the amount of total, used and free space, in bytes.
"""
st = os.statvfs(path)
free = st.f_bavail * st.f_frsize
total = st.f_blocks * st.f_frsize
used = (st.f_blocks - st.f_bfree) * st.f_frsize
return DiskUsage(total, used, free)
For the first point, you can try using os.path.realpath
to get a canonical path, check it against /etc/mtab (I'd actually suggest
calling getmntent , but I can't find a normal way to access it) to find the
longest match. (to be sure, you should probably stat both the file and the
presumed mountpoint to verify that they are in fact on the same device)
For the second point, use os.statvfs to get block
size and usage information.
(Disclaimer: I have tested none of this, most of what I know came from the coreutils
sources)
For the second part of your question, "get usage statistics of the given partition",
psutil makes this easy
with the disk_usage(path) function.
Given a path, disk_usage() returns a named tuple including total, used, and free
space expressed in bytes, plus the percentage usage.
Usually the /proc directory contains such information in Linux, it is a virtual
filesystem. For example, /proc/mounts gives information about current mounted
disks; and you can parse it directly. Utilities like top , df all
make use of /proc .
I know what my is in Perl. It defines a variable that exists only in the scope
of the block in which it is defined. What does our do? How does our
differ from my ?
Great question: How does our differ from my and what
does our do?
In Summary:
Available since Perl 5, my is a way to declare:
non-package variables, that are
private,
new ,
non-global variables,
separate from any package. So that the variable cannot be accessed in the form
of $package_name::variable .
On the other hand, our variables are:
package variables, and thus automatically
global variables,
definitely not private ,
nor are they necessarily new; and they
can be accessed outside the package (or lexical scope) with the qualified
namespace, as $package_name::variable .
Declaring a variable with our allows you to predeclare variables in
order to use them under use strict without getting typo warnings or
compile-time errors. Since Perl 5.6, it has replaced the obsolete use
vars , which was only file-scoped, and not lexically scoped as is
our .
For example, the formal, qualified name for variable $x inside package
main is $main::x . Declaring our $x allows you to use
the bare $x variable without penalty (i.e., without a resulting error), in the
scope of the declaration, when the script uses use strict or use
strict "vars" . The scope might be one, or two, or more packages, or one small
block.
The PerlMonks and PerlDoc links from cartman and Olafur are a great reference - below is my
crack at a summary:
my variables are lexically scoped within a single block defined by
{} or within the same file if not in {} s. They are not accessible
from packages/subroutines defined outside of the same lexical scope / block.
our variables are scoped within a package/file and accessible from any code
that use or require that package/file - name conflicts are resolved
between packages by prepending the appropriate namespace.
Just to round it out, local variables are "dynamically" scoped, differing
from my variables in that they are also accessible from subroutines called
within the same block.
use strict;
for (1 .. 2){
# Both variables are lexically scoped to the block.
our ($o); # Belongs to 'main' package.
my ($m); # Does not belong to a package.
# The variables differ with respect to newness.
$o ++;
$m ++;
print __PACKAGE__, " >> o=$o m=$m\n"; # $m is always 1.
# The package has changed, but we still have direct,
# unqualified access to both variables, because the
# lexical scope has not changed.
package Fubb;
print __PACKAGE__, " >> o=$o m=$m\n";
}
# The our() and my() variables differ with respect to privacy.
# We can still access the variable declared with our(), provided
# that we fully qualify its name, but the variable declared
# with my() is unavailable.
print __PACKAGE__, " >> main::o=$main::o\n"; # 2
print __PACKAGE__, " >> main::m=$main::m\n"; # Undefined.
# Attempts to access the variables directly won't compile.
# print __PACKAGE__, " >> o=$o\n";
# print __PACKAGE__, " >> m=$m\n";
# Variables declared with use vars() are like those declared
# with our(): belong to a package; not private; and not new.
# However, their scoping is package-based rather than lexical.
for (1 .. 9){
use vars qw($uv);
$uv ++;
}
# Even though we are outside the lexical scope where the
# use vars() variable was declared, we have direct access
# because the package has not changed.
print __PACKAGE__, " >> uv=$uv\n";
# And we can access it from another package.
package Bubb;
print __PACKAGE__, " >> main::uv=$main::uv\n";
Coping with Scoping
is a good overview of Perl scoping rules. It's old enough that our is not
discussed in the body of the text. It is addressed in the Notes section at the end.
The article talks about package variables and dynamic scope and how that differs from
lexical variables and lexical scope.
It's an old question, but I ever met some pitfalls about lexical declarations in Perl that
messed me up, which are also related to this question, so I just add my summary here:
1. definition or declaration?
local $var = 42;
print "var: $var\n";
The output is var: 42 . However we couldn't tell if local $var =
42; is a definition or declaration. But how about this:
use strict;
use warnings;
local $var = 42;
print "var: $var\n";
The second program will throw an error:
Global symbol "$var" requires explicit package name.
$var is not defined, which means local $var; is just a
declaration! Before using local to declare a variable, make sure that it is
defined as a global variable previously.
But why this won't fail?
use strict;
use warnings;
local $a = 42;
print "var: $a\n";
The output is: var: 42 .
That's because $a , as well as $b , is a global variable
pre-defined in Perl. Remember the sort function?
2. lexical or global?
I was a C programmer before starting using Perl, so the concept of lexical and global
variables seems straightforward to me: just corresponds to auto and external variables in C.
But there're small differences:
In C, an external variable is a variable defined outside any function block. On the other
hand, an automatic variable is a variable defined inside a function block. Like this:
int global;
int main(void) {
int local;
}
While in Perl, things are subtle:
sub main {
$var = 42;
}
&main;
print "var: $var\n";
The output is var: 42 , $var is a global variable even it's
defined in a function block! Actually in Perl, any variable is declared as global by
default.
The lesson is to always add use strict; use warnings; at the beginning of a
Perl program, which will force the programmer to declare the lexical variable explicitly, so
that we don't get messed up by some mistakes taken for granted.
Unlike my, which both allocates storage for a variable and associates a simple name with
that storage for use within the current scope, our associates a simple name with a package
variable in the current package, for use within the current scope. In other words, our has
the same scoping rules as my, but does not necessarily create a variable.
This is only somewhat related to the question, but I've just discovered a (to me) obscure bit
of perl syntax that you can use with "our" (package) variables that you can't use with "my"
(local) variables.
print "package is: " . __PACKAGE__ . "\n";
our $test = 1;
print "trying to print global var from main package: $test\n";
package Changed;
{
my $test = 10;
my $test1 = 11;
print "trying to print local vars from a closed block: $test, $test1\n";
}
&Check_global;
sub Check_global {
print "trying to print global var from a function: $test\n";
}
print "package is: " . __PACKAGE__ . "\n";
print "trying to print global var outside the func and from \"Changed\" package: $test\n";
print "trying to print local var outside the block $test1\n";
Will Output this:
package is: main
trying to print global var from main package: 1
trying to print local vars from a closed block: 10, 11
trying to print global var from a function: 1
package is: Changed
trying to print global var outside the func and from "Changed" package: 1
trying to print local var outside the block
In case using "use strict" will get this failure while attempting to run the script:
Global symbol "$test1" requires explicit package name at ./check_global.pl line 24.
Execution of ./check_global.pl aborted due to compilation errors.
#!/usr/local/bin/perl
use feature ':5.10';
#use warnings;
package a;
{
my $b = 100;
our $a = 10;
print "$a \n";
print "$b \n";
}
package b;
#my $b = 200;
#our $a = 20 ;
print "in package b value of my b $a::b \n";
print "in package b value of our a $a::a \n";
#!/usr/bin/perl -l
use strict;
# if string below commented out, prints 'lol' , if the string enabled, prints 'eeeeeeeee'
#my $lol = 'eeeeeeeeeee' ;
# no errors or warnings at any case, despite of 'strict'
our $lol = eval {$lol} || 'lol' ;
print $lol;
Let us think what an interpreter actually is: it's a piece of code that stores values in
memory and lets the instructions in a program that it interprets access those values by their
names, which are specified inside these instructions. So, the big job of an interpreter is to
shape the rules of how we should use the names in those instructions to access the values
that the interpreter stores.
On encountering "my", the interpreter creates a lexical variable: a named value that the
interpreter can access only while it executes a block, and only from within that syntactic
block. On encountering "our", the interpreter makes a lexical alias of a package variable: it
binds a name, which the interpreter is supposed from then on to process as a lexical
variable's name, until the block is finished, to the value of the package variable with the
same name.
The effect is that you can then pretend that you're using a lexical variable and bypass
the rules of 'use strict' on full qualification of package variables. Since the interpreter
automatically creates package variables when they are first used, the side effect of using
"our" may also be that the interpreter creates a package variable as well. In this case, two
things are created: a package variable, which the interpreter can access from everywhere,
provided it's properly designated as requested by 'use strict' (prepended with the name of
its package and two colons), and its lexical alias.
is it possible to import ( use ) a perl module within a different namespace?
Let's say I have a Module A (XS Module with no methods Exported
@EXPORT is empty) and I have no way of changing the module.
This Module has a Method A::open
currently I can use that Module in my main program (package main) by calling
A::open I would like to have that module inside my package main so
that I can directly call open
I tried to manually push every key of %A:: into %main:: however
that did not work as expected.
The only way that I know to achieve what I want is by using package A; inside
my main program, effectively changing the package of my program from main to
A . Im not satisfied with this. I would really like to keep my program inside
package main.
Is there any way to achieve this and still keep my program in package main?
Offtopic: Yes I know usually you would not want to import everything into your
namespace but this module is used by us extensively and we don't want to type A:: (well the
actual module name is way longer which isn't making the situation better)in front of hundreds
or thousands of calls
This is one of those "impossible" situations, where the clear solution -- to rework that
module -- is off limits.
But, you can alias that package's subs names, from its symbol table, to the same
names in main . Worse than being rude, this comes with a glitch: it catches all
names that that package itself imported in any way. However, since this package is a fixed
quantity it stands to reason that you can establish that list (and even hard-code it). It is
just this one time, right?
main
use warnings;
use strict;
use feature 'say';
use OffLimits;
GET_SUBS: {
# The list of names to be excluded
my $re_exclude = qr/^(?:BEGIN|import)$/; # ...
my @subs = grep { !/$re_exclude/ } sort keys %OffLimits::;
no strict 'refs';
for my $sub_name (@subs) {
*{ $sub_name } = \&{ 'OffLimits::' . $sub_name };
}
};
my $name = name('name() called from ' . __PACKAGE__);
my $id = id('id() called from ' . __PACKAGE__);
say "name() returned: $name";
say "id() returned: $id";
with OffLimits.pm
package OffLimits;
use warnings;
use strict;
sub name { return "In " . __PACKAGE__ . ": @_" }
sub id { return "In " . __PACKAGE__ . ": @_" }
1;
It prints
name() returned: In OffLimits: name() called from main
id() returned: In OffLimits: id() called from main
You may need that code in a BEGIN block, depending on other details.
Another option is of course to hard-code the subs to be "exported" (in @subs
). Given that the module is in practice immutable this option is reasonable and more
reliable.
This can also be wrapped in a module, so that you have the normal, selective,
importing.
WrapOffLimits.pm
package WrapOffLimits;
use warnings;
use strict;
use OffLimits;
use Exporter qw(import);
our @sub_names;
our @EXPORT_OK = @sub_names;
our %EXPORT_TAGS = (all => \@sub_names);
BEGIN {
# Or supply a hard-coded list of all module's subs in @sub_names
my $re_exclude = qr/^(?:BEGIN|import)$/; # ...
@sub_names = grep { !/$re_exclude/ } sort keys %OffLimits::;
no strict 'refs';
for my $sub_name (@sub_names) {
*{ $sub_name } = \&{ 'OffLimits::' . $sub_name };
}
};
1;
and now in the caller you can import either only some subs
use WrapOffLimits qw(name);
or all
use WrapOffLimits qw(:all);
with otherwise the same main as above for a test.
The module name is hard-coded, which should be OK as this is meant only for that
module.
The following is added mostly for completeness.
One can pass the module name to the wrapper by writing one's own import sub,
which is what gets used then. The import list can be passed as well, at the expense of an
awkward interface of the use statement.
It goes along the lines of
package WrapModule;
use warnings;
use strict;
use OffLimits;
use Exporter qw(); # will need our own import
our ($mod_name, @sub_names);
our @EXPORT_OK = @sub_names;
our %EXPORT_TAGS = (all => \@sub_names);
sub import {
my $mod_name = splice @_, 1, 1; # remove mod name from @_ for goto
my $re_exclude = qr/^(?:BEGIN|import)$/; # etc
no strict 'refs';
@sub_names = grep { !/$re_exclude/ } sort keys %{ $mod_name . '::'};
for my $sub_name (@sub_names) {
*{ $sub_name } = \&{ $mod_name . '::' . $sub_name };
}
push @EXPORT_OK, @sub_names;
goto &Exporter::import;
}
1;
what can be used as
use WrapModule qw(OffLimits name id); # or (OffLimits :all)
or, with the list broken-up so to remind the user of the unusual interface
use WrapModule 'OffLimits', qw(name id);
When used with the main above this prints the same output.
The use statement ends up using the import sub defined in the module, which
exports symbols by writing to the caller's symbol table. (If no import sub is
written then the Exporter 's import method is nicely used, which is
how this is normally done.)
This way we are able to unpack the arguments and have the module name supplied at
use invocation. With the import list supplied as well now we have to
push manually to @EXPORT_OK since this can't be in the
BEGIN phase. In the end the sub is replaced by Exporter::import via
the (good form of) goto , to complete the job.
You can forcibly "import" a function into main using glob assignment to alias the subroutine
(and you want to do it in BEGIN so it happens at compile time, before calls to that
subroutine are parsed later in the file):
use strict;
use warnings;
use Other::Module;
BEGIN { *open = \&Other::Module::open }
However, another problem you might have here is that open is a builtin function, which may
cause some problems . You can add
use subs 'open'; to indicate that you want to override the built-in function in
this case, since you aren't using an actual import function to do so.
Here is what I now came up with. Yes this is hacky and yes I also feel like I opened pandoras
box with this. However at least a small dummy program ran perfectly fine.
I renamed the module in my code again. In my original post I used the example
A::open actually this module does not contain any method/variable reserved by
the perl core. This is why I blindly import everything here.
BEGIN {
# using the caller to determine the parent. Usually this is main but maybe we want it somewhere else in some cases
my ($parent_package) = caller;
package A;
foreach (keys(%A::)) {
if (defined $$_) {
eval '*'.$parent_package.'::'.$_.' = \$A::'.$_;
}
elsif (%$_) {
eval '*'.$parent_package.'::'.$_.' = \%A::'.$_;
}
elsif (@$_) {
eval '*'.$parent_package.'::'.$_.' = \@A::'.$_;
}
else {
eval '*'.$parent_package.'::'.$_.' = \&A::'.$_;
}
}
}
I have a Perl module (Module.pm) that initializes a number of variables, some of which I'd
like to import ($VAR2, $VAR3) into additional submodules that it might load during execution.
The way I'm currently setting up Module.pm is as follows:
package Module;
use warnings;
use strict;
use vars qw($SUBMODULES $VAR1 $VAR2 $VAR3);
require Exporter;
our @ISA = qw(Exporter);
our @EXPORT = qw($VAR2 $VAR3);
sub new {
my ($package) = @_;
my $self = {};
bless ($self, $package);
return $self;
}
sub SubModules1 {
my $self = shift;
if($SUBMODULES->{'1'}) { return $SUBMODULES->{'1'}; }
# Load & cache submodule
require Module::SubModule1;
$SUBMODULES->{'1'} = Module::SubModule1->new(@_);
return $SUBMODULES->{'1'};
}
sub SubModules2 {
my $self = shift;
if($SUBMODULES->{'2'}) { return $SUBMODULES->{'2'}; }
# Load & cache submodule
require Module::SubModule2;
$SUBMODULES->{'2'} = Module::SubModule2->new(@_);
return $SUBMODULES->{'2'};
}
Each submodule is structured as follows:
package Module::SubModule1;
use warnings;
use strict;
use Carp;
use vars qw();
sub new {
my ($package) = @_;
my $self = {};
bless ($self, $package);
return $self;
}
I want to be able to import the $VAR2 and $VAR3 variables into each of the submodules
without having to reference them as $Module::VAR2 and $Module::VAR3. I noticed that the
calling script is able to access both the variables that I have exported in Module.pm in the
desired fashion but SubModule1.pm and SubModule2.pm still have to reference the variables as
being from Module.pm.
I tried updating each submodule as follows which unfortunately didn't work I was
hoping:
package Module::SubModule1;
use warnings;
use strict;
use Carp;
use vars qw($VAR2 $VAR3);
sub new {
my ($package) = @_;
my $self = {};
bless ($self, $package);
$VAR2 = $Module::VAR2;
$VAR3 = $Module::VAR3;
return $self;
}
Please let me know how I can successfully export $VAR2 and $VAR3 from Module.pm into each
Submodule. Thanks in advance for your help!
? Calling use Module from another package (say
Module::Submodule9 ) will try to run the Module::import method.
Since you don't have that method, it will call the Exporter::import method, and
that is where the magic that exports Module 's variables into the
Module::Submodule9 namespace will happen.
In your program there is only one Module namespace and only one instance of
the (global) variable $Module::VAR2 . Exporting creates aliases to this variable
in other namespaces, so the same variable can be accessed in different ways. Try this in a
separate script:
package Whatever;
use Module;
use strict;
use vars qw($VAR2);
$Module::VAR2 = 5;
print $Whatever::VAR2; # should be 5.
$VAR2 = 14; # same as $Whatever::VAR2 = 14
print $Module::VAR2; # should be 14
package M;
use strict;
use warnings;
#our is better than "use vars" for creating package variables
#it creates an alias to $M::foo named $foo in the current lexical scope
our $foo = 5;
sub inM { print "$foo\n" }
1;
In M/S.pm
package M;
#creates an alias to $M::foo that will last for the entire scope,
#in this case the entire file
our $foo;
package M::S;
use strict;
use warnings;
sub inMS { print "$foo\n" }
1;
In the script:
#!/usr/bin/perl
use strict;
use warnings;
use M;
use M::S;
M::inM();
M::S::inMS();
But I would advise against this. Global variables are not a good practice, and sharing
global variables between modules is even worse.
Recovery is possible but it depends on what caused the corruption.
If the file is just truncated, getting some partial result out is not too hard; just
run
gunzip < SMS.tar.gz > SMS.tar.partial
which will give some output despite the error at the end.
If the compressed file has large missing blocks, it's basically hopeless after the bad
block.
If the compressed file is systematically corrupted in small ways (e.g. transferring the
binary file in ASCII mode, which smashes carriage returns and newlines throughout the file),
it is possible to recover but requires quite a bit of custom programming, it's really only
worth it if you have absolutely no other recourse (no backups) and the data is worth a lot of
effort. (I have done it successfully.) I mentioned this scenario in a previous
question .
The answers for .zip files differ somewhat, since zip archives have multiple
separately-compressed members, so there's more hope (though most commercial tools are rather
bogus, they eliminate warnings by patching CRCs, not by recovering good data). But your
question was about a .tar.gz file, which is an archive with one big member.
,
Here is one possible scenario that we encountered. We had a tar.gz file that would not
decompress, trying to unzip gave the error:
All You have to do is git push origin master , where origin is the
default name (alias) of Your remote repository and master is the remote branch
You want to push Your changes to.
Last I checked, Error was
deprecated. But here's how you would do it without that module:
eval {
die "Oops!";
1;
} or do {
my $e = $@;
print("Something went wrong: $e\n");
};
Basically, use eval instead of try ,
die instead of
throw , and look for the exception in $@ . The true value at the end
of the eval block is part of an idiom to prevent $@ from unintentionally
changing before it is used again in Perl versions older than 5.14, see P::C::P::ErrorHandling::RequireCheckingReturnValueOfEval
for details. For example, this code suffers from this flaw.
# BAD, DO NOT USE WITH PERLS OLDER THAN 5.14
eval {
die "Oops!";
};
if (my $e = $@) {
print("Something went wrong: $e\n");
}
# BAD, DO NOT USE WITH PERLS OLDER THAN 5.14
But note that many Perl operations do not raise exceptions when they fail; they simply
return an error code. This behavior can be altered via autodie for builtins and standard modules. If
you're using autodie , then the standard way of doing try/catch is this
(straight out of the autodie perldoc):
use feature qw(switch);
eval {
use autodie;
open(my $fh, '<', $some_file);
my @records = <$fh>;
# Do things with @records...
close($fh);
};
given ($@) {
when (undef) { say "No error"; }
when ('open') { say "Error from open"; }
when (':io') { say "Non-open, IO error."; }
when (':all') { say "All other autodie errors." }
default { say "Not an autodie error at all." }
}
The consensus of the Perl community seems to be that Try::Tiny is the preferred way of doing exception
handling. The "lenient policy" you refer to is probably due to a combination of:
Perl not being a fully object-oriented language. (e.g. in contrast to Java where you
can't avoid dealing with exceptions.)
The background of many Perl developers. (Languages like C 1 and shell don't
have exception mechanisms.)
The kind of tasks people tend to use Perl for. (Small scripts for text munging and
report generation where exception handling isn't needed.)
Perl not having a (good) built-in exception mechanism.
Note that the last item means that you'll see a lot of code like this:
That's exception handling even though it doesn't use try/catch syntax. It's fragile,
though, and will break in a number of subtle edge cases that most people don't think
about.
Try::Tiny and the other exception handling modules on CPAN were written to make it easier
to get right.
1. C does have setjmp() and longjmp() , which can be used
for a very crude form of exception handling.
,
Never test $@ as is, because it is a global variable, so even the test itself can change it.
General eval-template:
my $result;
eval {
$result= something();
# ...
1; # ok
} or do {
my $eval_error= $@ || "error";
# ...
die $eval_error;
}; # needs a semicolon
In practice that is the lightest way. It still leaves a tiny room for funny $@ behaviour,
but nothing that really concerned me enough.
"... It baffles me the most because the common objection to Perl is legibility. Even if you assume that the objection is made from ignorance - i.e. not even having looked at some Perl to gauge its legibility - the nonsense you see in a complex bash script is orders of magnitude worse! ..."
"... Maybe it's not reassuring to hear that, but I took an interest in Perl precisely because it's seen as an underdog and "dead" despite having experienced users and a lot of code, kind of like TCL, Prolog, or Ada. ..."
"... There's a long history of bad code written by mediocre developers who became the only one who could maintain the codebase until they no longer worked for the organization. The next poor sap to go in found a mess of a codebase and did their best to not break it further. After a few iterations, the whole thing is ready for /dev/null and Perl gets the blame. ..."
"... All in all, Perl is still my first go-to language, but there are definitely some things I wish it did better. ..."
"... The Perl leadership Osborned itself with Perl6. 20/20 hindsight says the new project should have been given a different name at conception, that way all the "watch this space -- under construction" signage wouldn't have steered people away from perfectly usable Perl5. Again, IMO. ..."
"... I don't observe the premise at all though. Is bash really gaining ground over anything recently? ..."
"... Python again is loved, because "taught by rote" idiots. Now you can give them pretty little packages. And it's no wonder they can do little better than be glorified system admins (which id rather have a real sys admin, since he's likely to understand Perl) ..."
"... Making a new language means lots of new training. Lots of profit in this. Nobody profits from writing new books on old languages. Lots of profit in general from supporting a new language. In the end, owning the language gets you profits. ..."
"... And I still don't get why tab for blocks python is even remotely more readable than Perl. ..."
"... If anything, JavaScript is pretty dang godly at what it does, I understand why that's popular. But I don't get python one bit, except to employ millions of entry level minions who can't think on their own. ..."
"... "Every teacher I know has students using it. We do it because it's an easy language, there's only one way to do it, and with whitespace as syntax it's easy to grade. We don't teach it because it is some powerful or exceptional language. " ..."
Setting aside Perl vs. Python for the moment, how did Perl lose ground to Bash? It used to be that Bash scripts often got replaced
by Perl scripts because Perl was more powerful. Even with very modern versions of Bash, Perl is much more powerful.
The Linux Standards Base (LSB) has helped ensure that certain tools are in predictable locations. Bash has gotten a bit more powerful
since the release of 4.x, sure. Arrays, handicapped to 2-D arrays, have improved somewhat. There is a native regex engine in Bash
3.x, which admit is a big deal. There is also support for hash maps.
This is all good stuff for Bash. But, none of this is sufficient to explain why Perl isn't the thing you learn after Bash, or,
after Bash and Python; take your pick. Thoughts?
Because Perl has suffered immensely in the popularity arena and is now viewed as undesirable. It's not that Bash is seen as
an adequate replacement for Perl, that's where Python has landed.
- "thou must use Moose for everything" -> "Perl is too slow" -> rewrite in Python because the architect loves Python -> Python
is even slower -> architect shunned by the team and everything new written in Go, nobody dares to complain about speed now because
the budget people don't trust them -> Perl is slow
- "globals are bad, singletons are good" -> spaghetti -> Perl is unreadable
- "lets use every single item from the gang of four book" -> insanity -> Perl is bad
- "we must be more OOP" -> everything is a faux object with everything else as attributes -> maintenance team quits and they
all take PHP jobs, at least the PHP people know their place in the order of things and do less hype-driven-development -> Perl
is not OOP enough
- "CGI is bad" -> app needs 6.54GB of RAM for one worker -> customer refuses to pay for more RAM, fires the team, picks a PHP
team to do the next version -> PHP team laughs all the way to the bank, chanting "CGI is king"
It baffles me the most because the common objection to Perl is legibility. Even if you assume that the objection is made
from ignorance - i.e. not even having looked at some Perl to gauge its legibility - the nonsense you see in a complex bash script
is orders of magnitude worse!
Not to mention its total lack of common language features like first-class data and... Like, a compiler...
I no longer write bash scripts because it takes about 5 lines to become unmaintainable.
When I discuss projects with peers and mention that I chose to develop in Perl, the responses range from passive bemusement,
to scorn, to ridicule. The assumption is usually that I'm using a dead language that's crippled in functionality and uses syntax
that will surely make everyone's eyes bleed to read. This is the culture everywhere from the casual hackers to the C-suite.
I've proven at work that I can write nontrivial software using Perl. I'm still asked to use Python or Go (edit: or node, ugh)
for any project that'll have contributors from other teams, or to containerize apps using Docker to remove the need for Perl knowledge
for end-users (no CPAN, carton, etc.). But I'll take what I can get, and now the attitude has gone from "get with the times" or
"that's cute", to "ok but I don't expect everyone else to know it".
Perl has got a lot to offer, and I vastly enjoy using it over other languages I work with. I know that all the impassioned
figures in the Perl community love it just the same, but the community's got some major fragmentation going on. I understand that
everyone's got ideas about the future of the language, but is this really the best time to pull the community apart? I feel like
if everyone was able to let go of their ego and put their heads together to bring us to a point of stability, even a place where
we're not laughed at for professing our support for the language, it would be a major step in the right direction. I think we're
heading to the bottom fast, otherwise.
In that spirit of togetherness, I think the language, particularly the community, needs to be made more accessible to newcomers.
Not accessible to one Perl offshoot, but accessible to Perl. It needs to be decided what Perl means in today's day and age. What
can it do? Why would I want to use it over another shiny language? What are the definitive places I can go to learn more? Who
else will be there? How do I contribute and grow as a Perl developer? There need to be people talking about Perl in places that
aren't necessarily hubs for other Perl enthusiasts. It needs to be something business decision-makers can look at and feel confident
in using.
I really hope something changes. I'd be pretty sad if I had to spend the rest of my career writing whatever the trendy
language of the day is. These are just observations from someone that likes writing Perl and has been watching from the sidelines.
Maybe it's not reassuring to hear that, but I took an interest in Perl precisely because it's seen as an underdog and "dead"
despite having experienced users and a lot of code, kind of like TCL, Prolog, or Ada.
Being able to read Modern Perl for
free also helped a lot. I'm still lacking experience in Perl and I've yet to write anything of importance in it because I don't
see an area in which it's clearly better than anything else, either because of the language, a package, or a framework, and I
don't do a lot of text-munging anymore (I'm also a fan of awk so for small tasks it has the priority).
Don't call it Perl. Unfortunately. Also IME multitasking in Perl5 (or the lack thereof and/or severe issues with) has been
a detriment to it's standing in a "multithread all the things" world.
So often I see people drag themselves down that "thread my app" path. Eventually realize that they are implementing a whole
multi-processing operating system inside their app rather than taking advantage of the perfectly good one they are running on.
There are several perfectly good ways to do concurrency, multitasking, async IO and so on in perl. Many work well in the single
node case and in the multi-node case. Anyone who tells you that multitasking systems are easy because of some implementation language
choice has not made it through the whole Dunning Kruger cycle yet.
Multithreading is never easy. The processors will always manage to do things in a "wrong" order unless you are very careful
with your gatekeeping. However, other languages/frameworks have paradigms that make it seem easier such that those race conditions
show up much later in your product lifecycle.
There's a long history of bad code written by mediocre developers who became the only one who could maintain the codebase
until they no longer worked for the organization. The next poor sap to go in found a mess of a codebase and did their best to
not break it further. After a few iterations, the whole thing is ready for /dev/null and Perl gets the blame.
Bash has limitations, but that (usually) means fewer ways to mess it up. There's less domain knowledge to learn, (afaik) no
CPAN equivalent, and fewer issues with things like "I need to upgrade this but I can't because this other thing uses this older
version which is incompatible with the newer version so now we have to maintain two versions of the library and/or interpreter."
All in all, Perl is still my first go-to language, but there are definitely some things I wish it did better.
Perl has a largish executable memory-footprint*. If that gets in your way (which can happen in tight spaces such as semi/embedded),
you've got two choices: if it's shellable code, go to bash; otherwise, port to C. Or at least, that's my decision tree, and Perl5
is my go-to language. I use bash only when I must, and I hit the books every time.
The Perl leadership Osborned itself with Perl6. 20/20 hindsight says the new project should have been given a different
name at conception, that way all the "watch this space -- under construction" signage wouldn't have steered people away from perfectly
usable Perl5. Again, IMO.
*[e:] Consider, not just core here, but CPAN pull-in as well. I had one project clobbered on a smaller-memory machine when
I tried to set up a pure-Perl scp transfer -- there wasn't room enough for the full file to transfer if it was larger than about
50k, what with all the CPAN. Shelling to commandline scp worked just fine.
To be fair, wrapping a Perl script around something that's (if I read your comment right) just running SCP is adding a pointless
extra layer of complexity anyway.
It's a matter of using the best tool for each particular job, not just sticking with one. My own ~/bin directory has a big
mix of Perl and pure shell, depending on the complexity of the job to be done.
Agreed; I brought that example up to illustrate the bulk issue. In it, I was feeling my way, not sure how much finagling I
might have to do for the task (backdoor-passing legitimate sparse but possibly quite bulky email from one server to another),
which is why I initially went for the pure-Perl approach, so I'd have the mechanics exposed for any needed hackery. The experience
taught me to get by more on shelling to precompiled tooling where appropriate... and a healthy respect for CPAN pull-in, [e:]
the way that this module depends on that module so it gets pulled in along with its dependencies in turn, and the pileup
grows in memory. There was a time or two here and there where I only needed a teeny bit of what a module does, so I went in and
studied the code, then implemented it internally as a function without the object's generalities and bulk. The caution learned
on ancient x86 boxes now seems appropriate on ARM boards like rPi; what goes around comes around.
wouldn't have steered people away from perfectly usable Perl5
Perl5 development was completely stalled at the time. Perl6 brought not only new blood into it's own effort, it reinvigorated
Perl5 in the process.
It's completely backwards to suggest Perl 5 was fine until perl6 came along. It was almost dormant and became a lively language
after Perl 6 was announced.
Perl is better than pretty much everything g out there at what it does.
But keep in mind,
They say C sharp is loved by everyone, when in reality it's Microsoft pushing their narrative and the army of "learn by rote"
engineers In developing countries
Python again is loved, because "taught by rote" idiots. Now you can give them pretty little packages. And it's no wonder
they can do little better than be glorified system admins (which id rather have a real sys admin, since he's likely to understand
Perl)
Making a new language means lots of new training. Lots of profit in this. Nobody profits from writing new books on old
languages. Lots of profit in general from supporting a new language. In the end, owning the language gets you profits.
And I still don't get why tab for blocks python is even remotely more readable than Perl.
If anything, JavaScript is pretty dang godly at what it does, I understand why that's popular. But I don't get python one
bit, except to employ millions of entry level minions who can't think on their own.
I know a comp sci professor. I asked why he thought Python was so popular.
"Every teacher I know has students using it. We do it because it's an easy language, there's only one way to do it, and
with whitespace as syntax it's easy to grade. We don't teach it because it is some powerful or exceptional language. "
Then he said if he really needs to get something done, it's Perl or C.
Perl has a steeper and longer learning with it. curve than Python, and there is more than one way to do anything. And there
quite a few that continue coding
I'm trying to parse a single string and get multiple chunks of data out from the same string
with the same regex conditions. I'm parsing a single HTML doc that is static (For an
undisclosed reason, I can't use an HTML parser to do the job.) I have an expression that
looks like:
$string =~ /\<img\ssrc\="(.*)"/;
and I want to get the value of $1. However, in the one string, there are many img tags
like this, so I need something like an array returned (@1?) is this possible?
Vim can indent bash scripts. But not reformat them before indenting.
Backup your bash script, open it with vim, type gg=GZZ and indent will be
corrected. (Note for the impatient: this overwrites the file, so be sure to do that backup!)
Though, some bugs with << (expecting EOF as first character on a line)
e.g.
The granddaddy of HTML tools, with support for modern standards.
There used to be a fork called tidy-html5 which since became the official thing. Here is its
GitHub repository
.
Tidy is a console application for Mac OS X, Linux, Windows, UNIX, and more. It corrects and
cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to modern
standards.
For your needs, here is the command line to call Tidy:
Given a filename in the form
someletters_12345_moreleters.ext
, I want to extract the 5
digits and put them into a variable.
So to emphasize the point, I have a filename with x number of
characters then a five digit sequence surrounded by a single underscore on either side then another
set of x number of characters. I want to take the 5 digit number and put that into a variable.
I am very interested in the number of different ways that this can be accomplished.
If
x
is constant, the following parameter expansion performs substring extraction:
b=${a:12:5}
where
12
is the offset (zero-based) and
5
is the length
If the underscores around the digits are the only ones in the input, you can strip off the
prefix and suffix (respectively) in two steps:
tmp=${a#*_} # remove prefix ending in "_"
b=${tmp%_*} # remove suffix starting with "_"
If there are other underscores, it's probably feasible anyway, albeit more tricky. If anyone
knows how to perform both expansions in a single expression, I'd like to know too.
Both solutions presented are pure bash, with no process spawning involved, hence very fast.
In case someone wants more rigorous information, you can also search it in man bash like this
$ man bash [press return key]
/substring [press return key]
[press "n" key]
[press "n" key]
[press "n" key]
[press "n" key]
Result:
${parameter:offset}
${parameter:offset:length}
Substring Expansion. Expands to up to length characters of
parameter starting at the character specified by offset. If
length is omitted, expands to the substring of parameter start‐
ing at the character specified by offset. length and offset are
arithmetic expressions (see ARITHMETIC EVALUATION below). If
offset evaluates to a number less than zero, the value is used
as an offset from the end of the value of parameter. Arithmetic
expressions starting with a - must be separated by whitespace
from the preceding : to be distinguished from the Use Default
Values expansion. If length evaluates to a number less than
zero, and parameter is not @ and not an indexed or associative
array, it is interpreted as an offset from the end of the value
of parameter rather than a number of characters, and the expan‐
sion is the characters between the two offsets. If parameter is
@, the result is length positional parameters beginning at off‐
set. If parameter is an indexed array name subscripted by @ or
*, the result is the length members of the array beginning with
${parameter[offset]}. A negative offset is taken relative to
one greater than the maximum index of the specified array. Sub‐
string expansion applied to an associative array produces unde‐
fined results. Note that a negative offset must be separated
from the colon by at least one space to avoid being confused
with the :- expansion. Substring indexing is zero-based unless
the positional parameters are used, in which case the indexing
starts at 1 by default. If offset is 0, and the positional
parameters are used, $0 is prefixed to the list.
Note: the above is a regular expression and is restricted to your specific scenario of five
digits surrounded by underscores. Change the regular expression if you need different matching.
I have a filename with x number of characters then a five digit sequence surrounded by a
single underscore on either side then another set of x number of characters. I want to take
the 5 digit number and put that into a variable.
Here's a prefix-suffix solution (similar to the solutions given by JB and Darron) that matches
the first block of digits and does not depend on the surrounding underscores:
str='someletters_12345_morele34ters.ext'
s1="${str#"${str%%[[:digit:]]*}"}" # strip off non-digit prefix from str
s2="${s1%%[^[:digit:]]*}" # strip off non-digit suffix from s1
echo "$s2" # 12345
A slightly more general option would be
not
to assume that you have an
underscore
_
marking the start of your digits sequence, hence for instance stripping
off all non-numbers you get before your sequence:
s/[^0-9]\+\([0-9]\+\).*/\1/p
.
> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to
refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
More on this, in case you're not too confident with regexps:
s
is for _s_ubstitute
[0-9]+
matches 1+ digits
\1
links to the group n.1 of the regex output (group 0 is the whole match,
group 1 is the match within parentheses in this case)
p
flag is for _p_rinting
All escapes
\
are there to make
sed
's regexp processing work.
This will be more efficient if you want to extract something that has any chars like
abc
or any special characters like
_
or
-
. For example: If your string is
like this and you want everything that is after
someletters_
and before
_moreleters.ext
:
str="someletters_123-45-24a&13b-1_moreleters.ext"
With my code you can mention what exactly you want. Explanation:
#*
It will remove the preceding string including the matching key. Here the key
we mentioned is
_
%
It will remove the following string including the
matching key. Here the key we mentioned is '_more*'
Do some experiments yourself and you would find this interesting.
Ok, here goes pure Parameter Substitution with an empty string. Caveat is that I have defined
someletters
and
moreletters
as only characters. If they are
alphanumeric, this will not work as it is.
I use this scrub function to clean up output from other functions.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my %h = (
a => 1,
b => 1
);
print scrub($h{c});
sub scrub {
my $a = shift;
return ($a eq '' or $a eq '~' or not defined $a) ? -1 : $a;
}
The problem occurs when I also would like to handle the case, where the key in a hash
doesn't exist, which is shown in the example with scrub($h{c}) .
What change should be make to scrub so it can handle this case?
You're checking whether $a eq '' before checking whether it's defined, hence the
warning "Use of uninitialized value in string eq". Simply change the order of things in the
conditional:
return (!defined($a) or $a eq '' or $a eq '~') ? -1 : $a;
As soon as anything in the chain of 'or's matches, Perl will stop processing the
conditional, thus avoiding the erroneous attempt to compare undef to a string.
In scrub it is too late to check, if the hash has an entry for key
key . scrub() only sees a scalar, which is undef , if
the hash key does not exist. But a hash could have an entry with the value undef
also, like this:
my %h = (
a => 1,
b => 1,
c => undef
);
So I suggest to check for hash entries with the exists function.
Perl doesn't offer a way to check whether or not a variable has been initialized.
However, scalar variables that haven't been explicitly initialized with some value happen
to have the value of undef by default. You are right about defined
being the right way to check whether or not a variable has a value of undef
.
There's several other ways tho. If you want to assign to the variable if it's
undef , which your example code seems to indicate, you could, for example, use
perl's defined-or operator:
It depends on what you're trying to do. The proper C way to do things is to
initialize variables when they are declared; however, Perl is not C , so one of
the following may be what you want:
1) $var = "foo" unless defined $var; # set default after the fact
2) $var = defined $var? $var : {...}; # ternary operation
3) {...} if !(defined $var); # another way to write 1)
4) $var = $var || "foo"; # set to $var unless it's falsy, in which case set to 'foo'
5) $var ||= "foo"; # retain value of $var unless it's falsy, in which case set to 'foo' (same as previous line)
6) $var = $var // "foo"; # set to $var unless it's undefined, in which case set to 'foo'
7) $var //= "foo"; # 5.10+ ; retain value of $var unless it's undefined, in which case set to 'foo' (same as previous line)
C way of doing things ( not recommended ):
# initialize the variable to a default value during declaration
# then test against that value when you want to see if it's been changed
my $var = "foo";
{...}
if ($var eq "foo"){
... # do something
} else {
... # do something else
}
Another long-winded way of doing this is to create a class and a flag when the variable's
been changed, which is unnecessary.
#!/usr/bin/perl
use warnings;
use Net::Cisco;
################################### S
open( OUTPUTS, ">log_Success.txt" );
open( OUTPUTF, ">log_Fail.txt" );
################################### E
open( SWITCHIP, "ip.txt" ) or die "couldn't open ip.txt";
my $count = 0;
while (<SWITCHIP>) {
chomp($_);
my $switch = $_;
my $tl = 0;
my $t = Net::Telnet::Cisco->new(
Host => $switch,
Prompt =>
'/(?m:^(?:[\w.\/]+\:)?[\w.-]+\s?(?:\(config[^\)]*\))?\s?[\$#>]\s?(?:\(enable\))?\s*$)/',
Timeout => 5,
Errmode => 'return'
) or $tl = 1;
my @output = ();
################################### S
if ( $tl != 1 ) {
print "$switch Telnet success\n"; # for printing it in screen
print OUTPUTS "$switch Telnet success\n"; # it will print it in the log_Success.txt
}
else {
my $telnetstat = "Telnet Failed";
print "$switch $telnetstat\n"; # for printing it in screen
print OUTPUTF "$switch $telnetstat\n"; # it will print it in the log_Fail.txt
}
################################### E
$count++;
}
################################### S
close(SWITCHIP);
close(OUTPUTS);
close(OUTPUTF);
################################### E
In print statement after print just write the filehandle name which is OUTPUT in
your code:
print OUTPUT "$switch Telnet success\n";
and
print OUTPUT "$switch $telnetstat\n";
A side note: always use a lexical filehandle and three arguments with error handling to
open a file. This line open(OUTPUT, ">log.txt"); you can write like this:
But since you're opening a log.txt file with the handle OUTPUT ,
just change your two print statements to have OUTPUT as the first
argument and the string as the next (without a comma).
my $telnetstat;
if($tl != 1) {
$telnetstat = "Telnet success";
} else {
$telnetstat = "Telnet Failed";
}
print OUTPUT "$switch $telnetstat\n";
# Or the shorter ternary operator line for all the above:
print OUTPUT $swtich . (!$tl ? " Telnet success\n" : " Telnet failed\n");
Can anyone recommend a safe solution to recursively replace spaces with underscores in file
and directory names starting from a given root directory? For example:
$ tree
.
|-- a dir
| `-- file with spaces.txt
`-- b dir
|-- another file with spaces.txt
`-- yet another file with spaces.pdf
Use rename (aka prename ) which is a Perl script
which may be on
your system already. Do it in two steps:
find -name "* *" -type d | rename 's/ /_/g' # do the directories first
find -name "* *" -type f | rename 's/ /_/g'
Based on Jürgen's answer and able to handle multiple layers of files and directories
in a single bound using the "Revision 1.5 1998/12/18 16:16:31 rmb1" version of
/usr/bin/rename (a Perl script):
This one does a little bit more. I use it to rename my downloaded torrents (no special
characters (non-ASCII), spaces, multiple dots, etc.).
#!/usr/bin/perl
&rena(`find . -type d`);
&rena(`find . -type f`);
sub rena
{
($elems)=@_;
@t=split /\n/,$elems;
for $e (@t)
{
$_=$e;
# remove ./ of find
s/^\.\///;
# non ascii transliterate
tr [\200-\377][_];
tr [\000-\40][_];
# special characters we do not want in paths
s/[ \-\,\;\?\+\'\"\!\[\]\(\)\@\#]/_/g;
# multiple dots except for extension
while (/\..*\./)
{
s/\./_/;
}
# only one _ consecutive
s/_+/_/g;
next if ($_ eq $e ) or ("./$_" eq $e);
print "$e -> $_\n";
rename ($e,$_);
}
}
I just make one for my own purpose. You may can use it as reference.
#!/bin/bash
cd /vzwhome/c0cheh1/dev_source/UB_14_8
for file in *
do
echo $file
cd "/vzwhome/c0cheh1/dev_source/UB_14_8/$file/Configuration/$file"
echo "==> `pwd`"
for subfile in *\ *; do [ -d "$subfile" ] && ( mv "$subfile" "$(echo $subfile | sed -e 's/ /_/g')" ); done
ls
cd /vzwhome/c0cheh1/dev_source/UB_14_8
done
for i in `IFS="";find /files -name *\ *`
do
echo $i
done > /tmp/list
while read line
do
mv "$line" `echo $line | sed 's/ /_/g'`
done < /tmp/list
rm /tmp/list
Programming skills are somewhat similar to the skills of people who play violin or piano. As
soon a you stop playing violin or piano still start to evaporate. First slowly, then quicker. In
two yours you probably will lose 80%.
Notable quotes:
"... I happened to look the other day. I wrote 35 programs in January, and 28 or 29 programs in February. These are small programs, but I have a compulsion. I love to write programs and put things into it. ..."
Dijkstra said he was proud to be a programmer. Unfortunately he changed his attitude
completely, and I think he wrote his last computer program in the 1980s. At this conference I
went to in 1967 about simulation language, Chris Strachey was going around asking everybody at
the conference what was the last computer program you wrote. This was 1967. Some of the people
said, "I've never written a computer program." Others would say, "Oh yeah, here's what I did
last week." I asked Edsger this question when I visited him in Texas in the 90s and he said,
"Don, I write programs now with pencil and paper, and I execute them in my head." He finds that
a good enough discipline.
I think he was mistaken on that. He taught me a lot of things, but I really think that if he
had continued... One of Dijkstra's greatest strengths was that he felt a strong sense of
aesthetics, and he didn't want to compromise his notions of beauty. They were so intense that
when he visited me in the 1960s, I had just come to Stanford. I remember the conversation we
had. It was in the first apartment, our little rented house, before we had electricity in the
house.
We were sitting there in the dark, and he was telling me how he had just learned about the
specifications of the IBM System/360, and it made him so ill that his heart was actually
starting to flutter.
He intensely disliked things that he didn't consider clean to work with. So I can see that
he would have distaste for the languages that he had to work with on real computers. My
reaction to that was to design my own language, and then make Pascal so that it would work well
for me in those days. But his response was to do everything only intellectually.
So, programming.
I happened to look the other day. I wrote 35 programs in January, and 28 or 29 programs
in February. These are small programs, but I have a compulsion. I love to write programs and
put things into it. I think of a question that I want to answer, or I have part of my book
where I want to present something. But I can't just present it by reading about it in a book.
As I code it, it all becomes clear in my head. It's just the discipline. The fact that I have
to translate my knowledge of this method into something that the machine is going to understand
just forces me to make that crystal-clear in my head. Then I can explain it to somebody else
infinitely better. The exposition is always better if I've implemented it, even though it's
going to take me more time.
So I had a programming hat when I was outside of Cal Tech, and at Cal Tech I am a
mathematician taking my grad studies. A startup company, called Green Tree Corporation because
green is the color of money, came to me and said, "Don, name your price. Write compilers for us
and we will take care of finding computers for you to debug them on, and assistance for you to
do your work. Name your price." I said, "Oh, okay. $100,000.", assuming that this was In that
era this was not quite at Bill Gate's level today, but it was sort of out there.
The guy didn't blink. He said, "Okay." I didn't really blink either. I said, "Well, I'm not
going to do it. I just thought this was an impossible number."
At that point I made the decision in my life that I wasn't going to optimize my income; I
was really going to do what I thought I could do for well, I don't know. If you ask me what
makes me most happy, number one would be somebody saying "I learned something from you". Number
two would be somebody saying "I used your software". But number infinity would be Well, no.
Number infinity minus one would be "I bought your book". It's not as good as "I read your
book", you know. Then there is "I bought your software"; that was not in my own personal value.
So that decision came up. I kept up with the literature about compilers. The Communications of
the ACM was where the action was. I also worked with people on trying to debug the ALGOL
language, which had problems with it. I published a few papers, like "The Remaining Trouble
Spots in ALGOL 60" was one of the papers that I worked on. I chaired a committee called
"Smallgol" which was to find a subset of ALGOL that would work on small computers. I was active
in programming languages.
Frana: You have made the comment several times that maybe 1 in 50 people have the
"computer scientist's mind." Knuth: Yes. Frana: I am wondering if a large number of those
people are trained professional librarians? [laughter] There is some strangeness there. But can
you pinpoint what it is about the mind of the computer scientist that is....
Knuth: That is different?
Frana: What are the characteristics?
Knuth: Two things: one is the ability to deal with non-uniform structure, where you
have case one, case two, case three, case four. Or that you have a model of something where the
first component is integer, the next component is a Boolean, and the next component is a real
number, or something like that, you know, non-uniform structure. To deal fluently with those
kinds of entities, which is not typical in other branches of mathematics, is critical. And the
other characteristic ability is to shift levels quickly, from looking at something in the large
to looking at something in the small, and many levels in between, jumping from one level of
abstraction to another. You know that, when you are adding one to some number, that you are
actually getting closer to some overarching goal. These skills, being able to deal with
nonuniform objects and to see through things from the top level to the bottom level, these are
very essential to computer programming, it seems to me. But maybe I am fooling myself because I
am too close to it.
Frana: It is the hardest thing to really understand that which you are existing
within.
Knuth: I can be a writer, who tries to organize other people's ideas into some kind of a
more coherent structure so that it is easier to put things together. I can see that I could be
viewed as a scholar that does his best to check out sources of material, so that people get
credit where it is due. And to check facts over, not just to look at the abstract of something,
but to see what the methods were that did it and to fill in holes if necessary. I look at my
role as being able to understand the motivations and terminology of one group of specialists
and boil it down to a certain extent so that people in other parts of the field can use it. I
try to listen to the theoreticians and select what they have done that is important to the
programmer on the street; to remove technical jargon when possible.
But I have never been good at any kind of a role that would be making policy, or advising
people on strategies, or what to do. I have always been best at refining things that are there
and bringing order out of chaos. I sometimes raise new ideas that might stimulate people, but
not really in a way that would be in any way controlling the flow. The only time I have ever
advocated something strongly was with literate programming; but I do this always with the
caveat that it works for me, not knowing if it would work for anybody else.
When I work with a system that I have created myself, I can always change it if I don't like
it. But everybody who works with my system has to work with what I give them. So I am not able
to judge my own stuff impartially. So anyway, I have always felt bad about if anyone says,
'Don, please forecast the future,'...
Python was developed organically in the scientific space as a prototyping language that
easily could be translated into C++ if a prototype worked. This happened long before it was
first used for web development. Ruby, on the other hand, became a major player specifically
because of web development; the Rails framework extended Ruby's popularity with people
developing complex websites.
Which programming language best suits your needs? Here is a quick overview of each language
to help you choose:
Approach: one best way vs. human-languagePython
Python takes a direct approach to programming. Its main goal is to make everything obvious
to the programmer. In Python, there is only one "best" way to do something. This philosophy has
led to a language strict in layout.
Python's core philosophy consists of three key hierarchical principles:
Explicit is better than implicit
Simple is better than complex
Complex is better than complicated
This regimented philosophy results in Python being eminently readable and easy to learn --
and why Python is great for beginning coders.
Python has a big foothold in introductory programming courses . Its syntax is very simple,
with little to remember. Because its code structure is explicit, the developer can easily tell
where everything comes from, making it relatively easy to debug.
Python's hierarchy of principles is evident in many aspects of the language. Its use of
whitespace to do flow control as a core part of the language syntax differs from most other
languages, including Ruby. The way you indent code determines the meaning of its action. This
use of whitespace is a prime example of Python's "explicit" philosophy, the shape a Python app
takes spells out its logic and how the app will act.
Ruby
In contrast to Python, Ruby focuses on "human-language" programming, and its code reads like
a verbal language rather than a machine-based one, which many programmers, both beginners and
experts, like. Ruby follows the principle of " least astonishment ," and
offers myriad ways to do the same thing. These similar methods can have multiple names, which
many developers find confusing and frustrating.
Unlike Python, Ruby makes use of "blocks," a first-class object that is treated as a unit
within a program. In fact, Ruby takes the concept of OOP (Object-Oriented Programming) to its
limit. Everything is an object -- even global variables are actually represented within the
ObjectSpace object. Classes and modules are themselves objects, and functions and operators are
methods of objects. This ability makes Ruby especially powerful, especially when combined with
its other primary strength: functional programming and the use of lambdas.
In addition to blocks and functional programming, Ruby provides programmers with many other
features, including fragmentation, hashable and unhashable types, and mutable strings.
Ruby's fans find its elegance to be one of its top selling points. At the same time, Ruby's
"magical" features and flexibility can make it very hard to track down bugs.
Communities:
stability vs. innovation
Although features and coding philosophy are the primary drivers for choosing a given
language, the strength of a developer community also plays an important role. Fortunately, both
Python and Ruby boast strong communities.
Python
Python's community already includes a large Linux and academic community and therefore
offers many academic use cases in both math and science. That support gives the community a
stability and diversity that only grows as Python increasingly is used for web
development.
Ruby
However, Ruby's community has focused primarily on web development from the get-go. It tends
to innovate more quickly than the Python community, but this innovation also causes more things
to break. In addition, while it has gotten more diverse, it has yet to reach the level of
diversity that Python has.
Final thoughts
For web development, Ruby has Rails and Python has Django. Both are powerful frameworks, so
when it comes to web development, you can't go wrong with either language. Your decision will
ultimately come down to your level of experience and your philosophical preferences.
If you plan to focus on building web applications, Ruby is popular and flexible. There is a
very strong community built upon it and they are always on the bleeding edge of
development.
If you are interested in building web applications and would like to learn a language that's
used more generally, try Python. You'll get a diverse community and lots of influence and
support from the various industries in which it is used.
Tom Radcliffe -Tom Radcliffe has over 20 years experience in software
development and management in both academia and industry. He is a professional engineer (PEO
and APEGBC) and holds a PhD in physics from Queen's University at Kingston. Tom brings a
passion for quantitative, data-driven processes to ActiveState .
"... When you're writing a document for a human being to understand, the human being will look at it and nod his head and say, "Yeah, this makes sense." But then there's all kinds of ambiguities and vagueness that you don't realize until you try to put it into a computer. Then all of a sudden, almost every five minutes as you're writing the code, a question comes up that wasn't addressed in the specification. "What if this combination occurs?" ..."
"... When you're faced with implementation, a person who has been delegated this job of working from a design would have to say, "Well hmm, I don't know what the designer meant by this." ..."
...I showed the second version of this design to two of my graduate students, and I said,
"Okay, implement this, please, this summer. That's your summer job." I thought I had specified
a language. I had to go away. I spent several weeks in China during the summer of 1977, and I
had various other obligations. I assumed that when I got back from my summer trips, I would be
able to play around with TeX and refine it a little bit. To my amazement, the students, who
were outstanding students, had not competed [it]. They had a system that was able to do about
three lines of TeX. I thought, "My goodness, what's going on? I thought these were good
students." Well afterwards I changed my attitude to saying, "Boy, they accomplished a
miracle."
Because going from my specification, which I thought was complete, they really had an
impossible task, and they had succeeded wonderfully with it. These students, by the way, [were]
Michael Plass, who has gone on to be the brains behind almost all of Xerox's Docutech software
and all kind of things that are inside of typesetting devices now, and Frank Liang, one of the
key people for Microsoft Word.
He did important mathematical things as well as his hyphenation methods which are quite used
in all languages now. These guys were actually doing great work, but I was amazed that they
couldn't do what I thought was just sort of a routine task. Then I became a programmer in
earnest, where I had to do it. The reason is when you're doing programming, you have to explain
something to a computer, which is dumb.
When you're writing a document for a human being to understand, the human being will
look at it and nod his head and say, "Yeah, this makes sense." But then there's all kinds of
ambiguities and vagueness that you don't realize until you try to put it into a computer. Then
all of a sudden, almost every five minutes as you're writing the code, a question comes up that
wasn't addressed in the specification. "What if this combination occurs?"
It just didn't occur to the person writing the design specification. When you're faced
with implementation, a person who has been delegated this job of working from a design would
have to say, "Well hmm, I don't know what the designer meant by this."
If I hadn't been in China they would've scheduled an appointment with me and stopped their
programming for a day. Then they would come in at the designated hour and we would talk. They
would take 15 minutes to present to me what the problem was, and then I would think about it
for a while, and then I'd say, "Oh yeah, do this. " Then they would go home and they would
write code for another five minutes and they'd have to schedule another appointment.
I'm probably exaggerating, but this is why I think Bob Floyd's Chiron compiler never got
going. Bob worked many years on a beautiful idea for a programming language, where he designed
a language called Chiron, but he never touched the programming himself. I think this was
actually the reason that he had trouble with that project, because it's so hard to do the
design unless you're faced with the low-level aspects of it, explaining it to a machine instead
of to another person.
Forsythe, I think it was, who said, "People have said traditionally that you don't
understand something until you've taught it in a class. The truth is you don't really
understand something until you've taught it to a computer, until you've been able to program
it." At this level, programming was absolutely important
Knuth: No, I stopped going to conferences. It was too discouraging. Computer programming
keeps getting harder because more stuff is discovered. I can cope with learning about one new
technique per day, but I can't take ten in a day all at once. So conferences are depressing; it
means I have so much more work to do. If I hide myself from the truth I am much happier.
"... Also, Addison-Wesley was the people who were asking me to do this book; my favorite textbooks had been published by Addison Wesley. They had done the books that I loved the most as a student. For them to come to me and say, "Would you write a book for us?", and here I am just a secondyear gradate student -- this was a thrill. ..."
"... But in those days, The Art of Computer Programming was very important because I'm thinking of the aesthetical: the whole question of writing programs as something that has artistic aspects in all senses of the word. The one idea is "art" which means artificial, and the other "art" means fine art. All these are long stories, but I've got to cover it fairly quickly. ..."
Knuth: This is, of course, really the story of my life, because I hope to live long enough
to finish it. But I may not, because it's turned out to be such a huge project. I got married
in the summer of 1961, after my first year of graduate school. My wife finished college, and I
could use the money I had made -- the $5000 on the compiler -- to finance a trip to Europe for
our honeymoon.
We had four months of wedded bliss in Southern California, and then a man from
Addison-Wesley came to visit me and said "Don, we would like you to write a book about how to
write compilers."
The more I thought about it, I decided "Oh yes, I've got this book inside of me."
I sketched out that day -- I still have the sheet of tablet paper on which I wrote -- I
sketched out 12 chapters that I thought ought to be in such a book. I told Jill, my wife, "I
think I'm going to write a book."
As I say, we had four months of bliss, because the rest of our marriage has all been devoted
to this book. Well, we still have had happiness. But really, I wake up every morning and I
still haven't finished the book. So I try to -- I have to -- organize the rest of my life
around this, as one main unifying theme. The book was supposed to be about how to write a
compiler. They had heard about me from one of their editorial advisors, that I knew something
about how to do this. The idea appealed to me for two main reasons. One is that I did enjoy
writing. In high school I had been editor of the weekly paper. In college I was editor of the
science magazine, and I worked on the campus paper as copy editor. And, as I told you, I wrote
the manual for that compiler that we wrote. I enjoyed writing, number one.
Also, Addison-Wesley was the people who were asking me to do this book; my favorite
textbooks had been published by Addison Wesley. They had done the books that I loved the most
as a student. For them to come to me and say, "Would you write a book for us?", and here I am
just a secondyear gradate student -- this was a thrill.
Another very important reason at the time was that I knew that there was a great need for a
book about compilers, because there were a lot of people who even in 1962 -- this was January
of 1962 -- were starting to rediscover the wheel. The knowledge was out there, but it hadn't
been explained. The people who had discovered it, though, were scattered all over the world and
they didn't know of each other's work either, very much. I had been following it. Everybody I
could think of who could write a book about compilers, as far as I could see, they would only
give a piece of the fabric. They would slant it to their own view of it. There might be four
people who could write about it, but they would write four different books. I could present all
four of their viewpoints in what I would think was a balanced way, without any axe to grind,
without slanting it towards something that I thought would be misleading to the compiler writer
for the future. I considered myself as a journalist, essentially. I could be the expositor, the
tech writer, that could do the job that was needed in order to take the work of these brilliant
people and make it accessible to the world. That was my motivation. Now, I didn't have much
time to spend on it then, I just had this page of paper with 12 chapter headings on it. That's
all I could do while I'm a consultant at Burroughs and doing my graduate work. I signed a
contract, but they said "We know it'll take you a while." I didn't really begin to have much
time to work on it until 1963, my third year of graduate school, as I'm already finishing up on
my thesis. In the summer of '62, I guess I should mention, I wrote another compiler. This was
for Univac; it was a FORTRAN compiler. I spent the summer, I sold my soul to the devil, I guess
you say, for three months in the summer of 1962 to write a FORTRAN compiler. I believe that the
salary for that was $15,000, which was much more than an assistant professor. I think assistant
professors were getting eight or nine thousand in those days.
Feigenbaum: Well, when I started in 1960 at [University of California] Berkeley, I was
getting $7,600 for the nine-month year.
Knuth: Knuth: Yeah, so you see it. I got $15,000 for a summer job in 1962 writing a
FORTRAN compiler. One day during that summer I was writing the part of the compiler that looks
up identifiers in a hash table. The method that we used is called linear probing. Basically you
take the variable name that you want to look up, you scramble it, like you square it or
something like this, and that gives you a number between one and, well in those days it would
have been between 1 and 1000, and then you look there. If you find it, good; if you don't find
it, go to the next place and keep on going until you either get to an empty place, or you find
the number you're looking for. It's called linear probing. There was a rumor that one of
Professor Feller's students at Princeton had tried to figure out how fast linear probing works
and was unable to succeed. This was a new thing for me. It was a case where I was doing
programming, but I also had a mathematical problem that would go into my other [job]. My winter
job was being a math student, my summer job was writing compilers. There was no mix. These
worlds did not intersect at all in my life at that point. So I spent one day during the summer
while writing the compiler looking at the mathematics of how fast does linear probing work. I
got lucky, and I solved the problem. I figured out some math, and I kept two or three sheets of
paper with me and I typed it up. ["Notes on 'Open' Addressing', 7/22/63] I guess that's on the
internet now, because this became really the genesis of my main research work, which developed
not to be working on compilers, but to be working on what they call analysis of algorithms,
which is, have a computer method and find out how good is it quantitatively. I can say, if I
got so many things to look up in the table, how long is linear probing going to take. It dawned
on me that this was just one of many algorithms that would be important, and each one would
lead to a fascinating mathematical problem. This was easily a good lifetime source of rich
problems to work on. Here I am then, in the middle of 1962, writing this FORTRAN compiler, and
I had one day to do the research and mathematics that changed my life for my future research
trends. But now I've gotten off the topic of what your original question was.
Feigenbaum: We were talking about sort of the.. You talked about the embryo of The Art of
Computing. The compiler book morphed into The Art of Computer Programming, which became a
seven-volume plan.
Knuth: Exactly. Anyway, I'm working on a compiler and I'm thinking about this. But now I'm
starting, after I finish this summer job, then I began to do things that were going to be
relating to the book. One of the things I knew I had to have in the book was an artificial
machine, because I'm writing a compiler book but machines are changing faster than I can write
books. I have to have a machine that I'm totally in control of. I invented this machine called
MIX, which was typical of the computers of 1962.
In 1963 I wrote a simulator for MIX so that I could write sample programs for it, and I
taught a class at Caltech on how to write programs in assembly language for this hypothetical
computer. Then I started writing the parts that dealt with sorting problems and searching
problems, like the linear probing idea. I began to write those parts, which are part of a
compiler, of the book. I had several hundred pages of notes gathering for those chapters for
The Art of Computer Programming. Before I graduated, I've already done quite a bit of writing
on The Art of Computer Programming.
I met George Forsythe about this time. George was the man who inspired both of us [Knuth and
Feigenbaum] to come to Stanford during the '60s. George came down to Southern California for a
talk, and he said, "Come up to Stanford. How about joining our faculty?" I said "Oh no, I can't
do that. I just got married, and I've got to finish this book first." I said, "I think I'll
finish the book next year, and then I can come up [and] start thinking about the rest of my
life, but I want to get my book done before my son is born." Well, John is now 40-some years
old and I'm not done with the book. Part of my lack of expertise is any good estimation
procedure as to how long projects are going to take. I way underestimated how much needed to be
written about in this book. Anyway, I started writing the manuscript, and I went merrily along
writing pages of things that I thought really needed to be said. Of course, it didn't take long
before I had started to discover a few things of my own that weren't in any of the existing
literature. I did have an axe to grind. The message that I was presenting was in fact not going
to be unbiased at all. It was going to be based on my own particular slant on stuff, and that
original reason for why I should write the book became impossible to sustain. But the fact that
I had worked on linear probing and solved the problem gave me a new unifying theme for the
book. I was going to base it around this idea of analyzing algorithms, and have some
quantitative ideas about how good methods were. Not just that they worked, but that they worked
well: this method worked 3 times better than this method, or 3.1 times better than this method.
Also, at this time I was learning mathematical techniques that I had never been taught in
school. I found they were out there, but they just hadn't been emphasized openly, about how to
solve problems of this kind.
So my book would also present a different kind of mathematics than was common in the
curriculum at the time, that was very relevant to analysis of algorithm. I went to the
publishers, I went to Addison Wesley, and said "How about changing the title of the book from
'The Art of Computer Programming' to 'The Analysis of Algorithms'." They said that will never
sell; their focus group couldn't buy that one. I'm glad they stuck to the original title,
although I'm also glad to see that several books have now come out called "The Analysis of
Algorithms", 20 years down the line.
But in those days, The Art of Computer Programming was very important because I'm
thinking of the aesthetical: the whole question of writing programs as something that has
artistic aspects in all senses of the word. The one idea is "art" which means artificial, and
the other "art" means fine art. All these are long stories, but I've got to cover it fairly
quickly.
I've got The Art of Computer Programming started out, and I'm working on my 12 chapters. I
finish a rough draft of all 12 chapters by, I think it was like 1965. I've got 3,000 pages of
notes, including a very good example of what you mentioned about seeing holes in the fabric.
One of the most important chapters in the book is parsing: going from somebody's algebraic
formula and figuring out the structure of the formula. Just the way I had done in seventh grade
finding the structure of English sentences, I had to do this with mathematical sentences.
Chapter ten is all about parsing of context-free language, [which] is what we called it at
the time. I covered what people had published about context-free languages and parsing. I got
to the end of the chapter and I said, well, you can combine these ideas and these ideas, and
all of a sudden you get a unifying thing which goes all the way to the limit. These other ideas
had sort of gone partway there. They would say "Oh, if a grammar satisfies this condition, I
can do it efficiently." "If a grammar satisfies this condition, I can do it efficiently." But
now, all of a sudden, I saw there was a way to say I can find the most general condition that
can be done efficiently without looking ahead to the end of the sentence. That you could make a
decision on the fly, reading from left to right, about the structure of the thing. That was
just a natural outgrowth of seeing the different pieces of the fabric that other people had put
together, and writing it into a chapter for the first time. But I felt that this general
concept, well, I didn't feel that I had surrounded the concept. I knew that I had it, and I
could prove it, and I could check it, but I couldn't really intuit it all in my head. I knew it
was right, but it was too hard for me, really, to explain it well.
So I didn't put in The Art of Computer Programming. I thought it was beyond the scope of my
book. Textbooks don't have to cover everything when you get to the harder things; then you have
to go to the literature. My idea at that time [is] I'm writing this book and I'm thinking it's
going to be published very soon, so any little things I discover and put in the book I didn't
bother to write a paper and publish in the journal because I figure it'll be in my book pretty
soon anyway. Computer science is changing so fast, my book is bound to be obsolete.
It takes a year for it to go through editing, and people drawing the illustrations, and then
they have to print it and bind it and so on. I have to be a little bit ahead of the
state-of-the-art if my book isn't going to be obsolete when it comes out. So I kept most of the
stuff to myself that I had, these little ideas I had been coming up with. But when I got to
this idea of left-to-right parsing, I said "Well here's something I don't really understand
very well. I'll publish this, let other people figure out what it is, and then they can tell me
what I should have said." I published that paper I believe in 1965, at the end of finishing my
draft of the chapter, which didn't get as far as that story, LR(k). Well now, textbooks of
computer science start with LR(k) and take off from there. But I want to give you an idea
of
In ksh93 however, the argument is taken as a date expression where various
and hardly documented formats are supported.
For a Unix epoch time, the syntax in ksh93 is:
printf '%(%F %T)T\n' '#1234567890'
ksh93 however seems to use its own algorithm for the timezone and can get it
wrong. For instance, in Britain, it was summer time all year in 1970, but:
Vim can indent bash scripts. But not reformat them before indenting.
Backup your bash script, open it with vim, type gg=GZZ and indent will be
corrected. (Note for the impatient: this overwrites the file, so be sure to do that backup!)
Though, some bugs with << (expecting EOF as first character on a line)
e.g.
open my $fp, '<', $file or die $!;
while (<$fp>) {
my $line = $_;
if ($line =~ /$regex/) {
# How do I find out which line number this match happened at?
}
}
close $fp;
Re:Neither! (Score 2,
Interesting)817 by
M. D. Nahas on Friday December
23, 2005 @06:08PM ( #14329127 ) Attached to:
Learning
Java or C# as a Next Language? The cleanest languages I've used are C, Java, and OCaml. By
"clean", I mean the language has a few concepts that can be completely memorized, which results
in less "gotchas" and manual reading. For these languages, you'll see small manuals (e.g.,
K&R's book for C) which cover the complete language and then lots of pages devoted to the
libraries that come with the language. I'd definitely recommend Java (or C, or OCaml) over C#
for this reason. C# seems to have combined every feature of C++, Java, and VBA into a single
language. It is very complex and has a ton of concepts, for which I could never memorize the
whole language. I have a feeling that most programmers will use the subset of C# that is
closest to the language they understand, whether it is C++, Java or VBA. You might as well
learn Java's style of programming, and then, if needed, switch to C# using its Java-like
features.
Update: It's been more than 5 years since I started this answer. Thank you for LOTS of great
edits/comments/suggestions. In order save maintenance time, I've modified the code block to
be 100% copy-paste ready. Please do not post comments like "What if you changed X to Y ".
Instead, copy-paste the code block, see the output, make the change, rerun the script, and
comment "I changed X to Y and " I don't have time to test your ideas and tell you if they
work.
Method #1: Using bash without getopt[s]
Two common ways to pass key-value-pair arguments are:
cat >/tmp/demo-space-separated.sh <<'EOF'
#!/bin/bash
POSITIONAL=()
while [[ $# -gt 0 ]]
do
key="$1"
case $key in
-e|--extension)
EXTENSION="$2"
shift # past argument
shift # past value
;;
-s|--searchpath)
SEARCHPATH="$2"
shift # past argument
shift # past value
;;
-l|--lib)
LIBPATH="$2"
shift # past argument
shift # past value
;;
--default)
DEFAULT=YES
shift # past argument
;;
*) # unknown option
POSITIONAL+=("$1") # save it in an array for later
shift # past argument
;;
esac
done
set -- "${POSITIONAL[@]}" # restore positional parameters
echo "FILE EXTENSION = ${EXTENSION}"
echo "SEARCH PATH = ${SEARCHPATH}"
echo "LIBRARY PATH = ${LIBPATH}"
echo "DEFAULT = ${DEFAULT}"
echo "Number files in SEARCH PATH with EXTENSION:" $(ls -1 "${SEARCHPATH}"/*."${EXTENSION}" | wc -l)
if [[ -n $1 ]]; then
echo "Last line of file specified as non-opt/last argument:"
tail -1 "$1"
fi
EOF
chmod +x /tmp/demo-space-separated.sh
/tmp/demo-space-separated.sh -e conf -s /etc -l /usr/lib /etc/hosts
output from copy-pasting the block above:
FILE EXTENSION = conf
SEARCH PATH = /etc
LIBRARY PATH = /usr/lib
DEFAULT =
Number files in SEARCH PATH with EXTENSION: 14
Last line of file specified as non-opt/last argument:
#93.184.216.34 example.com
cat >/tmp/demo-equals-separated.sh <<'EOF'
#!/bin/bash
for i in "$@"
do
case $i in
-e=*|--extension=*)
EXTENSION="${i#*=}"
shift # past argument=value
;;
-s=*|--searchpath=*)
SEARCHPATH="${i#*=}"
shift # past argument=value
;;
-l=*|--lib=*)
LIBPATH="${i#*=}"
shift # past argument=value
;;
--default)
DEFAULT=YES
shift # past argument with no value
;;
*)
# unknown option
;;
esac
done
echo "FILE EXTENSION = ${EXTENSION}"
echo "SEARCH PATH = ${SEARCHPATH}"
echo "LIBRARY PATH = ${LIBPATH}"
echo "DEFAULT = ${DEFAULT}"
echo "Number files in SEARCH PATH with EXTENSION:" $(ls -1 "${SEARCHPATH}"/*."${EXTENSION}" | wc -l)
if [[ -n $1 ]]; then
echo "Last line of file specified as non-opt/last argument:"
tail -1 $1
fi
EOF
chmod +x /tmp/demo-equals-separated.sh
/tmp/demo-equals-separated.sh -e=conf -s=/etc -l=/usr/lib /etc/hosts
output from copy-pasting the block above:
FILE EXTENSION = conf
SEARCH PATH = /etc
LIBRARY PATH = /usr/lib
DEFAULT =
Number files in SEARCH PATH with EXTENSION: 14
Last line of file specified as non-opt/last argument:
#93.184.216.34 example.com
To better understand ${i#*=} search for "Substring Removal" in this guide . It is
functionally equivalent to `sed 's/[^=]*=//' <<< "$i"` which calls a
needless subprocess or `echo "$i" | sed 's/[^=]*=//'` which calls two
needless subprocesses.
More recent getopt versions don't have these limitations.
Additionally, the POSIX shell (and others) offer getopts which doesn't have
these limitations. I've included a simplistic getopts example.
Usage demo-getopts.sh -vf /etc/hosts foo bar
cat >/tmp/demo-getopts.sh <<'EOF'
#!/bin/sh
# A POSIX variable
OPTIND=1 # Reset in case getopts has been used previously in the shell.
# Initialize our own variables:
output_file=""
verbose=0
while getopts "h?vf:" opt; do
case "$opt" in
h|\?)
show_help
exit 0
;;
v) verbose=1
;;
f) output_file=$OPTARG
;;
esac
done
shift $((OPTIND-1))
[ "${1:-}" = "--" ] && shift
echo "verbose=$verbose, output_file='$output_file', Leftovers: $@"
EOF
chmod +x /tmp/demo-getopts.sh
/tmp/demo-getopts.sh -vf /etc/hosts foo bar
output from copy-pasting the block above:
verbose=1, output_file='/etc/hosts', Leftovers: foo bar
The advantages of getopts are:
It's more portable, and will work in other shells like dash .
It can handle multiple single options like -vf filename in the typical
Unix way, automatically.
The disadvantage of getopts is that it can only handle short options (
-h , not --help ) without additional code.
There is a getopts tutorial which explains
what all of the syntax and variables mean. In bash, there is also help getopts ,
which might be informative.
No answer mentions enhanced getopt . And the top-voted answer is misleading: It either
ignores -vfd style short options (requested by the OP) or options after
positional arguments (also requested by the OP); and it ignores parsing-errors. Instead:
Use enhanced getopt from util-linux or formerly GNU glibc .
1
It works with getopt_long() the C function of GNU glibc.
Has all useful distinguishing features (the others don't have them):
handles spaces, quoting characters and even binary in arguments
2 (non-enhanced getopt can't do this)
it can handle options at the end: script.sh -o outFile file1 file2 -v
( getopts doesn't do this)
allows = -style long options: script.sh --outfile=fileOut
--infile fileIn (allowing both is lengthy if self parsing)
allows combined short options, e.g. -vfd (real work if self
parsing)
allows touching option-arguments, e.g. -oOutfile or
-vfdoOutfile
Is so old already 3 that no GNU system is missing this (e.g. any
Linux has it).
You can test for its existence with: getopt --test → return value
4.
Other getopt or shell-builtin getopts are of limited
use.
verbose: y, force: y, debug: y, in: ./foo/bar/someFile, out: /fizz/someOtherFile
with the following myscript
#!/bin/bash
# saner programming env: these switches turn some bugs into errors
set -o errexit -o pipefail -o noclobber -o nounset
# -allow a command to fail with !'s side effect on errexit
# -use return value from ${PIPESTATUS[0]}, because ! hosed $?
! getopt --test > /dev/null
if [[ ${PIPESTATUS[0]} -ne 4 ]]; then
echo 'I'm sorry, `getopt --test` failed in this environment.'
exit 1
fi
OPTIONS=dfo:v
LONGOPTS=debug,force,output:,verbose
# -regarding ! and PIPESTATUS see above
# -temporarily store output to be able to check for errors
# -activate quoting/enhanced mode (e.g. by writing out "--options")
# -pass arguments only via -- "$@" to separate them correctly
! PARSED=$(getopt --options=$OPTIONS --longoptions=$LONGOPTS --name "$0" -- "$@")
if [[ ${PIPESTATUS[0]} -ne 0 ]]; then
# e.g. return value is 1
# then getopt has complained about wrong arguments to stdout
exit 2
fi
# read getopt's output this way to handle the quoting right:
eval set -- "$PARSED"
d=n f=n v=n outFile=-
# now enjoy the options in order and nicely split until we see --
while true; do
case "$1" in
-d|--debug)
d=y
shift
;;
-f|--force)
f=y
shift
;;
-v|--verbose)
v=y
shift
;;
-o|--output)
outFile="$2"
shift 2
;;
--)
shift
break
;;
*)
echo "Programming error"
exit 3
;;
esac
done
# handle non-option arguments
if [[ $# -ne 1 ]]; then
echo "$0: A single input file is required."
exit 4
fi
echo "verbose: $v, force: $f, debug: $d, in: $1, out: $outFile"
1 enhanced getopt is available on most "bash-systems", including
Cygwin; on OS X try brew install gnu-getopt or sudo port
install getopt 2 the POSIX exec() conventions have no reliable way to
pass binary NULL in command line arguments; those bytes prematurely end the argument 3 first version released in 1997 or before (I only tracked it back to
1997)
#!/bin/bash
for i in "$@"
do
case $i in
-p=*|--prefix=*)
PREFIX="${i#*=}"
;;
-s=*|--searchpath=*)
SEARCHPATH="${i#*=}"
;;
-l=*|--lib=*)
DIR="${i#*=}"
;;
--default)
DEFAULT=YES
;;
*)
# unknown option
;;
esac
done
echo PREFIX = ${PREFIX}
echo SEARCH PATH = ${SEARCHPATH}
echo DIRS = ${DIR}
echo DEFAULT = ${DEFAULT}
To better understand ${i#*=} search for "Substring Removal" in this guide . It is
functionally equivalent to `sed 's/[^=]*=//' <<< "$i"` which calls a
needless subprocess or `echo "$i" | sed 's/[^=]*=//'` which calls two
needless subprocesses.
I'm about 4 years late to this question, but want to give back. I used the earlier answers as
a starting point to tidy up my old adhoc param parsing. I then refactored out the following
template code. It handles both long and short params, using = or space separated arguments,
as well as multiple short params grouped together. Finally it re-inserts any non-param
arguments back into the $1,$2.. variables. I hope it's useful.
#!/usr/bin/env bash
# NOTICE: Uncomment if your script depends on bashisms.
#if [ -z "$BASH_VERSION" ]; then bash $0 $@ ; exit $? ; fi
echo "Before"
for i ; do echo - $i ; done
# Code template for parsing command line parameters using only portable shell
# code, while handling both long and short params, handling '-f file' and
# '-f=file' style param data and also capturing non-parameters to be inserted
# back into the shell positional parameters.
while [ -n "$1" ]; do
# Copy so we can modify it (can't modify $1)
OPT="$1"
# Detect argument termination
if [ x"$OPT" = x"--" ]; then
shift
for OPT ; do
REMAINS="$REMAINS \"$OPT\""
done
break
fi
# Parse current opt
while [ x"$OPT" != x"-" ] ; do
case "$OPT" in
# Handle --flag=value opts like this
-c=* | --config=* )
CONFIGFILE="${OPT#*=}"
shift
;;
# and --flag value opts like this
-c* | --config )
CONFIGFILE="$2"
shift
;;
-f* | --force )
FORCE=true
;;
-r* | --retry )
RETRY=true
;;
# Anything unknown is recorded for later
* )
REMAINS="$REMAINS \"$OPT\""
break
;;
esac
# Check for multiple short options
# NOTICE: be sure to update this pattern to match valid options
NEXTOPT="${OPT#-[cfr]}" # try removing single short opt
if [ x"$OPT" != x"$NEXTOPT" ] ; then
OPT="-$NEXTOPT" # multiple short opts, keep going
else
break # long form, exit inner loop
fi
done
# Done with that param. move to next
shift
done
# Set the non-parameters back into the positional parameters ($1 $2 ..)
eval set -- $REMAINS
echo -e "After: \n configfile='$CONFIGFILE' \n force='$FORCE' \n retry='$RETRY' \n remains='$REMAINS'"
for i ; do echo - $i ; done
> ,
I have found the matter to write portable parsing in scripts so frustrating that I have
written Argbash - a FOSS
code generator that can generate the arguments-parsing code for your script plus it has some
nice features:
I have a test script which has a lot of commands and will generate lots of output, I use
set -x or set -v and set -e , so the script would stop
when error occurs. However, it's still rather difficult for me to locate which line did the
execution stop in order to locate the problem. Is there a method which can output the line
number of the script before each line is executed? Or output the line number before the
command exhibition generated by set -x ? Or any method which can deal with my
script line location problem would be a great help. Thanks.
You mention that you're already using -x . The variable PS4 denotes
the value is the prompt printed before the command line is echoed when the -x
option is set and defaults to : followed by space.
You can change PS4 to emit the LINENO (The line number in the
script or shell function currently executing).
For example, if your script reads:
$ cat script
foo=10
echo ${foo}
echo $((2 + 2))
Executing it thus would print line numbers:
$ PS4='Line ${LINENO}: ' bash -x script
Line 1: foo=10
Line 2: echo 10
10
Line 3: echo 4
4
Simple (but powerful) solution: Place echo around the code you think that causes
the problem and move the echo line by line until the messages does not appear
anymore on screen - because the script has stop because of an error before.
Even more powerful solution: Install bashdb the bash debugger and debug the
script line by line
In a fairly sophisticated script I wouldn't like to see all line numbers; rather I would
like to be in control of the output.
Define a function
echo_line_no () {
grep -n "$1" $0 | sed "s/echo_line_no//"
# grep the line(s) containing input $1 with line numbers
# replace the function name with nothing
} # echo_line_no
Use it with quotes like
echo_line_no "this is a simple comment with a line number"
Output is
16 "this is a simple comment with a line number"
if the number of this line in the source file is 16.
Sure. Why do you need this? How do you work with this? What can you do with this? Is this
simple approach really sufficient or useful? Why do you want to tinker with this at all?
What is the best (or recommended) approach to do defensive programming in perl? For example
if I have a sub which must be called with a (defined) SCALAR, an ARRAYREF and an optional
HASHREF.
Three of the approaches I have seen:
sub test1 {
die if !(@_ == 2 || @_ == 3);
my ($scalar, $arrayref, $hashref) = @_;
die if !defined($scalar) || ref($scalar);
die if ref($arrayref) ne 'ARRAY';
die if defined($hashref) && ref($hashref) ne 'HASH';
#do s.th with scalar, arrayref and hashref
}
sub test2 {
Carp::assert(@_ == 2 || @_ == 3) if DEBUG;
my ($scalar, $arrayref, $hashref) = @_;
if(DEBUG) {
Carp::assert defined($scalar) && !ref($scalar);
Carp::assert ref($arrayref) eq 'ARRAY';
Carp::assert !defined($hashref) || ref($hashref) eq 'HASH';
}
#do s.th with scalar, arrayref and hashref
}
sub test3 {
my ($scalar, $arrayref, $hashref) = @_;
(@_ == 2 || @_ == 3 && defined($scalar) && !ref($scalar) && ref($arrayref) eq 'ARRAY' && (!defined($hashref) || ref($hashref) eq 'HASH'))
or Carp::croak 'usage: test3(SCALAR, ARRAYREF, [HASHREF])';
#do s.th with scalar, arrayref and hashref
}
I wouldn't use any of them. Aside from not not accepting many array and hash references, the
checks you used are almost always redundant.
>perl -we"use strict; sub { my ($x) = @_; my $y = $x->[0] }->( 'abc' )"
Can't use string ("abc") as an ARRAY ref nda"strict refs" in use at -e line 1.
>perl -we"use strict; sub { my ($x) = @_; my $y = $x->[0] }->( {} )"
Not an ARRAY reference at -e line 1.
The only advantage to checking is that you can use croak to show the caller
in the error message.
Proper way to check if you have an reference to an array:
defined($x) && eval { @$x; 1 }
Proper way to check if you have an reference to a hash:
None of the options you show display any message to give a reason for the failure,
which I think is paramount.
It is also preferable to use croak instead of die from within
library subroutines, so that the error is reported from the point of view of the caller.
I would replace all occurrences of if ! with unless . The former
is a C programmer's habit.
I suggest something like this
sub test1 {
croak "Incorrect number of parameters" unless @_ == 2 or @_ == 3;
my ($scalar, $arrayref, $hashref) = @_;
croak "Invalid first parameter" unless $scalar and not ref $scalar;
croak "Invalid second parameter" unless $arrayref eq 'ARRAY';
croak "Invalid third parameter" if defined $hashref and ref $hashref ne 'HASH';
# do s.th with scalar, arrayref and hashref
}
These are available with the __LINE__ and __FILE__ tokens, as
documented in perldoc perldata under "Special
Literals":
The special literals __FILE__, __LINE__, and __PACKAGE__ represent the current filename,
line number, and package name at that point in your program. They may be used only as
separate tokens; they will not be interpolated into strings. If there is no current package
(due to an empty package; directive), __PACKAGE__ is the undefined value.
The caller function will do what you are looking for:
sub print_info {
my ($package, $filename, $line) = caller;
...
}
print_info(); # prints info about this line
This will get the information from where the sub is called, which is probably what you are
looking for. The __FILE__ and __LINE__ directives only apply to
where they are written, so you can not encapsulate their effect in a subroutine. (unless you
wanted a sub that only prints info about where it is defined)
I am using rm within a BASH script to delete many files. Sometimes the files are
not present, so it reports many errors. I do not need this message. I have searched the man
page for a command to make rm quiet, but the only option I found is
-f , which from the description, "ignore nonexistent files, never prompt", seems
to be the right choice, but the name does not seem to fit, so I am concerned it might have
unintended consequences.
Is the -f option the correct way to silence rm ? Why isn't it
called -q ?
The main use of -f is to force the removal of files that would not be removed
using rm by itself (as a special case, it "removes" non-existent files, thus
suppressing the error message).
You can also just redirect the error message using
$ rm file.txt 2> /dev/null
(or your operating system's equivalent). You can check the value of $?
immediately after calling rm to see if a file was actually removed or not.
As far as rm -f doing "anything else", it does force ( -f is
shorthand for --force ) silent removal in situations where rm would
otherwise ask you for confirmation. For example, when trying to remove a file not writable by
you from a directory that is writable by you.
Any one can let me know the possible return codes for the command rm -rf other than zero i.e,
possible return codes for failure cases. I want to know more detailed reason for the failure
of the command unlike just the command is failed(return other than 0).
To see the return code, you can use echo $? in bash.
To see the actual meaning, some platforms (like Debian Linux) have the perror
binary available, which can be used as follows:
$ rm -rf something/; perror $?
rm: cannot remove `something/': Permission denied
OS error code 1: Operation not permitted
rm -rf automatically suppresses most errors. The most likely error you will
see is 1 (Operation not permitted), which will happen if you don't have
permissions to remove the file. -f intentionally suppresses most errors
If you want to do remote debug (for cgi or if you don't want to mess output with debug
command line) use this:
given test:
use v5.14;
say 1;
say 2;
say 3;
Start a listener on whatever host and port on terminal 1 (here localhost:12345):
$ nc -v -l localhost -p 12345
for readline support use rlwrap (you can use on perl
-d too):
$ rlwrap nc -v -l localhost -p 12345
And start the test on another terminal (say terminal 2):
$ PERLDB_OPTS="RemotePort=localhost:12345" perl -d test
Input/Output on terminal 1:
Connection from 127.0.0.1:42994
Loading DB routines from perl5db.pl version 1.49
Editor support available.
Enter h or 'h h' for help, or 'man perldebug' for more help.
main::(test:2): say 1;
DB<1> n
main::(test:3): say 2;
DB<1> select $DB::OUT
DB<2> n
2
main::(test:4): say 3;
DB<2> n
3
Debugged program terminated. Use q to quit or R to restart,
use o inhibit_exit to avoid stopping after program termination,
h q, h R or h o to get additional info.
DB<2>
Output on terminal 2:
1
Note the sentence if you want output on debug terminal
select $DB::OUT
If you are vim user, install this plugin: dbg.vim which provides basic support for perl
This is like "please can you give me an example how to drive a car" .
I have explained the basic commands that you will use most often. Beyond this you must
read the debugger's inline help and reread the perldebug documentation
The debugger starts by displaying the next line to be executed: usually the
first line in your program
Debugger commands are mostly single letters, possibly with parameters. The command will
be actioned as soon as you press Enter
You should concentrate on commands s and n to step through
the program. If the next statement is a subroutine (or method) call then s
will step into the subroutine while n will step over the call.
Otherwise s and n behave identically
Be careful using s when a single line of code contains multiple
subroutine calls. You may not be stepping into the subroutine that you expect
You can't step into a built-in function, or a subroutine not written in
Perl
Once you have executed a statement there is no going back. You must restart the
program to try something different
You can execute a line of Perl code just by typing it in and pressing Enter
. the code will be executed in the context of the current statement
You can examine or modify any variable this way
The p command is identical to print . The output from
p $var or p @arr will be the same as if you had typed p
$var or p @arr
You can use x to dump an expression in list context. The output
consists of numbered lines showing each element of the list
The commands dot . , hyphen - and v are useful
for looking at the source code. . and - will display the current
and previous source line respectively. v will display a window around the
current source line
To rapidly return to a specific line of code you can set a breakpoint and
continue execution until that line using the c command. For example c
13Enter will execute all code until line 13 and then stop
Breakpoints defined using c are temporary , so if you want to
continue to the same line again (in a loop) then you have to enter c 13Enter again
c without any parameters will run the rest of the program until it exits
or until a permanent breakpoint, defined using b , is reached
You can specify breakpoints with more complex conditions using the b
command. They can be deleted only with the corresponding B command, or B
* which will clear all breakpoints
h shows a list of the commands available, and h *command* ,
like h c , will show you detailed help on a single command
Finally, q will end the debug session and terminate the program
The debugger will do a lot more than this, but these are the basic commands that you need
to know. You should experiment with them and look at the contents of the help text to get
more proficient with the Perl debugger
if [ -z ${var+x} ]; then echo "var is unset"; else echo "var is set to '$var'"; fi
where ${var+x} is a parameter
expansion which evaluates to nothing if var is unset, and substitutes the
string x otherwise.
Quotes Digression
Quotes can be omitted (so we can say ${var+x} instead of
"${var+x}" ) because this syntax & usage guarantees this will only expand to
something that does not require quotes (since it either expands to x (which
contains no word breaks so it needs no quotes), or to nothing (which results in [ -z
] , which conveniently evaluates to the same value (true) that [ -z "" ]
does as well)).
However, while quotes can be safely omitted, and it was not immediately obvious to all (it
wasn't even apparent to the first author of this quotes
explanation who is also a major Bash coder), it would sometimes be better to write the
solution with quotes as [ -z "${var+x}" ] , at the very small possible cost of
an O(1) speed penalty. The first author also added this as a comment next to the code using
this solution giving the URL to this answer, which now also includes the explanation for why
the quotes can be safely omitted.
(Often) The wrong way
if [ -z "$var" ]; then echo "var is blank"; else echo "var is set to '$var'"; fi
This is often wrong because it doesn't distinguish between a variable that is unset and a
variable that is set to the empty string. That is to say, if var='' , then the
above solution will output "var is blank".
The distinction between unset and "set to the empty string" is essential in situations
where the user has to specify an extension, or additional list of properties, and that not
specifying them defaults to a non-empty value, whereas specifying the empty string should
make the script use an empty extension or list of additional properties.
The distinction may not be essential in every scenario though. In those cases [ -z
"$var" ] will be just fine.
In all cases shown with "substitute", the expression is replaced with the value shown.
In all cases shown with "assign", parameter is assigned that value, which also replaces the
expression.
While most of the techniques stated here are correct, bash 4.2 supports an actual test for
the presence of a variable ( man
bash ), rather than testing the value of the variable.
Notably, this approach will not cause an error when used to check for an unset variable in
set -u / set -o nounset mode, unlike many other approaches, such as
using [ -z .
Note that each group (with and without preceding colon) has the same set and
unset cases, so the only thing that differs is how the empty cases are
handled.
With the preceding colon, the empty and unset cases are identical, so I
would use those where possible (i.e. use := , not just = , because
the empty case is inconsistent).
Headings:
set means VARIABLE is non-empty ( VARIABLE="something"
)
empty means VARIABLE is empty/null ( VARIABLE=""
)
unset means VARIABLE does not exist ( unset VARIABLE
)
Values:
$VARIABLE means the result is the original value of the variable.
"default" means the result was the replacement string provided.
"" means the result is null (an empty string).
exit 127 means the script stops executing with exit code 127.
$(VARIABLE="default") means the result is the original value of the
variable and the replacement string provided is assigned to the variable for future
use.
On a modern version of Bash (4.2 or later I think; I don't know for sure), I would try this:
if [ ! -v SOMEVARIABLE ] #note the lack of a $ sigil
then
echo "Variable is unset"
elif [ -z "$SOMEVARIABLE" ]
then
echo "Variable is set to an empty string"
else
echo "Variable is set to some string"
fi
This worked for me. I wanted my script to exit with an error message if a parameter wasn't
set.
#!/usr/bin/env bash
set -o errexit
# Get the value and empty validation check all in one
VER="${1:?You must pass a version of the format 0.0.0 as the only argument}"
This returns with an error when it's run
peek@peek:~$ ./setver.sh
./setver.sh: line 13: 1: You must pass a version of the format 0.0.0 as the only argument
Check only, no exit - Empty and Unset are INVALID
Try this option if you just want to check if the value set=VALID or
unset/empty=INVALID.
TSET="good val"
TEMPTY=""
unset TUNSET
if [ "${TSET:-}" ]; then echo "VALID"; else echo "INVALID";fi
# VALID
if [ "${TEMPTY:-}" ]; then echo "VALID"; else echo "INVALID";fi
# INVALID
if [ "${TUNSET:-}" ]; then echo "VALID"; else echo "INVALID";fi
# INVALID
To check whether a variable is set with a non-empty value, use [ -n "$x" ] , as
others have already indicated.
Most of the time, it's a good idea to treat a variable that has an empty value in the same
way as a variable that is unset. But you can distinguish the two if you need to: [ -n
"${x+set}" ] ( "${x+set}" expands to set if x
is set and to the empty string if x is unset).
To check whether a parameter has been passed, test $# , which is the number
of parameters passed to the function (or to the script, when not in a function) (see
Paul's answer ).
Read the "Parameter Expansion" section of the bash man page. Parameter expansion
doesn't provide a general test for a variable being set, but there are several things you can
do to a parameter if it isn't set.
For example:
function a {
first_arg=${1-foo}
# rest of the function
}
will set first_arg equal to $1 if it is assigned, otherwise it
uses the value "foo". If a absolutely must take a single parameter, and no good
default exists, you can exit with an error message when no parameter is given:
function a {
: ${1?a must take a single argument}
# rest of the function
}
(Note the use of : as a null command, which just expands the values of its
arguments. We don't want to do anything with $1 in this example, just exit if it
isn't set)
For those that are looking to check for unset or empty when in a script with set
-u :
if [ -z "${var-}" ]; then
echo "Must provide var environment variable. Exiting...."
exit 1
fi
The regular [ -z "$var" ] check will fail with var; unbound
variable if set -u but [ -z "${var-}" ]expands
to empty string if var is unset without failing.
I'm giving a heavily Bash-focused answer because of the bash tag.
Short
answer
As long as you're only dealing with named variables in Bash, this function should always
tell you if the variable has been set, even if it's an empty array.
is-variable-set() {
declare -p $1 &>dev/null
}
Why this works
In Bash (at least as far back as 3.0), if var is a declared/set variable,
then declare -p var outputs a declare command that would set
variable var to whatever its current type and value are, and returns status code
0 (success). If var is undeclared, then declare -p var
outputs an error message to stderr and returns status code 1 .
Using &>/dev/null , redirects both regular stdout and
stderr output to /dev/null , never to be seen, and without changing
the status code. Thus the function only returns the status code.
Why other methods
(sometimes) fail in Bash
[ -n "$var" ] : This only checks if ${var[0]} is nonempty.
(In Bash, $var is the same as ${var[0]} .)
[ -n "${var+x}" ] : This only checks if ${var[0]} is
set.
[ "${#var[@]}" != 0 ] : This only checks if at least one index of
$var is set.
When this method fails in Bash
This only works for named variables (including $_ ), not certain special
variables ( $! , $@ , $# , $$ ,
$* , $? , $- , $0 , $1 ,
$2 , ..., and any I may have forgotten). Since none of these are arrays, the
POSIX-style [ -n "${var+x}" ] works for all of these special variables. But
beware of wrapping it in a function since many special variables change values/existence when
functions are called.
Shell compatibility note
If your script has arrays and you're trying to make it compatible with as many shells as
possible, then consider using typeset -p instead of declare -p .
I've read that ksh only supports the former, but haven't been able to test this. I do know
that Bash 3.0+ and Zsh 5.5.1 each support both typeset -p and declare
-p , differing only in which one is an alternative for the other. But I haven't tested
differences beyond those two keywords, and I haven't tested other shells.
If you need your script to be POSIX sh compatible, then you can't use arrays. Without
arrays, [ -n "{$var+x}" ] works.
Comparison code for different methods in
Bash
This function unsets variable var , eval s the passed code, runs
tests to determine if var is set by the eval d code, and finally
shows the resulting status codes for the different tests.
I'm skipping test -v var , [ -v var ] , and [[ -v var
]] because they yield identical results to the POSIX standard [ -n "${var+x}"
] , while requiring Bash 4.2+. I'm also skipping typeset -p because it's
the same as declare -p in the shells I've tested (Bash 3.0 thru 5.0, and Zsh
5.5.1).
is-var-set-after() {
# Set var by passed expression.
unset var
eval "$1"
# Run the tests, in increasing order of accuracy.
[ -n "$var" ] # (index 0 of) var is nonempty
nonempty=$?
[ -n "${var+x}" ] # (index 0 of) var is set, maybe empty
plus=$?
[ "${#var[@]}" != 0 ] # var has at least one index set, maybe empty
count=$?
declare -p var &>/dev/null # var has been declared (any type)
declared=$?
# Show test results.
printf '%30s: %2s %2s %2s %2s\n' "$1" $nonempty $plus $count $declared
}
Test case code
Note that test results may be unexpected due to Bash treating non-numeric array indices as
"0" if the variable hasn't been declared as an associative array. Also, associative arrays
are only valid in Bash 4.0+.
# Header.
printf '%30s: %2s %2s %2s %2s\n' "test" '-n' '+x' '#@' '-p'
# First 5 tests: Equivalent to setting 'var=foo' because index 0 of an
# indexed array is also the nonindexed value, and non-numerical
# indices in an array not declared as associative are the same as
# index 0.
is-var-set-after "var=foo" # 0 0 0 0
is-var-set-after "var=(foo)" # 0 0 0 0
is-var-set-after "var=([0]=foo)" # 0 0 0 0
is-var-set-after "var=([x]=foo)" # 0 0 0 0
is-var-set-after "var=([y]=bar [x]=foo)" # 0 0 0 0
# '[ -n "$var" ]' fails when var is empty.
is-var-set-after "var=''" # 1 0 0 0
is-var-set-after "var=([0]='')" # 1 0 0 0
# Indices other than 0 are not detected by '[ -n "$var" ]' or by
# '[ -n "${var+x}" ]'.
is-var-set-after "var=([1]='')" # 1 1 0 0
is-var-set-after "var=([1]=foo)" # 1 1 0 0
is-var-set-after "declare -A var; var=([x]=foo)" # 1 1 0 0
# Empty arrays are only detected by 'declare -p'.
is-var-set-after "var=()" # 1 1 1 0
is-var-set-after "declare -a var" # 1 1 1 0
is-var-set-after "declare -A var" # 1 1 1 0
# If 'var' is unset, then it even fails the 'declare -p var' test.
is-var-set-after "unset var" # 1 1 1 1
Test output
The test mnemonics in the header row correspond to [ -n "$var" ] , [ -n
"${var+x}" ] , [ "${#var[@]}" != 0 ] , and declare -p var ,
respectively.
declare -p var &>/dev/null is (100%?) reliable for testing named
variables in Bash since at least 3.0.
[ -n "${var+x}" ] is reliable in POSIX compliant situations, but cannot
handle arrays.
Other tests exist for checking if a variable is nonempty, and for checking for
declared variables in other shells. But these tests are suited for neither Bash nor POSIX
scripts.
Using [[ -z "$var" ]] is the easiest way to know if a variable was set or not,
but that option -z doesn't distinguish between an unset variable and a variable
set to an empty string:
The answers above do not work when Bash option set -u is enabled. Also, they are
not dynamic, e.g., how to test is variable with name "dummy" is defined? Try this:
is_var_defined()
{
if [ $# -ne 1 ]
then
echo "Expected exactly one argument: variable name as string, e.g., 'my_var'"
exit 1
fi
# Tricky. Since Bash option 'set -u' may be enabled, we cannot directly test if a variable
# is defined with this construct: [ ! -z "$var" ]. Instead, we must use default value
# substitution with this construct: [ ! -z "${var:-}" ]. Normally, a default value follows the
# operator ':-', but here we leave it blank for empty (null) string. Finally, we need to
# substitute the text from $1 as 'var'. This is not allowed directly in Bash with this
# construct: [ ! -z "${$1:-}" ]. We need to use indirection with eval operator.
# Example: $1="var"
# Expansion for eval operator: "[ ! -z \${$1:-} ]" -> "[ ! -z \${var:-} ]"
# Code execute: [ ! -z ${var:-} ]
eval "[ ! -z \${$1:-} ]"
return $? # Pedantic.
}
$var=10
$if ! ${var+false};then echo "is set";else echo "NOT set";fi
is set
$unset var
$if ! ${var+false};then echo "is set";else echo "NOT set";fi
NOT set
So basically, if a variable is set, it becomes "a negation of the resulting
false " (what will be true = "is set").
And, if it is unset, it will become "a negation of the resulting true " (as
the empty result evaluates to true ) (so will end as being false =
"NOT set").
if [[ ${1:+isset} ]]
then echo "It was set and not null." >&2
else echo "It was not set or it was null." >&2
fi
if [[ ${1+isset} ]]
then echo "It was set but might be null." >&2
else echo "It was was not set." >&2
fi
I like auxiliary functions to hide the crude details of bash. In this case, doing so adds
even more (hidden) crudeness:
# The first ! negates the result (can't use -n to achieve this)
# the second ! expands the content of varname (can't do ${$varname})
function IsDeclared_Tricky
{
local varname="$1"
! [ -z ${!varname+x} ]
}
Because I first had bugs in this implementation (inspired by the answers of Jens and
Lionel), I came up with a different solution:
# Ask for the properties of the variable - fails if not declared
function IsDeclared()
{
declare -p $1 &>/dev/null
}
I find it to be more straight-forward, more bashy and easier to understand/remember. Test
case shows it is equivalent:
function main()
{
declare -i xyz
local foo
local bar=
local baz=''
IsDeclared_Tricky xyz; echo "IsDeclared_Tricky xyz: $?"
IsDeclared_Tricky foo; echo "IsDeclared_Tricky foo: $?"
IsDeclared_Tricky bar; echo "IsDeclared_Tricky bar: $?"
IsDeclared_Tricky baz; echo "IsDeclared_Tricky baz: $?"
IsDeclared xyz; echo "IsDeclared xyz: $?"
IsDeclared foo; echo "IsDeclared foo: $?"
IsDeclared bar; echo "IsDeclared bar: $?"
IsDeclared baz; echo "IsDeclared baz: $?"
}
main
The test case also shows that local var does NOT declare var (unless followed
by '='). For quite some time I thought i declared variables this way, just to discover now
that i merely expressed my intention... It's a no-op, i guess.
I mostly use this test to give (and return) parameters to functions in a somewhat
"elegant" and safe way (almost resembling an interface...):
#auxiliary functions
function die()
{
echo "Error: $1"; exit 1
}
function assertVariableDeclared()
{
IsDeclared "$1" || die "variable not declared: $1"
}
function expectVariables()
{
while (( $# > 0 )); do
assertVariableDeclared $1; shift
done
}
# actual example
function exampleFunction()
{
expectVariables inputStr outputStr
outputStr="$inputStr world!"
}
function bonus()
{
local inputStr='Hello'
local outputStr= # remove this to trigger error
exampleFunction
echo $outputStr
}
bonus
If you wish to test that a variable is bound or unbound, this works well, even after you've
turned on the nounset option:
set -o noun set
if printenv variableName >/dev/null; then
# variable is bound to a value
else
# variable is unbound
fi
> ,Jan 30 at 18:23
Functions to check if variable is declared/unsetincluding empty
$array=()
The following functions test if the given name exists as a variable
# The first parameter needs to be the name of the variable to be checked.
# (See example below)
var_is_declared() {
{ [[ -n ${!1+anything} ]] || declare -p $1 &>/dev/null;}
}
var_is_unset() {
{ [[ -z ${!1+anything} ]] && ! declare -p $1 &>/dev/null;}
}
By first testing if the variable is (un)set, the call to declare can be avoided, if not
necessary.
If however $1 contains the name of an empty $array=() , the
call to declare would make sure we get the right result
There's never much data passed to /dev/null as declare is only called if either the
variable is unset or an empty array.
This functions would test as showed in the following conditions:
a; # is not declared
a=; # is declared
a="foo"; # is declared
a=(); # is declared
a=(""); # is declared
unset a; # is not declared
a; # is unset
a=; # is not unset
a="foo"; # is not unset
a=(); # is not unset
a=(""); # is not unset
unset a; # is unset
Remark: The similar usage of declare -p , as it is also shown by Peregring-lk 's
answer , is
truly coincidental. Otherwise I would of course have credited it!
I want to insert some items into mc menu (which is opened by F2) grouped together. Is it
possible to insert some sort of separator before them or put them into some submenu?
Probably, not.
The format of the menu file is very simple. Lines that start with anything but
space or tab are considered entries for the menu (in order to be able to use
it like a hot key, the first character should be a letter). All the lines that
start with a space or a tab are the commands that will be executed when the
entry is selected.
But MC allows you to make multiple menu entries with same shortcut and title, so you can
make a menu entry that looks like separator and does nothing, like:
a hello
echo world
- --------
b world
echo hello
- --------
c superuser
ls /
The above would compress the current directory (%d) to a file also in the current directory. If you want to compress the directory
pointed to by the cursor rather than the current directory, use %f instead:
tar -czf %f_$(date '+%%Y%%m%%d').tar.gz %f
mc handles escaping of special characters so there is no need to put %f in quotes.
By the way, midnight commander's special treatment of percent signs occurs not just in the user menu file but also at the command
line. This is an issue when using shell commands with constructs like ${var%.c} . At the command line, the same as
in the user menu file, percent signs can be escaped by doubling them.
It is documented in the help, the node is "Edit Menu File" under "Command Menu"; if you
scroll down you should find "Addition Conditions":
If the condition begins with '+' (or '+?') instead of '=' (or '=?') it is an addition
condition. If the condition is true the menu entry will be included in the menu. If the
condition is false the menu entry will not be included in the menu.
This is preceded by "Default conditions" (the = condition), which determine
which entry will be highlighted as the default choice when the menu appears. Anyway, by way
of example:
+ t r & ! t t
t r means if this is a regular file ("t(ype) r"), and ! t t
means if the file has not been tagged in the interface.
As I understand pipes and commands, bash takes each command, spawns a process for each one
and connects stdout of the previous one with the stdin of the next one.
For example, in "ls -lsa | grep feb", bash will create two processes, and connect the
output of "ls -lsa" to the input of "grep feb".
When you execute a background command like "sleep 30 &" in bash, you get the pid of
the background process running your command. Surprisingly for me, when I wrote "ls -lsa |
grep feb &" bash returned only one PID.
How should this be interpreted? A process runs both "ls -lsa" and "grep feb"? Several
process are created but I only get the pid of one of them?
When you run a job in the background, bash prints the process ID of its subprocess, the one
that runs the command in that job. If that job happens to create more subprocesses, that's
none of the parent shell's business.
When the background job is a pipeline (i.e. the command is of the form something1 |
something2 & , and not e.g. { something1 | something2; } & ),
there's an optimization which is strongly suggested by POSIX and performed by most shells
including bash: each of the elements of the pipeline are executed directly as subprocesses of
the original shell. What POSIX mandates is that the variable
$! is set to the last command in the pipeline in this case. In most shells,
that last command is a subprocess of the original process, and so are the other commands in
the pipeline.
When you run ls -lsa | grep feb , there are three processes involved: the one
that runs the left-hand side of the pipe (a subshell that finishes setting up the pipe then
executes ls ), the one that runs the right-hand side of the pipe (a subshell
that finishes setting up the pipe then executes grep ), and the original process
that waits for the pipe to finish.
You can watch what happens by tracing the processes:
use grep [n]ame to remove that grep -v name this is first... Sec using xargs in the way how
it is up there is wrong to rnu whatever it is piped you have to use -i ( interactive mode)
otherwise you may have issues with the command.
I start a background process from my shell script, and I would like to kill this process when my script finishes.
How to get the PID of this process from my shell script? As far as I can see variable $! contains the PID of the
current script, not the background process.
You need to save the PID of the background process at the time you start it:
foo &
FOO_PID=$!
# do other stuff
kill $FOO_PID
You cannot use job control, since that is an interactive feature and tied to a controlling terminal. A script will not necessarily
have a terminal attached at all so job control will not necessarily be available.
An even simpler way to kill all child process of a bash script:
pkill -P $$
The -P flag works the same way with pkill and pgrep - it gets child processes, only
with pkill the child processes get killed and with pgrep child PIDs are printed to stdout.
this is what I have done. Check it out, hope it can help.
#!/bin/bash
#
# So something to show.
echo "UNO" > UNO.txt
echo "DOS" > DOS.txt
#
# Initialize Pid List
dPidLst=""
#
# Generate background processes
tail -f UNO.txt&
dPidLst="$dPidLst $!"
tail -f DOS.txt&
dPidLst="$dPidLst $!"
#
# Report process IDs
echo PID=$$
echo dPidLst=$dPidLst
#
# Show process on current shell
ps -f
#
# Start killing background processes from list
for dPid in $dPidLst
do
echo killing $dPid. Process is still there.
ps | grep $dPid
kill $dPid
ps | grep $dPid
echo Just ran "'"ps"'" command, $dPid must not show again.
done
Then just run it as: ./bgkill.sh with proper permissions of course
root@umsstd22 [P]:~# ./bgkill.sh
PID=23757
dPidLst= 23758 23759
UNO
DOS
UID PID PPID C STIME TTY TIME CMD
root 3937 3935 0 11:07 pts/5 00:00:00 -bash
root 23757 3937 0 11:55 pts/5 00:00:00 /bin/bash ./bgkill.sh
root 23758 23757 0 11:55 pts/5 00:00:00 tail -f UNO.txt
root 23759 23757 0 11:55 pts/5 00:00:00 tail -f DOS.txt
root 23760 23757 0 11:55 pts/5 00:00:00 ps -f
killing 23758. Process is still there.
23758 pts/5 00:00:00 tail
./bgkill.sh: line 24: 23758 Terminated tail -f UNO.txt
Just ran 'ps' command, 23758 must not show again.
killing 23759. Process is still there.
23759 pts/5 00:00:00 tail
./bgkill.sh: line 24: 23759 Terminated tail -f DOS.txt
Just ran 'ps' command, 23759 must not show again.
root@umsstd22 [P]:~# ps -f
UID PID PPID C STIME TTY TIME CMD
root 3937 3935 0 11:07 pts/5 00:00:00 -bash
root 24200 3937 0 11:56 pts/5 00:00:00 ps -f
This typically gives a text representation of all the processes for the "user" and the -p option gives the process-id. It does
not depend, as far as I understand, on having the processes be owned by the current shell. It also shows forks.
pgrep can get you all of the child PIDs of a parent process. As mentioned earlier $$ is the current
scripts PID. So, if you want a script that cleans up after itself, this should do the trick:
[ -n file.txt ] doesn't check its size , it checks that the string file.txt is
non-zero length, so it will always succeed.
If you want to say " size is non-zero", you need [ -s file.txt ] .
To get a file's size , you can use wc -c to get the size ( file length) in bytes:
file=file.txt
minimumsize=90000
actualsize=$(wc -c <"$file")
if [ $actualsize -ge $minimumsize ]; then
echo size is over $minimumsize bytes
else
echo size is under $minimumsize bytes
fi
In this case, it sounds like that's what you want.
But FYI, if you want to know how much disk space the file is using, you could use du -k to get the
size (disk space used) in kilobytes:
file=file.txt
minimumsize=90
actualsize=$(du -k "$file" | cut -f 1)
if [ $actualsize -ge $minimumsize ]; then
echo size is over $minimumsize kilobytes
else
echo size is under $minimumsize kilobytes
fi
If you need more control over the output format, you can also look at stat . On Linux, you'd start with something
like stat -c '%s' file.txt , and on BSD/Mac OS X, something like stat -f '%z' file.txt .
--Mikel
5 Why du -b "$file" | cut -f 1 instead of stat -c '%s' "$file" ? Or stat --printf="%s" "$file"
? � mivk Dec
14 '13 at 11:00
1 Only because it's more portable. BSD and Linux stat have different flags. �
Mikel Dec
16 '13 at 16:40
It surprises me that no one mentioned stat to check file size. Some methods are definitely better: using -s
to find out whether the file is empty or not is easier than anything else if that's all you want. And if you want to
find files of a size, then find is certainly the way to go.
I also like du a lot to get file size in kb, but, for bytes, I'd use stat :
size=$(stat -f%z $filename) # BSD stat
size=$(stat -c%s $filename) # GNU stat?
alternative solution with awk and double parenthesis:
FILENAME=file.txt
SIZE=$(du -sb $FILENAME | awk '{ print $1 }')
if ((SIZE<90000)) ; then
echo "less";
else
echo "not less";
fi
+ t r & ! t t
d Diff against file of same name in other directory
if [ "%d" = "%D" ]; then
echo "The two directores must be different"
exit 1
fi
if [ -f %D/%f ]; then # if two of them, then
bcompare %f %D/%f &
else
echo %f: No copy in %D/%f
fi
x Diff file to file
if [ -f %D/%F ]; then # if two of them, then
bcompare %f %D/%F &
else
echo %f: No copy in %D/%f
fi
D Diff current directory against other directory
if [ "%d" = "%D" ]; then
echo "The two directores must be different"
exit 1
fi
bcompare %d %D &
It is documented in the help, the node is "Edit Menu File" under "Command Menu"; if you
scroll down you should find "Addition Conditions":
If the condition begins with '+' (or '+?') instead of '=' (or '=?') it is an addition
condition. If the condition is true the menu entry will be included in the menu. If the
condition is false the menu entry will not be included in the menu.
This is preceded by "Default conditions" (the = condition), which determine
which entry will be highlighted as the default choice when the menu appears. Anyway, by way
of example:
+ t r & ! t t
t r means if this is a regular file ("t(ype) r"), and ! t t
means if the file has not been tagged in the interface.
Is it possible to configure the Midnight Commander (Ubuntu 10.10) to show certain file and
directory names differently, e.g. all hidden (starting with a period) using grey color?
See man mc in the Colors section for ways to choose particular colors by
adding entries in your ~/.config/mc/ini file. Unfortunately, there doesn't
appear to be a keyword for hidden files.
Alright, so simple problem here. I'm working on a simple back up code. It works fine except
if the files have spaces in them. This is how I'm finding files and adding them to a tar
archive:
find . -type f | xargs tar -czvf backup.tar.gz
The problem is when the file has a space in the name because tar thinks that it's a
folder. Basically is there a way I can add quotes around the results from find? Or a
different way to fix this?
For anyone that has found this post through numerous googling, I found a way to not only
find specific files given a time range, but also NOT include the relative paths OR
whitespaces that would cause tarring errors. (THANK YOU SO MUCH STEVE.)
find . -name "*.pdf" -type f -mtime 0 -printf "%f\0" | tar -czvf /dir/zip.tar.gz --null -T -
. relative directory
-name "*.pdf" look for pdfs (or any file type)
-type f type to look for is a file
-mtime 0 look for files created in last 24 hours
-printf "%f\0" Regular -print0 OR -printf "%f"
did NOT work for me. From man pages:
This quoting is performed in the same way as for GNU ls. This is not the same quoting
mechanism as the one used for -ls and -fls. If you are able to decide what format to use
for the output of find then it is normally better to use '\0' as a terminator than to use
newline, as file names can contain white space and newline characters.
-czvf create archive, filter the archive through gzip , verbosely list
files processed, archive name
Some versions of tar, for example, the default versions on HP-UX (I tested 11.11 and 11.31),
do not include a command line option to specify a file list, so a decent work-around is to do
this:
On Solaris, you can use the option -I to read the filenames that you would normally state on
the command line from a file. In contrast to the command line, this can create tar archives
with hundreds of thousands of files (just did that).
Is there a simple shell command/script that supports excluding certain files/folders from
being archived?
I have a directory that need to be archived with a sub directory that has a number of very
large files I do not need to backup.
Not quite solutions:
The tar --exclude=PATTERN command matches the given pattern and excludes
those files, but I need specific files & folders to be ignored (full file path),
otherwise valid files might be excluded.
I could also use the find command to create a list of files and exclude the ones I don't
want to archive and pass the list to tar, but that only works with for a small amount of
files. I have tens of thousands.
I'm beginning to think the only solution is to create a file with a list of files/folders
to be excluded, then use rsync with --exclude-from=file to copy all the files to
a tmp directory, and then use tar to archive that directory.
Can anybody think of a better/more efficient solution?
EDIT: Charles Ma 's solution works well. The big gotcha is that the
--exclude='./folder' MUST be at the beginning of the tar command. Full command
(cd first, so backup is relative to that directory):
cd /folder_to_backup
tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .
so, you want to make a tar file that contain everyting inside /home/ftp/mysite (to move
the site to a new server), but file3 is just junk, and everything in
folder3 is also not needed, so we will skip those two.
we use the format
tar -czvf <name of tar file> <what to tar> <any excludes>
where the c = create, z = zip, and v = verbose (you can see the files as they are entered,
usefull to make sure none of the files you exclude are being added). and f= file.
so, my command would look like this
cd /home/ftp/
tar -czvf mysite.tar.gz mysite --exclude='file3' --exclude='folder3'
note the files/folders excluded are relatively to the root of your tar (I have tried full
path here relative to / but I can not make that work).
hope this will help someone (and me next time I google it)
I've experienced that, at least with the Cygwin version of tar I'm using
("CYGWIN_NT-5.1 1.7.17(0.262/5/3) 2012-10-19 14:39 i686 Cygwin" on a Windows XP Home Edition
SP3 machine), the order of options is important.
While this construction worked for me:
tar cfvz target.tgz --exclude='<dir1>' --exclude='<dir2>' target_dir
that one didn't work:
tar cfvz --exclude='<dir1>' --exclude='<dir2>' target.tgz target_dir
This, while tar --help reveals the following:
tar [OPTION...] [FILE]
So, the second command should also work, but apparently it doesn't seem to be the
case...
I found this somewhere else so I won't take credit, but it worked better than any of the
solutions above for my mac specific issues (even though this is closed):
tar zc --exclude __MACOSX --exclude .DS_Store -f <archive> <source(s)>
$ tar --exclude='./folder_or_file' --exclude='file_pattern' --exclude='fileA'
A word of warning for a side effect that I did not find immediately obvious: The exclusion
of 'fileA' in this example will search for 'fileA' RECURSIVELY!
Example:A directory with a single subdirectory containing a file of the same name
(data.txt)
If using --exclude='data.txt' the archive will not contain EITHER data.txt
file. This can cause unexpected results if archiving third party libraries, such as a
node_modules directory.
To avoid this issue make sure to give the entire path, like
--exclude='./dirA/data.txt'
To avoid possible 'xargs: Argument list too long' errors due to the use of
find ... | xargs ... when processing tens of thousands of files, you can pipe
the output of find directly to tar using find ... -print0 |
tar --null ... .
# archive a given directory, but exclude various files & directories
# specified by their full file paths
find "$(pwd -P)" -type d \( -path '/path/to/dir1' -or -path '/path/to/dir2' \) -prune \
-or -not \( -path '/path/to/file1' -or -path '/path/to/file2' \) -print0 |
gnutar --null --no-recursion -czf archive.tar.gz --files-from -
#bsdtar --null -n -czf archive.tar.gz -T -
Use the find command in conjunction with the tar append (-r) option. This way you can add
files to an existing tar in a single step, instead of a two pass solution (create list of
files, create tar).
You can use cpio(1) to create tar files. cpio takes the files to archive on stdin, so if
you've already figured out the find command you want to use to select the files the archive,
pipe it into cpio to create the tar file:
gnu tar v 1.26 the --exclude needs to come after archive file and backup directory arguments,
should have no leading or trailing slashes, and prefers no quotes (single or double). So
relative to the PARENT directory to be backed up, it's:
tar cvfz /path_to/mytar.tgz ./dir_to_backup
--exclude=some_path/to_exclude
After reading all this good answers for different versions and having solved the problem for
myself, I think there are very small details that are very important, and rare to GNU/Linux
general use , that aren't stressed enough and deserves more than comments.
So I'm not going to try to answer the question for every case, but instead, try to
register where to look when things doesn't work.
IT IS VERY IMPORTANT TO NOTICE:
THE ORDER OF THE OPTIONS MATTER: it is not the same put the --exclude before than after
the file option and directories to backup. This is unexpected at least to me, because in my
experience, in GNU/Linux commands, usually the order of the options doesn't matter.
Different tar versions expects this options in different order: for instance,
@Andrew's answer indicates that in GNU tar v 1.26 and 1.28 the excludes comes last,
whereas in my case, with GNU tar 1.29, it's the other way.
THE TRAILING SLASHES MATTER : at least in GNU tar 1.29, it shouldn't be any .
In my case, for GNU tar 1.29 on Debian stretch, the command that worked was
tar --exclude="/home/user/.config/chromium" --exclude="/home/user/.cache" -cf file.tar /dir1/ /home/ /dir3/
The quotes didn't matter, it worked with or without them.
tar -cvzf destination_folder source_folder -X /home/folder/excludes.txt
-X indicates a file which contains a list of filenames which must be excluded from the
backup. For Instance, you can specify *~ in this file to not include any filenames ending
with ~ in the backup.
Possible redundant answer but since I found it useful, here it is:
While a FreeBSD root (i.e. using csh) I wanted to copy my whole root filesystem to /mnt
but without /usr and (obviously) /mnt. This is what worked (I am at /):
tar --exclude ./usr --exclude ./mnt --create --file - . (cd /mnt && tar xvd -)
My whole point is that it was necessary (by putting the ./ ) to specify to tar that
the excluded directories where part of the greater directory being copied.
I had no luck getting tar to exclude a 5 Gigabyte subdirectory a few levels deep. In the end,
I just used the unix Zip command. It worked a lot easier for me.
So for this particular example from the original post
(tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz . )
The equivalent would be:
zip -r /backup/filename.zip . -x upload/folder/**\* upload/folder2/**\*
The following bash script should do the trick. It uses the answer given
here by Marcus Sundman.
#!/bin/bash
echo -n "Please enter the name of the tar file you wish to create with out extension "
read nam
echo -n "Please enter the path to the directories to tar "
read pathin
echo tar -czvf $nam.tar.gz
excludes=`find $pathin -iname "*.CC" -exec echo "--exclude \'{}\'" \;|xargs`
echo $pathin
echo tar -czvf $nam.tar.gz $excludes $pathin
This will print out the command you need and you can just copy and paste it back in. There
is probably a more elegant way to provide it directly to the command line.
Just change *.CC for any other common extension, file name or regex you want to exclude
and this should still work.
EDIT
Just to add a little explanation; find generates a list of files matching the chosen regex
(in this case *.CC). This list is passed via xargs to the echo command. This prints --exclude
'one entry from the list'. The slashes () are escape characters for the ' marks.
I've got a job running on my server at the command line prompt for a two days now:
find data/ -name filepattern-*2009* -exec tar uf 2009.tar {} ;
It is taking forever , and then some. Yes, there are millions of files in the
target directory. (Each file is a measly 8 bytes in a well hashed directory structure.) But
just running...
...takes only two hours or so. At the rate my job is running, it won't be finished for a
couple of weeks .. That seems unreasonable. Is there a more efficient to do this?
Maybe with a more complicated bash script?
A secondary questions is "why is my current approach so slow?"
If you already did the second command that created the file list, just use the
-T option to tell tar to read the files names from that saved file list. Running
1 tar command vs N tar commands will be a lot better.
Here's a find-tar combination that can do what you want without the use of xargs or exec
(which should result in a noticeable speed-up):
tar --version # tar (GNU tar) 1.14
# FreeBSD find (on Mac OS X)
find -x data -name "filepattern-*2009*" -print0 | tar --null --no-recursion -uf 2009.tar --files-from -
# for GNU find use -xdev instead of -x
gfind data -xdev -name "filepattern-*2009*" -print0 | tar --null --no-recursion -uf 2009.tar --files-from -
# added: set permissions via tar
find -x data -name "filepattern-*2009*" -print0 | \
tar --null --no-recursion --owner=... --group=... --mode=... -uf 2009.tar --files-from -
Guessing why it is slow is hard as there is not much information. What is the structure of
the directory, what filesystem do you use, how it was configured on creating. Having milions
of files in single directory is quite hard situation for most filesystems.
To correctly handle file names with weird (but legal) characters (such as newlines, ...) you
should write your file list to filesOfInterest.txt using find's -print0:
find -x data -name "filepattern-*2009*" -print0 > filesOfInterest.txt
tar --null --no-recursion -uf 2009.tar --files-from filesOfInterest.txt
The way you currently have things, you are invoking the tar command every single time it
finds a file, which is not surprisingly slow. Instead of taking the two hours to print plus
the amount of time it takes to open the tar archive, see if the files are out of date, and
add them to the archive, you are actually multiplying those times together. You might have
better success invoking the tar command once, after you have batched together all the names,
possibly using xargs to achieve the invocation. By the way, I hope you are using
'filepattern-*2009*' and not filepattern-*2009* as the stars will be expanded by the shell
without quotes.
You should check if most of your time are being spent on CPU or in I/O. Either way, there are
ways to improve it:
A: don't compress
You didn't mention "compression" in your list of requirements so try dropping the "z" from
your arguments list: tar cf . This might be speed up things a bit.
There are other techniques to speed-up the process, like using "-N " to skip files you
already backed up before.
B: backup the whole partition with dd
Alternatively, if you're backing up an entire partition, take a copy of the whole disk
image instead. This would save processing and a lot of disk head seek time.
tar and any other program working at a higher level have a overhead of having to
read and process directory entries and inodes to find where the file content is and to do
more head disk seeks , reading each file from a different place from the disk.
To backup the underlying data much faster, use:
dd bs=16M if=/dev/sda1 of=/another/filesystem
(This assumes you're not using RAID, which may change things a bit)
,
To repeat what others have said: we need to know more about the files that are being backed
up. I'll go with some assumptions here. Append to the tar file
If files are only being added to the directories (that is, no file is being deleted), make
sure you are appending to the existing tar file rather than re-creating it every time. You
can do this by specifying the existing archive filename in your tar command
instead of a new one (or deleting the old one).
Write to a different disk
Reading from the same disk you are writing to may be killing performance. Try writing to a
different disk to spread the I/O load. If the archive file needs to be on the same disk as
the original files, move it afterwards.
Don't compress
Just repeating what @Yves said. If your backup files are already compressed, there's not
much need to compress again. You'll just be wasting CPU cycles.
I'm trying to tar a collection of files in a directory called 'my_directory' and
remove the originals by using the command:
tar -cvf files.tar my_directory --remove-files
However it is only removing the individual files inside the directory and not the
directory itself (which is what I specified in the command). What am I missing here?
EDIT:
Yes, I suppose the 'remove-files' option is fairly literal. Although I too found the man
page unclear on that point. (In linux I tend not to really distinguish much between
directories and files that much, and forget sometimes that they are not the same thing). It
looks like the consensus is that it doesn't remove directories.
However, my major prompting point for asking this question stems from tar's handling of
absolute paths. Because you must specify a relative path to a file/s to be compressed, you
therefore must change to the parent directory to tar it properly. As I see it using any kind
of follow-on 'rm' command is potentially dangerous in that situation. Thus I was hoping to
simplify things by making tar itself do the remove.
For example, imagine a backup script where the directory to backup (ie. tar) is included
as a shell variable. If that shell variable value was badly entered, it is possible that the
result could be deleted files from whatever directory you happened to be in last.
logFile={path to a run log that captures status messages}
Then you could execute something along the lines of:
cd ${parent}
tar cvf Tar_File.`date%Y%M%D_%H%M%S` ${source}
if [ $? != 0 ]
then
echo "Backup FAILED for ${source} at `date` >> ${logFile}
else
echo "Backup SUCCESS for ${source} at `date` >> ${logFile}
rm -rf ${source}
fi
Also the word "file" is ambigous in this case. But because this is a command line switch I
would it expect to mean also directories, because in unix/lnux everything is a file, also a
directory. (The other interpretation is of course also valid, but It makes no sense to keep
directories in such a case. I would consider it unexpected and confusing behavior.)
But I have found that in gnu tar on some distributions gnu tar actually removes the
directory tree. Another indication that keeping the tree was a bug. Or at least some
workaround until they fixed it.
This is what I tried out on an ubuntu 10.04 console:
This is what it feels like to actually learn from an article instead of simply having it
confirm your existing beliefs.
Here is what it says:
An analysis of Dice job-posting data over the past year shows a startling dip in the number
of companies looking for technology professionals who are skilled in Ruby. In 2018, the
number of Ruby jobs declined 56 percent. That's a huge warning sign that companies are
turning away from Ruby - and if that's the case, the language's user-base could rapidly
erode to almost nothing.
Well, what's your evidence-based rebuttal to that?
If you actually look at the TIOBE rankings, it's #11 (not #12 as claimed in the article),
and back on the upswing. If you look at RedMonk, which they say they looked at but don't
reference with respect to Ruby, it is a respectable #8, being one of the top languages on
GitHub and Stack Overflow.
We are certainly past the glory days of Ruby, when it was the Hot New Thing and everyone
was deploying Rails, but to suggest that it is "probably doomed" seems a somewhat hysterical
prediction.
How do they know how many Ruby jobs there are? Maybe how many Ruby job openings
announced, but not the actual number of jobs. Or maybe they are finding Ruby job-applicants
and openings via other means.
Maybe there is a secret list of Ruby job postings only available to the coolest
programmers? Man! I never get to hang out with the cool kids.
Perhaps the devops/web programmers is a dying field.
But to be fair, Ruby had its peak about 10 years ago. With Ruby on Rails. However the
problem is the "Rails" started to get very dated. And Python and Node.JS had taken its
place.
I don't see Ruby dying anytime soon, but I do get the feeling that Python is the go-to
scripting language for all the things now. I learned Ruby and wish I spent that time learning
Python.
Perl is perl. It will live on, but anybody writing new things with it probably needs to
have a talkin' to.
I learned Ruby and wish I spent that time learning Python.
Ruby and Python are basically the same thing. With a little google, you can literally
start programming in Python today. Search for "print python" and you can easily write a hello
world. search for 'python for loop' and suddenly you can do repetitious tasks. Search for
"define function python" and you can organize your code.
After that do a search for hash tables and lists in Python and you'll be good enough to
pass a coding interview in the language.
I write new code in perl all the time. Cleanly written, well formatted and completely
maintainable. Simply because YOU can't write perl in such a manner, that doesn't mean others
can't.
I happen to read a lot of Perl in my day job, involving reverse-engineering a particular
Linux-based appliance for integration purposes. I seldom come across scripts that are too
actually bad.
It's important to understand that Perl has a different concept of readability. It's more
like reading a book than reading a program, because there are so many ways to write any given
task. A good Perl programmer will incorporate that flexibility into their style, so intent
can be inferred not just from the commands used, but also how the code is arranged. For
example, a large block describing a complex function would be written verbosely for detailed
clarity.
A trivial statement could be used, if it resolves an edge case.
Conversely, a good Perl reader will be familiar enough with the language to understand the
idioms and shorthand used, so they can understand the story as written without being
distracted by the ugly bits. Once viewed from that perspective, a Perl program can condense
incredible amounts of description into just a few lines, and still be as readily-understood
as any decent novel.
In building several dev teams, I have never tried to hire everyone with any particular
skill. I aim to have at least two people with each skill, but won't put effort to having more
than that at first. After the initial startup of the team, I try to run projects in pairs,
with an expert starting the project, then handing it to a junior (in that particular skill)
for completion. After a few rounds of that, the junior is close enough to an expert, and
somebody else takes the junior role. That way, even with turnover, expertise is shared among
the team, and there's always someone who can be the expert.
Back to the subject at hand, though...
My point is that Perl is a more conversational language that others, and its structure
reflects that. It is unreasonable to simply look at Perl code, see the variety of structures,
and declare it "unreadable" simply because the reader doesn't understand the language.
As an analogy, consider the structural differences between Lord of the Rings and
The Cat in the Hat . A reader who is only used to The Cat in the Hat would find
Lord of the Rings to be ridiculously complex to the point of being unreadable, when
Lord of the Rings is simply making use of structures and capabilities that are not
permitted in the language of young children's' books.
This is not to say that other languages are wrong to have a more limited grammar. They are
simply different, and learning to read a more flexible language is a skill to be developed
like any other. Similar effort must be spent to learn other languages with
sufficiently-different structure, like Lisp or Haskell.
Perl: Even if RedMonk has Perl's popularity declining, it's still going to take a long
time for the language to flatten out completely, given the sheer number of legacy websites
that still feature its code. Nonetheless, a lack of active development, and widespread
developer embrace of other languages for things like building websites, means that Perl is
going to just fall into increasing disuse.
First, Perl is used for many, many more things than websites -- and the focus in TFA is
short-sighted. Second, I've written a LOT of Perl in my many years, but wouldn't say my (or
most people's) career is based on it. Yes, I have written applications in Perl, but more
often used it for utility, glue and other things that help get things done, monitor and
(re)process data. Nothing (or very few things) can beat Perl for a quick knock-off script to
do something or another.
Perl's not going anywhere and it will be a useful language to know for quite a while.
Languages like Perl (and Python) are great tools to have in your toolbox, ones that you know
how to wield well when you need them. Knowing when you need them, and not something else, is
important.
Anybody whose career is based on a single programming language is doomed already.
Programmers know how to write code. The language they use is beside the point. A good
programmer can write code in whatever language is asked of them.
The writer of this article should consider diversifying his skillset at some point, as
not all bloggers endure forever and his popularity ranking on Slashdot has recently
tanked.
I'd suggest that this writer quit their day job and take up stand up...
Old languages never really die until the platform dies. Languages may fall out of favor,
but they don't usually die until the platform they are running on disappears and then the
people who used them die. So, FORTRAN, C, C++, and COBOL and more are here to pretty much
stay.
Specifically, PERL isn't going anywhere being fundamentally on Linux, neither is Ruby, the
rest to varying degrees have been out of favor for awhile now, but none of the languages in
the article are dead. They are, however, falling out of favor and because of that, it might
be a good idea to be adding other tools to your programmer's tool box if your livelihood
depends on one of them.
"... R commits a substantial scale crime by being so dependent on memory-resident objects. Python commits major scale crime with its single-threaded primary interpreter loop. ..."
I had this naive idea that Python might substantially displace R until I learned more
about the Python internals, which are pretty nasty. This is the new generation's big
data language? If true, sure sucks to be young again.
Python isn't even really used to do big data. It's mainly used to orchestrate big data
flows on top of other libraries or facilities. It has more or less become the lingua franca
of high-level hand waving. Any real grunt is far below.
R commits a substantial scale crime by being so dependent on memory-resident objects.
Python commits major scale crime with its single-threaded primary interpreter loop.
If I move away from R, it will definitely be Julia for any real work (as Julia matures, if
it matures well), and not Python.
With tar.gz to extract a file archiver first creates an intermediary tarball x.tar file from
x.tar.gz by uncompressing the whole archive then unpack requested files from this intermediary
tarball. In tar.gz archive is large unpacking can take several hours or even days.
@ChristopheDeTroyer Tarballs are compressed in such a way that you have to decompress
them in full, then take out the file you want. I think that .zip folders are different, so if
you want to be able to take out individual files fast, try them. – GKFX
Jun 3 '16 at 13:04
cd my_directory/ && tar -zcvf ../my_dir.tgz . && cd -
should do the job in one line. It works well for hidden files as well. "*" doesn't expand
hidden files by path name expansion at least in bash. Below is my experiment:
$ mkdir my_directory
$ touch my_directory/file1
$ touch my_directory/file2
$ touch my_directory/.hiddenfile1
$ touch my_directory/.hiddenfile2
$ cd my_directory/ && tar -zcvf ../my_dir.tgz . && cd ..
./
./file1
./file2
./.hiddenfile1
./.hiddenfile2
$ tar ztf my_dir.tgz
./
./file1
./file2
./.hiddenfile1
./.hiddenfile2
The -C my_directory tells tar to change the current directory to
my_directory , and then . means "add the entire current directory"
(including hidden files and sub-directories).
Make sure you do -C my_directory before you do . or else you'll
get the files in the current directory.
With some conditions (archive only files, dirs and symlinks):
find /my/dir/ -printf "%P\n" -type f -o -type l -o -type d | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -
Explanation
The below unfortunately includes a parent directory ./ in the archive:
tar -czf mydir.tgz -C /my/dir .
You can move all the files out of that directory by using the --transform
configuration option, but that doesn't get rid of the . directory itself. It
becomes increasingly difficult to tame the command.
You could use $(find ...) to add a file list to the command (like in
magnus' answer ),
but that potentially causes a "file list too long" error. The best way is to combine it with
tar's -T option, like this:
find /my/dir/ -printf "%P\n" -type f -o -type l -o -type d | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -
Basically what it does is list all files ( -type f ), links ( -type
l ) and subdirectories ( -type d ) under your directory, make all
filenames relative using -printf "%P\n" , and then pass that to the tar command
(it takes filenames from STDIN using -T - ). The -C option is
needed so tar knows where the files with relative names are located. The
--no-recursion flag is so that tar doesn't recurse into folders it is told to
archive (causing duplicate files).
If you need to do something special with filenames (filtering, following symlinks etc),
the find command is pretty powerful, and you can test it by just removing the
tar part of the above command:
$ find /my/dir/ -printf "%P\n" -type f -o -type l -o -type d
> textfile.txt
> documentation.pdf
> subfolder2
> subfolder
> subfolder/.gitignore
For example if you want to filter PDF files, add ! -name '*.pdf'
$ find /my/dir/ -printf "%P\n" -type f ! -name '*.pdf' -o -type l -o -type d
> textfile.txt
> subfolder2
> subfolder
> subfolder/.gitignore
Non-GNU find
The command uses printf (available in GNU find ) which tells
find to print its results with relative paths. However, if you don't have GNU
find , this works to make the paths relative (removes parents with
sed ):
find /my/dir/ -type f -o -type l -o -type d | sed s,^/my/dir/,, | tar -czf mydir.tgz --no-recursion -C /my/dir/ -T -
This Answer should work in
most situations. Notice however how the filenames are stored in the tar file as, for example,
./file1 rather than just file1 . I found that this caused problems
when using this method to manipulate tarballs used as package files in BuildRoot .
One solution is to use some Bash globs to list all files except for .. like
this:
Now tar will return an error if there are no files matching ..?* or
.[^.]* , but it will still work. If the error is a problem (you are checking for
success in a script), this works:
I would propose the following Bash function (first argument is the path to the dir, second
argument is the basename of resulting archive):
function tar_dir_contents ()
{
local DIRPATH="$1"
local TARARCH="$2.tar.gz"
local ORGIFS="$IFS"
IFS=$'\n'
tar -C "$DIRPATH" -czf "$TARARCH" $( ls -a "$DIRPATH" | grep -v '\(^\.$\)\|\(^\.\.$\)' )
IFS="$ORGIFS"
}
You can run it in this way:
$ tar_dir_contents /path/to/some/dir my_archive
and it will generate the archive my_archive.tar.gz within current directory.
It works with hidden (.*) elements and with elements with spaces in their filename.
# tar all files within and deeper in a given directory
# with no prefixes ( neither <directory>/ nor ./ )
# parameters: <source directory> <target archive file>
function tar_all_in_dir {
{ cd "$1" && find -type f -print0; } \
| cut --zero-terminated --characters=3- \
| tar --create --file="$2" --directory="$1" --null --files-from=-
}
Safely handles filenames with spaces or other unusual characters. You can optionally add a
-name '*.sql' or similar filter to the find command to limit the files
included.
The article mixes apples and oranges and demonstrates complete ignorance in the of language
classification.
Two of the three top language are scripting languages. This is a huge victory. But Python has
problems with efficiency (not that they matter everywhere) and is far from being an elegant
language. It entered mainstream via the adoption it at universities as the first programming
language, displacing Java (which I think might be a mistake -- I think teaching should start with
assembler and replicate the history of development -- assembler -- compiled languages --
scripting language)
Perl which essentially heralded the era of scripting languages is now losing its audience and
shrinks to its initial purpose -- the tool for Unix system administrators. But I think is such
surveys its use is underreported for obvious reasons -- it is not fashionable. But please note
that Fortran is still widely used.
Go is just veriant of a "better C" -- statically typed, compiled language. Rust is an attempt
to improve C++. Both belong to the class of compiled languages. So complied language still hold
their own and are important part of the ecosystem. See also How Rust Compares to Other Programming Languages -
The New Stack
The report surveyed about 7,000 developers worldwide, and
revealed Python
is the most studied programming language, the most loved language , and the third top
primary programming language developers are using... The top use cases developers are using
Python for include data analysis, web development, machine learning and writing automation
scripts, according to the JetBrains report
. More developers are also beginning to move over to Python 3, with 9 out of 10 developers
using the current version.
The JetBrains report also found while Go is still a young language, it is the most
promising programming language. "Go started out with a share of 8% in 2017 and now it has
reached 18%. In addition, the biggest number of developers (13%) chose Go as a language they
would like to adopt or migrate to," the report stated...
Seventy-three percent of JavaScript developers use TypeScript, which is up from 17
percent last year. Seventy-one percent of Kotlin developers use Kotlin for work. Java 8 is
still the most popular programming language, but developers are beginning to migrate to Java 10
and 11.
JetBrains (which designed Kotlin in 2011) also said that 60% of their survey's respondents
identified themselves
as professional web back-end developers (while 46% said they did web front-end, and 23%
developed mobile applications). 41% said they hadn't contributed to open source projects "but I
would like to," while 21% said they contributed "several times a year."
"16% of developers don't have any tests in their projects. Among fully-employed senior
developers though, that statistic is just 8%. Like last year, about 30% of developers still
don't have unit tests in their projects." Other interesting statistics: 52% say they code in
their dreams. 57% expect AI to replace developers "partially" in the future. "83% prefer the
Dark theme for their editor or IDE. This represents a growth of 6 percentage points since last
year for each environment. 47% take public transit to work.
And 97% of respondents using Rust "said they have been using Rust for less than a year. With
only 14% using it for work, it's much more popular as a language for personal/side projects."
And more than 90% of the Rust developers who responded worked with codebases with less than 300
files.
"... There is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is just a copy of the input file with header blocks in between files. ..."
"... You can also use the tar flag "--use-compress-program=" to tell tar what compression program to use. ..."
I normally compress using tar zcvf and decompress using tar zxvf
(using gzip due to habit).
I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I
notice that many of the cores are unused during compression/decompression.
Is there any way I can utilize the unused cores to make it faster?
The solution proposed by Xiong Chiamiov above works beautifully. I had just backed up my
laptop with .tar.bz2 and it took 132 minutes using only one cpu thread. Then I compiled and
installed tar from source: gnu.org/software/tar I included the options mentioned
in the configure step: ./configure --with-gzip=pigz --with-bzip2=lbzip2 --with-lzip=plzip I
ran the backup again and it took only 32 minutes. That's better than 4X improvement! I
watched the system monitor and it kept all 4 cpus (8 threads) flatlined at 100% the whole
time. THAT is the best solution. – Warren Severin
Nov 13 '17 at 4:37
You can use pigz instead of gzip, which
does gzip compression on multiple cores. Instead of using the -z option, you would pipe it
through pigz:
tar cf - paths-to-archive | pigz > archive.tar.gz
By default, pigz uses the number of available cores, or eight if it could not query that.
You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can
request better compression with -9. E.g.
pigz does use multiple cores for decompression, but only with limited improvement over a
single core. The deflate format does not lend itself to parallel decompression.
The
decompression portion must be done serially. The other cores for pigz decompression are used
for reading, writing, and calculating the CRC. When compressing on the other hand, pigz gets
close to a factor of n improvement with n cores.
There is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is just a copy of the input file
with header blocks in between files.
Unfortunately by doing so the concurrent feature of pigz is lost. You can see for yourself by
executing that command and monitoring the load on each of the cores. – Valerio
Schiavoni
Aug 5 '14 at 22:38
I prefer tar - dir_to_zip | pv | pigz > tar.file pv helps me estimate, you
can skip it. But still it easier to write and remember. – Offenso
Jan 11 '17 at 17:26
-I, --use-compress-program PROG
filter through PROG (must accept -d)
You can use multithread version of archiver or compressor utility.
Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:
$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive
Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need
specify additional parameters, then use pipes (add parameters if necessary):
$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz
Input and output of singlethread and multithread are compatible. You can compress using
multithread version and decompress using singlethread version and vice versa.
p7zip
For p7zip for compression you need a small shell script like the following:
#!/bin/sh
case $1 in
-d) 7za -txz -si -so e;;
*) 7za -txz -si -so a .;;
esac 2>/dev/null
Save it as 7zhelper.sh. Here the example of usage:
$ tar -I 7zhelper.sh -cf OUTPUT_FILE.tar.7z paths_to_archive
$ tar -I 7zhelper.sh -xf OUTPUT_FILE.tar.7z
xz
Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils,
you can utilize multiple cores for compression by setting -T or
--threads to an appropriate value via the environmental variable XZ_DEFAULTS
(e.g. XZ_DEFAULTS="-T 0" ).
This is a fragment of man for 5.1.0alpha version:
Multithreaded compression and decompression are not implemented yet, so this option has
no effect for now.
However this will not work for decompression of files that haven't also been compressed
with threading enabled. From man for version 5.2.2:
Threaded decompression hasn't been implemented yet. It will only work on files that
contain multiple blocks with size information in block headers. All files compressed in
multi-threaded mode meet this condition, but files compressed in single-threaded mode don't
even if --block-size=size is used.
Recompiling with replacement
If you build tar from sources, then you can recompile with parameters
After recompiling tar with these options you can check the output of tar's help:
$ tar --help | grep "lbzip2\|plzip\|pigz"
-j, --bzip2 filter the archive through lbzip2
--lzip filter the archive through plzip
-z, --gzip, --gunzip, --ungzip filter the archive through pigz
I just found pbzip2 and
mpibzip2 . mpibzip2 looks very
promising for clusters or if you have a laptop and a multicore desktop computer for instance.
– user1985657
Apr 28 '15 at 20:57
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec
This command will look for the files you want to archive, in this case
/my/path/*.sql and /my/path/*.log . Add as many -o -name
"pattern" as you want.
-exec will execute the next command using the results of find :
tar
Step 2: tar
tar -P --transform='s@/my/path/@@g' -cf - {} +
--transform is a simple string replacement parameter. It will strip the path
of the files from the archive so the tarball's root becomes the current directory when
extracting. Note that you can't use -C option to change directory as you'll lose
benefits of find : all files of the directory would be included.
-P tells tar to use absolute paths, so it doesn't trigger the
warning "Removing leading `/' from member names". Leading '/' with be removed by
--transform anyway.
-cf - tells tar to use the tarball name we'll specify later
{} + uses everyfiles that find found previously
Step 3:
pigz
pigz -9 -p 4
Use as many parameters as you want. In this case -9 is the compression level
and -p 4 is the number of cores dedicated to compression. If you run this on a
heavy loaded webserver, you probably don't want to use all available cores.
Is there a way to use one computer to send keystrokes to another by usb ?
What i'm looking to do is to capture the usb signal used by a keyboard (with USBTrace for
example) and use it with PC-1 to send it to PC-2. So that PC-2 recognize it as a regular
keyboard input.
What you essentially need is a USB port on PC-1 that will act as a
USB device for PC-2.
That is not possible for the vast majority of PC systems because USB is an asymmetric bus,
with a host/device (or master/slave, if you wish) architecture. USB controllers (and their
ports) on most PCs can only work in host mode and cannot simulate a device.
That is the reason that you cannot network computers through USB without a special cable
with specialized electronics.
The only exception is if you somehow have a PC that supports the USB On-The-Go standard that allows for a USB
port to act in both host and device mode. USB-OTG devices do exist, but they are usually
embedded devices (smartphones etc). I don't know if there is a way to add a USB-OTG port to a
commodity PC.
EDIT:
If you do not need a keyboard before the OS on PC-2 boots, you might be able to use a pair
of USB Bluetooth dongles - one on each PC. You'd have to use specialised software on PC-1,
but it is definitely possible - I've already seen a
possible implementation on Linux , and I am reasonably certain that there must be one for
Windows. You will also need Bluetooth HID drivers on PC-2, if they are not already
installed.
On a different note, have you considered a purely software/network solution such as
TightVNC ?
This uses a network connection from your computer to the raspi which is connected to a
teensy (usb developer board) to send the key strokes.
This solution is not an out-of-the-box product. The required skill is similar to
programming some other devices like arduino. But it's a complete and working setup.
The cheapest options are commercial microcontrollers (eg arduino platform, pic, etc) or ready
built usb keyboard controllers (eg i-pac, arcade controllers,etc)
Connect 2 computer, write your own program to send signal to your (usb <-> rs232)
unit, then you can control another computer under the help of TWedge.
> ,
The above mentionned https://github.com/Flowm/etherkey is one way. The
keyboard is emulated from an rPi, but the principle can be used from PC to PC (or Mac to
Whatever). The core answer to your question is to use an OTG-capable chip, and then you
control this chip via a USB-serial adapter.
The generic answer is: you need an OTG capable, or slave capable device: Arduino, Teensy,
Pi 0 (either from Rapberry or Orange brands, both work; only the ZERO models are OTG
capable), or, an rPi-A with heavy customisation (since it does not include USB hub, it can
theoretically be converted into a slave; never found any public tutorial to do it), or any
smartphone (Samsung, Nokia, HTC, Oukitel ... most smartphones are OTG capable). If you go for
a Pi or a phone, then, you want to dig around USB Gadget. Cheaper solutions (Arduino/Teensy)
need custom firmware.
I have three RHEL 6.6 servers. One has a yum repository that I know works. The other two servers I will refer to as "yum clients."
These two are configured to use the same yum repository (the first server described). When I do yum install httpd
on each of these two yum client servers, I get two different results. One server prepares for the installation as normal and prompts
me with a y/n prompt. The second server says
No package httpd available.
The /etc/yum.conf files on each of the two servers is identical. The /etc/yum.repos.d/ directories have the same .repo files.
Why does one yum client not see the httpd package? I use httpd as an example. One yum client cannot install any package. The other
yum client can install anything. Neither have access to the Internet or different servers the other one does not have access to.
XXX
If /etc/yum.conf is identical on all servers, and that package is not listed there in exclude line, check if the repo is enabled
on all the servers.
Do
grep enabled /etc/yum.repos.d/filename.repo
and see if it is set to 0 or 1.
value of enabled needs to be set to 1, for yum to use that repo.
If repo is not enabled, you can edit the repo file, and change the enable to 1, or try to run yum with enablerepo switch, to
enable it for that operation.
I would like Perl to write to STDERR only if STDOUT is not the same. For example, if both
STDOUT and STDERR would redirect output to the Terminal, then I don't want STDERR to be
printed.
Consider the following example (outerr.pl):
#!/usr/bin/perl
use strict;
use warnings;
print STDOUT "Hello standard output!\n";
print STDERR "Hello standard error\n" if ($someMagicalFlag);
exit 0
Now consider this (this is what I would like to achieve):
bash $ outerr.pl
Hello standard output!
However, if I redirect out to a file, I'd like to get:
bash $ outerr.pl > /dev/null
Hello standard error
and similary the other way round:
bash $ outerr.pl 2> /dev/null
Hello standard output!
If I re-direct both out/err to the same file, then only stdout should be
displayed:
my @stat_err = stat STDERR;
my @stat_out = stat STDOUT;
my $stderr_is_not_stdout = (($stat_err[0] != $stat_out[0]) ||
($stat_err[1] != $stat_out[1]));
But that won't work on Windows, which doesn't have real inode numbers. It gives both false
positives (thinks they're different when they aren't) and false negatives (thinks they're the
same when they aren't).
EDIT: Solutions for the case that both STDERR and STDOUT are regular files:
Tom Christianson suggested to stat and compare the dev and ino fields. This will work in
UNIX, but, as @cjm pointed out, not in Windows.
If you can guarantee that no other program will write to the file, you could do the
following both in Windows and UNIX:
check the position the file descriptors for STDOUT and STDERR are at, if they are not
equal, you redirected one of them with >> to a nonempty file.
Otherwise, write 42 bytes to file descriptor 2
Seek to the end of file descriptor 1. If it is 42 more than before, chances are high
that both are redirected to the same file. If it is unchanged, files are different. If it
is changed, but not by 42, someone else is writing there, all bets are off (but then,
you're not in Windows, so the stat method will work).
Some additional code was necessary to successfully migrate the sqlite database (handles
one line Create table statements, foreign keys, fixes a bug in the original program that
converted empty fields '' to \' .
Here's a pretty literal translation with just the minimum of obvious style changes (putting
all code into a function, using string rather than re operations where possible).
import re, fileinput
def main():
for line in fileinput.input():
process = False
for nope in ('BEGIN TRANSACTION','COMMIT',
'sqlite_sequence','CREATE UNIQUE INDEX'):
if nope in line: break
else:
process = True
if not process: continue
m = re.search('CREATE TABLE "([a-z_]*)"(.*)', line)
if m:
name, sub = m.groups()
line = '''DROP TABLE IF EXISTS %(name)s;
CREATE TABLE IF NOT EXISTS %(name)s%(sub)s
'''
line = line % dict(name=name, sub=sub)
else:
m = re.search('INSERT INTO "([a-z_]*)"(.*)', line)
if m:
line = 'INSERT INTO %s%s\n' % m.groups()
line = line.replace('"', r'\"')
line = line.replace('"', "'")
line = re.sub(r"([^'])'t'(.)", r"\1THIS_IS_TRUE\2", line)
line = line.replace('THIS_IS_TRUE', '1')
line = re.sub(r"([^'])'f'(.)", r"\1THIS_IS_FALSE\2", line)
line = line.replace('THIS_IS_FALSE', '0')
line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
print line,
main()
In the lines using regular expression substitution, the insertion of the matched groups
must be double-escaped OR the replacement string must be prefixed with r to mark is as
regular expression:
line = re.sub(r"([^'])'t'(.)", "\\1THIS_IS_TRUE\\2", line)
or
line = re.sub(r"([^'])'f'(.)", r"\1THIS_IS_FALSE\2", line)
Also, this line should be added before print:
line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
Last, the column names in create statements should be backticks in MySQL. Add this in line
15:
sub = sub.replace('"','`')
Here's the complete script with modifications:
import re, fileinput
def main():
for line in fileinput.input():
process = False
for nope in ('BEGIN TRANSACTION','COMMIT',
'sqlite_sequence','CREATE UNIQUE INDEX'):
if nope in line: break
else:
process = True
if not process: continue
m = re.search('CREATE TABLE "([a-z_]*)"(.*)', line)
if m:
name, sub = m.groups()
sub = sub.replace('"','`')
line = '''DROP TABLE IF EXISTS %(name)s;
CREATE TABLE IF NOT EXISTS %(name)s%(sub)s
'''
line = line % dict(name=name, sub=sub)
else:
m = re.search('INSERT INTO "([a-z_]*)"(.*)', line)
if m:
line = 'INSERT INTO %s%s\n' % m.groups()
line = line.replace('"', r'\"')
line = line.replace('"', "'")
line = re.sub(r"([^'])'t'(.)", "\\1THIS_IS_TRUE\\2", line)
line = line.replace('THIS_IS_TRUE', '1')
line = re.sub(r"([^'])'f'(.)", "\\1THIS_IS_FALSE\\2", line)
line = line.replace('THIS_IS_FALSE', '0')
line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
if re.search('^CREATE INDEX', line):
line = line.replace('"','`')
print line,
main()
all of scripts on this page can't deal with simple sqlite3:
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE Filename (
FilenameId INTEGER,
Name TEXT DEFAULT '',
PRIMARY KEY(FilenameId)
);
INSERT INTO "Filename" VALUES(1,'');
INSERT INTO "Filename" VALUES(2,'bigfile1');
INSERT INTO "Filename" VALUES(3,'%gconf-tree.xml');
None were able to reformat "table_name" into proper mysql's `table_name` . Some messed up
empty string value.
I am not sure what is so hard to understand about this that it requires a snide remark as in
your comment above. Note that <> is called the diamond operator.
s/// is the substitution operator and // is the match operator
m// .
Real issue is do you know actually how to migrate the database? What is presented is merely a
search and replace loop.
> ,
Shortest? The tilde signifies a regex in perl. "import re" and go from there. The only key
differences are that you'll be using \1 and \2 instead of $1 and $2 when you assign values,
and you'll be using %s for when you're replacing regexp matches inside strings.
Some additional code was necessary to successfully migrate the sqlite database (handles
one line Create table statements, foreign keys, fixes a bug in the original program that
converted empty fields '' to \' .
Here's a pretty literal translation with just the minimum of obvious style changes (putting
all code into a function, using string rather than re operations where possible).
import re, fileinput
def main():
for line in fileinput.input():
process = False
for nope in ('BEGIN TRANSACTION','COMMIT',
'sqlite_sequence','CREATE UNIQUE INDEX'):
if nope in line: break
else:
process = True
if not process: continue
m = re.search('CREATE TABLE "([a-z_]*)"(.*)', line)
if m:
name, sub = m.groups()
line = '''DROP TABLE IF EXISTS %(name)s;
CREATE TABLE IF NOT EXISTS %(name)s%(sub)s
'''
line = line % dict(name=name, sub=sub)
else:
m = re.search('INSERT INTO "([a-z_]*)"(.*)', line)
if m:
line = 'INSERT INTO %s%s\n' % m.groups()
line = line.replace('"', r'\"')
line = line.replace('"', "'")
line = re.sub(r"([^'])'t'(.)", r"\1THIS_IS_TRUE\2", line)
line = line.replace('THIS_IS_TRUE', '1')
line = re.sub(r"([^'])'f'(.)", r"\1THIS_IS_FALSE\2", line)
line = line.replace('THIS_IS_FALSE', '0')
line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
print line,
main()
In the lines using regular expression substitution, the insertion of the matched groups
must be double-escaped OR the replacement string must be prefixed with r to mark is as
regular expression:
line = re.sub(r"([^'])'t'(.)", "\\1THIS_IS_TRUE\\2", line)
or
line = re.sub(r"([^'])'f'(.)", r"\1THIS_IS_FALSE\2", line)
Also, this line should be added before print:
line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
Last, the column names in create statements should be backticks in MySQL. Add this in line
15:
sub = sub.replace('"','`')
Here's the complete script with modifications:
import re, fileinput
def main():
for line in fileinput.input():
process = False
for nope in ('BEGIN TRANSACTION','COMMIT',
'sqlite_sequence','CREATE UNIQUE INDEX'):
if nope in line: break
else:
process = True
if not process: continue
m = re.search('CREATE TABLE "([a-z_]*)"(.*)', line)
if m:
name, sub = m.groups()
sub = sub.replace('"','`')
line = '''DROP TABLE IF EXISTS %(name)s;
CREATE TABLE IF NOT EXISTS %(name)s%(sub)s
'''
line = line % dict(name=name, sub=sub)
else:
m = re.search('INSERT INTO "([a-z_]*)"(.*)', line)
if m:
line = 'INSERT INTO %s%s\n' % m.groups()
line = line.replace('"', r'\"')
line = line.replace('"', "'")
line = re.sub(r"([^'])'t'(.)", "\\1THIS_IS_TRUE\\2", line)
line = line.replace('THIS_IS_TRUE', '1')
line = re.sub(r"([^'])'f'(.)", "\\1THIS_IS_FALSE\\2", line)
line = line.replace('THIS_IS_FALSE', '0')
line = line.replace('AUTOINCREMENT', 'AUTO_INCREMENT')
if re.search('^CREATE INDEX', line):
line = line.replace('"','`')
print line,
main()
all of scripts on this page can't deal with simple sqlite3:
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE Filename (
FilenameId INTEGER,
Name TEXT DEFAULT '',
PRIMARY KEY(FilenameId)
);
INSERT INTO "Filename" VALUES(1,'');
INSERT INTO "Filename" VALUES(2,'bigfile1');
INSERT INTO "Filename" VALUES(3,'%gconf-tree.xml');
None were able to reformat "table_name" into proper mysql's `table_name` . Some messed up
empty string value.
I am not sure what is so hard to understand about this that it requires a snide remark as in
your comment above. Note that <> is called the diamond operator.
s/// is the substitution operator and // is the match operator
m// .
Real issue is do you know actually how to migrate the database? What is presented is merely a
search and replace loop.
> ,
Shortest? The tilde signifies a regex in perl. "import re" and go from there. The only key
differences are that you'll be using \1 and \2 instead of $1 and $2 when you assign values,
and you'll be using %s for when you're replacing regexp matches inside strings.
It call it from the command line: perl_regex.pl input.txt
Explanation of the Perl-style regex:
s/ # start search-and-replace regexp
^ # start at the beginning of this line
( # save the matched characters until ')' in $1
.*?; # go forward until finding the first semicolon
.*? # go forward until finding... (to be continued below)
)
( # save the matched characters until ')' in $2
\w # ... the next alphanumeric character.
)
/ # continue with the replace part
$1;$2 # write all characters found above, but insert a ; before $2
/ # finish the search-and-replace regexp.
Could anyone tell me, how to get the same result in Python? Especially for the $1 and $2
variables I couldn't find something alike.
import re
import sys
for line in sys.stdin: # Explicitly iterate standard input line by line
# `line` contains trailing newline!
line = re.sub(r'^(.*?;.*?)(\w)', r'\1;\2', line)
#print(line) # This print trailing newline
sys.stdout.write(line) # Print the replaced string back.
The replace instruction for s/pattern/replace/ in python regexes is the re.sub(pattern,
replace, string) function, or re.compile(pattern).sub(replace, string). In your case, you
will do it so:
_re_pattern = re.compile(r"^(.*?;.*?)(\w)")
result = _re_pattern.sub(r"\1;\2", line)
Note that $1 becomes \1 . As for perl, you need to iterate over
your lines the way you want to do it (open, inputfile, splitlines, ...).
But any of them seems to fit with an image that have to be shown as its real size, not
specifying width or height.
So, in that case, which is the equivalent in AMP?
,
As you are saying you have multiple images, it's better you use the ' layout="responsive" ',
with that, you will make your images responsive atleast.
Now regarding the Width and Height . They are must.
If you read the purpose of AMP, one of them is to make the pages ' Jumping/Flickering Free
Content ', which happens if there is no width mentioned for Images.
By Specifying the Width, the Browser (mobile browser), can calculate the precise space
and keep it for that Image and show the Content after that. In that way, there wont' be any
flickering of the content, as the page and images are loaded.
Regarding the re-writing of your HTML, one tip I can provide is, you can write some small
utility with PHP, Python or Node JavaScript, which can actually read the source image,
calculate their dimensions and replace your IMG tags.
Hope this helps and wish you good luck for your AMP powered site :-)
I use Tilda (drop-down terminal) on Ubuntu as my "command central" - pretty much the way
others might use GNOME Do, Quicksilver or Launchy.
However, I'm struggling with how to completely detach a process (e.g. Firefox) from the
terminal it's been launched from - i.e. prevent that such a (non-)child process
is terminated when closing the originating terminal
"pollutes" the originating terminal via STDOUT/STDERR
For example, in order to start Vim in a "proper" terminal window, I have tried a simple
script like the following:
exec gnome-terminal -e "vim $@" &> /dev/null &
However, that still causes pollution (also, passing a file name doesn't seem to work).
First of all; once you've started a process, you can background it by first stopping it (hit
Ctrl - Z ) and then typing bg to let it resume in the
background. It's now a "job", and its stdout / stderr /
stdin are still connected to your terminal.
You can start a process as backgrounded immediately by appending a "&" to the end of
it:
firefox &
To run it in the background silenced, use this:
firefox </dev/null &>/dev/null &
Some additional info:
nohup is a program you can use to run your application with such that its
stdout/stderr can be sent to a file instead and such that closing the parent script won't
SIGHUP the child. However, you need to have had the foresight to have used it before you
started the application. Because of the way nohup works, you can't just apply
it to a running process .
disown is a bash builtin that removes a shell job from the shell's job list.
What this basically means is that you can't use fg , bg on it
anymore, but more importantly, when you close your shell it won't hang or send a
SIGHUP to that child anymore. Unlike nohup , disown is
used after the process has been launched and backgrounded.
What you can't do, is change the stdout/stderr/stdin of a process after having
launched it. At least not from the shell. If you launch your process and tell it that its
stdout is your terminal (which is what you do by default), then that process is configured to
output to your terminal. Your shell has no business with the processes' FD setup, that's
purely something the process itself manages. The process itself can decide whether to close
its stdout/stderr/stdin or not, but you can't use your shell to force it to do so.
To manage a background process' output, you have plenty of options from scripts, "nohup"
probably being the first to come to mind. But for interactive processes you start but forgot
to silence ( firefox < /dev/null &>/dev/null & ) you can't do
much, really.
I recommend you get GNU screen . With screen you can just close your running
shell when the process' output becomes a bother and open a new one ( ^Ac ).
Oh, and by the way, don't use " $@ " where you're using it.
$@ means, $1 , $2 , $3 ..., which
would turn your command into:
gnome-terminal -e "vim $1" "$2" "$3" ...
That's probably not what you want because -e only takes one argument. Use $1
to show that your script can only handle one argument.
It's really difficult to get multiple arguments working properly in the scenario that you
gave (with the gnome-terminal -e ) because -e takes only one
argument, which is a shell command string. You'd have to encode your arguments into one. The
best and most robust, but rather cludgy, way is like so:
Reading these answers, I was under the initial impression that issuing nohup
<command> & would be sufficient. Running zsh in gnome-terminal, I found that
nohup <command> & did not prevent my shell from killing child
processes on exit. Although nohup is useful, especially with non-interactive
shells, it only guarantees this behavior if the child process does not reset its handler for
the SIGHUP signal.
In my case, nohup should have prevented hangup signals from reaching the
application, but the child application (VMWare Player in this case) was resetting its
SIGHUP handler. As a result when the terminal emulator exits, it could still
kill your subprocesses. This can only be resolved, to my knowledge, by ensuring that the
process is removed from the shell's jobs table. If nohup is overridden with a
shell builtin, as is sometimes the case, this may be sufficient, however, in the event that
it is not...
disown is a shell builtin in bash , zsh , and
ksh93 ,
<command> &
disown
or
<command> &; disown
if you prefer one-liners. This has the generally desirable effect of removing the
subprocess from the jobs table. This allows you to exit the terminal emulator without
accidentally signaling the child process at all. No matter what the SIGHUP
handler looks like, this should not kill your child process.
After the disown, the process is still a child of your terminal emulator (play with
pstree if you want to watch this in action), but after the terminal emulator
exits, you should see it attached to the init process. In other words, everything is as it
should be, and as you presumably want it to be.
What to do if your shell does not support disown ? I'd strongly advocate
switching to one that does, but in the absence of that option, you have a few choices.
screen and tmux can solve this problem, but they are much
heavier weight solutions, and I dislike having to run them for such a simple task. They are
much more suitable for situations in which you want to maintain a tty, typically on a
remote machine.
For many users, it may be desirable to see if your shell supports a capability like
zsh's setopt nohup . This can be used to specify that SIGHUP
should not be sent to the jobs in the jobs table when the shell exits. You can either apply
this just before exiting the shell, or add it to shell configuration like
~/.zshrc if you always want it on.
Find a way to edit the jobs table. I couldn't find a way to do this in
tcsh or csh , which is somewhat disturbing.
Write a small C program to fork off and exec() . This is a very poor
solution, but the source should only consist of a couple dozen lines. You can then pass
commands as commandline arguments to the C program, and thus avoid a process specific entry
in the jobs table.
I've been using number 2 for a very long time, but number 3 works just as well. Also,
disown has a 'nohup' flag of '-h', can disown all processes with '-a', and can disown all
running processes with '-ar'.
Silencing is accomplished by '$COMMAND &>/dev/null'.
in tcsh (and maybe in other shells as well), you can use parentheses to detach the process.
Compare this:
> jobs # shows nothing
> firefox &
> jobs
[1] + Running firefox
To this:
> jobs # shows nothing
> (firefox &)
> jobs # still shows nothing
>
This removes firefox from the jobs listing, but it is still tied to the terminal; if you
logged in to this node via 'ssh', trying to log out will still hang the ssh process.
,
To disassociate tty shell run command through sub-shell for e.g.
(command)&
When exit used terminal closed but process is still alive.
I run 16.04 and systemd now kills tmux when the user disconnects (
summary of
the change ).
Is there a way to run tmux or screen (or any similar program)
with systemd 230? I read all the heated disussion about pros and cons of the
behavious but no solution was suggested.
Based on @Rinzwind's answer and inspired by a unit
description the best I could find is to use TaaS (Tmux as a Service) - a generic detached
instance of tmux one reattaches to.
You need to set the Type of the service to forking , as explained
here .
Let's assume the service you want to run in screen is called
minecraft . Then you would open minecraft.service in a text editor
and add or edit the entry Type=forking under the section [Service]
.
Have a look at reptyr ,
which does exactly that. The github page has all the information.
reptyr - A tool for "re-ptying" programs.
reptyr is a utility for taking an existing running program and attaching it to a new
terminal. Started a long-running process over ssh, but have to leave and don't want to
interrupt it? Just start a screen, use reptyr to grab it, and then kill the ssh session
and head on home.
USAGE
reptyr PID
"reptyr PID" will grab the process with id PID and attach it to your current
terminal.
After attaching, the process will take input from and write output to the new
terminal, including ^C and ^Z. (Unfortunately, if you background it, you will still have
to run "bg" or "fg" in the old terminal. This is likely impossible to fix in a reasonable
way without patching your shell.)
EDIT : As Stephane Gimenez said, it's not that simple. It's only allowing you to print to a
different terminal.
You can try to write to this process using /proc . It should be located in
/proc/ pid /fd/0 , so a simple :
echo "hello" > /proc/PID/fd/0
should do it. I have not tried it, but it should work, as long as this process still has a
valid stdin file descriptor. You can check it with ls -l on /proc/ pid
/fd/ .
if it's a link to /dev/null => it's closed
if it's a link to /dev/pts/X or a socket => it's open
See nohup for more
details about how to keep processes running.
Just ending the command line with & will not completely detach the process,
it will just run it in the background. (With zsh you can use &!
to actually detach it, otherwise you have do disown it later).
When a process runs in the background, it won't receive input from its controlling
terminal anymore. But you can send it back into the foreground with fg and then
it will read input again.
Otherwise, it's not possible to externally change its filedescriptors (including stdin) or
to reattach a lost controlling terminal unless you use debugging tools (see Ansgar's answer , or have a
look at the retty command).
Since a few days I'm successfully running the new Minecraft Bedrock Edition dedicated
server on my Ubuntu 18.04 LTS home server. Because it should be available 24/7 and
automatically startup after boot I created a systemd service for a detached tmux session:
Everything works as expected but there's one tiny thing that keeps bugging me:
How can I prevent tmux from terminating it's whole session when I press
Ctrl+C ? I just want to terminate the Minecraft server process itself instead of
the whole tmux session. When starting the server from the command line in a manually
created tmux session this does work (session stays alive) but not when the session was
brought up by systemd .
When starting the server from the command line in a manually created tmux session this
does work (session stays alive) but not when the session was brought up by systemd
.
The difference between these situations is actually unrelated to systemd. In one case,
you're starting the server from a shell within the tmux session, and when the server
terminates, control returns to the shell. In the other case, you're starting the server
directly within the tmux session, and when it terminates there's no shell to return to, so
the tmux session also dies.
tmux has an option to keep the session alive after the process inside it dies (look for
remain-on-exit in the manpage), but that's probably not what you want: you want
to be able to return to an interactive shell, to restart the server, investigate why it died,
or perform maintenance tasks, for example. So it's probably better to change your command to
this:
That is, first run the server, and then, after it terminates, replace the process (the
shell which tmux implicitly spawns to run the command, but which will then exit) with
another, interactive shell. (For some other ways to get an interactive shell after the
command exits, see e. g. this question – but note that the
<(echo commands) syntax suggested in the top answer is not available in
systemd unit files.)
That change essentially means that /usr should be on the root partition, not on a separate partition which with the current
sizes of harddrive is a resobale requirement.
Notable quotes:
"... On Linux /bin and /usr/bin are still separate because it is common to have /usr on a separate partition (although this configuration breaks in subtle ways, sometimes). In /bin is all the commands that you will need if you only have / mounted. ..."
What? no /bin/ is not a symlink to /usr/bin on any FHS compliant
system. Note that there are still popular Unixes and Linuxes that ignore this - for example,
/bin and /sbin are symlinked to /usr/bin on Arch Linux
(the reasoning being that you don't need /bin for rescue/single-user-mode, since
you'd just boot a live CD).
contains commands that may be used by both the system administrator and by users, but which
are required when no other filesystems are mounted (e.g. in single user mode). It may also
contain commands which are used indirectly by scripts
This is the primary directory of executable commands on the system.
essentially, /bin contains executables which are required by the system for
emergency repairs, booting, and single user mode. /usr/bin contains any binaries
that aren't required.
I will note, that they can be on separate disks/partitions, /bin must be on
the same disk as / . /usr/bin can be on another disk - although
note that this configuration has been kind of broken for a while (this is why e.g. systemd
warns about this configuration on boot).
For full correctness, some unices may ignore FHS, as I believe it is only a Linux
Standard, I'm not aware that it has yet been included in SUS, Posix or any other UNIX
standard, though it should be IMHO. It is a part of the
LSB standard though.
/sbin - Binaries needed for booting, low-level system repair, or maintenance
(run level 1 or S)
/bin - Binaries needed for normal/standard system functioning at any run
level.
/usr/bin - Application/distribution binaries meant to be accessed by locally
logged in users
/usr/sbin - Application/distribution binaries that support or configure stuff
in /sbin.
/usr/share/bin - Application/distribution binaries or scripts meant to be
accessed via the web, i.e. Apache web applications
*local* - Binaries not part of a distribution; locally compiled or manually
installed. There's usually never a /local/bin but always a
/usr/local/bin and /usr/local/share/bin .
Recently some Linux distributions are merging /bin into /usr/bin
and relatedly /lib into /usr/lib . Sometimes also
(/usr)/sbin to /usr/bin (Arch Linux). So /usr is
expected to be available at the same time as / .
The distinction between the two hierarchies is taken to be unnecessary complexity now. The
idea was once having only /bin available at boot, but having an initial ramdisk makes this
obsolete.
I know of Fedora Linux (2011) and Arch Linux (2012) going this way and Solaris is doing
this for a long time (> 15 years).
On Linux /bin and /usr/bin are still separate because it is common
to have /usr on a separate partition (although this configuration breaks in
subtle ways, sometimes). In /bin is all the commands that you will need if you
only have / mounted.
On Solaris and Arch Linux (and probably others) /bin is a symlink to
/usr/bin . Arch also has /sbin and /usr/sbin symlinked
to /usr/bin .
Of particular note, the statement that /bin is for "system administrator"
commands and /usr/bin is for user commands is not true (unless you think
that bash and ls are for admins only, in which case you have a lot
to learn). Administrator commands are in /sbin and /usr/sbin .
The rm='rm -i' alias is an horror because after a while using it, you will
expect rm to prompt you by default before removing files. Of course, one day
you'll run it with an account that hasn't that alias set and before you understand what's going
on, it is too late.
... ... ...
If you want save aliases, but don't want to risk getting used to the commands working
differently on your system than on others, you can to disable rm like this
alias rm='echo "rm is disabled, use remove or trash or /bin/rm instead."'
I just want to create an RPM file to distribute my Linux binary "foobar", with only a couple
of dependencies. It has a config file, /etc/foobar.conf and should be installed in
/usr/bin/foobar.
Unfortunately the documentation for RPM is 27 chapters
long and I really don't have a day to sit down and read this, because I am also busy making
.deb and EXE installers for other platforms.
What is the absolute minimum I have to do to create an RPM? Assume the foobar binary and
foobar.conf are in the current working directory.
I often do binary rpm per packaging proprietary apps - also moster as websphere - on linux.
So my experience could be useful also a you, besides that it would better to do a TRUE RPM if
you can. But i digress.
So the a basic step for packaging your (binary) program is as follow - in which i suppose
the program is toybinprog with version 1.0, have a conf to be installed in
/etc/toybinprog/toybinprog.conf and have a bin to be installed in /usr/bin called tobinprog
:
I'm new to Perl and I'm writing a program where I want to force the user to enter a word. If
the user enters an empty string then the program should exit.
This is what I have so far:
print "Enter a word to look up: ";
chomp ($usrword = <STDIN>);
print "Enter a word to look up: ";
my $userword = <STDIN>; # I moved chomp to a new line to make it more readable
chomp $userword; # Get rid of newline character at the end
exit 0 if ($userword eq ""); # If empty string, exit.
File output is buffered by default. Since the prompt is so short, it is still sitting in
the output buffer. You can disable buffering on STDOUT by adding this line of code before
printing...
I am trying to backup my file server to a remove file server using rsync. Rsync is not successfully resuming when a transfer is
interrupted. I used the partial option but rsync doesn't find the file it already started because it renames it to a temporary
file and when resumed it creates a new file and starts from beginning.
When this command is ran, a backup file named OldDisk.dmg from my local machine get created on the remote machine as something
like .OldDisk.dmg.SjDndj23 .
Now when the internet connection gets interrupted and I have to resume the transfer, I have to find where rsync left off by
finding the temp file like .OldDisk.dmg.SjDndj23 and rename it to OldDisk.dmg so that it sees there already exists a file that
it can resume.
How do I fix this so I don't have to manually intervene each time?
TL;DR : Use --timeout=X (X in seconds) to change the default rsync server timeout, not --inplace .
The issue is the rsync server processes (of which there are two, see rsync --server ... in ps output
on the receiver) continue running, to wait for the rsync client to send data.
If the rsync server processes do not receive data for a sufficient time, they will indeed timeout, self-terminate and cleanup
by moving the temporary file to it's "proper" name (e.g., no temporary suffix). You'll then be able to resume.
If you don't want to wait for the long default timeout to cause the rsync server to self-terminate, then when your internet
connection returns, log into the server and clean up the rsync server processes manually. However, you
must politely terminate rsync -- otherwise,
it will not move the partial file into place; but rather, delete it (and thus there is no file to resume). To politely ask rsync
to terminate, do not SIGKILL (e.g., -9 ), but SIGTERM (e.g., pkill -TERM -x rsync
- only an example, you should take care to match only the rsync processes concerned with your client).
Fortunately there is an easier way: use the --timeout=X (X in seconds) option; it is passed to the rsync server
processes as well.
For example, if you specify rsync ... --timeout=15 ... , both the client and server rsync processes will cleanly
exit if they do not send/receive data in 15 seconds. On the server, this means moving the temporary file into position, ready
for resuming.
I'm not sure of the default timeout value of the various rsync processes will try to send/receive data before they die (it
might vary with operating system). In my testing, the server rsync processes remain running longer than the local client. On a
"dead" network connection, the client terminates with a broken pipe (e.g., no network socket) after about 30 seconds; you could
experiment or review the source code. Meaning, you could try to "ride out" the bad internet connection for 15-20 seconds.
If you do not clean up the server rsync processes (or wait for them to die), but instead immediately launch another rsync client
process, two additional server processes will launch (for the other end of your new client process). Specifically, the new rsync
client will not re-use/reconnect to the existing rsync server processes. Thus, you'll have two temporary files (and four rsync
server processes) -- though, only the newer, second temporary file has new data being written (received from your new rsync client
process).
Interestingly, if you then clean up all rsync server processes (for example, stop your client which will stop the new rsync
servers, then SIGTERM the older rsync servers, it appears to merge (assemble) all the partial files into the new
proper named file. So, imagine a long running partial copy which dies (and you think you've "lost" all the copied data), and a
short running re-launched rsync (oops!).. you can stop the second client, SIGTERM the first servers, it will merge
the data, and you can resume.
Finally, a few short remarks:
Don't use --inplace to workaround this. You will undoubtedly have other problems as a result, man rsync
for the details.
It's trivial, but -t in your rsync options is redundant, it is implied by -a .
An already compressed disk image sent over rsync without compression might result in shorter transfer time (by
avoiding double compression). However, I'm unsure of the compression techniques in both cases. I'd test it.
As far as I understand --checksum / -c , it won't help you in this case. It affects how rsync
decides if it should transfer a file. Though, after a first rsync completes, you could run a second rsync
with -c to insist on checksums, to prevent the strange case that file size and modtime are the same on both sides,
but bad data was written.
I didn't test how the server-side rsync handles SIGINT, so I'm not sure it will keep the partial file - you could check. Note
that this doesn't have much to do with Ctrl-c ; it happens that your terminal sends SIGINT to the foreground
process when you press Ctrl-c , but the server-side rsync has no controlling terminal. You must log in to the server
and use kill . The client-side rsync will not send a message to the server (for example, after the client receives
SIGINT via your terminal Ctrl-c ) - might be interesting though. As for anthropomorphizing, not sure
what's "politer". :-) � Richard Michael
Dec 29 '13 at 22:34
I just tried this timeout argument rsync -av --delete --progress --stats --human-readable --checksum --timeout=60 --partial-dir
/tmp/rsync/ rsync://$remote:/ /src/ but then it timed out during the "receiving file list" phase (which in this case takes
around 30 minutes). Setting the timeout to half an hour so kind of defers the purpose. Any workaround for this? �
d-b
Feb 3 '15 at 8:48
@user23122 --checksum reads all data when preparing the file list, which is great for many small files that change
often, but should be done on-demand for large files. �
Cees Timmerman
Sep 15 '15 at 17:10
There are no guarantees. A Journaling File System is more resilient and is less prone to
corruption, but not immune.
All a journal is is a list of operations which have recently been done to the file system.
The crucial part is that the journal entry is made before the operations take place.
Most operations have multiple steps. Deleting a file, for example might entail deleting the
file's entry in the file system's table of contents and then marking the sectors on the drive
as free. If something happens between the two steps, a journaled file system can tell
immediately and perform the necessary clean up to keep everything consistent. This is not the
case with a non-journaled file system which has to look at the entire contents of the volume
to find errors.
While this journaling is much less prone to corruption than not journaling, corruption can
still occur. For example, if the hard drive is mechanically malfunctioning or if writes to
the journal itself are failing or interrupted.
The basic premise of journaling is that writing a journal entry is much quicker, usually,
than the actual transaction it describes will be. So, the period between the OS ordering a
(journal) write and the hard drive fulfilling it is much shorter than for a normal write: a
narrower window for things to go wrong in, but there's still a window.
Could you please elaborate a little bit on why this is true? Perhaps you could give an
example of how corruption would occur in a certain scenario. – Nathan Osman
May 6 '11 at 2:57
That last bit is incorrect; there is no window for things to go wrong. Since it records what
it is about to do before it starts doing it, the operation can be restarted after the power
failure, no matter at what point it occurs during the operation. It is a matter of ordering,
not timing. – psusi
May 6 '11 at 17:58
@psusi there is still a window for the write to the journal to be interrupted. Journal writes
may appear atomic to the OS but they're still writes to the disk. – Andrew Lambert
May 6 '11 at 21:23
@Amazed they are atomic because they have sequence numbers and/or checksums, so the journal
entry is either written entirely, or not. If it is not written entirely, it is simply ignored
after the system restarts, and no further changes were made to the fs so it remains
consistent. – psusi
May 7 '11 at 1:57
The most common type of journaling, called metadata journaling, only protects the
integrity of the file system, not of data. This includes xfs , and
ext3 / ext4 in the default data=ordered mode.
If a non-journaling file system suffers a crash, it will be checked using
fsck on the next boot. fsck scans every inode on the file system, looking for blocks that
are marked as used but are not reachable (i.e. have no file name), and marks those blocks as
unused. Doing this takes a long time.
With a metadata journaling file system, instead of doing an fsck , it knows
which blocks it was in the middle of changing, so it can mark them as free without searching
the whole partition for them.
There is a less common type of journaling, called data journaling, which is what
ext3 does if you mount it with the data=journal option.
It attempts to protect all your data by writing not just a list of logical operations, but
also the entire contents of each write to the journal. But because it's writing your data
twice, it can be much slower.
As others have pointed out, even this is not a guarantee, because the hard drive might
have told the operating system it had stored the data, when it fact it was still in the hard
drive's cache.
+1 for the distinction between file system corruption and data corruption. That little
distinction is quite the doozy in practice. – SplinterReality
May 6 '11 at 8:03
Again, the OS knows when the drive caches data and forces it to flush it when needed in order
to maintain a coherent fs. Your data file of course, can be lost or corrupted if the
application that was writing it when the power failed was not doing so carefully, and that
applies whether or not you use data=journal. – psusi
May 6 '11 at 18:11
@user3338098, drives that silently corrupt data are horribly broken and should not ever be
used, and are an entirely different conversation than corruption caused by software doing the
wrong thing. – psusi
Aug 21 '16 at 3:22
A filesystem cannot guarantee the consistency of its filesystem if a power failure occurs,
because it does not know what the hardware will do.
If a hard drive buffers data for write but tells the OS that it has written the data and
does not support the appropriate write barriers, then out-of-order writes can occur where an
earlier write has not hit the platter, but a later one has. See
this serverfault answer for more details.
Also, the position of the head on a magnetic HDD is controlled with electro-magnets. If
power fails in the middle of a write, it is possible for some data to continue to be written
while the heads move, corrupting data on blocks that the filesystem never intended to be
written.
@George: It's going to depend on the drive. There's a lot out there and you don't know how
well your (cheap) drive does things. – camh
May 6 '11 at 7:54
The hard drive tells the OS if it uses a write behind cache, and the OS takes measures to
ensure they are flushed in the correct order. Also drives are designed so that when the power
fails, they stop writing. I have seen some cases where the sector being written at the time
of power loss becomes corrupt because it did not finish updating the ecc ( but can be easily
re-written correctly ), but never heard of random sectors being corrupted on power loss.
– psusi
May 6 '11 at 18:05
ZFS, which is close but not exactly a journaling filesystem, is guaranteeing by design
against corruption after a power failure.
It doesn't matter if an ongoing write is interrupted in the middle as in such case, its
checksum will be certainly incorrect so the block will be ignored. As the file system is copy
on write, the previous correct data (or meta-data) is still on disk and will be used
instead.
As already
mikel said, most journaling file systems can only protect file metadata (information
like the name of a file, its size, its permissions, etc.), not file data (the file's
contents). This is happening because protecting file data results in a very slow (in
practice useless) file system.
Since the journal is also a special kind of file stored on the hard disk, it can be
damaged after a power failure. Thus if the journal is corrupted the file system cannot
complete any incomplete transactions that were taking place when the power failure
occured.
What events could lead to a corrupt journal? The only thing I could think of was bad sectors
- is there anything else? – Nathan Osman
May 6 '11 at 16:35
This is #1 Google hit but there's controversy in the answer because the question unfortunately asks about delimiting on
, (comma-space) and not a single character such as comma. If you're only interested in the latter, answers here
are easier to follow:
stackoverflow.com/questions/918886/
� antak
Jun 18 '18 at 9:22
Note that the characters in $IFS are treated individually as separators so that in this case fields may be separated
by either a comma or a space rather than the sequence of the two characters. Interestingly though, empty fields aren't
created when comma-space appears in the input because the space is treated specially.
To access an individual element:
echo "${array[0]}"
To iterate over the elements:
for element in "${array[@]}"
do
echo "$element"
done
To get both the index and the value:
for index in "${!array[@]}"
do
echo "$index ${array[index]}"
done
The last example is useful because Bash arrays are sparse. In other words, you can delete an element or add an element and
then the indices are not contiguous.
unset "array[1]"
array[42]=Earth
To get the number of elements in an array:
echo "${#array[@]}"
As mentioned above, arrays can be sparse so you shouldn't use the length to get the last element. Here's how you can in Bash
4.2 and later:
echo "${array[-1]}"
in any version of Bash (from somewhere after 2.05b):
echo "${array[@]: -1:1}"
Larger negative offsets select farther from the end of the array. Note the space before the minus sign in the older form. It
is required.
Just use IFS=', ' , then you don't have to remove the spaces separately. Test: IFS=', ' read -a array <<< "Paris,
France, Europe"; echo "${array[@]}" � l0b0
May 14 '12 at 15:24
Warning: the IFS variable means split by one of these characters , so it's not a sequence of chars to split by. IFS=',
' read -a array <<< "a,d r s,w" => ${array[*]} == a d r s w �
caesarsol
Oct 29 '15 at 14:45
string="1:2:3:4:5"
set -f # avoid globbing (expansion of *).
array=(${string//:/ })
for i in "${!array[@]}"
do
echo "$i=>${array[i]}"
done
The idea is using string replacement:
${string//substring/replacement}
to replace all matches of $substring with white space and then using the substituted string to initialize a array:
(element1 element2 ... elementN)
Note: this answer makes use of the split+glob operator
. Thus, to prevent expansion of some characters (such as * ) it is a good idea to pause globbing for this script.
Used this approach... until I came across a long string to split. 100% CPU for more than a minute (then I killed it). It's a pity
because this method allows to split by a string, not some character in IFS. �
Werner Lehmann
May 4 '13 at 22:32
WARNING: Just ran into a problem with this approach. If you have an element named * you will get all the elements of your cwd
as well. thus string="1:2:3:4:*" will give some unexpected and possibly dangerous results depending on your implementation. Did
not get the same error with (IFS=', ' read -a array <<< "$string") and this one seems safe to use. �
Dieter Gribnitz
Sep 2 '14 at 15:46
1: This is a misuse of $IFS . The value of the $IFS variable is not taken as a single variable-length
string separator, rather it is taken as a set of single-character string separators, where each field that
read splits off from the input line can be terminated by any character in the set (comma or space, in this example).
Actually, for the real sticklers out there, the full meaning of $IFS is slightly more involved. From the
bash manual
:
The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words using these
characters as field terminators. If IFS is unset, or its value is exactly <space><tab><newline> , the default, then sequences
of <space> , <tab> , and <newline> at the beginning and end of the results of the previous expansions are ignored, and any
sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default,
then sequences of the whitespace characters <space> , <tab> , and <newline> are ignored at the beginning and end of the word,
as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not
IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters
is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.
Basically, for non-default non-null values of $IFS , fields can be separated with either (1) a sequence of one
or more characters that are all from the set of "IFS whitespace characters" (that is, whichever of <space> , <tab> , and <newline>
("newline" meaning line feed (LF) ) are present anywhere in
$IFS ), or (2) any non-"IFS whitespace character" that's present in $IFS along with whatever "IFS whitespace
characters" surround it in the input line.
For the OP, it's possible that the second separation mode I described in the previous paragraph is exactly what he wants for
his input string, but we can be pretty confident that the first separation mode I described is not correct at all. For example,
what if his input string was 'Los Angeles, United States, North America' ?
IFS=', ' read -ra a <<<'Los Angeles, United States, North America'; declare -p a;
## declare -a a=([0]="Los" [1]="Angeles" [2]="United" [3]="States" [4]="North" [5]="America")
2: Even if you were to use this solution with a single-character separator (such as a comma by itself, that is, with no following
space or other baggage), if the value of the $string variable happens to contain any LFs, then read
will stop processing once it encounters the first LF. The read builtin only processes one line per invocation. This
is true even if you are piping or redirecting input only to the read statement, as we are doing in this example
with the here-string
mechanism, and thus unprocessed input is guaranteed to be lost. The code that powers the read builtin has no knowledge
of the data flow within its containing command structure.
You could argue that this is unlikely to cause a problem, but still, it's a subtle hazard that should be avoided if possible.
It is caused by the fact that the read builtin actually does two levels of input splitting: first into lines, then
into fields. Since the OP only wants one level of splitting, this usage of the read builtin is not appropriate, and
we should avoid it.
3: A non-obvious potential issue with this solution is that read always drops the trailing field if it is empty,
although it preserves empty fields otherwise. Here's a demo:
string=', , a, , b, c, , , '; IFS=', ' read -ra a <<<"$string"; declare -p a;
## declare -a a=([0]="" [1]="" [2]="a" [3]="" [4]="b" [5]="c" [6]="" [7]="")
Maybe the OP wouldn't care about this, but it's still a limitation worth knowing about. It reduces the robustness and generality
of the solution.
This problem can be solved by appending a dummy trailing delimiter to the input string just prior to feeding it to read
, as I will demonstrate later.
string="1,2,3,4"
array=(`echo $string | sed 's/,/\n/g'`)
These solutions leverage word splitting in an array assignment to split the string into fields. Funnily enough, just like
read , general word splitting also uses the $IFS special variable, although in this case it is implied
that it is set to its default value of <space><tab><newline> , and therefore any sequence of one or more IFS characters (which
are all whitespace characters now) is considered to be a field delimiter.
This solves the problem of two levels of splitting committed by read , since word splitting by itself constitutes
only one level of splitting. But just as before, the problem here is that the individual fields in the input string can already
contain $IFS characters, and thus they would be improperly split during the word splitting operation. This happens
to not be the case for any of the sample input strings provided by these answerers (how convenient...), but of course that doesn't
change the fact that any code base that used this idiom would then run the risk of blowing up if this assumption were ever violated
at some point down the line. Once again, consider my counterexample of 'Los Angeles, United States, North America'
(or 'Los Angeles:United States:North America' ).
Also, word splitting is normally followed by
filename
expansion ( aka pathname expansion aka globbing), which, if done, would potentially corrupt words containing
the characters * , ? , or [ followed by ] (and, if extglob is
set, parenthesized fragments preceded by ? , * , + , @ , or !
) by matching them against file system objects and expanding the words ("globs") accordingly. The first of these three answerers
has cleverly undercut this problem by running set -f beforehand to disable globbing. Technically this works (although
you should probably add set +f afterward to reenable globbing for subsequent code which may depend on it), but it's
undesirable to have to mess with global shell settings in order to hack a basic string-to-array parsing operation in local code.
Another issue with this answer is that all empty fields will be lost. This may or may not be a problem, depending on the application.
Note: If you're going to use this solution, it's better to use the ${string//:/ } "pattern substitution" form
of
parameter expansion , rather than going to the trouble of invoking a command substitution (which forks the shell), starting
up a pipeline, and running an external executable ( tr or sed ), since parameter expansion is purely
a shell-internal operation. (Also, for the tr and sed solutions, the input variable should be double-quoted
inside the command substitution; otherwise word splitting would take effect in the echo command and potentially mess
with the field values. Also, the $(...) form of command substitution is preferable to the old `...`
form since it simplifies nesting of command substitutions and allows for better syntax highlighting by text editors.)
str="a, b, c, d" # assuming there is a space after ',' as in Q
arr=(${str//,/}) # delete all occurrences of ','
This answer is almost the same as #2 . The difference is that the answerer has made the assumption that the fields are delimited
by two characters, one of which being represented in the default $IFS , and the other not. He has solved this rather
specific case by removing the non-IFS-represented character using a pattern substitution expansion and then using word splitting
to split the fields on the surviving IFS-represented delimiter character.
This is not a very generic solution. Furthermore, it can be argued that the comma is really the "primary" delimiter character
here, and that stripping it and then depending on the space character for field splitting is simply wrong. Once again, consider
my counterexample: 'Los Angeles, United States, North America' .
Also, again, filename expansion could corrupt the expanded words, but this can be prevented by temporarily disabling globbing
for the assignment with set -f and then set +f .
Also, again, all empty fields will be lost, which may or may not be a problem depending on the application.
string='first line
second line
third line'
oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"
This is similar to #2 and #3 in that it uses word splitting to get the job done, only now the code explicitly sets $IFS
to contain only the single-character field delimiter present in the input string. It should be repeated that this cannot work
for multicharacter field delimiters such as the OP's comma-space delimiter. But for a single-character delimiter like the LF used
in this example, it actually comes close to being perfect. The fields cannot be unintentionally split in the middle as we saw
with previous wrong answers, and there is only one level of splitting, as required.
One problem is that filename expansion will corrupt affected words as described earlier, although once again this can be solved
by wrapping the critical statement in set -f and set +f .
Another potential problem is that, since LF qualifies as an "IFS whitespace character" as defined earlier, all empty fields
will be lost, just as in #2 and #3 . This would of course not be a problem if the delimiter happens to be a non-"IFS whitespace
character", and depending on the application it may not matter anyway, but it does vitiate the generality of the solution.
So, to sum up, assuming you have a one-character delimiter, and it is either a non-"IFS whitespace character" or you don't
care about empty fields, and you wrap the critical statement in set -f and set +f , then this solution
works, but otherwise not.
(Also, for information's sake, assigning a LF to a variable in bash can be done more easily with the $'...' syntax,
e.g. IFS=$'\n'; .)
This solution is effectively a cross between #1 (in that it sets $IFS to comma-space) and #2-4 (in that it uses
word splitting to split the string into fields). Because of this, it suffers from most of the problems that afflict all of the
above wrong answers, sort of like the worst of all worlds.
Also, regarding the second variant, it may seem like the eval call is completely unnecessary, since its argument
is a single-quoted string literal, and therefore is statically known. But there's actually a very non-obvious benefit to using
eval in this way. Normally, when you run a simple command which consists of a variable assignment only , meaning
without an actual command word following it, the assignment takes effect in the shell environment:
IFS=', '; ## changes $IFS in the shell environment
This is true even if the simple command involves multiple variable assignments; again, as long as there's no command
word, all variable assignments affect the shell environment:
IFS=', ' array=($countries); ## changes both $IFS and $array in the shell environment
But, if the variable assignment is attached to a command name (I like to call this a "prefix assignment") then it does not
affect the shell environment, and instead only affects the environment of the executed command, regardless whether it is a builtin
or external:
IFS=', ' :; ## : is a builtin command, the $IFS assignment does not outlive it
IFS=', ' env; ## env is an external command, the $IFS assignment does not outlive it
If no command name results, the variable assignments affect the current shell environment. Otherwise, the variables are
added to the environment of the executed command and do not affect the current shell environment.
It is possible to exploit this feature of variable assignment to change $IFS only temporarily, which allows us
to avoid the whole save-and-restore gambit like that which is being done with the $OIFS variable in the first variant.
But the challenge we face here is that the command we need to run is itself a mere variable assignment, and hence it would not
involve a command word to make the $IFS assignment temporary. You might think to yourself, well why not just add
a no-op command word to the statement like the
: builtin to make the $IFS assignment temporary? This does not work because it would then make the
$array assignment temporary as well:
IFS=', ' array=($countries) :; ## fails; new $array value never escapes the : command
So, we're effectively at an impasse, a bit of a catch-22. But, when eval runs its code, it runs it in the shell
environment, as if it was normal, static source code, and therefore we can run the $array assignment inside the
eval argument to have it take effect in the shell environment, while the $IFS prefix assignment that
is prefixed to the eval command will not outlive the eval command. This is exactly the trick that is
being used in the second variant of this solution:
IFS=', ' eval 'array=($string)'; ## $IFS does not outlive the eval command, but $array does
So, as you can see, it's actually quite a clever trick, and accomplishes exactly what is required (at least with respect to
assignment effectation) in a rather non-obvious way. I'm actually not against this trick in general, despite the involvement of
eval ; just be careful to single-quote the argument string to guard against security threats.
But again, because of the "worst of all worlds" agglomeration of problems, this is still a wrong answer to the OP's requirement.
IFS=', '; array=(Paris, France, Europe)
IFS=' ';declare -a array=(Paris France Europe)
Um... what? The OP has a string variable that needs to be parsed into an array. This "answer" starts with the verbatim contents
of the input string pasted into an array literal. I guess that's one way to do it.
It looks like the answerer may have assumed that the $IFS variable affects all bash parsing in all contexts, which
is not true. From the bash manual:
IFS The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the
read builtin command. The default value is <space><tab><newline> .
So the $IFS special variable is actually only used in two contexts: (1) word splitting that is performed after
expansion (meaning not when parsing bash source code) and (2) for splitting input lines into words by the read
builtin.
Let me try to make this clearer. I think it might be good to draw a distinction between parsing and execution
. Bash must first parse the source code, which obviously is a parsing event, and then later it executes the
code, which is when expansion comes into the picture. Expansion is really an execution event. Furthermore, I take issue
with the description of the $IFS variable that I just quoted above; rather than saying that word splitting is performed
after expansion , I would say that word splitting is performed during expansion, or, perhaps even more precisely,
word splitting is part of the expansion process. The phrase "word splitting" refers only to this step of expansion; it
should never be used to refer to the parsing of bash source code, although unfortunately the docs do seem to throw around the
words "split" and "words" a lot. Here's a relevant excerpt from the
linux.die.net version of the bash manual:
Expansion is performed on the command line after it has been split into words. There are seven kinds of expansion performed:
brace expansion , tilde expansion , parameter and variable expansion , command substitution ,
arithmetic expansion , word splitting , and pathname expansion .
The order of expansions is: brace expansion; tilde expansion, parameter and variable expansion, arithmetic expansion, and
command substitution (done in a left-to-right fashion); word splitting; and pathname expansion.
You could argue the
GNU version
of the manual does slightly better, since it opts for the word "tokens" instead of "words" in the first sentence of the Expansion
section:
Expansion is performed on the command line after it has been split into tokens.
The important point is, $IFS does not change the way bash parses source code. Parsing of bash source code is actually
a very complex process that involves recognition of the various elements of shell grammar, such as command sequences, command
lists, pipelines, parameter expansions, arithmetic substitutions, and command substitutions. For the most part, the bash parsing
process cannot be altered by user-level actions like variable assignments (actually, there are some minor exceptions to this rule;
for example, see the various
compatxx
shell settings , which can change certain aspects of parsing behavior on-the-fly). The upstream "words"/"tokens" that result
from this complex parsing process are then expanded according to the general process of "expansion" as broken down in the above
documentation excerpts, where word splitting of the expanded (expanding?) text into downstream words is simply one step of that
process. Word splitting only touches text that has been spit out of a preceding expansion step; it does not affect literal text
that was parsed right off the source bytestream.
string='first line
second line
third line'
while read -r line; do lines+=("$line"); done <<<"$string"
This is one of the best solutions. Notice that we're back to using read . Didn't I say earlier that read
is inappropriate because it performs two levels of splitting, when we only need one? The trick here is that you can call
read in such a way that it effectively only does one level of splitting, specifically by splitting off only one field per
invocation, which necessitates the cost of having to call it repeatedly in a loop. It's a bit of a sleight of hand, but it works.
But there are problems. First: When you provide at least one NAME argument to read , it automatically ignores
leading and trailing whitespace in each field that is split off from the input string. This occurs whether $IFS is
set to its default value or not, as described earlier in this post. Now, the OP may not care about this for his specific use-case,
and in fact, it may be a desirable feature of the parsing behavior. But not everyone who wants to parse a string into fields will
want this. There is a solution, however: A somewhat non-obvious usage of read is to pass zero NAME arguments.
In this case, read will store the entire input line that it gets from the input stream in a variable named
$REPLY , and, as a bonus, it does not strip leading and trailing whitespace from the value. This is a very robust
usage of read which I've exploited frequently in my shell programming career. Here's a demonstration of the difference
in behavior:
string=$' a b \n c d \n e f '; ## input string
a=(); while read -r line; do a+=("$line"); done <<<"$string"; declare -p a;
## declare -a a=([0]="a b" [1]="c d" [2]="e f") ## read trimmed surrounding whitespace
a=(); while read -r; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]=" a b " [1]=" c d " [2]=" e f ") ## no trimming
The second issue with this solution is that it does not actually address the case of a custom field separator, such as the
OP's comma-space. As before, multicharacter separators are not supported, which is an unfortunate limitation of this solution.
We could try to at least split on comma by specifying the separator to the -d option, but look what happens:
string='Paris, France, Europe';
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France")
Predictably, the unaccounted surrounding whitespace got pulled into the field values, and hence this would have to be corrected
subsequently through trimming operations (this could also be done directly in the while-loop). But there's another obvious error:
Europe is missing! What happened to it? The answer is that read returns a failing return code if it hits end-of-file
(in this case we can call it end-of-string) without encountering a final field terminator on the final field. This causes the
while-loop to break prematurely and we lose the final field.
Technically this same error afflicted the previous examples as well; the difference there is that the field separator was taken
to be LF, which is the default when you don't specify the -d option, and the <<< ("here-string") mechanism
automatically appends a LF to the string just before it feeds it as input to the command. Hence, in those cases, we sort of
accidentally solved the problem of a dropped final field by unwittingly appending an additional dummy terminator to the input.
Let's call this solution the "dummy-terminator" solution. We can apply the dummy-terminator solution manually for any custom delimiter
by concatenating it against the input string ourselves when instantiating it in the here-string:
a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,"; declare -p a;
declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")
There, problem solved. Another solution is to only break the while-loop if both (1) read returned failure and
(2) $REPLY is empty, meaning read was not able to read any characters prior to hitting end-of-file.
Demo:
a=(); while read -rd,|| [[ -n "$REPLY" ]]; do a+=("$REPLY"); done <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')
This approach also reveals the secretive LF that automatically gets appended to the here-string by the <<< redirection
operator. It could of course be stripped off separately through an explicit trimming operation as described a moment ago, but
obviously the manual dummy-terminator approach solves it directly, so we could just go with that. The manual dummy-terminator
solution is actually quite convenient in that it solves both of these two problems (the dropped-final-field problem and the appended-LF
problem) in one go.
So, overall, this is quite a powerful solution. It's only remaining weakness is a lack of support for multicharacter delimiters,
which I will address later.
string='first line
second line
third line'
readarray -t lines <<<"$string"
(This is actually from the same post as #7 ; the answerer provided two solutions in the same post.)
The readarray builtin, which is a synonym for mapfile , is ideal. It's a builtin command which parses
a bytestream into an array variable in one shot; no messing with loops, conditionals, substitutions, or anything else. And it
doesn't surreptitiously strip any whitespace from the input string. And (if -O is not given) it conveniently clears
the target array before assigning to it. But it's still not perfect, hence my criticism of it as a "wrong answer".
First, just to get this out of the way, note that, just like the behavior of read when doing field-parsing,
readarray drops the trailing field if it is empty. Again, this is probably not a concern for the OP, but it could
be for some use-cases. I'll come back to this in a moment.
Second, as before, it does not support multicharacter delimiters. I'll give a fix for this in a moment as well.
Third, the solution as written does not parse the OP's input string, and in fact, it cannot be used as-is to parse it. I'll
expand on this momentarily as well.
For the above reasons, I still consider this to be a "wrong answer" to the OP's question. Below I'll give what I consider to
be the right answer.
Right answer
Here's a na�ve attempt to make #8 work by just specifying the -d option:
string='Paris, France, Europe';
readarray -td, a <<<"$string"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=$' Europe\n')
We see the result is identical to the result we got from the double-conditional approach of the looping read solution
discussed in #7 . We can almost solve this with the manual dummy-terminator trick:
readarray -td, a <<<"$string,"; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe" [3]=$'\n')
The problem here is that readarray preserved the trailing field, since the <<< redirection operator
appended the LF to the input string, and therefore the trailing field was not empty (otherwise it would've been dropped).
We can take care of this by explicitly unsetting the final array element after-the-fact:
readarray -td, a <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]=" France" [2]=" Europe")
The only two problems that remain, which are actually related, are (1) the extraneous whitespace that needs to be trimmed,
and (2) the lack of support for multicharacter delimiters.
The whitespace could of course be trimmed afterward (for example, see
How to trim whitespace
from a Bash variable? ). But if we can hack a multicharacter delimiter, then that would solve both problems in one shot.
Unfortunately, there's no direct way to get a multicharacter delimiter to work. The best solution I've thought of is
to preprocess the input string to replace the multicharacter delimiter with a single-character delimiter that will be guaranteed
not to collide with the contents of the input string. The only character that has this guarantee is the
NUL byte . This is because, in bash (though not in
zsh, incidentally), variables cannot contain the NUL byte. This preprocessing step can be done inline in a process substitution.
Here's how to do it using awk :
There, finally! This solution will not erroneously split fields in the middle, will not cut out prematurely, will not drop
empty fields, will not corrupt itself on filename expansions, will not automatically strip leading and trailing whitespace, will
not leave a stowaway LF on the end, does not require loops, and does not settle for a single-character delimiter.
Trimming solution
Lastly, I wanted to demonstrate my own fairly intricate trimming solution using the obscure -C callback option
of readarray . Unfortunately, I've run out of room against Stack Overflow's draconian 30,000 character post limit,
so I won't be able to explain it. I'll leave that as an exercise for the reader.
function mfcb { local val="$4"; "$1"; eval "$2[$3]=\$val;"; };
function val_ltrim { if [[ "$val" =~ ^[[:space:]]+ ]]; then val="${val:${#BASH_REMATCH[0]}}"; fi; };
function val_rtrim { if [[ "$val" =~ [[:space:]]+$ ]]; then val="${val:0:${#val}-${#BASH_REMATCH[0]}}"; fi; };
function val_trim { val_ltrim; val_rtrim; };
readarray -c1 -C 'mfcb val_trim a' -td, <<<"$string,"; unset 'a[-1]'; declare -p a;
## declare -a a=([0]="Paris" [1]="France" [2]="Europe")
It may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray
first appears in Bash 4.4. � fbicknel
Aug 18 '17 at 15:57
Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,"\0"); print }' and eliminate that concatenation
of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray
-td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string") on Bash that supports readarray . Note your
method is Bash 4.4+ I think because of the -d in readarray �
dawg
Nov 26 '17 at 22:28
@datUser That's unfortunate. Your version of bash must be too old for readarray . In this case, you can use the second-best
solution built on read . I'm referring to this: a=(); while read -rd,; do a+=("$REPLY"); done <<<"$string,";
(with the awk substitution if you need multicharacter delimiter support). Let me know if you run into any problems;
I'm pretty sure this solution should work on fairly old versions of bash, back to version 2-something, released like two decades
ago. � bgoldst
Feb 23 '18 at 3:37
This does not work as stated. @Jmoney38 or shrimpwagon if you can paste this in a terminal and get the desired output, please
paste the result here. � abalter
Aug 30 '16 at 5:13
Sometimes it happened to me that the method described in the accepted answer didn't work, especially if the separator is a carriage
return.
In those cases I solved in this way:
string='first line
second line
third line'
oldIFS="$IFS"
IFS='
'
IFS=${IFS:0:1} # this is useful to format your code with tabs
lines=( $string )
IFS="$oldIFS"
for line in "${lines[@]}"
do
echo "--> $line"
done
+1 This completely worked for me. I needed to put multiple strings, divided by a newline, into an array, and read -a arr
<<< "$strings" did not work with IFS=$'\n' . �
Stefan van den Akker
Feb 9 '15 at 16:52
While not every solution works for every situation, your mention of readarray... replaced my last two hours with 5 minutes...
you got my vote � Mayhem
Dec 31 '15 at 3:13
The key to splitting your string into an array is the multi character delimiter of ", " . Any solution using
IFS for multi character delimiters is inherently wrong since IFS is a set of those characters, not a string.
If you assign IFS=", " then the string will break on EITHER "," OR " " or any combination
of them which is not an accurate representation of the two character delimiter of ", " .
You can use awk or sed to split the string, with process substitution:
#!/bin/bash
str="Paris, France, Europe"
array=()
while read -r -d $'\0' each; do # use a NUL terminated field separator
array+=("$each")
done < <(printf "%s" "$str" | awk '{ gsub(/,[ ]+|$/,"\0"); print }')
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output
It is more efficient to use a regex you directly in Bash:
#!/bin/bash
str="Paris, France, Europe"
array=()
while [[ $str =~ ([^,]+)(,[ ]+|$) ]]; do
array+=("${BASH_REMATCH[1]}") # capture the field
i=${#BASH_REMATCH} # length of field + delimiter
str=${str:i} # advance the string by that length
done # the loop deletes $str, so make a copy if needed
declare -p array
# declare -a array=([0]="Paris" [1]="France" [2]="Europe") output...
With the second form, there is no sub shell and it will be inherently faster.
Edit by bgoldst: Here are some benchmarks comparing my readarray solution to dawg's regex solution, and I also
included the read solution for the heck of it (note: I slightly modified the regex solution for greater harmony with
my solution) (also see my comments below the post):
## competitors
function c_readarray { readarray -td '' a < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); unset 'a[-1]'; };
function c_read { a=(); local REPLY=''; while read -r -d ''; do a+=("$REPLY"); done < <(awk '{ gsub(/, /,"\0"); print; };' <<<"$1, "); };
function c_regex { a=(); local s="$1, "; while [[ $s =~ ([^,]+),\ ]]; do a+=("${BASH_REMATCH[1]}"); s=${s:${#BASH_REMATCH}}; done; };
## helper functions
function rep {
local -i i=-1;
for ((i = 0; i<$1; ++i)); do
printf %s "$2";
done;
}; ## end rep()
function testAll {
local funcs=();
local args=();
local func='';
local -i rc=-1;
while [[ "$1" != ':' ]]; do
func="$1";
if [[ ! "$func" =~ ^[_a-zA-Z][_a-zA-Z0-9]*$ ]]; then
echo "bad function name: $func" >&2;
return 2;
fi;
funcs+=("$func");
shift;
done;
shift;
args=("$@");
for func in "${funcs[@]}"; do
echo -n "$func ";
{ time $func "${args[@]}" >/dev/null 2>&1; } 2>&1| tr '\n' '/';
rc=${PIPESTATUS[0]}; if [[ $rc -ne 0 ]]; then echo "[$rc]"; else echo; fi;
done| column -ts/;
}; ## end testAll()
function makeStringToSplit {
local -i n=$1; ## number of fields
if [[ $n -lt 0 ]]; then echo "bad field count: $n" >&2; return 2; fi;
if [[ $n -eq 0 ]]; then
echo;
elif [[ $n -eq 1 ]]; then
echo 'first field';
elif [[ "$n" -eq 2 ]]; then
echo 'first field, last field';
else
echo "first field, $(rep $[$1-2] 'mid field, ')last field";
fi;
}; ## end makeStringToSplit()
function testAll_splitIntoArray {
local -i n=$1; ## number of fields in input string
local s='';
echo "===== $n field$(if [[ $n -ne 1 ]]; then echo 's'; fi;) =====";
s="$(makeStringToSplit "$n")";
testAll c_readarray c_read c_regex : "$s";
}; ## end testAll_splitIntoArray()
## results
testAll_splitIntoArray 1;
## ===== 1 field =====
## c_readarray real 0m0.067s user 0m0.000s sys 0m0.000s
## c_read real 0m0.064s user 0m0.000s sys 0m0.000s
## c_regex real 0m0.000s user 0m0.000s sys 0m0.000s
##
testAll_splitIntoArray 10;
## ===== 10 fields =====
## c_readarray real 0m0.067s user 0m0.000s sys 0m0.000s
## c_read real 0m0.064s user 0m0.000s sys 0m0.000s
## c_regex real 0m0.001s user 0m0.000s sys 0m0.000s
##
testAll_splitIntoArray 100;
## ===== 100 fields =====
## c_readarray real 0m0.069s user 0m0.000s sys 0m0.062s
## c_read real 0m0.065s user 0m0.000s sys 0m0.046s
## c_regex real 0m0.005s user 0m0.000s sys 0m0.000s
##
testAll_splitIntoArray 1000;
## ===== 1000 fields =====
## c_readarray real 0m0.084s user 0m0.031s sys 0m0.077s
## c_read real 0m0.092s user 0m0.031s sys 0m0.046s
## c_regex real 0m0.125s user 0m0.125s sys 0m0.000s
##
testAll_splitIntoArray 10000;
## ===== 10000 fields =====
## c_readarray real 0m0.209s user 0m0.093s sys 0m0.108s
## c_read real 0m0.333s user 0m0.234s sys 0m0.109s
## c_regex real 0m9.095s user 0m9.078s sys 0m0.000s
##
testAll_splitIntoArray 100000;
## ===== 100000 fields =====
## c_readarray real 0m1.460s user 0m0.326s sys 0m1.124s
## c_read real 0m2.780s user 0m1.686s sys 0m1.092s
## c_regex real 17m38.208s user 15m16.359s sys 2m19.375s
##
Very cool solution! I never thought of using a loop on a regex match, nifty use of $BASH_REMATCH . It works, and
does indeed avoid spawning subshells. +1 from me. However, by way of criticism, the regex itself is a little non-ideal, in that
it appears you were forced to duplicate part of the delimiter token (specifically the comma) so as to work around the lack of
support for non-greedy multipliers (also lookarounds) in ERE ("extended" regex flavor built into bash). This makes it a little
less generic and robust. � bgoldst
Nov 27 '17 at 4:28
Secondly, I did some benchmarking, and although the performance is better than the other solutions for smallish strings, it worsens
exponentially due to the repeated string-rebuilding, becoming catastrophic for very large strings. See my edit to your answer.
� bgoldst
Nov 27 '17 at 4:28
@bgoldst: What a cool benchmark! In defense of the regex, for 10's or 100's of thousands of fields (what the regex is splitting)
there would probably be some form of record (like \n delimited text lines) comprising those fields so the catastrophic
slow-down would likely not occur. If you have a string with 100,000 fields -- maybe Bash is not ideal ;-) Thanks for the benchmark.
I learned a thing or two. � dawg
Nov 27 '17 at 4:46
As others have pointed out in this thread, the OP's question gave an example of a comma delimited string to be parsed into
an array, but did not indicate if he/she was only interested in comma delimiters, single character delimiters, or multi-character
delimiters.
Since Google tends to rank this answer at or near the top of search results, I wanted to provide readers with a strong answer
to the question of multiple character delimiters, since that is also mentioned in at least one response.
If you're in search of a solution to a multi-character delimiter problem, I suggest reviewing
Mallikarjun M 's post, in particular the response
from gniourf_gniourf who provides this elegant
pure BASH solution using parameter expansion:
#!/bin/bash
str="LearnABCtoABCSplitABCaABCString"
delimiter=ABC
s=$str$delimiter
array=();
while [[ $s ]]; do
array+=( "${s%%"$delimiter"*}" );
s=${s#*"$delimiter"};
done;
declare -p array
countries='Paris, France, Europe'
OIFS="$IFS"
IFS=', ' array=($countries)
IFS="$OIFS"
#${array[1]} == Paris
#${array[2]} == France
#${array[3]} == Europe
Bad: subject to word splitting and pathname expansion. Please don't revive old questions with good answers to give bad answers.
� gniourf_gniourf
Dec 19 '16 at 17:22
@GeorgeSovetov: As I said, it's subject to word splitting and pathname expansion. More generally, splitting a string into an
array as array=( $string ) is a (sadly very common) antipattern: word splitting occurs: string='Prague,
Czech Republic, Europe' ; Pathname expansion occurs: string='foo[abcd],bar[efgh]' will fail if you have a
file named, e.g., food or barf in your directory. The only valid usage of such a construct is when
string is a glob. � gniourf_gniourf
Dec 26 '16 at 18:07
Pfft. No. If you're writing scripts large enough for this to matter, you're doing it wrong. In application code, eval is evil.
In shell scripting, it's common, necessary, and inconsequential. �
user1009908
Oct 30 '15 at 4:05
Splitting strings by strings is a pretty boring thing to do using bash. What happens is that we have limited approaches that
only work in a few cases (split by ";", "/", "." and so on) or we have a variety of side effects in the outputs.
The approach below has required a number of maneuvers, but I believe it will work for most of our needs!
#!/bin/bash
# --------------------------------------
# SPLIT FUNCTION
# ----------------
F_SPLIT_R=()
f_split() {
: 'It does a "split" into a given string and returns an array.
Args:
TARGET_P (str): Target string to "split".
DELIMITER_P (Optional[str]): Delimiter used to "split". If not
informed the split will be done by spaces.
Returns:
F_SPLIT_R (array): Array with the provided string separated by the
informed delimiter.
'
F_SPLIT_R=()
TARGET_P=$1
DELIMITER_P=$2
if [ -z "$DELIMITER_P" ] ; then
DELIMITER_P=" "
fi
REMOVE_N=1
if [ "$DELIMITER_P" == "\n" ] ; then
REMOVE_N=0
fi
# NOTE: This was the only parameter that has been a problem so far!
# By Questor
# [Ref.: https://unix.stackexchange.com/a/390732/61742]
if [ "$DELIMITER_P" == "./" ] ; then
DELIMITER_P="[.]/"
fi
if [ ${REMOVE_N} -eq 1 ] ; then
# NOTE: Due to bash limitations we have some problems getting the
# output of a split by awk inside an array and so we need to use
# "line break" (\n) to succeed. Seen this, we remove the line breaks
# momentarily afterwards we reintegrate them. The problem is that if
# there is a line break in the "string" informed, this line break will
# be lost, that is, it is erroneously removed in the output!
# By Questor
TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")
fi
# NOTE: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results
# in more occurrences of "3F2C417D448C46918289218B7337FCAF" than the
# amount of "\n" that there was originally in the string (one more
# occurrence at the end of the string)! We can not explain the reason for
# this side effect. The line below corrects this problem! By Questor
TARGET_P=${TARGET_P%????????????????????????????????}
SPLIT_NOW=$(awk -F"$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}")
while IFS= read -r LINE_NOW ; do
if [ ${REMOVE_N} -eq 1 ] ; then
# NOTE: We use "'" to prevent blank lines with no other characters
# in the sequence being erroneously removed! We do not know the
# reason for this side effect! By Questor
LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'")
# NOTE: We use the commands below to revert the intervention made
# immediately above! By Questor
LN_NOW_WITH_N=${LN_NOW_WITH_N%?}
LN_NOW_WITH_N=${LN_NOW_WITH_N#?}
F_SPLIT_R+=("$LN_NOW_WITH_N")
else
F_SPLIT_R+=("$LINE_NOW")
fi
done <<< "$SPLIT_NOW"
}
# --------------------------------------
# HOW TO USE
# ----------------
STRING_TO_SPLIT="
* How do I list all databases and tables using psql?
\"
sudo -u postgres /usr/pgsql-9.4/bin/psql -c \"\l\"
sudo -u postgres /usr/pgsql-9.4/bin/psql <DB_NAME> -c \"\dt\"
\"
\"
\list or \l: list all databases
\dt: list all tables in the current database
\"
[Ref.: https://dba.stackexchange.com/questions/1285/how-do-i-list-all-databases-and-tables-using-psql]
"
f_split "$STRING_TO_SPLIT" "bin/psql -c"
# --------------------------------------
# OUTPUT AND TEST
# ----------------
ARR_LENGTH=${#F_SPLIT_R[*]}
for (( i=0; i<=$(( $ARR_LENGTH -1 )); i++ )) ; do
echo " > -----------------------------------------"
echo "${F_SPLIT_R[$i]}"
echo " < -----------------------------------------"
done
if [ "$STRING_TO_SPLIT" == "${F_SPLIT_R[0]}bin/psql -c${F_SPLIT_R[1]}" ] ; then
echo " > -----------------------------------------"
echo "The strings are the same!"
echo " < -----------------------------------------"
fi
Rather than changing IFS to match our desired delimiter, we can replace all occurrences of our desired delimiter ", "
with contents of $IFS via "${string//, /$IFS}" .
Maybe this will be slow for very large strings though?
I cover this idea in
my
answer ; see Wrong answer #5 (you might be especially interested in my discussion of the eval trick).
Your solution leaves $IFS set to the comma-space value after-the-fact. �
bgoldst
Aug 13 '17 at 22:38
I'm writing bash script which defines several path constants and will use them for file
and directory manipulation (copying, renaming and deleting). Often it will be necessary to do
something like:
rm -rf "/${PATH1}"
rm -rf "${PATH2}/"*
While developing this script I'd want to protect myself from mistyping names like PATH1
and PATH2 and avoid situations where they are expanded to empty string, thus resulting in
wiping whole disk. I decided to create special wrapper:
rmrf() {
if [[ $1 =~ "regex" ]]; then
echo "Ignoring possibly unsafe path ${1}"
exit 1
fi
shopt -s dotglob
rm -rf -- $1
shopt -u dotglob
}
Which will be called as:
rmrf "/${PATH1}"
rmrf "${PATH2}/"*
Regex (or sed expression) should catch paths like "*", "/*", "/**/", "///*" etc. but allow
paths like "dir", "/dir", "/dir1/dir2/", "/dir1/dir2/*". Also I don't know how to enable
shell globbing in case like "/dir with space/*". Any ideas?
EDIT: this is what I came up with so far:
rmrf() {
local RES
local RMPATH="${1}"
SAFE=$(echo "${RMPATH}" | sed -r 's:^((\.?\*+/+)+.*|(/+\.?\*+)+.*|[\.\*/]+|.*/\.\*+)$::g')
if [ -z "${SAFE}" ]; then
echo "ERROR! Unsafe deletion of ${RMPATH}"
return 1
fi
shopt -s dotglob
if [ '*' == "${RMPATH: -1}" ]; then
echo rm -rf -- "${RMPATH/%\*/}"*
RES=$?
else
echo rm -rf -- "${RMPATH}"
RES=$?
fi
shopt -u dotglob
return $RES
}
Intended use is (note an asterisk inside quotes):
rmrf "${SOMEPATH}"
rmrf "${SOMEPATH}/*"
where $SOMEPATH is not system or /home directory (in my case all such operations are
performed on filesystem mounted under /scratch directory).
CAVEATS:
not tested very well
not intended to use with paths possibly containing '..' or '.'
should not be used with user-supplied paths
rm -rf with asterisk probably can fail if there are too many files or directories
inside $SOMEPATH (because of limited command line length) - this can be fixed with 'for'
loop or 'find' command
I've found a big danger with rm in bash is that bash usually doesn't stop for errors. That
means that:
cd $SOMEPATH
rm -rf *
Is a very dangerous combination if the change directory fails. A safer way would be:
cd $SOMEPATH && rm -rf *
Which will ensure the rf won't run unless you are really in $SOMEPATH. This doesn't
protect you from a bad $SOMEPATH but it can be combined with the advice given by others to
help make your script safer.
EDIT: @placeybordeaux makes a good point that if $SOMEPATH is undefined or empty
cd doesn't treat it as an error and returns 0. In light of that this answer
should be considered unsafe unless $SOMEPATH is validated as existing and non-empty first. I
believe cd with no args should be an illegal command since at best is performs a
no-op and at worse it can lead to unexpected behaviour but it is what it is.
Instead of cd $SOMEPATH , you should write cd "${SOMEPATH?}" . The
${varname?} notation ensures that the expansion fails with a warning-message if
the variable is unset or empty (such that the && ... part is never run);
the double-quotes ensure that special characters in $SOMEPATH , such as
whitespace, don't have undesired effects. – ruakh
Jul 13 '18 at 6:46
There is a set -u bash directive that will cause exit, when uninitialized
variable is used. I read about it here , with
rm -rf as an example. I think that's what you're looking for. And here is
set's manual .
,Jun 14, 2009 at 12:38
I think "rm" command has a parameter to avoid the deleting of "/". Check it out.
Generally, when I'm developing a command with operations such as ' rm -fr ' in
it, I will neutralize the remove during development. One way of doing that is:
RMRF="echo rm -rf"
...
$RMRF "/${PATH1}"
This shows me what should be deleted - but does not delete it. I will do a manual clean up
while things are under development - it is a small price to pay for not running the risk of
screwing up everything.
The notation ' "/${PATH1}" ' is a little unusual; normally, you would ensure
that PATH1 simply contains an absolute pathname.
Using the metacharacter with ' "${PATH2}/"* ' is unwise and unnecessary. The
only difference between using that and using just ' "${PATH2}" ' is that if the
directory specified by PATH2 contains any files or directories with names starting with dot,
then those files or directories will not be removed. Such a design is unlikely and is rather
fragile. It would be much simpler just to pass PATH2 and let the recursive remove do its job.
Adding the trailing slash is not necessarily a bad idea; the system would have to ensure that
$PATH2 contains a directory name, not just a file name, but the extra protection
is rather minimal.
Using globbing with ' rm -fr ' is usually a bad idea. You want to be precise
and restrictive and limiting in what it does - to prevent accidents. Of course, you'd never
run the command (shell script you are developing) as root while it is under development -
that would be suicidal. Or, if root privileges are absolutely necessary, you neutralize the
remove operation until you are confident it is bullet-proof.
To delete subdirectories and files starting with dot I use "shopt -s dotglob". Using rm -rf
"${PATH2}" is not appropriate because in my case PATH2 can be only removed by superuser and
this results in error status for "rm" command (and I verify it to track other errors).
– Max
Jun 14 '09 at 13:09
Then, with due respect, you should use a private sub-directory under $PATH2 that you can
remove. Avoid glob expansion with commands like 'rm -rf' like you would avoid the plague (or
should that be A/H1N1?). – Jonathan Leffler
Jun 14 '09 at 13:37
If it is possible, you should try and put everything into a folder with a hard-coded name
which is unlikely to be found anywhere else on the filesystem, such as '
foofolder '. Then you can write your rmrf() function as:
You don't need to use regular expressions.
Just assign the directories you want to protect to a variable and then iterate over the
variable. eg:
protected_dirs="/ /bin /usr/bin /home $HOME"
for d in $protected_dirs; do
if [ "$1" = "$d" ]; then
rm=0
break;
fi
done
if [ ${rm:-1} -eq 1 ]; then
rm -rf $1
fi
,
Add the following codes to your ~/.bashrc
# safe delete
move_to_trash () { now="$(date +%Y%m%d_%H%M%S)"; mv "$@" ~/.local/share/Trash/files/"$@_$now"; }
alias del='move_to_trash'
# safe rm
alias rmi='rm -i'
Every time you need to rm something, first consider del , you
can change the trash folder. If you do need to rm something, you could go to the
trash folder and use rmi .
One small bug for del is that when del a folder, for example,
my_folder , it should be del my_folder but not del
my_folder/ since in order for possible later restore, I attach the time information in
the end ( "$@_$now" ). For files, it works fine.
I'm surprised it works with cat but not with echo. cat should expect a file name as stdin,
not a char string. psql << EOF sounds logical, but not othewise. Works with cat but not
with echo. Strange behaviour. Any clue about that? – Alex
Mar 23 '15 at 23:31
Answering to myself: cat without parameters executes and replicates to the output whatever
send via input (stdin), hence using its output to fill the file via >. In fact a file name
read as a parameter is not a stdin stream. – Alex
Mar 23 '15 at 23:39
@Alex echo just prints it's command line arguments while cat reads stding(when
piped to it) or reads a file that corresponds to it's command line args – The-null-Pointer-
Jan 1 '18 at 18:03
This type of redirection instructs the shell to read input from the current source until
a line containing only word (with no trailing blanks) is seen.
All of the lines read up to that point are then used as the standard input for a
command.
The format of here-documents is:
<<[-]word
here-document
delimiter
No parameter expansion, command substitution, arithmetic expansion, or pathname
expansion is performed on word . If any characters in word are quoted, the delimiter is the
result of quote removal on word , and the lines in the here-document are not expanded. If
word is unquoted, all lines of the here-document are subjected to parameter expansion,
command substitution, and arithmetic expansion. In the latter case, the character sequence
\<newline> is ignored, and \ must be used to quote the
characters \ , $ , and ` .
If the redirection operator is <<- , then all leading tab characters
are stripped from input lines and the line containing delimiter . This allows
here-documents within shell scripts to be indented in a natural fashion.
I was having the hardest time disabling variable/parameter expansion. All I needed to do was
use "double-quotes" and that fixed it! Thanks for the info! – Xeoncross
May 26 '11 at 22:51
Concerning <<- please note that only leading tab characters are
stripped -- not soft tab characters. This is one of those rare case when you actually need
the tab character. If the rest of your document uses soft tabs, make sure to show invisible
characters and (e.g.) copy and paste a tab character. If you do it right, your syntax
highlighting should correctly catch the ending delimiter. – trkoch
Nov 10 '15 at 17:23
I don't see how this answer is more helpful than the ones below. It merely regurgitates
information that can be found in other places (that have likely already been checked) –
BrDaHa
Jul 13 '17 at 19:01
The cat <<EOF syntax is very useful when working with multi-line text in
Bash, eg. when assigning multi-line string to a shell variable, file or a pipe. Examples
of cat <<EOF syntax usage in Bash:1. Assign multi-line string to a
shell variable
$ sql=$(cat <<EOF
SELECT foo, bar FROM db
WHERE foo='baz'
EOF
)
The $sql variable now holds the new-line characters too. You can verify
with echo -e "$sql" .
In your case, "EOF" is known as a "Here Tag". Basically <<Here tells the
shell that you are going to enter a multiline string until the "tag" Here . You
can name this tag as you want, it's often EOF or STOP .
Some rules about the Here tags:
The tag can be any string, uppercase or lowercase, though most people use uppercase by
convention.
The tag will not be considered as a Here tag if there are other words in that line. In
this case, it will merely be considered part of the string. The tag should be by itself on
a separate line, to be considered a tag.
The tag should have no leading or trailing spaces in that line to be considered a tag.
Otherwise it will be considered as part of the string.
example:
$ cat >> test <<HERE
> Hello world HERE <-- Not by itself on a separate line -> not considered end of string
> This is a test
> HERE <-- Leading space, so not considered end of string
> and a new line
> HERE <-- Now we have the end of the string
this is the best actual answer ... you define both and clearly state the primary purpose of
the use instead of related theory ... which is important but not necessary ... thanks - super
helpful – oemb1905
Feb 22 '17 at 7:17
The redirection operators "<<" and "<<-" both allow redirection of lines
contained in a shell input file, known as a "here-document", to the input of a command.
The here-document shall be treated as a single word that begins after the next and
continues until there is a line containing only the delimiter and a , with no characters in
between. Then the next here-document starts, if there is one. The format is as follows:
[n]<<word
here-document
delimiter
where the optional n represents the file descriptor number. If the number is omitted,
the here-document refers to standard input (file descriptor 0).
If any character in word is quoted, the delimiter shall be formed by performing quote
removal on word, and the here-document lines shall not be expanded. Otherwise, the
delimiter shall be the word itself.
If no characters in word are quoted, all lines of the here-document shall be expanded
for parameter expansion, command substitution, and arithmetic expansion. In this case, the
in the input behaves as the inside double-quotes (see Double-Quotes). However, the
double-quote character ( '"' ) shall not be treated specially within a here-document,
except when the double-quote appears within "$()", "``", or "${}".
If the redirection symbol is "<<-", all leading <tab>
characters shall be stripped from input lines and the line containing the trailing
delimiter. If more than one "<<" or "<<-" operator is specified on a line, the
here-document associated with the first operator shall be supplied first by the application
and shall be read first by the shell.
When a here-document is read from a terminal device and the shell is interactive, it
shall write the contents of the variable PS2, processed as described in Shell Variables, to
standard error before reading each line of input until the delimiter has been
recognized.
Examples
Some examples not yet given.
Quotes prevent parameter expansion
Without quotes:
a=0
cat <<EOF
$a
EOF
Output:
0
With quotes:
a=0
cat <<'EOF'
$a
EOF
or (ugly but valid):
a=0
cat <<E"O"F
$a
EOF
Outputs:
$a
Hyphen removes leading tabs
Without hyphen:
cat <<EOF
<tab>a
EOF
where <tab> is a literal tab, and can be inserted with Ctrl + V
<tab>
Output:
<tab>a
With hyphen:
cat <<-EOF
<tab>a
<tab>EOF
Output:
a
This exists of course so that you can indent your cat like the surrounding
code, which is easier to read and maintain. E.g.:
if true; then
cat <<-EOF
a
EOF
fi
Unfortunately, this does not work for space characters: POSIX favored tab
indentation here. Yikes.
In your last example discussing <<- and <tab>a , it
should be noted that the purpose was to allow normal indentation of code within the script
while allowing heredoc text presented to the receiving process to begin in column 0. It is a
not too commonly seen feature and a bit more context may prevent a good deal of
head-scratching... – David C. Rankin
Aug 12 '15 at 7:10
@JeanmichelCote I don't see a better option :-) With regular strings you can also consider
mixing up quotes like "$a"'$b'"$c" , but there is no analogue here AFAIK.
–
Ciro Santilli 新疆改造中心
六四事件 法轮功
Sep 23 '15 at 20:01
Not exactly as an answer to the original question, but I wanted to share this anyway: I
had the need to create a config file in a directory that required root rights.
I am planning to learn Lua for my desktop scripting needs. I want to know if there is any documentation available and also if
there are all the things needed in the Standard Lib.
Thanks uroc for your quick response. If possible, please let me know of any beginner tutorial or atleast some sample code for
using COM programming via Lua. :) � Animesh
Oct 14 '09 at 12:26
More complex code example for lua working with excel:
require "luacom"
excel = luacom.CreateObject("Excel.Application")
local book = excel.Workbooks:Add()
local sheet = book.Worksheets(1)
excel.Visible = true
for row=1, 30 do
for col=1, 30 do
sheet.Cells(row, col).Value2 = math.floor(math.random() * 100)
end
end
local range = sheet:Range("A1")
for row=1, 30 do
for col=1, 30 do
local v = sheet.Cells(row, col).Value2
if v > 50 then
local cell = range:Offset(row-1, col-1)
cell:Select()
excel.Selection.Interior.Color = 65535
end
end
end
excel.DisplayAlerts = false
excel:Quit()
excel = nil
Another example, could add a graph chart.
require "luacom"
excel = luacom.CreateObject("Excel.Application")
local book = excel.Workbooks:Add()
local sheet = book.Worksheets(1)
excel.Visible = true
for row=1, 30 do
sheet.Cells(row, 1).Value2 = math.floor(math.random() * 100)
end
local chart = excel.Charts:Add()
chart.ChartType = 4 -- xlLine
local range = sheet:Range("A1:A30")
chart:SetSourceData(range)
A quick suggestion: fragments of code will look better if you format them as code (use the little "101 010" button). �
Incredulous Monk
Oct 19 '09 at 4:17
I defined my own listing mode and I'd like to make it permanent so that on the next mc
start my defined listing mode will be set. I found no configuration file for mc.
,
You have probably Auto save setup turned off in
Options->Configuration menu.
You can save the configuration manually by Options->Save setup .
Panels setup is saved to ~/.config/mc/panels.ini .
I have a Ubuntu 12.04 server I bought, if I connect with putty using ssh and a sudoer user
putty gets disconnected by the server after some time if I am idle How do I configure Ubuntu to keep this connection alive indefinitely?
No, it's the time between keepalives. If you set it to 0, no keepalives are sent but you want
putty to send keepalives to keep the connection alive. – das Keks
Feb 19 at 11:46
In addition to the answer from "das Keks" there is at least one other aspect that can affect
this behavior. Bash (usually the default shell on Ubuntu) has a value TMOUT
which governs (decimal value in seconds) after which time an idle shell session will time out
and the user will be logged out, leading to a disconnect in an SSH session.
In addition I would strongly recommend that you do something else entirely. Set up
byobu (or even just tmux alone as it's superior to GNU
screen ) and always log in and attach to a preexisting session (that's GNU
screen and tmux terminology). This way even if you get forcibly
disconnected - let's face it, a power outage or network interruption can always happen - you
can always resume your work where you left. And that works across different machines. So you
can connect to the same session from another machine (e.g. from home). The possibilities are
manifold and it's a true productivity booster. And not to forget, terminal multiplexers
overcome one of the big disadvantages of PuTTY: no tabbed interface. Now you get "tabs" in
the form of windows and panes inside GNU screen and tmux .
apt-get install tmux
apt-get install byobu
Byobu is a nice frontend to both terminal multiplexers, but tmux is so
comfortable that in my opinion it obsoletes byobu to a large extent. So my
recommendation would be tmux .
Also search for "dotfiles", in particular tmux.conf and
.tmux.conf on the web for many good customizations to get you started.
Change the default value for "Seconds between keepalives(0 to turn off)" : from 0 to
600 (10 minutes) --This varies...reduce if 10 minutes doesn't help
Check the "Enable TCP_keepalives (SO_KEEPALIVE option)" check box.
Finally save setting for session
,
I keep my PuTTY sessions alive by monitoring the cron logs
tail -f /var/log/cron
I want the PuTTY session alive because I'm proxying through socks.
I have a script which, when I run it from PuTTY, it scrolls the screen. Now, I want to go
back to see the errors, but when I scroll up, I can see the past commands, but not the output
of the command.
I would recommend using screen if you want to have good control over the
scroll buffer on a remote shell.
You can change the scroll buffer size to suit your needs by setting:
defscrollback 4000
in ~/.screenrc , which will specify the number of lines you want to be
buffered (4000 in this case).
Then you should run your script in a screen session, e.g. by executing screen
./myscript.sh or first executing screen and then
./myscript.sh inside the session.
It's also possible to enable logging of the console output to a file. You can find more
info on the screen's man page
.
,
From your descript, it sounds like the "problem" is that you are using screen, tmux, or
another window manager dependent on them (byobu). Normally you should be able to scroll back
in putty with no issue. Exceptions include if you are in an application like less or nano
that creates it's own "window" on the terminal.
With screen and tmux you can generally scroll back with SHIFT + PGUP (same as
you could from the physical terminal of the remote machine). They also both have a "copy"
mode that frees the cursor from the prompt and lets you use arrow keys to move it around (for
selecting text to copy with just the keyboard). It also lets you scroll up and down with the
PGUP and PGDN keys. Copy mode under byobu using screen or tmux
backends is accessed by pressing F7 (careful, F6 disconnects the
session). To do so directly under screen you press CTRL + a then
ESC or [ . You can use ESC to exit copy mode. Under
tmux you press CTRL + b then [ to enter copy mode and
] to exit.
The simplest solution, of course, is not to use either. I've found both to be quite a bit
more trouble than they are worth. If you would like to use multiple different terminals on a
remote machine simply connect with multiple instances of putty and manage your windows using,
er... Windows. Now forgive me but I must flee before I am burned at the stake for my
heresy.
EDIT: almost forgot, some keys may not be received correctly by the remote terminal if
putty has not been configured correctly. In your putty config check Terminal ->
Keyboard . You probably want the function keys and keypad set to be either
Linux or Xterm R6 . If you are seeing strange characters on the
terminal when attempting the above this is most likely the problem.
I am trying to backup my file server to a
remove file server using rsync. Rsync is not
successfully resuming when a transfer is
interrupted. I used the partial option but
rsync doesn't find the file it already
started because it renames it to a temporary
file and when resumed it creates a new file
and starts from beginning.
When this command is ran, a backup file
named
OldDisk.dmg
from my
local machine get created on the remote
machine as something like
.OldDisk.dmg.SjDndj23
.
Now when the internet connection gets
interrupted and I have to resume the
transfer, I have to find where rsync left
off by finding the temp file like
.OldDisk.dmg.SjDndj23
and rename it
to
OldDisk.dmg
so that it
sees there already exists a file that it can
resume.
How do I fix this so I don't have to
manually intervene each time?
TL;DR
: Use
--timeout=X
(X in seconds) to
change the default rsync server timeout,
not
--inplace
.
The issue
is the rsync server processes (of which
there are two, see
rsync --server
...
in
ps
output on
the receiver) continue running, to wait
for the rsync client to send data.
If the rsync server processes do not
receive data for a sufficient time, they
will indeed timeout, self-terminate and
cleanup by moving the temporary file to
it's "proper" name (e.g., no temporary
suffix). You'll then be able to resume.
If you don't want to wait for the
long default timeout to cause the rsync
server to self-terminate, then when your
internet connection returns, log into
the server and clean up the rsync server
processes manually. However, you
must politely terminate
rsync --
otherwise, it will not move the partial
file into place; but rather, delete it
(and thus there is no file to resume).
To politely ask rsync to terminate, do
not
SIGKILL
(e.g.,
-9
),
but
SIGTERM
(e.g.,
pkill -TERM -x rsync
- only an
example, you should take care to match
only the rsync processes concerned with
your client).
Fortunately there is an easier way:
use the
--timeout=X
(X in
seconds) option; it is passed to the
rsync server processes as well.
For example, if you specify
rsync ... --timeout=15 ...
, both
the client and server rsync processes
will cleanly exit if they do not
send/receive data in 15 seconds. On the
server, this means moving the temporary
file into position, ready for resuming.
I'm not sure of the default timeout
value of the various rsync processes
will try to send/receive data before
they die (it might vary with operating
system). In my testing, the server rsync
processes remain running longer than the
local client. On a "dead" network
connection, the client terminates with a
broken pipe (e.g., no network socket)
after about 30 seconds; you could
experiment or review the source code.
Meaning, you could try to "ride out" the
bad internet connection for 15-20
seconds.
If you do not clean up the server
rsync processes (or wait for them to
die), but instead immediately launch
another rsync client process, two
additional server processes will launch
(for the other end of your new client
process). Specifically, the new rsync
client
will not
re-use/reconnect to the existing rsync
server processes. Thus, you'll have two
temporary files (and four rsync server
processes) -- though, only the newer,
second temporary file has new data being
written (received from your new rsync
client process).
Interestingly, if you then clean up
all rsync server processes (for example,
stop your client which will stop the new
rsync servers, then
SIGTERM
the older rsync servers, it appears to
merge (assemble) all the partial files
into the new proper named file. So,
imagine a long running partial copy
which dies (and you think you've "lost"
all the copied data), and a short
running re-launched rsync (oops!).. you
can stop the second client,
SIGTERM
the first servers, it
will merge the data, and you can resume.
Finally, a few short remarks:
Don't use
--inplace
to workaround this. You will
undoubtedly have other problems as a
result,
man rsync
for
the details.
It's trivial, but
-t
in your rsync options is redundant,
it is implied by
-a
.
An already compressed disk image
sent over rsync
without
compression might result in shorter
transfer time (by avoiding double
compression). However, I'm unsure of
the compression techniques in both
cases. I'd test it.
As far as I understand
--checksum
/
-c
,
it won't help you in this case. It
affects how rsync decides if it
should
transfer a file. Though,
after a first rsync completes, you
could run a
second
rsync
with
-c
to insist on
checksums, to prevent the strange
case that file size and modtime are
the same on both sides, but bad data
was written.
I
didn't test how the
server-side rsync handles
SIGINT, so I'm not sure it
will keep the partial file -
you could check. Note that
this doesn't have much to do
with
Ctrl-c
; it
happens that your terminal
sends
SIGINT
to
the foreground process when
you press
Ctrl-c
,
but the server-side rsync
has no controlling terminal.
You must log in to the
server and use
kill
.
The client-side rsync will
not send a message to the
server (for example, after
the client receives
SIGINT
via your
terminal
Ctrl-c
)
- might be interesting
though. As for
anthropomorphizing, not sure
what's "politer". :-)
�
Richard
Michael
Dec 29 '13 at 22:34
I
just tried this timeout
argument
rsync -av
--delete --progress --stats
--human-readable --checksum
--timeout=60 --partial-dir /tmp/rsync/
rsync://$remote:/ /src/
but then it timed out during
the "receiving file list"
phase (which in this case
takes around 30 minutes).
Setting the timeout to half
an hour so kind of defers
the purpose. Any workaround
for this?
�
d-b
Feb 3 '15 at 8:48
@user23122
--checksum
reads all data when
preparing the file list,
which is great for many
small files that change
often, but should be done
on-demand for large files.
�
Cees
Timmerman
Sep 15 '15 at 17:10
How can I find which process is constantly writing to disk?
I like my workstation to be close to silent and I just build a new system (P8B75-M + Core i5 3450s -- the 's' because it has
a lower max TDP) with quiet fans etc. and installed Debian Wheezy 64-bit on it.
And something is getting on my nerve: I can hear some kind of pattern like if the hard disk was writing or seeking someting
( tick...tick...tick...trrrrrr rinse and repeat every second or so).
In the past I had a similar issue in the past (many, many years ago) and it turned out it was some CUPS log or something and
I simply redirected that one (not important) logging to a (real) RAM disk.
But here I'm not sure.
I tried the following:
ls -lR /var/log > /tmp/a.tmp && sleep 5 && ls -lR /var/log > /tmp/b.tmp && diff /tmp/?.tmp
but nothing is changing there.
Now the strange thing is that I also hear the pattern when the prompt asking me to enter my LVM decryption passphrase is showing.
Could it be something in the kernel/system I just installed or do I have a faulty harddisk?
hdparm -tT /dev/sda report a correct HD speed (130 GB/s non-cached, sata 6GB) and I've already installed and compiled
from big sources (Emacs) without issue so I don't think the system is bad.
Are you sure it's a hard drive making that noise, and not something else? (Check the fans, including PSU fan. Had very strange
clicking noises once when a very thin cable was too close to a fan and would sometimes very slightly touch the blades and bounce
for a few "clicks"...) � Mat
Jul 27 '12 at 6:03
@Mat: I'll take the hard drive outside of the case (the connectors should be long enough) to be sure and I'll report back ; )
� Cedric Martin
Jul 27 '12 at 7:02
Make sure your disk filesystems are mounted relatime or noatime. File reads can be causing writes to inodes to record the access
time. � camh
Jul 27 '12 at 9:48
thanks for that tip. I didn't know about iotop . On Debian I did an apt-cache search iotop to find out that I had
to apt-get iotop . Very cool command! �
Cedric Martin
Aug 2 '12 at 15:56
I use iotop -o -b -d 10 which every 10secs prints a list of processes that read/wrote to disk and the amount of IO
bandwidth used. � ndemou
Jun 20 '16 at 15:32
You can enable IO debugging via echo 1 > /proc/sys/vm/block_dump and then watch the debugging messages in /var/log/syslog
. This has the advantage of obtaining some type of log file with past activities whereas iotop only shows the current
activity.
It is absolutely crazy to leave sysloging enabled when block_dump is active. Logging causes disk activity, which causes logging,
which causes disk activity etc. Better stop syslog before enabling this (and use dmesg to read the messages) �
dan3
Jul 15 '13 at 8:32
You are absolutely right, although the effect isn't as dramatic as you describe it. If you just want to have a short peek at the
disk activity there is no need to stop the syslog daemon. �
scai
Jul 16 '13 at 6:32
I've tried it about 2 years ago and it brought my machine to a halt. One of these days when I have nothing important running I'll
try it again :) � dan3
Jul 16 '13 at 7:22
I tried it, nothing really happened. Especially because of file system buffering. A write to syslog doesn't immediately trigger
a write to disk. � scai
Jul 16 '13 at 10:50
auditctl -S sync -S fsync -S fdatasync -a exit,always
Watch the logs in /var/log/audit/audit.log . Be careful not to do this if the audit logs themselves are flushed!
Check in /etc/auditd.conf that the flush option is set to none .
If files are being flushed often, a likely culprit is the system logs. For example, if you log failed incoming connection attempts
and someone is probing your machine, that will generate a lot of entries; this can cause a disk to emit machine gun-style noises.
With the basic log daemon sysklogd, check /etc/syslog.conf : if a log file name is not be preceded by -
, then that log is flushed to disk after each write.
It might be your drives automatically spinning down, lots of consumer-grade drives do that these days. Unfortunately on even a
lightly loaded system, this results in the drives constantly spinning down and then spinning up again, especially if you're running
hddtemp or similar to monitor the drive temperature (most drives stupidly don't let you query the SMART temperature value without
spinning up the drive - cretinous!).
I disable idle-spindown on all my drives with the following bit of shell code. you could put it in an /etc/rc.boot script,
or in /etc/rc.local or similar.
for disk in /dev/sd? ; do
/sbin/hdparm -q -S 0 "/dev/$disk"
done
that you can't query SMART readings without spinning up the drive leaves me speechless :-/ Now obviously the "spinning down" issue
can become quite complicated. Regarding disabling the spinning down: wouldn't that in itself cause the HD to wear out faster?
I mean: it's never ever "resting" as long as the system is on then? �
Cedric Martin
Aug 2 '12 at 16:03
IIRC you can query some SMART values without causing the drive to spin up, but temperature isn't one of them on any of the drives
i've tested (incl models from WD, Seagate, Samsung, Hitachi). Which is, of course, crazy because concern over temperature is one
of the reasons for idling a drive. re: wear: AIUI 1. constant velocity is less wearing than changing speed. 2. the drives have
to park the heads in a safe area and a drive is only rated to do that so many times (IIRC up to a few hundred thousand - easily
exceeded if the drive is idling and spinning up every few seconds) �
cas
Aug 2 '12 at 21:42
It's a long debate regarding whether it's better to leave drives running or to spin them down. Personally I believe it's best
to leave them running - I turn my computer off at night and when I go out but other than that I never spin my drives down. Some
people prefer to spin them down, say, at night if they're leaving the computer on or if the computer's idle for a long time, and
in such cases the advantage of spinning them down for a few hours versus leaving them running is debatable. What's never good
though is when the hard drive repeatedly spins down and up again in a short period of time. �
Micheal Johnson
Mar 12 '16 at 20:48
Note also that spinning the drive down after it's been idle for a few hours is a bit silly, because if it's been idle for
a few hours then it's likely to be used again within an hour. In that case, it would seem better to spin the drive down promptly
if it's idle (like, within 10 minutes), but it's also possible for the drive to be idle for a few minutes when someone is using
the computer and is likely to need the drive again soon. �
Micheal Johnson
Mar 12 '16 at 20:51
,
I just found that s.m.a.r.t was causing an external USB disk to spin up again and again on my raspberry pi. Although SMART is
generally a good thing, I decided to disable it again and since then it seems that unwanted disk activity has stopped
(or some variant of parameters with lsof) I can determine which process is bound to a particular port. This is useful say if
I'm trying to start something that wants to bind to 8080 and some else is already using that port, but I don't know what.
Is there an easy way to do this without using lsof? I spend time working on many systems and lsof is often not installed.
netstat -lnp will list the pid and process name next to each listening port. This will work under Linux, but not
all others (like AIX.) Add -t if you want TCP only.
# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:24800 0.0.0.0:* LISTEN 27899/synergys
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN 3361/python
tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 2264/mysqld
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 22964/apache2
tcp 0 0 192.168.99.1:53 0.0.0.0:* LISTEN 3389/named
tcp 0 0 192.168.88.1:53 0.0.0.0:* LISTEN 3389/named
etc.
xxx , Mar 14, 2011 at 21:01
Cool, thanks. Looks like that that works under RHEL, but not under Solaris (as you indicated). Anybody know if there's something
similar for Solaris? � user5721
Mar 14 '11 at 21:01
Thanks for this! Is there a way, however, to just display what process listen on the socket (instead of using rmsock which attempt
to remove it) ? � Olivier Dulac
Sep 18 '13 at 4:05
@vitor-braga: Ah thx! I thought it was trying but just said which process holds in when it couldn't remove it. Apparently it doesn't
even try to remove it when a process holds it. That's cool! Thx! �
Olivier Dulac
Sep 26 '13 at 16:00
Another tool available on Linux is ss . From the ss man page on Fedora:
NAME
ss - another utility to investigate sockets
SYNOPSIS
ss [options] [ FILTER ]
DESCRIPTION
ss is used to dump socket statistics. It allows showing information
similar to netstat. It can display more TCP and state informations
than other tools.
Example output below - the final column shows the process binding:
I was once faced with trying to determine what process was behind a particular port (this time it was 8000). I tried a variety
of lsof and netstat, but then took a chance and tried hitting the port via a browser (i.e.
http://hostname:8000/ ). Lo and behold, a splash screen greeted me, and it
became obvious what the process was (for the record, it was Splunk ).
One more thought: "ps -e -o pid,args" (YMMV) may sometimes show the port number in the arguments list. Grep is your friend!
In the same vein, you could telnet hostname 8000 and see if the server prints a banner. However, that's mostly useful
when the server is running on a machine where you don't have shell access, and then finding the process ID isn't relevant. �
Gilles
May 8 '11 at 14:45
How can I find which process is constantly writing to disk?
I like my workstation to be close to silent and I just build a new system (P8B75-M + Core i5 3450s -- the 's' because it has
a lower max TDP) with quiet fans etc. and installed Debian Wheezy 64-bit on it.
And something is getting on my nerve: I can hear some kind of pattern like if the hard disk was writing or seeking someting
( tick...tick...tick...trrrrrr rinse and repeat every second or so).
In the past I had a similar issue in the past (many, many years ago) and it turned out it was some CUPS log or something and
I simply redirected that one (not important) logging to a (real) RAM disk.
But here I'm not sure.
I tried the following:
ls -lR /var/log > /tmp/a.tmp && sleep 5 && ls -lR /var/log > /tmp/b.tmp && diff /tmp/?.tmp
but nothing is changing there.
Now the strange thing is that I also hear the pattern when the prompt asking me to enter my LVM decryption passphrase is showing.
Could it be something in the kernel/system I just installed or do I have a faulty harddisk?
hdparm -tT /dev/sda report a correct HD speed (130 GB/s non-cached, sata 6GB) and I've already installed and compiled
from big sources (Emacs) without issue so I don't think the system is bad.
Are you sure it's a hard drive making that noise, and not something else? (Check the fans, including PSU fan. Had very strange
clicking noises once when a very thin cable was too close to a fan and would sometimes very slightly touch the blades and bounce
for a few "clicks"...) � Mat
Jul 27 '12 at 6:03
@Mat: I'll take the hard drive outside of the case (the connectors should be long enough) to be sure and I'll report back ; )
� Cedric Martin
Jul 27 '12 at 7:02
Make sure your disk filesystems are mounted relatime or noatime. File reads can be causing writes to inodes to record the access
time. � camh
Jul 27 '12 at 9:48
thanks for that tip. I didn't know about iotop . On Debian I did an apt-cache search iotop to find out that I had
to apt-get iotop . Very cool command! �
Cedric Martin
Aug 2 '12 at 15:56
I use iotop -o -b -d 10 which every 10secs prints a list of processes that read/wrote to disk and the amount of IO
bandwidth used. � ndemou
Jun 20 '16 at 15:32
You can enable IO debugging via echo 1 > /proc/sys/vm/block_dump and then watch the debugging messages in /var/log/syslog
. This has the advantage of obtaining some type of log file with past activities whereas iotop only shows the current
activity.
It is absolutely crazy to leave sysloging enabled when block_dump is active. Logging causes disk activity, which causes logging,
which causes disk activity etc. Better stop syslog before enabling this (and use dmesg to read the messages) �
dan3
Jul 15 '13 at 8:32
You are absolutely right, although the effect isn't as dramatic as you describe it. If you just want to have a short peek at the
disk activity there is no need to stop the syslog daemon. �
scai
Jul 16 '13 at 6:32
I've tried it about 2 years ago and it brought my machine to a halt. One of these days when I have nothing important running I'll
try it again :) � dan3
Jul 16 '13 at 7:22
I tried it, nothing really happened. Especially because of file system buffering. A write to syslog doesn't immediately trigger
a write to disk. � scai
Jul 16 '13 at 10:50
auditctl -S sync -S fsync -S fdatasync -a exit,always
Watch the logs in /var/log/audit/audit.log . Be careful not to do this if the audit logs themselves are flushed!
Check in /etc/auditd.conf that the flush option is set to none .
If files are being flushed often, a likely culprit is the system logs. For example, if you log failed incoming connection attempts
and someone is probing your machine, that will generate a lot of entries; this can cause a disk to emit machine gun-style noises.
With the basic log daemon sysklogd, check /etc/syslog.conf : if a log file name is not be preceded by -
, then that log is flushed to disk after each write.
It might be your drives automatically spinning down, lots of consumer-grade drives do that these days. Unfortunately on even a
lightly loaded system, this results in the drives constantly spinning down and then spinning up again, especially if you're running
hddtemp or similar to monitor the drive temperature (most drives stupidly don't let you query the SMART temperature value without
spinning up the drive - cretinous!).
I disable idle-spindown on all my drives with the following bit of shell code. you could put it in an /etc/rc.boot script,
or in /etc/rc.local or similar.
for disk in /dev/sd? ; do
/sbin/hdparm -q -S 0 "/dev/$disk"
done
that you can't query SMART readings without spinning up the drive leaves me speechless :-/ Now obviously the "spinning down" issue
can become quite complicated. Regarding disabling the spinning down: wouldn't that in itself cause the HD to wear out faster?
I mean: it's never ever "resting" as long as the system is on then? �
Cedric Martin
Aug 2 '12 at 16:03
IIRC you can query some SMART values without causing the drive to spin up, but temperature isn't one of them on any of the drives
i've tested (incl models from WD, Seagate, Samsung, Hitachi). Which is, of course, crazy because concern over temperature is one
of the reasons for idling a drive. re: wear: AIUI 1. constant velocity is less wearing than changing speed. 2. the drives have
to park the heads in a safe area and a drive is only rated to do that so many times (IIRC up to a few hundred thousand - easily
exceeded if the drive is idling and spinning up every few seconds) �
cas
Aug 2 '12 at 21:42
It's a long debate regarding whether it's better to leave drives running or to spin them down. Personally I believe it's best
to leave them running - I turn my computer off at night and when I go out but other than that I never spin my drives down. Some
people prefer to spin them down, say, at night if they're leaving the computer on or if the computer's idle for a long time, and
in such cases the advantage of spinning them down for a few hours versus leaving them running is debatable. What's never good
though is when the hard drive repeatedly spins down and up again in a short period of time. �
Micheal Johnson
Mar 12 '16 at 20:48
Note also that spinning the drive down after it's been idle for a few hours is a bit silly, because if it's been idle for
a few hours then it's likely to be used again within an hour. In that case, it would seem better to spin the drive down promptly
if it's idle (like, within 10 minutes), but it's also possible for the drive to be idle for a few minutes when someone is using
the computer and is likely to need the drive again soon. �
Micheal Johnson
Mar 12 '16 at 20:51
,
I just found that s.m.a.r.t was causing an external USB disk to spin up again and again on my raspberry pi. Although SMART is
generally a good thing, I decided to disable it again and since then it seems that unwanted disk activity has stopped
Although this is very simple to read and write, is a very slow solution because forces you to
read twice the same data ($STR) ... if you care of your script performace, the @anubhava
solution is much better – FSp
Nov 27 '12 at 10:26
Apart from being an ugly last-resort solution, this has a bug: You should absolutely use
double quotes in echo "$STR" unless you specifically want the shell to expand
any wildcards in the string as a side effect. See also stackoverflow.com/questions/10067266/
– tripleee
Jan 25 '16 at 6:47
You're right about double quotes of course, though I did point out this solution wasn't
general. However I think your assessment is a bit unfair - for some people this solution may
be more readable (and hence extensible etc) than some others, and doesn't completely rely on
arcane bash feature that wouldn't translate to other shells. I suspect that's why my
solution, though less elegant, continues to get votes periodically... – Rob I
Feb 10 '16 at 13:57
If you know it's going to be just two fields, you can skip the extra subprocesses like this:
var1=${STR%-*}
var2=${STR#*-}
What does this do? ${STR%-*} deletes the shortest substring of
$STR that matches the pattern -* starting from the end of the
string. ${STR#*-} does the same, but with the *- pattern and
starting from the beginning of the string. They each have counterparts %% and
## which find the longest anchored pattern match. If anyone has a
helpful mnemonic to remember which does which, let me know! I always have to try both to
remember.
Dunno about "absence of bashisms" considering that this is already moderately cryptic .... if
your delimiter is a newline instead of a hyphen, then it becomes even more cryptic. On the
other hand, it works with newlines , so there's that. – Steven Lu
May 1 '15 at 20:19
Mnemonic: "#" is to the left of "%" on a standard keyboard, so "#" removes a prefix (on the
left), and "%" removes a suffix (on the right). – DS.
Jan 13 '17 at 19:56
I used triplee's example and it worked exactly as advertised! Just change last two lines to
<pre> myvar1= echo $1 && myvar2= echo $2 </pre>
if you need to store them throughout a script with several "thrown" variables. –
Sigg3.net
Jun 19 '13 at 8:08
This is a really sweet solution if we need to write something that is not Bash specific. To
handle IFS troubles, one can add OLDIFS=$IFS at the beginning
before overwriting it, and then add IFS=$OLDIFS just after the set
line. – Daniel Andersson
Mar 27 '15 at 6:46
Suppose I have the string 1:2:3:4:5 and I want to get its last field (
5 in this case). How do I do that using Bash? I tried cut , but I
don't know how to specify the last field with -f .
While this is working for the given problem, the answer of William below ( stackoverflow.com/a/3163857/520162 )
also returns 5 if the string is 1:2:3:4:5: (while using the string
operators yields an empty result). This is especially handy when parsing paths that could
contain (or not) a finishing / character. – eckes
Jan 23 '13 at 15:23
And how does one keep the part before the last separator? Apparently by using
${foo%:*} . # - from beginning; % - from end.
# , % - shortest match; ## , %% - longest
match. – Mihai Danila
Jul 9 '14 at 14:07
This answer is nice because it uses 'cut', which the author is (presumably) already familiar.
Plus, I like this answer because I am using 'cut' and had this exact question, hence
finding this thread via search. – Dannid
Jan 14 '13 at 20:50
great advantage of this solution over the accepted answer: it also matches paths that contain
or do not contain a finishing / character: /a/b/c/d and
/a/b/c/d/ yield the same result ( d ) when processing pwd |
awk -F/ '{print $NF}' . The accepted answer results in an empty result in the case of
/a/b/c/d/ – eckes
Jan 23 '13 at 15:20
@eckes In case of AWK solution, on GNU bash, version 4.3.48(1)-release that's not true, as it
matters whenever you have trailing slash or not. Simply put AWK will use / as
delimiter, and if your path is /my/path/dir/ it will use value after last
delimiter, which is simply an empty string. So it's best to avoid trailing slash if you need
to do such a thing like I do. – stamster
May 21 at 11:52
This runs into problems if there is whitespace in any of the fields. Also, it does not
directly address the question of retrieving the last field. – chepner
Jun 22 '12 at 12:58
There was a solution involving setting Internal_field_separator (IFS) to
; . I am not sure what happened with that answer, how do you reset
IFS back to default?
RE: IFS solution, I tried this and it works, I keep the old IFS
and then restore it:
With regards to your "Edit2": You can simply "unset IFS" and it will return to the default
state. There's no need to save and restore it explicitly unless you have some reason to
expect that it's already been set to a non-default value. Moreover, if you're doing this
inside a function (and, if you aren't, why not?), you can set IFS as a local variable and it
will return to its previous value once you exit the function. – Brooks Moses
May 1 '12 at 1:26
@BrooksMoses: (a) +1 for using local IFS=... where possible; (b) -1 for
unset IFS , this doesn't exactly reset IFS to its default value, though I
believe an unset IFS behaves the same as the default value of IFS ($' \t\n'), however it
seems bad practice to be assuming blindly that your code will never be invoked with IFS set
to a custom value; (c) another idea is to invoke a subshell: (IFS=$custom; ...)
when the subshell exits IFS will return to whatever it was originally. – dubiousjim
May 31 '12 at 5:21
I just want to have a quick look at the paths to decide where to throw an executable, so I
resorted to run ruby -e "puts ENV.fetch('PATH').split(':')" . If you want to
stay pure bash won't help but using any scripting language that has a built-in split
is easier. – nicooga
Mar 7 '16 at 15:32
This is kind of a drive-by comment, but since the OP used email addresses as the example, has
anyone bothered to answer it in a way that is fully RFC 5322 compliant, namely that any
quoted string can appear before the @ which means you're going to need regular expressions or
some other kind of parser instead of naive use of IFS or other simplistic splitter functions.
– Jeff
Apr 22 at 17:51
You can set the internal field separator (IFS)
variable, and then let it parse into an array. When this happens in a command, then the
assignment to IFS only takes place to that single command's environment (to
read ). It then parses the input according to the IFS variable
value into an array, which we can then iterate over.
IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[@]}"; do
# process "$i"
done
It will parse one line of items separated by ; , pushing it into an array.
Stuff for processing whole of $IN , each time one line of input separated by
; :
while IFS=';' read -ra ADDR; do
for i in "${ADDR[@]}"; do
# process "$i"
done
done <<< "$IN"
This is probably the best way. How long will IFS persist in it's current value, can it mess
up my code by being set when it shouldn't be, and how can I reset it when I'm done with it?
– Chris
Lutz
May 28 '09 at 2:25
You can read everything at once without using a while loop: read -r -d '' -a addr
<<< "$in" # The -d '' is key here, it tells read not to stop at the first newline
(which is the default -d) but to continue until EOF or a NULL byte (which only occur in
binary data). – lhunath
May 28 '09 at 6:14
@LucaBorrione Setting IFS on the same line as the read with no
semicolon or other separator, as opposed to in a separate command, scopes it to that command
-- so it's always "restored"; you don't need to do anything manually. – Charles Duffy
Jul 6 '13 at 14:39
@imagineerThis There is a bug involving herestrings and local changes to IFS that requires
$IN to be quoted. The bug is fixed in bash 4.3. – chepner
Oct 2 '14 at 3:50
This construction replaces all occurrences of ';' (the initial
// means global replace) in the string IN with ' ' (a
single space), then interprets the space-delimited string as an array (that's what the
surrounding parentheses do).
The syntax used inside of the curly braces to replace each ';' character with
a ' ' character is called Parameter
Expansion .
There are some common gotchas:
If the original string has spaces, you will need to use
IFS :
IFS=':'; arrIN=($IN); unset IFS;
If the original string has spaces and the delimiter is a new line, you can set
IFS with:
I just want to add: this is the simplest of all, you can access array elements with
${arrIN[1]} (starting from zeros of course) – Oz123
Mar 21 '11 at 18:50
No, I don't think this works when there are also spaces present... it's converting the ',' to
' ' and then building a space-separated array. – Ethan
Apr 12 '13 at 22:47
This is a bad approach for other reasons: For instance, if your string contains
;*; , then the * will be expanded to a list of filenames in the
current directory. -1 – Charles Duffy
Jul 6 '13 at 14:39
You should have kept the IFS answer. It taught me something I didn't know, and it definitely
made an array, whereas this just makes a cheap substitute. – Chris Lutz
May 28 '09 at 2:42
I see. Yeah i find doing these silly experiments, i'm going to learn new things each time i'm
trying to answer things. I've edited stuff based on #bash IRC feedback and undeleted :)
– Johannes Schaub - litb
May 28 '09 at 2:59
-1, you're obviously not aware of wordsplitting, because it's introducing two bugs in your
code. one is when you don't quote $IN and the other is when you pretend a newline is the only
delimiter used in wordsplitting. You are iterating over every WORD in IN, not every line, and
DEFINATELY not every element delimited by a semicolon, though it may appear to have the
side-effect of looking like it works. – lhunath
May 28 '09 at 6:12
You could change it to echo "$IN" | tr ';' '\n' | while read -r ADDY; do # process "$ADDY";
done to make him lucky, i think :) Note that this will fork, and you can't change outer
variables from within the loop (that's why i used the <<< "$IN" syntax) then –
Johannes
Schaub - litb
May 28 '09 at 17:00
To summarize the debate in the comments: Caveats for general use : the shell applies
word splitting and expansions to the string, which may be undesired; just try
it with. IN="[email protected];[email protected];*;broken apart" . In short: this
approach will break, if your tokens contain embedded spaces and/or chars. such as
* that happen to make a token match filenames in the current folder. –
mklement0
Apr 24 '13 at 14:13
To this SO question, there is already a lot of different way to do this in bash . But bash has many
special features, so called bashism that work well, but that won't work in
any other shell .
In particular, arrays , associative array , and pattern
substitution are pure bashisms and may not work under other shells
.
On my Debian GNU/Linux , there is a standard shell called dash , but I know many
people who like to use ksh .
Finally, in very small situation, there is a special tool called busybox with his own shell
interpreter ( ash ).
But if you would write something usable under many shells, you have to not use
bashisms .
There is a syntax, used in many shells, for splitting a string across first or
last occurrence of a substring:
${var#*SubStr} # will drop begin of string up to first occur of `SubStr`
${var##*SubStr} # will drop begin of string up to last occur of `SubStr`
${var%SubStr*} # will drop part of string from last occur of `SubStr` to the end
${var%%SubStr*} # will drop part of string from first occur of `SubStr` to the end
(The missing of this is the main reason of my answer publication ;)
The # , ## , % , and %% substitutions
have what is IMO an easier explanation to remember (for how much they delete): #
and % delete the shortest possible matching string, and ## and
%% delete the longest possible. – Score_Under
Apr 28 '15 at 16:58
The IFS=\; read -a fields <<<"$var" fails on newlines and add a
trailing newline. The other solution removes a trailing empty field. – sorontar
Oct 26 '16 at 4:36
Could the last alternative be used with a list of field separators set somewhere else? For
instance, I mean to use this as a shell script, and pass a list of field separators as a
positional parameter. – sancho.s
Oct 4 at 3:42
I've seen a couple of answers referencing the cut command, but they've all been
deleted. It's a little odd that nobody has elaborated on that, because I think it's one of
the more useful commands for doing this type of thing, especially for parsing delimited log
files.
In the case of splitting this specific example into a bash script array, tr
is probably more efficient, but cut can be used, and is more effective if you
want to pull specific fields from the middle.
This approach will only work if you know the number of elements in advance; you'd need to
program some more logic around it. It also runs an external tool for every element. –
uli42
Sep 14 '17 at 8:30
Excatly waht i was looking for trying to avoid empty string in a csv. Now i can point the
exact 'column' value as well. Work with IFS already used in a loop. Better than expected for
my situation. – Louis Loudog Trottier
May 10 at 4:20
, May 28, 2009 at 10:31
How about this approach:
IN="[email protected];[email protected]"
set -- "$IN"
IFS=";"; declare -a Array=($*)
echo "${Array[@]}"
echo "${Array[0]}"
echo "${Array[1]}"
+1 Only a side note: shouldn't it be recommendable to keep the old IFS and then restore it?
(as shown by stefanB in his edit3) people landing here (sometimes just copying and pasting a
solution) might not think about this – Luca Borrione
Sep 3 '12 at 9:26
-1: First, @ata is right that most of the commands in this do nothing. Second, it uses
word-splitting to form the array, and doesn't do anything to inhibit glob-expansion when
doing so (so if you have glob characters in any of the array elements, those elements are
replaced with matching filenames). – Charles Duffy
Jul 6 '13 at 14:44
Suggest to use $'...' : IN=$'[email protected];[email protected];bet <d@\ns*
kl.com>' . Then echo "${Array[2]}" will print a string with newline.
set -- "$IN" is also neccessary in this case. Yes, to prevent glob expansion,
the solution should include set -f . – John_West
Jan 8 '16 at 12:29
-1 what if the string contains spaces? for example IN="this is first line; this
is second line" arrIN=( $( echo "$IN" | sed -e 's/;/\n/g' ) ) will produce an array of
8 elements in this case (an element for each word space separated), rather than 2 (an element
for each line semi colon separated) – Luca Borrione
Sep 3 '12 at 10:08
@Luca No the sed script creates exactly two lines. What creates the multiple entries for you
is when you put it into a bash array (which splits on white space by default) –
lothar
Sep 3 '12 at 17:33
That's exactly the point: the OP needs to store entries into an array to loop over it, as you
can see in his edits. I think your (good) answer missed to mention to use arrIN=( $(
echo "$IN" | sed -e 's/;/\n/g' ) ) to achieve that, and to advice to change IFS to
IFS=$'\n' for those who land here in the future and needs to split a string
containing spaces. (and to restore it back afterwards). :) – Luca Borrione
Sep 4 '12 at 7:09
You can use -s to avoid the mentioned problem: superuser.com/questions/896800/
"-f, --fields=LIST select only these fields; also print any line that contains no delimiter
character, unless the -s option is specified" – fersarr
Mar 3 '16 at 17:17
It worked in this scenario -> "echo "$SPLIT_0" | awk -F' inode=' '{print $1}'"! I had
problems when trying to use atrings (" inode=") instead of characters (";"). $ 1, $ 2, $ 3, $
4 are set as positions in an array! If there is a way of setting an array... better! Thanks!
– Eduardo Lucio
Aug 5 '15 at 12:59
@EduardoLucio, what I'm thinking about is maybe you can first replace your delimiter
inode= into ; for example by sed -i 's/inode\=/\;/g'
your_file_to_process , then define -F';' when apply awk ,
hope that can help you. – Tony
Aug 6 '15 at 2:42
This worked REALLY well for me... I used it to itterate over an array of strings which
contained comma separated DB,SERVER,PORT data to use mysqldump. – Nick
Oct 28 '11 at 14:36
Diagnosis: the IFS=";" assignment exists only in the $(...; echo
$IN) subshell; this is why some readers (including me) initially think it won't work.
I assumed that all of $IN was getting slurped up by ADDR1. But nickjb is correct; it does
work. The reason is that echo $IN command parses its arguments using the current
value of $IFS, but then echoes them to stdout using a space delimiter, regardless of the
setting of $IFS. So the net effect is as though one had called read ADDR1 ADDR2
<<< "[email protected][email protected]" (note the input is space-separated not
;-separated). – dubiousjim
May 31 '12 at 5:28
$ in=$'one;two three;*;there is\na newline\nin this field'
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two three" [2]="*" [3]="there is
a newline
in this field")'
The trick for this to work is to use the -d option of read
(delimiter) with an empty delimiter, so that read is forced to read everything
it's fed. And we feed read with exactly the content of the variable
in , with no trailing newline thanks to printf . Note that's we're
also putting the delimiter in printf to ensure that the string passed to
read has a trailing delimiter. Without it, read would trim
potential trailing empty fields:
$ in='one;two;three;' # there's an empty field
$ IFS=';' read -d '' -ra array < <(printf '%s;\0' "$in")
$ declare -p array
declare -a array='([0]="one" [1]="two" [2]="three" [3]="")'
the trailing empty field is preserved.
Update for Bash≥4.4
Since Bash 4.4, the builtin mapfile (aka readarray ) supports
the -d option to specify a delimiter. Hence another canonical way is:
I found it as the rare solution on that list that works correctly with \n ,
spaces and * simultaneously. Also, no loops; array variable is accessible in the
shell after execution (contrary to the highest upvoted answer). Note, in=$'...'
, it does not work with double quotes. I think, it needs more upvotes. – John_West
Jan 8 '16 at 12:10
Consider using read -r ... to ensure that, for example, the two characters "\t"
in the input end up as the same two characters in your variables (instead of a single tab
char). – dubiousjim
May 31 '12 at 5:36
This is probably due to a bug involving IFS and here strings that was fixed in
bash 4.3. Quoting $IN should fix it. (In theory, $IN
is not subject to word splitting or globbing after it expands, meaning the quotes should be
unnecessary. Even in 4.3, though, there's at least one bug remaining--reported and scheduled
to be fixed--so quoting remains a good idea.) – chepner
Sep 19 '15 at 13:59
The following Bash/zsh function splits its first argument on the delimiter given by the
second argument:
split() {
local string="$1"
local delimiter="$2"
if [ -n "$string" ]; then
local part
while read -d "$delimiter" part; do
echo $part
done <<< "$string"
echo $part
fi
}
For instance, the command
$ split 'a;b;c' ';'
yields
a
b
c
This output may, for instance, be piped to other commands. Example:
$ split 'a;b;c' ';' | cat -n
1 a
2 b
3 c
Compared to the other solutions given, this one has the following advantages:
IFS is not overriden: Due to dynamic scoping of even local variables,
overriding IFS over a loop causes the new value to leak into function calls
performed from within the loop.
Arrays are not used: Reading a string into an array using read requires
the flag -a in Bash and -A in zsh.
If desired, the function may be put into a script as follows:
There are some cool answers here (errator esp.), but for something analogous to split in
other languages -- which is what I took the original question to mean -- I settled on this:
Now ${a[0]} , ${a[1]} , etc, are as you would expect. Use
${#a[*]} for number of terms. Or to iterate, of course:
for i in ${a[*]}; do echo $i; done
IMPORTANT NOTE:
This works in cases where there are no spaces to worry about, which solved my problem, but
may not solve yours. Go with the $IFS solution(s) in that case.
Better use ${IN//;/ } (double slash) to make it also work with more than two
values. Beware that any wildcard ( *?[ ) will be expanded. And a trailing empty
field will be discarded. – sorontar
Oct 26 '16 at 5:14
Better use set -- $IN to avoid some issues with "$IN" starting with dash. Still,
the unquoted expansion of $IN will expand wildcards ( *?[ ).
– sorontar
Oct 26 '16 at 5:17
In both cases a sub-list can be composed within the loop is persistent after the loop has
completed. This is useful when manipulating lists in memory, instead storing lists in files.
{p.s. keep calm and carry on B-) }
Fails if any part of $PATH contains spaces (or newlines). Also expands wildcards (asterisk *,
question mark ? and braces [ ]). – sorontar
Oct 26 '16 at 5:08
FYI, /etc/os-release and /etc/lsb-release are meant to be sourced,
and not parsed. So your method is really wrong. Moreover, you're not quite answering the
question about spiltting a string on a delimiter. – gniourf_gniourf
Jan 30 '17 at 8:26
-1 this doesn't work here (ubuntu 12.04). it prints only the first echo with all $IN value in
it, while the second is empty. you can see it if you put echo "0: "${ADDRS[0]}\n echo "1:
"${ADDRS[1]} the output is 0: [email protected];[email protected]\n 1: (\n is new line)
– Luca
Borrione
Sep 3 '12 at 10:04
-1, 1. IFS isn't being set in that subshell (it's being passed to the environment of "echo",
which is a builtin, so nothing is happening anyway). 2. $IN is quoted so it
isn't subject to IFS splitting. 3. The process substitution is split by whitespace, but this
may corrupt the original data. – Score_Under
Apr 28 '15 at 17:09
IN='[email protected];[email protected];Charlie Brown <[email protected];!"#$%&/()[]{}*? are no problem;simple is beautiful :-)'
set -f
oldifs="$IFS"
IFS=';'; arrayIN=($IN)
IFS="$oldifs"
for i in "${arrayIN[@]}"; do
echo "$i"
done
set +f
Explanation: Simple assignment using parenthesis () converts semicolon separated list into
an array provided you have correct IFS while doing that. Standard FOR loop handles individual
items in that array as usual. Notice that the list given for IN variable must be "hard"
quoted, that is, with single ticks.
IFS must be saved and restored since Bash does not treat an assignment the same way as a
command. An alternate workaround is to wrap the assignment inside a function and call that
function with a modified IFS. In that case separate saving/restoring of IFS is not needed.
Thanks for "Bize" for pointing that out.
!"#$%&/()[]{}*? are no problem well... not quite: []*? are glob
characters. So what about creating this directory and file: `mkdir '!"#$%&'; touch
'!"#$%&/()[]{} got you hahahaha - are no problem' and running your command? simple may be
beautiful, but when it's broken, it's broken. – gniourf_gniourf
Feb 20 '15 at 16:45
@ajaaskel you didn't fully understand my comment. Go in a scratch directory and issue these
commands: mkdir '!"#$%&'; touch '!"#$%&/()[]{} got you hahahaha - are no
problem' . They will only create a directory and a file, with weird looking names, I
must admit. Then run your commands with the exact IN you gave:
IN='[email protected];[email protected];Charlie Brown <[email protected];!"#$%&/()[]{}*?
are no problem;simple is beautiful :-)' . You'll see that you won't get the output you
expect. Because you're using a method subject to pathname expansions to split your string.
– gniourf_gniourf
Feb 25 '15 at 7:26
This is to demonstrate that the characters * , ? ,
[...] and even, if extglob is set, !(...) ,
@(...) , ?(...) , +(...)are problems with this
method! – gniourf_gniourf
Feb 25 '15 at 7:29
@gniourf_gniourf Thanks for detailed comments on globbing. I adjusted the code to have
globbing off. My point was however just to show that rather simple assignment can do the
splitting job. – ajaaskel
Feb 26 '15 at 15:26
> , Dec 19, 2013 at 21:39
Maybe not the most elegant solution, but works with * and spaces:
IN="bla@so me.com;*;[email protected]"
for i in `delims=${IN//[^;]}; seq 1 $((${#delims} + 1))`
do
echo "> [`echo $IN | cut -d';' -f$i`]"
done
Basically it removes every character other than ; making delims
eg. ;;; . Then it does for loop from 1 to
number-of-delimiters as counted by ${#delims} . The final step is
to safely get the $i th part using cut .
I normally compress using tar zcvf and decompress using tar zxvf
(using gzip due to habit).
I've recently gotten a quad core CPU with hyperthreading, so I have 8 logical cores, and I
notice that many of the cores are unused during compression/decompression.
Is there any way I can utilize the unused cores to make it faster?
The solution proposed by Xiong Chiamiov above works beautifully. I had just backed up my
laptop with .tar.bz2 and it took 132 minutes using only one cpu thread. Then I compiled and
installed tar from source: gnu.org/software/tar I included the options mentioned
in the configure step: ./configure --with-gzip=pigz --with-bzip2=lbzip2 --with-lzip=plzip I
ran the backup again and it took only 32 minutes. That's better than 4X improvement! I
watched the system monitor and it kept all 4 cpus (8 threads) flatlined at 100% the whole
time. THAT is the best solution. – Warren Severin
Nov 13 '17 at 4:37
You can use pigz instead of gzip, which
does gzip compression on multiple cores. Instead of using the -z option, you would pipe it
through pigz:
tar cf - paths-to-archive | pigz > archive.tar.gz
By default, pigz uses the number of available cores, or eight if it could not query that.
You can ask for more with -p n, e.g. -p 32. pigz has the same options as gzip, so you can
request better compression with -9. E.g.
pigz does use multiple cores for decompression, but only with limited improvement over a
single core. The deflate format does not lend itself to parallel decompression. The
decompression portion must be done serially. The other cores for pigz decompression are used
for reading, writing, and calculating the CRC. When compressing on the other hand, pigz gets
close to a factor of n improvement with n cores. – Mark Adler
Feb 20 '13 at 16:18
There is effectively no CPU time spent tarring, so it wouldn't help much. The tar format is
just a copy of the input file with header blocks in between files. – Mark Adler
Apr 23 '15 at 5:23
This is an awesome little nugget of knowledge and deserves more upvotes. I had no idea this
option even existed and I've read the man page a few times over the years. – ranman
Nov 13 '13 at 10:01
Unfortunately by doing so the concurrent feature of pigz is lost. You can see for yourself by
executing that command and monitoring the load on each of the cores. – Valerio
Schiavoni
Aug 5 '14 at 22:38
I prefer tar - dir_to_zip | pv | pigz > tar.file pv helps me estimate, you
can skip it. But still it easier to write and remember. – Offenso
Jan 11 '17 at 17:26
-I, --use-compress-program PROG
filter through PROG (must accept -d)
You can use multithread version of archiver or compressor utility.
Most popular multithread archivers are pigz (instead of gzip) and pbzip2 (instead of bzip2). For instance:
$ tar -I pbzip2 -cf OUTPUT_FILE.tar.bz2 paths_to_archive
$ tar --use-compress-program=pigz -cf OUTPUT_FILE.tar.gz paths_to_archive
Archiver must accept -d. If your replacement utility hasn't this parameter and/or you need
specify additional parameters, then use pipes (add parameters if necessary):
$ tar cf - paths_to_archive | pbzip2 > OUTPUT_FILE.tar.gz
$ tar cf - paths_to_archive | pigz > OUTPUT_FILE.tar.gz
Input and output of singlethread and multithread are compatible. You can compress using
multithread version and decompress using singlethread version and vice versa.
p7zip
For p7zip for compression you need a small shell script like the following:
#!/bin/sh
case $1 in
-d) 7za -txz -si -so e;;
*) 7za -txz -si -so a .;;
esac 2>/dev/null
Save it as 7zhelper.sh. Here the example of usage:
$ tar -I 7zhelper.sh -cf OUTPUT_FILE.tar.7z paths_to_archive
$ tar -I 7zhelper.sh -xf OUTPUT_FILE.tar.7z
xz
Regarding multithreaded XZ support. If you are running version 5.2.0 or above of XZ Utils,
you can utilize multiple cores for compression by setting -T or
--threads to an appropriate value via the environmental variable XZ_DEFAULTS
(e.g. XZ_DEFAULTS="-T 0" ).
This is a fragment of man for 5.1.0alpha version:
Multithreaded compression and decompression are not implemented yet, so this option has
no effect for now.
However this will not work for decompression of files that haven't also been compressed
with threading enabled. From man for version 5.2.2:
Threaded decompression hasn't been implemented yet. It will only work on files that
contain multiple blocks with size information in block headers. All files compressed in
multi-threaded mode meet this condition, but files compressed in single-threaded mode don't
even if --block-size=size is used.
Recompiling with replacement
If you build tar from sources, then you can recompile with parameters
After recompiling tar with these options you can check the output of tar's help:
$ tar --help | grep "lbzip2\|plzip\|pigz"
-j, --bzip2 filter the archive through lbzip2
--lzip filter the archive through plzip
-z, --gzip, --gunzip, --ungzip filter the archive through pigz
> , Apr 28, 2015 at 20:41
This is indeed the best answer. I'll definitely rebuild my tar! – user1985657
Apr 28 '15 at 20:41
I just found pbzip2 and
mpibzip2 . mpibzip2 looks very
promising for clusters or if you have a laptop and a multicore desktop computer for instance.
– user1985657
Apr 28 '15 at 20:57
This is a great and elaborate answer. It may be good to mention that multithreaded
compression (e.g. with pigz ) is only enabled when it reads from the file.
Processing STDIN may in fact be slower. – oᴉɹǝɥɔ
Jun 10 '15 at 17:39
find /my/path/ -type f -name "*.sql" -o -name "*.log" -exec
This command will look for the files you want to archive, in this case
/my/path/*.sql and /my/path/*.log . Add as many -o -name
"pattern" as you want.
-exec will execute the next command using the results of find :
tar
Step 2: tar
tar -P --transform='s@/my/path/@@g' -cf - {} +
--transform is a simple string replacement parameter. It will strip the path
of the files from the archive so the tarball's root becomes the current directory when
extracting. Note that you can't use -C option to change directory as you'll lose
benefits of find : all files of the directory would be included.
-P tells tar to use absolute paths, so it doesn't trigger the
warning "Removing leading `/' from member names". Leading '/' with be removed by
--transform anyway.
-cf - tells tar to use the tarball name we'll specify later
{} + uses everyfiles that find found previously
Step 3:
pigz
pigz -9 -p 4
Use as many parameters as you want. In this case -9 is the compression level
and -p 4 is the number of cores dedicated to compression. If you run this on a
heavy loaded webserver, you probably don't want to use all available cores.
I would like to find all the matches of the text I have in one file ('file1.txt')
that are found in another file ('file2.txt') using the grep option -f, that tells to read the
expressions to be found from file.
'file1.txt'
a
a
'file2.txt'
a
When I run the command:
grep -f file1.txt file2.txt -w
I get only once the output of the 'a'. instead I would like to get it twice, because it
occurs twice in my 'file1.txt' file. Is there a way to let grep (or any other unix/linux)
tool to output a match for each line it reads? Thanks in advance. Arturo
I understand that, but still I would like to find a way to print a match each time a pattern
(even a repeated one) from 'pattern.txt' is found in 'file.txt'. Even a tool or a script
rather then 'grep -f' would suffice. – Arturo
Mar 24 '17 at 9:17
To change two vertically split windows to horizontal split: Ctrl - WtCtrl - WK
Horizontally to vertically: Ctrl - WtCtrl - WH
Explanations:
Ctrl - W t -- makes the first (topleft) window current
Ctrl - W K -- moves the current window to full-width at the very
top
Ctrl - W H -- moves the current window to full-height at far
left
Note that the t is lowercase, and the K and H are uppercase.
Also, with only two windows, it seems like you can drop the Ctrl - Wt part because if you're already in one of only two windows, what's the point of
making it current?
Just toggle your NERDTree panel closed before 'rotating' the splits, then toggle it
back open. :NERDTreeToggle (I have it mapped to a function key for convenience).
The command ^W-o is great! I did not know it. –
Masi Aug 13 '09 at 2:20
add a comment | up vote 6
down vote The following ex commands will (re-)split
any number of windows:
To split vertically (e.g. make vertical dividers between windows), type :vertical
ball
To split horizontally, type :ball
If there are hidden buffers, issuing these commands will also make the hidden buffers
visible.
This is very ugly, but hey, it seems to do in one step exactly what I asked for (I tried). +1, and accepted. I was looking for
a native way to do this quickly but since there does not seem to be one, yours will do just fine. Thanks! �
greg0ireJan 23
'13 at 15:27
You're right, "very ugly" shoud have been "very unfamiliar". Your command is very handy, and I think I definitely going to carve
it in my .vimrc � greg0ireJan 23
'13 at 16:21
By "move a piece of text to a new file" I assume you mean cut that piece of text from the current file and create a new file containing
only that text.
Various examples:
:1,1 w new_file to create a new file containing only the text from line number 1
:5,50 w newfile to create a new file containing the text from line 5 to line 50
:'a,'b w newfile to create a new file containing the text from mark a to mark b
set your marks by using ma and mb where ever you like
The above only copies the text and creates a new file containing that text. You will then need to delete afterward.
This can be done using the same range and the d command:
:5,50 d to delete the text from line 5 to line 50
:'a,'b d to delete the text from mark a to mark b
Or by using dd for the single line case.
If you instead select the text using visual mode, and then hit : while the text is selected, you will see the
following on the command line:
:'<,'>
Which indicates the selected text. You can then expand the command to:
:'<,'>w >> old_file
Which will append the text to an existing file. Then delete as above.
One liner:
:2,3 d | new +put! "
The breakdown:
:2,3 d - delete lines 2 through 3
| - technically this redirects the output of the first command to the second command but since the first command
doesn't output anything, we're just chaining the commands together
new - opens a new buffer
+put! " - put the contents of the unnamed register ( " ) into the buffer
The bang ( ! ) is there so that the contents are put before the current line. This causes and
empty line at the end of the file. Without it, there is an empty line at the top of the file.
Your assumption is right. This looks good, I'm going to test. Could you explain 2. a bit more? I'm not very familiar with ranges.
EDIT: If I try this on the second line, it writes the first line to the other file, not the second line. �
greg0ireJan 23
'13 at 14:09
Ok, if I understand well, the trick is to use ranges to select and write in the same command. That's very similar to what I did.
+1 for the detailed explanation, but I don't think this is more efficient, since the trick with hitting ':' is what I do for the
moment. � greg0ireJan 23
'13 at 14:41
I have 4 steps for the moment: select, write, select, delete. With your method, I have 6 steps: select, delete, split, paste,
write, close. I asked for something more efficient :P � greg0ireJan 23
'13 at 13:42
That's better, but 5 still > 4 :P � greg0ireJan 23
'13 at 13:46
Based on @embedded.kyle's answer and this Q&A , I ended
up with this one liner to append a selection to a file and delete from current file. After selecting some lines with Shift+V
, hit : and run:
'<,'>w >> test | normal gvd
The first part appends selected lines. The second command enters normal mode and runs gvd to select the last selection
and then deletes.
I want to clean this up but I am worried because of the symlinks, which point to another
drive.
If I say rm -rf /home3 will it delete the other drive?
John Sui
rm -rf /home3 will delete all files and directory within home3 and
home3 itself, which include symlink files, but will not "follow"(de-reference)
those symlink.
Put it in another words, those symlink-files will be deleted. The files they
"point"/"link" to will not be touch.
$ ls -l
total 899166
drwxr-xr-x 12 me scicomp 324 Jan 24 13:47 data
-rw-r--r-- 1 me scicomp 84188 Jan 24 13:47 lod-thin-1.000000-0.010000-0.030000.rda
drwxr-xr-x 2 me scicomp 808 Jan 24 13:47 log
lrwxrwxrwx 1 me scicomp 17 Jan 25 09:41 msg -> /home/me/msg
And I want to remove it using rm -r .
However I'm scared rm -r will follow the symlink and delete everything in
that directory (which is very bad).
I can't find anything about this in the man pages. What would be the exact behavior of
running rm -rf from a directory above this one?
@frnknstn You are right. I see the same behaviour you mention on my latest Debian system. I
don't remember on which version of Debian I performed the earlier experiments. In my earlier
experiments on an older version of Debian, either a.txt must have survived in the third
example or I must have made an error in my experiment. I have updated the answer with the
current behaviour I observe on Debian 9 and this behaviour is consistent with what you
mention. – Susam
Pal
Sep 11 '17 at 15:20
Your /home/me/msg directory will be safe if you rm -rf the directory from which you ran ls.
Only the symlink itself will be removed, not the directory it points to.
The only thing I would be cautious of, would be if you called something like "rm -rf msg/"
(with the trailing slash.) Do not do that because it will remove the directory that msg
points to, rather than the msg symlink itself.
> ,Jan 25, 2012 at 16:54
"The only thing I would be cautious of, would be if you called something like "rm -rf msg/"
(with the trailing slash.) Do not do that because it will remove the directory that msg
points to, rather than the msg symlink itself." - I don't find this to be true. See the third
example in my response below. – Susam Pal
Jan 25 '12 at 16:54
I get the same result as @Susam ('rm -r symlink/' does not delete the target of symlink),
which I am pleased about as it would be a very easy mistake to make. – Andrew Crabb
Nov 26 '13 at 21:52
,
rm should remove files and directories. If the file is symbolic link, link is
removed, not the target. It will not interpret a symbolic link. For example what should be
the behavior when deleting 'broken links'- rm exits with 0 not with non-zero to indicate
failure
You are talking about text selecting and copying, I think that you should give a look to the
Vim Visual Mode .
In the visual mode, you are able to select text using Vim commands, then you can do
whatever you want with the selection.
Consider the following common scenarios:
You need to select to the next matching parenthesis.
You could do:
v% if the cursor is on the starting/ending parenthesis
vib if the cursor is inside the parenthesis block
You want to select text between quotes:
vi" for double quotes
vi' for single quotes
You want to select a curly brace block (very common on C-style languages):
viB
vi{
You want to select the entire file:
ggVG
Visual
block selection is another really useful feature, it allows you to select a rectangular
area of text, you just have to press Ctrl - V to start it, and then
select the text block you want and perform any type of operation such as yank, delete, paste,
edit, etc. It's great to edit column oriented text.
I have two files, say a.txt and b.txt , in the same session of vim
and I split the screen so I have file a.txt in the upper window and
b.txt in the lower window.
I want to move lines here and there from a.txt to b.txt : I
select a line with Shift + v , then I move to b.txt in the
lower window with Ctrl + w↓ , paste with p
, get back to a.txt with Ctrl + w↑ and I
can repeat the operation when I get to another line I want to move.
My question: is there a quicker way to say vim "send the line I am on (or the test I
selected) to the other window" ?
I presume that you're deleting the line that you've selected in a.txt . If not,
you'd be pasting something else into b.txt . If so, there's no need to select
the line first. – Anthony Geoghegan
Nov 24 '15 at 13:00
This sounds like a good use case for a macro. Macros are commands that can be recorded and
stored in a Vim register. Each register is identified by a letter from a to z.
Recording
To start recording, press q in Normal mode followed by a letter (a to z).
That starts recording keystrokes to the specified register. Vim displays
"recording" in the status line. Type any Normal mode commands, or enter Insert
mode and type text. To stop recording, again press q while in Normal mode.
For this particular macro, I chose the m (for move) register to store it.
I pressed qm to record the following commands:
dd to delete the current line (and save it to the default register)
CtrlWj to move to the window below
p to paste the contents of the default register
and CtrlWk to return to the window above.
When I typed q to finish recording the macro, the contents of the
m register were:
dd^Wjp^Wk
Usage
To move the current line, simply type @m in Normal mode.
To repeat the macro on a different line, @@ can be used to execute the most
recently used macro.
To execute the macro 5 times (i.e., move the current line with the following four lines
below it), use 5@m or 5@@ .
I asked to see if there is a command unknown to me that does the job: it seems there is none.
In absence of such a command, this can be a good solution. – brad
Nov 24 '15 at 14:26
@brad, you can find all the commands available to you in the documentation. If it's not there
it doesn't exist no need to ask random strangers. – romainl
Nov 26 '15 at 9:54
@romainl, yes, I know this but vim documentation is really huge and, although it doesn't
scare me, there is always the possibility to miss something. Moreover, it could also be that
you can obtain the effect using the combination of 2 commands and in this case it would be
hardly documented – brad
Nov 26 '15 at 10:17
I normally work with more than 5 files at a time. I use buffers to open different files. I
use commands such as :buf file1, :buf file2 etc. Is there a faster way to move to different
files?
Below I describe some excerpts from sections of my .vimrc . It includes mapping
the leader key, setting wilds tab completion, and finally my buffer nav key choices (all
mostly inspired by folks on the interweb, including romainl). Edit: Then I ramble on about my
shortcuts for windows and tabs.
" easier default keys {{{1
let mapleader=','
nnoremap <leader>2 :@"<CR>
The leader key is a prefix key for mostly user-defined key commands (some
plugins also use it). The default is \ , but many people suggest the easier to
reach , .
The second line there is a command to @ execute from the "
clipboard, in case you'd like to quickly try out various key bindings (without relying on
:so % ). (My nmeumonic is that Shift - 2 is @
.)
" wilds {{{1
set wildmenu wildmode=list:full
set wildcharm=<C-z>
set wildignore+=*~ wildignorecase
For built-in completion, wildmenu is probably the part that shows up yellow
on your Vim when using tab completion on command-line. wildmode is set to a
comma-separated list, each coming up in turn on each tab completion (that is, my list is
simply one element, list:full ). list shows rows and columns of
candidates. full 's meaning includes maintaining existence of the
wildmenu . wildcharm is the way to include Tab presses
in your macros. The *~ is for my use in :edit and
:find commands.
The ,3 is for switching between the "two" last buffers (Easier to reach than
built-in Ctrl - 6 ). Nmeuonic is Shift - 3 is
# , and # is the register symbol for last buffer. (See
:marks .)
,bh is to select from hidden buffers ( ! ).
,bw is to bwipeout buffers by number or name. For instance, you
can wipeout several while looking at the list, with ,bw 1 3 4 8 10 <CR> .
Note that wipeout is more destructive than :bdelete . They have their pros and
cons. For instance, :bdelete leaves the buffer in the hidden list, while
:bwipeout removes global marks (see :help marks , and the
description of uppercase marks).
I haven't settled on these keybindings, I would sort of prefer that my ,bb
was simply ,b (simply defining while leaving the others defined makes Vim pause
to see if you'll enter more).
Those shortcuts for :BufExplorer are actually the defaults for that plugin,
but I have it written out so I can change them if I want to start using ,b
without a hang.
You didn't ask for this:
If you still find Vim buffers a little awkward to use, try to combine the functionality
with tabs and windows (until you get more comfortable?).
Notice how nice ,w is for a prefix. Also, I reserve Ctrl key for
resizing, because Alt ( M- ) is hard to realize in all
environments, and I don't have a better way to resize. I'm fine using ,w to
switch windows.
" tabs {{{3
nnoremap <leader>t :tab
nnoremap <M-n> :tabn<cr>
nnoremap <M-p> :tabp<cr>
nnoremap <C-Tab> :tabn<cr>
nnoremap <C-S-Tab> :tabp<cr>
nnoremap tn :tabe<CR>
nnoremap te :tabe<Space><C-z><S-Tab>
nnoremap tf :tabf<Space>
nnoremap tc :tabc<CR>
nnoremap to :tabo<CR>
nnoremap tm :tabm<CR>
nnoremap ts :tabs<CR>
nnoremap th :tabr<CR>
nnoremap tj :tabn<CR>
nnoremap tk :tabp<CR>
nnoremap tl :tabl<CR>
" or, it may make more sense to use
" nnoremap th :tabp<CR>
" nnoremap tj :tabl<CR>
" nnoremap tk :tabr<CR>
" nnoremap tl :tabn<CR>
In summary of my window and tabs keys, I can navigate both of them with Alt ,
which is actually pretty easy to reach. In other words:
" (modifier) key choice explanation {{{3
"
" KEYS CTRL ALT
" hjkl resize windows switch windows
" np switch buffer switch tab
"
" (resize windows is hard to do otherwise, so we use ctrl which works across
" more environments. i can use ',w' for windowcmds o.w.. alt is comfortable
" enough for fast and gui nav in tabs and windows. we use np for navs that
" are more linear, hjkl for navs that are more planar.)
"
This way, if the Alt is working, you can actually hold it down while you find
your "open" buffer pretty quickly, amongst the tabs and windows.
,
There are many ways to solve. The best is the best that WORKS for YOU. You have lots of fuzzy
match plugins that help you navigate. The 2 things that impress me most are
I can't believe you want to turn recording off! I would show a really annoying popup 'Are you
sure?' if one asks to turn it off (or probably would like to give options like the Windows 10
update gives). – 0xc0de
Aug 12 '16 at 9:04
As seen other places, it's q followed by a register. A really cool (and possibly
non-intuitive) part of this is that these are the same registers used by things like
delete, yank, and put. This means that you can yank text from the editor into a register,
then execute it as a command. – Cascabel
Oct 6 '09 at 20:13
One more thing to note is you can hit any number before the @ to replay the recording that
many times like (100@<letter>) will play your actions 100 times – Tolga E
Aug 17 '13 at 3:07
You could add it afterward, by editing the register with put/yank. But I don't know why you'd
want to turn recording on or off as part of a macro. ('q' doesn't affect anything when typed
in insert mode.) – anisoptera
Dec 4 '14 at 9:43
*q* *recording*
q{0-9a-zA-Z"} Record typed characters into register {0-9a-zA-Z"}
(uppercase to append). The 'q' command is disabled
while executing a register, and it doesn't work inside
a mapping. {Vi: no recording}
q Stops recording. (Implementation note: The 'q' that
stops recording is not stored in the register, unless
it was the result of a mapping) {Vi: no recording}
*@*
@{0-9a-z".=*} Execute the contents of register {0-9a-z".=*} [count]
times. Note that register '%' (name of the current
file) and '#' (name of the alternate file) cannot be
used. For "@=" you are prompted to enter an
expression. The result of the expression is then
executed. See also |@:|. {Vi: only named registers}
only answer about "how to turn off" part of the question. Well, it makes recording
inaccessible, effectively turning it off - at least noone expects vi to have a separate
thread for this code, I guess, including me. – n611x007
Oct 4 '15 at 7:16
Actually, it's q{0-9a-zA-Z"} - you can record a macro into any register (named by digit,
letter, "). In case you actually want to use it... you execute the contents of a register
with @<register>. See :help q and :help @ if you're
interested in using it. – Cascabel
Oct 6 '09 at 20:08
And you can select the lines in visual mode, then press : to get :'<,'> (equivalent to the :1,3
part in your answer), and add mo N . If you want to move a single line, just :mo N . If you are really
lazy, you can omit the space (e.g. :mo5 ). Use marks with mo '{a-zA-Z} . �
J�da Ron�n
Jan 18 '17 at 21:20
The NERD tree allows you to explore your filesystem and to open files and directories. It presents the filesystem to you in
the form of a tree which you manipulate with the keyboard and/or mouse. It also allows you to perform simple filesystem operations.
The tree can be toggled easily with :NERDTreeToggle which can be mapped to a more suitable key. The keyboard shortcuts in the
NERD tree are also easy and intuitive.
For those of us not wanting to follow every link to find out about each plugin, care to furnish us with a brief synopsis? �
SpoonMeiserSep 17 '08
at 19:32
Pathogen is the FIRST plugin you have to install on every Vim installation! It resolves the plugin management problems every Vim
developer has. � Patrizio RulloSep 26
'11 at 12:11
A more recent alternative to this is Tagbar , which appears
to have some improvements over Taglist. This blog
post offers a comparison between the two plugins. �
mindthiefJun 27
'12 at 20:53
A very nice grep replacement for GVim is Ack . A search plugin written
in Perl that beats Vim's internal grep implementation and externally invoked greps, too. It also by default skips any CVS directories
in the project directory, e.g. '.svn'.
This blog shows a way to integrate Ack with vim.
A.vim is a great little plugin. It allows you
to quickly switch between header and source files with a single command. The default is :A , but I remapped it to
F2 reduce keystrokes.
I really like the SuperTab plugin, it allows
you to use the tab key to do all your insert completions.
community wiki Greg Hewgill, Aug 25, 2008
at 19:23
I have recently started using a plugin that highlights differences in your buffer from a previous version in your RCS system (Subversion,
git, whatever). You just need to press a key to toggle the diff display on/off. You can find it here:
http://github.com/ghewgill/vim-scmdiff . Patches welcome!
It doesn't explicitly support bitkeeper at the moment, but as long as bitkeeper has a "diff" command that outputs a normal patch
file, it should be easy enough to add. � Greg HewgillSep 16 '08
at 9:26
@Yogesh: No, it doesn't support ClearCase at this time. However, if you can add ClearCase support, a patch would certainly be
accepted. � Greg HewgillMar 10
'10 at 1:39
Elegant (mini) buffer explorer - This
is the multiple file/buffer manager I use. Takes very little screen space. It looks just like most IDEs where you have a top
tab-bar with the files you've opened. I've tested some other similar plugins before, and this is my pick.
TagList - Small file explorer, without
the "extra" stuff the other file explorers have. Just lets you browse directories and open files with the "enter" key. Note
that this has already been noted by
previouscommenters
to your questions.
SuperTab - Already noted by
WMR in this
post, looks very promising. It's an auto-completion replacement key for Ctrl-P.
Moria color scheme - Another good, dark
one. Note that it's gVim only.
Enahcned Python syntax - If you're using
Python, this is an enhanced syntax version. Works better than the original. I'm not sure, but this might be already included
in the newest version. Nonetheless, it's worth adding to your syntax folder if you need it.
Not a plugin, but I advise any Mac user to switch to the MacVim
distribution which is vastly superior to the official port.
As for plugins, I used VIM-LaTeX for my thesis and was very
satisfied with the usability boost. I also like the Taglist
plugin which makes use of the ctags library.
clang complete - the best c++ code completion
I have seen so far. By using an actual compiler (that would be clang) the plugin is able to complete complex expressions including
STL and smart pointers.
With version 7.3, undo branches was added to vim. A very powerful feature, but hard to use, until
Steve Losh made
Gundo which makes this feature possible to use with a ascii
representation of the tree and a diff of the change. A must for using undo branches.
My latest favourite is Command-T . Granted, to install it
you need to have Ruby support and you'll need to compile a C extension for Vim. But oy-yoy-yoy does this plugin make a difference
in opening files in Vim!
Definitely! Let not the ruby + c compiling stop you, you will be amazed on how well this plugin enhances your toolset. I have
been ignoring this plugin for too long, installed it today and already find myself using NERDTree lesser and lesser. �
Victor FarazdagiApr 19
'11 at 19:16
just my 2 cents.. being a naive user of both plugins, with a few first characters of file name i saw a much better result with
commandt plugin and a lots of false positives for ctrlp. �
FUDDec
26 '12 at 4:48
Conque Shell : Run interactive commands inside a Vim buffer
Conque is a Vim plugin which allows you to run interactive programs, such as bash on linux or powershell.exe on Windows, inside
a Vim buffer. In other words it is a terminal emulator which uses a Vim buffer to display the program output.
The vcscommand plugin provides global ex commands
for manipulating version-controlled source files and it supports CVS,SVN and some other repositories.
You can do almost all repository related tasks from with in vim:
* Taking the diff of current buffer with repository copy
* Adding new files
* Reverting the current buffer to the repository copy by nullifying the local changes....
Just gonna name a few I didn't see here, but which I still find extremely helpful:
Gist plugin - Github Gists (Kind of
Githubs answer to Pastebin, integrated with Git for awesomeness!)
Mustang color scheme (Can't link directly due to low reputation, Google it!) - Dark, and beautiful color scheme. Looks
really good in the terminal, and even better in gVim! (Due to 256 color support)
One Plugin that is missing in the answers is NERDCommenter
, which let's you do almost anything with comments. For example {add, toggle, remove} comments. And more. See
this blog entry for some examples.
This script is based on the eclipse Task List. It will search the file for FIXME, TODO, and XXX (or a custom list) and put
them in a handy list for you to browse which at the same time will update the location in the document so you can see exactly
where the tag is located. Something like an interactive 'cw'
I really love the snippetsEmu Plugin. It emulates
some of the behaviour of Snippets from the OS X editor TextMate, in particular the variable bouncing and replacement behaviour.
For vim I like a little help with completions.
Vim has tons of completion modes, but really, I just want vim to complete anything it can, whenver it can.
I hate typing ending quotes, but fortunately
this plugin obviates the need for such misery.
Those two are my heavy hitters.
This one may step up to roam my code like
an unquiet shade, but I've yet to try it.
The Txtfmt plugin gives you a sort of "rich text" highlighting capability, similar to what is provided by RTF editors and word
processors. You can use it to add colors (foreground and background) and formatting attributes (all combinations of bold, underline,
italic, etc...) to your plain text documents in Vim.
The advantage of this plugin over something like Latex is that with Txtfmt, your highlighting changes are visible "in real
time", and as with a word processor, the highlighting is WYSIWYG. Txtfmt embeds special tokens directly in the file to accomplish
the highlighting, so the highlighting is unaffected when you move the file around, even from one computer to another. The special
tokens are hidden by the syntax; each appears as a single space. For those who have applied Vince Negri's conceal/ownsyntax patch,
the tokens can even be made "zero-width".
I've heard a lot about Vim, both pros and
cons. It really seems you should be (as a developer) faster with Vim than with any other
editor. I'm using Vim to do some basic stuff and I'm at best 10 times less
productive with Vim.
The only two things you should care about when you talk about speed (you may not care
enough about them, but you should) are:
Using alternatively left and right hands is the fastest way to use the keyboard.
Never touching the mouse is the second way to be as fast as possible. It takes ages for
you to move your hand, grab the mouse, move it, and bring it back to the keyboard (and you
often have to look at the keyboard to be sure you returned your hand properly to the right
place)
Here are two examples demonstrating why I'm far less productive with Vim.
Copy/Cut & paste. I do it all the time. With all the contemporary editors you press
Shift with the left hand, and you move the cursor with your right hand to select
text. Then Ctrl + C copies, you move the cursor and Ctrl +
V pastes.
With Vim it's horrible:
yy to copy one line (you almost never want the whole line!)
[number xx]yy to copy xx lines into the buffer. But you never
know exactly if you've selected what you wanted. I often have to do [number
xx]dd then u to undo!
Another example? Search & replace.
In PSPad :
Ctrl + f then type what you want you search for, then press
Enter .
In Vim: /, then type what you want to search for, then if there are some
special characters put \ before each special character, then press
Enter .
And everything with Vim is like that: it seems I don't know how to handle it the right
way.
You mention cutting with yy and complain that you almost never want to cut
whole lines. In fact programmers, editing source code, very often want to work on whole
lines, ranges of lines and blocks of code. However, yy is only one of many way
to yank text into the anonymous copy buffer (or "register" as it's called in vi ).
The "Zen" of vi is that you're speaking a language. The initial y is a verb.
The statement yy is a synonym for y_ . The y is
doubled up to make it easier to type, since it is such a common operation.
This can also be expressed as ddP (delete the current line and
paste a copy back into place; leaving a copy in the anonymous register as a side effect). The
y and d "verbs" take any movement as their "subject." Thus
yW is "yank from here (the cursor) to the end of the current/next (big) word"
and y'a is "yank from here to the line containing the mark named ' a
'."
If you only understand basic up, down, left, and right cursor movements then vi will be no
more productive than a copy of "notepad" for you. (Okay, you'll still have syntax
highlighting and the ability to handle files larger than a piddling ~45KB or so; but work
with me here).
vi has 26 "marks" and 26 "registers." A mark is set to any cursor location using the
m command. Each mark is designated by a single lower case letter. Thus
ma sets the ' a ' mark to the current location, and mz
sets the ' z ' mark. You can move to the line containing a mark using the
' (single quote) command. Thus 'a moves to the beginning of the
line containing the ' a ' mark. You can move to the precise location of any mark
using the ` (backquote) command. Thus `z will move directly to the
exact location of the ' z ' mark.
Because these are "movements" they can also be used as subjects for other
"statements."
So, one way to cut an arbitrary selection of text would be to drop a mark (I usually use '
a ' as my "first" mark, ' z ' as my next mark, ' b ' as another,
and ' e ' as yet another (I don't recall ever having interactively used more than
four marks in 15 years of using vi ; one creates one's own conventions regarding how marks
and registers are used by macros that don't disturb one's interactive context). Then we go to
the other end of our desired text; we can start at either end, it doesn't matter. Then we can
simply use d`a to cut or y`a to copy. Thus the whole process has a
5 keystrokes overhead (six if we started in "insert" mode and needed to Esc out
command mode). Once we've cut or copied then pasting in a copy is a single keystroke:
p .
I say that this is one way to cut or copy text. However, it is only one of many.
Frequently we can more succinctly describe the range of text without moving our cursor around
and dropping a mark. For example if I'm in a paragraph of text I can use { and
} movements to the beginning or end of the paragraph respectively. So, to move a
paragraph of text I cut it using {d} (3 keystrokes). (If I happen
to already be on the first or last line of the paragraph I can then simply use
d} or d{ respectively.
The notion of "paragraph" defaults to something which is usually intuitively reasonable.
Thus it often works for code as well as prose.
Frequently we know some pattern (regular expression) that marks one end or the other of
the text in which we're interested. Searching forwards or backwards are movements in vi .
Thus they can also be used as "subjects" in our "statements." So I can use d/foo
to cut from the current line to the next line containing the string "foo" and
y?bar to copy from the current line to the most recent (previous) line
containing "bar." If I don't want whole lines I can still use the search movements (as
statements of their own), drop my mark(s) and use the `x commands as described
previously.
In addition to "verbs" and "subjects" vi also has "objects" (in the grammatical sense of
the term). So far I've only described the use of the anonymous register. However, I can use
any of the 26 "named" registers by prefixing the "object" reference with
" (the double quote modifier). Thus if I use "add I'm cutting the
current line into the ' a ' register and if I use "by/foo then I'm
yanking a copy of the text from here to the next line containing "foo" into the ' b
' register. To paste from a register I simply prefix the paste with the same modifier
sequence: "ap pastes a copy of the ' a ' register's contents into the
text after the cursor and "bP pastes a copy from ' b ' to before the
current line.
This notion of "prefixes" also adds the analogs of grammatical "adjectives" and "adverbs'
to our text manipulation "language." Most commands (verbs) and movement (verbs or objects,
depending on context) can also take numeric prefixes. Thus 3J means "join the
next three lines" and d5} means "delete from the current line through the end of
the fifth paragraph down from here."
This is all intermediate level vi . None of it is Vim specific and there are far more
advanced tricks in vi if you're ready to learn them. If you were to master just these
intermediate concepts then you'd probably find that you rarely need to write any macros
because the text manipulation language is sufficiently concise and expressive to do most
things easily enough using the editor's "native" language.
A sampling of more advanced tricks:
There are a number of : commands, most notably the :%
s/foo/bar/g global substitution technique. (That's not advanced but other
: commands can be). The whole : set of commands was historically
inherited by vi 's previous incarnations as the ed (line editor) and later the ex (extended
line editor) utilities. In fact vi is so named because it's the visual interface to ex .
: commands normally operate over lines of text. ed and ex were written in an
era when terminal screens were uncommon and many terminals were "teletype" (TTY) devices. So
it was common to work from printed copies of the text, using commands through an extremely
terse interface (common connection speeds were 110 baud, or, roughly, 11 characters per
second -- which is slower than a fast typist; lags were common on multi-user interactive
sessions; additionally there was often some motivation to conserve paper).
So the syntax of most : commands includes an address or range of addresses
(line number) followed by a command. Naturally one could use literal line numbers:
:127,215 s/foo/bar to change the first occurrence of "foo" into "bar" on each
line between 127 and 215. One could also use some abbreviations such as . or
$ for current and last lines respectively. One could also use relative prefixes
+ and - to refer to offsets after or before the curent line,
respectively. Thus: :.,$j meaning "from the current line to the last line, join
them all into one line". :% is synonymous with :1,$ (all the
lines).
The :... g and :... v commands bear some explanation as they are
incredibly powerful. :... g is a prefix for "globally" applying a subsequent
command to all lines which match a pattern (regular expression) while :... v
applies such a command to all lines which do NOT match the given pattern ("v" from
"conVerse"). As with other ex commands these can be prefixed by addressing/range references.
Thus :.,+21g/foo/d means "delete any lines containing the string "foo" from the
current one through the next 21 lines" while :.,$v/bar/d means "from here to the
end of the file, delete any lines which DON'T contain the string "bar."
It's interesting that the common Unix command grep was actually inspired by this ex
command (and is named after the way in which it was documented). The ex command
:g/re/p (grep) was the way they documented how to "globally" "print" lines
containing a "regular expression" (re). When ed and ex were used, the :p command
was one of the first that anyone learned and often the first one used when editing any file.
It was how you printed the current contents (usually just one page full at a time using
:.,+25p or some such).
Note that :% g/.../d or (its reVerse/conVerse counterpart: :%
v/.../d are the most common usage patterns. However there are couple of other
ex commands which are worth remembering:
We can use m to move lines around, and j to join lines. For
example if you have a list and you want to separate all the stuff matching (or conversely NOT
matching some pattern) without deleting them, then you can use something like: :%
g/foo/m$ ... and all the "foo" lines will have been moved to the end of the file.
(Note the other tip about using the end of your file as a scratch space). This will have
preserved the relative order of all the "foo" lines while having extracted them from the rest
of the list. (This would be equivalent to doing something like: 1G!GGmap!Ggrep
foo<ENTER>1G:1,'a g/foo'/d (copy the file to its own tail, filter the tail
through grep, and delete all the stuff from the head).
To join lines usually I can find a pattern for all the lines which need to be joined to
their predecessor (all the lines which start with "^ " rather than "^ * " in some bullet
list, for example). For that case I'd use: :% g/^ /-1j (for every matching line,
go up one line and join them). (BTW: for bullet lists trying to search for the bullet lines
and join to the next doesn't work for a couple reasons ... it can join one bullet line to
another, and it won't join any bullet line to all of its continuations; it'll only
work pairwise on the matches).
Almost needless to mention you can use our old friend s (substitute) with the
g and v (global/converse-global) commands. Usually you don't need
to do so. However, consider some case where you want to perform a substitution only on lines
matching some other pattern. Often you can use a complicated pattern with captures and use
back references to preserve the portions of the lines that you DON'T want to change. However,
it will often be easier to separate the match from the substitution: :%
g/foo/s/bar/zzz/g -- for every line containing "foo" substitute all "bar" with "zzz."
(Something like :% s/\(.*foo.*\)bar\(.*\)/\1zzz\2/g would only work for the
cases those instances of "bar" which were PRECEDED by "foo" on the same line; it's ungainly
enough already, and would have to be mangled further to catch all the cases where "bar"
preceded "foo")
The point is that there are more than just p, s, and
d lines in the ex command set.
The : addresses can also refer to marks. Thus you can use:
:'a,'bg/foo/j to join any line containing the string foo to its subsequent line,
if it lies between the lines between the ' a ' and ' b ' marks. (Yes, all
of the preceding ex command examples can be limited to subsets of the file's
lines by prefixing with these sorts of addressing expressions).
That's pretty obscure (I've only used something like that a few times in the last 15
years). However, I'll freely admit that I've often done things iteratively and interactively
that could probably have been done more efficiently if I'd taken the time to think out the
correct incantation.
Another very useful vi or ex command is :r to read in the contents of another
file. Thus: :r foo inserts the contents of the file named "foo" at the current
line.
More powerful is the :r! command. This reads the results of a command. It's
the same as suspending the vi session, running a command, redirecting its output to a
temporary file, resuming your vi session, and reading in the contents from the temp.
file.
Even more powerful are the ! (bang) and :... ! ( ex bang)
commands. These also execute external commands and read the results into the current text.
However, they also filter selections of our text through the command! This we can sort all
the lines in our file using 1G!Gsort ( G is the vi "goto" command;
it defaults to going to the last line of the file, but can be prefixed by a line number, such
as 1, the first line). This is equivalent to the ex variant :1,$!sort . Writers
often use ! with the Unix fmt or fold utilities for reformating or "word
wrapping" selections of text. A very common macro is {!}fmt (reformat the
current paragraph). Programmers sometimes use it to run their code, or just portions of it,
through indent or other code reformatting tools.
Using the :r! and ! commands means that any external utility or
filter can be treated as an extension of our editor. I have occasionally used these with
scripts that pulled data from a database, or with wget or lynx commands that pulled data off
a website, or ssh commands that pulled data from remote systems.
Another useful ex command is :so (short for :source ). This
reads the contents of a file as a series of commands. When you start vi it normally,
implicitly, performs a :source on ~/.exinitrc file (and Vim usually
does this on ~/.vimrc, naturally enough). The use of this is that you can
change your editor profile on the fly by simply sourcing in a new set of macros,
abbreviations, and editor settings. If you're sneaky you can even use this as a trick for
storing sequences of ex editing commands to apply to files on demand.
For example I have a seven line file (36 characters) which runs a file through wc, and
inserts a C-style comment at the top of the file containing that word count data. I can apply
that "macro" to a file by using a command like: vim +'so mymacro.ex'
./mytarget
(The + command line option to vi and Vim is normally used to start the
editing session at a given line number. However it's a little known fact that one can follow
the + by any valid ex command/expression, such as a "source" command as I've
done here; for a simple example I have scripts which invoke: vi +'/foo/d|wq!'
~/.ssh/known_hosts to remove an entry from my SSH known hosts file non-interactively
while I'm re-imaging a set of servers).
Usually it's far easier to write such "macros" using Perl, AWK, sed (which is, in fact,
like grep a utility inspired by the ed command).
The @ command is probably the most obscure vi command. In occasionally
teaching advanced systems administration courses for close to a decade I've met very few
people who've ever used it. @ executes the contents of a register as if it were
a vi or ex command.
Example: I often use: :r!locate ... to find some file on my system and read its
name into my document. From there I delete any extraneous hits, leaving only the full path to
the file I'm interested in. Rather than laboriously Tab -ing through each
component of the path (or worse, if I happen to be stuck on a machine without Tab completion
support in its copy of vi ) I just use:
0i:r (to turn the current line into a valid :r command),
"cdd (to delete the line into the "c" register) and
@c execute that command.
That's only 10 keystrokes (and the expression "cdd@c is
effectively a finger macro for me, so I can type it almost as quickly as any common six
letter word).
A sobering thought
I've only scratched to surface of vi 's power and none of what I've described here is even
part of the "improvements" for which vim is named! All of what I've described here should
work on any old copy of vi from 20 or 30 years ago.
There are people who have used considerably more of vi 's power than I ever will.
@Wahnfieden -- grok is exactly what I meant: en.wikipedia.org/wiki/Grok (It's apparently even in
the OED --- the closest we anglophones have to a canonical lexicon). To "grok" an editor is
to find yourself using its commands fluently ... as if they were your natural language.
– Jim
Dennis
Feb 12 '10 at 4:08
wow, a very well written answer! i couldn't agree more, although i use the @
command a lot (in combination with q : record macro) – knittl
Feb 27 '10 at 13:15
Superb answer that utterly redeems a really horrible question. I am going to upvote this
question, that normally I would downvote, just so that this answer becomes easier to find.
(And I'm an Emacs guy! But this way I'll have somewhere to point new folks who want a good
explanation of what vi power users find fun about vi. Then I'll tell them about Emacs and
they can decide.) – Brandon Rhodes
Mar 29 '10 at 15:26
Can you make a website and put this tutorial there, so it doesn't get burried here on
stackoverflow. I have yet to read better introduction to vi then this. – Marko
Apr 1 '10 at 14:47
You are talking about text selecting and copying, I think that you should give a look to the
Vim Visual Mode .
In the visual mode, you are able to select text using Vim commands, then you can do
whatever you want with the selection.
Consider the following common scenarios:
You need to select to the next matching parenthesis.
You could do:
v% if the cursor is on the starting/ending parenthesis
vib if the cursor is inside the parenthesis block
You want to select text between quotes:
vi" for double quotes
vi' for single quotes
You want to select a curly brace block (very common on C-style languages):
viB
vi{
You want to select the entire file:
ggVG
Visual
block selection is another really useful feature, it allows you to select a rectangular
area of text, you just have to press Ctrl - V to start it, and then
select the text block you want and perform any type of operation such as yank, delete, paste,
edit, etc. It's great to edit column oriented text.
Yes, but it was a specific complaint of the poster. Visual mode is Vim's best method of
direct text-selection and manipulation. And since vim's buffer traversal methods are superb,
I find text selection in vim fairly pleasurable. – guns
Aug 2 '09 at 9:54
I think it is also worth mentioning Ctrl-V to select a block - ie an arbitrary rectangle of
text. When you need it it's a lifesaver. – Hamish Downer
Mar 16 '10 at 13:34
Also, if you've got a visual selection and want to adjust it, o will hop to the
other end. So you can move both the beginning and the end of the selection as much as you
like. – Nathan Long
Mar 1 '11 at 19:05
* and # search for the word under the cursor
forward/backward.
w to the next word
W to the next space-separated word
b / e to the begin/end of the current word. ( B
/ E for space separated only)
gg / G jump to the begin/end of the file.
% jump to the matching { .. } or ( .. ), etc..
{ / } jump to next paragraph.
'. jump back to last edited line.
g; jump back to last edited position.
Quick editing commands
I insert at the begin.
A append to end.
o / O open a new line after/before the current.
v / V / Ctrl+V visual mode (to select
text!)
Shift+R replace text
C change remaining part of line.
Combining commands
Most commands accept a amount and direction, for example:
cW = change till end of word
3cW = change 3 words
BcW = to begin of full word, change full word
ciW = change inner word.
ci" = change inner between ".."
ci( = change text between ( .. )
ci< = change text between < .. > (needs set
matchpairs+=<:> in vimrc)
4dd = delete 4 lines
3x = delete 3 characters.
3s = substitute 3 characters.
Useful programmer commands
r replace one character (e.g. rd replaces the current char
with d ).
~ changes case.
J joins two lines
Ctrl+A / Ctrl+X increments/decrements a number.
. repeat last command (a simple macro)
== fix line indent
> indent block (in visual mode)
< unindent block (in visual mode)
Macro recording
Press q[ key ] to start recording.
Then hit q to stop recording.
The macro can be played with @[ key ] .
By using very specific commands and movements, VIM can replay those exact actions for the
next lines. (e.g. A for append-to-end, b / e to move the cursor to
the begin or end of a word respectively)
Example of well built settings
# reset to vim-defaults
if &compatible # only if not set before:
set nocompatible # use vim-defaults instead of vi-defaults (easier, more user friendly)
endif
# display settings
set background=dark # enable for dark terminals
set nowrap # dont wrap lines
set scrolloff=2 # 2 lines above/below cursor when scrolling
set number # show line numbers
set showmatch # show matching bracket (briefly jump)
set showmode # show mode in status bar (insert/replace/...)
set showcmd # show typed command in status bar
set ruler # show cursor position in status bar
set title # show file in titlebar
set wildmenu # completion with menu
set wildignore=*.o,*.obj,*.bak,*.exe,*.py[co],*.swp,*~,*.pyc,.svn
set laststatus=2 # use 2 lines for the status bar
set matchtime=2 # show matching bracket for 0.2 seconds
set matchpairs+=<:> # specially for html
# editor settings
set esckeys # map missed escape sequences (enables keypad keys)
set ignorecase # case insensitive searching
set smartcase # but become case sensitive if you type uppercase characters
set smartindent # smart auto indenting
set smarttab # smart tab handling for indenting
set magic # change the way backslashes are used in search patterns
set bs=indent,eol,start # Allow backspacing over everything in insert mode
set tabstop=4 # number of spaces a tab counts for
set shiftwidth=4 # spaces for autoindents
#set expandtab # turn a tabs into spaces
set fileformat=unix # file mode is unix
#set fileformats=unix,dos # only detect unix file format, displays that ^M with dos files
# system settings
set lazyredraw # no redraws in macros
set confirm # get a dialog when :q, :w, or :wq fails
set nobackup # no backup~ files.
set viminfo='20,\"500 # remember copy registers after quitting in the .viminfo file -- 20 jump links, regs up to 500 lines'
set hidden # remember undo after quitting
set history=50 # keep 50 lines of command history
set mouse=v # use mouse in visual mode (not normal,insert,command,help mode
# color settings (if terminal/gui supports it)
if &t_Co > 2 || has("gui_running")
syntax on # enable colors
set hlsearch # highlight search (very useful!)
set incsearch # search incremently (search while typing)
endif
# paste mode toggle (needed when using autoindent/smartindent)
map <F10> :set paste<CR>
map <F11> :set nopaste<CR>
imap <F10> <C-O>:set paste<CR>
imap <F11> <nop>
set pastetoggle=<F11>
# Use of the filetype plugins, auto completion and indentation support
filetype plugin indent on
# file type specific settings
if has("autocmd")
# For debugging
#set verbose=9
# if bash is sh.
let bash_is_sh=1
# change to directory of current file automatically
autocmd BufEnter * lcd %:p:h
# Put these in an autocmd group, so that we can delete them easily.
augroup mysettings
au FileType xslt,xml,css,html,xhtml,javascript,sh,config,c,cpp,docbook set smartindent shiftwidth=2 softtabstop=2 expandtab
au FileType tex set wrap shiftwidth=2 softtabstop=2 expandtab
# Confirm to PEP8
au FileType python set tabstop=4 softtabstop=4 expandtab shiftwidth=4 cinwords=if,elif,else,for,while,try,except,finally,def,class
augroup END
augroup perl
# reset (disable previous 'augroup perl' settings)
au!
au BufReadPre,BufNewFile
\ *.pl,*.pm
\ set formatoptions=croq smartindent shiftwidth=2 softtabstop=2 cindent cinkeys='0{,0},!^F,o,O,e' " tags=./tags,tags,~/devel/tags,~/devel/C
# formatoption:
# t - wrap text using textwidth
# c - wrap comments using textwidth (and auto insert comment leader)
# r - auto insert comment leader when pressing <return> in insert mode
# o - auto insert comment leader when pressing 'o' or 'O'.
# q - allow formatting of comments with "gq"
# a - auto formatting for paragraphs
# n - auto wrap numbered lists
#
augroup END
# Always jump to the last known cursor position.
# Don't do it when the position is invalid or when inside
# an event handler (happens when dropping a file on gvim).
autocmd BufReadPost *
\ if line("'\"") > 0 && line("'\"") <= line("$") |
\ exe "normal g`\"" |
\ endif
endif # has("autocmd")
The settings can be stored in ~/.vimrc, or system-wide in
/etc/vimrc.local and then by read from the /etc/vimrc file
using:
source /etc/vimrc.local
(you'll have to replace the # comment character with " to make
it work in VIM, I wanted to give proper syntax highlighting here).
The commands I've listed here are pretty basic, and the main ones I use so far. They
already make me quite more productive, without having to know all the fancy stuff.
Better than '. is g;, which jumps back through the
changelist . Goes to the last edited position, instead of last edited line
– naught101
Apr 28 '12 at 2:09
The Control + R mechanism is very useful :-) In either insert mode or
command mode (i.e. on the : line when typing commands), continue with a numbered
or named register:
a - z the named registers
" the unnamed register, containing the text of the last delete or
yank
% the current file name
# the alternate file name
* the clipboard contents (X11: primary selection)
+ the clipboard contents
/ the last search pattern
: the last command-line
. the last inserted text
- the last small (less than a line) delete
=5*5 insert 25 into text (mini-calculator)
See :help i_CTRL-R and :help c_CTRL-R for more details, and
snoop around nearby for more CTRL-R goodness.
+1 for current/alternate file name. Control-A also works in insert mode for last
inserted text, and Control-@ to both insert last inserted text and immediately
switch to normal mode. – Aryeh Leib Taurog
Feb 26 '12 at 19:06
There are a lot of good answers here, and one amazing one about the zen of vi. One thing I
don't see mentioned is that vim is extremely extensible via plugins. There are scripts and
plugins to make it do all kinds of crazy things the original author never considered. Here
are a few examples of incredibly handy vim plugins:
Rails.vim is a plugin written by tpope. It's an incredible tool for people doing rails
development. It does magical context-sensitive things that allow you to easily jump from a
method in a controller to the associated view, over to a model, and down to unit tests for
that model. It has saved dozens if not hundreds of hours as a rails
developer.
This plugin allows you to select a region of text in visual mode and type a quick command
to post it to gist.github.com . This
allows for easy pastebin access, which is incredibly handy if you're collaborating with
someone over IRC or IM.
This plugin provides special functionality to the spacebar. It turns the spacebar into
something analogous to the period, but instead of repeating actions it repeats motions. This
can be very handy for moving quickly through a file in a way you define on the
fly.
This plugin gives you the ability to work with text that is delimited in some fashion. It
gives you objects which denote things inside of parens, things inside of quotes, etc. It can
come in handy for manipulating delimited text.
This script brings fancy tab completion functionality to vim. The autocomplete stuff is
already there in the core of vim, but this brings it to a quick tab rather than multiple
different multikey shortcuts. Very handy, and incredibly fun to use. While it's not VS's
intellisense, it's a great step and brings a great deal of the functionality you'd like to
expect from a tab completion tool.
This tool brings external syntax checking commands into vim. I haven't used it personally,
but I've heard great things about it and the concept is hard to beat. Checking syntax without
having to do it manually is a great time saver and can help you catch syntactic bugs as you
introduce them rather than when you finally stop to test.
Direct access to git from inside of vim. Again, I haven't used this plugin, but I can see
the utility. Unfortunately I'm in a culture where svn is considered "new", so I won't likely
see git at work for quite some time.
A tree browser for vim. I started using this recently, and it's really handy. It lets you
put a treeview in a vertical split and open files easily. This is great for a project with a
lot of source files you frequently jump between.
This is an unmaintained plugin, but still incredibly useful. It provides the ability to
open files using a "fuzzy" descriptive syntax. It means that in a sparse tree of files you
need only type enough characters to disambiguate the files you're interested in from the rest
of the cruft.
Conclusion
There are a lot of incredible tools available for vim. I'm sure I've only scratched the
surface here, and it's well worth searching for tools applicable to your domain. The
combination of traditional vi's powerful toolset, vim's improvements on it, and plugins which
extend vim even further, it's one of the most powerful ways to edit text ever conceived. Vim
is easily as powerful as emacs, eclipse, visual studio, and textmate.
Thanks
Thanks to duwanis for his
vim configs from which I
have learned much and borrowed most of the plugins listed here.
The magical tests-to-class navigation in rails.vim is one of the more general things I wish
Vim had that TextMate absolutely nails across all languages: if I am working on Person.scala
and I do Cmd+T, usually the first thing in the list is PersonTest.scala. – Tom Morris
Apr 1 '10 at 8:50
@Benson Great list! I'd toss in snipMate as well. Very helpful
automation of common coding stuff. if<tab> instant if block, etc. – AlG
Sep 13 '11 at 17:37
Visual mode was mentioned previously, but block visual mode has saved me a lot of time
when editing fixed size columns in text file. (accessed with Ctrl-V).
Additionally, if you use a concise command (e.g. A for append-at-end) to edit the text, vim
can repeat that exact same action for the next line you press the . key at.
– vdboor
Apr 1 '10 at 8:34
Go to last edited location (very useful if you performed some searching and than want go
back to edit)
^P and ^N
Complete previous (^P) or next (^N) text.
^O and ^I
Go to previous ( ^O - "O" for old) location or to the next (
^I - "I" just near to "O" ). When you perform
searches, edit files etc., you can navigate through these "jumps" forward and back.
@Kungi `. will take you to the last edit `` will take you back to the position you were in
before the last 'jump' - which /might/ also be the position of the last edit. –
Grant
McLean
Aug 23 '11 at 8:21
It's pretty new and really really good. The guy who is running the site switched from
textmate to vim and hosts very good and concise casts on specific vim topics. Check it
out!
@SolutionYogi: Consider that you want to add line number to the beginning of each line.
Solution: ggI1<space><esc>0qqyawjP0<c-a>0q9999@q – hcs42
Feb 27 '10 at 19:05
Extremely useful with Vimperator, where it increments (or decrements, Ctrl-X) the last number
in the URL. Useful for quickly surfing through image galleries etc. – blueyed
Apr 1 '10 at 14:47
Whoa, I didn't know about the * and # (search forward/back for word under cursor) binding.
That's kinda cool. The f/F and t/T and ; commands are quick jumps to characters on the
current line. f/F put the cursor on the indicated character while t/T puts it just up "to"
the character (the character just before or after it according to the direction chosen. ;
simply repeats the most recent f/F/t/T jump (in the same direction). – Jim Dennis
Mar 14 '10 at 6:38
:) The tagline at the top of the tips page at vim.org: "Can you imagine how many keystrokes
could have been saved, if I only had known the "*" command in time?" - Juergen Salk,
1/19/2001" – Steve K
Apr 3 '10 at 23:50
As Jim mentioned, the "t/T" combo is often just as good, if not better, for example,
ct( will erase the word and put you in insert mode, but keep the parantheses!
– puk
Feb 24 '12 at 6:45
CTRL-A ;Add [count] to the number or alphabetic character at or after the cursor. {not
in Vi
CTRL-X ;Subtract [count] from the number or alphabetic character at or after the cursor.
{not in Vi}
b. Window key unmapping
In window, Ctrl-A already mapped for whole file selection you need to unmap in rc file.
mark mswin.vim CTRL-A mapping part as comment or add your rc file with unmap
c. With Macro
The CTRL-A command is very useful in a macro. Example: Use the following steps to make a
numbered list.
Create the first list entry, make sure it starts with a number.
Last week at work our project inherited a lot of Python code from another project.
Unfortunately the code did not fit into our existing architecture - it was all done with
global variables and functions, which would not work in a multi-threaded environment.
We had ~80 files that needed to be reworked to be object oriented - all the functions
moved into classes, parameters changed, import statements added, etc. We had a list of about
20 types of fix that needed to be done to each file. I would estimate that doing it by hand
one person could do maybe 2-4 per day.
So I did the first one by hand and then wrote a vim script to automate the changes. Most
of it was a list of vim commands e.g.
" delete an un-needed function "
g/someFunction(/ d
" add wibble parameter to function foo "
%s/foo(/foo( wibble,/
" convert all function calls bar(thing) into method calls thing.bar() "
g/bar(/ normal nmaf(ldi(`aPa.
The last one deserves a bit of explanation:
g/bar(/ executes the following command on every line that contains "bar("
normal execute the following text as if it was typed in in normal mode
n goes to the next match of "bar(" (since the :g command leaves the cursor position at the start of the line)
ma saves the cursor position in mark a
f( moves forward to the next opening bracket
l moves right one character, so the cursor is now inside the brackets
di( delete all the text inside the brackets
`a go back to the position saved as mark a (i.e. the first character of "bar")
P paste the deleted text before the current cursor position
a. go into insert mode and add a "."
For a couple of more complex transformations such as generating all the import statements
I embedded some python into the vim script.
After a few hours of working on it I had a script that will do at least 95% of the
conversion. I just open a file in vim then run :source fixit.vim and the file is
transformed in a blink of the eye.
We still have the work of changing the remaining 5% that was not worth automating and of
testing the results, but by spending a day writing this script I estimate we have saved weeks
of work.
Of course it would have been possible to automate this with a scripting language like
Python or Ruby, but it would have taken far longer to write and would be less flexible - the
last example would have been difficult since regex alone would not be able to handle nested
brackets, e.g. to convert bar(foo(xxx)) to foo(xxx).bar() . Vim was
perfect for the task.
@lpsquiggle: your suggestion would not handle complex expressions with more than one set of
brackets. e.g. if bar(foo(xxx)) or wibble(xxx): becomes if foo(xxx)) or
wibble(xxx.bar(): which is completely wrong. – Dave Kirby
Mar 23 '10 at 17:16
Use the builtin file explorer! The command is :Explore and it allows you to
navigate through your source code very very fast. I have these mapping in my
.vimrc :
I always thought the default methods for browsing kinda sucked for most stuff. It's just slow
to browse, if you know where you wanna go. LustyExplorer from vim.org's script section is a
much needed improvement. – Svend
Aug 2 '09 at 8:48
I recommend NERDtree instead of the built-in explorer. It has changed the way I used vim for
projects and made me much more productive. Just google for it. – kprobst
Apr 1 '10 at 3:53
I never feel the need to explore the source tree, I just use :find,
:tag and the various related keystrokes to jump around. (Maybe this is because
the source trees I work on are big and organized differently than I would have done? :) )
– dash-tom-bang
Aug 24 '11 at 0:35
I am a member of the American Cryptogram Association. The bimonthly magazine includes over
100 cryptograms of various sorts. Roughly 15 of these are "cryptarithms" - various types of
arithmetic problems with letters substituted for the digits. Two or three of these are
sudokus, except with letters instead of numbers. When the grid is completed, the nine
distinct letters will spell out a word or words, on some line, diagonal, spiral, etc.,
somewhere in the grid.
Rather than working with pencil, or typing the problems in by hand, I download the
problems from the members area of their website.
When working with these sudokus, I use vi, simply because I'm using facilities that vi has
that few other editors have. Mostly in converting the lettered grid into a numbered grid,
because I find it easier to solve, and then the completed numbered grid back into the
lettered grid to find the solution word or words.
The problem is formatted as nine groups of nine letters, with - s
representing the blanks, written in two lines. The first step is to format these into nine
lines of nine characters each. There's nothing special about this, just inserting eight
linebreaks in the appropriate places.
So, first step in converting this into numbers is to make a list of the distinct letters.
First, I make a copy of the block. I position the cursor at the top of the block, then type
:y}}p . : puts me in command mode, y yanks the next
movement command. Since } is a move to the end of the next paragraph,
y} yanks the paragraph. } then moves the cursor to the end of the
paragraph, and p pastes what we had yanked just after the cursor. So
y}}p creates a copy of the next paragraph, and ends up with the cursor between
the two copies.
Next, I to turn one of those copies into a list of distinct letters. That command is a bit
more complex:
: again puts me in command mode. ! indicates that the content of
the next yank should be piped through a command line. } yanks the next
paragraph, and the command line then uses the tr command to strip out everything
except for upper-case letters, the sed command to print each letter on a single
line, and the sort command to sort those lines, removing duplicates, and then
tr strips out the newlines, leaving the nine distinct letters in a single line,
replacing the nine lines that had made up the paragraph originally. In this case, the letters
are: ACELNOPST .
Next step is to make another copy of the grid. And then to use the letters I've just
identified to replace each of those letters with a digit from 1 to 9. That's simple:
:!}tr ACELNOPST 0-9 . The result is:
This can then be solved in the usual way, or entered into any sudoku solver you might
prefer. The completed solution can then be converted back into letters with :!}tr 1-9
ACELNOPST .
There is power in vi that is matched by very few others. The biggest problem is that only
a very few of the vi tutorial books, websites, help-files, etc., do more than barely touch
the surface of what is possible.
and an irritation is that some distros such as ubuntu has aliases from the word "vi" to "vim"
so people won't really see vi. Excellent example, have to try... +1 – hhh
Jan 14 '11 at 17:12
I'm baffled by this repeated error: you say you need : to go into command mode,
but then invariably you specify normal mode commands (like y}}p ) which
cannot possibly work from the command mode?! – sehe
Mar 4 '12 at 20:47
My take on the unique chars challenge: :se tw=1 fo= (preparation)
VG:s/./& /g (insert spaces), gvgq (split onto separate lines),
V{:sort u (sort and remove duplicates) – sehe
Mar 4 '12 at 20:56
I find the following trick increasingly useful ... for cases where you want to join lines
that match (or that do NOT match) some pattern to the previous line: :%
g/foo/-1j or :'a,'z v/bar/-1j for example (where the former is "all lines
and matching the pattern" while the latter is "lines between mark a and mark z which fail to
match the pattern"). The part after the patter in a g or v ex
command can be any other ex commmands, -1j is just a relative line movement and join command.
– Jim
Dennis
Feb 12 '10 at 4:15
of course, if you name your macro '2', then when it comes time to use it, you don't even have
to move your finger from the '@' key to the 'q' key. Probably saves 50 to 100 milliseconds
every time right there. =P – JustJeff
Feb 27 '10 at 12:54
I recently discovered q: . It opens the "command window" and shows your most
recent ex-mode (command-mode) commands. You can move as usual within the window, and pressing
<CR> executes the command. You can edit, etc. too. Priceless when you're
messing around with some complex command or regex and you don't want to retype the whole
thing, or if the complex thing you want to do was 3 commands back. It's almost like bash's
set -o vi, but for vim itself (heh!).
See :help q: for more interesting bits for going back and forth.
I just discovered Vim's omnicompletion the other day, and while I'll admit I'm a bit hazy on
what does which, I've had surprisingly good results just mashing either Ctrl +
xCtrl + u or Ctrl + n /
Ctrl + p in insert mode. It's not quite IntelliSense, but I'm still learning it.
<Ctrl> + W and j/k will let you navigate absolutely (j up, k down, as with normal vim).
This is great when you have 3+ splits. – Andrew Scagnelli
Apr 1 '10 at 2:58
after bashing my keyboard I have deduced that <C-w>n or
<C-w>s is new horizontal window, <C-w>b is bottom right
window, <C-w>c or <C-w>q is close window,
<C-w>x is increase and then decrease window width (??),
<C-w>p is last window, <C-w>backspace is move left(ish)
window – puk
Feb 24 '12 at 7:00
As several other people have said, visual mode is the answer to your copy/cut & paste
problem. Vim gives you 'v', 'V', and C-v. Lower case 'v' in vim is essentially the same as
the shift key in notepad. The nice thing is that you don't have to hold it down. You can use
any movement technique to navigate efficiently to the starting (or ending) point of your
selection. Then hit 'v', and use efficient movement techniques again to navigate to the other
end of your selection. Then 'd' or 'y' allows you to cut or copy that selection.
The advantage vim's visual mode has over Jim Dennis's description of cut/copy/paste in vi
is that you don't have to get the location exactly right. Sometimes it's more efficient to
use a quick movement to get to the general vicinity of where you want to go and then refine
that with other movements than to think up a more complex single movement command that gets
you exactly where you want to go.
The downside to using visual mode extensively in this manner is that it can become a
crutch that you use all the time which prevents you from learning new vi(m) commands that
might allow you to do things more efficiently. However, if you are very proactive about
learning new aspects of vi(m), then this probably won't affect you much.
I'll also re-emphasize that the visual line and visual block modes give you variations on
this same theme that can be very powerful...especially the visual block mode.
On Efficient Use of the Keyboard
I also disagree with your assertion that alternating hands is the fastest way to use the
keyboard. It has an element of truth in it. Speaking very generally, repeated use of the same
thing is slow. This most significant example of this principle is that consecutive keystrokes
typed with the same finger are very slow. Your assertion probably stems from the natural
tendency to use the s/finger/hand/ transformation on this pattern. To some extent it's
correct, but at the extremely high end of the efficiency spectrum it's incorrect.
Just ask any pianist. Ask them whether it's faster to play a succession of a few notes
alternating hands or using consecutive fingers of a single hand in sequence. The fastest way
to type 4 keystrokes is not to alternate hands, but to type them with 4 fingers of the same
hand in either ascending or descending order (call this a "run"). This should be self-evident
once you've considered this possibility.
The more difficult problem is optimizing for this. It's pretty easy to optimize for
absolute distance on the keyboard. Vim does that. It's much harder to optimize at the "run"
level, but vi(m) with it's modal editing gives you a better chance at being able to do it
than any non-modal approach (ahem, emacs) ever could.
On Emacs
Lest the emacs zealots completely disregard my whole post on account of that last
parenthetical comment, I feel I must describe the root of the difference between the emacs
and vim religions. I've never spoken up in the editor wars and I probably won't do it again,
but I've never heard anyone describe the differences this way, so here it goes. The
difference is the following tradeoff:
Vim gives you unmatched raw text editing efficiency Emacs gives you unmatched ability to
customize and program the editor
The blind vim zealots will claim that vim has a scripting language. But it's an obscure,
ad-hoc language that was designed to serve the editor. Emacs has Lisp! Enough said. If you
don't appreciate the significance of those last two sentences or have a desire to learn
enough about functional programming and Lisp to develop that appreciation, then you should
use vim.
The emacs zealots will claim that emacs has viper mode, and so it is a superset of vim.
But viper mode isn't standard. My understanding is that viper mode is not used by the
majority of emacs users. Since it's not the default, most emacs users probably don't develop
a true appreciation for the benefits of the modal paradigm.
In my opinion these differences are orthogonal. I believe the benefits of vim and emacs as
I have stated them are both valid. This means that the ultimate editor doesn't exist yet.
It's probably true that emacs would be the easiest platform on which to base the ultimate
editor. But modal editing is not entrenched in the emacs mindset. The emacs community could
move that way in the future, but that doesn't seem very likely.
So if you want raw editing efficiency, use vim. If you want the ultimate environment for
scripting and programming your editor use emacs. If you want some of both with an emphasis on
programmability, use emacs with viper mode (or program your own mode). If you want the best
of both worlds, you're out of luck for now.
Spend 30 mins doing the vim tutorial (run vimtutor instead of vim in terminal). You will
learn the basic movements, and some keystrokes, this will make you at least as productive
with vim as with the text editor you used before. After that, well, read Jim Dennis' answer
again :)
This is the first thing I thought of when reading the OP. It's obvious that the poster has
never run this; I ran through it when first learning vim two years ago and it cemented in my
mind the superiority of Vim to any of the other editors I've used (including, for me, Emacs
since the key combos are annoying to use on a Mac). – dash-tom-bang
Aug 24 '11 at 0:47
Use \c anywhere in a search to ignore case (overriding your ignorecase or
smartcase settings). E.g. /\cfoo or /foo\c will match
foo, Foo, fOO, FOO, etc.
Use \C anywhere in a search to force case matching. E.g. /\Cfoo
or /foo\C will only match foo.
Odd nobody's mentioned ctags. Download "exuberant ctags" and put it ahead of the crappy
preinstalled version you already have in your search path. Cd to the root of whatever you're
working on; for example the Android kernel distribution. Type "ctags -R ." to build an index
of source files anywhere beneath that dir in a file named "tags". This contains all tags,
nomatter the language nor where in the dir, in one file, so cross-language work is easy.
Then open vim in that folder and read :help ctags for some commands. A few I use
often:
Put cursor on a method call and type CTRL-] to go to the method definition.
You asked about productive shortcuts, but I think your real question is: Is vim worth it? The
answer to this stackoverflow question is -> "Yes"
You must have noticed two things. Vim is powerful, and vim is hard to learn. Much of it's
power lies in it's expandability and endless combination of commands. Don't feel overwhelmed.
Go slow. One command, one plugin at a time. Don't overdo it.
All that investment you put into vim will pay back a thousand fold. You're going to be
inside a text editor for many, many hours before you die. Vim will be your companion.
Multiple buffers, and in particular fast jumping between them to compare two files with
:bp and :bn (properly remapped to a single Shift +
p or Shift + n )
vimdiff mode (splits in two vertical buffers, with colors to show the
differences)
Area-copy with Ctrl + v
And finally, tab completion of identifiers (search for "mosh_tab_or_complete"). That's a
life changer.
Probably better to set the clipboard option to unnamed ( set
clipboard=unnamed in your .vimrc) to use the system clipboard by default. Or if you
still want the system clipboard separate from the unnamed register, use the appropriately
named clipboard register: "*p . – R. Martinho Fernandes
Apr 1 '10 at 3:17
Love it! After being exasperated by pasting code examples from the web and I was just
starting to feel proficient in vim. That was the command I dreamed up on the spot. This was
when vim totally hooked me. – kevpie
Oct 12 '10 at 22:38
There are a plethora of questions where people talk about common tricks, notably " Vim+ctags
tips and tricks ".
However, I don't refer to commonly used shortcuts that someone new to Vim would find cool.
I am talking about a seasoned Unix user (be they a developer, administrator, both, etc.), who
thinks they know something 99% of us never heard or dreamed about. Something that not only
makes their work easier, but also is COOL and hackish .
After all, Vim resides in
the most dark-corner-rich OS in the world, thus it should have intricacies that only a few
privileged know about and want to share with us.
Might not be one that 99% of Vim users don't know about, but it's something I use daily and
that any Linux+Vim poweruser must know.
Basic command, yet extremely useful.
:w !sudo tee %
I often forget to sudo before editing a file I don't have write permissions on. When I
come to save that file and get a permission error, I just issue that vim command in order to
save the file without the need to save it to a temp file and then copy it back again.
You obviously have to be on a system with sudo installed and have sudo rights.
Something I just discovered recently that I thought was very cool:
:earlier 15m
Reverts the document back to how it was 15 minutes ago. Can take various arguments for the
amount of time you want to roll back, and is dependent on undolevels. Can be reversed with
the opposite command :later
@skinp: If you undo and then make further changes from the undone state, you lose that redo
history. This lets you go back to a state which is no longer in the undo stack. –
ephemient
Apr 8 '09 at 16:15
Also very usefull is g+ and g- to go backward and forward in time. This is so much more
powerfull than an undo/redo stack since you don't loose the history when you do something
after an undo. – Etienne PIERRE
Jul 21 '09 at 13:53
You don't lose the redo history if you make a change after an undo. It's just not easily
accessed. There are plugins to help you visualize this, like Gundo.vim – Ehtesh Choudhury
Nov 29 '11 at 12:09
This is quite similar to :r! The only difference as far as I can tell is that :r! opens a new
line, :.! overwrites the current line. – saffsd
May 6 '09 at 14:41
An alternative to :.!date is to write "date" on a line and then run
!$sh (alternatively having the command followed by a blank line and run
!jsh ). This will pipe the line to the "sh" shell and substitute with the output
from the command. – hlovdal
Jan 25 '10 at 21:11
:.! is actually a special case of :{range}!, which filters a range
of lines (the current line when the range is . ) through a command and replaces
those lines with the output. I find :%! useful for filtering whole buffers.
– Nefrubyr
Mar 25 '10 at 16:24
And also note that '!' is like 'y', 'd', 'c' etc. i.e. you can do: !!, number!!, !motion
(e.g. !Gshell_command<cr> replace from current line to end of file ('G') with output of
shell_command). – aqn
Apr 26 '13 at 20:52
dab "delete arounb brackets", daB for around curly brackets, t for xml type tags,
combinations with normal commands are as expected cib/yaB/dit/vat etc – sjh
Apr 8 '09 at 15:33
This is possibly the biggest reason for me staying with Vim. That and its equivalent "change"
commands: ciw, ci(, ci", as well as dt<space> and ct<space> – thomasrutter
Apr 26 '09 at 11:11
de Delete everything till the end of the word by pressing . at your heart's desire.
ci(xyz[Esc] -- This is a weird one. Here, the 'i' does not mean insert mode. Instead it
means inside the parenthesis. So this sequence cuts the text inside parenthesis you're
standing in and replaces it with "xyz". It also works inside square and figure brackets --
just do ci[ or ci{ correspondingly. Naturally, you can do di (if you just want to delete all
text without typing anything. You can also do a instead of i if you
want to delete the parentheses as well and not just text inside them.
ci" - cuts the text in current quotes
ciw - cuts the current word. This works just like the previous one except that
( is replaced with w .
C - cut the rest of the line and switch to insert mode.
ZZ -- save and close current file (WAY faster than Ctrl-F4 to close the current tab!)
ddp - move current line one row down
xp -- move current character one position to the right
U - uppercase, so viwU upercases the word
~ - switches case, so viw~ will reverse casing of entire word
Ctrl+u / Ctrl+d scroll the page half-a-screen up or down. This seems to be more useful
than the usual full-screen paging as it makes it easier to see how the two screens relate.
For those who still want to scroll entire screen at a time there's Ctrl+f for Forward and
Ctrl+b for Backward. Ctrl+Y and Ctrl+E scroll down or up one line at a time.
Crazy but very useful command is zz -- it scrolls the screen to make this line appear in
the middle. This is excellent for putting the piece of code you're working on in the center
of your attention. Sibling commands -- zt and zb -- make this line the top or the bottom one
on the sreen which is not quite as useful.
% finds and jumps to the matching parenthesis.
de -- delete from cursor to the end of the word (you can also do dE to delete
until the next space)
bde -- delete the current word, from left to right delimiter
df[space] -- delete up until and including the next space
dt. -- delete until next dot
dd -- delete this entire line
ye (or yE) -- yanks text from here to the end of the word
ce - cuts through the end of the word
bye -- copies current word (makes me wonder what "hi" does!)
yy -- copies the current line
cc -- cuts the current line, you can also do S instead. There's also lower
cap s which cuts current character and switches to insert mode.
viwy or viwc . Yank or change current word. Hit w multiple times to keep
selecting each subsequent word, use b to move backwards
vi{ - select all text in figure brackets. va{ - select all text including {}s
vi(p - highlight everything inside the ()s and replace with the pasted text
b and e move the cursor word-by-word, similarly to how Ctrl+Arrows normally do . The
definition of word is a little different though, as several consecutive delmiters are treated
as one word. If you start at the middle of a word, pressing b will always get you to the
beginning of the current word, and each consecutive b will jump to the beginning of the next
word. Similarly, and easy to remember, e gets the cursor to the end of the
current, and each subsequent, word.
similar to b / e, capital B and E
move the cursor word-by-word using only whitespaces as delimiters.
capital D (take a deep breath) Deletes the rest of the line to the right of the cursor,
same as Shift+End/Del in normal editors (notice 2 keypresses -- Shift+D -- instead of 3)
All the things you're calling "cut" is "change". eg: C is change until the end of the line.
Vim's equivalent of "cut" is "delete", done with d/D. The main difference between change and
delete is that delete leaves you in normal mode but change puts you into a sort of insert
mode (though you're still in the change command which is handy as the whole change can be
repeated with . ). – Laurence Gonsalves
Feb 19 '11 at 23:49
One that I rarely find in most Vim tutorials, but it's INCREDIBLY useful (at least to me), is
the
g; and g,
to move (forward, backward) through the changelist.
Let me show how I use it. Sometimes I need to copy and paste a piece of code or string,
say a hex color code in a CSS file, so I search, jump (not caring where the match is), copy
it and then jump back (g;) to where I was editing the code to finally paste it. No need to
create marks. Simpler.
Ctrl-O and Ctrl-I (tab) will work similarly, but not the same. They move backward and forward
in the "jump list", which you can view by doing :jumps or :ju For more information do a :help
jumplist – Kimball Robinson
Apr 16 '10 at 0:29
@JoshLee: If one is careful not to traverse newlines, is it safe to not use the -b option? I
ask because sometimes I want to make a hex change, but I don't want to close and
reopen the file to do so. – dotancohen
Jun 7 '13 at 5:50
Sometimes a setting in your .vimrc will get overridden by a plugin or autocommand. To debug
this a useful trick is to use the :verbose command in conjunction with :set. For example, to
figure out where cindent got set/unset:
:verbose set cindent?
This will output something like:
cindent
Last set from /usr/share/vim/vim71/indent/c.vim
This also works with maps and highlights. (Thanks joeytwiddle for pointing this out.) For
example:
:verbose nmap U
n U <C-R>
Last set from ~/.vimrc
:verbose highlight Normal
Normal xxx guifg=#dddddd guibg=#111111 font=Inconsolata Medium 14
Last set from ~/src/vim-holodark/colors/holodark.vim
:verbose can also be used before nmap l or highlight
Normal to find out where the l keymap or the Normal
highlight were last defined. Very useful for debugging! – joeytwiddle
Jul 5 '14 at 22:08
When you get into creating custom mappings, this will save your ass so many times, probably
one of the most useful ones here (IMO)! – SidOfc
Sep 24 '17 at 11:26
Not sure if this counts as dark-corner-ish at all, but I've only just learnt it...
:g/match/y A
will yank (copy) all lines containing "match" into the "a / @a
register. (The capitalization as A makes vim append yankings instead of
replacing the previous register contents.) I used it a lot recently when making Internet
Explorer stylesheets.
Sometimes it's better to do what tsukimi said and just filter out lines that don't match your
pattern. An abbreviated version of that command though: :v/PATTERN/d
Explanation: :v is an abbreviation for :g!, and the
:g command applies any ex command to lines. :y[ank] works and so
does :normal, but here the most natural thing to do is just
:d[elete] . – pandubear
Oct 12 '13 at 8:39
You can also do :g/match/normal "Ayy -- the normal keyword lets you
tell it to run normal-mode commands (which you are probably more familiar with). –
Kimball
Robinson
Feb 5 '16 at 17:58
Hitting <C-f> after : or / (or any time you're in command mode) will bring up the same
history menu. So you can remap q: if you hit it accidentally a lot and still access this
awesome mode. – idbrii
Feb 23 '11 at 19:07
For me it didn't open the source; instead it apparently used elinks to dump rendered page
into a buffer, and then opened that. – Ivan Vučica
Sep 21 '10 at 8:07
@Vdt: It'd be useful if you posted your error. If it's this one: " error (netrw)
neither the wget nor the fetch command is available" you obviously need to make one of those
tools available from your PATH environment variable. – Isaac Remuant
Jun 3 '13 at 15:23
I find this one particularly useful when people send links to a paste service and forgot to
select a syntax highlighting, I generally just have to open the link in vim after appending
"&raw". – Dettorer
Oct 29 '14 at 13:47
I didn't know macros could repeat themselves. Cool. Note: qx starts recording into register x
(he uses qq for register q). 0 moves to the start of the line. dw delets a word. j moves down
a line. @q will run the macro again (defining a loop). But you forgot to end the recording
with a final "q", then actually run the macro by typing @q. – Kimball Robinson
Apr 16 '10 at 0:39
Another way of accomplishing this is to record a macro in register a that does some
transformation to a single line, then linewise highlight a bunch of lines with V and type
:normal! @a to applyyour macro to every line in your selection. –
Nathan Long
Aug 29 '11 at 15:33
I found this post googling recursive VIM macros. I could find no way to stop the macro other
than killing the VIM process. – dotancohen
May 14 '13 at 6:00
Assuming you have Perl and/or Ruby support compiled in, :rubydo and
:perldo will run a Ruby or Perl one-liner on every line in a range (defaults to
entire buffer), with $_ bound to the text of the current line (minus the
newline). Manipulating $_ will change the text of that line.
You can use this to do certain things that are easy to do in a scripting language but not
so obvious using Vim builtins. For example to reverse the order of the words in a line:
:perldo $_ = join ' ', reverse split
To insert a random string of 8 characters (A-Z) at the end of every line:
Sadly not, it just adds a funky control character to the end of the line. You could then use
a Vim search/replace to change all those control characters to real newlines though. –
Brian Carper
Jul 2 '09 at 17:26
Go to older/newer position. When you are moving through the file (by searching, moving
commands etc.) vim rember these "jumps", so you can repeat these jumps backward (^O - O for
old) and forward (^I - just next to I on keyboard). I find it very useful when writing code
and performing a lot of searches.
gi
Go to position where Insert mode was stopped last. I find myself often editing and then
searching for something. To return to editing place press gi.
gf
put cursor on file name (e.g. include header file), press gf and the file is opened
gF
similar to gf but recognizes format "[file name]:[line number]". Pressing gF will open
[file name] and set cursor to [line number].
^P and ^N
Auto complete text while editing (^P - previous match and ^N next match)
^X^L
While editing completes to the same line (useful for programming). You write code and then
you recall that you have the same code somewhere in file. Just press ^X^L and the full line
completed
^X^F
Complete file names. You write "/etc/pass" Hmm. You forgot the file name. Just press ^X^F
and the filename is completed
^Z or :sh
Move temporary to the shell. If you need a quick bashing:
press ^Z (to put vi in background) to return to original shell and press fg to return
to vim back
press :sh to go to sub shell and press ^D/exit to return to vi back
With ^X^F my pet peeve is that filenames include = signs, making it
do rotten things in many occasions (ini files, makefiles etc). I use se
isfname-== to end that nuisance – sehe
Mar 4 '12 at 21:50
This is a nice trick to reopen the current file with a different encoding:
:e ++enc=cp1250 %:p
Useful when you have to work with legacy encodings. The supported encodings are listed in
a table under encoding-values (see helpencoding-values ). Similar thing also works for ++ff, so that you
can reopen file with Windows/Unix line ends if you get it wrong for the first time (see
helpff ).
>, Apr 7, 2009 at 18:43
Never had to use this sort of a thing, but we'll certainly add to my arsenal of tricks...
– Sasha
Apr 7 '09 at 18:43
I have used this today, but I think I didn't need to specify "%:p"; just opening the file and
:e ++enc=cp1250 was enough. I – Ivan Vučica
Jul 8 '09 at 19:29
This is a terrific answer. Not the bit about creating the IP addresses, but the bit that
implies that VIM can use for loops in commands . – dotancohen
Nov 30 '14 at 14:56
No need, usually, to be exactly on the braces. Thought frequently I'd just =} or
vaBaB= because it is less dependent. Also, v}}:!astyle -bj matches
my code style better, but I can get it back into your style with a simple %!astyle
-aj – sehe
Mar 4 '12 at 22:03
I remapped capslock to esc instead, as it's an otherwise useless key. My mapping was OS wide
though, so it has the added benefit of never having to worry about accidentally hitting it.
The only drawback IS ITS HARDER TO YELL AT PEOPLE. :) – Alex
Oct 5 '09 at 5:32
@ojblass: Not sure how many people ever right matlab code in Vim, but ii and
jj are commonly used for counter variables, because i and
j are reserved for complex numbers. – brianmearns
Oct 3 '12 at 12:45
@rlbond - It comes down to how good is the regex engine in the IDE. Vim's regexes are pretty
powerful; others.. not so much sometimes. – romandas
Jun 19 '09 at 16:58
The * will be greedy, so this regex assumes you have just two columns. If you want it to be
nongreedy use {-} instead of * (see :help non-greedy for more information on the {}
multiplier) – Kimball Robinson
Apr 16 '10 at 0:32
Not exactly a dark secret, but I like to put the following mapping into my .vimrc file, so I
can hit "-" (minus) anytime to open the file explorer to show files adjacent to the one I
just edit . In the file explorer, I can hit another "-" to move up one directory,
providing seamless browsing of a complex directory structures (like the ones used by the MVC
frameworks nowadays):
map - :Explore<cr>
These may be also useful for somebody. I like to scroll the screen and advance the cursor
at the same time:
map <c-j> j<c-e>
map <c-k> k<c-y>
Tab navigation - I love tabs and I need to move easily between them:
I suppose it would override autochdir temporarily (until you switched buffers again).
Basically, it changes directory to the root directory of the current file. It gives me a bit
more manual control than autochdir does. – rampion
May 8 '09 at 2:55
:set autochdir //this also serves the same functionality and it changes the current directory
to that of file in buffer – Naga Kiran
Jul 8 '09 at 13:44
I like to use 'sudo bash', and my sysadmin hates this. He locked down 'sudo' so it could only
be used with a handful of commands (ls, chmod, chown, vi, etc), but I was able to use vim to
get a root shell anyway:
bash$ sudo vi +'silent !bash' +q
Password: ******
root#
yeah... I'd hate you too ;) you should only need a root shell VERY RARELY, unless you're
already in the habit of running too many commands as root which means your permissions are
all screwed up. – jnylen
Feb 22 '11 at 15:58
Don't forget you can prepend numbers to perform an action multiple times in Vim. So to expand
the current window height by 8 lines: 8<C-W>+ – joeytwiddle
Jan 29 '12 at 18:12
well, if you haven't done anything else to the file, you can simply type u for undo.
Otherwise, I haven't figured that out yet. – Grant Limberg
Jun 17 '09 at 19:29
Commented out code is probably one of the worst types of comment you could possibly put in
your code. There are better uses for the awesome block insert. – Braden Best
Feb 4 '16 at 16:23
I use vim for just about any text editing I do, so I often times use copy and paste. The
problem is that vim by default will often times distort imported text via paste. The way to
stop this is to use
:set paste
before pasting in your data. This will keep it from messing up.
Note that you will have to issue :set nopaste to recover auto-indentation.
Alternative ways of pasting pre-formatted text are the clipboard registers ( *
and + ), and :r!cat (you will have to end the pasted fragment with
^D).
It is also sometimes helpful to turn on a high contrast color scheme. This can be done
with
:color blue
I've noticed that it does not work on all the versions of vim I use but it does on
most.
The "distortion" is happening because you have some form of automatic indentation enabled.
Using set paste or specifying a key for the pastetoggle option is a
common way to work around this, but the same effect can be achieved with set
mouse=a as then Vim knows that the flood of text it sees is a paste triggered by the
mouse. – jamessan
Dec 28 '09 at 8:27
If you have gvim installed you can often (though it depends on what your options your distro
compiles vim with) use the X clipboard directly from vim through the * register. For example
"*p to paste from the X xlipboard. (It works from terminal vim, too, it's just
that you might need the gvim package if they're separate) – kyrias
Oct 19 '13 at 12:15
Here's something not obvious. If you have a lot of custom plugins / extensions in your $HOME
and you need to work from su / sudo / ... sometimes, then this might be useful.
In your ~/.bashrc:
export VIMINIT=":so $HOME/.vimrc"
In your ~/.vimrc:
if $HOME=='/root'
if $USER=='root'
if isdirectory('/home/your_typical_username')
let rtuser = 'your_typical_username'
elseif isdirectory('/home/your_other_username')
let rtuser = 'your_other_username'
endif
else
let rtuser = $USER
endif
let &runtimepath = substitute(&runtimepath, $HOME, '/home/'.rtuser, 'g')
endif
It will allow your local plugins to load - whatever way you use to change the user.
You might also like to take the *.swp files out of your current path and into ~/vimtmp
(this goes into .vimrc):
if ! isdirectory(expand('~/vimtmp'))
call mkdir(expand('~/vimtmp'))
endif
if isdirectory(expand('~/vimtmp'))
set directory=~/vimtmp
else
set directory=.,/var/tmp,/tmp
endif
Also, some mappings I use to make editing easier - makes ctrl+s work like escape and
ctrl+h/l switch the tabs:
I prefer never to run vim as root/under sudo - and would just run the command from vim e.g.
:!sudo tee %, :!sudo mv % /etc or even launch a login shell
:!sudo -i – shalomb
Aug 24 '15 at 8:02
Ctrl-n while in insert mode will auto complete whatever word you're typing based on all the
words that are in open buffers. If there is more than one match it will give you a list of
possible words that you can cycle through using ctrl-n and ctrl-p.
Ability to run Vim on a client/server based modes.
For example, suppose you're working on a project with a lot of buffers, tabs and other
info saved on a session file called session.vim.
You can open your session and create a server by issuing the following command:
vim --servername SAMPLESERVER -S session.vim
Note that you can open regular text files if you want to create a server and it doesn't
have to be necessarily a session.
Now, suppose you're in another terminal and need to open another file. If you open it
regularly by issuing:
vim new_file.txt
Your file would be opened in a separate Vim buffer, which is hard to do interactions with
the files on your session. In order to open new_file.txt in a new tab on your server use this
command:
vim --servername SAMPLESERVER --remote-tab-silent new_file.txt
If there's no server running, this file will be opened just like a regular file.
Since providing those flags every time you want to run them is very tedious, you can
create a separate alias for creating client and server.
I placed the followings on my bashrc file:
alias vims='vim --servername SAMPLESERVER'
alias vimc='vim --servername SAMPLESERVER --remote-tab-silent'
HOWTO: Auto-complete Ctags when using Vim in Bash. For anyone else who uses Vim and Ctags,
I've written a small auto-completer function for Bash. Add the following into your
~/.bash_completion file (create it if it does not exist):
Thanks go to stylishpants for his many fixes and improvements.
_vim_ctags() {
local cur prev
COMPREPLY=()
cur="${COMP_WORDS[COMP_CWORD]}"
prev="${COMP_WORDS[COMP_CWORD-1]}"
case "${prev}" in
-t)
# Avoid the complaint message when no tags file exists
if [ ! -r ./tags ]
then
return
fi
# Escape slashes to avoid confusing awk
cur=${cur////\\/}
COMPREPLY=( $(compgen -W "`awk -vORS=" " "/^${cur}/ { print \\$1 }" tags`" ) )
;;
*)
_filedir_xspec
;;
esac
}
# Files matching this pattern are excluded
excludelist='*.@(o|O|so|SO|so.!(conf)|SO.!(CONF)|a|A|rpm|RPM|deb|DEB|gif|GIF|jp?(e)g|JP?(E)G|mp3|MP3|mp?(e)g|MP?(E)G|avi|AVI|asf|ASF|ogg|OGG|class|CLASS)'
complete -F _vim_ctags -f -X "${excludelist}" vi vim gvim rvim view rview rgvim rgview gview
Once you restart your Bash session (or create a new one) you can type:
Code:
~$ vim -t MyC<tab key>
and it will auto-complete the tag the same way it does for files and directories:
Code:
MyClass MyClassFactory
~$ vim -t MyC
I find it really useful when I'm jumping into a quick bug fix.
Auto reloads the current buffer..especially useful while viewing log files and it almost
serves the functionality of "tail" program in unix from within vim.
Checking for compile errors from within vim. set the makeprg variable depending on the
language let's say for perl
:setlocal makeprg = perl\ -c \ %
For PHP
set makeprg=php\ -l\ %
set errorformat=%m\ in\ %f\ on\ line\ %l
Issuing ":make" runs the associated makeprg and displays the compilation errors/warnings
in quickfix window and can easily navigate to the corresponding line numbers.
:make will run the makefile in the current directory, parse the compiler
output, you can then use :cn and :cp to step through the compiler
errors opening each file and seeking to the line number in question.
I was sure someone would have posted this already, but here goes.
Take any build system you please; make, mvn, ant, whatever. In the root of the project
directory, create a file of the commands you use all the time, like this:
mvn install
mvn clean install
... and so forth
To do a build, put the cursor on the line and type !!sh. I.e. filter that line; write it
to a shell and replace with the results.
The build log replaces the line, ready to scroll, search, whatever.
When you're done viewing the log, type u to undo and you're back to your file of
commands.
Why wouldn't you just set makeprg to the proper tool you use for your build (if
it isn't set already) and then use :make ? :copen will show you the
output of the build as well as allowing you to jump to any warnings/errors. –
jamessan
Dec 28 '09 at 8:29
==========================================================
In normal mode
==========================================================
gf ................ open file under cursor in same window --> see :h path
Ctrl-w f .......... open file under cursor in new window
Ctrl-w q .......... close current window
Ctrl-w 6 .......... open alternate file --> see :h #
gi ................ init insert mode in last insertion position
'0 ................ place the cursor where it was when the file was last edited
Due to the latency and lack of colors (I love color schemes :) I don't like programming on
remote machines in PuTTY .
So I developed this trick to work around this problem. I use it on Windows.
You will need
1x gVim
1x rsync on remote and local machines
1x SSH private key auth to the remote machine so you don't need to type the
password
Configure rsync to make your working directory accessible. I use an SSH tunnel and only
allow connections from the tunnel:
address = 127.0.0.1
hosts allow = 127.0.0.1
port = 40000
use chroot = false
[bledge_ce]
path = /home/xplasil/divine/bledge_ce
read only = false
Then start rsyncd: rsync --daemon --config=rsyncd.conf
Setting up local machine
Install rsync from Cygwin. Start Pageant and load your private key for the remote machine.
If you're using SSH tunelling, start PuTTY to create the tunnel. Create a batch file push.bat
in your working directory which will upload changed files to the remote machine using
rsync:
SConstruct is a build file for scons. Modify the list of files to suit your needs. Replace
localhost with the name of remote machine if you don't use SSH tunelling.
Configuring Vim That is now easy. We will use the quickfix feature (:make and error list),
but the compilation will run on the remote machine. So we need to set makeprg:
This will first start the push.bat task to upload the files and then execute the commands
on remote machine using SSH ( Plink from the PuTTY
suite). The command first changes directory to the working dir and then starts build (I use
scons).
The results of build will show conviniently in your local gVim errors list.
I use Vim for everything. When I'm editing an e-mail message, I use:
gqap (or gwap )
extensively to easily and correctly reformat on a paragraph-by-paragraph basis, even with
quote leadin characters. In order to achieve this functionality, I also add:
-c 'set fo=tcrq' -c 'set tw=76'
to the command to invoke the editor externally. One noteworthy addition would be to add '
a ' to the fo (formatoptions) parameter. This will automatically reformat the paragraph as
you type and navigate the content, but may interfere or cause problems with errant or odd
formatting contained in the message.
autocmd FileType mail set tw=76 fo=tcrq in your ~/.vimrc will also
work, if you can't edit the external editor command. – Andrew Ferrier
Jul 14 '14 at 22:22
":e ." does the same thing for your current working directory which will be the same as your
current file's directory if you set autochdir – bpw1621
Feb 19 '11 at 15:13
retab 1. This sets the tab size to one. But it also goes through the code and adds extra
tabs and spaces so that the formatting does not move any of the actual text (ie the text
looks the same after ratab).
% s/^I/ /g: Note the ^I is tthe result of hitting tab. This searches for all tabs and
replaces them with a single space. Since we just did a retab this should not cause the
formatting to change but since putting tabs into a website is hit and miss it is good to
remove them.
% s/^/ /: Replace the beginning of the line with four spaces. Since you cant actually
replace the beginning of the line with anything it inserts four spaces at the beging of the
line (this is needed by SO formatting to make the code stand out).
Note that you can achieve the same thing with cat <file> | awk '{print " "
$line}' . So try :w ! awk '{print " " $line}' | xclip -i . That's
supposed to be four spaces between the "" – Braden Best
Feb 4 '16 at 16:40
When working on a project where the build process is slow I always build in the background
and pipe the output to a file called errors.err (something like make debug 2>&1
| tee errors.err ). This makes it possible for me to continue editing or reviewing the
source code during the build process. When it is ready (using pynotify on GTK to inform me
that it is complete) I can look at the result in vim using quickfix . Start by
issuing :cf[ile] which reads the error file and jumps to the first error. I personally like
to use cwindow to get the build result in a separate window.
A short explanation would be appreciated... I tried it and could be very usefull! You can
even do something like set colorcolumn=+1,+10,+20 :-) – Luc M
Oct 31 '12 at 15:12
colorcolumn allows you to specify columns that are highlighted (it's ideal for
making sure your lines aren't too long). In the original answer, set cc=+1
highlights the column after textwidth . See the documentation for
more information. – mjturner
Aug 19 '15 at 11:16
Yes, but that's like saying yank/paste functions make an editor "a little" more like an IDE.
Those are editor functions. Pretty much everything that goes with the editor that concerns
editing text and that particular area is an editor function. IDE functions would be, for
example, project/files management, connectivity with compiler&linker, error reporting,
building automation tools, debugger ... i.e. the stuff that doesn't actually do nothing with
editing text. Vim has some functions & plugins so he can gravitate a little more towards
being an IDE, but these are not the ones in question. – Rook
May 12 '09 at 21:25
Also, just FYI, vim has an option to set invnumber. That way you don't have to "set nu" and
"set nonu", i.e. remember two functions - you can just toggle. – Rook
May 12 '09 at 21:31
:ls lists all the currently opened buffers. :be opens a file in a
new buffer, :bn goes to the next buffer, :bp to the previous,
:b filename opens buffer filename (it auto-completes too). buffers are distinct
from tabs, which i'm told are more analogous to views. – Nona Urbiz
Dec 20 '10 at 8:25
In insert mode, ctrl + x, ctrl + p will complete
(with menu of possible completions if that's how you like it) the current long identifier
that you are typing.
if (SomeCall(LONG_ID_ <-- type c-x c-p here
[LONG_ID_I_CANT_POSSIBLY_REMEMBER]
LONG_ID_BUT_I_NEW_IT_WASNT_THIS_ONE
LONG_ID_GOSH_FORGOT_THIS
LONG_ID_ETC
∶
Neither of the following is really diehard, but I find it extremely useful.
Trivial bindings, but I just can't live without. It enables hjkl-style movement in insert
mode (using the ctrl key). In normal mode: ctrl-k/j scrolls half a screen up/down and
ctrl-l/h goes to the next/previous buffer. The µ and ù mappings are especially
for an AZERTY-keyboard and go to the next/previous make error.
A small function I wrote to highlight functions, globals, macro's, structs and typedefs.
(Might be slow on very large files). Each type gets different highlighting (see ":help
group-name" to get an idea of your current colortheme's settings) Usage: save the file with
ww (default "\ww"). You need ctags for this.
nmap <Leader>ww :call SaveCtagsHighlight()<CR>
"Based on: http://stackoverflow.com/questions/736701/class-function-names-highlighting-in-vim
function SaveCtagsHighlight()
write
let extension = expand("%:e")
if extension!="c" && extension!="cpp" && extension!="h" && extension!="hpp"
return
endif
silent !ctags --fields=+KS *
redraw!
let list = taglist('.*')
for item in list
let kind = item.kind
if kind == 'member'
let kw = 'Identifier'
elseif kind == 'function'
let kw = 'Function'
elseif kind == 'macro'
let kw = 'Macro'
elseif kind == 'struct'
let kw = 'Structure'
elseif kind == 'typedef'
let kw = 'Typedef'
else
continue
endif
let name = item.name
if name != 'operator=' && name != 'operator ='
exec 'syntax keyword '.kw.' '.name
endif
endfor
echo expand("%")." written, tags updated"
endfunction
I have the habit of writing lots of code and functions and I don't like to write
prototypes for them. So I made some function to generate a list of prototypes within a
C-style sourcefile. It comes in two flavors: one that removes the formal parameter's name and
one that preserves it. I just refresh the entire list every time I need to update the
prototypes. It avoids having out of sync prototypes and function definitions. Also needs
ctags.
"Usage: in normal mode, where you want the prototypes to be pasted:
":call GenerateProptotypes()
function GeneratePrototypes()
execute "silent !ctags --fields=+KS ".expand("%")
redraw!
let list = taglist('.*')
let line = line(".")
for item in list
if item.kind == "function" && item.name != "main"
let name = item.name
let retType = item.cmd
let retType = substitute( retType, '^/\^\s*','','' )
let retType = substitute( retType, '\s*'.name.'.*', '', '' )
if has_key( item, 'signature' )
let sig = item.signature
let sig = substitute( sig, '\s*\w\+\s*,', ',', 'g')
let sig = substitute( sig, '\s*\w\+\(\s)\)', '\1', '' )
else
let sig = '()'
endif
let proto = retType . "\t" . name . sig . ';'
call append( line, proto )
let line = line + 1
endif
endfor
endfunction
function GeneratePrototypesFullSignature()
"execute "silent !ctags --fields=+KS ".expand("%")
let dir = expand("%:p:h");
execute "silent !ctags --fields=+KSi --extra=+q".dir."/* "
redraw!
let list = taglist('.*')
let line = line(".")
for item in list
if item.kind == "function" && item.name != "main"
let name = item.name
let retType = item.cmd
let retType = substitute( retType, '^/\^\s*','','' )
let retType = substitute( retType, '\s*'.name.'.*', '', '' )
if has_key( item, 'signature' )
let sig = item.signature
else
let sig = '(void)'
endif
let proto = retType . "\t" . name . sig . ';'
call append( line, proto )
let line = line + 1
endif
endfor
endfunction
" Pasting in normal mode should append to the right of cursor
nmap <C-V> a<C-V><ESC>
" Saving
imap <C-S> <C-o>:up<CR>
nmap <C-S> :up<CR>
" Insert mode control delete
imap <C-Backspace> <C-W>
imap <C-Delete> <C-O>dw
nmap <Leader>o o<ESC>k
nmap <Leader>O O<ESC>j
" tired of my typo
nmap :W :w
I rather often find it useful to on-the-fly define some key mapping just like one would
define a macro. The twist here is, that the mapping is recursive and is executed
until it fails.
I am completely aware of all the downsides - it just so happens that I found it rather
useful in some occasions. Also it can be interesting to watch it at work ;).
Macros are also allowed to be recursive and work in pretty much the same fashion when they
are, so it's not particularly necessary to use a mapping for this. – 00dani
Aug 2 '13 at 11:25
"... The .vimrc settings should be heavily commented ..."
"... Look also at perl-support.vim (a Perl IDE for Vim/gVim). Comes with suggestions for customizing Vim (.vimrc), gVim (.gvimrc), ctags, perltidy, and Devel:SmallProf beside many other things. ..."
"... Perl Best Practices has an appendix on Editor Configurations . vim is the first editor listed. ..."
"... Andy Lester and others maintain the official Perl, Perl 6 and Pod support files for Vim on Github: https://github.com/vim-perl/vim-perl ..."
There are a lot of threads pertaining to how to configure Vim/GVim for Perl
development on PerlMonks.org .
My purpose in posting this question is to try to create, as much as possible, an ideal configuration for Perl development using
Vim/GVim. Please post your suggestions for .vimrc settings as well as useful plugins.
I will try to merge the recommendations into a set of .vimrc settings and to a list of recommended plugins, ftplugins
and syntax files.
.vimrc settings
"Create a command :Tidy to invoke perltidy"
"By default it operates on the whole file, but you can give it a"
"range or visual range as well if you know what you're doing."
command -range=% -nargs=* Tidy <line1>,<line2>!
\perltidy -your -preferred -default -options <args>
vmap <tab> >gv "make tab in v mode indent code"
vmap <s-tab> <gv
nmap <tab> I<tab><esc> "make tab in normal mode indent code"
nmap <s-tab> ^i<bs><esc>
let perl_include_pod = 1 "include pod.vim syntax file with perl.vim"
let perl_extended_vars = 1 "highlight complex expressions such as @{[$x, $y]}"
let perl_sync_dist = 250 "use more context for highlighting"
set nocompatible "Use Vim defaults"
set backspace=2 "Allow backspacing over everything in insert mode"
set autoindent "Always set auto-indenting on"
set expandtab "Insert spaces instead of tabs in insert mode. Use spaces for indents"
set tabstop=4 "Number of spaces that a <Tab> in the file counts for"
set shiftwidth=4 "Number of spaces to use for each step of (auto)indent"
set showmatch "When a bracket is inserted, briefly jump to the matching one"
@Manni: You are welcome. I have been using the same .vimrc for many years and a recent bunch of vim related questions
got me curious. I was too lazy to wade through everything that was posted on PerlMonks (and see what was current etc.), so I figured
we could put together something here. � Sinan �n�r
Oct 15 '09 at 20:02
Rather than closepairs, I would recommend delimitMate or one of the various autoclose plugins. (There are about three named autoclose,
I think.) The closepairs plugin can't handle a single apostrophe inside a string (i.e. print "This isn't so hard, is it?"
), but delimitMate and others can. github.com/Raimondi/delimitMate
� Telemachus
Jul 8 '10 at 0:40
Three hours later: turns out that the 'p' in that mapping is a really bad idea. It will bite you when vim's got something to paste.
� innaM
Oct 21 '09 at 13:22
@Manni: I just gave it a try: if you type, pt, vim waits for you to type something else (e.g. <cr>) as a signal that
the command is ended. Hitting, ptv will immediately format the region. So I would expect that vim recognizes that
there is overlap between the mappings, and waits for disambiguation before proceeding. �
Ether
Oct 21 '09 at 19:44
" Create a command :Tidy to invoke perltidy.
" By default it operates on the whole file, but you can give it a
" range or visual range as well if you know what you're doing.
command -range=% -nargs=* Tidy <line1>,<line2>!
\perltidy -your -preferred -default -options <args>
Look also at perl-support.vim (a Perl
IDE for Vim/gVim). Comes with suggestions for customizing Vim (.vimrc), gVim (.gvimrc), ctags, perltidy, and Devel:SmallProf beside
many other things.
I hate the fact that \$ is changed automatically to a "my $" declaration (same with \@ and \%). Does the author never use references
or what?! � sundar
Mar 11 '10 at 20:54
" Allow :make to run 'perl -c' on the current buffer, jumping to
" errors as appropriate
" My copy of vimparse: http://irc.peeron.com/~zigdon/misc/vimparse.pl
set makeprg=$HOME/bin/vimparse.pl\ -c\ %\ $*
" point at wherever you keep the output of pltags.pl, allowing use of ^-]
" to jump to function definitions.
set tags+=/path/to/tags
@sinan it enables quickfix - all it does is reformat the output of perl -c so that vim parses it as compiler errors. The the usual
quickfix commands work. � zigdon
Oct 16 '09 at 18:51
Here's an interesting module I found on the weekend:
App::EditorTools::Vim
. Its most interesting feature seems to be its ability to rename lexical variables. Unfortunately, my tests revealed that it doesn't
seem to be ready yet for any production use, but it sure seems worth to keep an eye on.
Here are a couple of my .vimrc settings. They may not be Perl specific, but I couldn't work without them:
set nocompatible " Use Vim defaults (much better!) "
set bs=2 " Allow backspacing over everything in insert mode "
set ai " Always set auto-indenting on "
set showmatch " show matching brackets "
" for quick scripts, just open a new buffer and type '_perls' "
iab _perls #!/usr/bin/perl<CR><BS><CR>use strict;<CR>use warnings;<CR>
The first one I know I picked up part of it from someone else, but I can't remember who. Sorry unknown person. Here's how I
made "C^N" auto complete work with Perl. Here's my .vimrc commands.
" to use CTRL+N with modules for autocomplete "
set iskeyword+=:
set complete+=k~/.vim_extras/installed_modules.dat
Then I set up a cron to create the installed_modules.dat file. Mine is for my mandriva system. Adjust accordingly.
locate *.pm | grep "perl5" | sed -e "s/\/usr\/lib\/perl5\///" | sed -e "s/5.8.8\///" | sed -e "s/5.8.7\///" | sed -e "s/vendor_perl\///" | sed -e "s/site_perl\///" | sed -e "s/x86_64-linux\///" | sed -e "s/\//::/g" | sed -e "s/\.pm//" >/home/jeremy/.vim_extras/installed_modules.dat
The second one allows me to use gf in Perl. Gf is a shortcut to other files. just place your cursor over the file and type
gf and it will open that file.
" To use gf with perl "
set path+=$PWD/**,
set path +=/usr/lib/perl5/*,
set path+=/CompanyCode/*, " directory containing work code "
autocmd BufRead *.p? set include=^use
autocmd BufRead *.pl set includeexpr=substitute(v:fname,'\\(.*\\)','\\1.pm','i')
To copy two lines, it's even faster just to go yj or yk,
especially since you don't double up on one character. Plus, yk is a backwards
version that 2yy can't do, and you can put the number of lines to reach
backwards in y9j or y2k, etc.. Only difference is that your count
has to be n-1 for a total of n lines, but your head can learn that
anyway. – zelk
Mar 9 '14 at 13:29
If you would like to duplicate a line and paste it right away below the current like, just
like in Sublime Ctrl + Shift + D, then you can add this to
your .vimrc file.
y7yp (or 7yyp) is rarely useful; the cursor remains on the first line copied so that p pastes
the copied lines between the first and second line of the source. To duplicate a block of
lines use 7yyP – Nefrubyr
Jul 29 '14 at 14:09
For someone who doesn't know vi, some answers from above might mislead him with phrases like
"paste ... after/before current line ".
It's actually "paste ... after/before cursor ".
yy or Y to copy the line
or dd to delete the line
then
p to paste the copied or deleted text after the cursor
or P to paste the copied or deleted text before the cursor
For those starting to learn vi, here is a good introduction to vi by listing side by side vi
commands to typical Windows GUI Editor cursor movement and shortcut keys. It lists all the
basic commands including yy (copy line) and p (paste after) or
P (paste before).
When you press : in visual mode, it is transformed to '<,'>
so it pre-selects the line range the visual selection spanned over. So, in visual mode,
:t0 will copy the lines at the beginning. – Benoit
Jun 30 '12 at 14:17
For the record: when you type a colon (:) you go into command line mode where you can enter
Ex commands. vimdoc.sourceforge.net/htmldoc/cmdline.html
Ex commands can be really powerful and terse. The yyp solutions are "Normal mode" commands.
If you want to copy/move/delete a far-away line or range of lines an Ex command can be a lot
faster. – Niels Bom
Jul 31 '12 at 8:21
Y is usually remapped to y$ (yank (copy) until end of line (from
current cursor position, not beginning of line)) though. With this line in
.vimrc : :nnoremap Y y$ – Aaron Thoma
Aug 22 '13 at 23:31
gives you the advantage of preserving the cursor position.
,Sep 18, 2008 at 20:32
You can also try <C-x><C-l> which will repeat the last line from insert mode and
brings you a completion window with all of the lines. It works almost like <C-p>
This is very useful, but to avoid having to press many keys I have mapped it to just CTRL-L,
this is my map: inoremap ^L ^X^L – Jorge Gajon
May 11 '09 at 6:38
1 gotcha: when you use "p" to put the line, it puts it after the line your cursor is
on, so if you want to add the line after the line you're yanking, don't move the cursor down
a line before putting the new line.
Use the > command. To indent 5 lines, 5>> . To mark a block of lines and indent it, Vjj>
to indent 3 lines (vim only). To indent a curly-braces block, put your cursor on one of the curly braces and use >%
.
If you're copying blocks of text around and need to align the indent of a block in its new location, use ]p
instead of just p . This aligns the pasted block with the surrounding text.
Also, the shiftwidth
setting allows you to control how many spaces to indent.
My problem(in gVim) is that the command > indents much more than 2 blanks (I want just two blanks but > indent something like
5 blanks) � Kamran Bigdely
Feb 28 '11 at 23:25
The problem with . in this situation is that you have to move your fingers. With @mike's solution (same one i use) you've already
got your fingers on the indent key and can just keep whacking it to keep indenting rather than switching and doing something else.
Using period takes longer because you have to move your hands and it requires more thought because it's a second, different, operation.
� masukomi
Dec 6 '13 at 21:24
I've an XML file and turned on syntax highlighting. Typing gg=G just puts every line starting from position 1. All
the white spaces have been removed. Is there anything else specific to XML? �
asgs
Jan 28 '14 at 21:57
This is cumbersome, but is the way to go if you do formatting outside of core VIM (for instance, using vim-prettier
instead of the default indenting engine). Using > will otherwise royally scew up the formatting done by Prettier.
� oligofren
Mar 27 at 15:23
I find it better than the accepted answer, as I can see what is happening, the lines I'm selecting and the action I'm doing, and
not just type some sort of vim incantation. � user4052054
Aug 17 at 17:50
Suppose | represents the position of the cursor in Vim. If the text to be indented is enclosed in a code block like:
int main() {
line1
line2|
line3
}
you can do >i{ which means " indent ( > ) inside ( i ) block ( { )
" and get:
int main() {
line1
line2|
line3
}
Now suppose the lines are contiguous but outside a block, like:
do
line2|
line3
line4
done
To indent lines 2 thru 4 you can visually select the lines and type > . Or even faster you can do >2j
to get:
do
line2|
line3
line4
done
Note that >Nj means indent from current line to N lines below. If the number of lines to be indented
is large, it could take some seconds for the user to count the proper value of N . To save valuable seconds you can
activate the option of relative number with set relativenumber (available since Vim version 7.3).
Not on my Solaris or AIX boxes it doesn't. The equals key has always been one of my standard ad hoc macro assignments. Are you
sure you're not looking at a vim that's been linked to as vi ? �
rojomoke
Jul 31 '14 at 10:09
In ex mode you can use :left or :le to align lines a specified amount. Specifically,
:left will Left align lines in the [range]. It sets the indent in the lines to [indent] (default 0).
:%le3 or :%le 3 or :%left3 or :%left 3 will align the entire file by padding
with three spaces.
:5,7 le 3 will align lines 5 through 7 by padding them with 3 spaces.
:le without any value or :le 0 will left align with a padding of 0.
Awesome, just what I was looking for (a way to insert a specific number of spaces -- 4 spaces for markdown code -- to override
my normal indent). In my case I wanted to indent a specific number of lines in visual mode, so shift-v to highlight the lines,
then :'<,'>le4 to insert the spaces. Thanks! �
Subfuzion
Aug 11 '17 at 22:02
There is one more way that hasn't been mentioned yet - you can use norm i command to insert given text at the beginning
of the line. To insert 10 spaces before lines 2-10:
:2,10norm 10i
Remember that there has to be space character at the end of the command - this will be the character we want to have inserted.
We can also indent line with any other text, for example to indent every line in file with 5 underscore characters:
:%norm 5i_
Or something even more fancy:
:%norm 2i[ ]
More practical example is commenting Bash/Python/etc code with # character:
:1,20norm i#
To re-indent use x instead of i . For example to remove first 5 characters from every line:
...what? 'indent by 4 spaces'? No, this jumps to line 4 and then indents everything from there to the end of the file, using the
currently selected indent mode (if any). � underscore_d
Oct 17 '15 at 19:35
There are clearly a lot of ways to solve this, but this is the easiest to implement, as line numbers show by default in vim and
it doesn't require math. � HoldOffHunger
Dec 5 '17 at 15:50
How to indent highlighted code in vi immediately by a # of spaces:
Option 1: Indent a block of code in vi to three spaces with Visual Block mode:
Select the block of code you want to indent. Do this using Ctrl+V in normal mode and arrowing down to select
text. While it is selected, enter : to give a command to the block of selected text.
The following will appear in the command line: :'<,'>
To set indent to 3 spaces, type le 3 and press enter. This is what appears: :'<,'>le 3
The selected text is immediately indented to 3 spaces.
Option 2: Indent a block of code in vi to three spaces with Visual Line mode:
Open your file in VI.
Put your cursor over some code
Be in normal mode press the following keys:
Vjjjj:le 3
Interpretation of what you did:
V means start selecting text.
jjjj arrows down 4 lines, highlighting 4 lines.
: tells vi you will enter an instruction for the highlighted text.
le 3 means indent highlighted text 3 lines.
The selected code is immediately increased or decreased to three spaces indentation.
Option 3: use Visual Block mode and special insert mode to increase indent:
Open your file in VI.
Put your cursor over some code
Be in normal mode press the following keys:
Ctrl+V
jjjj
(press spacebar 5 times)
EscShift+i
All the highlighted text is indented an additional 5 spaces.
This answer summarises the other answers and comments of this question, and adds extra information based on the
Vim documentation and the
Vim wiki . For conciseness, this answer doesn't distinguish between Vi and
Vim-specific commands.
In the commands below, "re-indent" means "indent lines according to your
indentation settings ."
shiftwidth is the
primary variable that controls indentation.
General Commands
>> Indent line by shiftwidth spaces
<< De-indent line by shiftwidth spaces
5>> Indent 5 lines
5== Re-indent 5 lines
>% Increase indent of a braced or bracketed block (place cursor on brace first)
=% Reindent a braced or bracketed block (cursor on brace)
<% Decrease indent of a braced or bracketed block (cursor on brace)
]p Paste text, aligning indentation with surroundings
=i{ Re-indent the 'inner block', i.e. the contents of the block
=a{ Re-indent 'a block', i.e. block and containing braces
=2a{ Re-indent '2 blocks', i.e. this block and containing block
>i{ Increase inner block indent
<i{ Decrease inner block indent
You can replace { with } or B, e.g. =iB is a valid block indent command.
Take a look at "Indent a Code Block" for a nice example
to try these commands out on.
Also, remember that
. Repeat last command
, so indentation commands can be easily and conveniently repeated.
Re-indenting complete files
Another common situation is requiring indentation to be fixed throughout a source file:
gg=G Re-indent entire buffer
You can extend this idea to multiple files:
" Re-indent all your c source code:
:args *.c
:argdo normal gg=G
:wall
Or multiple buffers:
" Re-indent all open buffers:
:bufdo normal gg=G:wall
In Visual Mode
Vjj> Visually mark and then indent 3 lines
In insert mode
These commands apply to the current line:
CTRL-t insert indent at start of line
CTRL-d remove indent at start of line
0 CTRL-d remove all indentation from line
Ex commands
These are useful when you want to indent a specific range of lines, without moving your cursor.
:< and :> Given a range, apply indentation e.g.
:4,8> indent lines 4 to 8, inclusive
set expandtab "Use softtabstop spaces instead of tab characters for indentation
set shiftwidth=4 "Indent by 4 spaces when using >>, <<, == etc.
set softtabstop=4 "Indent by 4 spaces when pressing <TAB>
set autoindent "Keep indentation from previous line
set smartindent "Automatically inserts indentation in some cases
set cindent "Like smartindent, but stricter and more customisable
Vim has intelligent indentation based on filetype. Try adding this to your .vimrc:
if has ("autocmd")
" File type detection. Indent based on filetype. Recommended.
filetype plugin indent on
endif
Both this answer and the one above it were great. But I +1'd this because it reminded me of the 'dot' operator, which repeats
the last command. This is extremely useful when needing to indent an entire block several shiftspaces (or indentations)
without needing to keep pressing >} . Thanks a long �
Amit
Aug 10 '11 at 13:26
5>> Indent 5 lines : This command indents the fifth line, not 5 lines. Could this be due to my VIM settings, or is your
wording incorrect? � Wipqozn
Aug 24 '11 at 16:00
Great summary! Also note that the "indent inside block" and "indent all block" (<i{ >a{ etc.) also works with parentheses and
brackets: >a( <i] etc. (And while I'm at it, in addition to <>'s, they also work with d,c,y etc.) �
aqn
Mar 6 '13 at 4:42
Using Python a lot, I find myself needing frequently needing to shift blocks by more than one indent. You can do this by using
any of the block selection methods, and then just enter the number of indents you wish to jump right before the >
Eg. V5j3> will indent 5 lines 3 times - which is 12 spaces if you use 4 spaces for indents
The beauty of vim's UI is that it's consistent. Editing commands are made up of the command and a cursor move. The cursor moves
are always the same:
H to top of screen, L to bottom, M to middle
n G to go to line n, G alone to bottom of file, gg to top
n to move to next search match, N to previous
} to end of paragraph
% to next matching bracket, either of the parentheses or the tag kind
enter to the next line
'x to mark x where x is a letter or another '
many more, including w and W for word, $ or 0 to tips of the
line, etc, that don't apply here because are not line movements.
So, in order to use vim you have to learn to move the cursor and remember a repertoire of commands like, for example, >
to indent (and < to "outdent").
Thus, for indenting the lines from the cursor position to the top of the screen you do >H, >G to indent
to the bottom of the file.
If, instead of typing >H, you type dH then you are deleting the same block of lines, cH
for replacing it, etc.
Some cursor movements fit better with specific commands. In particular, the % command is handy to indent a whole
HTML or XML block.
If the file has syntax highlighted ( :syn on ) then setting the cursor in the text of a tag (like, in the "i" of
<div> and entering >% will indent up to the closing </div> tag.
This is how vim works: one has to remember only the cursor movements and the commands, and how to mix them.
So my answer to this question would be "go to one end of the block of lines you want to indent, and then type the >
command and a movement to the other end of the block" if indent is interpreted as shifting the lines, =
if indent is interpreted as in pretty-printing.
When the 'expandtab' option is off (this is the default) Vim uses <Tab>s as much as possible to make the indent. ( :help :> )
� Kent Fredric
Mar 16 '11 at 8:36
The only tab/space related vim setting I've changed is :set tabstop=3. It's actually inserting this every time I use >>: "<tab><space><space>".
Same with indenting a block. Any ideas? � Shane
Reustle
Dec 2 '12 at 3:17
The three settings you want to look at for "spaces vs tabs" are 1. tabstop 2. shiftwidth 3. expandtab. You probably have "shiftwidth=5
noexpandtab", so a "tab" is 3 spaces, and an indentation level is "5" spaces, so it makes up the 5 with 1 tab, and 2 spaces. �
Kent Fredric
Dec 2 '12 at 17:08
For me, the MacVim (Visual) solution was, select with mouse and press ">", but after putting the following lines in "~/.vimrc"
since I like spaces instead of tabs:
set expandtab
set tabstop=2
set shiftwidth=2
Also it's useful to be able to call MacVim from the command-line (Terminal.app), so since I have the following helper directory
"~/bin", where I place a script called "macvim":
#!/usr/bin/env bash
/usr/bin/open -a /Applications/MacPorts/MacVim.app $@
And of course in "~/.bashrc":
export PATH=$PATH:$HOME/bin
Macports messes with "~/.profile" a lot, so the PATH environment variable can get quite long.
A quick way to do this using VISUAL MODE uses the same process as commenting a block of code.
This is useful if you would prefer not to change your shiftwidth or use any set directives and is
flexible enough to work with TABS or SPACES or any other character.
Position cursor at the beginning on the block
v to switch to -- VISUAL MODE --
Select the text to be indented
Type : to switch to the prompt
Replacing with 3 leading spaces:
:'<,'>s/^/ /g
Or replacing with leading tabs:
:'<,'>s/^/\t/g
Brief Explanation:
'<,'> - Within the Visually Selected Range
s/^/ /g - Insert 3 spaces at the beginning of every line within the whole range
(or)
s/^/\t/g - Insert Tab at the beginning of every line within the whole range
Yup, and this is why one of my big peeves is white spaces on an otherwise empty line: they messes up vim's notion of a "paragraph".
� aqn
Mar 6 '13 at 4:47
In addition to the answer already given and accepted, it is also possible to place a marker and then indent everything from the
current cursor to the marker. Thus, enter ma where you want the top of your indented block, cursor down as far as
you need and then type >'a (note that " a " can be substituted for any valid marker name). This is sometimes
easier than 5>> or vjjj> .
I have been using vim for quite some time and am aware that selecting blocks of text in
visual mode is as simple as SHIFT + V and moving the arrow key up or
down line-by-line until I reach the end of the block of text that I want selected.
My question is - is there a faster way in visual mode to select a block of text for
example by SHIFT + V followed by specifying the line number in which I
want the selection to stop? (via :35 for example, where 35 is the line number I
want to select up to - this obviously does not work so my question is to find how if
something similar to this can be done...)
+1 Good question as I have found myself doing something like this often. I am wondering if
perhaps this isn't the place start using using v% or v/pattern or
something else? – user786653
Sep 13 '11 at 19:08
V35G will visually select from current line to line 35, also V10j
or V10k will visually select the next or previous 10 lines – Stephan
Sep 29 '14 at 22:49
for line selecting I use shortcut: nnoremap <Space> V . When in visual
line mode just right-click with mouse to define selection (at least on linux it is so).
Anyway, more effective than with keyboard only. – Mikhail V
Mar 27 '15 at 16:52
In addition to what others have said, you can also expand your selection using pattern
searches.
For example, v/foo will select from your current position to the next instance
of "foo." If you actually wanted to expand to the next instance of "foo," on line
35, for example, just press n to expand selection to the next instance, and so
on.
update
I don't often do it, but I know that some people use marks extensively to make visual
selections. For example, if I'm on line 5 and I want to select to line 35, I might press
ma to place mark a on line 5, then :35 to move to line 35.
Shift + v to enter linewise visual mode, and finally `a to
select back to mark a .
@DanielPark To select the current word, use viw . If
you want to select the current contiguous non-whitespace, use viShift + w . The difference would be when the caret is here
MyCla|ss.Method , the first combo would select MyClass and second
would select the whole thing. – Jay
Oct 31 '13 at 0:18
G Goto line [count], default last line, on the first
non-blank character linewise. If 'startofline' not
set, keep the same column.
G is a one of jump-motions.
Vim is a language. To really understand Vim, you have to know the language. Many commands are
verbs, and vim also has objects and prepositions.
V100G
V100gg
This means "select the current line up to and including line 100."
Text objects are where a lot of the power is at. They introduce more objects with
prepositions.
Vap
This means "select around the current paragraph", that is select the current paragraph and
the blank line following it.
V2ap
This means "select around the current paragraph and the next paragraph."
}V-2ap
This means "go to the end of the current paragraph and then visually select it and the
preceding paragraph."
Understanding Vim as a language will help you to get the best mileage out of it.
After you have selecting down, then you can combine with other commands:
Vapd
With the above command, you can select around a paragraph and delete it. Change the
d to a y to copy or to a c to change or to a
p to paste over.
Once you get the hang of how all these commands work together, then you will eventually
not need to visually select anything. Instead of visually selecting and then deleting a
paragraph, you can just delete the paragraph with the dap command.
The book "Unix in a Nutshell" discusses about accessing multiple files on pages 572-573.
There seem to be very useful commands such as ":e", ":e #", ":e new_file", ":n files",
":args", ":prev" and ":n!". The commands confuse me:
":n Edit next file in the list of files."
":args Display list of files to be edited."
":prev Edit previous file in the list of files."
I cannot see no real list when I do ":args". There is only a small text at the corner. I
would like to see all files that I accessed with ":e", ie a list of files in the buffer.
Where can I see the list when I do the command ":n files"? What are the commands ":prev"
and ":n" supposed to do? I got the error message:
Regarding the last part: If you have only one buffer open, then you cannot toggle through
them ('cause there is only one open). – Rook
Apr 19 '09 at 3:25
Is the notation of buffer the same as in Emacs? Interestingly, the book defines buffer only
for Emacs :( It states "When you open a file in Emacs, the file is put into a Buffer. -- The
view of the buffer contents that you have at any point in time is called a window." Are the
buffers and windows different to the things in Vim? – Léo
Léopold Hertz 준영
Apr 19 '09 at 3:57
Yes, you could say that. There are some differences in types of available buffers, but in
principle, that's it. I'm not sure about emacs, he has windows/frames .., while vim has
windows/tabs. Regarding vim: a window is only y method of showing what vim has in a buffer. A
tab is a method of showing several windows on screen (tabs in vim have only recently been
introduced). – Rook
Apr 19 '09 at 11:08
In addition to what Jonathan Leffler said, if you don't invoke Vim with multiple files from
the commandline, you can set Vim's argument list after Vim is open via:
:args *.c
Note that the argument list is different from the list of open buffers you get from
:ls . Even if you close all open buffers in Vim, the argument list stays the
same. :n and :prev may open a brand new buffer in Vim (if a buffer
for that file isn't already open), or may take you to an existing buffer.
Similarly you can open multiple buffers in Vim without affecting the argument list (or
even if the arg list is empty). :e opens a new buffer but doesn't necessarily
affect the argument list. The list of open buffers and the argument list are independent. If
you want to iterate through the list of open buffers rather than iterate through the argument
list, use :bn and :bp and friends.
The :n :p :ar :rew :last operate on the command line argument list.
E.g.
> touch aaa.txt bbb.txt ccc.txt
> gvim *.txt
vim opens in aaa.txt
:ar gives a status line
[aaa.txt] bbb.txt ccc.txt
:n moves to bbb.txt
:ar gives the status line
aaa.txt [bbb.txt] ccc.txt
:rew rewinds us back to the start of the command line arg list to aaa.txt
:last sends us to ccc.txt
:e ddd.txt edits a new file ddd.txt
:ar gives the status line
aaa.txt bbb.txt [ccc.txt]
So the command set only operates on the initial command line argument list.
,
To clarify, Vim has the argument list, the buffer list, windows, and tab pages. The argument
list is the list of files you invoked vim with (e.g. vim file1 file2); the :n and :p commands
work with this. The buffer list is the list of in-memory copies of the files you are editing,
just like emacs. Note that all the files loaded at start (in the argument list) are also in
the buffer list. Try :help buffer-list for more information on both.
Windows are viewports for buffers. Think of windows as "desks" on which you can put
buffers to work on them. Windows can be empty or be displaying buffers that can also be
displayed in other windows, which you can use for example to look at two different areas of
the same buffer at the same time. Try :help windows for more info.
Tabs are collections of windows. For example, you can have one tab with one window, and
another tab with two windows vertically split. Try :help tabpage for more info
To prevent less from clearing the screen upon exit, use -X .
From the manpage:
-X or --no-init
Disables sending the termcap initialization and deinitialization strings to the
terminal. This is sometimes desirable if the deinitialization string does something
unnecessary, like clearing the screen.
As to less exiting if the content fits on one screen, that's option -F :
-F or --quit-if-one-screen
Causes less to automatically exit if the entire file can be displayed on the first
screen.
-F is not the default though, so it's likely preset somewhere for you. Check
the env var LESS .
This is especially annoying if you know about -F but not -X , as
then moving to a system that resets the screen on init will make short files simply not
appear, for no apparent reason. This bit me with ack when I tried to take my
ACK_PAGER='less -RF' setting to the Mac. Thanks a bunch! – markpasc
Oct 11 '10 at 3:44
@markpasc: Thanks for pointing that out. I would not have realized that this combination
would cause this effect, but now it's obvious. – sleske
Oct 11 '10 at 8:45
This is especially useful for the man pager, so that man pages do not disappear as soon as
you quit less with the 'q' key. That is, you scroll to the position in a man page that you
are interested in only for it to disappear when you quit the less pager in order to use the
info. So, I added: export MANPAGER='less -s -X -F' to my .bashrc to keep man
page info up on the screen when I quit less, so that I can actually use it instead of having
to memorize it. – Michael Goldshteyn
May 30 '13 at 19:28
If you want any of the command-line options to always be default, you can add to your
.profile or .bashrc the LESS environment variable. For example:
export LESS="-XF"
will always apply -X -F whenever less is run from that login session.
Sometimes commands are aliased (even by default in certain distributions). To check for
this, type
alias
without arguments to see if it got aliased with options that you don't want. To run the
actual command in your $PATH instead of an alias, just preface it with a back-slash :
\less
To see if a LESS environment variable is set in your environment and affecting
behavior:
Thanks for that! -XF on its own was breaking the output of git diff
, and -XFR gets the best of both worlds -- no screen-clearing, but coloured
git diff output. – Giles Thomas
Jun 10 '15 at 12:23
less is a lot more than more , for instance you have a lot more
functionality:
g: go top of the file
G: go bottom of the file
/: search forward
?: search backward
N: show line number
: goto line
F: similar to tail -f, stop with ctrl+c
S: split lines
There are a couple of things that I do all the time in less , that doesn't work
in more (at least the versions on the systems I use. One is using G
to go to the end of the file, and g to go to the beginning. This is useful for log
files, when you are looking for recent entries at the end of the file. The other is search,
where less highlights the match, while more just brings you to the
section of the file where the match occurs, but doesn't indicate where it is.
You can use v to jump into the current $EDITOR. You can convert to tail -f
mode with f as well as all the other tips others offered.
Ubuntu still has distinct less/more bins. At least mine does, or the more
command is sending different arguments to less.
In any case, to see the difference, find a file that has more rows than you can see at one
time in your terminal. Type cat , then the file name. It will just dump the
whole file. Type more , then the file name. If on ubuntu, or at least my version
(9.10), you'll see the first screen, then --More--(27%) , which means there's
more to the file, and you've seen 27% so far. Press space to see the next page.
less allows moving line by line, back and forth, plus searching and a whole
bunch of other stuff.
Basically, use less . You'll probably never need more for
anything. I've used less on huge files and it seems OK. I don't think it does
crazy things like load the whole thing into memory ( cough Notepad). Showing line
numbers could take a while, though, with huge files.
more is an old utility. When the text passed to it is too large to fit on one
screen, it pages it. You can scroll down but not up.
Some systems hardlink more to less , providing users with a strange
hybrid of the two programs that looks like more and quits at the end of the file
like more but has some less features such as backwards scrolling. This is a
result of less 's more compatibility mode. You can enable this
compatibility mode temporarily with LESS_IS_MORE=1 less ... .
more passes raw escape sequences by default. Escape sequences tell your terminal
which colors to display.
less
less was written by a man who was fed up with more 's inability to
scroll backwards through a file. He turned less into an open source project and over
time, various individuals added new features to it. less is massive now. That's why
some small embedded systems have more but not less . For comparison,
less 's source is over 27000 lines long. more implementations are generally
only a little over 2000 lines long.
In order to get less to pass raw escape sequences, you have to pass it the
-r flag. You can also tell it to only pass ANSI escape characters by passing it the
-R flag.
most
most is supposed to be more than less . It can display multiple files at
a time. By default, it truncates long lines instead of wrapping them and provides a
left/right scrolling mechanism. most's
website has no information about most 's features. Its manpage indicates that it
is missing at least a few less features such as log-file writing (you can use
tee for this though) and external command running.
By default, most uses strange non-vi-like keybindings. man most | grep
'\<vi.?\>' doesn't return anything so it may be impossible to put most
into a vi-like mode.
most has the ability to decompress gunzip-compressed files before reading. Its
status bar has more information than less 's.
more is old utility. You can't browse step wise with more, you can use space to browse page wise, or enter line
by line, that is about it. less is more + more additional features. You can browse page wise, line wise both up and down, search
There is one single application whereby I prefer more to less :
To check my LATEST modified log files (in /var/log/ ), I use ls -AltF |
more .
While less deletes the screen after exiting with q ,
more leaves those files and directories listed by ls on the screen,
sparing me memorizing their names for examination.
(Should anybody know a parameter or configuration enabling less to keep it's
text after exiting, that would render this post obsolete.)
The parameter you want is -X (long form: --no-init ). From
less ' manpage:
Disables sending the termcap initialization and
deinitialization strings to the terminal. This is sometimes desirable if the deinitialization
string does something unnecessary, like clearing the screen.
It's not about style. Technically, you can use a <p> to create a heading just by
increasing the font size. But search engines won't understand it like that. –
jalgames
Apr 23 '14 at 12:53
You are not allowed to use the time element like that. Since
dd-mm-yyy isn't one of the recognised formats, you have to supply a
machine-readable version (in one of the recognised formats) in a datetime
attribute of the time element. See w3.org/TR/2014/REC-html5-20141028/
– Andreas Rejbrand
Nov 17 '14 at 18:58
The weakness here is that it needs to be on some sort of link, but if you have that
there's a long discussion of
alternatives here . If you don't have a link, then just use a class attribute, that's
what it's for:
@Quang: the rel attribute is there to describe what the link's destination is.
If the link has no destination, rel is meaningless. – Paul D. Waite
Sep 5 '11 at 7:19
Both
rel="author" and <address>
are designed for this exact purpose. Both are supported in HTML5. The spec tells us that
rel="author" can be used on <link><a> , and <area> elements. Google also recommends its
usage . Combining use of
<address> and rel="author" seems optimal. HTML5 best affords
wrapping <article>
headlines and bylines info in a <header> like so:
Since the pubdate attribute is gone from both the WHATWG and W3C specs, as Bruce
Lawson writes here , I suggest you to remove it from
your answer. – Paul Kozlovitch
Apr 16 '15 at 11:36
Thanks Jason, do you know what "q.v." means? Under >4.4.4 >Author information
associated with an article element (q.v. the address element) does not apply to nested
article elements. – Quang Van
Feb 29 '12 at 9:24
The dl element represents an association list consisting of zero or more name-value
groups (a description list). A name-value group consists of one or more names (dt elements)
followed by one or more values (dd elements), ignoring any nodes other than dt and dd
elements. Within a single dl element, there should not be more than one dt element for each
name.
Name-value groups may be terms and definitions, metadata topics and values, questions
and answers, or any other groups of name-value data.
Authorship and other article meta information fits perfectly into this key:value pair
structure:
who is the author
date the article published
site structure under which the article is organized (category/tag: string/arrays)
As you can see when using the <dl> element for article meta
information, we are free to wrap <address> , <a> and
even <img> tags in <dt> and/or <dd>
tags according to the nature of the content and it's intended function .
The <dl> , <dt> and <dd> tags are
free to do their job -- semantically -- conveying information about the parent
<article> ; <a> , <img> and
<address> are similarly free to do their job -- again, semantically --
conveying information regarding where to find related content, non-verbal visual
presentation, and contact details for authoritative parties, respectively.
Please note that the only unfortunate difference is that you need Administrator rights to
create symbolic links. IE, you need an elevated prompt. (A workaround is the
SeCreateSymbolicLinkPrivilege can be granted to normal Users via
secpol.msc .)
Note in terminology: Windows shortcuts are not called "symlinks"; they are shell
links , as they are simply files that the Windows Explorer shell treats specially.
Symlinks: How do I create them on NTFS file system?
Windows Vista and later versions support Unix-style symlinks on NTFS filesystems.
Remember that they also follow the same path resolution – relative links are created
relative to the link's location, not to the current directory. People often forget that. They
can also be implemented using an absolute path; EG c:\windows\system32 instead of \system32
(which goes to a system32 directory connected to the link's location).
Symlinks are implemented using reparse points and
generally have the same behavior as Unix symlinks.
For files you can execute:
mklink linkname targetpath
For directories you can execute:
mklink /d linkname targetpath
Hardlinks: How do I create them on NTFS file systems?
All versions of Windows NT support Unix-style hard links on NTFS filesystems.
Using mklink on Vista and up:
mklink /h linkname targetpath
For Windows 2000 and XP, use fsutil .
fsutil hardlink create linkname targetpath
These also work the same way as Unix hard links – multiple file table entries point
to the same inode .
Directory Junctions: How do I create them on NTFS file systems?
Windows 2000 and later support directory junctions on NTFS filesystems. They are
different from symlinks in that they are always absolute and only point to
directories, never to files.
mklink /j linkname targetpath
On versions which do not have mklink , download junction from
Sysinternals:
How can I mount a volume using a reparse point in Windows?
For completeness, on Windows 2000 and later , reparse points can also point to
volumes , resulting in persistent Unix-style disk mounts :
mountvol mountpoint \\?\Volume{volumeguid}
Volume GUIDs are listed by mountvol ; they are static but only within the
same machine.
Is there a way to do this in Windows Explorer?
Yes, you can use the shell extension Link Shell Extension
which makes it very easy to make the links that have been described above. You can find the
downloads at the bottom of the
page .
The NTFS file system implemented in NT4, Windows 2000, Windows XP, Windows XP64, and
Windows7 supports a facility known as hard links (referred to herein as
Hardlinks ). Hardlinks provide the ability to keep a single copy of a file yet have
it appear in multiple folders (directories). They can be created with the POSIX command
ln included in the Windows Resource Kit, the fsutil command
utility included in Windows XP or my command line ln.exe utility.
The extension allows the user to select one or many files or folders, then using the
mouse, complete the creation of the required Links - Hardlinks, Junctions or Symbolic Links
or in the case of folders to create Clones consisting of Hard or Symbolic Links. LSE is
supported on all Windows versions that support NTFS version 5.0 or later, including Windows
XP64 and Windows7. Hardlinks, Junctions and Symbolic Links are NOT supported on FAT file
systems, and nor is the Cloning and Smart Copy process supported on FAT file systems.
The source can simple be picked using a right click menu.
And depending on what you picked , you right click on a destination folder and
get a menu with options.
This makes it very easy to create links. For an extensive guide, read the
LSE documentation
.
In this answer I will attempt to outline what the different types of links in
directory management are as well as why they are useful as well as when
they could be used. When trying to achieve a certain organization on your file volumes,
knowing the various different types as well as creating them is valuable knowledge.
For information on how a certain link can be made, refer to grawity 's answer .
What is a link?
A link is a relationship between two entities; in the context of directory management, a
link can be seen as a relationship between the following two entities:
This table keeps track of the files and folders that reside in a specific folder.
A directory table is a special type of file that represents a directory (also known
as a folder). Each file or directory stored within it is represented by a 32-byte entry
in the table. Each entry records the name, extension, attributes (archive, directory,
hidden, read-only, system and volume), the date and time of last modification, the
address of the first cluster of the file/directory's data and finally the size of the
file/directory.
More specifically, the first cluster of the file or directory.
A cluster is the smallest logical amount of disk space that can be allocated to hold
a file.
The special thing about this relationship is that it allows one to have only one data
cluster but many links to that data cluster, this allows us to show data as being
present in multiple locations. However, there are multiple ways to do this and each method of
doing so has its own effects.
To see where this roots from, lets go back to the past...
What is a shell
link and why is in not always sufficient?
Although it might not sound familiar, we all know this one! File shortcuts are undoubtedly the most
frequently used way of linking files. These were found in some of the early versions of
Windows 9x and have been there for a long while.
These allow you to quickly create a shortcut to any file or folder, they are more
specifically made to store extra information along just the link like for example the
working
directory the file is executed in, the arguments to provide to
the program as well as options like whether to maximize the program.
The downside to this approach of linking is exactly that, the extra information requires
this type of link to have a data cluster on its own to contain that file. The problem then is
not necessarily that it takes disk space, but its rather that it the link is indirectly
accessed as the Data Cluster first has to be requested before we get to the actual link.
If the path referred to in the actual link is gone the shell link will still exist.
If you were to operate on the file being referred to, you would actually first have to
figure out in which directory the file is. You can't simply open the link in an editor as you
would then be editing the .lnk file rather than the file being linked to. This
locks out a lot of possible use cases for shell links.
How does a junction
point link try to solve these problems?
A NTFS
junction point allows one to create a symbolic link to a directory on the
local drives , in such a way that it behaves just like a normal directory. So, you have
one directory of files stored on your disk but can access it from multiple locations.
When removing the junction point, the original directory remains. When removing the
original directory, the junction point remains. It is very costly to enumerate the disk to
check for junction points that have to be deleted. This is a downside as a result of its
implementation.
The NTFS junction point is implemented using NTFS reparse points , which are NTFS
file system objects introduced with Windows 2000.
An NTFS reparse point is a type of NTFS file system object. Reparse points provide a way
to extend the NTFS filesystem by adding extra information to the directory entry, so a file
system filter can interpret how the operating system will treat the data. This allows the
creation of junction points, NTFS symbolic links and volume mount points, and is a key
feature to Windows 2000's Hierarchical Storage System.
That's right, the invention of the reparse point allows us to do more sophisticated ways
of linking.
The NTFS junction point is a soft link , which means that it just links to the
name of the file. This means that whenever the link is deleted the original data stays
intact ; but, whenever the original data is deleted the original data will be
gone .
Can I also soft link files? Are there symbolic links?
Yes, when Windows Vista came around they decided to extend the functionality of the NTFS
file system object(s) by providing the NTFS symbolic link , which is a soft
link that acts in the same way as the NTFS junction point. But can be applied to file and
directories.
They again share the same deletion behavior, in some use cases this can be a pain for
files as you don't want to have an useless copy of a file hanging around. This is why also
the notion of hard links have been implemented.
What is a hard link and how
does it behave as opposed to soft links?
Hard links are not NTFS file system objects, but they are instead a link to a file (in
detail, they refer to the MFT entry as that stores extra information about the actual file).
The MFT entry has a field that remembers the amounts of time a file is being hard linked to.
The data will still be accessible as long as at least one link that points to it still
exists.
So, the data does no longer depend on a single MFT entry to exist . As long as there is a
hard link around, the data will survive. This prevents accidental deleting for cases where
one does not want to remember where the original file was.
You could for example make a folder with "movies I still have to watch" as well as a
folder "movies that I take on vacation" as well as a folder "favorite movies". Movies that
are none of these will be properly deleted while movies that are any of these will continue
to exist even when you have watched a movie.
What is a volume mount point link
for?
Some IT or business people might dislike having to remember or type the different drive
letters their system has. What does M: really mean anyway? Was it Music? Movies?
Models? Maps?
Microsoft has done efforts over the year to try to migrate users away from the work
in drive C: to work in your user folder . I could undoubtedly say that
the users with UAC and permission problems are those that do not follow these guidelines, but
doesn't that make them wonder:
Why should you even be viewing anything but your personal files on a daily
basis?
Volume mount points are the professional IT way of not being limited by drive letters as
well as having a directory structure that makes sense for them, but...
My files are in
different places, can I use links to get them together?
In Windows 7, Libraries were
introduced exactly for this purpose. Done with music files that are located in this folder,
and that folder and that folder . From a lower level of view, a library can be
viewed as multiple links. They are again implemented as a file system object that can contain
multiple references. It is in essence a one-to-many relationship ...
My
brain explodes... Can you summarize when to use them?
Shortcut links: Use them when you need quick access to an executable or website, a
file that you launch very often or when you need to specify parameters to an
application and a batch file is an overkill. Don't use it when you intend to manipulate
the file through its shortcut.
Junction points: Use them when you want a directory to be elsewhere, this allows you
to move directories to faster or slower drives without losing the ability to access the
original path. Another use is when you want access to a directory through another path.
These can't be used to link to a share.
Soft links: Use them where a shortcut link does not suffice, it is often used when you
do intend to manipulate the file through its shortcut. Or when you want the file to be on
a faster or slower drive without losing the ability to access the original path.
Hard links: Use them when you only want a file to be gone when all hard links to it
are removed. This can't be used for folders.
Volume mount points: Use them when you run out of drive letters, or when you find it
more feasible to access a volume through a path rather than through a drive letter.
Libraries: Use them when you have the same type of file at many different locations
and you need them to be together, this supports removable drives so makes it handy to get
the folders on your removable drives show up between those on your computer when you
insert it. You can click on the individual folders from the folder tree under the library
in the tree view, which facilitates moving files between both.
But do they do so at filesystem level in such way that, say, cmd.exe and dir can
list the aggregated content (in which case, where in the file system are they, I can't find
it), or do they only aggregate at shell level, where only Windows Explorer and file dialogs
can show them? I was under the impression it was the latter, but your "No" challenges this
unless I wrote my question wrong (I meant to say "Libraries are shell-level like shortcut
links are , right?" ). – Medinoc
Sep 30 '14 at 20:52
@Pacerier: Windows uses the old location system, where you can for example move a music
folder around from its properties. Libraries are a new addition, which the OS itself barely
uses as a result. Therefore I doubt if anything would break; as they are intended solely for
display purposes, ... – Tom Wijsman
Apr 25 '15 at 10:23
If you're on Windows Vista or later, and have admin rights, you might check out the mklink
command (it's a command line tool). I'm not sure how symlink-y it actually is since windows
gives it the little arrow icon it puts on shortcuts, but a quick notepad++ test on a text
file suggests it might work for what you're looking for.
You can run mklink with no arguments for a quick usage guide.
mklink uses NTFS junction points (I believe that's what they're called) to more or less
perfectly duplicate Unix-style linking. Windows can tell that it's a junction, though, so
it'll give it the traditional arrow icon. iirc you can remove this with some registry
fiddling, but I don't remember where. – jcrawfordor
Oct 18 '11 at 17:58
@jcrawfordor: The disk structures are "reparse
points" . Junctions and symlinks are two different types of reparse points; volume
mountpoints are third. – grawity
Oct 18 '11 at 18:01
Thanks grawity, for the confirmation. I've never played around with them much, so I just
wanted to include disclaim.h ;) – GeminiDomino
Oct 18 '11 at 18:13
A Junction Point should never be removed in Win2k, Win2003 and WinXP with Explorer, the
del or del /s commands, or with any utility that recursively walks directories since these
will delete the target directory and all its subdirectories. Instead, use the rmdir
command, the linkd utility, or fsutil (if using WinXP or above) or a third party tool to
remove the junction point without affecting the target. In Vista/Win7, it's safe to delete
Junction Points with Explorer or with the rmdir and del commands.
My /etc/group has grown by adding new users as well as installing programs that
have added their own user and/or group. The same is true for /etc/passwd .
Editing has now become a little cumbersome due to the lack of structure.
May I sort these files (e.g. by numerical id or alphabetical by name) without negative
effect on the system and/or package managers?
I would guess that is does not matter but just to be sure I would like to get a 2nd
opinion. Maybe root needs to be the 1st line or within the first 1k lines or
something?
How does sorting the file help with editing? Is it because you want to group related accounts
together, and then do similar changes in a range of rows? But will related account be
adjacent if you sort by uid or name? – Barmar
Feb 21 at 20:51
@Barmar It has helped mainly because user accounts are grouped by ranges and separate from
system accounts (when sorting by UID). Therefore it is easier e.g. to spot the correct line
to examine or change when editing with vi . – Ned64
Mar 13 at 23:15
You should be OK doing
this : in fact, according to the article and reading the documentation, you can sort
/etc/passwd and /etc/group by UID/GID with pwck -s and
grpck -s , respectively.
@Menasheh This site's colours don't make them stand out as much as on other sites, but "OK
doing this" in this answer is a hyperlink. – hvd
Feb 18 at 22:59
OK, fine, but... In general, are there valid reasons to manually edit /etc/passwd and similar
files? Isn't it considered better to access these via the tools that are designed to create
and modify them? – mickeyf
Feb 19 at 14:05
@mickeyf I've seen people manually edit /etc/passwd when they're making batch
changes, like changing the GECOS field for all users due to moving/restructuring (global room
or phone number changes, etc.) It's not common anymore, but there are specific reasons that
crop up from time to time. – ErikF
Feb 20 at 21:21
Although ErikF is correct that this should generally be okay, I do want to point out one
potential issue:
You're
allowed to map different usernames to the same UID. If you make use of this, tools that
map a UID back to a username will generally pick the first username they find for that UID in
/etc/passwd . Sorting may cause a different username to appear first. For
display purposes (e.g. ls -l output), either username should work, but it's
possible that you've configured some program to accept requests from username A, where it
will deny those requests if it sees them coming from username B, even if A and B are the same
user.
Having root at first line has been a long time de facto "standard" and is very convenient if
you ever have to fix their shell or delete the password, when dealing with problems or
recovering systems.
Likewise I prefer to have daemons/utils users in the middle and standard users at the end
of both passwd and shadow .
hvd answer is also very good about disturbing the users order, especially in
systems with many users maintained by hand.
If you somewhat manage to sort the files, for instance, only for standard users, it would
be more sensible than changing the order of all users, imo.
If you sort numerically by UID, you should get your preferred order. Root is always
0 , and daemons conventionally have UIDs under 100. – Barmar
Feb 21 at 20:13
If sudo vi /etc/hosts is successful, it means that the system administrator has
allowed the user to run vi /etc/hosts as root. That's the whole point of sudo:
it lets the system administrator authorize certain users to run certain commands with extra
privileges.
Giving a user the permission to run vi gives them the permission to run any
vi command, including :sh to run a shell and :w to overwrite any
file on the system. A rule allowing only to run vi /etc/hosts does not make any
sense since it allows the user to run arbitrary commands.
There is no "hacking" involved. The breach of security comes from a misconfiguration, not
from a hole in the security model. Sudo does not particularly try to prevent against
misconfiguration. Its documentation is well-known to be difficult to understand; if in doubt,
ask around and don't try to do things that are too complicated.
It is in general a hard problem to give a user a specific privilege without giving them
more than intended. A bulldozer approach like giving them the right to run an interactive
program such as vi is bound to fail. A general piece of advice is to give the minimum
privileges necessary to accomplish the task. If you want to allow a user to modify one file,
don't give them the permission to run an editor. Instead, either:
Give them the permission to write to the file. This is the simplest method with the
least risk of doing something you didn't intend.
setfacl u:bob:rw /etc/hosts
Give them permission to edit the file via sudo. To do that, don't give them the
permission to run an editor. As explained in the sudo documentation, give them the
permission to run sudoedit , which invokes an editor as the original
user and then uses the extra privileges only to modify the file.
bob ALL = sudoedit /etc/hosts
The sudo method is more complicated to set up, and is less transparent for the user
because they have to invoke sudoedit instead of just opening the file in
their editor, but has the advantage that all accesses are logged.
Note that allowing a user to edit /etc/hosts may have an impact on your
security infrastructure: if there's any place where you rely on a host name corresponding to
a specific machine, then that user will be able to point it to a different machine. Consider
that
it is probably unnecessary anyway .
I used rsync to copy a large number of files, but my OS (Ubuntu) restarted
unexpectedly.
After reboot, I ran rsync again, but from the output on the terminal, I found
that rsync still copied those already copied before. But I heard that
rsync is able to find differences between source and destination, and therefore
to just copy the differences. So I wonder in my case if rsync can resume what
was left last time?
Yes, rsync won't copy again files that it's already copied. There are a few edge cases where
its detection can fail. Did it copy all the already-copied files? What options did you use?
What were the source and target filesystems? If you run rsync again after it's copied
everything, does it copy again? – Gilles
Sep 16 '12 at 1:56
@Gilles: Thanks! (1) I think I saw rsync copied the same files again from its output on the
terminal. (2) Options are same as in my other post, i.e. sudo rsync -azvv
/home/path/folder1/ /home/path/folder2 . (3) Source and target are both NTFS, buy
source is an external HDD, and target is an internal HDD. (3) It is now running and hasn't
finished yet. – Tim
Sep 16 '12 at 2:30
@Tim Off the top of my head, there's at least clock skew, and differences in time resolution
(a common issue with FAT filesystems which store times in 2-second increments, the
--modify-window option helps with that). – Gilles
Sep 19 '12 at 9:25
First of all, regarding the "resume" part of your question, --partial just tells
the receiving end to keep partially transferred files if the sending end disappears as though
they were completely transferred.
While transferring files, they are temporarily saved as hidden files in their target
folders (e.g. .TheFileYouAreSending.lRWzDC ), or a specifically chosen folder if
you set the --partial-dir switch. When a transfer fails and
--partial is not set, this hidden file will remain in the target folder under
this cryptic name, but if --partial is set, the file will be renamed to the
actual target file name (in this case, TheFileYouAreSending ), even though the
file isn't complete. The point is that you can later complete the transfer by running rsync
again with either --append or --append-verify .
So, --partial doesn't itself resume a failed or cancelled transfer.
To resume it, you'll have to use one of the aforementioned flags on the next run. So, if you
need to make sure that the target won't ever contain files that appear to be fine but are
actually incomplete, you shouldn't use --partial . Conversely, if you want to
make sure you never leave behind stray failed files that are hidden in the target directory,
and you know you'll be able to complete the transfer later, --partial is there
to help you.
With regards to the --append switch mentioned above, this is the actual
"resume" switch, and you can use it whether or not you're also using --partial .
Actually, when you're using --append , no temporary files are ever created.
Files are written directly to their targets. In this respect, --append gives the
same result as --partial on a failed transfer, but without creating those hidden
temporary files.
So, to sum up, if you're moving large files and you want the option to resume a cancelled
or failed rsync operation from the exact point that rsync stopped, you need to
use the --append or --append-verify switch on the next attempt.
As @Alex points out below, since version 3.0.0 rsync now has a new option,
--append-verify , which behaves like --append did before that
switch existed. You probably always want the behaviour of --append-verify , so
check your version with rsync --version . If you're on a Mac and not using
rsync from homebrew , you'll (at least up to and including El
Capitan) have an older version and need to use --append rather than
--append-verify . Why they didn't keep the behaviour on --append
and instead named the newcomer --append-no-verify is a bit puzzling. Either way,
--append on rsync before version 3 is the same as
--append-verify on the newer versions.
--append-verify isn't dangerous: It will always read and compare the data on
both ends and not just assume they're equal. It does this using checksums, so it's easy on
the network, but it does require reading the shared amount of data on both ends of the wire
before it can actually resume the transfer by appending to the target.
Second of all, you said that you "heard that rsync is able to find differences between
source and destination, and therefore to just copy the differences."
That's correct, and it's called delta transfer, but it's a different thing. To enable
this, you add the -c , or --checksum switch. Once this switch is
used, rsync will examine files that exist on both ends of the wire. It does this in chunks,
compares the checksums on both ends, and if they differ, it transfers just the differing
parts of the file. But, as @Jonathan points out below, the comparison is only done when files
are of the same size on both ends -- different sizes will cause rsync to upload the entire
file, overwriting the target with the same name.
This requires a bit of computation on both ends initially, but can be extremely efficient
at reducing network load if for example you're frequently backing up very large files
fixed-size files that often contain minor changes. Examples that come to mind are virtual
hard drive image files used in virtual machines or iSCSI targets.
It is notable that if you use --checksum to transfer a batch of files that
are completely new to the target system, rsync will still calculate their checksums on the
source system before transferring them. Why I do not know :)
So, in short:
If you're often using rsync to just "move stuff from A to B" and want the option to cancel
that operation and later resume it, don't use --checksum , but do use
--append-verify .
If you're using rsync to back up stuff often, using --append-verify probably
won't do much for you, unless you're in the habit of sending large files that continuously
grow in size but are rarely modified once written. As a bonus tip, if you're backing up to
storage that supports snapshotting such as btrfs or zfs , adding
the --inplace switch will help you reduce snapshot sizes since changed files
aren't recreated but rather the changed blocks are written directly over the old ones. This
switch is also useful if you want to avoid rsync creating copies of files on the target when
only minor changes have occurred.
When using --append-verify , rsync will behave just like it always does on
all files that are the same size. If they differ in modification or other timestamps, it will
overwrite the target with the source without scrutinizing those files further.
--checksum will compare the contents (checksums) of every file pair of identical
name and size.
UPDATED 2015-09-01 Changed to reflect points made by @Alex (thanks!)
UPDATED 2017-07-14 Changed to reflect points made by @Jonathan (thanks!)
According to the documentation--append does not check the data, but --append-verify does.
Also, as @gaoithe points out in a comment below, the documentation claims
--partialdoes resume from previous files. – Alex
Aug 28 '15 at 3:49
Thank you @Alex for the updates. Indeed, since 3.0.0, --append no longer
compares the source to the target file before appending. Quite important, really!
--partial does not itself resume a failed file transfer, but rather leaves it
there for a subsequent --append(-verify) to append to it. My answer was clearly
misrepresenting this fact; I'll update it to include these points! Thanks a lot :) –
DanielSmedegaardBuus
Sep 1 '15 at 13:29
@CMCDragonkai Actually, check out Alexander's answer below about --partial-dir
-- looks like it's the perfect bullet for this. I may have missed something entirely ;)
– DanielSmedegaardBuus
May 10 '16 at 19:31
What's your level of confidence in the described behavior of --checksum ?
According to the man it has more to do with deciding
which files to flag for transfer than with delta-transfer (which, presumably, is
rsync 's default behavior). – Jonathan Y.
Jun 14 '17 at 5:48
Just specify a partial directory as the rsync man pages recommends:
--partial-dir=.rsync-partial
Longer explanation:
There is actually a built-in feature for doing this using the --partial-dir
option, which has several advantages over the --partial and
--append-verify / --append alternative.
Excerpt from the
rsync man pages:
--partial-dir=DIR
A better way to keep partial files than the --partial option is
to specify a DIR that will be used to hold the partial data
(instead of writing it out to the destination file). On the
next transfer, rsync will use a file found in this dir as data
to speed up the resumption of the transfer and then delete it
after it has served its purpose.
Note that if --whole-file is specified (or implied), any par-
tial-dir file that is found for a file that is being updated
will simply be removed (since rsync is sending files without
using rsync's delta-transfer algorithm).
Rsync will create the DIR if it is missing (just the last dir --
not the whole path). This makes it easy to use a relative path
(such as "--partial-dir=.rsync-partial") to have rsync create
the partial-directory in the destination file's directory when
needed, and then remove it again when the partial file is
deleted.
If the partial-dir value is not an absolute path, rsync will add
an exclude rule at the end of all your existing excludes. This
will prevent the sending of any partial-dir files that may exist
on the sending side, and will also prevent the untimely deletion
of partial-dir items on the receiving side. An example: the
above --partial-dir option would add the equivalent of "-f '-p
.rsync-partial/'" at the end of any other filter rules.
By default, rsync uses a random temporary file name which gets deleted when a transfer
fails. As mentioned, using --partial you can make rsync keep the incomplete file
as if it were successfully transferred , so that it is possible to later append to
it using the --append-verify / --append options. However there are
several reasons this is sub-optimal.
Your backup files may not be complete, and without checking the remote file which must
still be unaltered, there's no way to know.
If you are attempting to use --backup and --backup-dir ,
you've just added a new version of this file that never even exited before to your version
history.
However if we use --partial-dir , rsync will preserve the temporary partial
file, and resume downloading using that partial file next time you run it, and we do not
suffer from the above issues.
I agree this is a much more concise answer to the question. the TL;DR: is perfect and for
those that need more can read the longer bit. Strong work. – JKOlaf
Jun 28 '17 at 0:11
You may want to add the -P option to your command.
From the man page:
--partial By default, rsync will delete any partially transferred file if the transfer
is interrupted. In some circumstances it is more desirable to keep partially
transferred files. Using the --partial option tells rsync to keep the partial
file which should make a subsequent transfer of the rest of the file much faster.
-P The -P option is equivalent to --partial --progress. Its pur-
pose is to make it much easier to specify these two options for
a long transfer that may be interrupted.
@Flimm not quite correct. If there is an interruption (network or receiving side) then when
using --partial the partial file is kept AND it is used when rsync is resumed. From the
manpage: "Using the --partial option tells rsync to keep the partial file which should
<b>make a subsequent transfer of the rest of the file much faster</b>." –
gaoithe
Aug 19 '15 at 11:29
@Flimm and @gaoithe, my answer wasn't quite accurate, and definitely not up-to-date. I've
updated it to reflect version 3 + of rsync . It's important to stress, though,
that --partial does not itself resume a failed transfer. See my answer
for details :) – DanielSmedegaardBuus
Sep 1 '15 at 14:11
@DanielSmedegaardBuus I tried it and the -P is enough in my case. Versions:
client has 3.1.0 and server has 3.1.1. I interrupted the transfer of a single large file with
ctrl-c. I guess I am missing something. – guettli
Nov 18 '15 at 12:28
I think you are forcibly calling the rsync and hence all data is getting
downloaded when you recall it again. use --progress option to copy only those
files which are not copied and --delete option to delete any files if already
copied and now it does not exist in source folder...
@Fabien He tells rsync to set two ssh options (rsync uses ssh to connect). The second one
tells ssh to not prompt for confirmation if the host he's connecting to isn't already known
(by existing in the "known hosts" file). The first one tells ssh to not use the default known
hosts file (which would be ~/.ssh/known_hosts). He uses /dev/null instead, which is of course
always empty, and as ssh would then not find the host in there, it would normally prompt for
confirmation, hence option two. Upon connecting, ssh writes the now known host to /dev/null,
effectively forgetting it instantly :) – DanielSmedegaardBuus
Dec 7 '14 at 0:12
...but you were probably wondering what effect, if any, it has on the rsync operation itself.
The answer is none. It only serves to not have the host you're connecting to added to your
SSH known hosts file. Perhaps he's a sysadmin often connecting to a great number of new
servers, temporary systems or whatnot. I don't know :) – DanielSmedegaardBuus
Dec 7 '14 at 0:23
There are a couple errors here; one is very serious: --delete will delete files
in the destination that don't exist in the source. The less serious one is that
--progress doesn't modify how things are copied; it just gives you a progress
report on each file as it copies. (I fixed the serious error; replaced it with
--remove-source-files .) – Paul d'Aoust
Nov 17 '16 at 22:39
What's the accepted way of parsing this such that in each case (or some combination of the
two) $v , $f , and $d will all be set to
true and $outFile will be equal to /fizz/someOtherFile
?
For zsh-users there's a great builtin called zparseopts which can do: zparseopts -D -E
-M -- d=debug -debug=d And have both -d and --debug in the
$debug array echo $+debug[1] will return 0 or 1 if one of those are
used. Ref: zsh.org/mla/users/2011/msg00350.html
– dezza
Aug 2 '16 at 2:13
Preferred Method: Using straight bash without getopt[s]
I originally answered the question as the OP asked. This Q/A is getting a lot of
attention, so I should also offer the non-magic way to do this. I'm going to expand upon
guneysus's answer
to fix the nasty sed and include
Tobias Kienzler's suggestion .
Two of the most common ways to pass key value pair arguments are:
#!/bin/bash
POSITIONAL=()
while [[ $# -gt 0 ]]
do
key="$1"
case $key in
-e|--extension)
EXTENSION="$2"
shift # past argument
shift # past value
;;
-s|--searchpath)
SEARCHPATH="$2"
shift # past argument
shift # past value
;;
-l|--lib)
LIBPATH="$2"
shift # past argument
shift # past value
;;
--default)
DEFAULT=YES
shift # past argument
;;
*) # unknown option
POSITIONAL+=("$1") # save it in an array for later
shift # past argument
;;
esac
done
set -- "${POSITIONAL[@]}" # restore positional parameters
echo FILE EXTENSION = "${EXTENSION}"
echo SEARCH PATH = "${SEARCHPATH}"
echo LIBRARY PATH = "${LIBPATH}"
echo DEFAULT = "${DEFAULT}"
echo "Number files in SEARCH PATH with EXTENSION:" $(ls -1 "${SEARCHPATH}"/*."${EXTENSION}" | wc -l)
if [[ -n $1 ]]; then
echo "Last line of file specified as non-opt/last argument:"
tail -1 "$1"
fi
#!/bin/bash
for i in "$@"
do
case $i in
-e=*|--extension=*)
EXTENSION="${i#*=}"
shift # past argument=value
;;
-s=*|--searchpath=*)
SEARCHPATH="${i#*=}"
shift # past argument=value
;;
-l=*|--lib=*)
LIBPATH="${i#*=}"
shift # past argument=value
;;
--default)
DEFAULT=YES
shift # past argument with no value
;;
*)
# unknown option
;;
esac
done
echo "FILE EXTENSION = ${EXTENSION}"
echo "SEARCH PATH = ${SEARCHPATH}"
echo "LIBRARY PATH = ${LIBPATH}"
echo "Number files in SEARCH PATH with EXTENSION:" $(ls -1 "${SEARCHPATH}"/*."${EXTENSION}" | wc -l)
if [[ -n $1 ]]; then
echo "Last line of file specified as non-opt/last argument:"
tail -1 $1
fi
To better understand ${i#*=} search for "Substring Removal" in this guide . It is
functionally equivalent to `sed 's/[^=]*=//' <<< "$i"` which calls a
needless subprocess or `echo "$i" | sed 's/[^=]*=//'` which calls two
needless subprocesses.
Never use getopt(1). getopt cannot handle empty arguments strings, or
arguments with embedded whitespace. Please forget that it ever existed.
The POSIX shell (and others) offer getopts which is safe to use instead. Here
is a simplistic getopts example:
#!/bin/sh
# A POSIX variable
OPTIND=1 # Reset in case getopts has been used previously in the shell.
# Initialize our own variables:
output_file=""
verbose=0
while getopts "h?vf:" opt; do
case "$opt" in
h|\?)
show_help
exit 0
;;
v) verbose=1
;;
f) output_file=$OPTARG
;;
esac
done
shift $((OPTIND-1))
[ "${1:-}" = "--" ] && shift
echo "verbose=$verbose, output_file='$output_file', Leftovers: $@"
# End of file
The advantages of getopts are:
It's portable, and will work in e.g. dash.
It can handle things like -vf filename in the expected Unix way,
automatically.
The disadvantage of getopts is that it can only handle short options (
-h , not --help ) without trickery.
There is a getopts tutorial which explains
what all of the syntax and variables mean. In bash, there is also help getopts ,
which might be informative.
Is this really true? According to Wikipedia there's a newer GNU enhanced version of
getopt which includes all the functionality of getopts and then
some. man getopt on Ubuntu 13.04 outputs getopt - parse command options
(enhanced) as the name, so I presume this enhanced version is standard now. –
Livven
Jun 6 '13 at 21:19
You do not echo –default . In the first example, I notice that if
–default is the last argument, it is not processed (considered as
non-opt), unless while [[ $# -gt 1 ]] is set as while [[ $# -gt 0
]] – kolydart
Jul 10 '17 at 8:11
No answer mentions enhanced getopt . And the top-voted answer is misleading: It ignores
-vfd style short options (requested by the OP), options after positional
arguments (also requested by the OP) and it ignores parsing-errors. Instead:
Use enhanced getopt from util-linux or formerly GNU glibc .
1
It works with getopt_long() the C function of GNU glibc.
Has all useful distinguishing features (the others don't have them):
handles spaces, quoting characters and even binary in arguments
2
it can handle options at the end: script.sh -o outFile file1 file2
-v
allows = -style long options: script.sh --outfile=fileOut
--infile fileIn
Is so old already 3 that no GNU system is missing this (e.g. any
Linux has it).
You can test for its existence with: getopt --test → return value
4.
Other getopt or shell-builtin getopts are of limited
use.
verbose: y, force: y, debug: y, in: ./foo/bar/someFile, out: /fizz/someOtherFile
with the following myscript
#!/bin/bash
getopt --test > /dev/null
if [[ $? -ne 4 ]]; then
echo "I'm sorry, `getopt --test` failed in this environment."
exit 1
fi
OPTIONS=dfo:v
LONGOPTIONS=debug,force,output:,verbose
# -temporarily store output to be able to check for errors
# -e.g. use "--options" parameter by name to activate quoting/enhanced mode
# -pass arguments only via -- "$@" to separate them correctly
PARSED=$(getopt --options=$OPTIONS --longoptions=$LONGOPTIONS --name "$0" -- "$@")
if [[ $? -ne 0 ]]; then
# e.g. $? == 1
# then getopt has complained about wrong arguments to stdout
exit 2
fi
# read getopt's output this way to handle the quoting right:
eval set -- "$PARSED"
# now enjoy the options in order and nicely split until we see --
while true; do
case "$1" in
-d|--debug)
d=y
shift
;;
-f|--force)
f=y
shift
;;
-v|--verbose)
v=y
shift
;;
-o|--output)
outFile="$2"
shift 2
;;
--)
shift
break
;;
*)
echo "Programming error"
exit 3
;;
esac
done
# handle non-option arguments
if [[ $# -ne 1 ]]; then
echo "$0: A single input file is required."
exit 4
fi
echo "verbose: $v, force: $f, debug: $d, in: $1, out: $outFile"
1 enhanced getopt is available on most "bash-systems", including
Cygwin; on OS X try brew install gnu-getopt 2 the POSIX exec() conventions have no reliable way to
pass binary NULL in command line arguments; those bytes prematurely end the argument 3 first version released in 1997 or before (I only tracked it back to
1997)
I believe that the only caveat with getopt is that it cannot be used
conveniently in wrapper scripts where one might have few options specific to the
wrapper script, and then pass the non-wrapper-script options to the wrapped executable,
intact. Let's say I have a grep wrapper called mygrep and I have an
option --foo specific to mygrep , then I cannot do mygrep
--foo -A 2 , and have the -A 2 passed automatically to grep
; I need to do mygrep --foo -- -A 2 . Here is my implementation on top of
your solution. – Kaushal Modi
Apr 27 '17 at 14:02
Alex, I agree and there's really no way around that since we need to know the actual return
value of getopt --test . I'm a big fan of "Unofficial Bash Strict mode", (which
includes set -e ), and I just put the check for getopt ABOVE set -euo
pipefail and IFS=$'\n\t' in my script. – bobpaul
Mar 20 at 16:45
@bobpaul Oh, there is a way around that. And I'll edit my answer soon to reflect my
collections regarding this issue ( set -e )... – Robert Siemer
Mar 21 at 9:10
@bobpaul Your statement about util-linux is wrong and misleading as well: the package is
marked "essential" on Ubuntu/Debian. As such, it is always installed. – Which distros
are you talking about (where you say it needs to be installed on purpose)? – Robert Siemer
Mar 21 at 9:16
#!/bin/bash
for i in "$@"
do
case $i in
-p=*|--prefix=*)
PREFIX="${i#*=}"
;;
-s=*|--searchpath=*)
SEARCHPATH="${i#*=}"
;;
-l=*|--lib=*)
DIR="${i#*=}"
;;
--default)
DEFAULT=YES
;;
*)
# unknown option
;;
esac
done
echo PREFIX = ${PREFIX}
echo SEARCH PATH = ${SEARCHPATH}
echo DIRS = ${DIR}
echo DEFAULT = ${DEFAULT}
To better understand ${i#*=} search for "Substring Removal" in this guide . It is
functionally equivalent to `sed 's/[^=]*=//' <<< "$i"` which calls a
needless subprocess or `echo "$i" | sed 's/[^=]*=//'` which calls two
needless subprocesses.
Neat! Though this won't work for space-separated arguments à la mount -t tempfs
... . One can probably fix this via something like while [ $# -ge 1 ]; do
param=$1; shift; case $param in; -p) prefix=$1; shift;; etc – Tobias Kienzler
Nov 12 '13 at 12:48
@Matt J, the first part of the script (for i) would be able to handle arguments with spaces
in them if you use "$i" instead of $i. The getopts does not seem to be able to handle
arguments with spaces. What would be the advantage of using getopt over the for i loop?
– thebunnyrules
Jun 1 at 1:57
Sorry for the delay. In my script, the handle_argument function receives all the non-option
arguments. You can replace that line with whatever you'd like, maybe *) die
"unrecognized argument: $1" or collect the args into a variable *) args+="$1";
shift 1;; . – bronson
Oct 8 '15 at 20:41
Amazing! I've tested a couple of answers, but this is the only one that worked for all cases,
including many positional parameters (both before and after flags) – Guilherme Garnier
Apr 13 at 16:10
I'm about 4 years late to this question, but want to give back. I used the earlier answers as
a starting point to tidy up my old adhoc param parsing. I then refactored out the following
template code. It handles both long and short params, using = or space separated arguments,
as well as multiple short params grouped together. Finally it re-inserts any non-param
arguments back into the $1,$2.. variables. I hope it's useful.
#!/usr/bin/env bash
# NOTICE: Uncomment if your script depends on bashisms.
#if [ -z "$BASH_VERSION" ]; then bash $0 $@ ; exit $? ; fi
echo "Before"
for i ; do echo - $i ; done
# Code template for parsing command line parameters using only portable shell
# code, while handling both long and short params, handling '-f file' and
# '-f=file' style param data and also capturing non-parameters to be inserted
# back into the shell positional parameters.
while [ -n "$1" ]; do
# Copy so we can modify it (can't modify $1)
OPT="$1"
# Detect argument termination
if [ x"$OPT" = x"--" ]; then
shift
for OPT ; do
REMAINS="$REMAINS \"$OPT\""
done
break
fi
# Parse current opt
while [ x"$OPT" != x"-" ] ; do
case "$OPT" in
# Handle --flag=value opts like this
-c=* | --config=* )
CONFIGFILE="${OPT#*=}"
shift
;;
# and --flag value opts like this
-c* | --config )
CONFIGFILE="$2"
shift
;;
-f* | --force )
FORCE=true
;;
-r* | --retry )
RETRY=true
;;
# Anything unknown is recorded for later
* )
REMAINS="$REMAINS \"$OPT\""
break
;;
esac
# Check for multiple short options
# NOTICE: be sure to update this pattern to match valid options
NEXTOPT="${OPT#-[cfr]}" # try removing single short opt
if [ x"$OPT" != x"$NEXTOPT" ] ; then
OPT="-$NEXTOPT" # multiple short opts, keep going
else
break # long form, exit inner loop
fi
done
# Done with that param. move to next
shift
done
# Set the non-parameters back into the positional parameters ($1 $2 ..)
eval set -- $REMAINS
echo -e "After: \n configfile='$CONFIGFILE' \n force='$FORCE' \n retry='$RETRY' \n remains='$REMAINS'"
for i ; do echo - $i ; done
This code can't handle options with arguments like this: -c1 . And the use of
= to separate short options from their arguments is unusual... –
Robert
Siemer
Dec 6 '15 at 13:47
I ran into two problems with this useful chunk of code: 1) the "shift" in the case of
"-c=foo" ends up eating the next parameter; and 2) 'c' should not be included in the "[cfr]"
pattern for combinable short options. – sfnd
Jun 6 '16 at 19:28
My answer is largely based on the answer by Bruno Bronosky , but I sort
of mashed his two pure bash implementations into one that I use pretty frequently.
# As long as there is at least one more argument, keep looping
while [[ $# -gt 0 ]]; do
key="$1"
case "$key" in
# This is a flag type option. Will catch either -f or --foo
-f|--foo)
FOO=1
;;
# Also a flag type option. Will catch either -b or --bar
-b|--bar)
BAR=1
;;
# This is an arg value type option. Will catch -o value or --output-file value
-o|--output-file)
shift # past the key and to the value
OUTPUTFILE="$1"
;;
# This is an arg=value type option. Will catch -o=value or --output-file=value
-o=*|--output-file=*)
# No need to shift here since the value is part of the same string
OUTPUTFILE="${key#*=}"
;;
*)
# Do whatever you want with extra options
echo "Unknown option '$key'"
;;
esac
# Shift after checking all the cases to get the next option
shift
done
This allows you to have both space separated options/values, as well as equal defined
values.
So you could run your script using:
./myscript --foo -b -o /fizz/file.txt
as well as:
./myscript -f --bar -o=/fizz/file.txt
and both should have the same end result.
PROS:
Allows for both -arg=value and -arg value
Works with any arg name that you can use in bash
Meaning -a or -arg or --arg or -a-r-g or whatever
Pure bash. No need to learn/use getopt or getopts
CONS:
Can't combine args
Meaning no -abc. You must do -a -b -c
These are the only pros/cons I can think of off the top of my head
I have found the matter to write portable parsing in scripts so frustrating that I have
written Argbash - a FOSS
code generator that can generate the arguments-parsing code for your script plus it has some
nice features:
Thanks for writing argbash, I just used it and found it works well. I mostly went for argbash
because it's a code generator supporting the older bash 3.x found on OS X 10.11 El Capitan.
The only downside is that the code-generator approach means quite a lot of code in your main
script, compared to calling a module. – RichVel
Aug 18 '16 at 5:34
You can actually use Argbash in a way that it produces tailor-made parsing library just for
you that you can have included in your script or you can have it in a separate file and just
source it. I have added an example to
demonstrate that and I have made it more explicit in the documentation, too. –
bubla
Aug 23 '16 at 20:40
Good to know. That example is interesting but still not really clear - maybe you can change
name of the generated script to 'parse_lib.sh' or similar and show where the main script
calls it (like in the wrapping script section which is more complex use case). –
RichVel
Aug 24 '16 at 5:47
The issues were addressed in recent version of argbash: Documentation has been improved, a
quickstart argbash-init script has been introduced and you can even use argbash online at
argbash.io/generate –
bubla
Dec 2 '16 at 20:12
I read all and this one is my preferred one. I don't like to use -a=1 as argc
style. I prefer to put first the main option -options and later the special ones with single
spacing -o option . Im looking for the simplest-vs-better way to read argvs.
– erm3nda
May 20 '15 at 22:50
It's working really well but if you pass an argument to a non a: option all the following
options would be taken as arguments. You can check this line ./myscript -v -d fail -o
/fizz/someOtherFile -f ./foo/bar/someFile with your own script. -d option is not set
as d: – erm3nda
May 20 '15 at 23:25
Expanding on the excellent answer by @guneysus, here is a tweak that lets user use whichever
syntax they prefer, eg
command -x=myfilename.ext --another_switch
vs
command -x myfilename.ext --another_switch
That is to say the equals can be replaced with whitespace.
This "fuzzy interpretation" might not be to your liking, but if you are making scripts
that are interchangeable with other utilities (as is the case with mine, which must work with
ffmpeg), the flexibility is useful.
STD_IN=0
prefix=""
key=""
value=""
for keyValue in "$@"
do
case "${prefix}${keyValue}" in
-i=*|--input_filename=*) key="-i"; value="${keyValue#*=}";;
-ss=*|--seek_from=*) key="-ss"; value="${keyValue#*=}";;
-t=*|--play_seconds=*) key="-t"; value="${keyValue#*=}";;
-|--stdin) key="-"; value=1;;
*) value=$keyValue;;
esac
case $key in
-i) MOVIE=$(resolveMovie "${value}"); prefix=""; key="";;
-ss) SEEK_FROM="${value}"; prefix=""; key="";;
-t) PLAY_SECONDS="${value}"; prefix=""; key="";;
-) STD_IN=${value}; prefix=""; key="";;
*) prefix="${keyValue}=";;
esac
done
getopts works great if #1 you have it installed and #2 you intend to run it on the same
platform. OSX and Linux (for example) behave differently in this respect.
Here is a (non getopts) solution that supports equals, non-equals, and boolean flags. For
example you could run your script in this way:
./script --arg1=value1 --arg2 value2 --shouldClean
# parse the arguments.
COUNTER=0
ARGS=("$@")
while [ $COUNTER -lt $# ]
do
arg=${ARGS[$COUNTER]}
let COUNTER=COUNTER+1
nextArg=${ARGS[$COUNTER]}
if [[ $skipNext -eq 1 ]]; then
echo "Skipping"
skipNext=0
continue
fi
argKey=""
argVal=""
if [[ "$arg" =~ ^\- ]]; then
# if the format is: -key=value
if [[ "$arg" =~ \= ]]; then
argVal=$(echo "$arg" | cut -d'=' -f2)
argKey=$(echo "$arg" | cut -d'=' -f1)
skipNext=0
# if the format is: -key value
elif [[ ! "$nextArg" =~ ^\- ]]; then
argKey="$arg"
argVal="$nextArg"
skipNext=1
# if the format is: -key (a boolean flag)
elif [[ "$nextArg" =~ ^\- ]] || [[ -z "$nextArg" ]]; then
argKey="$arg"
argVal=""
skipNext=0
fi
# if the format has not flag, just a value.
else
argKey=""
argVal="$arg"
skipNext=0
fi
case "$argKey" in
--source-scmurl)
SOURCE_URL="$argVal"
;;
--dest-scmurl)
DEST_URL="$argVal"
;;
--version-num)
VERSION_NUM="$argVal"
;;
-c|--clean)
CLEAN_BEFORE_START="1"
;;
-h|--help|-help|--h)
showUsage
exit
;;
esac
done
This is how I do in a function to avoid breaking getopts run at the same time somewhere
higher in stack:
function waitForWeb () {
local OPTIND=1 OPTARG OPTION
local host=localhost port=8080 proto=http
while getopts "h:p:r:" OPTION; do
case "$OPTION" in
h)
host="$OPTARG"
;;
p)
port="$OPTARG"
;;
r)
proto="$OPTARG"
;;
esac
done
...
}
I give you The Function parse_params that will parse params:
Without polluting global scope.
Effortlessly returns to you ready to use variables so that you could build further
logic on them
Amount of dashes before params does not matter ( --all equals
-all equals all=all )
The script below is a copy-paste working demonstration. See show_use function
to understand how to use parse_params .
Limitations:
Does not support space delimited params ( -d 1 )
Param names will lose dashes so --any-param and -anyparam are
equivalent
eval $(parse_params "$@") must be used inside bash function (it will not
work in the global scope)
#!/bin/bash
# Universal Bash parameter parsing
# Parse equal sign separated params into named local variables
# Standalone named parameter value will equal its param name (--force creates variable $force=="force")
# Parses multi-valued named params into an array (--path=path1 --path=path2 creates ${path[*]} array)
# Parses un-named params into ${ARGV[*]} array
# Additionally puts all named params into ${ARGN[*]} array
# Additionally puts all standalone "option" params into ${ARGO[*]} array
# @author Oleksii Chekulaiev
# @version v1.3 (May-14-2018)
parse_params ()
{
local existing_named
local ARGV=() # un-named params
local ARGN=() # named params
local ARGO=() # options (--params)
echo "local ARGV=(); local ARGN=(); local ARGO=();"
while [[ "$1" != "" ]]; do
# Escape asterisk to prevent bash asterisk expansion
_escaped=${1/\*/\'\"*\"\'}
# If equals delimited named parameter
if [[ "$1" =~ ^..*=..* ]]; then
# Add to named parameters array
echo "ARGN+=('$_escaped');"
# key is part before first =
local _key=$(echo "$1" | cut -d = -f 1)
# val is everything after key and = (protect from param==value error)
local _val="${1/$_key=}"
# remove dashes from key name
_key=${_key//\-}
# search for existing parameter name
if (echo "$existing_named" | grep "\b$_key\b" >/dev/null); then
# if name already exists then it's a multi-value named parameter
# re-declare it as an array if needed
if ! (declare -p _key 2> /dev/null | grep -q 'declare \-a'); then
echo "$_key=(\"\$$_key\");"
fi
# append new value
echo "$_key+=('$_val');"
else
# single-value named parameter
echo "local $_key=\"$_val\";"
existing_named=" $_key"
fi
# If standalone named parameter
elif [[ "$1" =~ ^\-. ]]; then
# Add to options array
echo "ARGO+=('$_escaped');"
# remove dashes
local _key=${1//\-}
echo "local $_key=\"$_key\";"
# non-named parameter
else
# Escape asterisk to prevent bash asterisk expansion
_escaped=${1/\*/\'\"*\"\'}
echo "ARGV+=('$_escaped');"
fi
shift
done
}
#--------------------------- DEMO OF THE USAGE -------------------------------
show_use ()
{
eval $(parse_params "$@")
# --
echo "${ARGV[0]}" # print first unnamed param
echo "${ARGV[1]}" # print second unnamed param
echo "${ARGN[0]}" # print first named param
echo "${ARG0[0]}" # print first option param (--force)
echo "$anyparam" # print --anyparam value
echo "$k" # print k=5 value
echo "${multivalue[0]}" # print first value of multi-value
echo "${multivalue[1]}" # print second value of multi-value
[[ "$force" == "force" ]] && echo "\$force is set so let the force be with you"
}
show_use "param 1" --anyparam="my value" param2 k=5 --force --multi-value=test1 --multi-value=test2
You have to decide before use if = is to be used on an option or not. This is to keep the
code clean(ish).
while [[ $# > 0 ]]
do
key="$1"
while [[ ${key+x} ]]
do
case $key in
-s*|--stage)
STAGE="$2"
shift # option has parameter
;;
-w*|--workfolder)
workfolder="$2"
shift # option has parameter
;;
-e=*)
EXAMPLE="${key#*=}"
break # option has been fully handled
;;
*)
# unknown option
echo Unknown option: $key #1>&2
exit 10 # either this: my preferred way to handle unknown options
break # or this: do this to signal the option has been handled (if exit isn't used)
;;
esac
# prepare for next option in this key, if any
[[ "$key" = -? || "$key" == --* ]] && unset key || key="${key/#-?/-}"
done
shift # option(s) fully processed, proceed to next input argument
done
can be accomplished with a fairly concise approach:
# process flags
pointer=1
while [[ $pointer -le $# ]]; do
param=${!pointer}
if [[ $param != "-"* ]]; then ((pointer++)) # not a parameter flag so advance pointer
else
case $param in
# paramter-flags with arguments
-e=*|--environment=*) environment="${param#*=}";;
--another=*) another="${param#*=}";;
# binary flags
-q|--quiet) quiet=true;;
-d) debug=true;;
esac
# splice out pointer frame from positional list
[[ $pointer -gt 1 ]] \
&& set -- ${@:1:((pointer - 1))} ${@:((pointer + 1)):$#} \
|| set -- ${@:((pointer + 1)):$#};
fi
done
# positional remain
node_name=$1
ip_address=$2
--param arg (space delimited)
It's usualy clearer to not mix --flag=value and --flag value
styles.
./script.sh dumbo 127.0.0.1 --environment production -q -d
This is a little dicey to read, but is still valid
./script.sh dumbo --environment production 127.0.0.1 --quiet -d
Source
# process flags
pointer=1
while [[ $pointer -le $# ]]; do
if [[ ${!pointer} != "-"* ]]; then ((pointer++)) # not a parameter flag so advance pointer
else
param=${!pointer}
((pointer_plus = pointer + 1))
slice_len=1
case $param in
# paramter-flags with arguments
-e|--environment) environment=${!pointer_plus}; ((slice_len++));;
--another) another=${!pointer_plus}; ((slice_len++));;
# binary flags
-q|--quiet) quiet=true;;
-d) debug=true;;
esac
# splice out pointer frame from positional list
[[ $pointer -gt 1 ]] \
&& set -- ${@:1:((pointer - 1))} ${@:((pointer + $slice_len)):$#} \
|| set -- ${@:((pointer + $slice_len)):$#};
fi
done
# positional remain
node_name=$1
ip_address=$2
Note that getopt(1) was a short living mistake from AT&T.
getopt was created in 1984 but already buried in 1986 because it was not really
usable.
A proof for the fact that getopt is very outdated is that the
getopt(1) man page still mentions "$*" instead of "$@"
, that was added to the Bourne Shell in 1986 together with the getopts(1) shell
builtin in order to deal with arguments with spaces inside.
BTW: if you are interested in parsing long options in shell scripts, it may be of interest
to know that the getopt(3) implementation from libc (Solaris) and
ksh93 both added a uniform long option implementation that supports long options
as aliases for short options. This causes ksh93 and the Bourne
Shell to implement a uniform interface for long options via getopts .
An example for long options taken from the Bourne Shell man page:
This also might be useful to know, you can set a value and if someone provides input,
override the default with that value..
myscript.sh -f ./serverlist.txt or just ./myscript.sh (and it takes defaults)
#!/bin/bash
# --- set the value, if there is inputs, override the defaults.
HOME_FOLDER="${HOME}/owned_id_checker"
SERVER_FILE_LIST="${HOME_FOLDER}/server_list.txt"
while [[ $# > 1 ]]
do
key="$1"
shift
case $key in
-i|--inputlist)
SERVER_FILE_LIST="$1"
shift
;;
esac
done
echo "SERVER LIST = ${SERVER_FILE_LIST}"
Main differentiating feature of my solution is that it allows to have options concatenated
together just like tar -xzf foo.tar.gz is equal to tar -x -z -f
foo.tar.gz . And just like in tar , ps etc. the leading
hyphen is optional for a block of short options (but this can be changed easily). Long
options are supported as well (but when a block starts with one then two leading hyphens are
required).
Code with example options
#!/bin/sh
echo
echo "POSIX-compliant getopt(s)-free old-style-supporting option parser from phk@[se.unix]"
echo
print_usage() {
echo "Usage:
$0 {a|b|c} [ARG...]
Options:
--aaa-0-args
-a
Option without arguments.
--bbb-1-args ARG
-b ARG
Option with one argument.
--ccc-2-args ARG1 ARG2
-c ARG1 ARG2
Option with two arguments.
" >&2
}
if [ $# -le 0 ]; then
print_usage
exit 1
fi
opt=
while :; do
if [ $# -le 0 ]; then
# no parameters remaining -> end option parsing
break
elif [ ! "$opt" ]; then
# we are at the beginning of a fresh block
# remove optional leading hyphen and strip trailing whitespaces
opt=$(echo "$1" | sed 's/^-\?\([a-zA-Z0-9\?-]*\)/\1/')
fi
# get the first character -> check whether long option
first_chr=$(echo "$opt" | awk '{print substr($1, 1, 1)}')
[ "$first_chr" = - ] && long_option=T || long_option=F
# note to write the options here with a leading hyphen less
# also do not forget to end short options with a star
case $opt in
-)
# end of options
shift
break
;;
a*|-aaa-0-args)
echo "Option AAA activated!"
;;
b*|-bbb-1-args)
if [ "$2" ]; then
echo "Option BBB with argument '$2' activated!"
shift
else
echo "BBB parameters incomplete!" >&2
print_usage
exit 1
fi
;;
c*|-ccc-2-args)
if [ "$2" ] && [ "$3" ]; then
echo "Option CCC with arguments '$2' and '$3' activated!"
shift 2
else
echo "CCC parameters incomplete!" >&2
print_usage
exit 1
fi
;;
h*|\?*|-help)
print_usage
exit 0
;;
*)
if [ "$long_option" = T ]; then
opt=$(echo "$opt" | awk '{print substr($1, 2)}')
else
opt=$first_chr
fi
printf 'Error: Unknown option: "%s"\n' "$opt" >&2
print_usage
exit 1
;;
esac
if [ "$long_option" = T ]; then
# if we had a long option then we are going to get a new block next
shift
opt=
else
# if we had a short option then just move to the next character
opt=$(echo "$opt" | awk '{print substr($1, 2)}')
# if block is now empty then shift to the next one
[ "$opt" ] || shift
fi
done
echo "Doing something..."
exit 0
For the example usage please see the examples further below.
Position of options
with arguments
For what its worth there the options with arguments don't be the last (only long options
need to be). So while e.g. in tar (at least in some implementations) the
f options needs to be last because the file name follows ( tar xzf
bar.tar.gz works but tar xfz bar.tar.gz does not) this is not the case
here (see the later examples).
Multiple options with arguments
As another bonus the option parameters are consumed in the order of the options by the
parameters with required options. Just look at the output of my script here with the command
line abc X Y Z (or -abc X Y Z ):
Option AAA activated!
Option BBB with argument 'X' activated!
Option CCC with arguments 'Y' and 'Z' activated!
Long options concatenated as well
Also you can also have long options in option block given that they occur last in the
block. So the following command lines are all equivalent (including the order in which the
options and its arguments are being processed):
-cba Z Y X
cba Z Y X
-cb-aaa-0-args Z Y X
-c-bbb-1-args Z Y X -a
--ccc-2-args Z Y -ba X
c Z Y b X a
-c Z Y -b X -a
--ccc-2-args Z Y --bbb-1-args X --aaa-0-args
All of these lead to:
Option CCC with arguments 'Z' and 'Y' activated!
Option BBB with argument 'X' activated!
Option AAA activated!
Doing something...
Not in this solutionOptional arguments
Options with optional arguments should be possible with a bit of work, e.g. by looking
forward whether there is a block without a hyphen; the user would then need to put a hyphen
in front of every block following a block with a parameter having an optional parameter.
Maybe this is too complicated to communicate to the user so better just require a leading
hyphen altogether in this case.
Things get even more complicated with multiple possible parameters. I would advise against
making the options trying to be smart by determining whether the an argument might be for it
or not (e.g. with an option just takes a number as an optional argument) because this might
break in the future.
I personally favor additional options instead of optional arguments.
Option
arguments introduced with an equal sign
Just like with optional arguments I am not a fan of this (BTW, is there a thread for
discussing the pros/cons of different parameter styles?) but if you want this you could
probably implement it yourself just like done at http://mywiki.wooledge.org/BashFAQ/035#Manual_loop
with a --long-with-arg=?* case statement and then stripping the equal sign (this
is BTW the site that says that making parameter concatenation is possible with some effort
but "left [it] as an exercise for the reader" which made me take them at their word but I
started from scratch).
Other notes
POSIX-compliant, works even on ancient Busybox setups I had to deal with (with e.g.
cut , head and getopts missing).
Solution that preserves unhandled arguments. Demos Included.
Here is my solution. It is VERY flexible and unlike others, shouldn't require external
packages and handles leftover arguments cleanly.
Usage is: ./myscript -flag flagvariable -otherflag flagvar2
All you have to do is edit the validflags line. It prepends a hyphen and searches all
arguments. It then defines the next argument as the flag name e.g.
The main code (short version, verbose with examples further down, also a version with
erroring out):
#!/usr/bin/env bash
#shebang.io
validflags="rate time number"
count=1
for arg in $@
do
match=0
argval=$1
for flag in $validflags
do
sflag="-"$flag
if [ "$argval" == "$sflag" ]
then
declare $flag=$2
match=1
fi
done
if [ "$match" == "1" ]
then
shift 2
else
leftovers=$(echo $leftovers $argval)
shift
fi
count=$(($count+1))
done
#Cleanup then restore the leftovers
shift $#
set -- $leftovers
The verbose version with built in echo demos:
#!/usr/bin/env bash
#shebang.io
rate=30
time=30
number=30
echo "all args
$@"
validflags="rate time number"
count=1
for arg in $@
do
match=0
argval=$1
# argval=$(echo $@ | cut -d ' ' -f$count)
for flag in $validflags
do
sflag="-"$flag
if [ "$argval" == "$sflag" ]
then
declare $flag=$2
match=1
fi
done
if [ "$match" == "1" ]
then
shift 2
else
leftovers=$(echo $leftovers $argval)
shift
fi
count=$(($count+1))
done
#Cleanup then restore the leftovers
echo "pre final clear args:
$@"
shift $#
echo "post final clear args:
$@"
set -- $leftovers
echo "all post set args:
$@"
echo arg1: $1 arg2: $2
echo leftovers: $leftovers
echo rate $rate time $time number $number
Final one, this one errors out if an invalid -argument is passed through.
#!/usr/bin/env bash
#shebang.io
rate=30
time=30
number=30
validflags="rate time number"
count=1
for arg in $@
do
argval=$1
match=0
if [ "${argval:0:1}" == "-" ]
then
for flag in $validflags
do
sflag="-"$flag
if [ "$argval" == "$sflag" ]
then
declare $flag=$2
match=1
fi
done
if [ "$match" == "0" ]
then
echo "Bad argument: $argval"
exit 1
fi
shift 2
else
leftovers=$(echo $leftovers $argval)
shift
fi
count=$(($count+1))
done
#Cleanup then restore the leftovers
shift $#
set -- $leftovers
echo rate $rate time $time number $number
echo leftovers: $leftovers
Pros: What it does, it handles very well. It preserves unused arguments which a lot of the
other solutions here don't. It also allows for variables to be called without being defined
by hand in the script. It also allows prepopulation of variables if no corresponding argument
is given. (See verbose example).
Cons: Can't parse a single complex arg string e.g. -xcvf would process as a single
argument. You could somewhat easily write additional code into mine that adds this
functionality though.
The top answer to this question seemed a bit buggy when I tried it -- here's my solution
which I've found to be more robust:
boolean_arg=""
arg_with_value=""
while [[ $# -gt 0 ]]
do
key="$1"
case $key in
-b|--boolean-arg)
boolean_arg=true
shift
;;
-a|--arg-with-value)
arg_with_value="$2"
shift
shift
;;
-*)
echo "Unknown option: $1"
exit 1
;;
*)
arg_num=$(( $arg_num + 1 ))
case $arg_num in
1)
first_normal_arg="$1"
shift
;;
2)
second_normal_arg="$1"
shift
;;
*)
bad_args=TRUE
esac
;;
esac
done
# Handy to have this here when adding arguments to
# see if they're working. Just edit the '0' to be '1'.
if [[ 0 == 1 ]]; then
echo "first_normal_arg: $first_normal_arg"
echo "second_normal_arg: $second_normal_arg"
echo "boolean_arg: $boolean_arg"
echo "arg_with_value: $arg_with_value"
exit 0
fi
if [[ $bad_args == TRUE || $arg_num < 2 ]]; then
echo "Usage: $(basename "$0") <first-normal-arg> <second-normal-arg> [--boolean-arg] [--arg-with-value VALUE]"
exit 1
fi
This example shows how to use getopt and eval and
HEREDOC and shift to handle short and long parameters with and
without a required value that follows. Also the switch/case statement is concise and easy to
follow.
#!/usr/bin/env bash
# usage function
function usage()
{
cat << HEREDOC
Usage: $progname [--num NUM] [--time TIME_STR] [--verbose] [--dry-run]
optional arguments:
-h, --help show this help message and exit
-n, --num NUM pass in a number
-t, --time TIME_STR pass in a time string
-v, --verbose increase the verbosity of the bash script
--dry-run do a dry run, don't change any files
HEREDOC
}
# initialize variables
progname=$(basename $0)
verbose=0
dryrun=0
num_str=
time_str=
# use getopt and store the output into $OPTS
# note the use of -o for the short options, --long for the long name options
# and a : for any option that takes a parameter
OPTS=$(getopt -o "hn:t:v" --long "help,num:,time:,verbose,dry-run" -n "$progname" -- "$@")
if [ $? != 0 ] ; then echo "Error in command line arguments." >&2 ; usage; exit 1 ; fi
eval set -- "$OPTS"
while true; do
# uncomment the next line to see how shift is working
# echo "\$1:\"$1\" \$2:\"$2\""
case "$1" in
-h | --help ) usage; exit; ;;
-n | --num ) num_str="$2"; shift 2 ;;
-t | --time ) time_str="$2"; shift 2 ;;
--dry-run ) dryrun=1; shift ;;
-v | --verbose ) verbose=$((verbose + 1)); shift ;;
-- ) shift; break ;;
* ) break ;;
esac
done
if (( $verbose > 0 )); then
# print out all the parameters we read in
cat <<-EOM
num=$num_str
time=$time_str
verbose=$verbose
dryrun=$dryrun
EOM
fi
# The rest of your script below
The most significant lines of the script above are these:
OPTS=$(getopt -o "hn:t:v" --long "help,num:,time:,verbose,dry-run" -n "$progname" -- "$@")
if [ $? != 0 ] ; then echo "Error in command line arguments." >&2 ; exit 1 ; fi
eval set -- "$OPTS"
while true; do
case "$1" in
-h | --help ) usage; exit; ;;
-n | --num ) num_str="$2"; shift 2 ;;
-t | --time ) time_str="$2"; shift 2 ;;
--dry-run ) dryrun=1; shift ;;
-v | --verbose ) verbose=$((verbose + 1)); shift ;;
-- ) shift; break ;;
* ) break ;;
esac
done
Short, to the point, readable, and handles just about everything (IMHO).
I get this on Mac OS X: ``` lib/bashopts.sh: line 138: declare: -A: invalid option declare:
usage: declare [-afFirtx] [-p] [name[=value] ...] Error in lib/bashopts.sh:138. 'declare -x
-A bashopts_optprop_name' exited with status 2 Call tree: 1: lib/controller.sh:4 source(...)
Exiting with status 1 ``` – Josh Wulf
Jun 24 '17 at 18:07
you can pass attribute to short or long option (if you are using block of short
options, attribute is attached to the last option)
you can use spaces or = to provide attributes, but attribute matches until
encountering hyphen+space "delimiter", so in --q=qwe tyqwe ty is
one attribute
it handles mix of all above so -o a -op attr ibute --option=att ribu te --op-tion
attribute --option att-ribute is valid
script:
#!/usr/bin/env sh
help_menu() {
echo "Usage:
${0##*/} [-h][-l FILENAME][-d]
Options:
-h, --help
display this help and exit
-l, --logfile=FILENAME
filename
-d, --debug
enable debug
"
}
parse_options() {
case $opt in
h|help)
help_menu
exit
;;
l|logfile)
logfile=${attr}
;;
d|debug)
debug=true
;;
*)
echo "Unknown option: ${opt}\nRun ${0##*/} -h for help.">&2
exit 1
esac
}
options=$@
until [ "$options" = "" ]; do
if [[ $options =~ (^ *(--([a-zA-Z0-9-]+)|-([a-zA-Z0-9-]+))(( |=)(([\_\.\?\/\\a-zA-Z0-9]?[ -]?[\_\.\?a-zA-Z0-9]+)+))?(.*)|(.+)) ]]; then
if [[ ${BASH_REMATCH[3]} ]]; then # for --option[=][attribute] or --option[=][attribute]
opt=${BASH_REMATCH[3]}
attr=${BASH_REMATCH[7]}
options=${BASH_REMATCH[9]}
elif [[ ${BASH_REMATCH[4]} ]]; then # for block options -qwert[=][attribute] or single short option -a[=][attribute]
pile=${BASH_REMATCH[4]}
while (( ${#pile} > 1 )); do
opt=${pile:0:1}
attr=""
pile=${pile/${pile:0:1}/}
parse_options
done
opt=$pile
attr=${BASH_REMATCH[7]}
options=${BASH_REMATCH[9]}
else # leftovers that don't match
opt=${BASH_REMATCH[10]}
options=""
fi
parse_options
fi
done
This takes the same approach as Noah's answer , but has less safety checks
/ safeguards. This allows us to write arbitrary arguments into the script's environment and
I'm pretty sure your use of eval here may allow command injection. – Will Barnwell
Oct 10 '17 at 23:57
You use shift on the known arguments and not on the unknown ones so your remaining
$@ will be all but the first two arguments (in the order they are passed in),
which could lead to some mistakes if you try to use $@ later. You don't need the
shift for the = parameters, since you're not handling spaces and you're getting the value
with the substring removal #*= – Jason S
Dec 3 '17 at 1:01
You can use sudo to execute the test in your script. For instance:
sudo -u mysql -H sh -c "if [ -w $directory ] ; then echo 'Eureka' ; fi"
To do this, the user executing the script will need sudo privileges of
course.
If you explicitly need the uid instead of the username, you can also use:
sudo -u \#42 -H sh -c "if [ -w $directory ] ; then echo 'Eureka' ; fi"
In this case, 42 is the uid of the mysql user. Substitute your
own value if needed.
UPDATE (to support non-sudo-priviledged users)
To get a bash script to change-users without sudu would be to require the
ability to suid ("switch user id"). This, as pointed out by
this answer , is a security restriction that requires a hack to work around. Check
this blog for an
example of "how to" work around it (I haven't tested/tried it, so I can't confirm it's
success).
My recommendation, if possible, would be to write a script in C that is given permission
to suid (try chmod 4755 file-name ). Then, you can call setuid(#)
from the C script to set the current user's id and either continue code-execution from the C
application, or have it execute a separate bash script that runs whatever commands you
need/want. This is also a pretty hacky method, but as far as non-sudo alternatives it's
probably one of the easiest (in my opinion).
,
I've written a function can_user_write_to_file which will return 1
if the user passed to it either is the owner of the file/directory, or is member of a group
which has write access to that file/directory. If not, the method returns 0 .
## Method which returns 1 if the user can write to the file or
## directory.
##
## $1 :: user name
## $2 :: file
function can_user_write_to_file() {
if [[ $# -lt 2 || ! -r $2 ]]; then
echo 0
return
fi
local user_id=$(id -u ${1} 2>/dev/null)
local file_owner_id=$(stat -c "%u" $2)
if [[ ${user_id} == ${file_owner_id} ]]; then
echo 1
return
fi
local file_access=$(stat -c "%a" $2)
local file_group_access=${file_access:1:1}
local file_group_name=$(stat -c "%G" $2)
local user_group_list=$(groups $1 2>/dev/null)
if [ ${file_group_access} -ge 6 ]; then
for el in ${user_group_list-nop}; do
if [[ "${el}" == ${file_group_name} ]]; then
echo 1
return
fi
done
fi
echo 0
}
At least from these tests, the method works as intended considering file ownership and
group write access :-)
,
Because I had to make some changes to @chepner's answer in order to get it to work, I'm
posting my ad-hoc script here for easy copy & paste. It's a minor refactoring only, and I
have upvoted chepner's answer. I'll delete mine if the accepted answer is updated with these
fixes. I have already left comments on that answer pointing out the things I had trouble
with.
I wanted to do away with the Bashisms so that's why I'm not using arrays at all. The
(( arithmetic evaluation )) is still a Bash-only feature, so I'm
stuck on Bash after all.
for f; do
set -- $(stat -Lc "0%a %G %U" "$f")
(("$1" & 0002)) && continue
if (("$1" & 0020)); then
case " "$(groups "$USER")" " in *" "$2" "*) continue ;; esac
elif (("$1" & 0200)); then
[ "$3" = "$USER" ] && continue
fi
echo "$0: Wrong permissions" "$@" "$f" >&2
done
Without the comments, this is even fairly compact.
Note that the code will also be executed if the file does not exist at all. It is fine with
find but in other scenarios (such as globs) should be combined with -h to handle
this case, for instance [ -h "$F" -a ! -e "$F" ] . – Calimo
Apr 18 '17 at 19:50
this seems pretty nice as this only returns true if the file is actually a symlink. But even
with adding -q, readlink outputs the name of the link on linux. If this is the case in
general maybe the answer should be updated with 'readlink -q $F > dev/null'. Or am I
missing something? – zoltanctoth
Nov 8 '11 at 10:55
I'd strongly suggest not to use find -L for the task (see below for
explanation). Here are some other ways to do this:
If you want to use a "pure find " method, it should rather look like this:
find . -xtype l
( xtype is a test performed on a dereferenced link) This may not be
available in all versions of find , though. But there are other options as
well:
You can also exec test -e from within the find command:
find . -type l ! -exec test -e {} \; -print
Even some grep trick could be better (i.e., safer ) than
find -L , but not exactly such as presented in the question (which greps in
entire output lines, including filenames):
find . -type l -exec sh -c 'file -b "$1" | grep -q ^broken' sh {} \; -print
The find -L trick quoted by solo from commandlinefu
looks nice and hacky, but it has one very dangerous pitfall : All the symlinks are followed.
Consider directory with the contents presented below:
$ ls -l
total 0
lrwxrwxrwx 1 michal users 6 May 15 08:12 link_1 -> nonexistent1
lrwxrwxrwx 1 michal users 6 May 15 08:13 link_2 -> nonexistent2
lrwxrwxrwx 1 michal users 6 May 15 08:13 link_3 -> nonexistent3
lrwxrwxrwx 1 michal users 6 May 15 08:13 link_4 -> nonexistent4
lrwxrwxrwx 1 michal users 11 May 15 08:20 link_out -> /usr/share/
If you run find -L . -type l in that directory, all /usr/share/
would be searched as well (and that can take really long) 1 . For a
find command that is "immune to outgoing links", don't use -L .
1 This may look like a minor inconvenience (the command will "just" take long
to traverse all /usr/share ) – but can have more severe consequences. For
instance, consider chroot environments: They can exist in some subdirectory of the main
filesystem and contain symlinks to absolute locations. Those links could seem to be broken
for the "outside" system, because they only point to proper places once you've entered the
chroot. I also recall that some bootloader used symlinks under /boot that only
made sense in an initial boot phase, when the boot partition was mounted as /
.
So if you use a find -L command to find and then delete broken symlinks from
some harmless-looking directory, you might even break your system...
I think -type l is redundant since -xtype l will operate as
-type l on non-links. So find -xtype l is probably all you need.
Thanks for this approach. – quornian
Nov 17 '12 at 21:56
Be aware that those solutions don't work for all filesystem types. For example it won't work
for checking if /proc/XXX/exe link is broken. For this, use test -e
"$(readlink /proc/XXX/exe)" . – qwertzguy
Jan 8 '15 at 21:37
@Flimm find . -xtype l means "find all symlinks whose (ultimate) target files
are symlinks". But the ultimate target of a symlink cannot be a symlink, otherwise we can
still follow the link and it is not the ultimate target. Since there is no such symlinks, we
can define them as something else, i.e. broken symlinks. – weakish
Apr 8 '16 at 4:57
@JoóÁdám "which can only be a symbolic link in case it is broken". Give
"broken symbolic link" or "non exist file" an individual type, instead of overloading
l , is less confusing to me. – weakish
Apr 22 '16 at 12:19
The warning at the end is useful, but note that this does not apply to the -L
hack but rather to (blindly) removing broken symlinks in general. – Alois Mahdal
Jul 15 '16 at 0:22
As rozcietrzewiacz has already commented, find -L can have unexpected
consequence of expanding the search into symlinked directories, so isn't the optimal
approach. What no one has mentioned yet is that
find /path/to/search -xtype l
is the more concise, and logically identical command to
find /path/to/search -type l -xtype l
None of the solutions presented so far will detect cyclic symlinks, which is another type
of breakage.
this question addresses portability. To summarize, the portable way to find broken
symbolic links, including cyclic links, is:
find /path/to/search -type l -exec test ! -e {} \; -print
-L Cause the file information and file type (see stat(2)) returned
for each symbolic link to be those of the file referenced by the
link, not the link itself. If the referenced file does not exist,
the file information and type will be for the link itself.
If you need a different behavior whether the link is broken or cyclic you can also use %Y
with find:
$ touch a
$ ln -s a b # link to existing target
$ ln -s c d # link to non-existing target
$ ln -s e e # link to itself
$ find . -type l -exec test ! -e {} \; -printf '%Y %p\n' \
| while read type link; do
case "$type" in
N) echo "do something with broken link $link" ;;
L) echo "do something with cyclic link $link" ;;
esac
done
do something with broken link ./d
do something with cyclic link ./e
Yet another shorthand for those whose find command does not support
xtype can be derived from this: find . type l -printf "%Y %p\n" | grep -w
'^N' . As andy beat me to it with the same (basic) idea in his script, I was reluctant
to write it as separate answer. :) – syntaxerror
Jun 25 '15 at 0:28
I use this for my case and it works quite well, as I know the directory to look for broken
symlinks:
find -L $path -maxdepth 1 -type l
and my folder does include a link to /usr/share but it doesn't traverse it.
Cross-device links and those that are valid for chroots, etc. are still a pitfall but for my
use case it's sufficient.
,
Simple no-brainer answer, which is a variation on OP's version. Sometimes, you just want
something easy to type or remember:
@don_crissti I'll never understand why people prefer random web documentation to the
documentation installed on their systems (which has the added benefit of actually being
relevant to their system). – Kusalananda
Nov 17 '17 at 9:53
@Kusalananda - Well, I can think of one scenario in which people would include a link to a
web page instead of a quote from the documentation installed on their system: they're not on
a linux machine at the time of writing the post... However, the link should point (imo) to
the official docs (hence my comment above, which, for some unknown reason, was deleted by the
mods...). That aside, I fully agree with you: the OP should consult the manual page installed
on their system. – don_crissti
Nov 17 '17 at 12:52
My manual page tend to be from FreeBSD though. Unless I happen to have a Linux VM within
reach. And I have the impression that most questions are GNU/Linux based. – Hennes
Feb 16 at 16:16
But processing each line until the command is finished then moving to the next one is very
time consuming, I want to process for instance 20 lines at once then when they're finished
another 20 lines are processed.
I thought of wget LINK1 >/dev/null 2>&1 & to send the command
to the background and carry on, but there are 4000 lines here this means I will have
performance issues, not to mention being limited in how many processes I should start at the
same time so this is not a good idea.
One solution that I'm thinking of right now is checking whether one of the commands is
still running or not, for instance after 20 lines I can add this loop:
Of course in this case I will need to append & to the end of the line! But I'm feeling
this is not the right way to do it.
So how do I actually group each 20 lines together and wait for them to finish before going
to the next 20 lines, this script is dynamically generated so I can do whatever math I want
on it while it's being generated, but it DOES NOT have to use wget, it was just an example so
any solution that is wget specific is not gonna do me any good.
wait is the right answer here, but your while [ $(ps would be much
better written while pkill -0 $KEYWORD – using proctools that is, for legitimate reasons to
check if a process with a specific name is still running. – kojiro
Oct 23 '13 at 13:46
I think this question should be re-opened. The "possible duplicate" QA is all about running a
finite number of programs in parallel. Like 2-3 commands. This question, however, is
focused on running commands in e.g. a loop. (see "but there are 4000 lines"). –
VasyaNovikov
Jan 11 at 19:01
@VasyaNovikov Have you readall the answers to both this question and the
duplicate? Every single answer to this question here, can also be found in the answers to the
duplicate question. That is precisely the definition of a duplicate question. It makes
absolutely no difference whether or not you are running the commands in a loop. –
robinCTS
Jan 11 at 23:08
@robinCTS there are intersections, but questions themselves are different. Also, 6 of the
most popular answers on the linked QA deal with 2 processes only. – VasyaNovikov
Jan 12 at 4:09
I recommend reopening this question because its answer is clearer, cleaner, better, and much
more highly upvoted than the answer at the linked question, though it is three years more
recent. – Dan Nissenbaum
Apr 20 at 15:35
For the above example, 4 processes process1 .. process4 would be
started in the background, and the shell would wait until those are completed before starting
the next set ..
Wait until the child process specified by each process ID pid or job specification
jobspec exits and return the exit status of the last command waited for. If a job spec is
given, all processes in the job are waited for. If no arguments are given, all currently
active child processes are waited for, and the return status is zero. If neither jobspec
nor pid specifies an active child process of the shell, the return status is 127.
So basically i=0; waitevery=4; for link in "${links[@]}"; do wget "$link" & ((
i++%waitevery==0 )) && wait; done >/dev/null 2>&1 – kojiro
Oct 23 '13 at 13:48
Unless you're sure that each process will finish at the exact same time, this is a bad idea.
You need to start up new jobs to keep the current total jobs at a certain cap .... parallel is the answer.
– rsaw
Jul 18 '14 at 17:26
I've tried this but it seems that variable assignments done in one block are not available in
the next block. Is this because they are separate processes? Is there a way to communicate
the variables back to the main process? – Bobby
Apr 27 '17 at 7:55
This is better than using wait , since it takes care of starting new jobs as old
ones complete, instead of waiting for an entire batch to finish before starting the next.
– chepner
Oct 23 '13 at 14:35
For example, if you have the list of links in a file, you can do cat list_of_links.txt
| parallel -j 4 wget {} which will keep four wget s running at a time.
– Mr.
Llama
Aug 13 '15 at 19:30
I am using xargs to call a python script to process about 30 million small
files. I hope to use xargs to parallelize the process. The command I am using
is:
Basically, Convert.py will read in a small json file (4kb), do some
processing and write to another 4kb file. I am running on a server with 40 CPU cores. And no
other CPU-intense process is running on this server.
By monitoring htop (btw, is there any other good way to monitor the CPU performance?), I
find that -P 40 is not as fast as expected. Sometimes all cores will freeze and
decrease almost to zero for 3-4 seconds, then will recover to 60-70%. Then I try to decrease
the number of parallel processes to -P 20-30 , but it's still not very fast. The
ideal behavior should be linear speed-up. Any suggestions for the parallel usage of xargs
?
You are most likely hit by I/O: The system cannot read the files fast enough. Try starting
more than 40: This way it will be fine if some of the processes have to wait for I/O. –
Ole Tange
Apr 19 '15 at 8:45
I second @OleTange. That is the expected behavior if you run as many processes as you have
cores and your tasks are IO bound. First the cores will wait on IO for their task (sleep),
then they will process, and then repeat. If you add more processes, then the additional
processes that currently aren't running on a physical core will have kicked off parallel IO
operations, which will, when finished, eliminate or at least reduce the sleep periods on your
cores. – PSkocik
Apr 19 '15 at 11:41
1- Do you have hyperthreading enabled? 2- in what you have up there, log.txt is actually
overwritten with each call to convert.py ... not sure if this is the intended behavior or
not. – Bichoy
Apr 20 '15 at 3:32
I'd be willing to bet that your problem is python . You didn't say what kind of processing is
being done on each file, but assuming you are just doing in-memory processing of the data,
the running time will be dominated by starting up 30 million python virtual machines
(interpreters).
If you can restructure your python program to take a list of files, instead of just one,
you will get a huge improvement in performance. You can then still use xargs to further
improve performance. For example, 40 processes, each processing 1000 files:
This isn't to say that python is a bad/slow language; it's just not optimized for startup
time. You'll see this with any virtual machine-based or interpreted language. Java, for
example, would be even worse. If your program was written in C, there would still be a cost
of starting a separate operating system process to handle each file, but it would be much
less.
From there you can fiddle with -P to see if you can squeeze out a bit more
speed, perhaps by increasing the number of processes to take advantage of idle processors
while data is being read/written.
What is the constraint on each job? If it's I/O you can probably get away with
multiple jobs per CPU core up till you hit the limit of I/O, but if it's CPU intensive, its
going to be worse than pointless running more jobs concurrently than you have CPU cores.
My understanding of these things is that GNU Parallel would give you better control over
the queue of jobs etc.
As others said, check whether you're I/O-bound. Also, xargs' man page suggests using
-n with -P , you don't mention the number of
Convert.py processes you see running in parallel.
As a suggestion, if you're I/O-bound, you might try using an SSD block device, or try
doing the processing in a tmpfs (of course, in this case you should check for enough memory,
avoiding swap due to tmpfs pressure (I think), and the overhead of copying the data to it in
the first place).
I want the ability to schedule commands to be run in a FIFO queue. I DON'T want them to be
run at a specified time in the future as would be the case with the "at" command. I want them
to start running now, but not simultaneously. The next scheduled command in the queue should
be run only after the first command finishes executing. Alternatively, it would be nice if I
could specify a maximum number of commands from the queue that could be run simultaneously;
for example if the maximum number of simultaneous commands is 2, then only at most 2 commands
scheduled in the queue would be taken from the queue in a FIFO manner to be executed, the
next command in the remaining queue being started only when one of the currently 2 running
commands finishes.
I've heard task-spooler could do something like this, but this package doesn't appear to
be well supported/tested and is not in the Ubuntu standard repositories (Ubuntu being what
I'm using). If that's the best alternative then let me know and I'll use task-spooler,
otherwise, I'm interested to find out what's the best, easiest, most tested, bug-free,
canonical way to do such a thing with bash.
UPDATE:
Simple solutions like ; or && from bash do not work. I need to schedule these
commands from an external program, when an event occurs. I just don't want to have hundreds
of instances of my command running simultaneously, hence the need for a queue. There's an
external program that will trigger events where I can run my own commands. I want to handle
ALL triggered events, I don't want to miss any event, but I also don't want my system to
crash, so that's why I want a queue to handle my commands triggered from the external
program.
That will list the directory. Only after ls has run it will run touch
test which will create a file named test. And only after that has finished it will run
the next command. (In this case another ls which will show the old contents and
the newly created file).
Similar commands are || and && .
; will always run the next command.
&& will only run the next command it the first returned success.
Example: rm -rf *.mp3 && echo "Success! All MP3s deleted!"
|| will only run the next command if the first command returned a failure
(non-zero) return value. Example: rm -rf *.mp3 || echo "Error! Some files could not be
deleted! Check permissions!"
If you want to run a command in the background, append an ampersand ( &
).
Example: make bzimage & mp3blaster sound.mp3 make mytestsoftware ; ls ; firefox ; make clean
Will run two commands int he background (in this case a kernel build which will take some
time and a program to play some music). And in the foregrounds it runs another compile job
and, once that is finished ls, firefox and a make clean (all sequentially)
For more details, see man bash
[Edit after comment]
in pseudo code, something like this?
Program run_queue:
While(true)
{
Wait_for_a_signal();
While( queue not empty )
{
run next command from the queue.
remove this command from the queue.
// If commands where added to the queue during execution then
// the queue is not empty, keep processing them all.
}
// Queue is now empty, returning to wait_for_a_signal
}
//
// Wait forever on commands and add them to a queue
// Signal run_quueu when something gets added.
//
program add_to_queue()
{
While(true)
{
Wait_for_event();
Append command to queue
signal run_queue
}
}
The easiest way would be to simply run the commands sequentially:
cmd1; cmd2; cmd3; cmdN
If you want the next command to run only if the previous command exited
successfully, use && :
cmd1 && cmd2 && cmd3 && cmdN
That is the only bash native way I know of doing what you want. If you need job control
(setting a number of parallel jobs etc), you could try installing a queue manager such as
TORQUE but that
seems like overkill if all you want to do is launch jobs sequentially.
You are looking for at 's twin brother: batch . It uses the same
daemon but instead of scheduling a specific time, the jobs are queued and will be run
whenever the system load average is low.
Apart from dedicated queuing systems (like the Sun Grid Engine ) which you can also
use locally on one machine and which offer dozens of possibilities, you can use something
like
command1 && command2 && command3
which is the other extreme -- a very simple approach. The latter neither does provide
multiple simultaneous processes nor gradually filling of the "queue".
A solution to this question will solve the other question as well, you might want to delete
the other question in this situation. – Pavel Šimerda
Jan 2 '15 at 8:29
To invoke a login shell using sudo just use -i . When command is
not specified you'll get a login shell prompt, otherwise you'll get the output of your
command.
Example (login shell):
sudo -i
Example (with a specified user):
sudo -i -u user
Example (with a command):
sudo -i -u user whoami
Example (print user's $HOME ):
sudo -i -u user echo \$HOME
Note: The backslash character ensures that the dollar sign reaches the target user's shell
and is not interpreted in the calling user's shell.
I have just checked the last example with strace which tells you exactly what's
happening. The output bellow shows that the shell is being called with --login
and with the specified command, just as in your explicit call to bash, but in addition
sudo can do its own work like setting the $HOME .
I noticed that you are using -S and I don't think it is generally a good
technique. If you want to run commands as a different user without performing authentication
from the keyboard, you might want to use SSH instead. It works for localhost as
well as for other hosts and provides public key authentication that works without any
interactive input.
ssh user@localhost echo \$HOME
Note: You don't need any special options with SSH as the SSH server always creates a login
shell to be accessed by the SSH client.
sudo -i -u user echo \$HOME doesn't work for me. Output: $HOME .
strace gives the same output as yours. What's the issue? – John_West
Nov 23 '15 at 11:12
You're giving Bash too much credit. All "login shell" means to Bash is what files are sourced
at startup and shutdown. The $HOME variable doesn't figure into it.
In fact, Bash doesn't do anything to set $HOME at all. $HOME is
set by whatever invokes the shell (login, ssh, etc.), and the shell inherits it. Whatever
started your shell as admin set $HOME and then exec-ed bash ,
sudo by design doesn't alter the environment unless asked or configured to do
so, so bash as otheruser inherited it from your shell.
If you want sudo to handle more of the environment in the way you're
expecting, look at the -i switch for sudo. Try:
That sudo syntax threw an error on my machine. ( su uses the
-c option, but I don't think sudo does.) I had better luck with:
HomeDir=$( sudo -u "$1" -H -s echo "\$HOME" ) – palswim
Oct 13 '16 at 20:21
"... (which means "substitute user" or "switch user") ..."
"... (hmm... what's the mnemonic? Super-User-DO?) ..."
"... The official meaning of "su" is "substitute user" ..."
"... Interestingly, Ubuntu's manpage does not mention "substitute" at all. The manpage at gnu.org ( gnu.org/software/coreutils/manual/html_node/su-invocation.html ) does indeed say "su: Run a command with substitute user and group ID". ..."
"... sudo -s runs a [specified] shell with root privileges. sudo -i also acquires the root user's environment. ..."
"... To see the difference between su and sudo -s , do cd ~ and then pwd after each of them. In the first case, you'll be in root's home directory, because you're root. In the second case, you'll be in your own home directory, because you're yourself with root privileges. There's more discussion of this exact question here . ..."
"... I noticed sudo -s doesnt seem to process /etc/profile ..."
The main difference between these commands is in the way they restrict access to their
functions.
su(which means "substitute user" or "switch user") - does exactly
that, it starts another shell instance with privileges of the target user. To ensure you have
the rights to do that, it asks you for the password of the target user . So, to become root,
you need to know root password. If there are several users on your machine who need to run
commands as root, they all need to know root password - note that it'll be the same password.
If you need to revoke admin permissions from one of the users, you need to change root
password and tell it only to those people who need to keep access - messy.
sudo(hmm... what's the mnemonic? Super-User-DO?) is completely
different. It uses a config file (/etc/sudoers) which lists which users have rights to
specific actions (run commands as root, etc.) When invoked, it asks for the password of the
user who started it - to ensure the person at the terminal is really the same "joe" who's
listed in /etc/sudoers . To revoke admin privileges from a person, you just need
to edit the config file (or remove the user from a group which is listed in that config).
This results in much cleaner management of privileges.
As a result of this, in many Debian-based systems root user has no password
set - i.e. it's not possible to login as root directly.
Also, /etc/sudoers allows to specify some additional options - i.e. user X is
only able to run program Y etc.
The often-used sudo su combination works as follows: first sudo
asks you for your password, and, if you're allowed to do so, invokes the next
command ( su ) as a super-user. Because su is invoked by
root , it require you to enter your password instead of root.
So,
sudo su allows you to open a shell as another user (including root), if you're
allowed super-user access by the /etc/sudoers file.
I've never seen su as "switch user", but always as superuser; the default
behavior without another's user name (though it makes sense). From wikipedia : "The su command, also referred to as super user[1] as early as 1974,
has also been called "substitute user", "spoof user" or "set user" because it allows changing the account associated with the
current terminal (window)."
@dr jimbob: you're right, but I'm finding that "switch user" is kinda describes better what
it does - though historically it stands for "super user". I'm also delighted to find that the
wikipedia article is very similar to my answer - I never saw the article before :)
sudo lets you run commands in your own user account with root privileges.
su lets you switch user so that you're actually logged in as root.
sudo -s runs a [specified] shell with root privileges. sudo -i also acquires
the root user's environment.
To see the difference between su and sudo -s , do cd
~ and then pwd after each of them. In the first case, you'll be in root's
home directory, because you're root. In the second case, you'll be in your own home
directory, because you're yourself with root privileges. There's more discussion of this exact question here .
"you're yourself with root privileges" is not what's actually happening :) Actually, it's not
possible to be "yourself with root privileges" - either you're root or you're yourself. Try
typing whoami in both cases. The fact that cd ~ results are different is
a result of sudo -s not setting $HOME environment variable. – Sergey
Oct 22 '11 at 7:28
@Sergey, whoami it says are 'root' because you are running the 'whoami' cmd as though you
sudoed it, so temporarily (for the duration of that command) you appear to be the root user,
but you might still not have full root access according to the sudoers file. –
Octopus
Feb 6 '15 at 22:15
@Octopus: what I was trying to say is that in Unix, a process can only have one UID, and that
UID determines the permissions of the process. You can't be "yourself with root privileges",
a program either runs with your UID or with root's UID (0). – Sergey
Feb 6 '15 at 22:24
Regarding "you might still not have full root access according to the sudoers file": the
sudoers file controls who can run which command as another user, but that happens before the command is executed.
However, once you were allowed to start a process as, say, root -- the running process has root's UID and has a full access to the system, there's
no way for sudo to restrict that.
Again, you're always either yourself or root, there's no
"half-n-half". So, if sudoers file allows you to run shell as root -- permissions
in that shell would be indistinguishable from a "normal" root shell. – Sergey
Feb 6 '15 at 22:32
This answer is a dupe of my answer on a
dupe of this question , put here on the canonical answer so that people can find it!
The major difference between sudo -i and sudo -s is:
sudo -i gives you the root environment, i.e. your ~/.bashrc
is ignored.
sudo -s gives you the user's environment, so your ~/.bashrc
is respected.
Here is an example, you can see that I have an application lsl in my
~/.bin/ directory which is accessible via sudo -s but not
accessible with sudo -i . Note also that the Bash prompt changes as will with
sudo -i but not with sudo -s :
dotancohen@melancholy:~$ ls .bin
lsl
dotancohen@melancholy:~$ which lsl
/home/dotancohen/.bin/lsl
dotancohen@melancholy:~$ sudo -i
root@melancholy:~# which lsl
root@melancholy:~# exit
logout
dotancohen@melancholy:~$ sudo -s
Sourced .bashrc
dotancohen@melancholy:~$ which lsl
/home/dotancohen/.bin/lsl
dotancohen@melancholy:~$ exit
exit
Though sudo -s is convenient for giving you the environment that you are
familiar with, I recommend the use of sudo -i for two reasons:
The visual reminder that you are in a 'root' session.
The root environment is far less likely to be poisoned with malware, such as a rogue
line in .bashrc .
sudo asks for your own password (and also checks if you're allowed to run
commands as root, which is configured through /etc/sudoers -- by default all
user accounts that belong to the "admin" group are allowed to use sudo).
sudo -s launches a shell as root, but doesn't change your working directory.
sudo -i simulates a login into the root account: your working directory will be
/root , and root's .profile etc. will be sourced as if on
login.
to make the answer more complete: sudo -s is almost equal to su
($HOME is different) and sudo -i is equal to su - –
In Ubuntu or a related system, I don't find much use for su in the traditional,
super-user sense. sudo handles that case much better. However, su
is great for becoming another user in one-off situations where configuring sudoers would be
silly.
For example, if I'm repairing my system from a live CD/USB, I'll often mount my hard drive
and other necessary stuff and chroot into the system. In such a case, my first
command is generally:
su - myuser # Note the '-'. It means to act as if that user had just logged in.
That way, I'm operating not as root, but as my normal user, and I then use
sudo as appropriate.
This is nothing to do with another physical user. Both ID's are mine. I know the password as
I created the account. I just don't want to have to type the password every time. –
zio
Feb 17 '13 at 13:24
It needs a bit of configuration though, but once done you would only do this:
sudo -u user2 -s
And you would be logged in as user2 without entering a password.
Configuration
To configure sudo, you must edit its configuration file via: visudo . Note:
this command will open the configuration using the vi text editor, if you are
unconfortable with that, you need to set another editor (using export
EDITOR=<command> ) before executing the following line. Another command line
editor sometimes regarded as easier is nano , so you would do export
EDITOR=/usr/bin/nano . You usually need super user privilege for visudo
:
sudo visudo
This file is structured in different section, the aliases, then defaults and finally at
the end you have the rules. This is where you need to add the new line. So you navigate at
the end of the file and add this:
user1 ALL=(user2) NOPASSWD: /bin/bash
You can replace also /bin/bash by ALL and then you could launch
any command as user2 without a password: sudo -u user2 <command>
.
Update
I have just seen your comment regarding Skype. You could consider adding Skype directly to
the sudo's configuration file. I assume you have Skype installed in your
Applications folder:
One thing to note from a security-perspective is that specifying a specific command implies
that it should be a read-only command for user1; Otherwise, they can overwrite the command
with something else and run that as user2. And if you don't care about that, then you might
as well specify that user1 can run any command as user2 and therefore have a simpler
sudo config. – Stan Kurdziel
Oct 26 '15 at 16:56
@StanKurdziel good point! Although it is something to be aware of, it's really seldom to have
system executables writable by users unless you're root but in this case you don't need sudo
;-) But you're right to add this comment because it's so seldom that I've probably overlooked
it more than one time. – Huygens
Oct 26 '15 at 19:24
To get it nearer to the behaviour su - user2 instead of su user2 ,
the commands should probably all involve sudo -u user2 -i , in order to simulate
an initial login as user2 – Gert van den Berg
Aug 10 '16 at 14:24
You'd still have the issues where Skype gets confused since two instances are running on
one user account and files read/written by that program might conflict. It also might work
well enough for your needs and you'd not need an iPod touch to run your second Skype
instance.
This is a good secure solution for the general case of password-free login to any account on
any host, but I'd say it's probably overkill when both accounts are on the same host and
belong to the same user. – calum_b
Feb 18 '13 at 9:54
@scottishwildcat It's far more secure than the alternative of scripting the password and
feeding it in clear text or using a variable and storing the password in the keychain and
using a tool like expect to script the interaction. I just use sudo su -
blah and type my password. I think the other answer covers sudo well enough to keep
this as a comment. – bmike ♦
Feb 18 '13 at 14:02
We appear to be in total agreement - thanks for the addition - feel free to edit it into the
answer if you can improve on it. – bmike ♦
Feb 18 '13 at 18:46
The accepted solution ( sudo -u user2 <...> ) does have the advantage that
it can't be used remotely, which might help for security - there is no private key for user1
that can be stolen. – Gert van den Berg
Aug 10 '16 at 14:20
If you want to use sudo su - user without a password, you should (if you have
the privileges) do the following on you sudoers file:
<youuser> ALL = NOPASSWD: /bin/su - <otheruser>
where:
<yourusername> is you username :D (saumun89, i.e.)
<otheruser> is the user you want to change to
Then put into the script:
sudo /bin/su - <otheruser>
Doing just this, won't get subsequent commands get run by <otheruser> ,
it will spawn a new shell. If you want to run another command from within the script as this
other user, you should use something like:
sudo -u <otheruser> <command>
And in sudoers file:
<yourusername> ALL = (<otheruser>) NOPASSWD: <command>
Obviously, a more generic line like:
<yourusername> ALL = (ALL) NOPASSWD: ALL
Will get things done, but would grant the permission to do anything as anyone.
when the sudo su - user command gets executed,it asks for a password. i want a solution in
which script automaticaaly reads password from somewhere. i dont have permission to do what u
told earlier. – sam
Feb 9 '11 at 11:43
echo "your_password" | sudo -S [rest of your parameters for sudo]
(Of course without [ and ])
Please note that you should protect your script from read access from unauthorized users.
If you want to read password from separate file, you can use
sudo -S [rest of your parameters for sudo] < /etc/sudo_password_file
(Or whatever is the name of password file, containing password and single line break.)
From sudo man page:
-S The -S (stdin) option causes sudo to read the password from
the standard input instead of the terminal device. The
password must be followed by a newline character.
The easiest way is to make it so that user doesn't have to type a password at all.
You can do that by running visudo , then changing the line that looks
like:
someuser ALL=(ALL) ALL
to
someuser ALL=(ALL) NOPASSWD: ALL
However if it's just for one script, it would be more secure to restrict passwordless
access to only that script, and remove the (ALL) , so they can only run it as
root, not any user , e.g.
i do not have permission to edit sudoers file.. any other so that it should read password
from somewhere so that automation of this can be done. – sam
Feb 9 '11 at 11:34
you are out of luck ... you could do this with, lets say expect but that would
let the password for your user hardcoded somewhere, where people could see it (granted that
you setup permissions the right way, it could still be read by root). – Torian
Feb 9 '11 at 11:40
when the sudo su - user command gets executed,it asks for a password. i want a solution in
which script automaticaaly reads password from somewhere. i dont have permission to edit
sudoers file.i have the permission to store password in a file.the script should read
password from that file – sam
The sudoers policy plugin determines a user's sudo privileges.
For the targetpw:
sudo will prompt for the password of the user specified by the -u option (defaults to
root) instead of the password of the invoking user when running a command or editing a
file.
sudo(8) allows you to execute commands as someone else
So, basically it says that any user can run any command on any host as any user and yes,
the user just has to authenticate, but with the password of the other user, in order to run
anything.
The first ALL is the users allowed
The second one is the hosts
The third one is the user as you are running the command
The last one is the commands allowed
is "ALL ALL=!SUDOSUDO" as the last line is like when having DROP iptables POLICY and still
using a -j DROP rule as last rule in ex.: INPUT chain? :D or does it has real effects?
– gasko peter
Dec 6 '12 at 14:30
"... IF you're using putty in either Xorg or Windows (i.e terminal within a gui) , it's possible to use the "conventional" right-click copy/paste behavior while in mc. Hold the shift key while you mark/copy. ..."
"... Putty has ability to copy-paste. In mcedit, hold Shift and select by mouse ..."
open the other file in the editor, and navigate to the target location
press Shift+F5 to open Insert file dialog
press Enter to paste from the default file location (which is same as the
one in Save block dialog)
NOTE: There are other environment related methods, that could be more conventional
nowadays, but the above one does not depend on any desktop environment related clipboard,
(terminal emulator features, putty, Xorg, etc.). This is a pure mcedit feature which works
everywhere.
If you get unwanted indents in what was pasted then while editing file in Midnight Commander
press F9 to show top menu and in Options/Generals menu uncheck Return does
autoindent option. Yes, I was happy when I found it too :) – Piotr Dobrogost
Mar 30 '17 at 17:32
IF you're using putty in either Xorg or Windows (i.e terminal within a gui) , it's possible
to use the "conventional" right-click copy/paste behavior while in mc. Hold the shift key
while you mark/copy.
LOL - did you actually read the other answers? And your answer is incomplete, you should
include what to do with the mouse in order to "select by mouse".
According to help in MC:
Ctrl + Insert copies to the mcedit.clip, and Shift +
Insert pastes from mcedit.clip.
It doesn't work for me, by some reason, but by pressing F9 you get a menu,
Edit > Copy to clipfile - worked fine.
You can use IDA Pro by
Hex-Rays . You will usually not get
good C++ out of a binary unless you compiled in debugging information. Prepare to spend a lot
of manual labor reversing the code.
If you didn't strip the binaries there is some hope as IDA Pro can produce C-alike code
for you to work with. Usually it is very rough though, at least when I used it a couple of
years ago.
To clarify, IDA will only give the disassembly. There's an add-on to it called Hex-Rays that
will decompile the rest of the way into C/C++ source, to the extent that's possible. –
davenpcjMay
5 '12 at
information is discarded in the compiling process. Even if a decompiler could produce the
logical equivalent code with classes and everything (it probably can't), the self-documenting
part is gone in optimized release code. No variable names, no routine names, no class names -
just addresses.
Yes, but none of them will manage to produce readable enough code to worth the effort. You
will spend more time trying to read the decompiled source with assembler blocks inside, than
rewriting your old app from scratch.
"... MC_HOME variable can be set to alternative path prior to starting mc. Man pages are not something you can find the answer right away =) ..."
"... A small drawback of this solution: if you set MC_HOME to a directory different from your usual HOME, mc will ignore the content of your usual ~/.bashrc so, for example, your custom aliases defined in that file won't work anymore. Workaround: add a symlink to your ~/.bashrc into the new MC_HOME directory ..."
That turned out to be simpler as one might think. MC_HOME variable can be set to alternative
path prior to starting mc. Man pages are not something you can find the answer right away =)
You have to share the same user name on remote server (access can be distinguished by rsa
keys) and want to use your favorite mc configuration w/o overwriting it. Concurrent sessions
do not interfere each other.
A small drawback of this solution: if you set MC_HOME to a directory different from your
usual HOME, mc will ignore the content of your usual ~/.bashrc so, for example, your custom
aliases defined in that file won't work anymore. Workaround: add a symlink to your ~/.bashrc
into the new MC_HOME directory – Cri
Sep 5 '16 at 10:26
If you mean, you want to be able to run two instances of mc as the same user at the same
time with different config directories, as far as I can tell you can't. The path is
hardcoded.
However, if you mean, you want to be able to switch which config directory is being used,
here's an idea (tested, works). You probably want to do it without mc running:
Create a directory $HOME/mc_conf , with a subdirectory, one
.
Move the contents of $HOME/.config/mc into the
$HOME/mc_conf/one subdirectory
Duplicate the one directory as $HOME/mc_conf/two .
Create a script, $HOME/bin/switch_mc :
#!/bin/bash
configBase=$HOME/mc_conf
linkPath=$HOME/.config/mc
if [ -z $1 ] || [ ! -e "$configBase/$1" ]; then
echo "Valid subdirecory name required."
exit 1
fi
killall mc
rm $linkPath
ln -sv $configBase/$1 $linkPath
Run this, switch_mc one . rm will bark about no such file,
that doesn't matter.
Hopefully it's clear what's happening there -- this sets a the config directory path as a
symlink. Whatever configuration changes you now make and save will be int the
one directory. You can then exit and switch_mc two , reverting to
the old config, then start mc again, make changes and save them, etc.
You could get away with removing the killall mc and playing around; the
configuration stuff is in the ini file, which is read at start-up (so you can't
switch on the fly this way). It's then not touched until exit unless you "Save setup", but at
exit it may be overwritten, so the danger here is that you erase something you did earlier or
outside of the running instance.
that works indeed, your idea is pretty clear, thank you for your time However my idea was to
be able run differently configured mc's under the same account not interfering each other. I
should have specified that in my question. The path to config dir is in fact hardcoded, but
it is hardcoded RELATIVELY to user's home dir, that is the value of $HOME, thus changing it before mc start DOES change the config dir location - I've checked that. the drawback is
$HOME stays changed as long as mc runs, which could be resolved if mc had a kind of startup
hook to put restore to original HOME into – Tagwint
Dec 18 '14 at 16:52
I have been using a rsync script to synchronize data at one host with the data
at another host. The data has numerous small-sized files that contribute to almost 1.2TB.
In order to sync those files, I have been using rsync command as follows:
As a test, I picked up two of those projects (8.5GB of data) and I executed the command
above. Being a sequential process, it tool 14 minutes 58 seconds to complete. So, for 1.2TB
of data it would take several hours.
If I would could multiple rsync processes in parallel (using
& , xargs or parallel ), it would save my
time.
I tried with below command with parallel (after cd ing to source
directory) and it took 12 minutes 37 seconds to execute:
If possible, we would want to use 50% of total bandwidth. But, parallelising multiple
rsync s is our first priority. – Mandar Shinde
Mar 13 '15 at 7:32
In fact, I do not know about above parameters. For the time being, we can neglect the
optimization part. Multiple rsync s in parallel is the primary focus now.
– Mandar Shinde
Mar 13 '15 at 7:47
Here, --relative option ( link
) ensured that the directory structure for the affected files, at the source and destination,
remains the same (inside /data/ directory), so the command must be run in the
source folder (in example, /data/projects ).
That would do an rsync per file. It would probably be more efficient to split up the whole
file list using split and feed those filenames to parallel. Then use rsync's
--files-from to get the filenames out of each file and sync them. rm backups.*
split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync
--progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/ –
Sandip Bhattacharya
Nov 17 '16 at 21:22
How does the second rsync command handle the lines in result.log that are not files? i.e.
receiving file list ... donecreated directory /data/ . –
Mike D
Sep 19 '17 at 16:42
On newer versions of rsync (3.1.0+), you can use --info=name in place of
-v , and you'll get just the names of the files and directories. You may want to
use --protect-args to the 'inner' transferring rsync too if any files might have spaces or
shell metacharacters in them. – Cheetah
Oct 12 '17 at 5:31
I would strongly discourage anybody from using the accepted answer, a better solution is to
crawl the top level directory and launch a proportional number of rync operations.
I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G,
and in some benchmarks can saturate the link. Performance was evaluated using zpool
iostat 1 .
The source drive was mounted like:
mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0
This in synthetic benchmarks (crystal disk), performance for sequential write approaches
900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference
between waiting a weekend and two weeks.
So, I built the file list and tried to run the sync again (I have a 64 core machine):
In conclusion, as @Sandip Bhattacharya brought up, write a small script to get the
directories and parallel that. Alternatively, pass a file list to rsync. But don't create new
instances for each file.
ls -1 | parallel rsync -a {} /destination/directory/
Which only is usefull when you have more than a few non-near-empty directories, else
you'll end up having almost every rsync terminating and the last one doing all
the job alone.
rsync is a great tool, but sometimes it will not fill up the available bandwidth. This
is often a problem when copying several big files over high speed connections.
The following will start one rsync per big file in src-dir to dest-dir on the server
fooserver:
If I use --dry-run option in rsync , I would have a list of files
that would be transferred. Can I provide that file list to parallel in order to
parallelise the process? – Mandar Shinde
Apr 10 '15 at 3:47
Let's start with what they have in common: All three formats store
sequence data, and
sequence metadata.
Furthermore, all three formats are text-based.
However, beyond that all three formats are different and serve different purposes.
Let's start with the simplest format:
FASTA
FASTA stores a variable number of sequence records, and for each record it stores the
sequence itself, and a sequence ID. Each record starts with a header line whose first
character is > , followed by the sequence ID. The next lines of a record
contain the actual sequence.
The Wikipedia
artice gives several examples for peptide sequences, but since FASTQ and SAM are used
exclusively (?) for nucleotide sequences, here's a nucleotide example:
In the context of nucleotide sequences, FASTA is mostly used to store reference data; that
is, data extracted from a curated database; the above is adapted from GtRNAdb (a database of tRNA sequences).
FASTQ
FASTQ was conceived to solve a specific problem of FASTA files: when sequencing, the
confidence in a given base call (that is, the identity of a
nucleotide) varies. This is expressed in the Phred quality score . FASTA had no
standardised way of encoding this. By contrast, a FASTQ record contains a sequence of quality
scores for each nucleotide.
A FASTQ record has the following format:
A line starting with @ , containing the sequence ID.
One or more lines that contain the sequence.
A new line starting with the character + , and being either empty or
repeating the sequence ID.
One or more lines that contain the quality scores.
Here's an example of a FASTQ file with two records:
FASTQ files are mostly used to store short-read data from high-throughput sequencing
experiments. As a consequence, the sequence and quality scores are usually put into a single
line each, and indeed many tools assume that each record in a FASTQ file is exactly four
lines long, even though this isn't guaranteed.
SAM files are so complex that a complete description[PDF]
takes 15 pages. So here's the short version.
The original purpose of SAM files is to store mapping information for sequences from
high-throughput sequencing. As a consequence, a SAM record needs to store more than just the
sequence and its quality, it also needs to store information about where and how a sequence
maps into the reference.
Unlike the previous formats, SAM is tab-based, and each record, consisting of either 11 or
12 fields, fills exactly one line. Here's an example (tabs replaced by fixed-width
spacing):
For a description of the individual fields, refer to the documentation. The relevant bit
is this: SAM can express exactly the same information as FASTQ, plus, as mentioned, the
mapping information. However, SAM is also used to store read data without mapping
information.
In addition to sequence records, SAM files can also contain a header , which
stores information about the reference that the sequences were mapped to, and the tool used
to create the SAM file. Header information precede the sequence records, and consist of lines
starting with @ .
SAM itself is almost never used as a storage format; instead, files are stored in BAM
format, which is a compact binary representation of SAM. It stores the same information, just
more efficiently, and in conjunction with a search index , allows fast retrieval of
individual records from the middle of the file (= fast random access ). BAM files are also much
more compact than compressed FASTQ or FASTA files.
The above implies a hierarchy in what the formats can store: FASTA ⊂ FASTQ
⊂ SAM.
In a typical high-throughput analysis workflow, you will encounter all three file
types:
FASTA to store the reference genome/transcriptome that the sequence fragments will be
mapped to.
FASTQ to store the sequence fragments before mapping.
SAM/BAM to store the sequence fragments after mapping.
FASTQ is used for long-read sequencing as well, which could have a single record being
thousands of 80-character lines long. Sometimes these are split by line breaks, sometimes
not. – Scott Gigante
Aug 17 '17 at 6:01
Sorry, should have clarified: I was just referring to the line "FASTQ files are (almost?)
exclusively used to store short-read data from high-throughput sequencing experiments."
Definitely not exclusively. – Scott Gigante
Aug 17 '17 at 13:22
@charlesdarwin I have no idea. The line with the plus sign is completely redundant. The
original developers of the FASTQ format probably intended it as a redundancy to simplify
error checking (= to see if the record was complete) but it fails at that. In hindsight it
shouldn't have been included. Unfortunately we're stuck with it for now. – Konrad
Rudolph
Feb 21 at 17:06
FASTA file format is a DNA sequence format for specifying or representing DNA
sequences and was first described by Pearson (Pearson,W.R. and Lipman,D.J. (1988)
Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85,
2444–2448)
FASTQ is another DNA sequence file format that extends the FASTA format with
the ability to store the sequence quality. The quality scores are often represented in ASCII
characters which correspond to a phred score)
Both FASTA and FASTQ are common sequence representation formats and have emerged as key
data interchange formats for molecular biology and bioinformatics.
SAM is format for representing sequence alignment information from a read
aligner. It represents sequence information in respect to a given reference sequence. The
information is stored in a series of tab delimited ascii columns. The full SAM format
specification is available at http://samtools.sourceforge.net/SAM1.pdf
SAM can also (and is increasingly used for it, see PacBio) store unaligned sequence
information, and in this regard equivalent to FASTQ. – Konrad Rudolph
Jun 2 '17 at 10:43
Incidentally, the first part of your question is something you could have looked up yourself
as the first hits on Google of "NAME format" point you to primers on Wikipedia, no less. In
future, please do that before asking a question.
FASTA (officially) just stores the name of a sequence and the sequence, inofficially
people also add comment fields after the name of the sequence. FASTQ was invented to store
both sequence and associated quality values (e.g. from sequencing instruments). SAM was
invented to store alignments of (small) sequences (e.g. generated from sequencing) with
associated quality values and some further data onto a larger sequences, called reference
sequences, the latter being anything from a tiny virus sequence to ultra-large plant
sequences.
FASTA and FATSQ formats are both file formats that contain sequencing reads while SAM files
are these reads aligned to a reference sequence. In other words, FASTA and FASTQ are the "raw
data" of sequencing while SAM is the product of aligning the sequencing reads to a refseq.
A FASTA file contains a read name followed by the sequence. An example of one of these
reads for RNASeq might be:
>Flow cell number: lane number: chip coordinates etc.
ATTGGCTAATTGGCTAATTGGCTAATTGGCTAATTGGCTAATTGGCTAATTGGCTAATTGGCTA
The FASTQ version of this read will have two more lines, one + as a space holder and then
a line of quality scores for the base calls. The qualities are given as characters with '!'
being the lowest and '~' being the highest, in increasing ASCII value. It would look
something like this
@Flow cell number: lane number: chip coordinates etc.
ATTGGCTAATTGGCTAATTGGCTAATTGGCTAATTGGCTAATTGGCTAATTGGCTAATTGGCTA
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65
A SAM file has many fields for each alignment, the header begins with the @ character. The
alignment contains 11 mandatory fields and various optional ones. You can find the spec file
here: https://samtools.github.io/hts-specs/SAMv1.pdf
.
Often you'll see BAM files which are just compressed binary versions of SAM files. You can
view these alignment files using various tools, such as SAMtools, IGV or USCS Genome
browser.
As to the benefits, FASTA/FASTQ vs. SAM/BAM is comparing apples and oranges. I do a lot of
RNASeq work so generally we take the FASTQ files and align them the a refseq using an aligner
such as STAR which outputs SAM/BAM files. There's a l ot you can do with just these alignment
files, looking at expression, but usually I'll use a tool such as RSEM to "count" the reads
from various genes to create an expression matrix, samples as columns and genes as rows.
Whether you get FASTQ or FASTA files just depends on your sequencing platform. I've never
heard of anybody really using the quality scores.
Careful, the FASTQ format description is wrong: a FASTQ record can span more than four lines;
also, + isn't a placeholder, it's a separator between the sequence and the
quality score, with an optional repetition of the record ID following it. Finally, the
quality score string has to be the same length as the sequence. – Konrad Rudolph
Jun 2 '17 at 10:47
I'm having an issue with writing a Perl script to read a binary file.
My code is as the following
whereby the
$file
are files in binary format. I tried to search through the web and apply
in my code, tried to print it out, but it seems it doesn't work well.
Currently it only prints the '&&&&&&&&&&&" and ""ppppppppppp", but what I really want is it can
print out each of the
$line
, so that I can do some other post processing later. Also, I'm
not quite sure what the
$data
is as I see it is part of the code from sample in article,
stating suppose to be a scalar. I need somebody who can pin point me where the error goes wrong in my
code. Below is what I did.
my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my @dirs = readdir(TEMP1);
closedir(TEMP1);
foreach my $dirs (@dirs) {
next if ($dirs eq "." || $dirs eq "..");
print "---->$dirs\n";
my $d = "$basedir/$key/$dirs";
if (-d "$d") {
opendir (TEMP2, $d) || die $!;
my @files = readdir (TEMP2); # This should read binary files
closedir (TEMP2);
#my $buffer = "";
#opendir (FILE, $d) || die $!;
#binmode (FILE);
#my @files = readdir (FILE, $buffer, 169108570);
#closedir (FILE);
foreach my $file (@files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $file || die $!;
binmode FILE;
foreach ($line = read (FILE, $data, 169108570)) {
print "&&&&&&&&&&&$line\n";
print "ppppppppppp$data\n";
}
close FILE;
}
}
}
I have altered my code so that it goes like as below. Now I can read the $data. Thanks J-16 SDiZ for
pointing out that. I'm trying to push the info I got from the binary file to an array called "@array",
thinkking to grep data from the array for string whichever match "p04" but fail. Can someone point out
where is the error?
my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my @dirs = readdir (TEMP1);
closedir (TEMP1);
foreach my $dirs (@dirs) {
next if ($dirs eq "." || $dirs eq "..");
print "---->$dirs\n";
my $d = "$basedir/$key/$dirs";
if (-d "$d") {
opendir (TEMP2, $d) || die $!;
my @files = readdir (TEMP2); #This should read binary files
closedir (TEMP2);
foreach my $file (@files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $file || die $!;
binmode FILE;
foreach ($line = read (FILE, $data, 169108570)) {
print "&&&&&&&&&&&$line\n";
print "ppppppppppp$data\n";
push @array, $data;
}
close FILE;
}
}
}
foreach $item (@array) {
#print "==>$item<==\n"; # It prints out content of binary file without the ==> and <== if I uncomment this.. weird!
if ($item =~ /p04(.*)/) {
print "=>$item<===============\n"; # It prints "=><===============" according to the number of binary file I have. This is wrong that I aspect it to print the content of each binary file instead :(
next if ($item !~ /^w+/);
open (LOG, ">log") or die $!;
#print LOG $item;
close LOG;
}
}
Again, I changed my code as following, but it still doesn't work as it do not able to grep the "p04"
correctly by checking on the "log" file. It did grep the whole file including binary like this
"@^@^@^@^G^D^@^@^@^^@p04bbhi06^@^^@^@^@^@^@^@^@^@hh^R^@^@^@^^@^@^@p04lohhj09^@^@^@^^@@" . What I'm
aspecting is it do grep the anything with p04 only such as grepping p04bbhi06 and p04lohhj09. Here is
how my code goes:-
foreach my $file (@files) {
next if ($file eq "." || $file eq "..");
my $f = "$d/$file";
print "==>$file\n";
open FILE, $f || die $!;
binmode FILE;
my @lines = <FILE>;
close FILE;
foreach $cell (@lines) {
if ($cell =~ /b12/) {
push @array, $cell;
}
}
}
#my @matches = grep /p04/, @lines;
#foreach $item (@matches) {
foreach $item (@array) {
#print "-->$item<--";
open (LOG, ">log") or die $!;
print LOG $item;
close LOG;
}
There is no such thing as 'binary format'. Please be more precise.
What format are the files in? What characteristics do they have that cause you to call them 'in
binary format'?
�
reinierpost
Jan 30 '12 at 13:00
It is in .gds format. This file is able to read in Unix with strings
command. It was reaable in my Perl script but I am not able to grep the data I wanted (p04* here
in my code) .
�
Grace
Jan 31 '12 at 6:56
As already suggested, use File::Find or something to get your list of
files. For the rest, what do you really want? Output the whole file content if you found a match?
Or just the parts that match? And what do you want to match?
p04(.*)
matches
anything from "p04" up to the next newline. You then have that "anything" in
$1
.
Leave out all the clumsy directory stuff and concentrate first on what you want out of a single
file. How big are the files? You are only reading the first 170MB. And you keep overwriting the
"log" file, so it only contains the last item from the last file.
�
mivk
Nov 19 '13 at 13:16
@reinierpost the OP under the "binary file" probably mean the opposite
of the text files - e.g. same thing as is in the
perldoc's -X
documentation
see the
-B
explanation. (cite:
-B
File is a "binary"
file (opposite of -T).)
�
jm666
May 12 '15 at 6:44
The data is in
$data
; and
$line
is the number of bytes read.
my $f = "$d/$file" ;
print "==>$file\n" ;
open FILE, $file || die $! ;
I guess the full path is in
$f
, but you are opening
$file
. (In my
testing -- even
$f
is not the full path, but I guess you may have some other glue
code...)
If you just want to walk all the files in a directory, try
File::DirWalk
or
File::Find
.
Hi J-16 SDiZ, thanks for the reply. each of the $file is in binary
format, and what I want to do is to read eaxh of the file to grep some information in readable
format and dump into another file (which I consider here as post processing). I want to perform
something like "strings <filename> | grep <text synctax>" as in Unix. whereby the <filename> is
the $file here in my code. My problem here is cannot read the binary file so that I can proceed
with other stuff. Thanks.
�
Grace
Jan 19 '12 at 2:34
Hi Dinanoid, thanks for your answer, I tried it but it didn't work
well for me. I tried to edit my code as above (my own code, and it didn't work). Also, tried
code as below as you suggested, it didn't work for me either. Can you point out where I did
wrong? Thanks.
�
Grace
Jan 30 '12 at 4:30
I'm not sure I'll be able to answer the OP question exactly, but here are some notes that may be
related. (edit: this is the same approach as answer by @Dimanoid, but with more detail)
Say you
have a file, which is a mix of ASCII data, and binary. Here is an example in a
bash
terminal:
Note that byte
00
(specified as
\x00
) is a non-printable character, (and
in
C
, it also means "end of a string") - thereby, its presence makes
tester.txt
a binary file. The file has size of 13 bytes as seen by
du
, because of the trailing
\n
added by the
echo
(as it can be seen from
hexdump
).
Now, let's see what happens when we try to read it with
perl
's
<>
diamond operator (see also
What's the use of <>
in perl?
):
$ perl -e '
open IN, "<./tester.txt";
binmode(IN);
$data = <IN>; # does this slurp entire file in one go?
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'
length is: 7
data is: --aa aa
--
Clearly, the entire file didn't get slurped - it broke at the line end
\n
(and not at
the binary
\x00
). That is because the diamond filehandle
<FH>
operator is
actually shortcut for
readline
(see
Perl
Cookbook: Chapter 8, File Contents
)
The same link tells that one should undef the input record separator,
\$
(which by
default is set to
\n
), in order to slurp the entire file. You may want to have this
change be only local, which is why the braces and
local
are used instead of
undef
(see
Perl Idioms
Explained - my $string = do { local $/; };
); so we have:
$ perl -e '
open IN, "<./tester.txt";
print "_$/_\n"; # check if $/ is \n
binmode(IN);
{
local $/; # undef $/; is global
$data = <IN>; # this should slurp one go now
};
print "_$/_\n"; # check again if $/ is \n
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'
_
_
_
_
length is: 13
data is: --aa aa
bb bb
--
... and now we can see the file is slurped in its entirety.
Since binary data implies unprintable characters, you may want to inspect the actual contents of
$data
by printing via
sprintf
or
pack
/
unpack
instead.
More and more tar archives use the
xz format based on
LZMA2 for compression instead of the traditional bzip2(bz2) compression. In fact
kernel.org made a late " Good-bye bzip2 " announcement, 27th Dec.
2013 , indicating kernel sources would from this point on be released in both tar.gz and
tar.xz format - and on the main page of the website what's directly offered is in tar.xz .
Are there any specific reasons explaining why this is happening and what is the relevance
of gzip in this
context?
> ,
For distributing archives over the Internet, the following things are generally a priority:
Compression ratio (i.e., how small the compressor makes the data);
Decompression time (CPU requirements);
Decompression memory requirements; and
Compatibility (how wide-spread the decompression program is)
Compression memory & CPU requirements aren't very important, because you can use a
large fast machine for that, and you only have to do it once.
Compared to bzip2, xz has a better compression ratio and lower (better) decompression
time. It, however -- at the compression settings typically used -- requires more memory to
decompress [1] and is somewhat less widespread. Gzip uses less memory than
either.
So, both gzip and xz format archives are posted, allowing you to pick:
Need to decompress on a machine with very limited memory (<32 MB): gzip.
Given, not very likely when talking about kernel sources.
Need to decompress minimal tools available: gzip
Want to save download time and/or bandwidth: xz
There isn't really a realistic combination of factors that'd get you to pick bzip2. So its
being phased out.
I looked at compression comparisons in
a blog post . I didn't attempt to replicate the results, and I suspect some of it has
changed (mostly, I expect xz has improved, as its the newest.)
(There are some specific scenarios where a good bzip2 implementation may be preferable to
xz: bzip2 can compresses a file with lots of zeros and genome DNA sequences better than xz.
Newer versions of xz now have an (optional) block mode which allows data recovery after the
point of corruption and parallel compression and [in theory] decompression. Previously, only
bzip2 offered these. [2] However none of these are relevant for kernel
distribution)
1: In archive size, xz -3 is around bzip -9 . Then xz uses less
memory to decompress. But xz -9 (as, e.g., used for Linux kernel tarballs) uses
much more than bzip -9 . (And even xz -0 needs more than gzip
-9 ).
First of all, this question is not directly related to tar . Tar just creates an
uncompressed archive, the compression is then applied later on.
Gzip is known to be relatively fast when compared to LZMA2 and bzip2. If speed matters,
gzip (especially the multithreaded implementation pigz ) is often a good compromise between
compression speed and compression ratio. Although there are alternatives if speed is an issue
(e.g. LZ4).
However, if a high compression ratio is desired LZMA2 beats bzip2 in almost
every aspect. The compression speed is often slower, but it decompresses much faster and
provides a much better compression ratio at the cost of higher memory usage.
There is not much reason to use bzip2 any more, except of backwards
compatibility. Furthermore, LZMA2 was desiged with multithreading in mind and many
implementations by default make use of multicore CPUs (unfortunately xz on Linux
does not do this, yet). This makes sense since the clock speeds won't increase any more but
the number of cores will.
There are multithreaded bzip2 implementations (e.g. pbzip ), but they are often not installed by
default. Also note that multithreaded bzip2 only really pay off while
compressing whereas decompression uses a single thread if the file was compress
using a single threaded bzip2 , in contrast to LZMA2. Parallel
bzip2 variants can only leverage multicore CPUs if the file was compressed using
a parallel bzip2 version, which is often not the case.
Short answer : xz is more efficient in terms of compression ratio. So it saves disk space and
optimizes the transfer through the network.
You can see this
Quick Benchmark so as to discover the difference by practical tests.
LZMA2 is a block compression system whereas gzip is not. This means that LZMA2 lends itself
to multi-threading. Also, if corruption occurs in an archive, you can generally recover data
from subsequent blocks with LZMA2 but you cannot do this with gzip. In practice, you lose the
entire archive with gzip subsequent to the corrupted block. With an LZMA2 archive, you only
lose the file(s) affected by the corrupted block(s). This can be important in larger archives
with multiple files.
I have an attribute (32 bits-long), that each bit responsible to specific functionality. Perl
script I'm writing should turn on 4th bit, but save previous definitions of other bits.
I use in my program:
Sub BitOperationOnAttr
{
my $a="";
MyGetFunc( $a);
$a |= 0x00000008;
MySetFunc( $a);
}
** MyGetFunc/ MySetFunc my own functions that know read/fix value.
Questions:
if usage of $a |= 0x00000008; is right ?
how extract hex value by Regular Expression from string I have : For example:
Your questions are not related; they should be posted separately. That makes it easier for
other people with similar questions to find them. – Michael CarmanJan
12 '11 at 16:13
I upvoted, but there is something very important missing: vec operates on a
string! If we use a number; say: $val=5; printf("b%08b",$val); (this gives
b00000101 ) -- then one can see that the "check bit" syntax, say:
for($ix=7;$ix>=0;$ix--) { print vec($val, $ix, 1); }; print "\n"; will not
work (it gives 00110101 , which is not the same number). The correct is to
convert the number to ASCII char, i.e. print vec(sprintf("%c", $val), $ix, 1); .
– sdaauJul
15 '14 at 5:01
"... Purpose: I'd like to compress partition images, so filling unused space with zeros is highly recommended. ..."
"... Such an utility is zerofree . ..."
"... Be careful - I lost ext4 filesystem using zerofree on Astralinux (Debian based) ..."
"... If the "disk" your filesystem is on is thin provisioned (e.g. a modern SSD supporting TRIM, a VM file whose format supports sparseness etc.) and your kernel says the block device understands it, you can use e2fsck -E discard src_fs to discard unused space (requires e2fsprogs 1.42.2 or higher). ..."
"... If you have e2fsprogs 1.42.9, then you can use e2image to create the partition image without the free space in the first place, so you can skip the zeroing step. ..."
Two different kind of answer are possible. What are you trying to achieve? Either 1)
security, by forbidding someone to read those data, or 2) optimizing compression of
the whole partition or [SSD performance]( en.wikipedia.org/wiki/Trim_(computing) ?
– Totor
Jan 5 '14 at 2:57
Zerofree finds the unallocated, non-zeroed blocks in an ext2 or ext3 file-system and
fills them with zeroes. This is useful if the device on which this file-system resides is a
disk image. In this case, depending on the type of disk image, a secondary utility may be
able to reduce the size of the disk image after zerofree has been run. Zerofree requires
the file-system to be unmounted or mounted read-only.
The usual way to achieve the same result (zeroing the unused blocks) is to run "dd" do
create a file full of zeroes that takes up the entire free space on the drive, and then
delete this file. This has many disadvantages, which zerofree alleviates:
it is slow
it makes the disk image (temporarily) grow to its maximal extent
it (temporarily) uses all free space on the disk, so other concurrent write actions
may fail.
Zerofree has been written to be run from GNU/Linux systems installed as guest OSes
inside a virtual machine. If this is not your case, you almost certainly don't need this
package.
UPDATE #1
The description of the .deb package contains the following paragraph now which would imply
this will work fine with ext4 too.
Description: zero free blocks from ext2, ext3 and ext4 file-systems Zerofree finds the
unallocated blocks with non-zero value content in an ext2, ext3 or ext4 file-system and
fills them with zeroes...
@GrzegorzWierzowiecki: yes, that is the page, but for debian and friends it is already in the
repos. I used on a ext4 partition on a virtual disk to successively shrink the disk file
image, and had no problem. – enzotib
Jul 29 '12 at 14:12
zerofree page talks about a patch that
lets you do "filesystem is mounted with the zerofree option" so that it always zeros out
deleted files continuously. does this require recompiling the kernel then? is there an easier
way to accomplish the same thing? – endolith
Oct 14 '16 at 16:33
Summary of the methods (as mentioned in this question and elsewhere) to clear unused space on
ext2/ext3/ext4: Zeroing unused spaceFile system is not mounted
If the "disk" your filesystem is on is thin provisioned (e.g. a modern SSD supporting
TRIM, a VM file whose format supports sparseness etc.) and your kernel says the block
device understands it, you can use e2fsck -E discard src_fs to discard unused
space (requires e2fsprogs 1.42.2 or higher).
Using zerofree to
explicitly write zeros over unused blocks.
Using e2image -rap src_fs dest_fs to only copy blocks in use (new
filesystem should be on an otherwise zero'd "disk", requires e2fsprogs 1.42.9 or
higher).
File system is mounted
If the "disk" your filesystem is on is thin provisioned (e.g. a modern SSD supporting
TRIM, a VM file whose format supports sparseness etc.), your kernel says the block device
understands it and finally the ext filesystem driver supports it you can use fstrim
/mnt/fs/ to ask the filesystem to discard unused space.
Using cat /dev/zero > /mnt/fs/zeros; sync; rm /mnt/fs/zeros (
sfill from secure-delete uses this technique). This method is inefficient, not
recommended by Ted Ts'o (author of ext4), may not zero certain things and can slow down
future fscks.
Having the filesystem unmounted will give better results than having it mounted.
Discarding tends to be the fastest method when a lot of previously used space needs to be
zeroed but using zerofree after the discard process can sometimes zero a little bit extra
(depending on how discard is implemented on the "disk").
Making the image file
smallerImage is in a dedicated VM format
You will need to use an appropriate disk image tool (such as qemu-img convert
src_image dst_image ) to enable the zeroed space to be reclaimed and to allow the file
representing the image to become smaller.
Image is a raw file
One of the following techniques can be used to make the file sparse (so runs of zero stop
taking up space):
cp --sparse=always src_image dst_image .
fallocate -d src_image (requires util-linux v2.25 or higher).
These days it might easier to use a tool like virt-sparsify to do these steps and more in
one go.
sfill from secure-delete can
do this and several other related jobs.
e.g.
sfill -l -l -z /mnt/X
UPDATE #1
There is a source tree that appears to be used by the ArchLinux project on github that
contains the source for sfill which is a tool included in the package
Secure-Delete.
that URL is obsolete. no idea where its home page is now (or even if it still has one), but
it's packaged for debian and ubuntu. probably other distros too. if you need source code,
that can be found in the debian archives if you can't find it anywhere else. –
cas
Jul 29 '12 at 12:04
If you have e2fsprogs 1.42.9, then you can use e2image to create the partition
image without the free space in the first place, so you can skip the zeroing step.
I'm using tar to make daily backups of a server and want to avoid backup of
/proc and /sys system directories, but without excluding any directories named "proc" or
"sys" somewhere else in the file tree.
For, example having the following directory tree (" bla " being normal
files):
Not sure I understand your question. There is a --exclude option, but I don't
know how to match it for single, absolute file names (not any file by that name) - see my
examples above. – Udo G
May 9 '12 at 7:21
True, but the important detail about this is that the excluded file name must match exactly
the notation reported by the tar listing. For my example that would be
./sys - as I just found out now. – Udo G
May 9 '12 at 7:34
This did the trick for me, thank you! I wanted to exclude a specific directory, not all
directories/subdirectories matching the pattern. bsdtar does not have "--anchored" option
though, and with bsdtar we can use full paths to exclude specific folders. – Savvas Radevic
May 9 '13 at 10:44
ah found it! in bsdtar the anchor is "^": bsdtar cvjf test.tar.bz2 --exclude myfile.avi
--exclude "^myexcludedfolder" * – Savvas Radevic
May 9 '13 at 10:58
"... Trailing slashes at the end of excluded folders will cause tar to not exclude those folders at all ..."
"... I had to remove the single quotation marks in order to exclude sucessfully the directories ..."
"... Exclude files using tags by placing a tag file in any directory that should be skipped ..."
"... Nice and clear thank you. For me the issue was that other answers include absolute or relative paths. But all you have to do is add the name of the folder you want to exclude. ..."
"... Adding a wildcard after the excluded directory will exclude the files but preserve the directories: ..."
"... You can use cpio(1) to create tar files. cpio takes the files to archive on stdin, so if you've already figured out the find command you want to use to select the files the archive, pipe it into cpio to create the tar file: ..."
Is there a simple shell command/script that supports excluding certain files/folders from
being archived?
I have a directory that need to be archived with a sub directory that has a number of very
large files I do not need to backup.
Not quite solutions:
The tar --exclude=PATTERN command matches the given pattern and excludes
those files, but I need specific files & folders to be ignored (full file path),
otherwise valid files might be excluded.
I could also use the find command to create a list of files and exclude the ones I don't
want to archive and pass the list to tar, but that only works with for a small amount of
files. I have tens of thousands.
I'm beginning to think the only solution is to create a file with a list of files/folders
to be excluded, then use rsync with --exclude-from=file to copy all the files to
a tmp directory, and then use tar to archive that directory.
Can anybody think of a better/more efficient solution?
EDIT: cma 's solution works well. The big gotcha is that the
--exclude='./folder' MUST be at the beginning of the tar command. Full command
(cd first, so backup is relative to that directory):
cd /folder_to_backup
tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz .
I had to remove the single quotation marks in order to exclude sucessfully the directories. (
tar -zcvf gatling-charts-highcharts-1.4.6.tar.gz /opt/gatling-charts-highcharts-1.4.6
--exclude=results --exclude=target ) – Brice
Jun 24 '14 at 16:06
I came up with the following command: tar -zcv --exclude='file1' --exclude='pattern*'
--exclude='file2' -f /backup/filename.tgz . note that the -f flag needs
to precede the tar file see:
A "/" on the end of the exclude directory will cause it to fail. I guess tar thinks an ending
/ is part of the directory name to exclude. BAD: --exclude=mydir/ GOOD: --exclude=mydir
– flickerfly
Aug 21 '15 at 16:22
> Make sure to put --exclude before the source and destination items. OR use an absolute
path for the exclude: tar -cvpzf backups/target.tar.gz --exclude='/home/username/backups'
/home/username – NightKnight on
Cloudinsidr.com
Nov 24 '16 at 9:55
This answer definitely helped me! The gotcha for me was that my command looked something like
tar -czvf mysite.tar.gz mysite --exclude='./mysite/file3'
--exclude='./mysite/folder3' , and this didn't exclude anything. – Anish Ramaswamy
May 16 '15 at 0:11
Nice and clear thank you. For me the issue was that other answers include absolute or
relative paths. But all you have to do is add the name of the folder you want to exclude.
– Hubert
Feb 22 '17 at 7:38
Just want to add to the above, that it is important that the directory to be excluded should
NOT contain a final backslash. So, --exclude='/path/to/exclude/dir' is CORRECT
, --exclude='/path/to/exclude/dir/' is WRONG . – GeertVc
Dec 31 '13 at 13:35
I believe this require that the Bash shell option variable globstar has to be
enabled. Check with shopt -s globstar . I think it off by default on most
unix based OS's. From Bash manual: " globstar:If set, the pattern **
used in a filename expansion context will match all files and zero or more directories and
subdirectories. If the pattern is followed by a '/', only directories and subdirectories
match. " – not2qubit
Apr 4 at 3:24
Use the find command in conjunction with the tar append (-r) option. This way you can add
files to an existing tar in a single step, instead of a two pass solution (create list of
files, create tar).
To avoid possible 'xargs: Argument list too long' errors due to the use of
find ... | xargs ... when processing tens of thousands of files, you can pipe
the output of find directly to tar using find ... -print0 |
tar --null ... .
# archive a given directory, but exclude various files & directories
# specified by their full file paths
find "$(pwd -P)" -type d \( -path '/path/to/dir1' -or -path '/path/to/dir2' \) -prune \
-or -not \( -path '/path/to/file1' -or -path '/path/to/file2' \) -print0 |
gnutar --null --no-recursion -czf archive.tar.gz --files-from -
#bsdtar --null -n -czf archive.tar.gz -T -
$ tar --exclude='./folder_or_file' --exclude='file_pattern' --exclude='fileA'
A word of warning for a side effect that I did not find immediately obvious: The exclusion
of 'fileA' in this example will search for 'fileA' RECURSIVELY!
Example:A directory with a single subdirectory containing a file of the same name
(data.txt)
If using --exclude='data.txt' the archive will not contain EITHER data.txt
file. This can cause unexpected results if archiving third party libraries, such as a
node_modules directory.
To avoid this issue make sure to give the entire path, like
--exclude='./dirA/data.txt'
You can use cpio(1) to create tar files. cpio takes the files to archive on stdin, so if
you've already figured out the find command you want to use to select the files the archive,
pipe it into cpio to create the tar file:
That can cause tar to be invoked multiple times - and will also pack files repeatedly.
Correct is: find / -print0 | tar -T- --null --no-recursive -cjf tarfile.tar.bz2
– jørgensen
Mar 4 '12 at 15:23
I read somewhere that when using xargs , one should use tar r
option instead of c because when find actually finds loads of
results, the xargs will split those results (based on the local command line arguments limit)
into chuncks and invoke tar on each part. This will result in a archive containing the last
chunck returned by xargs and not all results found by the find
command. – Stphane
Dec 19 '15 at 11:10
gnu tar v 1.26 the --exclude needs to come after archive file and backup directory arguments,
should have no leading or trailing slashes, and prefers no quotes (single or double). So
relative to the PARENT directory to be backed up, it's:
tar cvfz /path_to/mytar.tgz ./dir_to_backup
--exclude=some_path/to_exclude
tar -cvzf destination_folder source_folder -X /home/folder/excludes.txt
-X indicates a file which contains a list of filenames which must be excluded from the
backup. For Instance, you can specify *~ in this file to not include any filenames ending
with ~ in the backup.
Possible redundant answer but since I found it useful, here it is:
While a FreeBSD root (i.e. using csh) I wanted to copy my whole root filesystem to /mnt
but without /usr and (obviously) /mnt. This is what worked (I am at /):
tar --exclude ./usr --exclude ./mnt --create --file - . (cd /mnt && tar xvd -)
My whole point is that it was necessary (by putting the ./ ) to specify to tar
that the excluded directories where part of the greater directory being copied.
I had no luck getting tar to exclude a 5 Gigabyte subdirectory a few levels deep. In the end,
I just used the unix Zip command. It worked a lot easier for me.
So for this particular example from the original post
(tar --exclude='./folder' --exclude='./upload/folder2' -zcvf /backup/filename.tgz . )
The equivalent would be:
zip -r /backup/filename.zip . -x upload/folder/**\* upload/folder2/**\*
The following bash script should do the trick. It uses the answer given
here by Marcus Sundman.
#!/bin/bash
echo -n "Please enter the name of the tar file you wish to create with out extension "
read nam
echo -n "Please enter the path to the directories to tar "
read pathin
echo tar -czvf $nam.tar.gz
excludes=`find $pathin -iname "*.CC" -exec echo "--exclude \'{}\'" \;|xargs`
echo $pathin
echo tar -czvf $nam.tar.gz $excludes $pathin
This will print out the command you need and you can just copy and paste it back in. There
is probably a more elegant way to provide it directly to the command line.
Just change *.CC for any other common extension, file name or regex you want to exclude
and this should still work.
EDIT
Just to add a little explanation; find generates a list of files matching the chosen regex
(in this case *.CC). This list is passed via xargs to the echo command. This prints --exclude
'one entry from the list'. The slashes () are escape characters for the ' marks.
Requiring interactive input is a poor design choice for most shell scripts. Make it read
command-line parameters instead and you get the benefit of the shell's tab completion,
history completion, history editing, etc. – tripleee
Sep 14 '17 at 4:27
Additionally, your script does not work for paths which contain whitespace or shell
metacharacters. You should basically always put variables in double quotes unless you
specifically require the shell to perform whitespace tokenization and wildcard expansion. For
details, please see stackoverflow.com/questions/10067266/
– tripleee
Sep 14 '17 at 4:38
> ,Apr 18 at 0:31
For those who have issues with it, some versions of tar would only work properly without the
'./' in the exclude value.
Tar --version
tar (GNU tar) 1.27.1
Command syntax that work:
tar -czvf ../allfiles-butsome.tar.gz * --exclude=acme/foo
These will not work:
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude=./acme/foo
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude='./acme/foo'
$ tar --exclude=./acme/foo -czvf ../allfiles-butsome.tar.gz *
$ tar --exclude='./acme/foo' -czvf ../allfiles-butsome.tar.gz *
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude=/full/path/acme/foo
$ tar -czvf ../allfiles-butsome.tar.gz * --exclude='/full/path/acme/foo'
$ tar --exclude=/full/path/acme/foo -czvf ../allfiles-butsome.tar.gz *
$ tar --exclude='/full/path/acme/foo' -czvf ../allfiles-butsome.tar.gz *
gzrecover does not come installed on Mac OS. However, Liudvikas Bukys's method worked fine.
Had tcpdump piped into gzip, killed with Control-C, unexpected EOF trying to decompress pipee
file. – George
Mar 1 '14 at 18:57
Recovery is possible but it depends on what caused the corruption.
If the file is just truncated, getting some partial result out is not too hard; just
run
gunzip < SMS.tar.gz > SMS.tar.partial
which will give some output despite the error at the end.
If the compressed file has large missing blocks, it's basically hopeless after the bad
block.
If the compressed file is systematically corrupted in small ways (e.g. transferring the
binary file in ASCII mode, which smashes carriage returns and newlines throughout the file),
it is possible to recover but requires quite a bit of custom programming, it's really only
worth it if you have absolutely no other recourse (no backups) and the data is worth a lot of
effort. (I have done it successfully.) I mentioned this scenario in a previous
question .
The answers for .zip files differ somewhat, since zip archives have multiple
separately-compressed members, so there's more hope (though most commercial tools are rather
bogus, they eliminate warnings by patching CRCs, not by recovering good data). But your
question was about a .tar.gz file, which is an archive with one big member.
Andrei • 8 years ago
I'm a programmer with 13+ years experience in web and desktop applications, and I use Python
on occasions (not for the language itself, but for some applications written in Python).
I don't recall seeing a programming language more hectic that Python. It looks like an
April's Fool joke that caught on with the public, like a hack that people just don't let
go.
I spent a few days reading some books about it and I got pretty good with it, but in
many parts I consider it a hack. You want some examples, I assume:
- lists: add an item: append/insert method; get+remove an item: pop method (strange name
matching)
- lists: add an item: append/insert method; remove an item: del STATEMENT (that's right,
not a method, but a statement)
- lists: add an item: append/insert method; get the size: len FUNCTION (that's right, not a
method, but a standalone function)
- iterate over items: for dicts use the iteritems method, for lists use the enumerate
function
- and many others.
I like programming languages which are either brief, or well designed (so that you can
"guess" how API calls will look like before you read about them, by extrapolating your
previous experience in that language). Python is none of that. Moreover, IMHO it's
widespread use in Linux's desktop apps is even hurting Linux's image as a good desktop
OS.
Now the good part: I like Python's whitespaces matter approach. Again, it has some
design problems, but it's nice.
ython3kFTW • 8 years ago
Honestly, anyone who accuses Python of lacking documentation or readability is entirely off
their tree. Python has:
- Docstrings (access them via help(object) or help() and type the name in)
-
docs.python.org , which uses the excellent Sphinx documentation tool
- Most modules come with a readme (although this is common to *just about every* language,
people often overlook it)
- Crunchy (
http://code.google.com/p/cr... which makes online tutorials 'come alive', so to speak,
by embedding an interpreter in the browser
Python even has it's own version of CPAN, the Python Package Index (PyPI,
http://pypi.python.org/pypi) with an absolutely huge listing of packages.
And comparing Django and PHP is ridiculous as Django is a framework, PHP is not. This
probably explains "anonymous"'s comment.
The one point I have to agree on is that Python is *S.L.O.W*, and although you can
increase it's speed quite easily (Psyco, Cython, C Extensions, Stackless Python (which is
an incredible implementation), etc) and bring it up to scratch.
On whitespace: Yes, Python does use a HUGE amount of whitespace, but as "Just a
Developer" pointed out, it's not all necessary and does drastically increase the
readability of your code. Someone tell me this is not criptic:
for(@x){s/(http:.*)/urlencode($1)/eg} and this is the problem you will always face with
people who don't know the language they are talking about (and I am no exception, I don't
know ANY Perl)
Guest Python3kFTW
• 6 years
ago
Docs.python.org is about the least most helpful documentation I've ever read. Want
a simple example of something? Go elsewhere. If Sphinx is the tool responsible for the
awful (the index doesn't scroll with the page - wfh?), the Sphinx is _not_ an excellent
documentation tool.Where I do agree, though, is that Perl is truly, direly, appallingly
awful. Oh, and I do (or did) know Perl. see
more
doc
• 8 years ago
i like perl's implicit arguments. they reflect what i actually do with my code... like, you
know, read from the list of arguments to the program. or, say, do something with a list
variable. try doing this in python and it will lokk like some reverse polish meltdown.
for(@x){s/(http:.*)/urlencode($1)/eg} see
more
Kilroy • 8 years ago
Yes, Python is a very powerful language BUT it is also very slow!
And I am not comparing it with C or other compiled language or statically typed language.
Python is also slow compared to other dynamically typed interpretive languages.
It is okay for PHP and ASP to be slow because PHP and ASP do not claim to be powerful or
advanced languages and they are easy to learn. But Python claims to be an advanced
language. Advanced and yet so slow????
Doug • 9 years ago I
can write perl in my sleep. I'm just learning python. Somehow I found this. Python whitespace
is mostly a non issue. Not better. Not worse. Different. For myself, I like to put temporary
debug print statements flush left...guess I'll have to change that habit with Python. Ok,
that does suck. But in case you haven't figured it out by now, ALL PROGRAMMING LANGUAGES
SUCK(tm). Perl, Python. They suck equally, in different ways. Now get back to work.
Joe Krahn • 10 years ago
There is no explanation here as to why Python sucks. The main problem with Python is that
changes in whitespace can change the function. Having a language that enforces indentation is
OK, but Python depends on precise alignment across many lines of code. Also, an indentation
always counts as one indentation level, no matter what the size, but a dedent can be any
number of dedents, depending on the previous lines of code. It works OK for small bits of
code, but larger programs get really ugly. A folding or Python-aware text editor helps, but
it is still a language weakness. It makes about as much sense as Fortran77 indentation rules.
Perl has it's own problems. Object oriented features are a hack, and even well-formatted
code can look rather cryptic. The documentation is much better just because it has been
around much longer than Python. It is also much faster, probably for the same reason.
I would rather use something less cryptic than Perl, but without the fragile and
annoying indentation rules of Python. Ruby looks promising, but I have not used it much
yet.
Pvdmeer • 7 years ago
Python isn't fast. But I think it's fast enough. I've been a long-time user of Perl. Perl is
sloppy by default. Python isn't, and still it's possible to do a quick five-line experiment
without the pain. I think Python is a slow version of Perl (including the awesome amounts of
modules), but way more tidy and suited for developing somewhat bigger apps. For developing
off-line scientific tools, it's the weapon of choice (because of the speed of the numpy
vector operations). And if you hate a language for it not being able to not make a binary
then, you sir, have never heard of package managers. Also, I made apps with Java, and the
development process was excruciatingly slow, same for C++ with its gazillion features which
all bear caveats. Python is a productive language. And btw, the Python documentation is
actually pretty good and widely available. Maybe in 2003 it wasn't, I don't know.
I use Python somewhat regularly, and overall I consider it to be a very good language.
Nonetheless, no language is perfect. Here are the drawbacks in order of importance to me
personally:
It's slow. I mean really, really slow. A lot of times this doesn't matter, but it
definitely means you'll need another language for those performance-critical bits.
Nested functions kind of suck in that you can't modify variables in the outer scope.
Edit: I still use Python 2 due to library support, and this design flaw irritates
the heck out of me, but apparently it's fixed in Python 3 due to the nonlocal
statement. Can't wait for the libs I use to be ported so this flaw can be sent to the ash
heap of history for good.
It's missing a few features that can be useful to library/generic code and IMHO are
simplicity taken to unhealthy extremes. The most important ones I can think of are
user-defined value types (I'm guessing these can be created with metaclass magic, but I've
never tried), and ref function parameter.
It's far from the metal. Need to write threading primitives or kernel code or
something? Good luck.
While I don't mind the lack of ability to catch semantic errors upfront as a
tradeoff for the dynamism that Python offers, I wish there were a way to catch syntactic
errors and silly things like mistyping variable names without having to actually run the
code.
The documentation isn't as good as languages like PHP and Java that have strong
corporate backings.
And of course also why it sucks a lot less than any other language. But it's not perfect. My
personal problems with python:
It's dict and not Dict, it's list and not List and you cannot subclass them without
overriding every method.......
it's copy.copy and not just copy. Why in god's name is that an import?
clean up the stdlib
there is BaseHTTPServer and not py.http.baseserver or something like that. Why is that
darn stdlib flat?
there are soo many bad libraries in the stdlib and so many good ones not...
no goddamn styleguide for the stdlib. it's UnitTest.assertEqual and not
UnitTest.assert_equal like PEP 8 proposes
By now you cannot reassign a variable from and outer scope. (there is a pep!)
clean up the stdlib
assignments are not expressions. gaaaaa. I want to do (foo |= []).append(42) and
not foo |= []; foo.append(42) etc.
No regexp literal and match objects are not instances of a Regexp class. Move the sre
module into the core, add a @/foo/ literal and create a Regexp class instead of
something like _sre.SRE_Pattern which you cannot import to make isinstance tests
missing blocks. darn. i want blocks
unify unicode and string. quick! (waiting for Python 3000)
clean up the stdlib
Why it still sucks less? Good question. Probably because the meta programming capabilities
are great, the libraries are awesome, indention based syntax is hip, first class functions ,
quite fast, many bindings (PyGTW FTW!) and the community is nice and friendly. And there is
WSGI!
But I rather even have a tool that combined the above with 'git blame' allowing me to browse
the source of a file as it changes in time... – Egon Willighagen
Apr 6 '10 at 15:50
I was also looking for the history of files that were previously renamed and found this
thread first. The solution is to use "git log --follow <filename>" as Phil pointed out
here . – Florian Gutmann
Apr 26 '11 at 9:05
The author was looking for a command line tool. While gitk comes with GIT, it's neither a
command line app nor a particularly good GUI. – mikemaccana
Jul 18 '11 at 15:17
This is great. gitk does not behave well when specifying paths that do not exist anymore. I
used git log -p -- path . – Paulo Casaretto
Feb 27 '13 at 18:05
Plus gitk looks like it was built by the boogie monster. This is a great answer and is best
tailored to the original question. – ghayes
Jul 21 '13 at 19:28
This will show the entire history of the file (including history beyond renames and with
diffs for each change).
In other words, if the file named bar was once named foo , then
git log -p bar (without the --follow option) will only show the
file's history up to the point where it was renamed -- it won't show the file's history when
it was known as foo . Using git log --follow -p bar will show the
file's entire history, including any changes to the file when it was known as
foo . The -p option ensures that diffs are included for each
change.
I agree this is the REAL answer. (1.) --follow ensures that you see file renames
(2.) -p ensures that you see how the file gets changed (3.) it is command line
only. – Trevor Boyd Smith
Sep 11 '12 at 18:54
@Benjohn The -- option tells Git that it has reached the end of the options and
that anything that follows -- should be treated as an argument. For git
log this only makes any difference if you have a path name that begins with a
dash . Say you wanted to know the history of a file that has the unfortunate name
"--follow": git log --follow -p -- --follow – Dan Moulding
May 28 '15 at 16:10
@Benjohn: Normally, the -- is useful because it can also guard against any
revision names that match the filename you've entered, which can actually be
scary. For example: If you had both a branch and a file named foo , git
log -p foo would show the git log history up to foo , not the history for
the filefoo . But @DanMoulding is right that since the
--follow command only takes a single filename as its argument, this is less
necessary since it can't be a revision . I just learned that. Maybe you were
right to leave it out of your answer then; I'm not sure. – NHDaly
May 30 '15 at 6:03
Excellent text-based tool, great answer. I freaked out when I saw the dependencies for gitk
installing on my headless server. Would upvote again A+++ – Tom McKenzie
Oct 24 '12 at 5:28
You can also see when a specific line of code inside a file was changed with git
blame filename . This will print out a short commit id, the author, timestamp, and
complete line of code for every line in the file. This is very useful after you've found a
bug and you want to know when it was introduced (or who's fault it was).
If you use SourceTree to visualize your repository (it's free and quite good) you can
right click a file and select Log Selected
The display (below) is much friendlier than gitk and most the other options listed.
Unfortunately (at this time) there is no easy way to launch this view from the command line
-- SourceTree's CLI currently just opens repos.
but unless i'm mistaken (please let me know!), one can only compare two versions at a time in
the gui? Are there any clients which have an elegant interface for diffing several different
versions at once? Possibly with a zoom-out view like in Sublime Text? That would be really
useful I think. – Sam Lewallen
Jun 30 '15 at 6:16
@SamLewallen If I understand correctly you want to compare three different commits? This
sounds similar to a three-way merge (mine, yours, base) -- usually this strategy is used for
resolving merge conflicts not necessarily comparing three arbitrary commits. There are many
tools that support three way merges
stackoverflow.com/questions/10998728/ but the trick is feeding these tools the specific
revisions gitready.com/intermediate/2009/02/27/
– Mark
Fox
Jun 30 '15 at 18:47
You save my life. You can use gitk to find the SHA1 hash, and then
open SourceTree to enter Log Selected.. based on the found
SHA1 . – AechoLiu
Jan 25 at 6:58
I think this is a great answer. Maybe you arent getting voted as well because you answer
other ways (IMHO better) to see the changes i.e. via gitk and tig in addition to git. –
PopcornKing
Feb 25 '13 at 17:11
Just to add to answer. Locate the path (in git space, up to which exists in repository
still). Then use the command stated above "git log --follow --all -p
<folder_path/file_path>". There may be the case, that the filde/folder would have been
removed over the history, hence locate the maximum path that exists still, and try to fetch
its history. works ! – parasrish
Aug 16 '16 at 10:04
This has the benefit of both displaying the results in the command line (like git
log -p ) while also letting you step through each commit using the arrow keys (like
gitk ).
With the excellent Git
Extensions , you go to a point in the history where the file still existed (if it have
been deleted, otherwise just go to HEAD), switch to the File tree tab,
right-click on the file and choose File history .
By default, it follows the file through the renames, and the Blame tab allows
to see the name at a given revision.
It has some minor gotchas, like showing fatal: Not a valid object name in the
View tab when clicking on the deletion revision, but I can live with that.
:-)
If you're using the git GUI (on Windows) under the Repository menu you can use "Visualize
master's History". Highlight a commit in the top pane and a file in the lower right and
you'll see the diff for that commit in the lower left.
Well, OP didn't specify command line, and moving from SourceSafe (which is a GUI) it seemed
relevant to point out that you could do pretty much the same thing that you can do in VSS in
the Git GUI on Windows. – cori
Jul 22 '13 at 15:34
If you want to include local (unstaged) changes, I often run git diff
origin/master to show the complete differences between your local branch and the
master branch (which can be updated from remote via git fetch ) –
ghayes
Jul 21 '13 at 19:47
My mistake, it only happens in Eclipse, but in TortoiseGit you can see all revisions of a
file if unchecking "show all project" + checking "all branches" (in case the file was
committed on another branch, before it was merged to main branch). I'll update your answer.
– Noam
Manos
Dec 1 '15 at 12:06
If you are using eclipse with the git plugin, it has an excellent comparison view with
history. Right click the file and select "compare with"=> "history"
I have installed R before on a machine running RedHat EL6.5, but I recently had a problem
installing new packages (i.e. install.packages()). Since I couldn't find a solution to this,
I tried reinstalling R using:
sudo yum remove R
and
sudo yum install R
But now I get:
....
---> Package R-core-devel.x86_64 0:3.1.0-5.el6 will be installed
--> Processing Dependency: blas-devel >= 3.0 for package: R-core-devel-3.1.0-5.el6.x86_64
--> Processing Dependency: libicu-devel for package: R-core-devel-3.1.0-5.el6.x86_64
--> Processing Dependency: lapack-devel for package: R-core-devel-3.1.0-5.el6.x86_64
---> Package xz-devel.x86_64 0:4.999.9-0.3.beta.20091007git.el6 will be installed
--> Finished Dependency Resolution
Error: Package: R-core-devel-3.1.0-5.el6.x86_64 (epel)
Requires: blas-devel >= 3.0
Error: Package: R-core-devel-3.1.0-5.el6.x86_64 (epel)
Requires: lapack-devel
Error: Package: R-core-devel-3.1.0-5.el6.x86_64 (epel)
Requires: libicu-devel
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
I already checked, and blas-devel is installed, but the newest version is 0.2.8. Checked
using:
A cursory search of blas-devel in google shows that the latest version is at
least version 3.2. You probably used to have an older version of R installed, and the newer
version depends on a version of BLAS not available in RedHat? – Scott Ritchie
Jul 12 '14 at 0:31
sudo yum install lapack-devel does not work. Returns: No package
lapack-devel available. Scott - you are right that blas-devel is not available in yum.
What is the best way to fix this? – Jon
Jul 14 '14 at 4:08
I had the same issue. Not sure why these packages are missing from RHEL's repos, but they are
in CentOS 6.5, so the follow solution works, if you want to keep things in the package
paradigm:
When installing texinfo-tex-5.1-4.el7.x86_654, it complains about requiring tex(epsd.tex),
but I've no idea which package supplies that. This is on RHEL7, obviously (and using CentOS7
packages). – DavidJ
Mar 23 '15 at 19:50
It was yum complaining. Adding the analogous CentOS repo to /etc/yum.repos.d temporarily and
then installing just the missing dependencies, then removing it and installing R fixed the
issue. It is apparently a issue/bug with the RHEL package dependencies. I had to be careful
to ensure the all other packages came from the RHEL repos, not CentOS, hence not a good idea
to install R itself when the CentOS repo is active. – DavidJ
Mar 25 '15 at 14:18
Glad you figured it out. When I stumbled on this last year I was also surprised that the
Centos repos seemed more complete than RHEL. – Owen
Mar 26 '15 at 4:49
RStudio Server v0.99 requires RedHat or CentOS version 5.4 (or higher) as well as an installation of R. You can install R for RedHat
and CentOS using the instructions on CRAN: https://cran.rstudio.com/bin/linux/redhat/README
.
RedHat/CentOS 6 and 7
To download and install RStudio Server open a terminal window and execute the commands corresponding to the 32 or 64-bit version
as appropriate.
As someone who's switched from Ruby to Python (because the latter is far easier to teach,
IMO) and who has also put significant time into learning R just to use ggplot2, I was really
surprised at the lack of relevant Google results for "switching from python to r" – or
similarly phrased queries. In fact, that particular query will bring up more results for R
to Python , e.g. "
Python Displacing R as The Programming Language For Data ". The use of R is so ubiquitous
in academia (and in the wild, ggplot2 tends to wow nearly on the same level as D3) that I had
just assumed there were a fair number of Python/Ruby developers who have tried jumping into R.
But there aren't minimaxir's guides are the most and only
comprehensive how-to-do-R-as-written-by-an-outsider guides I've seen on the web.
By and far, the most common shift seems to be that of Raschka's – going from R to
Python:
Well, I guess it's no big secret that I was an R person once. I even wrote a book about it
So, how can I summarize my feelings about R? I am not exactly sure where this quote is comes
from – I picked it up from someone somewhere some time ago – but it is great for
explaining the difference between R and Python: "R is a programming language developed by
statisticians for statisticians; Python was developed by a computer scientist, and it can be
used by programmers to apply statistical techniques." Part of the message is that both R and
Python are similarly capable for "data science" tasks, however, the Python syntax simply
feels more natural to me – it's a personal taste.
That said, one of the things I've appreciated about R is how it "just works" I usually
install R through Homebrew, but installing RStudio via point and click is also straightforward . I can
see why that's a huge appeal for both beginners and people who want to do computation but not
necessarily become developers. Hell, I've been struggling for what feels like months to do
just the
most rudimentary GIS work in Python 3 . But in just a couple weeks of learning R –
and leveraging however it manages to package GDAL and all its other geospatial dependencies
with rgdal – been able to
create some
decent geospatial visualizations (and queries) :
Also, I used to hate how <- was used for assignment. Now, that's one of the
things I miss most about using R. I've grown up with single-equals-sign assignment in every
other language I've learned, but after having to teach some programming the difference between
== and = is a common and often hugely stumping error for beginners.
Not only that, they have trouble remembering how assignment even works, even for basic variable
assignment I've come to realize that I've programmed so long that I immediately recognize the
pattern, but that can't possibly be the case for novices, who if they've taken general math
classes, have never seen the equals sign that way. The <- operator makes a lot
more sense though I would have never thought that if hadn't read Hadley Wickham's style guide .
Speaking of Wickham's style guide, one thing I wish I had done at the very early stages of
learning R is to have read Wickham's Advanced
R book – which is free online (and contains the style guide). Not only is it just a
great read for any programmer, like everything Wickham writes, it is not at all an "advanced"
book if you are coming from another language. It goes over the fundamentals of how the language
is designed. For example, one major pain point for me was not realizing that R does not have
scalars – things that appear to be scalars happen to be vectors of length one. This is
something Wickham's book mentions in its Data structures chapter .
Another vital and easy-to-read chapter: Wickham's explanation of R's non-standard evaluation has
totally illuminated to me why a programmer of Wickham's caliber enjoys building in R, but why I
would find it infuriating to teach R versus Python to beginners.
The Python community is so active on Reddit that it has its own learners subreddit –
r/learnpython – with 54,300 subscribers .
From anecdotal observations, I don't think Python shows much sign of diminishing popularity
on Hacker News, either. Not just because Python-language specific posts keep making the front
page, but because of the general increased interest in artificial intelligence, coinciding with
Google's recent release of TensorFlow
, which they've even quickly ported to Python 3.x .
"... Python is relatively constraining - in the sense that it does not give the same amount of freedom as PERL in implementing something (note I said 'same amount' - there is still some freedom to do things). But I also see that as Python's strength - by clamping down the parts of the language that could lead to chaos we can live without, I think it makes for a neat and tidy language. ..."
"... Perl 5, Python and Ruby are nearly the same, because all have copied code from each other and describe the same programming level. Perl 5 is the most backward compatible language of all. ..."
"... Python is notably slower than Perl when using regex or data manipulation. ..."
Perl is better.
Perl has almost no constraints. It's philosophy is that there is more than one way to do
it (TIMTOWTDI, pronounced Tim Toady). Python artificially restricts what you can do as a programmer. It's philosophy is that
there should be one way to do it. If you don't agree with Guido's way of doing it, you're
sh*t out of luck.
Basically, Python is Perl with training wheels. Training wheels are a great thing for a
beginner, but eventually you should outgrow them. Yes, riding without training wheels is less
safe. You can wreck and make a bloody mess of yourself. But you can also do things that you
can't do if you have training wheels. You can go faster and do interesting and useful tricks
that aren't possible otherwise. Perl gives you great power, but with great power comes great
responsibility.
A big thing that Pythonistas tout as their superiority is that Python forces you to write
clean code. That's true, it does... at the point of a gun, sometimes at the detriment of
simplicity or brevity. Perl merely gives you the tools to write clean code (perltidy,
perlcritic, use strict, /x option for commenting regexes) and gently encourages you to use
them.
Perl gives you more than enough rope to hang yourself (and not just rope, Perl gives you
bungee cords, wire, chain, string, and just about any other thing you can possibly hang
yourself with). This can be a problem. Python was a reaction to this, and their idea of
"solving" the problem was to only give you one piece of rope and make it so short you can't
possibly hurt yourself with it. If you want to tie a bungee cord around your waist and jump
off a bridge, Python says "no way, bungee cords aren't allowed". Perl says "Here you go, hope
you know what you are doing... and by the way here are some things that you can optionally
use if you want to be safer"
One liners. Perl has a whole set of shortcuts for making it easy to write ad-hoc
scripts on the command line
Speed. For most tasks, Perl is significantly faster than Python
Regular expressions are a first-class datatype rather than an add in. This means you
can manipulate them programatically like any other first-class object.
Power. You can do things in Perl that are either much harder, or prohibited, in Python.
For instance the <> operator... this lets you trivially deal with the complexities of
opening files from the command line and/or accepting streams from pipes or redirection. You
have to write several lines of boilerplate Python code to duplicate the behavior of Perl's
while (<>) ... construct (or even more trivially the -n switch, which
automatically wraps your code with this construct).
No significant whitespace. If your formatting gets mangled (by, say, posting it to a
web forum or sending it in an email that munges whitespace), the meaning of your code
doesn't change, and you can trivially re-format your code with Perltidy according to whatever coding style you
define. You can format your code as to what is most clear in context, rather than having to
conform to an arbitrary set of restrictions.
Postfix notation. This can be ugly and is easily misused, but used with care it makes
your code easier to read, especially for things like die if $condition or
die unless $condition assertions.
Sigils. It's a love it or hate it thing, but sigils unambiguously distinguish variables
from commands, make interpolation effortless, and make it easy to tell at a glance what
kind of variable it is without having to resort to some ugly hack like Hungarian
notation.
Inline::C and all of the
other Inline::* modules). Yes, you can write Python extensions in C but Inline::C makes it
effortless.
Pod is vastly more powerful than Docstrings, especially when you throw in the power of
something like Pod::Weaver to
write/manipulate your documentation programatically.
Advantages of Python
JVM interoperatiblity. For me this is huge. It's the only thing that Python does better
than Perl. Being able to write code that runs in the JVM and to work with Java objects/APIs
without having to write Java code is a huge win, and is pretty much the only reason I ever
write anything in Python.
Learning curve. Python is easier to learn, no denying it. That's why I'm teaching it to
my 12 year old son as his first programming language
User community. Python is more popular and has a larger active user community. Appeal
to popularity is a fallacy, but you can't just dismiss mindshare and 3rd party support
either.
Though I may get flamed for it, I will put it even more bluntly than others have: Python
is better than Perl . Python's syntax is cleaner, its object-oriented type system is more
modern and consistent, and its libraries are more consistent. ( EDIT: As Christian Walde points out
in the comments, my criticism of Perl OOP is out-of-date with respect to the current de facto
standard of Moo/se. I do believe that Perl's utility is still encumbered by historical
baggage in this area and others.)
I have used both languages extensively for both professional work and personal projects
(Perl mainly in 1999-2007, Python mainly since), in domains ranging from number crunching
(both PDL and NumPy are excellent) to web-based programming (mainly with
Embperl and Flask ) to good ol' munging text files and database
CRUD
Both Python and Perl have large user communities including many programmers who are far
more skilled and experienced than I could ever hope to be. One of the best things about
Python is that the community generally espouses this
aspect of "The Zen of Python"
Python's philosophy rejects the Perl " there is more than one
way to do it " approach to language design in favor of "there should be one -- and
preferably only one -- obvious way to do it".
... while this principle might seem stultifying or constraining at first, in practice it
means that most good Python programmers think about the principle of least
surprise and make it easier for others to read and interface with their code.
In part as a consequence of this discipline, and definitely because of Python's strong typing , and
arguably because of its "cleaner" syntax, Python code is considerably easier to read
than Perl code. One of the main events that motivated me to switch was the experience of
writing code to automate lab equipment in grad school. I realized I couldn't read Perl code
I'd written several months prior, despite the fact that I consider myself a careful and
consistent programmer when it comes to coding style ( Dunning–Kruger effect ,
perhaps? :-P).
* For example, the one-liner perl bak pe 's/foo/bar/g' *. txt will go through a
bunch of text files and replace foo with bar everywhere while
making backup files with the bak extension.
Neither language is objectively better. If you get enthused about a language - and by all
means, get enthused about a language! - you're going to find aspects of it that just strike
you as beautiful. Other people who don't share your point of view may find that same aspect
pointless. Opinions will vary widely. But the sentence in your question that attracted my
attention, and which forms the basis of my answer, is "But, most of my teammates use Perl."
If you have enough charisma to sway your team to Python, great. But in your situation,
Perl is probably better . Why? Because you're part of a team, and a team is more
effective when it can communicate easily. If your team has a signifigant code base written in
Perl, you need to be fluent in it. And you need to contribute to the code base in that same
language. Life will be easier for you and your team.
Now, it may be that you've got some natural lines of demarcation in your areas of interest
where it makes sense to write code for one problem domain in Perl and for another in Python -
I've seen this kind of thing before, and as long as the whole team is on board, it works
nicely. But if your teammates universally prefer Perl, then that is what you should focus
on.
It's not about language features, it's about team cohesion.
I used Perl since about 1998 and almost no Python until 2013. I have now (almost) completely
switched over to Python (written 10s of thousands of lines already for my project) and now
only use Perl one-liners on the Linux command line when I want to use regex and format the
print results when grepping through multiple files in my project. Perl's one-liners are
great.
This surprisingly easy transition to fully adopt Python and pretty much drop Perl simply
would not be have been possible without the Python modules, collections and re
(regex package). Python's numpy, matplotlib, and scipy help seal the deal.
The collections package makes complex variables even easier to create than in Perl.
It was created in 2004, I think. The regex package, re , works great, but I wish it
was built into the Python language like it is in Perl because using regex usage is smooth in
Perl and clunky(er) in Python.
OOP is super easy and not as verbose as in C++, so I find it faster to write, if needed in
Python.
I've drank the Kool-Aid and there is no going back. Python is it. (for now)
Python is easier to learn, Perl takes a while to get used to and is not intuitive. Perl
is great, if you already know it.
I use Perl to write quick scripts using regular expressions, to perform text/data
manipulations. If there is anything Perl can do well, it is string manipulations.
I use Python for writing reusable code. I personally think OO in Perl is a little odd,
and find Python to be more consistent. For example, I find it easier to create modules in
Python.
Perl scripts are often messy (it takes me a while to understand my own scripts),
whereas, Python is very clean.
I can't think of anything that you can do with Perl that you can't with Python. If I were
you, I'd stick with Python.
It's kind of funny; back in the early 1970's, C won out over Pascal despite being much more
cryptic. They actually have the same level of control over the machine, Pascal is just more
verbose about it, and so it's quicker to write little things in C. Today, with Python and
Perl, the situation is reversed; Python, the far less cryptic of the two, has won out over
Perl. It shows how values change over time, I suppose.
One of the values of Python is the readability of the code. It's certainly a better
language for receiving someone else's work and being able to comprehend it. I haven't had the
problem where I can't read my own Perl scripts years later, but that's a matter of fairly
strict discipline on my part. I've certainly received some puzzling and unreadable code from
other Perl developers. I rather hate PowerShell, in part because the messy way it looks
on-screen and its clumsy parameter passing reminds me of Perl.
For collaboration, the whole team would do better on Python than on Perl because of the
inherent code readability. Python is an extreme pain in the neck to write without a
syntax-aware editor to help with all the whitespace, and that could create a barrier for some
of your co-workers. Also Python isn't as good for a really quick-and-dirty script, because
nothing does quick (or dirty) better than Perl. There are a number of things you can do in
Perl more quickly and I've done some things with Perl I probably wouldn't be interested in
even trying in Python. But if I'm doing a scripting task today, I'll consider Python and even
Scala in scripting mode before I'll consider Perl.
I'll take a guess that your co-workers are on average older than you, and probably started
Perl before Python came along. There's a lot of value in both. Don't hate Perl because of
your unfamiliarity with it, and if it's a better fit for the task, maybe you will switch to
Perl. It's great to have the choice to use something else, but on the other hand, you may
pick up an awful lot of maintenance burden if you work principally in a language nobody else
in your company uses.
I have used Perl and Python both. Perl from 2004-2007, and Python, early 2009 onward. Both
are great, malleable languages to work with. I would refrain from making any comments
on the OOP model of PERL since my understanding is most likely out-of-date now.
Library-wise PERL and Python both have a fantastic number of user-contributed libraries.
In the initial days of Python, PERL definitely had an edge in this regard - you could find
almost anything in its library repository CPAN, but I am not sure whether this edge exists
anymore; I think not.
Python is relatively constraining - in the sense that it does not give the same amount of
freedom as PERL in implementing something (note I said 'same amount' - there is still some
freedom to do things). But I also see that as Python's strength - by clamping down the parts
of the language that could lead to chaos we can live without, I think it makes for a neat and
tidy language.
I loved PERL to death in the years I used it. I was an ardent proselytizer. My biggest
revelation/disappointment around the language came when I was asked to revisit a huge chunk
of production code I had written a mere 6 months ago. I had a hard time understanding various
parts of the code; most of it my own code! I realized that with the freedom that PERL offers,
you and your team would probably work better (i.e. write code that's maintainable ) if
you also had some coding discipline to go with it. So although PERL provides you a lot of
freedom, it is difficult to make productive use of it unless you bring in your own discipline
(why not Python then?) or you are so good/mature that any of the many ways to do something in
the language is instantly comprehensible to you (i.e. a steeper learning curve if you are not
looking forward to brain-teasers during code-maintenance).
The above is not an one-sided argument; it so happened that some years later I was in a
similar situation again; only this time the language was Python. This time around I was
easily able to understand the codebase. The consistency of doing things in Python helps.
Emerson said consistency is the hobgoblin of little minds , but maybe faced with the
daunting task of understanding huge legacy codebases we are relatively little minds.
Or maybe its just me (actually not, speaking from experience :) ).
All said and done, I am still a bit miffed at the space/tab duality in Python :)
You're asking the wrong question because the answer to your question is, it depends. There is
no best language. It depends on what you're going to use it for. About 25 years ago, some
bureaucratic decided that every system written by the DoD was to be written in Ada. The
purpose was that we'd only have to train software developers in one language. The DoD could
painlessly transfer developers from project to project. The only problem was that Ada wasn't
the best language for all situations. So, what problem do you wish to solve by programming.
That will help to determine what language is the best.
Perl 5, Python and Ruby are nearly the same, because all have copied code from each other and
describe the same programming level. Perl 5 is the most backward compatible language of all.
Python is much better, if only because it's much more readable. Python programs are therefore
easily modifiable, and then more maintainable, extensible, etc.
Perl is a powerful language, and a Perl script can be fun to write. But is a pain to read,
even by the one who wrote it, after just a couple of months, or even weeks. Perl code usually
looks like(and in many cases just is) a hack to get something done quickly and "easily".
What is still confusing and annoying in Python are the many different frameworks employed
to build and install modules, the fact that much code and modules are still in beta stage,
unavailable on certain platforms, or with difficult or almost impossible to satisfy
binary(and also non-binary, i.e. due to different/incompatible versions) dependencies. Of
course, this is related to the fact that Python is still a relatively young language.
Python 3 vs. 2 incompatibilities, though justifiable and logical, also don't help in this
regard.
Perl is the unix of programming languages. Those who don't know it are condemned to
re-implement it (and here I'm looking at you, Guido), badly. I've read a lot of Python books,
and many of the things they crow about as being awesome in Python have existed in Perl since
the beginning. And most of those Perl stole from sed or awk or lisp or shell. Sometimes I
think Pythonistas are simply kids who don't know any other languages, so of course they think
Python is awesome.
as someone that worked professionally with Perl, and moved to python only two years ago the
answer sound simple.
Work with whatever you can that feels more comfortable, however if your teammates use Perl it
would be better to learn Perl in order to share code and refrain from creating code that
cannot be reused.
in terms of comparison
1. python is object oriented by design, Perl can be object oriented but it is not a
must
2. Python has a very good standard library, Perl has cpan with almost everything.
3. Perl is everywhere and in most cases you won't need newer perl for most cpan modules,
python is a bit more problematic in this regard
4. Python is more readable after a month than Perl
there are other reasons but those are the first that comes to mind
Your question sounds a bit like "Which is better a boat or a car?". You simply do not do the same things with them. Or more precisely, some things are easier
to do with Perl and others are easier with Python. None of them is robust (compare with Eiffel or Oberon, and if you never heard of these it
is because robustness is not so important for you). So learn both, and choose for yourself. And also pick a nice one in http://en.wikipedia.org/wiki/Tim...
(why not Julia?). A language that none of your friend knows about and stick out your tongue
to them.
Python is notably slower than Perl when using regex or data manipulation. I think if
you're worried about the appeal of Perl, you could use PyCharm with Perl too. Furthermore, I believe the primary reason why someone would use
an interpreted language on the job is to manipulate or analyze data. Therefore, I would use
the Perl in order to get the job done quickly and worry about the appears another time.
I had to learn Perl in an old position at work, where system build scripts were written in
Perl and we are talking system builds that would take 6 to 8 hours to run, and had to run in
both Unix and Windows environments.
That said, I have continued to use Perl for a number of reasons
I love the ability to automate other systems with Perl, making data entry, problem
definition, etc. quick and easy.
The ability to deal with hashes while at first one of the more difficult concepts to
grasp has been a blessing.
Working with JSON and data structures is similarly a blessing, and curse.
Play with BIGINT a little, you can deal with integers that are hundreds of characters
long not that I have found where this helps with "work" but it does make some of the
programming challenges easier.
Yes, I now need to use Python at work, but I have actually found it more difficult to
learn than Perl, especially for pulling in a web page and extracting data out of the
page.
For beginner, I suggest Python. Perl 6 is released and Perl 5 will be replaced by Perl 6 or Ruby. Python is easier for a
beginner and more useful for future life no matter you want to be a scientist or a
programmer. Perl is also a powerful language but it only popular in U.S., Japan, etc.
Anyway, select the one you like, and learn the other as a second-language.
Perl is and older language and isn't really in the vogue. Python has a wider array of
applications and as a "trendy" language, greater value on the job market.
First, I'll say it's almost impossible to say if any one language is 'better' than
another, especially without knowing what you're using them for, but...
Short answer, it's Python. To be honest, I don't know what you mean by 'robust', they are both established languages,
been around a long time, they are both, to all intents and purposes, bug free, especially if
you're just automating a few tasks. I've not used Perl in quite some time, except when I delve into a very old project I
wrote. Python has terrible syntax, and Perl has worse, in my opinion. Just to make my answer a bit more incendiary, both Perl and Python both suck, Python sucks
significantly less.
"... Secondly I am personally not overly concerned with what the popular language of the day is. As I commented ages ago at RE (tilly) 1: Java vs. Perl from the CB , the dream that is sold to PHBs of programmers as interchangable monkeys doesn't appeal to me, and is a proven recipe for IT disasters. See Choose the most powerful language for further discussion, and a link to an excellent article by Paul Graham. As long as I have freedom to be productive, I will make the best choice for me. Often that is Perl. ..."
"... Indeed languages like Python and Ruby borrow some of Perl's good ideas, and make them conveniently available to people who want some of the power of Perl, but who didn't happen to click with Perl. I think this is a good thing in the end. Trying different approaches allows people to figure out what they like and why they like it, leading to better languages later. ..."
"... Well, to start off, I think you're equating "popularity" with "success". It's true that Perl awareness is not as widespread as the other languages you mention. (By the way, I notice that two of them, C# and Java, are products of corporations who would like to run away with the market regardless of the technical merit of their solution). But when I first read the title of your note, I thought a lot of other things. To me, "decline" means disuse or death, neither of which I think apply to Perl today. ..."
I love Perl and have been using it since 1996 at my work for administrative tasks as well as
web based products. I use Perl on Unix and Windows based machines for numerous tasks.
Before I get in depth about the decline let me give a little background of myself. I got my
first computer in 1981, a TRS-80 Model III 4 MHz Z80 processor, 16K RAM, no HD, no FD, just a
cassette tape sequential read/write for storage and retrieval. The TRS-80 line allowed for
assembler or BASIC programs to be run on it. I programmed in both BASIC and assembler, but most
BASIC since I had limited memory and using the tape became very annoying. Lets time warp
forward to 1987 when Perl was first released.
The introduction of Perl was not household knowledge; the use of computers in the home was
still considerably low. Those that did have computers most likely did very specific tasks; such
as bring work home from the office. So it is fairly safe to say that Perl was not targeted at
inexperienced computer users, but more to system administrators and boy did system
administrators love it. Now lets time warp ahead to 1994.
1994 marked what I consider the start of the rush to the WWW (not the internet) and it was
the birth year of Perl 5 and DBI. The WWW brought to use the ability to easily link to any
other site/document/page via hypertext markup language or as we like to say, HTML. This "
new " idea
caused created a stir in the non-tech world for the first time. The WWW, as HTML progressed,
started to make using and knowing about computers a little less geek. Most servers were UNIX
based and as the needs for dynamic content or form handling grew what language was there to
assist? Perl. So Perl became, in a way, the default web language for people that hadn't been
entrenched in programming another CGI capable language and just wanted to process a form,
create a flat file data store, etc. that is non-techies.
Perl served us all well, but on the horizon were the competitors. The web had proven itself
not to be a flash in the pan, but a tool through which commerce and social interaction could
take new form and allow people that had never considered using a computer before a reason to
purchase one. So the big software and hardware giants looked for ways to gain control over the
web and the Internet in general. There were even smaller players that were Open Source and
freeware just like Perl.
So by 2000 there were several mature choices for Internet development and Perl was a drift
in the sea of choice. I also see 2000 as the year the tide went out for Internet development in
general. The "rush" had subsided, companies and investor started to really analyze what they
had done and what the new media really offered to them. Along with analyzing comes consultants.
Consultants have an interest in researching or developing the best product possible for a
company. The company is interested in that when they terminate the contract with the consultant
that they will be able to maintain what they bought. This brings us to the rub on Perl. How can
a consultant convince a company that his application language of choice is free and isn't
backed by a company? ActiveState I believe backs Perl to some extent, but one company generally
isn't enough to put a CTO at ease.
So the decline of Perl use can be summarized with these facts:
Perl is thought of as a UNIX administrative tool
There are not enough professional organizations or guilds to bolster confidence with
corporations that investing in a Perl solution is a good long-term plan.
Perl doesn't have large scale advertising and full time advocates that keep Perl in major
computing publications and remind companies that when they chose, chose Perl.
There is no official certification. I have seen Larry's comments on this and I agree with
him, but lack of certification hurts in the corporate world.
Lack of college or university Perl class, or maybe better-stated lack of Perl promotion
by colleges.
I suppose all of this really only matter to people that don't make their living extending Perl
or using it for system admin work that isn't approved by a board or committee. People that make
final products based on Perl for the Internet and as standalone applications are effected by
the myths and facts of Perl.
Last year a possible opportunity I had to produce a complete package for a large
telecommunications firm failed in part due to lack of confidence in Perl as the language of
choice, despite the fact that two districts had been successfully using the prototype and
increased efficiency.
Another factor is the overseas development services. My most recent employer had a
subsidiary in India with 30 developers. Training for Perl was unheard of. There were signs
literally everywhere for C++ , C# and Java, but no mention of Perl. It seems Perl is used for
down and dirty utilities not full scale applications.
Maybe Perl isn't "supposed" to be for large-scale applications, but I think it can be and I
think it's more then mature enough and supported to provide a corporation with a robust and
wise long-term solution.
I am very interested in your opinions about why you feel Perl is or isn't gaining
ground.
First of all I don't know whether Perl is declining. Certainly I know that some of the
Perl 6 effort has done exactly what it was intended to do, and attracted effort and interest
in Perl. I know that at my job we have replaced the vast majority of our work with Perl, and
the directions we are considering away from Perl are not exactly popularly publicized
ones.
Secondly I am personally not overly concerned with what the popular language of the day
is. As I commented ages ago at RE (tilly) 1: Java vs. Perl from the CB , the
dream that is sold to PHBs of programmers as interchangable monkeys doesn't appeal to me, and
is a proven recipe for IT disasters. See Choose the most powerful language for further
discussion, and a link to an excellent article by Paul Graham. As long as I have freedom to
be productive, I will make the best choice for me. Often that is Perl.
Third I don't see it as a huge disaster if Perl at some point falls by the wayside. Perl
is not magically great to push just because it is Perl. Perl is good because it does things
very well. But other languages can adopt some of Perl's good ideas and do what Perl already
does. Indeed languages like Python and Ruby borrow some of Perl's good ideas, and make them
conveniently available to people who want some of the power of Perl, but who didn't happen to
click with Perl. I think this is a good thing in the end. Trying different approaches allows
people to figure out what they like and why they like it, leading to better languages
later.
Perhaps I am being narrow minded in focusing so much on what makes for good personal
productivity, but I don't think so. Lacking excellent marketing, Perl can't win in the hype
game. It has to win by actually being better for solving problems. Sure, you don't see Perl
advertised everywhere. But smart management understands that something is up when a small
team of Perl programmers in 3 months manages to match what a fair sized team of Java
programmers had done in 2 years. And when the Perl programmers come back again a year later
and in a similar time frame do what the Java programmers had planned to do over the next 5
years...
Well, to start off, I think you're equating "popularity" with "success". It's true that
Perl awareness is not as widespread as the other languages you mention. (By the way, I notice
that two of them, C# and Java, are products of corporations who would like to run away with
the market regardless of the technical merit of their solution). But when I first read the
title of your note, I thought a lot of other things. To me, "decline" means disuse or death,
neither of which I think apply to Perl today.
The fact that there isn't a company with a lot of money standing behind Perl is probably
the cause of the phenomena you observe. The same applies to other software that is produced
and maintained by enthusiasts rather than companies. Linux is a very good example. Up to
recently, Linux was perceived as just a hobbyist's toy. Now there are small glimmers of its
acceptance in the corporate world, mainly from IBM stepping up and putting together a
marketing package that non-technical people can understand. (I'm thinking of recent TV ad
campaigns.) But does that make Linux "better"? Now there's a way to start an argument.
I agree that except in certain cases (look at The Top Perl Shops for some
examples), most companies don't "get" Perl. Last year at my current client, I suggested that
a new application be prototyped using a Perl backend. My suggestion was met with something
between ridicule and disbelief ("We don't want a bunch of scripts , we want a
real program ". That one stung.) To give them credit -- and this lends credence to one
of your points -- they had very few people that could program Perl. And none of then were
very good at it.
So has this lack of knowledge among some people made Perl any worse, or any less useful?
No, definitely not. I think the ongoing work with Perl 6 is some of the most interesting and
exciting stuff around. I think the language continues to be used in fascinating leading-edge
application areas, such as bioinformatics. The state of Perl definitely doesn't fit my
definition of "decline".
Nonetheless, I think your point of "corporate acceptance" is well-taken. It's not that the
language is declining, it's that it's not making inroads in the average boardroom. How do you
get past that barrier? For my part, I think the p5ee project is a step in the right direction. We need to
simplify Perl training, which is one of the goals of the standardization, and provide
something for a corporate executive to hold on to -- which is a topic of discussion in the
mailing list right now.
And the nice part is that the standardized framework doesn't stop all
the wonderful and creative Cool uses for Perl that we've
become accustomed to. If the lack of corporate acceptance is of concern to you, then join the
group. "Dont' curse the darkness, light a candle" is an old Chinese proverb .
Re: Re: The Decline of
Perl - My Opinion
There was a post on the mod_perl list later the same day that I wrote this, that shows a
steady rise in the use of mod_perl
based servers. The graph
.
Maybe a better title should be, Top Hindrances in Selling Perl Solutions
I agree with your timeline and general ideas about what brought perl into focus. I also
agree with perl being adrift in a sea of choices and with the rush subsiding in 2000. I whole
heartedly disagree with your arguments about why perl will decline. Let's not look at your
summary but at the pillars of your summary.
1. Perl is thought of as a UNIX administrative tool.
You really don't have much support for this summary. You state that system
administrators love it but so do a whole lot of developers!
2. There are not enough professional organizations or guilds to bolster confidence with
corporations that investing in a Perl solution is a good long-term plan.
Well you're at one of the most professional guilds right now. I don't see a cold fusion
monks site? What do you want? Certification? I think more of my BS and BA then I do of any
certification. As for "good long-term plans" - very few business see past the quarter. While
I think this is generally bad , I think it's going to work wonders for open software.
Where can you trim the budget to ensure profitability - cut those huge $$$$ software licenses
down to nothing.
3. Perl doesn't have large scale advertising and full time advocates that keep Perl in
major computing publications and remind companies that when they chose, chose Perl.
Hmmm ... not sure I want to base my IT selection on what the mags have to say -- I've seen
a whole lot of shelf-ware that was bought due to what some wags say in the latest issue of
some Ziff Davis trash.
3. There is no official certification. I have seen Larry's comments on this and I agree
with him, but lack of certification hurts in the corporate world.
The are only two certifications that count - one is years of experience and the other is a
sheep skin. Anything else is pure garbage. As long as you have the fundamentals of algorithm
design down - then who cares what the cert is.
4. Lack of college or university Perl class, or maybe better-stated lack of Perl
promotion by colleges.
I wouldn't expect anyone right out of college to be productive in any language. I would
expect them to know what makes a good algorithm - and that my friend is language agnostic. Be
it VB, C, C++, or perl - you have to know big-o.
It sounds like you're a little worried about a corporations perception about our language
of choice. I wouldn't be. Given the current perception of corporate management (ala Enron), I
think the people who make the (ehh) long-range plans may be around a lot less than us tech
weenies. Bring it in back doors if you want. Rename it if you have to - an SOAP enabled
back-end XML processor may be more appealling then an apache/mod_perl engine (that's where
the BA comes in).
It also sound like you're worried about overseas chop shops. Ed Yourdan rang that bell
about ten years ago with "The Decline and the Fall of the American Programmer." I must say I
lost a lot of respect for Ed on that one. Farming out development to India has proven to be
more of a lead egg than the golden hen - time zone headaches and culture clash has proved
very hard to overcome.
Perl is moving on ... it seemed static because everyone was catching up to it. That being
said, some day other OSS languages may overtake it but Python and Ruby are still in perl's
review mirror.
Just like loved ones, we tend to ignore those which are around us everyday. For this
Valentines day, do us all a favor and by some chocolates and a few flowers for our
hard-working and beloved partner - We love you perl.
A few thoughts and data points: Perl may have gained ground initially in System Administration, and since the Web came
along, Perl now is though of more as the language of CGIs.
My previous employer also had a subsidiary in India, and my distributed project included a
team there, working on a large (80K line) web application in Perl. Folks coming out of
University in India are more likely to have been trained in C++ or Java, but Perl isn't
unknown.
On Certification: Be very careful with this. HR departments might like to see
certifications on resumes, but in the development teams I've worked in, a Very Big Flag is
raised by someone claiming to have a certification. The implication, fair or not, is "this
person is a Bozo."
Here's the thing: to some extent, you're right, but at the same time, you're still way off
base. Now let's say, just for a second, that the only thing Perl can / should be used for is
system administration and web programming (please don't flame me, guys. I know that is by no
means the extent of Perl's capabilities, but I don't have time to write a post covering all
the bases right now.) Even assuming that's true, you're still wrong.
Considering the web side of it, yes, Perl is being used in far fewer places now. There are
two major reasons for this: one is the abundance of other languages (PHP and ::shudder:: ASP,
for example). Another is the fact that a lot of sites backed by Perl programming crashed and
burned when the dot-coms did. You know what? I don't see this as a big deal. The techies who
wrote those sites are still around, likely still using Perl, and hopefully not hurting its
reputation by using it to support companies that will never, ever, make any money or do
anything useful...ever. (Not to say that all dot-coms were this way, but c'mon, there were
quite a few useless companies out there.) These sites were thrown together (in many cases) to
make a quick buck. Granted, Perl can be great for quick-and-dirty code, but do we really want
it making up the majority of the volume of the code out there?
System administration: I still think Perl is one of the finest languages ever created for
system administration, especially cross-platform system admin, for those of us who don't want
to learn the ins and outs of shell scripting 100x over. I really don't think there'll be much
argument there, so I'll move on.
The community: when was the last time Perl had organizations as devoted as the Perl
Mongers and Yet Another Society? Do you see a website as popular and helpful as Perlmonks for
PHP? Before last year, I don't remember ever having programmers whose sole job is to improve
and evangelize Perl. Do you? I can't remember ever before having an argument on the Linux
kernel mailing list on whether Linux should switch to a quasi-Perl method of patch
management. (This past week, for those who haven't been reading the kernel mailing list or
Slashdot.)
To be honest, I have no idea what you're talking about. As far as I'm concerned, Perl is
as strong as it's ever been. If not, then it's purely a matter of evangelization. I know I've
impressed more than one boss by getting what would've been a several-day-long job done in an
hour with Perl. Have you?
I'm not sure how you backup the statement that Perl is declining , but
anyway.
There's a huge difference between writing sysadmin tools and writing business
oriented applications. (Unless your business is to provide sysadmin tools. ;-)
In my shop, the sysadmins are free to use almost any tools, languages, etc. to
get their work done. OTOH, when it comes to business supporting applications, with end user
interface, the situation is very different. This is when our middle level management starts
to worry.
My personal experience is that Perl does have a very short and low learning curve when it
comes to writing different 'tools' to be used by IT folks.
The learning curve may quickly become long and steep when you want to create a business
oriented application, very often combining techniques like CGI, DBI (or SybPerl), using non
standard libraries written as OO or not plus adding your own modules to cover your
shop-specific topics. Especially if you want to create some shared business objects to be
reused by other Perl applications. Add to this that the CGI application shall create
JavaScript to help the user orient through the application, sometimes even JS that
eval() 's more JS and it becomes tricky. (Did someone mention 'security' as
well?)
Furthermore, there is (at least in Central Europe) a huge lack of good training. There are
commercial training courses, but those that I've found around where I live are all beginners
courses covering the first chapters in the llama. Which is good, but not enough.
Because after the introduction course, my colleagues ask me how to proceed. When I tell them
to go on-line, read books and otherwise participate, they are unhappy. Yes, many of them
still haven't really learnt how to take advantage of the 'Net. And yes again, not enough
people (fewer and fewer?) appreciates to RTFM. They all want quick solutions. And no,
they don't appreciate to spend their evenings reading the Cookbook or the Camel. (Some odd
colleagues, I admit. ;-)
You can repeatedly have quick solutions using Perl, but that requires efforts to
learn. And this learning stage (where I'm still at) is not quick if you need to do
something big .
Too many people (around where I make my living) want quick solutions to everything with no
or little investment.
(Do I sound old? Yeah, I guess I am. That's how you become after 20+ years in this business.
;-)
Conclusion: In my shop, the middle management are worried what will happen if I
leave.
Questions like: " Who will take over all this Perl stuff? " and " How can we get a
new colleague up to speed on this Perl thingy within a week or two? " are commonplace
here. Which in a way creates a resistance.
I'd say: Momentum! When there is a hord of Perl programmers available the middle
management will sleep well during the nights. (At least concerning "The Perl Problem".
;-)
I'm working very hard to create the momentum that is necessary in our shop and I hope you
do the same . ( Biker points a finger at the reader.)
I agree with most of this, but a major problem that I have noticed is that with the recent
'dot.gone burn', the (job) market is literally flooded with experienced Perl programmers, but
there are no positions for them. Mostly this is due to Perl being free. To learn C++, C#,
Java you've got to spend a lot of money. Courses do not come cheap, the software doesn't come
cheap and certification (where it is offered) isn't cheap.
However, if anybody with just programming knowledge in anything from BASIC upwards and the
willingness to learn can get hold for software that is free to use, free to download, free to
'learn by example' (the quickest and easiest way to learn IMHO) - you can probably have a
Perl programmer that can make a basic script in next to no time. He runs his own website off it and other scripts he wrote,
learns, looks at other scripts and before you know it, he's writing a complete Intranet based
Management Information System
in it... If that person had to get C++, learn by reading books and going to courses (and the
associated costs - there isn't that much code available to read and learn by), it would have
taken so much longer and if you are on a budget - it isn't an option.
Compare Linux+MySQL+Apache+Perl to (shudder) Windows 2000 Advanced Server+MS SQL
Server+IIS+ASP. Which costs a lot more in the set up costs, staff costs and maintenance. Let
alone security holes? But which do big corporations go for (even though ever techy knows
which is the best one to go for). Why? Because 'oh, things are free for a reason - if we've
got to pay lots of money for it it has got to be damn good - just look at all those ASP
programmers asking 60,000UKP upwards, it must be good if they are charging that much'.
All in all, if Perl6+ had a license fee charge for it and a 'certification' was made
available AND Perl programmers put up their 'minimum wage', Perl would take off again big
time. Of course, it's all IMHO, but you did ask for my opinion :-)
If you really think that by asking money and shutting up Perl, you would make it more
popular and profitable, then I challenge you to go out and try to do it.
If you read the licensing terms, you can take Perl, take advantage of the artistic
license, rename it slightly, and make your own version which can be proprietary if you
want. (See oraperl and sybperl for examples where this was done with Perl 4.)
My prediction based on both theory and observation of past examples (particularly
examples of what people in the Lisp world do wrong time after time again) is that you will
put in a lot of energy, lose money, and never achieve popularity. For some of the theory,
the usually referred to starting place is The Cathedral and the Bazaar
.
Of course if you want to charge money for something and can get away with it, go ahead.
No less than Larry Wall has said that, It's almost like we're doing Windows users a
favor by charging them money for something they could get for free, because they get
confused otherwise. But I think that as time goes by it is becoming more mainstream to
accept that it is possible for software to be both free and good at the same time.
I know, but that's how head of departments and corporate management think: these are the
people that believe the FUD that Microsoft put out ('XP is more secure - we best get it
then', never mind that Linux is more secure and the Windows 2000 machines are also secure).
Sometimes it's just the brand name which also helps - I know of a certain sausage
manufacture who makes sausages for two major supermarkets. People say 'Oh, that's from
Supermarket X' upon tasting, although it is just the same sausage.
All in all - it comes down to packaging. 'Tart' something up by packaging, brand names
and high prices are, despite the rival product being better in every respect, the 'well
packaged' product will win.
Mark Lutz's
Learning Python is a favorite of many. It is a good book for novice programmers. The new fifth edition is updated to both Python 2.7 and 3.3. Thank you for your feedback!
Your response is private. Is this answer still relevant and up to date?
Instead of book, I would advice you to start learning Python from
CodesDope which is a wonderful site for starting to learn
Python from the absolute beginning. The way its content explains everything step-by-step and
in such an amazing typography that makes learning just fun and much more easy. It also
provides you with a number of practice questions for each topic so that you can make your topic
even stronger by solving its questions just after reading it and you won't have to go around
searching for its questions for practice. Moreover, it has a discussion forum which is really
very responsive in solving all your doubts instantly.
There are many good websites for learning the basics, but for going a bit deeper, I'd
suggest
MIT OCW 6.00SC. This is how I learned Python back in 2012 and what ultimately led me to MIT
and to major in CS. 6.00 teaches Python syntax but also teaches some basic computer science
concepts. There are lectures from John Guttag, which are generally well done and easy to
follow. It also provides access to some of the assignments from that semester, which I found
extremely useful in actually learning Python.
After completing that, you'd probably have a better idea of what direction you wanted to go.
Some examples could be completing further OCW courses or completing projects in Python.
Choose Python. It has a bigger and more diverse community of developers. Ruby has a great
community as well, but much of the community is far too focused on Rails. The fact that a lot
of people use Rails and Ruby interchangeably is pretty telling.
As for the differences between their syntax go: They both have their high and low points.
Python has some very nice functional features like list comprehension and generators. Ruby
gives you more freedom in how you write your code: parentheses are optional and blocks aren't
whitespace delimited like they are in Python. From a syntactical point of view, they're both in
a league of their own.
My reason for choosing Python (as my preferred language) though, had everything to do with
the actual language itself. It actually has a real module system. It's clean and very
easy to read. It has an enormous standard library and an even larger community contributed
one.
1. Python is widely used across several domains. It's installed on almost all linux distros.
Perfect for system administration. It's the de facto scripting language in the animation/visual
effect industry. The following industry standard programs use python as their scripting
language:
Autodesk Maya, Softimage, Toxik, Houdini, 3DSMAX, Modo, MotionBuilder, Nuke, Blender, Cinema4D,
RealFlow and a lot more....
If you love visual effect intensive movies and video games, chances are python helped in making
them. Python is used extensively by Industrial Light Magic, Sony Imageworks, Weta Digital, Luma
Pictures, Valve etc
2. Researchers use it heavily. Resulting in free high quality libraries: NumPy, SciPy,
Matplotlib etc
3. Python is backed by a giant like Google
4. Both python and ruby are slow. But ruby is slower!
And oh, the One Laptop per Child project uses python a lot
Ruby is more or less a dsl. It is used widely for web development. So much so that the name
Ruby is interchangeable with Rails. Dave Aronson , T. Rex at Codosaurus, LLC
(1990-present)
Answered Feb 27
This was apparently asked in 2010, but I've just been A2A'ed in 2017, so maybe an updated
perspective will help. Unfortunately I haven't done much Python since about 2008, but I did a
little bit in...
This was apparently asked in 2010, but I've just been A2A'ed in 2017, so maybe an updated
perspective will help. Unfortunately I haven't done much Python since about 2008, but I did a
little bit in 2014 but I have done a ton of Ruby in the meantime.
From what I hear, the main changes in Python are that on the downside it is no longer the
main darling of Google, but on the upside, more and more and better and faster libraries have
come out for assorted types of scientific computation, Django has continued to climb in
popularity, and there are apparently some new and much easier techniques for integrating C
libraries into Python. Ruby has gotten much faster, and more popular in settings other than
Rails, and meanwhile Rails itself has gotten faster and more powerful. So it's still very much
not a no-brainer.
There are some good libraries for both for games, and other uses of GUI interfaces (from
ordinary dialog boxes and such, to custom graphics with motion). If you mean a highly visual
video game, needing fine resolution and a high frame rate, forget 'em both, that's usually done
in C++.
For websites, it depends what kind . If you want to just show some info, and have
ability to change what's there pretty easily, from what I've heard Django is excellent at
making that kind of system, i.e., a CMS (Content Management System). If you want it to do some
storage, processing, and retrieval of user-supplied data, then Ruby is a better choice, via
frameworks such as Rails (excellent for making something quickly even if the result
isn't lightning-fast, though it can be with additional work) or Sinatra (better for fast stuff
but it does less for you). If you don't want to do either then you don't need either
language, just raw HTML, CSS (including maybe some libraries like Bootstrap or Foundation), and
maybe a bit of JavaScript (including maybe some libraries like JQuery) -- and you'll have to
learn all that anyway!
"Apps" is a bit broad of a term. These days it usually means mobile apps. I haven't
heard of ways to do them in Python, but for Ruby there is RubyMotion, which used to be iOS-only
but now supports Android too. Still, though, you'd be better off with Swift (or Objective-C)
for iOS and Java for Android. On the other claw, if you just mean any kind of
application, either one will do very nicely for a wide range.
Vikash Vikram
Software Architect Worked at SAP Studied at Indian Institute of Technology, Delhi Lived in New
Delhi
Depends on you
use. If you are into number crunching, system related scripts, Go to Python by all means.
If
you are interested into speed, Go somewhere else as both languages are slow. When you compare
their speed to each other, they are more or less same (You are not at all going to get any
order of difference in their performance). If you are into web application development or want
to create backend for you mobile apps or writing scripts, then only you have to dig deeper. The
major difference is in the mindset of the community. Python is conservative and Ruby is
adventurous. Rails is the shining example of that. Rails guys are just crazy about trying new
ideas and getting them integrated. They have been one of the first to has REST, CoffeeScript,
SaSS and some many other stuffs by default. With Rails 5, they will have ES6 integrated as
well. So if you want to bet on a framework that let you try all the new stuffs, then Rails and
hence Ruby is the way. Personally I like Ruby because of Rails. I like Rails because I like the
philosophy of its creators. The syntax of Ruby is icing on the cake. I am not fond of Python
syntax and I am a web developer so the versatility of Python Libraries does not cut the cake
for me. In the end, you won't be wrong with any of these but you will have to find out yourself
about which one you will love to code in (if that is important for you).
Ruby and Python. Two languages. Two communities. Both have a similar target: to make software
development better. Better than Java, better than PHP and better for everyone. But where is
the difference? And what language is "better"? For the last question I can say: none is
better. Both camps are awesome and do tons of great stuff. But for the first question the
answer is longer. And I hope to provide that in this little article.
Is the difference in the toolset around the language? No, I don't think so. Both have good
package managers, tons of libraries for all kind of stuff and a few decent web frameworks.
Both promote test driven development. On the language side one is whitespace sensitive, the
other isn't. Is that so important? Maybe a little, but I think there is something else that
is way more important. The culture.
It all started with a stupid python troll at the Sigint that wanted to troll our
cologne.rb booth. To be prepared for the next troll attack I started to investigate Python.
For that I talked with a lot of Python guys and wrote a few little things in Python to get a
feel for the language and the ecosystem. Luckily at theFrOSCon our Ruby booth was right next
to the pycologne folks and we talked a lot about the differences. During that time I got the
feeling that I know what is different in the culture of both parties. Last month I had the
opportunity to test my theory in real life. The cologne.rb and the django cologne folks did a
joined meetup. And I took the opportunity to test my theory. And it got confirmed by lots of
the Python people.
Okay, now what is the difference in the culture? It is pretty easy. Python folks are
really conservative and afraid of change, Ruby folks love the new shiny stuff even if it
breaks older things. It's that simple. But it has huge consequences. One you can see for
example in the adaption of Ruby 1.9 vs Python 3. Both new versions did tons of breaking
changes. A lot of code needed changes to run on the new plattform. In the Ruby world the
transition went pretty quick. In the Python world it is very troublesome. Some Python people
even say that Python 3 is broken and all energy should be focused on the 2.x-branch of the
language. The Ruby community saw the opportunities. The Python community only saw the update
problems. Yes, there have been update problems in the Ruby world, but we found an easy way to
fix this: isitruby19.com . A simple
plattform that showed if the gem is ready for 1.9. And if it wasn't and the gem was
important, it got fixed with pull requests or something similar. And the problems went away
fast.
Both models of thinking have pros and cons. The Python world is more stable, you can
update your django installation without much troubles. But that also means new technology is
only added very slowly. The Ruby world loves changes. So much that most of the "new stuff" in
the Python world was tested in the Ruby world first. We love changes so much that the Rails
core is build around that idea. You can easily change nearly everything and extend
everything. Most of the new stuff the Rails Core Team is testing right now for version 4 is
available as plugin for Rails 3. This is pretty interesting if you love new things, love
change, and love playing around with stuff. If you don't and hate the idea of breaking
changes, you maybe are better suited with the Python way. But don't be afraid of breaking
changes. They are all pretty well documented in the release guides. It's not voodoo.
I for myself love the Ruby mindset. Something like Rails or Asset Pipelines or all the
other things would not be possible if we are stuck with "no, don't change that, it works
pretty well that way". Someone has to be the leader. Someone has to play around with new
ideas. Yes, some ideas won't fly, some are removed pretty quickly. But at least we tried
them. Yes, I know that some people prefer the conservative way. If you consider yourself to
be like that, you should at least try Python. I stay with Ruby.
Githut lists its ranking according to the following characteristics: active repositories,
the number of pushes and the pushes per repository, as well as the new forks per repository,
the open issues per repository and the new watcher per repository.
GitHut's top 20 ranking currently looks like this:
The list was moved to GitHub by Victor Felder for collaborative updating and maintenance. It
grew to become one of the most popular
repositories on Github , with over 80,000 stars, over 4000 commits, over 800 contributors,
and over 20,000 forks.
The repo is now administered by the Free Ebook Foundation , a not-for-profit organization
devoted to promoting the creation, distribution, archiving and sustainability of free ebooks.
Donations to the
Free Ebook Foundation are tax-deductible in the US.
The author pays outsize attention to superficial things like popularity with particular
groups of users. For sysadmin this matter less then the the level of integration with the underling
OS and the quality of the debugger.
The real story is that Python has less steep initial learning curve and that helped to entrenched
it in universities. Students brought it to large companies like Red Hat. The rest is history. Google support also was a positive factor.
Python also basked in OO hype. So this is more widespread language now much like Microsoft
Basic. That does not automatically makes it a better language in sysadmin domain.
The phase " Perl's quirky stylistic conventions, such as using $ in front to declare
variables, are in contrast for the other declarative symbol $ for practical programmers today�the
money that goes into the continued development and feature set of Perl's frenemies such as Python
and Ruby." smells with "syntax junkie" mentality. What wrong with dereferencing using $ symbol?
yes it creates problem if you are using simultaneously other languages like C or Python, but for
experienced programmer this is a minor thing. Yes Perl has some questionable syntax choices so so
are any other language in existence. While painful, it is the semantic and "programming
environment" that mater most.
My impression is that Perl returned to its roots -- migrated back to being an
excellent sysadmin tool -- as there is strong synergy between Perl and Unix shells. The fact
that Perl 5 is reasonably stable is a huge plus in this area.
Notable quotes:
"... By the late 2000s Python was not only the dominant alternative to Perl for many text parsing tasks typically associated with Perl (i.e. regular expressions in the field of bioinformatics ) but it was also the most proclaimed popular language , talked about with elegance and eloquence among my circle of campus friends, who liked being part of an up-and-coming movement. ..."
"... Others point out that Perl is left out of the languages to learn first �in an era where Python and Java had grown enormously, and a new entrant from the mid-2000s, Ruby, continues to gain ground by attracting new users in the web application arena (via Rails ), followed by the Django framework in Python (PHP has remained stable as the simplest option as well). ..."
"... In bioinformatics, where Perl's position as the most popular scripting language powered many 1990s breakthroughs like genetic sequencing, Perl has been supplanted by Python and the statistical language R (a variant of S-plus and descendent of S , also developed in the 1980s). ..."
"... By 2013, Python was the language of choice in academia, where I was to return for a year, and whatever it lacked in OOP classes, it made up for in college classes. Python was like Google, who helped spread Python and employed van Rossum for many years. Meanwhile, its adversary Yahoo (largely developed in Perl ) did well, but comparatively fell further behind in defining the future of programming. Python was the favorite and the incumbent; roles had been reversed. ..."
"... from my experience? Perl's eventual problem is that if the Perl community cannot attract beginner users like Python successfully has ..."
"... The fact that you have to import a library, or put up with some extra syntax, is significantly easier than the transactional cost of learning a new language and switching to it. ..."
"... MIT Python replaced Scheme as the first language of instruction for all incoming freshman, in the mid-2000s ..."
I first heard of Perl when I was in middle school in the early
2000s. It was one of the world's most versatile programming languages, dubbed the
Swiss army knife of the Internet.
But compared to its rival Python, Perl has faded from popularity. What happened to the web's most
promising language? Perl's low entry barrier compared to compiled, lower level language alternatives
(namely, C) meant that Perl attracted users without a formal CS background (read: script kiddies
and beginners who wrote poor code). It also boasted a small group of power users ("hardcore hackers")
who could quickly and flexibly write powerful, dense programs that fueled Perl's popularity to a
new generation of programmers.
A central repository (the Comprehensive Perl Archive Network, or
CPAN ) meant that for every person who wrote code,
many more in the Perl community (the
Programming Republic of Perl
) could employ it. This, along with the witty evangelism by eclectic
creator Larry Wall , whose interest in
language ensured that Perl led in text parsing, was a formula for success during a time in which
lots of text information was spreading over the Internet.
As the 21st century approached, many pearls of wisdom were wrought to move and analyze information
on the web. Perl did have a learning curve�often meaning that it was the third or fourth language
learned by adopters�but it sat at the top of the stack.
"In the race to the millennium, it looks like C++ will win, Java will place, and Perl will show,"
Wall said in the third State of Perl address in 1999. "Some of you no doubt will wish we could erase
those top two lines, but I don't think you should be unduly concerned. Note that both C++ and Java
are systems programming languages. They're the two sports cars out in front of the race. Meanwhile,
Perl is the fastest SUV, coming up in front of all the other SUVs. It's the best in its class. Of
course, we all know Perl is in a class of its own."
Then Python came along. Compared to Perl's straight-jacketed scripting, Python was a lopsided
affair. It even took after its namesake, Monty Python's Flying Circus. Fittingly, most of Wall's
early references to Python were lighthearted jokes at its expense. Well, the millennium passed, computers
survived Y2K , and
my teenage years came and went. I studied math, science, and humanities but kept myself an arm's
distance away from typing computer code. My knowledge of Perl remained like the start of a new text
file: cursory , followed by a lot of blank space to fill up.
In college, CS friends at Princeton raved about Python as their favorite language (in spite of
popular professor
Brian
Kernighan on campus, who helped popularize C). I thought Python was new, but I later learned
it was around when I grew up as well,
just not visible on the
charts.
By the late 2000s Python was not only the dominant alternative to Perl for many text parsing tasks
typically associated with Perl (i.e.
regular expressions
in the field of
bioinformatics
) but it was also the
most proclaimed popular language
, talked about with elegance and eloquence among my circle of
campus friends, who liked being part of an up-and-coming movement.
Despite Python and Perl's
well documented rivalry
and design decision differences�which persist to this day�they occupy a similar niche in the
programming ecosystem. Both are frequently referred to as "scripting languages," even though later
versions are retro-fitted with object oriented programming (OOP) capabilities.
Stylistically, Perl and Python have different philosophies. Perl's best known mottos is "
There's
More Than One Way to Do It ". Python is designed to have one obvious way to do it. Python's construction
gave an advantage to beginners: A syntax with more rules and stylistic conventions (for example,
requiring whitespace indentations for functions) ensured newcomers would see a more consistent set
of programming practices; code that accomplished the same task would look more or less the same.
Perl's construction favors experienced programmers: a more compact, less verbose language with built-in
shortcuts which made programming for the expert a breeze.
During the dotcom era and the tech recovery of the mid to late 2000s, high-profile websites and
companies such as
Dropbox
(Python) and Amazon and
Craigslist
(Perl), in addition to some of the world's largest news organizations (
BBC ,
Perl ) used
the languages to accomplish tasks integral to the functioning of doing business on the Internet.
But over the course of the last
15 years , not only how companies do business has changed and grown, but so have the tools they
use to have grown as well,
unequally to the
detriment of Perl. (A growing trend that was identified in the last comparison of the languages,
" A Perl
Hacker in the Land of Python ," as well as from the Python side
a Pythonista's evangelism aggregator
, also done in the year 2000.)
Today, Perl's growth has stagnated. At the Orlando Perl Workshop in 2013, one of the talks was
titled "
Perl is not Dead, It is a Dead End
," and claimed that Perl now existed on an island. Once Perl
programmers checked out, they always left for good, never to return.
Others
point out that Perl is
left out of the languages to learn first
�in an era where Python and Java had grown enormously,
and a new entrant from the mid-2000s, Ruby, continues to gain ground by attracting new users in the
web application arena (via Rails ), followed
by the Django framework in Python (PHP
has remained stable as the simplest option as well).
In bioinformatics, where Perl's position as the most popular scripting language powered many 1990s
breakthroughs like genetic sequencing, Perl has been supplanted by Python and the statistical language
R (a variant of S-plus and descendent of
S , also
developed in the 1980s).
In scientific computing, my present field, Python, not Perl, is the open source overlord, even
expanding at Matlab's expense (also a
child of the 1980s
, and similarly retrofitted with
OOP abilities
). And upstart PHP grew in size
to the point where it is now arguably the most common language for web development (although its
position is dynamic, as Ruby
and Python have quelled PHP's dominance and are now entrenched as legitimate alternatives.)
While Perl is not in danger of disappearing altogether, it
is in danger of
losing cultural relevance , an ironic fate given Wall's love of language. How has Perl become
the underdog, and can this trend be reversed? (And, perhaps more importantly, will
Perl 6 be released!?)
Why Python , and not Perl?
Perhaps an illustrative example of what happened to Perl is my own experience with the language.
In college, I still stuck to the contained environments of Matlab and Mathematica, but my programming
perspective changed dramatically in 2012. I realized lacking knowledge of structured computer code
outside the "walled garden" of a desktop application prevented me from fully simulating hypotheses
about the natural world, let alone analyzing data sets using the web, which was also becoming an
increasingly intellectual and financially lucrative skill set.
One year after college, I resolved to learn a "real" programming language in a serious manner:
An all-in immersion taking me over the hump of knowledge so that, even if I took a break, I would
still retain enough to pick up where I left off. An older alum from my college who shared similar
interests�and an experienced programmer since the late 1990s�convinced me of his favorite language
to sift and sort through text in just a few lines of code, and "get things done": Perl. Python, he
dismissed, was what "what academics used to think." I was about to be acquainted formally.
Before making a definitive decision on which language to learn, I took stock of online resources,
lurked on PerlMonks , and acquired several used
O'Reilly books, the Camel
Book and the Llama Book
, in addition to other beginner books. Yet once again,
Python reared its head , and
even Perl forums and sites
dedicated to the language were lamenting
the
digital siege their language was succumbing to . What happened to Perl? I wondered. Ultimately
undeterred, I found enough to get started (quality over quantity, I figured!), and began studying
the syntax and working through examples.
But it was not to be. In trying to overcome the engineered flexibility of Perl's syntax choices,
I hit a wall. I had adopted Perl for text analysis, but upon accepting an engineering graduate program
offer, switched to Python to prepare.
By this point, CPAN's enormous
advantage had been whittled away by ad hoc, hodgepodge efforts from uncoordinated but overwhelming
groups of Pythonistas that now assemble in
Meetups , at startups, and on
college and
corporate campuses
to evangelize the Zen of Python . This has created a lot of issues with importing (
pointed out by Wall
), and package download synchronizations to get scientific computing libraries (as I found),
but has also resulted in distributions of Python such as
Anaconda that incorporate
the most important libraries besides the standard library to ease the time tariff on imports.
As if to capitalize on the zeitgiest, technical book publisher O'Reilly
ran this ad , inflaming
Perl devotees.
By 2013, Python was the language of choice in academia, where I was to return for a year, and
whatever it lacked in OOP classes, it made up for in college classes. Python was like Google, who
helped spread Python and
employed van Rossum
for many years. Meanwhile, its adversary Yahoo (largely developed in
Perl
) did well, but comparatively fell further behind in defining the future of programming.
Python was the favorite and the incumbent; roles had been reversed.
So after six months of Perl-making effort, this straw of reality broke the Perl camel's back and
caused a coup that overthrew the programming Republic which had established itself on my laptop.
I sheepishly abandoned the llama
. Several weeks later, the tantalizing promise of a
new MIT edX course
teaching general CS principles in Python, in addition to
numerous n00b examples , made Perl's
syntax all too easy to forget instead of regret.
Measurements of the popularity of programming languages, in addition to friends and fellow programming
enthusiasts I have met in the development community in the past year and a half, have confirmed this
trend, along with the rise of Ruby in the mid-2000s, which has also eaten away at Perl's ubiquity
in stitching together programs written in different languages.
While historically many arguments could explain away any one of these studies�perhaps Perl programmers
do not cheerlead their language as much, since they are too busy productively programming. Job listings
or search engine hits could mean that a programming language has many errors and issues with it,
or that there is simply a large temporary gap between supply and demand.
The concomitant picture, and one that many in the Perl community now acknowledge, is that Perl
is now essentially a second-tier language, one that has its place but will not be the first several
languages known outside of the Computer Science domain such as Java, C, or now Python.
I believe
Perl has a future
, but it could be one for a limited audience. Present-day Perl is more suitable
to users who have
worked with the language from its early days
, already
dressed to impress
. Perl's quirky stylistic conventions, such as using $ in front to declare
variables, are in contrast for the other declarative symbol $ for practical programmers today�the
money that goes into the continued development and feature set of Perl's frenemies such as Python
and Ruby. And the high activation cost of learning Perl, instead of implementing a Python solution.
Ironically, much in the same way that Perl jested at other languages, Perl now
finds
itself at the receiving
end .
What's wrong
with Perl , from my experience? Perl's eventual problem is that if the Perl community cannot
attract beginner users like Python successfully has, it runs the risk of become like Children of Men
, dwindling away to a standstill; vast repositories of hieroglyphic code looming in sections
of the Internet and in data center partitions like the halls of the Mines of
Moria . (Awe-inspiring
and historical? Yes. Lively? No.)
Perl 6 has been an ongoing
development since 2000. Yet after 14 years it is not officially done
, making it the equivalent of Chinese Democracyfor Guns N'
Roses. In Larry Wall's words
: "We're not trying to make Perl a better language than C++, or Python, or Java, or JavaScript.
We're trying to make Perl a better language than Perl. That's all." Perl may be on the same self-inflicted
path to perfection as Axl Rose, underestimating not others but itself. "All" might still be too much.
Absent a game-changing Perl release (which still could be "too little, too late") people who learn
to program in Python have no need to switch if Python can fulfill their needs, even if it is widely
regarded as second or third best in some areas. The fact that you have to import a library, or put
up with some extra syntax, is significantly easier than the transactional cost of learning a new
language and switching to it. So over time, Python's audience stays young through its gateway strategy
that van Rossum himself pioneered,
Computer Programming for Everybody
. (This effort has been a complete success. For example, at MIT
Python replaced Scheme as
the first language of instruction for all incoming freshman, in the mid-2000s.)
Python continues to gain footholds one by one in areas of interest, such as visualization (where
Python still lags behind other language graphics, like Matlab, Mathematica, or
the recent d3.js
), website creation (the Django framework is now a mainstream choice), scientific
computing (including NumPy/SciPy), parallel programming (mpi4py with CUDA), machine learning, and
natural language processing (scikit-learn and NLTK) and the list continues.
While none of these efforts are centrally coordinated by van Rossum himself, a continually expanding
user base, and getting to CS students first before other languages (such as even Java or C), increases
the odds that collaborations in disciplines will emerge to build a Python library for themselves,
in the same open source spirit that made Perl a success in the 1990s.
As for me? I'm open to returning to Perl if it can offer me a significantly different experience
from Python (but "being frustrating" doesn't count!). Perhaps Perl 6 will be that release. However,
in the interim, I have heeded the advice of many others with a similar dilemma on the web. I'll just
wait and C .
27 Answers Dan Lenski , I do all my
best thinking in Python, and I've written a lot of it
Updated Jun 1 2015 Though I may get flamed for it, I will put it even more bluntly than
others have: Python is better than Perl . Python's syntax is cleaner, its
object-oriented type system is more modern and consistent, and its libraries are more
consistent. ( EDIT: As Christian Walde points out in the comments,
my criticism of Perl OOP is out-of-date with respect to the current de facto standard of
Moo/se. I do believe that Perl's utility is still encumbered by historical baggage in this area
and others.)
I have used both languages extensively for both professional work and personal projects
(Perl mainly in 1999-2007, Python mainly since), in domains ranging from number crunching (both
PDL and NumPy are excellent) to web-based programming (mainly with Embperl and Flask ) to good ol' munging text files and database CRUD
Both Python and Perl have large user communities including many programmers who are far more
skilled and experienced than I could ever hope to be. One of the best things about Python is
that the community generally espouses this
aspect of "The Zen of Python"
Python's philosophy rejects the Perl " there is more than one way
to do it " approach to language design in favor of "there should be one!and preferably
only one!obvious way to do it".
... while this principle might seem stultifying or constraining at first, in practice it means
that most good Python programmers think about the principle of least surprise
and make it easier for others to read and interface with their code.
In part as a consequence of this discipline, and definitely because of Python's strong typing , and
arguably because of its "cleaner" syntax, Python code is considerably easier to read
than Perl code. One of the main events that motivated me to switch was the experience of
writing code to automate lab equipment in grad school. I realized I couldn't read Perl code I'd
written several months prior, despite the fact that I consider myself a careful and consistent
programmer when it comes to coding style ( Dunning–Kruger effect , perhaps?
:-P).
* For example, the one-liner perl bak pe 's/foo/bar/g' *. txt will go through a
bunch of text files and replace foo with bar everywhere while making
backup files with the bak extension.
whenever you have to write a quick and dirty script that uses regular expressions or that
executes system commands. Perl has been designed for automating operations on files and file
systems; in fact, in perl to use regular expressions or to call system commands you don't
need to import anything, while in python, you need to import a library to do these
operations.
perl programmers usually adopt the Unix Philosophy of " Make one tool that does only one specific
task ". This is the reason why most perl script don't even include functions, not to
speak of objects. Each perl script must be written in order to execute a single task, and
then the scripts are piped together through the Unix piping system. However, consider that
this approach can lead to a lot of confusion, and is generally difficult to mantain.
Moreover, you can do "One tool one task" in python, too; the only difference is that you will
have to import sys in every script.
whenever you need to use a library available only for perl. Have a look at the modules in
CPAN . I think that BioPerl is still
slightly more complete than BioPython, and that there are more bioinformatics-oriented
modules for perl than for python (unfortunately). The Ensembl APIs are also available only
for perl.
When to use python:
whenever you need to use the script more than once.
whenever there is a even remote possibility that one of your colleagues or someone else
will ever use your script.
whenever you need to use functions or objects. As I was saying before, the majority of
perl scripts don't even include functions, because they are more difficult to write than in
python. For example, in perl, you need to learn how to pass the reference to a variable,
etc.. and this leads to scripts that are more difficult to understand (in perl).
if this is the first programming language you are learning, I suggest you to start with
Python. Python is much cleaner than Perl, and is designed to respect some good practices that
any programmer should know. Have a look at the Python Zen for some ideas.
Edit after 5 years:
Python has now support for tabular-like data structure (data frames), which are very
important for data analysts. Use Python or R whenever you have to work with tables
Python has now excellent support for machine learning algorithms and data analysis in
general. Use Python or R whenever you need to do one of these things.
I think very few people are completely agnostic about programming languages especially when
it comes to languages with very similar strengths and weaknesses like: perl/python/ruby -
therefore there is no general reason for using one language vs the other.
It is more common to find someone equally proficient in C and perl, than say equally
proficient in perl and python. My guess would be that complementary languages require
complementary skill sets that occupy different parts of the brain, whereas similar concepts
will clash more easily.
You are totally correct with your initial assumption. This question is similar to choosing
between Spanish and English, which language to choose? Well if you go to Spain,...
All (programming) languages are equal, in the sense of that you can solve the same class of
problems with them. Once you know one language, you can easily learn all imperative languages.
Use the language that you already master or that suits your style. Both (Perl & Python) are
interpreted languages and have their merits. Both have extensive Bio-libraries, and both have
large archives of contributed packages.
An important criterion to decide is the availability of rich, stable, and well maintained
libraries. Choose the language that provides the library you need. For example, if you want to
program web-services (using SOAP not web-sites), you better use Java or maybe C#.
Conclusion: it does no harm to learn new languages. And no flame wars please.
ADD COMMENT
• link modified 5.8
years ago • written 5.8 years ago by Michael Dondrup ♦ 42k
What is the Josephus problem? To quote from Concepts, Techniques, and Models of Computer
Programming (a daunting title if ever there was one):
Flavius Josephus was a roman historian of Jewish origin. During the Jewish-Roman wars of
the first century AD, he was in a cave with fellow soldiers, 40 men in all, surrounded by
enemy Roman troops. They decided to commit suicide by standing in a ring and counting off
each third man. Each man so designated was to commit suicide...Josephus, not wanting to die,
managed to place himself in the position of the last survivor.
In the general version of the problem, there are n soldiers numbered from 1 to
n and each k -th soldier will be eliminated. The count starts from the first
soldier. What is the number of the last survivor?
I decided to model this situation using objects in three different scripting
languages, Perl, Ruby, and Python. The solution in each of the languages is similar. A Person
class is defined, which knows whether it is alive or dead, who the next person in the circle
is, and what position number it is in. There are methods to pass along a kill signal, and to
create a chain of people. Either of these could have been implemented using iteration, but I
wanted to give recursion a whirl, since it's tougher on the languages. Here are my results.
Crowdsourcing,
Open Data and Precarious Labour Crowdsourcing and microtransactions are two halves of the
same coin: they both mark new stages in the continuing devaluation of labour. byAllana Mayeron February 24th, 2016 The cultural heritage industries (libraries, archives, museums,
and galleries, often collectively called GLAMs) like to consider themselves the tech industry's
little siblings. We're working to develop things like Linked Open Data, a decentralized network
of collaboratively-improved descriptive metadata; we're building our own open-source tech to
make our catalogues and collections more useful; we're pushing scholarly publishing out from
behind paywalls and into open-access platforms; we're driving innovations in accessible tech.
We're only different in a few ways. One, we're a distinctly
feminized set of professions , which comes with a large set of internally- and
externally-imposed assumptions. Two, we rely very heavily on volunteer labour, and not just in
the
internship-and-exposure vein : often retirees and non-primary wage-earners are the people
we "couldn't do without." Three, the underlying narrative of a "helping" profession !
essentially a social service ! can push us to ignore the first two distinctions, while driving
ourselves to perform more and expect less.
I suppose the major way we're different is that tech doesn't acknowledge us, treat us with
respect, build things for us, or partner with us, unless they need a philanthropic opportunity.
Although, when some ingenue autodidact bootstraps himself up to a billion-dollar IPO, there's a
good chance he's been educating himself using our free resources. Regardless, I imagine a few
of the issues true in GLAMs are also true in tech culture, especially in regards to labour and
how it's compensated.
Here's an example. One of the latest trends is crowdsourcing: admitting we don't have all
the answers, and letting users suggest some metadata for our records. (Not to be confused with
crowdfunding.) The biggest example of this is Flickr Commons: the Library of Congress partnered
with Yahoo! to publish thousands of images that had somehow ended up in the LOC's collection
without identifying information. Flickr users were invited to tag pictures with their own
keywords or suggest descriptions using comments.
Many orphaned works (content whose copyright status is unclear) found their way conclusively
out into the public domain (or back into copyright) this way. Other popular crowdsourcing
models include gamification ,
transcription of handwritten documents (which can't be done with Optical Character
Recognition), or proofreading OCR output on digitized texts. The most-discussed side benefits
of such projects include the PR campaign that raises general awareness about the organization,
and a "lifting of the curtain" on our descriptive mechanisms.
The problem with crowdsourcing is that it's been conclusively provennot to
function in the way we imagine it does: a handful of users end up contributing massive amounts
of labour, while the majority of those signed up might do a few tasks and then disappear. Seven
users in the "Transcribe Bentham" project contributed to 70% of
the manuscripts completed; 10 "power-taggers" did the lion's share of the Flickr
Commons' image-identification work. The function of the distributed digital model of
volunteerism is that those users won't be compensated, even though many came to regard their
accomplishments as full-time jobs .
It's not what you're thinking: many of these contributors already had full-time jobs ,
likely ones that allowed them time to mess around on the Internet during working hours. Many
were subject-matter experts, such as the vintage-machinery hobbyist who
created entire datasets of machine-specific terminology in the form of image tags. (By the way,
we have a cute name for this: "folksonomy," a user-built taxonomy. Nothing like reducing unpaid
labour to a deeply colonial ascription of communalism.) In this way, we don't have precisely
the free-labour-for-exposure/project-experience
problem the tech industry has ; it's not our internships that are the problem. We've moved
past that, treating even our volunteer labour as a series of microtransactions. Nobody's
getting even the dubious benefit of job-shadowing, first-hand looks at business practices, or
networking. We've completely obfuscated our own means of production. People who submit metadata
or transcriptions don't even have a means of seeing how the institution reviews and ingests
their work, and often, to see how their work ultimately benefits the public.
All this really says to me is: we could've hired subject experts to consult, and given them
a living wage to do so, instead of building platforms to dehumanize labour. It also means
our systems rely on privilege , and will undoubtedly contain and promote content with a
privileged bias, as Wikipedia does. (And hey, even Wikipedia contributions can sometimes result
in paid Wikipedian-in-Residence jobs.)
If libraries continue on with their veneer of passive and objective authorities that offer
free access to all knowledge, this underlying bias will continue to propagate subconsciously.
As in
Mechanical Turk , being "slightly more
diverse than we used to be" doesn't get us any points, nor does it assure anyone that our
labour isn't coming from countries with long-exploited workers.
I also want to draw parallels between the free labour of crowdsourcing and the free labour
offered in civic hackathons or open-data contests. Specifically, I'd argue that open-data
projects are less ( but
still definitely ) abusive to their volunteers, because at least those volunteers have a
portfolio object or other deliverable to show for their work. They often work in groups and get
to network, whereas heritage crowdsourcers work in isolation.
There's also the potential for converting open-data projects to something monetizable: for
example, a Toronto-specific bike-route app can easily be reconfigured for other cities and
sold; while the Toronto version stays free under the terms of the civic initiative, freemium
options can be added. The volunteers who supply thousands of transcriptions or tags can't
usually download their own datasets and convert them into something portfolio-worthy, let alone
sellable. Those data are useless without their digital objects, and those digital objects still
belong to the museum or library.
Crowdsourcing and microtransactions are two halves of the same coin: they both mark new
stages in the continuing devaluation of labour, and they both enable misuse and abuse of people
who increasingly find themselves with few alternatives. If we're not offering these people
jobs, reference letters, training, performance reviews, a "foot in the door" (cronyist as that
is), or even acknowledgement by name, what impetus do they have to contribute? As with
Wikipedia, I think the intrinsic motivation for many people to supply us with free labour is
one of two things: either they love being right, or they've been convinced by the feel-good
rhetoric that they're adding to the net good of the world. Of course, trained librarians,
archivists, and museum workers have fallen sway to the
conflation of labour and identity , too, but we expect to be paid for it.
As in tech, stereotypes and PR obfuscate labour in cultural heritage. For tech, an
entrepreneurial spirit and a tendency to buck traditional thinking; for GLAMs, a passion for
public service and opening up access to treasures ancient and modern. Of course, tech
celebrates the autodidactic dropout; in GLAMs, you need a masters. Period. Maybe two. And
entry-level jobs in GLAMs require one or more years of experience, across the board.
When library and archives students go into massive student debt, they're rarely apprised of
the
constant shortfall of funding for government-agency positions, nor do they get told how
much work is done by volunteers (and, consequently, how much of the job is monitoring and
babysitting said volunteers). And they're not trained with enough technological competency to
sysadmin anything , let alone build a platform that pulls crowdsourced data into an
authoritative record. The costs of commissioning these platforms aren't yet being made public,
but I bet paying subject experts for their hourly labour would be cheaper.
Solutions
I've tried my hand at many of the crowdsourcing and gamifying interfaces I'm here to
critique. I've never been caught up in the "passion" ascribed to those super-volunteers who
deliver huge amounts of work. But I can tally up other ways I contribute to this problem: I
volunteer for scholarly tasks such as peer-reviewing, committee work, and travelling on my own
dime to present. I did an unpaid internship without receiving class credit. I've put my
research behind a paywall. I'm complicit in the established practices of the industry, which
sits uneasily between academic and social work: neither of those spheres have ever been
profit-generators, and have always used their codified altruism as ways to finagle more labour
for less money.
It's easy to suggest that we outlaw crowdsourced volunteer work, and outlaw
microtransactions on Fiverr and MTurk, just as the easy answer would be to outlaw Uber and Lyft
for divorcing administration from labour standards. Ideally, we'd make it illegal for
technology to wade between workers and fair compensation.
But that's not going to happen, so we need alternatives. Just as unpaid internships are
being eliminated ad-hoc through corporate pledges, rather than being prohibited
region-by-region, we need pledges from cultural-heritage institutions that they will pay for
labour where possible, and offer concrete incentives to volunteer or intern otherwise. Budgets
may be shrinking, but that's no reason not to compensate people at least through resume and
portfolio entries. The best template we've got so far is the Society of
American Archivists' volunteer best practices , which includes "adequate training and
supervision" provisions, which I interpret to mean outlawing microtransactions entirely. The
Citizen Science
Alliance , similarly, insists on "concrete outcomes" for its crowdsourcing projects, to "
never
waste the time of volunteers ." It's vague, but it's something.
We can boycott and publicly shame those organizations that promote these projects as fun
ways to volunteer, and lobby them to instead seek out subject experts for more significant
collaboration. We've seen a few
efforts to shame job-posters for unicorn requirements and pathetic salaries, but they've
flagged without productive alternatives to blind rage.
There are plenty more band-aid solutions. Groups like Shatter The Ceiling offer cash to women of colour who
take unpaid internships. GLAM-specific internship awards are relativelycommon
, but could: be bigger, focus on diverse applicants who need extra support, and have
eligibility requirements that don't exclude people who most need them (such as part-time
students, who are often working full-time to put themselves through school). Better yet, we can
build a tech platform that enables paid work, or at least meaningful volunteer projects. We
need nationalized or non-profit recruiting systems (a digital "volunteer bureau") that matches
subject experts with the institutions that need their help. One that doesn't take a cut
from every transaction, or reinforce power imbalances, the way Uber does. GLAMs might even find
ways to combine projects, so that one person's work can benefit multiple institutions.
GLAMs could use plenty of other help, too: feedback from UX designers on our catalogue
interfaces, helpful
tools , customization of our vendor platforms, even turning libraries into Tor relays or exits .
The open-source community seems to be looking for ways to contribute meaningful volunteer
labour to grateful non-profits; this would be a good start.
What's most important is that cultural heritage preserves the ostensible benefits of
crowdsourcing – opening our collections and processes up for scrutiny, and admitting the limits of our knowledge –
without the exploitative labour practices. Just like in tech, a few more glimpses behind the
curtain wouldn't go astray. But it would require deeper cultural shifts, not least in the
self-perceptions of GLAM workers: away from overprotective stewards of information, constantly
threatened by dwindling budgets and unfamiliar technologies, and towards facilitators,
participants in the communities whose histories we hold.
Use a var with more text only if it exists. See "Parameter Expansion" in the bash manpage.
They refer to this as "Use Alternate Value", but we're including the var in the at
alternative.
If (and only if) the variable is not set, prompt users and give them a default option
already filled in. The read command reads input and puts it into a variable. With -i you set
an initial value. In this case I used a known environment variable.
Show sample output | Comments (2)
debsecan --format detail
2015-10-22 18:46:41
List known debian vulnerabilities on your system -- many of which may not yet be patched.
You can search for CVEs at https://security-tracker.debian.org/tracker/ or use --report to
get full links. This can be added to cron, but unless you're going to do manual patches, you'd
just be torturing yourself.
Comments (3)
tr '\0' ' ' </proc/21679/cmdline ; echo
2015-09-25 22:08:31
Show the command line for a PID, converting nulls to spaces and a newline
Rename file to same name plus datestamp of last modification.
Note that the -i will not help in a script. Proper error checking is required.
Show sample output | Comments (2)
echo "I am $BASH_SUBSHELL levels nested";
2014-06-20 20:33:43
Comments (0)
diff -qr /dirA /dirB
2014-04-01 21:42:19
shows which files differ in two direcories
Comments (0)
find ./ -type l -ls
2014-03-21 17:13:39
Show all symlinks
Comments (3)
ls | xargs WHATEVER_COMMAND
xargs will automatically determine how namy args are too many and only pass a reasonable
number of them at a time. In the example, 500,002 file names were split across 26
instantiations of the command "echo".
killall conky
kill a process(e.g. conky) by its name, useful when debugging conky:)
nmap -sn 192.168.1.0/24
2014-01-28 23:32:18
Ping all hosts on 192.168.1.0/24
find . -name "pattern" -type f -printf "%s\n" | awk '{total += $1} END {print total}'
2014-01-16 01:16:18
Find files and calculate size of result in shell. Using find's internal stat to get the file
size is about 50 times faster than using -exec stat.
Move all epub keyword containing files to Epub folder
Comments (0) | Add to favourites | Report as malicious | Submit alternative | Report as a
duplicate
CMD=chrome ; ps h -o pmem -C $CMD | awk '{sum+=$1} END {print sum}'
Show total cumulative memory usage of a process that spawns multiple instances of itself
mussh -h 192.168.100.{1..50} -m -t 10 -c uptime
2013-11-27 18:01:12
This will run them at the same time and timeout for each host in ten seconds. Also, mussh
will append the ip addres to the beginning of the output so you know which host resonded with
which time.
The use of the sequence expression {1..50} is not specific to mussh. The `seq ...` works,
but is less efficient.
Comments (1)
du -Sh | sort -h | tail
2013-11-27 17:50:11
Which files/dirs waste my disk space
I added -S to du so that you don't include /foo/bar/baz.iso in /foo, and change sorts -n to
-h so that it can properly sort the human readable sizes.
Avoiding a for loop brought this time down to less than 3 seconds on my old machine. And
just to be clear, 33554432 = 8192 * 4086.
Comments (6)
sudo lsof -iTCP:25 -sTCP:LISTEN
2013-11-12 17:32:34
Check if TCP port 25 is open
for i in {1..4096}; do base64 /dev/urandom | head -c 8192 > dummy$i.rnd ; done
2013-11-12 00:36:10
Create a bunch of dummy text files
Using the 'time' command, running this with 'tr' took 28 seconds (and change) each time but
using base64 only took 8 seconds (and change). If the file doesn't have to be viewable,
pulling straight from urandom with head only took 6 seconds (and change)
In
computer science, abstraction is the process by which
data and
programs are defined with a
representation similar to its meaning (semantics),
while hiding away the
implementation details. Abstraction tries to reduce and factor out details
so that the
programmer
can focus on a few concepts at a time. A system can have several abstraction
layers whereby different meanings and amounts of detail are exposed
to the programmer. For example,
low-level abstraction layers expose details of the
hardware
where the program is
run, while high-level layers deal with the
business
logic of the program.
That might be a bit too wordy for some people, and not at all clear. Here's
my analogy of abstraction.
Abstraction is like a car
A car has a few features that makes it unique.
A steering wheel
Accelerator
Brake
Clutch
Transmission (Automatic or Manual)
If someone can drive a Manual transmission car, they can drive any Manual
transmission car. Automatic drivers, sadly, cannot drive a Manual transmission
drivers without "relearing" the car. That is an aside, we'll assume that all
cars are Manual transmission cars � as is the case in Ireland for most cars.
Since I can drive my car, which is a Mitsubishi Pajero, that means that I
can drive your car � a Honda Civic, Toyota Yaris, Volkswagen Passat.
All I need to know, in order to drive a car � any car � is how to use the
breaks, accelerator, steering wheel, clutch and transmission. Since I already
know this in my car, I can abstract away your car and it's
controls.
I do not need to know the inner workings of your car in
order to drive it, just the controls. I don't need to know how exactly the breaks
work in your car, only that they work. I don't need to know, that your car has
a turbo charger, only that when I push the accelerator, the car moves. I also
don't need to know the exact revs that I should gear up or gear down (although
that would be better on the engine!)
Virtually all controls are the same. Standardization means that the clutch,
break and accelerator are all in the same place, regardless of the car. This
means that I do not need to relearn how a car works. To me, a car is
just a car, and is interchangeable with any other car.
Abstraction means not caring
As a programmer, or someone using a third party API (for example), abstraction
means not caring how the inner workings of some function works
� Linked list data structure, variable names inside the function, the sorting
algorithm used, etc � just that I have a standard (preferable unchanging) interface
to do whatever I need to do.
Abstraction can be taught of as a black box. For input, you get output. That
shouldn't be the case, but often is. We need abstraction so that, as a programmer,
we can concentrate on other aspects of the program � this is the corner-stone
for large scale, multi developer, software projects.
Posted by samzenpus
from the king-of-the-hill dept.
Nerval's Lobster writes Developers assume that Swift, Apple's newish programming
language for iOS and Mac OS X apps, will become extremely popular over the next few years.
According to new data from RedMonk, a tech-industry analyst
firm, Swift could reach that apex of popularity sooner rather than later. While the usual stalwarts-including JavaScript, Java, PHP,
Python, C#, C++, and Ruby-top
RedMonk's list of the most-used languages, Swift has, well, swiftly ascended 46 spots in the six months since the firm's last
update, from 68th to 22nd. RedMonk pulls data from GitHub and Stack Overflow to create its rankings, due to those sites' respective
sizes and the public nature of their data. While its top-ranked languages don't trade positions much between reports, there's a fair
amount of churn at the lower end of the rankings. Among those "smaller" languages, R has enjoyed stable popularity over the past
six months, Rust and Julia
continue to climb, and Go has exploded upwards-although CoffeeScript, often cited as a language to watch, has seen its support
crumble a bit.
Dutch Gun (899105) on Wednesday February 04, 2015 @09:45PM (#48985989)
Re:not really the whole story (Score:5, Insightful)
More critically, the question I always ask about this is: "Used for what?"
Without that context, why does popularity even matter? For example, I'm a game developer, so my programming life revolves around
C++, at least for game-side or engine-level code - period. Nothing else is even on the radar when you're talking about highly-optimized,
AAA games. For scripting, Lua is a popular contender. For internal tools, C# seems to be quite popular. I've also seen Python
used for tool extensions, or for smaller tools in their own right. Javascript is generally only used for web-based games, or by
the web development teams for peripheral stuff.
I'll bet everyone in their own particular industry has their own languages which are dominant. For instance, if you're working
on the Linux kernel, you're obviously working in C. It doesn't matter what the hell everyone else does. If you're working in scientific
computing, are you really looking seriously at Swift? Of course not. Fortran, F#, or C++ are probably more appropriate, or perhaps
others I'm not aware of. A new lightweight iOS app? Swift it is!
Languages are not all equal. The popularity of Javascript is not the measure of merit of that particular language. It's a measure
of how popular web-based development is (mostly). C/C++ is largely a measure of how many native, high-performance-required applications
there are (games, OS development, large native applications). Etc, etc.
Raw popularity numbers probably only have one practical use, and that's finding a programming job without concern for the particular
industry. Or I suppose if you're so emotionally invested in a particular language, it's nice to know where it stands among them
all.
unrtst (777550) on Wednesday February 04, 2015 @10:34PM (#48986283)
... And not sure public github or stack overflow are really as representative as they want to believe
This story about python surpassing java as top learning language:
http://developers.slashdot.org... [slashdot.org]
Or this about 5 languages you'll need to learn for the next year and on:
http://news.dice.com/2014/07/2... [dice.com]
... those are all from the past year on slashdot, and there's loads more.
Next "top languages" post I see, I hope it just combines all the other existing stats to provide a weightable index (allow you
to tweak what's most important). Maybe BH can address that :-)
gavron (1300111) on Wednesday February 04, 2015 @08:21PM (#48985495)
68th to 22nd and there are many to go (Score:5, Insightful)
All new languages start out at the bottom, as Swift did. In time, the ones that don't get used fall down.
Swift has gotten up to 22nd, but the rest of the climb past the stragglers won't ever happen.
However, to be "the most popular language" is clearly no contest worth winning. Paris Hilton and Kim Kardashian are most popular
compared to Steven Hawking and Isaac Asimov.
Being popular doesn't mean better, useful, or even of any value whatsoever. It just means someone has a better marketing-of-crap
department.
There's a time to have popularity contests. It's called high school.
E
coop247 (974899) on Wednesday February 04, 2015 @08:54PM (#48985695)
Being popular doesn't mean better, useful, or even of any value whatsoever
PHP runs facebook, yahoo, wordpress, and wikipedia. Javascript runs everything on the internet. Yup, no value there.
UnknownSoldier (67820) on Wednesday February 04, 2015 @08:39PM (#48985617)
Popularity != Quality (Score:5, Insightful)
McDonalds may serve billions, but no one is trying to pass it off as gourmet food.
Kind of like PHP and Javascript. The most fucked up languages are the most popular ... Go figure.
* http://dorey.github.io/JavaScr... [github.io]
Xest (935314) on Thursday February 05, 2015 @04:47AM (#48987365)
Re:Popularity != Quality (Score:2)
I think this is the SO distortion effect.
Effectively the more warts a language has and/or the more poorly documented it is, the more questions that are bound to be asked
about it, hence the more apparent popularity if you use SO as a metric.
So if companies like Microsoft and Oracle produce masses of great documentation for their respective technologies and provide
entire sites of resources for them (such as www.asp.net or the MSDN developer forums) then they'll inherently see reduced "popularity"
on SO.
Similarly some languages have a higher bar to entry, PHP and Javascript are both repeatedly sold as languages that beginners can
start with, it should similarly be unsurprising therefore that more questions are asked about them than by people who have moved
up the change to enterprise languages like C#, Java, and C++.
But I shouldn't complain too much, SO popularity whilst still blatantly flawed is still a far better metric than TIOBE whose
methodology is just outright broken (they explain their methodology on their site, and even without high school statistics knowledge
it shouldn't take more than 5 seconds to spot gaping holes in their methodology).
I'm still amazed no one's done an actual useful study on popularity and simply scraped data from job sites each month. It'd
be nice to know what companies are actually asking for, and what they're paying. That is after all the only thing anyone really
wants to know when they talk about popularity - how likely is it to get me a job, and how well is it likely to pay? Popularity
doesn't matter beyond that as you just choose the best tool for the job regardless of how popular it is.
Shados (741919) on Wednesday February 04, 2015 @11:08PM (#48986457)
Re:Just learn C and Scala (Score:2)
Its not about the language, its about the ecosystem. ie: .NET may be somewhere in between java and scala, and the
basic of the framework is the same, but if you do high end stuff, JVM languages and CLR languages are totally different. Different
in how you debug it in production, different in what the standards are, different in what patterns people expect you to use when
they build a library, different gotchas. And while you can pick up the basics in an afternoon, it can take years to really push
it.
Doesn't fucking matter if you're doing yet-another-e-commerce-site (and if you are, why?). Really fucking big deal if you do something
that's never been done before with a ridiculous amount of users.
"I tell them you learn to write the same way you learn to play golf," he once said. "You do it, and keep doing it until you get
it right. A lot of people think something mystical happens to you, that maybe the muse kisses you on the ear. But writing isn't divinely
inspired - it's hard work."
Last summer Lukas Ocilka mentioned the completion of the basic conversion of YaST from YCP to Ruby. At the time it was said the
change was needed to encourage contributions from a wider set of developers, and Ruby is said to be simpler and more flexible. Well,
today Jos Poortvliet posted an interview with two YaST developers explaining the move in more detail.
In a discussion with Josef Reidinger and David Majda, Poortvliet discovered the reason for the move was because all the original
YCP developers had moved on to other things and everyone else felt YCP slowed them down. "It didn't support many useful concepts
like OOP or exception handling, code written in it was hard to test, there were some annoying features (like a tendency to be "robust",
which really means hiding errors)."
Ruby was chosen because it is a well known language over at the openSUSE camp and was already being used on other SUSE projects (such
as WebYaST). "The internal knowledge and standardization was the decisive factor." The translation went smoothly according to developers
because they "automated the whole process and did testing builds months in advance. We even did our custom builds of openSUSE 13.
1 Milestones 2 and 3 with pre-release versions of YaST in Ruby."
For now performance under the Ruby code is comparable to the YCP version because developers were concentrating on getting it working
well during this first few phases and user will notice very little if any visual changes to the YaST interface. No more major changes
are planned for this development cycle, but the new Yast will be used in 13.1 due out November 19.
I make more than $40k as a software developer, but it wasn't too long ago that I was making right around that amount.
I have an AAS (not a fancy degree, if you didn't already know), my GPA was 2.8, and I assure you that neither of those things
has EVER come up in a job interview. I'm also old enough that my transcripts are gone. (Schools only keep them for about 10 years.
After that, nobody's looking anyway.)
The factors that kept me from making more are:
Timing. The dot-com "crash" of 2000 happened during my last full semester of college. I didn't land a job in the
industry until 5 years later.
Lack of experience. Since the dot-bomb dropped during my college days, nobody wanted interns either. No experience
= no job.
Lack of money. I grew up in a just-above-the-poverty-line household. I had to scrape by to even get a community
college education, and that didn't get me a job once there were so many out-of-work developers on the job market after the
crash.
Location. The midwest is a "small market" even in the larger cities. You don't pay as much for housing, but you
also don't make as much.
So when I did finally land a programming job, it was as a code monkey in a PHP sweatshop. The headhunter wanted a decent payout,
so I started at $40k. No raises. Got laid off after a year and a half due to it being a sweatshop and I had outstayed my welcome.
(Basically, I wanted more money and they didn't want to give me any more money.)
Next job was a startup. Still $40k. Over 2.5 years, I got a couple of small raises. I topped out at $45k-ish before I got laid
off during the early days of the recession.
Next job was through a headhunter again. I asked for $50k, but the employer could only go $40k. After 3 years and a few raises,
I'm finally at $50k.
I could probably go to the larger employers in this city and make $70k, but that's really the limit in this area. Nobody in
this line of work makes more than about $80k here.
aralin
Not accurate, smaller companies pay more
This survey must be only talking about companies above certain size. Our Sillicon Valley startup has about 50 employees and
the average engineering salaries are north of $150,000. Large companies like Google actually don't have to pay that much, because
the hours are more reasonable. I know there are other companies too that pay more than Google in the area.
Reply to This Parent Share twitter facebook Flag as Inappropriate
Re:Not accurate, smaller companies pay more (Score:4, Interesting)
by MisterSquid (231834) writes: on Thursday October 18, @11:16AM (#41693121)
Our Sillicon Valley startup has about 50 employees and the average engineering salaries are north of $150,000.
I suppose there are some start-ups that do pay developers the value of the labor, but my own experience is a bit different in
that it was more stereotypical of Silicon-Valley startup compensation packages. That is, my salary was shamefully low (I was new
to the profession), just about unlivable for the Bay Area, and was offset with a very accelerated stock options plan.
I don't know Python but I can comment on Perl. I have written many elegant scripts for complex problems and I still love it.
I often come across comments about how a programmer went back to his program six months later and had difficulty understanding
it. For my part, I haven't had this problem primarily because I use consistently a single syntax. If perl provides more than one
way to do things, I choose and use only one. Secondly, I do agree that objects in perl are clunky and make for difficult writing/reading.
I have never used them. This makes it difficult for me to write perl scripts for large projects. Perhaps this is where Python
succeeds.
shane o mac
I was forced to learn Python in order to write scripts within Blender (Open Source 3D Modeler).
White Hat:
1. dir( object )
This is nice as it shows functions and constants.
Black Hat:
1. Indention to denote code blocks. (!caca?)
PERL was more of an experiment instead of necessity. Much of what I know about regular expressions probably came from reading
about PERL. I never even wrote much code in PERL. You see CPAN (Repository for modules) alone makes up for all the drawbacks I
can't think of at the moment.
White Hat:
FreeBSD use to extensively use PERL for installation routines (4.7., I keep a copy of it in my music case although I don't know
why as I feel its a good luck charm of sorts). Then I read in 5.0 they started removing it in favor of shell script (BAsh). Why?
Black Hat:
I'm drawing a blank here.
With freedom there are costs, you are allowed to do as you please. Place variables as you must and toss code a-muck.
You can discipline yourself to write great code in any language. VB-Script I write has the appearance of a standard practices
C application.
I think it's a waste of time to sit and debate over which language is suited for which project. Pick one and master it.
So you can't read PERL? Break the modules into several files. It's more about information management than artisan ablility.
Divide and Conquor.
Peter
I disagree with mastering one language. Often there will be trade-offs in a program that match very nicely with a particular
program.
For example, you could code everything in C++\C\assembler, but that only makes sense when you really need speed or memory compactness.
After all, I find it difficult to write basic file processing applications in C in under 10 minutes.
Perl examples use a lot of default variables and various ways to approach problems, but this is really a nightmare when you
have to maintain someone else's code. Especially if you don't know Perl. I think its hard to understand (without a background
in Perl)
Juls
I'm a perl guy not a py programmer so I won't detract from python [except for the braces, Guido should at least let the language
compile with them].
Note: Perl is like English it's a reflective language. So I can make nouns into adjectives and use the power of reflection.
For example � 'The book on my living room table' vs [Spanish] 'The book on the table of my living room'.
And this makes sense � because Larry Wall was a linguist; and was very influenced by the fact that reflective languages can
say more with less because much is implied based on usage. These languages can also say the same thing many different ways. [Perl
makes me pull my hair out. | Perl makes me pull out my hair.] And being that we have chromosomes that wire us for human language
� these difficulties are soon mastered by even children. But � but we don't have the same affinity for programming languages (well
most of us) so yes Perl can be a struggle in the beginning. But once you achieve a strong familiarity and stop trying to turn
Perl into C or Python and allow Perl just to be Perl you really really start to enjoy it for those reasons you didn't like it
before.
The biggest failure of Perl has been its users enjoying the higher end values of the language and failing to publish and document
simple examples to help non-monks get there. You shouldn't have to be a monk to seek wisdom at the monastic gates.
Example � Perl classes. Obtuse and hard to understand you say? It doesn't have to be � I think that most programmers will understand
and be able to write their own after just looking at this simple example. Keep in mind, we just use 'package' instead or class.
Bless tells the interpreter your intentions and is explicitly used because you can bless all kinds of things � including a class
(package).
my $calc = new Calc; # or Calc->new;
print $calc->Add(1);
print $calc->Add(9);
print $calc->Pie(67);
package Calc;
sub new
{
my $class = shift; # inherits from package name, 'Calc'
my $self =
{
_undue => undef,
_currentVal => 0,
_pie => 3.14
};
bless $self, $class; # now we have a class named 'Calc'
return $self;
1;
sub Add
{
my ($self, $val) = @_;
$self->{_undue} = $self->{_currentVal}; # save off the last value
$self->{_currentVal} = $self->{_currentVal} + $val; # add the scalars
return $self->{_currentVal}; # return the new value
}
sub Pie
{
my ($self, $val) = @_;
$self->{_undue} = $self->{_currentVal}; # save off the last value
$self->{_currentVal} = $self->{_pie} * $val; # add the scalars
return $self->{_currentVal}; # return the new value
}
}
Esther Schindler writes "Plenty of people want to get involved in open
source, but don't know where to start. In this article, Andy Lester lists
several ways to help out even if you lack confidence in your technical chops. Here are a couple of his suggestions: 'Maintenance
of code and the systems surrounding the code often are neglected in the rush to create new features and to fix bugs. Look to these
areas as an easy way to get your foot into a project. Most projects have a publicly visible trouble ticket system, linked from the
front page of the project's website and included in the documentation. It's the primary conduit of communication between the users
and the developers. Keeping it current is a great way to help the project. You may need to get special permissions in the ticketing
system, which most project leaders will be glad to give you when you say you want to help clean up the tickets.'" What's your
favorite low-profile way to contribute?
Perl is a far better applications type language than JAVA/C/C#. Each has their niche. Threads were always an issue in Perl,
and like OO, if you don't need it or know it don't use it.
My issues with Perl is when people get Overly Obfuscated with their code because the person thinks that less characters
and a few pointers makes the code faster.
Unless you do some real smart OOesque building all you are doing is making it harder to figure out what you were thinking about.
and please perl programmers, don't by into the "self documenting code" i am an old mainframer and self documenting code was as
you wrote you added comments to the core parts of the code ... i can call my subroutine "apple" to describe it.. but is it really
an apple? or is it a tomato or pomegranate. If written properly Perl is very efficient code. and like all the other languages
if written incorrectly its' HORRIBLE. I have been writing perl since almost before 3.0 ;-)
Thats my 3 cents.. Have a HAPPY and a MERRY!
Nikolai Bezroukov
@steve Thanks for a valuable comment about the threat of overcomplexity junkies in Perl. That's a very important threat that
can undermine the language future.
@Gabor: A well know fact is that PHP, which is a horrible language both as for general design and implementation of most features
you mentioned is very successful and is widely used on for large Web applications with database backend (Mediawiki is one example).
Also if we think about all dull, stupid and unrelaible Java coding of large business applications that we see on the marketplace
the question arise whether we want this type of success ;-)
@Douglas: Mastering Perl requires slightly higher level of qualification from developers then "Basic-style" development in
PHP or commercial Java development (where Java typically plays the role of Cobol) which is mainstream those days. Also many important
factors are outside technical domain: ecosystem for Java is tremendous and is supported by players with deep pockets. Same is
true for Python. Still Perl has unique advantages, is universally deployed on Unix and as such is and always will be attractive
for thinking developers
I think that for many large business applications which in those days often means Web application with database backend one
can use virtual appliance model and use OS facilities for multitasking. Nothing wrong with this approach on modern hardware. Here
Perl provides important advantages due to good integration with Unix.
Also structuring of a large application into modules using pipes and sockets as communication mechanism often provides very
good maintainability. Pointers are also very helpful and unique for Perl. Typically scripting languages do not provide pointers.
Perl does and as such gives the developer unique power and flexibility (with additional risks as an inevitable side effect).
Another important advantage of Perl is that it is a higher level language then Python (to say nothing about Java ) and stimulates
usage of prototyping which is tremendously important for large projects as the initial specification is usually incomplete and
incorrect. Also despite proliferation of overcomplexity junkies in Perl community, some aspects of Perl prevent excessive number
of layers/classes, a common trap that undermines large projects in Java. Look at IBM fiasco with Lotus Notes 8.5.
I think that Perl is great in a way it integrates with Unix and promote thinking of complex applications as virtual appliances.
BTW this approach also permits usage of a second language for those parts of the system for which Perl does not present clear
advantages.
Also Perl provide an important bridge to system administrators who often know the language and can use subset of it productively.
That makes it preferable for large systems which depend on customization such as monitoring systems.
Absence of bytecode compiler hurts development of commercial applications in Perl in more ways than one but that's just question
of money. I wonder why ActiveState missed this opportunity to increase its revenue stream. I also agree that the quality of many
CPAN modules can be improved but abuse of CPAN along with fixation on OO is a typical trait of overcomplexity junkies so this
has some positive aspect too :-).
I don't think that OO is a problem for Perl, if you use it where it belongs: in GUI interfaces. In many cases OO is used when
hierarchical namespaces are sufficient. Perl provides a clean implementation of the concept of namespaces. The problem is that
many people are trained in Java/C++ style of OO and as we know for hummer everything looks like a nail. ;-)
Allan Bowhill:
I think the original question Gabor posed implies there is a problem 'selling' Perl to companies for large projects. Maybe
it's a question of narrowing its role.
It seems to me that if you want an angle to sell Perl on, it would make sense to cast it (in a marketing sense) into
a narrower role that doesn't pretend to be everything to everyone. Because, despite what some hard-core Perl programmers
might say, the language is somewhat dated. It hasn't really changed all that much since the 1990s.
Perl isn't particularly specialized so it has been used historically for almost every kind of application imaginable. Since
it was (for a long time in the dot-com era) a mainstay of IT development (remember the 'duct tape' of the internet?) it gained
high status among people who were developing new systems in short time-frames. This may in fact be one of the problems in selling
it to people nowadays.
The FreeBSD OS even included Perl as part of their main (full) distribution for some time and if I remember correctly, Perl
scripts were included to manage the ports/packaging system for all the 3rd party software. It was taken out of the OS shortly
after the bust and committee reorganization at FreeBSD, where it was moved into third-party software. The package-management scripts
were re-written in C. Other package management utilities were effectively displaced by a Ruby package.
A lot of technologies have come along since the 90s which are more appealing platforms than Perl for web development, which
is mainly what it's about now.
If you are going to build modern web sites these days, you'll more than likely use some framework that utilizes object-oriented
languages. I suppose the Moose augmentation of Perl would have some appeal with that, but CPAN modules and addons like Moose are
not REALLY the Perl language itself. So if we are talking about selling the Perl language alone to potential adopters, you have
to be honest in discussing the merits of the language itself without all the extras.
Along those lines I could see Perl having special appeal being cast in a narrower role, as a kind of advanced systems batching
language - more capable and effective than say, NT scripting/batch files or UNIX shell scripts, but less suitable than object-oriented
languages, which pretty much own the market for web and console utilities development now.
But there is a substantial role for high-level batching languages, particularly in systems that build data for consumption
by other systems. These are traditionally implemented in the highest-level batching language possible. Such systems build things
like help files, structured (non-relational) databases (often used on high-volume commercial web services), and software. Not
to mention automation many systems administration tasks.
There are not too many features or advantages to Perl that are unique in itself in the realm of scripting languages, as they
were in the 90s. The simplicity of built-in Perl data structures and regular expression capabilities are reflected almost identically
in Ruby, and are at least accessible in other strongly-typed languages like Java and C#.
The fact that Perl is easy to learn, and holds consistent with the idea that "everything is a string" and there is no need
to formalize things into an object-oriented model are a few of its selling points. If it is cast as an advanced batching language,
there are almost no other languages that could compete with it in that role.
@Pascal: bytecode is nasty for the poor Sysadmin/Devop who has to run your code. She/he can never fix it when bugs arise. There
is no advantage to bytecode over interpreted.
Which infact leads me to a good point.
All the 'selling points' of Java have all failed to be of any real substance.
Cross-platform? vendor applications are rarely supported on more than one platform, and rarely will work on any other platform.
Bytecode - hasnt proved to provide any performance advantage, but merely made peoples lives more difficult.
Object Oriented - it was new and cool, but even Java fails to be a 'pure' OO language.
In truth, Java is popular because it is popular.
Lots of people dont like perl because its not popular any more. Similar to how lots of people hate Mac's but have no logical
reason for doing so.
Douglas is almost certainly right, that Python is rapidly becoming the new fad language.
Im not sure how perl OO is a 'hack'. When you bless a reference in to an object it becomes and object... I can see that some
people are confused by perls honesty about what an object is. Other languages attempt to hide away how they have implemented objects
in their compiler - who cares? Ultimately the objects are all converted in to machine code and executed.
In general perl objects are more object oriented than java objects. They are certainly more polymorphic.
Perl objects can fully hide their internals if thats something you want to do. Its not even hard, and you dont need to use
moose. But does it afford any real benefit? Not really.
At the end of the day, if you want good software you need to hire good programmers it has nothing to do with the language.
Even though some languages try to force the code to be neat (Python) and try to force certain behaviours (Java?) you can write
complete garbage in any of them, then curse that language for allowing the author to do so.
A syntactic argument is pointless. As is something oriented around OO. What benefits perl brings to a business are...
- massive centralised website of libraries (CPAN)
- MVC's
- DBI
- POE
- Other frameworks etc
- automated code review (perlcritic)
- automated code formatting and tidying (perltidy)
- document as you code (POD)
- natural test driven development (Test::More etc)
- platform independence
- perl environments on more platforms than java
- perl comes out of the box on every unix
- excellent canon of printed literature, from beginner to expert
- common language with Sysadmin/Devops and traditional developers roles (with source code always available to *fix* them problem
quickly, not have to try to set up an ant environment with and role a new War file)
- rolled up perl applications (PAR files)
- Perl can use more than 3.6gig of ram (try that in java)
Brian Martin
Well said Dean.
Personally, I don't really care if a system is written in Perl or Python or some other high level language, I don't get religious
about which high level language is used.
There are many [very] high level languages, any one of them is vastly more productive & consequently less buggy than
developing in a low level language like C or Java. Believe me, I have written more vanilla C code in my career than Perl
or Python, by a factor of thousands, yet I still prefer Python or Perl as quite simply a more succinct expression of the
intended algorithm.
If anyone wants to argue the meaning of "high level", well basically APL wins ok. In APL, to invert a matrix is a single operator.
If you've never had to implement a matrix inversion from scratch, then you've never done serious programming. Meanwhile, Python
or Perl are pretty convenient.
What I mean by a "[very] high level language" is basically how many pages of code does it take to play a decent game
of draughts (chequers), or chess ?
In APL you can write a reasonable draughts player in about 2 pages.
In K&R C (not C++) you can write a reasonable Chess player in about 10-20 pages.
I recently read a frightening 2008 post by David Pogue about the breakdown of homemade DVDs. This inspired me to back up my old
DVDs of my dog to my computer (now that hard drives are so much bigger than they used to be), which led me to install HandBrake.
The Handbrake web site includes this gem:
"The Law of Software Development and Envelopment at MIT:
Every program in development at MIT expands until it can read mail."
I thought of that when I heard that Facebook is launching a (beyond) email service.
(The side benefit of this project is that now I get to watch videos of my dog sleeping whenever I want to.)
Nemo
Pogue should have mentioned whether he was talking about DVD-R or DVD-RW.
The rewritable variants are vastly more perishable than the write-once variants.
That law is originally due to Jamie Zawinski (rather famous software developer known for his work on Netscape Navigator and
contributions to Mozilla and XEmacs). In its original form:
Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which
can.-Jamie Zawinski
Ted K
James, you remind me of myself when I'm drinking and thinking of past things. I'm mean, I'm not criticizing you, please don't
take it as criticism. But watching videos of the dog that passed away�Don't torture yourself man. Wait some time and when you
and the family is ready, get a new dog. Should probably be a different breed unless you're super-sold on that breed. Let the kids
choose one from a set of breeds you like.
We got Pogue's book on the i-Pod because Apple's manual is so crappy. He is "da man".
You know if Apple gave a damn about customers they would include a charger cord with that damned thing to hook into the wall
instead of making you shop for the charger cord separately, but Mr. "I'm an assh*le" Steve Jobs couldn't bother to show he's customer
service inclined. He's slowly but surely going the way of Bill Gates.
Ted K
Mr. Kwak,
There is a super good story they're running on PBS NewsHour today with the former "NOW" host David Brancaccio on a show called
"Fixing the Future". James you need to try to download that or catch the show. That looks really good and shows some promise for
the future. Catch this link people. Nov 18 (Thursday) http://www.pbs.org/now/fixing-the-future/index.html
David Petraitis
@Ted K
It looks like programs need to expand until they can delete uploaded Youtube videos which are seriously off topic.
As for James' original point, most applications today are Internet Aware and use the Internet in their base functionality
(which is what was meant by the original email capability) The next level is for them to be mobile and location aware,
and it is already happening.
Bruce E. Woych
facebook launches beyond e-mail�
I was browsing through some tech books at B&N's and came across some works assessing the questionable fields of cyber space
tech wars and the current trends in development. The book has no axe to grind and was written well before the facebook attempt
to dominate the media with multimedia dependency. Here's what was so interesting in the text that applies:
The two greatest "vulnerabilities of the future" involve what is categorized as consolidation and convergence.
Consolidation is the process we all look forward to in facilitating easy operation under one command network. The
problem here is obviously that it also places the entire menu on one table to be infiltrated or manipulated.
Convergence, on the other hand, is just exactly what Facebook is "selling" to dominate the market
and bring everything possible into one network user friendly conveyance (conduit so to speak). The more the merrier! And of
course this practically makes old fashion e-mailing obsolete! Too much time, less direct interface�etc.; and I think you get
the picture.
Now it first occurred to me that this is much like micro and macro economics�but then I realized that it is precisely (in the
field) like too big to fail!
So are we on another monopoly trail down the primrose path of self destructive dependencies?
Isn't this just another brand media Octopus looking to knock out variations and dominate our choices with their market offerings?
And is this going to set us up for I.T. crisis of authorization for the systemic network and future of "ownership" wars in essential
services?
3-D
Facebook is recreating AOL. A gigantic walled garden that becomes "the internet" for most of the people with computers.
Look how AOL ended up.
And Handbrake is a great little program. I've been using it to rip my DVD collection to the 2TB of network storage I now have
on my home network. A very convenient way to watch movies.
Anonymous
Shrub already handed them out to all his war + torture buddies, as well as Greenspan � and Daddy Shrub gave one to the teabaggers'
favorite faux-economist (Hayek) and to Darth Cheney, so I'd say the reputation of the medal is pretty much already in the sewer.
As I write this column, I'm in the middle of two summer projects; with luck, they'll both be finished by the time you read it.
One involves a forensic analysis of over 100,000 lines of old C and assembly code from about 1990, and I have to work on Windows
XP. The other is a hack to translate code written in weird language L1 into weird language L2 with a program written in scripting
language L3, where none of the L's even existed in 1990; this one uses Linux. Thus it's perhaps a bit surprising that I find myself
relying on much the same toolset for these very different tasks.
... ... ...
here has surely been much progress in tools over the 25 years that IEEE Software has been around, and I wouldn't want to go back
in time. But the tools I use today are mostly the same old ones-grep, diff, sort, awk, and friends.
This might well mean that I'm a dinosaur stuck in the past. On the other hand, when it comes to doing simple things quickly, I
can often have the job done while experts are still waiting for their IDE to start up. Sometimes the old ways are best, and
they're certainly worth knowing well
The Lua programming language is a small scripting language specifically designed to be embedded in other programs.
Lua's C API allows exceptionally clean and simple code both to call Lua from C, and to call C from Lua.
This allows developers who want a convenient runtime scripting language to easily implement the basic API elements needed by the
scripting language, then use Lua code from their applications.
This article introduces the Lua language as a possible tool for simplifying common development tasks, and discusses some of the
reasons to embed a scripting language in the first place.
chthonicdaemon is pretty naive and does not understand
that combination of scripting language with complied language like C (or semi-compiled language like Java) is more productive environment
that almost any other known... You need a common runtime like in Windows to make it a smooth approach (IronPython). Scripting helps
to avoid OO trap that is pushed by "a hoard of practically illiterate researchers publishing crap papers in junk conferences."
"I have been using Linux as my primary environment for more than ten years. In this time, I have absorbed all the lore surrounding
the Unix Way - small programs doing one thing well, communicating via text and all that. I have found the command line a productive
environment for doing many of the things I often do, and I find myself writing lots of small scripts that do one thing, then piping
them together to do other things. While I was spending the time learning grep, sed, awk, python and many other more esoteric languages,
the world moved on to application-based programming, where the paradigm seems to be to add features to one program written in one
language. I have traditionally associated this with Windows or MacOS, but it is happening with Linux as well. Environments have little
or no support for multi-language projects - you choose a language, open a project and get it done. Recent trends in more targeted
build environments like cmake or ant are understandably focusing on automatic dependency generation and cross-platform support, unfortunately
making it more difficult to grow a custom build process for a multi-language project organically. All this is a bit painful for me,
as I know how much is gained by using a targeted language for a particular problem. Now the question: Should I suck it up and learn
to do all my programming in C++/Java/(insert other well-supported, popular language here) and unlearn ten years of philosophy, or
is there hope for the multi-language development process?"
by setagllib (753300) on Saturday February 28, @12:29AM (#27020683)
Eclipse is a fantastic platform for multi-language development, especially if your primary languages are C, C++, Python, Ruby,
etc.
All you need to do is create a C++ Makefile Project, then use the makefile to wrap your build system (e.g. ant, scons, actual
makefile, whatever). You can build any number of binaries and launch them (or scripts) from the powerful launch profile system.
Basically, Eclipse projects have "facets" - they can cram in features from multiple language development kits and mostly remain
compatible. You still sometimes have to do the glue work yourself, but in general C/C++/Python are very easy to mesh. It is therefore
easy to have a project with C libraries being loaded by Python, and so on.
It is a very high level development language, and does have a vast common library, able to "talk" tens of protocols, you can
call directly any module compiled into a dynamic library with the CTypes module.
Also, if your application or parts of it run in the Java VM, no problem, python is there in the form of "jython", enabling you
to use this dynamic, multi paradigm and interactive language directly from inside the JVM, with all its standard library, plus
full access to any java classes you have in there. Oh..you do you use
.net? Ditto - there is ironpython!
Ah, you need to exchange data from parents of your app in the jvm with native code in
.CPP? Use libboost or ctyypes to interface python with the
.cpp, and soem xmlrpc to talk with a module in the JVM (oh, it would take you 10, perhaps 12 lines of code to write two methods
in python which use the standard library to talk back and forth between both running enviroments.
Plus, connectivity with the automation interface of hundreds of other software - including OpenOffice, GIMP, Inkscape, all
of KDE software through DCOP (kde 3) and DBUS (KDE 4), easy communication to any windows software which does havea COMM interface
- and, it even works under GNU/other unixes - just run your windows app and win32 python under the wine emulator (the apps "think"
they are ont eh windoews, but sockets and network ports are on localhost across windows and native apps)
I would read it as sarcasm. Try reading this manifesto
[pbm.com] and updating Fortran to C to account for 20 years of shift in the industry. Anyone not using C is just eating Quiche.
Although his joke went over your head, it is worth pointing out that OO is not a paradigm. I know wikipedia thinks
that it is, and so do a hoard of practically illiterate researchers publishing crap papers in junk conferences.
But that doesn't make it true.
Object Orientation is just a method of organization for procedural languages. Although it helps code maintenance and does a
better job of unit management that modules alone, it doesn't change the underlying computational paradigm.
I say procedural languages because class-based programming in functional languages is actually a different type of beast although
it gets called OO to appeal to people from an imperative background.
Your "Unix Way" is a wheel that's being reinvented as SOA, etc.
Here's the thing: It is possible for one language to be good enough for nearly everything, especially if you pick one with
good support for internal DSLs (I like Ruby). Also, while message-passing is a good idea, it's usually slow, and you probably
don't want to be designing your own text-based format every time.
Now, you're still going to have DSLs and whole other languages forced on you, occasionally. For example, JavaScript is still
the best language for AJAX clients, simply because no one has written a better language that compiles to JavaScript. (That's relative,
of course -- if you like Java, then Google Web Toolkit will be perfect.) In fact, with web development, you'll want JavaScript,
HTML, CSS, and probably another language (Ruby, Perl, Python, PHP, etc), and SQL, all in the same application.
But, each additional language is that much more for a newcomer to learn, and it's that much more glue. If you communicate with
text, how much time are you spending writing text parsers instead of your app?
Of course, ideally, you provide a language-agnostic API, because you may need this application to interact with others. You
might even find yourself writing multiple applications...
But the other big win of a huge application is the UI. The Unix commandline has made mashups of many small programs as
easy as a pipe character. There's really no equivalent for the GUI -- users will relate better to one big monolith, even
if it's just a frontend for a bunch of small tools.
So, I would split application by the UI concept, and share the small, common utilities via shared libraries. That's not far
off from the Unix Way, either -- it's not hard to write a small commandline app with a shared library, if you find you need it.
It can be annoyingly difficult to go the other way -- for example, Git bindings aren't as easy as they should be.
The Unix Way is a perfectly valid method to develop administrative, text-based tasks targeting a single, well-known platform,
but does not scale well toward the development of other types of applications.
First, compared to modern languages like Python or Java, shell scripting sucks. The syntax is awkward and it
can only manipulate bits of text. The world has moved on from text. Today, I want to be able to process complex structures, which
in many cases cannot be converted to a simple text format.
Second, modern languages have huge libraries, so usually there is no need to use anything but those libraries. Furthermore,
using those libraries reduced compatibilities issues. When I develop for the Java 6 platform, I know the code is going to work
on every single platform with support for Java 6: Windows, Linux, Solaris, you name it. With the Unix Way, you have to make sure
that every single function of every single tool you use is going to behave in the same way on every single platform. This is of
course a huge pain in the ass.
But there is no need to fret. You quote Python: from my point of view, it is one of the platforms that can be used exclusively,
so your experience is perfectly valuable. Regexp are pretty much the same everywhere.
But I do not really understand your problem. If you're developing applications, you should know all that already. If your software
development experience is limited to administering systems, shell scripting is always going to work for you. I guess that in this
last case, you may want to try to pick a singe platform (say, Python) for all your dev needs and see how it goes.
Very questionable logic. The cost of programming and especially the cost of maintenance of an application depends on the level of
the language. It is less with Ruby/Python then with Java.
Lately I seem to find everywhere lots of articles about the imminent dismissal of Java and its replacement with the scripting language
of the day or sometimes with other compiled languages.
No, that is not gonna happen. Java is gonna die eventually of old age many
many years from now. I will share the reasoning behind my statement. Let's first look at some metrics.
Language popularity status as of May 2008
For this I am gonna use the TIOBE index (tiobe.com) and the nice graphs at
langpop.com. I know lots of people don't like them because their statistics
are based on search engine results but I think they are a reasonable fair indicator of popularity.
What I find significant here is the huge share the "C like syntax" languages have.
C (15.292) + C++ (10.484) + Java (20.176) + C# (3.963) = 49.915%
This means 4 languages get half of all the attention on the web.
If we add PHP (10.637) here (somehow uses a similar syntax) we get 60.552%
As a result we can extract:
Reason number 1:Syntax is very important because it builds on previous knowledge. Also similar syntax means
similar concepts. Programmers have to make less effort to learn the new syntax, can reuse the old concepts and thus they can concentrate
on understanding the new concepts.
This is less than the attention Visual Basic gets: 10.782% and leads us to�
Reason number 2:Too much noise is distracting. Programmers are busy and learning 10 languages to the level
where they can evaluate them and make an educated decision is too much effort. The fact that most of these languages have a different
syntax and introduce different (sometimes radically different) concepts doesn't help either.
Looking at the trend for the last 7 years we can see a pretty flat evolution in popularity for most of the languages. There
are a few exceptions like the decline of Perl but nothing really is earth shattering. There are seasonal variations but in
long term nothing seems to change.
This shows that while various languages catch the mind of the programmer for a short time, they are put back on the shelf pretty
fast. This might be caused by the lack of opportunity to use them in real life projects. Most of the programmers in the world work
on ongoing projects.
Reason number 3: Lack of pressure on the programmers to switch. The market is pretty stable, the existing languages
work pretty well and the management doesn't push programmers to learn new languages.
Number of new projects started
Looking at another site that does language popularity analysis, langpop.com,
we see a slightly different view but the end result is almost the same from the point of view of challenger languages.
What I found interesting here was the analysis regarding new projects started in various languages. The sources for information
are Freshmeat.net and Google
Code. The results show a clear preference for C/C++/Java with Python getting some attention.
Reason number 4: Challenger languages don't seem to catch momentum in order to create an avalanche of new projects
started with them. This can be again due to the fact that they spread thin when they are evaluated. They are too many.
Other interesting charts at langpop.com are those about
books on programming languages at amazon.com and about language
discussions statistics. Book writers write about subjects that have a chance to sell. On the other hand a lot of discussion about
all theses new languages takes place online. One thing I noticed in these discussion is the attitude the supporters of certain languages
have. There is a lot of elitism and concentration on what is wrong with Java instead of pointing to what their language of choice
brings useful and on creating good tutorials for people wanting to attempt a switch.
Reason number 5: Challenger languages communities don't do a good job at attracting programmers from established languages.
Telling to somebody why she is wrong will most likely create a counter reaction not interest.
Let's look now at what is happening on the job market. I used the tools offered by
indeed.com and I compared a bunch of languages to produce this graph:
Reason number 6:There is no great incentive to switch to one of the challenger languages since gaining this skill
is not likely to translate into income in the near future.
Well, I looked at all these statistics and I extracted some reasons, but what are the qualities a language needs and what are
the external conditions that will make a programming language popular?
How and when does a language become popular
A new language has to gain the support of a big number of programmers using, at the moment, a different language.
To do this it has to leverage things those programmers already know. (C++ built on C, Java built on C++, C# built on C++, Java
and Delphi)
A new language stands a chance when there are some pressing problems with the existing languages. For example Java managed
to cover two problems plaguing the C/C++ world: complexity (C++) and memory management (C/C++). These two were real problems
because projects plagued with bugs created by complexity and with memory leaks.
A changing market can also help a lot. Java managed to ride the Internet growth. They lost the browser battle,
applets are not very used, but the switch to the server market was a huge success.
Based on history we can see how all successful languages had very powerful sponsors. C/C++/Java/C# are all creations
of big companies like AT&T, Sun, Microsoft. All these new languages are born in universities and research institutes or
are coming from very specific niche domains.
A popular language needs to be generic and applicable in most of the domains if not all.
Popular languages usually succeed fast. They have to avoid getting "old". When programmers see a language around
for many years without a growing market share they start to feel just okay not learning it.
Reason number 7:The new languages don't introduce an earth shattering improvement in the life of most of the
programmers and projects.
Reason number 8: There is no killer application on the horizon. This means new languages compete in old markets
with established players. Reason number 9: None of these new languages has a powerful sponsor with the will and the money to push them on the
market. Powerful sponsor translates in investment in the libraries - see Java. All these new languages are born in universities
and research institutes or are coming from very specific niche domains.
Reason number 10: Most of these languages lingered around too much without stepping decisively into the big arena.
For one's curiosity here is a list of talked about languages with their birth date: Ruby (mid 1990s), Python (1991), Lisp (1958), Scheme (1970s), Lua (1993), Smalltalk (1969-1980), Haskell (1990), Erlang (1987),
Caml (1985), OCaml (1996), Groovy (2003), Scala (2003)
Compare this with older successful languages: C (1972), C++ (1983), Java (1995), C# (2001), BASIC (1964), Pascal (1970), FORTRAN (1957), Ada (1983), COBOL (1959)
It is pretty obvious most of these "new" languages lost the train to success.
Why many of the new languages will never be popular
I already mentioned syntax a few times
Some languages made strange mistakes. For example Python is a great language but the idea of using indentation as block demarcation
really is a cannon ball chained to its feet. While most of the pythonistas defend this idea with a lot of energy,
the truth is this feature makes it really a dangerous tool in big, world wide distributed projects - and most important enterprise
projects are big and distributed. For a better analysis from somebody with real experience read this:
Python indentation
considered boneheaded-- [ while psychologically this criticism sounds true, from a technical standpoint this is an criticism,
as automatic indentation can be implemented via special comments and preprocessor -- NNB]
Some languages have very difficult to "get" concepts. For example most of the supporters of functional languages
are proud of how concise statements are in their language. This is not really useful for somebody used to think procedural or
object oriented. If the only gain from binding and twisting your mind is typing a few less lines then any experience programmer
will tell you that this is not the main activity. Writing the first version is just a small part of the life cycle of a project.
Typing the code is even smaller compared with the design time. From the second version the game changes dramatically. Maintainability
is way more important. Also very important is to be able to add features and to refactor the code. Readability is paramount from
version two, and for both development and support teams.
The nature of a part of these languages makes it difficult to build really good tools to support them. One very useful
feature is the automatic refactoring provided by advanced tools like Eclipse.
Reason number 11: "Features" that look and are dangerous for big projects. Since there are not a lot of
big projects written in any of these languages it is hard to make an unbiased evaluation. But bias is in the end a real obstacle
for their adoption.
Reason number 12: Unnatural concepts (for majority of programmers) raise the entry level. Functional languages make
you write code like mathematical equations. But how many people actually love math so much to write everything in it? Object
oriented languages provide a great advantage: they let programmers think about the domain they want to model, not about the language
or the machine.
Reason number 13: Lack of advanced tools for development and refactoring cripple the programmer and the development
teams when faced with big amounts of lines of code.
How would a real Java challenger look like
The successful challenger should try to build on existing knowledge and infrastructure. Scala is actually getting on
this road. Running on the JVM and being able to reuse Java libraries is a huge advantage. Using a similar syntax to Java is also
a great decision that might push Scala into mainstream.
The challenger needs a killer application. Erlang might have the killer application with distributed computing features.
But distributed computing is not that mainstream yet, even on the server market.
Python actually is sponsored by Google
It has to be born inside a powerful company or to be adopted by a big sponsor.
All the challenger languages are lacking a sponsor
at this point. Sun seems to be interested in some of the scripting languages. But I am not sure if their attention is gonna help
any of these languages or is gonna distract and kill them. And Sun already has Java so we can suspect they are trying to actually
promote Java through these languages - see
the scripting engine
Pick me, pick mee, pick meeee!!!
Looking at all those (smart) languages and all the heated discussions that surround them makes me think about the donkey from
Shrek yelling "Pick me! Pick mee!! Pick meeee!!!". In the end only one can be the real winner even if in a limited part
of the market.
For scripting Python has potential, huge potential. But it has to do something about the indentation fetish
to be able penetrate the big project market. Without that the web looks PHPish.
Ruby is elegant but alien. I saw its syntax described like "the bastard son of Perl" (just
google it).
Its new popularity is based not on the language itself but on a framework (Rails) that can be reproduced in other languages even
if with less elegance. Struts 2 attempts just that.
Scripting languages (Groovy, Rhino�)on top of Java and the JVM are interesting but they will never be primadonnas.
They cannot compete with Java because they are slower. They can be useful when
scripting a Java application
is a desirable feature (VBA is an excellent tool for Microsoft products and other Windows products and it pushed Visual Basic
up the scale).
Scala has a lot of good cards. Building on the JVM, familiar syntax, huge inherited library, can be as fast as
Java on the JVM� But where is the sponsor and where is the killer application in a shifting market?
The danger for Java doesn't come from outside. None of these new (actually most of them are pretty old) languages
have the potential to displace Java. The danger for Java comes from inside and it is caused by too many "features" making
their way into the language and transforming if from a language that wanted to keep only the essential features of C++ into
a trash box for features and concepts from all languages.
In the end I want to make it clear that I am not advocating against any of those languages. There is
TAO in all of them. I actually find them interesting,
cool and useful as exercise for my brain, when I have time. I recommend to every programmer to look around from time to time and
try to understand what is going on the language market.
This article is part of a series of opinions and rants:
What new elements does Perl 5.10.0 bring to the language? In what way is it preparing for Perl 6?
Perl 5.10.0 involves backporting some ideas from Perl 6, like switch statements and named pattern matches.
One of the most popular things is the use of "say" instead of "print".
This is an explicit programming design in Perl - easy things should be easy and hard things should be possible. It's
optimised for the common case. Similar things should look similar but similar things should also look different, and how
you trade those things off is an interesting design principle.
Huffman Coding is one of those principles that makes similar things look different.
In your opinion, what lasting legacy has Perl brought to computer development?
An increased awareness of the interplay between technology and culture. Ruby has borrowed a few ideas from Perl and so has
PHP. I don't think PHP understands the use of signals, but all languages borrow from other languages, otherwise they risk being
single-purpose languages. Competition is good.
It's interesting to see PHP follow along with the same mistakes Perl made over time and recover from them. But
Perl 6 also borrows back from other languages too, like Ruby. My ego may be big, but it's not that big.
Where do you envisage Perl's future lying?
My vision of Perl's future is that I hope I don't recognize it in 20 years.
Where do you see computer programming languages heading in the future, particularly in the next 5 to 20 years?
Don't design everything you will need in the next 100 years, but design the ability to create things we will need in 20 or
100 years. The heart of the Perl 6 effort is the extensibility we have built into the parser and introduced language changes as
non-destructively as possible.
> Given the horrible mess that is Perl (and, BTW, I derive 90% of my income from programming in Perl),
.
Did the thought that 'horrible mess' you produce with $language 'for an income' could be YOUR horrible mess already cross your mind?
The language itself doesn't write any code.
> You just said something against his beloved
> Perl and compounded your heinous crime by
> saying something nice about Python...in his
> narrow view you are the antithesis of all that is
> right in the world. He will respond with his many
> years of Perl == good and everything else == bad
> but just let it go...
.
That's a pretty pointless insult. Languages don't write code. People do. A statement like 'I think that code written in Perl looks
very ugly because of the large amount of non-alphanumeric characters' would make sense. Trying to elevate entirely subjective,
aesthetic preferences into 'general principles' doesn't. 'a mess' is something inherently chaotic, hence, this is not a sensible
description for a regularly structured program of any kind. It is obviously possible to write (or not write) regularly structured
programs in any language providing the necessary abstractions for that. This set includes Perl.
.
I had the mispleasure to have to deal with messes created by people both in Perl and Python (and a couple of other languages) in
the past. You've probably heard the saying that "real programmers can write FORTRAN in any language" already.
It is even true that the most horrible code mess I have seen so far had been written in Perl. But this just means that a
fairly chaotic person happened to use this particular programming language.
C/C++ are the languages you'd want to go for. They can do *everything*, have great support, are fast etc.
Let's be honest here. C and C++ are very fast indeed if you use them well (very little can touch them; most other languages are
actually implemented in terms of them) but they're also very easy to use really badly. They're genuine professional power
tools: they'll do what you ask them to really quickly, even if that is just to spin on the spot chopping peoples' legs off. Care
required!
If you use a higher-level language (I prefer Tcl, but you might prefer Python, Perl, Ruby, Lua, Rexx, awk, bash, etc. - the list
is huge) then you probably won't go as fast. But unless you're very good at C/C++ you'll go acceptably fast at a much earlier calendar
date. It's just easier for most people to be productive in higher-level languages. Well, unless you're doing something where you
have to be incredibly close to the metal like a device driver, but even then it's best to keep the amount of low-level code small
and to try to get to use high-level things as soon as you can.
One technique that is used quite a bit, especially by really experienced developers, is to split the program up into components
that are then glued together. You can then write the components in a low-level language if necessary, but use the far superior
gluing capabilities of a high-level language effectively. I know many people are very productive doing this.
Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing
lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest
with a single compilation.
Andrew Binstock: You are one of the fathers of the open-source revolution, even if you aren't widely heralded as such. You
previously have stated that you released TeX as open source
because of the problem of proprietary implementations at the time, and to invite corrections to the code-both of which are key drivers
for open-source projects today. Have you been surprised by the success of open source since that time?
Donald Knuth:
The success of open source code is perhaps the only thing in the computer field that hasn't surprised me during the
past several decades. But it still hasn't reached its full potential; I believe that open-source programs will begin to
be completely dominant as the economy moves more and more from products towards services, and as more and more volunteers arise
to improve the code.
For example, open-source code can produce thousands of binaries, tuned perfectly to the configurations of individual
users, whereas commercial software usually will exist in only a few versions. A generic binary executable file must include
things like inefficient "sync" instructions that are totally inappropriate for many installations; such wastage goes away when
the source code is highly configurable. This should be a huge win for open source.
Yet I think that a few programs, such as Adobe Photoshop, will always be superior to competitors like the Gimp-for some reason,
I really don't know why! I'm quite willing to pay good money for really good software, if I believe that it
has been produced by the best programmers.
Remember, though, that my opinion on economic questions is highly suspect, since I'm just an educator and scientist. I understand
almost nothing about the marketplace.
Andrew: A story states that you once entered a programming contest at Stanford (I believe) and you submitted the winning entry,
which worked correctly after a single compilation. Is this story true? In that vein, today's developers frequently build
programs writing small code increments followed by immediate compilation and the creation and running of unit tests. What are your
thoughts on this approach to software development?
Donald:
The story you heard is typical of legends that are based on only a small kernel of truth. Here's what actually happened:
John McCarthy decided in 1971
to have a Memorial Day Programming Race. All of the contestants except me worked at his AI Lab up in the hills above Stanford,
using the WAITS time-sharing system; I was down on the main campus, where the only computer available to me was a mainframe for
which I had to punch cards and submit them for processing in batch mode. I used Wirth's
ALGOL W system (the predecessor of Pascal). My program
didn't work the first time, but fortunately I could use Ed Satterthwaite's excellent offline debugging system for ALGOL W,
so I needed only two runs. Meanwhile, the folks using WAITS couldn't get enough machine cycles because their machine was so overloaded.
(I think that the second-place finisher, using that "modern" approach, came in about an hour after I had submitted the winning
entry with old-fangled methods.) It wasn't a fair contest.
As to your real question, the idea of immediate compilation and "unit tests" appeals to me only rarely, when I'm feeling my
way in a totally unknown environment and need feedback about what works and what doesn't. Otherwise, lots of time is wasted
on activities that I simply never need to perform or even think about. Nothing needs to be "mocked up."
Andrew: One of the emerging problems for developers, especially client-side developers, is changing their thinking to write
programs in terms of threads. This concern, driven by the advent of inexpensive multicore PCs, surely will require that many algorithms
be recast for multithreading, or at least to be thread-safe. So far, much of the work you've published for Volume 4 of The Art of Computer Programming (TAOCP)
doesn't seem to touch on this dimension. Do you expect to enter into problems of concurrency and parallel programming in upcoming
work, especially since it would seem to be a natural fit with the combinatorial topics you're currently working on?
Donald:
The field of combinatorial algorithms is so vast that I'll be lucky to pack its sequential aspects into three or four
physical volumes, and I don't think the sequential methods are ever going to be unimportant. Conversely, the half-life of parallel
techniques is very short, because hardware changes rapidly and each new machine needs a somewhat different approach. So I decided
long ago to stick to what I know best. Other people understand parallel machines much better than I do; programmers should listen
to them, not me, for guidance on how to deal with simultaneity.
Andrew: Vendors of multicore processors have expressed frustration at the difficulty of moving developers to this model. As
a former professor, what thoughts do you have on this transition and how to make it happen? Is it a question of proper tools, such
as better native support for concurrency in languages, or of execution frameworks? Or are there other solutions?
Donald:
I don't want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current
trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and
that they're trying to pass the blame for the future demise of Moore's Law to the software writers by giving us machines
that work faster only on a few key benchmarks! I won't be surprised at all if the whole multithreading idea turns out
to be a flop, worse than the "Titanium" approach that was supposed to be so terrific - until it turned out that the wished-for compilers were basically impossible to write.
Let me put it this way: During the past 50 years, I've written well over a thousand programs, many of which have substantial
size. I can't think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading.
Surely, for example, multiple processors are no help to TeX.[1]
How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing
but grief from software people, although the hardware folks in our department assure me that I'm wrong.
I know that important applications for parallelism exist-rendering graphics, breaking codes, scanning images, simulating physical
and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need
to be changed substantially every few years.
Even if I knew enough about such methods to write about them in TAOCP, my time would be largely wasted, because soon
there would be little reason for anybody to read those parts. (Similarly, when I prepare the third edition of
Volume 3 I plan to rip out much of the
material about how to sort on magnetic tapes. That stuff was once one of the hottest topics in the whole software field, but now
it largely wastes paper when the book is printed.)
The machine I use today has dual processors. I get to use them both only when I'm running two independent jobs at the same
time; that's nice, but it happens only a few minutes every week. If I had four processors, or eight, or more, I still wouldn't
be any better off, considering the kind of work I do-even though I'm using my computer almost every day during most of the day.
So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to
make multicores speed up my kind of work; I think it's a pipe dream. (No-that's the wrong metaphor! "Pipelines" actually
work for me, but threads don't. Maybe the word I want is "bubble.")
From the opposite point of view, I do grant that web browsing probably will get better with multicores. I've been talking about
my technical work, however, not recreation. I also admit that I haven't got many bright ideas about what I wish hardware designers
would provide instead of multicores, now that they've begun to hit a wall with respect to sequential computation. (But my
MMIX design contains several ideas that would
substantially improve the current performance of the kinds of programs that concern me most-at the cost of incompatibility with
legacy x86 programs.)
Andrew: One of the few projects of yours that hasn't been embraced by a widespread community is literate programming. What are your thoughts about why literate
programming didn't catch on? And is there anything you'd have done differently in retrospect regarding literate programming?
Donald:
Literate programming is a very personal thing. I think it's terrific, but that might well be because I'm a very strange person.
It has tens of thousands of fans, but not millions.
In my experience, software created with literate programming has turned out to be significantly better than software
developed in more traditional ways. Yet ordinary software is usually okay-I'd give it a grade of C (or maybe C++), but
not F; hence, the traditional methods stay with us. Since they're understood by a vast community of programmers, most people have
no big incentive to change, just as I'm not motivated to learn Esperanto even though it might be preferable to English and German
and French and Russian (if everybody switched).
Jon Bentley probably hit the nail on the head when he
once was asked why literate programming hasn't taken the whole world by storm. He observed that a small percentage of the
world's population is good at programming, and a small percentage is good at writing; apparently I am asking everybody to be in
both subsets.
Yet to me, literate programming is certainly the most important thing that came out of the TeX project. Not only has it enabled
me to write and maintain programs faster and more reliably than ever before, and been one of my greatest sources of joy since
the 1980s-it has actually been indispensable at times. Some of my major programs, such as the MMIX meta-simulator, could
not have been written with any other methodology that I've ever heard of. The complexity was simply too daunting for my limited
brain to handle; without literate programming, the whole enterprise would have flopped miserably.
If people do discover nice ways to use the newfangled multithreaded machines, I would expect the discovery to come from people
who routinely use literate programming. Literate programming is what you need to rise above the ordinary level of achievement.
But I don't believe in forcing ideas on anybody. If literate programming isn't your style, please forget it and do what
you like. If nobody likes it but me, let it die.
On a positive note, I've been pleased to discover that the conventions of CWEB are already standard equipment within preinstalled
software such as Makefiles, when I get off-the-shelf Linux these days.
Andrew: In Fascicle 1 of Volume 1,
you reintroduced the MMIX computer, which is the 64-bit upgrade to the venerable MIX machine comp-sci students have come to know
over many years. You previously described MMIX in great detail in MMIXware. I've read portions of both books,
but can't tell whether the Fascicle updates or changes anything that appeared in MMIXware, or whether it's a pure synopsis. Could
you clarify?
Donald:
Volume 1 Fascicle 1 is a programmer's introduction, which includes instructive exercises and such things. The
MMIXware book is a detailed reference manual, somewhat terse and dry, plus a bunch of literate programs that describe prototype
software for people to build upon. Both books define the same computer (once the errata to MMIXware are incorporated from my website).
For most readers of TAOCP, the first fascicle contains everything about MMIX that they'll ever need or want to know.
I should point out, however, that MMIX isn't a single machine; it's an architecture with almost unlimited varieties of implementations,
depending on different choices of functional units, different pipeline configurations, different approaches to multiple-instruction-issue,
different ways to do branch prediction, different cache sizes, different strategies for cache replacement, different bus speeds,
etc. Some instructions and/or registers can be emulated with software on "cheaper" versions of the hardware. And so on. It's a
test bed, all simulatable with my meta-simulator, even though advanced versions would be impossible to build effectively until
another five years go by (and then we could ask for even further advances just by advancing the meta-simulator specs another notch).
Suppose you want to know if five separate multiplier units and/or three-way instruction issuing would speed up a given MMIX
program. Or maybe the instruction and/or data cache could be made larger or smaller or more associative. Just fire up the meta-simulator
and see what happens.
Andrew: As I suspect you don't use unit testing with MMIXAL, could you step me through how you go about making sure that your
code works correctly under a wide variety of conditions and inputs? If you have a specific work routine around verification, could
you describe it?
Donald:
Most examples of machine language code in TAOCP appear in Volumes 1-3; by the time we get to Volume 4, such low-level
detail is largely unnecessary and we can work safely at a higher level of abstraction. Thus, I've needed to write only a dozen
or so MMIX programs while preparing the opening parts of Volume 4, and they're all pretty much toy programs-nothing substantial.
For little things like that, I just use informal verification methods, based on the theory that I've written up for the book,
together with the MMIXAL assembler and MMIX simulator that are readily available on the Net (and described in full detail in the
MMIXware book).
That simulator includes debugging features like the ones I found so useful in Ed Satterthwaite's system for ALGOL W, mentioned
earlier. I always feel quite confident after checking a program with those tools.
Andrew: Despite its formulation many years ago, TeX is still thriving, primarily as the foundation for LaTeX. While TeX has been effectively frozen at your request,
are there features that you would want to change or add to it, if you had the time and bandwidth? If so, what are the major items
you add/change?
Donald:
I believe changes to TeX would cause much more harm than good. Other people who want other features are creating their own
systems, and I've always encouraged further development-except that nobody should give their program the same name as mine. I
want to take permanent responsibility for TeX and Metafont,
and for all the nitty-gritty things that affect existing documents that rely on my work, such as the precise dimensions of characters
in the Computer Modern fonts.
Andrew: One of the little-discussed aspects of software development is how to do design work on software in a completely new
domain. You were faced with this issue when you undertook TeX: No prior art was available to you as source code, and it was a domain
in which you weren't an expert. How did you approach the design, and how long did it take before you were comfortable entering into
the coding portion?
Donald:
That's another good question! I've discussed the answer in great detail in Chapter 10 of my book
Literate Programming, together with Chapters 1 and 2 of my book
Digital Typography. I think that anybody who is really interested in this topic will enjoy reading those chapters. (See also
Digital Typography Chapters 24 and 25 for the complete first and second drafts of my initial design of TeX in 1977.)
Andrew: The books on TeX and the program itself show a clear concern for limiting memory usage-an important problem for systems
of that era. Today, the concern for memory usage in programs has more to do with cache sizes. As someone who has designed a processor
in software, the issues of cache-aware and cache-oblivious
algorithms surely must have crossed your radar screen. Is the role of processor caches on algorithm design something that
you expect to cover, even if indirectly, in your upcoming work?
Donald:
I mentioned earlier that MMIX provides a test bed for many varieties of cache. And it's a software-implemented machine, so
we can perform experiments that will be repeatable even a hundred years from now. Certainly the next editions of Volumes 1-3 will
discuss the behavior of various basic algorithms with respect to different cache parameters.
In Volume 4 so far, I count about a dozen references to cache memory and cache-friendly approaches (not to mention a "memo
cache," which is a different but related idea in software).
Andrew: What set of tools do you use today for writing TAOCP? Do you use TeX? LaTeX? CWEB? Word processor? And what
do you use for the coding?
Donald:
My general working style is to write everything first with pencil and paper, sitting beside a big wastebasket.
Then I use Emacs to enter the text into my machine, using the conventions of TeX. I use tex, dvips, and gv to see the results,
which appear on my screen almost instantaneously these days. I check my math with Mathematica.
I program every algorithm that's discussed (so that I can thoroughly understand it) using CWEB, which works splendidly with
the GDB debugger. I make the illustrations with MetaPost (or,
in rare cases, on a Mac with Adobe Photoshop or Illustrator). I have some homemade tools, like my own spell-checker for TeX and
CWEB within Emacs. I designed my own bitmap font for use with Emacs, because I hate the way the ASCII apostrophe and the left
open quote have morphed into independent symbols that no longer match each other visually. I have special Emacs modes to help
me classify all the tens of thousands of papers and notes in my files, and special Emacs keyboard shortcuts that make bookwriting
a little bit like playing an organ. I prefer rxvt to xterm for
terminal input. Since last December, I've been using a file backup system called
backupfs, which meets my need beautifully to archive the
daily state of every file.
According to the current directories on my machine, I've written 68 different CWEB programs so far this year. There were about
100 in 2007, 90 in 2006, 100 in 2005, 90 in 2004, etc. Furthermore, CWEB has an extremely convenient "change file" mechanism,
with which I can rapidly create multiple versions and variations on a theme; so far in 2008 I've made 73 variations on those 68
themes. (Some of the variations are quite short, only a few bytes; others are 5KB or more. Some of the CWEB programs are quite
substantial, like the 55-page BDD package that I completed in January.) Thus, you can see how important literate programming is
in my life.
I currently use Ubuntu Linux, on a standalone laptop-it has no Internet
connection. I occasionally carry flash memory drives between this machine and the Macs that I use for network surfing and graphics;
but I trust my family jewels only to Linux. Incidentally, with Linux I much prefer the keyboard focus that I can get with classic
FVWM to the GNOME and KDE environments that other people seem
to like better. To each his own.
Andrew: You state in the preface of Fascicle 0 of Volume 4 of TAOCP
that Volume 4 surely will comprise three volumes and possibly more. It's clear from the text that you're really enjoying writing
on this topic. Given that, what is your confidence in the note posted on the TAOCP website that Volume 5 will see light
of day by 2015?
Donald:
If you check the Wayback Machine for previous incarnations of that web page, you will see that the number 2015 has not been
constant.
You're certainly correct that I'm having a ball writing up this material, because I keep running into fascinating facts that
simply can't be left out-even though more than half of my notes don't make the final cut.
Precise time estimates are impossible, because I can't tell until getting deep into each section how much of the stuff in my
files is going to be really fundamental and how much of it is going to be irrelevant to my book or too advanced. A lot of
the recent literature is academic one-upmanship of limited interest to me; authors these days often introduce arcane methods
that outperform the simpler techniques only when the problem size exceeds the number of protons in the universe. Such algorithms
could never be important in a real computer application. I read hundreds of such papers to see if they might contain nuggets for
programmers, but most of them wind up getting short shrift.
From a scheduling standpoint, all I know at present is that I must someday digest a huge amount of material that I've been
collecting and filing for 45 years. I gain important time by working in batch mode: I don't read a paper in depth until I can
deal with dozens of others on the same topic during the same week. When I finally am ready to read what has been collected about
a topic, I might find out that I can zoom ahead because most of it is eminently forgettable for my purposes. On the other hand,
I might discover that it's fundamental and deserves weeks of study; then I'd have to edit my website and push that number 2015
closer to infinity.
Andrew: In late 2006, you were diagnosed with prostate cancer. How is your health today?
Donald:
Naturally, the cancer will be a serious concern. I have superb doctors. At the moment I feel as healthy as ever, modulo being
70 years old. Words flow freely as I write TAOCP and as I write the literate programs that precede drafts of TAOCP.
I wake up in the morning with ideas that please me, and some of those ideas actually please me also later in the day when I've
entered them into my computer.
On the other hand, I willingly put myself in God's hands with respect to how much more I'll be able to do before cancer or
heart disease or senility or whatever strikes. If I should unexpectedly die tomorrow, I'll have no reason to complain, because
my life has been incredibly blessed. Conversely, as long as I'm able to write about computer science, I intend to do my best to
organize and expound upon the tens of thousands of technical papers that I've collected and made notes on since 1962.
Andrew: On your website, you mention that the Peoples
Archive recently made a series of videos in which you reflect on your past life. In segment 93, "Advice to Young People,"
you advise that people shouldn't do something simply because it's trendy. As we know all too well, software development is as subject
to fads as any other discipline. Can you give some examples that are currently in vogue, which developers shouldn't adopt simply
because they're currently popular or because that's the way they're currently done? Would you care to identify important examples
of this outside of software development?
Donald:
Hmm. That question is almost contradictory, because I'm basically advising young people to listen to themselves rather than
to others, and I'm one of the others. Almost every biography of every person whom you would like to emulate will say that he or
she did many things against the "conventional wisdom" of the day.
Still, I hate to duck your questions even though I also hate to offend other people's sensibilities-given that software methodology
has always been akin to religion. With the caveat that there's no reason anybody should care about the opinions of a computer
scientist/mathematician like me regarding software development, let me just say that almost everything I've ever heard associated
with the term "extreme programming" sounds
like exactly the wrong way to go...with one exception. The exception is the idea of working in teams and reading each
other's code. That idea is crucial, and it might even mask out all the terrible aspects of extreme programming that alarm me.
I also must confess to a strong bias against the fashion for reusable code. To me, "re-editable code" is much,
much better than an untouchable black box or toolkit. I could go on and on about this. If you're totally convinced that reusable
code is wonderful, I probably won't be able to sway you anyway, but you'll never convince me that reusable code isn't mostly a
menace.
Here's a question that you may well have meant to ask: Why is the new book called Volume 4 Fascicle 0, instead of Volume 4
Fascicle 1? The answer is that computer programmers will understand that I wasn't ready to begin writing Volume 4 of TAOCP
at its true beginning point, because we know that the initialization of a program can't be written until the program itself takes
shape. So I started in 2005 with Volume 4 Fascicle 2, after which came Fascicles 3 and 4. (Think of Star Wars, which
began with Episode 4.)
I think the article misstates position of Perl (according to
TIOBE index it is No.6 about above C#,
Python and Ruby). The author definitely does not understand the value and staying power of C. Also there is an inherent problem with
their methodology (as with any web presence based metric). This is visible in positions of C# which now definitely looks stronger then
Python and Perl (and may be even PHP) as well as in positions of bash, awk and PowerShell. That means all other statements should be
taken with the grain of salt...
April 23, 2008 | DDJ
From what Paul Jansen has seen, everyone has a favorite programming language.
DDJ: Paul, can you tell us about the TIOBE Programming Community Index?
PJ: The TIOBE index tries to measure the popularity of programming languages by monitoring their web presence. The most
popular search engines Google, Yahoo!, Microsoft, and YouTube are used to calculate these figures. YouTube has been added recently
as an experiment (and only counts for 4 percent of the total). Since the TIOBE index has been published now for more than 6 years,
it gives an interesting picture about trends in the area of programming languages. I started the index because I was curious to know
whether my programming skills were still up to date and to know for which programming languages our company should create development
tools. It is amazing to see that programming languages are something very personal. Every day we receive
e-mails from people that are unhappy with the position of "their" specific language
in the index. I am also a bit overwhelmed about the vast and constant traffic this index generates.
DDJ: Which language has moved to the top of the heap, so to speak, in terms of popularity, and why do you think this is
the case?
PJ: If we take a look at the top 10 programming languages, not much has
happened the last five years. Only Python entered the top 10, replacing COBOL. This comes as a surprise because the IT world is moving
so fast that in most areas, the market is usually completely changed in five years time. Python managed to reach the
top 10 because it is the truly object-oriented successor of Perl. Other winners of the last couple of years are Visual Basic, Ruby,
JavaScript, C#, and D (a successor of C++). I expect in five years time there will be two main languages: Java and C#, closely
followed by good-old Visual Basic. There is no new paradigm foreseen.
DDJ: Which languages seem to be losing ground?
PJ: C and C++ are definitely losing ground. There is a simple explanation for this. Languages without automated
garbage collection are getting out of fashion. The chance of running into all kinds of memory problems is gradually outweighing
the performance penalty you have to pay for garbage collection. Another language that has had its day is Perl. It was once the standard
language for every system administrator and build manager, but now everyone has been waiting on a new major release for more than
seven years. That is considered far too long.
DDJ: On the flip side, what other languages seem to be moving into the limelight?
PJ: It is interesting to observe that dynamically typed object-oriented (scripting) languages are evolving the most. A new language has hardly arrived on the scene, only to be immediately replaced by another new emerging language. I think this
has to do with the increase in web programming. The web programming area demands a language that is easy to learn, powerful, and
secure. New languages pop up every day, trying to be leaner and meaner than their predecessors. A couple of years ago, Ruby was rediscovered
(thanks to Rails). Recently, Lua was the hype, but now other scripting languages such as ActionScript, Groovy, and Factor are about
to claim a top 20 position. There is quite some talk on the Internet about the NBL (next big language). But although those web-programming
languages generate a lot of attention, there is never a real breakthrough.
DDJ: What are the benefits of introducing coding standards into an organization? And how does an organization choose a
standard that is a "right fit" for its development goals?
PJ: Coding standards help to improve the general quality of software.
A good coding standard focuses on best programming practices (avoiding known language pitfalls), not only on style and naming conventions.
Every language has its constructions that are perfectly legitimate according to its language definition but will lead to reliability,
security, or maintainability problems. Coding standards help engineers to stick to a subset of a programming language to make sure
that these problems do not occur. The advantage of introducing coding standards as a means to improve quality is that-once it is
in place-it does not change too often. This is in contrast with dynamic testing. Every change in your program calls for a change
in your dynamic tests. In short, dynamic tests are far more labor intensive than coding standards. On the other hand, coding standards
can only take care of nonfunctional defects. Bugs concerning incorrectly implemented requirements remain undetected. The best way
to start with coding standards is to download a code checker and tweak it to your needs. It is our experience that if you do not
check the rules of your coding standard automatically, the coding standard will soon end as a dusty document on some shelf.
CoScripter is a system for recording, automating, and sharing processes performed in a web browser such as printing photos online,
requesting a vacation hold for postal mail, or checking flight arrival times. Instructions for processes are recorded and stored
in easy-to-read text here on the CoScripter web site, so anyone can make use of them. If you are having trouble with a web-based
process, check to see if someone has written a CoScript for it!
About: Tiny Eclipse is distribution of Eclipse for development with dynamic languages for the Web, such as JSP, PHP, Ruby, TCL,
and Web Services. It features a small download size, the ability to choose the features you want to install, and GUI installers for
Win32 and Linux GTK x86.
"Simply put, developers are saying that Java slows them down"
Dec 28, 2007 | infoworld.com
Simply put, developers are saying that Java slows them down. "There were big promises that Java would solve incompatibility problems
[across platforms]. But now there are different versions and different downloads, creating complications," says Peter Thoneny, CEO
of Twiki.net, which produces a certified version of the open source Twiki wiki-platform software. "It has not gotten easier.
It's more complicated," concurs Ofer Ronen, CEO of Sendori, which routes domain traffic to online advertisers and ad networks.
Sendori has moved to Ruby on Rails. Ronen says Ruby offers pre-built structures - say, a shopping cart for an e-commerce
site - that you'd have to code from the ground up using Java.
Another area of weakness is the development of mobile applications. Java's UI capabilities and its memory
footprint simply don't measure up, says Samir Shah, CEO of software testing provider Zephyr. No wonder the mobile edition of
Java has all but disappeared, and no wonder Google is creating its own version (Android).
These weaknesses are having a real effect. Late last month, Info-Tech Research Group said its survey
of 1,850 businesses found .Net the choice over
Java among businesses of all sizes and industries, thanks to its promotion via Visual Studio and SharePoint. Microsoft
is driving uptake of the .Net platform at the expense of Java," says George Goodall, a senior research analyst at Info-Tech.
One bit of good news: developers and analysts agree that Java is alive and well for internally developed
enterprise apps. "On the back end, there is still a substantial amount of infrastructure available that makes Java a very strong
contender," says Zephyr's Shah.
The Bottom Line: Now that Java is no longer the unchallenged champ for Internet-delivered
apps, it makes sense for companies to find programmers who are skilled in the new languages. If you're a Java developer, now's the
time to invest in new skills.
I think that the author of this comment is deeply mistaken: the length of the code has tremendous influence of the cost of maintenance
and number of errors and here Java sucks.
Linux Today
It's true. putting together an Enterprise-scale Java application takes a considerable amount of planning, design, and co-ordination.
Scripted languages like Python are easier - just hack something out and you've a working webapp by the end of the day.
But then you get called in at midnight, because a lot of the extra front-end work in Java has to do with the fact that the compiler
is doing major datatype validation. You're a lot less likely to have something blow up after it went into production, since a whole
raft of potential screw-ups get caught at build time.
Scripting systems like Python, Perl, PHP, etc. not only have late binding, but frequently have late compiling as well, so
until the offending code is invoked, it's merely a coiled-up snake.
In fact, after many years and many languages, I'm just about convinced that the amount of time and effort for producing a debugged
major app in just about any high-level language is about the same.
Myself, I prefer an environment that keeps me from having to wear a pager. For those who need less sleep and more Instant Gratification,
they're welcome to embrace the other end of the spectrum.
It's pretty strange for the system admin and CGI
programmer prefer Python to Perl... It goes without saying that in any case such evaluations should be taken with a grain of sslt.
what makes this comparison interesting is that author claim to have substantial programming experience in
Perl-4,
Tcl and
Python
Some languages I've used and how I've felt about them. This may help you figure out where I'm coming from. I'm only listing the
highlights here, and not including the exotica (Trac, SAM76, Setl, Rec, Convert, J...) and languages I've only toyed with or programmed
in my head (Algol 68, BCPL, APL, S-Algol, Pop-2 / Pop-11, Refal, Prolog...).
Pascal.
My first language. Very nearly turned me off programming before I got started. I hated it. Still do.
Sail.
The first language I loved, Sail was Algol 60 with zillions of extensions, from dynamically allocated strings to Leap, a weird
production system / logic programming / database language that I never understood (and now can barely recall), and access to every
Twenex JSYS and TOPS 10 UUO! Pretty much limited to PDP-10s; supposedly reincarnated as MAINSAIL, but I never saw that.
Teco.
The first language I actually became fluent in; also the first language I ever got paid to program in.
Snobol4.
The first language I actually wrote halfway-decent sizable code in, developed a personal subroutine library for, wrote multi-platform
code in, and used on an IBM mainframe (Spitbol -- but I did all my development under Twenex with Sitbol, thank goodness). I loved
Snobol: I used to dream in it.
PL/I.
The first language I ever thought was great at first and then grew to loathe. Subset G only, but that was enough.
Forth.
The first language I ever ran on my own computer; also the first language I ever wrote useful assembler in -- serial I/O routines
for the Z/80 (my first assembly language was a handful of toy programs in IBM 360 assembly language, using the aptly-named SPASM
assembler), and the first language I thought was really wonderful but had a really difficult time writing useful programs
in. Also the first language whose implementation I actually understood, the first language that really taught me about hardware,
and the first language implementation I installed myself. Oh and the first language I taught.
C.
What can I say here? It's unpleasant, but it works, it's everywhere, and you have to use it.
Lisp.
The first language I thought was truly brilliant and still think so to this day. I programmed in Maclisp and Muddle (MDL)
at first, on the PDP-10; Franz and then Common Lisp later (but not much).
Scheme.
How could you improve on Lisp? Scheme is how.
Perl.
The first language I wrote at least hundreds of useful programs in (Perl 4 (and earlier) only). Probably the second
language I thought was great and grew to loathe (for many of the same reasons I grew to loathe PL/I, interestingly enough -- but
it took longer).
Lazy Functional Languages.
How could you improve on Scheme? Lazy functional languages is how, but can you actually do anything with them (except
compile lazy functional languages, of course)?
Tcl.
My previous standard, daily language. It's got a lot of problems, and it's amazing that Tcl programs ever get around to terminating,
but they do, and astonishingly quickly (given the execution model...). I've developed a large library of Tcl procs that allow
me to whip up substantial programs really quickly, the mark of a decent language. And it's willing to dress up as Lisp to fulfill
my kinky desires.
Python
My current standard, daily language. Faster than Tcl, about as fast as Perl and with nearly as large a standard library, but
with a reasonable syntax and real data structures. It's by no means perfect -- still kind of slow, not enough of an expression
language to suit me, dynamically typed, no macro system -- but I'm really glad I found it.
By itself, Vim is one of the best editors for shell scripting. With a little tweaking, however, you can turn Vim into a full-fledged
IDE for writing scripts. You could do it yourself, or you can just install Fritz Mehner's
Bash Support plugin.
To install Bash Support, download the zip archive,
copy it to your ~/.vim directory, and unzip the archive. You'll also want to edit your ~/.vimrc to include a few personal details;
open the file and add these three lines:
let g:BASH_AuthorName = 'Your Name'
let g:BASH_Email = '[email protected]'
let g:BASH_Company = 'Company Name'
These variables will be used to fill in some headers for your projects, as we'll see below.
The Bash Support plugin works in the Vim GUI (gVim) and text mode Vim. It's a little easier to use in the GUI, and Bash Support
doesn't implement most of its menu functions in Vim's text mode, so you might want to stick with gVim when scripting.
When Bash Support is installed, gVim will include a new menu, appropriately titled Bash. This puts all of the Bash Support functions
right at your fingertips (or mouse button, if you prefer). Let's walk through some of the features, and see how Bash Support can
make Bash scripting a breeze.
Header and comments
If you believe in using extensive comments in your scripts, and I hope you are, you'll really enjoy using Bash Support. Bash Support
provides a number of functions that make it easy to add comments to your bash scripts and programs automatically or with just a mouse
click or a few keystrokes.
When you start a non-trivial script that will be used and maintained by others, it's a good idea to include a header with basic
information -- the name of the script, usage, description, notes, author information, copyright, and any other info that might be
useful to the next person who has to maintain the script. Bash Support makes it a breeze to provide this information. Go to Bash
-> Comments -> File Header, and gVim will insert a header like this in your script:
You'll need to fill in some of the information, but Bash Support grabs the author, company name, and email address from your ~/.vimrc,
and fills in the file name and created date automatically. To make life even easier, if you start Vim or gVim with a new file that
ends with an .sh extension, it will insert the header automatically.
As you're writing your script, you might want to add comment blocks for your functions as well. To do this, go to Bash -> Comment
-> Function Description to insert a block of text like this:
Just fill in the relevant information and carry on coding.
The Comment menu allows you to insert other types of comments, insert the current date and time, and turn selected code into a
comment, and vice versa.
Statements and snippets
Let's say you want to add an if-else statement to your script. You could type out the statement, or you could just use Bash Support's
handy selection of pre-made statements. Go to Bash -> Statements and you'll see a long list of pre-made statements that you can just
plug in and fill in the blanks. For instance, if you want to add a while statement, you can go to Bash -> Statements -> while, and
you'll get the following:
while _; do
done
The cursor will be positioned where the underscore (_) is above. All you need to do is add the test statement and the
actual code you want to run in the while statement. Sure, it'd be nice if Bash Support could do all that too, but there's only so
far an IDE can help you.
However, you can help yourself. When you do a lot of bash scripting, you might have functions or code snippets that you reuse
in new scripts. Bash Support allows you to add your snippets and functions by highlighting the code you want to save, then going
to Bash -> Statements -> write code snippet. When you want to grab a piece of prewritten code, go to Bash -> Statements -> read code
snippet. Bash Support ships with a few included code fragments.
Another way to add snippets to the statement collection is to just place a text file with the snippet under the ~/.vim/bash-support/codesnippets
directory.
Running and debugging scripts
Once you have a script ready to go, and it's testing and debugging time. You could exit Vim, make the script executable, run it
and see if it has any bugs, and then go back to Vim to edit it, but that's tedious. Bash Support lets you stay in Vim while doing
your testing.
When you're ready to make the script executable, just choose Bash -> Run -> make script executable. To save and run the script,
press Ctrl-F9, or go to Bash -> Run -> save + run script.
Bash Support also lets you call the bash debugger (bashdb) directly from within Vim. On Ubuntu, it's not installed by default,
but that's easily remedied with apt-get install bashdb. Once it's installed, you can debug the script you're working
on with F9 or Bash -> Run -> start debugger.
If you want a "hard copy" -- a PostScript printout -- of your script, you can generate one by going to Bash -> Run -> hardcopy
to FILENAME.ps. This is where Bash Support comes in handy for any type of file, not just bash scripts. You can use this function
within any file to generate a PostScript printout.
Bash Support has several other functions to help run and test scripts from within Vim. One useful feature is syntax checking,
which you can access with Alt-F9. If you have no syntax errors, you'll get a quick OK. If there are problems, you'll
see a small window at the bottom of the Vim screen with a list of syntax errors. From that window you can highlight the error and
press Enter, and you'll be taken to the line with the error.
Put away the reference book...
Don't you hate it when you need to include a regular expression or a test in a script, but can't quite remember the syntax? That's
no problem when you're using Bash Support, because you have Regex and Tests menus with all you'll need. For example, if you need
to verify that a file exists and is owned by the correct user ID (UID), go to Bash -> Tests -> file exists and is owned by the effective
UID. Bash Support will insert the appropriate test ([ -O _]) with your cursor in the spot where you have to
fill in the file name.
To build regular expressions quickly, go to the Bash menu, select Regex, then pick the appropriate expression from the list. It's
fairly useful when you can't remember exactly how to express "zero or one" or other regular expressions.
Bash Support also includes menus for environment variables, bash builtins, shell options, and a lot more.
Hotkey support
Vim users can access many of Bash Support's features using hotkeys. While not as simple as clicking the menu, the hotkeys do follow
a logical scheme that makes them easy to remember. For example, all of the comment functions are accessed with \c, so
if you want to insert a file header, you use \ch; if you want a date inserted, type \cd; and for a line
end comment, use \cl.
Statements can be accessed with \a. Use \ac for a case statement, \aie for an "if then
else" statement, \af for a "for in..." statement, and so on. Note that the online docs are incorrect here, and indicate
that statements begin with \s, but Bash Support ships with a PDF reference card (under .vim/bash-support/doc/bash-hot-keys.pdf)
that gets it right.
Run commands are accessed with \r. For example, to save the file and run a script, use \rr; to make
a script executable, use \re; and to start the debugger, type \rd. I won't try to detail all of the
shortcuts, but you can pull
up a reference using :help bashsupport-usage-vim when in Vim, or use the PDF. The full Bash Support reference is available
within Vim by running :help bashsupport, or you can read it
online.
Of course, we've covered only a small part of Bash Support's functionality. The next time you need to whip up a shell script,
try it using Vim with Bash Support. This plugin makes scripting in bash a lot easier.
Once and a while you may come across, what would seem to be a 'killer' piece of software, or maybe a cool new programming language
- something in that would appear to give you some advantage.
That MAY be the case, but many times, it isn't really so - think twice before your leap!
Consider these points:
You will have to learn this new thingamabob - that takes time.
Often, new thingamabobs excel in one area and stink in others - problem is that it can take time to figure this
out.
Listen to the king: "Wise men say, only fools rush in."
Do you notice a pattern here?
Yes, it's all about time. All this junk (software, programming languages, markup languages etc�) have one purpose in the
end: to save you time.
Keep that in mind when you approach things - ask yourself:
OO is definitely overkill for a lot of web projects. It seems to me that so many people use OO frameworks like Ruby and Zope because
"it's enterprise level". But using an 'enterprise' framework for small to medium sized web applications just adds so much overhead
and frustration at having to learn the framework that it just doesnt seem worth it to me.
Having said all this I must point out that I'm distrustful of large corporations and hate their dehumanising heirarchical
structure. Therefore i am naturally drawn towards open source and away from the whole OO/enterprise/heirarchy paradigm. Maybe
people want to push open source to the enterprise level in the hope that they will adopt the technology and therefore they will have
more job security. Get over it - go and learn Java and .NET if you want job security and preserve open source software as an oasis
of freedom away from the corporate world. Just my 2c
===
OOP has its place, but the diversity of frameworks is just as challenging to figure out as a new class you didn't write,
if not more. None of them work the same or keep a standard convention between them that makes learning them easier. Frameworks
are great, but sometimes I think maybe they don't all have to be OO. I keep a small personal library of functions I've (and others
have) written procedurally and include them just like I would a class.
Beyond the overhead issues is complexity. OOP has you chasing declarations over many files to figure out what's happening.
If you're trying to learn how that unique class you need works, it can be time consuming to read through it and see how the class
is structured. By the time you're done you may as well have written the class yourself, at least by then you'd have a solid understanding.
Encapsulation and polymorphism have their advantages, but the cost is complexity which can equal time. And for smaller projects that
will likely never expand, that time and energy can be a waste.
Not trying to bash OOP, just to defend procedural style. They each have their place.
===
Sorry, but I don't like your text, because you mix Ruby and Ruby on Rails alot. Ruby is in my oppinion easier to use then PHP,
because PHP has no design-principle beside "make it work, somehow easy to use". Ruby has some really cool stuff I miss quite often,
when I have to program in PHP again (blocks for example), but has a more clear and logic syntax.
Ruby on Rails is of course not that easy to use, at least when speaking about small-scale projects. This is, because it does alot
more than PHP does. Of course, there are other good reasons to prefere PHP over Rails (like the better support by providers, more
modules, more documentation), but from my opinion, most projects done in PHP from the complexity of a blog could profit from being
programmed in Rails, from the pure technical point of view. At least I won't program in PHP again unless a customer asks me.
===
I have a reasonable level of experience with PHP and Python but unfortunately haven't touched Ruby yet. They both seem to be a
good choice for low complexity projects. I can even say that I like Python a lot. But I would never consider it again for projects
where design is an issue. They also say it is for (rapid) prototyping. My experience is that as long as you can't afford a proper
IDE Python is maybe the best place to go to. But a properly "equipped" environment can formidably boost your productivity with a
statically typed language like Java. In that case Python's advantage shrinks to the benefits of quick tests accesible through its
command line.
Another problem of Python is that it wants to be everything: simple and complete, flexible and structured, high-level while allowing
for low-level programming. The result is a series of obscure features
Having said all that I must give Python all the credits of a good language. It's just not perfect. Maybe it's Ruby
My apologies for not sticking too closely to the subject of the article.
===
The one thing I hate is OOP geeks trying to prove that they can write code that does nothing usefull and nobody understands.
"You don't have to use OOP in ruby! You can do it PHP way! So you better do your homework before making such statements!"
Then why use ruby in the first place?
"What is really OVERKILL to me, is to know the hundrets of functions, PHP provides out of the box, and available in ANY scope!
So I have to be extra careful whether I can use some name. And the more functions - the bigger the MESS."
On the other hand, in ruby you use only functions available for particular object you use.
I would rather say: "some text".length than strlen("some text"); which is much more meaningful! Ruby language itself much more
descriptive. I remember myself, from my old PHP days, heaving always to look up the php.net for appropriate function, but now I can
just guess!"
Yeah you must have weak memory and can`t remember whether strlen() is for strings or for numbers�.
Doesn`t ruby have the same number of functions just stored in objects?
Look if you can`t remember strlen than invent your own classes you can make a whole useless OOP framework for PHP in a day��
Rails certainly looks beautiful. It is fully object oriented, with built in O/R mapping, powerful AJAX support, an elegant syntax,
a proper implementation of the Model-View-Controller design pattern, and even a Ruby to Javascript converter which lets you write
client side web code in Ruby.
However, I don't think it's the end of the line for C# and Java by a long shot. Even if it does draw a lot of fire, there is a
heck of a lot of code knocking around in these languages, and there likely still will be for a very long time to come. Even throwaway
code and hacked together interim solutions have a habit of living a lot longer than anyone ever expects. Look at how much code is
still out there in Fortran, COBOL and Lisp, for instance.
Like most scripting languages such as Perl, Python, PHP and so on, Ruby is still a dynamically typed language. For this reason
it will be slower than statically typed languages such as C#, C++ and Java. So it won't be used so much in places where you need
lots of raw power. However, most web applications don't need such raw power in the business layer. The main bottleneck in web development
is database access and network latency in communicating with the browser, so using C# rather than Rails would have only a very minor
impact on performance. But some of them do, and in such cases the solutions often have different parts of the application written
in different languages and even running on different servers. One of the solutions that we have developed, for instance, has a web
front end in PHP running on a Linux box, with a back end application server running a combination of Python and C++ on a Windows
server.
Rails certainly knocks the spots off PHP though�
...I have a hunch that the main branches of the evolutionary tree pass through the languages that have the smallest, cleanest
cores. The more of a language you can write in itself, the better.
...Languages evolve slowly because they're not really technologies. Languages are notation. A program is a formal
description of the problem you want a computer to solve for you. So the rate of evolution in programming languages is more like the
rate of evolution in mathematical notation than, say, transportation or communications. Mathematical notation does evolve, but not
with the giant leaps you see in technology.
...I learned to program when computer power was scarce. I can remember taking all the spaces out of my Basic programs so they
would fit into the memory of a 4K TRS-80. The thought of all this stupendously inefficient software burning up cycles doing the same
thing over and over seems kind of gross to me. But I think my intuitions here are wrong. I'm like someone who grew up poor, and can't
bear to spend money even for something important, like going to the doctor.
Some kinds of waste really are disgusting. SUVs, for example, would arguably be gross even if they ran on a fuel which would never
run out and generated no pollution. SUVs are gross because they're the solution to a gross problem. (How to make minivans look more
masculine.) But not all waste is bad. Now that we have the infrastructure to support it, counting the minutes of your long-distance
calls starts to seem niggling. If you have the resources, it's more elegant to think of all phone calls as one kind of thing, no
matter where the other person is.
There's good waste, and bad waste. I'm interested in good waste-- the kind where, by spending more, we can get simpler designs.
How will we take advantage of the opportunities to waste cycles that we'll get from new, faster hardware?
The desire for speed is so deeply engrained in us, with our puny computers, that it will take a conscious effort to overcome it.
In language design, we should be consciously seeking out situations where we can trade efficiency for even the smallest increase
in convenience.
Most data structures exist because of speed. For example, many languages today have both strings and lists. Semantically,
strings are more or less a subset of lists in which the elements are characters. So why do you need a separate data type? You don't,
really. Strings only exist for efficiency. But it's lame to clutter up the semantics of the language with hacks to make programs
run faster. Having strings in a language seems to be a case of premature optimization.
... Inefficient software isn't gross. What's gross is a language that makes programmers do needless work. Wasting programmer time
is the true inefficiency, not wasting machine time. This will become ever more clear as computers get faster
... Somehow the idea of reusability got attached to object-oriented programming in the 1980s, and no amount of evidence to the
contrary seems to be able to shake it free. But although some object-oriented software is reusable, what makes it reusable is its
bottom-upness, not its object-orientedness. Consider libraries: they're reusable because they're language, whether they're written
in an object-oriented style or not.
I don't predict the demise of object-oriented programming, by the way. Though I don't think it has much to offer good programmers,
except in certain specialized domains, it is irresistible to large organizations. Object-oriented programming offers a sustainable
way to write spaghetti code. It lets you accrete programs as a series of patches. Large organizations always tend to develop
software this way, and I expect this to be as true in a hundred years as it is today.
... As this gap widens, profilers will become increasingly important. Little attention is paid to profiling now.
Many people still seem to believe that the way to get fast applications is to write compilers that generate fast code. As the gap
between acceptable and maximal performance widens, it will become increasingly clear that the way to get fast applications is to
have a good guide from one to the other.
...One of the most exciting trends in the last ten years has been the rise of open-source languages like Perl, Python, and Ruby.
Language design is being taken over by hackers. The results so far are messy, but encouraging. There are some stunningly novel
ideas in Perl, for example. Many are stunningly bad, but that's always true of ambitious efforts. At its current rate of
mutation, God knows what Perl might evolve into in a hundred years.
... One helpful trick here is to use the length of the program
as an approximation for how much work it is to write. Not the length in characters, of course, but the length in distinct syntactic
elements-- basically, the size of the parse tree. It may not be quite true that the shortest program is the least work to write,
but it's close enough that you're better off aiming for the solid target of brevity than the fuzzy, nearby one of least work.
Then the algorithm for language design becomes: look at a program and ask, is there any way to write this that's shorter?
Ralph Griswold, the creator of Snobol and
Icon programming languages, died in October 2006
of cancer. Until recently Computer Science was a discipline where the founders were still around. That's changing. Griswold was an important
pioneer of programming language design with Snobol sting manipulation facilities different and somewhat faster then regular expressions.
Ralph Griswold died two weeks ago. He created several programming languages, most notably Snobol (in the 60s) and Icon (in the
70s) - both outstandingly innovative, integral, and efficacious in their areas. Despite the abundance of scripting and other languages
today, Snobol and Icon are still unsurpassed in many respects, both as elegance of design and as practicality.
Ralph E. Griswold died in Tucson on October 4, 2006, of complications from pancreatic cancer. He was Regents Professor Emeritus in
the Department of Computer Science at the University of Arizona.
Griswold was born in Modesto, California, in 1934. He was an award winner in the 1952 Westinghouse National Science Talent Search
and went on to attend Stanford University, culminating in a PhD in Electrical Engineering in 1962.
Griswold joined the staff of Bell Telephone Laboratories in Holmdel, New Jersey, and rose to become head of Programming Research
and Development. In 1971, he came to the University of Arizona to found the Department of Computer Science, and he served as department
head through 1981. His insistence on high standards brought the department recognition and respect. In recognition of his work the
university granted him the breastle of Regents Professor in 1990.
While at Bell Labs, Griswold led the design and implementation of the groundbreaking SNOBOL4 programming language with its emphasis
on string manipulation and high-level data structures. At Arizona, he developed the Icon programming language, a high-level language
whose influence can be seen in Python and other recent languages.
Griswold authored numerous books and articles about computer science. After retiring in 1997, his interests turned to weaving.
While researching mathematical aspects of weaving design he collected and digitized a large library of weaving documents and maintained
a public website. He published technical monographs and weaving designs that inspired the work of others, and he remained active
until his final week.
-----Gregg Townsend Staff Scientist The University of Arizona
Actually Spolsky does not understand the role of scripting languages. But he is right of target with his critique of OO. Object
oriented programming is no silver bullet.
(InfoWorld) Joel Spolsky is one of our
most celebrated pundits on the practice of software development, and he's full of terrific insight. In a recent blog post, he decries
the fallacy of "Lego programming" -- the all-too-common
assumption that sophisticated new tools will make writing applications as easy as snapping together children's toys. It simply isn't
so, he says -- despite the fact that people have been claiming it for decades -- because the most important work in software development
happens before a single line of code is written.
By way of support, Spolsky reminds us of a quote from the most celebrated pundit of an earlier generation of developers. In his
1987 essay "No Silver Bullet,"
Frederick P. Brooks wrote, "The essence of a software entity is a construct of interlocking concepts ... I believe the hard
part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing
it and testing the fidelity of the representation ... If this is true, building software will always be hard. There is inherently
no silver bullet."
As Spolsky points out, in the 20 years since Brooks wrote "No Silver Bullet," countless products have reached the market heralded
as the silver bullet for effortless software development. Similarly, in the 30 years since Brooks published "
The Mythical Man-Month" -- in which, among other
things, he debunks the fallacy that if one programmer can do a job in ten months, ten programmers can do the same job in one month
-- product managers have continued to buy into various methodologies and tricks that claim to make running software projects as easy
as stacking Lego bricks.
Don't you believe it. If, as Brooks wrote, the hard part of software development is the initial design, then no amount of radical
workflows or agile development methods will get a struggling project out the door, any more than the latest GUI rapid-development
toolkit will.
And neither will open source. Too often, commercial software companies decide to turn over their orphaned software to "the
community" -- if such a thing exists
-- in the naive belief that open source will be a miracle cure to get a flagging project back on track. This is just another fallacy,
as history demonstrates.
In 1998, Netscape released the source code to its Mozilla browser to the public to much fanfare, but only lukewarm response from
developers. As it turned out, the Mozilla source was much too complex and of too poor quality for developers outside Netscape to
understand it. As Jamie Zawinski recounts, the resulting decision
to rewrite the browser's rendering engine from scratch derailed the project anywhere from six to ten months.
This is a classic example of the fallacy of the mythical man-month. The problem with the Mozilla code was poor design, not
lack of an able workforce. Throwing more bodies at the project didn't necessarily help; it may have even hindered it. And
while implementing a community development process may have allowed Netscape to sidestep its own internal management problems, it
was certainly no silver bullet for success.
The key to developing good software the first time around is doing the hard work at the beginning: good design, and rigorous testing
of that design. Fail that, and you've got no choice but to take the hard road. As Brooks observed all those years ago, successful
software will never be easy. No amount of open source process will change that, and to think otherwise is just more Lego-programming
nonsense.
On the heels of last weekend's Ruby Conference in Denver (for a report, see
Jack Woehr's blog), Sun Microsystems made a Ruby-related announcement of its own. Led by Charles Nutter and Thomas Enebo, the
chief maintainers of JRuby, a 100% pure Java implementation of the Ruby
language, Sun has released JRuby 0.9.1. Among the features of this release are:
Overall performance is 50-60% faster than JRuby 0.9.0
New interpreter design
Refactoring of Method dispatch, code evaluation, and block dispatch code
Parser performance enhancement
Rewriting of Enumerable and StringScanner in Java
New syntax for including Java classes into Ruby
In related news, Ola Bini has been inducted into JRuby as a core developer during this development cycle.
Details are available at Thomas Enebo's blog and
Ola Bini's blog.
I was just looking at our BookScan data
mart to update a reporter on Java vs.
C# adoption. (The answer to his query: in the last twelve weeks, Java book sales are off 4% vs. the same period last year, while
C# book sales are up 16%.) While I was looking at the data, though, I noticed something perhaps more newsworthy: in the same period,
Ruby book sales surpassed Python book sales for the first time. Python is up 20% vs. the same period last year, but
Ruby is up 1552%! (Perl is down 3%.) Perl is still the most commonly used of the three languages, at least according to book
sales, but Python and now Ruby are narrowing the gap.
RoR, AJAX, SOA -- these are hype. The reality is that Java, JSP, PHP, Python, Perl, and Tcl/Tk are what people use these days.
And if it's a web app, they use PHP or JSP.
RoR is too time-consuming. It takes too long to learn and provides too few results in that timeframe compared to a PHP developer.
AJAX is also time-consuming and assumes too much about the stability of the browser client. And it puts way too much power into
ugly Javascript. It's good for only a smidgen of things in only a smidgen of environments.
Java and JSP is around only because of seniority. It was hyped and backed by big companies as the only other option to Microsoft's
stuff.Now we're stuck with it. In reality, JSP and Java programmers spend too much time in meetings going over OOP
minutea. PHP programmers may use a little OOP, but they focus most on just getting it done.
Python seems to have taken hold on Linux only as far as a rich client environment. It's not used enough for web apps.
Re: The departure
of the hyper-enthusiasts Posted: Dec 18, 2005 5:54 PM
The issue at hand is comparable to the "to use or not to use EJB". I, too, had a bad time trying to use EJBs, so maybe you can
demonstrate some simpathy for a now Ruby user who can't seem to use any other language.
I claim that Ruby is simple enough for me to concentrate on the problem and not on the language (primary tool). Maybe for you
Python is cleaner, but to me Python is harder than Ruby when I try to read the code. Ruby has a nice convention of "CamelCase", "methods_names",
"AClass.new", etc, that make the code easier to read than a similar Python code, because in Python you don't have a good convention
for that. Also, when I require 'afile.rb' in Ruby, it's much easier to read than the "import this.that.Something" in Python. Thus,
despite the forced indentation, I prefer the way that the Ruby code looks and feels in comparison to the Python code.
On the supported libraries, Python has a very good selection, indeed. I would say that the Python libraries might be very good
in comparison to the Ruby libraries. On the other hand, Ruby has very unique libraries which feel good to use. So, even if Python
has more libraries, Ruby should have some quality libraries that compensate a lot for the difference. By considering that one should
be well served using Ruby or Python in terms of libraries, the Python's force over Ruby diminishes quite a bit.
Finally, when you are up to the task of creating something new, like a library or program, you may be much more well served by
using Ruby if you get to the point of fluid Ruby programming. But, if all you want is to create some web app, maybe Rails already
fulfills your requirements.
Even if you consider Python better, for example, because Google uses it and you want to work for Google or something, that
won't make us Ruby users give up on improving the language and the available tools. I simply love Ruby and I will keep using
it for the forseeable future -- in the future, if I can make a major contribution to the Ruby community, I will.
The author is really incoherent in this rant (also he cannot be right by definition as he loves Emacs ;-). The key problem with
arguments presented is that he is mixing apples with oranges (greatness of a programmer as an artist and greatness of a programmer as
an innovator). Strong point of this runt is the idea that easy extensibility is a huge advantage and openness of code does not matter
much per se. Another good obeservation (made by many other authors) is that "Any sufficiently flexible and programmable environment
- say Emacs, or Ruby on Rails, or Firefox, or even my game Wyvern - begins to take on characteristics of ... operating system as it
grows."
Any sufficiently flexible and programmable environment - say Emacs, or Ruby on Rails, or Firefox, or even my game Wyvern -
begins to take on characteristics of both language and operating system as it grows. So I'm lumping together a big class
of programs that have similar characteristics. I guess you could call them frameworks, or extensible systems.
... ... ...
Not that we'd really know, because how often do we go look at the source code for the frameworks we use? How much
time have you spent examining the source code of your favorite programming language's compiler, interpreter or VM? And by the
time such systems reach sufficient size and usefulness, how much of that code was actually penned by the original author?
Sure, we might go look at framework code sometimes. But it just looks like, well, code. There's usually nothing particularly
famous-looking or even glamorous about it. Go look at the source code for Emacs or Rails or Python or Firefox, and it's just
a big ball of code. In fact, often as not it's a big hairy ball, and the original author is focused on refactoring or even
rewriting big sections of it.
I was in Barnes today, doing my usual weekend stroll through the tech section. Helps me keep up on the latest trends.
And wouldn't you know it, I skipped a few weeks there, and suddenly Ruby and Rails have almost as many books out as Python. I counted
eleven Ruby/RoR titles tonight, and thirteen for Python (including one Zope book). And Ruby had a big display section at the
end of one of the shelves.
Not all the publishers were O'Reilly and Pragmatic Press. I'm pretty sure there were two or three others there, so it's not just
a plot by Tim O'Reilly to sell more books. Well, actually that's exactly what it is, but it's based on actual market research that
led him to the conclusion that Rails and Ruby are both gathering steam like nobody's business.
... ... ...
I do a lot more programming in Python than in Ruby -- Jython in my game server, and Python at work, since that's what everyone
there uses for scripting. I have maybe 3x more experience with Python than with Ruby (and 10x more experience with Perl). But Perl
and Python both have more unnecessary conceptual overhead, so I find I have to consult the docs more often with both of them. And
when all's said and done, Ruby code generally winds up being the most direct and succinct, whether it's mine or someone else's.
I have a lot of trouble writing about Ruby, because I find there's nothing to say. It's why I almost never post to the O'Reilly
Ruby blog. Ruby seems so self-explanatory to me. It makes it almost boring; you try to focus on Ruby and you wind up talking about
some problem domain instead of the language. I think that's the goal of all programming languages, but so far Ruby's one of the few
to succeed at it so well.
... ... ...
I think next year Ruby's going to be muscling in on Perl in terms of mindshare, or shelf-share, at B&N.
Welcome to the Summer of Code 2006 site. We are no longer accepting applications from students or mentoring organizations. Students
can view previously submitted applications and respond to mentor comments via the
student home page. Accepted student projects will be announced
on code.google.com/soc/ on May 23, 2006. You can talk to us in the
Summer-Discuss-2006 group or via IRC in #summer-discuss
on SlashNET.
If you're feeling nostalgic, you can still access the Summer
of Code 2005 site.
While interpreted programming languages such as Perl, Python, PHP, and Ruby are increasingly favored for Web applications -- and
have long been preferred for automating system administration tasks -- compiled programming languages such as C and C++ are still
necessary. The performance of compiled programming languages remains unmatched (exceeded only by the performance of hand-tuned assembly),
and certain software -- including operating systems and device drivers -- can only be implemented efficiently using compiled code.
Indeed, whenever software and hardware need to mesh seamlessly, programmers instinctively reach for a C compiler: C is primitive
enough to get "close to the bare metal" -- that is, to capture the idiosyncrasies of a piece of hardware -- yet expressive enough
to offer some high-level programming constructs, such as structures, loops, named variables, and scope.
However, scripting languages
have distinct advantages, too. For example, after a language's interpreter is successfully ported to a platform, the vast majority
of scripts written in that language run on the new platform unchanged -- free of dependencies such as system-specific function libraries.
(Think of the many DLL files of the Microsoft� Windows� operating system or the many libcs of UNIX� and Linux�.) Additionally, scripting
languages typically offer higher-level programming constructs and convenience operations, which programmers claim boost productivity
and agility. Moreover, programmers working in an interpreted language can work faster, because the compilation and link steps are
unnecessary. The "code, build, link, run" cycle of C and its ilk is reduced to a hastened "script, run."
Like every scripting language, Lua has its own peculiarities:
Lua types. In Lua, values have types, but variables are dynamically typed. The nil, boolean, number, and
string types work as you might expect.
Nil is the type of the special value nil and is used to represent the absence of a value.
Boolean is the type of the constants true and false. (Nil also represents false,
and any non-nil value represents true.)
All numbers in Lua are doubles (but you can easily build code to realize other numeric types).
A string is an immutable array of characters. (Hence, to append to a string, you must make a copy of it.)
The table, function, and thread types are all references. Each can be assigned to a variable, passed as an argument,
or returned from a function. For instance, here's an example of storing a function:
-- example of an anonymous function
-- returned as a value
-- see http://www.tecgraf.puc-rio.br/~lhf/ftp/doc/hopl.pdf
function add(x)
return function (y) return (x + y) end
end
f = add(2)
print(type(f), f(10))
function 12
Lua threads. A thread is a co-routine created by calling the built-in function coroutine.create(f),
where f is a Lua function. Threads do not start when created; instead, a co-routine is started after the fact, using
coroutine.resume(t), where t is a thread. Every co-routine must occasionally yield the processor to other
co-routines using coroutine.yield().
Assignment statements. Lua allows multiple assignments, and expressions are evaluated first and are then assigned.
For example, the statements:
i = 3
a = {1, 3, 5, 7, 9}
i, a[i], a[i+1], b = i+1, a[i+1], a[i]
print (i, a[3], a[4], b, I)
produce 4 7 5 nil nil. If the list of variables is larger than the list of values, excess variables are assigned
nil; hence, b is nil. If there are more values than variables, extra values are simply discarded. In
Lua, variable names are case-sensitive, explaining why I is nil.
Chunks. A chunk is any sequence of Lua statements. A chunk can be stored in a file or in a string in a Lua program.
Each chunk is executed as the body of an anonymous function. Therefore, a chunk can define local variables and return values.
More cool stuff. Lua has a mark-and-sweep garbage collector. As of Lua 5.1, the garbage collector works incrementally.
Lua has full lexical closures (like Scheme and unlike Python). And Lua has reliable tail call semantics (again, like Scheme and
unlike Python).
Find more examples of Lua code in Programming in Lua and in the Lua-users wiki (for links, see the
Resources section below).
As in all engineering pursuits, choosing between a compiled language and an interpreted language means measuring the pros and
cons of each in context, weighing the trade-offs, and accepting compromises.
Several weeks ago there was a notable bit of controversy
over some comments made by James Gosling, father of the Java programming language. He has since
addressed the flame war that erupted,
but the whole ordeal got me thinking seriously about PHP and its scalability and performance abilities compared to Java. I knew that
several hugely popular Web 2.0 applications were written in scripting languages like PHP, so I contacted Owen Byrne - Senior Software
Engineer at digg.com to learn how he addressed any problems they encountered during
their meteoric growth. This article addresses the all-to-common false assumptions about the cost of scalability and performance in
PHP applications.
At the time Gosling's comments were made, I was working on tuning and optimizing the source code and server configuration for
the launch of Jobby, a Web 2.0 resume tracking application written using the
WASP PHP framework. I really hadn't done any substantial research on how
to best optimize PHP applications at the time. My background is heavy in the architecture and development of highly scalable applications
in Java, but I realized there were enough substantial differences between Java and PHP to cause me concern. In my experience, it
was certainly faster to develop web applications in languages like PHP; but I was curious as to how much of that time savings might
be lost to performance tuning and scaling costs. What I found was both encouraging and surprising.
What are Performance and Scalability?
Before I go on, I want to make sure the ideas of performance and scalability are understood. Performance is measured by the output
behavior of the application. In other words, performance is whether or not the app is fast. A good performing web application is
expected to render a page in around or under 1 second (depending on the complexity of the page, of course). Scalability is the ability
of the application to maintain good performance under heavy load with the addition of resources. For example, as the popularity of
a web application grows, it can be called scalable if you can maintain good performance metrics by simply making small hardware additions.
With that in mind, I wondered how PHP would perform under heavy load, and whether it would scale well compared with Java.
Hardware Cost
My first concern was raw horsepower. Executing scripting language code is more hardware intensive because to the code isn't compiled.
The hardware we had available for the launch of Jobby was a single hosted Linux server with a 2GHz processor and 1GB of RAM. On this
single modest server I was going to have to run both Apache 2 and MySQL. Previous applications I had worked on in Java had been deployed
on 10-20 application servers with at least 2 dedicated, massively parallel, ultra expensive database servers. Of course, these applications
handled traffic in the millions of hits per month.
To get a better idea of what was in store for a heavily loaded PHP application, I set up an interview with Owen Byrne, cofounder
and Senior Software Engineer at digg.com. From talking with Owen I learned digg.com
gets on the order of 200 million page views per month, and they're able to handle it with only 3 web servers and 8 small database
servers (I'll discuss the reason for so many database servers in the next section). Even better news was that they were able
to handle their first year's worth of growth on a single hosted server like the one I was using. My hardware worries were relieved.
The hardware requirements to run high-traffic PHP applications didn't seem to be more costly than for Java.
Database Cost
Next I was worried about database costs. The enterprise Java applications I had worked on were powered by expensive database software
like Oracle, Informix, and DB2. I had decided early on to use MySQL for my database, which is of course free. I wondered whether
the simplicity of MySQL would be a liability when it came to trying to squeeze the last bit of performance out of the database. MySQL
has had a reputation for being slow in the past, but most of that seems to have come from sub-optimal configuration and the overuse
of MyISAM tables. Owen confirmed that the use of InnoDB for tables for read/write data makes a massive performance difference.
There are some scalability issues with MySQL, one being the need for large amounts of slave databases. However, these issues are
decidedly not PHP related, and are being addressed in future versions of MySQL. It could be argued that even with the large amount
of slave databases that are needed, the hardware required to support them is less expensive than the 8+ CPU boxes that typically
power large Oracle or DB2 databases. The database requirements to run massive PHP applications still weren't more costly than for
Java.
PHP Coding Cost
Lastly, and most importantly, I was worried about scalability and performance costs directly attributed to the PHP language itself.
During my conversation with Owen I asked him if there were any performance or scalability problems he encountered that were related
to having chosen to write the application in PHP. A bit to my surprise, he responded by saying, "none of the scaling challenges we
faced had anything to do with PHP," and that "the biggest issues faced were database related." He even added, "in fact, we found
that the lightweight nature of PHP allowed us to easily move processing tasks from the database to PHP in order to deal with that
problem." Owen mentioned they use the APC PHP accelerator platform
as well as MCache to lighten their database load. Still, I was
skeptical. I had written Jobby entirely in PHP 5 using a framework which uses a highly object oriented MVC architecture to provide
application development scalability. How would this hold up to large amounts of traffic?
My worries were largely related to the PHP engine having to effectively parse and interpret every included class on each page
load. I discovered this was just my misunderstanding of the best way to configure a PHP server. After doing some research, I found
that by using a combination of Apache 2's worker threads, FastCGI, and a PHP accelerator, this was no longer a problem. Any class
or script loading overhead was only encountered on the first page load. Subsequent page loads were of comparative performance to
a typical Java application. Making these configuration changes were trivial and generated massive performance gains. With regard
to scalability and performance, PHP itself, even PHP 5 with heavy OO, was not more costly than Java.
Conclusion
Jobby was launched successfully on its single modest server and, thanks to links from
Ajaxian and
TechCrunch, went on to happily survive hundreds
of thousands of hits in a single week. Assuming I applied all of my new found PHP tuning knowledge correctly, the application should
be able to handle much more load on its current hardware.
Digg is in the process of preparing to scale to 10 times current load. I asked Owen Byrne if that meant an increase in headcount
and he said that wasn't necessary. The only real change they identified was a switch to a different database platform. There doesn't
seem to be any additional manpower cost to PHP scalability either.
It turns out that it really is fast and cheap to develop applications in PHP. Most scaling and performance challenges are
almost always related to the data layer, and are common across all language platforms. Even as a self-proclaimed PHP evangelist,
I was very startled to find out that all of the theories I was subscribing to were true. There is simply no truth to the idea that
Java is better than scripting languages at writing scalable web applications. I won't go as far as to say that PHP is better than
Java, because it is never that simple. However it just isn't true to say that PHP doesn't scale, and with the rise of Web 2.0, sites
like Digg, Flickr, and even
Jobby are proving that large scale applications can be rapidly built and maintained
on-the-cheap, by one or two developers.
Run existing AutoIt v2 scripts and enhance them with
new capabilities.
Convert any script into an EXE file that
can be run on computers that don't have AutoHotkey installed.
Getting started might be easier than you think. Check out the
quick-start tutorial.
More About Hotkeys
AutoHotkey unleashes the full potential of your keyboard, joystick, and mouse. For example, in addition to the typical Control,
Alt, and Shift modifiers, you can use the Windows key and the Capslock key as modifiers. In fact, you can make any key or mouse button
act as a modifier. For these and other capabilities, see
Advanced Hotkeys.
Other Features
Change the volume, mute, and other settings
of any soundcard.
What if you could provide a seamlessly integrated, fully dynamic language with a conventional syntax while increasing your application's
size by less than 200K on an x86? You can do it with Lua!
There's no reason that web developers should have all the fun. Web 2.0 APIs enable fascinating collaborations between developers
and an extended community of developer-users. Extension and configuration APIs added to traditional applications can generate the
same benefits.
Of course, extensibility isn't a particularly new idea. Many applications have a plugin framework (think Photoshop) or an extension
language (think Emacs). What if you could provide a seamlessly integrated, fully dynamic language with a conventional syntax
while increasing your application's size by less than 200K on an x86? You can do it with Lua!
Virtually anyone with any kind of programming experience should find Lua's syntax concise and easy to read. Two dashes introduce
comments. An end statement delimits control structures (if, for, while). All
variables are global unless explicitly declared local. Lua's fundamental data types include numbers
(typically represented as double-precision floating-point values), strings, and Booleans. Lua has true and false
as keywords; any expression that does not evaluate to nil is true. Note that 0 and arithmetic expressions
that evaluate to 0 do not evaluate to nil. Thus Lua considers them as true when you use them as part of
a conditional statement.
Finally, Lua supports userdata as one of its fundamental data types. By definition, a userdata
value can hold an ANSI C pointer and thus is useful for passing data references back and forth across the C-Lua boundary.
Despite the small size of the Lua interpreter, the language itself is quite rich. Lua uses subtle but powerful forms of syntactic
sugar to allow the language to be used in a natural way in a variety of problem domains, without adding complexity (or size) to the
underlying virtual machine. The carefully chosen sugar results in very clean-looking Lua programs that effectively convey the nature
of the problem being solved.
The only built-in data structure in Lua is the table. Perl programmers will recognize this as a hash; Python programmers will
no doubt see a dictionary. Here are some examples of table usage in Lua:
a = {} -- Initializes an empty table
a[1] = "Fred" -- Assigns "Fred" to the entry indexed by the number 1
a["1"] = 7 -- Assigns the number 7 to the entry indexed by the string "1"
Any Lua data type can serve as a table index, making tables a very powerful construct in and of themselves. Lua extends the capabilities
of the table by providing different syntactic styles for referencing table data. The standard table constructor looks like this:
t = { "Name"="Keith", "Address"="Ballston Lake, New York"}
A table constructor written like
t2 = { "First", "Second","Third"}
is the equivalent of
t3 = { 1="First", 2="Second", 3="Third" }
This last form essentially initializes a table that for all practical purposes behaves as an array. Arrays created in this way
have as their first index the integer 1 rather than 0, as is the case in other languages.
The following two forms of accessing the table are equivalent when the table keys are strings:
t3["Name"] = "Keith"
t3.Name = "Keith"
Tables behave like a standard struct or record when accessed in this fashion.
Simkin started life in 1995. At that time Lateral Arts' Simon Whiteside
was involved in the development of "Animals of Farthing Wood" an adventure game being produced by the BBC. Simon was asked to produce
the game code. "When I started the project it became clear that while the games designers had clear objectives for what they were
trying to achieve the detail of much of the game were not defined," says Simon. "Faced with the prospect of rewriting section of
the games as the design progressed, which written in C running on Windows 3.0, I realized was going to be time consuming, I looked
for some alternative solutions." Simon's initial solution was to allow the game to be manipulated using configuration files, but
as time progressed the need for an expression evaluator was identified and later the loops were added to give greater control and
flexibility and so the scripting language emerged.
From the Farthing Wood project Simon took this technology with him to a project for
Sibelius the best selling music notation application. The developers of Sibelius
wanted to add a macro language, to provide Sibelius with a macro capability similar to the facilities available in a word processor.
Simon created this feature using Simkin to provide the Sibelius plug-in.
When Simon left Sibelius in 1997 he decided to make Simkin available as a product and after productizing it spent about 6 months
working on licensing the product. In that period he sold a couple of licenses but eventually realized that his core interest was
in bespoke applications development. Rather than let the product die Simon made the decision to release it as an open source project.
So in 1999 it was released through Sourceforge. "Simkin certainly gained interest as an open source product," says Simon. "I received
a lot of feed back and several bug fixes so I was happy that open source was the right way to go with Simkin."
Since Simon open sourced Simkin he has developed Java and XML versions as well as pilot J2ME version.
The Symbian version started with an inquiry from Hewlett-Packard in early 2002. Hewlett-Packard Research Laboratories, Europe
were running the Bristol Wearable Computing Project in partnership
with Bristol University. The project involves looking at various applications for wearable computing devices from applications such
as games to guides. One application provides a guide to the works in the city art gallery fed with information from wireless access
points, which had been set up around Bristol. As part of the project HP wanted to build an interactive game to run on the HP iPAQ.
To provide the games with a simple mechanism to customize it, they approached Simon to port Simkin to the iPAQ and so provide the
ability to use XML schemas to describe elements of the game.
"Once we had done that HP wanted to extend the project to use phones," says Simon. "They had identified Symbian OS phones as the
emerging technology in this arena and they asked me to do a port." Through contacts Simon approached Symbian who provided comprehensive
support in porting Simkin. However HP did not proceed with the use of Symbian phones in the wearables project, although Simon notes
that there has been a fair amount of interest from Symbian developers since the port was made available through Sourceforge
In porting to Symbian Simon wanted to retain source code compatibility with the other versions of Simkin. "Maintaining compatibility
created two main challenges due to the fact that Symbian C++ does not include the ability to process C++ exceptions and you cannot
use the Symbian leave process in a C++ constructor," says Simon. "I managed to overcome most of these problems by using C++ macros,
part of which I had started for the HP port as Windows CE also lacked support for exceptions. In the most part this approached worked
but still there were some placed that needed particular code for Symbian."
Simkin is not a language that can be used to develop
applications from scratch. As Simon describes it "Simkin is a language that can be used to configure application behavior, I call
it an embeddable scripting language. So it bolts onto an application to allow a script to make the final decisions about the applications
behavior or allows users to control aspects of what the application does, but the real functionality is still in the host application."
Simon believes Simkin is well suited to games, where performance is an issue, as the intrinsic games functions can be developed in
C or C++ but then controlled by the light weight Simkin. "Using a conventional scripting language would simply not be possible for
that type of application," says Simon.
It's kind of funny to see the negative opinion about Ruby, this "Perl with exceptions, co-routines and OO done right" from an OO
evangelist, who managed to ride Java wave chanting standard mantras "Java is a great OO language", "OO programming Java is holy and
the best thing on the Earth since sliced bread" to the fullest extent possible. All those cries were enthusiastically performed despite
the fact the Bruce understands C++ well enough to feel all the deficiencies of Java from day one.
What the author misses here is that the length of the program (an indirect expression of the level of the language) is an extremely
important measure of the success of the language design. And here Ruby beats Python and trash Java.
BTW by this measure Java with all its OO holyness is a miserable failure in comparison with Ruby or Python as most Java programs
are even more verbose then the equivalent programs in C++.
Actually if you read Thinking in Java attentively you realize that
Bruce Eckel is more language feature collector type of person (a type very good for sitting in the language standardization committees)
that a software/language architect type of the person.
Also he by-and-large depends on the correct choice of "next hype wave" for the success of his consulting business and might be slightly
resentful that Ruby recently has a lot of positive press when he bet on Python. But the problem for consultants like Bruce might be
not about Python vs. Ruby, but that there might be no "the next wave".
December 18, 2005
Ruby is to Perl what C++ was to C. Ruby improves and simplifies the Perl language (the name "Ruby" is even a tribute to Perl),
and adds workable OO features (If you've ever tried to use classes or references in Perl, you know what I'm talking about. I have
no idea whether Perl 6 will rectify this or not, but I stopped paying attention long ago). But it also seems to carry forward some
of the Perl warts. For anyone used to, and tired of, Perl, this certainly seems like a huge improvement, but I'm tired of all impositions
by a language upon my thinking process, and so arbitrary naming conventions, reversed syntax and begin-end statements all seem like
impediments to me.
But it's hard to argue that Ruby hasn't moved things forward, just like C++ and Java have. It has clearly shaken the Python community
up a little; for a long time they were on the high ground of pure, clean language design. ... " Ruby, for example has coroutines,
as I learned from Tate's book. The expression of coroutines in Ruby (at least, according to Tate's example) is awkward, but they
are there, and I suspect that this may be why coroutines -- albeit in a much more elegant form -- are appearing in Python 2.5.
Python's coroutines also allow straightforward continuations, and so we may see continuation servers implemented using Python 2.5.
... ... ...
... the resulting code has 20 times the visual bulk of a simpler approach. One of the basic tenets of the Python language
has been that code should be simple and clear to express and to read, and Ruby has followed this idea, although not as far
as Python has because of the inherited Perlisms. But for someone who has invested Herculean effort to use EJBs just to baby-sit a
database, Rails must seem like the essence of simplicity. The understandable reaction for such a person is that everything they did
in Java was a waste of time, and that Ruby is the one true path.
... ... ...
So -- sorry, Jim (Fulton, not Kirk) -- I'm going to find something drop-dead simple to solve my drop-dead simple problems. Probably
PHP5, which actually includes most of Java and C++ syntax, amazingly enough, and I wonder if that isn't what made IBM adopt it.
... ... ...
However, I can't see Ruby, or anything other than C#, impacting the direction of the Java language, because of the way things
have always happened in the Java world. And I think the direction that C# 3.0 may be too forward-thinking for Java to catch up
to.
But here's something interesting. I was on the C++ standards committee from the initial meeting and for about 8 years. When Java
burst on the scene with its onslaught of Sun marketing, a number of people on the standards committee told me they were moving over
to Java, and stopped coming to meetings. And although some users of Python like Martin Fowler (who, it could be argued, was
actually a Smalltalk programmer looking for a substitute, because the Smalltalk relationship never really worked out in the real
world) have moved to Ruby, I have not heard of any of the rather significant core of Python language and library developers saying
"hey, this Ruby thing really solves a lot of problems we've been having in Python, I'm going over there." Instead, they write PEPs
(Python Enhancement Proposals) andmorph the language to incorporate the good features.
Dick FordRe: The departure
of the hyper-enthusiasts
Remember, both Tate and Eckel make a living writing and talking about programming technologies. So both
have to be looking down the road when Java goes into COBOL-like legacy status. It takes a lot of investment in time to
learn a programming language and their libraries well enough to write and lecture about them. So if Ruby "is the one"
to make into the enterprise eventually and Python never makes the leap, then that's a huge amount of re-tooling that Eckel
has to do. It looks like he's trying to protect his Python investment.
Lars StitzRe: The departure
of the hyper-enthusiasts
The "hyper-enthusiasts", as they are euphemistically called by the article, are no more (and no less)
than a bunch of consultants who want to advertise their expertise in order to gain more consulting gigs. They are not
that outspoken because they know more or are brighter than other developers, but because their blogs and websites like Artima
or TheServerSide grant them the benefit of publicity promotion at no cost.
Now, as mainstream development has moved
to Javaland for good, these consultants are not needed anymore. Everybody and their dog can write a good Java application that
uses decent frameworks and OR mappers and thus performs well in most tasks. So, more enthusiastic propaganda for Java does
not pay the bill for these folks anymore -- they have to discover a new niche market where their service is still needed. In
this case, what could be better than a language that only few people know yet? Instantly, the consultant's services appear
valuable again!
My advice: Don't follow the hype unless you have good reasons to do so. "A cobbler should stick to his last," as we say
in Germany. Sure, PHP, Perl, Ruby, C# have all their place in software development. But they are not to replace Java -- for
now. For this, their benefits over Java still are too small.
Very glad that you touched on the Tate's book. How about "I've never written a single EJB in my life"
from an author of "Bitter EJB"?..
I am sensing quite a bit of commercial pressure from Bruce and his camp. They are
simply not making enough margin teaching Java anymore. To do make that margin, you have to work hard, maybe not as
hard as B. Eckel but still really hard: play with intricacies of the language, dig ever deeper and deeper, invest the time
to write a book... But that's hard to do, kayaking is way more interesting.
So, it seems like Ruby has potential, why not throw a book or two at it, run a few $1000 a day courses... If you read "Beyond
Java", this is exactly what Jason Hunter says in his interview.
I think mercantilism of "hyper-enthusiasts" is yet to be analyzed.
That said, I am not buying another Tate's book every again, no matter how "pragmatic" it is.
I think you are mainly right about reasons for people moving to Ruby, and its influence on Python and
Java languages.
But by saying that "Java-on-rails might actually tempt me into creating a web app using Java again." and by comparing development
in Ruby to the one in EJB 1/2 (or even EJB 3), you are missing the fact, that part of the server-side Java community has already
moved to the lightweight approaches such as Spring Framework.
From a one and a half year experience of working as a
Spring web developer I must admit that the server-side Java development could be much simpler with lightweight approaches than
it was in the EJB 1/2 times.
> Does anyone have an Open Source Ruby application they can
> point me to besides "ROR"( prefarbly a desktop app)? I
> wouldn't mind analysing some of its code, if one exists,
> so I can get a better sense of why its going to be a great
> desktop application programming language for me to use.
How about a Ruby/Cocoa application? I'm speaking of the graphical
TestRunner for Ruby's Test::Unit I wrote: http://rubyforge.org/projects/crtestrunner/
It provides an interface similar to jUnit's for running ruby unit tests.
Also check out Rake and RubyGems. Actually any of the top projects on RubyForge are worth digging into.
DougHoltonRe: The departure
of the hyper-enthusiasts
I highly recommend skimming thru these short resources to get a much more in depth feel for what ruby
is like, if you are already familiar with java or python like myself. I did recently and I am finally "getting" ruby much better
(the perl-isms turned me off from seriously looking at it earlier, just like it took me a year to get over python's indenting):
Things I like:
-blocks
-you can be more expressive in ruby and essentially twist it into different domain-specific languages, see:
http://blog.ianbicking.org/ruby-python-power.html
-I like how standalone functions essentially become protected extensions of the object class (like C# 3.0 extension methods): http://www.rubyist.net/~slagell/ruby/accesscontrol.html
-using "end" instead of curly braces (easier for beginners and more readable)
Things I don't like and never will:
-awkward syntax for some things like symbols and properties
-awful perlisms like $_,$$,$0,$1,?,<<,=begin
-80's style meaningless and over-abbreviated keywords and method names like "def", "to_s", "puts", etc.
-:: (double colon) vs. . (period).
Ruby is definitely better than python, but still not perfect, and still an order of magnitude slower than statically typed
languages.
> 'For example, the beautiful little feature where you can
> ask an array for its size as well as for its length
> (beautiful because it doesn't terrorize you into having to
> remember the exact precise syntax; it approximates it,
> which is the way most humans actually work),'
>
> you've intrigued me, which means I might be one of those
> programmers who lacks the imagination to see the
> difference between an arrays size and its length. :D What
> exactly is the difference?
There is no difference. The terms are synonymous. Ruby, being a common-sense oriented language,
allows for synonymous terms without throwing a fit. It accommodates the way humans tend to think.
Java is the exact opposite. It is very stern, very non-commonsense oriented. It will throw a fit if you send the message
'length()' to an ArrayList. Although in the commonsense world, we all know what the meaning of the question: "what is your
length?" should be for an ArrayList. Still, Java bureaucratically insists that our question is dead wrong, and that we should
be asking it for its 'size()'. Java is absolutely non lenient.
Now, if you ask me, such boneheaded bureaucratic mindset is very dumb, very stupid. This is why anyone who develops in such
bureaucratic languages feels their debilitating effects. And that's why switching to Ruby feels like a full-blown liberation!
James WatsonRe: The departure
of the hyper-enthusiasts
> Java is the exact opposite. It is very stern, very
> non-commonsense oriented. It will throw a fit if you send
> the message 'length()' to an ArrayList. Although in the
> commonsense world, we all know what the meaning of the
> question: "what is your length?" should be for an
> ArrayList. Still, Java bureaucratically insists that our
> question is dead wrong, and that we should be asking it
> for its 'size()'. Java is absolutely non lenient.
And then what? You give up? You go and cry? The world explodes? I don't
get it. What's the big problem. You try to compile, the compiler says, "sorry, I don't get your meaning" and you correct the
mistake. Is that really a soul-crushing experience? And that's in the stone-age when we didn't have IDEs for Java. Now you
type '.' a list comes up and you select the appropriate method. Not that difficult.
How many times have your project enabled you to create reusable components?
Without blackboxes, you
will always be starting over and creating as many parts from scratch as needed.
Take Rails, for example. It's a framework built from components. One person was responsible for creating the main components,
like HTTP Interface, ORM, general framework, etc. One person only! And the components were so good that people were able to
use them with extreme ease (now known as "hype").
How many Java projects could have enjoyed a way to create good components, instead of poor frameworks and libraries that
barely work together? I would say most Java projects could enjoy a componentized approach because they generally involve lots
of very skilled people and lots of resources. :-)
What's a component compared to a library or a module? A component is a code that has a published interface and works like
a blackbox -- you don't need to know how it works, only that it works. Even a single object can be a component, like said by
Anders Hejlsberg (C#, Delphi):
"Anders Hejlsberg: The great thing about the word component is that you can just sling it about, and it sounds great, but
we all think about it differently. In the simplest form, by component I just mean a class plus stuff. A component is a self-contained
unit of software that isn't just code and data. It is a class that exposes itself through properties, methods, and events.
It is a class that has additional attributes associated with it, in the form of metadata or naming patterns or whatever. The
attributes provide dynamic additional information about how the component slots into a particular hosting environment, how
it persists itself-all these additional things you want to say with metadata. The metadata enables IDEs to intelligently reason
about what a component does and show you its documentation. A component wraps all of that up." http://www.artima.com/intv/simplexity3.html
So, to me, components are truly the fine-grained units of code reuse. With Ruby, I not only can create my own components
in a succinct way, but also can use its Domain Specific Language capabilities to create easy interfaces to use and exercise
the components. All this happens in Rails. All this happens in my own libraries. And all this happens in the libraries of people
who use Ruby. We are not starting our projects from scratch and hopping for the best. We are enjoying some powerful programmability!
Alex Bunardzic Re: The departure
of the hyper-enthusiasts
> The length/size inconsistency has nothing to do with Java
> and everything to do with poor API design decisions made
> in 1995, probably by some very inexperienced programmer
> who had no idea that Java would become so successful.
This is akin to saying that the Inquisition had nothing to do with
the fanaticism of the Catholic church, and everything to do with poor decisions some clergy made at that time. In reality,
however, the Inquisition was inspired by the broader climate of the Catholic church fanaticism.
In the same way, poor API design that Java is infested with was/is directly inspired by the bureaucratic nature of the language
itself.
Because it offers more proof that dynamically typed, loosely coupled
languages can more productive in creating robust solutions than statically typed, stricter languages with deeply nested class
hierarchies. Java and C# essentially lead us through the same path for tackling problems. One may be a better version of the
other (I like C# more) but the methodology is very similar. In fact the release of C# only validated the Java-style methodology
by emulating it (albeit offering a more productive way to follow it).
Enter Python or Ruby, both different from the Java/C# style. Both producing 'enlightening' expreriences in ever growing
list of seasoned, fairly well known static-style developers (Bruce Eckel, Bruce Tate, Martin Fowler...). As the knowledge spreads,
it pokes holes in the strong Java/C# meme in peoples minds. Then people start to explore and experiment, and discover the Python
(or Ruby) productivity gain. Some may prefer one, some the other. Ruby, in the end, validates the fact that Java/C# style methods
may not be the best for everything, something the Python advocates have been saying for quite some time.
CS-TR-02-9Authors: James Noble, Robert Biddle, Elvis Software Design Research Group ~Source: GZipped
PostScript(1700kb);
Adobe PDF(1798kb)\
These notes have the status of letters written to ourselves: we wrote them down because, without doing so, we found ourselves making
up new arguments over and over again. When reading what we had written, we were always too satisfied. For one thing, we felt they
suffered from a marked silence as to what postmoderism actually is. Yet, we will not try to define postmodernism, first because a
complete description of postmodernism in general would be too large for the paper, but secondly (and more importantly) because an
understanding of postmodern programming is precisely what we are working towards. Very few programmers tend to see their (sometimes
rather general) difficulties as the core of the subject and as a result there is a widely held consensus as to what programming is
really about. If these notes prove to be a source of recognition or to give you the appreciation that we have simply written down
what you already know about the programmer's trade, some of our goals will have been reached.
Everyone's buzzing about Bruce Eckel's
"anti-hype"
article. I hope the irony isn't lost on him.
... ... ...
First, inferior languages and technologies are just as likely to win. Maybe even more likely, since it takes less time to get
them right. Java beat Smalltalk; C++ beat Objective-C; Perl beat Python; VHS beat Beta; the list goes on. Technologies, especially
programming languages, do not win on merit. They win on marketing. Begging for fair, unbiased debate is going to get your language
left in the dust.
You can market a language by pumping money into a hype machine, the way Sun and IBM did with Java, or Borland did back with Turbo
Pascal. It's pretty effective, but prohibitively expensive for most. More commonly, languages are marketed by a small group of influential
writers, and the word-of-mouth hyping extends heirarchically down into the workplace, where a bunch of downtrodden programmers wishing
they were having more fun stage a coup and start using a new "forbidden" language on the job. Before long, hiring managers start
looking for this new language on resumes, which drives book sales, and the reactor suddenly goes supercritical.
Perl's a good example: how did it beat Python? They were around at more or less the same time. Perl might predate Python by a
few years, but not enough for it to matter much. Perl captured roughly ten times as many users as Python, and has kept that
lead for a decade. How? Perl's success is the result of Larry Wall's brilliant marketing, combined with the backing of a strong publisher
in O'Reilly.
"Programming Perl" was a landmark language book: it was chatty, it made you feel welcome, it was funny, and you felt as if Perl
had been around forever when you read it; you were just looking at the latest incarnation. Double marketing points there: Perl was
hyped as a trustworthy, mature brand name (like Barnes and Noble showing up overnight and claiming they'd been around since 1897
or whatever), combined with that feeling of being new and special. Larry continued his campaigning for years. Perl's ugly deficiencies
and confusing complexities were marketed as charming quirks. Perl surrounded you with slogans, jargon, hip stories, big personalities,
and most of all, fun. Perl was marketed as fun.
What about Python? Is Python hip, funny, and fun? Not really. The community is serious, earnest, mature, and professional, but
they're about as fun as a bunch of tax collectors.
... ... ...
Pedantry: it's just how things work in the Python world. The status quo is always correct by definition. If you don't like something,
you are incorrect. If you want to suggest a change, put in a PEP, Python's equivalent of Java's equally glacial JSR process.
The Python FAQ goes to great lengths to rationalize a bunch of broken language features. They're obviously broken if they're frequently
asked questions, but rather than 'fessing up and saying "we're planning on fixing this", they rationalize that the rest of the world
just isn't thinking about the problem correctly. Every once in a while some broken feature is actually fixed (e.g. lexical scoping),
and they say they changed it because people were "confused". Note that Python is never to blame.
In contrast, Matz is possibly Ruby's harshest critic; his presentation
"How Ruby Sucks" exposes so many problems with his language
that it made my blood run a bit cold. But let's face it: all languages have problems. I much prefer the Ruby crowd's honesty to Python's
blaming,
hedging and
overt rationalization.
As for features, Perl had a very different philosophy from Python: Larry would add in just about any feature anyone asked for.
Over time, the Perl language has evolved from a mere kitchen sink into a vast landfill of flotsam and jetsam from other languages.
But they never told anyone: "Sorry, you can't do that in Perl." That would have been bad for marketing.
Today, sure, Perl's ugly; it's got generations of cruft, and they've admitted defeat by turning their focus to Perl 6, a complete
rewrite. If Perl had started off with a foundation as clean as Ruby's, it wouldn't have had to mutate so horribly to accommodate
all its marketing promises, and it'd still be a strong contender today. But now it's finally running out of steam. Larry's magical
marketing vapor is wearing off, and people are realizing that Perl's useless toys (references, contexts, typeglobs, ties, etc.) were
only fun back when Perl was the fastest way to get things done. In retrospect, the fun part was getting the job done and showing
your friends your cool software; only half of Perl's wacky features were helping with that.
So now we have a void. Perl's running out of steam for having too many features; Java's running out of steam for being too
bureaucratic. Both are widely beginning to be perceived as offering too much resistance to getting cool software built. This
void will be filled by... you guessed it: marketing. Pretty soon everyone (including hiring managers) will see which way the wind
is blowing, and one of Malcolm Gladwell's tipping points will happen.
We're in the middle of this tipping-point situation right now. In fact it may have already tipped, with Ruby headed to become
the winner, a programming-language force as prominent on resumes and bookshelves as Java is today. This was the entire point of Bruce
Tate's book. You can choose to quibble over the details, as Eckel has done, or you can go figure out which language you think is
going to be the winner, and get behind marketing it, rather than complaining that other language enthusiasts aren't being fair.
Could Python be the next mega-language? Maybe. It's a pretty good language (not that this really matters much). To succeed, they'd have to get their act together today. Not in a year, or a few months, but today -- and they'd have to realize
they're behind already. Ruby's a fine language, sure, but now it has a killer app. Rails has been a huge driving and rallying force
behind Ruby adoption. The battleground is the web framework space, and Python's screwing it up badly. There are at least five major
Python frameworks that claim to be competing with Rails: Pylons, Django, TurboGears, Zope, and Subway. That's at least three
(maybe four) too many. From a marketing perspective, it doesn't actually matter which one is the best, as long as the Python community
gets behind one of them and starts hyping it exclusively. If they don't, each one will get 20% of the developers, and none
will be able to keep pace with the innovation in Rails.
The current battle may be over web frameworks, but the war is broader than that. Python will have to get serious about marketing,
which means finding some influential writers to crank out some hype books in a hurry. Needless to say, they also have to abandon
their anti-hype position, or it's a lost cause. Sorry, Bruce. Academic discussions won't get you a million new users. You
need
faith-based arguments. People have to watch you having fun, and envy you.
My guess is that the Python and Java loyalists will once again miss the forest for the trees. They'll debate my points one by
one, and declare victory when they've proven beyond a doubt that I'm mistaken: that marketing doesn't really matter. Or they'll say
"gosh, it's not really a war; there's room for all of us", and they'll continue to wonder why the bookshelves at Barnes are filling
up with Ruby books.
I won't be paying much attention though, 'cuz Ruby is soooo cool. Did I mention that "quit" exits the shell in Ruby? It does,
and so does Ctrl-D. Ruby's da bomb. And Rails? Seriously, you don't know what you're missing. It's awesome. Ruby's dad could totally
beat up Python's dad. Check out Why's Poignant Guide if you don't believe
me. Ruby's WAY fun -- it's like the only language I want to use these days. It's so easy to learn, too. Not that I'm hyping it or
anything. You just can't fake being cool.
This article's main purpose is to review the changes in programming practices known collectively
as the "rise of scripting," as predicted in 1998 IEEE COMPUTER by Ousterhout. This attempts to be both brief and definitive, drawing
on many of the essays that have appeared in online forums. The main new idea is that programming language theory needs to move beyond
semantics and take language pragmatics more seriously.
... ... ...
Part of the problem is that scripting has risen in the shadow of object-oriented programming and highly publicized corporate battles
between Sun, Netscape, and Microsoft with their competing software practices. Scripting has been appearing language by language,
including object-oriented scripting languages now. Another part of the problem is that scripting is only now mature enough to stand
up against its legitimate detractors. Today, there are answers to many of the persistent questions about scripting:
Is there a scripting language appropriate for the teaching of CS1 (the first programming course for majors in the undergraduate
computing curriculum)?
Is there a scripting language for enterprise or real-time applications?
Is there a way for scripting practices to scale to larger software engineering projects?
The classic language from the creator of Snobol. Icon introduced many interesting constructs (generators), moreover Icon constructs
were done right unlike similar attempts in Perl and Python.
Remark about the danger of mixing of arbitary languages in large projects (in this case Java and Perl). This is true that Perl and
Java are an odd couple for a large project (everybody probably would be better off using
Jython for this particular project ;-)
is like a bacterial infestation. I worked on a large Perl based ecommerce project and a large Java based Ecommerce project. In
the end, to insure quality code we had 100% make sure use strict was used, we had to forbid many things Perl programmers pride themselves
on in order to get 8 developers to stop duplicating work, stepping on each others code and make our code malleable to changes in
specs.
In Java project it was sooo much easier. Sure it took a little longer to start up, creating the Beans, the database layer etc,
but once we were going everyone used the code we created, adding features and dealing with changing specs were SOO much easier.
Now comes to the point of the title, we were on a tight deadline, so the bosses got a team from another part of the company to
write a PDF generator. That piece came in Perl. Now the piece was written by good, skilled programmers, but dealing with different
error log locations, creating processes for the Perl interpreter to live in etc was a nightmare. If we paid the $$ for a 3rd Party
Java PDF writer or developed our own we could have saved a good 2-3 man months off of the code. I learned pretty quickly as the only
'Perl' guy on the Java side of the project, You should NEVER, EVER mix languages in a project.
Scripting languages are fine for small one-two page cgi programs, but unless you can crack a whip and get the programmers to fall
in line, you'd better let the language and environment do that.
btw, J2EE are frustrating to Script Programmers because they were DESIGNED to be. But if you were ever in charge of divying out
tasks in a large project you'll realize how J2EE was designed for you.
The Last but not LeastTechnology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
Copyright � 1996-2021 by Softpanorama Society. www.softpanorama.org
was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP)
without any remuneration. This document is an industrial compilation designed and created exclusively
for educational use and is distributed under the Softpanorama Content License.
Original materials copyright belong
to respective owners. Quotes are made for educational purposes only
in compliance with the fair use doctrine.
FAIR USE NOTICEThis site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
You can use PayPal to to buy a cup of coffee for authors
of this site
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society.We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.