Attempts to create a new JCL for Unix are OK. But pretentions that they change the current situation to the better are
open to review. Is the King naked ?
�More generally, it's impressive how many people can look at the landscape of dysfunctional technology and failed promises that
surrounds us today and still insist that the future won't be like that.
Most of us have learned already that upgrades on average have fewer benefits and more bugs than the programs they replace,
and that products labeled "new and improved" may be new but they're rarely improved; it's starting to sink in that most new technologies
are simply more complicated and less satisfactory ways of doing things that older technologies did at least as well at a lower
cost.
Try suggesting this as a general principle, though, and I promise you that plenty of people will twist themselves mentally
into pretzel shapes trying to avoid the implication that progress has passed its pull date�
Recently several JCL-style batch systems with the ability to manage
multiple nodes from a central point (the headnode) -- were introduced in the form of Unix configuration management systems
(Puppet, Chief, Ansible, etc) We should remember that Unix Borne shell was a tremendous step forward in comparison with IBM JCL,
and wiped the floor with it. So why we have reincarnation of this technology that was buried more than 40 years ago with the
emergence of Unix shells (IBM itself switched to REXX as the shell language in VM/CMS and later added it to JCL in VMS.)
Moreover, with those system we are adding another JCL-style domain specific language (DSL) to already excessively complex environment. Does the game worth
the candles? Why not to try to expand bash, or to create a specialized subset of Python? Or simply try to extend Python to the
new domain via some framework, much like was done with extending it for website creation.
For example, Ansible clearly plays this "new JCL" game with its JSON-based language for playbooks, replicating on a new level JCL
style solution, inferior in comparison with Unix shell dinosaur shell language, invented by IBM in early 60th.
The landscape we have in Linux now is one of tremendous level of overcomplexity. When a normal
sysadmin is unable to learn the system and is unable fully master one of two major scripting languages for Unix (Perl or
Python). Is there a better way to implement Unix configuration management system DSL (domain specific language) then to replicate
on a new level IBM JCL. Does it provide claimed benefits in comparison with usage by sysadmin of a collection of simple tools.
We will try to address those question this article.
First of all let's discuss the landscape into which those new systems -- JCL-style batch systems with the ability to access
multiple nodes -- are introduced. We should remember that Unix Borne shell was a tremendous step forward in comparison with IBM JCL,
and wiped the floor with it. So why we have reincarnation of this technology that was buried more then 40 years ago with the
emergence of Unix shells (IBM itself switched to REXX as the shell language in VM/CMS and later added it to JCL in VMS.).
We are adding another JCL-style domain specific language (DSL) to already excessively complex environment. Does the game worth
the candles? Why not to try to expand bash, or to create a specialized subset of Python? Or simply try to extend Python to the
new domain much like was done with extending it for website creation.
For example, Ansible clearly plays this "new JCL" game with its JSON-based language for playbooks, replicating on a new level JCL
style solution, inferior in comparison with Unix shell language. And BTW JCL was invented by IBM in early 60th. So
Ansible and froends are a little bit late to the party.
The landscape we have in Linux now is one of tremendous level of overcomplexity. When a normal sysadmin is unable to learn the
system and is unable fully master one of major scripting languages (Perl or Python). They still are able to learn bash and AWK, as
for bash there are also a lot of complexity in modern versions, too. We know that Linux complexity junkies in Red Hat (especially
since Red Hat 7 which introduced systemd) and Suse
dominate. In a way they can probably be viewed as a suicide cult of overcompleixty masquerading as Linux distribution
vendors ;-).
So let's talk a little bit about this unending drive to higher and higher levels of overcomplexity. Currently any Linux sysadmin needs
intimately know approximately a hundred out of around 250 key utilities (with some of them such as grep, find, yum, rpm,
rsync, curl, wget being quite complex software systems in themselves (in case of curl and wget,
the most interesting staff starts, if you are behind the proxy). And they are becoming more complex with time. For example,
19 years ago curl was a few hundred lines of code. Today it's around 150K lines of C (daniel.haxx.se).
Add to this that any sysadmin needs to know at least two scripting languages (bash and Perl or bash and Python) and you will get "mental stack overflow"
for most normal human beings. But in reality, if your organization uses Web site you need to know LAMP stack and some
JavaScript too, if not R or some other more specialized language used in your organization.
As the result knowledge became sketchy and fuzzy. Let's ask ourselves does any sysadmin really know even simper utilities with just
a dozen of options like rm. For example what option -I means. Probably not. And that as I can attest includes Red
Hat staff. Otherwise they would never add alias rm='rm -i' to the .bashrc of root. The rm='rm -i' alias is an
invitation to a yet another horror story, because after you get used to it, you will automatically expect rm to prompt
you by default before removing files. Of course, one day you'll run it on an account that hasn't that alias set and before you understand
what's going on, it is too late.
Even such classic Unix utility as diff in recent version is more complex and has more capabilities then most people realize
(now it can compare directories). To lesser extent is also true even for ls in the current implementation :-) How many
sysadmin know the difference between -a and -A in the ls utility, or whether the alias
alias ll='ls -hAlF --group-directories-first'
would work on RHEL 5 (it does work in RHEL6 & 7) ? Add to this constant troubles with colors, when users who use light background
in their terminals are completely out of luck with the standard /etc/DIR_COLOR and even ls does not look like a simple
utility.
And the utilities is only the tip of the iceberg. Sysadmins also need to know about location and structure at least a couple
of dozen of important configuration files including but not limited to /etc/hosts, passwd, group, shadow,
profile (and /etc/profile.d/ directory),resolv.conf, ntp.conf, fstab,
exports, sshd.conf,yum.conf (and related /etc/yum/repos.d/ directory), sysctl.conf,
sysconfig/network and several other files in /etc/sysconfig directory, /etc/xinetd.d/ directory /etc/init.d/
directory(or, worse, systemd craziness in RHEL 7), and so on and so forth.
You might be surprised to see the result of the command
find /etc -name "*\.conf" | wc -l
More then 60 files typically are listed).
Next comes the knowledge of bash shell with its complex set of built-ins, which is a must for any system administrator. Next
in importance is the knowledge of Perl as a new major scripting tool, available on all major platform and far superior to bash for complex
scripts. Next come Apache, PHP and MySQL (so called LAMP stack) which is widely used in many organizations and which sysadmins need
to support and be able to troubleshoot typical problems. This is "bread and butter" of hosting companies, but in any organization
you can find applications that depends on LAMP stack such MediaWiki. Speaking of databases you need at least to know how to install
Oracle.
Add to this a dozen of common daemons such as atd, cron, init, iptables, nfs, nis,
sshd, vsftp, rsyslogd (or other variant of syslog), xinetd, postfix (or old
Sendmail), bind,sysstat, with their own configuration files and quirks. SELinux is another huge subsystem.
Next comes X11 and related daemons such as VNC, XRDP (and X11 is so complex that you can start understanding this system only
after you programmed a couple of applications for X11). Than there is LVM (with its own set of utilities) -- a really complex
subsystem, knowledge of which is crucial when you have damaged disks or unbootable system (with root on LVM as an additional torture
for sins that you committed ;-). When an important production server with root partition on LVM fails to boot late at night
after patching, you might feel like you are falling into elevator shaft )premium Red Hat support might help in such cases, as they use
people of different continents to provide continues coverage, but they usually do not want to dig deep into the problems trying to point
you to some article in their database, often irrelevant) .
We also need to know a set of backup utilities/archivers such tar, gzip, zip/unzip, cpio,
etc. And a set of command like utilities sysadmin usually uses such as anaconda, expect, screen, lftp, dos2unix,
mutt, scp, ssh, etc.
Python is another important scripting language used in major Red Hat applications such as yum and anaconda and
increasingly used for writing Linux system applications, displacing Perl, but unless you are really gifted programmer three languages
(bash, Perl, and Python) is one too much. There is simply no space in brains for the third scripting language, unless you limit yourself
to the basic subset. So, in essence, it is iether Perl or Python, but not both.
In short Linux is already way too much for a human brain... And usage of a particular utility of daily basis does not mean its deep knowledge
(as was demonstrated in the ls example above). But if frequent use does not automatically means deep knowledge, then rare use definitely
means the lack of deep knowledge, and possibly loss of existing level of knowledge with time :-(. At the end for rarely used utilities
people often use a small subset of available functionality and deteriorating to the "level-zero" with time, even if at some point they have
known more. And that the only way to survive and preserve sanity in such an environment.
And we did not even start talking about all those exciting games connected with compiling applications from the source code
using the GNU complier stack (gcc, make, config), or Intel compliers stack, which is growing in popularity,
especially for computational applications where it already became the standard de-facto. Add to this multithreading, hardware
issues, the knowledge of remote control units such as Dell DRAC and HP ILO (specialized computers that powered separately and control
such functionality as remote boot (including boot from ISO of a flash card or mounted remotely) and provide the remote console
functionality, as well as checks for the health of the hardware) and we get not just a single mental "stack overflow", but the
double mental "stack overflow". And this situation needs to be dealt with in the environment when the demand for your services
is unpredictable, urgent, and above all, relentless. So while the idea of a specialized system that sheds sysadmin from
a part of this complexity is a sound one. But the way to implement this idea is open to review.
Also in case of server is down the sysadmin came under constant pressure and need to go to lower level of abstraction to solve
the problem, or wait when vendor technical support solves it for him. You can't go to lower level of abstraction if you never worked
on it on a regular basis. Level of this pressure vary from one organization to another, and from one week to
other, but often (especially in cases of downtime) it reaches at some point a substantial level, when it starts to affect your judgments.
As in making decisions under stress. And that increase the chances to commit some spectacular blunder, see
Sysadmin Horror Stories. In this sense systems Like Ansible can became more of liability
then help, as badly thought out solution will be dutifully replicated on a dozen or more boxes. Classic example of stupid use
of Ansible is sending RHEL6 authentication related files (passwd, shadow, groups, etc) to RHEL7 servers. This is a vey
simple and reliable way to hose those boxes. SSH will not work after that exercise and you need to work with DRAC/ILO/Whatever to
get to them.
It is quite clear that Linux is now a definite example of the system that is far beyond human capacities to understand. And
was for some time. Although this analogy is definitely somewhat stretched, the behaviour of Linux distributors reminds me the drive
of financial institutions toward higher and higher levels of leverage in the quest for higher profits that culminated in 2008.
At some point the population just can't take any more debt and the system crashed. We already see somewhat similar effect with
Microsoft (which is a real king of complexity) and PCs, when some people voluntarily downgrade the functionality of their desktops by
switching from Microsoft Windows PCs to simpler (and better watched by NSA ;-) Chromebooks.
This toxic mix of Linux (and Unix in general) overcomplexity and proliferation of different versions of Unix/Linux within the same
datacenter (often with almost half-dozen of flavors used, such as RHEL/CentOS/Oracle Linux, SUSE, Solaris, HP-UX and AIX)
creates a need for systems that helps to manage Linux/Unix and protect your sanity from the behaviour of Linux vendors who now are replaying
the Unix wars on a new, but no less nasty level, than was the case in the old
Unix wars. In case of Red Hat, Linux version of Unix
wars reminds me some kind of civil war as differences between RHEL6 and RHEL7 are so substantial that they can be called alternatives,
not so much as one being the successor of another ;-). Looks like this so called Red Hat civil war is fought within Red
Hat camp, between server-oriented "traditionalists" and the radical sect of fanatical adherents to Linux desktop (Linux Taliban ;-).
In which the latter is winning. With the introduction of systemd
Red Hat distribution became something like
Mad Hatter in
Alice in Wonderland. (slightly rephrasing : Linux is a place like no
place on Earth. A land full of wonder, mystery, and danger! Some say to survive it you need to be as mad as a hatter. Which luckily
I am. "):
Mercury was used in the manufacturing of felt
hats during the 19th century, causing a high rate of mercury poisoning in those working in the hat industry.[1]
Mercury poisoning causes neurological damage, including slurred speech, memory loss, and tremors, which led to the phrase "mad
as a hatter"
...In the chapter "A Mad Tea Party", the Hatter asks a much-noted riddle "why is a
raven like a writing desk?" When Alice gives up trying to figure
out why, the Hatter admits "I haven't the slightest idea!".
With the default RHEL 7 settings
systemd tends to talk to itself polluting the syslog with spam (you can cut this useless chatter with the command
systemd-analyze set-log-level notice ):
Mar 5 03:30:01 srv255 systemd: Starting user-0.slice.
Mar 5 03:30:01 srv255 systemd: Started Session 21356 of user root.
Mar 5 03:30:01 srv255 systemd: Starting Session 21356 of user root.
Mar 5 03:30:01 srv255 systemd: Removed slice user-0.slice.
Mar 5 03:30:01 srv255 systemd: Stopping user-0.slice.
Mar 5 03:40:02 srv255 systemd: Created slice user-0.slice.
Mar 5 03:40:02 srv255 systemd: Starting user-0.slice.
Mar 5 03:40:02 srv255 systemd: Started Session 21357 of user root.
Mar 5 03:40:02 srv255 systemd: Starting Session 21357 of user root.
Mar 5 03:40:02 srv255 systemd: Removed slice user-0.slice.
Mar 5 03:40:02 srv255 systemd: Stopping user-0.slice.
Mar 5 03:50:01 srv255 systemd: Created slice user-0.slice.
Mar 5 03:50:01 srv255 systemd: Starting user-0.slice.
Mar 5 03:50:01 srv255 systemd: Started Session 21358 of user root.
Mar 5 03:50:01 srv255 systemd: Starting Session 21358 of user root.
Mar 5 03:50:01 srv255 systemd: Removed slice user-0.slice.
Mar 5 03:50:01 srv255 systemd: Stopping user-0.slice.
Mar 5 04:00:01 srv255 systemd: Created slice user-0.slice.
Mar 5 04:00:01 srv255 systemd: Starting user-0.slice.
Mar 5 04:00:01 srv255 systemd: Started Session 21359 of user root.
Mar 5 04:00:01 srv255 systemd: Starting Session 21359 of user root.
Mar 5 04:00:01 srv255 systemd: Removed slice user-0.slice.
Mar 5 04:00:01 srv255 systemd: Stopping user-0.slice.
... ... ...
I strongly encourage you to read the
systemd-devel mailing
list archive to see issues you can possibly face. Here is one example:
Hello list, sometimes i have problems rebooting some machine. i think in that cases shutting down some services fails and
machine stays somewhere between life and death.
Unfortunately my ssh window closes at first and no reconnect is possible, it only tells "Connection refused".
If this happens, then i have to do a call to someone who works in datacenter and resets my machine by hand. I would like
to keep sshd alive as long as possible to reconnect and fix this by hand.How can i achieve this?
System is Ubuntu 16.04 with systemd 229-4ubuntu16
I goggled some similar questions and tried but without success. What could i do? Thanks,
Hajo
Those are issues that Unix configuration management systems supposedly should help to solve. But can they ? Can they provide
real help, or "the king is naked" and they are only able to do easy, trivial tasks (which is also important), that do not matter much
and can be performed equally well with other tools? That is the question.
What Unix configuration management system such as Ansible provide is a reincarnation of JCL with some bell and whistles. In other
words another, more specialized shell. Which is an additional to the regular shell.
So this is a new language that sysadmin need to learn, That's why at least half-dozen books exists for each of top 5 Unix configuration
management systems (in case of Puppet, we can talk about a couple of dozens of low quality books). But the problem is that the
claim that they will make sysadmin life easier and configuration-related tasks a breeze to perform is slightly exaggerated :-).
Adding complexity is a double edge sword. they provide the user of a higher level of abstraction for tasks which involve series
of consecutive inte4rconnected (failure of one step effects all or some subsequent steps) steps. This is the same functional area which
was addressed by IBM JCL.
What that means in reality is that when such systems work, everything is fine, but when they don't you are really screwed, because
switching to the higher level of abstraction automatically means that you know less about underling layers. In other words, you become
more like a regular Windows system administrator, who knows how to use Control Panel extremely well, but very little about what
is inside Windows and its registry. Also none of the popular Unix configuration systems try to adhere to Unix paradigm of building
system with maximum utilization of existing tools. They prefer reinventing the bicycle in best Microsoft style -- create another complex
monolithic Swiss army knife with multiple bell and whistles. You can resist that became you control what is executed in this new
JCL, but the trend is undeniable
There is a growing realization in Linux sysadmin community that more system software is not always better, and adding yet another
complex software system that supposedly helps sysadmin on top of multiple (already underutilized) existing systems might produce quite
an opposite results. No Unix system administrator can hope to learn more then a small part of functionality of a set of complex
tools that he/she uses in his lifetime. There are just too many of them and that list includes Unix configuration systems, NAGIOS (or
equivalent) and several other depending on the company in which you are working.
There is a growing realization in Linux sysadmin community that more system software is not always better, and adding yet
another complex software system that supposedly helps sysadmin on top of multiple (and already underutilized) existing systems
might produce quite an opposite results.
Generally there are three approaches to emulate JCL in Unix:
Stay within the Unix paradigm and try to combine simple tools with shell or Perl as the glue (there is also a space for innovation
here); Perl was designed as a programming language for automating system administration tasks. Scripting your tasks using Perl
or bash tasks using ssh, PDSH, rdist, rsync, tar, etc as components is a pretty powerful approach with zero learning curve; see below
tarball approach to Config management ). In
this sense, tar, RPMs, parallel execution toolsrsync, possibly combined with such tools as Midnight Commander and
Expect (or substitute) probably can provide 70-80% of the necessary "API" for your
scripts without extra hassle. Versioning system can be gradually added to provide the central repository of changes on the seed server.
I do not recommend them for deployment of each server unless you are also a good programmer.
Implement you own proprietary language which should be "JCL, the next generation" and claim that this is a radically new better
approach, inherently connected with DevOps hoopla. This is the path the Ansible,
Puppet, Chief, etc has taken. If you observe some precautions learning this JCL (but only one) might be not a
bad idea. The rule is never fight fashion, especially it is merged with the influential techno-cult, which managed to brainwash top
IT brass as it allows to put a smoke screen on further outsourcing ;-). It might be better to declare at least formally
that "you are in" and then use just minimum functionality. The minimum use of Ansible is emulation with it the functionality that
pdsh (or other parallel execution tool) provides. For example
ansible atlanta -a "/sbin/reboot"
Open resistance to the whims of the top brass in enterprise IT usually leads to complications during annular performance review
;-). Also as somebody said that there is no atheists in the trenches, so joining a techno-cult might just increase your chances of
survival... The best path here is to get system that can generate code in Perl, or shell so that it can be inspected, and if
necessary, manually adapted, before applying it to members of the group.
It also helps if the system is rather small and written in the language that you know well, or at least want to learn. Which limits
implementation languages to three languages (bash, Perl and Python), unless you are a Ruby enthusiast. Such system might also provide
some inventory management and have a sophisticated integrated database that simplifies creation of various reports and can integrate
some hardware inventory tasks. often is you monitoring needs are modest and you understand that nothing can be done in a large organization
in less then an hour, also can double as a monitoring system. That's especially true for medium size datacenters. Paradoxically,
in most cases probes that run once an hour are as good or better, than probes that are running each minute ;-).
The main drawback of using this new JCL is that first of all it is another language you need to master, which implicitly promotes
superficial "click-click-done" mentality of Windows administrators, and add a burden of writing you own scripts in "yet another DSL"
that it uses. Also if the situation is not within narrow parameters that those systems can handle, sysadmin is completely lost, as
the lower levels now are hidden from him.
Try to expand one on major scripting language already used by sysadmins (for example bash or Python) to the new domain or
implement a subset of this language as a new DSL for configuration management. This is approach used in several existing Unix CMS
such as Chief and cdist
Any job in JCL-style languages consists of steps -- invocation of separately compiled programs or scripts. Each step returns a
return code (RC). If RC outside of acceptable range (for example is non-zero) for a particular step, iether job fails (following steps are not executed) or
"recovery step(s)" are executed after which job continued from the next step.
In bash typical control JCL control flow can be emulated in the following way
Of course, there you can add bells and whistles to this scheme, but the basic logic is as simple as described. Existence
of multiple servers on which this program executes means that there should be some "summary" and the ability to view each protocol
separately, but this tasks is well performed by any parallel execution program HPC scheduler. For example, by SGE or
Slurm (both are open source). Actually Unix configuration management
system that have agents of managed nodes looks to me very similar to HPC schedulers and borrow from them key concepts.
Today such names as Puppet (released in 2005, was written in Ruby, and is closely associated
with DevOps ) closely associated with DevOps. which is not true. They all are descendants of IBM JCL and the first of such system
was cfengine (written in 1993; it never got much traction). While the idea of adding
JCL to Unix is not bad per se, they are pushed upon us by all this DevOps
hoopla, which really has found traction on higher levels of IT management as a smoke screen for further outsourcing.
They want to provide you the abilities to bind a service to a special network interface or to configure different database
servers for your application for different environments. Or do some other complex staff. Fine. But in the process they make simple
tasks complex. In other words they are just redefining existing API in a new way. Due to that, few, if any, of popular configuration
management systems are successful in lessening the load on sysadmin and provide positive return on investment of time and effort to
deploy them. They might have other benefits, but lessening sysadmin workload is not the one. In other words, for many
popular configuration management systems the return on investment in time and effort for accomplishing the task is either negative or
close to zero.
They are essentially trying to reinvent the wheel -- repackage the existing functionality in a new way. There are powerful Unix tools
that in combination can provide at least 80% of the necessary functionality without the necessity to deploy and learn yet another
complex software system (see Introduction � etch)
:
In either cfengine or puppet you have a maze of classes, controls, modules, resources, etc.
Where you store your configuration within your cfengine or puppet tree has no obvious correlation to where it ends up on your clients.
You can and will spend hours, quite possibly days, studying manuals and searching the web just to get
the simplest initial setup.
... cfengine doesn't actually support doing much that is useful. So you end up using
it as a framework for a bunch of little shell scripts you hack together. Puppet is somewhat better, but still lacking.
I would say more outside the idea of re-implementing JCL in Unix they lacks any significant ideas that can lessen admin burden.
Those system supposedly can ensure that complexity of changes to Linux/Unix hidden in pre-written "JCL batch jobs" (partially
created by others, so there some level of synergy and community in the usage of such a system) and handled in a more systematic
way, more close to software development paradigm (or fashion ;-). While theoretically that helps to ensure that a system
is configured in a correct and reliable manner, the road to hell is always paved with good intentions.
Also please note that an idiot with a tool that handles changes in a systematic manner on multiple servers remains an idiot. Only
more dangerous idiot because with them he/she can inflict more damage. Much more damage. Road to hell is always paved with
good intentions.
There are three major components of any Unix configuration management system
Configuration description language. If is also called DSL (domain specific language), and all current attempts
are just reinvention of JCL2.0 for a new environment. They differ in functionality and bell and whistles. Ansible playbooks
is probably the most simple and robust approach here. Currently it is unique to most systems: there is no any attempts
of standardization here. The jury is out about what is the proper configuration language for this domain and to what extent it should
be declarative. Much of usability of a Unix configuration management system depends on the DSL quality, or the lack of
thereof. DSL complexity by-and-large determine steepness of the learning curve for the particular system.
Repository subsystem. This is the place where you store scripts, files and RPMs to distribute. RPM repositories are an
important part of such repository subsystems and RPMs allow you to perform tasks within the RPM package somewhat similar to JCL.
Collection of full images of the server is also kind of repository in disguise.
Distribution subsystem. Currently scp/ssh is the most popular protocol for secure retrieval and transmitting of a set
of files to a group of servers. NFS or other distributed filesystem (GPFS,
Lustre, etc) for a set of servers in the same
datacenter can be used instead (for example in HPC clusters where all servers has access to common distributed filesystem),
or along with scp. Some systems has agents which communicate using SSL with the "mother ship". But most Unix administrators are not
even slightly interested in using some half-baked new protocol (with possible security holes) for communication between the master
server and clients, if server-client configuration management system is used.
"When people are free to do as they please, they usually imitate each other."
Eric Hoffer
Creation of new specific to the particular set of problems language is how humans typically approach to solving new problems. In
this sense DSL (domain specific languages) that many Unix configuration management systems introduce can be viewed as derivative on
JCL (which was created in 1962 ;-) and are more natural way to approach the problem of executing jobs consisting of several serial steps
then Unix shell. But the devil is in the details and road to hell is paved with good intentions.
Creating of the right DSL (domain specific language) which can server as JCL 2.0 for Unix is not an easy task. Language design is
the area that requires unique, pretty rate talent. Plus a lot of luck (like in being in the right place at the right time). Most current
DSL are too verbose and this is a mortal sin, as sysadmins time is limited and valuable resource. The idea is provide some additional
functionality that is absent or more difficult to achieve in standard, classic Unix tools and hide the differences between various Unix/linux
flavors. But the question is: is this true and at what price this functionality is provided? Can more simple wrappers written in shell
or Perl replace this complex systems in those cases when they are needed?
The key ideas behind most Unix configuration systems DSLs is to factor out common tasks inherent in jobs consisting of several serial
steps, such as checking the return code of the job, providing legible protocol of execution of each step and summary like
[OK] Step1 -- creation of account
[Failure] Step 2 -- checking if passwordless login for accounts work
Traditionally those tasks were accomplished using bash (or Perl), classic Unix tools and a dozen of more Linux specific
utilities yum, useradd, groupadd, service, chkconfig, etc. So integration them into some "ad hoc" DSL is "one steps
forward, two steps back": you do add functionality, but you also add a lot of complexity and a new as-hoc language, which sysadmins
need to learn and use. New JCL scripts need to be maintained like any others.
Puppet started letting you write your specifications in Ruby after version 2.6. but it was too little to late. Puppet DSL was already
entrenched. Also it is not clear if writing JCL task in Ruby is the correct way to approach the problems, which sysadmin face in executing
jobs consisting of several serial task on multiple servers. It essentially returns us to the square one: implementing JCL using traditional
Unix tools (but with some provided by the system new primitives). Chief recipes are written in subset of Ruby called Pure
Ruby. And that was from the very beginning. That makes Chief a better, more sophisticated system then Puppet.
Ansible JCL scripts are called playbooks and it uses a rather simple language based on
YAML. a superset of JSON. So it is a somewhat different
approach then we find in Chief. So Chief and Ansible can be viewed as two competing approaches to the problem of creation of JCL 2.0
for Unix.
The "Catch 22" with custom DSLs is that if you create a new DSL, you gradually feel the needs to re-implement a large part
of functionality of shell in it. Look at Ansible description below. They added conditionals. That OK. Then they
added loops. Which is not OK, but bearable extension. But what is next ? Subroutines with parameters, then something else (access to
ENV variables, etc). But that defeats the whole purpose of the exercise. In this sense Chief approach makes more sense.
Another side of this Catch 22 situation is that it is clear that this language will be used only episodically. So few sysadmin
would ever be able to master it on the level they know shell. And with episodic use much of the knowledge disappear from one use to
another. So no matter what complex construct the designer implement they will be very rarely actually used, and the tool
will be used in the most primitive fashion possible.
It this sense a better idea is to designate one Linux flavor (for example Red Hat) as "classic" and translate actions of utilities
in this flavor to all other flavors via wrappers, or in via real translator of code implemented in, say, Perl of Python. There are two
possible approaches to this idea of "standard Linux flavor + translator from it to other Linux flavors":
The simplest approach is to provide some "wrappers" for each tool that differs on both systems. For example, for
Suse you can write a wrapper called yum which translates yum parameters into zypper parameters. So when
you call yast -y install the translated string (the command that will be executed) will be "zypper -n in". This
wrapper, which can be written in bash or a scripting language of your choice, can be statically deployed on all Suse boxes.
So far only Cdist adopted Posix shell as DSL and uses Python only on master to generate actions scripts.
The second most complex approach is to write a translator that really translate, say, shell program, written for Red Hat into
equivalent shell program written for the target system, say, for Suse. This is much more powerful approach, but it is more difficult
to implement.
You can also combine two or use some intermediate approach when only curtain primitives are translated.
In both cases you allow system administrator to use the language that he/she already knows and to avoid "learning" curve of
mastering yet another language/
Another problems is that the existing DSL are way too verbose.
In this sense even comparing them with bash using the most primitive, but still somewhat useful measure of the level of abstraction
( LOC -- lines of code -- metric) clearly demonstrates this deficiently. So far they are unable to "operate of higher level".
In other words they re-implement just the "same level" language tuned to the specific task of hiding Unix flavor differences at the
expense of introduction of new syntax and new primitives. Both lexical and syntax analyzers deployed are extremely primitive and
make the such DSL kind of "bastard languages".
That means that scripts doing the same task on the same flavor of linux, the program written in DSL is neither shorter, not more
clear that the equivalent program written in bash. And that is in my opinion the capital sentence for such languages.
And being verbose has some other side affects (aka externalities ;-). First of all number of lines of codes correlates with the number
of mistakes and difficulties in debugging. If bash comes close or beats particular Unix configuration management system (for a single
flavor of Unix/Linux) in this metric, be vary. Be very vary.
A side note on programming languages design process
A side note. Even if we look at modern scripting language that achieved huge success the main impression is that none of their creators
has outstanding level of talent. They repeated mistakes already known in algorithmic language design from the days of PL/1 if not earlier,
stepping at time on the same rakes as designers of Korn shell, AWK, PL/1, C and C++. I would say more: creators of PHP were brain-dead
in this particular area, diligently repeating all the most of common mistakes in language design. Perl designer (Larry Wall) has had
some interesting insights, but he also could do much better with namespaces, "weak semicolons", limit on literal length, etc.
Why only recently Perl got state variable (sort of replica of PL/1 static variables for scripting language which use dynamic
allocation of memory) is unexplainable.
The problem was probably that the project suffered from limited resources most of its life (with the exception of a short period
when O'Reilly milked Perl books franchise ;-). but Python which fared better in this respect and enjoys support of Google, is also
an ugly language.
Let's return to the key problem here, the problem of learning of "yet another language". This problem is by-and-large independent
of the quality of the language. Important observation is that if you do not use the particular language very often (and that's true
for all Unix configuration management systems), you will never master it to sufficient extent. this is a strong argument against
creation of a new DSL for Unix configuration management systems. Because this language by definition can't be "primary" language for
any sysadmin.
Currently writing 30 line file to deploy NFS or NTP (possibly in an incorrect way ;-) also is not a very exiting prospect for any
sysadmin. Just because there are many things that are hanging over his shoulder and those tasks that Unix configuration management system
solves, while definitely important, are far from being frequent, or most time consuming task to perform, unless you need to manage inordinately
large number of boxes.
Most of those tasks can be pretty well performed using bash scripts and classic Unix tools after writing a limited
set of "wrapper scripts". I would like to stress again that using bash and your main Linux/Unix flavor, which is installed on most boxes,
as "primary dialect" (English of Unix) and translating all other into it looks like a better approach. In the simplest form that can
be done using wrappers.
And the last and not least. DSL often try to provide functionality for tasks, which are related but distinct from Unix configuration
management. For example, controlling daemons (which for some software products have a tendency to die) is a useful task, but advertizing
this functionality in books devoted to Unix configuration management, demonstrate the lack of good ideas, as this task generally belongs
to monitoring system domain.
Let's look at the program written in REX DSL for deploying NTP server on multiple nodes:
# Rexfile
use Rex -feature => ['1.3'];
user "root";
private_key "/root/.ssh/id_rsa";
public_key "/root/.ssh/id_rsa.pub";
group all_servers => "srv[001..150]";
task "setup_ntp", group => "all_servers", sub {
pkg "ntpd",
ensure => "present";
file "/etc/ntp.conf",
source => "files/etc/ntp.conf",
on_change => sub {
service ntpd => "restart";
};
service "ntpd",
ensure => "started";
};
As we can see when the set of Linux flavors supported, the deployment is achieves by a uniform script. But is "ensure => "present"
is better then wrapper that list tools in each flavor one by one and chooses "the right one". Is this really better then using
C3 Tools with such a wrapper ? Or is we limit ourselves to RHEL/CenOs/Oracle_linux
to a simple script:
cexec yum -y ntp
cpush /myconfig/TT/ntp/ntp.conf /etc/ntp.conf
cexec chkconfig --list ntpd | grep -v "3:on"
cexec server ntp start
cexec "server ntp status | grep -v "is running"
timestamp_on_master=`date "+%D %H%M"`
cexec "[[ `date "+%D %H%M"` = $timestamp_on_master ]] || echo time is not correct"
So far I think that there was very limited progress in creating an expressive DSL for Unix configuration management and the major
part of efforts was devoted just at hiding Linux flavor differences. which, as I already mentioned, can be done iether via wrappers
or via some form of translator.
Similarly, any software version control system like git or subversion can be adapted to keeping system configuration files in sync
with repository automatically on multiple servers (in this case different flavors of Linux can be accommodated by using different branches),
but does that means that this is the best way to synchronize configuration files on multiple servers. Definitely not. So something
new and different is not always better. It can be worse. As
John Michael Greer noted on
a different subject:
�More generally, it's impressive how many people can look at the landscape of dysfunctional technology and failed promises that
surrounds us today and still insist that the future won't be like that.
Most of us have learned already that upgrades on average have fewer benefits and more bugs than the programs they replace,
and that products labeled "new and improved" may be new but they're rarely improved; it's starting to sink in that most new technologies
are simply more complicated and less satisfactory ways of doing things that older technologies did at least as well at a lower cost.
Try suggesting this as a general principle, though, and I promise you that plenty of people will twist themselves mentally into
pretzel shapes trying to avoid the implication that progress has passed its pull date�
One of the better examples of the current breed of Unix configuration systems is probably Ansible. It has a dozen of so books
already published about it. Ansible is an agent-less Unix configuration management system developed in 2012 by Michael DeHaan,
a former Red Hat associate. For RHEL and RHEL-based (CentOS, Scientific Linux, Unbreakable Linux) systems, versions 6 and 7 have Ansible
2.0+ available from the EPEL repository. In its simplest form it can be used as just yet another parallel script execution
tool that works via ssh. On more complex level it can be scripted to perform various tasks.
But Ansible idea of deployment scripts ( which are called
playbooks) in far from being impressive. Here
is one example
- hosts: webservers
user: root
vars:
apache_version: 2.6
motd_warning: 'WARNING: Use by ACME Employees ONLY'
testserver: yes
tasks:
- name: setup a MOTD
copy:
dest: /etc/motd
content: "{{ motd_warning }}"
First on all, there is question whether adopting a primitive syntax format is a way to achieve simplicity. If does help to prevent
silly mistakes like "missing semicolon" typical for some other DSLs. But at the same it looks like this is just adoption of some
primitive syntax to express the same set of "wrong ideas" that are present in Puppet. This small DSL "hello world" type example also
looks too verbose for a very simple task it performs. Because for all practical purposes is essentially, equivalent to a single command:
And here is another example written in Ansible DSL that distributes Apache config file and restarts the daemon:
- hosts: webservers
vars:
http_port: 80
max_clients: 200
remote_user: root
tasks:
- name: ensure apache is at the latest version
yum:
name: httpd
state: latest
- name: write the apache config file
template:
src: /srv/httpd.j2
dest: /etc/httpd.conf
notify:
- restart apache
- name: ensure apache is running
service:
name: httpd
state: started
handlers:
- name: restart apache
service:
name: httpd
state: restarted
This example will not work if the particular server uses apache2 as the daemon name instead of httpd and a different
name or location of /etc/httpd.conf configuration file. Also if the number of servers is large there are always some additional
exceptions, that prevent upgrade to the latest version. So using "state: latest" is somewhat suicidal as there
are at least two other components that need to work with the particular version of httpd -- PHP (or Java) and MySQL (or other database).
And what if on one box the Apache server is still of 1.3 or any other 1.x version. In this case this is a SNAFU.
In other words you need to have some set of new constructive idea that allow abstracting those activities at a higher then present
level. Currently those ideas are missing and without them a new DSL that hides just differences between existing flavors of Unix/Linux
does not look like a very bright idea. .
OK. Let's assume that somehow we eliminated some problems of performing the task manually, and the most typical errors that are connected
with such operations (of which the major one in possible inconsistency in configurations). And that's assume that after we run our scripts
everything is OK. Which is a rare event, but let this assumptions stand.
But what will happen that when we need to run the same operation in a year or so. Can we rely on existing script. The answer, in
general, is no.
That means that we got another, not less important and time consuming problem, instead. When you need to reuse a set of scripts that
you have written (or borrowed) a year or two ago, you face typical for software developer problem of maintenance. Options and locations
of executables might change, new subsystems such as systemd introduced, one package can be replaced with another (Sendmail with postfix
in the past; syslogd with rsyslogd, rsh no longer installed etc). And the script need to incorporate those changes too. Another some
of those changes you may know, other may come as a surprise, when the script stop working or work in a way different from intended.
This is what in intelligence community is called blowback.
Also writing, debugging and testing DSL script can take as much time as implementing the same in old style fashion, or via simple
bash script and wrappers. So only if you use script with huge number of different servers (let's say above one hundred) you might
get some economy of scale here, because differences between Linux flavors are hidden by the configuration management system
and that makes you task easier. But you get instead the problems of outliers -- a few boxes that are "more different then usual"
from others in the same group, and those differences are not accounted in the scripts. For them you need to reverse changes and redo
them manually. This takes additional time. This is not often happens for servers, but this is a pretty common situation if you
manage a large number of Unix workstations, for example in the university campus.
For small groups of more or less uniform servers, simpler approaches might well be more flexible, more reliable and easier to debug.
For example is you have to manage, say, 24 servers in one location (say all RHEL 6.x), 32 in another ( say most CentOS 6.x with a couple
of 7.x) and 16 in third (all the latest version of Debian with systemd), you might not recoup the investment of writing, debugging
and them maintaining those "recipes" in some DSLs in comparison with custom or borrowed scripts using simpler tools. Of course
much depends of the quality of the particular Unix configuration management system and the set of ideas it is based on. Systems without
any innovative ideas, based simply on the idea "let create another way of expressing the same operations, throw it at the wall and see
what'll stick" (Puppet, cfengine) are usually the worst.
The problems with a set of DSL scripts, and your own bash scripts is the basically the same: as environment changes, the set of scripts
that you wrote today might became inapplicable tomorrow and require maintenance. All major Linux flavors distributors are not
known for sticking to the same set of configuration files, or using the same set of daemons "almost forever" like in Solaris.
As i mentioned before, Red Hat, for example, make quit a bit of previous work done for RHEL 6.x obsolete with introduction of RHEL 7
with its quirks and systemd. To what extent given Unix
configuration management system can hide these differences is an interesting test of its functionality.
I would like to stress it again: Bash, despite its warts and historical baggage, is a pretty well debugged implementation of
Unix shell, and has an optional debugger, high quality books and style guidelines. It is the language that is known by all Unix administrators,
kind of "lingua franca" of Unix. All this is iether completely
absent or in a very rudimentary stage for DSLs in Unix configuration management systems. Lack of intelligent debugging facilities is
especially biting, and ability to perform a "dry run", while definitely useful is nowhere close to what is needed. That's
probably why many users drop Puppet after trying it for a while (along with the problem of multiple bugs). Pigs just don't fly,
and if with enough thrust they attempt to fly it is dangerous to stand where they are going to land ;-).
So when your manager gives you another 32 servers to manage due to redistribution of workload (read the other sysadmin left and they
do not want to hire a replacement) and tells you that the switch to DevOps should make everything very easy, the last thing
that would excite you is that previous sysadmin created a set of Puppet scripts linked to his own set of Python workflow scripts. What
if you do not know and do not like Python and Ruby ? Software maintenance of somebody else software is a more complex activity then
software writing by several order of magnitudes.
Idiosyncrasies can matter too: if you need to understand the scripts and recipes of the guy who was OO-maniac, and can't write even
simple straightforward script without using classes and inheritance (and such guys of some reason are often attracted to Puppet) , you
are really screwed ;-). And believe me such perverts are pretty common in Python-land.
Another problems with writing and debugging Unix configuration management script is that some activities, such as, for example, checking
attributes of files for compliance with some set of rules is the domain of different systems. Such monitoring systems and hardening
scripts (see, for example, old good Tiger
) . The same is true as for the checking if all the necessary for the particular server daemons are running. As such they might
be better performed with a more specialized tools, although one advantage of tools like Puppet that I see is that they can double as
a Unix monitoring system, in some cases saving you from the necessity to deploy and learn yet another complex software package (although
simple Unix monitoring system is definitely preferable to Puppet).
As for typical examples published in books devoted to Puppet and similar systems I can tell you one thing: they are extremely
naive about the problems with maintenance that you might be facing, even discounting the fact that in the introductory books you
can't provide really complex examples. Let's ask ourselves a simple question: how many times a year you deploy daemons
like NFS and NTP -- the most typical examples discussed in such books ? So from one time to another something might change and you need
to modify your scripts accordingly. you can't just run them blindly. Don't you do it via kickstart during the initial installation and
then simply adapt or copy existing Config from a similar server? You know the answers.
Still the fact that examples you can find the two or more dozens of Puppet books are so simplistic and detached for reality, should
serve as a warning signal that suggests that the king is possible naked. If examples are not worth the paper on which they are printed,
to say nothing about the price of the book, that suggest that possibly the system described addresses wrong problem, or addresses the
right problem in a wrong way.
For example, for NTP deployment scripts published in many Puppet and other books that I have read, typically miss the most important,
vital test -- correspondence of the time after the deployment to the atomic clock time (which should be done by comparing the
time displayed by local NTP daemon with time displayed on the server about which we know that it is configured correctly and NTP is
working properly). And there are way to many things that can go wrong with NTP to check them one by one. You need an integral
check and if it fails manual troubleshooting should be done.
Similarly, for NFS they often miss most important staff such as firewall rules, subnet restrictions, optimization of mounting
parameters and what version of protocol the particular server should be using (v4 has huge problems in case of frequent disconnects).
This failure of Puppet books to provide useful in a "real datacenter" information in an engaging (or, at least, not extremely boring)
suggests one thing: the king can be naked. My impression is that designers of Puppet were trying to create Unix configuration
management system, but ended with a monitoring system with some useful Unix configuration system functionality. Somewhat similar to
HP Open View, but more modern, better designed, more programmable and cheaper (if you
buy professional support).
In other words, the quality of boons suggests that Puppet and similar members of the present generation of the systems accomplished
things that value almost nothing in daily sysadmin workload and does not represent a huge advantage over the set of custom scripts
written in Perl, or bash, or other scripting language you know best. Often they intrude into monitoring area (and those tasks can be
accomplished with other systems such as Nagios) and skip tasks that have a real value.
Unless the same system is used for both configuration management and monitoring I am slightly skeptical about the value of agents
in Unix configuration management systems. In my opinion, the capabilities of ssh in most cases are adequate for the tasks
that need to performed, especially if you have fast network and you do not want to substitute your current monitoring system with something
more powerful and programmable, but less specialized.
In a way, any configuration management systems which provide their own (often complex) agent working over SSL and does not provide
adequate monitoring capabilities is trying to reinvent the bicycle and the quality of such agents is usually suspect. Moreover
they can introduce additional security vulnerabilities, that are difficult to understand and slow to fix. As such they do represent
a security risk.
Essentially what they are doing is re-writing of parts of ssh daemon again and again. Often with less qualification and with
additional bugs, creating security problems or even backdoors, that are difficult to understand and that are usually detected only way
too late. Actually this was the problem with OpenSSH for some times: in the past it was the most common way to break into ISPs.
Some of Unix configuration management systems are specifically designed as agentless and are simpler then alternatives. Among them:
This is a Linux configuration management system from Red Hat, so it is actively maintained. It is targeted to large number
of servers, and can handle hundreds, so it is more like datacenter application then sysadmin application. It is also pretty complex.
The last version is 2.10 (October, 2020). Written in Python and requires Python 2.6+ on all managed nodes. Nodes are managed
by a controlling machine over SSH. In the simplest form /etc/hosts can serve as an inventory. You can also create multiple
custom inventories, for example (taken from
Ansible Playbook Essentials)
#customhosts
#inventory configs for my cluster
[db]
192.168.61.11 ansible_ssh_user=vagrant
[www]
www-01.example.com ansible_ssh_user=ubuntu
www-02 ansible_ssh_user=ubuntu
[lb]
lb0.example.com
To orchestrate nodes, Ansible deploys modules to nodes over SSH. Modules are temporarily stored in the nodes and communicate with
the controlling machine through a JSON protocol over the standard output. When Ansible is not managing nodes, it does not consume resources
because no daemons or programs are executing for Ansible in the background.
It can work both as a parallel execution tool and as a parallel scp command (in Ansible terminology there are called "ad
hoc" operations):
ansible atlanta -a "/sbin/reboot"
ansible atlanta -m copy -a "src=/etc/hosts dest=/tmp/hosts"
ansible webservers -m file -a "dest=/srv/foo/b.txt mode=600 owner=joeuser group=joeuser"
Similarly in playbooks you can use group names, for example
- hosts: all
- hosts: www
Here all is the equivalent and to target all hosts in the inventory. Ansible support regular expressions and set operations
on groups. you can create "groups of groups" (which Unix BTW does not allow ;-) using ":children suffix.
Certain
settings in Ansible are adjustable via a configuration file. The stock configuration should be sufficient for most users, but there
may be reasons you would want to change them. Changes can be made and used in a configuration file which will be processed in the following
order:
* ANSIBLE_CONFIG (an environment variable)
* ansible.cfg (in the current directory)
* .ansible.cfg (in the home directory)
* /etc/ansible/ansible.cfg
In a playbook, it�s possible to define variables directly inline like so:
- hosts: webservers
vars:
http_port: 80
Ansible allows you to reference variables in your playbooks using the Jinja2 templating system. the most basic form of variable
substitution looks like
My amp goes to {{ max_amp_value }}
Content can be read off the filesystem as follows:
---
- hosts: all
vars:
contents: "{{ lookup('file', '/etc/foo.txt') }}"
tasks:
- debug: msg="the value of foo.txt is {{ contents }}"
Information discovered from systems is called "facts" in Ansible. An example is the IP address of the remote host, or what
the favor of Linux that is running on it. To see what information is available, try the following:
ansible hostname -m setup
As you would expect DSL trends toward full blown algorithmic language and now has conditional
tasks:
- name: "shut down CentOS 6 and Debian 7 systems"
command: /sbin/shutdown -t now
when: (ansible_distribution == "CentOS" and ansible_distribution_major_version == "6") or
(ansible_distribution == "Debian" and ansible_distribution_major_version == "7")
Loops are also available, including nested looks loops over integer sequences, files, hashes, fileglobs, parallel set of data, etc:
Directory Layout
The top level of the directory would contain files and directories like so:
production # inventory file for production servers
staging # inventory file for staging environment
group_vars/
group1 # here we assign variables to particular groups
group2 # ""
host_vars/
hostname1 # if systems need specific variables, put them here
hostname2 # ""
library/ # if any custom modules, put them here (optional)
filter_plugins/ # if any custom filter plugins, put them here (optional)
site.yml # master playbook
webservers.yml # playbook for webserver tier
dbservers.yml # playbook for dbserver tier
roles/
common/ # this hierarchy represents a "role"
tasks/ #
main.yml # <-- tasks file can include smaller files if warranted
handlers/ #
main.yml # <-- handlers file
templates/ # <-- files for use with the template resource
ntp.conf.j2 # <------- templates end in .j2
files/ #
bar.txt # <-- files for use with the copy resource
foo.sh # <-- script files for use with the script resource
vars/ #
main.yml # <-- variables associated with this role
defaults/ #
main.yml # <-- default lower priority variables for this role
meta/ #
main.yml # <-- role dependencies
library/ # roles can also include custom modules
lookup_plugins/ # or other types of plugins, like lookup in this case
webtier/ # same kind of structure as "common" was above, done for the webtier role
monitoring/ # ""
fooapp/ # ""
Rex in one of very few Unix configuration management systems which requires only Perl 5
and ssh (both on master and nodes); as Perl in present by default on all commercial Unixes and Linuxes). For some reason Perl is
now out of fashion and was by-and-large displaced by Python for writing Linux system utilities. It does not have published books
as of March 2016, but there is a draft of one book on the web.
It uses a regular custom DSL, -- looks like nothing special in comparison with puppet or chief. But using Perl as an implementation
language and also that language in which tasks are executed is a better choice of the language. Perl is installed on all Unixes by default
now. So such systems has an edge. See overview by Andy Beverlay at
An introduction to
Rex - FLOSS UK DevOps York 2015. Was actively maintained until at lease late 2016 (on 2016-07-16 (R)?ex 1.4.1 was released).
There is also a draft of the book Rex Book (work in progress).
Like Ansible, it can work both as a parallel execution tool and as a parallel scp command. So you can use it at the beginning without
writing any DSL at all.
cdist is an agentless system which is much less known then iether
Ansible or Rex. Authors claim to adhere to
KISS principle which is positive, but such declarations
generally does not worth much.
Licensed under GPL. Initially released in 2010 at ETH Zurich
so it originated in the university environment, which has its own specifics. And it shows. Initially written and still is
maintained by Nico Schottelius and Steven Armstrong. It requires only ssh and Posix shell on the target host..
On the master host it requires Python 3.2. cdist is being used at a couple of organizations in Switzerland such as
ETH Zurich
((Swiss Federal Institute of Technology in Zurich from which Albert Einstein graduated) and the OMA Browser project ), as
well as the USA, Germany and France. Unlike most Unix configuration systems, cdist is not distributed as a package (like .deb or .rpm),
but installed via git.
Documentation is very scarce. It is almost impossible to understand how the system operates
and why particular structure was adopted. But there is
cdist group on Linkedin. The major
part of the discussion about cdist happens on the mailinglist and on the IRC channel #cstar in the Freenode network.
The last version is from 2015, but the latest commit in
github is from Aug 19, 2016. It was mentioned
on Hacker News, on
Reddit and on
Twitter. Ubuntu has man pages for it availbe in Web format. It has
some following, see Migrating away from Puppet to cdist (Python3)
Hacker News
cdist consists of two main components:
The core which is running of the master host. The core of cdist is implemented in Python 3.2 and provides all the executables
to configure target hosts. The core operates in a push model: It connects from the source host to the target hosts and executes scripts
on them. For communication and file transfer SSH is
being used.
The configuration scripts called types and they are executed of target hosts via SSH. The "types" are written
in Bourne Shell which is not the bext flavour of shell
availble (ksh93 is much better; bash is better as well and should be used as main domain fo cdist are Linux flavoues, not anything
else). To allow parallel configuration of hosts, the core supports a parallel mode in which it creates a child process for every
target host. This model allows cdist to scale horizontally and with the available on a typical server computing resources it can
reach pretty high number of instances.
cdist operates in push based approach, in which a server pushes configurations to the client. It's one way system -- the clients
do not poll for updates. All commands are run from the single master host. The entry point for any configuration is the shell script
conf/manifest/init, which is called initial manifest in cdist terms. It runs in several"stages" with only the final
being execution of scripts on the target. That allow generation of code on one of the previous steps.
Cdist does contain three idea that brought my
attention to it:
the usage as DSL of a regular POSIX shell. This is the idea I also subscribe to.
Idea of "code generators" a shell scripts that are not executed
directly on the target hosts, but instead generate shell code, which later is executed on the target hosts (nodes). Those
days, code generation is not a widely used technique and among few applications that still are using it we can mention only XSLT
which is typically used to transform XML to HTML. But it could be used for more generic "template driven code generation". See the
book Program Generators with XML
and Java for more information.
I would also like to mention a creative use of Unix hierarchical directory structure for encoding information about "objects"
in this configuration management system.
Usage of shell as DSL means that after you install cdist, you do not need to learn ugly new
DSL and curse the designers for incompetence
and bugs. But cdist does not used the idea "translate from the "Classic Linux" approach. Is uses typical for all other Unix configurationa
management system a set of new, custom, primitives called types and that's problematic. For example here is a description of the
"type" package which as you can guess allow you to install packages to the target systems:
This cdist type allows you to install or uninstall packages on the target. It dispatches the actual work to the package system
dependent types.
REQUIRED PARAMETERS: None
OPTIONAL PARAMETERS:
name (The name of the package to install. Default is to use the object_id as the package name.)
version: The version of the package to install. Default is to install the version chosen by the local package manager.
type: The package type to use. Default is determined based on the $os explorer variable. e.g. package_apt
for Debian package_emerge for Gentoo
state: Either "present" or "absent", defaults to "present"
EXAMPLES
# Install the package vim on the target
__package vim --state present
# Same but install specific version
__package vim --state present --version 7.3.50
# Force use of a specific package type
__package vim --state present --type __package_apt
In my very limited understanding of the system type is a complex object, consisting of a set of executable (let's
say object methods ;-) and files (let's day object variables). The whole cdist looks like a large API for writing shell
scripts, designed to simplify writing complex configuration management scripts. Types is structures as subtree in Unix file system,
consisting of a set of files and directories. The subtree is the same name as the name of the type and is provides via $__object
variable in script. The tree includes:
./parameter -- directory, that contains such files as required (for required parameters, one line per parameter),
optional (for optional parameters), boolean, and the subdirectory default with such files as loglevel
and optional_multiple
manifest -- the main script which is executed each time the type is called from the scripts. In the process of execution
it uses several other components which include
singleton -- a file, which insures that this type can be called only once.
explorer -- a script for obtaining some properties. Explorers are scripts that are executed on the target for
every created object. For example, an explorer can check the md5sum of a file on the client, like the example below (shortened
version which was derived from the type __file):
if [ -f "$__object/parameter/destination" ]; then
destination="$(cat "$__object/parameter/destination")"
else
destination="/$__object_id"
fi
if [ -e "$destination" ]; then
md5sum < "$destination"
fi
gencode - a code generator, which generates shell code that later should be executed on target. It is not
very clear to me at which stage (cdist runs in five or so stages) and from where it is called.
Types are stored in the directory called $CDIST_ROOT/cdist/conf/type/. Each type name is prefixed with two underscores (like
in __file) to prevent collisions with other executables in $PATH, because in scripts the names of those components
are used with qualification by the directory. So the names should not conflict with system executables:
Here is example that might help to understand how those directories and files re create. It contains the partial definition of
the type __nginx_vhostTARGET=$CDIST_ROOT/cdist/conf/type/__nginx_vhost
As manifest of a type is a shell script, you can call other "types" from it, creating some kind of "poor man" inheritance
in shell. For example, the type __package abstracts from the type of the OS for which package manager is executed
in the following way (this is a bad example, which simultaneously shows the weakness of -- cdist -- the absence of meaningful
abstraction of the OS version, but never mind) :
os="$(cat "$__global/explorer/os")" # get the OS for the target
case "$os" in
archlinux) type="pacman" ;;
debian|ubuntu) type="apt" ;;
gentoo) type="emerge" ;;
*)
echo "Don't know how to manage packages on: $os" >&2
exit 1
;;
esac
__package_$type "$@" # execute script appropriate for the Os on the target.
Code generation is another interesting feature of cdist. Instrad of writing a script for all cases imaginable is allow
to generate the code for a specific node which takes into account version of Linux it is running and other relevant parameters.
Which is by the order of magnitute easer to understadn then generic scripts.
Such generated scripts can be executed iether on master or on target nodes and use "context files" generated on other steps of cdist
execution (resuts of execution of "explorer" scripts). In the generated scripts, you have access to the following cdist variables
__object -- the path to the manifest -- essentially the type directory path.
__object_id
They can only read information from this tree, not write to is as there is no back copy of this files and they can't be restored
after the script execution.
if [ -f "$__object/parameter/name" ]; then
name="$(cat "$__object/parameter/name")"
else
name="$__object_id"
fi
Red Hat introduced RPM in 1995. While they never marketed it as a configuration management system, in reality it is close to this
class of systems, especially after introduction of YUM. It was based on Solaris packaging system and like the latter it
operates with the notion of packages (cpio archives with additional pre and post processing scripts added). It is the most widespread
type of repositories of Linux packages (Debian Apt is a distant second) and
as such its architecture and solutions are interesting for anybody who is interested in Linux configuration management systems.
So learning it has other benefits beyond purely configuration management tasks.
The rpm systems operate on packages not individual files. Important capabilities include:
Querying and verifying packages
Installing, upgrading, and removing packages
Performing miscellaneous functions (it keeps the history of all installed packages and can roll back some simple changes).
There are two command like tools which can provide information about installed and available packages: rpm and yum. GUI
tools are also available. Yum is more sophisticated of two and provides capabilities of automatic updates and package management,
including dependency management. It works with repositories, which are collections of packages and are typically accessible over
HTTP (http://), FTP (ftp://), or filesystem (file:///). It is a written in Python and is
a derivative of Yellowdog Updater -- an updater
for now defunct Yellowdog Linux distribution for Apple Macintosh, which was adapted to Red Hat by folks at Duke University Department
of Physics.
YUM has the ability to install groups of packages. Which if you have a prove repository
you can create yourself. This is really useful because many tasks require a collection of different software that may on first glance
not look at all related. There are too types of packages in the group: mandatory and optional. Yum installs only those packages
that are marked as mandatory. This is normally fine because it usually installs all of the key packages, but if you find it didn�t install
what you�re looking for, you can still install any missing packages individually. To find out what groups are available (and also which
ones you have already installed), you use the following:
yum grouplist
One of the groups that sysadmins tend to use a lot is Virtualization. This group contains all the packages you need such as the Xen
kernel, support libraries, and administration tools.
To get information about the group including the list of packages use
yum groupinfo Virtualization
To install a group, you use the groupinstall command:
yum groupinstall Virtualization
If the group you want to install has a space in the name, enclose it in quotes:
yum groupinstall "Yum Utilities"
As with installing packages, Yum will present you with a list of packages that it needs to download and install in order to fulfill
your request.
Classic example of using this capability is installing X11, if you missed it during the initial install:
I read
that book a long time
ago. What I remember (perhaps incorrectly) is that there are simple, compound and complex failures. One error causes a simple
failure, two a compound and three a complex. Complex failures are usually catastrophic. The errors were 1) failure to learn 2)
failure to anticipate 3) failure to adapt. Perhaps a bit overly structural, but it did stick in my mind for years.
"It aren't what you don't know that gets you into trouble. It's what you know for sure that just aren't
so."
Mark Twain
The popular now "software development" analogy is interesting from the purely intellectual standpoint and appealing to whose
who write their own scripts on a regular basis.
But it falls short if we analyses the realities of sysadmin work with the major problems of
accommodating various flavors of Linux/Unix, and unanticipated effects of even trivial changes (which can demonstrate themselves only
after the fact, which can be discovered days or weeks after the change was made). The rollback of botched changes is often
complex or even impossible on such a complex system as linux -- on server that runs applications only total reinstall of OS from previous
backup in order to returns the the previous
state. Like in river often you can't enter into the same linux system twice :-)
In other words, the Unix/Linux datacenter is fundamentally a chaotic system with a high degree of complexity and indeterminacy and
periodic crisis situations (after some of which, heads can roll). In this sense, Unix system administration is a different activity
in comparison with the
typical software development. It more common withdevelopment of imbedded software ( effects of bugs in production releases).
What is common is just high or very high influence
of fashion.
Also the software maintenance of complex software systems such as OS, or compilers (the author was involved in the latter) is
far from paradise. The code gets less and less architectural clarity with time, with new features, contributions from new people, bug
fixes, and workarounds. The result quickly becomes unmanageable: difficult to modify without unexpected side effects, hard to reason
about, and increasingly failure-prone. So it is unclear why this is an ideal to which we should strive.
Also the way Unix sysadmins are thinking about changes to the system it is different from how software developers think about development
of modification of software. I have done both and I can tell you that, while I am a former programmer, usually I am thinking about system
administrations tasks more in terms of a surgical operation on "OS image" -- converting the current image of OS into
desirable. Kind of surgical operation, sometimes "under anesthesia" -- with users disconnected, applications shut, and
the system
booted to special level.
And like any surgical operation it involves substantial risks, and should adhere to the principle, "First
do no harm�. Consequences of "interventions" are often different from what you expect. Sometimes very painfully so (that's why
there are a lot of unpatched systems in major datacenters; sysadmin just do not want to take the risk of screwing the complex system
up). While in software development tasks I am usually thinking in more simple terms of adding new features/functionality, or fixing
bugs. I never think about software maintenance as a surgical operation.
In this sense all this DevOps hoopla is missing the target and
as such is just another variant of Agile marketing scam (see
DevOps Is a Poorly Executed Scam ) liberating
organizational fools from their money:
I've got to hand it to the Agile development guys � they were really good at liberating money out of organizations that all
had trouble with something inherently difficult. The geniuses who developed Scrum and Extreme Programming executed masterfully;
selling books and training; and they made some serious bank doing it. If you hang around Silicon Valley long enough, you know to
applaud the hustle. It's the classic Rainmaker scam. You pay a man to make it rain on your crops, and when it rains, he
takes the credit. If it doesn't rain, he comes up with an excuse that involves you paying more money.
While surgical operation on "OS image" analogy is not perfect, to me it makes sense and allow to organize my activities in a more
predictable and controllable and safe manner. OS image can really exist as a set of directories or an archive containing all OS files
in case of virtual servers. The "target system state" may already exist on one of the servers (test or quality server). Most tasks
can be viewed as variation of the generic task of "elimination of the differences" between the "ideal state" and the "current
state". Much like in sculpture, where creating a statue is just taking the piece of marble and eliminating extra. Differences
between a current system state and the desired state imply that there is some "delta" -- a set of files and RPMs that needs to
be applied to non-conformant system to transform it into desired state.
And this delta can be visualized as tree of files that needs to be changes and the set of packages that need to installed/updated.
Such a tree typically is compressed into tarball, distributed and them "executed" (applied in a very controlled manner) on all target
systems. So creating of such a delta is more of a iterative process of comparing two systems and removing "extra" files and packages
that differ and adding/updating packages that writing a program. Most sysadmin activities are more close in spirit to some complex
task of synchronization, a superset of what rsync is capable to do), then writing a set of boring, trivial, or, in case of Puppet,
"intelligence insulting" scripts that push files and packages to given servers (although in some case such an approach also can be useful)
and which essentially hide what one wants to achieve.
That might mean that systems that utilized images of servers in a special filesystem, full or partial, and implements
instruments for manipulation them are a better way to go that traditional "push the files" approach. After all 100 full
images of linux system directories, say, 6GB each is only 600GB or less then a terabyte and now fits a USB stick. 300 such images (which
is pretty large datacenter with more then 300 servers, as one image can correspond to multiple servers) fits 2TB USB drive. And still
you can put such drive in your pocket. In this sense you can call this approach as a pocket Unix configuration management system
;-)
Most current Unix configuration management systems are still far from being mature. The main push for their deployment comes from
DevOps hoopla. Most of them suffer from verbose, non standardized "configuration definition language" (DSL) which adds to the overcomplexity.
Many suffer from abuse of XML and practice borrowed from Agile folk -- inventing new terminology for the sake of new terminology
and making simple things complex.
Selling the king a new cloth is an old, well known but still efficient and profitable business. With the level of knowledge of technology
of a typical corporate IT brass I would say that this is an easy task. Low hanging fruit of sort, as IBM released long ago, selling
junk system for enormous amount of money, just on the power of their brand name. So we need to be skeptical and do not take
the claims of designer at their face value. Here is a typical example of small, trivial program in DSL (domain specific language) used
in Puppet (Puppet Show:
Automating UNIX Administration).
Essentially, the example below is equivalent to "hello world" program used to introduce new programming languages. The purpose here
is to create file /tmp/testfile on a node (puppet client) if it doesn't exist:
class test_class {
file { "/tmp/testfile":
ensure => present,
mode => 644,
owner => root,
group => root
}
}
node puppetclient {
include test_class
}
As everybody understands copying one file to multiple servers with a given set of attributes can be accomplished with a single
rdist
command or similar command in any of popular parallel execution tools such as pdsh. So this is pretty verbose alternative and creates some concerns
about the validity of this approach. Why this type of DSL is optimal? Why it is so verbose? This example also demonstrates
both strong point and weaknesses of the typical approach in creating of such systems --concentration of creation of custom DSL.
The strong point is that this framework is extendable to the tasks of arbitrary complexity which now all can be performed
(hopefully) in a uniform way. As with any specialized language you bet benefits of expressing the problem in more domain
specific terms. But is this an advancement in comparison with shell remains to be seen.
The weak point is that you get into tar pit of software maintenance. The language does not have a good debugger,
introduce additional higher level of abstraction to your tasks, and in addition to solving your system administrator related
problems burdens you with software (and this is software, written in a specialized DSL) maintenance tasks. And if maintaining, say,
bash, Perl or Python scripts, while also painful, increases your knowledge of a particular scripting language and as such polish
valuable, marketable skills, here the value is those newly acquired skills and time spending of their debugging and maintenance is
more questionable. Would not providing some sort of API (which partially already exists in a form of Unix utilities, which
any sysadmin should know well) and using plain vanilla Perl would be a better deal? At least Perl has a very good debugger.
Two more points:
Dreams that somebody else will maintain your scripts or that you can borrow scripts from the repository and use them without
modification for your environment -- are just dreams.
The systems evolve and that introduce the problem similar to changing of versions of Linux distribution, when the new version
requires significant efforts to adapt to.
And if somebody suggest that this is a new more advanced way to perform Unix system administration I have some reservations. If you
are not involved full time you probably will forget large part of what's need to be done from one encounter to another (and if you are
you will become disconnected from real challenges of system administration.) So at the end you will use this system in the most basic
way, utilizing probably tiny part of its capabilities.
I have impression that developers are simply barking to the wrong tree by creating this level of overcomplexity, or, in some
cases, even may be artificially creating franchise that they can milk. Not unlike the "Pet rock" project. All of the leading
systems of this class are huge monolithic system and as such it has bad integration with classic Unix utilities and other components
of the datacenter such as monitoring systems, helpdesk, etc. In this sense they does not look superior to the popular "tarball+
Parallel command execution tools" method because they suffer from lack of constructive
ideas about how to maintain complex Unix configurations on multiple servers, that have different versions of Unix. There is no "OS version
abstraction layer", unless we consider the system itself to be such a layer. In most cases the differences if file locations and content
needs to be explicitly programmed into recipes.
And creating a new DSL is not an answer, unless it can be more concise, more expressive and more easily debugged then alternatives.
The key problem with the existing systems and the lack of new constructive ideas. Which demonstrates itself in extremely boring
books. In a way most popular systems can be viewed as an "extent and pretend" variant of the set of ideas that were introduced almost
30 years ago in rdist utility (which was included in BSD 4.3 released in 1986 ). In
other words they can be viewed as a slick repackaging of basic ideas that are 30 years old (actually reading a brilliant article about
rdist by Benedikt Stockebrand -- Introduction
to Rdist -- is probably the best introduction to this set of ideas).
Adding more "modern" DSL (instead of shell-style used in rdist) and providing several bells and whistles changes very
little. But, as Agile has shown, "rainmaker" style marketing can be success: just attention is profitable if you can keep
it. As the quote above suggests, in this case the income can come from books, training and conferences. In this sense even open source
systems is not a panacea. They also can be a variation on the same theme as Agile.
NOTE: The utilityrdist is a classic Unix utility to maintain identical
copies of files over multiple hosts. It probably provided the first DSL for configuration management. Here is a large quote from the
manpage that gives you some impression of the power of the utility (the example below really belongs to the Unix as it existed around
1992 -- 25 years ago -- and as such is a historical artifact ;-) :
It preserves the owner, group, mode, and mtime of files if possible and can update
programs that are executing. It can use SSH as transport protocol and in
this sense can be viewed as more flexible and powerful form of scp. Utility rdist reads commands from do called distfile
to direct the updating of files and/or directories. If distfile is '-', the standard input is used. If no -f option
is present, the program looks first for distfile, then 'Distfile' to use as the input. If no names are specified on the command
line, rdist will update all of the files and directories listed in distfile.
Otherwise, the argument is taken to be the name of a file to be updated or the label of a command to execute. If label and file
names conflict, it is assumed to be a label. These may be used together to update specific files using specific commands.
The -c option forces rdist to interpret the remaining arguments as a small distfile. The equivalent
distfile is as follows.
( name ... ) -> [login@]host
To use a transport program other than rsh(1c) use the -P option. Whatever transport program is used, must be compatible
with the above specified syntax for rsh(1c). If the transport program is not, it should be wrapped in a shell
script which does understand this command line syntax and which then executes the real transport program.
Here's an example which uses SSH as the transport:
rdist -P /usr/bin/ssh -f myDistfile
... ... ...
The distfile contains a sequence of entries that specify the files to be copied, the destination hosts, and what operations
to perform to do the updating. Each entry has one of the following formats.
The first format is used for defining variables. The second format is used for distributing files to other hosts. The third format
is used for making lists of files that have been changed since some given date. The source list specifies a list of files
and/or directories on the local host which are to be used as the master copy for distribution. The destination list is the
list of hosts to which these files are to be copied. Each file in the source list is added to a list of changes if the file is out
of date on the host which is being updated (second format) or the file is newer than the time stamp file (third format).
... ... ...
These simple lists can be modified by using one level of set addition, subtraction, or intersection like this:
list '-' list
or
list '+' list
or
list '&' list
The shell meta-characters '[', ']', '{', '}', '*', and '?' are recognized and expanded (on the local host only) in the same way
as csh(1). They can be escaped with a backslash. The '~' character is also expanded in the same way as csh but is expanded separately
on the local and destination hosts
As you can see from the example above rdist covered almost all the ground covered in more verbose way by modern Unix configuration
management systems. And DSLs used in them are nothing new and might be one step forward two step back kind of things. They are far from
being expressive (some are annoyingly verbose) and in many cases writing a special script in some new and obscure
DSL is not a better/faster solution in comparison with using bash or Perl and command line tools and scripts. And provide you will less
control of the steps. In rdist DSL the "hello world" example written in Puppet DSL presented above would look something like
HOSTS = (puppetclient)
F=/tmp/testfile
${F} -> ${HOSTS}
special chmod 644 ${F};
special chown root:root ${F}
notify root@master
Note: actually special chmod 644; and special chown root:root are not necessary, if file already
has those attributes.
And in both cases the proliferation of such scripts creates the problem of software maintenance, which is additional
task to perform by already stressed and overloaded system administrator. And this problem rises its ugly head with each release
of RHEL or Suse: there should period of adaptation to the new version for all scripts after such a release. Nasty errors
can be introduced by outdated or buggy scripts tuned to previous versions of OS, but executed on multiple servers in a group that includes
new versions of the same OS. Ask yourself how many of your own daemon control/verification of running scripts survived transition
from RHEL 6.8 for, say, RHEL 7.2 without major changes.
Nevertheless, if you manage multiple flavors of Linux (or worse both multiple flavors of linux and Unix) the need to automate some
task does exists. the question is only: what is the best way. And nearly every system administrator tasked with operating a large (as
in several dozens) number of servers eventually find, or write a set of scripts for executing the most common tasks.
Most brave try to write their own custom mini Unix configuration system (although they probably do not call it as such), increasing
the level of automation of their works, but at a price of reinventing the bicycle.
So the first important observation about desirable properties of Unix configuration systems is that they should not force sysadmin
hand but allow integration of his own scripts at least at the level the Midnight Commander allows (in user menu). Most sysadmin on senior
level are quite smart people and can automate many of the tasks they face themselves. At least using bash (and bash potential here is
definitely underestimated; it is difficult to beat bash in LOC metrics for accomplishing a given task even form Perl or Python).
So what system administrators really need is more like a custom IDE that help to write such scripts and provide some minimal API
that allow to lessen the tendency of reinventing the bicycle again (such as logging, execution of multiple server, mechanisms
of recovery of changes went wrong are those things that needs to be provided). On the most primitive level that can be just a library
of functions in bash or set of modules in Perl. But, in any case, the last thing sysadmin wants is to learn and then debug scripts in
yet another badly constructed and badly implemented DSL, the path the most Unix configuration systems designers are hell bent to pursue.
If I do not know the scripting language in which particular configuration management system is written, I would choose bash over new
DSL anytime.
There are multiple tools that help to solve this task and they usually fall outside capabilities on Unix configuration management
systems. Some "baseliners" can double as such inventory management tools. Comma delimited files can be exported to Excel or other spreadsheet
which provide a perfect viewer for this info, far superior to anything that can be achieved via Web interface.
Realities of system administration are quite different from software development: there is quite a lot of changes during the lifecycle
of the server that requires modification of scripts. And this is quite a different subject area with a different price for the mistakes
(remember how NASA lost probe due to some tiny error) . And due to that stricter discipline of applying changes to large number of servers.
The key difference is that each change should be ("uniformly") applied to large number of "slightly different" servers, each
of which deviates from "ideal" configuration in its own (possibly dynamic; see the problem of many cooks in the same kitchen) way. While
writing software for several different OSes is similar, here we have more variety and complexity. Writing software for 10 different
OSes is rare activity. So this "hell made of many small differences" in only superficially similar to the issue of portability in software
(although it does has similarities with year 2000 problem). That makes, for example, the distribution of ntp.conf
file to multiple (let's say 50) servers a non-trivial problem just because you can't be sure that you know all the factors that are
important. As Mark Twin quipped: "It aren't what you don't know that gets you into trouble. It's what you know for sure that just
aren't so."
For example I once tried to deploy modified version of user dot files (we changed to environment modules package at the time) that
(as I discovered later) have home directories of user mounted "on demand" using NIS. Previous sysadmin left three month ago and this
"nuance" was never documented.
Also even for common files it can well be that some version of OS on those servers use different name or a location or a different
format for the file you distribute, that you are not aware, until it is too late. So even for such a trivial operation as, say,
as distribution of /etc/DIR_COLORS file you can run into the problem of incompatibility between different versions of linux:
version that you created and tested for RHEL 6 will not work with RHEL 5. Who could guess? So distribution of files to multiple
servers is not so much the question of mechanics of distribution. It is mainly the question of knowledge which servers form a uniform
group and which are "outliers".
That's where key problems arise even if servers you manage are all RHEL/CensOs/Oracle Linux and just have different versions
varying from 5.11 to 7.2 (almost ideal, dream situation for any sysadmin). That's why sysadmins usually think about such tasks
in terms of group of servers with the particular version of OS, for which he needs to distribute iether individual file, or tarball
or set of RPM that implement (and then can reverse, if needed) the set of changes. And his main concern is about what happens, if the
change goes wrong on some servers of the group -- that he did overlook some important existing idiosyncrasies of configuration. And
what will happens if he reboots the server after this "trivial" change ;-). Due to Murphy law those servers for which the change
goes wrong can be hundreds or thousands miles away from sysadmin office.
Again, even such a trivial operation as reboot of the server that is working OK after some trivial changes are made, represents some
risk. And cause some fear that server "will no come up", as any seasoned sysadmin can attest. In this case things go wrong,
he should be able to restore the previous state of the system quickly and hopefully correctly (with servers as with the river, you can't
enter the same river twice ;-). Hopefully not discovering other things that went wrong in the process -- a typical example is that the
remote control unit such as DRAC or ILO, on which he relies also crashed and he can't login into it (which, at one time,
was pretty common problem for HP servers due to screw up in the firmware of those
units; Dell DRAC also at one time was affected and those naive
folks who believed that they should be able to connect to the server via DRAC without checking, were burned. Some badly...
This emphasis of high cost of error and the ability to roll the change back is necessary is somewhat different that we have in software
systems where roll back is usually trivial. Not so with Linux OS ;-). Like with the river you can't step into the same linux twice
:-). So even if roiled the change back chances are that you have slightly different Os then before, unless you installed it from backup.
That's why making full backup before important change is sine qua
non (you should not forget that the author has a PhD ;-) for any seasoned Unix sysadmin.
And only then he is interested in such niceties as history of changes, branches, and other goodies, associated with advanced version
control system programmers typically use. His main concert is about the validity of the backups of his systems and complexities of rolling
back after the failure of some complex deployment such as RPMs that went wrong and "hosed" the system ( in certain cases making the
system unbootable). As for version control system, local backup of files (with timestamps) in the same directory in many cases serve
as well as more sophisticated version control.
This tremendous complexity of environment to which often trivial changes are applied distinguishes sysadmin from programmers, who
usually needs to worry only about backup of his own programs and the data on which they operate and assume that the system is functional
for granted. Only those programmers who deal with maintenance of legacy system, can appreciate pains of regular sysadmin works.
As a superficial analogy, we can say that "year 2000 saga" is replayed in sysadmin context each day. To remind you all year 2000 fuss
was about really trivial change in an old software, often without active maintainers and written in obscure languages. Even blunders
sysadmins made are different from one that programmers usually make. See Sysadmin Horror Stories
And while systems like Puppet in certain circumstances can be useful, in reality they play really small role of the complex
set of tasks, that arise in managing large number of "somewhat different" servers. Especially, if you maintain three or more different
flavor of Unix/Linux, while each exists in at least two different versions. And for sysadmins even such "Spartan" version control system
as creating a backup in place before each change of configuration file, work surprisingly well in most cases. Files are typically
really small and diffing the current version with previous generations is not that difficult.
There are also few dependencies between various daemons that are not the that simple and more like "indirect influence" type. For
example, many sophisticated daemons such as SGE depends on NTP working properly. The same is true about rsync. Changing your network
parameters, such as IP, while being on ssh connection to the same box has a nuance about which you can easily forget: your connection
to box can be cut after the change. So if something goes wrong, that's it. This is a very painful situation, if the server does not
have remote control unit such as DRAC or ILO.
There are also more mundane, but still important things in sysadmin work that lie outside "proper" configuration management per se,
and belong to the "periphery". But still are extremely important. One of such tasks is the managing of the "manifest" of
each server, which includes all the relevant information about each. Such information include but is not limited to current system
administrators, tech parameters of hardware, location in the server room, network parameters, etc.
This can be done, and often is done, using Excel spreadsheet (see, for example,
Server Inventory
03-02-07 - California) or using a set of HTML files, or some more complex scheme that includes database, such
as MySQL and a viewer. But it needs to be done.
The simple quiz below illustrates a set of problems. Assume that vice president of IT requested the information that should be delivered
to him in an hour or so (assume that you have access to your desktop, and all you servers at this time) and you manage around
30 servers at this time:
How many servers you currently manage?
For each of those systems you need to provide the following:
Hostname, IP, OS version,
Hardware model and the serial number,
Type of CPU and number of cores.
The amount of installed RAM, type of chips and member of slot on the motherboard.
Number and type of local disks and the total amount of local disk space,
Type of remote control unit (DRAC, ILO, etc) , version, account name and password for remote access (if used)
Without ready-made spreadsheet or something like this to accomplish this task in time is very difficult or even impossible as you
have just two minutes to collect information about each server. There is a trend to use applications such as
openDCIM for this purpose, but IMHO capabilities of spreadsheet (with proper macro programming)
are more or less adequate. Such information is also essential in such tasks as moving datacenter, which become more and more common.
Many of those parameters influence configuration decisions as well, so they should be a part of Unix configuration management system
iether in passive way (created by somebody else and just used), or in active way collected and stored by some kind of tool.
If you do not use particular system on a daily basis you forget large part of functionality that you in the past had known and essentially
degrade functionality to some very basic staff. In worst case you feel like novice at the skating ring again: you can do nothing
useful and fall each time you try something. Moreover if you overlook impotnat neance and the system is powerful you can destroy the
existing configuration of the server in no time.
This is sad reality of Unix sysadmin life and I observed this effect on myself multiple times. For example, I at some point realized
that I forgot not most but quite a lot of functionary of find (despite in the past teaching Unix and writing
my own tutorial of find usage).
In other case when I returned to usage of Bright Cluster manager (a powerful, but proprietary configuration manager designed mainly
for HPC clusters or grids; but can also be used for managing regular servers) after a long period when everything was working normally,
the first thing I did was to destroy the existing installation of the OS on the node that had a problem because I forgot in what circumstances
this sucker reimages the server on the boot: Bright cluster manager generally reimage the node only if the node is booted from network,
but does not do it if node is booted from the hard drive; but the key nuance that I missed was that it does not install the boot record
on the server by default and in this case it fells to network boot even you boot the server from the hard drive, with predictable results.
And, as you already guessed, there was no boot record on this node. This was just a computational cluster node without any useful
information, so nothing was lost, but still it was a pretty painful experience, because I planned to use Bright Cluster Manager
for production servers too, and just imagined what if this was a production server. So the idea was instantly abandoned...
If you use something only sporadically you can never become an expert in this particular system. And probably eventually downgrade
yourself for a very small subset of functionality that you understand well. In this sense rich functionality along with high complexity
are shortcomings for "occasionally used" systems. And Unix configuration management systems belong to this category. For
proprietary Unix configuration management systems or open source systems that you bought with professional support a telling sign when
you ask a support person to perform some frequently used operation and he starts with reading a man page, and then tells you that
he need to run an experiment on his test system.
If you use something only sporadically you can never become an expert in this particular system. And probably eventually
set for a very small subset of functionality. In this sense rich functionality along with high complexity are shortcomings, not
an advantage of Unix configuration management system.
The problem is that a lot of time of a regular sysadmin is consumed by activities that that are different from configuration changes
and maintenance of Unix servers. And if everything works fine, you usually do not pay too much attention to preserving your skills
that you acquired when you use particular configuration management system the last time. Which requires test systems and "fire-drills".
Of course, you have some notes, but then when "push comes to shove" most often they prove to be incomplete and some essential information
is missing. Actually quality of your personal notes maintenance is a very important factor, no less important then the quality
of configuration management system. Imagine the situation when a blade is 16 blades chassis malfunction after, say, two years of normal
work, but you have no idea what is ILO or DRAC password for the enclosure, and only with some effort manage to retrieve IP address of
this enclosure, and this particular server room is say, 500 miles from your working place with nobody to help.
Concentrating on just a task of configuration management and creating and deploying a complex system to automate this particular
area by assigning a special person to it, as most large organizations can afford, is also not a good approach. The experience shows
that such a person soon became detached from he realities of sysadmin life and more often then not start engaging himself with "art
for sake of the art" types of activities.
You need to try to log time you spend on various activities during your typical working week to see what drain your time most. Typically
you have at least two-three "useless" or "semi-useless" meeting (say, one hour each) which possibly requires some preparation and ruin
half of the day. Then there might be some new unexpected problem iether with the equipment or software. You might need to
order some hardware which typically is the process with a lot of red tape attached to it. Dealing with users might be another
time drag for certain weeks and this is typically complicated by the fact that ticketing system might be really horrible and more a
nuisance then the help. Those areas are not only the one where your time down the drain. And it is given for for any
complex system that you did not use at least for a couple of month some part of knowledge already evaporated. It might evaporate
less is you diligently keep your own journal (as you must), but still you can't prevent this completely.
In other words life of sysadmin is pretty chaotic and it is difficult to concentrate of learning or re-learning again yet another
complex system. If so then productive usage of such system is not an easy task, unless you enjoy playing with it in your spare time.
And, believe me, you will discover that this is not maintenance of configuration of your systems, that tekes the most of your time.
So there is a clear limit on complexity that one can stomach and for Unix configuration system my hypothesis (please call it "Softpanorama
hypothesis" to promote this site ;-) is that this level is very low, much lower the the level of complexity that exists in Puppet and
friends. This is the phenomenon somewhat similar to what I previously observed is computer security area (Softpanorama
Laws of Computer Security):
There are also some inherent limitations in the level of security achievable in any given organization. The author formulated
three laws of Computer Security:
In a long run the level of security of any large enterprise Unix environment can not be significantly different
from the average level of qualification of system administrators responsible for this environment...
If a large discrepancy between the level of qualification of system administrators and the level of Computer Security
of the system or network exists, the main trend is toward restoring equilibrium at some, not so distant, point...
In a large corporate environment incompetent people implementing security solutions are a bigger problem then most OS
security weaknesses because users tend to react on their actions that decrease user-friendliness of the system by counteractions
that the tend to restore it, simultaneously weakening the security level, often to lower level than existed before. The real
computer security skills presuppose not only the knowledge of what should be done, but the knowledge were to stop in order not
to cause excessive backlash. The latter skills presuppose understanding of architecture of the environment and are completely
lacking in wanna-be security specialists. If incompetents happen to be in charge of security one should expect that they
will implement the most destructive for corporate IT security measures dictated by the current fashion, driven by excessive zeal
and desire to survive. Measures that backfire and due to use counteractions create security holes bigger then they are trying
to patch.
So the tools should simple, preferable very simple with a very low learning curve at the expense of functionality. That exclude
Puppet and similar "all singing, all dancing" software packages from consideration, unless you also use them as a monitoring system.
In case they are used only as fir configuration management their complexity is just way too much for a system administrator to handle.
Preferably, on level zero, this tool should behave exactly like pdsh.
Only few Unix configuration management system that I encountered can do that. Rex
is the only one that I know of.
Many sysadmin approach to solving Linux configuration problems is an iterative guessing game, when you search Google, then
try one thing, then another. This happens mainly due to overcomplexity of environment, when you really do not understand fully
the system you are working with, and has no changes ever to advance to this level. And solving problems when you do not fully understand
the environment is like searching a black cat in a dark room.
This is especially true for patching Red Hat (and derivatives) servers, which create set of complex and unique to particular package
management system problems that cause a lot of headache. On RHEL 6.x if you, for example install Mellanox Infiniband drivers, regular
RHEL patching does not work unless you exclude quite a bit of packages. Installing R from EPEL repository also interferes with
patching of RHEL (library conflicts), but removing EPEL from /etc/yum.repos.d allows patching to proceed OK. With
CentOS the problem is the set of valid repositories. Once I managed to patch the server with CentOS 6.3 to CentOS 6.7 only after replacing
the content of /etc/yum.repos.d from a CentOS 6.7 installation (before that most repositories listed returned code 404 -- not
found probably because version 6.3 was already removed from those repos). This was a remote server and using DVD for patching was not
easy as somebody needs to burn it, and I forgot that I can use USB stick instead.
No configuration management system can solve this type of problems. Sometime with RHEL one of several "very similar" systems
can be patched, but on the other yum complains. Using your private repositories help, but not always. That fact that a typical
RHEL installation consists of around 1600 packages excludes any possibility to learn them. Most system administrators (including
myself) now do not understand even the role of daemons that are active on level 3 and level 5 of RHEL 6. In other words we need
to deal with a closed system.
Also the amount of information that you need to remember is such that some of it fades away, despite being essential. Sometimes I
look at my old scripts and realize that for example in the past I used to know find much better than I know it today.
But the key problem is the fact that the system configuration tasks are rarely central to sysadmin life that this severely limits
the level of complexity of the system you can "afford". A lot of time of sysadmin is consumed by mundane problems including dealing
with (often clueless) users and (sometimes equally clueless) managers :-). Among such drains of time we can mention:
Dealing with clueless users. Each datacenter has its own small set of "village idiots", who can't remember their password
(and if SecurID is used, they manage to screw it up too) , systematically destroy files in this home directories, etc. Often such
people have only very superficial understanding of software they are using and for all problems blame system administrators, trying
to offload as much of their tasks on them.
Dealing with "pack rats", who want to duplicate all the software in the world in their home directories as well as keep
their useless data for the last 10 years or more and systematically go over allowed quota (or squeeze other users if not quota mechanism
is installed), and the press sysadmin to increase it.
"Waiving dead chicken" activities. There are also set of activates in enterprise datacenter which usually can be classified
under the banner of "waiving dead chicken" (the term which came from networking). Preparing useless presentations, weekly or
biweekly "activities reports", participation in useless meetings, and creating Potemkin villages for the visit of some high
level honcho are probably the most typical.
Applying security patches for the unending stream of Linux security vulnerabilities (which actually do very
little to improve datacenter security, as this is mainly an architectural problem).
The last problem -- the problem with the unending stream of security patches -- probably deserves more close look. Many security
problems covered by the stream of patches emanated from Red Hat and Suse are iether impossible to exploit remotely, or not applicable
to the particular environment on the datacenter where servers are installed. Also existence of NSA and CIA guarantee that sufficient
set of vulnerabilities are always present to simplify their tasks ;-)
So all those efforts belong to the category of "waiving dead chicken". Avoiding blatant architectural errors and configuration
blunders might be a more modest and more realistic goal, but it is never articulated as such. Instead we are fed with unending stream
of "Corporate speak" (aka
corporate bullshit) about importance of security
with one Potemkin village built after another. I think Hillary now can have a very successful corporate lecture tour on this particular
topic.
The task of applying this stupid stream of security patches from Red hat or Suse is often raised to the level of life-or death problem
by the security department, which in order to justify its existence, insists that they are all applied in a timely fashion, even
if those patches mean absolutely nothing at overall (often dismal) level of security of the particular organization or a particular
datacenter. For example, if all server access internet via proxy and in addition site and server based firewalls are used, and, hopefully,
properly configured why we should bother with the vulnerabilities that target closed ports? Not only often those patches are related
to services already blocked by firewall, they often require very special condition to exploit (for example an account on the server).
And believe me, as a former IT security specialist, really good remote exploits are sold for money to three letter agencies and
"rich" hacker groups long before (often years) they are patched by vendors ;-)
Soon you start to hate the security people involved. And often not without reason, as they are often dumped from other IT departments
because they are useless or gravitated to security themselves , as an opportunity to repair their injured ego ;-). Sometimes I
saw a really amazing level of security paranoia in organizations artificially maintained by the security department in order to preserve
and maintain their value (often fictional; as I mentioned before, security in reality is the problem that exist and should be solved
on the level of datacenter architecture and the last department involved is deciding architectural issues is the security department).
For example, I saw organizations which deploy their internal DNS root (so you can't resolve any external IP without going via proxy)
and simultaneously once a month or so send their sysadmins the list of security patches that need to be applied ASAP, the list created
by scanning servers with some third rate vulnerability detection system that produces a lot of false positives. But the latter does
not bother anybody. Instead efforts are concentrated on reporting and maintaining the spreadsheets about the percentage of fixes accomplished.
Fortunately, there are some tricks that you can deploy against those security junkies, but this is quite another topic.
See Softpanorama Bulletin. Vol 23, No.10 (October,
2011) An observation about corporate security departments
Of course, we also know about opposite cases as well, when extremely sensitive systems were configured and administered as if they
are home systems. See, for example,
Understanding
Hillary Clinton email scandal. Which is not surprising and just an opposite side of the same utter incompetence coin: extremes meet.
The animals were happy as they had never conceived it possible to be. Every mouthful of food was an acute positive
pleasure, now that it was truly their own food, produced by themselves and for themselves, not doled out to them by a grudging
master."
- George Orwell, Animal Farm, Ch. 3
"I will work harder!"
- George Orwell, Animal Farm, Ch. 3
"All that year the animals worked like slaves. But they were happy in their work; they grudged no effort or
sacrifice, well aware that everything they did was for the benefit of themselves and those of their kind who would come after
them, and not for a pack of idle, thieving human beings."
- George Orwell, Animal Farm, Ch. 6
The work on Unix system administrators was always hard. More often then not, it requires long hours. Like in popular song "The
cowboys work is never done", the work of Unix system administrator is never done. That reminds me the tale of
Sisyphus:
In Greek mythology Sisyphus was the king of Ephyra (now known as Corinth). He was punished for his self-aggrandizing craftiness
and deceitfulness by being forced to roll an immense boulder up a hill, only to watch it come back to hit him, repeating this action
for eternity.
And that situation does not change with the invention of Unix configuration management systems. You just get more systems to manage,
or, in Sisyphus tale terms, a larger boulder. But we are digressing.
If we have a complex mix of different Linux flavor in the datacenter plus several classic versions of Unix (for example Solaris and
HP-UX), we have a new set of problems due to "compartmentalization" of system administration, as different people are typically
responsible for such Unixes.
Of cause, in most organizations IT brass practices the game of selection of "preferred" vendor, which is a favorite but very inefficient,
weak form of "unification" efforts in enterprise datacenter (today SLES, tomorrow RHEL, Oracle Unix day after tomorrow ). Customers
often want particular flavor on which their application runs best. Also often this "preferred vendor" is changed when one honcho at
the top is replaced with another. Acquisitions also throw monkey wrench at those efforts.
What we are talking here is the replay of Unix Wars (Solaris vs. AIX vs. HP-UX vs. BSD) in Linux space (Suse vs. RHEL vs. Debian/Ubuntu
); as well as the differences between different versions, such as RHEL 5, 6 and 7 (if you look closely they are pretty substantial,
especially between version 6 and 7). As Mark Twain noted "History doesn't repeat itself, but it does rhyme."
The complexity on modern Linux is not only multiplied by existence of multiple flavors of Linux, it is raised on a new level by the
multiplicity of scripting languages (Perl vs. PHP, vs. Python vs. Ruby), multiple web servers, and databases and other applications
that you need at least partially understand to manage your systems.
For example, managing of both Suse and RHEL servers is almost twice more complex that managing uniform RHEL or SUSE server park.
Also often there are remnants of Solaris, HP-UX and AIX servers that exist in enterprise data centers (and will continue to exist,
as often they fill specialized niche, providing for example in case of Solaris higher level of security). That means that for the set
operations implemented, there should be some form of "version awareness". For example if we install packages on three flavor of Linux
(Red Hat, Suse and Debian) then we should have differentiate between tools available and our "install" operation might be based on translating
yum parameters (you Red Hat is dominant flavor of Linux in your organization) or zypper parameters (if SLES dominates) to other package
manager. Or you can create install script that contain something like this (which is the way Unix configuration management
system designers typically approach the problem):
function install
{
if [[ $OS == 'Red Hat' ]] ; then
package_manager='yum -y install'
elif [[ $OS == 'Suse' ]] ; then
package_manager='zypper -n in'
elif [[ $OS == 'Debian' ]] ; then
package_manager='apt-get install'
fi
... ... ...
}
But the devil here is in details. If you start inventing you own parameters that adds to complexity. If all parameters are borrowed,
say from yum, this is a better deal, but the task of matching them to zypper and apt-get is lightly more complex. But I would like to
stress that selecting dominant Linux distribution and translating all other into utility parameters that are used in it is a netter
path than inventing you own "yet another" set of such parameters..
Most of the major options can be translated and is some option can't it should be treated in some special way. But typically
system administrators know and use only very small subset of, say, yum capabilities and outside yum -y install or yum -y
update know little, so the loss here might be much less then one would assume. Of course, optimal subset of parameters can be created
only on "case by case" basis.
In this case you are adding another layer of complexity and if installation encounters library conflicts you need to go down one
level and troubleshoot the problem directly, in term of the package manager used and all gory details that such troubleshooting involves.
Unix configuration management system in useful only if the deployment goes smooth.
As each of the major hardware verdor has its own approach to remote controls (Dell DRAC, HP ILO) such operation as complete shutdown
of the datacenter in case some work on electrical equipment or air-conditioning is performed requires some though. Most of such units
are accessible via ssh and have a set of command available. Also each hardware vendor has its own set of utilities for hardware updates.
In case of DRAC or ILO, for example you apply firmware updates directly from Linux. But again devil here is in details and each update
mechanism has its strong and weak spots especially in the area of remote control of the process.
Often just the number of servers that one needs to manage present big enough challenge to make your life miserable. If you want to
do a good job you are is simply swamped.
Users often think that they can offload both packages installation and some application problems on system administrators.
That is especially true for complex open source packages. Even such standard packages as PHP, Apache and MySQL (LAMP stack) are
complex to install and maintain and there are interdependences between three. For example, such package as Fantastico, Softculous,
Quick Install simplify installation of LAMP stack, CMS (Joomla, Drupal), Blogs (WordPress), and over 70 other open-source applications.
And it's not enough human life to learn them in fine detail. So sysadmins need to skip corners and rely of prepackaged solutions.
Few people now compile complex package themselves like in "good old days." Even RPM-based installations have million of nuances that
one needs to remember. And to this Python and R that each datacenter need now to have and you have multiple complex problems with version
and updates. And for a human it is too much to remember It is impossible to remember nuances of each installation and software packages
The level of non-uniformity of the datacenter is probably the most important factor, that corporate IT brass does not want to address.
And it by-and large determines which Unix configuration management system to use because the tool needs to support all flavors
of Unix you have. Adding to that Windows installations is probably not wise, so tools that support simultaneously Unix and Windows usually
support well none, and should probably be rejected because striving for that is just greed (large market share) and often is an architectural
error (unless Cygwin is used on Windows side).
More specific problem that Unix configuration management systems are trying to solve include:
Ability to hide most of the OS differences related to configuration and patching of the servers (now with the dominance
of Linux this is less important, although Solaris and HP-UX are still remain parts of enterprise datacenters) using a domain-specific
language. That was actually the initial idea behind cfengine. Also if you use just RHEL and derivatives you can use
kickstart for deployment and yum for package management, but if you have both SLES and RHEL, your situation is more
difficult.
Reporting about the changes you did to the server yourself and related problem of being informed about (sometime wrong or
redundant) changes done by other sysadmin(s)). "Change we can believe in" made by somebody else, and which produced "interesting"
side effects is sometimes pretty difficult to detect :-).
Ability to put specific configuration files under revision control and to ease the burden of having to remember to commit
changes to multiple boxes (using a distribution to a specific group of servers instead). There are attempts to use git
for this purpose, but git is badly suited to Unix configuration specifics and unless you use git heavily for software
development this is a bad idea. It's a dog which barking to the wrong tree. Such packages as
etckeeper can be viewed as a failure. Of course, you can always write
you own set of scripts to make work git better using it just as a storage of configuration information, but this is another
story.
Consistency checks between server belonging to one group and comparing the current configuration with configuration of other
server or configuration of the same server N days ago. Existing configuration management systems are bad at this. Baselines are
specialized class of program designed with this particular goal as you can diff two baselines (typically being text files). but there
are even better more specialized system for this purpose. Also existing utilities like diff and mc has unique
capabilities for this purpose too: few people know that GNU diff can take two directories directly as parameters without any "input
substitution magic". Try something like diff /etc /Rescue/Baseline/Etc_old
Automation of similar changes (often distribution of patches or changes on configuration) to multiple servers ( for a particular
server group within which this group of changes supposedly does not break anything ;-) and maintaining consistency of a set of manually
modified configuration files across all servers ( /etc/resolv.conf, ntp.conf, /etc/postfix/main.cf,
/etc/profile, /etc/bashrc and user dot files are good examples here).This is actually not that difficult
to implement using such tools as rpm, pssh, PDSH, C3 Tools, but
it is somewhat better to have an integrated functionality, which created an integrated log of such operation and put them in the
general context of the "lifecycle" and "workflows" for the particular set of servers.
Automation of collecting of configuration information or hardware information from multiple servers both for resource management
and for bare metal recovery. In Puppet such information is called facts and there is a special utility factor to collect
them. If you use daily backup for your systems, you also have a collection of configuration files for the system
as a part of backup. The problem here is that in enterprise datacenter backup is bureaucratized and fossilized. Baseline
of the system and private tarballs is a simpler method to have a collection of basic configuration information in time (usually one
year is enough). Baselines are organized for ease of comparing two system or two states of the same system using regular diff.
Similarly tar balls of /etc directory can be compared with the current state using tar itself. That makes creating a backup
of tarball on the first root login each day (from root profile script) of paramount importance. Many SNAFUs can be avoided
if you have a tarball of /etc directory made at the beginning of a particular day.
Control of some daemons that tend to self-destruct, verification that they are running and restart of daemons and applications
in case they died (the task is typically performed by monitoring systems, which are suitable for it). Paradoxically, some monitoring
systems agents (for example, HP Open View) are so notoriously
unreliable that you need an additional layer of software to ensure that they are running properly (HP
Open View agent consists of half a dozen daemons that tend to die and sometimes need troubleshooting to recover; here Unix configuration
management system can be of great help). In some large enterprises giants of thought from monitoring group (which in feudalized enterprise
IT is, of cause, a separate group with its own manager and its own interests, distinct from the interests of the enterprise
as a whole) automatically create tickets for sysadmins for each dead daemon (probably because re-launching daemons would distract
them from watching porn on the job; this is probably the most close approximation of
Sisyphus labor in modern IT :-)
Semi-automatic verification (and reporting of violations) of important OS settings (the set of task which in old days was
usually incorporated in what was called "hardening" scripts) . Unix configurations system provide already pretty developed
infrastructure that can simplify (and also can complicate) set of "system sanity" checks such as checks of file and directories
permissions, presence of various banners, absence of typical errors (blunders) in configuration files that open the server wide,
etc. In the past there was a class software systems that were designed to verify certain setting and enforce some
parameters. They were known as "hardening scripts". Such early systems as
Cops by
Dan Farmer and
Titan by Brad Powell were probably the most well known. Later Solaris
Jass and Linux
BASTILLE (badly written, but hugely promoted) and became
somewhat popular. Around 2010 they eventually disappeared or, more correctly, went into semi-forgotten stage, but the idea is still
valid and now can be executed on a new level: the level of scriptable Unix configuration management systems. Actually
the task of re-implementing functionality of a typical set of hardening scripts, such as Titan, is a very good test of for a particular
Unix configuration management system. It gives you much better assessment of strong and weak points of such a system, then creation
of some stupid or not so stupid "evaluation matrix" -- the sport that became alarmingly popular in enterprise IT environment, as
such a matrix can hide the responsibility for a blunder.
The task of re-implementing functionality of a typical set of hardening scripts, such as Titan, is a very good test
of for a particular Unix configuration management system. It gives you much better assessment of strong and weak points of
such a system, then creation of some stupid or not so stupid "evaluation matrix" -- the sport that became alarmingly popular
in enterprise IT environment, as such a matrix can hide the responsibility for a blunder.
Documentation of the life cycle of the server, events that happened and operations performed and presentation this information
is a blog or wiki format. Lack of documentation and limitation of human memory when you are dealing with the typical flow
of tickets in a corporate datacenter are such that here some aid is not only desirable, it is extremely, utterly necessary. It is
a survival tool. And simple paper log that in the past was "good enough" while still useful is not adequate on the current level
of complexity. you need a Web site format like blog or wiki to help to deal with this level of complexity. Unfortunately corporate
tickets systems (help desk systems) are so bureaucratized and mismanaged that they are more an obstacle then a tools for documenting
changes in the system you manage. Here different systems which are less controllable by corporate bureaucracy might help.
For example, PuppetDB stores and aggregates data about changes to nodes. All dashboards provide a web interface to review
the data from PuppetDB and there are tools that utilize the same DB as a data source.
Often a new problem in Unix system administration domain is nothing but well forgotten past problem. So maintaining records
of your activities in a searchable format (not necessary database, HTML and plain files is as good, or even better) is of paramount
importance. MediaWiki is often used for this purpose too as learning it has value beyond this particular domain. While it is
a complex software and uses wiki format which I hate, it does provide several useful tools such as discussions, versioning
and other wiki services and is pretty well debugged (this is the engine used by Wikipedia). Ability to document your day-by-day activities,
and especially blunders, or as they are now called SNAFUs, is now an important part of the life in system administration, because
you will lose most of this knowledge in two or three months and if you face the same problem again most likely will try to reinvent
the bicycle ;-). Also people tend to repeat blunders (and different administrators are susceptible to different blunders; our
shortcoming are an extension of our strong traits) them unless they periodically browse their logs. Weaker folk often
try to swipe their mistakes under the carpet, which usually complicates the situation. See
Sysadmin Horror Stories for some telling examples.
This (notably incomplete) list shows pretty clearly that such systems overlap with several existing systems, and first of all with
monitoring systems and RPM-based systems of distribution of patches such as YUM, especially YUM ability to use private repositories.
RPM format includes the capability of running pre and post scripts. As for overlapping functionality with monitoring systems, as I mentioned
before, Puppet can definitely compete with Open View. Actually, only when I started to view Puppet as a monitoring system competing
with Open View, its design decision started to make some sense to me. And do not looks like a horrible overkill. Because
agents definitely have a value in monitoring systems. There is actually a book about use of Puppet for pure monitoring:
Puppet Reporting and Monitoring by
Michael Duffy (Packt Publishing,
June 24, 2014)...
Another subset of functionality definitely belongs to version control systems such as subversion and git. Actually central
git repository can be used as a source of distribution of changes which allow a very well controlled mode of distribution of the configuration
file with the possibility to reverse the actions and simultaneous documentation of each change, built-in diff mechanism, etc.
Yes another subset of similar functionality is implemented in so called bug trackers, as most changes include not only description
of the problems but also a set of files and other documentation that needs to be stored.
Trac integrates with git and subversion and provides minimal but
adequate wiki for documentation. See Comparison
of issue-tracking systems - Wikipedia
For any integrated system there is always some overlaps with the existing systems. That's the nature of the game. The problem
is the quality of implementation of particular function that overlaps, in comparison with the "dedicated" implementation of the same.
We all know about tools that can perform many functions, but can't perform any of them well. Moreover there are some niche products
that essentially undermine the whole concept of "Swiss knife for Unix system configuration management". For example, "environment
modules" represent a specialized configuration management system for a very narrow domain -- user .bash_profile and
.bashrc scripts. This package defies the concept of Swiss army knife for Unix configuration management. The same
is actually true about another unique tool for Unix system administrator -- Midnight Commander.
But here I am not impartial observer...
Also it is clear that there was no clear breakthrough in this type of systems yet. There is only some incremental and rather slow
progress and the rising complexity of this category of tools, as if complexity solves the problem, not proliferates them.
No exiting or revolutionary ideas were introduced by this type of software. All of them belong to the category "same old, same
old".
For example, books about Puppet (more then a dozen exists) so are boring that reading them is a real pain. And they typically advertize
boring, semi-useless and detached from real sysadmin needs examples like deploying something like NTP daemon as an ultimate achievement.
And even for this simple task, the functionality that they provide is not very convincing. For example few such example, even those
published in the books, include the most vital check after the installation -- whether the time displayed by NTP daemon
after the installation is correct (and this is the major real problem with NTP installation in any large organization, as complexities
such a proxies and firewall make everything pretty convoluted). In other words what they are doing is not very useful and is a minor
enhancement of the capabilities of the existing RPM package. Which with minor modification would provide the same or better functionality
with less efforts. Moreover, unlike learning Puppet (unless you are a Ruby enthusiast), modifying RPM package instantly teaches
you a really valuable skills. Which can be applied in such areas as troubleshooting library conflicts in complex software installations.
All those consideration again beg the question "Is the king naked?"
The fact that some of those configuration management system are used by several large and influential organizations proves nothing:
large and influential organizations are notable for using software junk because due to huge available resources including manpower they
can make them work and, at least, appear useful. Many such system are examples of "let's do something" approach to creation of Unix
configuration management system and lack any constructive ideas and approaches to the problem (see
GNU cfengine as a classic example).
The second problem is that their "server configuration description languages' are still at the stage of infancy. With some are not
really useable and most far from being comfortable. The typical first reaction of normal Unix sysadmin at seeing such description is
"why the hell I need all this additional complexity?". That makes simple tools like baseliners and some adaptation of software version
management for configuration files more attractive as they provide, say, 80% of functionality necessary with 20% of troubles.
Related problem is that they try to solve tasks that are solvable by other means no less well and avoid tasks for which configuration
management system is of primary importance -- such as automating patching of group of servers and creating a visual map of complex Unix
servers configuration which allow better to understand it and make fewer mistakes in modifying it.
The control of changes--including the recording thereof--that are made to the hardware, software, firmware, and documentation
throughout the system lifecycle.
The control and adaption of the evolution of complex systems. It is the discipline of keeping evolving software products
under control, and thus contributes to satisfying quality and delay constraints.
Software configuration management
(or SCM) can be divided into two areas. The first (and older) area of SCM concerns the storage of the entities produced during
the software development project, sometimes referred to as
component repository management. The
second area concerns the activities performed for the production and/or change of these entities; the term
engineering support is often used to refer this
second area.
After establishing a configuration, such as that
of a telecommunications or computer system, the evaluating
and approving changes to the
configuration and to the interrelationships among system components.
"People forget how fast you did a job
but they remember how well you did it"
― Howard Newton
Festina
lente is a Latin saying that can be translated into English as "hurry slowly". Sometimes translated into English as "more
haste, less speed" or 'haste makes waste". If tasks are overly rushed, mistakes are made and good long-term results are not achieved.
It has been adopted as a motto by the emperors Augustus and Titus, as well as the Medicis. The Roman historian Suetonius, in De vita
Caesarum, tells that Augustus deplored rashness in a military commander:
He thought nothing less becoming in a well-trained leader than haste and rashness, and, accordingly, favorite sayings of his were:
"Hasten slowly"; "Better a safe commander than a bold"; and "That which has been done well has been done quickly enough."
Along a decade ago I spoke with one old Unix administrator and asked him why he is changing root passwords on his servers assigned
to him by logging to each server and doing it manually, while he has perfect tools to do all at once in less the a minute and use them
for other tasks on a regular basis (this particular organization has Tivoli full suits of applications installed and running). This
organization has 90 days root expiration policy, so this was not that often, but still quite a bit of work. He answered to me that this
is essentially the only chance for him to look at the servers more closely on individual basis during the quarter. And by logging in
he checks some other areas, along with changing root password, spending on each server some time. So this occasion serves as once
a quarter "server round trip" for him.
I noticed somewhat similar effect myself in different circumstances. For example, when you make a non-trivial configuration change
manually let's say on the first ten servers creating or modifying the existing script (which then you attempt to run on remaining servers
via some parallel execution tool, you learn a lot in the process and usually tweak your script quite a bit toward the end, as you gain
experience and learn about some nuances that initially you was not aware of. But if you run your, say, pre-existing script from
the very beginning you learn nothing, and possibly can miss some important things, screwing at least some of the servers to the extent
that the change on them needs to be reverted. And "needs to be revered" is often non-trivial undertaking, probably more complex then
making the change.
So the speed on making configuration changes about which most Unix configuration management system boast does not have absolutely
positive value. In some circumstances it can be a huge disadvantage as to recover from the results of running of botched script on,
say, 16 servers is often more time consuming then making this change manually.
Slow speed also facilitates learning. And allow you to produce better notes. Both those aspects are extremely important.
So the danger of Unix configuration management systems is that they implicitly encourage you to run script on more servers then you
should and before they reach the state of maturity necessary for your environment. That's why running them manually on, say, first
dozen of pre-selected and not that important boxes is a preferable tactic. No matter how well you test script in your test env., production
boxes can produce some unpleasant surprises. You better be prepared to them.
The same is true about the ability to change configuration on, say, 100 servers, simultaneously. Often this is simply not needed.
It is better to split those 100 into smaller groups and run each group in more controllable fashion. Only with virtual machines
you can allow yourself to behave more "recklessly", as replacing botched image with an old one is a trivial procedure. Of cause
much depends here on what applications your virtual machines are actually running. If they are critical for business, you better be
more careful.
One typical task that any Unix configuration management system should do well is to distribute a change in a single configuration
file to multiple servers (a server group). The task looks simple, but actually in a typical datacenter it is not. That is mainly due
to multiple flavors of Unix/Linux involved.
There is a lot of hidden knowledge required to implement even simple changes and this knowledge often exist outside of any automated
system. Some of it is even difficult to formalize. The main complicating factors here is the number of affected servers and "remoteness"
of some servers. The latter means that if they crash after the change, there is no simple way to get into the server room where they
are located, and often there is no personnel on duty to perform anything more complex then putting DVD into the slot (or there is not
personnel at all). If such server does not have something like DRAC or ILO, in case you lose networking connection the only way to see
what is happening is via pictures or video feed from smartphone (using Skype or similar), or by pointing somebody laptop camera at the
monitor. Here is the list of some complexities that may arise and precautions that might need to the taken (especially for remote
servers, where there no personnel on duty at the time of the change due to differences in time zone or other reasons):
You need to preserve previous version of the file on each affected server. Otherwise you can't roll the change back.
You need to create a "manifest" -- list of files the you distribute to each server to simplify roll back.
If the change is "critical" you need a set of partial or full backup of each server to be performed first.
/etc/ backup. As a minimum precaution you should always backup the whole /etc/ directory and some
parts of /root and /var directory before making any changes as a part of due diligence. Assume that in worst case the server can
became unbootable and plan accordingly ;-)
Baseline. If you use baseliner, then Baseline should be taken before and after the change. For example, Red Hat
provides free SOS packages which is of low quality but still can be used. SLES provides somewhat better supportconfig
script.
Filesystem snapshot If operating system permits it, filesystem snapshot mechanism should be used to prevent unintended
consequences, especially if the change is complex (patching multiple subsystems qualified as a complex change; the fact the they
were tested by the vendor doe not mean that they will work in your environment with all applications you use).
Verification of the list of affected server, that constitute the group. The problem here is that your list of affected
servers (server group) can be wrong. This is a typical problem when attempting to propagate a file to a very large number of
servers. There are always some outliers.
There can be "manually patched" servers that already contain configuration files with timestamp newer the "cutoff" for
the change. In other words file that were edited manually and now can't be "unified" without understanding what was added
and why. Such files can be detected because diff file for them will different from a "typical". That means that you need
to save deltas and compare them.
You need a method to verify the old version of the files does exist on all affected server and you do not overwriting files
on wrong servers. In a the most benign version this looks like sending updated bash profile to HP-UX server that
does not contain bash, or contains too old version of bash. In more menacing version this looks like overwriting files on
RHEL 5.x server with configuration files belonging to RHEL 6.x.
You might also need to verify additional prerequisite for the update, especially if you update the kernel. Red Hat
sometimes play bad jokes with kernel updates.
Change of daemon config files often require restart of daemon or other post-installation actions. In case you update
configuration file for a daemon that is running, you often need to restart the service for the change to take effect
Verification of change. You might need some method to verify that change actually work on all servers in a group and,
what is even more important, produced the necessary change in behaviour. Even if you view them as identical there might
be some hidden differences that will change the behaviour of the update. For example, one of several servers might have the registration
expired and no longer can access repositories needed. Actually testing validity of registration is a must if you install some package
on multiple servers.
On some servers additional changes might be needed to be made in order for this change to work properly.
Creating documentation for this patch. Group of changes typically is viewed in system administration as a patch. It is
desirable to generate some documentation about the change made "just now" so that you don't forget about and another system administrators
be aware about it too.
If thing went wrong. In case of SNAFY,You need some mechanism to uninstall the change (which in this case means
restoration of the old file and possible restart of daemon or similar actions) if you find that you made a mistake or the change
does not work as expected (the situation, which typically is detected when it is too late)
There are several not so obvious problems that arise in the environment when multiple system administrators try to manage multiple
intersecting groups of servers. Among them
Problem of many cooks on the same kitchen
The problem of unannounced or forgotten changes, missing files, "history gaps" and importance of your own knowledgebase
Inability to find the necessary information, that you know exists somewhere
Avoiding SNAFU due to typos, making change to the wrong server and other trivial blunders
There are many other, but that is enough to show that any Unix configuration system has severe limitations in what it is able to
accomplish. "Human factor" remains very significant, if not decisive, factor in this business and all this "software development" talk
is, in ways, just an attempt to swipe those problems under the carpet.
No matter what Unix configuration system you use and what is the major flavor of Linux in your datacenters, you face the set of additional
complex problems when several sysadmin administer multiple servers. Usually they have unequal qualifications, some of them can behave
badly under stress, and due to this some unique problems arise. System administration area is far from being a paradise. and there are
several complex problem that go above and beyond distribution of changes and patches. One of such problem is the problem of multiple
cooks on the same kitchen and informing members of this team about actions of each other. There is a fair amount of backstabbing
as well, especially if there are one or two narcissistic jerks in the team, who consider themselves superstars and everybody else "trash".
"Cascade of interventions" that can happen with multiple administrators especially if they work different shifts can happen when
something going wrong, often making the situation worse. When one administrator make some disastrous change and then denies that he
made it is easy to get too emotional. But it is better to get technical ;-). For that you need the tool that record changes and allow
you to recreate history and reverse changes without too much drama.
Even if you are a sole system administrator for the particular group of servers it makes sense to keep track of all the small changes
you make to the configuration of each of them and understand three things:
What you did,
Why you did it,
How to do (or not to do ;-) it again
Without supporting tools this three simple items are an impossible task, as there ware way too many changes for human mind to remember.
Also with the complexity of modern Unixes answering the second question often represent formidable challenge, especially after month
or two since the change was done. The key problems that you forget significant, often critical details way too soon, typically
in a couple of months or even sooner after the change is made. And what is important is that people tend to forget the most crucial,
complex details. Recovering of which later will require substantial work and Google searching. With "reinventing the bicycle taking,
hours, days or even a week. Keeping personal log (for example, in the form of the private Web site on tablet, or netbook)
can help if done religiously, but complexity here is such, that it is not enough.
Also the flow of problems is relentless and often you need to deal with more then one problem a single day. Juggling several
problems a formidable challenge and switching from one problem to another during the day is productive only if problems are relatively
minor. For "real" problem you need 100% concentration, and here other problems are your enemies.
But constant distractions is the reality that Unix system administrators face. Add to this long hours and you really are ready to
any tool that can help you. But often such a tool is a false promise.
The worst problem that you face is the problem of limitation of human memory: there is just way too many things that Linux/Unix
sysadmin needs to remember. Even the number of utilities in Linux is such that without personal notes and manpages you are often lost.
and forgetting some importance nuance might help you to make some disastrous moves that you somehow managed to avoid the previous time.
Stress also created additional problems. Stressed syadmin usually commit more errors.
Also some forms of protection from plain vanilla stupidity is welcomed. I have been in this situation several times.
And believe me such things as rebooting the wrong server as just child game in comparison with other blunders that you can step into
under stress, in a hurry, or because being too tied (but I do recommend you to renamed the reboot command on production servers to something
like reboot_usdell68 as part of post-installation tuning (where usdell68 is the name of particular server). Or replace
it with the script that asks a simple question: is the right server to reboot. How about wiping /etc directory on the
critical corporate server in the middle of the day just because you have etc directory in your home directory and accidentally put a
slash in front of the etc in the rm command (that's why backing up /etc/ directory should be done on the first login to the
server (from your .bash_profile) during the day :-). It's really simple to implement and on "level 0" can be as simple
as adding to your .bash_profile script the following:
if [ ! -f ~/backup/etc/etc`date +%y%m%d` ] ; then
tar cvzf ~/backup/etc/etc`date +%y%m%d` /etc &
fi
Such operation it is almost instant on modern servers.
But additional protection from "stupid" operations on system directories should go far beyond that. Many Linux distribution
now offer primitive but important defense against wiping critical directories such as /etc with the rm command. But
we need more sophisticated mechanism in this area that really help sysadmins to avoid unpleasant SNAFUs. Something like "safety
net" can for example be implemented using AppArmor. Unfortunately
this very interesting idea was killed due to RHEL dominance.
It is very difficult to restore the chain of events and actions using tiny peaces of information that you can extract for /root/.bash_history,
logs (if you your organization keeps them that long) and files in your home directory (which should include tar of /etc directory for
the last year or at least six months). It instantly becomes clear that that important things were never documented as they were
not considered as such in the heat of the moment. And later another fire prevented documenting everything.
Here is where using version control of system file can really help. But having version control records is also not a panacea, because
it is not enough to have records, you should also understand that logic behind the changes made. The latter is not given. That's why
using HTML and Web site format and SSD disk with you logs is better then paper log. The search on SSD disk is reasonable fast and can
be done using standard Unix tools such as grep. and if you document your changes even in a very simple format such as one directory
one change (see Perl Wiki as a System Administrator Tool) in many cases
you might uncover additional useful information that you previously recorded, about existence of which you already forgot.
Another set of problems exist when other sysadmin leaves the company and his servers are transferred to you. No matter how hard you
try to obtain the necessary knowledge before he leaves the company and no matter how cooperative he is, huge gaps will be discovered
in your knowledge later. And documenting those problems and the solution found, one by one essentially creates you own knowledge
database that help to maintain those servers with less frustration.
Of course we just scratch the surface of this important topic that deserves separate page -- see
Perl Wiki as a System Administrator Tool. In a way nothing demonstrate limited capacities
of human brains better then modern Linux systems ;-). Complexity is just overwhelming and far beyond any human abilities.
And vendors trying to fatten/secure their bottom line by continuing to increate complexity with each OS release, each of them imitating
Microsoft path to the glory.
In any case the guiding principle is that you will forget important things and needs to put considerable efforts in preserving the
"trail of evidence" for your own activities (if not activities of your colleagues). That's why even such thing as keeping log file of
your daily activities via screen log, Teraterm log or some other activities logging tool is a step in right direction.
You need to be your own NSA :-)
Creating deltas of /etc, /rootcrontab, and other critical files (including
/root/.bash history ) on a regular basis is also worthwhile. They should be stored on a remote server or
at least a USB drive, so that they remained available is the server root filesystem went south. Reading .bash_history
in the morning in a good practice that help to avoid blunders and "revive" your previous actions. And it is vital if there are
several sysadmins for the same server. Comparing previous day version of /etc with the current and sending you a difference can be put
in your cron script.
In any case you need to take steps to prevent typical SNAFUs caused by misunderstanding some aspects of OS or utilities or plain
vanilla human error. When a serious disaster strikes particular server you can get to your files instantly, not after hour of talking
about retrieving backup tapes. Also the most typical "serious" problems arise the problem itself is trivial but the latest or
all backups unreadable to some unfortunate confluence of factors. Which elevates this problem to the level of major SNAFU. For
example, HP Data Protector can abruptly stop backing up files and if this situation is not noticed, you are up for major problems if
something bad happen with disks and filesystem is lost (for example RAID controller died, or server room was flooded). In this case
your own private backups is all that left.
If the situation is similar to what you experienced before (and many such cases are), browsing history and your personal log
them might help to revive essential facts and ideas about what you did, why you did and how to do it again recovering from this problems
the last time (or, if you made some blunder, not do repeat it again ;-).
Those "memory crutches" are far from being perfect, but they better then nothing and as with the current level of overcomplexity
of Linux they are a must. A typical Linux configuration management system does not address this important area at all. They concentrated
on "operations" part, which represents only a tiny subset, a tip of the iceberg of problems you face. "Knowledge database" part is probably
more important.
Putting undue amount of efforts only on "change implementation/change control" part is just a barking to the wrong tree.
"Knowledge gaps" and lost parts of your own experience, misplaces or lost files, scripts and notes, are probably the most important
problem that you face even if you are the only administrator, who administer a set of Linux/Unix servers. That's why organizing them
as a web site is so important and you should not spare efforts on creating this "private knowledgebase".
The situation of many cooks at the same kitchen just adds additional stress and complexity and require additional efforts to avoid
misunderstand, but does not present anything new in this respect.
Only your own knowledgebase can help you promptly remember details of how resolves previous problem with your servers, when
they reoccur (possibly in a new context, but still when your previous experience of solving them is vital). Even remembering critical
switches and options of Unix commands and utilities (which are way too numerous and duplicate each other) is simpler with your own pages,
which can be populated each time you frantically search Google and man pages for some forgotten switch, example or combination of switches.
And they can help to understand how the system evolved with years. without this knowledge dealing with complex problems can be more
difficult, and if you take a wrong direction you can easily make the situation worse (especially under pressure).
So creating and keeping your own knowledge base is probably the major part of the art of modern Unix configuration management
and Unix sysadmin skills in general.
Configuration management tools supposed to help to answer the problem of "too many cooks in one kitchen" in some way by standardizing
common procedures and writing scripts for them iether in a standard scripting language such as bash, Perl, Python or Ruby, or in a special
"domain specific language" (DSL). This approach is more helpful if the number of the number administrators for the server
is more then one and the number of servers is more then a hundred. Medium size datacenter usually has around 100-300 "real" servers.
Large data center are a special case anyway and they have resources to tackle those problems.
Tracking changes in a server configuration files is critical to understand problems and often substantially help to find the root
cause and repair the server or the OS, including security problems. Making mistakes is easy. It is troubleshooting them what is
hard.
When you manage couple of dozen systems you can no more view each system as an individual box and risk catastrophic errors like making
changes on a wrong box or not enough boxes. You need the log the changes "per group" not "in general" as different groups of servers
present different sets of the problems. That does not exclude having "master" journal, but the only way to get it right is to use entries
from "groups" journals.
Even more nasty situation arise when you make changes on the right box but using a wrong set of assumptions about it as between changes
you forgot some important facts pertaining the box or a group of boxes.
You can utilize for this purpose a separate small tablet (7" Samsung tablet with bluetooth keyboard works OK), or netbook (Dell 10"
netbooks work perfectly well), so that it remains portable like paper "lab journal." And reading your journal entries pertaining particular
group for systems before making any important changes usually can save you from a lot of troubles. Just the act printing and reading
them (if you commute by train) is often worth more then the best configuration management system. Typically
Bug Tracking system also be used as a personal journal and provides a lot of useful functionality
but I have found that such simple tool as HTML editor (for example Frontpage) with
each group represented as one Web site is good enough too. Perl Wiki or blog engine also are viable options.
As Unix configuration server multiples each your mistake by the the number of servers in the groups testing of your changes became
an acute problem. Unless the change is absolutely trivial, you should never attempt to run a change without testing it on at least one
server of each flavor of Linux you are dealing with. That takes time. Often the negative effect of complex change are not apparent immediately.
Sometimes trivial change in reality is not so trivial: change of the hostname of the server is a classic example of a minefield attached
this very simple change.. Unless you have a very homogeneous environment, like in HPC clusters, not everything is rozy is this scenario,
and the possibility of multiplication of your blunders on the number of servers in the group is very real.
The key to avoiding SNAFUs in making changes to multiple server is a very strict following of a standard software development process:
use IDE with editor that has syntax coloring, each change should the standard sequence of steps, which includes such steps as
"documented" (often omitted due lack of time of convenience ;-), tested and only then applied. In other words, in complex environment
there are no simple changes. All changes are complex and require full software development cycle to be successful.
It is especially important to adhere to this simple rule for remote systems, visiting which involves driving over 100 miles or, worse,
an airline trip. Using "corporate bullshit" as a dialect of English
language, we can state:
Unmanaged configuration changes impact an organization's ability to prevent outages, understand the impact of planned changes,
and especially in today's regulatory environment, adhere to corporate and government policies. Knowing who changed what and when
is vital to complying with today's security requirements.
Tom Perrine of the San Diego Supercomputer Center recently offered this guidance to an Internet newsgroup aimed at university security
administrators. It offers sage advice for anyone managing heterogeneous UNIX systems. I actually do not share his excitement over cfengine
-- IMHO badly architecture agent-based system. Also in a way cfengine is a misguided attempt to reinvent TCL by a person who has no
real talent for language design. As happens in such cases such attempts lead to a predictable bad results.
Let me take a small step back and philosophize from a wider perspective. The local Cray folks have a saying: "Wanna-bees worry
about GigaFLOPS, and nanoseconds; real computer companies worry about *cooling*..."
I think that the real "higher ground" is security will be won (if it ever is) in two strongly-related areas: software quality
(process) and (automated) configuration management.
Let's face it, the quality of most commercial software is pretty pitiful at worst, and sub-standard at best. As an industry, we
have pretty much ignored 40 years of software process research and lessons learned. The first paper on what we now call "buffer
overflows" was published in 1965. This paper and those related to it was influential in the design of Multics, portions of
the original UNIX system-call interface, and security kernels. They called this problem "insufficient argument validation" in those
papers), and it also influenced language design and the move towards higher-level languages.
We have ignored all the "formal methods", strong specification, structured design and adequate testing strategies. We have forgotten
(or never learned) all the lessons of Mythical Man-Month, Peopleware, The Psychology of Computer Programming, Software Tools, and
many other books, methodologies and studies. As in the security arena, we have most of the technology and lessons figured out, we
just don't apply them :-(
Configuration management is related (a part of any proper development process), but we often fail to use it in non-software-development
areas, even if we do use it for software. There is no reason for a person to *ever* ask "What version of *anything*, is this?"
and not get a good answer. There is *no* reason for computers to have "version drift" where patches or software are inconsistent.
Again, we have the technology, whether it is cfengine, SMS, or vendor-supplied or home-grown scripts, it is just not being applied.
So why are these basic technologies not being applied? The answer is short-term thinking, similar to that that drives the quarterly
earnings drives of most US companies.
Let's face it, it initially takes longer to establish a proper software development (or any other) process. You have a steeper,
longer initial spending/development curve, and pay more of the costs "up front", and dramatically lower costs in the maintenance
and update phase. (You also have fewer bugs to fix, pushing the support costs even lower, but I digress.)
... ... ...
So I guess I believe that "wanna-bees" worry about exploits and patches; real security people are more concerned with
process and management..."
For more of my heretical views, see "Security as Infrastructure: Are you shooting rabbits, or building fences", a USENIX LISA
Invited Talk.
Yesterday,
All those backups seemed a waste of pay.
Now my database has gone away.
Oh I believe in yesterday.
unknown source (well, originally Paul McCartney :-)
The classic "missing backup problem" looks trivial, but it is not. The essence is that you made some complex change, realize that
it is not desirable (or worse botched the server) and now want to restore the system from backup. And at this point you discover that
backup does not exist, or exist but is corrupted, or is not full, etc.
The key solution to this problem is to reverse the course of your actions. The implementation of change on multiple servers, should
always start with the verification of the backup, or making a full backup backup yourself. In simple cases where only /etc/directory
is involved you can program such backup yourself as a step in you configuration management script.
At the same time inattention to this problems typical for books for existing configuration management systems suggest that they are
still a very immature field, up to major restructuring and consolidation. "Do not do harm" is a principle fully applicable
to any attempt to push changes to multipl eserver and lack of attention to this pribem is worse then a crime, it is a blunder.
On modern servers, only in a very rare case such a backup takes more then a few hours, so there no excuse not to to perform this
step. Both Relax-and-Recover and rsnapshot
allow to use USB drives for this purpose. The largest size of USB drive is now 8TB (with larger 10TB drives in the pipeline), so
it is adequate for most local backup needs. For backing up OS USB sticks are enough as they now scale to 128GB.
The key problem with existing configuration management system is that it is pretty difficult to distill the key ideas they are based
on and determine their worth without actually using them for a prolong period of time. Books are mostly descriptive and tell you how
the system can do this and that, not why this particular method was chosen. Articles that compare configuration management system are
mostly superficial (see, for example,
Comparison of open-source
configuration management software ). In no way they answer the key question: why I should use particular configuration management
system and does it provide for me real benefits in comparison with the collection of simpler tools.
When you need to choose the right system for deployment, which should correspond to the needs of your organization your knowledge
of particular scripting language in one of primary factors. For example if you know Perl well, you better limit yourself to Perl
based Unix configuration management systems. Unless you want to learn Ruby. If you know only shell well, you should think about
learning one additional scripting language ASAP, but meanwhile can choose the system
that is "shell" friendly and generates target scripts in shell.
If your part of servers is more or less uniform and consists of different version of the same flavor of Linux (say RHEL) you can
choose a very simple tool. If you need to support in additional to Linux Solaris, HP-UX and AIX the tool should be more complex
as here the differences with Linux are considerable (especially with AIX and HP-UX).
There is also a problem of the tool fitting the size of datacenter. The problems that exist in giant datacenters like Facebook or
Yahoo are quite different then the problems in regular enterprise datacenter, or some research center like university labs. Yahoo
and Facebook can allow themselves to hire developers to help them to maintain and deploy Unix configuration tools so they have local
experts. This is typically out of question for enterprises. Enterprise IT outside financial institutions is usually understaffed
and overworked. The same it true for most universities, although by definition such places are more friendly to developers of open source
software. But both often they just cannot afford additional complex software system to be implemented due to the lack of manpower,
even if at the end of the day that might resolve some existing problems.
In any case without trial period lasting at least a couple of month (60 days) it is impossible to choose the right tool. And even
with trial period mistakes can be made, of you evaluated only a single tool. Such as evaluation should include at last three different
tool belonging to different weight categories with at lest one of them agentless. That gives you some perspective.
If you have a freedom of choice and really need one (two big ifs) you should always pick up Unix configuration system written in
the scripting language you know best, be it Perl, Python and Ruby. If the system uses "plain vanilla" scripting language it is a better
system for Linux/Unix administrators. How many DSL a regular human can learn? This problem id "yet another DSL" actually kills interests
in such systems, unless they are pushed by higher management (with enough thrust pigs can fly; it is just unsafe to stand where they
are going to land). The Catch 22 here is the following: to learn complex system like Puppet is close to a full time job. But if
this is your full time job, you are by definition not a Unix administrator anymore, and as such is useless. Because you can write only
simple things, not really challenging deployments scenarios where such system can provide real value, because you no longer involved
with day-to-day administration tasks and do not understand interplay of complex nuances involved, which only can be obtained by doing
day to day administration. Large rich companies such as Facebook and Google can actually bury real IT talent in such "monkey
see -- monkey do" jobs and achieve some level of success (at the expense of people involved), but for other companies this is neither
possible, nor desirable as top IT talent is a scarce commodity.
Also sysadmin can benefit from using the whole "undiluted" scripting language and just using API to the system.
This way it will help him to stay current in his favorite scripting language (and that's why the system should be chosen from this angle).
Even if they use DSL for some things it should be iether maximally close to the underlying scripting language (in which the system
is written), or they should use YAML. In this sense Chef, which uses YAML, is somewhat preferable to Puppet (it also has
better written books, such as
Learning Chef A Guide to Configuration Management and Automation). Still both are pretty complex, agent based systems.
And that has significant downside.
The second thing is simplicity. Linux is already overwhelmingly complex even without configuration management system :-). So any
system that at least declare that being minimalist as their design goal is preferable to alternatives. Simplicity also implies low learning
curve.
Simplicity also depends on whether you know the scripting language in which such system is written or not. You always learn
quicker and will be more productive in a system that is written in the scripting language you can program yourself. Not only most conventions
used will be natural for you, the learning curve also will be less steep. For this reason I think picking up Puppet only because
it is probably the most poplar Unix configuration management system is not very wise move, if you do not know (or at least want
to learn in depth) Ruby.
Excessive verbosity and attempt to be more catholic the Pope are clear warning signs that this is a wrong system to be deployed.
You can spend a lot of time learning this crap with little or no tangible results. As somebody mentioned after initial periods of excitement,
such systems tend to became a nuisance, rather then help. And believe me this happen more often then people admit. So try to chose the
system wisely because from now on you are essentially forced to use it. And it would be sad if most of tasks it perform can be accomplished
better by other means. Not having a Unix configuration management system is a much better deal then having a wrong one.
Unix configuration management is far from being a new topic. On a basic level you need just to understand who and when made changes
to particular system and compare two server configurations belonging to two moments of the server life. For example the current and
60 days ago. The simplest tool for this are so called baseliners.
They can be taken daily and stored offsite to prevent games with their modifications "after the fact". Analyzing /root/.bash_history
can help but usually is not enough. This is just a useful starting point (but that amplify the importance of using timestamps in bash
history -- this is must for "enlightened" sysadmins.)
After you understand what needs to be changed, think about making the change as a software development process. You need to prepare
the file, document your change, test it and then distribute is to the set of nodes using some kind of tool. If your testing was deficient
and you got into SNAFU you need a way to reverse the change.
Tasks that go above this functionality includes some more sophisticated methods of synchronizing configuration files and patches
applied to similar systems. There are several already widely used "tried and true" methods beside what a typical Unix configuration
management system offers. All such "alternative tools" are available as RPMs for all major Linux distributions, or can be easily installed
on all your systems. Extensive literature including books exists about their capabilities and use.
Among them:
RPMs with their built in checks. You can think about such changes are very simple patches for which patch tarball just
consists of a few configuration files and no executables. Patch management is one area which any senior sysadmin knows very
well as this is one of the most troublesome issues with distributions like Red Hat, with the non-stop flow of security patches.
A patch is essentially a specially packages update which is installed by specialized program such as rpm or yum. Like is the case
with configuration manages you are responsible for testing patch before applying it to all production systems. With some patches
containing only scripts and configuration files it is difficult to say where patch management ends and configuration management starts.
You can package any change as RPM and apply it via package manager.
Makefiles also can be viewed as a configuration management tool. In this case you put in place your changes by running
a makefile which should take care about all the necessary additional checks and verifying the "preconditions".
Grid schedulers and their envelope files written in shell. Grid schedulers such as
SGE, Torque/PBC, etc use special "submit scripts" to submit jobs. You can think
about them as a system of pre and post checks similar to what RPM utilizes. They also use the concept of queue that specify
the server group on which particular class of jobs (you change in this particular case) should be executed and can scale to very
large number of servers (10K or more) Most grid schedulers also provide logging, moving the output files to the master
server and a lot of other useful for proper implementing changes functionality. All you need is to submit your job to
all servers in the queue using a simple shell loop. They will be scheduled and executed asynchronously making it possible
to implement changes of hundreds or thousand servers in a uniform manner. Logs also provide some rudimental documentation about
the change.
Environment modules. This is a specialized method of maintaining
your users .bashrc and .bash_profile files and making them less complex and more modular.
Those methods can be combined: for example grid schedulers can be used to deploy RPMs or make file. Tools are known, well debugged
and involve zero learning curve for most senior level system administrators.
The last thing most sysadmins need is to master yet another complex software system; we have them more then enough already. Attempts
to design yet another DSL, without attempts to standardize them and take into account the learning curve probably should be considered
a special case of software graphomania.
Graphomania ... refers to an obsessive impulse to write....
Outside the psychiatric definitions of graphomania and related conditions, the word is used more broadly to label the urge
and need to write excessively, whether professional or not...
Milan Kundera ironically explains proliferation of non-professional writing as follows:
"Graphomania inevitably takes on epidemic proportions when a society develops to the point of creating three basic conditions:
An elevated level of general well-being, which allows people to devote themselves to useless activities;
A high degree of social atomization and, as a consequence, a general isolation of individuals;
The absence of dramatic social changes in the nation's internal life. (From this point of view, it seems to me symptomatic
that in France, where practically nothing happens, the percentage of writers is twenty-one times higher than in Israel)."
- Milan Kundera, The Book of Laughter and Forgetting, 1978
Unless they are pushed from above, such systems generally should be rejected. Only systems that use standard scripting
language as DSL should be considered. But the problem is that they redefine functionality of most Unix utilities into their own
API that you need to learn. So even usage of scripting language is not a panacea. The learning crve remains steep.
That's probably why the majority of "Puppet-related" books are so utterly useless (as in "do not contain information for solving
your current problems") and extremely boring to read. And again, as far as I know, few sysadmins are Ruby enthusiasts. Most
probably know some Perl or Python, but while Ruby is a Perl-derivative there is a big distance from programming Perl to programming
in Ruby.
Standard Linux/Unix distributions contains enough powerful tools that can significantly simplify accomplishing of the 80% of tasks
that Unix configuration systems perform. Often with less troubles and zero leaning curve. At the core of any Unix configuration
management system there are two very simple concepts which were already present in rdist created more then 30 years ago:
Parallel execution of tasks on multiple servers. there is where Unix configuration management started (and many actually
ended being unable to produce any worthwhile idea beyond that)
The idea of groups of servers (or in a more sophisticated from you can view a groups as kind of queue, as defined by SGE
and similar cluster schedulers) for which you apply changes. Each group is assigned a name. You should be able to perform all classic
set operations on those groups to form new groups and subgroups, operating groups ads variable. for example, if you need to,
you should be able to select all servers with RHEL 6.8 that have Apache installed, form a new group, and apply particular change
by just selecting an appropriate group.
I would like to stress that the most typical way Linux/Unix administrators perform configuration management tasks connected with
distibution of s set of files to multiple servers is to create tarball that contains the changed files and then use so called "parallel
execution tools" to backup files to be changes, apply the tarball to the target group of server and then verify that results. This
set of parallel distribution tools typically used includes but is not limited to such tools as
PDSH, C3 Tools, rsync,
rdist. NFS or any other shared filesystem such as GPFS also can be used for this purpose and are typically used in HPC cluster environment.
In Germany, Eastern Europe and xUSSR area file managers such as Midnight Commander (which allow you to compare two directories and
works well with RPMs) is often used as the sysadmin tool of choice for creating tarballs with changes.
Moreover, I use is as frontend to my own scripts and integrate them into Midnight Commander user menu, making selecting files
that are involved in particular operation simpler and more reliable. This visualization of what you are changing or putting into
your tarball (you can have the content of tarball be visible of the second panel of Midnight Commander while adding the files,
is very important for creating a "right" tarball with all the necessary files to be changed.
Visual feedback increases "situation awareness" and as such cuts down on mistakes, especially disastrous one. So Midnight Commander
can serve as a important part of sysadmin arsenal of tool for configuration management -- a "universal frontend" that
pass list of selected files to your custom scripts. It also has a primitive ability to work
with remote filesystem via ssh (providing you with a virtual filesystem).
After the tarball is created such tools such as C3 Toolscexec/cpush
utility or to rsync are used to distribute it to the particular group. If you do it from the script the group to
which the change applies can be supplied via environment variable.
One nice thing to do is to put into each changed file a unique "signature" (version of the file and date of change) by grepping which
you can determine that the right file is deployed without diffing it will the "etalon". Of course this is possible only with the file
that allows the comment, but version also can be encoded in the time fields as the number of seconds in the files date of creation.
Naturally, the checking, if all the right servers really received the proper version of the changed files, represent that most
important half of the task of deployment of any, no matter how trivial, change to multiple servers.
Rsync and custom RPMs represent the level above that and can word with "seed servers" and custom repositories. The latter can
contain set of RPMs for common operations and in order to distribute a new change you just update the RPM that implemented this change
in the past. This approach is indispensible if the task in hand is more complex that tarball can handle and needs some pre
and post conditions and/or need per server checking of applicability. Of course, pre and post scripts can be integrated into tarball
as well, so RPMs are not the only game in town is you need this functionality. Advantage of RPMs is that their deployment using
yum provides you with the history and other goodies, which in case of tarballs is missing and you need to create everything from scratch
reinventing the bicycle. For large number of servers this is an important advantage.
Some sysadmins also use versioning system like git with various levels of success (pulling updates files from the central repository
is not a bad idea of implementing some changes; it provide instant backup of previous versions and change control). But the capabilities
required for rolling back change in system administration are still different from those that git provides. That usually involve much
more then restoring previous content of files changed.
Still git (used in moderation) can help to maintain the log of changes for critical configuration files and allow to roll individual
files back for several generations, if necessary. Git is also easy to deploy and are not that difficult to learn, at least on
basic level and is a very useful tool for your "seed" server. You just need some set of script that put all changes files into
git or other version control system automatically at the end of each day, because in case of multiple sysadmins you can't rely the everybody
who touches configuration file uses standard commit operation. In a way they can be considered as a step up in the direction
of a "full" Unix configuration management system.
Seldom Unix sysadmins are excited about makefiles and other tools that software developers are using daily and take for granted.
That last thing any Unix sysadmin wants is to write a script to distribute a single file to multiple servers, or, God forbid, to list
explicitly attributes for each file, like in many examples of half-baked books on this topic recommend (Puppet books are especially
bad in this respect, reflecting the weakness of the system). Only to the extent that Unix administrator is often also a programmer
(most senior sysadmins know in addition to shell at least one scripting language on professional level; often this is Perl, or Python)
he might see the analogies more clearly and enjoy this way of application of changes more. But he also clearly can see the huge
differences and shortcoming of viewing Unix configuration of multiple servers as a software development task. It is simply not.
Spending an hour on relearning testing and deploying written a year or two ago script for distribution of updates to /etc/hosts
files is not time well spent. Here problems are well known, and such change can be implemented in 10 minutes using cpush or similar
utility with the same reliability and even version control.
It goes without saying that Unix always have tools to simplify performance of those tasks. For example
rdist -- a program to maintain identical copies of files over multiple
hosts. (it preserves the owner, group, mode, and mtime of files if possible and can update programs
that are executing) is almost as old as Unix. Later ssh became standard de facto protocol for distributing files to multiple servers,
which do not have a common filesystem with "seed" server (such as NSF or GPFS). For example, because they are on a different continent.
The tarball method of distribute configuration multiple configuration files to multiple server involves several steps:
Implement manually changes on one of the servers and verify that they work
Create a tarball of changes (possibly using Midnight commander) and "manifest" file with the list of files (just the list
of file with absolute path).
Create backup tarball on all servers of the selected group using manifest file. Verify that you are replacing the same
set of files on all servers (some server might have manually edited file among the group you intend to replace). This can be
done a loop comparing tarballs from the group, one by one, with the "pristine" set of file to be changed.
Use one of the method of distributing changes to the target groups of servers:
Distribute the tarball using some kind of ssh-based distribution utility such as cpush from C3 Tools or putting it on the common distributed filesystem
such as NFS share.
If you use tarball method untar the tarball on all servers using parallel execution tool such as cexec from
C3 Tools of
PDSH Those tools allow to use the concept of group of servers.
Verify the result using some custom scripts or comparison with the known working instance. This last part is usually
the most complex and challenging as server that appear identical and belonging the same group might have idiosyncrasies about which
you forgot. And even one incorrectly deployed instance defeats the idea.
Restore the servers for which update failed to their initial state using tarball of original files created at step 2.
Another major problem is how to abstract the differences between various flavors of Unix. So far I did not see any bright ideas in
this area. All efforts are primitive and ad-hoc. But that is the domain where Unix configuration system should put the most
efforts as such a difference are the major pain in daily sysadmin work with multiple Linux/Unix flavors. Currently I do not see
anything that exceed the usefulness of a "poor man configuration system" that uses the seed filesystem (can be shared with the Doubletree
structure
A set of directories each of which represent one flavor of Linux and contain common configuration files. For example,
on the "seed filesystem" might have the following set of directories. For example:
In this example, the files are "flattened" by replacing "/" in path with "^" so that they all can reside in a single directory
(which simplifies editing and processing them with scripts). All three files can be simlinked from "lower level directory
/Seedfs/US/NJcenter/etc^hosts -- the hosts file common for the particular datacenter. The levels of hierarchy are
optional can be adjusted to your particular situation.
As most configuration files comes from /etc directory you can omit prefix etc^. So any file with zero number of "^" symbols is
assumed to be from /etc.
A set of directories that contain set of packages that need to be additionally deployed/removed for the particular group of
the servers (group can form hierarchical structure with lower nesting level (closer to the root) groups containing common
packages for all higher level (less general) groups (possibly hosts can be simlinked from "lower level directories" a common repository).
For example
(Optional) A set of "compiled" "partial images" of all groups of servers -- one image per server group -- a set
of system directories with the files to be distributed from which tarball can be created.
Please note that in case of configuration files tree you can not symlink file from more general levels of hierarchy but
you also can programmatically generate them using scripts before distribution. The key idea here is that you "compile" the image
of the server, using methods developed for code generation in compilers. This complication can involve creating a set of yum command
to deploy remove ROMs. And then, after testing, synchronize this complied image with the set of real servers in the particular
group using uniform "image synchronization script". Please note that if your image is full it can be patched "in place" using
chroot command.
The level of details can vary. In ultimate form the complete image is stored in each branch of the seed directory. In this case the
problem of maintaining multiple servers is reduced to the problem of maintaining of multiple images in the same filesystem, which facilitate
sharing of files and other tricks that allow to simplify system administration of multiple servers. This approach is, for
example, used by Bright Cluster Manager, which provides
opportunity to reimage the server from the assigned to it image (one image can be used by a group of servers) on the reboot. This
idea of reimaging the servers or workstations from the central image or database was also at the core of
LCFG design (which in the key ideas is quite similar to
Kickstart which, in turn, was influenced by
Solaris jumpstart).
Kickstart implement this idea differently allowing you to recreate the image of the standard DVD using so called kickstart
file. In this case you reinstall the server from a kickstart file and then apply a set of additional changes using post scripts
to achieve the required configuration. Outside of computational nodes on clusters and other simple server configurations this approach
does not work well as each server eventually became too idiosyncratic to be described by you post scripts, unless you generate them
automatically (which is also possible)
A variation of the same method avoid creation of the tarball by putting all changes into version control system such as subversion
or git and extracting those file on target servers. While this is a more fancy way to accomplish the same, it does not produce
critical advantages. It is usually enough to implement version control on the seed server. The main advantage of is that
it provides you with far better documentation as all changes on all servers of the group, as each change is documented in version control
system.
The alternative way, more suitable for complex changes, when you need to check if this change is applicable with a script ( test
of the timestamp is not enough) or not and may need be execute some "post-change" scripts, is to create your own RPM (or, better,
modify the existing one -- which is easy with Midnight Commander), distribute it to all servers (or put it in your private repository)
and use YUM to install this RPM. In this case documentation exists within yum logs and RPM database.
The third way (suitable only is you already has SGE or similar grid scheduler deployments) is to use grid scheduler and write you
own "envelope" (called submission script) script that provide pre and post checks. Then the job can be submitted to all nodes of the
group via such scheduler. This method works well for clusters. It can be combined with two previous approaches. Essentially, in this
case SGE is just a higher level, more sophisticated version of a parallel execution tools that has some additional capabilities (for
example it can wait until the server CPU is not loaded) and is scalable to thousands of hosts.
So you can view it as the "next generation" of such tools as cexec or PDSH. It also bring the concept of the group of
server on a new, more sophisticated level. Of course this approach is more suitable for clusters, where grid scheduler is deployed by
default and does not need to be specifically installed. In this case you also do not need to suffer from learning curve as this
is production tool without which cluster is not operational.
In many cases the existing generation of Unix configuration management can't compete in efficiency and simplicity with those "primitive"
approaches and they adds very little or nothing to the capabilities presented by those "poor man" Unix configuration systems.
Only when the number of servers (and, especially, virtual instances) you manage exceeds any human capacity to understand them (which
probably is true for any number above 100; can be less if servers are non-uniform) you probably need a more strict, more regulated
approach then tarball based changes distribution. And can switch to some (preferably agentless) Unix configuration management system.
But, again, the DSL should be a regular scripting language. Accept no substitutes. And preferably the scripting language that you know
well (the last requirement is really important for lasting success, unless you want to learn new scripting language). Ability to generate
"execution scripts" in bash or Perl are also important -- you can look at those scripts and see what is happening, which is more
tricky if everything is done in interpretive fashion. But that negatively affects diagnostics.
Tricky deployments usually make it necessary to use scripts that takes care about all special cases, testing them (possibly involving
QA) and then deploying it on groups of the servers that are affected by a particular change. Custom RPMs typically work
very well for such cases and have an advantage that all the necessary infrastructure is already in place (rpm, yum, etc) and is well
debugged. All you need is to create and populate custom repository. Which also can be used for deployment of "non-standard" packages
RPMs of which can't be found in standard repositories to which your systems are connected.
Another important case which warrants extreme caution, and additional efforts are remote datacenters. Here the number of servers
does not matter, as even error on one is very costly (may involve your trip to some God forgotten location) and they tend to be non-uniform.
In this case, if you seriously screw something up because you forgot about important differences when you make a particular change,
you might need iether to drive more then a hundred miles, or fly to fix the mess. But what is funny, remote datacenters
are far less suitable to typical activities that Puppet tries to automate, unless you view it as a monitoring system in which role it
is definitely an OK (but not very exiting) solution,
I would like to stress that in many aspect Puppet is competitive with Open View. Even Puppet agents in this case make sense and have
the right for existence.
But again, often modification of existing RPMs is simpler then playing with the level of complexity Puppet designers enforce on you,
unless you are a Ruby enthusiast and would like to learn it better. The same is true of all other Unix configuration management
systems that Puppet compete with it. They are all created by overcomplexity junkies, who in reality do not care about the reality
of Unix system administration work, and tradeoff involved. May be initially they cared, but later development deviated into "art for
the sake of the art" type of functionality and "Microsoft mentality" prevailed. This inability to keep the system simple and transparent,
even at the cost of avoiding implementing some peripheral functionality, is very upsetting.
The fact is that large sections of the officer corps ... had no desire to fight for the republic, which they
despised... The constant tension between political and military leaders is exacerbated by wartime conditions.
You should not fell under the spell of magic words "configuration management system". In your particular circumstances and especially
for smaller side of organizations and datacenters, it well might be useless or even harmful. Other things equal simpler system
or even set of existing tools will suit your need better then a complex one. Return on investment for additional complexity is negative
in most cases. Also social factors are often play more important role. Mismanagement in large organizations sometimes takes really epic
scale and here no Unix configuration system can change the situation to the better. It will remain a horror show, and you need just
to suffer or quit. And with current trend to outsourcing and virtualization of everything that management can look at such
situations happen pretty often. Like Minsky moments in economics they looks almost inevitable.
In a way Unix configuration management systems are part of the trend toward less qualification in IT. Younger system administrators
can't have experience of old-timers, who were watching the emerging of all those technologies with their own eyes. So they have
less "in-depth" knowledge that old-timers acquired due to this process. But of course there are old-timers and old-timers. A lot of
old-times are just accidental people which moved to the field during dot-com boom years (say, 1990-1998). Many of them are barely competent
in what they are doing even now.
At the same time, I would not get too exited about new generation of IT workers (mostly part-time and lower paid) getting much from
such systems as Puppet, chief, or other system on the upper end of the complexity levels for Unix configuration systems. But I am not
worried about possible blunders that can be committed with them iether. Blunders greatly affecting network or server reliability.
May be something will happen on the margins. The damage from a SNAFU when you deploy something on hundreds servers (or virtual instances)
and it negatively affects functionality of the servers can be significant. At the same time, due to commodization of the technology
the IT support on the level of the firm now matter less and that includes Unix administrators. Complex issues are delegated to vendor
support (which is also quickly deteriorating), or professional consultants. Enterprise software is also more or less standardized.
That diminishes such changes for such cases.
Where huge blunders are now made is at senior level, where people became generally detached from technology (and sometimes from reality).
Also too many technically illiterate bean counters were promoted to senior IT positions. And they often rely on fashion (as well
as vendor hype and/or bribing) in adopting new technologies for the firm. But at the end of the day their blunders are also not crucial.
They might produce modest cost overruns. Nothing to be exited about. Something that cost $100K can be acquired for a
million and cost another couple of million in maintenance fees and internal costs before being abandoned. Or some new and
existing Potemkin village can be build for a couple of millions. That's about it. Remember that in a large company IT is generally around
1% of the total cost of a large company operations.
At the same time on level of individual system administrators the situation is less rozy. As with the proferation of virtual instances
Unix sysadmin became more overloaded, wize chose of Unix configuration system can pay greatly on individual level. Excessive overcomplexity
one can break the camel back. That means that if you have a freedom to choose one for your individual needs, you need to chose less
complex system with the most flat learning curve, which allows initial usage as a simple parallel execution tool. Only when you will
feel ready to delve into more complex staff and that such activity can pay dividends, you can start converting your pre-existing scripts
into Unix configuration management framework. And again no framework can replace your own brain. Despite all hoopla the current generation
of Unix configuration management systems are limited tools that do not take into account many things in the complex environment
of a real datacenter. And can't solve most problems that you face.
Another consideration is that no Unix configuration management system exists in vacuum. They are immerged in already existing set
of tools and should be able at least of superficial level interact with monitoring and version control systems. And, possibly, with
helpdesk systems and "knowledge" wikis. Accumulation of knowledge now is a crusial activity as you forget most of the things pretty
quickly. If Unix configuration system can help with that it is a much more valuable tool, then otherwise. convention of
logs into wiki or blog format should be considered.
Some Unix configuration system replicate basic functionality of Unix monitoring systems and, because they are more programmable,
can replace them if your requirements are not too strict. Usually they are better written and better architectured than "pure" monitoring
system. That also justifies some level of additional complexity (for example existence of the agents and need to configure, maintain
and secure them on all servers) and put Puppet and Chief back into play . In other words it is important to understand to what
extent they can replace some already deployed systems, or, at least, complement them. At the same time for some tasks Unix
configuration management system are inferior to existing tools. Environment modules
is one example. They are used for maintaining dot files and creation of proper environment for complex applications(there is also a
more modern version written in LUA).
When an automated tool complicates the tasks that are relatively easy (forcing you to write long descriptions of what you intend
to do) and makes more difficult to perform the tasks which are really complex, one would wonder why you need such a tool at all.
In this case you need a courage to say: "the king is naked" and choose another path.
Due to "overcomplexity factor" in modern Unix configuration management systems "poor man Unix configuration system" build from
simple and well known utilities as scp, rsync and NFS filesystem can be productively used for automation
of Unix configuration management instead of a more complex system. And can accomplish complex tasks if used with a set of "wrapper"
script written in bash or Perl that unify different falvours of Linux/Unix. Moreover a typically configuration management
systems does not provide functionality of baseliners (which are typically used by Linux vendors
for troubleshooting complex problems) and backup tools, especially bare metal backup tools such as
Relax and Recover (and believe me a recent tarball is a perfect store of all configuration
information about particular server :-). The ability to restore the system after failed deployment of packages or patches is must for
any system that deals with complex production environment. Changes sometimes tend to destabilize working normally systems and understanding
why this happened can take weeks or even months of your time and help from the vendor, and in some cases, hardware manufacturer (if
you are especially unlucky). Even in a very simple and uniform environment of Web hosting providers, people periodically blow
their systems out of the water due to rushed changes, causing pain for thousands of users and inflicting financial losses for their
organizations (some users quit after such as an incident). There is a huge advantage in sticking to just simple tools (KISS principle),
which does not exclude some clever ways to combine them in order to enhance their usefulness. The main advantage of simple tools is
that they do not stand between you and the task in a way complex systems do, when you need to learn how to troubleshoot them in addition
to how performing tasks using them. After all Unix philosophy of software development is based on the idea of the reuse of existing
tools. That's what, for example, Unix pipes and Unix shells are about: they allow you combine simple tools to perform very complex tasks.
That means that you can concentrate on the task in hand instead of learning intricacies of some complex and potentially not very helpful
tool, which reinvents the bicycle, contains its own set of bugs, gotchas, security vulnerabilities and require time to learn to use
properly. In Unix configuration management there is always, as people say "
more than one way to skin a cat" ;-)
While tar is rarely viewed as system configuration tool, but in reality it is a very useful one and learning it "in-depth" pays.
Using tar you, for example, can skip changing files with more recent timestamp then timestamp of the change (possible "outliers" about
existence of which you might did not know or forget ). So this functionality does not need to be explicitly programmed. Creating tarball
with changes, distributing it to servers and then untarring it can be further simplified by using Midnight Commander. The same is true
for custom RPM that you definitely need to learn how to build, as those skills are crucial for troubleshooting complex software deployment
problems. We already discussed those things, and this is just a summary of previous discussion.
Remember that if you encounter a real SNAFU after some configuration change on multiple servers (especially connected with deploying
software using RPMs or other packages ), often only backup can save your skin. Retuning to the initial state by un-installing
RPMs often fails. Actually re-reading Sysadmin Horror Stories before making complex change
on multiple and, especially, remote servers is a good method of raising situational awareness. And that might be more effective than
using for deployment of the change a complex configuration management system ;-).
As the final note, please understand that Unix configuration management systems are useless for fighting the incompetence of IT management
of large corporations which can be simply staggering and can be compared with the description of military bureaucracy in
The Good Soldier �vejk. For sure, it produces the
same mixed feelings:
"All along the line,' said the volunteer, pulling the blanket over him, 'everything in the army stinks of rottenness. Up till
now the wide-eyed masses haven't woken up to it. With goggling eyes they let themselves be made into mincemeat and then when they're
struck by a bullet they just whisper, "Mummy!"
Heroes don't exist, only cattle for the slaughter and the butchers in the general staffs. But in the end every body will mutiny
and there will be a fine shambles. Long live the army! Goodnight!"
In Ansible architecture, you have a controller node and managed nodes. Ansible is installed
on only the controller node. It's an agentless tool and doesn't need to be installed on the
managed nodes. Controller and managed nodes are connected using the SSH protocol. All tasks are
written into a "playbook" using the YAML language. Each playbook can contain multiple
plays, which contain tasks , and tasks contain modules . Modules are
reusable standalone scripts that manage some aspect of a system's behavior. Ansible modules are
also known as task plugins or library plugins.
Playbooks for complex tasks can become lengthy and therefore difficult to read and
understand. The solution to this problem is Ansible roles . Using roles, you can break
long playbooks into multiple files making each playbook simple to read and understand. Roles
are a collection of templates, files, variables, modules, and tasks. The primary purpose behind
roles is to reuse Ansible code. DevOps engineers and sysadmins should always try to reuse their
code. An Ansible role can contain multiple playbooks. It can easily reuse code written by
anyone if the role is suitable for a given case. For example, you could write a playbook for
Apache hosting and then reuse this code by changing the content of index.html to
alter options for some other application or service.
The following is an overview of the Ansible role structure. It consists of many
subdirectories, such as:
Initially, all files are created empty by using the ansible-galaxy command. So,
depending on the task, you can use these directories. For example, the vars
directory stores variables. In the tasks directory, you have main.yml
, which is the main playbook. The templates directory is for storing Jinja
templates. The handlers directory is for storing handlers.
Advantages of Ansible roles:
Allow for content reusability
Make large projects manageable
Ansible roles are structured directories containing sub-directories.
But did you know that Red Hat Enterprise Linux also provides some Ansible System Roles to manage operating system
tasks?
System roles
The rhel-system-roles package is available in the Extras (EPEL) channel. The
rhel-system-roles package is used to configure RHEL hosts. There are seven default
rhel-system-roles available:
rhel-system-roles.kdump - This role configures the kdump crash recovery service. Kdump is
a feature of the Linux kernel and is useful when analyzing the cause of a kernel crash.
rhel-system-roles.network - This role is dedicated to network interfaces. This helps to
configure network interfaces in Linux systens.
rhel-system-roles.selinux - This role manages SELinux. This helps to configure the
SELinux mode, files, port-context, etc.
rhel-system-roles.timesync - This role is used to configure NTP or PTP on your Linux
system.
rhel-system-roles.postfix - This role is dedicated to managing the Postfix mail transfer
agent.
rhel-system-roles.firewall - As the name suggests, this role is all about managing the
host system's firewall configuration.
rhel-system-roles.tuned - Tuned is a system tuning service in Linux to monitor connected
devices. So this role is to configure the tuned service for system performance.
The rhel-system-roles package is derived from open source Linux system-roles . This Linux-system-role is
available on Ansible Galaxy. The rhel-system-roles is supported by Red Hat, so you
can think of this as if rhel-system-roles are downstream of Linux system-roles. To
install rhel-system-roles on your machine, use:
This is the default path, so whenever you use playbooks to reference these roles, you don't
need to explicitly include the absolute path. You can also refer to the documentation for using
Ansible roles. The path for the documentation is
/usr/share/doc/rhel-system-roles
The documentation directory for each role has detailed information about that role. For
example, the README.md file is an example of that role, etc. The documentation is
self-explanatory.
The following is an example of a role.
Example
If you want to change the SELinux mode of the localhost machine or any host machine, then
use the system roles. For this task, use rhel-system-roles.selinux
For this task the ansible-playbook looks like this:
---
- name: a playbook for SELinux mode
hosts: localhost
roles:
- rhel-system-roles.selinux
vars:
- selinux_state: disabled
After running the playbook, you can verify whether the SELinux mode changed or not.
Shiwani
Biradar I am an OpenSource Enthusiastic undergraduate girl who is passionate about Linux &
open source technologies. I have knowledge of Linux , DevOps, and cloud. I am also an active
contributor to Fedora. If you didn't find me exploring technologies then you will find me
exploring food! More about me
We've gone over several things you can do with Ansible on your system, but we haven't yet
discussed how to provision a system. Here's an example of provisioning a virtual machine (VM)
with the OpenStack cloud solution.
- name: create a VM in openstack
osp_server:
name: cloudera-namenode
state: present
cloud: openstack
region_name: andromeda
image: 923569a-c777-4g52-t3y9-cxvhl86zx345
flavor_ram: 20146
flavor: big
auto_ip: yes
volumes: cloudera-namenode
All OpenStack modules start with os , which makes it easier to find them. The
above configuration uses the osp-server module, which lets you add or remove an instance. It
includes the name of the VM, its state, its cloud options, and how it authenticates to the API.
More information about cloud.yml
is available in the OpenStack docs, but if you don't want to use cloud.yml, you can use a
dictionary that lists your credentials using the auth option. If you want to
delete the VM, just change state: to absent .
Say you have a list of servers you shut down because you couldn't figure out how to get the
applications working, and you want to start them again. You can use
os_server_action to restart them (or rebuild them if you want to start from
scratch).
Here is an example that starts the server and tells the modules the name of the
instance:
Most OpenStack modules use similar options. Therefore, to rebuild the server, we can use the
same options but change the action to rebuild and add the
image we want it to use:
For this laptop experiment, I decided to use Debian 32-bit as my starting point, as it
seemed to work best on my older hardware. The bootstrap YAML script is intended to take a
bare-minimal OS install and bring it up to some standard. It relies on a non-root account to be
available over SSH and little else. Since a minimal OS install usually contains very little
that is useful to Ansible, I use the following to hit one host and prompt me to log in with
privilege escalation:
The script makes use of Ansible's raw module to set some base
requirements. It ensures Python is available, upgrades the OS, sets up an Ansible control
account, transfers SSH keys, and configures sudo privilege escalation. When bootstrap
completes, everything should be in place to have this node fully participate in my larger
Ansible inventory. I've found that bootstrapping bare-minimum OS installs is nuanced (if there
is interest, I'll write another article on this topic).
The account YAML setup script is used to set up (or reset) user accounts for each family
member. This keeps user IDs (UIDs) and group IDs (GIDs) consistent across the small number of
machines we have, and it can be used to fix locked accounts when needed. Yes, I know I could
have set up Network Information Service or LDAP authentication, but the number of accounts I
have is very small, and I prefer to keep these systems very simple. Here is an excerpt I found
especially useful for this:
---
- name : Set user accounts
hosts : all
gather_facts : false
become : yes
vars_prompt :
- name : passwd
prompt : "Enter the desired ansible password:"
private : yes
tasks :
- name : Add child 1 account
user :
state : present
name : child1
password : "{{ passwd | password_hash('sha512') }}"
comment : Child One
uid : 888
group : users
shell : /bin/bash
generate_ssh_key : yes
ssh_key_bits : 2048
update_password : always
create_home : yes
The vars_prompt section prompts me for a password, which is put to a Jinja2 transformation
to produce the desired password hash. This means I don't need to hardcode passwords into the
YAML file and can run it to change passwords as needed.
The software installation YAML file is still evolving. It includes a base set of utilities
for the sysadmin and then the stuff my users need. This mostly consists of ensuring that the
same graphical user interface (GUI) interface and all the same programs, games, and media files
are installed on each machine. Here is a small excerpt of the software for my young
children:
- name : Install kids software
apt :
name : "{{ packages }}"
state : present
vars :
packages :
- lxde
- childsplay
- tuxpaint
- tuxtype
- pysycache
- pysiogame
- lmemory
- bouncy
I created these three Ansible scripts using a virtual machine. When they were perfect, I
tested them on the D620. Then converting the Mini 9 was a snap; I simply loaded the same
minimal Debian install then ran the bootstrap, accounts, and software configurations. Both
systems then functioned identically.
For a while, both sisters enjoyed their respective computers, comparing usage and exploring
software features.
The moment of truth
A few weeks later came the inevitable. My older daughter finally came to the conclusion that
her pink Dell Mini 9 was underpowered. Her sister's D620 had superior power and screen real
estate. YouTube was the new rage, and the Mini 9 could not keep up. As you can guess, the poor
Mini 9 fell into disuse; she wanted a new machine, and sharing her younger sister's would not
do.
I had another D620 in my pile. I replaced the BIOS battery, gave it a new SSD, and upgraded
the RAM. Another perfect example of breathing new life into old hardware.
I pulled my Ansible scripts from source control, and everything I needed was right there:
bootstrap, account setup, and software. By this time, I had forgotten a lot of the specific
software installation information. But details like account UIDs and all the packages to
install were all clearly documented and ready for use. While I surely could have figured it out
by looking at my other machines, there was no need to spend the time! Ansible had it all
clearly laid out in YAML.
Not only was the YAML documentation valuable, but Ansible's automation made short work of
the new install. The minimal Debian OS install from USB stick took about 15 minutes. The
subsequent shape up of the system using Ansible for end-user deployment only took another nine
minutes. End-user acceptance testing was successful, and a new era of computing calmness was
brought to my family (other parents will understand!).
Conclusion
Taking the time to learn and practice Ansible with this exercise showed me the true value of
its automation and documentation abilities. Spending a few hours figuring out the specifics for
the first example saves time whenever I need to provision or fix a machine. The YAML is clear,
easy to read, and -- thanks to Ansible's idempotency -- easy to test and refine over time. When
I have new ideas or my children have new requests, using Ansible to control a local virtual
machine for testing is a valuable time-saving tool.
Doing sysadmin tasks in your free time can be fun. Spending the time to automate and
document your work pays rewards in the future; instead of needing to investigate and relearn a
bunch of things you've already solved, Ansible keeps your work documented and ready to apply so
you can move onto other, newer fun things!
Ansible works by connecting to nodes and sending small programs called modules to be
executed remotely. This makes it a push architecture, where configuration is pushed from
Ansible to servers without agents, as opposed to the pull model, common in agent-based
configuration management systems, where configuration is pulled.
These modules are mapped to resources and their respective states , which are
represented in YAML files. They enable you to manage virtually everything that has an
API, CLI, or configuration file you can interact with, including network devices like load
balancers, switches, firewalls, container orchestrators, containers themselves, and even
virtual machine instances in a hypervisor or in a public (e.g., AWS, GCE, Azure) and/or private
(e.g., OpenStack, CloudStack) cloud, as well as storage and security appliances and system
configuration.
With Ansible's batteries-included model, hundreds of modules are included and any task in a
playbook has a module behind it.
The contract for building modules is simple: JSON in the stdout . The
configurations declared in YAML files are delivered over the network via SSH/WinRM -- or any
other connection plugin -- as small scripts to be executed in the target server(s). Modules can
be written in any language capable of returning JSON, although most Ansible modules (except for
Windows PowerShell) are written in Python using the Ansible API (this eases the development of
new modules).
Modules are one way of expanding Ansible capabilities. Other alternatives, like dynamic
inventories and plugins, can also increase Ansible's power. It's important to know about them
so you know when to use one instead of the other.
Plugins are divided into several categories with distinct goals, like Action, Cache,
Callback, Connection, Filters, Lookup, and Vars. The most popular plugins are:
Connection plugins: These implement a way to communicate with servers in your inventory
(e.g., SSH, WinRM, Telnet); in other words, how automation code is transported over the
network to be executed.
Filters plugins: These allow you to manipulate data inside your playbook. This is a
Jinja2 feature that is harnessed by Ansible to solve infrastructure-as-code problems.
Lookup plugins: These fetch data from an external source (e.g., env, file, Hiera,
database, HashiCorp Vault).
Although many modules are delivered with Ansible, there is a chance that your problem is not
yet covered or it's something too specific -- for example, a solution that might make sense
only in your organization. Fortunately, the official docs provide excellent guidelines on
developing
modules .
IMPORTANT: Before you start working on something new, always check for open pull requests,
ask developers at #ansible-devel (IRC/Freenode), or search the development list and/or existing
working groups to see if a
module exists or is in development.
Signs that you need a new module instead of using an existing one include:
Conventional configuration management methods (e.g., templates, file, get_url,
lineinfile) do not solve your problem properly.
You have to use a complex combination of commands, shells, filters, text processing with
magic regexes, and API calls using curl to achieve your goals.
Your playbooks are complex, imperative, non-idempotent, and even non-deterministic.
In the ideal scenario, the tool or service already has an API or CLI for management, and it
returns some sort of structured data (JSON, XML, YAML).
Identifying good and bad
playbooks
"Make love, but don't make a shell script in YAML."
So, what makes a bad playbook?
- name : Read a remote resource
command : "curl -v http://xpto/resource/abc"
register : resource
changed_when : False
- name : Create a resource in case it does not exist
command : "curl -X POST http://xpto/resource/abc -d '{ config:{ client: xyz, url: http://beta,
pattern: *.* } }'"
when : "resource.stdout | 404"
# Leave it here in case I need to remove it hehehe
#- name: Remove resource
# command: "curl -X DELETE http://xpto/resource/abc"
# when: resource.stdout == 1
Aside from being very fragile -- what if the resource state includes a 404 somewhere? -- and
demanding extra code to be idempotent, this playbook can't update the resource when its state
changes.
Playbooks written this way disrespect many infrastructure-as-code principles. They're not
readable by human beings, are hard to reuse and parameterize, and don't follow the declarative
model encouraged by most configuration management tools. They also fail to be idempotent and to
converge to the declared state.
Bad playbooks can jeopardize your automation adoption. Instead of harnessing configuration
management tools to increase your speed, they have the same problems as an imperative
automation approach based on scripts and command execution. This creates a scenario where
you're using Ansible just as a means to deliver your old scripts, copying what you already have
into YAML files.
Here's how to rewrite this example to follow infrastructure-as-code principles.
- name :
XPTO
xpto :
name : abc
state : present
config :
client : xyz
url : http://beta
pattern : "*.*"
The benefits of this approach, based on custom modules , include:
It's declarative -- resources are properly represented in YAML.
It's idempotent.
It converges from the declared state to the current state .
It's readable by human beings.
It's easily parameterized or reused.
Implementing a custom module
Let's use WildFly , an open source
Java application server, as an example to introduce a custom module for our not-so-good
playbook:
JBoss-CLI returns plaintext in a JSON-like syntax; therefore, this approach is very
fragile, since we need a type of parser for this notation. Even a seemingly simple parser can
be too complex to treat many exceptions .
JBoss-CLI is just an interface to send requests to the management API (port 9990).
Sending an HTTP request is more efficient than opening a new JBoss-CLI session,
connecting, and sending a command.
It does not converge to the desired state; it only creates the resource when it doesn't
exist.
A custom module for this would look like:
- name : Configure datasource
jboss_resource :
name : "/subsystem=datasources/data-source=DemoDS"
state : present
attributes :
driver-name : h2
connection-url : "jdbc:h2:mem:demo;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE"
jndi-name : "java:jboss/datasources/DemoDS"
user-name : sa
password : sa
min-pool-size : 20
max-pool-size : 40
This playbook is declarative, idempotent, more readable, and converges to the desired state
regardless of the current state.
Why learn to build custom modules?
Good reasons to learn how to build custom modules include:
Improving existing modules
You have bad playbooks and want to improve them, or
You don't, but want to avoid having bad playbooks.
Knowing how to build a module considerably improves your ability to debug problems in
playbooks, thereby increasing your productivity.
" abstractions save us time working, but they don't save us time learning." -- Joel Spolsky,
The Law of
Leaky Abstractions
Custom Ansible modules 101
JSON (JavaScript Object Notation) in stdout : that's the contract!
They can be written in any language, but
Python is usually the best option (or the second best)
Most modules delivered with Ansible ( lib/ansible/modules ) are written in Python and
should support compatible versions.
The Ansible way
First step:
git clone https://github.com/ansible/ansible.git
Navigate in lib/ansible/modules/ and read the existing modules code.
Your tools are: Git, Python, virtualenv, pdb (Python debugger)
For comprehensive instructions, consult the
official docs .
An alternative: drop it in the library directory library/ # if any custom modules,
put them here (optional)
module_utils/ # if any custom module_utils to support modules, put them here (optional)
filter_plugins/ # if any custom filter plugins, put them here (optional)
site.yml # master playbook
webservers.yml # playbook for webserver tier
dbservers.yml # playbook for dbserver tier
roles/
common/ # this hierarchy represents a "role"
library/ # roles can also include custom modules
module_utils/ # roles can also include custom module_utils
lookup_plugins/ # or other types of plugins, like lookup in this case
It's easier to start.
Doesn't require anything besides Ansible and your favorite IDE/text editor.
This is your best option if it's something that will be used internally.
TIP: You can use this directory layout to overwrite existing modules if, for example, you
need to patch a module.
First steps
You could do it in your own -- including using another language -- or you could use the
AnsibleModule class, as it is easier to put JSON in the stdout ( exit_json() ,
fail_json() ) in the way Ansible expects ( msg , meta , has_changed , result ), and it's also
easier to process the input ( params[] ) and log its execution ( log() , debug() ).
module = AnsibleModule ( argument_spec = arguments , supports_check_mode = True )
try :
if module. check_mode :
# Do not do anything, only verifies current state and report it
module. exit_json ( changed = has_changed , meta = result , msg = 'Fez alguma coisa ou
não...' )
if module. params [ 'state' ] == 'present' :
# Verify the presence of a resource
# Desired state `module.params['param_name'] is equal to the current state?
module. exit_json ( changed = has_changed , meta = result )
if module. params [ 'state' ] == 'absent' :
# Remove the resource in case it exists
module. exit_json ( changed = has_changed , meta = result )
NOTES: The check_mode ("dry run") allows a playbook to be executed or just verifies if
changes are required, but doesn't perform them. Also, the module_utils directory can be used
for shared code among different modules.
The Ansible codebase is heavily tested, and every commit triggers a build in its continuous
integration (CI) server, Shippable , which includes
linting, unit tests, and integration tests.
For integration tests, it uses containers and Ansible itself to perform the setup and verify
phase. Here is a test case (written in Ansible) for our custom module's sample code:
- name
: Configure datasource
jboss_resource :
name : "/subsystem=datasources/data-source=DemoDS"
state : present
attributes :
connection-url : "jdbc:h2:mem:demo;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE"
...
register : result
- name : assert output message that datasource was created
assert :
that :
- "result.changed == true"
- "'Added /subsystem=datasources/data-source=DemoDS' in result.msg" An alternative: bundling
a module with your role
How to spin up your infrastructure: e.g., Vagrant, Docker, OpenStack, EC2
How to verify your infrastructure tests: Testinfra and Goss
But your tests would have to be written using pytest with Testinfra or Goss, instead of
plain Ansible. If you'd like to learn more about testing Ansible roles, see my article about
using Molecule
.
tyberious 5 hours ago
They were trying to overcome the largest Electoral and Popular Vote Victory in American
History!
It was akin to dousing a 4 alarm fire with a garden hose, eventually you will get burned!
play_arrow 95 play_arrow pocomotion 5 hours ago
Tyberious, thanks for sharing your thoughts. I think you are correct is your assessment.
play_arrow 15 play_arrow 1 systemsplanet 4 hours ago
I found over 500 duplicate voters in Georgia, but get the feeling that it doesn't
matter.
The game is rigged y_arrow 1 Stainmaker 4 hours ago
I found over 500 duplicate voters in Georgia, but get the feeling that it doesn't
matter.
Of course it doesn't matter when you have Lester Holt interviewing Joe Biden and asking
whether Creepy Joe's administration is going to continue investigating Trump. Whatever happened
to Hunter's laptop and the hundreds of millions in Russian, Ukrainian & Chinese bribes
anyway? 7 play_arrow HelluvaEngineer 5 hours ago
So far, they are winning. Got an idea of a path to victory? I don't. Americans are fvcking
stupid. play_arrow 24 play_arrow 1 tyberious 5 hours ago
Secure shell (SSH) is at the heart of Ansible, at least for almost everything besides
Windows. Key (no pun intended) to using SSH efficiently with Ansible is keys ! Slight aside -- there are a lot of
very cool things you can do for security with SSH keys. It's worth perusing the authorized_keys
section of the sshd manual page
. Managing SSH keys can become laborious if you're getting into the realms of granular user
access, and although we could do it with either of my next two favourites, I prefer to use the
module because it
enables easy management through variables .
Besides the obvious function of placing a file somewhere, the file module also sets
ownership and permissions. I'd say that's a lot of bang for your buck with one module.
I'd proffer a substantial portion of security relates to setting permissions too, so the file
module plays nicely with authorized_keys .
There are so many ways to manipulate the contents of files, and I see lots of folk use
lineinfile . I've
used it myself for small tasks. However, the template module is so much clearer because you
maintain the entire file for context. My preference is to write Ansible content in such a way
that anyone can understand it easily -- which to me means not making it hard to
understand what is happening. Use of template means being able to see the entire file you're
putting into place, complete with the variables you are using to change pieces.
Many modules in the current distribution leverage Ansible as an orchestrator. They talk to
another service, rather than doing something specific like putting a file into place. Usually,
that talking is over HTTP too. In the days before many of these modules existed, you
could program an API directly using the uri module. It's a powerful access tool,
enabling you to do a lot. I wouldn't be without it in my fictitious Ansible shed.
The joker card in our pack. The Swiss Army Knife. If you're absolutely stuck for how to
control something else, use shell . Some will argue we're now talking about making Ansible a
Bash script -- but, I would say it's still better because with the use of the name parameter in
your plays and roles, you document every step. To me, that's as big a bonus as anything. Back
in the days when I was still consulting, I once helped a database administrator (DBA) migrate
to Ansible. The DBA wasn't one for change and pushed back at changing working methods. So, to
ease into the Ansible way, we called some existing DB management scripts from Ansible using the
shell module. With an informative name statement to accompany the task.
You can ac hieve a lot with these five modules. Yes, modules designed to do a specific task
will make your life even easier. But with a smidgen of engineering simplicity, you can achieve
a lot with very little. Ansible developer Brian Coca is a master at it, and his tips and tricks talk is always
worth a watch.
10 Ansible modules for Linux system automationThese handy modules save time and
hassle by automating many of your daily tasks, and they're easy to implement with a few
commands. 26 Oct 2020 Ricardo Gerardi (Red Hat) Feed 69
up 3 comments Image by : Opensource.com x Subscribe now
Ansible is a complete
automation solution for your IT environment. You can use Ansible to automate Linux and Windows
server configuration, orchestrate service provisioning, deploy cloud environments, and even
configure your network devices.
Ansible modules
abstract actions on your system so you don't need to worry about implementation details. You
simply describe the desired state, and Ansible ensures the target system matches it.
This module availability is one of Ansible's main benefits, and it is often referred to as
Ansible having "batteries included." Indeed, you can find modules for a great number of tasks,
and while this is great, I frequently hear from beginners that they don't know where to
start.
Although your choice of modules will depend exclusively on your requirements and what you're
trying to automate with Ansible, here are the top ten modules you need to get started with
Ansible for Linux system automation.
1. copy
The
copy module allows you to copy a file from the Ansible control node to the target hosts. In
addition to copying the file, it allows you to set ownership, permissions, and SELinux labels
to the destination file. Here's an example of using the copy module to copy a "message of the
day" configuration file to the target hosts:
- name: Ensure MOTD file is in place
copy:
src: files / motd
dest: / etc / motd
owner: root
group: root
mode: 0644
For less complex content, you can copy the content directly to the destination file without
having a local file, like this:
- name: Ensure MOTD file is in place
copy:
content: "Welcome to this system."
dest: / etc / motd
owner: root
group: root
mode: 0644
This module works
idempotently , which means it will only copy the file if the same file is not already in
place with the same content and permissions.
The copy module is a great option to copy a small number of files with static content. If
you need to copy a large number of files, take a look at the
synchronize module. To copy files with dynamic content, take a look at the
template module next.
2. template
The
template module works similarly to the copy module, but it processes content
dynamically using the Jinja2 templating
language before copying it to the target hosts.
For example, define a "message of the day" template that displays the target system name,
like this:
$ vi templates / motd.j2
Welcome to {{ inventory_hostname }} .
Then, instantiate this template using the template module, like this:
-
name: Ensure MOTD file is in place
template:
src: templates / motd.j2
dest: / etc / motd
owner: root
group: root
mode: 0644
Before copying the file, Ansible processes the template and interpolates the variable,
replacing it with the target host system name. For example, if the target system name is
rh8-vm03 , the result file is:
Welcome to rh8-vm03.
While the copy module can also interpolate variables when using the
content parameter, the template module allows additional flexibility
by creating template files, which enable you to define more complex content, including
for loops, if conditions, and more. For a complete reference, check
Jinja2
documentation .
This module is also idempotent, and it will not copy the file if the content on the target
system already matches the template's content.
3. user
The user
module allows you to create and manage Linux users in your target system. This module has many
different parameters, but in its most basic form, you can use it to create a new user.
For example, to create the user ricardo with UID 2001, part of the groups
users and wheel , and password mypassword , apply the
user module with these parameters:
Notice that this module tries to be idempotent, but it cannot guarantee that for all its
options. For instance, if you execute the previous module example again, it will reset the
password to the defined value, changing the user in the system for every execution. To make
this example idempotent, use the parameter update_password: on_create , ensuring
Ansible only sets the password when creating the user and not on subsequent runs.
You can also use this module to delete a user by setting the parameter state:
absent .
The user module has many options for you to manage multiple user aspects. Make
sure you take a look at the module documentation for more information.
4. package
The package
module allows you to install, update, or remove software packages from your target system using
the operating system standard package manager.
For example, to install the Apache web server on a Red Hat Linux machine, apply the module
like this:
- name: Ensure Apache package is installed
package:
name: httpd
state: present More on Ansible
This module is distribution agnostic, and it works by using the underlying package
manager, such as yum/dnf for Red Hat-based distributions and apt for
Debian. Because of that, it only does basic tasks like install and remove packages. If you need
more control over the package manager options, use the specific module for the target
distribution.
Also, keep in mind that, even though the module itself works on different distributions, the
package name for each can be different. For instance, in Red Hat-based distribution, the Apache
web server package name is httpd , while in Debian, it is apache2 .
Ensure your playbooks deal with that.
This module is idempotent, and it will not act if the current system state matches the
desired state.
5. service
Use the service
module to manage the target system services using the required init system; for example,
systemd
.
In its most basic form, all you have to do is provide the service name and the desired
state. For instance, to start the sshd service, use the module like this:
-
name: Ensure SSHD is started
service:
name: sshd
state: started
You can also ensure the service starts automatically when the target system boots up by
providing the parameter enabled: yes .
As with the package module, the service module is flexible and
works across different distributions. If you need fine-tuning over the specific target init
system, use the corresponding module; for example, the module systemd .
Similar to the other modules you've seen so far, the service module is also
idempotent.
6. firewalld
Use the firewalld
module to control the system firewall with the firewalld daemon on systems that
support it, such as Red Hat-based distributions.
For example, to open the HTTP service on port 80, use it like this:
- name: Ensure port
80 ( http ) is open
firewalld:
service: http
state: enabled
permanent: yes
immediate: yes
You can also specify custom ports instead of service names with the port
parameter. In this case, make sure to specify the protocol as well. For example, to open TCP
port 3000, use this:
- name: Ensure port 3000 / TCP is open
firewalld:
port: 3000 / tcp
state: enabled
permanent: yes
immediate: yes
You can also use this module to control other firewalld aspects like zones or
complex rules. Make sure to check the module's documentation for a comprehensive list of
options.
7. file
The file
module allows you to control the state of files and directories -- setting permissions,
ownership, and SELinux labels.
For instance, use the file module to create a directory /app owned
by the user ricardo , with read, write, and execute permissions for the owner and
the group users :
You can also use this module to set file properties on directories recursively by using the
parameter recurse: yes or delete files and directories with the parameter
state: absent .
This module works with idempotency for most of its parameters, but some of them may make it
change the target path every time. Check the documentation for more details.
8.
lineinfile
The lineinfile
module allows you to manage single lines on existing files. It's useful to update targeted
configuration on existing files without changing the rest of the file or copying the entire
configuration file.
For example, add a new entry to your hosts file like this:
You can also use this module to change an existing line by applying the parameter
regexp to look for an existing line to replace. For example, update the
sshd_config file to prevent root login by modifying the line PermitRootLogin
yes to PermitRootLogin no :
Note: Use the service module to restart the SSHD service to enable this change.
This module is also idempotent, but, in case of line modification, ensure the regular
expression matches both the original and updated states to avoid unnecessary changes.
9.
unarchive
Use the unarchive
module to extract the contents of archive files such as tar or zip
files. By default, it copies the archive file from the control node to the target machine
before extracting it. Change this behavior by providing the parameter remote_src:
yes .
For example, extract the contents of a .tar.gz file that has already been
downloaded to the target host with this syntax:
Some archive technologies require additional packages to be available on the target system;
for example, the package unzip to extract .zip files.
Depending on the archive format used, this module may or may not work idempotently. To
prevent unnecessary changes, you can use the parameter creates to specify a file
or directory that this module would create when extracting the archive contents. If this file
or directory already exists, the module does not extract the contents again.
10.
command
The command
module is a flexible one that allows you to execute arbitrary commands on the target system.
Using this module, you can do almost anything on the target system as long as there's a command
for it.
Even though the command module is flexible and powerful, it should be used with
caution. Avoid using the command module to execute a task if there's another appropriate module
available for that. For example, you could create users by using the
command module to execute the useradd command, but you should
use the user module instead, as it abstracts many details away from you, taking
care of corner cases and ensuring the configuration only changes when necessary.
For cases where no modules are available, or to run custom scripts or programs, the
command module is still a great resource. For instance, use this module to run a
script that is already present in the target machine:
- name: Run the app installer
command: "/app/install.sh"
By default, this module is not idempotent, as Ansible executes the command every single
time. To make the command module idempotent, you can use when
conditions to only execute the command if the appropriate condition exists, or the
creates argument, similarly to the unarchive module example.
What's
next?
Using these modules, you can configure entire Linux systems by copying, templating, or
modifying configuration files, creating users, installing packages, starting system services,
updating the firewall, and more.
If you are new to Ansible, make sure you check the documentation on how to create
playbooks to combine these modules to automate your system. Some of these tasks require
running with elevated privileges to work. For more details, check the privilege escalation
documentation.
As of Ansible 2.10, modules are organized in collections. Most of the modules in this list
are part of the ansible.builtin collection and are available by default with
Ansible, but some of them are part of other collections. For a list of collections, check the
Ansible documentation
. What
you need to know about Ansible modules Learn how and when to develop custom modules for
Ansible.
Ansible has no notion of state. Since it doesn't keep track of dependencies, the tool
simply executes a sequential series of tasks, stopping when it finishes, fails or encounters an
error . For some, this simplistic mode of automation is desirable; however, many prefer
their automation tool to maintain an extensive catalog for ordering (à la Puppet),
allowing them to reach a defined state regardless of any variance in environmental
conditions.
YAML Ain't a Markup Language (YAML), and as configuration formats go, it's easy on the eyes.
It has an intuitive visual structure, and its logic is pretty simple: indented bullet points
inherit properties of parent bullet points.
It's easy (and misleading) to think of YAML as just a list of related values, no more
complex than a shopping list. There is a heading and some items beneath it. The items below the
heading relate directly to it, right? Well, you can test this theory by writing a little bit of
valid YAML.
Open a text editor and enter this text, retaining the dashes at the top of the file and the
leading spaces for the last two items:
---
Store: Bakery
Sourdough loaf
Bagels
Save the file as example.yaml (or similar).
If you don't already have yamllint installed, install it:
$ sudo dnf install -y yamllint
A linter is an application that verifies the syntax of a file. The
yamllint command is a great way to ensure your YAML is valid before you hand it
over to whatever application you're writing YAML for (Ansible, for instance).
Use yamllint to validate your YAML file:
$ yamllint --strict shop.yaml || echo "Fail"
$
But when converted to JSON with a simple converter script , the data structure of
this simple YAML becomes clearer:
Parsed without the visual context of line breaks and indentation, the actual scope of your
data looks a lot different. The data is mostly flat, almost devoid of hierarchy. There's no
indication that the sourdough loaf and bagels are children of the name of the store.
Sequence: values listed in a specific order. A sequence starts with a dash and a space (
- ). You can think of a sequence as a Python list or an array in Bash or
Perl.
Mapping: key and value pairs. Each key must be unique, and the order doesn't matter.
Think of a Python dictionary or a variable assignment in a Bash script.
There's a third type called scalar , which is arbitrary data (encoded in
Unicode) such as strings, integers, dates, and so on. In practice, these are the words and
numbers you type when building mapping and sequence blocks, so you won't think about these any
more than you ponder the words of your native tongue.
When constructing YAML, it might help to think of YAML as either a sequence of sequences or
a map of maps, but not both.
YAML mapping blocks
When you start a YAML file with a mapping statement, YAML expects a series of mappings. A
mapping block in YAML doesn't close until it's resolved, and a new mapping block is explicitly
created. A new block can only be created either by increasing the indentation level (in
which case, the new block exists inside the previous block) or by resolving the previous
mapping and starting an adjacent mapping block.
The reason the original YAML example in this article fails to produce data with a hierarchy
is that it's actually only one data block: the key Store has a single value of
Bakery Sourdough loaf Bagels . YAML ignores the whitespace because no new mapping
block has been started.
Is it possible to fix the example YAML by prepending each sequence item with a dash and
space?
---
Store: Bakery
- Sourdough loaf
- Bagels
Again, this is valid YAML, but it's still pretty flat:
The problem is that this YAML file opens a mapping block and never closes it. To close the
Store block and open a new one, you must start a new mapping. The value of
the mapping can be a sequence, but you need a key first.
As you can see, this YAML directive contains one mapping ( Store ) to two child
values ( Bakery and Cheesemonger ), each of which is mapped to a
child sequence.
YAML sequence blocks
The same principles hold true should you start a YAML directive as a sequence. For instance,
this YAML directive is valid:
Flour
Water
Salt
Each item is distinct when viewed as JSON:
["Flour", "Water", "Salt"]
But this YAML file is not valid because it attempts to start a mapping block at an
adjacent level to a sequence block :
---
- Flour
- Water
- Salt
Sugar: caster
It can be repaired by moving the mapping block into the sequence:
---
- Flour
- Water
- Salt
- Sugar: caster
You can, as always, embed a sequence into your mapping item:
---
- Flour
- Water
- Salt
- Sugar:
- caster
- granulated
- icing
Viewed through the lens of explicit JSON scoping, that YAML snippet reads like this:
If you want to comfortably write YAML, it's vital to be aware of its data structure. As you
can tell, there's not much you have to remember. You know about mapping and sequence blocks, so
you know everything you need have to work with. All that's left is to remember how they do and
do not interact with one another. Happy coding! Check out these related articles on Enable
Sysadmin Image 10 YAML
tips for people who hate YAML
This article describes the
different parts of an Ansible playbook starting with a very broad overview of what Ansible is and how
you can use it. Ansible is a way to use easy-to-read YAML syntax to write playbooks that can automate
tasks for you. These playbooks can range from very simple to very complex and one playbook can even be
embedded in another.
Now that you have that base
knowledge let's look at a basic playbook that will install the
httpd
package.
I have an inventory file with two hosts specified, and I placed them in the
web
group:
Let's look at the actual
playbook to see what it contains:
[root@ansible test]# cat httpd.yml
---
- name: this playbook will install httpd
hosts: web
tasks:
- name: this is the task to install httpd
yum:
name: httpd
state: latest
Breaking this down, you see
that the first line in the playbook is
---
.
This lets you know that it is the beginning of the playbook. Next, I gave a name for the play. This is
just a simple playbook with only one play, but a more complex playbook can contain multiple plays.
Next, I specify the hosts that I want to target. In this case, I am selecting the
web
group,
but I could have specified either
ansibleclient.usersys.redhat.com
or
ansibleclient2.usersys.redhat.com
instead
if I didn't want to target both systems. The next line tells Ansible that you're going to get into the
tasks that do the actual work. In this case, my playbook has only one task, but you can have multiple
tasks if you want. Here I specify that I'm going to install the
httpd
package.
The next line says that I'm going to use the
yum
module.
I then tell it the name of the package,
httpd
,
and that I want the latest version to be installed.
When I run the
httpd.yml
playbook
twice, I get this on the terminal:
[root@ansible test]# ansible-playbook httpd.yml
PLAY [this playbook will install httpd] ************************************************************************************************************
TASK [Gathering Facts] *****************************************************************************************************************************
ok: [ansibleclient.usersys.redhat.com]
ok: [ansibleclient2.usersys.redhat.com]
TASK [this is the task to install httpd] ***********************************************************************************************************
changed: [ansibleclient2.usersys.redhat.com]
changed: [ansibleclient.usersys.redhat.com]
PLAY RECAP *****************************************************************************************************************************************
ansibleclient.usersys.redhat.com : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
ansibleclient2.usersys.redhat.com : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
[root@ansible test]# ansible-playbook httpd.yml
PLAY [this playbook will install httpd] ************************************************************************************************************
TASK [Gathering Facts] *****************************************************************************************************************************
ok: [ansibleclient.usersys.redhat.com]
ok: [ansibleclient2.usersys.redhat.com]
TASK [this is the task to install httpd] ***********************************************************************************************************
ok: [ansibleclient.usersys.redhat.com]
ok: [ansibleclient2.usersys.redhat.com]
PLAY RECAP *****************************************************************************************************************************************
ansibleclient.usersys.redhat.com : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
ansibleclient2.usersys.redhat.com : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
[root@ansible test]#
Note that in both cases, I
received an
ok=2
, but in the
second run of the playbook, nothing was changed. The latest version of
httpd
was
already installed at that point.
To get information about the
various modules you can use in a playbook, you can use the
ansible-doc
command.
For example:
[root@ansible test]# ansible-doc yum
> YUM (/usr/lib/python3.6/site-packages/ansible/modules/packaging/os/yum.py)
Installs, upgrade, downgrades, removes, and lists packages and groups with the `yum' package manager. This module only works on Python 2. If you require Python 3 support, see the [dnf] module.
* This module is maintained by The Ansible Core Team
* note: This module has a corresponding action plugin.
< output truncated >
It's nice to have a playbook
that installs
httpd
, but to make
it more flexible, you can use variables instead of hardcoding the package as
httpd
.
To do that, you could use a playbook like this one:
[root@ansible test]# cat httpd.yml
---
- name: this playbook will install {{ myrpm }}
hosts: web
vars:
myrpm: httpd
tasks:
- name: this is the task to install {{ myrpm }}
yum:
name: "{{ myrpm }}"
state: latest
Here you can see that I've
added a section called "vars" and I declared a variable
myrpm
with
the value of
httpd
. I then can
use that
myrpm
variable in the playbook and adjust it to
whatever I want to install. Also, because I've specified the RPM to install by using a variable, I can
override what I have written in the playbook by specifying the variable on the command line by using
-e
:
[root@ansible test]# cat httpd.yml
---
- name: this playbook will install {{ myrpm }}
hosts: web
vars:
myrpm: httpd
tasks:
- name: this is the task to install {{ myrpm }}
yum:
name: "{{ myrpm }}"
state: latest
[root@ansible test]# ansible-playbook httpd.yml -e "myrpm=at"
PLAY [this playbook will install at] ***************************************************************************************************************
TASK [Gathering Facts] *****************************************************************************************************************************
ok: [ansibleclient.usersys.redhat.com]
ok: [ansibleclient2.usersys.redhat.com]
TASK [this is the task to install at] **************************************************************************************************************
changed: [ansibleclient2.usersys.redhat.com]
changed: [ansibleclient.usersys.redhat.com]
PLAY RECAP *****************************************************************************************************************************************
ansibleclient.usersys.redhat.com : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
ansibleclient2.usersys.redhat.com : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
[root@ansible test]#
Another way to make the tasks
more dynamic is to use
loops
. In this snippet, you can see
that I have declared
rpms
as a
list to have
mailx
and
postfix
.
To use them, I use
loop
in
my task:
vars:
rpms:
- mailx
- postfix
tasks:
- name: this will install the rpms
yum:
name: "{{ item }}"
state: installed
loop: "{{ rpms }}"
You might have noticed that
when these plays run, facts about the hosts are gathered:
These facts can be used as variables when you run the play. For example, you could have a
motd.yml
file
that sets content like:
"This is the system {{ ansible_facts['fqdn'] }}.
This is a {{ ansible_facts['distribution'] }} version {{ ansible_facts['distribution_version'] }} system."
For any system where you run
that playbook, the correct fully-qualified domain name (FQDN), operating system distribution, and
distribution version would get set, even without you manually defining those variables.
I currently work as a
Solutions Architect at Red Hat. I have been here for going on 14 years, moving around a bit over
the years, working in front line support and consulting before my current role. In my free time,
I enjoy spending time with my family, exercising, and woodworking.
More
about me
Related Content
Image
Tricks and treats for
sysadmins and ops
Are you ready for the scary technology tricks that can haunt you as a sysadmin? Here
are five treats to counter those tricks.
Posted:
October 30, 2020
Author:
Bryant
Son
(Red
Hat, Sudoer)
Image
Eight ways to protect SSH
access on your system
The Secure Shell is a critical tool in the administrator's arsenal. Here are eight ways
you can better secure SSH, and some suggestions for basic SSH centralization.
Posted:
October 29, 2020
Author:
Damon
Garn
Image
Linux command basics: printf
Use printf to format text or numbers.
Posted:
October 27, 2020
Author:
Tyler
Carrigan
(Red
Hat)
So I started writing simple code in a file that could be interpreted by perl to make the
changes for me with one command per line:
uc mail_owner # "uc" is the command for
"uncomment" uc hostname cv hostname {{fqdn}} # "cv" is the command for "change value", {{fqdn}
+ } is replaced with appropriate value ...[download]
You get the idea. I started writing some code to interpret my config file modification
commands and then realized someone had to have tackled this problem before. I did a search on
metacpan but came up empty. Anyone familiar with this problem space and can help point me in
the right direction?
by likbez on Oct 05, 2020
at 03:16 UTC Reputation: 2
There are also some newer editors that use LUA as the scripting language, but none with
Perl as a scripting language. See
https://www.slant.co/topics/7340/~open-source-programmable-text-editors
Here, for example, is a fragment from an old collection of hardening scripts called Titan,
written for Solaris by Brad M. Powell. Example below uses vi which is the simplest, but
probably not optimal choice, unless your primary editor is VIM.
FixHostsEquiv() { if [
-f /etc/hosts.equiv -a -s /etc/hosts.equiv ]; then t_echo 2 " /etc/hosts.equiv exists and is
not empty. Saving a co + py..." /bin/cp /etc/hosts.equiv /etc/hosts.equiv.ORIG if grep -s
"^+$" /etc/hosts.equiv then ed - /etc/hosts.equiv <<- ! g/^+$/d w q ! fi else t_echo 2
" No /etc/hosts.equiv - PASSES CHECK" exit 1 fi[download]
For VIM/Emacs users the main benefit here is that you will know your editor better,
instead of inventing/learning "yet another tool." That actually also is an argument against
Ansible and friends: unless you operate a cluster or other sizable set of servers, why try to
kill a bird with a cannon. Positive return on investment probably starts if you manage over 8
or even 16 boxes.
Perl also can be used. But I would recommend to slurp the file into an array and operate
with lines like in editor; a regex on the whole text are more difficult to write correctly
then a regex for a line, although experts have no difficulties using just them. But we seldom
acquire skills we can so without :-)
On the other hand, that gives you a chance to learn splice function ;-)
If the files are basically identical and need some slight customization you can use
patch utility with pdsh, but you need to learn the ropes. Like Perl the patch
utility was also written by Larry Wall and is a very flexible tool for such tasks. You need
first to collect files from your servers into some central directory with pdsh/pdcp (which I
think is a standard RPM on RHEL and other linuxes) or other tool, then to create diffs with
one server to which you already applied the change (diff is your command language at this
point), verify that on other server that this diff produced right results, apply it and then
distribute the resulting files back to each server using again pdsh/pdcp. If you have a
common NFS/GPFS/LUSTRA filesystem for all servers this is even simpler as you can store both
the tree and diffs on common filesystem.
The same central repository of config files can be used with vi and other approaches
creating "poor man Ansible" for you .
Get the details on what's inside your computer from the command line. 16 Sep 2019
Howard Fosdick Feed 44
up 5 comments Image by : Opensource.com x Subscribe now
The easiest way is to do that is with one of the standard Linux GUI programs:
i-nex collects
hardware information and displays it in a manner similar to the popular CPU-Z under Windows.
HardInfo
displays hardware specifics and even includes a set of eight popular benchmark programs you
can run to gauge your system's performance.
KInfoCenter and
Lshw also
display hardware details and are available in many software repositories.
Alternatively, you could open up the box and read the labels on the disks, memory, and other
devices. Or you could enter the boot-time panels -- the so-called UEFI or BIOS panels. Just hit
the proper program
function key during the boot process to access them. These two methods give you hardware
details but omit software information.
Or, you could issue a Linux line command. Wait a minute that sounds difficult. Why would you
do this?
Sometimes it's easy to find a specific bit of information through a well-targeted line
command. Perhaps you don't have a GUI program available or don't want to install one.
Probably the main reason to use line commands is for writing scripts. Whether you employ the
Linux shell or another programming language, scripting typically requires coding line
commands.
Many line commands for detecting hardware must be issued under root authority. So either
switch to the root user ID, or issue the command under your regular user ID preceded by sudo
:
sudo <the_line_command>
and respond to the prompt for the root password.
This article introduces many of the most useful line commands for system discovery. The
quick reference chart at the end summarizes them.
Hardware overview
There are several line commands that will give you a comprehensive overview of your
computer's hardware.
The inxi command lists details about your system, CPU, graphics, audio, networking, drives,
partitions, sensors, and more. Forum participants often ask for its output when they're trying
to help others solve problems. It's a standard diagnostic for problem-solving:
inxi -Fxz
The -F flag means you'll get full output, x adds details, and z masks out personally
identifying information like MAC and IP addresses.
The hwinfo and lshw commands display much of the same information in different formats:
hwinfo --short
or
lshw -short
The long forms of these two commands spew out exhaustive -- but hard to read -- output:
hwinfo
or
lshw
CPU details
You can learn everything about your CPU through line commands. View CPU details by issuing
either the lscpu command or its close relative lshw :
lscpu
or
lshw -C cpu
In both cases, the last few lines of output list all the CPU's capabilities. Here you can
find out whether your processor supports specific features.
With all these commands, you can reduce verbiage and narrow any answer down to a single
detail by parsing the command output with the grep command. For example, to view only the CPU
make and model:
The -i flag on the grep command simply ensures your search ignores whether the output it
searches is upper or lower case.
Memory
Linux line commands enable you to gather all possible details about your computer's memory.
You can even determine whether you can add extra memory to the computer without opening up the
box.
To list each memory stick and its capacity, issue the dmidecode command:
dmidecode -t memory | grep -i size
For more specifics on system memory, including type, size, speed, and voltage of each RAM
stick, try:
lshw -short -C memory
One thing you'll surely want to know is is the maximum memory you can install on your
computer:
dmidecode -t memory | grep -i max
Now find out whether there are any open slots to insert additional memory sticks. You can do
this without opening your computer by issuing this command:
lshw -short -C memory | grep -i empty
A null response means all the memory slots are already in use.
Determining how much video memory you have requires a pair of commands. First, list all
devices with the lspci command and limit the output displayed to the video device you're
interested in:
lspci | grep -i vga
The output line that identifies the video controller will typically look something like
this:
Now reissue the lspci command, referencing the video device number as the selected
device:
lspci -v -s 00:02.0
The output line identified as prefetchable is the amount of video RAM on your
system:
...
Memory at f0100000 ( 32 -bit, non-prefetchable ) [ size =512K ]
I / O ports at 1230 [ size = 8 ]
Memory at e0000000 ( 32 -bit, prefetchable ) [ size =256M ]
Memory at f0000000 ( 32 -bit, non-prefetchable ) [ size =1M ]
...
Finally, to show current memory use in megabytes, issue:
free -m
This tells how much memory is free, how much is in use, the size of the swap area, and
whether it's being used. For example, the output might look like this:
total used free
shared buff / cache available
Mem: 11891 1326 8877 212 1687 10077
Swap: 1999 0 1999
The top command gives you more detail on memory use. It shows current overall memory and CPU
use and also breaks it down by process ID, user ID, and the commands being run. It displays
full-screen text output:
top
Disks, filesystems, and devices
You can easily determine whatever you wish to know about disks, partitions, filesystems, and
other devices.
To display a single line describing each disk device:
lshw -short -C disk
Get details on any specific SATA disk, such as its model and serial numbers, supported
modes, sector count, and more with:
hdparm -i /dev/sda
Of course, you should replace sda with sdb or another device mnemonic if necessary.
To list all disks with all their defined partitions, along with the size of each, issue:
lsblk
For more detail, including the number of sectors, size, filesystem ID and type, and
partition starting and ending sectors:
fdisk -l
To start up Linux, you need to identify mountable partitions to the GRUB bootloader. You can find this
information with the blkid command. It lists each partition's unique identifier (UUID) and its
filesystem type (e.g., ext3 or ext4):
blkid
To list the mounted filesystems, their mount points, and the space used and available for
each (in megabytes):
df -m
Finally, you can list details for all USB and PCI buses and devices with these commands:
lsusb
or
lspci
Network
Linux offers tons of networking line commands. Here are just a few.
To see hardware details about your network card, issue:
lshw -C network
Traditionally, the command to show network interfaces was ifconfig :
ifconfig -a
But many people now use:
ip link show
or
netstat -i
In reading the output, it helps to know common network abbreviations:
Abbreviation
Meaning
lo
Loopback interface
eth0 or enp*
Ethernet interface
wlan0
Wireless interface
ppp0
Point-to-Point Protocol interface (used by a dial-up modem, PPTP VPN
connection, or USB modem)
vboxnet0 or vmnet*
Virtual machine interface
The asterisks in this table are wildcard characters, serving as a placeholder for whatever
series of characters appear from system to system.
To show your default gateway and routing tables, issue either of these commands:
ip route | column -t
or
netstat -r
Software
Let's conclude with two commands that display low-level software details. For example, what
if you want to know whether you have the latest firmware installed? This command shows the UEFI
or BIOS date and version:
dmidecode -t bios
What is the kernel version, and is it 64-bit? And what is the network hostname? To find out,
issue:
uname -a
Quick reference chart
This chart summarizes all the commands covered in this article:
Building from scratch an agentless inventory system for Linux servers is a very
time-consuming task. To have precise information about your server's inventory, Ansible comes to be very handy, especially if you
are restricted to install an agent on the servers. However, there are some pieces of
information that the Ansible's inventory mechanism cannot retrieve from the default inventory.
In this case, a Playbook needs to be created to retrieve those pieces of information. Examples
are VMware tool and other application versions which you might want to include in your
inventory system. Since Ansible makes it easy to create JSON files, this can be easily
manipulated for other interesting tasks, say an HTML static page. I would recommend Ansible-CMDB which is very handy
for such conversion. The Ansible-CMDB allows you to create a pure HTML file based on the JSON
file that was generated by Ansible. Ansible-CMDB is another amazing tool created by Ferry Boender .
Let's have a look how the agentless servers inventory with Ansible and Ansible-CMDB works.
It's important to understand the prerequisites needed before installing Ansible. There are
other articles which I published on Ansible:
1. In this article, you will get an overview of what Ansible inventory
is capable of. Start by gathering the information that you will need for your inventory system.
The goal is to make a plan first.
2. As explained in the article Getting started with Ansible
deployment , you have to define a group and record the name of your servers(which can be
resolved through the host file or DNS server) or IP's. Let's assume that the name of the group
is " test ".
3. Launch the following command to see a JSON output which will describe the inventory of
the machine. As you may notice that Ansible had fetched all the data.
Ansible -m setup test
4. You can also append the output to a specific directory for future use with Ansible-cmdb.
I would advise creating a specific directory (I created /home/Ansible-Workdesk ) to prevent
confusion where the file is appended.
Ansible-m setup --tree out/ test
5. At this point, you will have several files created in a tree format, i.e; specific file
with the name of the server containing JSON information about the servers inventory.
Getting Hands-on with Ansible-cmdb
6. Now, you will have to install Ansible-cmdb which is pretty fast and easy. Do make sure
that you follow all the requirements before installation:
git clone https://github.com/fboender/ansible-cmdb
cd ansible-cmdb && make install
7. To convert the JSON files into HTML, use the following command:
ansible-cmdb -t html_fancy_split out/
8. You should notice a directory called "cmdb" which contain some HTML files. Open the
index.html and view your server inventory system.
Tweaking the default template
9. As mentioned previously, there is some information which is not available by default on
the index.html template. You can tweak the
/usr/local/lib/ansible-cmdb/ansiblecmdb/data/tpl/html_fancy_defs.html page and add more
content, for example, ' uptime ' of the servers. To make the " Uptime " column visible, add the
following line in the " Column definitions " section:
It's easier than you think to get started automating your tasks with Ansible. This gentle introduction
gives you the basics you need to begin streamlining your administrative life.
In the end of 2015 and the beginning of 2016, we decided to use
Red Hat Enterprise Linux (RHEL)
as our third operating system, next to Solaris and Microsoft Windows. I was part of the team that
tested RHEL, among other distributions, and would engage in the upcoming operation of the new OS.
Thinking about a fast-growing number of Red Hat Enterprise Linux systems, it came to my mind that I
needed a tool to automate things because without automation the number of hosts I can manage is
limited.
I had experience with Puppet back in the day but did not like that tool because of its complexity.
We had more modules and classes than hosts to manage back then. So, I took a look at
Ansible
version 2.1.1.0 in July 2016.
What I liked about Ansible and still do is that it is push-based. On a target node, only Python and
SSH access are needed to control the node and push configuration settings to it. No agent needs to be
removed if you decide that Ansible isn't the right tool for you. The
YAML
syntax is easy to read and
write, and the option to use playbooks as well as ad hoc commands makes Ansible a flexible solution
that helps save time in our day-to-day business. So, it was at the end of 2016 when we decided to
evaluate Ansible in our environment.
First steps
As a rule of thumb, you should begin automating things that you have to do on a daily or at least a
regular basis. That way, automation saves time for more interesting or more important things. I
followed this rule by using Ansible for the following tasks:
Set a baseline configuration for newly provisioned hosts (set DNS, time, network, sshd, etc.)
Test how useful the ad hoc commands are, and where we could benefit from them.
Baseline Ansible configuration
For us,
baseline configuration
is the configuration every newly provisioned host gets.
This practice makes sure the host fits into our environment and is able to communicate on the network.
Because the same configuration steps have to be made for each new host, this is an awesome step to get
started with automation.
Make sure a certain set of packages were installed
Configure Postfix to be able to send mail in our environment
Configure firewalld
Configure SELinux
(Some of these steps are already published here on Enable Sysadmin, as you can see, and others
might follow soon.)
All of these tasks have in common that they are small and easy to start with, letting you gather
experience with using different kinds of Ansible modules, roles, variables, and so on. You can run
each of these roles and tasks standalone, or tie them all together in one playbook that sets the
baseline for your newly provisioned system.
Red Hat Enterprise Linux Server
patch management with Ansible
As I explained on my GitHub page for
ansible-role-rhel-patchmanagement
, in our environment, we deploy Red Hat Enterprise Linux Servers
for our operating departments to run their applications.
This role was written to provide a mechanism to install Red Hat Security Advisories on target nodes
once a month. In our special use case, only RHSAs are installed to ensure a minimum security limit.
The installation is enforced once a month. The advisories are summarized in "Patch-Sets." This way, it
is ensured that the same advisories are used for all stages during a patch cycle.
The Ansible Inventory nodes are summarized in one of the following groups, each of which defines
when a node is scheduled for patch installation:
[rhel-patch-phase1] - On the second Tuesday of a month.
[rhel-patch-phase2] - On the third Tuesday of a month.
[rhel-patch-phase3] - On the fourth Tuesday of a month.
[rhel-patch-phase4] - On the fourth Wednesday of a month.
In case packages were updated on target nodes, the hosts will reboot afterward.
Because the production systems are most important, they are divided into two separate groups
(phase3 and phase4) to decrease the risk of failure and service downtime due to advisory installation.
Updating and patch management are tasks every sysadmin has to deal with. With these roles, Ansible
helped me get this task done every month, and I don't have to care about it anymore. Only when a
system is not reachable, or yum has a problem, do I get an email report telling me to take a look.
But, I got lucky, and have not yet received any mail report for the last couple of months, now. (Yes,
of course, the system is able to send mail.)
Ad hoc commands
The possibility to run ad hoc commands for quick (and dirty) tasks was one of the reasons I chose
Ansible. You can use these commands to gather information when you need them or to get things done
without the need to write a playbook first.
I used ad hoc commands in cron jobs until I found the time to write playbooks for them. But, with
time comes practice, and today I try to use playbooks and roles for every task that has to run more
than once.
Here are small examples of ad hoc commands that provide quick information about your nodes.
ansible all -m command -a'/usr/bin/cat /etc/os-release'
Query running kernel version
ansible all -m command -a'/usr/bin/uname -r'
Query DNS servers in use by nodes
ansible all -m command -a'/usr/bin/cat /etc/resolv.conf' | grep 'SUCCESS\|nameserver'
Hopefully, these samples give you an idea for what ad hoc commands can be used.
Summary
It's not hard to start with automation. Just look for small and easy tasks you do every single day,
or even more than once a day, and let Ansible do these tasks for you.
Eventually, you will be able to solve more complex tasks as your automation skills grow. But keep
things as simple as possible. You gain nothing when you have to troubleshoot a playbook for three days
when it solves a task you could have done in an hour.
[Want to learn more about Ansible? Check out these
free e-books
.]
10 Ansible modules you need to knowSee examples and learn the most important
modules for automating everyday tasks with Ansible. 11 Sep 2019 DirectedSoul (Red Hat)Feed 25
up4 comments x
Subscribe now
Get the highlights in your inbox every week.
https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0
Ansible is an open source IT
configuration management and automation platform. It uses human-readable YAML templates so
users can program repetitive tasks to happen automatically without having to learn an advanced
programming language.
Ansible is agentless, which means the nodes it manages do not require any software to be
installed on them. This eliminates potential security vulnerabilities and makes overall
management smoother.
Ansible modules are standalone
scripts that can be used inside an Ansible playbook. A playbook consists of a play, and a play
consists of tasks. These concepts may seem confusing if you're new to Ansible, but as you begin
writing and working more with playbooks, they will become familiar.
There are some modules that are frequently used in automating everyday tasks; those are
the ones that we will cover in this article.
Ansible has three main files that you need to consider:
Host/inventory file: Contains the entry of the nodes that need to be managed
Ansible.cfg file: Located by default at /etc/ansible/ansible.cfg , it has the necessary
privilege escalation options and the location of the inventory file
Main file: A playbook that has modules that perform various tasks on a host listed in an
inventory or host file
Module 1: Package management
There is a module for most popular package managers, such as DNF and APT, to enable you to
install any package on a system. Functionality depends entirely on the package manager, but
usually these modules can install, upgrade, downgrade, remove, and list packages. The names of
relevant modules are easy to guess. For example, the DNF module is dnf_module , the old YUM
module (required for Python 2 compatibility) is yum_module , while the
APT module is apt_module , the Slackpkg
module is slackpkg_module ,
and so on.
Example 1:
- name : install the latest version of Apache and MariaDB
dnf :
name :
- httpd
- mariadb-server
state : latest
This installs the Apache web server and the MariaDB SQL database.
Example 2: -
name : Install a list of packages
yum :
name :
- nginx
- postgresql
- postgresql-server
state : present
This installs the list of packages and helps download multiple packages.
Module 2:
Service
After installing a package, you need a module to start it. The service module
enables you to start, stop, and reload installed packages; this comes in pretty
handy.
Example 1: - name : Start service foo, based on running process
/usr/bin/foo
service :
name : foo
pattern : /usr/bin/foo
state : started
This starts the service foo .
Example 2: - name : Restart network service for
interface eth0
service :
name : network
state : restarted
args : eth0
This restarts the network service of the interface eth0 .
Module 3: Copy
The copy module copies a
file from the local or remote machine to a location on the remote machine.
Example 1:
- name : Copy a new "ntp.conf file into place, backing up the original if it differs from the
copied version
copy:
src: /mine/ntp.conf
dest: /etc/ntp.conf
owner: root
group: root
mode: '0644'
backup: yes Example 2: - name : Copy file with owner and permission, using symbolic
representation
copy :
src : /srv/myfiles/foo.conf
dest : /etc/foo.conf
owner : foo
group : foo
mode : u=rw,g=r,o=r Module 4: Debug
The debug module prints
statements during execution and can be useful for debugging variables or expressions without
having to halt the playbook.
Example 1: - name : Display all variables/facts known
for a host
debug :
var : hostvars [ inventory_hostname ]
verbosity : 4
This displays all the variable information for a host that is defined in the inventory
file.
Example 2: - name : Write some content in a file /tmp/foo.txt
copy :
dest : /tmp/foo.txt
content : |
Good Morning!
Awesome sunshine today.
register : display_file_content
- name : Debug display_file_content
debug :
var : display_file_content
verbosity : 2
This registers the content of the copy module output and displays it only when you specify
verbosity as 2. For example:
ansible-playbook demo.yaml -vv
Module 5: File
The file module manages the
file and its properties.
It sets attributes of files, symlinks, or directories.
It also removes files, symlinks, or directories.
Example 1: - name : Change file ownership, group and permissions
file :
path : /etc/foo.conf
owner : foo
group : foo
mode : '0644'
This creates a file named foo.conf and sets the permission to 0644 .
Example 2: -
name : Create a directory if it does not exist
file :
path : /etc/some_directory
state : directory
mode : '0755'
This creates a directory named some_directory and sets the permission to 0755 .
It ensures a particular line is in a file or replaces an existing line using a
back-referenced regular expression.
It's primarily useful when you want to change just a single line in a file.
Example 1: - name : Ensure SELinux is set to enforcing mode
lineinfile :
path : /etc/selinux/config
regexp : '^SELINUX='
line : SELINUX=enforcing
This sets the value of SELINUX=enforcing .
Example 2: - name : Add a line to a
file if the file does not exist, without passing regexp
lineinfile :
path : /etc/resolv.conf
line : 192.168.1.99 foo.lab.net foo
create : yes
This adds an entry for the IP and hostname in the resolv.conf file.
Module 7: Git
The git module
manages git checkouts of repositories to deploy files or software.
Example 1: #
Example Create git archive from repo
- git :
repo : https://github.com/ansible/ansible-examples.git
dest : /src/ansible-examples
archive : /tmp/ansible-examples.zip Example 2: - git :
repo : https://github.com/ansible/ansible-examples.git
dest : /src/ansible-examples
separate_git_dir : /src/ansible-examples.git
This clones a repo with a separate Git directory.
Module 8: Cli_command
The cli_command
module , first available in Ansible 2.7, provides a platform-agnostic way of pushing
text-based configurations to network devices over the network_cli connection
plugin.
Example 1: - name : commit with comment
cli_config :
config : set system host-name foo
commit_comment : this is a test
This sets the hostname for a switch and exits with a commit message.
This backs up a config to a different destination file.
Module 9: Archive
The archive module
creates a compressed archive of one or more files. By default, it assumes the compression
source exists on the target.
Example 1: - name : Compress directory /path/to/foo/
into /path/to/foo.tgz
archive :
path : /path/to/foo
dest : /path/to/foo.tgz Example 2: - name : Create a bz2 archive of multiple files,
rooted at /path
archive :
path :
- /path/to/foo
- /path/wong/foo
dest : /path/file.tar.bz2
format : bz2 Module 10: Command
One of the most basic but useful modules, the command module takes
the command name followed by a list of space-delimited arguments.
Example 1: - name :
return motd to registered var
command : cat /etc/motd
register : mymotd Example 2: - name : Change the working directory to somedir/ and run
the command as db_owner if /path/to/database does not exist.
command : /usr/bin/make_database.sh db_user db_name
become : yes
become_user : db_owner
args :
chdir : somedir/
creates : /path/to/database Conclusion
There are tons of modules available in Ansible, but these ten are the most basic and
powerful ones you can use for an automation job. As your requirements change, you can learn
about other useful modules by entering ansible-doc <module-name> on the command line or
refer to the official documentation
.
Ansible
is a multiplier, a tool that automates
and scales infrastructure of every size. It is considered to be a configuration management,
orchestration, and deployment tool. It is easy to get up and running with Ansible. Even a new sysadmin
could start automating with Ansible in a matter of a few hours.
Ansible automates using the SSH
protocol. The control machine uses an SSH connection to communicate with its target hosts, which are
typically Linux hosts. If you're a Windows sysadmin, you can still use Ansible to automate your
Windows environments using
WinRM
as opposed
to SSH. Presently, though, the control machine still needs to run Linux.
As a new sysadmin, you might start with just a few playbooks. But as your automation skills
continue to grow, and you become more familiar with Ansible, you will learn best practices and further
realize that as your playbooks increase, using
Ansible Galaxy
becomes invaluable.
In this article, you will learn a bit about Ansible Galaxy, its structure, and how and when you can
put it to use.
What Ansible does
Common sysadmin tasks that can be performed with Ansible include patching, updating systems, user
and group management, and provisioning. Ansible presently has a huge footprint in IT Automation -- if not
the largest presently -- and is considered to be the most popular and widely used configuration
management, orchestration, and deployment tool available today.
One of the main reasons for its popularity is its simplicity. It's simple, powerful, and agentless.
Which means a new or entry-level sysadmin can hit the ground automating in a matter of hours. Ansible
allows you to scale quickly, efficiently, and cross-functionally.
Create roles with Ansible Galaxy
Ansible Galaxy is essentially a large public repository of Ansible roles. Roles ship with READMEs
detailing the role's use and available variables. Galaxy contains a large number of roles that are
constantly evolving and increasing.
Galaxy can use git to add other role sources, such as GitHub. You can initialize a new galaxy role
using
ansible-galaxy init
, or you can install a role directly from the Ansible Galaxy
role store by executing the command
ansible-galaxy install <name of role>
.
Here are some helpful
ansible-galaxy
commands you might use from time to time:
ansible-galaxy list
displays a list of installed roles, with version numbers.
ansible-galaxy remove <role>
removes an installed role.
ansible-galaxy info
provides a variety of information about Ansible Galaxy.
ansible-galaxy init
can be used to create a role template suitable for submission
to Ansible Galaxy.
To create an Ansible role using Ansible Galaxy, we need to use the
ansible-galaxy
command and its templates. Roles must be downloaded before they can be used in playbooks, and they are
placed into the default directory
/etc/ansible/roles
. You can find role examples at
https://galaxy.ansible.com/geerlingguy
:
Image
Create collections
While Ansible Galaxy has been the go-to tool for constructing and managing roles, with new
iterations of Ansible you are bound to see changes or additions. On Ansible version 2.8 you get the
new feature of
collections
.
What are collections and why are they worth mentioning? As the Ansible documentation states:
Collections are a distribution format for Ansible content. They can be used to package and
distribute playbooks, roles, modules, and plugins.
The
ansible-galaxy-collection
command implements the following commands. Notably, a few of the
subcommands are the same as used with
ansible-galaxy
:
init
creates a basic collection skeleton based on the default template included
with Ansible, or your own template.
build
creates a collection artifact that can be uploaded to Galaxy, or your own
repository.
publish
publishes a built collection artifact to Galaxy.
install
installs one or more collections.
In order to determine what can go into a collection, a great resource can be found
here
.
Conclusion
Establish yourself as a stellar sysadmin with an automation solution that is simple, powerful,
agentless, and scales your infrastructure quickly and efficiently. Using Ansible Galaxy to create
roles is superb thinking, and an ideal way to be organized and thoughtful in managing your
ever-growing playbooks.
The only way to improve your automation skills is to work with a dedicated tool and prove the value
and positive impact of automation on your infrastructure.
Dell EMC OpenManage Ansible Modules provide customers the ability to automate the Out-of-Band configuration management, deployment
and updates for Dell EMC PowerEdge Servers using Ansible by leeveragin the management automation built into the iDRAC with Lifecycle
Controller. iDRAC provides both REST APIs based on DMTF RedFish industry standard and WS-Management (WS-MAN) for management automation
of PowerEdge Servers.
With OpenManage Ansible modules, you can do:
Server administration
Configure iDRAC's settings such as:
iDRAC Network Settings
SNMP and SNMP Alert Settings
Timezone and NTP Settings
System settings such as server topology
LC attributes such as CSIOR etc.
Perform User administration
BIOS and Boot Order configuration
RAID Configuration
OS Deployment
Firmware Updates
1.1 How OpenManage Ansible Modules work?
OpenManage Ansible modules extensively uses the Server Configuration Profile (SCP) for most of the configuration management, deployment
and update of PowerEdge Servers. Lifecycle Controller 2 version 1.4 and later adds support for SCP. A SCP contains all BIOS, iDRAC,
Lifecycle Controller, Network amd Storage settings of a PowerEdge server and can be applied to multiple servers, enabling rapid,
reliable and reproducible configuration.
A SCP operation can be performed using any of the following methods:
Export/Import to/from a remote network share via CIFS, NFS
Export/Import to/from a remote network share via HTTP, HTTPS (iDRAC firmware 3.00.00.00 and above)
Export/Import to/from via local file streaming (iDRAC firmware 3.00.00.00 and above)
NOTE : This BETA release of OpenManage Ansible Module supports only the first option listed above for SCP operations i.e. export/import
to/from a remote network share via CIFS or NFS. Future releases will support all the options for SCP operations.
Setting up a local mount point for a remote network share
Since OpenManage Ansible modules extensively uses SCP to automate and orchestrate configuration, deployment and update on PowerEdge
servers, you must locally mount the remote network share (CIFS or NFS) on the ansible server where you will be executing the playbook
or modules. Local mount point also should have read-write privileges in order for OpenManage Ansible modules to write a SCP file
to remote network share that will be imported by iDRAC.
You can use either of the following ways to setup a local mount point:
Use the mount command to mount a remote network share
# Mount a remote CIFS network share on the local ansible machine.
# In the below command, 192.168.10.10 is the IP address of the CIFS file
# server (you can provide a hostname as well), Share is the directory that
# is being shared, and /mnt/CIFS is the location to mount the file system
# on the local ansible machine
sudo mount -t cifs \\\\192.168.10.10\\Share -o username=user1,password=password,dir_mode=0777,file_mode=0666 /mnt/CIFS
# Mount a remote NFS network share on the local ansible machine.
# In the below command, 192.168.10.10 is the IP address of the NFS file
# server (you can provide a hostname as well), Share is the directory that
# is being exported, and /mnt/NFS is the location to mount the file system
# on the local ansible machine. Please note that NFS checks access
# permissions against user ids (UIDs). For granting the read-write
# privileges on the local mount point, the UID and GID of the user on your
# local ansible machine needs to match the UID and GID of the owner of the
# folder you are trying to access on the server. Other option for granting
# the rw privileges would be to use all_squash option.
sudo mount -t nfs 192.168.10.11:/Share /mnt/NFS -o rw,user,auto
Alternate and preferred way would be to use the /etc/fstab for mounting the remote network share. That way, you
won't have to mount the network share after a reboot and remember all the options. General syntax for mounting the network share
in /etc/fstab would be as follows:
We take our first glimpse at the Ansible
documentation on the official website. While Ansible can be overwhelming with so many
immediate options, let's break down what is presented to us here. Putting our attention on the
page's main pane, we are given five offerings from Ansible. This pane is a central location, or
one-stop-shop, to maneuver through the documentation for products like Ansible Tower, Ansible
Galaxy, and Ansible Lint: Image
We can even dive into Ansible Network for specific module documentation that extends the
power and ease of Ansible automation to network administrators. The focal point of the rest of
this article will be around Ansible Project, to give us a great starting point into our
automation journey:
Image
Once we click the Ansible Documentation tile under the Ansible Project section, the first
action we should take is to ensure we are viewing the documentation's correct version. We can
get our current version of Ansible from our control node's command line by running
ansible --version . Armed with the version information provided by the output, we
can select the matching version in the site's upper-left-hand corner using the drop-down menu,
that by default says latest :
10 YAML tips for people who hate YAML
Do you hate YAML? These tips might ease your pain.
Posted June 10, 2019
|
by
Seth Kenlon
(Red Hat)
Image
There are lots of formats for configuration files: a list of values, key and value pairs, INI files,
YAML, JSON, XML, and many more. Of these, YAML sometimes gets cited as a particularly difficult one to
handle for a few different reasons. While its ability to reflect hierarchical values is significant
and its minimalism can be refreshing to some, its Python-like reliance upon syntactic whitespace can
be frustrating.
However, the open source world is diverse and flexible enough that no one has to
suffer through abrasive technology, so if you hate YAML, here are 10 things you can (and should!) do
to make it tolerable. Starting with zero, as any sensible index should.
0. Make your editor do the work
Whatever text editor you use probably has plugins to make dealing with syntax easier. If you're not
using a YAML plugin for your editor, find one and install it. The effort you spend on finding a plugin
and configuring it as needed will pay off tenfold the very next time you edit YAML.
For example, the
Atom
editor comes with a YAML mode
by default, and while GNU Emacs ships with minimal support, you can add additional packages like
yaml-mode
to help.
Emacs
in YAML and whitespace mode.
If your favorite text editor lacks a YAML mode, you can address some of your grievances with small
configuration changes. For instance, the default text editor for the GNOME desktop, Gedit, doesn't
have a YAML mode available, but it does provide YAML syntax highlighting by default and features
configurable tab width:
Configuring
tab width and type in Gedit.
With the
drawspaces
Gedit plugin package, you can make white space visible in the form of leading
dots, removing any question about levels of indentation.
Take some time to research your favorite text editor. Find out what the editor, or its community,
does to make YAML easier, and leverage those features in your work. You won't be sorry.
1. Use a linter
Ideally, programming languages and markup languages use predictable syntax. Computers tend to do
well with predictability, so the concept of a
linter
was invented in 1978. If you're not using a linter for YAML, then it's time to adopt this 40-year-old
tradition and use
yamllint
.
Invoking
yamllint
is as simple as telling it to check a file. Here's an example of
yamllint
's response to a YAML file containing an error:
$ yamllint errorprone.yaml
errorprone.yaml
23:10 error syntax error: mapping values are not allowed here
23:11 error trailing spaces (trailing-spaces)
That's not a time stamp on the left. It's the error's line and column number. You may or may not
understand what error it's talking about, but now you know the error's location. Taking a second look
at the location often makes the error's nature obvious. Success is eerily silent, so if you want
feedback based on the lint's success, you can add a conditional second command with a double-ampersand
(
&&
). In a POSIX shell,
&&
fails if a command returns anything
but
0, so upon success, your
echo
command makes that clear. This tactic is somewhat
superficial, but some users prefer the assurance that the command did run correctly, rather than
failing silently. Here's an example:
$ yamllint perfect.yaml && echo "OK"
OK
The reason
yamllint
is so silent when it succeeds is that it returns 0 errors when
there are no errors.
2. Write in Python, not YAML
If you really hate YAML, stop writing in YAML, at least in the literal sense. You might be stuck
with YAML because that's the only format an application accepts, but if the only requirement is to end
up in YAML, then work in something else and then convert. Python, along with the excellent
pyyaml
library, makes this easy, and you have two
methods to choose from: self-conversion or scripted.
Self-conversion
In the self-conversion method, your data files are also Python scripts that produce YAML. This
works best for small data sets. Just write your JSON data into a Python variable, prepend an
import
statement, and end the file with a simple three-line output statement.
#!/usr/bin/python3
import yaml
d={
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
f=open('output.yaml','w')
f.write(yaml.dump(d))
f.close
Run the file with Python to produce a file called
output.yaml
file.
$ python3 ./example.json
$ cat output.yaml
glossary:
GlossDiv:
GlossList:
GlossEntry:
Abbrev: ISO 8879:1986
Acronym: SGML
GlossDef:
GlossSeeAlso: [GML, XML]
para: A meta-markup language, used to create markup languages such as DocBook.
GlossSee: markup
GlossTerm: Standard Generalized Markup Language
ID: SGML
SortAs: SGML
title: S
title: example glossary
This output is perfectly valid YAML, although
yamllint
does issue a warning that the
file is not prefaced with
---
, which is something you can adjust either in the Python
script or manually.
Scripted conversion
In this method, you write in JSON and then run a Python conversion script to produce YAML. This
scales better than self-conversion, because it keeps the converter separate from the data.
Create a JSON file and save it as
example.json
. Here is an example from
json.org
:
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-markup language, used to create markup languages such as DocBook.",
"GlossSeeAlso": ["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
Create a simple converter and save it as
json2yaml.py
. This script imports both the
YAML and JSON Python modules, loads a JSON file defined by the user, performs the conversion, and then
writes the data to
output.yaml
.
Save this script in your system path, and execute as needed:
$ ~/bin/json2yaml.py example.json
3. Parse early, parse often
Sometimes it helps to look at a problem from a different angle. If your problem is YAML, and you're
having a difficult time visualizing the data's relationships, you might find it useful to restructure
that data, temporarily, into something you're more familiar with.
If you're more comfortable with dictionary-style lists or JSON, for instance, you can convert YAML
to JSON in two commands using an interactive Python shell. Assume your YAML file is called
mydata.yaml
.
There are many other examples, and there are plenty of online converters and local parsers, so
don't hesitate to reformat data when it starts to look more like a laundry list than markup.
4. Read the spec
After I've been away from YAML for a while and find myself using it again, I go straight back to
yaml.org
to re-read the spec. If
you've never read the specification for YAML and you find YAML confusing, a glance at the spec may
provide the clarification you never knew you needed. The specification is surprisingly easy to read,
with the requirements for valid YAML spelled out with lots of examples in
chapter 6
.
5. Pseudo-config
Before I started writing my book,
Developing Games on the
Raspberry Pi
, Apress, 2019, the publisher asked me for an outline. You'd think an outline would be
easy. By definition, it's just the titles of chapters and sections, with no real content. And yet, out
of the 300 pages published, the hardest part to write was that initial outline.
YAML can be the same way. You may have a notion of the data you need to record, but that doesn't
mean you fully understand how it's all related. So before you sit down to write YAML, try doing a
pseudo-config instead.
A pseudo-config is like pseudo-code. You don't have to worry about structure or indentation,
parent-child relationships, inheritance, or nesting. You just create iterations of data in the way you
currently understand it inside your head.
A
pseudo-config.
Once you've got your pseudo-config down on paper, study it, and transform your results into valid
YAML.
6. Resolve the spaces vs. tabs debate
OK, maybe you won't
definitively
resolve the
spaces-vs-tabs debate
,
but you should at least resolve the debate within your project or organization. Whether you resolve
this question with a post-process
sed
script, text editor configuration, or a blood-oath
to respect your linter's results, anyone in your team who touches a YAML project must agree to use
spaces (in accordance with the YAML spec).
Any good text editor allows you to define a number of spaces instead of a tab character, so the
choice shouldn't negatively affect fans of the
Tab
key.
Tabs and spaces are, as you probably know all too well, essentially invisible. And when something
is out of sight, it rarely comes to mind until the bitter end, when you've tested and eliminated all
of the "obvious" problems. An hour wasted to an errant tab or group of spaces is your signal to create
a policy to use one or the other, and then to develop a fail-safe check for compliance (such as a Git
hook to enforce linting).
7. Less is more (or more is less)
Some people like to write YAML to emphasize its structure. They indent vigorously to help
themselves visualize chunks of data. It's a sort of cheat to mimic markup languages that have explicit
delimiters.
For some users, this approach is a helpful way to lay out a YAML document, while other users miss
the structure for the void of seemingly gratuitous white space.
If you own and maintain a YAML document, then
you
get to define what "indentation" means.
If blocks of horizontal white space distract you, then use the minimal amount of white space required
by the YAML spec. For example, the same YAML from the Ansible documentation can be represented with
fewer indents without losing any of its validity or meaning:
I'm a big fan of repetition breeding familiarity, but sometimes repetition just breeds repeated
stupid mistakes. Luckily, a clever peasant woman experienced this very phenomenon back in 396 AD
(don't fact-check me), and invented the concept of the
recipe
.
If you find yourself making YAML document mistakes over and over, you can embed a recipe or
template in the YAML file as a commented section. When you're adding a section, copy the commented
recipe and overwrite the dummy data with your new real data. For example:
I'm a fan of YAML, generally, but sometimes YAML isn't the answer. If you're not locked into YAML
by the application you're using, then you might be better served by some other configuration format.
Sometimes config files outgrow themselves and are better refactored into simple Lua or Python scripts.
YAML is a great tool and is popular among users for its minimalism and simplicity, but it's not the
only tool in your kit. Sometimes it's best to part ways. One of the benefits of YAML is that parsing
libraries are common, so as long as you provide migration options, your users should be able to adapt
painlessly.
If YAML is a requirement, though, keep these tips in mind and conquer your YAML hatred once and for
all!
What to read next
Skip to
main content
We use cookies on our websites to deliver our online services. Details about how we use cookies and how you may disable
them are set out in our
Privacy Statement
. By using
this website you agree to our use of cookies.
�
Search
Enable SysAdmin
Ansible
is an open source tool for software provisioning, application deployment, orchestration,
configuration, and administration. Its purpose is to help you automate your configuration processes
and simplify the administration of multiple systems. Thus, Ansible essentially pursues the same goals
as Puppet, Chef, or Saltstack.
What I like about Ansible is that it's flexible, lean, and easy to start with. In most use cases,
it keeps the job simple.
I chose to use Ansible back in 2016 because no agent has to be installed on the managed nodes -- a
node is what Ansible calls a managed remote system. All you need to start managing a remote system
with Ansible is SSH access to the system, and Python installed on it. Python is preinstalled on most
Linux systems, and I was already used to managing my hosts via SSH, so I was ready to start right
away. And if the day comes where I decide not to use Ansible anymore, I just have to delete my Ansible
controller machine (control node) and I'm good to go. There are no agents left on the managed nodes
that have to be removed.
Ansible offers two ways to control your nodes. The first one uses
playbooks
.
These are simple ASCII files written in
Yet Another Markup Language (YAML)
, which is easy to read and write. And second, there are the
ad-hoc
commands
, which allow you to run a command or
module
without having to create a playbook first.
You organize the hosts you would like to manage and control in an
inventory
file, which offers flexible format options. For example, this could be an INI-like file
that looks like:
I would like to give you two small examples of how to use Ansible. I started with these really
simple tasks before I used Ansible to take control of more complex tasks in my infrastructure.
Ad-hoc: Check if Ansible can remote manage
a system
As you might recall from the beginning of this article, all you need to manage a remote host is SSH
access to it, and a working Python interpreter on it. To check if these requirements are fulfilled,
run the following ad-hoc command against a host from your inventory:
Playbook: Register a system and attach a
subscription
This example shows how to use a playbook to keep installed packages up to date. The playbook is an
ASCII text file which looks like this:
---
# Make sure all packages are up to date
- name: Update your system
hosts: mail.example.com
tasks:
- name: Make sure all packages are up to date
yum:
name: "*"
state: latest
Now, we are ready to run the playbook:
[jkastning@ansible]$ ansible-playbook yum_update.yml
PLAY [Update your system] **************************************************************************
TASK [Gathering Facts] *****************************************************************************
ok: [mail.example.com]
TASK [Make sure all packages are up to date] *******************************************************
ok: [mail.example.com]
PLAY RECAP *****************************************************************************************
mail.example.com : ok=2 changed=0 unreachable=0 failed=0
Here everything is ok and there is nothing else to do. All installed packages are already the
latest version.
Today, Ansible saves me a lot of time and supports my day-to-day work tasks quite well. So what are
you waiting for? Try it, use it, and feel a bit more comfortable at work.
What to read next
Image
10 YAML tips for people who hate YAML
Do you hate YAML? These tips might ease your pain.
Posted:
June 10, 2019
Author:
Seth Kenlon
(Red Hat)
Topics:
Automation
Ansible
AUTOMATION FOR EVERYONE
Getting started with Ansible
Get started
J�rg Kastning
Joerg is a sysadmin for over ten years now. He is a member of the Red Hat Accelerators and runs
his own blog at https://www.my-it-brain.de.
More
about me
Related Content
Image
10 YAML
tips for people who hate YAML
Do you hate YAML? These tips might ease your pain.
Posted:
June 10, 2019
Author:
Seth Kenlon
(Red Hat)
OUR BEST CONTENT, DELIVERED TO YOUR INBOX
https://www.redhat.com/sysadmin/eloqua-embedded-subscribe.html?offer_id=701f20000012gE7AAI
The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat.
Red Hat and the Red Hat logo are trademarks of Red Hat, Inc., registered in the United States and other
countries.
Objective Our goal is to build rpm packages with custom content, unifying scripts
across any number of systems, including versioning, deployment and undeployment. Operating
System and Software Versions
Operating system: Red Hat Enterprise Linux 7.5
Software: rpm-build 4.11.3+
Requirements Privileged access to the system for install, normal access for build.
Difficulty MEDIUM Conventions
# - requires given linux commands to be executed with root
privileges either directly as a root user or by use of sudo command
$ - given linux
commands to be executed as a regular non-privileged user
Introduction One of the core feature of any Linux system is that they are built for
automation. If a task may need to be executed more than one time - even with some part of it
changing on next run - a sysadmin is provided with countless tools to automate it, from simple
shell scripts run by hand on demand (thus eliminating typo errors, or only save
some keyboard hits) to complex scripted systems where tasks run from cron at a
specified time, interacting with each other, working with the result of another script, maybe
controlled by a central management system etc.
While this freedom and rich toolset indeed adds to productivity, there is a catch: as a
sysadmin, you write a useful script on a system, which proves to be useful on another, so you
copy the script over. On a third system the script is useful too, but with minor modification -
maybe a new feature useful only that system, reachable with a new parameter. Generalization in
mind, you extend the script to provide the new feature, and complete the task it was written
for as well. Now you have two versions of the script, the first is on the first two system, the
second in on the third system.
You have 1024 computers running in the datacenter, and 256 of them will need some of the
functionality provided by that script. In time you will have 64 versions of the script all
over, every version doing its job. On the next system deployment you need a feature you recall
you coded at some version, but which? And on which systems are they?
On RPM based systems, such as Red Hat flavors, a sysadmin can take advantage of the package
manager to create order in the custom content, including simple shell scripts that may not
provide else but the tools the admin wrote for convenience.
In this tutorial we will build a custom rpm for Red Hat Enterprise Linux 7.5 containing two
bash scripts, parselogs.sh and pullnews.sh to provide a
way that all systems have the latest version of these scripts in the
/usr/local/sbin directory, and thus on the path of any user who logs in to the
system.
Distributions, major and minor versions In general, the minor and major version of the
build machine should be the same as the systems the package is to be deployed, as well as the
distribution to ensure compatibility. If there are various versions of a given distribution, or
even different distributions with many versions in your environment (oh, joy!), you should set
up build machines for each. To cut the work short, you can just set up build environment for
each distribution and each major version, and have them on the lowest minor version existing in
your environment for the given major version. Of cause they don't need to be physical machines,
and only need to be running at build time, so you can use virtual machines or containers.
In this tutorial our work is much easier, we only deploy two scripts that have no
dependencies at all (except bash ), so we will build noarch packages
which stand for "not architecture dependent", we'll also not specify the distribution the
package is built for. This way we can install and upgrade them on any distribution that uses
rpm , and to any version - we only need to ensure that the build machine's
rpm-build package is on the oldest version in the environment. Setting up
building environment To build custom rpm packages, we need to install the
rpm-build package:
# yum install rpm-build
From now on, we do not useroot user, and for a good reason. Building
packages does not require root privilege, and you don't want to break your building
machine.
Building the first version of the package Let's create the directory structure needed
for building:
$ mkdir -p rpmbuild/SPECS
Our package is called admin-scripts, version 1.0. We create a specfile that
specifies the metadata, contents and tasks performed by the package. This is a simple text file we
can create with our favorite text editor, such as vi . The previously installed
rpmbuild package will fill your empty specfile with template data if you use
vi to create an empty one, but for this tutorial consider the specification below
called admin-scripts-1.0.spec :
Name: admin-scripts
Version: 1
Release: 0
Summary: FooBar Inc. IT dept. admin scripts
Packager: John Doe
Group: Application/Other
License: GPL
URL: www.foobar.com/admin-scripts
Source0: %{name}-%{version}.tar.gz
BuildArch: noarch
%description
Package installing latest version the admin scripts used by the IT dept.
%prep
%setup -q
%build
%install
rm -rf $RPM_BUILD_ROOT
mkdir -p $RPM_BUILD_ROOT/usr/local/sbin
cp scripts/* $RPM_BUILD_ROOT/usr/local/sbin/
%clean
rm -rf $RPM_BUILD_ROOT
%files
%defattr(-,root,root,-)
%dir /usr/local/sbin
/usr/local/sbin/parselogs.sh
/usr/local/sbin/pullnews.sh
%doc
%changelog
* Wed Aug 1 2018 John Doe
- release 1.0 - initial release
Place the specfile in the rpmbuild/SPEC directory we created earlier.
We need the sources referenced in the specfile - in this case the two shell
scripts. Let's create the directory for the sources (called as the package name appended with
the main version):
As this tutorial is not about shell scripting, the contents of these scripts are irrelevant. As
we will create a new version of the package, and the pullnews.sh is the script we
will demonstrate with, it's source in the first version is as below:
#!/bin/bash
echo "news pulled"
exit 0
Do not forget to add the appropriate rights to the files in the source - in our case,
execution right:
We'll get some output about the build, and if anything goes wrong, errors will be shown (for
example, missing file or path). If all goes well, our new package will appear in the RPMS directory
generated by default under the rpmbuild directory (sorted into subdirectories by
architecture):
$ ls rpmbuild/RPMS/noarch/
admin-scripts-1-0.noarch.rpm
We have created a simple yet fully functional rpm package. We can query it for all the
metadata we supplied earlier:
$ rpm -qpi rpmbuild/RPMS/noarch/admin-scripts-1-0.noarch.rpm
Name : admin-scripts
Version : 1
Release : 0
Architecture: noarch
Install Date: (not installed)
Group : Application/Other
Size : 78
License : GPL
Signature : (none)
Source RPM : admin-scripts-1-0.src.rpm
Build Date : 2018. aug. 1., Wed, 13.27.34 CEST
Build Host : build01.foobar.com
Relocations : (not relocatable)
Packager : John Doe
URL : www.foobar.com/admin-scripts
Summary : FooBar Inc. IT dept. admin scripts
Description :
Package installing latest version the admin scripts used by the IT dept.
And of cause we can install it (with root privileges): Installing custom scripts with rpm
As we installed the scripts into a directory that is on every user's $PATH , you
can run them as any user in the system, from any directory:
$ pullnews.sh
news pulled
The package can be distributed as it is, and can be pushed into repositories available to any
number of systems. To do so is out of the scope of this tutorial - however, building another
version of the package is certainly not. Building another version of the package Our package
and the extremely useful scripts in it become popular in no time, considering they are reachable
anywhere with a simple yum install admin-scripts within the environment. There will be
soon many requests for some improvements - in this example, many votes come from happy users that
the pullnews.sh should print another line on execution, this feature would save the
whole company. We need to build another version of the package, as we don't want to install another
script, but a new version of it with the same name and path, as the sysadmins in our organization
already rely on it heavily.
First we change the source of the pullnews.sh in the SOURCES to something even
more complex:
#!/bin/bash
echo "news pulled"
echo "another line printed"
exit 0
We need to recreate the tar.gz with the new source content - we can use the same filename as
the first time, as we don't change version, only release (and so the Source0 reference
will be still valid). Note that we delete the previous archive first:
cd rpmbuild/SOURCES/ && rm -f admin-scripts-1.tar.gz && tar -czf admin-scripts-1.tar.gz admin-scripts-1
Now we create another specfile with a higher release number:
We don't change much on the package itself, so we simply administrate the new version as
shown below:
Name: admin-scripts
Version: 1
Release: 1
Summary: FooBar Inc. IT dept. admin scripts
Packager: John Doe
Group: Application/Other
License: GPL
URL: www.foobar.com/admin-scripts
Source0: %{name}-%{version}.tar.gz
BuildArch: noarch
%description
Package installing latest version the admin scripts used by the IT dept.
%prep
%setup -q
%build
%install
rm -rf $RPM_BUILD_ROOT
mkdir -p $RPM_BUILD_ROOT/usr/local/sbin
cp scripts/* $RPM_BUILD_ROOT/usr/local/sbin/
%clean
rm -rf $RPM_BUILD_ROOT
%files
%defattr(-,root,root,-)
%dir /usr/local/sbin
/usr/local/sbin/parselogs.sh
/usr/local/sbin/pullnews.sh
%doc
%changelog
* Wed Aug 22 2018 John Doe
- release 1.1 - pullnews.sh v1.1 prints another line
* Wed Aug 1 2018 John Doe
- release 1.0 - initial release
All done, we can build another version of our package containing the updated script. Note that
we reference the specfile with the higher version as the source of the build:
If the build is successful, we now have two versions of the package under our RPMS directory:
ls rpmbuild/RPMS/noarch/
admin-scripts-1-0.noarch.rpm admin-scripts-1-1.noarch.rpm
And now we can install the "advanced" script, or upgrade if it is already installed.
Upgrading custom scripts with rpm
And our sysadmins can see that the feature request is landed in this version:
rpm -q --changelog admin-scripts
* sze aug 22 2018 John Doe
- release 1.1 - pullnews.sh v1.1 prints another line
* sze aug 01 2018 John Doe
- release 1.0 - initial release
Conclusion
We wrapped our custom content into versioned rpm packages. This means no
older versions left scattered across systems, everything is in it's place, on the version we
installed or upgraded to. RPM gives the ability to replace old stuff needed only in previous
versions, can add custom dependencies or
provide some tools or services our other packages rely on. With effort, we can pack nearly any of
our custom content into rpm packages, and distribute it across our environment, not only with ease,
but with consistency.
When performing the change process, metadata is used for analytical purposes. This may be in the
form of reports or a direct search in the database or the databases where metadata is maintained.
Trace information is often used-for instance, to determine in which configuration item changes are
required due to an event. Also information about variants or branches belonging to a configuration
item is used to determine if a change has effects in several places.
Finally metadata may be used to determine if a configuration item has other outstanding event
registrations, such as whether other changes are in the process of being implemented or are awaiting
a decision about implementation.
Consequence Analysis
When analyzing an event, you must consider the cost of implementing changes. This is not always
a simple matter. The following checklists, adapted from a list by Karl Wiegers, may help in analyzing
the effects of a proposed change. The lists are not exhaustive and are meant only as inspiration.
Identify
All requirements affected by or in conflict with the proposed change
The consequences of not introducing the proposed change
Possible adverse effects and other risks connected with implementation
How much of what has already been invested in the product will be lost if the proposed change
is implemented-or if it is not
Check if the proposed change
Has an effect on nonfunctional requirements, such as performance requirements (ISO 9126, a
standard for quality characteristics, defines six characteristics: functional, performance, availability,
usability, maintainability, and portability. The latter five are typically referred to as nonfunctional.)
May be introduced with known technology and available resources
Will cause unacceptable resource requirements in development or test
Will entail a higher unit price
Will affect marketing, production, services, or support
Follow-on effects may be additions, changes, or removals in
User interfaces or reports, internal or external interfaces, or data storage
Designed objects, source code, build scripts, include files
Test plans and test specifications
Help texts, user manuals, training material, or other user documentation
Project plan, quality plan, configuration management plan, and other plans
Other systems, applications, libraries, or hardware components
Roles
The configuration (or change) control board (CCB) is responsible for change control. A configuration
control board may consist of a single person, such as the author or developer when a document or
a piece of code is first written, or an agile team working in close contact with users and sponsors,
if work can be performed in an informal way without bureaucracy and heaps of paper. It may also-and
will typically, for most important configuration items-consist of a number of people, such as the
project manager, a customer representative, and the person responsible for quality assurance.
Process Descriptions
The methods, conventions, and procedures necessary for carrying out the activities in change control
may be
Description of the change control process structure
Procedures in the life cycles of events and changes
Convention(s) for forming different types of configuration control boards
Definition of responsibility for each type of configuration control board
Template(s) for event registration
Template(s) for change request
Connection with Other Activities
Change control is clearly delimited from other activities in configuration management, though
all activities may be implemented in the same tool in an automated system. Whether change control
is considered a configuration management activity may differ from company to company. Certainly it
is tightly coupled with project management, product management, and quality assurance, and in some
cases is considered part of quality assurance or test activities. Still, when defining and distributing
responsibilities, it's important to keep the boundaries clear, so change control is part of configuration
management and nothing else.
Example
Figure
1�10 shows an example of a process diagram for change control. A number of processes are depicted
in the diagram as boxes with input and output sections (e.g., "Evaluation of event registration").
All these processes must be defined and, preferably, described. 1.5 Status Reporting
Status reporting makes available, in a useful and readable way, the information necessary to effectively
manage a product's development and maintenance. Other activity areas in configuration management
deliver the data foundation for status reporting, in the form of metadata and change control data.
Status reporting entails extraction, arrangement, and formation of these data according to demand.
Figure
1�11 shows how status reporting is influenced by its surroundings .
The result of status reporting is the generation of status report(s). Each company must define
the reports it should be possible to produce. This may be a release note, an item list (by status,
history, or composition), or a trace matrix. It should also be possible to extract ad hoc information
on the basis of a search in the available data.
Process Descriptions
The methods, conventions, and procedures necessary for the activities in status re-porting may
be
Procedure(s) for the production of available status reports
Procedure(s) for ad hoc extraction of information
Templates for status reports that the configuration management system should be able to produce
Roles
The librarian is responsible for ensuring that data for and information in status reports are
correct, even when reporting is fully automated. Users themselves should be able to extract as many
status reports as possible. Still, it may be necessary to involve a librarian, especially if metadata
and change data are spread over different media.
Connection with Other Activities
Status reporting depends on correct and sufficient data from other activity areas in configuration
management. It's important to understand what information should be available in status reports,
so it can be specified early on. It may be too late to get information in a status report if the
information was requested late in the project and wasn't collected. Status reports from the configuration
management system can be used within almost all process areas in a company. They may be an excellent
source of metrics for other process areas, such as helping to identify which items have had most
changes made to them, so these items can be the target of further testing or redesign. 1.6 False
Friends: Version Control and Baselines
The expression "false friends" is used in the world of languages. When learning a new language,
you may falsely think you know the meaning of a specific word, because you know the meaning of a
similar word in your own or a third language. For example, the expression faire exprs in French
means "to do something on purpose," and not, as you might expect, "to do something fast." There are
numerous examples of "false friends"-some may cause embarrassment, but most "just" cause confusion.
This section discusses the concepts of "version control" and "baseline." These terms are frequently
used when talking about configuration management, but there is no common and universal agreement
on their meaning. They may, therefore, easily become "false friends" if people in a company use them
with different meanings. The danger is even greater between a company and a subcontractor or customer,
where the possibility of cultural differences is greater than within a single company. It is hoped
that this section will help reduce misunderstandings.
Version Control
"Version control" can have any of the following meanings:
Configuration management as such
Configuration management of individual items, as opposed to configuration management of deliveries
Control of versions of an item (identification and storage of items) without the associated
change control (which is a part of configuration management)
Storage of intermediate results (backup of work carried out over a period of time for the
sole benefit of the producer)
It's common but inadvisable to use the terms "configuration management" and "version control"
indiscriminately. A company must make up its mind as to which meaning it will attach to "version
control" and define the term relative to the meaning of configuration management. The term "version
control" is not used in this book unless its meaning is clear from the context. Nor does the concept
exist in IEEE standards referred to in this book, which use "version" in the sense of "edition."
Baseline
"Baseline" can have any of the following meanings:
An item approved and placed in storage in a controlled library
A delivery (a collection of items released for usage)
A configuration item, usually a delivery, connected to a specific milestone in a project
"Configuration item" as used in this book is similar to the first meaning of "baseline" in the
previous list. "Delivery" is used in this book in the sense of a collection of configuration items
(in itself a configuration item), whether or not such a delivery is associated with a milestone or
some other specific event-similar to either the second or third meaning in the list, depending on
circumstances.
The term "baseline" is not used in this book at all, since misconceptions could result from the
many senses in which it's used. Of course, nothing prevents a company from using the term "baseline,"
as long as the sense is clear to everyone involved.
There are a number of reasons why automated configuration management tools play a vital role in managing complex enterprise infrastructures.
Here are four of the most popular reasons:
Consistency. If your infrastructure is being configured manually, how do you know your servers are being set
up in a consistent manner? Further, how do you know these changes are being performed in a way that meets your compliance and
security requirements? (For instance, are administrators logging changes in the appropriate systems?)
Make life easier for your system administrators by automating repeated tasks with a configuration management tool. When repeated
tasks are tedious, humans are alarmingly bad at performing them consistently. Automate tedious administration tasks with a configuration
management tool so your staff can focus on other important things that humans do best.
Efficient change management. Whenever infrastructure is built manually without the aid of a configuration
management tool, people tend to fear change. Over time, servers that are maintained by hand tend to become fragile environments
that are hard to understand and modify.
In these situations, organizations tend to develop a lot of processes for managing changes, usually with the sole intent on
minimizing change or even delaying it as long as possible. This tends to delay introducing new features your customers need.
When servers can be reproduced easily in a repeatable fashion, fewer processes are needed to manage change. Small change batches
can be performed on a regular basis, such as daily, or even several times a day.
Simplicity in rebuild. When servers are built manually, it's typically not easy to rebuild them from scratch.
What would happen if you suddenly lost your servers in a catastrophic event? How quickly could you restore service if disaster
struck?
Automated deployments using a configuration management tool help quickly restore service. Rather than bothering to upgrade
or patch applications, which can be inherently fragile operations, system administrators can build a new, upgraded system in an
automated fashion and throw the old one away, returning it to the server pool. When rebuilds are easy, system administrators gain
confidence to make changes to infrastructure more rapidly.
Visibility.Configuration management tools include auditing and reporting capabilities. Monitoring
the work performed by one system administrator doesn't require a sophisticated tool. But trying to understand what is going on
with a team of, say, 10 system administrators and 10 software developers deploying software changes many times per day? You need
a configuration tool.
When infrastructure changes are handled by automated systems, changes can be automatically logged in all relevant tracking
systems to raise visibility on the meaningful work your teams are doing.
Automating Internal Infrastructure Orchestration with Chef
BioTeam maintains it's internal company IT infrastructure across a distributed mix of servers hosted both "in the cloud" as well
as within our own offices and colocation cages. We've long been using Opscode Chef
to "orchestrate" our cloud systems and recently have found it invaluable for automatic configuration management of our own local
servers and VMs.
This blog post is just a quick one-off article to highlight how well Chef plays with non-cloud systems including local virtual
machines that BioTeam is running via Citrix XenServer. It was so easy to spin up a new VM ("staff.bioteam.net") and then use a single
Chef one-liner command to bootstrap the server to configure user accounts, install new software (denyhosts) and adjust the configuration
of the /etc/sudoers file that I wanted to screencast and share the process.
First things first �
Thanks to Steve Danna for publishing a CentOS-6 bootstrapping template script. In the screencast below where you see me typing
the "knife bootstrap �" command I'm directly invoking the bootstrapping script for CentOS 6 systems
that Steve put on github.
Screencast Ahead
In the video recorded below we start with a CentOS 6.1 Linux system. The VM was created from a pre-existing barebones XenServer
template and really just contains a minimal operating system and network stack with almost no installed software.
Normally in "Xen" land, I'd fire up the new VM from a template and then do manual sysadmin "stuff" to the server to make it do
what it needed to do.
For this particular server ("staff.bioteam.net") we really just needed a few things to start with:
Create BioTeam staff user accounts
Upload and install individual BioTeam staff SSH keys so they can login securely
Add the appropriate BioTeam user accounts to the /etc/sudoers file so they can elevate access when needed
Install, configure and start the 'denyhosts' service to block SSH password guessing attacks
And wouldn't you know � BioTeam ALREADY has Chef recipes to do all those things because we need them on just
about every cloud server we create.
The screencast below simply shows how I can do all the tasks listed above via my personal Mac OS X laptop with a single call to
the Opscode Chef CLI tool named 'knife'. The exact command used was:
The video below is not edited for time in any way. It really does take less than 4 minutes to take a 'barebones' CentOS system,
install all the software dependencies, build and configure chef, download the cookbooks and runlist and then "process them". The
end result is 100% automated provisioning of a new server while I check Facebook in another browser window.
And for people new to Opscode Chef this is a great example of how powerful and flexible these "infrastructure orchestration" systems
have become. The Chef client running on the new server is doing far more than just simple installs of software from remote repositories.
Of course it's doing that but it's also installing personal individual SSH keys, editing the contents of the /etc/sudoers file and
installing, configuring and starting a new network security service (denyhosts). Try doing that amount of "custom" server config
work using a "golden image" or Kickstart type method!
Note: The text-heavy screencast may best be viewed directly on youtube.com, particularly in the "big" 720p HD
version �
About the Author
Chris is an infrastructure geek specializing in the applied use of IT to enable and enhance scientific research in life science informatics
environments.
synctool is a cluster administration tool that keeps configuration files synchronized across all nodes in a cluster. Nodes may
be part of a logical group or class, in which case they need a particular subset of configuration files. synctool can restart daemons
when needed, if their relevant configuration files have been changed. synctool can also be used to do patch management or other system
administrative tasks.
SPAM is a tool that assists in the management of system configuration and compliance. SPAM tracks, reports on, and compares system
configurations across AIX systems.
Puppet Puppet is a declarative language for expressing
system configuration, a client and server for distributing it, and a library for realizing the configuration. It's written in Ruby.
ISConf ISconf is a framework for recording and playing back all sysadmin
work done to a network of Unix machines. It uses Makefiles to execute work lists. It steps thru each state in a machine's configuration
history in order to get to the current/desired state.
ISConf 4 is decentralized, and looks like it's making good progress.
psgconf psgconf is modular, data store independent and extensible
in Perl.
LCFGng X resource style parameters stored in source files on a central repository
(in RCS), then compiled into individual profiles (which are specific to a machine). When a profile changes, the client is notified
via UDP and then retrieves the profile via HTTP. Component scripts on the client then act on the configuration parameters.
Template Tree 2 The teTre2 program is basically a preprocessor.
It converts the configuration information contained in the tetre configuration files and in the feature tree into a number of target
formats. At the moment there are two target formats available. First there is cfengine which generates a cfengine configuration file.
You can then run this file with cfengine to update the local machine. Second there is a pod mode which produces comprehensive documentation
of all features available in the local feature repository.
Ark / Arusha A framework for collaborative system administration of
multi-platform Unix sites with many dozens of machines. Uses XML to define packages, hosts, users etc.
Quattor toolkit providing a powerful, portable and modular
toolsuite for the automated installation, configuration and management of clusters and farms running UNIX derivates like Linux and
Solaris. Uses the system package manager (RPM or PKG), support for store and manage packages centrally. Node configuration manager
configures local system using a plug-in component framework. Automated Installation Infrastructure subsystem generates install time
information like installer control files (Kickstart or Jumpstart) and DHCP tables.
Gromit Rules file and cron job, designed to be
very automated. Written in Python, doesn't use any python in the config, has a backend for Red Hat with one for Debian and one for
Solaris in alpha status.
SimonStores
system configuration in Oracle. Uses PL/SQL to generate config files based on the contents of the database.
More Simon papers
CFM Configuration file manager, like RPM for
configuration files.
MATtool UNIX configuration and monitoring tool
with Tcl/Tk frontend.
Genuadmin Plain text database
files operated on by a Perl core program that uses shell scripts to perform actions. Uses rsh and nfs
Etch is a tool for system configuration management. It manages the configuration files of the operating system and core
applications. It is easy for a professional system administrator to start using, yet is scalable to large and complex environments.
Are here because you do not want to learn RUBY to get something like Ruby's Chef working.
Absolutely HATE a DSL (explicitly talking about Ruby's Puppet) that it looks like Ruby but it is not.
Are disgusted by the idea of XML based configuration management (ugghh bcfg2)
Basically, any running program that uses a configuration file can use Pacha to safeguard the changes made. Easily
revert from mistakes in configuration (since it is already versioned via Mercurial) and keep track o what changed at what time.
As long as you have Python, Mercurial and SSH installed, you are good to go!
I spent a while going over recipes, and comparing them to Puppet. For example, here's some code to manage sudo for Chef. The Chef
code was written by Chef's authors; the Puppet code was written by myself. The Chef code is spread across 3 files.
# recipes/default.rb:
package "sudo" do
action :upgrade
end
template "/etc/sudoers" do
source "sudoers.erb"
mode 0440
owner "root"
group "root"
variables(
:sudoers_groups => node[:authorization][:sudo][:groups],
:sudoers_users => node[:authorization][:sudo][:users]
)
end
# attributes.rb:
authorization Mash.new unless attribute?("authorization")
authorization[:sudo] = Mash.new unless authorization.has_key?(:sudo)
unless authorization[:sudo].has_key?(:groups)
authorization[:sudo][:groups] = Array.new
end
unless authorization[:sudo].has_key?(:users)
authorization[:sudo][:users] = Array.new
end
# metadata.rb:
maintainer "Opscode, Inc."
maintainer_email "[email protected]"
license "Apache 2.0"
description "Installs and configures sudo"
version "0.7"
attribute "authorization",
:display_name => "Authorization",
:description => "Hash of Authorization attributes",
:type => "hash"
attribute "authorization/sudoers",
:display_name => "Authorization Sudoers",
:description => "Hash of Authorization/Sudoers attributes",
:type => "hash"
attribute "authorization/sudoers/users",
:display_name => "Sudo Users",
:description => "Users who are allowed sudo ALL",
:type => "array",
:default => ""
attribute "authorization/sudoers/groups",
:display_name => "Sudo Groups",
:description => "Groups who are allowed sudo ALL",
:type => "array",
:default => ""
Both Chef and Puppet then take this information and output it through an ERB template, which is an exercise for the reader, since
it's basically the same for both.
There's a few things worth noting here. First of all, Puppet has zero metadata available. If you want to set sudo-able groups,
you need to know those variable names ahead of time and set them to what you want. Both your template and whatever code sets your
sudo-able groups must magically 'just know' this information. Since the Puppet DSL is not even Ruby, you have *zero* ability to perform
any kind of metadata analysis on these attributes in order to make code more generic.
Chef gives you complete metadata about the variables it's using. This is powerful and indeed critical in my imagined use domains
for Chef (keep reading). That metadata comes at a cost of a lot of boilerplate code, though. Chef comes with some rake tasks to generate
some scaffolding. I'm always uncomfortable with scaffolding like this; I think this kind of code generation is a bad way to do metaprogramming.
Chef spreads this information across 3 files, named a particular way. Puppet has a similar scheme of magically named files, but
it's basically just a folder structure, a file called init.pp, and templates/source files. For a fairly simple task, Chef requires
you to know a folder structure and 3 file names, and which data goes in which files. This is congruent with the Ruby world's (perhaps
specifically the rails/merb world's?) general practice of 'convention not configuration'. This is in addition to all of the 'you
just have to know' parts of the Chef system which are taken from Merb, such as where models and controllers live, though you would
not need to edit those save for pretty advanced cases.
Lastly, Chef provides you with an actual data structure that is fed to the sudoers template. Puppet simply uses available dynamically-scoped
variables in its template files. This is *awful*, and a big loss for puppet. I administrate Zimbra servers, for example, which require
extra content in sudoers. I cannot add this to the zimbra module unless the zimbra module were to be the one including the sudo module.
There are solutions to this, of course, but this is a really, really simple use case and we're already
shaving yaks. Chef's method
is undeniably superior.
All 3 of these are part of the same core difference between the two: Puppet is an application, and Chef is a part of one.
Chef is a library to be used in a combined system of resource management in which the application itself is aware of the hardware
it's using. This allows certain kinds of applications to exist on certain kinds of platforms (particularly EC2) that simply couldn't
before--an application using this system can declare a database just as well as it can declare an integer. That's fundamentally powerful,
awesome, amazing.
Puppet is an application which has an enormous built-in library of control methods for systems. The puppet package manager, for
example, supports multiple kinds of *nix, Solaris, HPUX, and so forth. Chef cookbooks can certainly be written to do this, but I
imagine by the time you supported everything puppet does I don't think Chef would get a smiley-face sticker for being tiny and pure
with extra ruby sauce. Puppet's not a fundamental change, it's just a really nice workhorse.
I picked puppet for the project I'm working on now. It made sense for a lot of reasons. Probably first and foremost, there are
3 other sysadmins working with me, some split between this project and others. None of us are ruby programmers. We don't write rake
tasks like we configure Apache, we don't want to explain to new hires the difference between a symbol or a variable, or where the
default Merb configuration files, or 100 other ruby-isms. Meanwhile, most puppet config, silly folder structure aside, is not any
harder to configure than something like Nagios. I think it would be a mistake for an IT shop with a lot of existing systems running
various old-fashioned stateful applications like databases or LDAP to suddenly declare that sysadmins need to be Merb programmers.
Puppet's much deeper out-of-the-box support for a lot of systems provides the kind of right-now real improvements that a lot of
IT shops and random contractors desperately need. System administration is depressingly rarely about being elegant or 'the best'
and much more frequently about being repeatable and reliable. It's just the nature of the business--if the systems ran themselves,
there would be no administrators. Having a bunch of non-programmers become not just programmers but programmers specializing in a
tiny subset of the ruby world is a lot of yaks to shave for an organization. This is not some abstract jab at my colleagues: I am
most certainly not a Merb programmer, and even if I were, I have too many database copies to make, SQL queries to run, mysterious
performance problems to diagnose and deployments to make to give this kind of development the attention it requires. How many system
administrators do you know that use the kind of TDD that Merb can provide for their bash scripts? What would make one think that's
going to happen with Chef?
The other big reason I picked Puppet is that it's got a sizable mailing list, a friendly and frequently used google group for
help, and remains in active development after a couple of years. I don't think Reductive Labs is going away, and if it did, there
have been a lot of contributors to the code base over those 2 years.
It's worth noting, though, that the Chef guys come with an impressive set of resumes. It seems to be somehow tied in with Engine
Yard (several presentations about Chef include Ezra Zygmuntowicz as a speaker). I worry, though, that they are working the typical
valley business model, namely to explode about a year after launch. Chef was released about 8 months before I write this. The organization
I am installing Puppet for does not have the Ruby talent base required to ensure that they can fix bugs as required in the long term
if Opscode goes away, or if they get hired on to Engine Yard and they make Chef into the kind of competitive differentiation secret
it could be.
Chef currently manages the EC2 version of Engine Yard, and that's just the kind of thing I cannot imagine using puppet for: interact
with a giant ruby application to manage itself. If you have a lot of systems joining and leaving the resource pool as required, Chef's
ability to add nodes dynamically is going to save you. The ability to define resources programmatically is very powerful--one could
easily imagine reducing the number of web server threads if a system's CPU use goes over a certain threshold, for example. I would
not try that in puppet! But note that this is an application built from scratch to expect such a command and control system to exist.
If you're just managing a bunch of LAMP stacks and samba servers, this is more power than you need. One of the Opscode founders has
some slides
that talk about this kind of model.
And Chef is powerful for that model, sure, but is that even the model you want for your applications? Applications should not
have to worry about the hardware they use. Making an application's own hardware use visible to itself encourages programmers to spend
time thinking about issues they should be trying their hardest to ignore. A better model is App Engine's, where the system just scales
forever without developer intervention. Even Azure's
service configuration schema model is better, in which different application roles (web, proxy, etc) are described as resources
and given a dynamic instance count, and transparently scalable data stores are available. The number of 'nodes' in the system is
never an issue for either model.
Chef is what you'd use to build that auto-scaling backend. Engine Yard uses it for, well, Engine Yard--scalable rails hosting,
transparently sold as a service to folks who can then just blissfully program in rails and never think about Chef. Very few organizations
are making that infrastructure, and most of them that are are shaving really big yaks and need to stop and use one of the available
clouds.
Meanwhile, a very many organizations are running 6 kinds of *nix to maintain tens of older applications built on the POSIX or
LAMP paradigms, or hosting virtual machines running applications made who knows when. For these organizations, Puppet is probably
the easiest thing that could work, and thus probably the best option.
I'm sure there are sysadmins out there who think I'm completely wrong, and that you just can't beat the elegance Chef provides.
There are a lot of people better than me out there, and I'm sure they have a point. But in my experience, bad system administration
happens when sysadmins try and do everything for themselves. For a given situation in system administration, it's highly unlikely
a sysadmin can do a better job than an available tool. Puppet's sizable default library is what most organizations need, not the
ability to write their own.
And all of the above aside, one thing is clear: there is little excuse for an organization with 3 or more *nix servers not to
be using Puppet, Chef, cfengine, or *something*. I would argue that about 80% of the virtualization push is dodging some of the core
questions of system administration, making systems movable to new resources indefinitely rather than making their configuration repeatable,
but that's a topic for another post. Especially since nobody got this far on this one anyway.
Adam Jacob
Hi John! Thanks for being passionate about my favorite space - configuration management. You do great work, and I know your
intent wasn't necessarily to sow discord - but I wanted to take a moment to comment on a few of your points that I think are either
wrong or missing some important context.
1) Large installed base
Chef has somewhere in the neighborhood of ~1500 working installations. It's true that our early adopters are primarily large
web players like Wikia, Fotopedia, and 37signals. We also have a growing number of people integrating Chef directly into their
service offering - it's not just Engine Yard, it's RightScale and others.
2) Large developer base
According to Ohloh, 39 developers have contributed to Puppet in the last 12 months, and 71 over the projects entire history.
Chef has been open source for a year. We just had our 100th CLA (contributor license agreement, meaning they can contribute
code). Over the course of the year, 52 different people have contributed to Chef, including significant functionality (for the
record, 5 of them work for Opscode.) We're incredibly proud of the community of developers who have joined the project in the
last year, and the huge amount of quality code they produce.
3) Dedicated Configuration Language
To each their own, man. :) My preference for writing configuration management in a 3GL was born out of frustration with doing
the higher order systems integration tasks. By definition, internal DSLs aren't meant to do that - when they start being broadly
applicable, they loose the benefits they gained from domain specificity. For me, the benefit of being able to leverage the full
power of a 3GL dramatically outweigh the learning curve, and I think a side-by-side comparison of the two languages shows just
how close you can get to never having to leave the comfort of your DSL most of the time.
4) Robust Architecture
Chef is built to scale horizontally like a web application. It's a service oriented architecture, built around REST and HTTP.
Like cfengine, it pushes work to the edges, rather then centralizing it. There are large (multi-thousand node) chef deployments,
and larger ones coming. Chef scales just fine.
5) Documentation
It's true, we've been focused pretty intently on refining Chef in tandem with our earlier adopters, and that focus has had
an impact on the clarity of our documentation. Rest assured, we're working on it.
6) Language/Framework Neutral
I'm not sure where this comes from, other than we've had great adoption in the Ruby community. People deploy and manage every
imaginable software stack with Chef - Java, Perl, Ruby, PHP - it's all being managed with Chef.
7) Multi-Platform
It's true that, at release a year ago, Chef didn't support many platforms. Since then, we've been growing that support steadily
- all the platforms you list run Chef just fine, with the exception of AIX. We have native packages for Red Hat (community maintained
by the always awesome Matthew Kent!) and Ubuntu that ship regularly at every release. As for the Chef Server only running on Ubuntu
- that's just not true.
8) Doesn't re-invent the wheel
Again, to each their own. I think Chef's deterministic ordering, ease of integration, wider range of actions, directly re-usable
cookbooks, and lots of other things make it quite innovative. I'm pleased to explain it to you over beer, on my dime. :)
9) Dependency Management
While I understand how you can think this would be true, it isn't. Chef does have dependency management, and a more robust
notification system then Puppet. Each resource is declarative and idempotent. Within a recipe, resources are executed in the order
they are written - meaning the way you write it is the way it runs. This is frequently the way puppet manifests are written as
well. The difference being, there is no need to declare resource-level dependency relationships.
With Chef, you focus on recipe-level dependencies. "Apache should be working before I install Tomcat". You can ensure that
another recipe has been applied at any point, giving you great flexibility, along with a high degree of encapsulation.
One added benefit of the way Chef works is that the system behaves the exact same way, every time, given the same set of inputs.
This greatly eases debugging of ordering issues, and results in a system that is, in my opinion, significantly easier to reason
about at scale (thousands of resources under management).
10. Big Mindshare
There is a bit of survivor bias happening here. I meet people every day who are starting with, or switching to, Chef. You don't,
because, well - you don't use Chef.
* Conclusion
Thanks for taking the time to write about Puppet and Chef - I know your heart is in the right place. Next time, come talk to
us - we're pretty accessible guys, and I would be happy to provide feedback and education about how Chef works. I won't even try
and convince you to switch. :)
Puppet, Chef, cfengine, and Bcfg2 are all players in the configuration management space. If you're looking for Linux automation solutions,
or server configuration management tools, the two technologies you're most likely to come across are Puppet and Opscode Chef. They
are broadly similar in architecture and solve the same kinds of problems. Puppet, from Reductive Labs, has been around longer, and
has a large user base. Chef, from Opscode, has learned some of the lessons from Puppet's development, and has a high-profile client:
EngineYard.
You have an important choice to make: which system should you invest in? When you build an automated infrastructure,
you will likely be working with it for some years. Once your infrastructure is already built, it's expensive to change technologies:
Puppet and Chef deployments are often large-scale, sometimes covering thousands of servers.
Chef vs. Puppet is an ongoing debate, but here are 10 advantages I believe Puppet has over Chef today.
1. Larger installed base
Put simply, almost everyone is using Puppet rather than Chef. While Chef's web site lists
only a handful of companies using it, Puppet's
has over 80 organisations including Google,
Red Hat, Siemens, lots of big businesses worldwide, and several major universities including Stanford and Harvard Law School.
This means Puppet is here to stay, and makes Puppet an easier sell. When people hear it's the same technology Google use, they
figure it works. Chef deployments don't have that advantage (yet). Devops and sysadmins often look to their colleagues and counterparts
in other companies for social proof.
2. Larger developer base
Puppet is so widely used that lots of people develop for it. Puppet has many contributors to its core source code, but it has
also spawned a variety of support systems and third-party add-ons specifically for Puppet, including
Foreman. Popular tools create their own ecosystems.
Chef's developer base is growing fast, but has some way to go to catch up to Puppet - and its developers are necessarily less
experienced at working on it, as it is a much younger project.
3. Choice of configuration languages
The language which Puppet uses to configure
servers is designed specifically for the task: it is a domain language optimised for the task of describing and linking resources
such as users and files.
Chef uses an extension of the Ruby language.
Ruby is a good general-purpose programming language, but it is not designed for configuration management - and learning Ruby is a
lot harder than learning Puppet's language.
Some people think that Chef's lack of a special-purpose language is an advantage. "You get the power of Ruby for free," they argue.
Unfortunately, there are many things about Ruby which aren't so intuitive, especially for beginners, and there is a large and complex
syntax that has to be mastered.
There is experimental support in Puppet for writing your manifests in a domain language embedded in Ruby just like Chef's. So
perhaps it would be better to say that Puppet gives you the choice of using either its special-purpose language, or the general-purpose
power of Ruby. I tend to agree with Chris Siebenmann
that the problem with using general-purpose languages for configuration is that
they sacrifice clarity for
power, and it's not a good trade.
4. Longer commercial track record
Puppet has been in commercial use for many years, and has been continually refined and improved. It has been deployed into very
large infrastructures (5,000+ machines) and the performance and scalability lessons learned from these projects have fed back into
Puppet's development.
Chef is still at an early stage of development. It's not mature enough for enterprise deployment, in my view. It does not yet
support as many operating systems as Puppet, so it may not even be an option in your environment. Chef deployments do exist on multiple
platforms, though, so check availability for your OS.
5. Better documentation
Puppet has a large user-maintained wiki with hundreds of pages of
documentation and comprehensive references
for both the language and its
resource types. In addition, it's actively
discussed on several mailing lists and has a very popular
IRC channel, so whatever your Puppet problem,
it's easy to find the answer. (If you're getting started with Puppet, you might like to check out my
Puppet tutorial here.)
Chef's developers have understandably concentrated on getting it working, rather than writing extensive documentation. While there
are Chef tutorials, they're a little sketchy.
There are bits and pieces scattered around, but it's hard to find the piece of information you need.
6. Wider range of use cases
You can use both Chef and Puppet as a deployment tool. The Chef documentation seems largely aimed at users deploying
Ruby on Rails applications,
particularly in cloud environments - EngineYard is its main user and that's what they do, and most of the tutorials have a similar
focus. Chef's not limited to Rails, but it's fair to say it's a major use case.
In contrast, Puppet is not associated with any particular language or web framework. Its users manage Rails apps, but also PHP
applications, Python and Django, Mac desktops, or AIX mainframes running Oracle.
To make it clear, this is not a technical advantage of Puppet, but rather that its community, documentation and usage have a broader
base. Whatever you're trying to manage with Puppet, you're likely to find that someone else has done the same and can help you.
7. More platform support
Puppet supports multiple platforms. Whether it's running on OS X or on Solaris, Puppet knows the right package manager to use
and the right commands to create resources. The Puppet server can run on any platform which supports Ruby, and it can run on relatively
old and out-of-date OS and Ruby versions (an important consideration in many enterprise environments, which tend to be conservative
about upgrading software).
Chef supports fewer platforms than Puppet, largely because it depends on recent versions of both Ruby and CouchDB. As with Puppet,
though, the list of supported platforms is growing all the time. Puppet and Chef can both deploy all domains of your infrastructure,
provided it's on the supported list.
8. Doesn't reinvent the wheel
Chef was strongly inspired by Puppet. It largely duplicates functionality which already existed in Puppet - but it doesn't yet
have all the capabilities of Puppet. If you're already using Puppet, Chef doesn't really offer anything new which would make it worth
switching.
Of course, Puppet itself reinvented a lot of functionality which was present in earlier generations of config management software,
such as cfengine. What goes around comes around.
9. Explicit dependency management
Some resources depend on other resources - things need to be done in a certain order for them to work. Chef is like a shell script:
things are done in the order they're written, and that's all. But since there's no way to explicitly say that one resource depends
on another, the ordering of your resources in the code may be critical or it may not - there's no way for a reader to tell by looking
at the recipe. Consequently, refactoring and moving code around can be dangerous - just changing the order of resources in a text
file may stop things from working.
In Puppet, dependencies are always explicit, and you can reorder your resources freely in the code without affecting the order
of application. A resource in Puppet can 'listen' for changes to things it depends on: if the Apache config changes, that can automatically
trigger an Apache restart. Conversely, resources can 'notify' other resources that may be interested in them. (Chef can do this too,
but you're not required to make these relationships explicit - and in my mind that's a bad thing, though some people disagree. Andrew
Clay Shafer has written thoughtfully on this distinction:
Puppet, Chef,
Dependencies and Worldviews).
Though not a technical consideration, this is probably the most important. When you say 'configuration management' to most people
(at least people who know what you're talking about), the usual answer is 'Puppet'. Puppet owns this space. I know there is a large
and helpful community I can call on for help, and even
books published on Puppet. Puppet is so widely adopted that virtually every problem you could encounter has already been found
and solved by someone.
Conclusion
Currently 'Chef vs. Puppet' is a rather unfair comparison. Many of the perceived disadvantages of Chef that I've mentioned above
are largely due to the fact that Chef is very new. Technically, Puppet and Chef have similar capabilities, but Puppet has first mover
advantage and has colonised most corners of the configuration management world. One day Chef may catch up, but my recommendation
today is to go with Puppet.
Selected Comments
Julian Simpson:
Culture is an important reason as to why people gravitate to one tool or another. Chef will draw in Ruby developers because
it's not declarative, and because it's easy.
My experience is that most developers don't do declarative systems. Everyday languages are imperative, and when you're a developer
looking to get something deployed quickly, you're most likely to pick the tool that suits your world view.
Systems Administrators tend to use more declarative tools (make, etc.)
Developers and Systems Administrators also have a divergent set of incentives. Developers are generally rewarded for delivering
systems quickly, and SA's are rewarded for stability. IMHO, Chef is a tool to roll out something quickly, and Puppet is the one
to manage the full lifecycle. That's why I think Chef makes a good fit for cloud deployment because Vm instances have a short
lifespan.
I think it's still anybody's game. The opportunity for Chef is that the developer community could build out an ecosystem very
quickly.
vvuksan:
It seems to me that both system have quite a bit of support out there and it really comes down to what you as the sysadmin/developer
prefer.
I would also agree with ripienaar's tweet about disagreeing with point 6. Configuration management
systems are not really intended for deploying software but for making sure that systems conform to a certain policy ie. webserver
policy etc.
Nick Anderson:
I'm a SA and have worked closely with developers for years. It never ceased to amaze me how differently we think. It does boil
down to priorities, culture, and incentives as Julian mentioned. I have not used Chef but I saw quite the stir the last time I
mentioned puppet
Puppet Works
Hard To Make Sure Nodes Are In Compliance.
I have used puppet both as a deployment tool and a configuration management tool. It really can do both just fine as a deployment
is essentially a configuration change. But I have found it easier to use a tool like
fabric when I need to perform "actions" on a group of machines, especially when
those actions are many and very possibly one time. I have found it a bit daunting if you put too much into your configuration
management tool as over time it becomes a lot to sift through, and when its time to remove a configuration you have to leave that
part of the configuration there (the part that removes whatever it was).
Maybe I haven't looked around enough but I really want to see a puppet reporting tool. I know bcfg2 has a decent one. I want
to be able to know the current stats of my nodes, who is in compliance, who isn't, when I last spoke with what node, last time
nodex changed and what changed.
John Arundel:
It is hard to be objective - probably impossible. I'm sure I haven't been.
My background is that I've used Puppet for commercial sysadmin work for several years (basically since it came out), and it
currently manages many infrastructures for many of my clients (I'm a freelancer). The biggest deployment I've worked on is probably
25-30 servers, and a comparable number of desktops. Maybe 6,000 lines of manifest code (not counting templates).
When Chef was first announced, I set aside time to build a Chef server and try it out, with a view to adopting it if it was
superior to Puppet. I found it quite hard going (admittedly that was early days for Chef), and I didn't find sufficient advantages
for Chef to migrate any of my clients to it. If a client asked for Chef specifically, I'd be quite happy to use it, but so far
no-one has.
So based on what I know, I use Puppet and that's what I recommend to others. I'm very interested in hearing from anyone who
knows different.
Anonymous
Readers, do you homework too and stop reading articles with the title 'versus', the hallmark of propaganda. If you must read
on, some specific points, with disclosure that I'm a Chef early adopter with previous Puppet exposure.
#1, #2, #5, #7, #10: puppet is more mature than Chef
All software starts with a small install base, fewer adherents, etc. That doesn't make it more suitable for your specific environment
or taste in software development (configuration management is development too). The answer here is to try both systems yourself
and compare them - something the author of this article seems to not have done yet. It's not just about the code, it's about the
software used to deploy it, the way it authenticates, etc. These things should also influence your decision.
#9: Dependency management
"Chef has no support for specifying dependencies (ordering resources). Chef is like a shell script: things are done in the
order they're written, and that's all."
Chef's default behavior is to process resources in the order you write them. It has other dependency features just like Puppet
does - see below.
"A resource in Puppet can 'listen' for changes to things it depends on: if the Apache config changes, that can automatically
trigger an Apache restart. Conversely, resources can 'notify' other resources that may be interested in them."
"Ruby is a good general-purpose programming language, but it is not designed for configuration management - and learning Ruby
is a lot harder than learning Puppet's language."
Sysadmins who can code can learn Ruby quickly, and there are plenty of resources on how to write Ruby. While most of the time
you can stick to the Chef style of Ruby, you have access to the power of a mature programming language for free. If you think
this language is easier, show why that would be the case for someone who already knows at least one programming language.
I see nothing inherent in Puppet's language that makes it better suited to configuration management. If you think there is,
show some examples.
#6: Language/framework neutral
Straight up bullshit here. There is nothing in Chef specific to Ruby on Rails. All chef deployments I know of (including our
own) are used for deploying entire stacks of software totally unrelated to Ruby or Rails, just like Puppet.
Conclusion: In the next installment, show more code examples and tell us why Chef didn't work for you where Puppet did. Try
both software packages the day before you write the article, not 6 months before. Assume your readers write code and already know
that adopting less mature software is more risky.
R.I.Pienaar:
I'd agree with almost everything above, this strikes me as mostly self promoting b/s written with the express intend on driving
traffic to a blog. Especially given the spammy nature of its promotion.
As an aside, and I wouldn't want to distract from the fantasy here with actual facts, but Puppet is getting native Ruby base
DSL some time soon and so will please both sides of that particular fence.
Configuration files contain complex information associated with a system's host environment, including settings for network, storage
and other run-time resources. Application, OS and middleware configuration files typically need to be heavily modified to "contextualize"
a system for its local host environment.
Today, rPath supports open source
configuration tools such as Puppet, Cfengine and Opscode's
Chef in two ways:
Side-by-side. Used side-by-side, rPath manages operating system, middleware and application software while
a third-party configuration tool manages configuration files. No changes or integrations are required in either system.
Deploy and manage. rPath can deploy and manage configuration tool scripts, managing the regularly-changing
scripts under version control and alongside software system manifests. Scripts are easily deployed and reproduced, and changes
to the scripts can be easily rolled back. Unique configuration scripts can be managed together with specific system manifests
to ensure they're coordinated and synchronized as they move together through the release lifecycle.
According to Sorofman: "rPath offers the most advanced capabilities available for provisioning and maintaining software systems
across physical, virtual or cloud environments. Increasingly, advanced IT shops-including several rPath customers-are using tools
like Puppet, Opscode's Chef and
Cfengine to manage configuration settings. But they recognize that these tools are poorly
suited to managing software systems, which is rPath's strength. It's a logical combination."
This is an interesting idea but not a real solution as /etc/ is a dynamic directory into which files are often installed as new
packages are added. This is especially typical for Linux.
Tracking changes in a server configuration can be critical to understand problems, identify security breaches and repair a server.
When several people are in charge of administering one or several servers, sharing the configuration changes is helpful to inform
each other about these modifications. The article describes a simple organization that uses subversion and daily mail
notifications in case of change.
The overall idea is to put the server configuration files stored in /etc directory
under a version control system:
subversion. The VCS is configured to send an email to the system administrators.
The email contains the differences with a previous version. A cron script is executed every day to automatically commit the changes,
thus triggering the email.
The best practice is of course that each system administrator commits their changes after they validated the new running configuration.
If they do so, they are able to specify a comment which is helpful to understand what was done.
First, you should install subversion with its tools.
For the mail notification, you may use postfix, exim or sendmail. But to avoid to setup
a complete mail system, you may just use a simple mail client. For this, you can use the combination of esmtp and
procmail.
sudo apt-get install -y procmail esmtp
Create the subversion repository
The subversion repository will contain all the version and history of your /etc. It must be protected carefully because it contains
sensitive information.
Now, setup the subversion repository to send an email for each commit. For this, copy or rename the post-commit.tmpl file and
edit it to specify to whom you want the email to be sent:
Now the hard stuff is to turn /etc into a subversion environment without breaking the server. For this, we extract the subversion
/etc repository somewhere and copy only the subversion files in /etc.
sudo mkdir /home/svn/last
sudo sh -c "cd /home/svn/last && svn co file:///home/svn/repos/etc"
sudo sh -c "cd /home/svn/last/etc && tar cf - `find . -name .svn` | (cd /etc && tar xvf -)"
At this step, everything is ready. You can go in /etc directory and use all the subversion commands. Example:
sudo svn log /etc/hosts
to see the changes in the hosts file.
Auto-commit and detection of changes
The goal now is to detect every day the changes that were made and send a mail with the changes to the supervisor. For this, you
create a cron script that you put in /etc/cron.daily. The script will be executed every day at 6:25am. It will commit
the changes that were made and send an email for the new files.
#!/bin/sh
SVN_ETC=/etc
HOST=`hostname`
# Commit those changes
cd $SVN_ETC && svn commit -m "Saving changes in /etc on $HOST"
# Email address to which changes are sent
EMAIL_TO="TO_EMAIL"
STATUS=`cd $SVN_ETC && svn status`
if test "T$STATUS" != "T"; then
(echo "Subject: New files in /etc on $HOST";
echo "To: $EMAIL_TO";
echo "The following files are new and should be checked in:";
echo "$STATUS") | sendmail -f'FROM_EMAIL' $EMAIL_TO
fi
In this script you will replace TO_EMAIL and FROM_EMAIL by real email addresses.
Complete setup script
To help setup and configure all this easily, I'm now using a script that configures everything. You can download it:
mk-etc-repository. The usage of the script is really simple,
you just need to specify the email address for the notification:
The one-button install concept should extend to other aspects of your systems, for much the same reasons. Puppet enables you to
manage your systems centrally - you change files or settings in the repository on the central Puppet server, and they're rolled out
automatically to all your Puppet clients. You will still have to change things twice (once on a test machine to make sure what you're
doing, then once in the central Puppet repository), but it'll save a lot of time and reduce mistakes. (Remember that it really is
important to test - Puppet also makes it really fast to propagate an error across all your systems.)
... ... ...
Send commands to several PCs
Not everything that you want to do on all machines will work well with Puppet, - you might for example want to temporarily mount
a particular disk on all machines. ClusterSSH is great for this - it enables you to log onto a number of machines at once, and issue
the same command on all of them simultaneously. Usefully, you can also click on a particular machine's screen and issue a command
just on that machine, in case one machine is misbehaving.
You can set up groups of machines, as well, so that you can log in immediately to all your servers, or all your desktops. Combine
this with a root ssh key and ssh-agent, and save yourself both typing and time.
About: pssh provides parallel versions of the OpenSSH tools that are useful for controlling large numbers of machines simultaneously.
It includes parallel versions of ssh, scp, and rsync, as well as a parallel kill command.
Changes: A 64-bit bug was fixed: select now uses None when there is no timeout rather than sys.maxint. EINTR is caught
on select, read, and write calls. Longopts were fixed for pnuke, prsync, pscp, pslurp, and pssh. Missing environment variables options
support was added.
Silk Tree propagate/etc/passwd and /etc/group files from a master to a list of hosts via SSH. Neither the sending nor the receiving end connect to each other as
root. Instead there is a read-only sudo sub-component on the receiver's side that makes the final modifications in /etc. Many checks
are made to ensure reliable authorization updates. ACLs are used to enforce a simple security policy. Differences between old and
new versions are shown. Two small scripts are included for exporting LDAP users and groups.
About: The "Schily" Tool Box is a set of tools written or managed by J�rg Schilling. It includes programs like: cdrecord,
cdda2wav, readcd, mkisofs, smake, bsh, btcflash, calc, calltree, change, compare, count, devdump, hdump, isodebug, isodump, isoinfo,
isovfy, label, mt, p, sccs, scgcheck, scpio, sdd, sfind, sformat, smake, sh, star, star_sym, suntar, gnutar, tartest, termcap, and
ved.
Changes: The source for "copy" (an accurate sparse file enabled copy program) has beeen added. The source for the "mountcd"
program from SchilliX has been added. The source for "udiff", a diff program with human readable output has been added. Star has
been bumped to 1.5-final. bsh and sh now skip BASH time stamps from the .history file. smake adds MAKE_SHELL_FLAG/MAKE_SHELL_IFLAG
macros.
MrTools is a suite of tools for managing large, distributed environments. It can be used to execute
scripts on multiple remote hosts without prior installation, copy of a file or directory to multiple hosts as efficiently
as possible in a relatively secure way, and collect a copy of a file or directory from multiple hosts.
Release focus:
Initial freshmeat announcement
Changes:
Hash tree cleanup in thread tracking code was improved in all tools in the suite. Mrtools Has now adopted version 3 of the
GPL. A shell quoting issue in mrexec.pl was fixed. This fixed several known limitations,
including the ability to use mrexec.pl with Perl scripts and awk if statements. This fix alone has redefined mrexec.pl's capabilities,
making an already powerful tool even more powerful.
Scmbug integrates software configuration management (SCM) with bug-tracking. It aims to solve the integration problem once and
for all. It will glue any source code version control system (such as CVS/CVSNT, Subversion,
and Git) with any bug tracking system (such as Bugzilla, Mantis, Request Tracker, Test
Director).
About: System Configuration Collector (SCC) is yet another configuration collector. It consists of a client and a server
part. The client collects configuration data in a structured snapshot, compares the new snapshot with
the previous one, and adds differences to a logbook.
Then the snapshot and the logbook are converted to HTML for local inspection. Optionally, the data can be sent to a system running
the server software. On the server, summaries of the data are generated, and search/compare operations on the snapshots and logbooks
are available via a Web interface.
Changes: Some changes to support ServerOrientedLinux have been implemented. The determination of an active name has been
corrected. This release avoids messages when the LVM directory is absent on a cluster node. Config files in /etc/rc.d have been added.
The package also contain Solaris binary of chpasswd clone, which is extremely
useful for mass changes of passwords in corporate environments which include Solaris and other Unixes that does not have chpasswd utility
(HP-UX is another example in this category). Version 1.3.2 now includes Solaris binary of
chpasswd which works on Solaris 9 and 10.
passwd.cgi, which allow users to update their password,
viewmailcfg.cgi, which allows users to view their current mail configuration,
mailcfg.cgi, which updates the mail configuration.
All programs use PAM for user authentication. It is possible to run a script to update SAMBA passwords or NIS configuration when
a password is changed. mailcfg.cgi creates a .procmailrc in the user's home directory. A user with too many invalid logins can be
locked. The minimum and maximum UID can be set in the configuration file, so you can specify a range of UIDs that are allowed to
use cgipaf.
Written in shell. Looks very similar to Titan as simple configuration management tool with the security/hardening bent.
ProShield is a system administration program for Ubuntu/Debian Linux. It helps ensure your system is secure and up-to-date by
checking many different aspects of your system. Regular use is recommended.
Whether you are a Linux novice or a system administrator
with a dozen servers, ProShield is designed to be useable by all. ProShield's main goal is to help secure a newly installed box (computer),
as well as maintain the security of an existing box on a maintenance basis. It's part security, part security administration.
The main features of ProShield are:
Helps you backup your system weekly.
Checks for new software releases, in order to see if installed software is reasonably up to date. Smart-suggestion to upgrade
if an important package is released.
Disk-space check to find any partitions that are 70% full or more.
Checks for extra root accounts.
Checks account & password files for correct access control permissions.
Makes sure a few security-hazardous packages are not installed.
Checks to make sure a packet sniffer is not running.
Removes unneeded packages from the local package archive.
Checks to see if 'apt' is fetching unnecessary information when checking for software updates.
Makes sure system time is accurate.
Checks to make sure the user isn't logged into the system (GUI) as root.
Checks the configuration of the ssh server ([sshd] if installed) for insecure settings.
At runtime, ProShield will also check to see if there has been a new version released, and can download and install it at
the user's preference.
When the program is done analyzing your system, it displays an "advisory report", and then (if necessary), guides you through a series
of interactive questions to help you solve any problems it found.
ns4 is a configuration management tool which allows the automated backup of node configurations.
Commands are defined within a configuration file, and when they are executed, the output is sent to a series of FTP servers for
archiving. As well as archiving configurations, it allows scripts to be run on nodes; this allows configurations to be applied en
masse and allows conditional logic so different bits of scripts are run on different nodes.
The idea of storing files without full path is questionable: "In my configuration scheme, each configuration file is in a single
directory or in one of its subdirectories. The configuration files are be named uniquely, and the directories denote
machines or platforms rather
than location."
The average developer spends more time navigating, learning, and debugging configuration files than you'd expect.
But you can save that time -- and loads of energy and frustration -- with one of the tools you probably use every day: your CVS
tree. Take these tips on backing up, distributing, and making portable your peskiest Linux� (and UNIX�) config files.
Working with configuration files can be a bewildering part of using Linux and computers in general. No standards exist, though
several have been proposed. For example, Samba and rsync use INI-style configurations; passwd is in a decades-old colon-separated
format that doesn't allow colons in any field; sudo comes with a visudo program to keep people from entering wrong information in
the sudoers file; Emacs uses Lisp for configuration files. And the list goes on...
Now, I'm not complaining about the variety of configuration files. I understand the historical and practical reasons for this
Configuration Tower of Babel. Changing the Samba configuration format, for instance, would annoy thousands upon thousands of administrators.
In another example, Emacs' internal language is Lisp, a powerful high-level language, so using anything else for Emacs configuration
files would be ridiculous.
No, my point is the effect all this variety has on the Linux user: a large portion of a Linux user's computer time is spent learning,
writing, and debugging configuration files. Thus, it is useful to have a system in which these configuration files (1) are backed
up automatically, (2) are distributed automatically, and (3) work on multiple flavors of UNIX and distributions of Linux. This article
explains how to achieve the first two goals, and gets you started on the road to achieving the third one.
We'll use CVS to hold the configuration files. Feel free to use any other versioning system. Subversion is gaining popularity
quickly. The FSF has GNU tla (GNU arch), another nice versioning system. The essential features you need are provided by all those
and many others, including the non-free ones like Rational� ClearCase�.
In my configuration scheme, each configuration file is in a single directory or in one of its subdirectories.
The configuration files are be named uniquely, and the directories denote
machines or platforms
rather than location.
Thus, the file name maps uniquely to a location in the filesystem. For example, passwd will always be used for /etc/passwd,
while cshrc will be used for /home/tzz/.cshrc for user
tzz.
For a few programs I use daily, I'll show how I handle multiple platforms with the help of my configuration system and changing
the configuration files themselves.
All the examples I show use the C shell to set environment variables. Modifying them to use GNU bash or something else should
not be terribly difficult.
You probably already have CVS installed on your machine. If not, get it (see the
Resources section) and install it. If you are using another versioning system, try to set up something similar to what I show
below.
First of all, you need to create a CVS repository. I'll assume you have access to a machine that can be used as a CVS server through
OpenSSH or Pserver CVS access (Pserver is the communication protocol for CVS; see
Resources for more information). Then, you need to create a module called config, which I will use to hold the sample
configuration files. Finally, you need to arrange a way to use your CVS repository remotely non-interactively, through OpenSSH, Pserver,
or whatever is appropriate. This last point is highly dependent on your particular system administration skills, level of paranoia,
and environment, so I can only point you to some information in the
Resources. I will assume you have configured non-interactive (ssh-agent) logins through OpenSSH for the rest of this article.
# assume that /cvsroot is your repository's home
> setenv CVSROOT /cvsroot
# this will use $CVSROOT if no -d option is specified
> cvs init
# check that it worked
> ls /cvsroot
# you should see one directory called CVSROOT
CVSROOT
Now that the repository is set up, you can continue using it remotely (you can do the steps below on the CVS server, too -- just
leave CVSROOT as in Listing 1).
# user tzz, machine home.com, directory /cvsroot is the CVSROOT
> setenv CVSROOT [email protected]:/cvsroot
# use SSH as the transport
> setenv CVS_RSH ssh
# use a temporary directory for the module creation
> cd /tmp
> mkdir config
> cd config
# tzz is the "vendor name" and initial is the "release tag", they can
# be anything; the -m flag tells CVS not to ask us for a message
# if this fails due to SSH problems, see the Resources
> cvs import -m '' config tzz initial
No conflicts created by this import
# now let's do a test checkout
> cd ~
> rm -rf /tmp/config
> cvs co config
cvs checkout: Updating config
# check everything is correct
> ls config
CVS
Now you have a copy of the config CVS module checked out in your home directory; we'll use that as our starting point.
I'll use my user name tzz
and home directory /home/tzz in this article, but, of course, you should use your own user name and directory as appropriate.
Let's create a single file. The CVS options file, cvsrc, seems appropriate since we'll be using CVS a lot more.
> cd ~/config
> echo "cvs -z3" > cvsrc
> echo "update -P -d" >> cvsrc
> cvs add cvsrc
# you really don't need log messages here
> cvs commit -m ''
> ln -s ~/config/cvsrc ~/.cvsrc
From this point on, all your CVS options will live in ~/config/cvsrc, and you will update that file instead of ~/.cvsrc. The specific
options you added tell CVS to retrieve directories when they don't exist, and to prune empty directories. This is usually what users
want. For the remaining machines you want to set up this way, you need to check out the config module again and make
the link again.
> cd ~
# set the following two for remote access
> setenv CVSROOT ...
> setenv CVS_RSH ...
# now check out "config" -- this will get all the files
> cvs checkout config
> cd ~/config
> ln -s ~/config/cvsrc ~/.cvsrc
You may also know that Linux allows for hard links in addition to the symbolic ones you just created. Because of the limitations
of hard links, they are not suitable to this scheme. For instance, say you create a hard link, ~/.cvsrc, to ~/config/cvsrc and later
you remove ~/config/cvsrc (there are many ways this could happen). The ~/.cvsrc file would still hold the old contents of what used
to be ~/config/cvsrc. Now, you check out ~/config/cvsrc again. The ~/.cvsrc file, however, will not be updated. That's why symbolic
links are better in this situation.
Let's say you change cvsrc to add one more option:
This is nice and easy. What's even nicer is that the CVS update shown above will update
every file in ~/config,
so all the files you keep
under this CVS scheme will be up-to-date at once with one command. This is the essence of the configuration scheme shown here; the
rest is just window dressing.
Note that once you've checked out a module, there's a directory in it called "CVS." The CVS directory has enough information about
the CVS module that you can do update, commit, and other CVS operations without specifying the CVSROOT variable.
For automatic updates and commits, I have written a very simple Perl program, maintain.pl. The longest part of the program is
the help text, so you can imagine it's not full of complex code. I will go through it regardless, but keep in mind that a shell script
could do the same job if needed.
The only thing maintain.pl does not do is make the symbolic links. Since that has to be done just once, and on some systems you
do not want the links
wholesale, the complexity of the task compared to the simplicity of doing it manually was simply too much. I know because I wrote
the symbolic link code and got rid of it later.
I had to write and maintain yet another configuration file that mapped out many filenames. There were many exceptions; for example,
two Linux and Solaris systems I use have radically different setups. There were just too many things to worry about, and I found
that manually installing the links was much easier. Of course, your experience may vary -- I encourage you to try to find the most
appropriate approach for your own environment.
I hope you found this article interesting and useful. Take what you can from it -- I've spent years perfecting my setup, and it should
serve you in good stead.
Convert to this scheme a little at a time, don't get overwhelmed. You can easily spend days rewriting your configurations -- so
do it gradually and you'll enjoy the process.
The greatest benefit you'll see is the automatic update function. On any of your machines, you can commit a file and it will show
up everywhere else the next time maintain.pl is run! Even if you disagree with the directory structure, think about the power of
the automatic updates and how they can be useful to you.
The second benefit you get is configuration archiving. Every version of your configurations will be in the revision control system!
If you make a mistake, you can go back to an earlier version. If you lose a whole machine to, say, disk failure -- you can recover
all the time-consuming configuration files you wrote for it in minutes.
Don't be tempted to convert everything to this scheme. Convert just the things you want to keep or reuse. Binary files don't work
well with CVS -- at the very least, you won't have the diff capability that CVS provides for text files. Also, CVS has
trouble with renaming directories, although it's certainly possible if you also rename the directory in the repository.
Finally, keep good backups of your CVSROOT repository, wherever it is. I hope you never need them.
Essential CVS (O'Reilly & Associates, 2003) by Jennifer Vesperman is a good CVS overview, and
CVS Pocket Reference, 2nd edition (O'Reilly & Associates, 2003) by Gregor Purdy is an excellent quick reference to
CVS -- I highly recommend it.
dotfiles.com is an excellent resource for learning about configuring the C shell, bash, Emacs, and many, many other Linux
and UNIX programs. It's highly recommended; just don't blame us when you spend your whole weekend browsing the site.
OpenSSH is a standard, free, and very good implementation of the SSH protocol.
CVS Pserver is good for allowing anonymous CVS access, but it is insecure.
OpenSSH non-interactive logins with the help of an ssh-agent are explained in
OpenSSH key management (developerWorks, July 2001), a three-part series by Daniel Robbins.
AppConfig is a CPAN module for parsing command-line options and configuration files. In
Cultured Perl: Application configuration with Perl (developerWorks, October 2000), Ted demonstrates how the AppConfig
module can handle local configuration storage for Perl programs, and how such configurations can be stored in a database that
can then be accessed from any machine on the network.
You may also want to read
Understanding Linux configuration files (developerWorks, December 2001), which explains those configuration files on a Linux
system that control user permissions, system applications, daemons, services, and other administrative tasks.
Meanwhile,
Debugging configure (developerWorks, December 2003) discusses what to do when good config files go bad, and an automatic configuration
script doesn't work. Tips for users as well as for developers help you to keep failures to a minimum.
About the author
Teodor Zlatanov graduated with an M.S. in computer engineering from Boston University in 1999. He has worked as a programmer since
1992, using Perl, Java�, C, and C++. His interests are in open source work, Perl, text parsing, three-tier client-server database
architectures, and UNIX system administration. Suggestions and corrections are welcome; contact Ted at [email protected]
The Machine Inventory Database (MID) is a Perl-based CGI interface to manage the machines on and off
your network, both from the IP assignment perspective and the asset-tracking perspective. On top of acting as a frontend
to a handful of MySQL tables, it handles IP assignment and acts as a frontend to the configuration files for BIND, YP, and DHCPD
to reduce the chance for typos in the configuration files which tend to bring down service.
nothing prevents you from just installing cvs and importing/checking out your config directories. i think it's really not that
much work to justify a distro on its own.
Gentoo offers several choices in managing the configuration files in/etc, one of these options is the dispatch-conf script which keeps all changes in RCS. This is mostly
for updating... so it's not everything, but it's definitely a strong start and you could likely use the same system to keep track
of your own modifications.
Just go into your/etc/, do a 'mkdir RCS', and then start checking your config files in and out of RCS to edit them. There's no code
anywhere in linux that says 'if there's a directory I don't recognize, then crash spectacularly', so just adding the RCS directory
itself isn't going to adversely affect anything.
That's actually a really good idea, too, I'm not sure why I never thought of it myself...
Re:Nothing is stopping you from doing this. (Score:5, Informative)
by Atzanteol (99067) on Monday July 19, @09:14PM (#9743767)
There's no code anywhere in linux that says 'if there's a directory I don't recognize, then crash spectacularly'
I beg to differ... I had an issue just last week where I tried checking/etc into a CVS repository. It turns out that/etc/devfs.d/ doesn't like *anything* in it that doesn't
belong (like a CVS directory). This caused/dev to be very slim upon a reboot, and things like 'hda' et al were
missing.
Now, I'm not sure if this is purely a Gentoo issue or not (I'm not terribly familiar with devfs), but it's something to
remember. Back up/etc/ before doing ANYTHING! lesson learned...:-)
I keep my entire home directory in a Subversion repository. Works great for linux and my windows boxes. Firefox and thunderbird
user directories are compatible across platforms.
I just add 'svn up' to my login script and 'svn ci --message "%HOST%@%TIME%%DATE%"' to my logout script.
No reason it shouldn't work for a whole system with an initial 'svn up' somewhere in rc.local and periodic updates in a chron job.
Just do a commit whenever you change things on your template system and 5 minutes later it'll be on all your boxen.
There was a slashdot article about putting a home directory under version control a few months ago from which I got the idea,
too lazy to find the link at the moment though.
Yikes! by schon (Score:2) Tuesday July 20, @03:29PM
BitKeeper (Score:2)
by twoflower (24166) on Monday July 19, @08:36PM (#9743457)
Larry McVoy designed BitKeeper with the specific aim of doing this. I believe they also offer special single-user free licenses for
this; you may want to check the BitKeeper documentation to see if there are any Linux distributions who actually took him up on this.
[
Reply to This ]
YEs, Gentoo can do this. Just emerge rcs, make an/etc/config-archive dir, setup/etc/dispatch-conf.conf, and just do dispatch-conf in place of etc-update.
I think it was OpenVMS (fuzzy memories of a freshman computer class) that had version control built into the filesystem.
I'm amazed that this hasn't been introduced into the more popular filesystem(s) yet. I've wished for it on many occasions.
Or am I just being impatient? Will Reiser4 provide this capability?
You should really check out a utility for FreeBSD called mergemaster. You run it after rebuilding/upgrading
your system and it compares the latest "vanilla" system configuration files to what you've got.
You can choose to overwrite your file, keep your file or merge the two together. I
like to think of it as the ultimate choice in system housekeeping.
Re:FreeBSD by Rysc (Score:2) Tuesday July 20, @06:38AM
As many people have pointed out having versioning on the config of a system is hardly a new idea. If you think about what might
happen if you try to make this idea simple and easy to use it might end up being something like System Restore for Windows, which
stores versions before updates, and if you're smart you make a check point before installing any questionable software or drivers.
And then allows you to roll back if something goes wrong and the uninstall doesn't fix it.
For non-Debian users, download changetrack [sourceforge.net] from
SourceForge.
changetrack uses RCS as it's backend, not CVS (support for CVS is on the Todo list), but the end result is the same.
It is specifically intended for tracking system files like those in/etc.
dispatch-conf (Score:1)
by trickycamel (696375) on Tuesday July 20, @09:21AM (#9747791)
Gentoo does this for your files in/etc. Use dispatch-conf and forget about etc-update. You can set it to use RCS, so no more overwrites of your configs.
At work, we have a simple wrapper for vim that does all of the RCS stuff for us, like checking in and checking out files. We use
it on all of our production servers, as it gives use nice revision control over our files.
You'll spend years fooling around with RCS and CVS for configuration versioning before realizing that what you really need is
cfengine. CVS or svn for source code, cfengine for configuration. Cut to the chase:
Cfruby allows managed system administration using Ruby by David Powers and
PjotrPrins. It is both a library of Ruby functions
for system administration and an Cfengine-like clone. Cfruby is current deployed on servers, clusters and workstations. See below
for an introduction on both.
Cfruby can be downloaded from http://rubyforge.org/projects/cfruby/
as a gem. You can also access the SVN repository through the Rubyforge web interface.
It is important to understand that Cfruby is really two in one:
Cfrubylib is a pure Ruby library with classes and methods for system administration. This includes file copying, finding,
checksumming, package management, user management etc. etc.
Cfenjin is a simple scripting language for system administration - allowing for scripting of configuration tasks (without
knowledge of Ruby). Naturally Cfenjin uses Cfrubylib itself.
So, if you are looking for a Ruby API check out Cfrubylib. But if you are looking for a scripting language check out Cfenjin.
To confuse matters more: you can use Ruby mixed with Cfenjin style scripting - but that is for those who have a weird streak -
also known as geekishness.
Cfrubylib
Cfrubylib is a Ruby library for system administration. It can do most of the common tasks like file tidying, editing etc. etc.
Best to study the API and code in:
Why reinvent the wheel? And you'll find it gives a lot more power than most configuration tools. Cfrubylib includes cfyaml - a
YAML configurator. And support for FreeBSD Portage, Linux Debian, Linux Gentoo and OS-X Fink packages. Adding support for your favourite
package manager should be straightforward.
Cfenjin
Cfenjin is a GNU Cfengine clone written in Ruby. It does not offer a full replacement for Cfengine (for one we don't have a client/server
protocol, though cfrubylib has some support for that itself) - but it is Ruby and consists of few lines of code using Cfrubylib.
Documentation has been written, bits and pieces, but for now it is probably the best idea to study the examples in:
I'm assuming that you have Subversion installed; in other words, you should have the svn and svnadmin
commands and they should work properly. I'm also assuming that you'll be performing the following tasks as root
The ideal situation to begin applying this tutorial is right after your server has been freshly installed. However, for practical
purposes, any server that's configured and running will do.
Okay. That's enough of the lists and introductions. Time for some action.
Creating the Subversion repository
If you're familiar with UNIX, you'll know /var is the customary directory for files that pertain to the whole system
and are changed. So, following tradition, we'll create a /var/preserve/config repository. Type the following command
at your console:
(note the backslashes are being used to add whitespace)
That should create a /var/preserve/config directory, with a couple of files in it. Those files are not meant to be
editing, and they'll be opaque to us for the rest of the tutorial. As usual, I'd advise you to secure that directory so only
root can read and write files to it.
Now, you'll create two directories directly into the repository. You'll use these directories to travel back and forth between
known configuration states.
trunk/ will contain the current configuration
tags/ will contain snapshots (we'll learn more about them later)
The -m argument specifies a message to attach to the operation.
You can consult these messages afterwards through the svn log command.
Preparing the configuration directory
In true UNIX tradition, /etc is the place to go for system-wide configuration. For the rest of the tutorial, I'll
assume those are the files you want to keep in check.
To track files in /etc, you need to both:
place its contents into the Subversion repository
make it into a working copy
That's easily accomplished via the following command:
Once you've done that,
/etc will be a working copy. Time to add existing files into Subversion.
Checking existing configuration files into the repository
[rudd-o@amauta2 ~]# cd /etc [rudd-o@amauta2 /etc]# svn status
You should see a long listing of files, like this:
? 4Suite ? acpi ? adjtime
The question marks at the beginning of each line mean that Subversion has no idea what those
files are doing there. So, you'll add them to the repository:
[rudd-o@amauta2 /etc]# svn add *
You'll see svn working intensely to add those
files. Note that the files are not being added to the repository yet - they're only being queued for addition. To commit these files
into the repository:
[rudd-o@amauta2 /etc]# svn commit \ -m 'Initial addition of files'
And svn
should start doing its magic. Once it's done, it'll tell you the revision number.
Followup maintenance
Okay, let's review a few things you need to keep in mind from now on.
When configuration files are added to /etc
Check for added files with svn status /etc. You should see them listed with a question mark.
You should use svn add to add them to the working copy, and then svn commit
the added files into the repository. Many people make the mistake of configuring freshly installed files. Do not do that. Instead,
commit new files first, then edit. That way, you'll have a way to track modifications right
back to the pristine configuration files.
When configuration files are deleted from /etc
Check for deleted files with svn status /etc. You should see them listed with an exclamation sign.
After doing the check, svn delete them. Don't forget to commit at the end.
mValent Integrity tracks changes to deployed servers and monitors configuration drift alerting IT teams to potentially critical
problems. By comparing application environments in mValent Integrity for differences in granular configuration items, IT teams rapidly
isolate root causes of production incidents. These teams can then model fixes to problems to validate their impact and automatically
deploy them.
Rich Compare Capabilities � mValent Integrity's Compare function aids troubleshooting by quickly pinpointing differences between
multiple server instances or across infrastructure stacks representing different application environments.
Versioning and Rollback - Running 'snapshots' of application infrastructure environments, plus the ability to recover quickly
from unwanted changes.
Tracking and Alerts - Knowing when a change has been made - no matter what changed - and accurately reporting on the specific
properties before and after the change, gives IT teams early warning on potential problems.
Point-in-Time Views - By keeping a running record of changes to a granular level, mValent Integrity reports on all changes
that occurred between two points in time, or show that no unapproved changes took place.
Audit Reports � Show the changes made to an individual server or a whole production environment by time period or by user.
Configuration Collector (SCC) is yet another configuration collector. It consists of a client and a server part. The client collects
configuration data in a structured snapshot, compares the new snapshot with the previous one, and adds differences to a logbook.
Then the snapshot and the logbook are converted to HTML for local inspection. Optionally, the data can be sent to a system running
the server software. On the server, summaries of the data are generated, and search/compare operations on the snapshots and logbooks
are available via a Web interface.
Changes: This release will not update the keep file when running in interactive mode. It ignores differences in the main
log file when moving data to "split" hosts. Split conditions have been extended with a simple process check. A correction for Debian
for large lines with many fields. Include files have been added for logrotate.conf. Includes for Apache have been corrected. Netscape
Fasttrack server has been added.
Remote Server Management Tool is an Eclipse plug-in that provides an integrated graphical user interface (GUI) environment and enables
testers to manage multiple remote servers simultaneously.
Remote Server Management Tool is an Eclipse plug-in that provides an integrated graphical user interface (GUI) environment
and enables testers to manage multiple remote servers simultaneously. The tool is designed as a management tool for those who
would otherwise telnet to more than one server to manage the servers and who must look at different docs and man pages to find commands
for different platforms in order to create or manage users and groups and to initiate and monitor processes. This tool handles these
operations on remote servers by using a user-friendly GUI; in addition, it displays configuration of the test server (number of processors,
RAM, etc.). The activities that can be managed by this tool on the remote and local server are divided as follows:
Process Management: This utility lists the process running on UNIX and Windows� servers. One can start and stop processes.
Along with process listing, the utility also provides details of the resources used by the process.
User Management: This utility facilitates creation of users and groups on UNIX servers; it also provides options for
listing, creating, deleting, and modifying the attributes of users and groups.
File Management: This utility acts as a windows explorer for any selected server, irrespective of its operating system.
One can create, edit, delete, and copy files and directories on local or remote servers. Testers can tail the remote files.
How does it work?
This Eclipse plug-in was written with the Standard Widget Toolkit (SWT). The tool has a perspective named Remote System Management;
the perspective consists of test servers and a console view. The remote test servers are mounted in the Test Servers view for management
of their resources (process, file system, and users or groups).
At the back end, this Eclipse plug-in uses the Software Test Automation Framework (STAF). STAF is an open-source
framework that masks the operating system-specific details and provides common services and APIs in order to manage system resources.
The APIs are provided for a majority of the languages. Along with the built-in services, STAF also supports external services. The
Remote Server Management Tool comes with two STAF external services: one for user management and another for proving system details.
With the growing interest in adopting best practices across IT departments, particularly according to standards such
as the Information Technology Infrastructure Library (ITIL), many organizations are deciding to implement a configuration management
database (CMDB). A CMDB should help them discover and manage the elements in their IT infrastructure so they can better understand
the relationships among components and facilitate changes effectively. This is important because there is a significant business
value in having a single "source of record" that provides a logical model of the IT infrastructure to identify, manage and verify
all configuration items in the environment.
Having reliable data requires more than a database. It requires a well-conceived configuration management strategy; without knowing
what's in your environment, you can't hope to control it, maintain it or improve it.
Since configuration items are at the heart of the CMDB, it's important to understand what they encompass. A configuration item
is an instance of a physical, logical or conceptual entity that is part of your environment and has configurable attributes specific
to that instance. Examples of configuration items would be a computer system (attributes could include a serial number or IP address)
or even an employee (with configurable attributes such as hours worked and department number).
Getting Started: Developing the Right Strategy
Once you have determined that you may need a CMDB, how do you select the approach that's best for you? Everything begins with
ITIL, the industry framework for IT service management. To get started on developing a configuration management strategy, set your
objectives according to ITIL goals, which state that configuration management accounts for all the IT assets and configurations within
the organization and its services. According to ITIL, the ideal CMDB should also provide accurate information on configurations and
their documentation to support all the other service management processes. In addition, it must provide a sound basis for incident
management, problem management, change management and release management. It must be able to verify the configuration records against
the infrastructure and correct any exceptions. If you think that creating a CMDB is a major undertaking, you're right. But it can
be done effectively if you follow the right approach for your organization.
Lessons Learned: The Evolution of the CMDB
The concept of a CMDB has evolved over the years from a collection of isolated data stores to integrated data stores to a single,
central database. Each time, it gets closer to being the source of record for configuration data without taking a toll on the infrastructure.
However, those who have tried these approaches find that they have serious drawbacks that make them difficult or impossible to scale.
A better alternative is the federated data model. This approach features a centralized database linked to other data stores with
a common data model that carries information from one point to another, without the need to rewrite code. I will describe this model
in more detail after providing an overview of how it evolved.
The predecessors to CMDBs, popular in the 1990s, consisted of several applications that stored their own data, including configuration
data. This approach could meet ITIL's goal of accounting for IT assets and services, but because the data wasn't integrated, the
approach fell short of other objectives, such as understanding dependencies and relationships among configuration items. With isolated
data stores, your asset management application may not see data from a discovery application, and your service-impact management
application may not be able to modify service-level agreements.
IT organizations also tried to create CMDBs by directly integrating their various data sources and applications, connecting each
data consumer to each provider from which it needed data. This approach allowed different configuration management processes to share
data, greatly improving the CMDB's usefulness as a means to integrate applications and IT processes they support. But it required
a lot of resources to create and maintain what tend to be brittle, hard-coded connections between systems.
Recently, vendors have been offering a single, all-encompassing CMDB to hold configuration data that's accessible by all applications
that need the data. But an all-encompassing database isn't feasible in a large, distributed organization. It creates an access
bottleneck because all requests for and updates to data pass through the same path. It also requires a massive migration
to get all of your data into the single database, creating a complicated data model that must change if any application integrated
with the CMDB changes.
Putting It All Together With a Federated Data Model
The most effective approach is the federated data model. It's the best way to share configuration data without the high setup
and maintenance costs associated with the pure centralized approach. It puts primary and widely shared configuration-item data in
a common data store and federates other noncritical attribute data from other application databases. According to a recent Gartner
Inc. study ("Defining a Configuration Management Database," by P. Adams and R. Colville, November 2004), "A practical approach for
a successful implementation of a configuration management database will require a federated data model with a consistent view that
receives at least some data from element-specific tools (for example, desktop configuration management, server configuration management,
network management and storage management)."
This federated approach to a CMDB offers a single, common set of information on each configuration item and its relationships
with other configuration items in a manner that can be leveraged by all relevant IT processes -- creating cost-saving synergy among
different service management functions. A federated data model enables you to fully integrate critical service and infrastructure
management applications and break down the traditional functional silos that often exist within an IT organization, all of which
streamlines delivery of IT services.
Important Benefits of a Federated Approach
The CMDB can focus its functionality on configuration items and their relationships. This functionality includes partitions
for multiple "snapshot" versions, reconciliation of data from multiple sources and federated data. The overhead required to provide
this functionality isn't wasted on data that doesn't need it.
You don't have to migrate related data or modify the CMDB to hold this information. With the boundary
drawn at configuration items and their relationships, the question of whether to store some new type of data in the CMDB is already
answered. You store it instead as part of the CMDB Extended Data and save the trouble of changing the data model in the CMDB to
accommodate the new type of data. You also avoid pitfalls inherent in trimming the data model if you later decide to move data
out of the CMDB. In addition, you don't need to undertake several data migrations and application integrations to move your change
requests, help desk tickets and other configuration item-related data into the CMDB. Applications that use this data can continue
to access it where you currently store it.
Transactional data can be stored in databases that are better able to handle a high volume of requests, instead of in the
CMDB. Data is provided more efficiently. Instead of getting all their data from the CMDB, data consumers can
get it from individual data stores that are optimized to provide the specific type of data being requested.
The CMDB doesn't become a bottleneck. With requests for related data on its own being handled by other
databases, the CMDB doesn't have to accommodate all such traffic in addition to configuration item-related requests. You can spread
the load across multiple systems.
What should this federated model look like?
This model refines ITIL's idea of a CMDB by breaking up the CMDB and its infrastructure into three layers. These are the CMDB
itself; related data linked to or from the CMDB, called the CMDB Extended Data; and applications that interact with these two layers,
called the CMDB Environment.
The CMDB and CMDB Extended Data layers together contain the information ITIL suggests be stored in a CMDB. Separating this information
into two layers is what distinguishes the federated CMDB approach from other, less-successful CMDB approaches. The CMDB holds only
configuration items and their relationships. However, not all available configuration-item attributes must be stored in the CMDB.
In fact, to keep the CMDB scalable and manageable, you should store only the key attributes here and link to the less-important ones
in the CMDB Extended Data.
The CMDB Extended Data layer holds related data, such as help desk tickets, change events, contracts, service-level agreements,
a definitive software library and much more. Although these things aren't configuration items, they contain information about your
configuration items and form an important part of your IT infrastructure. In addition, the CMDB Extended Data layer holds any configuration-item
attributes judged as unnecessary to be stored in the CMDB.
The data in the CMDB Extended Data layer is linked to the configuration item data in the CMDB. By definition, federated configuration-item
attributes are linked from their instances in the CMDB, allowing requests to the CMDB to reach these attributes. But for other types
of extended data, the link can be in either or both directions. For example, a change-request record could have a link through which
you can access the instances of the configuration items it will change, and each configuration-item instance could have a link through
which you can access the change requests that affect it.
To pursue ITIL's goals for configuration management, you should consider the advantages of a federated data model and what it
can do for you.
Doug Mueller is the chief technology officer at the Service Management business unit of BMC Software Inc. and a co-founder
of Remedy Corp., now a part of BMC.
Doug Mueller is the chief technology officer for the Service Management Business Unit of BMC Software and a co-founder of Remedy,
now a part of BMC.
The software is designed to help businesses unify service- and infrastructure-management tools to promote database management
consistency and simplified integration among processes.
BMC Software on Monday will announce the availability of its Atrium Configuration Management Database (CMDB),
intended to help customers unify their service and infrastructure management.
Based on industry-standard IT Infrastructure Library requirements for enterprisewide database management with consistency and
simplified integration among different management processes, the CMDB is also the first offering by BMC to be branded under the Atrium
name, says Andrej Vlahcevic, senior product marketing manager for change and configuration management at BMC.
Over the course of the year, BMC plans to introduce other management products under the Atrium brand. "A lot of people see a CMDB
as a common set of information that captures data on the configuration and relationship of items in your IT environment," Vlahcevic
says. "We believe it has to be more." The Atrium database was designed to integrate both service and infrastructure-management applications,
he says, as well as complement the company's existing line of discovery tools.
The Atrium CMDB includes a reconciliation engine that lets users combine input from multiple data sources and identify and reconcile
any differences to establish a configuration profile. "If you don't have strong reconciliation, the CMDB will end up with repetitive
data that ultimately will create confusion," Vlahcevic says.
The Atrium CMDB was designed with industry standards in mind, he says, including those endorsed by the Distributed Management
Task Force and the Common Information Model. The platform supports all primary IT Infrastructure Library configuration item classes
and more than 80 potential relationship types that can be leveraged to characterize an IT environment.
The Atrium CMDB is integrated with eight existing BMC applications, including the IT Discovery Suite, Service Impact Manager version
5.0, and Remedy IT Service Management Suite version 6.0. It's available now and can be purchased as part of any BMC Remedy IT Service
Management version 6.0 products and the Service Impact Manager version 5.0.
PIKT� is a registered trademark of the University of Chicago. Copyright � 1998-2005 Robert Osterlund. All rights reserved.
PIKTis cross-categorical, multi-purpose software to monitor and configure computer systems, report and fix problems,
manage system security, arrange job scheduling, format documents, install files, assist command-line work, and perform
many other common systems administration tasks. PIKT is used primarily for system monitoring,
and secondarily for configuration management, but its flexibility and extendibility evoke many other uses limited only by
your imagination. One reviewer said of PIKT, "this is by far one of the most interesting/powerful
tools I have seen for Linux administration." Another wrote that PIKT "excels at handling a diverse
collection of machines, saves time and eliminates repetition, and gives you a global view of your site."
PIKT has been compared favorably to commercial software costing hundreds of thousands of dollars. Yet
PIKT costs you nothing! Who uses PIKT? The
answer might surprise you. To learn more,
read the Introduction pages. For example uses and configurations,
visit the Samples pages.
What is PIKT
An acronym: Problem Informant/Killer Tool.
An innovative new paradigm for administering heterogeneous networked workstations.
System monitoring software that both reports and, wherever possible, fixes problems.
A cross-categorical, multi-purpose toolkit with uses limited only by your imagination.
A content management system for websites, an aid in configuration management, and a basis for building system security.
An embedded scripting language and accompanying script interpreter.
A sophisticated script and system configuration file preprocessor for use with the Pikt scripting language or any other scripting
language of your choice.
A cross-platform, centrally run job scheduler (like cron), customizing file installer (like rdist), command shell enhancement,
and comprehensive script and configuration management software.
Available for Linux and many other flavors of Unix, including AIX, Digital UNIX, FreeBSD, HP-UX, IRIX, OpenBSD, Solaris, and
others.
PIKT is Open Source software distributed under the GNU GPL.
A GUI-enhanced system. This is in the works, but as an option only.
An all-purpose programming language. PIKT is tailored to systems administration, designed to
work hand-in-hand with other languages, not to replace them.
Available for Mac OS X and other Unix variants. (Not yet anyway.) Nor is it available for Windows. (Perhaps some day.)
The last word in system monitoring software. (But maybe it's farther back in the dictionary.)
Why the name "PIKT"?
PIKT is like a military picket, "a group of soldiers or a single soldier stationed, usually at an outpost,
to guard a body of troops from surprise attack" (Webster's New World College Dictionary).
A pickets' primary mission is to warn of the enemy's advance, but to fight if necessary. Similarly, PIKT's
primary task is to warn of problems, but to fix those problems when needed.
This document is a basic introduction about few useful tools for a sysadmin that wants to install OS, to perform simultaneous
operations on multiple machines via ssh and to upgrade a machine already installed using an automatic (or manual) procedure. For
more detailed information please refer to the bibliography added in the following paragraphs.
IMPORTANT: this document is based on our experience with a farm running Scientific Linux CERN 3.0.4 and should not be considered
a general guide, i2ady described in another document:
how it is possible to setup a kickstart installation server. Here we will add only few notes about the customization of the kickstart
file, providing an example:
that must be changed accordingly with a specific site configuration. This example was written with the idea to install a Scientific
Linux CERN OS, from which we removed few packages (or turned off few services) not strictly needed for machines not located at CERN.
To find all the possible options for a kickstart file please refer to:
In our kickstart file example it is shown how it is possible to add (or remove) different groups of packages, for example:
@ Text-based Internet
add the packages: mutt, fetchmail and elink.
It is possible to use a graphical tool redhat-config-packages to show the full list of package in a group like Text-based
Internet.
To add/remove a single rpm it is possible to use a single line like:
-phone
to exclude the installation of the phone rpm. Vice-versa to add a rpm it is possible to use:
+<package name>
for example if you want to install wget it is sufficient to add:
In the kickstart file it is possible to include operations to be performed after the OS installation, at the first
reboot. In our kickstart file few example are present as a reference, in the section %post. We will comment about them in the APT
section.
As an example if you want to configure the INFN AFS cell add the following lines in the post-install section of the kickstart file:
In this distribution by default APT comes with CERN configuration to use the CERN RPM repository. Details of this configuration,
with some explanation about apt commands, are available here:
In our kickstart file example we included a post-install section to re-configure APT in order to use a local RPM repository (see
also http://grid-it.cnaf.infn.it/fileadmin/sysadm/akserver/akserver.html).
You can change the APT sources.list.d configuration via post-install:
mv /etc/apt/sources.list.d/dag.list /etc/apt/sources.list.d/dag.list.orig
cat >> /etc/apt/sources.list.d/local.list <<EOF
# Your local repository
rpm http://<YOUR_KICKSTART_SERVER> rep/slc304-i386 os updates extras localrpms
EOF
where <YOUR_KICKSTART_SERVER> is your RPM server configured for APT usage. Our re-configuration will add a local repository
(localrpms) that could be used to customize your OS including for example ``private'' RPMs (example: ssh configuration, tools, ....).
In our kickstart example we included the APT preferences modification to give higher priority to all the RPMs in the
localrpms section of the repository.
mv /etc/apt/preferences /etc/apt/preferences.orig
cat >> /etc/apt/preferences <<EOF
# Maximum priority to local rpms
Package: *
Pin: release c=localrpms
Pin-Priority: 1001
EOF
For example in the CERN-SL, pine is installed via a CERN customized rpm. If you will put a ``plain'' pine rpm in
localrpms repository - after apt-autoupdate will run for the first time - this one will replace the previous
one.
Also if will be available a higher version of ``CERN'' pine in the CERN-SL apt-autoupdate will preserve the ``localrpms'' one.
It is also possible to use a pin mechanism for a single rpm instead of a directory, for example for sylpheed package including
in the APT preferences:
Nearly every system administrator tasked with operating a cluster of Unix machines will eventually find or write a tool which
will execute the same command on all of the nodes.
At Fermilab has been created a tool called "rgang", written by Marc Mengel, Kurt Ruthmansdorfer, Jon Bakken (who added "copy mode")
and Ron Rechenmacher (who included the parallel mode and "tree structure").
The tools was repackaged in an rpm and it is available here:
It relies on files in /etc/rgang.d/farmlets/ which define sets of nodes in the cluster.
For example, "all" (/etc/rgang.d/farmlets/all) lists all farm nodes, "t2_wn" lists all your t2_wn nodes, and so forth.
The administrator issues a command to a group of nodes using this syntax:
rgang farmlet_name command arg1 arg2 ... argn
On each node in the file farmlet_name, rgang executes the given command via ssh, displaying the result delimited by a node-specific
header.
"rgang" is implemented in Python and works forking separate ssh children which execute in parallel. After successfully waiting on
returns from each child or after timing out it displays the output as the OR of all exit status values of the commands executed on
each node.
To allow scaling to kiloclusters it can utilize a tree-structure, via an "nway" switch. When so invoked, rgang uses ssh to spawn
copies of itself on multiple nodes. These copies in turn spawn additional copies.
Users will need to have python (tested on Python 1.5.2 and 2.3.4) installed too. It is also supplied a "frozen" version of rgang
that does not need any additional package and can be found in /usr/lib/rgang/bin/.
It has been created a "pre-script" (/usr/bin/rgang ) that sets the appropriate environmental variables and then execs the python
script or "frozen" version. You have to change the name of the executable depending on the one you are planning to use. In the python
case:
#!/bin/sh
pathToRgang=/usr/lib/rgang/bin
rgOpts="--rsh=ssh --rcp=scp"
# this has to be uncommented if you have a Python version over 2.3
#pyOpts="-W ignore::FutureWarning"
exec python $pathToRgang/rgang.py $rgOpts "$@"
if you need to use the frozen version modify the pre-script
as follows:
#!/bin/sh
pathToRgang=/usr/lib/rgang/bin
rgOpts="--rsh=ssh --rcp=scp"
# this has to be uncommented if you have a Python version over 2.3
#pyOpts="-W ignore::FutureWarning"
exec $pathToRgang/rgang $rgOpts "$@"
In the following lines it's shown by examples the typical usage of 'rgang' refer to the documentation or usage/help from 'rgang
-h' for the whole of the options.
ATA disks
Serial ATA hard drives are identified from BIOS (with few exceptions) as /dev/sda, /dev/sdb, ... instead of /dev/hda/, /dev/hdb,
be carefull in your kickstart file.
It could be useful to distribute the RSA-key from your mother-node to your target-nodes so that you can use ssh-agent for authentication.
To create a key on your mother-node:
ssh-keygen -t dsa
then to copy the public key to the target-nodes in interactive mode (``-pty''):
and type the pass-phrase you choose when created the key, then use 'rgang' as usual (no interactive option).
Freshmeat admin script selection:
FAI (fully automatic installation) is a non-interactive system
to install a Debian GNU/Linux operating system on a group of PCs or a Linux cluster. After installation, the systems are fully configured
and ready to run. It is a scalable method for performing unattended installation and updating. Changes to the configuration files
of the operating system are made by cfengine, shell, and Perl scripts.
survey is a nearly complete list of your system's
configuration files. It also lists installed packages, hardware info, dmesg output, etc. The resulting printout is 25-50 pages in
size, fully documenting your system. Survey is invaluable after a blown install or upgrade when it is too late to get this information.
Large organizations could use survey to document every Linux system.
fs-check checks filesystem sizes to see if
they are getting too full. It uses a configuration file that specifies the filesystems to check, email contacts, trigger thresholds
(percentage or amount used/unused), and a report program to run. It includes fs-report, which shows things like the largest files,
the newest files, and core files. It can be run from cron or as a daemon.
super-session is a substitute for your
skel files. The skel files define all the environment when you login, including shell prompt, environment variables etc. This package
introduces modular and portable configurations. The same configuration can be used in any UNIX system, including Linux, AIX, Solaris,
Cygwin (UNIX tools for Windows), with no changes. It will detect your OS automagically and set the best TERM, prompt etc. In addition,
it will start an SSH agent, and ask for password, if an SSH private identification is found.
ConfigEdit is a powerful, text-based
configuration management system which can be used to manage configuration files of all machines on a network. All configurations
are stored within a repository and whenever a file is updated, it controls version information and can trigger user-defined commands
(e.g., scp). Rolling back to an older version is very easy. Triggering commands (like "scp file account@myserver1") on each new version
can be used to automatically distribute the new version to all machines that use that file.
iBackupsimplifies the task of backing
up the system configuration files (those under /etc) for Solaris, *BSD, and Linux systems. You can run it from any directory
and it will, by default, save the (maybe compressed) tarball to /root. It is possible to encrypt the tarball, to upload the tarball
to another host, and to run the backup automated in a cron job. You can also create a nice HTML summary of a system using the included
sysconf.
ConfigNaturalis a Perl module that can
read generic configuration files with a simple but very flexible syntax. It has very few constraints on the data, and has support
for multi-line values and nested lists. Its object oriented API is compatible with modules like CGI or HTML::Template, and it can
be safely associated with the latter.
siga, System Information GAthering, collects various
information on your SuSE Linux System, and outputs it in HTML or ASCII format. You can edit your configuration files locally or remotely.
It works with and without X, on different browsers and editors.
FileReader is a Perl module which allows
you to easily read, write, and rewrite your configuration files. This module can modify Perl source code for supporting a configuration
file, thanks to the generateConfFile() function. It also has many more features.
vestaweb is a CVSWeb-inspired interface to
the Vesta Configuration Management System. It allows browsing of the full repository, diffing any version of packages, and can optionally
allow checking packages in and out.
Vesta Configuration Management System For substantial software systems
(say, 500k source lines or larger), effective software configuration management (SCM) is a serious problem. It is not generally known
how to make a configuration management system that is both easy to use and general enough to handle multi-million line software projects.
As a result, the world is full of easy-to-use, small-scale configuration management tools and large-scale, hard-to-use ones. Vesta
is a portable SCM system targeted at supporting development of software systems of almost any size, from fairly small (under 10,000
source lines) to very large (10,000,000 source lines). Vesta is a mature system. It is the result of over 10 years of research and
development at the Compaq/Digital Systems Research Center, and it was in production use by Compaq's Alpha microprocessor group for
over two and a half years. The Alpha group had over 150 active developers at two sites thousands of miles apart, on the east and
west coasts of the United States. The group used Vesta to manage builds with as much as 130 MB of source data, each producing 1.5
GB of derived data. The builds done at the eastern site in an average day produced about 10-15 GB of derived data, all managed by
Vesta. Although Vesta was designed with software development in mind, the Alpha group demonstrated the system's flexibility by using
it for hardware development, checking their hardware description language files into Vesta's source code control facility and building
simulators and other derived objects with Vesta's builder. The members of the former Alpha group, now a part of Intel, are continuing
to use Vesta today in a new microprocessor project. In the latter half of 2001, Vesta was released by Compaq under
the GNU LGPL.
System Audit Script collects the metadata
that is not normally included in a typical UNIX filesystem backup (e.g., disk partitioning info, logical volume info, network configuration,
etc.) This information can then be used to perform a bare metal recovery of a system.
MID The Machine Inventory Database is a Perl-based
CGI interface to manage the machines on and off your network, both from the IP assignment perspective and the asset-tracking perspective.
On top of acting as a frontend to a handful of MySQL tables, it handles IP assignment and acts as a frontend to the configuration
files for BIND, YP, and DHCPD to reduce the chance for typos in the configuration files which tend to bring down service.
System Configuration Collector (SCC) is yet another
configuration collector. It consists of a client and a server part. The client collects configuration data in a structured snapshot,
compares the new snapshot with the previous one, and adds differences to a logbook. Then the snapshot and the logbook are converted
to HTML for local inspection. Optionally, the data can be sent to a system running the server software. On the server, summaries
of the data are generated, and search/compare operations on the snapshots and logbooks are available via a Web interface.
confstore is a configuration backup utility.
It scans a system for all recognised configuration files and then stores them in a simple archive. It knows what to scan for by reading
a definitions file. Confstore can also restore configuration from backup archives it has previously created.
Scmbug is a system that integrates software
configuration management with bug-tracking. It aims to be a universal tool that can glue any source code version control system (such
as CVS, Subversion, and arch) with any bug-tracking system (such as Bugzilla and GNATS)
3/25/2004In this interview we learn how the System Configuration Collector (SCC) project began, how the software works, why Siem
chose to make it open source, and information on future developments.
Introduction:
Have you ever noticed changes on your departmental server, but couldn't quite pinpoint what exactly happened? How many times have
staff forgotten to make an entry in the log-book, or the entries made were not detailed enough? Administrators are faced with these
problems on a day-by-day basis. The System Configuration Collector (SCC) project attempts to automate this process. Rather than depending
on staff to keep accurate records, SCC enables a system to record all changes taking place. Additionally, the software has the functionality
to send all configuration data to a central server so that it can be analyzed when needed.
LinuxSecurity.com: Please tell us about the SCC project and how it began. When did it start, and who are some of
the key contributors?
Siem Korteweg: In 2001 a younger colleague asked whether it was possible to automatically
track the changes that were made to the configuration of a system. I told him that was impossible due to variable nature of the
output of the commands we have to use to show the configuration of a system. Being a much younger colleague he accepted this answer.
But I did not like to say it was "impossible" and it kept nagging me.
I thought that when I could split the variable and fixed parts of the output of system commands, I would be able to track changes.
I started a small, hobby project by collecting configuration data and preceding each line with "fix:" or "var:". After some time
I was able to detect some changes made to configuration. But when a kernel parameter was changed, all I saw was a change from
128 to 256. I had to search in the snapshot to find out what part of the configuration had changed. Therefore I extended the fix-var
classification with a hierarchy of keywords indicating the nature of the data.
The development continued and the customer where I was developing the software, was wondering how to maintain this software
without hiring me indefinitely. By that time I realized that this software also could/should be used by others. I talked to the
manager of the customer and to the manager of the company I am working for and suggested to make SCC a GPL project. They both
agreed and from then on, SCC was an Open Source project. To extend the collection of configuration data I looked at the code of
cfg2html and check.sh (HP specific) and the FAQ's of several newsgroups. At the customer site where I started developing SCC,
we deployed the software on some 300 systems. This gave us a great opportunity to tune the "fixed" and "variable" parts of the
configuration to avoid unnecessary changes.
The first versions of the software collected configuration data and converted the data and logbook to HTML on a per system
basis. At the customer site, Bram Lous started to collect all snapshots and logbooks on a server and built the first version of
the CGI-interface. Later on, Paul te Vaanholt contributed much for the HP OpenView modules. His main contribution is the analysis
and conversion to SCC-format of the Operations Center database. A colleague Oscar Meijer wrote the Windows version of the SCC-client,
based on WMI and WSH. The configuration of the data we are collecting on Windows systems still needs to be tuned. The software
itself is stable, but it detects too many changes. The whole process of tuning what data is "fixed" and what data is "variable"
takes quiet some time.
LinuxSecurity.com: What is the most important benefit an administrator can get out of SCC? How can this improve
the overall security of a network or host?
Siem Korteweg: Each administrator should document his/her systems. We all
know that, but we all lack time to do this properly. SCC automates the documentation process. For HP-UX systems SCC collects
more than 95% of the configuration of the system is covered by SCC. For other system the percentage is somewhat lower at the moment.
The logbooks and snapshots can assist administrators in finding the cause of an incident. Configuration changes can have
unwanted side-effects (on other systems). By examining the logbooks for the changes during the last days/weeks an administrator
might find the cause of an incident easier/faster. Another way of using the SCC-data to find the cause of an incident is to compare
(parts of) the configuration of a system with a comparable system that does function correctly.
Comparing the configuration of systems can also be used to assure that the systems in a cluster are consistent and identical.
Do they run the same (versions of) software? Do they have the same kernel-configuration? It is also possible to check your security
policies. Just check the snapshots on the server for the aspects of the policies. By default the server checks and signals accounts
without a password.
Another use of the SCC-data on the server is to quickly identify systems. After an advisory from Sun, I was able to identify
within one minute the 100 systems that needed to be addressed out of a total of 600 systems. Because the selection was
automated and because the collection of SCC-data was accurate and outdate, I did not miss a system. This obviously contributes
to the safety of the network.
LinuxSecurity.com: How difficult is it to get started? How long would it take for an administrator to get the system
fully setup? Can you describe at a high level the steps necessary to setup SCC?
Siem Korteweg: The easiest way to start and get the feeling of the software is to
install only the client part and keep the data and logbook on the client. Just create a simple cron-job after the installation
of the client and you are finished. This way you are able to pilot the software before you deploy it more widely.
The setup of the server takes some more steps. First you have to decide how to transport the SCC-data from the clients to the
server. Supported mechanisms are email (optionally encrypted, using OpenSSL), scp, rcp and cp. Then setup the webserver
to display the data. To achieve this, you have to indicate the path under the document-root and indicate the CGI-script of SCC.
Then schedule a cron-job to transfer the SCC-data that is sent by the clients from the transfer-area to the website Finally all
cronjobs of the clients have to be extended with the proper options to transfer the SCC-data to the scc-server.
LinuxSecurity.com: What improvement would you like to make in the future? What direction is this project heading?
Siem Korteweg: When running SCC on a system that uses clustering software, like
MC ServiceGuard from HP, switching a "package" from one system to another, results in changes of the SCC-data for both systems
involved in switching. We want to make the software cluster-aware by extracting the configuration data for each package and sending
it separately to the scc-server.
Another future extension is the collection of the configuration of network devices like routers and switches.
LinuxSecurity.com: What advantage does SCC have over using a typical pen & paper log book for recording system changes?
Siem Korteweg: It is automated, so it does not "forget" to record a change (supposing
the changed attribute is part of the SCC-snapshot). It is not lazy (once you run it through cron). - The pen & paper logbook is
a physical item that can only be at one place. Each admin of a group of systems can be at a different place, without access to
the paper logbook. Suppose 7x24 systems, where the admins "follow the sun". - By consolidating all snapshots on a system with
scc-srv, you obtain much data that can be searched automatically. This enables you to quickly identify the systems that need an
update or to compare two systems when one of them does not function correctly. This is impossible with pen & paper.
LinuxSecurity.com: What operating systems does SCC run on? What type of license is it under?
Siem Korteweg: HP-UX, Solaris, AIX, Linux (RedHat, Suse, Gentoo). As the code of
SCC only uses "standard" Unix tools, I think it runs on almost all Unix/Linux systems. The coverage of the configuration data
depends on the OS. For example the coverage of HP-UX configuration is more than 90%. For other systems this will be less. The
license is GPL.
LinuxSecurity.com: If an administrator needs assistance setting up or configuring SCC is support available? If so,
how can support be obtained?
Siem Korteweg: Besides the documentation on our website, SCC comes with documentation
and manual pages. We offer an implementation service, where a consultant visits a customer and installs the server and at most
5 clients and introduces the software to the admins of the customer. This is only feasible in the Netherlands. Otherwise, support
via email is possible. When the requested support is more than a few simple questions, we have to agree upon payment.
LinuxSecurity.com: How does SCC differ from other similar configuration collectors? What are some of the strengths
and weaknesses of SCC?
Siem Korteweg: SCC collects configuration data without formatting it immediately
to HTML. Instead it prefixes each line of configuration data with fix/var and a hierarchical classification. This makes it easy
to process the snapshots. The processing consists of comparing consecutive snapshots to generate the logbook, formatting the snapshot
to HTML and comparing the snapshots of two systems to determine the differences.
The philosophy of SCC is to collect data, not to judge its value or correctness. Stupid configuration errors in Apache/Samba
are not detected in scc, this should be done at the server where all snapshots are collected. Some might question the value of
all the data in the snapshots. It is true that a considerable part of the snapshots will never change during the lifetime of a
system. Nevertheless this data is collected, just in case someone needs it sometimes.
One commercial configuration collector works by allowing remote root-access to all clients from their server. This is not very
security minded. I had security in mind when coding scc and scc-srv.
A weakness of SCC is that I coded the classifications of all collected configuration data. This classification has to be used
when an admin wants to view specific information. I decided to store cron configuration data under classification "software:cron:"
and swap info under classification "system:swap:". Each user of SCC has to follow my intuition.
Another weak point is that the clients are autonomous. The scc-srv can be DOSed by mailing much snapshots from seemingly different
systems. Therefore, I suggest to install scc-srv only in a "trusted" network. Finally, scc has to do "reverse engineering" to
collect for example the Apache configuration. Apache can be installed and configured in dozens of different locations. We have
to determine the correct paths and files from the running processes.
LinuxSecurity.com: How can the project benefit from the open source community?
Siem Korteweg: The project can benefit from the open source community when admins
use it and contribute their extensions. These extensions can be specific applications/hardware/OS they use or new features. At
the moment some people already contribute knowledge of specific software. Feedback concerning the strong and weak aspects admins
experience while they are using SCC, is also valuable.
Area's for future extensions are SAN/NAS and network devices. I am looking for people and organisations that are willing to
contribute in any way in these areas.
About:Alist is a program that collects hardware and software information about systems
and stores it in a database for users to browse and search via a Web interface. The program consists of three parts: a client
portion that collects the information, a daemon that receives data sent from clients, and a CGI that displays and lets you search
for information. Clients for Solaris, Linux, FreeBsd, OpenBSD, and Mac OS X are currently available.
Changes: There is a new Windows module (MSWIN32.pm), a new Irix module (irix.pm), bugfixes
for the Linux module on Debian, and bugfixes for client/alist and hpux.pm.
Alist is written entirely in Perl 5. The server portion has been tested on Linux, Solaris, and Mac
OS X, and should run without any problems on any modern Unix OS, but may not work on non-Unixlike operating systems, due to calls
to fork(). The server needs to have a web server, Perl 5, and the Perl CGI.pm module.
The client portion requires Perl 5, but no modules outside the core distribution are required. There
are currently clients for Solaris, Linux, OS X, FreeBSD, and OpenBSD, Windows, HP-UX and Irix. Clients explicitly tested can be found
here.
BitMover builds and markets enterprise level development tools for software and web developers. Our flagship product is
BitKeeper, a powerful replicated and distributed configuration
management system. BitKeeper is supported on most
platforms, such as Microsoft Windows as
well as the various commercial and free Unix platforms. See the
products section for more information about BitKeeper and our other products.
Never used BitKeeper? Take the test drive and see how easy it
is to get started!
Please enjoy our web site and let us know if there is anything we can
About: ITracker is a Java J2EE issue/bug tracking system designed to support multiple projects with independent
user bases. It supports features such as multiple versions and project components, detailed histories, issue searching, file attachments,
dynamic reports with charts, and multiple email notifications.
>This article, the third one in a series on team development in IBM� WebSphere� Studio Application Developer, focuses on installing
and configuring CVS on RedHat Linux 7 as an SCM Repository. WebSphere Studio Application Developer (hereafter called Application
Developer) works seamlessly with CVS, the dominant open-source, network-transparent version control system. CVS runs on most platforms,
including Windows�, Linux, AIX�, and UNIX�. Installing it with Application Developer on RedHat Linux has several advantages:
Linux is now the dominant open-source operating system.
RedHat is one of the major distributors of Linux.
CVS is included in the RedHat Linux 7 distribution.
CVS for Linux is stable, reliable, and scalable, and is useful for individual developers and small teams as well as large,
distributed teams.
Application Developer runs on RedHat Linux 7.
When using Application Developer, you can use CVS as a local repository or as a shared repository for the entire team.
>However, installing and configuring CVS for Linux is not trivial and there is little good documentation available. The step-by-step
instructions below should help system administrators configure CVS for Linux for developers using Application Developer.
If you are serious about automating system administration, cfengine is a tool you should know. Ignoring cfengine
is a viable option only if you like to spend your days in the vi editor.
cfengine is a system configuration engine. It takes configuration scripts as input, and then takes actions based
on these scripts. It is currently at version 1.6.3 (a very stable release), and version 2.0 is on the horizon. For more information
on cfengine development, visit the cfengine Web site (see Resources later in this
article).
You don't have to use everything cfengine offers, and you will probably not need the whole thing all at once. Your
cfengine configuration files should start out simple, and grow as you discover more things that you want automated.
From the cfengine command reference, here are its most notable features:
File permissions and ACLs can be monitored and fixed. For example, /etc/shadow can be kept with 0400/root/sys permissions,
and if those permissions change, you can either warn the system administrator or fix them immediately.
NFS filesystems can be automatically mounted or unmounted, with the corresponding fstab changes.
Netmasks, DNS configuration, default routes, and primary network interfaces can be administered through a single file;
Files and directories can be recursively copied to another location, either locally or from a remote server.
Files can be edited (this is a very powerful feature, offering regular expressions and global search/replace), rotated
(log files, for instance), or deleted.
Files (singly and/or everything in a directory or matching a regex) and whole directories can be linked.
Processes can be started, killed, restarted, or sent arbitrary signals based on regular expression matches in the process
table.
Arbitrary commands can be run.
All of the above can be conditional upon the operating system type and revision, time of day, arbitrary user-defined classes,
presence or absence of files, directories, or data in files, and so on.
Even though you can do with Perl all the things that cfengine does, why would you want to reinvent the wheel? Editing
files, for instance, can be a simple one-liner if you want to replace one word with another. When you start allowing for system subtypes,
logical system divisions, and all the other miscellaneous factors, your one-liner could end up being 300 lines. Why not do it in
cfengine, and produce 100 lines of readable configuration code?
From my own experience, introducing cfengine to a site is quite easy, because you can start out with a minimal configuration
file and gradually move things into cfengine over time. No one likes sudden change, least of all system administrators
(because they will get blamed if anything goes wrong, of course).
Configuration file management
Managing configuration files is tough. You can start by considering whether cfengine is adequate for the task. Unfortunately,
cfengine's editing is line oriented, so complex configuration files will probably not be a good match for it. But simple
files such as the TCP wrappers configuration file /etc/hosts.allow are best done through cfengine.
Usually, you will want to keep more than one version of configuration files. For instance, you may need two sets of DNS configurations
in /etc/resolv.conf, one for external, and another for internal machines. The external DNS resolv.conf file could, naturally, go
into a directory called "external", while the internal resolv.conf could go into the corresponding "internal" directory. Let's assume
both directories are under a global "spec" directory, which is a sort of root for configuration files.
The following code will traverse the spec directory, searching for a filename suitable for a given machine. It will start at /usr/local/spec
and go down, looking for files that match the one requested. Furthermore, it will check whether or not each directory's name is the
same as the class belonging to some machine. Thus, if we request locate_global('resolv.conf', 'wonka'), the function
will look under /usr/local/spec for files named resolv.conf that are in either the root directory, or in children of the root directory
whose names match the classes that the "wonka" machine belongs to. So, if "wonka" belongs to the "chocolate" class, and if there
is a /usr/local/spec/chocolate/resolv.conf file, then locate_global() will return "/usr/local/spec/chocolate/resolv.conf".
If locate_global() finds multiple matching versions of a file (for instance, /usr/local/spec/chocolate/resolv.conf
and /usr/local/spec/resolv.conf), it will give up. The assumption is that we are better off with no configuration than with one of
the two wrong ones. Also, note that machines can belong to more than one class.
You can build on this structure. For instance,
/usr/local/spec/external/chocolate/resolv.conf
/usr/local/spec/internal/chocolate/resolv.conf
/usr/local/spec/external/sugar/resolv.conf
/usr/local/spec/internal/sugar
will contain files for external and internal "chocolate" and "sugar" machines. You just have to set up the your machine_belongs_to_class()
function correctly.
Once locate_global() returns a file name, it's pretty simple to copy it to the remote system with scp or rsync. Remember,
always preserve the permissions and attributes of the file. Scp needs the "-p" flag, and rsync needs the "-a" flag. Consult the documentation
for the file copy command you want to use. And there you have a unified configuration file tree.
Listing 1: Spec directory traversal
# {{{ locate_global: use spec directory to find a file matching the current class
sub locate_global($$)
{
# this code uses File::Find
my $spec_dir = '/usr/local/spec';
my $file = shift || return undef; # file name sought
my $machine = shift || return undef; # machine name
my @matches;
my $find_sub =
sub
{
print "found file $_\n";
push @matches, $File::Find::name if ($_ eq $file);
# the machine_belongs_to_class sub returns true if a machine
# belongs to a class; we stop traversing down otherwise
$File::Find::prune = 1 unless
machine_belongs_to_class($machine, $_) || $_ eq '.';
};
find($find_sub, $spec_dir);
if (scalar @matches > 1)
{
print "More than one match for file $file,",
"machine $machine found: @matches\n" ;
return undef;
}
elsif (scalar @matches == 1)
{
return $matches[0]; # this is the right match
}
else
{
return undef; # no files found
}
}
# }}}
One challenge once you set up this sort of /usr/local/spec structure is: how do we know that resolv.conf should go into /etc?
You either have to do without the nice hierarchical structure shown here, adapt it (replace "/" with "+", for instance -- a risky
and somewhat ugly approach), or maintain a separate mapping between symbolic names and real names. For instance, "root-profile" can
be the symbolic name for "~root/.profile". The last approach is the one I prefer, because it flattens out filenames and eliminates
the problem of having hidden filenames. Everything is visible and tidy, under one directory structure. Of course, it's a little more
work every time you add a file to the list. The program has to know that "resolv.conf" should be copied to "/etc/resolv.conf" on
the remote system, and "dfstab" should go to "/etc/dfs/dfstab" (the Solaris file for sharing NFS filesystems).
Now let's talk about what you can do once you have this spec directory hierarchy set up. You could, if you wanted to, look for
all the users named Joe:
Listing 2: Find all password files and grep them for Joe
grep Joe `find /usr/local/spec -name passwd`
Or you can use a tool such as rep.pl (link to rep.pl), written by David Pitts, to replace every word with another:
Listing 3: Find all hosts files and change "wonka" to "willy"
Now, you can write both Listing 2 and 3 in Perl, if you want; the find2perl utility was written just for that. It's
much simpler, however, to just use find from the start. It really is a wonderful utility that every system administrator
should use. More importantly, it took me 5 minutes to write the two listings. How long would it take you to figure out how to use
find2perl, store the code it produces in a file, then run that file? Try it and see for yourself!
Task automation
Task automation is an extremely broad topic. I will limit this section to only simple automation of non-interactive UNIX commands.
For automation of interactive commands, Expect is the best tool currently available. You should either learn its syntax, or use the
Perl Expect.pm module. You can get Expect.pm from CPAN; see Resources for more
details.
With cfengine, you can automate almost any task based on arbitrary criteria. Its functionality, however, is a lot
like the Makefile functionality in that complex operations on variables are hard to do. When you find that you need to run commands
with parameters obtained from a hash, or through a separate function, it's usually best to switch to a shell script or to Perl. Perl
is probably the better choice because of its functionality. You shouldn't discard shell scripts as an alternative, though. Sometimes
Perl is overkill and you just need to run a simple series of commands.
Automating user addition is a common problem. You can write your own adduser.pl script,
or you can use the adduser program provided with most modern UNIX systems.
Make sure the syntax is consistent between all the UNIX systems you will use, but don't try to write a universal adduser program
interface. It's too hard, and sooner or later someone will ask for a Win32 or MacOS version when you thought you had all the UNIX
variants covered. This is one of the many problems that you just shouldn't solve entirely in Perl, unless you are very ambitious.
Just have your script ask for user name, password, home directory, etc. and invoke adduser with a system() call.
Listing 4: Invoking adduser with a simple script
#!/usr/bin/perl -w
use strict;
my %values; # will hold the values to fill in
# these are the known adduser switches
my %switches = ( home_dir => '-d', comment => '-c', group => '-G',
password => '-p', shell => '-s', uid => '-u');
# this location may vary on your system
my $command = '/usr/sbin/adduser ';
# for every switch, ask the user for a value
foreach my $setting (sort keys %switches, 'username')
{
print "Enter the $setting or press Enter to skip: ";
$values{$setting} = ;
chomp $values{$setting};
# if the user did not enter data, kill this setting
delete $values{$setting} unless length $values{$setting};
}
die "Username must be provided" unless exists $values{username};
# for every filled-in value, add it with the right switch to the command
foreach my $setting (sort keys %switches)
{
next unless exists $values{$setting};
$command .= "$switches{$setting} $values{$setting} ";
}
# append the username itself
$command .= $values{username};
# important - let the user know what's going to happen
print "About to execute [$command]\n";
# return the exit status of the command
exit system($command);
Another task commonly done with Perl is monitoring and restarting processes. Usually, this is done with the Proc::ProcessTable
CPAN module, which can go through the entire process table, and give the user a list of processes with many important attributes.
Here, however, I must recommend cfengine. It offers much better process monitoring and restarting options than a quick
Perl tool does, and if you get serious about writing such a tool, you are just reinventing the wheel (and cfengine is
stealing your hubcaps). If you do not want to use cfengine for your own reasons, consider the pgrep and pkill utilities
that come with most modern UNIX systems. pkill -HUP inetd will do in one concise command as much as a Perl script four
or more lines long. This said, you should definitely use Perl if the process monitoring you are doing is very complex or time sensitive.
For the sake of completeness, here is a Proc::ProcessTable example that shows how to use the kill()
Perl function. The "9" as a parameter is the strongest kill() argument, meaning roughly "kill process with extreme prejudice,
then feed it to the piranhas." Do not run this as root, unless you really want to kill your inetd processes.
Listing 5: Running through the processes, and killing all inetds
use Proc::ProcessTable;
$t = new Proc::ProcessTable;
foreach $p (@{$t->table})
{
# note that we will also kill "xinetd" and all processes
# whose command line contains "inetd"
kill 9, $p->pid if $p->cmndline =~ 'inetd';
}
A typical Unix contains 20,000 files. A typical large site contains 100 or more hosts. Keeping each of the resultant 2 million files
correct and consistent is a difficult version control problem. Often the problem is not solved, and each host becomes a unique collection
of files from differing operating system versions. Reliability plummets as versions of programs interact that vendors never tested
for interoperability, and the cost of maintenance soars as the same problem is solved differently for each host. What is needed is
a place to store operating system distributions under version control, a place to generate configuration files that differ between
hosts, and a method to install these files onto running systems with minimum interruption and maximum automation. The Host Factory
software from Working Version fulfills all of these needs. Components of Host Factory include the Pgfs version control filesystem,
a Host Profile developed for your site, and the Pdist filesystem replicator.
netSwitch 0.1.3 A boot-time network configuration
tool for Linux laptops.
Helix Setup Tools 0.2.0 A simplified interface for
Unix workstation configuration.
Information Resource Manager - IRM is a Web-based asset
and problem tracking system built for IT departments and helpdesks. It keeps detailed information, both hardware and software, about
each computer, as well as a complete history of all work requests ever placed.
SFI Director - The SFI Director is a tool for
managing distributed, hetergeneous UNIX Systems.
Its functionality includes System Configuration, Application Distribution, NIS & NIS+ Management, User Creation and Dynamic System
Documentation.
LANdb is a network administration CGI package written in Perl. It uses a RDBMS (ie MySQL or
Oracle) to store information on all network hardware, connections, and connection statuses.
Perl-cfd is an superior implementation
of the cfengine 1.x server daemon. It has been tested with cfengine v1.4.17 and v1.5.3 clients. It should work with older v1.4.x and
other v1.5.x clients.
SysWatch
SysWatch is a Perl CGI to display current information about your UNIX system. It can display drive partitions,
drive use, as well as resource hogs, and what current users are doing.
[This article is essentially a compacted-for-LINK.bnl version of one of the topics covered by MIX (Monthly Information eXchange)
Meeting Notes - 09/24/97, written by Susan Sevian. The speaker for this topic was Jim Flanagan of CCD's Advanced Technology and Planning
Section.
Notes from any of our MIXes -- generally more detailed than what we provide in LINK.bnl --
are available on the web. Please see the reference to MIXed Notes at the bottom of our
MIX page.]
Tools for large scale system administration are being developed in conjunction with the RCF (RHIC Computing Facility)
/ CCD effort to set up and manage computing systems for RHIC. With a large number (hundreds) of RHIC computers, such system administration
tools are needed in order to avoid tedious and error-prone manual efforts to synchronize operating system and node configuration
changes.
Under the strategy adopted by RHIC/CCD, configuration information is kept in a hierarchical, class-based central
repository, with the configuration of each node viewed as a specialization of more abstract configuration classes. The tool being
developed for manipulating this repository is SyRCS, a wrapper around the Revision Control System (RCS), written in Perl.
SyRCS provides simple, familiar commands (emulating such UNIX and RCS commands as ls, ci, co), which are used to maintain
and inspect the repository and to check node configurations against the repository for "undisciplined" or unauthorized changes.
Master System is a public-domain Unix systems configuration
tool written in Perl. The system is architecture and operating system independent, but it can handle architecture and operating system
dependent configuration. It is designed to control the configuration of large groups of systems that are configured in the same style,
but are not necessarily identical. From a group at Rutgers University.
Webmin is a free web-based admin interface for Unix systems. Via a web browser,
you can configure DNS, Apache, Samba, filesystems, startup scripts, inetd, crontabs and more. Written in Perl5 and easily extendable.
Supports several Linux versions and Solaris.
There's no need to spend days documenting your servers. I've written a program that can help. unixdoc collects all the configuration
files and other information about your computers into an HTML file and sends it to a display server where it can be viewed with a
browser. It works on Solaris 2.6/7/8 and on HP-UX 10.20. On the display server, you can see an overview page with all your systems
as shown in Figure 1. By selecting a computer, the
unixdoc HTML page of this computer will be displayed as shown in
Figure 2.
The unixdoc HTML file of a Solaris computer consists of the following 18 sections:
Hardware
Eeprom
Kernel
Networking
Software
Nameservices
Bootup
Disk
Disk Hardware
Users
dmesg
Printers
Cron
Rhosts
Quota
Syslog
Xntpd
Sendmail
The information in these sections consists of either config files or the output of a command. With unixdoc, it is
easy to compare the configuration of two servers. You just have to open the two unixdoc HTML pages of the servers and compare the
content, section after section. You don't have to do a login on the two servers, or to remember all those commands to display the
configuration. I find subsection 4.1.1 ifinfo helpful, because it provides a good overview of all the network interfaces
(speed, mode, etc.). (Subsection 4.1.1 is shown in Figure
3.) The information in this subsection is very useful when verifying the speed/mode settings between your switches and servers.
An example of the entire unixdoc HTML page can be found at: http://www.net.li/article The software can be found at:
http://www.net.li/article
[Mar 19, 2001] In Daniel Robbins' newest tutorial,
learn to use CVS to check out the latest software sources, or begin using CVS as a full-fledged developer. (Linux)
Document Management Systems
[Apr 04, 2001] Ecora -- very nice package that includes Solaris documenter
with HTML output
Whether you are an IT manager, systems integrator, consultant, or reseller, the demands on the IT environments you support are considerable
and complex. Preparing for an IT audit, for example, is a time-consuming and tedious process. Our Documentor and IT Auditor products
automatically create a comprehensive, natural-language report of your IT infrastructure. This can be used to create an audit trail
to meet HIPAA requirements, prepare for a security audit or provide thorough documentation for a system audit. We invite you to experience
for yourself the benefits of documentation. Click
here to download an .exe file to document a
server for free.
Benefits to system documentation:
Create baseline system & security documentation for IT audits
Preserve your IT knowledge base
Quick disaster recovery
Train new staff efficiently
Simplify server consolidation & network mergers
Baseline & document platform migrations at each milestone
Company B received a contract to develop a new piece of hardware. As part of this contract, they were to supply their documents
online.
First, company B looked into a Commercial, Off-The-Shelf (COTS) document management system. It seemed to meet all of their needs,
until they found out that the cost was over $600,000. The price was way too high, in fact it was higher than the original budget
for the whole contract!
Next, they decided to go with a proprietary document management system (DMS) that the company had an enterprise license for. This
DMS was supposed to be the "do-all, end-all" DMS that would solve all of their problems. And since it was a commercial product and
they had an enterprise license for it, the managers of the project assumed that there must be plenty of support available for it.
Company B spent over 6 months installing, configuring, and tweaking this DMS system on the new hardware that they had to buy in
order to run it. When they ran into trouble, they called the people within company B who were supposed to be experts on the system
for help. These experts didn't know the system any better than the group working on the project and support from the software company
was either too pricey, or not much help. So much for the availability of support for this COTS product!
After 6 months of frustration, they gave up on the company standard DMS and implemented a "solution" using File Manager. This
solution provided no features of a DMS, was cumbersome and documents were hard to find.
Perl to the Rescue
At this point I came along - and I was completely confident that I could solve their dilemma using a web-based solution with Perl.
What other language would I use?
I talked with the program managers and we discussed what the needs of the DMS were. Next, I gathered user input, which, in my
opinion, is the most important factor. When developing a system that is going to impact the way your users work on a system,
it is important to understand their needs. After considering the needs of users and management, I proposed a Web-based DMS which
management quickly approved. Now all I had to figure out was: how am I going to pull this off?
I started to develop the new system and the pieces seemed to fall into place. Eight weeks later, when we rolled out the new Perl
DMS system, I completely shut off the existing File Manager access so users had no choice but to use the new system. It was a rather
brutal way to force them onto the new system, but one that I felt was necessary.
The New System
The new Perl DMS system has the following features (and more):
Completely web-based.
Single logon, using Windows NT rights.
Database back-end:
controls document status, stores users, etc.
Document check in/out based upon user rights.
Page views based upon user rights:
Users only see documents they have rights to.
Full-text search and keyword search capabilities.
Users determine a files rights when they add a new document.
The Open Watcom project requires an industrial strength source control system, that's why we selected Perforce for the job.
ALAMEDA, Calif., Sept. 29 /PRNewswire/ -- Perforce Software, Inc. today announced that SciTech Software has selected the Perforce
source code control system to manage the Open Watcom source code base. The Perforce software will enable the large team of developers
participating in the Open Watcom worldwide to have up-to-the-minute access to the latest Open Watcom source code via the Internet.
"Perforce itself has benefited tremendously from Open Source software, and we feel it is only fitting that we return the favor.
We're especially happy to be supporting the Watcom C++ compiler, which powers a number of our platforms," said Christopher Siewald,
president and chief technology officer of Perforce Software.
Perforce Software makes its Fast Software Configuration Management System available at no charge to bona fide organizations developing
freely available software, such as OpenWatcom.org. The Open Watcom code base consists of nearly three million lines of code.
"The Open Watcom project requires an industrial strength source control system, that's why we selected Perforce for the job,"
said Kendall Bennett, Director of Engineering at SciTech Software, Inc. "SciTech uses Perforce for internal projects, so we know
that it can handle the massive demands that the Open Watcom project is going to place on a distributed source control system."
Developers wishing to access the Open Watcom Perforce system can register at Open Watcom's web site (
http://www.openwatcom.org ) to be automatically notified when it comes online.
About Open Watcom
Open Watcom is the result of the Open Source release of the Sybase Watcom C/C++ and Fortran compilers. The Open Watcom products
are the first mass market, proprietary compilers to be open sourced and, weighing in at nearly three million lines of source code,
represent one of the largest pools of commercial source code of any type ever released under an Open Source license. Sybase,
Inc. developed the original Watcom code and SciTech Software, Inc. is the official maintainer of the project. The project has already
stirred tremendous interest among thousands of developers worldwide, who will use and contribute to its further development. Open
Watcom supports software development in Windows, DOS, OS/2, Netware, QNX, and other operating systems. A Linux version of Open Watcom
is planned. The Open Watcom web address is http://www.openwatcom.org.
A scalable configuration management system, supporting globally distributed development, disconnected operation, compressed repositories,
change sets, and named lines of development (branches).
Distributed means that every developer gets their own personal repository and the tool handles moving changes between repositories.
SSH, RSH, and/or SMTP can all be used as communication transports between repositories; or, if both are local, the system just uses
the file system. For example, this resyncs from a local file system to a remote system using ssh:
bk resync /home/lm/bk bitmover.com:/home/bk
Other features: file names are revisioned and propogated just like contents; graphical interfaces are provided for merging, browsing,
and creating changes; changes are logged to a private or public change server for centralized tracking of work; bug tracking is in
the works and will be integrated.
Wilma is a suite of CGI scripts that allows you to easily manage a list of items (broken into discrete categories) on the Web.
With Wilma, you can make lists of bookmarks, resources, reviews, classified ads, 'what's new' lists, bulletin boards and much more.
Anything that needs to be indexed and easily maintained is a good candidate for Wilma.
Version 1.xMN of Wilma is independent of the original distribution
by E-doc. It is free for non-commercial use (i.e., as long as you don't make money
off it-- see the license), and requires Perl 5 on a Unix machine.
Using Wilma
Wilma is extremely flexible. You can have a public submission facility, to allow anyone to add resources, or you can password
protect it (with .htaccess)
to restrict access to selected people; in this way, you can manage lists of meeting minutes, job offerings or items for sale. You
can even use Wilma (or several Wilmas) to manage an entire site's index. By keeping control over the organization of a site with
Wilma while allowing people to add and update pages at will, you can take the headache out of Intranet management.
Downloading Wilma
The most current version of Wilma is 1.36MN, which includes
bugfixes and several new features. It's probably a good
idea to read some documentation first. Wilma is available in a tarred,
gzipped archive. To unpack it, move it to the desired directory and type
$ gzip -d wilma1.36.tar.gz $ tar -xvf wilma1.36.tar
I'd love to hear what you think of my version of Wilma; drop me a line!
About this Version
This version of Wilma is by Mark Nottingham, and is unsupported by
E-doc. While there have been many enhancements, none of it would be possible without their generous contribution of the original
software to the 'net. Thanks, Andrew and Daniel! Support queries and bug reports should go to
Mark Nottingham. Please check the
FAQ before mailing. If you're upgrading from a previous
version, you'll find that changing to this version only requires entering your values to the new wilma.conf file, as well as copying
your data directory over. Please pay attention to the
license information found in the docs/ directory,
as use of this software implies responsibilities to the current author, as well as the original authors. Enjoy!
Although, or perhaps because, I quit my first real job (at a quickly defunct startup company called Enfoprise, building "business
workstations") on the first day because they had changed my job assignment from UNIX driver writing to "Systems Integration", I have
had a longstanding love/hate relationship with configuration management tools like SCCS and RCS.
Boxes
My first published paper was "Boxes, Links, and Parallel Trees: Elements of a Configuration Management System" in the first USEnix
Workshop on Software Management. In this I described a centralized RCS database, with multiple "views" and hardlink cloning to save
space and time, as used by Gould Computer Systems Division's UNIX team.
Dissed by CVS
Brian Berliner (who preceded me at Gould, before he left for Prisma) deprecates my approach in one of the CVS papers, mainly because
he advocates an optimistic concurrency control approach, whereas he thought that I advocated locking. Actually, I advocate optimistic
concurrency control, but I also advocate locking in case the optimistic version gets into livelock; and, I usually insist that there
be a single, identified, serial schedule of source code checkins so that testing can proceed in a linear manner. I require programmers
to test that their new code works in a system with all previous fixes applied. (Although I recognize that even this requirement can
be relaxed.) I am amused that locking has slowly been creeping back into CVS.
How often does this happen to you? You add a new Web server to the network, inserting its IP address in /etc/hosts
with plenty of time to spare before the Demo For Big People. At T-minus one hour to demo, your browser can't resolve the hostname.
Neither can anyone else's.
Frantic, you check everything before finally coming back to /etc/hosts. Your change is
gone, probably because someone else edited the file around the same time and overwrote or removed your edits. You either need some
strong configuration control, or a truly loud warning bell that signals anyone's attempt to modify a critical file. Text editors
aren't databases -- they don't impose transactional consistency or concurrency control for multiple updates. This doesn't affect
you one bit if you're the sole system manager at your site, but as soon as two or more people are chartered to maintain the environment,
you need some sort of control system to serialize and document configuration changes. The downside is that you'll spend a non-trivial
amount of time deciphering changes made by your peers or un-doing valid work that conflicts with items on your own task list.
In this feature we look at the source code control system, or SCCS, bundled into nearly every Unix operating system and a staple
of simple configuration control.
After explaining the basics of SCCS file administration, we'll look at the more difficult issues of merging changes and dealing
with files owned by root. Our goal is to reduce the mystery and annoyance factor of SCCS, and make it a viable tool for producing
an electronic version of your "site book" documenting the who, what, and why of system-configuration changes.
Rewriting history
SCCS is really a collection of tools that control updates to ASCII files. You can use SCCS with binary data, which will be converted
into ASCII form using uuencode, but we'll limit this discussion to ASCII data since that's
the source for most configuration files. SCCS lets you put files under configuration control, check out read-only copies, acquire
write locks for updates, check in and document changes, print histories, and identify and combine specific updates. Any text file
can be put under SCCS's control, making it useful for managing plain text documentation and meeting notes.
Before going into the functional details, here's a bit of terminology:
History files contain the source for the file under control, as well as a log of all changes made to the
file, information about revision numbers, and access controls. History files are prefixed with an "s.", and generally live in
a subdirectory called SCCS.
Deltas are specific changes made to a file. Changing a few characters, adding a line, or removing a line
constitute deltas to a file. Deltas are numbered as minor release numbers from the main or major release. A particular version
of a file, reflecting the cumulative effect of many deltas, is referred to as an SCCS delta ID, or SID. Most SCCS commands take
an SID as an argument when a specific version of the file history is needed.
Branches are subdivisions of deltas. While deltas are used to track the main changes to a file, branches
let you create special-purpose minor variations in a file. Branches may or may not be merged back together at some point; a typical
branch file might be created when you update your network configuration to host some demo or loaner machines, and plan to remove
those edits in a few days or weeks. Branches are a convenient form of short-term memory.
When you place a file under SCCS control, SCCS creates the history file. To change the file, you check it out for editing, and
then each subsequent change to the file is annotated in the history file when you check the modified version back in. SCCS locks
the history file while one user is editing it to prevent concurrent updates.
Bones of contention
Let's walk through some basic SCCS operations to see how the components fit together, and then get into the grittier problems that
make SCCS more of a benefit than an added burden. First, you'll need to have /usr/ccs/bin
in your path, since that's where the SCCS commands live (in SunOS, they're part of /usr/bin).
You can call the individual SCCS commands, or use the sccs front-end tool to simplify life.
We'll use the front-end for illustrative purposes, but you can also call the SCCS subcommands directly. Make sure you have an obvious
place to store history files, such as a subdirectory called SCCS. SCCS commands look for this subdirectory if you don't give
an explicit history file location.
Take a vanilla ASCII file and put it under SCCS control, using the admin command:
huey% sccs admin -ihosts hosts
This creates an SCCS history file called hosts initialized with the content of the file named
hosts. You want the history file and the actual file to be namesakes unless you're particularly
good at associating strange path names with your /etc files. You can choose any file you want
for the initialization; if you've just sorted your hosts file into /tmp/hosts.sorted, the
above command line might be:
huey% sccs admin -i/tmp/hosts.sorted hosts
If all goes well, sccs admin returns quietly to the shell prompt. The most common complaint
is that the initial file doesn't contain any ID keywords, which are magic strings filled in by SCCS with the file name, delta numbers,
and date and time stamps. We'll talk about the keywords and how to maximize your enjoyment of them shortly. Successful submission
of a file to SCCS creates a new s-file in the SCCS directory. The file is primarily ASCII text, with SCCS records marked with an
ASCII SOH (start of header) character, showing up as control-A in most editors. All revisions,
delta histories and access control information goes into the s-file.
When you're ready to use the file, check out a read-only copy:
huey% sccs get hosts 1.2 10 lines
SCCS tells us the current SID of the file and its size. The get operation produces a read-only
file in the current directory, and it will complain if there's a writeable version of the file already present. After you initialize
a history file, be sure to rename or remove the initial file to prevent problems on your first check-out operation.
Edit the file by checking out a writeable version, using sccs get -e or the shorthand
sccs edit:
huey% sccs edit hosts 1.2 new delta 1.3 10 lines
This time, we're told the new delta number to be created by our editing session. If someone else is editing the file at the time,
SCCS produces an error:
Our first contention point is removed: any request to edit a file that is already being consumed by another system administrator
is met with a cryptic yet gentle slap on the keyboard. If you want to find out who is currently editing SCCS-controlled files, use
the info subcommand:
huey% sccs info hosts: being edited: 1.2 1.3 stern 95/06/16 17:41:22 aliases: being edited: 1.45 1.46 wendyt 95/06/17 14:50:33
Make your changes a part of the file's permanent record using sccs delta:
huey% sccs delta hosts comments? added two new host entries 1.3 2 inserted 0 deleted 10 unchanged
Your writeable source file is removed when you file the deltas, so you have to do another sccs get
to fetch the latest, read-only copy, or merge the delta and get
operations together with sccs delget hosts.
At this point, you can feed the read-only file into whatever system management step comes next: running an NIS
make, executing newaliases, or restarting a daemon with its new
configuration file.
Letters of intent
How can you determine the version number of a file, or if it's even SCCS controlled? When you check a file out, the
get subcommand fills in SCCS keywords with values such as the SID, pathname of the history file, date,
and time. The SCCS magic cookie indicating a keyword is a single, capital letter between percent signs, such as
%Z%. Put the SCCS keywords in a comment header in your file, and you have a built-in identification scheme.
Here's a sample header for a configuration file that uses the pound sign (#) as a comment character:
# %M% %I% %H% %T%
This set of keywords gives you the filename (M), the file revision or SID (I), the current date (H), and the time of checkout
(T). You may also choose to insert the pathname to the s-file (P). (Here is a partial list of
SCCS magic cookies.) The %W% keyword generates the filename and SID prefixed with the
string @(#), which is assumed to be unique to the SCCS system. The what
utility searches for the SCCS prefix and prints any information after it, allowing you to quickly identify any number of files.
To include other information to be picked up by what, use the %Z%
keyword to insert an SCCS cookie and then build your own identification string. A more verbose version of the example above is easily
found by what:
# %Z% common hosts file revision %I% of %H% at %T%
>what> is smart enough to look in the string tables of executables and libraries, so it will identify
the SCCS versions of each object component. Bundle an SCCS string into a C program with a global definition like this:
char *sccs_id = "%Z% %I% %H% %T%";
While peeking at the SID and file origins is useful for quick sanity checks, reviewing the delta history of a file is more likely
to tell you who changed something and why. When you create the delta, SCCS asks for a comment which is then recorded with your login
in the history file. Dump the delta history using sccs prs:
huey% sccs prs hosts SCCS/s.hosts: D 1.2 95/06/16 16:49:32 stern 2 1 00002/00002/00008 COMMENTS: added alias for wind, new host shower D 1.1 95/06/16 16:43:30 stern 1 0 00010/00000/00000 COMMENTS: date and time created 95/06/16 16:43:30 by stern
The line introducing each delta shows you the SID, date and time of change, and the login of the person making the change. The slash-separated
numbers are the line counts of new, deleted and unchanged lines. The manual pages for the prs
subcommand also list all of the possible SCCS keywords and their expanded values.
Merge ahead
We still haven't tackled two of the hardest problems in change management: how do you get multiple users to access SCCS files, particularly
when the files are owned by root, and how do you merge changes together? The first problem doesn't have an easy solution. You can
keep all of your SCCS history files in /etc/SCCS, and insist that system administrators include
their user names when making changes as root. Since this is fairly unlikely, the next step is to make the SCCS history files group-writeable
by members of your system management group (creating a new user group if you need to). Create private SCCS work areas for each system
manager using symbolic links to the actual history file location: >
Within ~stern/SCCS, an sccs edit hosts picks up
the s-file /etc/SCCS/s.hosts, giving me a private copy of the hosts file to work on.
When I check it back in, the single host-specific copy is returned where other managers (and the system) can find it, but it has
my user name attached to changes instead of root. To publicize the changes, I need to su to root, cd into /etc,
and then do an sccs get hosts to fetch my latest changes and install the file. Note that the
symbolic link points to a machine-specific location, which means I have to be logged on to the machine on which I want to make the
edits before doing the checkout. I can always move SCCS files around, as long as files get installed on the appropriate machines.
If you're worried about giving up some measure of security regarding permissions on /etc/hosts,
remember that only root can install the file in /etc and rebuild NIS maps or restart daemons.
For an added layer of safety, using the SCCS access control feature, explicitly name allowed users with sccs admin
-a:
But the opening question still lingers: how do I find out what happened to my hosts file at 3:30 on Friday afternoon June 16,
and who did it? The easiest way is to look at the delta history since that time:
huey% sccs prs -l -c95-06-16-15-30 hosts
The -l flag says I'm interested in things that occurred after the time specified with the
-c flag. The time and date are given in YYMMDDHHMM format, with any non-white space character
separating the items. This example shows me the revision history comments and the user names responsible for making changes.
If I want to see the actual line by line edits, it's sccs diffs to the rescue:
huey% sccs diffs -c95-06-16-15-30 hosts
Like the diff command, this compares the current working copy of a file to any older delta,
identified by SID or by a timestamp. In this example, I'll see the list of changes between the current hosts file and the one that
existed at 3:30 PM on June 16. Want to regenerate the hosts file, minus a few changes? get
lets you include or exclude any SID, providing a simple mechanism to drop changes from the current copy of a file:
huey% sccs get -x1.6,1.7 hosts
The current hosts file is retrieved without the changes applied in SIDs 1.6 and 1.7. If you want to extract the changes made in
those deltas, generate the differences with context in a form that can be later fed to sed,
just like the output of the standard Unix diff command:
If you plan on applying the patches at a later time, when the hosts file may have undergone some additional minor edits, you'll
need to generate context differences that can be fed through
patch:
huey% sccs diffs -C -r1.5 hosts > hosts.sed.6
>diff takes the -c flag for generating context differences, but sccs diffs
takes -C to avoid conflict with the timestamp flag.
Control freaks
Like all powerful system administration tools, SCCS has a number of poorly documented but interesting features and subtle caveats:
If you damage the list of users allowed to make deltas badly enough, it may be worth hand-patching the history file:
Edit the list of users that is underneath the ^Au line. Regenerate the SCCS file checksum
using admin -z, or you'll get notice of a corrupted history file on your next attempt to edit or check out a copy
of the file. If you see corrupted file warnings at other times, you can fix the checksum, but be very certain that the file wasn't
actually damaged by someone editing in the wrong place. Once the checksum is fixed, the SCCS history file is assumed to be valid,
and any errors introduced will be propagated into future deltas of the file.
To find out when a particular line showed up in a file, use sccs get -m to preface
each line with its SID number.
You may run into contention when multiple system administrators attempt to create history files at the same time, if the SCCS
directory is NFS-mounted. SCCS uses an exclusive file-create operation when opening the s-file for the first time, but exclusive
creates aren't obeyed by the NFS Version 2 protocol (NFS Version 3 supports exclusive creates, see
"NFS Version 3 Design and Implementation,"
by Brian Pawlowski, et al.).
If you find that you can't access a history file you think you just created, make sure you're the owner of the file and that
someone else didn't beat you to the sccs admin punch.
SCCS doesn't preserve the modification timestamps on files. When you check a file out using sccs get, the modification
time is set to the current time. Furthermore, if you're accessing the SCCS directory and work area via NFS, the modification time
is set to the time on the NFS server, which might have drifted a few minutes ahead or behind the time on other machines.
Changing the modification time has less than pleasant impacts on make. If you decide to get a fresh copy of all
of your NIS source files, figuring that the NIS Makefile will only rebuild those that you've changed, you'll be in for a surprise,
because SCCS will change the timestamps on all of them, so make will assume they're all new.
Hooks for integrating a trouble ticket or request system into SCCS are minimal but present. SCCS calls these modification
records, or MRs in short hand. By default, MRs are not collected, but you can force sccs delta to prompt for
an MR by enabling them:
huey% sccs admin -fv hosts
This command turns on the validation flag in the s-file, which is used to signal that MRs should be accepted for each delta.
A script or executable specified after the v flag will be invoked on each delta, and given the name of the file and
the modification record string entered during the delta check-in. Here's how to update your SCCS history file for hosts so that
it calls /usr/local/bin/host-update after each check-in:
huey% sccs admin -fv /usr/local/bin/host-update
One application of MRs is tying the configuration file edit cycle to trouble-ticket management, so that the validation executable
removes the trouble tickets or requests from your work flow system as soon as the change is made. ... .... ... ...
There's certainly much more that can be done with SCCS. In the last issue of Advanced Systems, Chuck Musciano suggested
using a Web browser front end for checking files in and out, and viewing the history. A bit of creative perl or
awk programming lets you generate HTML out of the sccs prt output. Send us your marriage proposals for
HTML and SCCS, and we'll attach the interesting submissions to this page.
The hidden agenda of using SCCS is accountability. You want to know who inflicted a change, and why, and under whose authority.
A rigorous policy for attributing changes and accepting responsibility for their implementation and effects is fundamental to any
robust, mission-critical environment.
Dan Geer, noted security expert and frequent speaker, tells the story of an investment bank executive who demanded a systems change
to circumvent normal reporting and control code. The hole was later exploited to execute trades that violated various internal and
external regulations. Who was responsible?
The developer who changed the source code?
The executive requesting the change?
The system administrator who allowed this code to be fielded?
Tracing the changes from idea to deployment gives you the first measure of accountability. It's a good thing to have when you
hear those warning bells.
Remote Server Management Tool is an Eclipse plug-in that provides an integrated graphical user interface (GUI) environment and
enables testers to manage multiple remote servers simultaneously. The tool is designed as a management tool for those who would otherwise
telnet to more than one server to manage the servers and who must look at different docs and man pages to find commands for different
platforms in order to create or manage users and groups and to initiate and monitor processes. This tool handles these operations
on remote servers by using a user-friendly GUI; in addition, it displays configuration of the test server (number of processors,
RAM, etc.). The activities that can be managed by this tool on the remote and local server are divided as follows:
Process Management: This utility lists the process running on UNIX and Windows� servers. One can start and stop processes.
Along with process listing, the utility also provides details of the resources used by the process.
User Management: This utility facilitates creation of users and groups on UNIX servers; it also provides options for
listing, creating, deleting, and modifying the attributes of users and groups.
File Management: This utility acts as a windows explorer for any selected server, irrespective of its operating system.
One can create, edit, delete, and copy files and directories on local or remote servers. Testers can tail the remote files.
How does it work?
This Eclipse plug-in was written with the Standard Widget Toolkit (SWT). The tool has a perspective named Remote System Management;
the perspective consists of test servers and a console view. The remote test servers are mounted in the Test Servers view for management
of their resources (process, file system, and users or groups).
At the back end, this Eclipse plug-in uses the Software Test Automation Framework (STAF). STAF is an open-source
framework that masks the operating system-specific details and provides common services and APIs in order to manage system resources.
The APIs are provided for a majority of the languages. Along with the built-in services, STAF also supports external services. The
Remote Server Management Tool comes with two STAF external services: one for user management and another for proving system details.
About the technology author(s):
Geetha Adinarayan is an advisory software specialist from IBM Software Labs, Bangalore, India. She has five years of experience in
IBM messaging middleware products. Ms. Adinarayan holds a degree in information systems from BITS, Pilani, India; she is also a Certified
Software Test Engineer and IBM Certified System Administrator for WebSphere Business Integration Message Broker 5. Currently, Ms.
Adinarayan works with the High Performance On Demand Solutions (HiPODs) team in India. Her interests are in performance analysis
of complex customer solutions and in autonomic computing.
Shashi K. Dalmia is a staff software engineer from IBM Software Labs, Bangalore, India. He has been with IBM for five years and
in the IT field for a total of ten years. He has experience in application development, systems software, and messaging middleware.
Mr. Dalmia holds a master's degree in software systems from BITS, Pilani, India, and he is an IBM Certified Systems Administrator
for Websphere Business Integrator 2.1. Currently, he works on Websphere Business Integrator, Message Broker 6.0, with the Systems
Test team in India. His interests include learning new technologies and creating tools to help ease the work of testers and developers.
Rahul Gupta is a computer science engineer from the National Institute Of Engineering, Mysore. He is skilled in the Software Test
Automation System (STAF) and Eclipse plug-in development.
Sreenandan Iyengar is a computer science engineer from National Institute Of Engineering, Mysore. He is skilled in the Software
Test Automation System (STAF) and Eclipse plug-in development.
System Configuration Repository (SCR) capture and store information about your system's configuration on your request or
at a scheduled times. Desktop Management Interface (DMI) operates between your management software and your system's components.
The DMI standard gives technical support personnel, IT managers, and individual users a common path to access information about all
aspects of a computer system.
Version B.11.11.32, B.11.00.32 and version B.10.20.32 of SCR+DMI for HP-UX are now available free for download and use from this
Web site. There is also a CD containing the product that you can order. Select the link above to see how.
The System Configuration Repository (SCR) is an application that tracks changes in a system's configuration over time. SCR
can take snapshots of system configuration information periodically or manually before and after major configuration changes. SCR
provides tools to filter and compare snapshots from different times or from different machines.
The information that is stored in snapshots comes from DMI, and is stored in a database. Currently, the configuration information
available through DMI includes system information such as devices, volume groups, file systems, kernel parameters, etc., and information
about software products, including information such as bundles and filesets. (Developers can write their own DMI instrumentation
in order to expand the information stored in SCR.)
SCR is highly configurable and can be used in many ways. For example, SCR can be used to maintain consistency on a system or across
systems, or to a recover a machine's configuration information in case of disaster, or to maintain consistency between test systems
and production systems,...
Included in this presentation is an overview of SCR, future directions, and example scripts for how to use SCR most efficiently.
In addition, we will be soliciting input on additional APIs and additional data coverage.
Creating multiple, identical copies of a system can be hard work;
it becomes even harder if patches and diffs need to be maintained. Multiply this by hundreds of computers ... and Unix sysadmins
go crazy.
The Working Version company has created a system version control and distribution
mechanism to manage entire installed system versions.
The Last but not LeastTechnology is dominated by
two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt.
Ph.D
Copyright � 1996-2021 by Softpanorama Society. www.softpanorama.org
was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP)
without any remuneration. This document is an industrial compilation designed and created exclusively
for educational use and is distributed under the Softpanorama Content License.
Original materials copyright belong
to respective owners. Quotes are made for educational purposes only
in compliance with the fair use doctrine.
FAIR USE NOTICEThis site contains
copyrighted material the use of which has not always been specifically
authorized by the copyright owner. We are making such material available
to advance understanding of computer science, IT technology, economic, scientific, and social
issues. We believe this constitutes a 'fair use' of any such
copyrighted material as provided by section 107 of the US Copyright Law according to which
such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free)
site written by people for whom English is not a native language. Grammar and spelling errors should
be expected. The site contain some broken links as it develops like a living tree...
You can use PayPal to to buy a cup of coffee for authors
of this site
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or
referenced source) and are
not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society.We do not warrant the correctness
of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be
tracked by Google please disable Javascript for this site. This site is perfectly usable without
Javascript.