|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
News | Syntax | Recommended Links | Sorting algorithms | Recommended Papers | Rcut | Reference | Pipes |
Perl re-implemenations | uniq | sort | tr | AWK | Tips | Humor | Etc |
|
The external join command performs the relational database operation "join" of two files. Files are joined on a common key field (column) that should exist in both files. Also both files must be sorted on the key field in the same order. join examines one line at a time from each file. If a certain segment of the lines match, they are combined into one line. Only one instance of the same segment is printed. The files are assumed to be sorted in the same order.
|
During join operation if the contents in the key field of both files match, the two lines are merged. The merged line is written to the standard output.
Related command are awk, cut, grep, paste, sort, and uniq.
The join is a classic relational database operation. The join command reads the first line of the first file, then reads lines from the second file until the contents in the key field of the second file match or exceed the contents in the key field of the first file. If a match occurs, the line from the first file and the line from the second file are combined and displayed on the output. If no match occurs, join begins the process again except reading is done from the first file. This process continues until one or both files have been read to the end and no more lines in the other file can match the last line of the file.
An example of a join command might help clarify how the join function works. Let's assume we have the following files.
file1: file2: 01123 HP 9000/825 01123 Hewlett Packard 02213 Sun 3/110 02213 Sun Microsystems 03321 Sun 4/110 03321 Sun Microsystems 08412 HP 9000/835 08412 Hewlett Packard
The following command should produce the combined output as shown.
$ join -t"\t" file1 file2
01123 HP 9000/825 Hewlett Packard 02213 Sun3/110 Sun Microsystems 03321 Sun4/110 Sun Microsystems 08412 HP 9000/835 Hewlett Packard
Following is the general format of the join command.
join [ -an ] [ -e str ] [ -jn m ] [ -o list ] [ -tc ] file1 file2
Options
The following list describes the options and their arguments that may be used to control how join functions.
-an | If a field in file n does not have a match in the other file, the line is written to file n. For example, if you use | |
join -a1 file1 file2 | ||
and file1 has a line in it with a key field that is not matched in file2, then that line is written to the file named 1. | ||
-e str | Replaces empty output fields with the str string. The -o option causes empty fields in certain conditions. | |
-jn m | The join is performed using the mth field of file n. If you do not specify n, the mth field is used from both files. The fields are numbered beginning with 1. | |
-o list | Specifies the fields to be displayed on output after the join has been performed. The list is comprised of n.m descriptions that define which fields from each file are to be displayed. The n refers to the file name and the m refers to the field number. Multiple n.m descriptions may be used in the list. They must be space separated. An example is, | |
join -j1 3 -j2 4 -o 1.1 1.3 2.1 1.5 file1 file2 | ||
which uses field 3 of file1 and field 4 of file2 to join the files. The output is field 1 of file1, field 3 of file1, field 1 of file2, and field 5 of file1. | ||
-tc | Character c is used as the input and output field separator. For example, | |
join -t: file1 file2 | ||
informs join to use a : (colon) field separator in place of a tab. | ||
The default input field separators are blanks (spaces, tabs, and new-lines). If the default separators are being used, multiple occurrences count as only one field separator. Leading separators are ignored. The default output separator is a space. |
The following list describes the arguments that may be passed to the join command.
- | A hyphen may be used in place of file1 to cause join to read the standard input as file1. |
file1 | The first file to be joined to the second file. The file must be sorted in ASCII collating sequence on the fields that are used to join the file to file2. |
file2 | The second file to be joined to the first file. The file must be sorted in ASCII collating sequence on the fields that are used to join the file to file1. |
If fields are separated by spaces and tabs you need to use the sort -b command for sorting the file. If you use the -t option you need to use the sort command without the -b option.
Unfortunately, the comm, join, sort, and uniq commands have different format conventions for referencing data.
Numeric filenames may conflict with the -o option and the field descriptions that relate to it. One solution is to mv your files to temporary names beginning with a nonnumeric character. Then after the join, move the files back to their original names.
You use the join command to join related files together based on a key field. The output is a subset of the fields and lines from both files. It is useful in combining two related tables (files) together to create one larger table.
The line segment (or field) is chosen using three switches. The -1 switch selects the field number from the first file. The -2 switch selects the field number from the second. The -t switch specifies the character that separates one field from another. If these switches aren't used, join separates fields by spaces and examines the first field on each line.
Suppose the data in the robots.txt file was separated into two files, one with the pricing information (robots1.txt) and one with the quantity and account information (robots2.txt).
$ cat robots1.txt Birchwood China Hutch,475.99 Bookcase Oak Veneer,205.99 Small Bookcase Oak Veneer,205.99 Reclining Chair,1599.99 Bunk Bed,705.99 $ cat robots2.txt Birchwood China Hutch,1,756 Bookcase Oak Veneer,1,756 Small Bookcase Oak Veneer,1,756 Reclining Chair,1,757 Bunk Bed,1,757
To join these two files together, use a comma as a field separator and compare field 1 of the first file with field 1 of the second.
$ join -1 1 -2 1 -t, robots1.txt robots2.txt Birchwood China Hutch,475.99,1,756 Bookcase Oak Veneer,205.99,1,756 Small Bookcase Oak Veneer,205.99,1,756 Reclining Chair,1599.99,1,757 Bunk Bed,705.99,1,757
If either file contains a line with a unique field, the field is discarded. Lines are joined only if matching fields are found in both files. To print unpaired lines, use -a 1 to print the unique lines in the first file or -a 2 to print the unique lines in the second file. The lines are printed as they appear in the files.
The sense of matching can be reversed with the -v switch. -v 1 prints the unique lines in the first file and -v 2 prints the unique lines in the second file.
The tests are case-insensitive when the --ignore-case (or -i) switch is used.
The fields can be rearranged using the -o (output) switch. Use a comma-separated field list to order the fields. A field is specified using the file number (1 or 2), a period and the field number from that file. A zero is a short form of the join field.
$ join -1 1 -2 1 -t, -o "1.2,2.3,2.2,0" robots1.txt robots2.txt 475.99,756,1,Birchwood China Hutch 205.99,756,1,Bookcase Oak Veneer 205.99,756,1,Small Bookcase Oak Veneer 1599.99,757,1,Reclining Chair 705.99,757,1,Bunk Bed
To join the /etc/passwd and /etc/group file together to show the user name, user ID, group name, group ID, group name, and home directory (both files should be sorted in ASCII collating sequence).
1. First you need to sort the passwd file on the group ID and user ID fields:sort -t':' +3.0n -4.0n +2.0n -3.0n /etc/passwd > mypasswd 2. Sort the group file on the group ID field. sort -t: +3.0n -4.0n /etc/group > mygroup 3. Now you can join two files:. > join -t: -j1 4 -j2 3 -o 1.1 1.3 2.1 2.3 1.6 mypasswd mygroup root:0:0:root . . mylogin:60:40:tech . .
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
The join command can be used to merge two files (one can be standard input) to create a third file (can be standard output). Each line in the file is merged on the basis of a field that has the same value in both input files to create one line in the output file. The fields in each file are separated by either a space or tab character.
Following is a list of flags that can be used with the join command:
- -1 field or -j1 field to specify that the join should be made on the basis of the field in the first file.
- -2 field or -j2 field to specify that the join should be made on the basis of the field in the second file.
- -e string to specify that blank fields in the output file be replaced by the specified string.
- -o fileid.fieldnumber to specify that the output should consist of the specified fields. You can specify multiple fields separating them by commas.
- -t character to modify the field separator character from the default value of space.
- -a fileid to generate output line for each line in the file specified by fileid parameter for lines that cannot be matched to the lines in the other file using join field. The output lines are produced in addition to the default output.
- -v fileid to generate output line for each line in the file specified by fileid parameter for lines that cannot be matched to the lines in the other file using join field. The default output is not produced.
Examples Let us assume we have two files, file1 and file2, whose contents are shown as follows:
more file1 computer1 16MB 1.2GB 17inch CDROM computer2 8MB 840MB 14inch computer3 12MB 1.6GB 17inch computer4 4MB 270MB 14inch more file2 computer1 1stfloor office5 computer3 2ndfloor office9A computer4 1stfloor office2 computer5 3rdfloor office1If you want to join the two files and display only the matching lines, execute the following command:
join file1 file2 computer1 16MB 1.2GB 17inch CDROM 1stfloor office5 computer3 12MB 1.6GB 17inch 2ndfloor office9A computer4 4MB 270MB 14inch CDROM 1stfloor office2If you want to join the two files and display the matching lines as well as the nonmatching lines from the specified file, use the -a flag in the following command:
join -a1 file1 file2 computer1 16MB 1.2GB 17inch CDROM 1stfloor office5 computer2 8MB 840MB 14inch computer3 12MB 1.6GB 17inch 2ndfloor office9A computer4 4MB 270MB 14inch CDROM 1stfloor office2The above example displays the line with computer2 from file1 because it does not have a matching line in file2. If you want to display only the lines that do not match lines from the specified file, use the -v flag in the following command:
join -v2 file1 file2 computer5 3rdfloor office1The above example displays the line with computer5 from file2 because it does not have a matching line in file1.
If you want to display only certain fields from the input files to the output file, use the -o flag as in the following command:
join -o 1.1 2.2 2.3 1.5 file1 file2 computer1 1stfloor office5 CDROM computer3 2ndfloor office9A computer4 1stfloor office2 CDROMIn the above example, the line with computer3 is displayed with one field short because that field is not present in the input file. You can insert a fixed legend in the empty field in the output by using the -e flag in the following command:
join -o 1.1 2.2 2.3 1.5 -e"NO CDROM" file1 file2 computer1 1stfloor office5 CDROM computer3 2ndfloor office9A NO CDROM computer4 1stfloor office2 NO CDROM
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019