|
|
file1 and file2 must be sorted in increasing collating sequence (defined by the current locale; see locale(M)) on the fields on which they are to be joined, normally the first in each line.
There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally consists of the common field, followed by the rest of the line from file1, then the rest of the line from file2.
Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are discarded.
join recognizes these options:
PTMP=/tmp/passwd{$$} GTMP=/tmp/group{$$} sort -t: +3 -4 /etc/passwd > $PTMP sort -t: +2 -3 /etc/group > $GTMP join -t: -j1 4 -j2 3 -o 1.1 2.1 $PTMP $GTMP | sort -t: +0 -1 rm $PTMP $GTMPIn the next example an employer has two files of data about his employees. The first file, E_pers, contains the name, address, and payroll number of each employee. The second file, E_dept, contains the payroll number, position, salary, and department of each employee. The fields in each employee's record are separated by colons ``:'', and each record occupies one line of a file.
The format of a record in the file E_pers would be:
Name:Address:Payroll#Here are some example records that might be in the file E_pers:
Selkirk C:33 Wingfield Court, Anytown, Pa 65000:133 Friday M:17 Awberry Court, Ubique, Ca 18480:91 Fletcher J:14 Tulip Gardens, Appledorn, NY 10010:231Each record in the file E_dept would have the format:
Payroll#:Position:Salary:DepartmentThe following are some example records that might appear in E_dept:
91:Manager:85000:Marketing 133:Clerk:22000:Accounts 231:Clerk:19500:AccountsThe employer wants to know which employees earn more than $20,000.
Firstly, the fields in the file E_pers are put in the order:
Payroll#:Name:AddressThis ensures that the field that the files are to be joined on (payroll number) is the first field in both files. The reordering is performed using awk(C), and its output is piped through sort(C). The sample records from E_pers given above would become:
133:Selkirk C:33 Wingfield Court, Anytown, Pa 65000 231:Fletcher J:14 Tulip Gardens, Appledorn, NY 10010 91:Friday M:17 Awberry Court, Ubique, Ca 18480Note that the records are sorted into order of payroll number according to the collating sequence (not increasing numeric order). The reordered and sorted file is named P_tmp:
Next, the records in the file E_dept are sorted on payroll
number and the output is placed in D_tmp:
sort -t: +0 -1 E_dept > D_tmp
The sorted files are now joined on their first fields (payroll
number), and the output is filtered using awk. A final
pass through sort puts the output in alphabetical order:
join -t: -o 1.2 2.3 P_tmp D_tmp | awk -F: '$2>20000 {print $1" $"$2}' | sort
This command prints the employee's name (the second field of P_tmp) and their salary (the third field of D_tmp) if it is more than $20000. For the sample records given above, the output would be:
Friday M $85000 Selkirk C $22000The output from join can be directed to a file (E_joined):
Omitting the -o option causes all fields to be written to
the output file, with the fields from file1 being followed
by those from file2. However, the field on which the files
were joined is only included once. Thus the join command
above could be entered as:
join -t: P_tmp D_tmp > E_joined
The file E_joined may be processed using awk to
extract any other information required. Here a list of all employees
sorted by name within each department is output:
cat E_joined | awk -F: '{print $6" "$2}' | sort
For the sample records used in the example above, the output would be:
Accounts Fletcher J Accounts Selkirk C Marketing Friday M
ISO/IEC DIS 99452:1992, Information technology Portable Operating System Interface (POSIX) Part 2: Shell and Utilities (IEEE Std 1003.21992);
AT&T SVID Issue 2;
X/Open CAE Specification, Commands and Utilities, Issue 4, 1992.