CSC 140 Introduction to IT Advanced File Processing




![Compressing Files: compress [-c] [-d] [-l] [-v] file 1 [file 2, …] -c Send Compressing Files: compress [-c] [-d] [-l] [-v] file 1 [file 2, …] -c Send](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-5.jpg)
![Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-6.jpg)

![Compressing Files: gzip [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -# Compressing Files: gzip [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -#](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-8.jpg)


![Modern Compression: bzip 2 [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] Modern Compression: bzip 2 [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …]](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-11.jpg)



![Archiving Files: tar [-c] [-t] [-x] [-v] [-f file. tar] file 1 [file 2, Archiving Files: tar [-c] [-t] [-x] [-v] [-f file. tar] file 1 [file 2,](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-15.jpg)



![Sorting: sort [-f] [-i] [-d] [-l] [-v] file 1 [file 2, …] -d Sort Sorting: sort [-f] [-i] [-d] [-l] [-v] file 1 [file 2, …] -d Sort](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-19.jpg)



- Slides: 22

CSC 140: Introduction to IT Advanced File Processing CIT 140: Introduction to IT 1

Topics 1. Compressing files: 1. compress, 2. gzip, 3. bzip 2 2. Archiving Files: tar 3. Sorting files: sort CIT 140: Introduction to IT 2

Data Compression Problem: How can we store X bytes using only Y < X bytes? Solution: Find redundancies in the data. 1. Run-length encoding Encode reptitions as the repeated value and a count. Ex: thethethe -> the 3 2. Dictionary encoding Build dictionary of words. Encode each with a number. Common words: the, an, is, this CIT 140: Introduction to IT 3

Data Compression "Ask not what your country can do for you -- ask what you can do for your country. " Encoded version: “ 1 2 3 4 5 6 7 8 9 – 1 3 9 6 7 8 4 5. ” CIT 140: Introduction to IT Dictionary: 1 ask 2 not 3 what 4 your 5 country 6 can 7 do 8 for 9 you 4
![Compressing Files compress c d l v file 1 file 2 c Send Compressing Files: compress [-c] [-d] [-l] [-v] file 1 [file 2, …] -c Send](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-5.jpg)
Compressing Files: compress [-c] [-d] [-l] [-v] file 1 [file 2, …] -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT 5
![Compressing Files Old School The compress command compress optionsfilelist CIT 140 Introduction to IT Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-6.jpg)
Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT 6

Uncompressing Files Old School The uncompress command CIT 140: Introduction to IT 7
![Compressing Files gzip c d l v file 1 file 2 Compressing Files: gzip [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -#](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-8.jpg)
Compressing Files: gzip [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -# Specify compression level. Default=6. -c Send output to stdout. -d Decompress instead of compressing. -l List compression stats. -v Provide verbose output. CIT 140: Introduction to IT 8

Compressing Files: gzip > man bash >bash. man > man tcsh >tcsh. man > ls –l *man -rw-r--r-- 1 waldenj 267350 Oct 4 19: 48 bash. man -rw-r--r-- 1 waldenj 239534 Oct 4 19: 48 tcsh. man > gzip *. man > ls –l *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19: 45 bash. man. gz -rw-r--r-- 1 waldenj 69759 Oct 4 19: 45 tcsh. man. gz > gzip –l *gz compressed uncompressed ratio uncompressed_name 71333 267350 73. 3% bash. man 69759 239534 70. 8% tcsh. man 141092 506884 72. 1% (totals) > CIT 140: Introduction to IT 9

Uncompressing Files: gunzip > gunzip bash. man. gz > ls -l *man *gz -rw-r--r-- 1 waldenj 267350 Oct 4 19: 45 bash. man -rw-r--r-- 1 waldenj 69759 Oct 4 19: 45 tcsh. man. gz > gzip -v bash. man: 73. 3% -- replaced with bash. man. gz > gzip -dc bash. man. gz | less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell … > ls -l *man *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19: 45 bash. man. gz -rw-r--r-- 1 waldenj 69759 Oct 4 19: 45 tcsh. man. gz CIT 140: Introduction to IT 10
![Modern Compression bzip 2 c d l v file 1 file 2 Modern Compression: bzip 2 [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …]](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-11.jpg)
Modern Compression: bzip 2 [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -# Specify compression level. Default=9. -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT 11

Modern Compression: bzip 2 > bzip 2 -v bash. man tcsh. man bash. man: 4. 821: 1, 1. 659 bits/byte, 79. 26% saved, 267350 in, 55456 out. tcsh. man: 4. 259: 1, 1. 878 bits/byte, 76. 52% saved, 239534 in, 56236 out. > ls -l *bz 2 -rw-r--r-- 1 waldenj 55456 Oct 4 19: 45 bash. man. bz 2 -rw-r--r-- 1 waldenj 56236 Oct 4 19: 48 tcsh. man. bz 2 > bzip 2 -d bash. man. bz 2 > bunzip 2 tcsh. man. bz 2 > ls -l *. man -rw-r--r-- 1 waldenj 267350 Oct 4 19: 45 bash. man -rw-r--r-- 1 waldenj 239534 Oct 4 19: 48 tcsh. man > bzip 2 -dc bash. man. bz 2 |less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell CIT 140: Introduction to IT 12

Displaying Compressed Files zcat – Identical to compress -dc gzcat – Identical to gzip -dc bzcat 2 – Identical to bzip 2 -dc CIT 140: Introduction to IT 13

Compression Benchmarks > ls -l patch* -rw-r--r-- 1 waldenj 28944395 Oct 4 19: 37 patch-2. 6. 13 10238237 Oct 4 19: 37 patch-2. 6. 13. Z 5009926 Oct 4 19: 37 patch-2. 6. 13. bz 2 6220228 Oct 4 19: 37 patch-2. 6. 13. gz Compression Tool Compression Ratio compress 64. 6% gzip 78. 5% bzip 2 82. 7% CIT 140: Introduction to IT 14
![Archiving Files tar c t x v f file tar file 1 file 2 Archiving Files: tar [-c] [-t] [-x] [-v] [-f file. tar] file 1 [file 2,](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-15.jpg)
Archiving Files: tar [-c] [-t] [-x] [-v] [-f file. tar] file 1 [file 2, …] -c Create a new tape archive. -f Write the archive to specified file instead of writing to tape. -t Trace (view) archive contents. -v Provide verbose output. -x e. Xtract archive contents. CIT 140: Introduction to IT 15

Archiving Files: tar > tar -cvf manpages. tar *. man bash. man tcsh. man > ls -l manpages. tar -rw-r--r-- 1 waldenj 512000 Oct 4 21: 01 manpages. tar > tar -tf manpages. tar bash. man tcsh. man > tar -tvf manpages. tar -rw-r--r-- waldenj/students 267350 2005 -10 -04 19: 45 bash. man -rw-r--r-- waldenj/students 239534 2005 -10 -04 19: 48 tcsh. man > mkdir tmp > cd tmp > tar -xvf. . /manpages. tar bash. man tcsh. man CIT 140: Introduction to IT 16

Other File Compression Tools PKzip/Win. Zip zip, unzip ARJ arj, unarj RAR rar, unrar CIT 140: Introduction to IT 17

Sorting Ordering set of items by some criteria. Systems in which sorting is used include: – Words in a dictionary. – Names of people in a telephone directory. – Numbers. CIT 140: Introduction to IT 18
![Sorting sort f i d l v file 1 file 2 d Sort Sorting: sort [-f] [-i] [-d] [-l] [-v] file 1 [file 2, …] -d Sort](https://slidetodoc.com/presentation_image/35f122d86512ed65f4feb54cdefd2d03/image-19.jpg)
Sorting: sort [-f] [-i] [-d] [-l] [-v] file 1 [file 2, …] -d Sort in dictionary order (default. ) -f Ignore case of letters. -i Ignore non-printable characters. -n Sort in numerical order. -r Reverse order of sort -u Do not list duplicate lines in output. CIT 140: Introduction to IT 19

sort Example > cat days. txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort days. txt Friday Monday Saturday Sunday Thursday Tuesday Wednesday CIT 140: Introduction to IT 20

sort Example > cat days. txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort -r days. txt Wednesday Tuesday Thursday Sunday Saturday Monday Friday CIT 140: Introduction to IT 21

sort Example > cat numbers. txt 101 5571 58 2001 9 > sort numbers. txt 101 2001 5571 58 9 > sort -n numbers. txt 9 58 101 2001 5571 CIT 140: Introduction to IT 22