CSC 140 Introduction to IT Advanced File Processing

  • Slides: 22
Download presentation
CSC 140: Introduction to IT Advanced File Processing CIT 140: Introduction to IT 1

CSC 140: Introduction to IT Advanced File Processing CIT 140: Introduction to IT 1

Topics 1. Compressing files: 1. compress, 2. gzip, 3. bzip 2 2. Archiving Files:

Topics 1. Compressing files: 1. compress, 2. gzip, 3. bzip 2 2. Archiving Files: tar 3. Sorting files: sort CIT 140: Introduction to IT 2

Data Compression Problem: How can we store X bytes using only Y < X

Data Compression Problem: How can we store X bytes using only Y < X bytes? Solution: Find redundancies in the data. 1. Run-length encoding Encode reptitions as the repeated value and a count. Ex: thethethe -> the 3 2. Dictionary encoding Build dictionary of words. Encode each with a number. Common words: the, an, is, this CIT 140: Introduction to IT 3

Data Compression "Ask not what your country can do for you -- ask what

Data Compression "Ask not what your country can do for you -- ask what you can do for your country. " Encoded version: “ 1 2 3 4 5 6 7 8 9 – 1 3 9 6 7 8 4 5. ” CIT 140: Introduction to IT Dictionary: 1 ask 2 not 3 what 4 your 5 country 6 can 7 do 8 for 9 you 4

Compressing Files: compress [-c] [-d] [-l] [-v] file 1 [file 2, …] -c Send

Compressing Files: compress [-c] [-d] [-l] [-v] file 1 [file 2, …] -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT 5

Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT

Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT 6

Uncompressing Files Old School The uncompress command CIT 140: Introduction to IT 7

Uncompressing Files Old School The uncompress command CIT 140: Introduction to IT 7

Compressing Files: gzip [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -#

Compressing Files: gzip [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -# Specify compression level. Default=6. -c Send output to stdout. -d Decompress instead of compressing. -l List compression stats. -v Provide verbose output. CIT 140: Introduction to IT 8

Compressing Files: gzip > man bash >bash. man > man tcsh >tcsh. man >

Compressing Files: gzip > man bash >bash. man > man tcsh >tcsh. man > ls –l *man -rw-r--r-- 1 waldenj 267350 Oct 4 19: 48 bash. man -rw-r--r-- 1 waldenj 239534 Oct 4 19: 48 tcsh. man > gzip *. man > ls –l *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19: 45 bash. man. gz -rw-r--r-- 1 waldenj 69759 Oct 4 19: 45 tcsh. man. gz > gzip –l *gz compressed uncompressed ratio uncompressed_name 71333 267350 73. 3% bash. man 69759 239534 70. 8% tcsh. man 141092 506884 72. 1% (totals) > CIT 140: Introduction to IT 9

Uncompressing Files: gunzip > gunzip bash. man. gz > ls -l *man *gz -rw-r--r--

Uncompressing Files: gunzip > gunzip bash. man. gz > ls -l *man *gz -rw-r--r-- 1 waldenj 267350 Oct 4 19: 45 bash. man -rw-r--r-- 1 waldenj 69759 Oct 4 19: 45 tcsh. man. gz > gzip -v bash. man: 73. 3% -- replaced with bash. man. gz > gzip -dc bash. man. gz | less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell … > ls -l *man *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19: 45 bash. man. gz -rw-r--r-- 1 waldenj 69759 Oct 4 19: 45 tcsh. man. gz CIT 140: Introduction to IT 10

Modern Compression: bzip 2 [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …]

Modern Compression: bzip 2 [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -# Specify compression level. Default=9. -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT 11

Modern Compression: bzip 2 > bzip 2 -v bash. man tcsh. man bash. man:

Modern Compression: bzip 2 > bzip 2 -v bash. man tcsh. man bash. man: 4. 821: 1, 1. 659 bits/byte, 79. 26% saved, 267350 in, 55456 out. tcsh. man: 4. 259: 1, 1. 878 bits/byte, 76. 52% saved, 239534 in, 56236 out. > ls -l *bz 2 -rw-r--r-- 1 waldenj 55456 Oct 4 19: 45 bash. man. bz 2 -rw-r--r-- 1 waldenj 56236 Oct 4 19: 48 tcsh. man. bz 2 > bzip 2 -d bash. man. bz 2 > bunzip 2 tcsh. man. bz 2 > ls -l *. man -rw-r--r-- 1 waldenj 267350 Oct 4 19: 45 bash. man -rw-r--r-- 1 waldenj 239534 Oct 4 19: 48 tcsh. man > bzip 2 -dc bash. man. bz 2 |less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell CIT 140: Introduction to IT 12

Displaying Compressed Files zcat – Identical to compress -dc gzcat – Identical to gzip

Displaying Compressed Files zcat – Identical to compress -dc gzcat – Identical to gzip -dc bzcat 2 – Identical to bzip 2 -dc CIT 140: Introduction to IT 13

Compression Benchmarks > ls -l patch* -rw-r--r-- 1 waldenj 28944395 Oct 4 19: 37

Compression Benchmarks > ls -l patch* -rw-r--r-- 1 waldenj 28944395 Oct 4 19: 37 patch-2. 6. 13 10238237 Oct 4 19: 37 patch-2. 6. 13. Z 5009926 Oct 4 19: 37 patch-2. 6. 13. bz 2 6220228 Oct 4 19: 37 patch-2. 6. 13. gz Compression Tool Compression Ratio compress 64. 6% gzip 78. 5% bzip 2 82. 7% CIT 140: Introduction to IT 14

Archiving Files: tar [-c] [-t] [-x] [-v] [-f file. tar] file 1 [file 2,

Archiving Files: tar [-c] [-t] [-x] [-v] [-f file. tar] file 1 [file 2, …] -c Create a new tape archive. -f Write the archive to specified file instead of writing to tape. -t Trace (view) archive contents. -v Provide verbose output. -x e. Xtract archive contents. CIT 140: Introduction to IT 15

Archiving Files: tar > tar -cvf manpages. tar *. man bash. man tcsh. man

Archiving Files: tar > tar -cvf manpages. tar *. man bash. man tcsh. man > ls -l manpages. tar -rw-r--r-- 1 waldenj 512000 Oct 4 21: 01 manpages. tar > tar -tf manpages. tar bash. man tcsh. man > tar -tvf manpages. tar -rw-r--r-- waldenj/students 267350 2005 -10 -04 19: 45 bash. man -rw-r--r-- waldenj/students 239534 2005 -10 -04 19: 48 tcsh. man > mkdir tmp > cd tmp > tar -xvf. . /manpages. tar bash. man tcsh. man CIT 140: Introduction to IT 16

Other File Compression Tools PKzip/Win. Zip zip, unzip ARJ arj, unarj RAR rar, unrar

Other File Compression Tools PKzip/Win. Zip zip, unzip ARJ arj, unarj RAR rar, unrar CIT 140: Introduction to IT 17

Sorting Ordering set of items by some criteria. Systems in which sorting is used

Sorting Ordering set of items by some criteria. Systems in which sorting is used include: – Words in a dictionary. – Names of people in a telephone directory. – Numbers. CIT 140: Introduction to IT 18

Sorting: sort [-f] [-i] [-d] [-l] [-v] file 1 [file 2, …] -d Sort

Sorting: sort [-f] [-i] [-d] [-l] [-v] file 1 [file 2, …] -d Sort in dictionary order (default. ) -f Ignore case of letters. -i Ignore non-printable characters. -n Sort in numerical order. -r Reverse order of sort -u Do not list duplicate lines in output. CIT 140: Introduction to IT 19

sort Example > cat days. txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday >

sort Example > cat days. txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort days. txt Friday Monday Saturday Sunday Thursday Tuesday Wednesday CIT 140: Introduction to IT 20

sort Example > cat days. txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday >

sort Example > cat days. txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort -r days. txt Wednesday Tuesday Thursday Sunday Saturday Monday Friday CIT 140: Introduction to IT 21

sort Example > cat numbers. txt 101 5571 58 2001 9 > sort numbers.

sort Example > cat numbers. txt 101 5571 58 2001 9 > sort numbers. txt 101 2001 5571 58 9 > sort -n numbers. txt 9 58 101 2001 5571 CIT 140: Introduction to IT 22