CSC 140 Introduction to IT Advanced File Processing
- Slides: 22
CSC 140: Introduction to IT Advanced File Processing CIT 140: Introduction to IT 1
Topics 1. Compressing files: 1. compress, 2. gzip, 3. bzip 2 2. Archiving Files: tar 3. Sorting files: sort CIT 140: Introduction to IT 2
Data Compression Problem: How can we store X bytes using only Y < X bytes? Solution: Find redundancies in the data. 1. Run-length encoding Encode reptitions as the repeated value and a count. Ex: thethethe -> the 3 2. Dictionary encoding Build dictionary of words. Encode each with a number. Common words: the, an, is, this CIT 140: Introduction to IT 3
Data Compression "Ask not what your country can do for you -- ask what you can do for your country. " Encoded version: “ 1 2 3 4 5 6 7 8 9 – 1 3 9 6 7 8 4 5. ” CIT 140: Introduction to IT Dictionary: 1 ask 2 not 3 what 4 your 5 country 6 can 7 do 8 for 9 you 4
Compressing Files: compress [-c] [-d] [-l] [-v] file 1 [file 2, …] -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT 5
Compressing Files Old School The compress command compress [options][file-list] CIT 140: Introduction to IT 6
Uncompressing Files Old School The uncompress command CIT 140: Introduction to IT 7
Compressing Files: gzip [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -# Specify compression level. Default=6. -c Send output to stdout. -d Decompress instead of compressing. -l List compression stats. -v Provide verbose output. CIT 140: Introduction to IT 8
Compressing Files: gzip > man bash >bash. man > man tcsh >tcsh. man > ls –l *man -rw-r--r-- 1 waldenj 267350 Oct 4 19: 48 bash. man -rw-r--r-- 1 waldenj 239534 Oct 4 19: 48 tcsh. man > gzip *. man > ls –l *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19: 45 bash. man. gz -rw-r--r-- 1 waldenj 69759 Oct 4 19: 45 tcsh. man. gz > gzip –l *gz compressed uncompressed ratio uncompressed_name 71333 267350 73. 3% bash. man 69759 239534 70. 8% tcsh. man 141092 506884 72. 1% (totals) > CIT 140: Introduction to IT 9
Uncompressing Files: gunzip > gunzip bash. man. gz > ls -l *man *gz -rw-r--r-- 1 waldenj 267350 Oct 4 19: 45 bash. man -rw-r--r-- 1 waldenj 69759 Oct 4 19: 45 tcsh. man. gz > gzip -v bash. man: 73. 3% -- replaced with bash. man. gz > gzip -dc bash. man. gz | less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell … > ls -l *man *gz -rw-r--r-- 1 waldenj 71333 Oct 4 19: 45 bash. man. gz -rw-r--r-- 1 waldenj 69759 Oct 4 19: 45 tcsh. man. gz CIT 140: Introduction to IT 10
Modern Compression: bzip 2 [-#] [-c] [-d] [-l] [-v] file 1 [file 2, …] -# Specify compression level. Default=9. -c Send output to stdout. -d Decompress instead of compressing. -v Provide verbose output. CIT 140: Introduction to IT 11
Modern Compression: bzip 2 > bzip 2 -v bash. man tcsh. man bash. man: 4. 821: 1, 1. 659 bits/byte, 79. 26% saved, 267350 in, 55456 out. tcsh. man: 4. 259: 1, 1. 878 bits/byte, 76. 52% saved, 239534 in, 56236 out. > ls -l *bz 2 -rw-r--r-- 1 waldenj 55456 Oct 4 19: 45 bash. man. bz 2 -rw-r--r-- 1 waldenj 56236 Oct 4 19: 48 tcsh. man. bz 2 > bzip 2 -d bash. man. bz 2 > bunzip 2 tcsh. man. bz 2 > ls -l *. man -rw-r--r-- 1 waldenj 267350 Oct 4 19: 45 bash. man -rw-r--r-- 1 waldenj 239534 Oct 4 19: 48 tcsh. man > bzip 2 -dc bash. man. bz 2 |less User Commands BASH(1) NAME bash - GNU Bourne-Again Shell CIT 140: Introduction to IT 12
Displaying Compressed Files zcat – Identical to compress -dc gzcat – Identical to gzip -dc bzcat 2 – Identical to bzip 2 -dc CIT 140: Introduction to IT 13
Compression Benchmarks > ls -l patch* -rw-r--r-- 1 waldenj 28944395 Oct 4 19: 37 patch-2. 6. 13 10238237 Oct 4 19: 37 patch-2. 6. 13. Z 5009926 Oct 4 19: 37 patch-2. 6. 13. bz 2 6220228 Oct 4 19: 37 patch-2. 6. 13. gz Compression Tool Compression Ratio compress 64. 6% gzip 78. 5% bzip 2 82. 7% CIT 140: Introduction to IT 14
Archiving Files: tar [-c] [-t] [-x] [-v] [-f file. tar] file 1 [file 2, …] -c Create a new tape archive. -f Write the archive to specified file instead of writing to tape. -t Trace (view) archive contents. -v Provide verbose output. -x e. Xtract archive contents. CIT 140: Introduction to IT 15
Archiving Files: tar > tar -cvf manpages. tar *. man bash. man tcsh. man > ls -l manpages. tar -rw-r--r-- 1 waldenj 512000 Oct 4 21: 01 manpages. tar > tar -tf manpages. tar bash. man tcsh. man > tar -tvf manpages. tar -rw-r--r-- waldenj/students 267350 2005 -10 -04 19: 45 bash. man -rw-r--r-- waldenj/students 239534 2005 -10 -04 19: 48 tcsh. man > mkdir tmp > cd tmp > tar -xvf. . /manpages. tar bash. man tcsh. man CIT 140: Introduction to IT 16
Other File Compression Tools PKzip/Win. Zip zip, unzip ARJ arj, unarj RAR rar, unrar CIT 140: Introduction to IT 17
Sorting Ordering set of items by some criteria. Systems in which sorting is used include: – Words in a dictionary. – Names of people in a telephone directory. – Numbers. CIT 140: Introduction to IT 18
Sorting: sort [-f] [-i] [-d] [-l] [-v] file 1 [file 2, …] -d Sort in dictionary order (default. ) -f Ignore case of letters. -i Ignore non-printable characters. -n Sort in numerical order. -r Reverse order of sort -u Do not list duplicate lines in output. CIT 140: Introduction to IT 19
sort Example > cat days. txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort days. txt Friday Monday Saturday Sunday Thursday Tuesday Wednesday CIT 140: Introduction to IT 20
sort Example > cat days. txt Sunday Monday Tuesday Wednesday Thursday Friday Saturday > sort -r days. txt Wednesday Tuesday Thursday Sunday Saturday Monday Friday CIT 140: Introduction to IT 21
sort Example > cat numbers. txt 101 5571 58 2001 9 > sort numbers. txt 101 2001 5571 58 9 > sort -n numbers. txt 9 58 101 2001 5571 CIT 140: Introduction to IT 22
- File-file yang dibuat oleh user pada jenis file di linux
- Csc 102
- Point processing techniques
- Advanced data processing
- Awips cave
- Difference between logical file and physical file
- Fungsi sistem file
- Remote file access in distributed file system
- An html file is a text file containing small markup tags.
- In a file-oriented information system, a transaction file
- Irt 5433
- File based processing
- Evolution of file system data processing
- File processing in c
- Python file processing
- C file processing
- Top down procesing
- Gloria suarez
- Bottom up and top down processing
- Point processing and neighbourhood processing
- Primary food production
- Fractal
- Histogram processing in digital image processing