Python Text Copyright Software Carpentry 2010 This work
- Slides: 56
Python Text Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http: //software-carpentry. org/license. html for more information.
How to represent characters? Python Text
How to represent characters? American English in the 1960 s: Python Text
How to represent characters? American English in the 1960 s: 26 characters × {upper, lower} Python Text
How to represent characters? American English in the 1960 s: 26 characters × {upper, lower} + 10 digits Python Text
How to represent characters? American English in the 1960 s: 26 characters × {upper, lower} + 10 digits + punctuation Python Text
How to represent characters? American English in the 1960 s: 26 characters × {upper, lower} + 10 digits + punctuation + special characters for controlling teletypes (new line, carriage return, form feed, bell, …) Python Text
How to represent characters? American English in the 1960 s: 26 characters × {upper, lower} + 10 digits + punctuation + special characters for controlling teletypes (new line, carriage return, form feed, bell, …) = 7 bits per character (ASCII standard) Python Text
How to represent text? Python Text
How to represent text? 1. Fixed-width records Python Text
How to represent text? 1. Fixed-width records A crash reduces your expensive computer to a simple stone. Python Text
How to represent text? 1. Fixed-width records A crash reduces your expensive computer to a simple stone. A Python c r a s h r e d u c e s · · · · y o u r e x p e n s i v e t o s i s t o n e. · · · a m p l e c o m p u t e r Text
How to represent text? 1. Fixed-width records A crash reduces your expensive computer to a simple stone. A c r a s h r e d u c e s · · · · y o u r e x p e n s i v e t o s i s t o n e. · · · a m p l e c o m p u t e r Easy to get to line N Python Text
How to represent text? 1. Fixed-width records A crash reduces your expensive computer to a simple stone. A c r a s h r e d u c e s · · · · y o u r e x p e n s i v e t o s i s t o n e. · · · a m p l e c o m p u t e r Easy to get to line N But may waste space Python Text
How to represent text? 1. Fixed-width records A crash reduces your expensive computer to a simple stone. A c r a s h r e d u c e s · · · · y o u r e x p e n s i v e t o s i s t o n e. · · · a m p l e c o m p u t e r Easy to get to line N But may waste space What if lines are longer than the record length? Python Text
How to represent text? 1. Fixed-width records 2. 2. Stream with embedded end-of-line markers Python Text
How to represent text? 1. Fixed-width records 2. 2. Stream with embedded end-of-line markers A crash reduces your expensive computer to a simple stone. A c r a s h e c o m p u t e r Python r e d u c e s t o a y o u r e x p e n s i e m p l v s t o n e. Text
How to represent text? 1. Fixed-width records 2. 2. Stream with embedded end-of-line markers A crash reduces your expensive computer to a simple stone. A c r a s h r e d u c e s e c o m p u t e r t o a y o u r e x p e n s i e m p l v s t o n e. More flexible Python Text
How to represent text? 1. Fixed-width records 2. 2. Stream with embedded end-of-line markers A crash reduces your expensive computer to a simple stone. A c r a s h r e d u c e s e c o m p u t e r t o a y o u r e x p e n s i e m p l v s t o n e. More flexible Wastes less space Python Text
How to represent text? 1. Fixed-width records 2. 2. Stream with embedded end-of-line markers A crash reduces your expensive computer to a simple stone. A c r a s h r e d u c e s e c o m p u t e r More flexible t o a y o u r e x p e n s i e m p l v s t o n e. Skipping ahead is harder Wastes less space Python Text
How to represent text? 1. Fixed-width records 2. 2. Stream with embedded end-of-line markers A crash reduces your expensive computer to a simple stone. A c r a s h r e d u c e s e c o m p u t e r t o a y o u r e x p e n s i e m p l v s t o n e. More flexible Skipping ahead is harder Wastes less space What to use for end of line? Python Text
Unix: newline ('n') Python Text
Unix: newline ('n') Windows: carriage return + newline ('rn') Python Text
Unix: newline ('n') Windows: carriage return + newline ('rn') Oh dear… Python Text
Unix: newline ('n') Windows: carriage return + newline ('rn') Oh dear… Python converts 'rn' to 'n' and back on Windows Python Text
Unix: newline ('n') Windows: carriage return + newline ('rn') Oh dear… Python converts 'rn' to 'n' and back on Windows To prevent this (e. g. , when reading image files) open the file in binary mode Python Text
Unix: newline ('n') Windows: carriage return + newline ('rn') Oh dear… Python converts 'rn' to 'n' and back on Windows To prevent this (e. g. , when reading image files) open the file in binary mode reader = open('mydata. dat', 'rb') Python Text
Back to characters… Python Text
Back to characters… How to represent ĕ, β, Я, …? Python Text
Back to characters… How to represent ĕ, β, Я, …? 7 bits = 0… 127 Python Text
Back to characters… How to represent ĕ, β, Я, …? 7 bits = 0… 127 8 bits (a byte) = 0… 255 Python Text
Back to characters… How to represent ĕ, β, Я, …? 7 bits = 0… 127 8 bits (a byte) = 0… 255 Different companies/countries defined different meanings for 128. . . 255 Python Text
Back to characters… How to represent ĕ, β, Я, …? 7 bits = 0… 127 8 bits (a byte) = 0… 255 Different companies/countries defined different meanings for 128. . . 255 Did not play nicely together Python Text
Back to characters… How to represent ĕ, β, Я, …? 7 bits = 0… 127 8 bits (a byte) = 0… 255 Different companies/countries defined different meanings for 128. . . 255 Did not play nicely together And East Asian "characters" won't fit in 8 bits Python Text
1990 s: Unicode standard Python Text
1990 s: Unicode standard Defines mapping from characters to integers Python Text
1990 s: Unicode standard Defines mapping from characters to integers Does not specify how to store those integers Python Text
1990 s: Unicode standard Defines mapping from characters to integers Does not specify how to store those integers 32 bits per character will do it. . . Python Text
1990 s: Unicode standard Defines mapping from characters to integers Does not specify how to store those integers 32 bits per character will do it. . . but wastes a lot of space in common cases Python Text
1990 s: Unicode standard Defines mapping from characters to integers Does not specify how to store those integers 32 bits per character will do it. . . but wastes a lot of space in common cases Use in memory (for speed) Python Text
1990 s: Unicode standard Defines mapping from characters to integers Does not specify how to store those integers 32 bits per character will do it. . . but wastes a lot of space in common cases Use in memory (for speed) Use something else on disk and over the wire Python Text
(Almost) everyone uses a variable-length encoding called UTF-8 instead Python Text
(Almost) everyone uses a variable-length encoding called UTF-8 instead First 128 characters (old ASCII) stored in 1 byte each Python Text
(Almost) everyone uses a variable-length encoding called UTF-8 instead First 128 characters (old ASCII) stored in 1 byte each Next 1920 stored in 2 bytes, etc. Python Text
(Almost) everyone uses a variable-length encoding called UTF-8 instead First 128 characters (old ASCII) stored in 1 byte each Next 1920 stored in 2 bytes, etc. 0 xxxxxxx Python 7 bits Text
(Almost) everyone uses a variable-length encoding called UTF-8 instead First 128 characters (old ASCII) stored in 1 byte each Next 1920 stored in 2 bytes, etc. 110 yyyyy Python 0 xxxxxxx 7 bits 10 xxxxxx 11 bits Text
(Almost) everyone uses a variable-length encoding called UTF-8 instead First 128 characters (old ASCII) stored in 1 byte each Next 1920 stored in 2 bytes, etc. 1110 zzzz Python 0 xxxxxxx 7 bits 110 yyyyy 10 xxxxxx 11 bits 10 yyyyyy 10 xxxxxx 16 bits Text
(Almost) everyone uses a variable-length encoding called UTF-8 instead First 128 characters (old ASCII) stored in 1 byte each Next 1920 stored in 2 bytes, etc. 0 xxxxxxx 7 bits 110 yyyyy 10 xxxxxx 11 bits 1110 zzzz 10 yyyyyy 10 xxxxxx 16 bits 11110 www 10 zzzzzz 10 yyyyyy 10 xxxxxx 21 bits Python Text
(Almost) everyone uses a variable-length encoding called UTF-8 instead First 128 characters (old ASCII) stored in 1 byte each Next 1920 stored in 2 bytes, etc. 0 xxxxxxx 7 bits 110 yyyyy 10 xxxxxx 11 bits 1110 zzzz 10 yyyyyy 10 xxxxxx 16 bits 11110 www 10 zzzzzz 10 yyyyyy 10 xxxxxx 21 bits The good news is, you don't need to know Python Text
Python 2. * provides two kinds of string Python Text
Python 2. * provides two kinds of string Classic: one byte per character Python Text
Python 2. * provides two kinds of string Classic: one byte per character Unicode: "big enough" per character Python Text
Python 2. * provides two kinds of string Classic: one byte per character Unicode: "big enough" per character Write u'the string' for Unicode Python Text
Python 2. * provides two kinds of string Classic: one byte per character Unicode: "big enough" per character Write u'the string' for Unicode Must specify encoding when converting from Unicode to bytes Python Text
Python 2. * provides two kinds of string Classic: one byte per character Unicode: "big enough" per character Write u'the string' for Unicode Must specify encoding when converting from Unicode to bytes Use UTF-8 Python Text
created by Greg Wilson October 2010 Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http: //software-carpentry. org/license. html for more information.
- Making connections
- Copyright 2010 pearson education inc
- Copyright 2010
- Variazioni finanziarie attive
- Copyright 2010 pearson education inc
- Copyright 2010 pearson education inc
- Copyright 2010
- Copyright 2010
- Copyright 2010 pearson education inc
- Copyright 2010 pearson education inc
- Copyright 2010 pearson education inc
- 2010 pearson education inc
- Copyright 2010 pearson education inc
- Composition copyright example
- Copyright 2010 pearson education inc
- Naive bayes text classification python code
- Python file processing
- Columbus custom carpentry
- Chisel is under what classification of carpentry tools
- Name three types of widening joint
- A carpentry hammer with a slightly rounded
- Big carpentry
- It is classified as holding tool
- Types of joint wood
- Software defined networking python
- What is conflict in work immersion
- What i learned about work immersion
- Hard work vs smart work presentation
- Work in present continuous
- Group work objectives
- Element of social case work
- Work group vs work team
- Physics 03-02 potential energy and conservative forces
- Section 4 review physical science
- I work all day i work all night
- Smart work vs hard work group discussion
- Shs work immersion
- What do you think work in pairs and discuss the questions
- Work in pairs answer the questions about the text
- What is a text box in a nonfiction book
- 5 exemple de texte nonliterare
- Text features vs text structures
- Generic structure of report text
- Text to world examples
- Text _____ in a nonfiction text is its organization.
- End text citation
- Example of intertextuality
- Close reading symbols
- What is source text
- Narrative vs expository writing
- Linear text to nonlinear text
- Text analytics and text mining
- What is a expository text
- Text analytics and text mining
- Linear text examples
- Expository text structures
- What text features are included in this text