Programming for Engineers in Python Lecture 3 Data
- Slides: 32
Programming for Engineers in Python Lecture 3: Data Analysis Autumn 2011 -12 1
Lecture 2: Highlights • Simulation: power lines and rare diseases • Plan before coding • Using Modules • Import • Constants 2
Tuples • Fixed size • Immutable (similarly to Strings) • What are they good for (compared to list)? • Simpler (“light weight”) • Staff multiple things into a single container • Immutable (e. g. , records in database) 3
Dictionaries (Hash Tables) • Key – Value mapping • Fast! • Usage: • Database • Dictionary • Phone book keys values 4
Dictionaries (Cont. ) 5
Dictionaries (Cont. ) 6
Dict – Initiate Dictionary from a List 7
Sorting Lists 8
Types and Casting 9
Today: Data Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains. • Descriptive • Predictive 10
Data Analysis Examples • Stock market trends • Genome-disease association • Face recognition • Production yield • Business intelligence • Speech recognition • Text categorization 11
Text Categorization / Document Classification 12
How is it Done? • Manually • Automatically • Gather document statistics • Measure how similar it is to documents in each category • Today we will collect word-statistics from several well known books
Plan • Find data • Collect word statistics • Observe results
Find Data • This might be the hardest task for many applications! • Project Gutenberg (http: //www. gutenberg. org/) • Alice's Adventures in Wonderland (http: //www. gutenberg. org/cache/epub/11/pg 11. txt) • The Bible, King James version, Book 1: Genesis (http: //www. gutenberg. org/cache/epub/8001/pg 8001. txt)
Reading a Book
Flying 17
Print Most Popular Words (High Level)
Modular Programming • Top-down approach: first write what you plan to do and then implement the details • Clear for readers • Easier to debug • Easier to update
Print. Most. Popular Build Word-Occurrences Dictionary
Print. Most. Popular Sort Words by Occurrences ? http: //docs. python. org/library/operator. html
The Code
Results
And Now for Several Books
Results The word “to” as an example Bible L. Carroll
How is it Really Done? • Preprocessing (e. g. , words to lower case, remove punctuation signs) • Word count • Enhance statistics • • Discard stop words (e. g. , and, of, a) Stemming (e. g. , go & went) Synonyms ( )מילים נרדפות bigrams, trigrams • Similarity measures to existing documents / categories
How is it Really Done? Categories Topics: http: //www. cs. tau. ac. il/courses/py. Prog/1112 a/lectures/3/topics. rbb Categories Hierarchy: http: //www. cs. tau. ac. il/courses/py. Prog/1112 a/lectures/3/rcv 1. topics. hier. orig
How is it Really Done? Enhance Statistics Stop words: http: //www. cs. tau. ac. il/courses/py. Prog/1112 a/lectures/3/english. stop After processing: http: //www. cs. tau. ac. il/courses/py. Prog/1112 a/lectures/3/lyrl 2004 -non-v 2_tokens_test_pt 0. dat
Off Topic: Find the Error
And Now?
And Now?
INDENTATION IS REALLY, REALLY IMPORTANT!
- Introduction to matlab for engineers
- 01:640:244 lecture notes - lecture 15: plat, idah, farad
- C data types with examples
- Python procedural
- Python chapter 5
- Second order cone programming python
- Python programming in context
- Constraint programming python
- Rapid gui programming with python and qt
- Python programming in context
- Cgi python
- Audiolab python
- Python str
- Programming essentials in python
- Python blackjack oop
- Python programming an introduction to computer science
- Perbedaan linear programming dan integer programming
- Greedy algorithm vs dynamic programming
- Windows 10 system programming, part 1
- Linear vs integer programming
- Perbedaan linear programming dan integer programming
- Kontinuitetshantering i praktiken
- Novell typiska drag
- Tack för att ni lyssnade bild
- Ekologiskt fotavtryck
- Varför kallas perioden 1918-1939 för mellankrigstiden
- En lathund för arbete med kontinuitetshantering
- Särskild löneskatt för pensionskostnader
- Personlig tidbok fylla i
- Anatomi organ reproduksi
- Densitet vatten
- Datorkunskap för nybörjare
- Stig kerman