Pandas Data Frame 2 D Tabular Structure that

Pandas

Data. Frame 2 D Tabular Structure that is mutable, has rows and columns, and you can do fun things with it! Sepal. Length Sepal. Width Petal. Length Petal. Width Species Treatment 5. 1 3. 5 1. 4 0. 2 setosa 1 4. 9 3 1. 4 0. 2 setosa 1 4. 7 3. 2 1. 3 0. 2 setosa 0 4. 6 3. 1 1. 5 0. 2 setosa 0 5 3. 6 1. 4 0. 2 setosa 1 5. 4 3. 9 1. 7 0. 4 setosa 1 4. 6 3. 4 1. 4 0. 3 setosa 0 5 3. 4 1. 5 0. 2 setosa 1 4. 4 2. 9 1. 4 0. 2 setosa 0 4. 9 3. 1 1. 5 0. 1 setosa 0 5. 4 3. 7 1. 5 0. 2 setosa 1

Pandas Module that lets you easily manipulate the dataframe for analysis Lots of commands!! ########## # import modules import pandas as pd ##########

Install and Import To install for the first time: conda install pandas OR pip install pandas When you import in a shell or run a script, type: This tells python to use the “pandas” module and nickname it “pd” import pandas as pd

Different ways to Read in a file df = pd. read_csv(”data. csv") ###### df = pd. read_csv(”data. csv”, headers=none, sep=“t”) ####### fields = [”Column 1”, “Column 2”, “Column 3”] df = pd. read_csv(“data. csv”, names = fields, sep = “, ”)
![Handy Commands watch out for when you need to use (), [], periods, and Handy Commands watch out for when you need to use (), [], periods, and](http://slidetodoc.com/presentation_image_h/00e988f9517555c9467dd34499125d8b/image-6.jpg)
Handy Commands watch out for when you need to use (), [], periods, and quotes… its tricky! df. head() print out the top 10 rows df. tail() print out bottom 10 rows df. <column 1>. unique() prints out only unique values in a certain column df. drop([“column 1”], axis=1) deletes column 1 df[”column 2”]. min() minimum values of column 2 df[“column 2”]. mean() mean values of column 2

Practice Iris dataset iris-data. csv is on the website if you want to follow along I will post finished script on github

Bedfiles Scaffold Start Stop Element Score Strand Family Sub-Family. Divergence

sines. py 1. Use avan_rm. bed file in “/lustre/work/jenjense/python/Pandas” 2. Make sure it has proper column names 3. Determine what families are in there (SINE, etc) 4. Create new dataframe from that file using only elements in family “SINE” 5. Drop columns “Strand” and “Score” 6. Create new column “Length” 7. Determine min, max, and mean for all SINEs 8. Determine min, max, and mean length for each sub-family of SINE (metulj and Zeno. SINE)

More tutorials for help https: //www. learndatasci. com/tutorials/python-pandas-tutorial-complete-introduction-forbeginners/
- Slides: 10