Virtual University of Pakistan Data Warehousing Lab Lect2
Virtual University of Pakistan Data Warehousing Lab Lect-2 Lab Data Set Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www. nu. edu. pk/cairindex. asp FAST National University of Computers & Emerging Sciences, Islamabad DWH-Ahsan Abdullah 1
Multi-Campus University 2 DWH-Ahsan Abdullah
Degree Programs 3 DWH-Ahsan Abdullah
Disciplines for BS 4 DWH-Ahsan Abdullah
Disciplines for MS 5 DWH-Ahsan Abdullah
The need § Head Office wants a central data repository for decision support i. e. a DWH 6 DWH-Ahsan Abdullah
Students Record Keeping & Mgmt. 7 DWH-Ahsan Abdullah
Data from Lahore Campus 8 DWH-Ahsan Abdullah
Data from Lahore Campus: Sample 9 DWH-Ahsan Abdullah
Lahore: Header of Student Table § SID § St_Name § Father_Name 10 DWH-Ahsan Abdullah
Lahore: Header of Student Table § Gender § Address § [Date of Birth] § [Reg Date] 11 DWH-Ahsan Abdullah
Lahore: Header of Student Table § [Reg Status] § [Degree Status] § [Last Degree] 12 DWH-Ahsan Abdullah
Lahore: Header of Course Reg. Table § SID § Degree § Semester § Course § Marks § Discipline 13 DWH-Ahsan Abdullah
Lahore: Facts About Data 14 DWH-Ahsan Abdullah
Data from Karachi Campus 15 DWH-Ahsan Abdullah
Data from Karachi Campus: Sample 16 DWH-Ahsan Abdullah
Karachi: Header of Student Table § St_ID § Name § Father § Do. B § M/F § Do. Reg § RStatus § DStatus § Address § Qualification 17 DWH-Ahsan Abdullah
Karachi: Header of Course Reg. Table § SID: § Courses § Score § Sem § Disp Degree (BS/MS) is missing because separate books are maintained, but the issue is critical while loading data 18 DWH-Ahsan Abdullah
Karachi: Facts About Data 19 DWH-Ahsan Abdullah
Data from Islamabad Campus 20 DWH-Ahsan Abdullah
Data from Islamabad Campus: Sample 21 DWH-Ahsan Abdullah
Islamabad: Header of Student Table § Roll Num § Name § Father § Reg Date § Reg Status § Degree Status § Date of Birth § Education § Gender § Address 22 DWH-Ahsan Abdullah
Islamabad: Header of Course Reg. Table § Roll Num: § Course § Marks § Discipline § Session Degree (BS/MS) is missing, whereas same table contains records for both. Only way to differentiate is through discipline attribute. 23 DWH-Ahsan Abdullah
Islamabad: Facts About Data 24 DWH-Ahsan Abdullah
Exercise 25 DWH-Ahsan Abdullah
Problems with Adhoc Approach 26 DWH-Ahsan Abdullah
Problem-1: Non-Standard data sources LAHORE KARACHI ISLAMABAD PESHAWAR Uses Text Files Uses Excel Book Uses MS-ACCESS Uses Text Files 27 DWH-Ahsan Abdullah
Problem-2: Non-standard attributes 28 DWH-Ahsan Abdullah
Problem-3: Non Normalized database 29 DWH-Ahsan Abdullah
Notepad: Issues 30 DWH-Ahsan Abdullah
MS-Excel: Issues 31 DWH-Ahsan Abdullah
MS-Access: Issues 32 DWH-Ahsan Abdullah
Problem Statement 33 DWH-Ahsan Abdullah
Data from Peshawar Campus § Data at Peshawar campus is stored in Text files § To store data regarding one complete batch 2 text files are used § Lhr_Student_batch (Student record) § Lhr_Detail_batch (Course Reg. record) § 22 text files for 11 BS batches § 8 text files for 4 MS batches DWH-Ahsan Abdullah 34
Data from Peshawar Campus: Sample 35 DWH-Ahsan Abdullah
Peshawar: Header of Student Table § Reg#: Student identity § Name: Student name § Father: Father name § Address: Permanent address § Date of Birth: Date of Birth § last. Deg: Last degree achieved § Reg Date: Date of Enrollment § Reg Status: Status of Enrollment (A/T) § Degree Status: Status of Degree (C/I) DWH-Ahsan Abdullah 36
Peshawar: Header of Course Reg. Table § § § Reg#: Courses: Course code Score: Out of 100 Program: CS/TC/SE/CE Sem: Fall/Spring Year: YYYY e. g. 1999 We need to identify semester session (fall 04) through combination of Sem and Year 37 DWH-Ahsan Abdullah
- Slides: 37