Chris Saxton Maria Sinn Matt Wronski Arifur Sumon

  • Slides: 11
Download presentation
Chris Saxton, Maria Sinn, Matt Wronski, Arifur Sumon Rahman GIGAWORD COUNTER

Chris Saxton, Maria Sinn, Matt Wronski, Arifur Sumon Rahman GIGAWORD COUNTER

Overview Different Methods Used Results Amdahl’s Law

Overview Different Methods Used Results Amdahl’s Law

Alpha Design Used manager-worker paradigm Each worker received one file name Counted words and

Alpha Design Used manager-worker paradigm Each worker received one file name Counted words and articles in a file Returned counts to manager after each file Problems Must have at least as many files as processes Files must be of similar size

Beta Decomposition Wanted to send each worker an article at a time Used fseek()

Beta Decomposition Wanted to send each worker an article at a time Used fseek() and ftell() Send file name and location of line in file to each worker Problems Lots of communication Each file was read a total of three times

Beta Design 2 nd Attempt Used getline() Two methods Each worker received an article

Beta Design 2 nd Attempt Used getline() Two methods Each worker received an article Each worker read the ever n lines from file Problems High communication costs May require many file reads Huge buffers for send

Data Storage (struct)

Data Storage (struct)

Brute force approach Due to complexity with struct Decided to store every term of

Brute force approach Due to complexity with struct Decided to store every term of length m – n in an array Checked each term individually to see if it was distinct Problems No shared memory Uses a lot of memory A lot of time spent searching the array for already existing terms

Decomposition by Term Manager sends a term of length n to each worker Manager

Decomposition by Term Manager sends a term of length n to each worker Manager responsible for total word and article count Workers assigned part of the alphabet Terms sent to appropriate worker based on first two letters Preserves distinctness of terms

Results of Alpha Processors 32 64 128 256 Execution Time 45. 58 s 30.

Results of Alpha Processors 32 64 128 256 Execution Time 45. 58 s 30. 36 s 27. 30 s 23. 93 s Execution time increases significantly when there are smaller number of large files.

Beta Results With GIGAWORD corpus as the input, beta program would not finish or

Beta Results With GIGAWORD corpus as the input, beta program would not finish or produce expected result. However, it would run on smaller portion of data, although the count would include some (but not all) duplicates of distinct terms.

Amdahl’s Law

Amdahl’s Law