For the birds Selecting and sorting data CS
For the birds Selecting and sorting data CS 112 Scientific Computation Department of Computer Science Wellesley College 1 -
A bird database Name Great. Egret Great. Blue. Heron Snowy. Egret Green. Heron American. Bittern Mute. Swan Canada. Goose Snow. Goose American. Black. Duck Northern. Pintail Mallard Blue. Winged. Teal Osprey Bald. Eagle Northern. Harrier Coopers. Hawk Red. Tailed. Hawk Peregrine. Falcon. . . Family heron heron waterfowl waterfowl hawk/eagle hawk/eagle Habitat Size marshes 39 marshes 48 marshes/ponds 24 marshes/ponds 19 marshes 28 lagoons/ponds 60 wetlands/fields 40 marshes/fields 28 wetlands 23 marshes/ponds 26 wetlands 24 marshes/ponds 15 wetlands 23 wetlands 32 marshes/fields 22 woods 17 woods/fields 22 marshes/cities 18 Wingspan 51 72 38 25 45 96 72 58 36 35 36 24 72 80 54 28 58 44 Back. Color Under. Color Head. Color Spotted Comment white no long. Legs blue/gray black/white no long. Legs white no long. Legs green brown black no dark. Bill brown no green. Legs white no orange. Bill brown black/white no flies. In. V white no orange. Bill brown no yellow. Bill gray white brown no long. Tail brown gray green no purple. Chest brown blue/gray yes blue. Shoulders brown white brown/white yes white. Legs brown white no yellow. Beak gray brown no long. Tail gray white gray no orange. Legs brown white brown yes orange. Tail gray black/white no sideburn Suppose you’d like to select all the herons, select the birds with giant wingspans, sort by size or wingspan, sort alphabetically by name… Databases 2
Let’s load it in and see what we have % load. Birds. m birds = cell(1, 10); [birds{1} birds{2} birds{3} birds{4} birds{5} birds{6} birds{7} … birds{8} birds{9} birds{10}] =. . . textread('birds. txt', '%s %s %s %u %u %s %s %s', 'headerlines', 1); function print. Bird. Info (birds) % prints out all the information stored in the input cell array for i = 1: length(birds{1}) disp(sprintf('%22 s %10 s %17 s %3 u %12 s %4 s %17 s', . . . birds{1}{i}, birds{2}{i}, birds{3}{i}, birds{4}(i), birds{5}(i), . . . birds{6}{i}, birds{7}{i}, birds{8}{i}, birds{9}{i}, birds{10}{i})) end Databases 3
Selecting all the herons function herons = get. Herons (birds) % find the indices of all birds from the heron family indices = find(strcmp(birds{2}, 'heron')); % create an empty cell array and fill it with all the % information from the heron family herons = cell(1, 10); for i = 1: 10 herons{i} = birds{i}(indices); end Exercise: select birds with large wingspans (> 48”) Databases 4
MATLAB sort function >> nums = [7 2 9 7 8 3 6 1 3 4]; >> sort. Nums = sort(nums) sort. Nums = 1 2 3 3 4 6 7 7 8 9 >> sort. Nums = sort(nums, 'descend') sort. Nums = 9 8 7 7 6 4 3 3 2 1 >> [sort. Nums sort. Indices] = sort(nums, 'ascend') sort. Nums = 1 2 3 3 4 6 7 7 8 9 sort. Indices = 8 2 6 9 10 7 1 4 5 3 Exercise: What does the expression nums(sort. Indices) return? Databases 5
Now let’s sort a cell array of strings >> words = {'early' 'cloud' 'heights' 'a' 'black' 'great' 'from’ 'descended'}; >> sort. Words = sort(words) sort. Words = 'a' 'black' 'cloud' 'descended' 'early' 'from' 'great' 'heights' >> words = {'early' 'Cloud' 'heights' 'A' 'black' 'Great' 'from' 'Descended'}; >> [sort. Words sort. Indices] = sort(words) sort. Words = 'A' 'Cloud' 'Descended' 'Great' 'black' 'early' 'from' 'heights' sort. Indices = 4 2 8 6 5 1 7 3 Hmmmm…. What’s going on here? ? ? Databases 6
Remember the ASCII code? When comparing the order of two strings MATLAB uses the order of characters in the ASCII code in which all capital letters appear before all lowercase letters Exercise: Write a function that sorts a cell array of words alphabetically, independent of capitalization Databases 7
Sorting the bird data function sorted. Data = sort. By. Wingspan (birds) % sort the bird information by wingspan [temp indices] = sort(birds{5}); % create an empty cell array and fill it with all the bird % information in sorted order sorted. Data = cell(1, 10); for i = 1: 10 sorted. Data{i} = birds{i}(indices); end Exercise: sort the birds alphabetically by name Databases 8
On the flip side… Suppose we have some nice, orderly information that we want to scramble >> order = randperm(10) order = 6 2 5 1 4 8 10 3 7 9 >> conditions = [2. 0 -2. 0 2. 5 -2. 5 3. 0 -3. 0 3. 5 -3. 5 4. 0 -4. 0]; >> new. Conditions = conditions(order) new. Conditions = -3. 0 -2. 0 3. 0 2. 0 -2. 5 -3. 5 -4. 0 2. 5 3. 5 4. 0 Exercise: write a function make. Anagram that has an input string and returns an anagram of the string Databases 9
Revisiting selection function selection = get. Herons (birds) % select birds from the heron family indices = find(strcmp(birds{2}, 'heron')); selection = cell(1, 10); for i = 1: 10 selection{i} = birds{i}(indices); end function selection = get. Smalls (birds) % select small birds indices = find(birds{4} < 10); selection = cell(1, 10); for i = 1: 10 selection{i} = birds{i}(indices); end function selection = get. Big. Wings (birds) % select birds with large wingspans indices = find(birds{5} > 48); selection = cell(1, 10); for i = 1: 10 selection{i} = birds{i}(indices); end function selection = get. Wetlands (birds) % select birds who inhabit wetlands indices = []; for i = 1: length(birds{3}) if findstr(birds{3}{i}, 'wetlands') indices(end+1) = i; end selection = cell(1, 10); for i = 1: 10 selection{i} = birds{i}(indices); end Databases 10
Tailoring the selection criterion function indices = get. Herons (birds) % select birds from the heron family indices = find(strcmp(birds{2}, 'heron'); function indices = get. Big. Wings (birds) % select birds with large wingspans indices = find(birds{5} > 48); function indices = get. Wetlands (birds) % select birds who inhabit wetlands indices = []; for i = 1: length(birds{3}) if findstr(birds{3}{i}, 'wetlands') indices(end+1) = i; end function indices = get. Smalls (birds) end % select small birds indices = find(birds{4} < 10); Databases 11
Functions as inputs function new. Val = calc (val) new. Val = 2*val+3; function results = apply. Func 1 (func, vect) for i = 1: length(vect) results(i) = feval(func, vect(i)); end >> results = apply. Func 1('calc', nums) function results = apply. Func 2 (func, vect) for i = 1: length(vect) results(i) = func(vect(i)) end >> results = apply. Func 2(@calc, nums) Databases 12
Revisiting selection function selection = get. Birds 1 (birds, criterion) % select birds using input criterion function indices = feval(criterion, birds) selection = cell(1, 10); for i = 1: 10 selection{i} = birds{i}(indices); end >> herons = get. Birds 1(birds, 'get. Herons'); function selection = get. Birds 2 (birds, criterion) % select birds using input criterion function indices = criterion(birds) selection = cell(1, 10); for i = 1: 10 selection{i} = birds{i}(indices); end >> herons = get. Birds 2(birds, @get. Herons) Databases 13
The search is on. . . Suppose we have our data in sorted order, how can we search for a particular value, in an efficient way? Let’s play a game. . . I’m thinking of a number between 1 and 100 What is it? You’re probably using the binary search strategy Databases 14
- Slides: 14