ECE 6580 Lecture 9 My Fir Asm asm

  • Slides: 6
Download presentation
ECE 6580 Lecture 9

ECE 6580 Lecture 9

My. Fir. Asm. asm. global _My. Fir. Asm; _My. Fir. Asm: entry; dm(b 2

My. Fir. Asm. asm. global _My. Fir. Asm; _My. Fir. Asm: entry; dm(b 2 Save) = b 2; dm(i 2 Save) = i 2; l 4 = reads(1); l 2 = l 4; b 4 = r 8; b 2 = r 12; // save length of filter in l 4 // save length in l 2; // pointer to states goes in b 4 and i 4 // pointer to coefs goes in b 2 and i 2

Un-Rolled Loop f 8=f 4; // x needs to be in f 8 f

Un-Rolled Loop f 8=f 4; // x needs to be in f 8 f 1=dm(i 4, m 5); // fetch states[0], but do not inc i 4 dm(i 4, m 6) = f 8; // states[0] = x f 2=dm(i 2, m 6); // fetch coefs[0] f 4 = f 1*f 2; // coefs[0]*states[0] f 0 = f 0 + f 4; // acc = acc + coefs[0]*state[0] f 8 = dm(i 4, m 5); // fetch states[1], but do not inc i 4 dm(i 4, m 6) = f 1; // states[1] = states[0] f 2=dm(i 2, m 6); // fetch coefs[1] f 4 = f 8*f 2; // coefs[1]*states[1] f 0 = f 0 + f 4; // acc = acc + coefs[1]*states[1] f 1 = dm(i 4, m 5); // fetch states[2], but do not increment i 4 dm(i 4, m 6) = f 8; // states[2] = states[1] f 2=dm(i 2, m 6); // fetch coefs[2] f 4 = f 1*f 2; // coefs[2]*states[2] f 0 = f 0 + f 4; // acc = acc + coefs[2]*state[2] f 8 = dm(i 4, m 5); // fetch states[3], but do not inc i 4 dm(i 4, m 6) = f 1; // states[3] = states[2]

How Can We Roll It Up? f 8=f 4; // x needs to be

How Can We Roll It Up? f 8=f 4; // x needs to be in f 8 f 1=dm(i 4, m 5); // fetch states[0], but do not inc i 4 dm(i 4, m 6) = f 8; // states[0] = x f 2=dm(i 2, m 6); // fetch coefs[0] f 4 = f 1*f 2; // coefs[0]*states[0] f 0 = f 0 + f 4; // acc = acc + coefs[0]*state[0] f 8 = dm(i 4, m 5); // fetch states[1], but do not inc i 4 dm(i 4, m 6) = f 1; // states[1] = states[0] f 2=dm(i 2, m 6); // fetch coefs[1] f 4 = f 8*f 2; // coefs[1]*states[1] f 0 = f 0 + f 4; // acc = acc + coefs[1]*states[1] f 1 = dm(i 4, m 5); // fetch states[2], but do not increment i 4 dm(i 4, m 6) = f 8; // states[2] = states[1] f 2=dm(i 2, m 6); // fetch coefs[2] f 4 = f 1*f 2; // coefs[2]*states[2] f 0 = f 0 + f 4; // acc = acc + coefs[2]*state[2] f 8 = dm(i 4, m 5); // fetch states[3], but do not inc i 4 dm(i 4, m 6) = f 1; // states[3] = states[2]

Rolled and Ready to Go f 2=dm(i 2, m 6); // fetch coefs[0] f

Rolled and Ready to Go f 2=dm(i 2, m 6); // fetch coefs[0] f 0 = f 2*f 4; // acc = coefs[0]*x f 8 = f 4; lcntr = r 2, do My. Fir. Asm. End until lce; f 1=dm(i 4, m 5); // fetch states[i], but do not inc i 4 dm(i 4, m 6) = f 8; // states[i] = state[i-1] f 2=dm(i 2, m 6); // fetch coefs[i] f 4 = f 8*f 2; // coefs[i]*states[i] f 0 = f 0 + f 4; // acc = acc + coefs[i]*state[i] f 8 = dm(i 4, m 5); // fetch states[i], but do not inc i 4 dm(i 4, m 6) = f 1; // states[i+1] = states[i] f 2=dm(i 2, m 6); // fetch coefs[i+1] f 4 = f 8*f 2; // coefs[i+1]*states[i+1] My. Fir. Asm. End: f 0 = f 0 + f 4; // acc = acc + coefs[i+1]*states[i+1]

Bench Mark Numbers • My. Fir. Asm 8535 cycles 1076 cycles (optimized) 1304 cycles

Bench Mark Numbers • My. Fir. Asm 8535 cycles 1076 cycles (optimized) 1304 cycles