Error Detecting Codes for Serial links an alternative

Abstract In this talk we discuss algorithms and structures for n error detection as

Serial link failures and errors Two main problems regarding errors due to rad hard

Error Correction vs Error Detection q When retransmission is feasible error correction may be

Error Detecting Codes: a review A large number of error detection techniques and codes

Error Detecting codes: CRC code CRC coding is based on a polynomial representation of

Error Detecting codes: CRC code Given a generator CRC polynomial g with g bits

Error Detecting codes: CRC What we will do is appendingdetection to the message the

CRC realization message polynomial Generator polinomial Polinomial division quozient remainder code Feedback shift register

Features of CRC coding A large variety of polynomials may be used: the longer

CRC coding in existing standards Name r Generator Polynomial Factor x+1 Standard CRC-12 12

CRC coding in the standard Ethernet protocol The frame check sequence (FCS) field follows

CRC coding on a 18 bits block CRC-3 CRC-4 CRC-5 x 3+x+1 x 4+x

Why some errors remain undetected? Limitations of the CRC codes depend on some erratic

Undetected error probability for We may analyze all possible error patterns and find out

Error detection codes for Super. B Starting point for the serial trasmission and the

Error detection codes for Super. B Considering blocks of N 18 bit serdes stream

Detection efficiency for CRC Two main parameters are overhead=crc_bits/message_bits efficiency = no. Detected /

Undetected error probability for This high value fo the efficiency depends of course mainly

Simulating CRC check CRC len polynomial N=5 N=6 N=7 N=8 N=9 CRC - 5

Error Detecting codes: CHECKSUMs Checksum was introduced in order to grant really very simple

Error Detecting codes: checksum A number of different solutions are devised: The message is

CHECKSUMs: comparison Parameters for a comparison are d minumum distance between codewords b burst

CHECKSUMs: simulations Short words eg, a single 18 bits stream protection deliver both §

CHECKSUMs: multiple 18 bits stream Number in parenthesis [N S n] are N number

CHECKSUMs: multiple 18 bits stream range of interest Number in parenthesis [N S n]

CRC vs CHECKSUMs CRC shows large gap in performance as shown in the figures

To be done next q Obtain precise figures on the bit error rate in

Conclusions q We made a recognition of current techniques for the detection of errors

Slides: 29

Download presentation

Error Detecting Codes for Serial links: an alternative to error correction Sergio Cavaliere q Department of Physics, University of Napoli “Federico II”, Italy and q INFN Sezione di Napoli, Italy e-mail: sergio. cavaliere@na. infn. it XVII Super. B Workshop – La Biodola - may 2011

Abstract In this talk we discuss algorithms and structures for n error detection as a possible alternative to full error correcting codes. n this solution, suitable to the actual case where the expected error rate is very low, shows good results at a much lower hardware complexity and timing latency. Cavaliere - Super. B Workshop - may 2011

Serial link failures and errors Two main problems regarding errors due to rad hard environment : q Loss Of Lock – due to failures on fixed bits in the SERDES – Conclusion: need to provide a direct fast link between transmitter and receiver in order to signall promptly occurrence of Lo. L q Bit errors due to the radiation hard environment: affect data integrity and data quality • Solutions: • Error Correcting Code (ECC) • computationally intensive. • Suitable for high noise level • may preclude future technological link upgrades • Error Detecting Code (EDC) • Less intensive computationally • Requires re-transmission of data • needs a feedback loop • or in alternative may allow discarding data off line • suitable for low BER Bit Error Rate Cavaliere - Super. B Workshop - may 2011

Error Correction vs Error Detection q When retransmission is feasible error correction may be simply obtained by means of Error Detection and subsequent ARQ Automatic Repeat re. Quest Due to the low error rate in our case both data-rate and latency are not affected q When short data frames do not preclude the overall information data may discarderd later in the communication stream, even in an off line stage. A specified level of data quality must be granted. This is attained because of the low error rate An important parameter for the choice is the noise level: o High level noise requires real time Error Correction in order to prevent lowering the data rate (in the case of frequent re-trasmission) o Error correction doesn’t require a feedback loop o Low level noise would make a little use of a complex correction mechanism: Error Detection may suffice. A repeat mechanism with a consequent doubling of the transmission time of the packet may be adopted o ARQ requires a feedback loop to signall errors and require re-transmission Cavaliere - Super. B Workshop - may 2011

Error Detecting Codes: a review A large number of error detection techniques and codes (introduced since the ‘ 60): q q q q CRC Cyclic Redundancy Check Fletcher Checksum Internet checksum XTP CXOR WSC Weighted Sum Codes ……. Parameters for choice are: q overhead q Probability of undetected error q Computational complexity Cavaliere - Super. B Workshop - may 2011

Error Detecting codes: CRC code CRC coding is based on a polynomial representation of a binary message [0 0 1 0 1 0 1 0 0 0 1 1 1 0] In this representation polynomials are defined in the Galois field GF(2) with the usual x and + operations. Msg polynomial = x 7 + x 5+ x 4+ x 3 + 1 Msg = [1 0 1 1 1 0 0 1] + operations + bitwise XOR X bitwise AND Cavaliere - Super. B Workshop - may 2011

Error Detecting codes: CRC code Given a generator CRC polynomial g with g bits and a message msg g= x 3 + x+ 1 g = [1 0 1 1] m = x 7 + x 5+ x 4+ x 3 + m = [1 0 1 1 1 0 0 0] we may multiply the message by xg m*xg [1 0 1 1 1 0 0 0] if we divide by polynomial g m*xg =qg+r [1 0 1 1 1 0 0 0]= [1 0 0 0 1 1] [1 0 1 1]+ [1 0 1] adding r m*xg+r=qg+r+r [1 0 1 1 1 0 0 0] + [1 0 1]= [1 0 0 0 1 1] [1 0 1 1]+[1 0 1] m*xg+r=qg [1 0 1 1 1 0 0 0] + [1 0 1]= [1 0 0 0 1 1] [1 0 1 1] = [1 0 1 1 1 0 0 0 1] = [msg remainder] This polynomial is then exact multiple of the CRC polynomial g. If we transmit the polynomial mxg+r= [msg remainder] we may verify at the arrival if it is still exact multple of the CRC polynomial g. q If this happens we may infer that probably no error was added by noise q If this is not true we may infer that probably error(s) was added by noise Cavaliere - Super. B Workshop - may 2011

Error Detecting codes: CRC What we will do is appendingdetection to the message the remainder of a proper division by the generator polynomial before transmitting the whole. If g is the degree of the generating polynomial we have to add just g check bits again code/g = [1 0 1 1 1 0 0 0 1]/[1 0 1 1] gives r = [0 0 0] : No ERROR code noisy = code noisy/g= [1 0 1 0 0 0 1] 1 error [1 0 1 0 0 0 1]/[1 0 1 1] gives r = [0 0 1] : ERROR quozient is discarded [1 0 0 1 1 1 0 0] Cavaliere - Super. B Workshop - may 2011

CRC realization message polynomial Generator polinomial Polinomial division quozient remainder code Feedback shift register Cavaliere - Super. B Workshop - may 2011

Features of CRC coding A large variety of polynomials may be used: the longer the polynomial the larger the overhead and the better the detecting ability. The simplest polinomial x+1 delivers 1 bit remainder and reverts to the usual parity bit. Main features of CRC coding: A proper CRC is able to detect: q all single bit errors; q any odd number of errors, assuming x + 1 is a factor of g(x); q burst errors of length not exceeding g, where g is the number of check bits (order of CRC polynomial) q double errors if G(x) contains at least three 1 s. The burst feature is invaluable since we expect that the SEU events may affect more than a single bit at a time. Cavaliere - Super. B Workshop - may 2011

CRC coding in existing standards Name r Generator Polynomial Factor x+1 Standard CRC-12 12 x 12+x 11+x 3+x 2+x+ 1 80 F y transmission of 6 -bit character streams CRC-16 CRCCCITT 16 16 x 16+x 15+x 2+1 x 16+x 12+x 5+ 8005 1021 y y IBM’s BISYNCH disk storage XMODEM-X. 25 -IBM’s. SDLCISO’s. HDLC CRC-32 32 x 32+x 26+x 23+x 22+x 04 C 11 DB 7 16+x 12+x 11+x 10+x 8 +x 7+x 5+x 4+x 2+x+1 n PKZip-Ethernet. AAL 5(ATMAdaptation. Layer 5) FDDI(Fiber Distributed Data Interface) IEEE-802 LAN/MAN standard Cavaliere - Super. B Workshop - may 2011

CRC coding in the standard Ethernet protocol The frame check sequence (FCS) field follows the data block in the data frame of the protocol g(X) = X 32 + X 26 + X 23 + X 22 + X 16 + X 12 + X 11 + X 10 + X 8 + X 7 + X 5 + X 4 + X 2 + X + 1 32 bit redundancy are added independently from the message length from 512 to 12144 bits Code n Minimum n. of length 3007 301 204 124 90 12, 144 3006 300 203 123 Hamming distance dmin 4 5 6 7 8 detected errros 3 4 5 6 7 many longer error patterns are detected many burst error patterns are detected Cavaliere - Super. B Workshop - may 2011

CRC coding on a 18 bits block CRC-3 CRC-4 CRC-5 x 3+x+1 x 4+x 3+x 2+x+1 x 5+x 3+x+1 x 5+1 ovh 20% ovh 26. 7% ovh 33. 3% Efficiency of detection = no. of detected errors/ no. of total errors Efficiency of detection v/s n. of errors in a word Cavaliere - Super. B Workshop - may 2011

Why some errors remain undetected? Limitations of the CRC codes depend on some erratic features. CRC detects all single errors and burst errors up to a certain burst length. Anyway the code has some ability to detect also larger number of errors in the frame. But as a function of message length and number of errors it shows large probability that it may detect the errors even if it doesn’t grant the detection. This happens since, remembering the fact that : The code is multiple of the generator g if noise pattern too is an integer multiple of g the resulting received word divided by the CRC polynomial g will give no remainder and then will signall absence of noise This happens with a low but non zero probability, depending also on the length of the trasmitted word. This may be analyzed further……. . Cavaliere - Super. B Workshop - may 2011

Undetected error probability for We may analyze all possible error patterns and find out which actually fail. CRCerror pattern with a fixed number of We may plot the number of undetectable error in it as a function of the length of the message. msg_len = 8; undetected: [0 4 26 44 50 58 46 19 6 2 0] msg_len = 9; undetected: [0 5 34 66 88 114 108 61 24 9 2 0] The trend shows a fast increase in this number Cavaliere - Super. B Workshop - may 2011

Error detection codes for Super. B Starting point for the serial trasmission and the parallel to serial conversion is the basic block length of 18 bits. Information on error control (generalized parity bits) may be: q Appended to each 18 bits block Or, since a block of 5 to 10 18 bit serdes stream is foreseen as an unit transmission block, which should be treated as a whole, and in case of error discarded entirely q Appended to a number N of 5. . 1018 bits blocks Cavaliere - Super. B Workshop - may 2011

Error detection codes for Super. B Considering blocks of N 18 bit serdes stream 4*18=72 bit 65 bit serdes CRC generator 18 18 scrambler n=4 Ecc = 12 % serial link Overhead = 44 % Buffer & descrambler Data to distribute 18 bit 7 Data to transmit buffer & 11 65 bit 3 serdes 4*18=72 bit 18 bit CRC check ERROR flag / ARQ request 72 18 18 Cavaliere - Super. B Workshop - may 2011

Detection efficiency for CRC Two main parameters are overhead=crc_bits/message_bits efficiency = no. Detected / total no. Errors Efficiency is almost constant against the overhead and relatively high, below the 100% value. CRC 7 7 bits parity Polynomial is x 7+x 3+1 Block length in the range 4*18 bits 10*18 bits overhead 4 11 % Cavaliere - Super. B Workshop - may 2011

Undetected error probability for This high value fo the efficiency depends of course mainly on CRC length CRC but also on the polynomial choice. We may verify in the literature that even some of the polynomial chosen for some standards are not at all optimal Cavaliere - Super. B Workshop - may 2011

Simulating CRC check CRC len polynomial N=5 N=6 N=7 N=8 N=9 CRC - 5 x^5+x^3+1 5. 9 4. 1 3. 6 3. 2 CRC - 6 x^6+x+1 7. 1 5. 9 5 4. 3 3. 8 CRC - 7 x^7+x^3+1 8. 4 6. 9 5. 1 4. 5 CRC - 8 x^8+x^2+x+1 9. 8 8 6. 8 5. 9 5. 2 CRC - 9 x^9+x^7+x^6+x^3+x^2+x+1 11 9. 1 7. 7 6. 7 5. 9 Choosen polynomials and N multiplicity to obtain a range 5% to 12% overhead efficiency of the detection v/s overhead Cavaliere - Super. B Workshop - may 2011

Error Detecting codes: CHECKSUMs Checksum was introduced in order to grant really very simple hardware and software implementations. In fact CRC are easely implemented by means of serial processing via shift register with a number of feedback paths. When implemented in software as for example in the Internet case this serial arrangement is slower than a parallel implementation which in turn is relatively intensive. Also in our case the serial bit stream is embodied in the SERDES chip which from the external shows only the parallel path. CHECKSUMs show much simpler algorythms at the cost of less performance Cavaliere - Super. B Workshop - may 2011

Error Detecting codes: checksum A number of different solutions are devised: The message is divided in words which are used to obtain one or more extra words to be transmitted to allow a control at the arrival. Parity byte or parity word Modular sum Position-dependent checksums Fletcher Checksum weighted sum code (WSC) Fletcher checksum (used in ISO) one’scomplement checksum (used in Internet) circular-shift exclusive-OR checksum (CXOR) block-parity code Cavaliere - Super. B Workshop - may 2011 checksum

CHECKSUMs: comparison Parameters for a comparison are d minumum distance between codewords b burst error detecting capacity h number of check bits Lmax maximum code length allowed Cavaliere - Super. B Workshop - may 2011

CHECKSUMs: simulations Short words eg, a single 18 bits stream protection deliver both § high overhead § low efficiency Protecting multiple 18 bits stream – N*18 bits blocks give better results as far as regards: § overhead § efficiency Cavaliere - Super. B Workshop - may 2011

CHECKSUMs: multiple 18 bits stream Number in parenthesis [N S n] are N number of 18 bits blocks protected at the same time S number of words making the protected block N length of the single word and also of the «parity» word Cavaliere - Super. B Workshop - may 2011

CHECKSUMs: multiple 18 bits stream range of interest Number in parenthesis [N S n] are N number of 18 bits blocks protected at the same time S number of words making the protected block N length of the single word and also of the «parity» word Cavaliere - Super. B Workshop - may 2011

CRC vs CHECKSUMs CRC shows large gap in performance as shown in the figures related to the 2 errors and 4 errors. Cavaliere - Super. B Workshop - may 2011

To be done next q Obtain precise figures on the bit error rate in our rad hard environment q Complete the analysis/simulation of the large set of possible algorythms to obtain checksums q Take into consideration the specific statistics of the data/commands to be transmitted in order to optimize some parameters q evaluate different hardware implementations in order to present practical alternatives to be evaluated for a final choice q Define to that purpose which choices may be allowed by a comprehensive implemantation (programmable hardware) q analyze thoroughly the impact of error rates on the performance of the overall apparatus, trigger rate, latence time and data quality Cavaliere - Super. B Workshop - may 2011

Conclusions q We made a recognition of current techniques for the detection of errors in the Super. B DAQ, with the aim of minimizing the required computational power/hardware/latency/robustness in comparison with full error correcting coding q We have developed some statistical analysis to obtain figures useful from our specific viwpoint, mainly the required overhed and undetected error probability q We have developed simulations in order to assess practical figures q We have set up some software useful to develop further analysis and evaluate different alternatives and algorythms for a final choice Cavaliere - Super. B Workshop - may 2011