High performance readout chain for the DSSC 1

  • Slides: 34
Download presentation
High performance readout chain for the DSSC 1 M Pixel detector, designed for high

High performance readout chain for the DSSC 1 M Pixel detector, designed for high throughput during pulsed operation mode Topical Workshop on Electronics for Particle Physics Aix en Provence, 24. 09. 2014 Manfred Kirchgessner on behalf of the DSSC Collaboration DSSC = DEPFET Sensor with Signal Compression 1 M. Kirchgessner TWEPP, 24. 09. 2014

Outline § The European XFEL § DSSC system overview § Readout chain implementation §

Outline § The European XFEL § DSSC system overview § Readout chain implementation § Implementation details § Summary 2 M. Kirchgessner TWEPP, 24. 09. 2014

The European XFEL 3 M. Kirchgessner TWEPP, 24. 09. 2014

The European XFEL 3 M. Kirchgessner TWEPP, 24. 09. 2014

Eu. XFEL construction site at Hamburg Three 2 D detector developments at the European

Eu. XFEL construction site at Hamburg Three 2 D detector developments at the European XFEL (coordinator: M. Kuster) § Adaptive Gain integrating Pixel Detector Consortium (AGIPD) (Project Leader: H. Graafsma) § Large Pixel Detector Consortium (LPD) (Project Leader: M. French) § DEPFET Sensor with Signal Compression Consortium (DSSC) (Project Leader: M. Porro) 4 M. Kirchgessner TWEPP, 24. 09. 2014

Eu. XFEL – bunch structure The Eu. XFEL runs in pulsed operation mode: Data

Eu. XFEL – bunch structure The Eu. XFEL runs in pulsed operation mode: Data readout § Bunch repetition rate of 10 Hz readout § Sequences of ~2700 pulses § Up to min 220 ns distance (frame rate 4. 5 MHz) § ~100 fs wide X-Ray pulses (exposure time) § 99. 4 ms pause between macro bunches Max frame rate: 4. 5 MHz 5 M. Kirchgessner TWEPP, 24. 09. 2014

The DSSC Detector 6 M. Kirchgessner TWEPP, 24. 09. 2014

The DSSC Detector 6 M. Kirchgessner TWEPP, 24. 09. 2014

Sensor and focal plane architecture § DEPFET with non linear characteristic • Silicon detector

Sensor and focal plane architecture § DEPFET with non linear characteristic • Silicon detector with internal gate • Intrinsic low noise due to small internal gate capacitance • Intrinsic signal compression § Focal Plane composition • • 1024 x 1024 pixels 32 monolithic sensors Sensor bump bonded to 8 Readout ASICs Dead area: ~15% § Power cycling • 10. 7 k. W peak power • 240 W average power § Readout concept • • Full parallel readout Analogue shaping using trapezoidal filter In-Pixel 9 Bit ADC In-Pixel SRAM Memory ( 800 frames ) © Image by Karsten Hansen Focal-Plane 248 x 240 mm 7 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC – Design Parameters General Parameters Energy range Number of pixels optimized for 0.

DSSC – Design Parameters General Parameters Energy range Number of pixels optimized for 0. 5 … 6 ke. V 1024 x 1024 Sensor Pixel Shape Hexagonal Sensor Pixel pitch ~ 204 x 236 µm 2 Dynamic range / pixel / pulse ~5000 ph @ 0. 5 ke. V > 10000 ph @ E≥ 1 ke. V Resolution Single photon detection also @ 0. 25 ke. V Frame rate 0. 9 - 4. 5 MHz Stored frames per Macro bunch Operating temperature 800 -20˚C optimum, RT possible 8 M. Kirchgessner TWEPP, 24. 09. 2014

The DSSC high throughput readout chain 9 M. Kirchgessner TWEPP, 24. 09. 2014

The DSSC high throughput readout chain 9 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC Ladder components Module-Interconnection Board: MIB Readout ASIC Patch-Panel Flex Cable Module-Interconnection Board I/O

DSSC Ladder components Module-Interconnection Board: MIB Readout ASIC Patch-Panel Flex Cable Module-Interconnection Board I/O Board ( 1 st FPGA stage) Power Regulator Board Mainboard Monolithic DEPFET sensor 10 M. Kirchgessner TWEPP, 24. 09. 2014

THE DSSC System overview Second FPGA Stage 1 MPixel x 800 images x 2

THE DSSC System overview Second FPGA Stage 1 MPixel x 800 images x 2 Bytes per Pixel = 1600 MByte per 0. 1 seconds Total data production rate of the detector: © Image by Karsten Hansen 128 GBit/s or 32 GBit/s per PPT 11 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC DAQ Architecture 12 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC DAQ Architecture 12 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC DAQ Architecture – ASIC Readout ASIC: • IBM 130 nm technology • 4096

DSSC DAQ Architecture – ASIC Readout ASIC: • IBM 130 nm technology • 4096 pixels per ASIC • In-Pixel SRAM cells for 800 9 bit words • One 10 bit serializer running at 350 MHz (400 MHz also successfully tested) 9 bit data + 1 bit parity 13 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC DAQ Architecture – ASIC 16 ASICs are connected to first FPGA readout Board

DSSC DAQ Architecture – ASIC 16 ASICs are connected to first FPGA readout Board ( I/O Board ) • Differencial 350 MHz LVDS signals • Connection via wire bonds and traces on PCB 14 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC DAQ Architecture – I/O Board implements the first FPGA stage • FPGA: Spartan

DSSC DAQ Architecture – I/O Board implements the first FPGA stage • FPGA: Spartan 6 LX 45 (xc 6 slx 45 t-3 -csg 324) • Combines the data from 16 ASICs into one data stream • Implements 3 high speed serial Xilinx Aurora links • Additional capacitors for pulsed sensor supply • Temperature sensor 15 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC DAQ Architecture – ASIC Xilinx Aurora Protocol: • 3 lanes @ 3. 125

DSSC DAQ Architecture – ASIC Xilinx Aurora Protocol: • 3 lanes @ 3. 125 GHz form one channel • 8 b 10 b encoding & 32 bit cyclic redundancy check (CRC) ERROR Correction: all single bit and most multi-bit errors • Effective usable datarate per channel is 2. 5 GBit/s • Parallel user interface in FPGA is 96 bit @ 78. 125 MHz = 7. 5 Gbit/s Flexible cable connection to the Patch-Panel-Tranceiver ( PPT ) outside of the vaccum vessel: • Rigid flex circuit board • 320 mm length Aurora Eye-Diagram 16 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC DAQ Architecture – PPT The Patch-Panel-Tranceiver ( PPT ) implements the main FPGA

DSSC DAQ Architecture – PPT The Patch-Panel-Tranceiver ( PPT ) implements the main FPGA stage • FPGA: Kintex 7 325 T (xc 7 k 325 t-ffg 900 -2) • Receives data from 4 IOBoards over 4 x 3 lanes = 12 Aurora lanes • 1 GByte high speed DDR 3 -1600 data buffer • 4 x 10 GBit/s ethernet links that connect to a QSFP 40 Gb/s • Microblaze µC with an embedded linux for slow control via 1 GBit/s ethernet 17 M. Kirchgessner TWEPP, 24. 09. 2014

PPT Firmware Details 18 M. Kirchgessner TWEPP, 24. 09. 2014

PPT Firmware Details 18 M. Kirchgessner TWEPP, 24. 09. 2014

PPT FPGA Firmware – Datarates Data Input 22. 4 Gbit/s 19 M. Kirchgessner TWEPP,

PPT FPGA Firmware – Datarates Data Input 22. 4 Gbit/s 19 M. Kirchgessner TWEPP, 24. 09. 2014

Implementation Details 20 M. Kirchgessner TWEPP, 24. 09. 2014

Implementation Details 20 M. Kirchgessner TWEPP, 24. 09. 2014

PPT – FPGA Connections Kintex 7 325 T QSFP Detector Slow-Control Rx-Aurora 1 GB

PPT – FPGA Connections Kintex 7 325 T QSFP Detector Slow-Control Rx-Aurora 1 GB DDR 3 -1600 DDR 3 -800 µC 21 M. Kirchgessner TWEPP, 24. 09. 2014

Aurora - implementation details Aurora IP-core details: § Simplex core implemented (no back channel)

Aurora - implementation details Aurora IP-core details: § Simplex core implemented (no back channel) § Streaming interface for easy data transmition § Timer used for initialization sequence GTP Quad Only one differencial wire pair per lane between FPGAs required § License comes with ISE Chan 1 Chan 2 Chan 3 Chan 4 Chan 1 MGT usage: Chan 2 § One input clock can be connected to 3 GTX Quads. § Each GTX transceiver can be driven by ist own PLL (CPLL) or by the Quad. PLL • CPLL (in each GTX Channel included) for linerates 1. 6 – 3. 3 Gbit/s (connected in Aurora) • • QPLL (one per Quad) for linerates 5. 93 – 12. 5 Gbit/s (required for 10 Gig. E) § Each Aurora channel is distributed to 3 Quads – 1 Lane per Quad § Signal quality can be improved by optimizing swing and pre-emphasys settings Chan 3 Chan 4 Chan 1 Chan 2 Chan 3 Chan 4 22 M. Kirchgessner TWEPP, 24. 09. 2014

DDR 3 1600 - implementation details IP-Core Version: § Xilinx DRAM-controller mig 7 v

DDR 3 1600 - implementation details IP-Core Version: § Xilinx DRAM-controller mig 7 v 1. 9 § License comes with ISE Interface: § 4 DDR 3 modules with 16 bit width = 64 bit data bus @ 800 MHz § On the firmware (user) side: 512 bit data bus @ 200 MHz single data rate § Running in burst mode of up to 256 words x 512 bits. § Alternating read and write bursts to minimize latency • In alternating read/write mode max bandwidth achieved is 88 GBit/s 23 M. Kirchgessner TWEPP, 24. 09. 2014

QSFP - implementation details IP Core: § Adaptions required to support FOUR 10 Gig.

QSFP - implementation details IP Core: § Adaptions required to support FOUR 10 Gig. E channels, single links can directly be generated § Licence required System Tests: § System was tested using a standard desktop PC § 1 x 10 Gig. E PCI-Express SFP receiver card (single link tested via breakout cable) § It is possible to receive 8 k. B UDP packets at ~10 GBit/s without loss after some optimiziations: • Linux driver adaptions of buffer sizes • Move data-receiving in seperate CPU thread • No data stored, just copied from buffer and checked 24 M. Kirchgessner TWEPP, 24. 09. 2014

PPT – board details Board details: § ~ 5000 € per Board § 14

PPT – board details Board details: § ~ 5000 € per Board § 14 layers § Size: 80 x 160 mm § Supplied by 12 volts / 17 W § 9 different supply levels: 12 Volts + 2 x 1. 0 V 1. 2 V 1. 5 V 1. 8 V 2. 0 V 2. 5 V 3. 3 V Booting and update: § Boot chain for successive power-up of all required voltages § FPGA & Linux boot automaticly from SPI-connected flash memory § • Firmware & Linux flash reprogrammable from Microblace • Re-boot process triggerable from remote • IO Board FPGAs programmed by PPT After ~5 min system is ready PPT top view Full system update possible from remote 25 M. Kirchgessner TWEPP, 24. 09. 2014

PPT – board details Debugging: § Xilinx JTAG Programmer Cable for early debugging §

PPT – board details Debugging: § Xilinx JTAG Programmer Cable for early debugging § Xilinx virtual cable implemented: • § Xilinx Chipscope access to all IOB FPGAs AND PPT FPGA remotely via ethernet USB ftdi interface ( linux boot output ) Debug access available even when installed in vacuum PPT top view 26 M. Kirchgessner TWEPP, 24. 09. 2014

Summary § Differential links LVDS @ 350 MBit/s § Aurora protocol @ 3. 125

Summary § Differential links LVDS @ 350 MBit/s § Aurora protocol @ 3. 125 GBit/s over 30 cm LVDS on Flex-Cable runs reliable. § Aurora Lanes can be distributed to different GTX Quads § 10 GBit/s link works nice at >90% speed. § DDR 3 – 1600 @ 800 MHz works out of the box, if hardware timings are known. Outlook: § First X-Ray beam 2015 § First DSSC ladder camera (65 k pixel) 2015 § Full DSSC 1 M pixel camera 2017 27 M. Kirchgessner TWEPP, 24. 09. 2014

The DSSC Consortium M. Porro 1, L. Andricek 2, S. Aschauer 3, M. Bayer

The DSSC Consortium M. Porro 1, L. Andricek 2, S. Aschauer 3, M. Bayer 4, A. Castoldi 4, 5, D. Comotti 6, M. Donato 7, F. Erdinger 8, C. Fiorini 4, 5, P. Fischer 8, H. Graafsma 9, C. Guazzoni 4, 5, K. Hansen 9, P. Kalavakuru 9, H. Klaer 9, M. Kirchgessner 8, A. Kugel 8, M. Kuster 7, P. Lechner 3, G. Lutz 3, P. Majewski 3, M. Manghisoni 6, D. Moch 1, B. Nasri 4, S. Nidhi 7, V. Re 6, C. Reckleben 9, R. Richter 2, S. Schlee 7, J. Soldat 8, L. Strueder 8, J. Szymanski 9, M. Turcato 7, G. Weidenspointner 7, C. Wunderer 9 1) Max Planck Institut fuer Extraterrestrische Physik, Garching, Germany 2) MPG Halbleiterlabor, Muenchen, Germany 3) PNSensor Gmb. H, Muenchen, Germany 4) Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy 5) Sezione di Milano, Italian National Institute of Nuclear Physics (INFN), Milano, Italy 6) Dipartimento di ingegneria industriale, Università di Bergamo, Italy 7) European XFEL Gmb. H, Hamburg, Germany 8) Zentrales Institut für Technische Informatik, Universitaet Heidelberg, Germany 9) Deutsches Elektronen-Synchrotron DESY, Hamburg, Germany 28 M. Kirchgessner TWEPP, 24. 09. 2014

THANK YOU for your attention 29 M. Kirchgessner TWEPP, 24. 09. 2014

THANK YOU for your attention 29 M. Kirchgessner TWEPP, 24. 09. 2014

PPT - FPGA Utilization FPGA: Kintex 7 325 T (xc 7 k 325 t-ffg

PPT - FPGA Utilization FPGA: Kintex 7 325 T (xc 7 k 325 t-ffg 900 -2) LOGIC Used Available Ratio Slice Registers 77. 228 407. 600 18% Slice LUTs 81. 387 203. 800 39% Occupied slices 32. 635 50. 950 64% RAMB 36/FIFO 36 115 445 25% RAMB 18/FIFO 18 91 890 10% GTXE 2_CHANNELS 16 16 100% 30 M. Kirchgessner TWEPP, 24. 09. 2014

PPT – FPGA Logic distribution Logic Other 10% Microblaze 23% Rest Datapath 17% DDR

PPT – FPGA Logic distribution Logic Other 10% Microblaze 23% Rest Datapath 17% DDR 3 15% Aurora x 4 7% 10 GBE x 4 QSFP 28% 31 M. Kirchgessner TWEPP, 24. 09. 2014

Used IP Cores ● ● Used Xilinx IP cores + self written wrapper code

Used IP Cores ● ● Used Xilinx IP cores + self written wrapper code (verilog) § Aurora 8 B 10 B v 8. 3 Rx&Tx § FIFO Generator v 9. 3 (no wrapper required) § 1 GB DDR 3 -1600 memory controller MIG 7 Series 1. 9 § Ethernet MAC + PHY (v 11. 6 + PCS/PMA 2. 6) Xilinx EDK was used to implement the Microblace § Running @ 100 MHz § Running a busybox linux § Discrete 256 MB DDR 3 -800 DRAM controller § Gigabit ethernet controller § Only for slow control 32 M. Kirchgessner TWEPP, 24. 09. 2014

Kintex 7 GTX Quad - QPLL Certain refclock frequency for application required 33 M.

Kintex 7 GTX Quad - QPLL Certain refclock frequency for application required 33 M. Kirchgessner TWEPP, 24. 09. 2014

DSSC– implemented datarates Maximum Implemented Min Required ASIC 0. 33 0. 00 0. 05

DSSC– implemented datarates Maximum Implemented Min Required ASIC 0. 33 0. 00 0. 05 0. 10 0. 15 0. 20 0. 25 Datarate [GBit/s] IOB 5. 27 0. 00 1. 25 2. 50 0. 35 6. 25 15. 00 30. 00 45. 00 60. 00 Datarate [GBit/s] 0. 00 75. 00 90. 00 35. 84 33. 76 QSFP 5. 00 10. 00 15. 00 20. 00 25. 00 Datarate [GBit/s] 0. 45 7. 50 102. 4 71. 68 67. 51 0. 00 0. 40 7. 5 5. 6 3. 75 5. 00 Datarate [GBit/s] DDR 3 Buffer (in+out) 0. 4 30. 00 35. 00 105. 00 40 40. 00 45. 00 34 M. Kirchgessner TWEPP, 24. 09. 2014