i 3 on Internet in a Box Zhangxi
i 3 on Internet in a Box Zhangxi Tan, Wei Xu, David Patterson UC Berkeley
Outline • • Project Overview System Architecture Sample Software and Demo Future Work
Project Overview • Testing and debugging large scale distributed systems is difficult • Problem with existing approaches – – Scalability – O(100) nodes Reproducibility - Planet. Lab Observability - Don’t know what’s going on inside Cost, Space and Power • IIAB: Building distributed system testbed with over 1, 000 nodes using multi modular FPGA system – Version 0: Basic hardware building block, operating system and TCP/IP network support
Methodology Target Platform Research Accelerator for Multi. Processing (RAMP) • 5 Xilinx Virtex II Pro FPGA (VP 70) • DDR II memory / FPGA • 10 Gbps Ethernet Version 0 platform Xilinx XUP boards • Price $299 • Xilinx Virtex II Pro FPGA (VP 30) • 256 MB DDR memory • 10/100 Mbps Ethernet
Version 0 Status • 4 32 -bit RISC processor (Micro. Blaze) in a chip – Running at 100 MHz with L 1 Cache (Instruction 16 KB, Data 16 KB) – HW Div/Mult, Barrier Shifter and etc – 64 MB DDR memory (100 MHz) for each processor (different address space) – 50 MIPS (measured from Linux kernel) – Running u. Clinux 2. 4. 32 kernel • Inter-processor connection – P 2 P 32 -bit high speed FIFO link • 3. 2 Gbps throughput, 1 cycle access latency – Emulate Ethernet device through Linux kernel driver • • Support TCP/IP protocol stack Standard UNIX socket programming interface Software implementation (Polling/Interrupt) Software router through Linux kernel
System Architecture
Chip Layout Chip Utilization • 97% BRAM utilization • 61% LUT utilization • Over 9 million equivalent gate count No floorplanning! • Xilinx tools are difficult to use! • Inter-connection creates hot spots • Excessive BRAM usage affects the layout
An XUP Cluster • 3 XUP boards with 12 nodes • Connected by 100 Mbps Ethernet switch
Network Performance • Measured by TTCP program (polling mode) – Software networking <<< 3. 2 Gbps physical bonds
Software on Version 0 System tools agetty, basename, crond, crontab, date, dmesg, echo, env, expand, flatfsd, free, hostname, init, insmod, killall, login, passwd, ps, uname, version, whoami Shell programming egrep, false, find, grep, msh, null, sed, sh, true, xargs Networking arp, dhclient, dhcpd, dhcrelay, ftpd, ftpget, ftpput, ifconfig, ifdown, ifup, inetd, iptables, nslookup, ping, portmap, rdate, telnetd, ttcp tftp, traceroute, wget File system cat, chmod, cmp, cp, dd, df, du, gunzip, gzip, hd, head, ln, lsmod, mkdir modprobe, more, mount, mv, pwd, rmdir, rmmod, tail, touch, umount, which, zcat Monitoring and debugging netstat, rsyslogd, tcpdump, time, top, uptime, vmstat, Interpreters Python (2. 0, no math libraies) Package management dpkg, dpkg-deb Web server BOA, thttpd (with CGI support) Editor vi Research applications i 3 serverd and a number of example applications
i 3 on IIAB Internet Indirection Infrastructure (i 3) – A new Internet architecture from Berkeley • Multicast, unicast, anycast and etc. – Chord DHT based C implementation data id Sender Receiver (R) data R id R trigger
Future Work (1/2) • LEON 3 (SPARC v 8) as the next supported processor – MMU Support – Double precision floating point – Reconfigurable parameters (Cache, MMU and etc. ) – Cache coherent (snooping) – 0. 85 MIPS/MHz (5000 LUTs, 90 MHz on Virtex-II) Huge benefit on software support • Full version Linux support (Linux 2. 6 Kernel) • Java support Putting multiple LEON 3 will be more challenging • Floorplanning and physical synthesis (less PAR time and Qo. R)
Future Work (2/2) • Time Dilation – Make 50 MIPS processor looks like 1000 MIPS processor to software – Network/Link emulation: delay, bandwidth, jitter and etc. • Disk emulation – An abstraction layer (HW/SW approach) to software • Better internal architecture – Processor/memory subsystem – High performance internal network (1 gigabit Ethernet)
- Slides: 13