My SQL in e Bays Personalization Platform Presented
My. SQL in e. Bay’s Personalization Platform Presented by, My. SQL & O’Reilly Media, Inc. Chris Kasten e. Bay Kernel Framework Group April 16, 2008
Outline Ø Ø Ø Background General Vision General Requirements Why My. SQL Memory Engine? System Overview Results
Fun Facts About e. Bay Ø Ø Ø Ø Ø 110 Million items for sale on the site $59 Billion in gross merchandize value (GMV) per year Approx $2, 039 worth of goods traded on the site every second 276 Million registered users 2 Billion URL requests per day 6, 000 application servers with 12, 000 Java processes 40 Billion database requests per day 300 different databases (over 700 instances) 9 PB of data storage 13 million lines of source code (In 2008 will surpass Windows NT 4. 0 O/S 16 million lines)
Background Ø Further distinguish the e. Bay shopping experience § Provide a more relevant and even better user experience § Provide users with a more rich experience with greater continuity § Provide users with the best selection tailored to their interests/profile § Provide better user experience through real time personalization data feedback loop that is immediately available § Provide users with tailored alternatives Ø Further distinguish the e. Bay business value proposition § Advertising shown to more relevant buyers § More effective merchandizing and marketing of items § Increase conversion rates through better buyer experience and greater relevancy of items presented to the buyer
Background Ø e. Bay needed to expand its real time personalization capabilities Ø e. Bay needed to be able to associate more data with sessions Ø Both personalization and session data were constrained by technology § Cookies limitation • Client side cookie limit of 4 KB data • Long term scalability issue of sending all cookie data, whether needed or not § High cost of traditional server side solutions using an OLTP database • e. Bay’s very large scale quickly multiplies costs in to a very large number • Throughput of OLTP’s decrease with high write ratio of approximately 50% • Large number of licenses/servers needed for throughput was cost prohibitive § High cost of other commercial alternatives at e. Bay’s very large scale Ø These constraints were limiting business decisions and had to be solved
General Vision Every Application Server Can Access Data For Every URL Request (All 2 Billion of them!) Session Data Personalization Data
General Requirements Ø Ø Ø Handle 4 Billion reads/writes per day Support connections and requests from 12, 000 Java processes High throughput on low cost hardware Scale both horizontally and vertically for 10 x future growth Scale without operational interruption High availability and operational failure robustness Low latency response times Low licensing, support, and total cost of ownership costs Enterprise class support agreement Enterprise class management and monitoring tools Driver for Java
Why My. SQL Memory Engine? Ø My. SQL Memory Engine had the best performance Ø Very impressive POC results for My. SQL Memory Engine § Approx 2 X more throughput than nearest competitor (Java driver) § e. Bay test case of 50/50 read/writes showed approx 13, 000 TPS @ 50% CPU for a Sun 4100 running Solaris 10 x 86 (2 CPU, Dual Core Opteron, 16 GB RAM) for a network client § Handled 20, 000 concurrent connections with less than 1% degradation in throughput than baseline case (e. Bay developed patch) Ø Production performance has been consistent with POC results
Why My. SQL Memory Engine? Ø My. SQL Enterprise had a very attractive cost structure Ø My. SQL’s ability to offer enterprise class support Ø My. SQL’s combined throughput and cost structure provided a low cost system for the scale of e. Bay Ø Power and flexibility of using SQL for different needs Ø A company with a significant track record
Why My. SQL Memory Engine? Ø The power of open source § e. Bay has developed and contributed two enhancements to My. SQL • Support for an event port based threading and connection handling model for scalable connection handling • Support for true variable size columns in My. SQL Memory Engine § Option to be able to apply our talent and create the enhancements we need quickly § Receive the benefits of innovations of others via open source
Why My. SQL Memory Engine? Ø The power of an open source company behind the product § Ability to collaborate with My. SQL on enhancements to the product § Option to request enhancements from a company behind the product § Out of the box monitoring and administration tools § Eliminate tying up high end e. Bay talent in owning it ourselves § An enterprise class open source product § Enterprise class support offerings for use in critical systems
e. Bay Personalization System Overview Browser Application Servers My. SQL Memory Engine Cache Tier Persistent Database
e. Bay Personalization System Overview Application Servers My. SQL Memory Engine Cache Tier Replication 5 min Batched Write Back Read/Write Cache Miss Read Persistent Database
e. Bay Personalization System Overview Ø Replication optional based on criticality of data loss for past 5 min § Trade-off between data criticality versus double the memory cost § Some personalization data may not be critical enough for the additional hardware cost Ø Single threaded My. SQL replication is generally problematic § Once replication falls behind it stays behind with continued traffic § Replication can be achieved via dual writes from the application server performed transparently by the framework § Second write to replica can be asynchronous Ø Automatic redistribution of data when node failure or draining a node
e. Bay Personalization System Overview Ø Write back to persistent database performed by batch process Ø Evictions performed by batch process based on target free memory Ø Buffering space is set aside in case persistent database is unavailable Ø Special techniques used to minimize table lock duration during write back and eviction operations
Results Ø A business critical system running on My. SQL Enterprise for one of the largest scale websites in the world Ø Highly scalable and low cost system that handles all of e. Bay’s personalization and session data needs Ø Ability to handle 4 billion requests per day of 50/50 read/write operations for approximately 40 KB of data per user / session Ø Approx 25 Sun 4100’s running 100% of e. Bay’s personalization and session data service (2 CPU, Dual core Opteron, 16 GB RAM, Solaris 10 x 86)
Results Ø Highly manageable system for entire operational life cycle Ø Leveraging My. SQL Dashboard as a critical tool in providing insight into system performance, trending, and identifying issues Ø Adding new applications to ebay. com domain that previously would have been in a different domain because of cookie constraints Ø Creating several new business opportunities that would not have been possible without this new low cost personalization platform Ø Leveraging My. SQL Memory Engine for other types of caching tiers that are enabling new business opportunities
Q&A Ø Thank you for coming! Ø Questions?
- Slides: 18