Solr Power FTW Alex Pinkin apinkin solrnosql What
- Slides: 26
Solr Power FTW Alex Pinkin @apinkin #solrnosql
What Will I Cover? a. Who I am b. What Bazaarvoice does c. SOLR and No. SQL d. Can SOLR handle 20 K queries per second? e. Lessons learned: large scale multi data center deployment f. Conclusion
Alex Pinkin a. Software Engineering Lead, Data Infrastructure team, Bazaarvoice b. Loves to play with SQL and No. SQL @apinkin
Bazaarvoice a. Bazaarvoice is a software as a service company powering user generated content such as ratings and reviews on thousands of web sites b. 5 billion page views per month c. 230 billion impressions d. 75 million UGC
No. SQL ?
SQL vs No. SQL
No. SQL is Not Only SQL a. Departs from relational model b. No fixed schema c. No joins d. Eventual consistency is OK e. Scale horizontally
Types of No. SQL a. Key-value (Redis, Riak, Voldemort) b. Document (Mongo. DB, Couch. DB) c. Graph (Neo 4 J, Flock. DB) d. Column family (Cassandra, HBase)
SOLR as No. SQL a. Non-relational model - Check b. No fixed schema - Check (dynamic fields) c. No joins - Check (denormalization) d. Horizontal scaling - Check (with work)
SOLR stats - Bazaarvoice
SOLR Case Study
SOLR Case Study
Life Before SOLR a. Indexes for sorting and filtering b. Aggregate tables for stats c. Nightly jobs d. Bugs. . .
Enter SOLR a. Index content and product catalog b. De-normalization c. Filtering and sorting d. Index every 15 minutes (20 seconds NRT)
SOLR - Statistics a. COUNT, SUM, AVG, MIN, MAX (Stats. Component) b. Stored fields c. Whenever content changes, re-calc stats for all affected subjects
Scaling reads - Replication
Replication - Multiple Data Centers
Replication - Multiple Data Centers Chatty if using multiple cores Relay a. Core auto-warming disabled • Connection wait and read timeouts increased • Replication poll interval increased (15 min) • Compression enabled . . . <str name="http. Conn. Timeout">20000</str> <str name="http. Read. Timeout">65000</str> <str name="poll. Interval">00: 15: 00</str> <str name="compression">internal</str>. . .
SOLR Cloud - Bazaarvoice version a. Multiple cores (100+ per server) b. Re-balance indexes across cores and servers a. Automatic b. Manual c. Deployment map stored in My. SQL a. Host - Core - Partition b. Statistics d. Partition lifecycle
Schema Changes Re-indexing is time consuming for large indexes Process 1. Full re-index off-line prior to the release • Incremental indexing after the release Bottleneck: reading from My. SQL Goal: Transparent re-indexing
Performance Tuning a. Heap size b. Cache sizing c. Auto-warming d. Stored fields e. Merge factor f. Commit frequency g. Optimize frequency Process: Simulate and measure • Replay logs • Analyze metrics • Monitor GC
Performance Tuning - GC # Java memory usage settings # Force the New. Size to be larger than the JVM typically allocates. # In practice, the JVM has been allocating an extremely small Young generation which objects to be prematurely promoted to the Tenured generation JAVA_MEM_OPTS="-Xms 27 g -Xmx 27 g -XX: New. Ratio=8" # -verbose: gc -XX: +Print. GCDetails -XX: +Print. GCDate. Stamps --> Turn on GC Logging # -XX: +Use. Conc. Mark. Sweep. GC --> Use the concurrent collector # -XX: +CMSIncremental. Mode --> Incremental mode for the concurrent collector # -XX: +CMSIncremental. Pacing --> Let the JVM adjust the amount of incremental collection JAVA_GC_OPTS="-verbose: gc -XX: +Print. GCDetails -XX: +Print. GCDate. Stamps XX: +Use. Conc. Mark. Sweep. GC -XX: +Use. Par. New. GC -XX: CMSInitiating. Occupancy. Fraction=55 XX: Parallel. GCThreads=8 -XX: Survivor. Ratio=4"
SOLR Performance - Summary a. SOLR loves RAM! b. Log replay SOLR c. Same config, same hardware d. Get the most out of one instance
Conclusion - SOLR Strengths a. Lightning fast given enough RAM b. Good scale out support including multi-data center c. Great community
Conclusion - SOLR's Gaps a. Not fully elastic b. Real time takes work c. Secondary data store = sync overhead d. Schema changes
Questions @apinkin
- Lucid imagination
- Solr autocomplete
- Triangle of power
- Actual power
- Chain rule範例
- Flex28024a
- Power angle curve in power system stability
- The dispersive power of a grating is defined as the :
- Powerbi in powerpoint
- Solar power satellites and microwave power transmission
- Power of a power property
- Power absorbed or delivered
- Google xclass room
- Alex pouget
- Alex c. snoeren
- Alex trial
- Alex andreotti
- Alex bogacz
- Alex hegyi
- Alex osborn brainstorming rules
- Alex stannard
- Alex brahm
- How many chapters in noughts and crosses
- Alex kopelowicz
- Alex lisenko
- Alex karmazin
- La parrilla de alex