MSG An Overview of a Messaging System for


















- Slides: 18

MSG: An Overview of a Messaging System for the Grid Daniel Rodrigues CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it

Presentation Summary • • Current Issues Messaging System Testing Test Summary – Throughput – Message Lag – Flow Control • Next Steps Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it Presentation title - 2

Current Issues • The current paradigm in the Grid is based on “Distributed central services” • Single points of failure exist within Grid Monitoring Systems – (ex: Service Availability Monitoring [SAM]). • Reliability on information delivery is often not guaranteed… • … as is not Scalability. Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Will a Messaging System improve both reliability and scalability? Presentation title - 3

Messaging System • Features of an Messaging System: – Flexible architecture: • Deliver messages, either in point to point (queue)… • … or multicast mode (topics) • Support Synchronous or Asynchronous communication. – Reliable delivery of messages: • Provide reliability to the senders if required • Configurable Persistency. – Highly Scalable: • Reducing single points of failure. Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Active. MQ is an Open Source message broker providing these and many other features. Presentation title - 4

Testing • From Active. MQ Evaluated Parameters: documentation: 1) Number of Producers Performance differs greatly depending on many different factors: 2) Number of Consumers - the network topology 3) Message Size - transport protocols used 4) Message - quality of service. Number - hardware, network, JVM and operating system Measurement of timestamps: - number of producers, number of consumers 1) Message Sent across destinations along with message size - distribution of messages 2) Message on Broker 3) Message Received We believe it does change, but how? Results analysis: Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it 1) Logs containing all information for each message 2) From logs, extract messages/second… 3) … and message. Lag Presentation title - 5

Test Summary • Broker statistics: – Running for 6 weeks with no crashes – 50 Million messages of various sizes (0 to 10 k. B) forwarded to consumers – 12 Million incoming messages from producers – Up to 40 Producers and 80 Consumers connected at the same time – Stable under highly irregular test pattern: Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Number of clients change • Frequent client process kills • Daily number of tests vary Presentation title - 6

Results : Throughput > Consumers > Throughput ? ? Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it Consumer Bottleneck! With a larger number of producers, even more messages per second saturating the consumer. Presentation title - 7

Results : Flow Control • Effective flow control Flow Control balances load on producers and consumers Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it Presentation title - 8

Results : Message Lag 3 Consumers Memory Overflow ( Slow consumers ) Test Run batch change Max 180 s! Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it Presentation title - 9

Next Steps • Scalability in a distributed environment – Network of Brokers – Testing optimized wire protocols (Open. Wire) • Evaluation under real world use cases – SAM • 1 Consumer ~ 300 Producers per VO • 15 (~2 k) messages / second • Prototype already in place for OSG – Atlas Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • 10 Producers ~ 100 Consumers • Streaming of messages with 200 B each • Persistence required Presentation title - 10

Thank you for your attention. Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it Presentation title - 11

Support Slides Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it Presentation title - 12

(390 B) (290 B) Messages/s / Consumer Number Consumers 1 Number Consumers 2 Number. Producers 1 Number. Producers 2 Number Consumers 3 Number Consumers 5 Number. Producers 3 Number. Producers 5 Number Consumers 10 Number. Producers 10 10000 0 1 2 3 5 10 1 2 Number. Producers 2 3 5 1 10 2 (1290 B) 2500 2000 1500 1000 500 0 1 2 3 5 10 2500 2000 1500 1000 500 0 1 2 Number. Producers 3 Number Consumers 1000 (10290 B) 3 Number Consumers Number. Producers www. cern. ch/it 10 4000 3000 2000 1000 0 1 CERN IT Department CH-1211 Genève 23 Switzerland 5 Number Consumers 4000 3000 2000 1000 0 Internet Services 3 1000 800 600 400 200 0 1 2 3 Number. Producers 5 10 1 2 3 Number Consumers

Total Messages/s 100 B Total. Messages/s 0 B 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 25000 20000 15000 10000 5000 0 1 2 3 5 Number. Producers 1 Number. Producers 2 Number. Producers 3 Number. Producers 5 Number. Producers 10 10 1 Number Consumers 10 4500 4000 3500 3000 2500 2000 1500 1000 500 0 10000 8000 6000 4000 2000 0 1 2 3 5 Number Consumers www. cern. ch/it 5 Total Messages/s 10 k. B 12000 CERN IT Department CH-1211 Genève 23 Switzerland 3 Number Consumers Total Messages/s 1 k. B Internet Services 2 10 Number. Producers 1 Number. Producers 2 Number. Producers 3 Number. Producers 5 Number. Producers 10 1 2 3 5 10 Number Consumers Presentation title - 14

Kb/s per consumer k. B/(s*consumer) 1 Producer 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Size ( B ) 290 Size ( B ) 390 Size ( B ) 1290 Size ( B ) 10290 1 2 3 5 10 Number Consumers 2 Producers 3 Producers 4500 5 Producers 3000 4000 10 Producers 4500 30000 4000 2500 Internet Services k. B/(s*consumer) 3500 3000 2000 3000 2500 www. cern. ch/it 15000 2500 10000 1500 2000 1500 1000 5000 1500 1000 1 500 CERN IT Department CH-1211 Genève 23 Switzerland 20000 0 1 2 3 5 10 Number Consumers 500 0 2 1 2 3 5 10 Number Consumers

SAM-MSG integration overview ( Piotr Nyczyk ) Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Test results are published both from the framework and directly from test jobs executing in grid sites • MSG-consumer is using “transport views” in Oracle DB (see later) 16

SAM – MSG: Publishing side • Firewall and network issues - test jobs running on Worker Nodes – solution: using HTTP protocol (REST) with http_proxy if available – robust publisher: list of broker URLs (STOMP/REST), the first one that responds is used to publish – requirements: message servlet installed on the broker machine • Tested with a typical SAM load for 1 VO – message rate: 1 to 10 messages/second – published from many short-lived producers • ~300 machines (producers) publishing at the same time • ~15 messages for each producer – prototype setup with 1 broker (gridmsg 001) Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it • Currently used for OSG monitoring integration with SAM 17

SAM – MSG: Consumer side • Generic consumer written in Python: – durable subscription (no data loss in case of producer downtime) – message classes based on WLCG MW Probe Format: key-value pairs – trivial transformation to SQL inserts: • message class - table (name mapping) • attribute (key) - column • On the Oracle DB side: Internet Services CERN IT Department CH-1211 Genève 23 Switzerland www. cern. ch/it – a view for each message class with exactly the same columns as the attributes – PL/SQL code in “INSTEAD OF INSERT” trigger to do the ID look-ups and actual insert(s) into underlying tables 18