Facebook A strong and Large distributed database What
Facebook – A strong and Large distributed database
What is Facebook? According to Facebook. com: “Facebook is an online directory that connects people through social networks. ” Technically, Facebook is application which comprises of N number of Applications.
Facebook: a brief and tumultuous history Founded in February 2004 Founders: Mark Zuckerberg Eduardo Saverin Dustin Moskovitz Chris Hughes All were students at Harvard College at the time Lawsuit from website Connect. U Alleged that Zuckerberg had stolen the idea while employed by their company
Platform Facebook Platform engineering team has released and maintains open source SDKs for Android, C#, i. Phone, Java. Script, PHP, and Python.
Developer tools • codemod assists with large-scale codebase refactors that can be partially automated but still require human oversight and occasional intervention. • Facebook Animation is a Java. Script library for creating customizable animations using DOM and CSS manipulation. • flvtool++ is a tool for hinting and manipulating the metadata of FLV files. It was originally created for Facebook Video. • Online Schema Change for My. SQL lets you alter large database tables without taking your cluster offline. • Phabricator is a collection of web applications which make it easier to write, review, and share source code. It is currently available as an early release and is used by hundreds of Facebook engineers every day. • PHPEmbed makes embedding PHP truly simple for all of our developers (and indeed the world) we developed this PHPEmbed library which is just a more accessible and simplified API built on top of the PHP SAPI. • phpsh provides an interactive shell for PHP that features readline history, tab completion, and quick access to documentation. It is ironically written mostly in Python. • Three 20 is an Objective-C library for i. Phone developers which provides many UI elements and data helpers behind our i. Phone application. • XHP is a PHP extension which augments the syntax of the language such that XML document fragments become valid expressions. • XHProf is a function-level hierarchical profiler for PHP with a simple HTML-based navigational interface.
Infrastructure • Apache Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure. • Apache Hive is data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets. • Flash. Cache is a general purpose writeback block cache for Linux. It was developed as a loadable Linux kernel module, using the Device Mapper and sits below the filesystem. • Hip. Hop for PHP transforms PHP source code into highly optimized C++. Hip. Hop offers large performance gains and was developed over the past two years. • Open Compute Project an open hardware project aims to accelerate data center and server innovation while increasing computing efficiency through collaboration on relevant best practices and technical specifications. • Scribe is a scalable service for aggregating log data streamed in real time from a large number of servers. • Thrift provides a framework for scalable cross-language services development in C++, Java, Python, PHP, and Ruby. • Tornado is a relatively simple, non-blocking web server framework written in Python. It is designed to handle thousands of simultaneous connections, making it ideal for real -time Web services.
Facebook engineers contribute to • Apache Hadoop provides reliable, scalable, distributed computing infrastructure which we use for data analysis. • Apache HBase is a distributed, versioned, column-oriented data store built on top of the Hadoop Distributed Filesystem. • Cfengine is a rule-based configuration system that is used to automate the config and maintenance of servers. Facebook uses Cfengine to maintain host configs and to automate many janitorial operations on our production tiers. • jemalloc is a memory allocator which is fast, consistant, and supports heap profiling. Facebook engineers added heap profiling and made many optimizations. • memcached is a distributed memory object caching system. Memcached was not originally developed at Facebook, but we have become the largest user of the technology. • My. SQL is the backbone of our database infrastructure. You can find our patches on Launchpad and learn more about how we use it on the My. SQL@Facebook page. • PHP is an incredibly popular scripting language which makes up the majority of our code-base. Its simple syntax lets us move fast and iterate on products. • Varnish serves billions of requests every day to Facebook users around the world. Whenever you load photos and profile pictures of your friends, there's a very good chance that Varnish is involved.
Facebook Technology • Having 30000 Servers all over the Globe • 25 Terabytes of Log Data – Daily • Uses localized uninteruptable power supplies serving six racks of servers. • Aimed to cut the energy loss from the grid to the motherboard. There’s a 480 volt electrical system through the data center and 277 volts go directly to each server. • Facebook’s data center has no air conditioning system. It is cooled with outside air. Walls eliminate water particles to only get the cool air. There is no duct work in the data center.
How Does Facebook Work? ”” The Front End Facebook uses a variety of services, tools, and programming languages to make up its core infrastructure. At the front end, their servers run a LAMP (Linux, Apache, My. SQL, and PHP) stack with Memcache.
Linux & Apache This part is pretty self-explanatory. Linux is a Unixlike computer operating system kernel. It’s open source, very customizable, and good for security. Facebook runs the Linux operating system on Apache HTTP Servers. Apache is also free and is the most popular open source web server in use. My. SQL For the database, Facebook utilizes My. SQL because of its speed and reliability. My. SQL is used primarily as a key-value store as data is randomly distributed amongst a large set of logical instances. These logical instances are spread out across physical nodes and load balancing is done at the physical node level. As far as customizations are concerned, Facebook has developed a custom partitioning scheme in which a global ID is assigned to all data. They also have a custom archiving scheme that is based on how frequent and recent data is on a per-user basis. Most data is distributed randomly.
PHP Facebook uses PHP because it is a good web programming language with extensive support and an active developer community and it is good for rapid iteration. PHP is a dynamically typed/interpreted scripting language. Memcache is a memory caching system that is used to speed up dynamic database-driven websites (like Facebook) by caching data and objects in RAM to reduce reading time. Memcache is Facebook’s primary form of caching and helps alleviate the database load. Having a caching system allows Facebook to be as fast as it is at recalling your data. If it doesn’t have to go to the database it will just fetch your data from the cache based on your user ID.
How Does Facebook Work? ”” The Back End Facebook’s backend services are written in a variety of different programming languages including C++, Java, Python, and Erlang. Their philosophy for the creation of services is as follows: 1. Create a service if needed 2. Create a framework/toolset for easier creation of services 3. Use the right programming language for the task I will discuss a few of the essential tools that Facebook has developed.
Thrift (protocol) Thrift is a lightweight remote procedure call framework for scalable cross-language services development. Thrift supports C++, PHP, Python, Perl, Java, Ruby, Erlang, and others. It’s quick, saves development time, and provides a division of labor of work on highperformance servers and applications. Scribe (log server) Hip. Hop for PHP is a source code transformer for PHP script code and was created to save server resources. Hip. Hop transforms PHP source code into optimized C++. After doing this, it uses g++ to compile it to machine code. Scribe is a server for aggregating log data streamed in real-time from many other servers. It is a scalable framework useful for logging a wide array of data. It is built on top of Thrift.
Cassandra (Database) Cassandra is a distributed storage system for managing structured data that is designed to scale to a very large size across many commodity servers, with no single point of failure. Reliability at massive scale is a very big challenge. Outages in the service can have significant negative impact. Hence Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different datacenters). At this scale, small and large components fail continuously; the way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. Cassandra has achieved several goals – scalability, high performance, high availability and applicability. In many ways Cassandra resembles a database and shares many design and implementation strategies with databases. Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. The rest of the material talks about the data model and the distributed properties, provided by the system.
Data Model Every row is identified by a unique key. The key is a string and there is no limit on its size. An instance of Cassandra has one table which is made up of one or more column families as defined by the user. The number of column families and the name of each of the above must be fixed at the time the cluster is started. There is no limitation the number of column families but it is expected that there would be a few of these. Each column family can contain one of two structures: supercolumns or columns. Both of these are dynamically created and there is no limit on the number of these that can be stored in a column family. Columns are constructs that have a name, a value and a user-defined timestamp associated with them. The number of columns that can be contained in a column family is very large. Columns could be of variable number per key. For instance key K 1 could have 1024 columns/super columns while key K 2 could have 64 columns/super columns. “Supercolumns” are a construct that have a name, and an infinite number of columns assosciated with them. .
Distribution, Replication and Fault Tolerance Data is distributed across the nodes in the cluster using Consistent Hashing based and on an Order Preserving Hash function. Cluster membership is maintained via Gossip style membership algorithm. Failures of nodes within the cluster are monitored using an Accrual Style Failure Detector. High availability is achieved using replication and we actively replicate data across data centers. Since eventual consistency is the mantra of the system reads execute on the closest replica and data is repaired in the background for increased read throughput. System exhibits incremental scalability properties which can be achieved as easily as dropping nodes and having them automatically bootstrapped with data.
Database Schema used by Facebook
Components of Facebook Personal Profile: Personal Information Photo Groups Class Schedule Wall
Components of Facebook “Friends” Who can be friends? From YOUR school, from OTHER schools Current students, alumni/ea, faculty, staff, etc Anyone who has a abcde@diversity. edu e-mail address can register for Facebook How do you become friends? Request an acknowledgment of “friend” status
Components of Facebook Details and Social Timeline Friend Details How you “know” this person Lived together, worked together, organizations/teams, took a course together, summer/study abroad program, went to school together (preschool, elementary school, middle school, high school, college, grad school), family, through a friend, through Facebook, met randomly, “hooked up”, dated Social Timeline Uses friend details to construct a timeline for the user Displays groups, friends and more that the user was connected to through Facebook by year
Components of Facebook Groups and Groupies Thousands of groups can be joined by members of Facebook: “I love Harry Potter” “Procrastinators Unite…Tomorrow!!” “Student Government Association @ Diversity College” You can also become a “groupie” of a group if you know a certain amount of people with membership in that group The groupie feature can be turned off in the privacy settings
Components of Facebook Events Groups and individual Facebook users can create, post and invite others to events: “John’s 20 th Birthday Bash” “Sorority Recruitment Informational Meeting” “Fusion Hall Council Meeting” Personal invitations can be sent or the event can be listed as open to anyone There is also and RSVP feature for Facebook events
Components of Facebook Messages Internal e-mail-type component of Facebook Messages can be sent from any Facebook user to another, regardless of school or friend status “Poke” Feature This feature sends a message via Facebook to another user stating that he or she has been “poked” by that person, then the option to “poke” back is provided There is no specific purpose to the “poke” Considered flirting by some, or simply a joke between friends
Components of Facebook The Facebook “Wall” Each individual and group profile can have a wall Essentially a message board where other users can post public messages on a user’s profile Can be edited by the person whose profile the message is posted The message writer’s Facebook picture appears next to their message
Components of Facebook Photo Features: Profile Photo Appears on the user’s profile page, attached to messages and other things the user does on Facebook My Photo Page Allows the user to post “albums” of pictures The user can label the people in the pictures and provide descriptions of what is occurring in the picture The user can also “tag” the people in the picture, which ties the image to that user’s profile in an additional photo section
Components of Facebook Other components: Advertisment: can be purchased by students or corporations Generates revenue for Facebook Pulse Page Has Top Ten lists generated from the Facebook community and other trend driven features
Facebook: The Good It’s FREE! Thousands of groups Can help you find others who share your interests, hobbies, major, etc. Academics Find students enrolled in your classes to form study groups Locate friends From home, high school, and other places who you have lost touch with
Facebook: The Bad Procrastination Tool Most students who use Facebook state that it can serve as a distraction from school work and other responsibilities Feeling of “safe” and “private” playground for students In fact many people other than students can access Facebook profiles
Facebook: The Ugly Internet Stalking Personal information such as address, phone number and class schedule can provide many tools to individuals interesting in keeping tabs on someone Incriminating and questionable photos tagged to your profile by you or others Schools and police may use as evidence Can be used by employers who are interested in background information
Thank You!!!!
- Slides: 31