1 SPAMMERS ON TWITTER 2 1 TWITTER GAMES

  • Slides: 67
Download presentation
1

1

SPAMMERS ON TWITTER 2 1) TWITTER GAMES: HOW SUCCESSFUL SPAMMERS PICK TARGETS BY VASUMATHI

SPAMMERS ON TWITTER 2 1) TWITTER GAMES: HOW SUCCESSFUL SPAMMERS PICK TARGETS BY VASUMATHI SRIDHARAN, VAIBHAV SHANKAR, MINAXI GUPTA –INDIANA UNIVERSITY 2) DETECTING SOCIAL SPAM CAMPAIGNS ON TWITTER BY ZI CHU, INDRA WIDJAJA, HAINING WANG Avichai Cohen and Kira Belkin

Introduction 3 �E-mail spam is now successfully managed �OSNs are becoming extremely popular –Twitter

Introduction 3 �E-mail spam is now successfully managed �OSNs are becoming extremely popular –Twitter over 200 million users �A new type of spamming : all inside network 140 chars limit Twitter’s popularity depends on quality

About Twitter 4

About Twitter 4

Tweet Types 5 �Regular Tweet – received by senders followers

Tweet Types 5 �Regular Tweet – received by senders followers

Tweet Types 6 �Reply Tweet – received by those following both sender and recipient

Tweet Types 6 �Reply Tweet – received by those following both sender and recipient

Tweet Types 7 �Mention Tweet – received by those following the sender

Tweet Types 7 �Mention Tweet – received by those following the sender

Tweet Types 8 �Retweet – received by sender’s followers

Tweet Types 8 �Retweet – received by sender’s followers

Privacy Issues 9 �All tweet types go to user’s home timeline which is private

Privacy Issues 9 �All tweet types go to user’s home timeline which is private �Tweets sent by user appear in his profile timeline and are public

Social Trust 10 �Same network creates social trust: Reliance on content Users more willing

Social Trust 10 �Same network creates social trust: Reliance on content Users more willing to click on links / read content Creates trust between unknown users �A rapidly growing number of users

Social Trust 11 �Same network creates social trust: Reliance on content Users more willing

Social Trust 11 �Same network creates social trust: Reliance on content Users more willing to click on links / read content Creates trust between unknown users �A rapidly growing number of users Ideal Platform for Spammers for example: 0. 13% of spam tweets are clicked as oppose to 0. 003% – 0. 006% for spam mail.

Spam Definition in this paper 12 �Spam: Spreading malicious content Phishing Spreading scam –

Spam Definition in this paper 12 �Spam: Spreading malicious content Phishing Spreading scam – pornography, gambling, fake pharmaceuticals

Spamming Techniques 13

Spamming Techniques 13

URL shortening 14 � 140 chars limitation requires URL shortening, popular spammer tool Leads

URL shortening 14 � 140 chars limitation requires URL shortening, popular spammer tool Leads to: http: //www. haaretz. co. il/news/science/1. 1904319

Hashtag 15 �Hashtag – groups tweets by topic.

Hashtag 15 �Hashtag – groups tweets by topic.

Hashtag – Trending Topics 16 �Hashtag – groups tweets by topic. �Used for trending

Hashtag – Trending Topics 16 �Hashtag – groups tweets by topic. �Used for trending topics like #Japan_Tsunami and #Egyptian_Revolution in March 2011. �Today - #Mention 25 Cute. People. On. Twitter

Mention 17 �Enables sending a direct message to a user using @ char. �Allows

Mention 17 �Enables sending a direct message to a user using @ char. �Allows a spammer to directly send spam to a target

Protection From Spammers 18

Protection From Spammers 18

 Spam Policy on Twitter - Content 19 �Content – forbidden to: post content

Spam Policy on Twitter - Content 19 �Content – forbidden to: post content / URL spam use large number of unrelated @replies, mentions and #hashtags duplicate content (from single or multiple accounts).

 Spam Policy on Twitter – Social Relationship 20 �Social relationship – forbidden to:

Spam Policy on Twitter – Social Relationship 20 �Social relationship – forbidden to: Follow a large number of users in a short amount of time Have a small number of followers compared to num of friend the user is following Create / purchase accounts in order to gain followers

Tweet – Level detection 21 �Checks for – spam text content, URLs �Today over

Tweet – Level detection 21 �Checks for – spam text content, URLs �Today over 8. 3 million tweets per hour �Near real time delivery required

Tweet – Level detection 22 �Checks for – spam text content, URLs �Today over

Tweet – Level detection 22 �Checks for – spam text content, URLs �Today over 8. 3 million tweets per hour �Near real time delivery required INEFFICIENT

Account –Level detection 23 �Checks for – evidence of posting spam, aggressive automation �If

Account –Level detection 23 �Checks for – evidence of posting spam, aggressive automation �If found – account suspended �Spammers can easily create new accounts

Account –Level detection 24 �Checks for – evidence of posting spam, aggressive automation �If

Account –Level detection 24 �Checks for – evidence of posting spam, aggressive automation �If found – account suspended �Spammers can easily create new accounts INEFFICIENT

Collective detection –Spam campaigns 25 �Coordinate multiple accounts to achieve a specific purpose. �Used

Collective detection –Spam campaigns 25 �Coordinate multiple accounts to achieve a specific purpose. �Used to avoid being detected. �Distributed workload – individual accounts fly under the radar. �Wider audience. �Detecting Spam Campaigns is complement to conventional spam detection.

Spam account types 26 �Spam campaigns either create or compromise a large number of

Spam account types 26 �Spam campaigns either create or compromise a large number of Twitter accounts. �When detected: Sybil accounts (created by spammers) will be permanently suspended Owners of compromised accounts will be notified

Who Are The Spammers 27

Who Are The Spammers 27

Data collection 28 1) Tweeter’s streaming API was used for tweets collection at 1/11/2011

Data collection 28 1) Tweeter’s streaming API was used for tweets collection at 1/11/2011 2) The data set contained 19, 991, 050 tweets and 7, 078, 643 Twitter account profiles 3) Checked for each account, found 82, 274 suspend profiles. 4) Examined only English- tweeting profiles – 53, 083 5) Data (tweets) was collected for 5 days

Data collection 29 �Set a minimum of 10 tweets within the 5 days of

Data collection 29 �Set a minimum of 10 tweets within the 5 days of data collection - 14, 230 profiles �These are “Successful spam profiles”, others are “Unsuccessful spam profiles”

Total Picture 30 7, 078, 643 (total) 82, 274 (suspended) 53, 083 (English) 14,

Total Picture 30 7, 078, 643 (total) 82, 274 (suspended) 53, 083 (English) 14, 230 (Successful spammers)

Unsuccessful vs. Successful Spam Profiles 31 � 1) 70% suspended on the first day

Unsuccessful vs. Successful Spam Profiles 31 � 1) 70% suspended on the first day � 2) 1/6 have over 100 followers � 3) 40% have zero followers 15% suspended on the first day 1/3 have over 100 followers 5% have zero followers

Spammers Behavior 32 �Notice: Regular tweets Re-tweets- naturally low Spammers use fewer types of

Spammers Behavior 32 �Notice: Regular tweets Re-tweets- naturally low Spammers use fewer types of tweets compared to other users

Who are the targets? 33

Who are the targets? 33

Who are the targets? 34 1) Spammer’s own followers! (In sharp contrast to an

Who are the targets? 34 1) Spammer’s own followers! (In sharp contrast to an earlier study)

Spamming Own Followers 35 �Over 2/3 of successful spammers use exclusively regular tweets. 1/3

Spamming Own Followers 35 �Over 2/3 of successful spammers use exclusively regular tweets. 1/3 of them have over 100 followers!

Spamming Own Followers – Spam Campaigns 36 �Out of 14, 230 successful spam accounts

Spamming Own Followers – Spam Campaigns 36 �Out of 14, 230 successful spam accounts we only consider ones which have at least 10 regular tweets with links – 7, 704 �In order to study spam campaigns destination we require that 80% of the final destination links for each profile lead to the same domain name – 6, 630

Spamming Own Followers – Spam Campaigns 37 �In total 6, 630 spammers lead their

Spamming Own Followers – Spam Campaigns 37 �In total 6, 630 spammers lead their followers to 559 different domains. �One interesting domain: 1) t. co – Twitter’s warning page (warns of malware). - 1, 822 profiles led there.

Who are the targets? 38 1) Spammer’s own followers! (In sharp contrast to an

Who are the targets? 38 1) Spammer’s own followers! (In sharp contrast to an earlier study) 2) Followers of other popular accounts

Spamming Followers of Popular Profiles 39 �When a spammers wants to target many people

Spamming Followers of Popular Profiles 39 �When a spammers wants to target many people with a certain interest (Music- Musician) �Done using reply / mention tweets � 4, 086 of the spam profiles used reply / mention. But did they target followers of other profiles?

Spamming Followers of Popular Profiles 40 1) If at least 4 twitter users received

Spamming Followers of Popular Profiles 40 1) If at least 4 twitter users received spam from a specific profile check 2) 2) If at least 50% of the users were following the same profile – the spammer targeted the users of that profile Spammer T 1 T 2 T 3 Popular Profile T 4

Who are the targets? 41 1) Spammer’s own followers! (In sharp contrast to an

Who are the targets? 41 1) Spammer’s own followers! (In sharp contrast to an earlier study) 2) Followers of other popular accounts 3) Targets whose tweets contain keywords relevant to their spam campaign

Spamming Based on Keywords in Tweets 42 �Picking targets based on the content of

Spamming Based on Keywords in Tweets 42 �Picking targets based on the content of their tweets �Tweeter lets search tweets based on keywords �Can be used with reply / mention tweets �How can we determine was it used as a strategy?

Spamming Based on Keywords in Tweets 43 �Impossible for mention tweets, possible for reply

Spamming Based on Keywords in Tweets 43 �Impossible for mention tweets, possible for reply � 2, 969 of successful spammers used reply tweets �In order to judge if a spam tweet contained a keyword we’ll identify possible keywords

Spamming Based on Keywords in Tweets 44 �TF-IDF - a numerical statistic which reflects

Spamming Based on Keywords in Tweets 44 �TF-IDF - a numerical statistic which reflects how important a word is to a document in a collection or corpus �The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others

Spamming Based on Keywords in Tweets 45 �The tf-idf of about 7 million words

Spamming Based on Keywords in Tweets 45 �The tf-idf of about 7 million words present in tweets from successful spammers was computed – top 50 K were picked as possible keywords �For each profile that had at least 3 reply tweets: 1) Extract the source tweet 2) Look for common keywords in source tweet 3) If a word appeared in the spammers tweets and was one of the 50 K -> The spammer targeted authors of tweets with a specific keyword

Spamming Based on Keywords in Tweets 46 2, 969 (used reply tweets) 2, 419

Spamming Based on Keywords in Tweets 46 2, 969 (used reply tweets) 2, 419 (at least 3) 1, 004 (fit strategy) 710 (80% of links go to single domain)

Who are the targets? 47 1) Spammer’s own followers! (In sharp contrast to an

Who are the targets? 47 1) Spammer’s own followers! (In sharp contrast to an earlier study) 2) Followers of other popular accounts 3) Targets whose tweets contain keywords relevant to their spam campaign 4) Hijack of trendy topics – increase the chance to be found

Trending Topics Hijacking 48 �Popular hashtags become trending topics �May be used among other

Trending Topics Hijacking 48 �Popular hashtags become trending topics �May be used among other spamming strategies �May be used with any tweet type �How do we identify a strategy?

Trending Topics Hijacking 49 1) Pre- calculate a set of 200 most popular hashtags

Trending Topics Hijacking 49 1) Pre- calculate a set of 200 most popular hashtags (out of 466, 597 in data set) 2) Require that each spam profile has at least 3 tweets with hashtags 3) If found presence of popular hashtags - fits strategy

Trending Topics Hijacking 50 4, 327 (at least one #) 3, 503 (at least

Trending Topics Hijacking 50 4, 327 (at least one #) 3, 503 (at least 3 #) 1, 043 (fit strategy) 174 (at least 10 # with links, 80% to the same domain)

Targeting Own Followers by Retweets 51 �More restricted than regular tweets �Have higher click

Targeting Own Followers by Retweets 51 �More restricted than regular tweets �Have higher click rate than regular tweets �Least popular among all user types � 1, 230 successful spammers used them, only 28 were running campaign using retweets

Posting Methodology 52 �Web interfaces and various third party clients: applications for smartphones RSS

Posting Methodology 52 �Web interfaces and various third party clients: applications for smartphones RSS to tweet (blogs) �Twitter shows how a tweet was posted :

Posting Methodology 53 �The method of posting can indicate if a tweet is spam

Posting Methodology 53 �The method of posting can indicate if a tweet is spam or good.

Total Picture 54 7, 078, 643 (total) 82, 274 (suspended) 53, 083 (English) 14,

Total Picture 54 7, 078, 643 (total) 82, 274 (suspended) 53, 083 (English) 14, 230 (Successful spammers) 8, 805 (binned) + 5, 425 (unbinned)

Unbinned Spam Profiles 55 �Remainder: a profile is binned if it appears to be

Unbinned Spam Profiles 55 �Remainder: a profile is binned if it appears to be running a campaign, defined by at least 10 tweets with links and 80% of them lead to the same domain. �Out of our 14, 230 profiles, 5, 425 are unbinned. � 64. 7% of them exclusively made regular tweets. �The remaining – 1, 910: 89. 7% sent at least one mention tweet 328 sent exclusively mention tweets

How to get followers? Why bother? 56

How to get followers? Why bother? 56

Garnering Followers 57 1) Become a part of peer- driven communities that encourage following

Garnering Followers 57 1) Become a part of peer- driven communities that encourage following back

Garnering Followers 58 2) Buying followers:

Garnering Followers 58 2) Buying followers:

 Garnering Relevant Followers - Problems 59 �This methods are unlikely to work for

Garnering Relevant Followers - Problems 59 �This methods are unlikely to work for spam profiles that require relevant followers. �Mainly, spammers that use exclusively regular tweets (many successful spammers) �How can it be done?

Garnering Relevant Followers - Solutions 60 �Possible strategies to locate targets and friend them:

Garnering Relevant Followers - Solutions 60 �Possible strategies to locate targets and friend them: Follow popular profiles Spamming based on keywords Publicly searchable profile information to target users based on location, interests etc.

Garnering Relevant Followers - Solutions 61 �Possible strategies to locate targets and friend them:

Garnering Relevant Followers - Solutions 61 �Possible strategies to locate targets and friend them: Follow popular profiles Spamming based on keywords Publicly searchable profile information to target users based on location, interests etc. �These targets may be interested in the spammer’s content, they are more likely to follow back – And if so, is it still spam?

Related Work 62 �Spam on OSN’s has been analyzed before: Online video spam on

Related Work 62 �Spam on OSN’s has been analyzed before: Online video spam on You. Tube Detecting and characterizing spam campaigns on Facebook Earlier work on spam on Twitter includes: � Distinguishing between spam and non spam profiles � Distinguishing between spam and non spam tweets � Classifying profiles that send spam using machine learning � Detecting and characterization of suspicious URLs � How spammers get embedded deeper in social networks

Conclusions 63 �Spammers have evolved since a similar study performed a year earlier (suspension,

Conclusions 63 �Spammers have evolved since a similar study performed a year earlier (suspension, methods) �The complexity of their strategies is likely to increase (tools simulating human behavior developed, they get more experienced, etc. ) �Constant research is required to keep track

Conclusions 64

Conclusions 64

Questions? ? 65 “Judge a man by his questions rather than by his answers.

Questions? ? 65 “Judge a man by his questions rather than by his answers. ” -Voltaire

Spammers Behavior - 2 66 �Notice: ¾ of successful spammers using exclusively one tweet

Spammers Behavior - 2 66 �Notice: ¾ of successful spammers using exclusively one tweet type 2/3 of unsuccessful spammers do so Only 14% of regular users do so

Spammers vs. Good users Behavior 67 Almost ¾ use only a type of tweet

Spammers vs. Good users Behavior 67 Almost ¾ use only a type of tweet Only 13% do so certain 2/3 target only their followers Only 10% do so