Proxy Servers What Is a Proxy Server Intermediary

  • Slides: 42
Download presentation
Proxy Servers

Proxy Servers

What Is a Proxy Server? • Intermediary server between clients and the actual server

What Is a Proxy Server? • Intermediary server between clients and the actual server • Proxy processes request • Proxy processes response • Intranet proxy may restrict all outbound/inbound requests the intranet server 2

What Does a Proxy Server Do? Ø Between client and server Ø Receives the

What Does a Proxy Server Do? Ø Between client and server Ø Receives the client request Ø Decides if request will go on to the server Ø May have cache & may respond from cache Ø Acts as the client with respect to the server Ø Uses one of it’s own IP addresses to get page from server 3

Usual Uses for Proxies • Firewalls • Employee web use control (email etc. )

Usual Uses for Proxies • Firewalls • Employee web use control (email etc. ) • Web content filtering (kids) – Black lists (sites not allowed) – White lists (sites allowed) – Keyword filtering of page content 4

User Perspective Ø Proxy is invisible to the client Ø IP address of proxy

User Perspective Ø Proxy is invisible to the client Ø IP address of proxy is the one used or the browser is configured to go there Ø Speed up retrieval if using caching Ø Can implement profiles or personalization 5

Main Proxy Functions Ø Caching Ø Firewall Ø Filtering Ø Logging 6

Main Proxy Functions Ø Caching Ø Firewall Ø Filtering Ø Logging 6

Web Cache Proxy ØOur concern is not with browser cache! ØStore frequently used pages

Web Cache Proxy ØOur concern is not with browser cache! ØStore frequently used pages at proxy rather than request the server to find or create again ØWhy? ØReduce latency: faster to get from proxy & so makes the server seem more responsive ØReduce traffic: reduces traffic to actual server 7

Proxy Caches Ø Proxy cache serves hundreds/thousands of users Ø Corporate and intranets often

Proxy Caches Ø Proxy cache serves hundreds/thousands of users Ø Corporate and intranets often use Ø Most popular requests are generated only once Ø Good news: Proxy cache hit rates often hit 50% Ø Bad news: Stale content (stock quotes) 8

How Does a Web Cache Work? Ø Set of rules in either or both

How Does a Web Cache Work? Ø Set of rules in either or both ØProxy admin ØHTTP header 9

Don’t Cache Rules Ø HTTP header Ø Cache-control: max-age=xxx, must-revalidate Ø Expires: date… Ø

Don’t Cache Rules Ø HTTP header Ø Cache-control: max-age=xxx, must-revalidate Ø Expires: date… Ø Last-modified: date… Ø Pragma: no-cache (doesn’t always work!) Ø Object is authenticated or secure Ø Fails proxy filter rules Ø URL Ø Meta data Ø MIME type Ø Contents 10

Getting From Cache Ø Use cache copy if it is fresh ØWithin date constraint

Getting From Cache Ø Use cache copy if it is fresh ØWithin date constraint ØUsed recently and modified date is not recent 11

2. Firewalls Ø Proxies for security protection Ø More on this later 12

2. Firewalls Ø Proxies for security protection Ø More on this later 12

3. Filtering at the Proxy 1. URL lists (black and white lists) 2. Meta

3. Filtering at the Proxy 1. URL lists (black and white lists) 2. Meta data 3. Content filters 13

Filtering Web doc label base ratings keywords URL lists URLs 14

Filtering Web doc label base ratings keywords URL lists URLs 14

The Problem: the Web Ø 1 billion documents (April 2000) ØAverage query is 2

The Problem: the Web Ø 1 billion documents (April 2000) ØAverage query is 2 words (e. g. , Sara name) ØContinual growth ØBalance global indexing and access and unintentional access to inappropriate material 15

Filtering Application Types Proxies ØBlack lists ØWhite lists ØKeyword profiles ØLabels 16

Filtering Application Types Proxies ØBlack lists ØWhite lists ØKeyword profiles ØLabels 16

Black and White Lists Ø Black list : URLs proxy will not access Ø

Black and White Lists Ø Black list : URLs proxy will not access Ø White list: URLs proxy will allow access 17

How Is Filtering/selection Done? Ø Build a profile of preferences Ø Match input against

How Is Filtering/selection Done? Ø Build a profile of preferences Ø Match input against the profile using rules 18

Black and White Lists Ø Black list of URLs ØNo access allowed Ø White

Black and White Lists Ø Black list of URLs ØNo access allowed Ø White list of URLs ØAccess permitted 19

Lists in Action Ø 1 billion documents! Ø Who builds the lists Ø Who

Lists in Action Ø 1 billion documents! Ø Who builds the lists Ø Who updates them Ø Frequency of updates 20

Labels Ø Metadata tags Ø Rule driven: PICS rules for example Ø Labels are

Labels Ø Metadata tags Ø Rule driven: PICS rules for example Ø Labels are part of document or separate Ø Separate = label bureau 21

Labels Ø Metadata (goes with page) Ø Label Bureau (stored separately from page) 22

Labels Ø Metadata (goes with page) Ø Label Bureau (stored separately from page) 22

Meta Data as part of HTML doc <HTML> <HEAD> <META HTTP-EQUIV=“keywords” CONTENT=“federal”> <META HTTP-EQUIV=“keywords”

Meta Data as part of HTML doc <HTML> <HEAD> <META HTTP-EQUIV=“keywords” CONTENT=“federal”> <META HTTP-EQUIV=“keywords” CONTENT=“tax”> </HEAD> …… </HTML> Browser and/or proxy interpret the metadata 23

Metadata Apart From Doc Ø Label bureaus Ø Request for a doc is also

Metadata Apart From Doc Ø Label bureaus Ø Request for a doc is also a request for labels from one or more label bureaus Ø Who makes the labels Ø Text analysis Ø Community of users Ø Creator of document 24

Labels: Collaborative Filtering Label Bureau B Labels Label Bureau A Rating Service Search Engine

Labels: Collaborative Filtering Label Bureau B Labels Label Bureau A Rating Service Search Engine Author Labels Web Site 25

PICS and PICS Rules Ø Tools for communities to use profiles and control/direct access

PICS and PICS Rules Ø Tools for communities to use profiles and control/direct access Ø Structure designed by W 3 consortium Ø Content designed by communities of users 26

PICS Rating Data (PICS 1 -1 “http//www. abc. org/r 1. 5” by “John Doe”

PICS Rating Data (PICS 1 -1 “http//www. abc. org/r 1. 5” by “John Doe” labels on “ 1998. 11. 05” until “ 2000. 11. 01” for http: //www. xyz. com/new. html ratings (violence 2 blood 1 language 4) ) 27

Using a URL List Filtering (Pics. Rule-1. 1 (Policy (Reject. By. URL (http: //www.

Using a URL List Filtering (Pics. Rule-1. 1 (Policy (Reject. By. URL (http: //www. xyz. com: */*) Policy (Accept. If “otherwise”) ) ) 28

Using the PICS Data (Pics. Rule-1. 1 (serviceinfo ( http: //www. lablist. org/ratings/v 1.

Using the PICS Data (Pics. Rule-1. 1 (serviceinfo ( http: //www. lablist. org/ratings/v 1. html shortname “PTA” bureau. URL http: //www. lablist. org/ratings Use. Embedded “N” ) Policy (Reject. If “((PTA. violence >3) or (PTA. language >2))”) Policy (Accept. If “otherwise”) ) ) 29

Example: Medical PICS labels Ø Su – UMLS vocab word: 0 -9999999 Ø Aud-

Example: Medical PICS labels Ø Su – UMLS vocab word: 0 -9999999 Ø Aud- audience: 1 -patient, 3 -para, 5 -GP, etc. Ø Ty-information type: 5 -scientist, 3 -patient, 4 -prod Ø C-country: 1 -Can, 4 -Afghan, etc. Ø Etc. Ø Ratings(su 0019186 aud 3: 5 Ty 3 C 1) 30

User Profiles for Labels Ø Rules for interpreting ratings Ø Based on ØUser preferences

User Profiles for Labels Ø Rules for interpreting ratings Ø Based on ØUser preferences ØUser access privileges Ø Who keeps these Ø Who updates these Ø How fine is the granularity 31

Labels and Digital Signatures Labels can also be used to carry digital Signature and

Labels and Digital Signatures Labels can also be used to carry digital Signature and authority information 32

Example (''by. Key'' ((''N'' ''aba 21241241='') (''E'' ''abcdefghijklmnop=''))) (''on'' ''1996. 12. 02 T 22:

Example (''by. Key'' ((''N'' ''aba 21241241='') (''E'' ''abcdefghijklmnop=''))) (''on'' ''1996. 12. 02 T 22: 20 -0000'') (''Sig. Crypto'' ''aba 1241241=='')) (''Signature'' ''http: //www. w 3. org/TR/1998/RECDSig-label/DSS-1_0'' (''By. Name'' ''plipp@iaik. tugraz. ac. at'') (''on'' ''1996. 12. 02 T 22: 20 -0000'') (''Sig. Crypto'' ((''R'' 33 ''aba 124124156'')

Proxy level (hidden) 34

Proxy level (hidden) 34

Text analysis of Page content Ø Proxy examines text of page before showing it

Text analysis of Page content Ø Proxy examines text of page before showing it Ø Generally keyword based Ø Profile of ‘black’ and/or ‘white’ keywords 35

Profiles for Text analysis Ø Keywords (+ weights sometimes) Ø ‘Reflect’ interest of user

Profiles for Text analysis Ø Keywords (+ weights sometimes) Ø ‘Reflect’ interest of user or user group Ø May be used to eliminate pages Ø‘All but’ Ø May be used to select pages Ø‘Only those’ 36

Keyword matching algorithms 1. 2. 3. 4. 5. 6. 7. Extract keywords Eliminate ‘noisy’

Keyword matching algorithms 1. 2. 3. 4. 5. 6. 7. Extract keywords Eliminate ‘noisy’ words with stop list (1/3) Stem (computer compute computation) Match to profile Evaluate ‘value’ of match Check against a threshold for match Show or throw! 37

Stop List (35%) the of and to in a be will for on is

Stop List (35%) the of and to in a be will for on is with by as this are from or been was have it that at an were has (27 words) 38

Matching Profile to Page • Similarity? • • How many profile terms occur in

Matching Profile to Page • Similarity? • • How many profile terms occur in doc? How often? How many docs does term occur in? How important is the term to the profile? 39

Cosine Similarity Measurement • Profile terms weighted PW (0, 1) importance • Document terms

Cosine Similarity Measurement • Profile terms weighted PW (0, 1) importance • Document terms weighted TW (0, 1) – frequency in doc – frequency in whole set • Overall closeness of doc to profile (all profile terms)[TW *PW] ---------------------- ( (all profile terms)[TW 2]*[PW 2]) 40

What works well? 41

What works well? 41

What’s the problem? Site Labels Who does them? Are they authentic? Has the source

What’s the problem? Site Labels Who does them? Are they authentic? Has the source changed? A billion docs? Black and White lists Ditto Text analysis of page contents Poor results 42