Building Managing Web Sites Web Statistics 982021 1999
Building & Managing Web Sites Web Statistics 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 1
Why Care about Web Statistics • Identify problems – Scan logs for HTTP error codes. • Identify successes and weaknesses – Scan logs for heavily/lightly used pages. • Improve service – Adjust site based on study of log files. • Accountability – Provides metrics to quantify results to management and justify web expenditures. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 2
Terminology Log file • Created by a service, typically is ASCII format, separate line for each resource/file request. Many different formats. Hit or Request • An individual resource/file request made to the webserver. Since web pages are typically made up of multiple files (graphics, etc. ), not a good measure of usage. Page View • A transfer of a specific HTML to a client. A single page view many involve multiple hits for graphics, sounds files, etc. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 3
Terminology Proxy Server • A computer service run in your LAN or by an ISP • Logs HTTP requests, sends requests under a consolidated IP address(es), receives results and re-routes to original sender’s IP address. Visitor • A visitor typically defined as a specific IP address. May also be identifed by cookies. • Visitors are underreported due to proxy servers and overreported due to ISPs assigning multiple IP addresses to clients. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 4
Terminology Cookies / Tokens • Cookies save a small amount of ID data specific to the site issuing the cookie. Allows server to identify users between requests. Visit or Session • The activities of a user during a single visit. • Tracked by IP address and/or cookies. IP-based tracking problematic because the different visitors may use the same IP address. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 5
Types of Logs Access Log • Records clients accesses of your web resources. Typical data includes client IP address, URL accessed, data/time, HTTP status code, number of bytes sent. Error Log • Logs webserver errors. Sometimes merged in Access Log Referrer Log • Lists the site URL that a user came from before they accessed a page on your webserver. Agent Log • Records the browser or client software used to access a web resource. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 6
Log Formats NCSA Common Log File Format • The National Center for Supercomputing Applications (NCSA) common format; a fixed ASCII format. W 3 C Extended Log File Format • A customizable ASCII format, selected by default. Can capture the most information. ODBC Logging (Only Available with IIS) • A fixed format logged to a database (generally SQL Server). 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 7
Common (NCSA) Log Format tarpon. gulf. net - - [12/Jan/1996: 20: 37: 55 +0000] "GET index. htm HTTP/1. 0" 200 215 tarpon. gulf. net - - [12/Jan/1996: 20: 38: 56 +0000] "GET products. htm HTTP/1. 0" 200 215 tarpon. gulf. net - - [12/Jan/1996: 20: 39: 57 +0000] "GET sales. htm HTTP/1. 0" 200 215 tarpon. gulf. net - - [12/Jan/1996: 20: 39: 58 +0000] "GET /images/xxx. gif HTTP/1. 0" 404 215 tarpon. gulf. net - - [12/Jan/1996: 20: 39: 59 +0000] "GET /buttons/form. gif HTTP/1. 0" 200 215 Source: Web. Trends Web Site 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 8
Extended Log File Format 206. 58. 83. 186, -, 10/25/96, 11: 26: 35, W 3 SVC, MICRON, 206. 58. 83. 186, 4750, 253, 111, 404, 2, GET, /simple. htm, Mozilla/3. 0 Gold (Win. NT; I), http: //www. excite. com/servers, -, 206. 58. 83. 186, -, 10/25/96, 11: 26: 35, W 3 SVC, MICRON, 206. 58. 83. 186, 0, 271, 111, 404, 2, GET, /simple. htm, Mozilla/3. 0 Gold (Win. NT; I), http: //www. excite. com/servers, -, 206. 58. 83. 186, -, 10/25/96, 11: 27: 00, W 3 SVC, MICRON, 206. 58. 83. 186, 0, 271, 111, 404, 2, GET, /simple. htm, Mozilla/3. 0 Gold (Win. NT; I), http: //www. excite. com/servers, -, Source: Web. Trends Web Site 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 9
Extended Log File Format 1: Client’s IP address 2: Client’s Username 3: Date (mm/dd/yy) 4: Time 5: Service 6: Computer name 7: Server IP address 8: Processing time 9/8/2021 9: Bytes received 10: Bytes sent 11: Status Code 12: Win. NT status Code 13: Operation 14: Target file 15: Browser/Platform 16: Referring URL 17: script/dll variables © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 10
Configuring IIS Logging To setup logging on IIS, right-click on the webserver and select “properties. ” Best to select the W 3 C Extended format to capture the most information. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 11
Configuring IIS Logging 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 12
Configuring IIS Logging 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 13
Extended Logging Properties • • • Date -- Date on which the activity occurred. Time -- Time the activity occurred. Client IP Address -- IP address of the client that accessed your server. User Name -- Name of the user who accessed your server. Service Name -- Internet service that was running on the client computer. Server Name -- Name of the on which the log entry was generated. Server IP -- IP address of the server on which the log entry was generated. Server Port -- Port number the client is connected to. Method Action client was trying to perform (e. g. , HTTP GET command). URI Stem -- Resource accessed (e. g. , HTML page, CGI program, script). URI Query -- The query, if any, the client was trying to perform; that is, one or more search strings for which the client was seeking a match. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 14
Extended Logging Properties • • • Http Status -- Status of the action, in HTTP terms. Win 32 Status -- Status of the action, in terms used by Windows. NT. Bytes Sent -- Number of bytes sent by the server. Bytes Received -- Number of bytes received by the server. Time Taken -- Length of time the action took. Protocol Version -- Protocol (HTTP, FTP) version used by the client. For HTTP this will be either HTTP 1. 0 or HTTP 1. 1. • User Agent -- Browser used on the client. • Cookie -- Content of the cookie sent or received, if any. • Referrer -- Site on which the user clicked on a link that brought the user to this site. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 15
Typical Uses of Web Statistics • • • Measure banner ad hits Review error hits Match 404 error hits to referrers Identify/study most popular pages Identify/study least popular pages Identify common search requests 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 16
Best Practices • If hosting multiple virtual webservers on IIS, replace default log file locations with custom locations. • Use extended log format and save as much data as possible. Use a log file converter program to dumb down the log format if necessary. • Save your log files. Zip them up. • Don’t provide statistics to management on less than a monthly basis if possible. • Round statistics up to imply their approximations (e. g. , 102, 435 to 102, 000 or even 100, 000). • Use a commercial product like Web. Trends. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 17
Web Statistics Problems • An inexact science… • Can’t identify specific users. • People want more than you can give. 9/8/2021 © 1999, 2000 Valtara Digital Design/Blitzkrieg Software 18
- Slides: 18