Dynamic content WUCM 1 1 Basic architecture of
Dynamic content WUCM 1 1
Basic architecture of the web WUCM 1 2
Dynamic web pages • Stages: – Collect data from user in browser • Usually using an HTML form – Send data in HTTP request to server – Server processes request (dynamically) • Different models of how to do this – Server sends HTTP response to browser – Browser displays response WUCM 1 3
HTML forms <FORM METHOD=GET ACTION="/cgi-bin/mycgi. bat"> <p> Which lecture are you missing? (Give the lecture number not title) <INPUT NAME="WUCMI_unit" TYPE=int> <p> When will you come to collect them? <ol> <li>Next lecture <INPUT NAME="collect_type" TYPE=checkbox VALUE="Next. Lecture"> <li>Next tutorial <INPUT NAME="collect_type" TYPE=checkbox VALUE="Next. Tutorial"> <li>At 5: 00 pm on Friday this week <INPUT NAME="collect_type" TYPE=checkbox VALUE="Friday"> </ol> <p> Your CAM number? <INPUT NAME="cam_num" id="cam_num" SIZE=20> <hr> <p> <INPUT TYPE=submit> <INPUT TYPE=reset> </FORM> WUCM 1 4
Passing data to the server • HTTP methods – GET for data retrieval – POST for data update – (and others we can ignore) WUCM 1 5
Passing data to the server • Using GET /cgi-bin/mycgi. bat? WUCMI_unit=78&collect_type=Next. Tutorial&dis_num=67 HTTP/1. 0 <headers. . . > • Using POST /cgi-bin/mycgi. bat HTTP/1. 0 <headers. . . > WUCMI_unit=78&collect_type=Next. Tutorial&dis_num=67 WUCM 1 6
HTTP encoding • Queries – ? separates path from query • Parameters – & separates name/value pairs – = separates name and value • Encoding – + for space – %xx for special characters (e. g. %7 E for ~) WUCM 1 7
Dynamic web pages • Four models: – Server-side includes (SSI) – CGI – Server modules – Auxiliary servers WUCM 1 8
CGI architecture WUCM 1 9
CGI characteristics • Web server creates a new process for each request that maps onto a program • Data passed according to CGI • Server reads output of program from program • CGI spec: http: //hoohoo. ncsa. uiuc. edu/cgi/ • Can use pretty much any programming language – best known Perl, Python, C/C++ WUCM 1 10
Pros and cons of CGI • Cons: • Pros: – Independent of server - if program crashes it cannot affect the server – The web server takes up less memory if it does not load any server modules – Any memory (or other resources) used by the CGI program is released when the CGI program terminates WUCM 1 – The time to create a new process to handle the CGI request is relatively long – For programs that access databases, each new process must establish a new database connection 11
Server module WUCM 1 12
Server module characteristics • Web server invokes interpreter via API for each request that maps onto a program • Data passed via API • Server gets output via API • Popular for: – PHP – ASP. NET – Perl (as an alternative to CGI) WUCM 1 13
Pros and cons of server modules • Pros: • Cons: – No need to create a separate process, therefore faster – For programs that access databases, the server can maintain a persistent connection to a database, saving reconnection time – Server and program inextricably linked - a crash within the server module may crash the server – The web server will occupy more memory because of the size of the server module(s) it loads – If any server module needs a lot of memory, that memory will not be released (at least not until the server dies) WUCM 1 14
Auxiliary server WUCM 1 15
Auxiliary server characteristics • Auxiliary server runs on a different TCP/IP port (and potentially on a different machine) • Relevant requests forwarded by web server to auxiliary server • Server passes response back • Common for: – Java – PL/SQL (Oracle) WUCM 1 16
Pros and cons of auxiliary servers • Pros: • Cons: – No need to create a new process for each request – Can maintain state (if desired) including database connections – Separate from the main web server WUCM 1 – Overhead of resending HTTP requests and responses 17
Big benefits of auxiliary servers • Enterprise scalability – add new web servers – add new auxiliary servers – cross-connect between them – fits in with database scalability • Resilience and reliability WUCM 1 18
Web programming languages • Programmatic • Output-based – Output HTML in print statements – Use normal programming language constructs – HTML page with programming statements embedded – Can require contrived programming language constructs • Examples: – Perl – Java (servlets) – C/C++ • Better when the complexity is in the data capture and processing – PHP – ASP – Java (Java Server Pages) • Better when the complexity is in the output format WUCM 1 19
Examples (both Java) protected void process. Request (Http. Servlet. Request request, Http. Servlet. Response response) throws Servlet. Exception, IOException { response. set. Content. Type ("text/html"); Print. Writer out = response. get. Writer (); out. println("<html>"); out. println("<head>"); out. println("<title>Hello world servlet</title>"); out. println("</head>"); out. println("<body>"); <%@page content. Type="text/html"%> <html> <head><title>Hello JSP Page</title></head> <body> <P>Hello <%= request. get. Parameter("name") %> </body> </html> String name = request. get. Parameter ("name"); out. println("<P>Hello " + name); out. println("</body>"); out. println("</html>"); } out. close (); WUCM 1 20
CGI WUCM 1 21
Script interaction – CGI • A CGI script is a program run on the server: – How can it get its input? – What should it do with its output? • CGI scripts are loaded and executed at the request of the web server • Server passes details about the request through environment variables WUCM 1 22
CGI input • Environment variables pass details on: – The URL the script was called from – Request parameters passed from web browser – The HTTP method used – General information about the request • In addition, input can come via the standard input (STDIN) if the method was POST WUCM 1 23
CGI Output • The script passes information back via standard output (STDOUT) • Server will ensure that data presented to STDOUT is routed to the browser • Server will output the relevant data from the web browser so can be read from STDIN • But HOW? - via forms, e. g. WUCM 1 24
Forms for dynamic programming • The significant part of the html is: <FORM METHOD=GET ACTION="/cgi-bin/mycgi. bat"> • Notice that: – the FORM method is GET – the FORM action is to run the mycgi. bat program WUCM 1 25
GET as the form method • Any data is passed as a “query string” • Separator is ‘? ’ • Parameters concatenated with ‘&’ to form the “query string” GET /cgi-bin/mycgi. bat? WUCMI_unit=78&collect_type=Next. Tutorial&dis_num=67 HTTP/1. 0 • Complications: – If the user enters any characters not permitted in URLs, like spaces or &, =, ? – The data is encoded using the ISO 8859 rules, e. g. for a space, this would give %20, its ASCII code • Decoding these is best left to library code – CGI. pm in the case of Perl WUCM 1 26
POST as the form method • The data is sent in the request body • The web server will present this data to the CGI script as input on STDIN • Browser indicates how much data is being transferred, so that the CGI script knows how much to expect on STDIN WUCM 1 27
When to use which method? • Use GET for actions – that are safe – that are idempotent – where the total length of the URI is less than 256 chars – where it's OK for the parameters to be visible to the user – where it's OK for the parameters to be preserved as part of a bookmark • Use POST for – everything else • Data update • Data retrieval WUCM 1 28
Common CGI environment variables • REQUEST_METHOD – how the script was called, i. e. GET or POST • PATH_INFO – the relative path of the requested resource • PATH_TRANSLATED – the absolute path of the requested resource • QUERY_STRING – additional supplied parameters, if any • SCRIPT_NAME – the actual name of the script WUCM 1 29
Simple Apache CGI configuration • CGI scripts are handled by mod_cgi • Apache needs to be told: – Which directory contains scripts – How to recognise them as executable programs rather than files to be delivered • The choice is basically: – Use Script. Alias in the config file, setting up a safe directory – outside tree – Use Add. Handler or Set. Handler to set a handler type of cgi-script for script files in a directory in the document tree WUCM 1 30
Example - Script. Alias • Assume: – A cgi-bin directory parallel to the htdocs directory in your web server's space – e. g. C: ApacheRogercgi-bin • Then the httpd. conf file could be WUCM 1 31
Example httpd. conf file # permit access to cgi-bin directory # default deny from Prac 04 <Directory "C: /Apache/Roger/cgi-bin"> Options –Indexes +Exec. CGI Allow. Override None Order allow, deny Allow from all </Directory> Transfer. Log "logs/access. log" Error. Log "logs/error. log" Log. Level warn # tell Apache where cgi scripta are. Script. Alias /cgi-bin/ "C: /Apache/Roger/cgi-bin/" # tell Apache where to put the script errors. Script. Log "logs/script. log" WUCM 1 32
Marking scripts as executable • This is OS specific • Under Windows: – Usually determined by the extension, . cmd, . bat, . pl or. exe • Under Unix: – chmod +x filename – #!/bin/perl first line WUCM 1 33
Debugging scripts • Since CGI scripts run through Apache they are more difficult to debug • For a simple test, e. g. filling in the form and then clicking “submit” where do the error & debug messages go? • Usual answer is the log files: – both Error. Log If Apache config problem. – and Script. Log – depending on what has gone wrong WUCM 1 34
Debugging scripts • Some CGI libraries, e. g. Perl CGI: : Carp, direct error messages to the browser for debugging purposes – not useful for production systems • For debugging messages, send output to STDERR – they will be added to Script. Log • Problems with testing CGI scripts from the command line – no web server to generate input WUCM 1 35
A few security points • Apache's privileges (and hence those of any CGI scripts it runs) – Apache must start as root to bind to port 80 • In a well configured server, once this initial binding is done, Apache will drop back to a very low privilege user, (nobody, webuser etc) • If due to a poor configuration file it does not, then any later executed CGI script will be running as root – potentially very dangerous WUCM 1 36
A few security points • Editors used to edit CGI scripts – Often produce backup files with standard extensions, e. g. PFE uses. $$$ – To protect deny from suspect extensions: <Files *$$$> Order allow, deny Deny from all </Files> WUCM 1 37
su. EXEC and CGIWrap • To increase the security when running CGI scripts, they are often “wrapped” • The CGI wrappers can: – control the ownership of CGI scripts – subject the script to stringent security tests • The two main alternatives are: – su. EXEC –bundled with Apache – CGIWrap –produced by an independent group WUCM 1 38
- Slides: 38