PHP Internals Martin Kruli by Martin Kruli v
PHP Internals Martin Kruliš by Martin Kruliš (v 1. 1) 29. 2. 2016 1
Revision of PHP Fundamentals � Dynamic ◦ Values Nature of PHP �Exist in a managed memory space �Created as literals, results of expressions, or by internal constructions and functions �Explicit data type �boolean, integer, float, string, array, object, resource, NULL ◦ Memory Management �Uses copy-on-write and reference counting �Values are not always copied on assignment �Once a value has zero references, it is garbage collected by Martin Kruliš (v 1. 1) 29. 2. 2016 2
Revision of PHP Fundamentals � Dynamic Nature of PHP ◦ Variables �Mnemonic references to values �No declarations, created on the first assignment �In the global or local scope �Globals can be mapped into local context (global keyword) �No explicit type (type is determined by current value) �Type can be changed with a new assignment �Existence can be tested (isset) and terminated (unset) ◦ Arrays �An array key behaves in many ways like a variable by Martin Kruliš (v 1. 1) 29. 2. 2016 3
Revision of PHP Fundamentals � Implications ◦ Large data handling �Shared reading works fine thanks to Co. W �Explicit unset() may release no longer needed data ◦ There are no pointers… �Some data structures depend on pointers/references �Instead of pointers �The arrays are flexible enough �Objects are passed by reference (like in C#/Java) �Variable variables �Explicit references by Martin Kruliš (v 1. 1) 29. 2. 2016 4
Variables � Indirect Access to Values ◦ Name of one variable is stored in another variable $a = 'b'; $$a = 42; // the same as $b = 42; $a = 'b'; $b = 'c'; $c = 'd'; $$$$a = 'hello'; // the same as $d = 'hello'; ◦ The {, } can be used to avoid ambiguous situations ◦ Can be used with members, functions, classes, … $obj = new $class. Name(); $obj->$var. Name = 42; by Martin Kruliš (v 1. 1) 29. 2. 2016 5
References � References ◦ Similar to Unix hard-links in FS ◦ Multiple variables attached to the same data �Reference is taken by the & operator ◦ Independent on object references �A reference to an object can be created $a = 1; $b = &$a; $b++; echo $a; // prints 2 $a int (2) (1) $b by Martin Kruliš (v 1. 1) 29. 2. 2016 6
References in Functions � Arguments as References ◦ Similar usage as var keyword in Pascal function inc(&$x) { $x++; } � Returning References function &find. It($what) { global $my. Array; return &$my. Array[$what]; } by Martin Kruliš (v 1. 1) 29. 2. 2016 7
References � References vs. Pointers function foo(&$var) { $var = &$GLOBALS['bar']; } � The unset() Function � The global Declaration $x = 42; foo($x); How is $x affected? ◦ Does not remove data, only the variable ◦ Data are removed when not referenced global $a; $a = &$GLOBALS['a']; by Martin Kruliš (v 1. 1) 29. 2. 2016 8
Revising PHP Functions � Declaration ◦ Keyword function followed by the identifier function foo([args, …]) { … body … } ◦ Function body �Pretty much anything (even a function/class decl. ) �Nested functions/classes are declared once the function is called for the first time �Functions are 2 nd level citizens and identifier space is flat ◦ Results �Optional argument of the return construct �Only one value, but it can be an array or an object by Martin Kruliš (v 1. 1) 29. 2. 2016 9
Function Arguments � Argument Declarations ◦ Implicit values may be provided function foo($x, $y = 1, $z = 2) { … } �Arguments with implicit values are aligned to the right �Note that PHP functions does not support overloading � Variable Number of Arguments ◦ Any function can be called with more arguments than formally declared ◦ Functions func_num_args(), func_get_arg(), and func_get_args() provide access to such arguments by Martin Kruliš (v 1. 1) 29. 2. 2016 10
Variable Functions � Indirect Calling ◦ Calling a function by its name stored in a variable function foo($x, $y) { … } $func. Name = 'foo'; $func. Name(42, 54); // the same as foo(42, 54); � Similar Constructs ◦ Using specialized invocation functions �call_user_func('foo', 42, 54); �call_user_func_array('foo', array(42, 54)); by Martin Kruliš (v 1. 1) 29. 2. 2016 11
Functions Specialities � Testing Function Existence ◦ function_exists() – test whether given func. exists ◦ get_defined_functions() – list of all func. names � Cleanup Functions ◦ register_shutdown_function() – registers a function, which is executed when the script finishes � Special Case of Left-side Function ◦ Func. list() is used at the left-side of assignments �Reverse logic – it fills its arguments list($a, $b, $c) = array(1, 2, 3); by Martin Kruliš (v 1. 1) 29. 2. 2016 12
Creating Functions � Lambda (Nameless) Functions ◦ A unique name is generated automatically ◦ Function create_function() �Gets the arguments and the body as strings �Returns an identifier of newly created function ◦ Useful in many situations �One-call functions �Call-back functions Creates a string identifier, which cannot collide with regular identifiers $mul = create_function('$x, $y', 'return $x * $y'); echo $mul(3, 4); // prints out '12' by Martin Kruliš (v 1. 1) 29. 2. 2016 13
Anonymous Functions � Anonymous Functions ◦ Better way how to implement nameless functions $fnc = function($args) { …body… }; ◦ The anonymous function is an instance of Closure �It can be passed on like an object ◦ The visible variables must be explicitly stated $fnc = function(…) use ($var, &$refvar) { … }; �These variables are captured in the closure �Variables passed by reference can be modified Example 1 by Martin Kruliš (v 1. 1) 29. 2. 2016 14
Discussion by Martin Kruliš (v 1. 1) 29. 2. 2016 15
PHP Strings Charsets, text processing, and regular expressions, by Martin Kruliš (v 1. 1) 29. 2. 2016 16
Select Your Charset � One Charset to Rule Them All ◦ HTML, PHP, database (connection), text files, … ◦ Determined by the language(s) used �Unicode covers almost every language ◦ Early incoming, late outgoing conversions � Charset in Meta-data ◦ Must be in HTTP headers header('Content-Type: text/html; charset=utf-8'); ◦ Do not use HTML meta element with http-equiv �Except special cases (like saving HTML file locally) by Martin Kruliš (v 1. 1) 29. 2. 2016 17
Multi-byte Strings � Multibyte Character Encoding ◦ Some charsets (e. g. , UTF-8, UTF-16, …) ◦ Standard string functions are ANSI based �They treat each byte as a char � Multibyte String Functions Library ◦ Standard library, often present in PHP ◦ Duplicates most of the standard string functions, but with prefix mb_ (mb_strlen, mb_strpos, …) ◦ Encoding conversions mb_convert_encoding() ◦ mb_internal_encoding() – specifies the internal encoding used in PHP Example 2 by Martin Kruliš (v 1. 1) 29. 2. 2016 18
Data Encoding � Encoding Input Data from HTTP ◦ Usually done transparently �Check “mbstring” section of php. ini ◦ Can be done manually mb_parse_str() � Databases ◦ The database or the database connection usually requires to be configured ◦ An example for My. SQL database �mysqli_set_charset() by Martin Kruliš (v 1. 1) 29. 2. 2016 19
Comparisons and Conversions � Lexicographical Comparison of Strings ◦ Best to be done elsewhere (in DBMS for instance) ◦ The strcmp() function is binary safe ◦ The locale must be set correctly (setlocale()) � Iconv Library ◦ An alternative to Multibyte String Functions ◦ Fewer functions ◦ Easier for encoding conversions �Can deal with missing mappings and replacements by Martin Kruliš (v 1. 1) 29. 2. 2016 20
Input Verification/Sanitization � What to Verify or Sanitize ◦ Everything that possibly comes from users: $_GET, $_POST, $_COOKIE, … ◦ Data that comes from external sources (database, text files, …) � When to Verify or Sanitize ◦ On input – verify correctness �Before you start using data in $_GET, $_POST, … ◦ On output – sanitize to prevent injections �When data are inserted into HTML, SQL queries, … by Martin Kruliš (v 1. 1) 29. 2. 2016 21
Input Verification/Sanitization � How to Verify ◦ Regular expressions ◦ Filter functions �filter_input(), filter_var(), … �Useful for special validations (e-mail, URL, IP, …) � How ◦ ◦ to Sanitize String and filter functions, regular expressions htmlspecialchars() – encoding for HTML urlencode() – encoding for URL DBMS-specific functions (mysqli_escape_string()) by Martin Kruliš (v 1. 1) 29. 2. 2016 22
Regular Expressions � String Search Patterns ◦ Special syntax that encodes a program (language) for regular automaton ◦ Simple to use �Encoding is (mostly) human readable ◦ POSIX and Perl Standards � Usage ◦ Searching strings, listing matches ◦ Find and replace ◦ Splitting a string into an array of strings by Martin Kruliš (v 1. 1) 29. 2. 2016 23
Regular Expression Syntax � Expression ◦ <separator>expr<separator>modifiers ◦ Separator is a single character (usually /, #, %, …) ◦ Pattern modifiers are flags that affect the evaluation � Base Syntax ◦ Sequence of atoms ◦ Atom could be �Simple (non-meta) character (letter, number, …) �Dot (. ) represents any character �A list of characters in [] ([abc], [0 -9 a-z_], …) by Martin Kruliš (v 1. 1) 29. 2. 2016 24
Regular Expression Syntax � Important Meta-characters ◦ - an escaping character for other meta-characters ◦ Anchors ^, $ marking start/end of a string/line �^ in character class definition inverts the set ◦ [, ] – character class definition ◦ {, } – min/max quantifier atom{n}, atom{min, max} �[0 -9]{8} (8 -digit number), . {1, 9} (1 -9 chars) ◦ (, ) – subpattern (treated like an atom) ◦ *, +, ? – repetitions, shorthand notations of {0, }, {1, }, and {0, 1} respectively ◦ | - branches (ptrn 1|ptrn 2) by Martin Kruliš (v 1. 1) 29. 2. 2016 25
Regular Expression Syntax � Character Classes ◦ Pre-defined classes identified by names [: name: ] ◦ ◦ ◦ ◦ �For example [ab[: digit: ]] matches a, b, and 0 -9 alpha – letters digit – decimal digits alnum – letters and digits blank – horizontal whitespace (space and tab) space – any whitespace (including line breaks) lower, upper – lowercase/uppercase letters cntrl – control characters xdigit – hexadecimal digits by Martin Kruliš (v 1. 1) 29. 2. 2016 26
Regular Expression Syntax � Modifiers i – case Insensitive m – multiline mode (^, $ match start/end of a line) s – '. ' matches also a newline character x – ignore whitespace in regex (except in character class constructs) ◦ S – more extensive performance optimizations ◦ U – switch to not greedy evaluation ◦ ◦ �Greedy evaluation means that patterns with *, +, or ? tries to match as many characters as possible by Martin Kruliš (v 1. 1) 29. 2. 2016 27
Regular Expression Syntax � Subpatterns ◦ To ensure correct operation precedence (one|two|three){1, 3} ◦ To add modifiers to only a part of the expression (? modifiers: ptrn) ◦ To mark important parts of the expression �Used to retrieve parts of a string after matching �Named subpatterns (? <name>ptrn), or (? 'name'ptrn) �Unnamed subpatterns (no capturing in matching) (? : ptrn) by Martin Kruliš (v 1. 1) 29. 2. 2016 28
Regular Expression Example � E-mail Verification (RFC 2822) (? : [a-z 0 -9!#$%&'*+/=? ^_`{|}~-]+(? : . [a-z 0 -9!#$%&'*+/ =? ^_`{|}~-]+)*|"(? : [x 01 -x 08x 0 bx 0 cx 0 e-x 1 fx 21 x 23 -x 5 bx 5 d-x 7 f]|\[x 01 -x 09x 0 bx 0 cx 0 e-x 7 f])* ")@(? : [a-z 0 -9](? : [a-z 0 -9 -]*[a-z 0 -9])? . )+[a-z 0 -9] (? : [a-z 0 -9 -]*[a-z 0 -9])? |[(? : 25[0 -5]|2[0 -4][0 -9]| [01]? [0 -9]? ). ){3}(? : 25[0 -5]|2[0 -4][0 -9]|[01]? [0 -9]? |[a-z 0 -9 -]*[a-z 0 -9]: (? : [x 01 -x 08x 0 bx 0 c x 0 e-x 1 fx 21 -x 5 ax 53 -x 7 f]|\[x 01 -x 09x 0 bx 0 c x 0 e-x 7 f])+)]) by Martin Kruliš (v 1. 1) 29. 2. 2016 29
Regular Expression Functions � preg_match($ptrn, $subj [, &$matches]) ◦ Searches given string by a regex ◦ Returns true if the pattern matches the subject ◦ The $matches array gathers the matched substrings of subject with respect to the expression and subpatterns �Subpatterns are indexed from 1 �At index 0 is the entire expression �Named patterns are indexed associatively by their names "6 eggs, 3 spoons of oil, 250 g of flower" ~ /[[: digit: ]]+/ array(1) { [0] => string("6") } by Martin Kruliš (v 1. 1) 29. 2. 2016 30
Regular Expression Functions � preg_replace($ptrn, $repl, $str) ◦ Search and replace substrings in a string �Each match of the pattern is replaced �Replacement may contain references to subpatterns � preg_split($ptrn, $str [, $limit]) ◦ Similar to explode() function ◦ Split a string into an array of strings ◦ The pattern is used to match delimiters �Delimiters are not part of the result Example 3 by Martin Kruliš (v 1. 1) 29. 2. 2016 31
POSIX Regular Expressions � Differences ◦ The expression is not enclosed by separators �No modifiers can be added ◦ Only simple subpatterns ◦ Only a few escape sequences � Functions ◦ ereg(), ereg_replace(), split() ◦ Each function has –i version (case insensitive) �eregi() – case insensitive version of ereg() ◦ Deprecated since PHP 5. 3 by Martin Kruliš (v 1. 1) 29. 2. 2016 32
Discussion by Martin Kruliš (v 1. 1) 29. 2. 2016 33
- Slides: 33