Following is a merge of two letters I sent to firstname.lastname@example.org,
describing the changes in API between PHP 3.0 and PHP 4.0 (Zend).
This file is by no means thorough documentation of the PHP API,
and is intended for developers who are familiar with the PHP 3.0 API,
and want to port their code to PHP 4.0, or take advantage of its new
features. For highlights about the PHP 3.0 API, consult apidoc.txt.
I'm going to try to list the important changes in API and programming
techniques that are involved in developing modules for PHP4/Zend, as
opposed to PHP3. Listing the whole PHP4 API is way beyond my scope here,
it's mostly a 'diff' from the apidoc.txt, which you're all pretty familiar
An important note that I neglected to mention yesterday - the php4 tree is
based on the php 3.0.5 tree, plus all 3.0.6 patches hand-patched into it.
Notably, it does NOT include any 3.0.7 patches. All of those have to be
reapplied, with extreme care - modules should be safe to patch (mostly),
but anything that touches the core or main.c will almost definitely require
changes in order to work properly.
 Symbol Tables
One of the major changes in Zend involves changing the way symbols tables
work. Zend enforces reference counting on all values and resources. This
required changes in the semantics of the hash tables that implement symbol
tables. Instead of storing pval in the hashes, we now store pval *. All
of the API functions in Zend were changed in a way that this change is
completely transparent. However, if you've used 'low level' hash functions
to access or update elements in symbol tables, your code will require
changes. Following are two simple examples, one demonstrates the
difference between PHP3 and Zend when reading a symbol's value, and the
other demonstrates the difference when writing a value.
_php3_hash_find(ht, "foo", sizeof("foo"), &foo);
/* foo->type is the type and foo->value is the value */
_php3_hash_find(ht, "foo", sizeof("foo"), &foo);
/* (*foo)->type is the type and (*foo)->value is the value */
newval.type = ...;
newval.value = ...;
_php3_hash_update(ht, "bar", sizeof("bar"), &newval, sizeof(pval), NULL);
pval *newval = ALLOC_ZVAL();
newval->type = ...;
newval->value = ...;
_php3_hash_update(ht, "bar", sizeof("bar"), &newval, sizeof(pval *), NULL);
One of the 'cute' things about the reference counting support is that it
completely eliminates the problem of resource leaking. A simple loop that
included '$result = mysql_query(...)' in PHP leaked unless the user
remembered to run mysql_free($result) at the end of the loop body, and
nobody really did. In order to take advantage of the automatic resource
deallocation upon destruction, there's virtually one small change you need
to conduct. Change the result type of a resource that you want to destroy
itself as soon as its no longer referenced (just about any resource I can
think of) as IS_RESOURCE, instead of as IS_LONG. The rest is magic.
A special treatment is required for SQL modules that follow MySQL's
approach for having the link handle as an optional argument. Modules that
follow the MySQL module model, store the last opened link in a global
variable, that they use in case the user neglects to explicitly specify a
link handle. Due to the way referenec counting works, this global
reference is just like any other reference, and must increase that SQL link
resource's reference count (otherwise, it will be closed prematurely).
Simply, when you set the default link to a certain link, increase that
link's reference count by calling zend_list_addref().
As always, the MySQL module is the one used to demonstrate 'new
technology'. You can look around it and look for IS_RESOURCE, as well as
zend_list_addref(), to see a clear example of how the new API should be used.
 Thread safety issues
I'm not going to say that Zend was designed with thread safety in mind, but
from some point, we've decided upon several guidelines that would make the
move to thread safety much, much easier. Generally, we've followed the PHP
3.1 approach of moving global variables to a structure, and encapsulating
all global variable references within macros. There are three main
1. We grouped related globals in a single structure, instead of grouping
all globals in one structure.
2. We've used much, much shorter macro names to increase the readability
of the source code.
3. Regardless of whether we're compiling in thread safe mode or not, all
global variables are *always* stored in a structure. For example, you
would never have a global variable 'foo', instead, it'll be a property of a
global structure, for example, compiler_globals.foo. That makes
development much, much easier, since your code will simply not compile
unless you remember to put the necessary macro around foo.
To write code that'll be thread safe in the future (when we release our
thread safe memory manager and work on integrating it), you can take a look
at zend_globals.h. Essentially, two sets of macros are defined, one for
thread safe mode, and one for thread unsafe mode. All global references
are encapsulated within ???G(varname), where ??? is the appropriate prefix
for your structure (for example, so far we have CG(), EG() and AG(), which
stand for the compiler, executor and memory allocator, respectively).
When compiling with thread safety enabled, each function that makes use of
a ???G() macro, must obtain the pointer to its copy of the structure. It
can do so in one of two forms:
1. It can receive it as an argument.
2. It can fetch it.
Obviously, the first method is preferable since it's much quicker.
However, it's not always possible to send the structure all the way to a
particular function, or it may simply bloat the code too much in some
cases. Functions that receive the globals as an argument, should look like
rettype functioname(???LS_D) <-- a function with no arguments
rettype functioname(type arg1, ..., type argn ???LS_DC) <-- a funciton with
Calls to such functions should look like this:
functionname(???LS_C) <-- a function with no arguments
functionname(arg1, ..., argn ???LS_CC) <-- a function with arguments
LS stands for 'Local Storage', _C stands for Call and _CC stands for Call
Comma, _D stands for Declaration and _DC stands for Declaration Comma.
Note that there's NO comma between the last argument and ???LS_DC or ???LS_CC.
In general, every module that makes use of globals should use this approach
if it plans to be thread safe.
 Generalized INI support
The code comes to solve several issues:
a. The ugly long block of code in main.c that reads values from the
cfg_hash into php3_ini.
b. Get rid of php3_ini. The performance penalty of copying it around all
the time in the Apache module probably wasn't too high, but
psychologically, it annoyed me :)
c. Get rid of the ugly code in mod_php4.c, that also reads values from
Apache directives and puts them into the php3_ini structure.
d. Generalize all the code so that you only have to add an entry in one
single place and get it automatically supported in php3.ini, Apache, Win32
registry, runtime function ini_get() and ini_alter() and any future method
we might have.
e. Allow users to easily override *ANY* php3.ini value, except for ones
they're not supposed to, of course.
I'm happy to say that I think I pretty much reached all goals. php_ini.c
implements a mechanism that lets you add your INI entry in a single place,
with a default value in case there's no php3.ini value. What you get by
using this mechanism:
1. Automatic initialization from php3.ini if available, or from the
default value if not.
2. Automatic support in ini_alter(). That means a user can change the
value for this INI entry at runtime, without you having to add in a single
line of code, and definitely no additional function (for example, in PHP3,
we had to add in special dedicated functions, like
set_magic_quotes_runtime() or the likes - no need for that anymore).
3. Automatic support in Apache .conf files.
4. No need for a global php3_ini-like variable that'll store all that
info. You can directly access each INI entry by name, in runtime. 'Sure,
that's not revolutionary, it's just slow' is probably what some of you
think - which is true, but, you can also register a callback function that
is called each time your INI entry is changed, if you wish to store it in a
cached location for intensive use.
5. Ability to access the current active value for a given INI entry, and
the 'master' value.
Of course, (2) and (3) are only applicable in some cases. Some entries
shouldn't be overriden by users in runtime or through Apache .conf files -
you can, of course, mark them as such.
So, enough hype, how does it work.
static PHP_INI_MH(OnChangeBar); /* declare a message handler for a change
in "bar" */
PHP_INI_ENTRY("foo", "1", PHP_INI_ALL, NULL, NULL)
PHP_INI_ENTRY("bar", "bah", PHP_INI_SYSTEM, OnChangeBar, NULL)
a_global_var_for_bar = new_value;
and that's it. Here's what it does. As you can probably guess, this code
registers two INI entries - "foo" and "bar". They're given defaults "1"
and "bah" respectively - note that all defaults are always given as
strings. That doesn't reduce your ability to use integer values, simply
specify them as strings. "foo" is marked so that it can be changed by
anyone at any time (PHP_INI_ALL), whereas "bar" is marked so it can be
changed only at startup in the php3.ini only, presumably, by the system
When "foo" changes, no function is called. Access to it is done using the
macros INI_INT("foo"), INI_FLT("foo") or INI_STR("foo"), which return a
long, double or char * respectively (strings that are returned aren't
duplicated - if they're manipulated, you must duplicate them first). You
can also access the original value (the 'master' value, in case one of them
was overriden by a user) using another pair of macros:
INI_ORIG_INT("foo"), INI_ORIG_FLT("foo") and INI_ORIG_STR("foo").
When "bar" changes, a special message handler is called, OnBarChange().
Always declare those message handlers using PHP_INI_MH(), as they might
change in the future. Message handlers are called as soon as an ini entry
initializes or changes, and allow you to cache a certain INI value in a
quick C structure. In this example, whenever "bar" changes, the new value
is stored in a_global_var_for_bar, which is a global char * pointer,
quickly accessible from other functions. Things get a bit more complicated
when you want to implement a thread-safe module, but it's doable as well.
Message handlers may return SUCCESS to acknowledge the new value, or
FAILURE to reject it. That enables you to reject invalid values for some
INI entries if you want. Finally, you can have a pointer passed to your
message handler - that's the fifth argument to PHP_INI_ENTRY(). It is
passed as mh_arg to the message handler.
Remember that for certain values, there's really no reason to mess with a
callback function. A perfect example for this are the syntax highlight
colors, which no longer have a dedicated global C slot that stores them,
but instead, are fetched from the php_ini hash on demand.
"As always", for a perfect working example of this mechanism, consult
functions/mysql.c. This module uses the new INI entry mechanism, and was
also converted to be thread safe in general, and in its php_ini support in
particular. Converting your modules to look like this for thread safety
isn't a bad idea (not necessarily now, but in the long run).