HTTPS SSH

SOQ ( SOcket Queue ), an easy-to-use, reliable thin layer for PHP sockets


Why?

Because vanilla socket programming in PHP is unreliable and fails at the edge
cases. The problem arises from one particular design choice of sockets -
"sockets don't provide any facilities to detect message boundary". While this
does not seem to be much of a problem at the beginning, any real world
application developer eventually has to face this particular limitation.
If you are interested more about the motivation behind this project, please
skip to the section titled Problems with sockets.

Do I need to use SOQ?

SOQ is like a FIFO(First-In-First-Out) queue implemented on top of PHP sockets.
If you are trying to do IPC(Interprocess communication) with PHP sockets and
don't want to use something heavy like ZeroMQ, then SOQ is for you. It provides
barely enough abstractions to keep the application lightweight and self-contained,
yet lets you take full advantage of sockets.

Features

  • Ability to send/receive arbitrary serializable PHP object.
  • Reliable in the face of network congestion.
  • Support for both blocking and non-blocking mode.
  • Exception based handling of irregular situations, such as
    connection termination.

Requirements

  • PHP 5.3.0 or greater.
  • PHP sockets extension being enabled.

Example usage

Here's the implementation of an echo server, done in SOQ; The source for
the server.php is -

<?php
require_once(__DIR__.'/soq/soq.class.php');

$HOST = "localhost";
$PORT = 9876;

# Typical socket boiler plates.
#-----------------------------------------------------------------------\
# Create a server socket.
$ssock = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
if (! is_resource($ssock)) { exit("Failed to create socket."); }

# Bind the socket to a port.
$retcode = socket_bind($ssock, $HOST, $PORT);
if ($retcode === false) { exit("Failed to bind to port."); }

# Wait for the client to connect.
$retcode = socket_listen($ssock, 1);
if ($retcode === false) { exit("Error while waiting for connection."); }

# Create a client socket.
$csock = socket_accept($ssock);
if (! is_resource($csock)) { exit("Error while accepting connection."); }
#-----------------------------------------------------------------------/


# Create a so-cue object.
$soq = new vimmaniac\soq\Soq($csock);

$msg = "";
while (strtolower($msg) !== "quit"){
    # Receive a message (in blocking mode).
    $msg = $soq->recv(True);

    # After capitalizing, echo the message (in blocking mode).
    $soq->send(strtoupper($msg), True);
}


# Close the sockets.
socket_close($csock);
socket_close($ssock);

The source for client.php is -

<?php
require_once(__DIR__.'/soq/soq.class.php');

$HOST = "localhost";
$PORT = 9876;

# Typical socket boilerplate code.
#------------------------------------------------------------------\
# Create a socket.
$csock = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
if (! is_resource($csock)) { exit("Failed to create socket."); }

# Connect to server.
$retcode = socket_connect($csock, $HOST, $PORT);
if ($retcode === false) { exit("Failed to connect to server."); }
#------------------------------------------------------------------/


# Create a so-cue object.
$soq = new vimmaniac\soq\Soq($csock);

echo "Welcome to capitalizing terminal. Type 'quit' to exit ...\n";
while (True){
    try {
        # Take msg from user.
        $msg = trim(fgets(STDIN));

        # Forward the message to the server, in blocking mode.
        $soq->send($msg, True);

        # Receive the transformed message.
        echo $soq->recv(True)."\n";
    }

    # We quit in case our server closes connection.
    catch (vimmaniac\soq\SocketConnectionClosed $e){ break; }
}


# Close the socket.
socket_close($csock);

The server can be started with php -f server.php and the client can be
started with php -f client.php. After starting the client, the user is
presented with a terminal that capitalizes every sentence they type in.
More examples can be found inside examples directory of the source.

Problems with sockets

Lack of message boundary is the Achilles Hill of sockets. To
demonstrate what can go wrong with sockets, lets look at a simple client and
server program that transfers a small text. For brevity, I am going to ignore
error handling completely. The source for server.php is -

<?php
// filename: server.php
// A simple TCP server. For brevity we are going to ignore
// error handling completely.

$MSG1 = "hello";
$MSG2 = "world";
$HOST = "localhost";
$PORT = 5000;

// Create a server socket and listen for connections.
$ssock = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_bind($ssock, $HOST, $PORT);
socket_listen($ssock, 1);

// Accept a connection and create the client socket
// to talk to the remote peer.
$csock = socket_accept($ssock);

// Send the message.
socket_write($csock, $MSG1);
socket_write($csock, $MSG2);

// Close the sockets.
socket_close($csock);
socket_close($ssock);

/* End of File */

And source for corresponding client.php -

<?php
// filename: client.php
// A simple TCP client. Error handling is ignore for keeping
// it simple.

$HOST = "localhost";
$PORT = 5000;

$MAX_READ_SIZE = 2048;

// Create a client socket and connect with the server.
$csock = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
socket_connect($csock, $HOST, $PORT);

// Read the messages sent by the server.
$msg1 = socket_read($csock, $MAX_READ_SIZE);
$msg2 = socket_read($csock, $MAX_READ_SIZE);

// Print the messages in a line.
echo $msg1."+".$msg2."\n";

// Close the socket.
socket_close($csock);

/* End of File */

If you start the server with php -f server.php in a terminal, and then start
the client in another terminal with php -f client.php, the client should
print the text hello+world and exit, right? Looks simple enough!

But if you actually run the programs, you will most likely see helloworld+,
instead of hello+world! Not only that, depending on the performance and
load on your networking stack, you may see any of the following - h+elloworld,
he+lloworld, hel+loworld ... etc.

This seemingly odd behavior exist because TCP stacks don't care about when and
how a message was sent. They also don't put any internal boundary to
distinguish between messages. All they care about is ordering of the data sent,
so that we can not have something totally inconsistent like world+hello.
On the other hand, such behavior gives room for optimizations in a fast
network and fail-proofing in a slow network. In our case, we received two
messages sent by the server in a single read, which is better(think about it)!
Consequently though, it makes very hard for an application programmer to
distinguish between two messages.

Working around lack of message boundary

The most common approach to work around this problem is to declare a particular
character or a sequence of characters as a message boundary. As an example,
we could declare the percentage sign(%) to be a message boundary. Then
append a % at the end of every message while sending. In the receiving side,
we append all messages to a buffer and if we find a percentage sign in that
buffer, we split at the first occurrence of that character and receive our
message. This approach is efficient, except that we are now limited to not
having that particular character in the data itself. For example, if we send a
message 100% is just 1, our implementation will treat it as two separate
messages - 100 and is just 1.

socket_read with the NORMAL_READ flag is implemented on similar ideas, with newline(\n) character as the message boundary.

Workaround to include the message boundary in the data

A way to work around the problem of having the message boundary itself in the
data is to replace that particular character with something else. For example,
if our data is Hype: 100%, then it is transformed to Hype: 100__PERCENT__.
A similar but inverse transformation is done on the receiving side to regain
the original data. Again, data separation is messed up if somehow the sequence
__PERCENT__ appears inside the data! This approach is really dangerous
if you intend to transfer arbitrary data through your system.

Another proper workaround is to choose an encoding for the data. The encoding is
chosen carefully to exclude the message boundary character. For example, if
we choose base64 as the data encoding and % as message boundary,
everything will work as intended. The only down side is that, such encoding
schemes are generally space inefficient. base64 encoding roughly expands
data size to 140%.

Another approach: send the message length along the message

Another way to approach the problem is to append the length of a message
at the start of message. As the receiver knows the exact length of the
data before it is even received, it can cut the buffer without any sort
of ambiguity about message boundary. In such a scheme, let's conserve the
first three bytes/characters for transmission of message length. Then our
example messages will be transmitted as - 005hello and 005world.

This method is reliable, space efficient(although there is a fixed amount
of overhead per message) at the cost of being a little difficult to implement.
SOQ is an implementation of this idea.