bitblaze avatar bitblaze committed 0677aa0

Finish report and 438 final project submission.

Comments (0)

Files changed (1)

doc/438-report/final-report.tex

 
 We decided to implement the server in the Go programming language. Although the language is relatively young, it provides features that are ideal to writing event-driven distributed applications, most notably the \emph{channel} abstraction, which provides synchronous message passing, and concurent programming features like \emph{goroutines}, an abstraction of lightweight threads. Furthermore, Go is statically and explicitly typed but has many easy-to-use modern features inspired by dynamic languages such as maps and closures.
 
-The Ringmaster client library was written in C and was loosely based on the architecture of the ZoKeeper C client library. ZooKeeper in fact provides two libraries, a multithreaded version and a single-threaded version where the event loop is exposed. We decided to only implement a multithreaded library for Ringmaster because that will be the most common use case such as for our test harness; furthermore, Calvin utilizes the multithreaded version of the ZooKeeper client library. Finally, while ZooKeeper's client library provides both blocking and nonblocking functions, we only provide the asynchronous functions. While a simpler test client may have been more suited for the short time frame between the start of the project and the writing of the report, developing this more sophisticated and higher performance client library will provide much greater utility especially for the next steps of the project.
+The Ringmaster client library was written in C and was loosely based on the architecture of the ZooKeeper C client library. ZooKeeper in fact provides two libraries, a multithreaded version and a single-threaded version where the event loop is exposed. We decided to only implement a multithreaded library for Ringmaster because that will be the most common use case such as for our test harness; furthermore, Calvin utilizes the multithreaded version of the ZooKeeper client library. Finally, while ZooKeeper's client library provides both blocking and nonblocking functions, we only provide the asynchronous functions. While a simpler test client may have been more suited for the short time frame between the start of the project and the writing of the report, developing this more sophisticated and higher performance client library will provide much greater utility especially for the next steps of the project.
 
 Early in the development of our project, we looked into different options for message serialization and deserialization. It was found that ZooKeeper uses a data serialization library called Jute, which turned out to be the same as Hadoop Records. Due to the low adoption of Jute and becuase Calvin had been using Google Protocol Buffers, we initially decided to use Protocol Buffers as well. However, after exploring Protobuf-C and GoProtobufs, the C and Go Protocol Buffers bindings respectively, we discovered that due to the immaturity of these two packages and the lack of facilities such as unmarshalling from a stream (which was provided by the C++ and Java packages), using Protocol Buffers caused more difficulties than it solved. As a result, we decided on our own message formats and wrote our own serialization/deserialization packages.
 
 
 \section{Server Framework}
 
-Sherwin to describe the awesomeness of the server framework.
+The purpose of each ringmaster server is to run a Paxos consensus protocol in conjunction with the other servers, which we call peers. To accomplish this, each server needs to perform the following roles: 1) establish a network connection with all other servers and the client. 2) receive requests from thr client. 3) forward client requests to thr paxos protocol and participate in consensus with other servers. 4) respond to the client with thr results of the proposal.
+
+The ringmaster server consists of two core components: network and paxos. Network encapsulates all server to server communications, exposing a simple interface for message delivery and receipt. Each Paxos implementation represents a version if the Paxos consensus protocol, defining it's own behavior and and message formats. Network handles all the communication for Paxos.
+
+On top of the network and paxos components, the server maintains communications with the client and handles server-server communications, each in its own goroutine. The server maintains a TCP connection with the client and proceses its incoming messages accordingly. It also waits on network. Incoming which is a Go channel of messages, which the server processes in order of arrival. Processing these network-layer messges usually entails unwrapping the payload and passing it to the Paxos layer.
+
+\subsection{Network}
+
+We implemented a separate network layer to handle server to server communications for the ease of imposing simulation conditions later on, as our network conditiond are limited by our testing environment set up. By separating the communications layer from the actual server, we are able to simulate different network conditions  (additional delays, loss of messages) that might be reflective of real-world situations (e.g. Servers deployed in different continents; variable server servers latency). With out design, we can add arbitrary delays and loss rates between certain edges to simulate these conditions.
+
+The current implementation opens two directional TCP connections between peers (all of which are different hosts).  It first starts a listening socket and then begins to dial each of its peers. Setup is not complete until all TCP connections are established. To ensure messages aren't interleaved, the network uses two go channels, for incoming and outgoing network messages. For each accepted connection, a goroutine  is started to read network messages from thr connection. The goroutine then pushes each created message to the incoming channel, which can be read from by the server.
+
+Thr network also supports sends an arbitrary payload (wrapped in a network message) to a single peer,  and broadcast, which sends a message to all peers. For send, a message with the destination peer and desired payload is created and pushed to the outgoing channel. For broadcast, the same is done for all peer destinations. Finally, a dedicated goroutine writes messages from the outgoing channel to the appropriate outgoing TCP connection
 
 \section{Client Library}
 
 
 A typical run is described as follows. First, a call to \texttt{ringmaster\_init()} allocates a new \texttt{RgHandle} opject. The host and port passed to the function are resolved and the address is stored in the handle. Then, the multithreaded I/O initializer function is called \texttt{mtio\_init()}. Here, the four queues are initialized, a self-pipe is created (which will be discussed later), and the event completion and I/O threads are forked.
 
-When the I/O thread is started, it first registers the receiving end of the self-pipe with \texttt{epoll\_ctl()}, then attempts to open a socket connection to the Ringmaster server. Once the socket is established, the socket is set to be non-blocking and then also registered with \texttt{epoll}. Then, the I/O thread enters its main loop and waits for I/O events with \texttt{epoll\_wait()}. At the same time, the event completion thread enters its main loop where it locks the inbound completion queue, checks the queue for any available callbacks to process, then sleeps using \texttt{pthread\_cond\_wait()}. 
+When the I/O thread is started, it first registers the receiving end of the self-pipe with \texttt{epoll\_ctl()}, then attempts to open a TCP socket connection to the Ringmaster server. Once the socket is established, the socket is set to be non-blocking and then also registered with \texttt{epoll}. Then, the I/O thread enters its main loop and waits for I/O events with \texttt{epoll\_wait()}. At the same time, the event completion thread enters its main loop where it locks the inbound completion queue, checks the queue for any available callbacks to process, then sleeps using \texttt{pthread\_cond\_wait()}. 
 
 Each message is represented with a struct that is densely packed with the \texttt{\_\_attribute\_\_((packed))} GCC attribute. This way, the binary representation of the struct will be exactly as we would expect as the widths of each field are specified exactly and the compiler would not perform it's usual padding to align all fields to four-byte boundaries. All messages have fixed 12-byte headers that contain the length of the entire method. Fields that are variable length are declared unusually as one-element \texttt{char} arrays. This is inspired by a trick used by PostgreSQL when allocating shared memory and is particularly useful when receiving messages where where the actual length of the message is not known until the length is extracted from the header. Once the length is known, for a $n$-byte message body, $n-1$ bytes are then allocated past the end of the array so that the entire message may be contained, once again by abuse of C's weak typing.
 
 
 \section{Results and Discussion}
 
-The first consensus protocol we implemented was Classic Paxos \cite{lamport:simplepaxos}. The next steps will involve implement the two other notable members of the Paxos family, Fast Paxos \cite{lamport:fastpaxos} and Multi-Paxos \cite{chandra:multipaxos}.
+The first consensus protocol we implemented was Classic Paxos \cite{lamport:simplepaxos}. As with all consensus protocols that are to be implemented with our framework, the Classic Paxos protocol is implemented in each server as a handler that is invoked when the server receives a server-server message. The message formats for the Classic Paxos algorithm are all defined in the \texttt{ClassicPaxos} type. When a \texttt{ClassicPaxos} object receives a message, it deserializes it and determines which kind of message it is and then passes it to the correct handler method. Each handler method handles a different step in the voting process, following Lamport's description of the Classic Paxos algorithm.
+
+Due to the current progress on the server framework, client library, and test harness, we have not yet tested the implementation. However, during our reasoning through the design and implementation process, we encountered an issue that will need to be resolved before our Classic Paxos implementation is fully correct. According to Lamport, a proposal is considered "committed" once a quorum of acceptors have accepted the proposal during the second round of voting. However, consider the case where a minority of acceptors accept and commit a proposal $n$. However, the majority promise and a proposal $n+1$ and thus do not accept $n$. Clearly, $n$ should not be committed because a quorum has not accepted $n$. However, due to network latency, the minority nodes may believe $n$ is committed and if a client connected to one of the minority nodes performs a \texttt{retr} during that time, then it will read an inconsistent value. Lamport states that the learning step should be used to rememdy this issue and for a quorum of accepted acceptors to notify the rest of the nodes so that the proposal will be agreed upon by all nodes. However, we have yet to devise a practial method of implementing this step and remains part of the next step of our project.
+
 
 \section{Conclusions}
 
-While our project is overall in its early stages, we are optimistic about the direction of our project. The immediate next steps will be to complete the implementations of the client library, the server, and the correct implementation of Classic Paxos. Then, the test harness will be implemented so that preliminary test benchmarks on Classic Paxos may be implemented. Then, we will implement Fast Paxos and Multi-Paxos under the same framework and use our test harness to benchmark those protocols. In the process, we will seek to refine our framework. We then plan to study, implement, and test other consensus protocols such as Generalized Paxos \cite{lamport:generalizedpaxos}, which is used to maintain consistency between replicated state machines, and an improved Fast Paxos \cite{charron:improvingfast}, which seeks to minimize message round trips and rounds of voting while assuming fewer crash failures. With the completion of this project, we hope to submit our work for publication. Ultimately, we hope to obtain and present a greater and more comprehensive understanding of the strengths and weaknesses of the the many different consensus protocols and Paxos variants so that it may be better and more clearly understood which kinds of application different consensus protocols are best suited for. It will then be possible to use this survey to more optimally achieve consensus for distributed systems such as Calvin.
+While our project is overall in its early stages, we are optimistic about the direction of our project. The immediate next steps will be to complete the implementations of the client library, the server, and the correct implementation of Classic Paxos. Then, the test harness will be implemented so that preliminary test benchmarks on Classic Paxos may be implemented. Then, we will implement Fast Paxos \cite{lamport:fastpaxos} and Multi-Paxos \cite{chandra:multipaxos} under the same framework and use our test harness to benchmark those protocols. In the process, we will seek to refine our framework. We then plan to study, implement, and test other consensus protocols such as Generalized Paxos \cite{lamport:generalizedpaxos}, which is used to maintain consistency between replicated state machines, and an improved Fast Paxos \cite{charron:improvingfast}, which seeks to minimize message round trips and rounds of voting while assuming fewer crash failures. With the completion of this project, we hope to submit our work for publication. Ultimately, we hope to obtain and present a greater and more comprehensive understanding of the strengths and weaknesses of the the many different consensus protocols and Paxos variants so that it may be better and more clearly understood which kinds of application different consensus protocols are best suited for. It will then be possible to use this survey to more optimally achieve consensus for distributed systems such as Calvin.
 
 \bibliographystyle{abbrv}
 \bibliography{final-report-sources}
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.