# Commits

committed 7b18aff

Updated analysis

• Participants
• Parent commits e28d7c5
• Branches master

# File analysis.txt

` # 2. Simulations are doomed to succeed`
` # -----------------------------------------------------------------------------`
` `
`-For our purposes, this is a useful (if not "good") simulation as it ensures that the bottleneck is locking (ie, not CPU usage, data IO, etc), therefore producing cleaner results. One big problem is transactions in real systems do not take a constant amount of time (If I were to guess, I'd suspect a Poisson distribution would be more likely). Furthermore....`
`+For our purposes, this is a useful (if not "good") simulation as it ensures `
`+that the bottleneck is locking (ie, not CPU usage, data IO, etc), therefore `
`+producing cleaner results. One big problem is transactions in real systems do`
`+not take a constant amount of time (if I were to guess, I'd suspect a Poisson `
`+distribution would be more likely). Furthermore, considering the fact that `
`+joins and aggregation are often the most costly operations in a real system, `
`+simply having the transaction sleep may not be as accurate for simulating `
`+transactional costs as creating a pattern of reads and writes that are similar`
`+to that of those types of operations.`
` `
` # -----------------------------------------------------------------------------`
`-# 3. Adventures in Contentino`
`+# 3. Adventures in Contention`
` # -----------------------------------------------------------------------------`
` `
`-duration*Throughput = Sum from 1 to N of (P_c -1)^(i-1)`
`-Throughput = [Sum from 1 to N of (P_c -1)^(i-1)]/duration`
`+  duration*Throughput = Sum from 1 to N of (P_c -1)^(i-1)`
`+  Throughput = [Sum from 1 to N of (P_c -1)^(i-1)]/duration`
`+`
` Where P_c is the probability of contention and i is the increment variable`
`-Explanation: Each "subsequent" transaction has (P_c - 1) probability of not contending with each of the previous transactions. The throughput*duration product at a given point is therefore the summation over the probability that the nth transaction is running. Throughput is just that value divided by duration.`
`+`
`+Explanation: Each "subsequent" transaction has (P_c - 1) probability of not `
`+contending with each of the previous transactions. The throughput*duration `
`+product at a given point is therefore the summation over the probability that `
`+the nth transaction is running. Throughput is just that value divided by `
`+duration.`
` `
` For 10% Contention:`
` 10`
` # 4. The glass is half empty`
` # -----------------------------------------------------------------------------`
` `
`-Locking B, performed better, as it allowed transactions which overlapping requests only in their readset to overlap. This is most apparent during low contention situations with long transaction duration.`
`+Locking B, performed better, as it allowed transactions which overlapping `
`+requests only in their readset to overlap. This is most apparent during low `
`+contention situations with long transaction duration.`
` `
`-Purely in terms of lock management, Locking B will always perform better (There are no situations in which a transaction would gain ownership of a lock under A, but not under B). However, under a heavily CPU bottlenecked system, it's possible (though I suspect unlikely) that the simpler locking would be beneficial`
`+Purely in terms of lock management, Locking B will always perform better (There `
`+are no situations in which a transaction would gain ownership of a lock under A, `
`+but not under B). However, under a heavily CPU bottlenecked system, it's `
`+possible (though I suspect unlikely) that the simpler locking would be `
`+beneficial.`
` `
`-B has the advantage of the concept of shared locks over standard two phase locking, however since it waits for all locks to be received before starting, and releases all locks at once regardless of the nature of the transaction, it loses out on the ability to start a transaction before acquiring the locks needed for a later stage of it, and blocks other transactions from using locks that the transaction already finished with.`
`+B has the advantage of the concept of shared locks over standard two phase `
`+locking, however since it waits for all locks to be received before starting, `
`+and releases all locks at once regardless of the nature of the transaction, it `
`+loses out on the ability to start a transaction before acquiring the locks `
`+needed for a later stage of it, and blocks other transactions from using locks `
`+that the transaction already finished with.`
` `
` # -----------------------------------------------------------------------------`
` # 5. The glass is half full`
` # -----------------------------------------------------------------------------`
`-Better for low contention and long transaction length. The former lines up with expectations, the latter doesn't (to the degree that I suspect it indicates an error on our part somewhere), as OCC performs well in situations where contention, and therefore the number of transactions that need to be restarted is low, and where the cost of restarting, should it be necessary, is also low.`
` `
`-I would expect in a real system, where due to the pieces mentioned, the cost of each transaction, and therefore that of a restart, is higher, that the low contention advantage would remain, and the effect of long/resource intensive transactions would be even greater.`
`+Better for low contention and long transaction length. The former lines up with `
`+expectations, the latter doesn't (to the degree that I suspect it indicates an `
`+error on our part somewhere), as OCC performs well in situations where `
`+contention, and therefore the number of transactions that need to be restarted `
`+is low, and where the cost of restarting, should it be necessary, is also low.`
`+`
`+I would expect in a real system, where due to the pieces mentioned, the cost of `
`+each transaction, and therefore that of a restart, is higher, that the low `
`+contention advantage would remain, and the effect of long/resource intensive `
`+transactions would be even greater.`
` `
` Changes needed:`
`-a) have multiple txn_processors running simultaneously `
`-b) keep track of running transactions`
`+  a) Have multiple txn_processors running simultaneously `
`+  b) Keep track of running transactions`
` `
`-I suspect it would perform worse in simulation than our current system, as more transactions would have to be restarted and the cost of validation in this simulation is minimal.`
`+I suspect it would perform worse in simulation than our current system, as more `
`+transactions would have to be restarted and the cost of validation in this `
`+simulation is minimal.`
` `
` # -----------------------------------------------------------------------------`
` # 6. Extra version olive oil`
` # -----------------------------------------------------------------------------`
` `
`-########`
`-Diff between this and Postgres HERE`
`-######`
`-`
`-MVCC great advantage is that it effectively negates the effects of contention on read (no transaction ever must wait for another to finish using a resourcebefore it can read it) and therefore shines in situations with a mixture of reads and writes with high contention (MVCC is the run-away winner for the mixed test)`
`-`
`-You could very well combine MVCC with OCC (and then you wouldn't need to check your readset on validation), however, MVCC+LockingA or LockingB would basically be MVCC.`
`+One major difference between our system and Postgres is that in our`
`+implementation of MVCC, writes will have acquried exclusive locks on all tuples`
`+in its writeset before the transaction can proceed. After acquiring the locks,`
`+application of the writes will proceed without waiting. In Postgres, a `
`+transaction may have to wait for an in-progress write to the same tuple. `
`+`
`+Thus, in our system, a deleted pg_log entry for a particular XID means that `
`+the transaction associated with that XID has committed and is not in progress. `
`+This simplifies the visibility checks since we only need to check the `
`+transaction's own snpashot of pg_log taken at the start of the transaction. `
`+The check for Xmin would only consist of checking if Xmin < XID and pg_log `
`+does not contain an entry for Xmin. This both simplifies visibility checking `
`+and allows us to simply just check Xmin, Xmax, and our snapshot of pg_log.`
`+`
`+Pruning pg_log entries this way may work in Postgres. Enforce the invariant`
`+that a deleted pg_log entry means that the Xaction has committed. The `
`+snapshot received by each Xaction must also describe whether the Xmin's`
`+Xaction has committed (i.e. pg_log's entry for Xmin does not exist) in`
`+addition to whether it is in progress. However, considering the fact that`
`+Postgres's pg_log must be written to disk, having such frequent deletes`
`+may incur a greater overhead in comparison to the completely in-memory`
`+pg_log in our system.`
`+`
`+MVCC great advantage is that it effectively negates the effects of contention `
`+on read (no transaction ever must wait for another to finish using a resource`
`+before it can read it) and therefore shines in situations with a mixture of `
`+reads and writes with high contention (MVCC is the run-away winner for the `
`+mixed test).`
`+`
`+You could very well combine MVCC with OCC (and then you wouldn't need to check `
`+your readset on validation); however, MVCC+LockingA or LockingB would basically `
`+be MVCC.`
` `
` Unserializable example:`
` `
` Start:`
`-A = 1`
`-B = 1`
`+  A = 1`
`+  B = 1`
` `
` Transaction 1:`
`-if A + B == 2; Write A = 0;`
`+  if A + B == 2; Write A = 0;`
` `
` Transaction 2:`
`-if A + B == 2; Write B = 0;`
`+  if A + B == 2; Write B = 0;`
` `
` Both transactions would procede under MVCC, but they could then not be serialized `
` `
`-`
`-`
`-`