Decoder introspection

Issue #40 resolved
Johan Rosenkilde created an issue

Related to #36.

Decoders should be able to give information about themselves, making certain automation and "learning by discovery" possible:

Their input space

A decoder might take e.g. received word plus erasure information, or received word plus soft reliability information.

Output "space"

The decoder might output a single codeword, lists of codewords or lists of codewords * reliability, etc.

Decoder "type"

I imagine a list of predefined strings to use here, e.g. "Hard-decision", "List-hard-decision", "Error-erasure", "Maximum-Likelihood", "Soft-decision", etc. This list will grow as we (and others!) implement more strange stuff.

Specialised methods

For certain types, it should be convention that certain methods exist:

  • decoding_radius for hard-decision and others
  • for error-erasure, some manner of describing what can be decoded (i.e. often 2t + e < d where t is number of errors and e erasures.)

Other stuff?

Perhaps this discussion should turn into something on the wiki. For now, at least add decoding_radius to our hard-decision decoders.

Comments (14)

  1. Daniel Augot

    No special comments, but as I think the idea is to keep things manageable and sustainable, here is my devilish remark : can you easily integrate an interleaved decoder which receives a bunch of codewords with some promises on the errors ?

  2. Johan Rosenkilde reporter

    I'm not sure I exactly know what you mean. I have written something on combining Channels on Metrics and Channels. That is about the issue of specifying how errors are added to the codeword. If I understand your question correctly, it is about how to characterise the decoder or something, similarly to the "type" I mention above?

    First of all, the "type" system I suggest here is never going to be exhaustive and formal in its characterisation, but could just approximate with a very short description which a human can reasonably understand, and a computer might use for rough categorisation. There are too many variations if one considers everything. It is possible that this implies that the idea of adding a "type" doesn't scale and is doomed to fail. We should think about this.

    My goal would be to use the type to, for instance, do a quick filter through available decoders for the ones I'm interested in. But perhaps that's artificial. A human could manually go through each, reading their documentation -- probably there's not going to be too many.

    It just disturbs me to have objects which one can not programmatically discover what does, thereby disallowing automatisation.

    For your concrete case of an interleaved decoder, I would probably not in the type care about how the decoder uses the interleaving. If the errors are hard-decision, and one can give something akin to a decoding radius (e.g. interleaved RS codes), I would say "hard-decision". Otherwise perhaps "hard-decision-probabilistic". If the errors are soft, I would say "soft-decision".

    A related but different example is product codes with an iterative decoder. I guess one could say "hard-decision-iterative", which is also what an LDPC decoder would have.

  3. Johan Rosenkilde reporter

    A different possibilitiy is instead of one "type" to have a description as a set of properties, like "hard-decision", "probabilistic", "iterative", "ML", "list", "soft-in", "soft-out" etc. This would make it easier to match on. E.g. if one would like to compare the actual performance of all hard-decision decoders on a code, one could write something like

    for Dname in MyCode.decoders_available():
        D = MyCode.decoder(Dname)
        if "hard-decision" in D.description():
            test_report[D] = perform_test(MyCode, D)
    plot_test_report(test_report)
    
  4. Daniel Augot

    One important thing. Think of Berlekamp-Welch, which is an "if and only if" decoder. This means

    1. when it returns a codeword, the codeword is correct (belonging to the code, at distance less than t)
    2. when it does not return a codeword, there is not codeword at distance less than t
    3. when a codeword is a distance less than t, it is returned
    4. when there is no codeword at distance less than t, it does not return a codeword

    Now, we have decoders which fail, especially point 2 and 3 above. This issue can be complexified with probabilistic decoders (Las Vegas, Monte Carlo). Can you describe this with your system of properties ?

    I guess to, so I would vote for properties.

  5. Johan Rosenkilde reporter

    decoder_type:

    • Gao: "hard-decision", "unique", "always-succeed"
    • Berlekamp-Welch: "hard-decision", "unique", "always-succeed"
    • Binary Hamming Codes: "hard-decision", "complete", "unique", "always-succeed"
    • Error-Erasure GRS: "error-erasure", "unique", always-succeed". decoding_radius takes number of erasures and returns no. of errors correctable for that number of erasures.
    • Power decoding GRS: "hard-decision", "unique", "might-fail"
    • Guruswami-Sudan: "hard-decision", "list", "always-succeed"
    • LDPC Iterative: "hard-decision", "unique", "might-fail", doesn't have a decoding_radius.
  6. Johan Rosenkilde reporter

    Ah you came across a problem in the decoding_radius for error-erasure decoders that we hadn't though about. I guess returning 0 when #erasures exceed the number of pure erasures that can be fixed, however, is not the most helpful: the user wouldn't know if it simply meant that these erasures were correctable and no more errors.

    I suggest throwing an exception. Though Daniel probably disagrees ;-)

  7. David Lucas repo owner

    I recollect something I've read in Roth's Introduction to CT book (see pp 16-17). It's a theorem on the number of errors and erasures which states that if the number of errors and erasures follows a certain inequality, there exists a decoder with the following behaviour :

    • If the number if errors is lesser or equal than a certain value, all errors and erasures will be recovered correctly
    • Otherwise, if the number of errors is lesser or equal than another certain value, the decoder returns "e"

    Though it is not the same issue, maybe we can build something rather similar?

  8. Daniel Augot

    If I understand well, the problem is when there are too many erasures ? Well, isn't it a case of "always-succeed", since the decoder should properly and rightly not decode ?

  9. Johan Rosenkilde reporter

    I think we're perhaps talking about three different things? At least, I don't follow at all what you guys are writing :-)

    What I commented on is the following: for the error-erasure decoder, we agreed on a method decoding_radius which takes a number of erasures as arguments and returns how many errors are correctable after that number of erasures. David implemented this function. What we didn't discuss is what that function should return if you gave it a number of erasures which exceeds the erasure-decoding capability of the decoder. David's implementation currently returns 0 (as in, 0 errors are decodable at this many erasures). I suggest throwing an exception instead to clearly indicate that not only is 0 errors decodable, but the erasures themselves are not decodable.

  10. Log in to comment