multi- thread/process signal decoding

Issue #4 open

Dennis Delin created an issue 2017-12-20

Hi,

First off, sorry for spamming you ^^

This might not really be an issue or enhancement but now when I got the BLF and DBC libs to work I try to parse all the CanMessages using dbc files. But I noticed its quite slow to do the signal decoding.

When just run thru my 88mb blf file with 4 channels it takes about 12.82s If I then add dbc message lookup without signal decoding I get about 19.58.s When I enable the signal decoding I end up with a 5m 56.50s runtime.

I don't do any console printouts in this. Code is more or less a copy from the DBC decode example but with a data copy from blf to vector as the decode method uses.

if (ohb->objectType == Vector::BLF::ObjectType::CAN_MESSAGE) {
            Vector::BLF::CanMessage * canMessage = reinterpret_cast<Vector::BLF::CanMessage *>(ohb);;

            //cout << canMessage->channel << " " << canMessage->id << " " << canMessage->dlc;
            Vector::DBC::Message & message = channelNetworkMap[canMessage->channel].messages[canMessage->id];
            //cout << " Message " << message.name << endl;

            // copy data to vector for dbc lib to use
            vector<uint8_t> canData(&canMessage->data[0], &canMessage->data[canMessage->dlc]);

            // loop over signals of this message to find and get multiplexor
            unsigned int multiplexerSwitchValue = 0;
            for (auto signal : message.signals) {
                if (signal.second.multiplexorSwitch) {
                    unsigned int rawValue = signal.second.decode(canData);
                    multiplexerSwitchValue = rawValue;
                    //std::cout << "  this is a multiplexed message with switch value = "
                    //          << std::dec << multiplexerSwitchValue << std::endl;
                }
            }

            // loop over signals of this messages
            for (auto signal : message.signals) {
                // if it's the multiplexorSwitch, only show raw value
                if (signal.second.multiplexorSwitch) {
                    //std::cout << "  Signal (MultiplexorSwitch) " << signal.second.name << std::endl;
                    unsigned int rawValue = signal.second.decode(canData);
                    //std::cout << "    Raw Value: 0x" << std::hex << rawValue << std::endl;
                } else

                // if it's an multiplexed signal check that the value matches
                if (signal.second.multiplexedSignal && (signal.second.multiplexerSwitchValue == multiplexerSwitchValue)) {
                    //std::cout << "  Signal (MultiplexedSignal) " << signal.second.name << std::endl;
                    unsigned int rawValue = signal.second.decode(canData);
                    //std::cout << "    Raw Value: 0x" << std::hex << rawValue << std::endl;
                    double physicalValue = signal.second.rawToPhysicalValue(rawValue);
                    //std::cout << "    Physical Value: " << physicalValue << std::endl;
                } else

                // if it's a not a multiplexed signal just proceed it
                if (!signal.second.multiplexedSignal) {
                    //std::cout << "  Signal " << signal.second.name << std::endl;
                    unsigned int rawValue = signal.second.decode(canData);
                    //std::cout << "    Raw Value: 0x" << std::hex << rawValue << std::endl;
                    double physicalValue = signal.second.rawToPhysicalValue(rawValue);
                    //std::cout << "    Physical Value: " << physicalValue << std::endl;
                }
            }
        }

Any idea on how to speed this up? Or do you think I should try to multithread this decode per canMessage in my application?

Thanks, Dennis

Comments (5)

Tobias Lorenz repo owner
No problem, I like to stay in contact with users and customers as this is always valuable feedback ;-)

I have several ideas how the performance of the two libraries can be improved. It's only difficult to identify, where the performance bottlenecks are and if a potential solution will in fact improve the situation, or make it worse. So it's good that you already provided some performance figures.

In general I though about introducing threading in all my libraries. However this only makes sense, if more than one thread is beneficial, as otherwise the one thread can as well be implemented on the application level using the library. However especially for the log file libraries (Vector::ASC, Vector::BLF, ASAM::MDF) it definitely makes sense to have multiple threads in the library already, as read-ahead with decompression, and late-write with compression, can be implemented only with multiple threads.

About the Vector::DBC library. The load performance can be improved using lexical analyzers (flex), or probably even grammar parsers (bison). But startup performance is usually not a problem. If the data is loaded already, the C++ data structures are already access-optimized using std::map wherever possible. The next level of improvement, but this is a really big steps, is to implement some kind of in-memory byte code compiler and byte code run time engine. This is also what the Linux kernel does to speed up firewall rule evaluation in the BPF part. But I have not much experience with just-in-time compilers. I'm open for more ideas. ;-)

About the Vector::BLF library. Here I see a much more potential: 1. I experienced short hangs, whenever a LogContainer gets inflated/decompressed, which happens every 100-200 objects. So doing a read-ahead and decompression, and a write-after and compression, in a separate thread definitely makes sense. 2. Currently every member variable in the classes gets read/write one by one. The better approach would be to read the whole block at once, and then access the variables using getters/setters. 3. Currently when more data need to be decompressed from LogContainer, a std::vector<char> is resized to the intended uncompressed size, the data from LogContainer is inflated/uncompressed into the std::vector and then copied into the Uncompressed::m_data std::vector. There is likely a way to directly decompress the data into the target vector and skip one copy process. 4. Having a separation between CompressedFile and UncompressedFile at all might not make sense. We need a buffer for the decompressed data for sure. But maybe we can decompress in place. So whenever a read stumbles upon a LogContainer, it could transparently decompress the data in place. 5. I'm not sure about the performance of a std::vector<char> in comparison to a new char[]. This is especially true for how the data is managed in the UncompressedFile::m_data vector. This is kind of a stream buffer and partial view / file window in the uncompressed file content. 6. Talking about the UncompressedFile::m_data, it might not be the best idea to use a std::vector as a file buffer. I need to resize it every time I append something to it, and to keep memory consumption low, I need to truncate from the start. I don't know how many move operations are necessary behind the scenes, so within the std::vector implementation. I'm waiting for a standard for such use cases in the C++ library, e.g. std::buffer. Maybe the existing std::streambuf can also be used for it. But at the time I tried I wasn't able to figure out, how to do this right.

So a lot of ideas. Let's discuss what makes the most sense. ;-) Having an interface with getters/setters is definitely a good start. This keeps the interface stable for future changes, even if significant code improvements happen behind the scenes.
- 2017-12-21T17:32:58+00:00
Tobias Lorenz repo owner
Hi Dennis, I made some changes in the last weeks to the Vector::BLF library to have the library running multi-threaded. Any read/write operation works on a queue now. From the read/write queue a thread transfers the data to the virtual uncompressed file. Another thread takes the data and transfers it into the actual compressed file. I don't see much difference in performance using my small test files (up to 4 Mb), which tells me there seems to be at least no significant overhead due to multi-threading. Please give it a try, how this improves performance with your large files. The changes are located in a feature branch here: https://bitbucket.org/tobylorenz/vector_blf/branch/feature/multi_threading Maybe we can continue this discussion in context of Vector::BLF issue tracker. I took me some time to find this here ;-) Bye, Tobias
- 2018-01-13T16:16:00+00:00
Dennis Delin reporter
Hi Tobias,

Thank you, I will test it out next week, have been away for business trip sense new year. Will give you more feedback then.

Dennis
- 2018-01-23T10:19:02+00:00
Tobias Lorenz repo owner
Hi Dennis,

I'm not sure if this issue is solved for you already.

I think, the Vector::BLF parsing now runs much faster due to multi-threading. Also the access to the Vector:DBC structures cannot be improved much. Only the DBC load into memory can be improved, but that's not the issue here, right?

I'm pretty sure that what actually consumes the time is the signal decode function, because this extracts the raw value bit-wise from the CAN data. This is far from being optimal. So I leave this ticket open to improve this.

Bye Tobias
- 2018-10-09T17:01:59+00:00
Tobias Lorenz repo owner
- changed status to open
Improve signal decode function, that currently extracts the raw value bit-wise from the CAN message.
- 2018-10-09T17:02:34+00:00
Log in to comment

Assignee: Tobias Lorenz

Type: enhancement

Priority: major

Status: open

Votes: 0

Watchers: 3