Network Health Status

Issue #22 new
ferment created an issue

It would be nice to have an intrastructure health/availability page.

Comments (17)

  1. marcus03

    I remembered that CfB and BCNext were monitoring the NXT network when we were under DDOS and asked him how they did it. Here's his answer:

    "We had nodes in different datacenters. All these nodes were connected to Nxt network (to different peers!) and had direct connections to each other. Once a node received a block or an unconfirmed transaction it sent this to other nodes (not Nxt peers). The others measured time between the signal and time when they received the same block/transaction from Nxt peers. The delay is the main indicator that shows time of network convergence. During DDoS attacks average convergence time becomes much higher. Forks r also easily detected.

    We also measure load on other peers by sending transactions to one of them from one node and receiving on another one. U can easily extended this approach and add extra functionality.

    We used tool written by BCNext and I can't share it without his explicit permission, which is hard to get coz I'm not in touch with him anymore."

  2. marcus03

    The basis idea that I get out this is that we could have a special version of NRS on some nodes managed by known operators. These nodes would simply sent timestamp/blockID values and timestamp/TransactionID values to a central data collector whenever they receive a block or an unconfirmed transaction through the network.

    Network health could then be calculated by looking at the timestamp differences for the same transactions/blocks across all monitored nodes.

    This is of course a centralized approach, but I guess any network monitoring is centralized by definition. We would need to make sure however that the data collector is protected from DDOS itself.

  3. ferment reporter

    A decentralized approach would be to add an API to nodes that allow them to return internal statistics as well some kind of event stream with time stamps. The data could then be propagated using the existing p2p communications or we could run it on a separate port.

    Decentralized data collectors can aggregate and analyze the data in the same way that multiple peer/network explorers currently do.

    We can make the install of monitoring plugin a requirement of receiving bounties for running a node.

  4. marcus03

    I think monitoring should be on top of NXT, not within the NXT core. We still want to be able to monitor when NXT is under DDOS. I also don't want to influence the network itself with monitoring. And finally, getting something like this into the core, would take months and I want it faster. :-)

    An additional optional piece of software (e.g. shell script) which uses ngrep (on Linux) or something to extract transaction/block data from the network stream and NRS http API to get additional data from the core would work more independently from the status of the NXT core.

  5. ferment reporter

    I get what you're saying and agree with the concerns. However, I think it would be possible to shoehorn something into the JVM to gather data in an unobtrusive way.

    a collectd plugin would be interesting for the external (on top) approach: http://collectd.org.

  6. Ian Ravenscroft

    Monitoring is not part of NXT, its a separate APP - there are lots of funky tools to monitor networks like icinga or nagios....

    NXT needs to enable it - what I mean is we need to have hooks (API) which the monitoring tool can use to gather information.... we have some status API - we should review to see if we have anymore..

  7. marcus03

    I've set up pretty complex Icinga systems and it is not the right tool. It's good if you have a rather constant infrastructure, but it will not do a good job for gathering infos from a constantly changing number of network nodes. It's also limited in regards to the logic that would be needed to make a statement about network health in contrast to node health.

    I've looked into ngrep a bit, but I think it is too low level.

    Looking at the core source code, it seems to me (I'm not really into Java) that there is a Java API that in the core that could be used to our own stuff whenever a transaction comes in (transaction listener). Regarding gathering info about network convergence as done by CfB, this would be what we would need.

    Anyone of you can say if it's feasible to use the Java API to extract this information without touching the NRS core source code?

  8. ferment reporter

    @chanc3r To get data on when a node receives a block or transaction, we need to get that from inside the node when it happens as I don't believe any existing public API will return it nor does NRS currently save the receiving timestamp.

    @marcus03 From what I can tell, no. Looking at BlockchainProcessorImpl and TransactionProcessorImp, there's no way to know when a node receives a block/transaction for the first time to generate the signal. Also, the timestamp of when a block or transaction is received is not stored.

    We'd need to add some kind of eventing/callback mechanism to NRS to capture this stuff externally.

    Another approach would be to build our own lightweight monitoring nodes (pull only) that do what BCNext's tool did and recreate their environment.

    Looking at the code, I'm not even sure how we'd get TPS other than interpolating from block timestamp and number of transactions per block. I assume 1000 TPS would be a block per minute with 60,000 transactions. Although I could be missing something obvious...

  9. ferment reporter

    I wonder how useful it would be to get API and peer metrics by just monitoring Jetty http calls?

    Actually, now that I'm thinking about it, we could shoehorn some async monitoring through a jetty filter! I'll investigate.

    UPDATE: This could be a solution. http://stackoverflow.com/questions/14390577/how-to-add-servlet-filter-with-embedded-jetty.

    As long as the filter runs aysnc, it shouldn't be intrusive. We could add a stat/callback filter to APIServlet and PeerServlet.

    UPDATE 2: Doh, 0.8.x removed the config files that would let us add a Filter through configuration. Maybe we need to brainstorm with Jean-Luc Picard.

  10. marcus03

    @ferment: Looking at the last 4 lines of TransactionProcessorImpl.java (https://bitbucket.org/JeanLucPicard/nxt/src/1ff7dc8f579fed8804d6bb3fdda6a822fcd52162/src/java/nxt/TransactionProcessorImpl.java?at=master) it seems there is something like a transactionListeners class. My hope was that this could be used from a separate Java application that registers with the NXT core to get the notifications about new transactions.

    The missing timestamp wouldn't be a problem here, since the registrar would get instant notification when the core processes the transaction.

    EDIT: I've messaged JL for feedback again.

  11. ferment reporter

    @marcus03 Good find. Listening for ADDED_CONFIRMED_TRANSACTIONS and BLOCK_PUSHED events might give us what we want.

    Seems like we need to be able to register listeners via nxt.properties? @JeanLucPicard?

  12. marcus03

    My PM to JL: "...could you comment on the question if it's possible to plug-into NRS from a separate JAVA application using the JAVA API to get notifications about new transactions being processed? ... I haven't seen code using the JAVA API, so I am not even sure what's it for. Are you aware of any code using it?"

    His answer: "The Offspring client relies on those listeners a lot. I am not sure what you mean by a separate java application. It needs to run in the same JVM, and call Nxt internally as a library, the way Offspring is doing it."

    Offspring is still closed-source and I am not aware of any other codes that shows how you could use the JAVA API for this.

  13. Ian Ravenscroft

    We need to be very careful about what java instrumentation and where... Must make sure that any instrumentation cannot introduce vulnerabilities... I would favour an application specific class / API call which returns necessary data I would fear that the generic libraries could be or contain exploits.

  14. ferment reporter

    Sounds like Offspring is a JAVA app that uses NXT internally. Not a huge deal, but not optimal. In a similar manner we could do a bounty for a Node only app that uses the NXT library and all does our instrumentation and whatever else.

    As an alternative, if JLP could add support for registering listeners via nxt.properties then we wouldn't need to create a separate app. We just do a a bounty for a listener.

    The other approach is to shim an http proxy in front of NRS and async log information that way. Not sure how much steaming is done however that would be intrusive to inspect.

  15. Log in to comment