NXT network monitoring funding request (Marcus03)

Issue #45 new

EvilDave created an issue 2014-03-31

Reposted from nxtforums.org:

From Marcus03: I am asking for 150.000 NXT to develop, setup and run a NXT monitoring software and the necessary infrastructure. The amount covers a 6 months timespan. If the community thinks its useful, the following months should be cheaper, since the development work should be much less. The amount of 150.000 NXT includes the hosting of 10 VPS nodes for 6 months. The rest is compensation for the work I want to put into it.

I've developed a proof-of-concept version of a NXT network monitoring software and infrastructure, based on two python scripts:

Monitoring agents: Run on NRS nodes, analyze the peer network traffic, grab cpu and memory load and send the data to a centralized server. The agents work from outside NRS meaning that they look at the network traffic on the OS level. Monitoring server: Centralized data collector which also spits out metrics.

I currently have two agents running on two NRS VPS nodes. The plan is to have 10 VPS in datacenters around the world with the NRS and the agent running on them and the 10 servers not being peers of each other. IPs of the agent servers and the central server are not generally known to prevent DOS attacks. My solution is centralized and I think the monitoring should actually be outside the NRS core and centralized.

ED: Cut a lot, see for the rest: https://nxtforum.org/general/%27nxt-network-monitoring%27-funding-request/new/#new

Comments (8)

EvilDave reporter
- edited description
- 2014-03-31T22:05:21+00:00
EvilDave reporter
Personally, I like the sound of this Centralisation is maybe an issue, but network monitoring is always going to be centralised at some point.

I understand that Marcus has put out a request to the entire community for funding, not solely to InfCom, but we should discuss this now, both to be ready to support M if needed and to help any potential NXT donors with the decision to fund it or not.
- 2014-03-31T22:09:47+00:00
EvilDave reporter
- edited description
- 2014-03-31T22:10:42+00:00
Ian Ravenscroft
So when I run a VPS i use the providers SW to monitor CPU, Disk, Network and NRS availability and alert me if there is a problem... I dont understand why I would install someone else's SW to do this? I am not clear on the outcome of having this data centrally. I need the info / alarm as the node owner in order to fix it if it fails.

I run major centralised infrastructure so I understand the benefit of this where the ownership / management of the infrastructure is centralised - in this case its not so I am finding it hard to see the benefit to justify the this level of funding.

I think a HOW-TO for people to set up a fully monitored and alarmed VPS and a set of automated linux install scripts would probably do more for the network health so people would feel less threatened by the job of setting up and running a node.

I think James stuff which is more focussed on the behaviour of NRS itself is adding something that is not already available - it would be good to see Marcus and James join forces on this one.
- 2014-04-03T15:55:58+00:00
EvilDave reporter
Welcome back, chanc3r.

I think we need to take a small step backwards here and take some time to define precisely what we want to monitor. One of the major issues/targets for NXT has always been the magical 1000TPS, and while that may not be immediately achievable it would be very useful to know what the current TPS rate is.

Monitoring TPS will imply that we have to gather information about the entire NXTwork and its performance, which will then allow us to configure / encourage the NXTwork to achieve a higher TPS, once we know which variables most affect TPS.

Apart from TPS, there isn't much more that we really need to know about the NXTworks day-to-day operation, IMHO

But we do need to guard against abnormal circumstances: attacks such as Harmony666s recent attempt to confuse BTER and his successful knocking out of forging nodes, or the DDOS all NXT assets suffered at end 2013/begin2014. Forking of the blockchain also needs to be an alertable situation.

I do like the rough outline of Marcus' concept, and feel that he will deliver, but i would also like to see a little bit more co-operation / cross-fertilisation of ideas between developers. l8orre has also claimed to have network monitoring capabilities in his FreeRider client, also in Python, and we are all familiar (more or less) with James' work on Nodecoin.

Put them all in a small room together, don't let them out until we have a network monitor
- 2014-04-03T23:28:55+00:00
Ian Ravenscroft
Current max fps is easy its 3 250 tx per block 1 block per minutes i.e. 60s so you submit your TX and wait max 60s to be included as long as network gets your TX to a forger and there aren't 250 TX in front of you....

Long way from 1000 - already had this debate in the thread with CIYAM Current limit 100tps With TF 100 fps achievable With TF and Parallel Chains - 1000 tps

I think Wesley's client already provides an approx tps value if you select the blocks tab.
- 2014-04-05T16:14:48+00:00
ferment
A really simple way to do this is to write some plugins for Scoutapp. https://scoutapp.com

Centralized but way cheaper than from scratch.
- 2014-05-05T16:10:40+00:00
ferment
I'd like to see something like this broken out into deliverables that are reusable so that the community gets benefit after the scope of the funding.

I'm ok with centralization of data collection but the agents and collects should be open source so that multiples can exist if needed.
- 2014-05-05T16:20:14+00:00
Log in to comment

Assignee: –

Type: proposal

Priority: major

Status: new

Component: Infrastructure

Votes: 0

Watchers: 4