NMS-9475: Document the logic behind the response time value reported by the SnmpMonitor

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

NMS-9475: Document the logic behind the response time value reported by the SnmpMonitor

Seibold, Michael-2
Hi Alejandro,

can you confirm that this response time behaviour is the same for all polling services or do they use different models for the timeouts?

Maybe the actual behaviour is quite excellent in some occasions:

I often wondered how I could count the retries that occur while polling services. To avoid too many false alarms retries are obviously helpful. We used to have some scripts using statistical functions like summing up all retries for a given service / location / name_it, summing up response times for all locations, calculating standard deviations etc. to see problems in provider networks even when there were no alarms for outages. But this required additional polling of those services outside from the monitoring system.

With the actual behaviour and a little bit of thinking it should be possible to get the count for the retries out of the response times.... something like sum(retries=integer(response time / timeout)) for all nodes in remote locations.

-Michael



NMS-9475 Description:

During a support session, I've discovered that the response time graphs associated with the SnmpMonitor shows values greater than the configured timeout.

Digging into the code, I found that the reason for this is due to the fact that the TimeTracker responsible to return the actual value of the response time is created outside the retry loop, and it is not re-initialized on each attempt (or retry). So, if the monitor implementation has to retry to get a response, and it actually gets the response within one of the retry attempts, the response time will be the total amount of time spent during all the attempts (which can be greater than the timeout).

A future enhancement could be add an optional parameter to let the user choose the behavior. In this case, we can choose between having the total transaction time, or having the time spent on the last attempt.

For now, update the documentation to reflect the current behavior is enough.







------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss

winmail.dat (18K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NMS-9475: Document the logic behind the response time value reported by the SnmpMonitor

Alejandro Galue-3
Hi Michael,

On Jul 10, 2017, at 11:45 AM, Seibold, Michael <[hidden email]> wrote:

can you confirm that this response time behaviour is the same for all polling services or do they use different models for the timeouts?

Short answer: I don’t know.

There are about 80 different monitor implementations, so it could take a while as I have to study the source code of each of them in order to tell you if the behavior is the same or not.

Alejandro Galue
[hidden email]
PGP Key Fingerprint: 5293 6234 1E75 DF30 7821  1823 87AF 972E DAF8 BE2C


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NMS-9475: Document the logic behind the response time value reported by the SnmpMonitor

Seibold, Michael-2

Hi Alejandro,

 

don't bother with all of them, way too much work. But maybe you can check for icmp, because with snmp and icmp there is already enough of measurement data to get good results for statistics, and those two are measured on most of the monitored equipement.

 

Thanks a lot

Michael

 


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss
Reply | Threaded
Open this post in threaded view
|

Re: NMS-9475: Document the logic behind the response time value reported by the SnmpMonitor

Alejandro Galue-3
Michael,

On Jul 10, 2017, at 12:11 PM, Seibold, Michael <[hidden email]> wrote:

don't bother with all of them, way too much work. But maybe you can check for icmp, because with snmp and icmp there is already enough of measurement data to get good results for statistics, and those two are measured on most of the monitored equipement.

ICMP is tricky due to how it is implemented (and also considering that there are several implementations).

It seems like the response time is the time spent on receiving a successful response (taken from the time on which the request was sent).

So, regardless how many retries you have, you'll get the time for the last successful attempt (which is not the case of the SnmpMonitor).

Alejandro Galue
[hidden email]
PGP Key Fingerprint: 5293 6234 1E75 DF30 7821  1823 87AF 972E DAF8 BE2C


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss

signature.asc (465 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: NMS-9475: Document the logic behind the response time value reported by the SnmpMonitor

Seibold, Michael-2

Hi Alejandro,

 

Ø  So, regardless how many retries you have, you'll get the time for the last successful attempt (which is not the case of the SnmpMonitor).

 

thanks a lot!

 

-Michael


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss