Quantcast

Performance issues

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Performance issues

Jimisola Laursen
Administrator
Hi!

We are experiencing some major performance issues with OpenNMS web GUI after activating Syslogd. The event counter was around 100000 a week ago and is now ~330000 (i.e. the number of events in one week has been >200 000!).

The setup has 180 Nodes and 201 Interfaces.

The performance issue is apparent all through the application with "List all events" being the worse.

What are others experience with OpenNMS's performance?

We will appearently have to limit the incoming syslog messages, but there will still be a substansial amount entering OpenNMS.

Is there functionality like Apache's

    Order Deny,Allow
    Deny from all
    Allow from <ueiList><ueiMatch>...

functionality in OpenNMS Syslogd?

I noticed the <hideMessage><hideMatch> in syslogd-configuration.xml, but it states nothing about the order (ueiMatch, hideMatch or hideMatch, ueiMatch) and I'm unable to literally do anything in OpenNMS at the moment due to the excess of events.

I need to delete all syslogd events directly in the database in order to go on from here.
Had a look at the database schema (http://www.opennms.org/index.php/OpenNMS_database_schema).
Will a "delete from events where eventuei LIKE "uei.opennms.org/syslogd%"" work or are there other relations/foreign keys that needs consideration?

Regards,
Jimisola

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jimisola Laursen
Administrator
A quick look in the database shows that 229082 of 265028 events are due to syslogd.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jeff Gehlbach
In reply to this post by Jimisola Laursen
On Jul 18, 2008, at 5:56 AM, Jimisola Laursen wrote:
> We are experiencing some major performance issues with OpenNMS web  
> GUI after
> activating Syslogd. The event counter was around 100000 a week ago  
> and is
> now ~330000 (i.e. the number of events in one week has been >200  
> 000!).

If your system is dragging its butt with only 330K events, then you  
need to do some tuning.  The first step I would recommend is putting  
the DB tables on their own spindle or RAID volume set.  Or you can  
move PostgreSQL to a separate system.  Either one will make a huge  
difference.

> The setup has 180 Nodes and 201 Interfaces.

That's not very many compared to some installations I've worked on.  
What kind of hardware are you on?

> What are others experience with OpenNMS's performance?

It just gets happier if you give it (in rough order) more RAM, more  
and faster disks, less RAID5 (i.e. avoid it), and more CPU cores.

> We will appearently have to limit the incoming syslog messages, but  
> there
> will still be a substansial amount entering OpenNMS.

Yes.  You can usually pre-filter a lot of stuff that you know you  
don't care about at the syslog-ng layer.

> Is there functionality like Apache's
>
>    Order Deny,Allow
>    Deny from all
>    Allow from <ueiList><ueiMatch>...
> functionality in OpenNMS Syslogd?

No, but there is in syslog-ng.  I also mentioned in my reply to you on  
another thread that there ought to be a way to discard certain syslog  
messages before they get turned into events.  This is possible today  
with SNMP traps.

> I noticed the <hideMessage><hideMatch> in syslogd-configuration.xml,  
> but it
> states nothing about the order (ueiMatch, hideMatch or hideMatch,  
> ueiMatch)
> and I'm unable to literally do anything in OpenNMS at the moment due  
> to the
> excess of events.

hideMatch will do nothing to curb the number of events.  Its purpose  
is to allow syslog-derived events to be created, but to have the  
message body excised for security reasons if it matches a certain  
pattern, like "[Ll]ogin failed for user (.*?) with password (.*)"

> I need to delete all syslogd events directly in the database in  
> order to go
> on from here.
> Had a look at the database schema
> (http://www.opennms.org/index.php/OpenNMS_database_schema).
> Will a "delete from events where eventuei LIKE "uei.opennms.org/
> syslogd%""
> work or are there other relations/foreign keys that needs  
> consideration?

This is a fine query.  A better constraint would be WHERE eventsource  
= 'syslogd'.

If you do this, keep in mind that PostgreSQL's query planner will  
still be all out of whack.  To reset it, you can do a VACUUM FULL  
ANALYZE (with OpenNMS stopped), but that can take a really long time.  
It's usually faster and at least as effective to stop OpenNMS, use  
pg_dumpall to dump your DB, drop the "opennms" DB, and then load the  
dump you made.  Then you can start OpenNMS again.

-jeff

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jimisola Laursen
Administrator
> If your system is dragging its butt with only 330K events, then you  
> need to do some tuning.  The first step I would recommend is putting  
> the DB tables on their own spindle or RAID volume set.  Or you can  
> move PostgreSQL to a separate system.  Either one will make a huge  
> difference.

>> The setup has 180 Nodes and 201 Interfaces.

> That's not very many compared to some installations I've worked on.  
> What kind of hardware are you on?

I'm not sure, but I'll ask and get back to you on that. The machine has a CPU usage of <20% and there was plenty of memory left. The guy that installed OpenNMS is back next week, so I'll ask him to have a look at it as well.

> Yes.  You can usually pre-filter a lot of stuff that you know you  
> don't care about at the syslog-ng layer.

Yes, we might have to do that in case the performance issues persists.
However, I had some problems with syslog-ng when I use a source (such as internal(), udp()) in more than one place. That is, I created source s_all which contained internal() and different versions of udp(X port(514)) (X e.g. "0.0.0.0", "");.  Is this something you are familiar with?

>> Is there functionality like Apache's
>>
>>    Order Deny,Allow
>>    Deny from all
>>    Allow from <ueiList><ueiMatch>...
>> functionality in OpenNMS Syslogd?

> No, but there is in syslog-ng.  I also mentioned in my reply to you on  
> another thread that there ought to be a way to discard certain syslog  
> messages before they get turned into events.  This is possible today  
> with SNMP traps.

Ok. I had a detailed look at the syslogd source code and have some concerns, but I'll take that in another thread.

> hideMatch will do nothing to curb the number of events.  Its purpose  
> is to allow syslog-derived events to be created, but to have the  
> message body excised for security reasons if it matches a certain  
> pattern, like "[Ll]ogin failed for user (.*?) with password (.*)"

Yes, I confirmed that before I got your reply.

>> I need to delete all syslogd events directly in the database in  
>> order to go
>> on from here.
>> Had a look at the database schema
>> (http://www.opennms.org/index.php/OpenNMS_database_schema).
>> Will a "delete from events where eventuei LIKE "uei.opennms.org/
>> syslogd%""
>> work or are there other relations/foreign keys that needs  
>> consideration?

> This is a fine query.  A better constraint would be WHERE eventsource  
> = 'syslogd'.

Thanks. I found the eventsource myself and it was very convenient :)

> If you do this, keep in mind that PostgreSQL's query planner will  
> still be all out of whack.  To reset it, you can do a VACUUM FULL  
> ANALYZE (with OpenNMS stopped), but that can take a really long time.  
> It's usually faster and at least as effective to stop OpenNMS, use  
> pg_dumpall to dump your DB, drop the "opennms" DB, and then load the  
> dump you made.  Then you can start OpenNMS again.

I didn't know about this. I'll do that next week.

To sum it up, I'll be at work Thursday next week and will have a look at the performance issue again with the system administrator and then get back here. Using syslog-ng to filter out early is the solution for now, but this means double work since I will explicitly have to let through the syslog messages that I pinpointing in OpenNMS for event creation.

Regards,
Jimisola
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jimisola Laursen
Administrator
The one-time-only-usage of sources in syslog-ng.conf was found here:

http://www.balabit.com/dl/guides/syslog-ng-v2.0-guide-admin-en.pdf

see last paragraph of "3.3. Sources and source drivers".

But, it can, of course, be worked around with some extra work.

Jimisola
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jeff Gehlbach
In reply to this post by Jimisola Laursen
On Jul 18, 2008, at 1:33 PM, Jimisola Laursen wrote:
> I'm not sure, but I'll ask and get back to you on that. The machine  
> has a
> CPU usage of <20% and there was plenty of memory left. The guy that
> installed OpenNMS is back next week, so I'll ask him to have a look  
> at it as
> well.

What's the load average running?  iostat is your friend for checking  
on I/O wait.


>> Yes.  You can usually pre-filter a lot of stuff that you know you
>> don't care about at the syslog-ng layer.
>
> Yes, we might have to do that in case the performance issues persists.

You should do that regardless -- if one of your systems or devices is  
prone to spewing nonsense syslog messages, and you know that they're  
nonsense, why wouldn't you filter them out?

> However, I had some problems with syslog-ng when I use a source  
> (such as
> internal(), udp()) in more than one place. That is, I created source  
> s_all
> which contained internal() and different versions of udp(X  
> port(514)) (X
> e.g. "0.0.0.0", "");.  Is this something you are familiar with?

As I said before, I don't mangle syslog-ng very often.  I would have  
to go read the online docs and figure it out for myself and then pass  
that along to you, and I just don't have time to do that right now :)

> Ok. I had a detailed look at the syslogd source code and have some  
> concerns,
> but I'll take that in another thread.

Yes, it's a terrible mess and needs to be refactored.  It does mostly  
work in its present state, though.

> Using syslog-ng to filter out early is the solution for now, but this
> means double work since I will explicitly have to let through the  
> syslog
> messages that I pinpointing in OpenNMS for event creation.

I would suggest that you take the approach of using syslog-ng as a  
blacklisting tool, not a whitelisting one.

-jeff

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jimisola Laursen
Administrator
> What's the load average running?  iostat is your friend for checking  
on I/O wait.

Thanks for the tip. I'll have  check.

> You should do that regardless -- if one of your systems or devices is  
> prone to spewing nonsense syslog messages, and you know that they're  
> nonsense, why wouldn't you filter them out?

I feel the same way, but I just started digging in the system. Excess information hides important information. However, I'd prefer to fix the syslog messages at source if possible.

> As I said before, I don't mangle syslog-ng very often.  I would have  
> to go read the online docs and figure it out for myself and then pass  
> that along to you, and I just don't have time to do that right now :)

Of course not. I'll have a look again. I'll report back if I find something useful .

> Yes, it's a terrible mess and needs to be refactored.  It does mostly  
> work in its present state, though.

Mostly was the word :)

> I would suggest that you take the approach of using syslog-ng as a  
> blacklisting tool, not a whitelisting one.

I'm afraid that the variety of syslog message is far too great to black list.
Most likely, it will be easier to whitelist the ones that we are interested in.
We can always use php-syslog-ng to find the ones that are of interest.

Jimisola
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jimisola Laursen
Administrator
Update on the performance issues...

A "Home / Events / All Events" have been running for more than 5 minutes now without the result coming up in the GUI (using 1.5.91).

Ran top and iostat during the request:

1) top

top - 11:47:26 up 199 days,  1:33,  4 users,  load average: 7.59, 5.21, 3.18
Tasks: 345 total,   1 running, 343 sleeping,   1 stopped,   0 zombie
Cpu(s): 19.9%us,  4.0%sy,  0.0%ni, 31.2%id, 43.0%wa,  1.5%hi,  0.5%si,  0.0%st
Mem:   1035296k total,   754756k used,   280540k free,    48748k buffers
Swap:  1638392k total,   165684k used,  1472708k free,   381872k cached

2) iostat

Linux 2.6.16.21-0.8-smp (mon02)     07/24/08

avg-cpu:  %user   %nice %system %iowait   %idle
           3.51    0.02    1.30    7.28   87.88

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
hda              26.69        69.72        31.13 1199053914  535334160
hdb              26.42       182.11        31.13 3131845954  535333268
md0              85.19         2.07        13.28   35624472  228384192
dm-0             29.12       118.90       200.26 2044900224 3444023032
dm-1              0.69         2.22         3.26   38254000   56075072
dm-2             14.69        60.26        98.28 1036307456 1690175184
dm-3             40.69        70.42       210.96 1211128696 3628045496

3) free

             total       used       free     shared    buffers     cached
Mem:       1035296    1021648      13648          0      98496     575376
-/+ buffers/cache:     347776     687520
Swap:      1638392     165684    1472708

4) cat /proc/cpuinfo

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 3.06GHz
stepping        : 9
cpu MHz         : 3066.473
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips        : 6142.17

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 3.06GHz
stepping        : 9
cpu MHz         : 3066.473
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips        : 6132.08


Looking at the output I find it difficult to understand why we are experiencing performance issues with OpenNMS. I'm not familiar with the output from iostat. Are there any hints there?

Regards,
Jimisola
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jeff Gehlbach
On Jul 24, 2008, at 6:08 AM, Jimisola Laursen wrote:

> A "Home / Events / All Events" have been running for more than 5  
> minutes now
> without the result coming up in the GUI (using 1.5.91).

I think you simply have too damn many events in your DB:

psql -U opennms -h localhost -c 'SELECT COUNT(eventid) FROM events;'


> Ran top and iostat during the request:
>
> 1) top
>
> top - 11:47:26 up 199 days,  1:33,  4 users,  load average: 7.59,  
> 5.21, 3.18
> Tasks: 345 total,   1 running, 343 sleeping,   1 stopped,   0 zombie
> Cpu(s): 19.9%us,  4.0%sy,  0.0%ni, 31.2%id, 43.0%wa,  1.5%hi,  0.5%si,
> 0.0%st
> Mem:   1035296k total,   754756k used,   280540k free,    48748k  
> buffers
> Swap:  1638392k total,   165684k used,  1472708k free,   381872k  
> cached

Your system is spending LOTS of time waiting (43.0% according to  
top).  You're hitting swap quite hard.  Adding more RAM is probably a  
very good idea.

> 2) iostat
>
> Linux 2.6.16.21-0.8-smp (mon02)     07/24/08
>
> avg-cpu:  %user   %nice %system %iowait   %idle
>           3.51    0.02    1.30    7.28   87.8%

I would expect the %iowait here to be closer to the wait% reported by  
top.  That may be just a timing issue.

> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
> hda              26.69        69.72        31.13 1199053914  535334160
> hdb              26.42       182.11        31.13 3131845954  535333268

Tell me about hda and hdb -- they're presumably IDE / PATA disks?  
What's the rotation speed and cache size?

> dm-0             29.12       118.90       200.26 2044900224 3444023032

Lots of writes to whichever filesystem is on dm-0.

> dm-1              0.69         2.22         3.26   38254000   56075072
> dm-2             14.69        60.26        98.28 1036307456 1690175184
> dm-3             40.69        70.42       210.96 1211128696 3628045496

Ditto the filesystem on the dm-3 device.

>             total       used       free     shared    buffers      
> cached
> Mem:       1035296    1021648      13648          0      98496      
> 575376
> -/+ buffers/cache:     347776     687520
> Swap:      1638392     165684    1472708

See comments above regarding memory.

> 4) cat /proc/cpuinfo

You should be fine on CPU.  Memory is much more important.

Add as much RAM to this system as it will hold and I think you will  
see a dramatic performance improvement.  If you can get 10KRPM or  
15KRPM SCSI or SAS disks, that will help too, but only after you get  
enough RAM to keep the system from regularly hitting swap while still  
having a good amount of filesystem cache and I/O buffers.

-jeff

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jimisola Laursen
Administrator

Jeff Gehlbach wrote
On Jul 24, 2008, at 6:08 AM, Jimisola Laursen wrote:

> A "Home / Events / All Events" have been running for more than 5  
> minutes now
> without the result coming up in the GUI (using 1.5.91).

I think you simply have too damn many events in your DB:

psql -U opennms -h localhost -c 'SELECT COUNT(eventid) FROM events;'
I started a VACUUM before I left today and then I had ~340 000 events.
After I delete all syslog events and vacuum the performance is ok again.

The initial OpenNMS installation is actually not my responsibility, but now I have enough information to hand-over the issue to another guy :)

Adding memory and changing disk setup is not likely just because of OpenNMS. I believe we will have to filter syslog messages going into OpenNMS using syslog-ng.

Thanks for the quick response.

Regards,
Jimisola
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Performance issues

Jeff Gehlbach
On Jul 24, 2008, at 12:42 PM, Jimisola Laursen wrote:
> Adding memory and changing disk setup is not likely just because of  
> OpenNMS.

If you continue adding nodes, your performance is going to be bad if  
your system is hitting swap.  RAM is virtually free today -- throw  
your poor server a bone :)

> I believe we will have to filter syslog messages going into OpenNMS  
> using
> syslog-ng.


I've been telling you that for weeks ;)

-jeff

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss
Loading...