We're moving to Zabbix 2.2 from HP OpenView now, could you share your experience for the following problems :ġ. In Zabbix, you can't manually reset triggers to an OK state, so triggers will either reset to OK prematurely, or will never reset to OK. In Openview, if a event occurs and you've taken care of the problem, you simply acknowledge the event and it falls of the screen. Zabbix triggers only reset to an OK state on the receipt of data indicating that the state is OK. When you do pay, its quite good and frankly quite cheap. The support when you don't pay is still pretty decent. If you're in a high traffic environment (around a thousand hosts or above 400 or so new values per second), you'll probably want to ditch the purging scheme they use and write your own (table partitioning and automatic drops of the partitions are a good idea). As far as the GUI goes, there are alternates, but the real advantage for us is once we get an trigger that trips into an alarm, we have a number of purpose built screens that are automagically linked to the alert notification so that we can view more than just the particular item that failed (CPU is high => why is CPU high => OMG the load balancer shows we went from 5k connections to 250k connections => its a traffic issue, and we're able to get someone looking at that in less than 30s).Ĭoncerning your original questions, initial setup is quite easy (grab a few packages and then compile from source. They play very nicely together, getting around a lot of the native implementations issues. We actually use logstash (for log aggregation and querying) which has a few hooks for integration with Zabbix. To address the few points made by keep667, yes, the logfile monitoring isn't that great. We use Zabbix pretty heavily (slightly less than 1,000 hosts in prod, somewhere around 550 new values/sec).
Solarwinds network performance monitor vs zabbix code#
We had to hack the C code for the zabbix agent to re-parse the directory less frequently. This resulted in %100 cpu utilization by zabbix. We discovered that it was re-parsing the entire directory every time it read a segment of the file.
What logrt does is read lines from the most recent logfile matching a specified filename pattern (e.g., the most recent logfile named mylogfile*.log). Our solution now is a hack on the web interface to allow logfile triggers to be manually reset to OK.Ģ.) We have a 'logrt' item on a logfile in a directory with 700 other logfiles. We found that this would cause multiple notifications to be sent for the same problem. We initially worked around this issue by using a 'nodata' trigger to reset the trigger to OK after, for example, 30 minutes. (2) Only lines matching 'HTTP-ERROR' occur, or no other lines appear, and the trigger never resets to OK. Since this doesn't match 'HTTP-ERROR', the trigger resets to OK - probably not what you intended. Now there are two possibilites: (1) A line "HTTP-WARN foo" occurs. The line "HTTP-ERROR foo" occurs, and the trigger goes to a PROBLEM state. You have a trigger for the word HTTP-ERROR. You monitor a logfile, looking only for lines with the word HTTP.
There are two issues we've encountered:ġ.) Zabbix triggers only reset to an OK state on the receipt of data indicating that the state is OK. Log file monitoring is problematic in version 1.8 (and maybe will be better in 2.0). For example, if you have a trigger set to trip if it sees the word 'ERROR' in a logfile, the trigger monitoring interface will show a PROBLEM state for that trigger, but you have to actually navigate to a separate history page for the item in order to see the content of the error message. One issue is that in the 'triggers' monitoring interface, you often can't quickly tell exactly what datapoint caused the trigger to trip. The web interface is a little lacking if you actually have someone looking at it 24/7. Nice built-in graphing/trending functionality It's open source, so you can try to fix shortcomings At the moment, we have Zabbix configured to send snmp traps to OpenView so that our NOC can continue to use the same interface for the time being. We're moving to Zabbix from HP OpenView (our ancient version was last updated 2006), and currently have ~80 of ~300 hosts moved over.