Just a comment on possible roadmap to the future
As I'm working my way through deploying Prelude for the first time, there are a number of things in the documented architecture where I'd like to at least just raise suggestions... Which would address some things I anticipate down the road and maybe are not immediately imperative.
The use of a relational database.
Although proven for its reliability and performance, I wonder if it's really the best choice for an app like Prelude which requires aggregating enormous amounts of data, in fact the more that can be accumulated from more sources and over a longer period of time, the better the analysis should typically become.
A relational database has two fundamental limitations that are hard obstructions... The inability to modify the original schema and physical data storage limitations. Clustering can incrementally increase storage but with great effort.
NoSQL databases, particularly Hadoop style storage have no such limitations. Static schemas are replaced with the ability to just add new data types as you wish on demand and relational aspects are abstracted into a meta layer that can be re-configured easily. Also, hadoop type storage and NoSQL like Cassandra can expand storage simply by just bringing up a new node as a member of the cluster, and today the various administrative tasks like joining, node communication and data mapping are done automatically.
I can see that Prelude is in its nascient beginnings of implementing an Agent based distributed architecture with its many advantages (decentralized computing load distribution, local administration and configuration) compared to centralization (better centralized control).
To address the above issues and objectives,
You may or may not know about the Elasticsearch project (http://elastic.co) which I've also been using. Elasticsearch is a competitor to, and a solid alternative to the pure Hadoop/Solr/Pig/Hive Big Data analytical solutions typically used for the biggest Web search engines, IBM Watson which was a Jeopardy! game contestant against humans in 2011(?), and much more. As a re-imagination of the traditional Hadoop stack, a number of features were implemented in Elasticsearch
- All analytics, data structures and data movement is based on JSON
- As described above, limitless storage by adding inexpensive nodes to the cluster.
- Use Logstash as the main data aggregator and conversion agent, which uses standard grok to create filters that parse data. Links to existing plugins and filters and more is here,
I would think that you only need to create IDMEF and IODEF filters(actually definitions) to immediately import or export data from everything else Logstash can already translate into and out of Prelude. And, if you want to inject something into the data like metadata tags, Logstash can do that for you, too.
In fact, should you wish to take a closer look at the Elasticsearch stack to see what you might like to assimilate, you'll notice that its three major components (Kibana which is the web query interface, Elasticsearch itself which is generally storage and Logstash which is the data aggregator and convertor) are completely independent components on their own which can be deployed completely independently or replaced... You just need to need to know how to "talk JSON."
In any case, am very interested in getting Prelude as it now exists off the ground...
Thanks for explaining your point around NoSQL and Elasticsearch.
But, this is Prelude OSS and not Prelude SIEM. Prelude OSS do only the alert part (real time part).
With Prelude SIEM (Commercial), we include the raw data (Syslog and others) through elasticsearch since 2 years. We also include many other things : behavior analytics, dashboard, reporting, incidents, administration, authentication, etc.
The roadmap of Prelude OSS does not include the integration of the raw data part, sorry. But if you want, you can contribute to the projet to add the support.
Every year, we do an audit of the performance part with experts in database (relational and NoSQL) and for now, our needs keep that relational database is the best choice for IDMEF database.
If you have more than 10000 alerts peer day, then you use Prelude in the wrong way, it is not a log management system.
Normally, you have to check every alerts, why it comes, and so on.
Thx for the clarification!