Ibis Data Lake Ibis Data Lake is a software solution for the implementation of the Data Lake environment according to the reference architecture and best practices developed by the Ibis Instruments development center dealing with Big data technologies since 2014. The best industry practice, as well as the best experiences acquired from many production data lake implementations performed by Ibis Instruments, are incorporated in Ibis Data Lake.
What is Data Lake?
Data Lake is a repository for collection and storage of structured and unstructured data. Data can be stored in their original form without the need to be transformed in any way. After storing, it is possible to generate various queries, searches and data processing by using tools for analytics, processing in real time and machine learning algorithms. This way, companies can receive better-quality information from the data they already have but they have not been able to use in their original form so far.
Data Lake or Data WareHouse
Data Warehouse is a database optimised for analysing relational data derived from transaction processing systems and a series of business applications. Data structure is predefined and optimised for searches with SQL queries.
Data Lake represents an expansion of the Data Warehouse concept because in addition to the structured, it also stores unstructured data derived from mobile applications, IoT sensors or the social networks. They are searched differently, by using machine learning, algorithms for text searches, Big data analytics.
Why is Data Lake necessary?
Organizations which obtain good-quality information and conclusions from their data have a significant advantage over their competitors. These organizations experience a revenue increase by analysing data from log ins, web site clicks, the social networks etc. This way, they can observe trends and react to the changes faster.
Ibis Data Lake
Ibis Data Lake brings:
- Data Lake implementation according to the reference architecture and best practices
- System configuration and tuning according to best practices
- Training for system use
It provides a cost-efficient way for storing large volumes of data typically stored in DWH, including structured, semi-structured and unstructured data, their processing via map-reduce algorithms and data access through SQL-like interfaces. Data Lakes with layers for data integration, storing data on Hadoop file system (HDFS file system, HBase big data database and Kudu) and data processing layer, as well as access to the same, enable organizations to create a landing zone for raw data and realize the first phase of Data Lake formation, centralization of data storage.
In addition to the standard set of functionalities contained in Ibis Data Lake, there is also an added pre-integrated layer for data analytics which includes a set of tools for machine learning, data mining and statistical processing of data stored on the big-data layer of the solution.
Ibis Data Lake brings the following data science components:
- Knime-open source
Besides additional data science components, there is also an advanced module for NiFi data integration, which provides a powerful and simple interface for data integrations – ETL.
Data Science add-on fulfils technical preconditions for the formation of a significant data science team within an organization, that can efficiently deal with data analysis and experimenting, with the purpose of finding hidden values in them.
In addition to the data science module, there is also a component for analytical processing of data streams in real time. The components which make that possible are: Apache Flink and Kafka. They enable organisations to create a complete data-processing environment coordinated with lambda architecture which represents a standard for these environments.
Ibis Data Lake Architecture
storing (Data lake)
Data access and
(not included) in Ibis license,
may be purchased separately
Ibis Data Lake is agnostic from the aspect of Hadoop distribution, so it can be implemented on one of the available distributions:
If commercial Hadoop distribution is chosen, it is necessary to obtain additional licences for the same, and Ibis can offer them as an option, since it is in partnership with the leading vendors in this field such as Cloudera.
Data Lake is implemented on-premises or as a Cloud/hybrid option, and it can also be implemented as a cloud-native solution through the cloud service of a Cloud provider, such as: Microsoft Azure, IBM Softlayer, Oracle Cloud, … In this case, cloud services are additionally licenced, and they can also be offered as an option by Ibis Instruments.