What is Big Data
Everything we do in the modern world creates some form of data. With each purchase, visit to a website, our phones or other electronic devices leave behind some traces, i.e. data. It is nothing new that the basic force of some modern businesses lies in their ability to process huge amounts of data and make predictions, reach conclusions and propose actions based on such data. The expression Big Data is primarily used to emphasize the diversity in structures of processed data, and the classification of data into categories (in the table below) is the key to understanding how great a challenge it is to have disposal over all this data in a standardized manner in order to ensure results. Storage of all these data types so as to enable their optimal usage at the moment when there is a need for this is the first step in the exploitation of business value from the Big Data concept.
What is Data Lake
Data Lake is a virtual place for collecting and keeping structured and non-structured data. Data can be stored in its initial form without the need to transform it in any way. After storage, it is possible to create various types of queries, searches and processing of data by using tools for analytics, real-time processing and machine learning algorithms. In this manner, companies can receive higher quality information from the data they already have but cannot use it in its initial form. All of the above makes Data Lake a natural environment for Big Data and the basis of any bank initiatives in the field of artificial intelligence or machine learning.
Data Lake or Data WareHouse
Data Warehouse is a base optimized for analysis of relation data coming from transaction systems and a range of business applications. The data structure is defined in advance and optimized for searching using SQL queries.
Data Lake is an expansion of the Data Warehouse concept because, in addition to the structured data, it also stores non-structured data whose sources are mobile applications, IoT sensors or social media. They are searched in another way, using machine learning, text search algorithms, Big Data analytics.
captures and stores data in a single data warehouse, making it cost-effective.
As structured and semi-structured data is stored and managed in a single repository, data processing activities are optimized. Workloads such as data transformation and integration are performed relatively faster with this solution.
makes Extract, Transform, Load (ETL) faster and more efficient
allows analytics tools to work across data that may not have been associated before, generating new insights for businesses
Who benefits from a Data Lake?
Users, business analysts and data scientists can easily find the information they need without extensive IT involvement
Data strategists and data stewards can make information available to users in an organized and well-governed manner
IT security and governance teams can be assured that information is governed according to well-defined organizational and regulatory policies