- 1 Data Ingestion Infrastructure
- 2 Storage Infrastructure
- 3 Data Quality
- 4 Operations on Data
- 5 Efficiency of Data Operations
- 6 Scalability of Data
- 7 Keeping Data Secure
- 8 Conclusion
The Big Data Management process Describe the way how big data is getting manage in a variety of sectors.
In the modern world, huge unstructured data is generated every day and it is very significant to process or manage this kind of data.
It Means that Identify the primary issues in data is very crucial and is involved in the management of big data.
Data Ingestion Infrastructure
The ingestion Infrastructure of big data is depending on which type of data you are dealing with it can be retail data, e-commerce, finance, healthcare, etc.
The big data ingestion we can understand using a specific domain like here we can take healthcare data (Hospital data) and social media data to explore ingestion infrastructure.
Few important questions illustrate data ingestion infrastructure in the big data management system.
How many data sources are available?
Hospital data – The data sources might be varies based on hospital size and location, it might be approximately 30 data sources.
Social media data – The data sources for different social media can be approximately 2 million.
How large are data items?
The data in healthcare or hospital not that much like Average record size 5KB, Average image size: 2GB, and the number of records are 50 Million.
In social media, data is increasing every single day which is Average record size is 3KB, the Average image size 2MB, and the number of records 200 Billion.
Will the number of data sources grow?
The growth of health data is seasonal, and which is most of the time stable and very little growth.
In social media data growth is high and it is increasing massively, Now the data is 25 Million, and growing at 15% per year.
Rate of data ingestion?
The rate of data ingestion in health and hospital data is less like 3k to 5k every day.
In the social media data, the ingestion rate is higher as compare to other data which is approximately a peak of 200k per hour.
What to do with bad data?
In healthcare data is very sensitive and preparing and maintaining each and every record is very essential so most of the time Warn the data, flag it and ingest it to the data source is very crucial.
In social media data, most of the data is unstructured and untidy which is hard to prepare and maintaining in most situations.
Retrieving the bad data once for ingestion, if not possible then discarding the data is the best option.
Where and how do we store the big data?
Finding and Selecting suitable big data storage is much more involved in big data management.
Plus choosing the easy and flexible way of storing huge data is similarly significant in managing data.
How much data to store?
Calculating and finding the amount of data need to store which comes in the storage infrastructure of big data management.
The amount of data can be different in some cases like social media data, stock market data, or e-commerce data mostly generated in huge quantity, which needs a strong and secure big data system.
How fast the read/write happens?
In big data, the system contains Non-volatile Memory Express (NVMe) which use for the fastest data transfer between memory and SSD to reduce execution timing.
The SSD (Solid State Device) use for the fastest data processing memory that contains low latency and high cost.
The data quality is the crucial factor, Better quality of data means better analytics and decision making for the business.
Data Quality assurance means it is needed for regulatory compliance which helps to give more precise solutions. Eg. In Healthcare the clinical trials of drugs need quality data.
The data Quality leads to better engagement and interaction with external entities that generate more precise and accurate solutions.
Operations on Data
You can perform various types of operations on data that can produce expected results and serve the right decision-making.
The Operations on single data items can produce a sub-item that helps for easy understanding and processing.
Same like you can perform operations on collections of data items, Operations that combine two collections or Operations that compute a function on a collection, etc.
Efficiency of Data Operations
The operation that we perform needs to evaluate how efficient they are for management.
To find the efficiency of data you can measure the complexity of data using time and space.
The data operation efficiency can be check and improve using selecting the right subsets of data.
Scalability of Data
The huge volume of data is scaled up and scale out to increase or tune the scalability of big data.
Vertical Scaling is called the data Scale-up that does the operation like Adding more processors and RAM, buying a more expensive and robust server to increase the scalability of data.
Horizontal Scaling is calanid the data Scale-out that you can do by Adding more, possibly less powerful machines that interconnect over a network compare the scalability of data.
The Maintenance of data can be difficult and expensive, but it is very helpful to increase the scalability of data.
The Server industry provides many solutions for scale-up or scale-out decisions on big data for good management.
Keeping Data Secure
Data security is a very essential and significant factor in big data management and it is more crucial for sensitive data.
Big data processing and storing can be hard some time most of the data gets lost at the time of processing, for avoiding losses of data the huge amount of replicas available in big data systems.
The other part is Increasing the number of machines can be risky based on security as compare to fewer machines.
A huge amount of data is generated in most businesses and transactional data is the important factor and Data in transit must be secure.
In data security maximum amount of Encryption and decryption techniques used that increase security, sometimes that makes big data operations expensive.
Big data management is really not easy as compared to traditional data management that requires high expertise and strong domain understanding.
The Security of data from attacks is a critical part similarly it is a complex and primary task for big data management and this can be costly in a few cases.
Analytics Teams working on creating useful content related to Data Science, analytics, and AI. It is a team of skilled data Scientists and Analysts, some works full time and some are part-time.