By Pritha Banerjee, L&T MHPS
‘Data Science’! Isn’t that the word about which the entire world is talking about recently? What is this buzz all about? What is data? How is it related to science? What does Data Science altogether mean?
– Is it a new field or old wine packaged in a new bottle?
Let us dive deep into the world of Data Science.
- What is the source of this data?
Data is everywhere. It is generated from different sources. Any question-answer session can create data. Any experiment through sensors and instruments can create data. Our social media profiles are creating a bulk of unstructured data in the form of images and sounds. Every e-commerce site is creating data. Organizations have data in forms of financial logs, text files, multimedia forms, to name a few.
Recently, the amount of unstructured data has increased immensely compared to that of a structured one. So, we need a science to study it and a technology to handle them and make it functional. This is where Data Science comes into the picture
- Data Science:
Data Science can be simply described as a field that combines mathematics, statistics, and programming. Hence, it can be defined as a multi-disciplinary field using scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data in various forms.
- Is Data Science something new?
No, the term ‘data science’ has been used in different contexts over the past thirty years but only in recent times, it has become an established term.
- Purpose of Data Science
There are different purposes served by data science today. Few are mentioned below:
- Today every business has become data-driven. A proper methodology, technology and resource can provide better business understanding. Data generation & acquisition can deploy a better business model leading to a successful outcome as per the vision. So, with consumable data that is easily available, a lot of tools can be explored and applied to build a sophisticated data analytics solution providing better insights.
- In this competitive market, every organization is eager to bring in more business accounts. Hence, based on the past browsing history of your customer and their demographic details you can get a precise idea about their requirement. With the vast quantity and variety of data available we can train a model more effectively for customer recommendation related issues.
- We can train decision making models based on sensor data, camera, radars and lasers to follow a particular map with controlled speed and thus create a self-driving car by using advanced machine learning algorithms.
- Using predictive analysis models, we can generate weather forecast reports or predict the occurrence of any natural calamities.
- The list goes on!
- Techniques of Data Science:
A Data Scientist will do the exploratory analysis of the data available to discover insights from it. Then, will use various advanced machine learning algorithms to predict or provide decision. The basic processes are as follows:
Descriptive Analysis: A model that summarizes the data in a meaningful way.
Predictive analysis: A model which can predict the possibilities of a particular event in the future. For example, whether a lender will default on his/her payment or not.
Prescriptive Analysis: an intelligent model which is capable of taking its own decisions and able to modify it with dynamic parameters. For example, a self-driving car.
Machine Learning: A supervised learning trains its model based on the data available and predicts for future occurrence. For example, a fraud detection model can be trained using past transactional data of a finance company having records of fraudulent purchases.
Similarly, machine learning can also be used for pattern recognition. Such a model can find out hidden patterns in the dataset to make meaningful predictions. Such algorithms can be called as Clustering.
- Various Tools Used during the entire life cycle of Data Science
Step 1: Discovery or data understanding
Step 2: Data preparation – explore, pre-process and condition data prior to modelling.
Step 3: Model Planning – apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools and determine the relationships between variables. We can use R, SQL, SAS/ Access etc. tools for these purposes. Tableau, Power Bi can help in data visualization.
Step 4: Model Building – Develop training and test dataset and apply techniques like regression or classification or clustering to build the models.
Step 5: Operationalize – Implement the training model on the required data and find out results as required.
Step 6: Communicate results – identify all the key findings and communicate to the stakeholders to see if the model is successful or not.
To end with, we can easily claim that the future belongs to Data Science. More and more data will provide opportunities to drive key business decisions and will soon change the way we look at the world overloaded with data around us.
Hope you enjoyed reading this article! Please give your valuable feedback .
Note: Technical details and the pictures used here are googled and from various blogs.
One of the source that needs mention is : www.edureka.co/blog