Skip to content

I have data, how do I start analyzing it?

Share on twitter
Share on linkedin
Share on email
Share on whatsapp
Gobierna-tus-datos

One day, a company offers you a set of data, and it tells you:

"We want you to use advanced analytics to extract patterns, turn our data into information, the information into intelligence and generate value for our business, using big data tools and your best data scientists.

We cannot deny it, data analysis is in vogue, and it is accompanied by a good set of neologisms used at will. In particular, advanced analytics can have different meanings depending on who is using it. In this article we will give an overview of the different ways of performing (advanced or non-advanced) data analysis from three perspectives: the purpose of the analysis, the nature of the data, and the techniques used in the analysis.

Objective of the analysis

Depending on the objective of the analysis, three different types of analysis can be defined:

  • Descriptive analysis allows a first inspection of the data, calculating statistics on trends, data variability and visualizing the information through different graphs such as histograms, line graphs, heat maps, etc. Let us assume the case of a medical analysis system, which aims to reduce the probability of the appearance of a certain disease in a group of patients. With the data of this group of patients, we could observe the average age of certain groups, the dispersion of patients according to their place of residence, etc.
  • Predictive analysis allows the prediction of previously unseen attributes through the creation of models using statistics or automatic learning techniques. Continuing with the previous example, it could be predicted which patients have more risk of suffering a certain disease according to different attributes such as age, height, weight, sedentary life, sleep hours, blood test results, clinical history, etc.
  • Prescriptive analysis recommends an action to be taken according to the results of the predictive analysis in order to maximize a criterion. In the example above, the system would be provided with a set of possible actions such as modifying the diet, developing an exercise plan, providing certain medications, etc. and the system would be able to decide what actions could be taken to minimize the probability of suffering from a certain disease.

These three types of analysis are usually associated with obtaining data, information and intelligence within the intelligence analysis cycle.

Nature of data

The formats and types of data we can analyze are very varied, not always the information is structured as in databases, Excel, JSON or CSV files. This is the case of text, images, audio and video.

Text analysis is the classic example of unstructured information. For this type of data, different algorithms can be applied to identify the language, translate the text into different languages, extract the entities such as the names of people, places and organizations, extract the relationships between those entities, analyze the overall feeling or opinion of a document or particular entity, classify the text into a predefined set of categories and analyze the topics, unknown a priori, of a set of documents.

Image analysis, also known by the term computer visionThe new system is intended to automate the tasks that the human visual system can do, such as face recognition, place recognition, and general object recognition. Some analyses use specialized algorithms for population estimation from an image, the detection of defects in manufacturing processes or the detection of breast cancer from a mammography.

Audio analysis is a subset within signal analysis for signals within the audible spectrum. Some of the most common analyses include speech analysis where a conversation is transcribed from voice to text, speaker recognition which aims to detect how many and which people are speaking, song recognition such as the popular Shazam service, and generally sound recognition where the system determines by classification whether the sound is an alarm, a car engine, etc.

Video analysis is used for the analysis of movement allowing to define paths or identify events being video surveillance one of the most common applications.

Beyond the format of the data, among the structured data we can find data that have a particular structure and that by its nature allows us to make a specialized analysis. This is the case of graph analysis that allows us to detect fraud, analyze social networks, or generate knowledge models.

From data that have a certain latitude and longitude we can perform a geospatial analysis to detect trajectories, predict the transport needs of a city, or analyze the dispersion of a certain disease.

For data that apply to certain moments in time, a time series analysis can be performed that allows us to characterize a series through its trend and seasonality components and predict future values using different statistical and algorithmic tools.

Techniques for analysis

Traditionally, data analysis has been based on classic statistics. For some years now, machine learning algorithms have complemented classical statistical analysis. It is difficult to determine where statistics begin and where machine learning ends, or whether one discipline is a subset of the other.

From a practical point of view all techniques will help us to understand our data regardless of whether they are considered statistics or machine learning.

The simplest techniques to describe our data are the calculation of the central tendency through metrics such as mean, mode and median, the observation of the dispersion through variance and standard deviation or the shape of the distribution through asymmetry and kurtosis metrics.

Through more advanced techniques, we can create models to characterize our data and predict future values. Some of these techniques are based on regression, such as linear or logistic regression, while others are based on automatic learning algorithms such as neural networks, decision trees or vector support machines.

The choice of which techniques to use for our analysis will depend on the two perspectives above: the objective of our analysis and the nature of our data.

To conclude

In this article we have given an overview of where we can start analyzing our data. From now on, if a company offers us to analyze its data, we must specify what aspects we want to analyze, defining the objectives of the analysis, observing the nature of the data and choosing a set of techniques that will help us in our analysis.

By limiting the scope of the analysis we can better manage our clients' expectations and offer them a service adapted to their needs.

Share the article

Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on email
Email
Share on whatsapp
WhatsApp

A new generation of technological services and products for our customers