Skip to content

Analysing documents without reading them: 10 use cases

Share on twitter
Share on linkedin
Share on email
Share on whatsapp
Text analysis Knowing a document without reading it

There are many scenarios in which we need to know a document, but we cannot or do not want to read it. Possibly the most obvious one is to get out of doing a text commentary on the book we have been sent at school. But I am sure that, without much difficulty, more useful use cases for our professional life will come to mind. With these lines, therefore, we begin a series of articles that will allow us to learn how we can extract value for our businesses thanks to the solutions that technology offers us in the field of document analysis.

Computers are currently overwhelmed by the volume of information we produce. At the moment, they are unable to process all the available data to the desired value. However, we humans are actually more overloaded with information than our computers.

We will not delve into figures that become obsolete even before they are written. But to give you an idea, it is estimated that by 2020 at least 80% of the data produced will be unstructured data. In other words, data that is not organised according to a predefined model, such as photographs, video or text. And while it is true that data in text format is much less than in other formats, in our professional life, most of the relevant information is in written form (emails, reports, memories, offers, invoices...).

Therefore, the possibility of saving time and human effort in reading texts is key to getting the most out of our work. So, let's look at some examples where it would be interesting to have a computer consuming time and effort to help us learn about documents without having to read them.

Text analysis

1. Prioritise readings

Suppose you have all the time you need to read the documents you have pending. Wouldn't you at least want to know where to start?

Prioritising our reading allows us to read in order of interest, urgency, importance, relevance or simply appetite. In this way, we will deal first with the issues that most require our attention. By doing so, if there is a reason to stop reading, we will have reduced the potential harm of leaving an important document behind.

2. Classify comments

I remember a time in my childhood when the suggestion boxes in fast food chains were something of a sight to behold. Now social media makes it virtually unnecessary to have these boxes taking up space (although they are still there), but what do they say about me, my brand or my products in social media comments?

Text classifiers allow us to group similar comments together, so that we can detect what is a claim, a compliment, an irrelevant comment or an attempt to discredit. To do this, each text is analysed in order to extract descriptive characteristics of both the author and the text itself. In this way we can find out feelings, degree of subjectivity, topics covered, key words, etc. And with all this, it is possible to group those comments that are most similar to each other, so that we can classify them before we have read them.

3. Get the news type

Now suppose you only have some time to read the documents you have pending: wouldn't you at least want to know what selection of documents you would have to read in order to get an idea of what they are all about?

This is the case of daily news. It is most common for the same news item to be covered in different texts, possibly from different sources. It would be interesting to have a system capable of obtaining the most representative document of all the texts covering the same news item. In this way, it would only be necessary to read one news item of each type to get an idea of all the topics covered that day.

4. Summarising books

What if the reason that leads us to need to invest a lot of reading time is not the amount of text to read, but its length? Then, having a system that summarises the content of a document would be very useful.

Automatic book summaries allow us to obtain indicative data about the text, such as its length, key words or narrative style. Basically, they offer us shorter alternative texts whose reading replaces the reading of the original book. In this way, we obtain as much information as possible about the content of the book without having to read it in its entirety.


5. Analyse CVs

The applications of this type of technology in the world of human resources are very extensive. One of the most typical are the automatic CV analysers that allow the recruiter to have a previous analysis of the candidates presented to an offer.

These systems allow applications such as those mentioned in the previous cases. We can prioritise CVs, group similar candidates, obtain a typical example of each group or summarise professional profiles. We can go beyond pure text analysis, and, for example, involve the area of OSINT (open source intelligence). This makes it possible to cross-reference and enrich the information in the CV with the digital footprint left by the candidate on social networks. We can even have an automatic CV pre-evaluator, which could directly select those candidates who really fit the profile we are looking for.

6. Detect copies

Once again, the academic case is the first that comes to mind. From time to time, news of a computer system available to teachers to help them detect copying in school work or doctoral theses comes up again.

But establishing the degree of similarity between two documents has many other applications, and can help us save a lot of reading time. For example, my electronic library contains several hundred files, a good part of which are information brochures sent to me by different companies through their newsletters. The problem is that files with different names may contain the same (or very similar) content. So it is very useful to have a tool to determine whether a new brochure is already part of my collection.

Didn't the report "SmartGraph, your tool for fraud detection" reach me a few months ago?

+4 document analysis use cases

There are many other situations that we could discuss where automatic document analysis is of interest. But of all of them, there are 4 that we would like to dedicate their own articles to, so we will simply name them here:

  • State of the art classifier
  • Automatic invoice evaluator
  • Observatory of technological publications
  • Proposal Innovation Evaluator

However, the most interesting case of automatic document analysis will probably be the one you have to solve yourself. Would you like to tell us about it?

Share the article

Share on twitter
Share on linkedin
Share on email
Share on whatsapp

A new generation of technological services and products for our customers