Predict customers leaving in the telecommunications industry ( Part 1)

6 min readOct 10, 2020

Being able to predict whether or not customers will continue using the service is very important for any organization. If it is possible to anticipate early failure of the customer to continue using the service, the organization can come up with options for customer retention.

But let’s not think about those effects for now. Let’s try to analyze the data of a telecom company to see what the story will be like.

Let’s see first what kind of problem we’re working on. We need to predict a label so this is supervised learning, the predictor is the categorical variable so it is a categorical problem.

In this part 1 we will try going through the entire analysis project from start to finish before attempting to apply more complex methods in the following sections.

Read Data

We have about 900 thousand records, with 152 “independent” variables and 1 churn variable to be predicted. The main variables are continuous variables representing the amount and number of times of using services with the latest 6-month history. Data only include ages 30–40 years old.

EDA

The data is not missing, but there is quite a lot outlier. The data is also not balanced in the two classes — which is quite easy to understand. In the EDA step I usually do the 3 things above without doing other techniques such as distribution plot … because it can cause me to make wrong assumptions about the data.

We will focus on important features, not all variables.
We need the change of the independent variable and the dependent variable in relation to other variables.

The two above are only possible when we have a fairly accurate model. So let’s build a good enough model first.

Feature correlation

I use dendogram to see how the variables are correlated. Note that the C and D services are highly correlated with each other, probably these are two additional services. Similarly we also have high correlation between services B and F, A and G, A and E, C and D …

Feature importance

I will use the permutation importance instead of the importance function provided by default by lightGBM. In this step, the highly correlated features will be grouped into a group to calculate the importance of the group.

Notice that Service A greatly affects whether the customer leaves or not. E, M, H services do not affect much. A user’s age does not affect whether he or she is likely to leave.

You will find it a bit strange that the age variable is separated into small variables, the value of these variables is 0 or 1 representing whether the customer is that age or not. Processes like this are just for the tree model to make decisions faster (because it only has to branch once) and not mean anything else.

Model

I use lightgbm with the hyper-parameter

The model that identifies the customer is still using the service or not (quite obvious) but not very good for the customer leaving. As can be seen from the graph, if the threshold of 0.5 is set, there is still about 50% chance of mistakenly identifying the customer leaving to still use.

Model interpretation

Let’s see an example with customers who continue to use the service.

And a customer leaves.

Recall that Service A has a huge impact on whether or not customers leave, much like we saw in defining feature importance. Let’s look at it in a little more detail.

In general, patterns are only really obvious in months near month t. This is quite understandable.

Total number of service times

Pattern is only quite clear with the last 2 months.

However, the trend of these 2 months is opposite, if in month n-1, the increase in the total number of service usage increases the likelihood of customers leaving, then in month n, it is the opposite. again.

But certainty for feature at month n is much higher than for month n-1. At month n, customers are at risk of leaving if the total number of service uses is less than about 200 times of use. With the total number of service usage times greater than 200 times, customers tend to continue using the service.

Total number of uses of service A (both free and paid)

There is a discrepancy in the impact of service usage on the results. With the number of times from 0 — about 50 times, customers tend to continue using. With a higher number of uses, from about 50 times to about 200 times, this makes customers tend to leave, while this trend is in contrast to using service A greater than 200.

Total duration of service usage A

In month: (n-5) we see an interesting pattern:

The allocation at 30,000 in my opinion is not accidental but due to product policy. Maybe this is a product related to minutes of service. Groups with a total service duration of less than 500 minutes tended to continue using, while groups with more than 500 minutes of use tended to leave. However, it must be noted that the impact of this variable is not large. This pattern appears only in month: n-5, not in the following months, personally I find this a bit confusing.