top of page

Get auto trading tips and tricks from our experts. Join our newsletter now

Thanks for submitting!

Quant analytic: Best method to transform a continuous variable to categorical variable in order to b

Quant analytic: Best method to transform a continuous variable to categorical variable in order to build a logistic regression model?


Which logistic regression model do you intend to use? If binary logistic: just decide on a cut-point which separates the two categories. In SPSS, you can use the recode method which is available on the Transform menu, or do it via syntax. If you create more than two categories, you might look at ordinal regression, which is an extension of the binary logistic model, but quite challenging to interpret. I assume your categories would be ordered, so in a multinomial logistic model you would lose part of the information contained in the data.


I would urge caution and recommend you reconsider whether you want to really want to “bin” your continuous outcome variable.

Logistic regression is best applied when the two outcomes reflect distinct states (for example, has diabetes vs. does not have diabetes). If you took a continuous variable, like income, and binned it to “over $40k” and “$40k or less” you really don’t have distinct states … the difference between $39,999 and $40,001 is trivial.

If you are struggling with a skewed outcome variable, I recommend you consider these two alternatives before resorting to binning it: (1) Use a generalized linear model and select an appropriate distribution (Poisson and Gamma are quite popular); or (2) Try transforming your outcome variable (such as a log transformation) to see if that makes it “more normal”.



You can generate a seq of cut-off points and then try to separate the continuous data to binary using the cut-off. Based on each logistic regression, calculate the AUC. Find the highest AUC and the corresponding cut-off. I think that cut-off may be the optimal one to classify your data into binary.

== i don’t understand how to use a generalized linear model sutch us poisson or Gamma to bin continuous variable can you give a simple example if you want.


I was suggesting consider using a generalized linear model instead of binning — not as a method to create bins. Sorry for the misunderstanding.


I wouldn’t recommend doing that. Why would you want to lose richness of data that have been collected using a ratio scale by downgrading it to data using a categorical scale. It might be better to run linear or non-linear regression and thus retain the robustness of the data you’ve collected. I’d recommend looking at various bivariate scatter plots (two-dimensional plots between the dependent variable and each of the independent variables, one by one), as the first step, to understand the nature of relationship, and then chose an appropriate regression model accordingly. … But if for any reason, you must want to change the dependent variable to categorical scale, you could just follow a simple step. Classify the independent variable data into intervals, examine the frequency distribution to look for distinct concentrations or groupings. Choose cut-off points as appropriate and combine the intervals into groups accordingly. Assign a value to each group, and what you now have is a categorical scale. Hope this helps.

0 views0 comments


bottom of page