It is well known that, in classification problems, the predictive capacity of any decision-making model decreases rapidly with increasing asymmetry of the target variable (Sonquist et al., 1973; Fielding 1977). In particular, in segmentation analysis with a categorical target variable, very poor improvements of purity are obtained when the least represented modality counts less than 1/4 of the cases of the most represented modality. The same problem arises with other (theoretically more exhaustive) techniques such as Artificial Neural Networks. Actually, the optimal situation for classification analyses is the maximum uncertainty, that is, equidistribution of the target variable. Some classification techniques are more robust, by using, for example, the less sensitive logit transformation of the target variable (Fabbris & Martini 2002); however, also the logit transformation is strongly affected by the distributive asymmetry of the target variable. In this paper, starting from the results of a direct survey in which the target (binary) variable was extremely asymmetrical (10% vs. 90%, or greater asymmetry), we noted that also the logit model with the most significant parameters had very reduced fitting measures and almost zero predictive power. To solve this predictive issue, we tested post-stratification techniques, artificially symmetrizing a training sample. In this way, a substantially increase of fitting and predictive capacity was achieved, both in the symmetrized sample and, above all, in the original sample. In conclusion of the paper, an application of the same technique to a dataset of very different nature and size is described, demonstrating that the method is stable even in the case of analysis executed with all data of a population.
University of Bari Aldo Moro, Italy - ORCID: 0000-0003-1641-039X
University of Bari Aldo Moro, Italy - ORCID: 0000-0001-9768-651X
ARTI, Agency for Technology and Innovation of Apulia, Italy - ORCID: 0000-0001-8179-4970
University of Bari Aldo Moro, Italy - ORCID: 0000-0002-4817-7169
Titolo del capitolo
Post-stratification as a tool for enhancing the predictive power of classification methods
Autori
Francesco D. d'Ovidio, Angela Maria D'Uggento, Rossana Mancarella, Ernesto Toma
Lingua
English
DOI
10.36253/978-88-5518-461-8.24
Opera sottoposta a peer review
Anno di pubblicazione
2021
Copyright
© 2021 Author(s)
Licenza d'uso
Licenza dei metadati
Titolo del libro
ASA 2021 Statistics and Information Systems for Policy Evaluation
Sottotitolo del libro
BOOK OF SHORT PAPERS of the on-site conference
Curatori
Bruno Bertaccini, Luigi Fabbris, Alessandra Petrucci
Opera sottoposta a peer review
Anno di pubblicazione
2021
Copyright
© 2021 Author(s)
Licenza d'uso
Licenza dei metadati
Editore
Firenze University Press
DOI
10.36253/978-88-5518-461-8
eISBN (pdf)
978-88-5518-461-8
eISBN (xml)
978-88-5518-462-5
Collana
Proceedings e report
ISSN della collana
2704-601X
e-ISSN della collana
2704-5846