Article

I’m doing a research project on evaluating Communist party support in the context of the application of Socialism with Chinese Characteristics, relating widespread support for policies with the relevant socialist theory. Anyway, while doing research I stumbled across this usage of K-means clustering to analyze the data and with this application of a data analysis tool, the support for the party, while still high, varies greatly from what is initially suggested from the surveys.

Looking at it I find some of the justifications they use for describing typologies a little fishy. The questions asked are whether or not you trust the CPC on a four point scale with 1 being not at all and 4 being high amounts of trust, with the second question being about support for the one party system using the same scale. In any case they use K Clustering to break these groups into the four possible typologies and cluster the two of the middle groups together under the justification that people can be “ambivalent”. However, this feels like unnecessary simplification of the clusters in order to present the “ambivalence” as being more varied than it is. Just because people might have incoherent views on the issue doesn’t mean they do and presenting the issue as that feels like it could be “gerrymandering” data. I’m completely open to my speculations and reservations being completely off base, this is very estranged from my major, but I thought I would ask her for some help in understanding it.

You guys are pretty smart sometimes meow-tankie

The part I’m discussing occurs on page 56 where they begin to explain their statistics and methods.

  • Maoo [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    It sounds like a very silly study even from the initial design. A 4-point scale is embarrassing. You need an odd number in order to have both a neutral choice option and symmetry. The 5-point Likert is basically standard if you don’t want to think about it too hard, like the authors. 1=lowest rating, 5=highest, 3=neutral, and 2 and 4 represent intermediates. If they really did just want to know pro vs con vs neutral, they should’ve chosen a 3-point scale so that participants had to actually make that choice: 1=low rating, 2=neutral, 3=high rating. 4 is silly and you can see that they grouped together the 2s and 3s answers for exactly this reason.

    In terms of analysis, yeah that’s absurd. Even using K-means should raise little alarm bells given that their scale and dimensionality are so small that it’s basically discrete. K-means clustering is a continuous analysis. There are analogous methods to clustering for discrete cases where you cut cubes or sets of blocks.

    But even that is silly. It sounds like they could have just made a 2D plot and shown it, like a density plot. The frewuy of all the answers can be shown on a 4x4 plot, right? You could make every dot have a different opacity or size. Easy to see all of the results and any subsequent analysis would be visually explained. At the same time, the clustering should be visualized the same way and it would probably show how silly their choices were. It would be obvious that this is a discrete and low-dimension dataset, for example.

    Also, if they’re just gonna group middle clusters anyways, they should just admit they want to ask a question heuristically and create the clusters manually.