• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

The only 7 types of people in ASR: Clustering Analysis

Joined
May 3, 2023
Messages
8
Likes
89
Location
Germany

Identifying Groups of Users Based on Their Interests Using Clustering Techniques​

Objective

The main objective of this analysis is to identify the ASR users’ preferences. The relative frequencies of messages in different subforums for each user is used as input features for a clustering algorithm K-Means.

Data retrieved
: 15 July 2023

Clusters Evaluation

Using the K-Means cluster method, we first obtained 9 different clusters that are rather unbalanced in nature. Nevertheless, we tried to interpret them:

Cluster123456789
Samples4942658933155613825311951352

After getting the desired number of clusters, the silhouette method was used to analyze, how well the clustering had been performed:

1690561067629.png


Each value on the plot represents the fit of an observation within its assigned cluster. Negative values suggest that an observation has more similarity with other clusters than with its own. As we can see, clusters 7 and 8 are not well-defined. Therefore, they were excluded from the further investigation. There are quite a lot of people belonging to the excluded clusters, but their behaviour could not be clearly classified.

Clusters investigation

We used box plots for visualizing how the input features vary for each of the clusters:

1690562348216.png


1690562348251.png


1690562348294.png


1690562348329.png


1690562348366.png


1690562348399.png


1690562348434.png


Displaying, how many comments the users from each of the clusters wrote to every topic:

1690562507782.png

1690562527384.png

1690562550548.png

1690562566556.png

1690562642257.png

1690562653884.png

1690562666898.png

1690562682177.png


Clusters Description​

Based on the gathered data, we can described these clusters:
  • Cluster 1: The most popular topics within this group are amplifiers and turntables. These users wrote nearly as much to “Ampflifiers, Phono preamp, and Analog Audio Review”, as cluster 9, although there is significantly fewer people in cluster 1. Meanwhile, users from all other clusters posted much fewer messages in this subforum. In the subforums “Stereo and Multichannel Amplifier Reviews” and “Turntables, Phono Amplifiers, Cartridges Review” the current cluster wrote the most messages.
  • Cluster 2: The main focus of people from this cluster seems to be on headphones, namely all 3 subforums, that are specialised on this topic (“Headphones and Headphone Amplifier Reviews”, “Headphone Amplifier Reviews & Discussion”, and “Headphone & IEM Reviews & Discussions”). As expected, a big part of messages in these 3 subforums come exactly from this cluster.
  • Cluster 3: A group of people from this cluster is really focused on “home theater” topic (more than 50% of their messages on average). Some other topics most of them occasionally write to are those concerning amplifiers and speakers. They also wrote much more messages to “Home Theater AVR and Processor Review” subforum than other cluster (although there are only 89 users in this cluster!).
  • Cluster 4: Another cluster, where people actually wrote more messages of their focus, than the largest cluster we took for the investigation. Some users from this cluster are most active in speaker reviews topic. The biggest activity in comparison to other clusters is shown in “Speaker Reviews, Measurements and Discussions” (with a large margin).
  • Cluster 5: We can notice the interest in DAC, DAP, and “Home Music Servers, Computers and Streamers” subforums. Most users from this cluster seem to be actually having the same behaviour according to the box plot. But these users do not tend to write a lot of comments according to the bar charts (even though this cluster is relatively large).
  • Cluster 6: Some people from this cluster seem to most frequently be writing to the “General Audio Discussions” topic (more than half of their comments). They wrote in this subforum a lot of comments (more than 200 comments per user on average) . But as there is a great variability of this value, which we can notice on the box plot, there can be a handful of people, who write there a lot, while others do not do that much.
  • Cluster 9: Now we are considering the largest cluster, that we got performing K-means. As we can see from the bar chart, its scope of interests is also similar to cluster 5. Users from this cluster mostly write to different subforums with DAC and DAP focus or to “Home Music Servers, Computers and Streamers” subforum. As expected, in comparison to other clusters, this group of users wrote the most comments into subforums concerning these topics (but it probably could not be otherwise due to the size of this cluster).

The cluster Amir belongs to: 8. So, Amir does not belong to either of the described above groups.

The top-10 users for each cluster:

1690562996535.png


1690562996567.png


1690562996596.png


1690562996622.png


1690562996649.png


1690562996676.png


1690562996703.png
 

mhardy6647

Grand Contributor
Joined
Dec 12, 2019
Messages
11,405
Likes
24,758
You missed the biggest group. Those who are just bored and reading for entertainment only
And/or those who click on "New Messages" and read the ones that catch their eye. :)
It is oh-so-easy to mis-meta-analyze data. ;)

Point of contention: The top users are seemingly so prolific that each cluster may well be defined by them and not the long tail of the indifferent. Please correct me if you've done anything to mitigate their weighty influence!
[emphasis added]

Thanks to this post, I have a new phrase to use*:
The long tail of the indifferent.
Brilliant! :)

_____________
* Or a title -- if I ever write a book or put out an album. :cool:
 

TankTop

Senior Member
Joined
Jul 10, 2019
Messages
380
Likes
376
And/or those who click on "New Messages" and read the ones that catch their eye. :)
It is oh-so-easy to mis-meta-analyze data. ;)


[emphasis added]

Thanks to this post, I have a new phrase to use*:
The long tail of the indifferent.
Brilliant! :)

_____________
* Or a title -- if I ever write a book or put out an album. :cool:
New messages, oh that’s definitely me
 

kemmler3D

Major Contributor
Forum Donor
Joined
Aug 25, 2022
Messages
3,352
Likes
6,866
Location
San Francisco
This is really cool!

I think the comment about excluding prolific posters is a good one. People do have their preferences and areas of expertise, and will tend to focus heavily on commenting in those areas. Not a big surprise. But I guess a follow-up question is whether there are generalists at all? What does this analysis look like for users in the (say) 85th percentile or below?
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,405
Likes
18,366
Location
Netherlands
I think it may be also weighted by the amount of questions that you ask.:(
Is that a negative weight then? I don’t think I made many topic in my cluster…
 

sarumbear

Master Contributor
Forum Donor
Joined
Aug 15, 2020
Messages
7,604
Likes
7,324
Location
UK

Identifying Groups of Users Based on Their Interests Using Clustering Techniques​

Objective

The main objective of this analysis is to identify the ASR users’ preferences. The relative frequencies of messages in different subforums for each user is used as input features for a clustering algorithm K-Means.

Data retrieved: 15 July 2023

Clusters Evaluation

Using the K-Means cluster method, we first obtained 9 different clusters that are rather unbalanced in nature. Nevertheless, we tried to interpret them:

Cluster123456789
Samples4942658933155613825311951352

After getting the desired number of clusters, the silhouette method was used to analyze, how well the clustering had been performed:

View attachment 301859

Each value on the plot represents the fit of an observation within its assigned cluster. Negative values suggest that an observation has more similarity with other clusters than with its own. As we can see, clusters 7 and 8 are not well-defined. Therefore, they were excluded from the further investigation. There are quite a lot of people belonging to the excluded clusters, but their behaviour could not be clearly classified.

Clusters investigation

We used box plots for visualizing how the input features vary for each of the clusters:

View attachment 301870

View attachment 301874

View attachment 301869

View attachment 301875

View attachment 301873

View attachment 301872

View attachment 301871

Displaying, how many comments the users from each of the clusters wrote to every topic:

View attachment 301877
View attachment 301878
View attachment 301879
View attachment 301880
View attachment 301881
View attachment 301882
View attachment 301883
View attachment 301884

Clusters Description​

Based on the gathered data, we can described these clusters:
  • Cluster 1: The most popular topics within this group are amplifiers and turntables. These users wrote nearly as much to “Ampflifiers, Phono preamp, and Analog Audio Review”, as cluster 9, although there is significantly fewer people in cluster 1. Meanwhile, users from all other clusters posted much fewer messages in this subforum. In the subforums “Stereo and Multichannel Amplifier Reviews” and “Turntables, Phono Amplifiers, Cartridges Review” the current cluster wrote the most messages.
  • Cluster 2: The main focus of people from this cluster seems to be on headphones, namely all 3 subforums, that are specialised on this topic (“Headphones and Headphone Amplifier Reviews”, “Headphone Amplifier Reviews & Discussion”, and “Headphone & IEM Reviews & Discussions”). As expected, a big part of messages in these 3 subforums come exactly from this cluster.
  • Cluster 3: A group of people from this cluster is really focused on “home theater” topic (more than 50% of their messages on average). Some other topics most of them occasionally write to are those concerning amplifiers and speakers. They also wrote much more messages to “Home Theater AVR and Processor Review” subforum than other cluster (although there are only 89 users in this cluster!).
  • Cluster 4: Another cluster, where people actually wrote more messages of their focus, than the largest cluster we took for the investigation. Some users from this cluster are most active in speaker reviews topic. The biggest activity in comparison to other clusters is shown in “Speaker Reviews, Measurements and Discussions” (with a large margin).
  • Cluster 5: We can notice the interest in DAC, DAP, and “Home Music Servers, Computers and Streamers” subforums. Most users from this cluster seem to be actually having the same behaviour according to the box plot. But these users do not tend to write a lot of comments according to the bar charts (even though this cluster is relatively large).
  • Cluster 6: Some people from this cluster seem to most frequently be writing to the “General Audio Discussions” topic (more than half of their comments). They wrote in this subforum a lot of comments (more than 200 comments per user on average) . But as there is a great variability of this value, which we can notice on the box plot, there can be a handful of people, who write there a lot, while others do not do that much.
  • Cluster 9: Now we are considering the largest cluster, that we got performing K-means. As we can see from the bar chart, its scope of interests is also similar to cluster 5. Users from this cluster mostly write to different subforums with DAC and DAP focus or to “Home Music Servers, Computers and Streamers” subforum. As expected, in comparison to other clusters, this group of users wrote the most comments into subforums concerning these topics (but it probably could not be otherwise due to the size of this cluster).

The cluster Amir belongs to: 8. So, Amir does not belong to either of the described above groups.

The top-10 users for each cluster:

View attachment 301885

View attachment 301890

View attachment 301888

View attachment 301889

View attachment 301891

View attachment 301887

View attachment 301886
And in English whats the take away?
 

GXAlan

Major Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
3,923
Likes
6,058
This is great. Is there a way to see how I score within each cluster?

Within these clusters… are there differences in speaker preferences (multi channel versus stereo, headphones vs speakers, dispersion…)

May be possible to answer this in a poll if we can map the 7 clusters in 7 dimensions and then have a vector representing where we are in 7 dimensional space.
 
Top Bottom