Pageviews past week

Saturday, January 23, 2016

KBO clustering

I refer to this MPML message and to the associated conversation, in particular to the interesting comments made by Aldo Vitagliano, author of Solex orbit simulator, about the difficulty to see a cluster of KBOs.

I do not know the answer but this prompted me to look at the KBO's orbit parameter distribution. What follows next is an exercise ... so I do not claim that it is correct!

I used the Web service made available by MPC to look for KBOs characterized by:
 250 AU < a < 1000 AU

I found this list:

2006 UL321260.76420.909904137.36673-6.20276-17.0798
2012 VP113268.25090.700563524.0183-66.96890.88303
1996 PW271.48480.990582129.72461-178.15845144.61041
2011 OR17287.18750.9892086110.3377214.03442-88.40106
336756 319.04390.9704821140.81617132.96423136.19752
2013 RF98325.10390.888374629.57957-43.4549967.57105
2004 VN112339.08370.860411525.51725-32.7647966.06033
2010 GB174364.85130.867026421.53648-12.61199130.60394
2015 DB216418.72130.980003840.48768-119.15783.27749
2010 BK118484.38160.987395143.90195179.11098175.97502
2007 DA61518.39540.994876676.71542-10.32894145.98047
90377 539.6680.858831511.92852-48.91693144.50447
2007 TG422546.44730.93488518.57957-74.16184112.98074
87269 586.59610.964551820.07087-147.4484142.37805
2002 RN109746.67250.996380257.99165-147.53176170.49258
308933 780.17640.96901119.46659122.49628-162.61603
2013 AZ60990.12810.99201316.52296157.9403-10.77058

Then I used the R package to produce a hierarchical cluster as follows:
1) I scaled the above table so that every column has mean 0 and variance 1.
2) I calculated the distance between any two rows (manhattan distance).
3) I submitted the scaled table to the function hclust choosing clustering method complete.
4) I used a further R function ( see rect.hclust ) to display colored rectangles at different height: the purpose is to help visualize the various clusters at different levels.


I do not know, whether these clusters have a statistical significance or not.

The left cluster maybe interesting: in fact, it consistently maintains its shape even when you cut the dendogram at a level where the second big cluster gets split into 5 subgroups.

The left cluster contains asteroid 2012 VP113 plus other 4 even more similar asteroids.
From now on I will refer to this cluster as "cluster 2" as opposed to "cluster1" made by all other KBOs without further distinctions.

A nice R function is cutree: you can tag the original table with a further column i.e. the cluster where it is supposed to belong. By doing this, you can use, for example, the function ggpairs of the GGally library to make a set of plots like these:
  • in the diagonal, you can see the density distribution, each cluster being given a different color. Cluster 2 is coloured in blue. 
  • above the diagonal: you can see the correlation between pairs of parameters, with cluster detail and total as well.
  • below the diagonal: you can see a scatter plot diagram of each pair of parameters, again the colour represents a cluster.

.... and, if you are interested in a specifc plot, you can make it alone with the ggplot function.

For example:
  • let's see a scatter plot of orbital parameters a and w where we add the name of the asteroids
  • finally, let's see the density distribution of orbital parameter w
a versus w
w density distribution

Kind Regards,
Alessandro Odasso