¤Þ¥Î:
ì©«¥Ñ Ĺ°¨¤H ©ó 26-2-2010 04:12 µoªí
¦p¦ó¯à°µ¨ì¯u¥¿¯àĹ¿úªº¤H¤£¦h¡A¤@¯ën°µ¤@Ó½m°¨®v¹ï©ó°¨ªº³¡¸p¤âªk¡A¨C¤@¦ì½m°¨®v°µ¤@¦Ê¶µ¼Æ¾Ú¡A©Ò¦³ªºÂIÂIºwºw¤p°Ê§@³£nÝÅU¡C24Ó½m°¨®v¤j¬ùn¤G¤d´X¶µ¼Æ¾Ú¡CÁÙn°µ¤¦~¥H¤W¤~¦³·Ç½T²v¡C°ò¼Æ¤j¤F¤~¥i«H¡C¥»¤Hªá ...
I assume ¼Æ¾Ú means factors, not data points. (e.g."days between last gallop & next race" is a factor, a particular horse had "6 days" is a data point for this factor.)
5 years data = ~3500 races, or 40,000 horse instance, or less than 2000 average per trainer. So your data point to factor ratio is less than 20:1 on average. With this low ratio, you system is almost guaranteed to have overfit. Therefore it may look very good on paper on past races, but cannot really predict the future races.
To build a good system, you need
as little factors as possible, but
as many data points as possible. Does it make sense to you?