# | Name | Type 1 | Type 2 | HP | Atk | Def | Sp.Atk | Sp.Def | Speed | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|
1-721 | - | Bug, Dark, Dragon, Electric, Fairy, Fire, Fighting, Flying, Grass, Ghost, Ground, Ice, Normal, Water, Poison, Psychic, Rock, Steel, None (Only avavailable for Type 2) | Range | 1-255 | 5-190 | 5-230 | 10-194 | 20-230 | 5-180 | 180-780 | |
Mean | 69.26 | 79.00 | 73.84 | 72.82 | 71.90 | 68.28 | 435.10 | ||||
Median | 65 | 75 | 70 | 65 | 70 | 65 | 450 |
Preliminary analysis of the Pokemon Data Set revealed that Pokemon with only one type had Type 2 = None. To prevent errors due to null fields, they were changed to 'None'. Next, by using Pandas DataFrame methods, a rough idea of how stats were distributed was obtained. Unsurprisingly, 'Total' is several magnitudes larger than the other stats - for that reason, it will not be graphed with the other stats, in order to prevent 'squeezing'. Finally, for a more visual representation of stats, a boxplot was generated. Note that Type 1 and Type 2 are henceforth refered to as Primary Type and Secondary Type.
By using a barchart and the Pandas DataFrame.count() method, the type frequency distribution can be observed. Since types are categorical, the use of a Pareto Chart gives them a numerical value based on frequency.
The following trends can be observed from the charts:
1. Most types with high frenquency in the Primary Type chart, have low frequency in the Secondary Type chart,
2. Nearly half (48.25%) of Pokemon have a single type,
3. Secondary Type distribution has much less variance than Primary Type distribution (if Secondary Type = None is disregarded).
A Heatmap of the Primary Type/Secondary Type combinations allows us to see which combinations are most prevalent, and which do not exist. These combinations are order specific; therefore Bug/Poison and Poison/Bug are considered different. This specificity begs the question: is there a difference between Primary Type and Secondary Type?
Discussions on forums revealed that most people thought that Primary Type played a larger role solely in Pokemon appearance. However, further analysis is needed.
By superimposing the probability density function for each stat, based on whether the selected type is a Primary Type or a Secondary Type, we can find out if Primary Type is more important than Secondary Type. For most types and most stats, both density functions have similar shapes. While there definitely are exceptions (e.g. Psychic Sp. Atk), the density functions of the same type resemble each other a lot more than density functions from different types. Therefore I hypothesize that Primary Type does not have a stronger influence on stats than Secondary Type. By referring to the barchart above, we see that there are many more Pokemon of with each type as a Primary Type. Therefore, the Primary Type density function is much less sensitive to outliers. Moreover, since the Primary Type density function takes into account Pokemon with a single type, it is more 'pure' - that is to say, it is less influenced by other types.
Pay Attention to the graph axis scale - they change depending on the type