Fill in the form or Send an email
To come to this final visualization of this data, I had to go through a design process. Looking at the Students Performance dataset I found on Kaggle, I had to decide which parts of the dataset I wanted to represent through my visualization. There were many avenues that I could have gone, for instance, comparing each student by their math, reading, and writing test scores, comparing students by their race/ethnicity, comparing students by their parental level of education, and so on. Ultimately, I decided to compare the student’s reading and writing test score performance and sort it by their gender because this was much more straightforward to me in understanding, and I felt that this would lead me to creating the best representation of the dataset. After deciding on the actual data that I wanted to visualize, next I had to decide on what kind of graph/chart/visualization I needed for it. Since there are 1000 entries in the Students Performance dataset, I opted into using a scatterplot. At first, I considered making three separate scatterplots to represent math, reading, and writing test scores, but after testing this strategy in Excel as well as Tableau, I realized that all of the test scores had similar slopes, so it would be difficult to distinguish with the naked eye how the scores differ across subjects. Additionally, this would have been a lot of extra coding to get everything to correlate with one another. To alleviate this problem, I decided to take on the challenge of figuring out how to graph two similar subjects (reading and writing) on the same scatterplot. This way, it would be easier for the viewer to see if there is a noticeable difference between the subjects as well as the gender’s performance.
To improve the overall quality, look, and feel of the scatterplot, there were a few features that I added into it to make it easier for the viewer to study. For instance, I read up on how to add a grid to my graph using D3.js because I had so many points to plot, having a grid to fill up the graph would make it much easier for the viewer to process the information they see on the graph rather than guessing where the points lie on the graph’s axes. Also, the points on the scatterplot are colored as blue for the male gender, and orange for the female gender. Seeing the points as colors helps the viewer see where differences in test scores may lie, as well as compare how males perform on these tests against females. Additionally, to include a little bit of interaction in my chart, I added a feature that highlights a point in red when the mouse hovers over the datapoint and also displays a tooltip along the bottom of the chart that defines the gender of the student as well as their exact reading and writing test scores. This allowed for more precision to be seen in the dataset and takes out the guesswork of having to estimate the true value of the point being shown on the graph. Finally, I scaled the graph so that both axes go from zero to one hundred because I believe that this scale provided the best zoomed-in view of the datapoints without hiding any information or misconstruing the information shown on the graph in any way.
The main question that I had from the dataset before creating my visualization was: “How do student’s reading and writing test scores compare across genders?” It was interesting to me to see whether or not test scores did differ across genders, and I was able to find that out in the visualization. If you take a look at the final visualization above, you can see that there is a slight difference between the two colors, blue and orange. The blue points seem to be scattered slightly higher than the orange scattered points are, thus implying that males seem to have higher test scores on average compared to female test-takers. It was also interesting to see how female test-takers has an outlier of the lowest overall reading and writing test score, as well as a perfect score on both the reading and writing tests. You can view this information by hovering over the lowest orange datapoint (towards the bottom left corner of the graph) as well as hovering over the highest orange datapoint (towards the very top right of the graph). Though no assumptions should be made about the overall performance of either gender, it is interesting to see through this graph that the distribution of points does differ across genders.
Link to Youtube Video: https://youtu.be/53uEirISEdo