Return to site

Sachin scores a century, India loses the match.

A statistical view of indian batting

Motivation

India loses the match when Sachin scores a Century.

- Sachin haters

I am one of the person whose blood boils when I hear the above sentence. I always wanted to prove those people wrong, but how many mouths can you seal? I faced many people and was overjoyed to overturn them. One of the anecdote I cherish the most is drafted in my quora answer https://www.quora.com/What-is-your-favorite-memory-related-to-Sachin-Tendulkar/answer/Sameer-Darekar. This is my small effort towards the cause.

Findings

Data was gathered from the cricinfo statsguru website, I used Python with BeautifulSoup and Requests for web scrapping, Seaborn for visualization and Pandas for the data intensive operations. I also used the Scikit-Learn library for machine learning which I have used at the last for Linear Regression. The timeframe of the data used is from 1st January 1980 to 31st December 2015. The whole code is available in the Ipython Notebook.

broken image

First lets see the top run scorers for India in ODI format, as expected the Master is ahead of other players with an extensive margin. Here we see that Virat Kohli, the current bating sensation features in this list, considering the amount of cricket he played as compared to others it augurs well for the current generation of Indian cricket

broken image

Now lets go one step further and get the contribution of batsmen in wins and loses. We see that there is some shuffling but the top contributer remains the same and that too with a distinctive margin which just shows how he was a class apart from the others. We also see some constant contributers Ganguly and Dravid whose contribution cannot be ignored. From the graphs it is also clear that some batsmen like Yuvraj Singh, Virender Sehwag and Virat Kohli's contribution matters in winning may be because of their distructive batting styles considering total score would be unfair for the people who din't play much. So I set a threshold of atleast 1500 runs in wins and loses seperately and plotted few top percentage contribution in wins and loses.

broken image

Here also we see that contribution of Sachin in wins and loses is significant, no matter the conditions this guy has scored runs. In wins Shikhar Dhawan has the highest percentage of runs considering the meagre number of matches he has played he has performed well in wins but in loss he is no where. Then comes the current batting gem of everyones eye Virat and then the master of all. Now here people might argue Sachin has the highest percentage of runs in defeats ohh yeah!! he does because if you go back around 10 years he was the only back bone of indian batting and the indian team used to suffer when he fails. I also remember few matches where Sachin was the lone scorer. Sometimes he single handedly won the match, sometimes he couldn't. While thinking about this if you are his fan you would remember one of the epic match where he got the moniker "The Desert Storm", lets see the scorecard for that match

broken image

India needed 284 runs to win and 254 to qualify for the finals due to storm the match was reduced by 4 overs, which reduced the two targets to 276 and 237 respectively. Sachin scored a stupendous 143 by smashing all the bowlers throughout the park (remember Shane Warne and his nightmares? it was due to this series) and was out (he was not given out, but still he walks out) for 242 in the 42nd over and others just managed to score 8 runs of the remaining balls, resulting in loss for India. Sachin had sailed the Indian boat to the finals thereby completing his task and still people criticize this great man. How much can a person do! there are many such incidents, one of the epic match, the Chennai test vs Pakistan where India need 16 runs with 3 wickets remaining after Sachin was out, and they lost by 12 runs. If the haters still don't agree, I can't help.

Coming to the final part, When Sachin scores a century India loses the match, Really? lets see

broken image

Here I have plotted the centuries in wins and loses, Sachin has 33 centuries resulting in wins out of 49 centuries (Note: I have not listed centuries in tied and n/r matches sachin has 1 in each of the category). So how can one say India loses when he scores a century?

Statistical comparison of batsmen

Lets compare the high profile Indian batsmen statistically by plotting the Kernel Density Estimate (KDE) of their scores on x axis and density of the scores on y axis.

broken image

as expected the distribution is right skewed as there are more scores less than 50 for any batsmen, we see there are few outliers for Sachin and Sehwag above 200 due to their high scores. The Graph of Kohli has a smaller density than that of others for scores less than 50 but has higher density in scores approximately above 80 which indicates his good conversion rate to centuries. Dhoni has higher density of scores around 50 indicating his cameo roles he plays down the order. Sachin has the second highest conversion rate here after Kohli which is explainable constidering the astronomical amount of ODI's he played.

While doing this analysis I got few hints about the above par performance of Virat Kohli in some aspects so I compared the statistics of Sachin with Kohli with their individual KDE's with an overlay of their respective histograms.

broken image

The histogram indicates the higher density of Kohli for scores around the golden duck. If we compare the statistics he has a total of 10 (till 31st Dec 2015) numbers of golden ducks and Sachin has 20 in his whole ODI career, as the number of ODI's played by Kohli is far far less than that of Tendulkar hence the high density. More or less the density is equivalent and we dont see much variation in their score distribution. We can conclude that currently the score distribution is kind of similar with some minute variations, as time progress we might see some variations in these two curves depending on the performances of Kohli in future.

broken image

Lets go ahead and check their both career progression by number of matches played and cumulative runs they scored. here we see that Kohli is scoring at a rate higher than that of Sachin but that is understandable considering the effects of t20 cricket on the ODI's in recent years. Sachin faced few bad patches which can be seen clearly as few hiccups in the red worm, but Kohli is yet to face one which indicates his sublime form till now and hope so he continues the same for the bright future of Indian Cricket.

Seeing this I was very inquisitive about the number of matches Virat Kohli will take to break Sachin's record so I created a Linear Regression model considering his past as the training data. But before disclosing it the number completely depends upon how he performs in the future he may take less number of matches or even more. The linear regression assumes the same rate continues in future, so I got the exact amount as 433.02. So if Virat is fortunate enough to play 433 matches he will break the record of highest number of runs in ODI by Sachin Tendulkar, if he continues batting the same way he was doing it in past

My Opinion

I frankly feel statistics can't be used to predict the future in sports as there are many dynamics affecting the game, but it can be used to analyze the data from the past with much confidence and in a better way. However people may argue of Sachin's greatness. He may not be great according to some people but for me he is the greatest, my generation has grown up watching him play. He is an epitome of success, hardwork, simplicity and will power. He is gracious in defeat and humble in win, He has taught people to dream and chase them.

Don't stop chasing your dreams,

because dreams do come true.

- Sachin Ramesh Tendulkar