The Comparative Power of Type/Token and Hapax legomena/Type Ratios: A Corpus-based Study of Authorial Differentiation

Sundus Muhsin Ali, Khalid Shakir Hussein


This paper presents an attempt to verify the comparative power of two statistical features: Type/Token, and Hapax legomena/Token ratios (henceforth TTR and HTR). A corpus of ten novels is compiled. Then sixteen samples (each is 5,000 tokens in length) are taken randomly out of these novels as representative blocks. The researchers observe the way TTR and HTR behave in discriminating four novelists: Joyce, Woolf, Faulkner and Hemingway. When compared to the traditional statistical features (e.g. word length average, Sentence length average, etc.), TTR and HTR are by far more competent in comparing the distinctive quantitative behavior of each novelist. It turns out that TTR and HTR contribute more or less in creating a sort of statistical identity which can be used in giving a vivid comparison and discrimination of the four novelists involved in this paper. Nevertheless, HTR sounds more viable in achieving the discriminating task than TTR.    

Full Text:



  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2010-2025 (CC-BY) Australian International Academic Centre PTY.LTD.

Advances in Language and Literary Studies

You may require to add the '' domain to your e-mail 'safe list’ If you do not receive e-mail in your 'inbox'. Otherwise, you may check your 'Spam mail' or 'junk mail' folders.