Text Variants and First Person Domain in Author Identification: Hermeneutic versus Computerized Methods

Omar A. S. Al-Shabab, Farida H. Baka


Since a language variety contains shared variants, and since a complete correlation between author and linguistic features is rarely acquired, it is suggested that linguistic features which fall outside the correlational agreement in a variety belong to the author's First Person Domain (FPD). Advances in computerized vocabulary profiling and readability provide useful characterization of features found in Academic English (AE), but they cannot capture the full range of linguistic features in a text. A corpus of about 38 extracts and texts (111.000 words) from local and international authors is analyzed to determine interpersonal and intrapersonal variations. The results show that language variation determines the features of FPD which are crucial for author identification and that computational methods are not adequately sensitive to insure a hundred percent author identification. Therefore, epistemological author identity profile (AIP) is suggested to plot alleged texts against the socio-physical and epistemological parameters of alleging authors.   



Vocabulary profile, Readability, Syntactic depth, Language variety, Author identification

Full Text:



Author (1986). Organizational and Textual Structuring of Radionews Discourse in English and Arabic. (Unpublished doctoral thesis). Aston University, UK.

Author (2012). First Person Domain: Threshold Mental Lexicon and Arab Learners of English. Proceedings of the Second Symposium on English Language Teaching in KSA: Realities and Challenges: Research Papers, Riyadh, Saudi Arabia (9-11 April, 2012), 141-208.

Author (forthcoming). Text Integrity, Editorial Practices and Author Epistemological Profile. In Author. New Horizons in Linguistic Interpretation: Poetry Translation. Manuscript.

Appen Speech and Language Technology Inc. (2008). Internet Safety Technical Task Force: Technology Submission – Text Attribution Tool. Retrieved from http://www,appen.com.au

Aristotle (written 350 B.C., Translated by S. H. Butcher). Poetics. Retrieved from http://classics.mit.edu/Aristotle/ poetics.html

Academic Integrity. Retrieved from https://www.google.com.sa/search?biw=1307&bih= 342&q=asu+academic+integrity+resource+guide&oq=ASU+Academic+Integrity+Resource+Guide&gs_l=serp.1.0.35i39.15784.19768.0.24396.

Baka, F. (1989). The Discourse of Biology Lectures: Aspects of its Mode and Text Structure. Ph. D. thesis, Aston University in Birmingham, UK.

Bruke, S. B. (2010). The Construction of Writer Identity in the Academic Writing of Korean ESL Students: A Qualitative Study of Korean Students in the US. (Unpublished doctoral dissertation) Indiana University Pennsylvania.

Chamcharatsri, P. B (2009). Negotiating Identity from Auto-ethnography: Second Language Writers’ Perspective. The Asian EFL Journal: Professional Teaching Articles, 38, 3-19.

De Beaugrande, R. and Dressler, W. (1981). An Introduction to Text Linguistics (digital 2002). London, Longman.

Dressler, W. V. (1978): Current Trends in Textlinguistics, Berlin & New York, Walter de Gruyter.

Ellis, J. (1965). Linguistic Sociology and Institutional Linguistics. Linguistics. 3, 19, 5–20.

Flesch, R (online). FLESCH READING EASE READABILITY FORMULA. Retrieved from http://www.readabilityformulas.com/flesch-reading-ease-readability-formula.phpFlesch, R. (2006). A New Readability Yardstick. The Classic Readability Studies, Costa Mesa CA, Impact Information, 96-111. Retrieved from http://www.ecy.wa.gov/quality/plaintalk/ resources/classics.pdf

Flesh, R (1948). A New Readability Yardstick. Journal of Applied Psychology. 32 (3), 221-223. Retrieved from http://psycnet.apa.org/journals/apl/32/3/

Grieve, J. W. (2005). Quantitative Authorship Attribution: A History and an Evaluation of Techniques. Unpublished M.A. thesis, Simon Fraser University, USA.

Halliday, M. A. K., Mcintosh, A. and Strevens, P. (1964). The Linguistic Sciences and Language Teaching (Longmans’ Linguistic Library). London, Longman.

Hill. T. (1958). Institutional Linguistics. Orbis, 7, 441-455. Retrieved from http://www. degruyter.com/dg/viewarticle/j$002fling.1965.3.issue-19$002fling.1965.3.19.5$002fling.1965 .3.19.5.xml;jsessionid= 8E93A3F08 AA99E4F5B5E1BA367FE422B

Hoover, D. L. (2003). Another Perspective on Vocabulary Richness. Computing in the Humanities, 37, 151-178 Kluwer Academic Publishers, Netherland. Retrieved from http://link.springer. com/article/10.1023%2FA%3A1022673822140

Hoover, D. L. (2006). Word Frequency and Keyword Extraction. HRC ICI Method Network, Centre for Computing in the Humanities, Kay House, London, 1-8.

Ivanić, R. (1997) Writing and Identity. Discoursal Construction of Identity in Academic Writing, Amsterdam: John Benjamin Publishing, (Chapters, 6 & 10).

Klein, P. D., & Kirkpatrick, L. C. (2010). A framework for content area writing: Mediators and moderators. Journal of Writing Research, 2 (1), 1-46. Retrieved from http://www. jowr.org/articles/vol2_1/jowr_2010_vol2_nr1_klein_kirkpatrick.pdf

Lu, X. (2010). Automatic Analysis of Syntactic Complexity in Second Language Writing. International Journal of Corpus Linguistics, 15(4), 474-496. Retrieved from http://www.personal.psu.edu/faculty/x/x/xxl13/papers/Lu_inpress_ijcl.pdf

Luyckx, K. and Daelemans, W. (2011). The Effect of Author Set Size a Data Size in Authorship Attribution. Literary and linguistic Computing, 26, (1), 35-55

Moonwomon-Baird, B. (2002). What do Lesbians do in the daytime? Journal of Sociolinguistics, 4, (3), 348-378. Retrieved from onlinelibrary.wiley.com/doi/10.1111/1467-9481.00120

Patchan, M. M., Charney, D. and Schunn, C. D. (2009). A Validation Study of Students’ End Comments: Comparing Comments by Students, A Writing Instructor, and Content Instructor. Journal of Writing Research, 1 (2), 124-152. Retrieved from http://www.lrdc.pitt.edu/schunn/ research/papers/JoWR_2009_vol1_nr2_Patchan_et_al.pdf

Russell, C. (1999). Experimental Ethnography: The Work of Film in the Age of Video. Durham (NC), Duke University press.

Sinclair, J. McH. (1972). A Course in Spoken English: Grammar. Oxford, Oxford University Press.

Sinclair, J. M and Coulthard, M. (1975). Towards an Analysis of Discourse: The English Used by Teachers and Pupils. Oxford, Oxford University Press.

Stamatatos, E., Fakotakis, N. and Kokkinakis, G. (2001). Automatic Text Categorization in Term of Genre and Author. Association for Computational Linguistics, 26, (4), 471-495. Retrieved from http://delivery.acm.org/10.1145/980000/971883/p471-stamatatos.pdf?ip=


Stańczyk, U. and Cyran, K. A. (2007). Machine Learning Approach to Authorship Attribution of Literary Texts. International Journal of Applied Mathematics and Informatics, 1, (4), 151-158. Retrieved from http://www.naun.org/main/UPress/ami/ami-22.pdf

Ure, J. and Ellis, J. (1977). Register in Descriptive Linguistics and Linguistic Sociology. Ed. by Oscar Uribe-Villegas. Issues in Sociolinguistics, The Hague, Paris, New York: Mouton Publishers, 197-243.

Zhao, Y. and Zobel, J. (2006). Searching with Style: Authorship Attribution in Classic Literature. Twenty-Ninth Australian Computer Science Conference (ACSC), Conferences in Research and Practice in Information Technology (CRPIT), 48. V. Estivill-Castro and G. Dobbie, Eds. Retrieved from http://goanna.cs.rmit.edu.au/~jz/fulltext/acsc07yz.pdf

DOI: http://dx.doi.org/10.7575/aiac.ijalel.v.4n.5p.170


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

2012-2019 (CC-BY) Australian International Academic Centre PTY.LTD

International Journal of Applied Linguistics and English Literature

To make sure that you can receive messages from us, please add the journal emails into your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.