ASAWEC: towards a corpus of Arab scholars’ academic written English
Linguistic corpora have been used in a wide range in recent years. Different types of linguistics analyses in both spoken and written discourses are being conducted using the corpus linguistics approach. Among these, academic writing has received considerable attention. Corpus linguistics has provided insights into the academic writing of both native and non-native English language learners and writers in general. Nevertheless, relatively few studies have investigated this topic in the Arab EFL setting. Consequently, there is a relative paucity in corpora of Academic written English by Arab speakers. To address this gap, we compiled the Arab Scholars’ Academic Written English Corpus (ASAWEC) which is a specialized corpus of Arab scholars’ academic written English. We collected the corpus texts according to specific criteria, and then we normalized and cleaned the data. The texts were then tokenized and tagged and the corpus underwent initial tests which yields insightful findings on Arab scholars’ academic written English such as the low lexical diversity and the utilization of various discourse techniques. The present paper introduces the corpus, provides details on its compilation, presents initial results and statistics, and discusses potential limitations and future perspectives for updating the corpus. It is envisaged that this project will encourage the use of the ASAWEC and help in launching similar initiatives to advance research in Arab corpus linguistics.
Figures
Sanosi, A. B, Mohammed, A. S. E. (2024). ASAWEC: towards a corpus of Arab scholars’ academic written English, Research Result. Theoretical and Applied Linguistics, 10 (3), 116-134.
While nobody left any comments to this publication.
You can be first.
Akeel, E. S. (2014). A corpus-based study of modal verbs in academic writing of English native speakers and Saudis: Theses in pursue of academic degree of Master’s in Applied Linguistics, Reading, 72 p.
Allan, R., Shaw, I. and Shaw, M. (2023). Building a corpus of written tasks of Swedish national tests in English: Motivation, method, and research applications, Nordic Journal of English Studies, 22 (2), 128154. https://doi.org/10.35360/njes.821
Almohizea, M. (2017). The compilation process of (COLTLC): A learner corpus, International Journal of Language and Linguistics, 4 (4), 223–231.
Alotaibi, H. (2017). Arabic-English parallel corpus: A new resource for translation training and language teaching, Arab World English Journal, 8 (3), 319–337. https://dx.doi.org/10.24093/awej/vol8no3.21
Anthony, L. (2022). AntFileConverter (Version 2.0.2) [Computer Software], Waseda University, Japan, available at: https://www.laurenceanthony.net/software/antfileconverter/ (Accessed 07 October 2023)
Anthony, L. (2023). AntConc (Version 4.2.4) [Computer Software], Waseda University, Japan, available at: https://www.laurenceanthony.net/software (Accessed 07 October 2023)
Atkins, S., Clear, J. and Ostler, N. (1992). Corpus design criteria, Literary and Linguistic Computing, 7(1), 1–16. https://doi.org/10.1093/llc/7.1.1
Baker, P. (2010). Sociolinguistics and corpus linguistics, Edinburgh University Press, Edinburgh, Scotland.
Baker, P., Hardie, A. and McEnery, T. (2006). A glossary of corpus linguistics, Edinburgh University Press, Edinburgh, Scotland.
Bird, S., Klein, E. and Loper, E. (2009). Natural language processing with python, O’Reilly Media, Inc.Sebastopol, CA, USA.
Blecha, J. (2012) Building specialized corpora: Thesis in pursue of the academic degree of Master’s in English Language and Literature, Masaryk, 159 p.
Bodell, M., Magnusson, M. and Mutzel, S. (2022). From documents to data: A framework for corpus quality, Scoius: Sociological Research for Dynamic World, 8, 1–15. https://doi.org/10.1177/23780231221135523
Brezina, V. (2018). Statistics in corpus linguistics: A practical guide, Cambridge University Press, Cambridge, UK.
Brezina, V. and Platt, W. (2023). #LancsBox X [Computer Software]. Lancaster University, available at: https://lancsbox.lancs.ac.uk/ (Accessed 07 October 2023)
Collins, L. (2019). Corpus linguistics for online communication, Routledge, London, UK.
Cox, C. and Newman, J. (2020). Corpus annotation. In Paquot, M. and Gries, S. (eds.), A practical handbook of corpus linguistics, Springer, Cham, Switzerland, 25–49.
Crawford, W. and Csomay, E. (2016). Doing corpus linguistics, Routledge, London, UK.
Darģis, R., Auziņa, I., Levāne-Petrova, K. and Kaija, I. (2020). Quality-focused approach to a learner corpus development, Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 392–396. DOI: 10.13140/RG.2.2.13826.43207
Dunn, J. (2022). Natural language processing for corpus linguistics, Cambridge University Press, Cambridge, UK,
Fuentes, A. (2009). A case study corpus for academic English written by NNS authors, in Gómez, P. and Pérez, A. (eds.), A survey of corpus-based research, Spanish Association of Corpus Linguistic (AELINCO), Madrid, Spain, 1101–1114.
Gilquin, G. (2020). Learner corpora, in Paquot, M. and Gries, S. (eds.), A practical handbook of corpus linguistics, Springer, Cham, Switzerland, 283–304.
Guerra, J. and Smirnova, E. (2023). How complex is professional academic writing? A corpus-based analysis of research articles in ‘hard’ and ‘soft’ disciplines, Vigo International Journal of Applied Linguistics (20), 149–184. DOI: 10.35869/vial.v0i20.4357
Hunston, S. (2002). Corpora in applied linguistics, Cambridge University Press, Cambridge, UK. https://doi.org/10.1017/CBO9781139524773
Jamalzadeh, M. and Tabrizi, H. (2020). Academic vocabulary in tourism research articles: A corpus-based study, Journal of Language and Discourse Practice, 1(2), 23–42. DOI: 10.14744/ldpj.2020
Kubler, S. and Zinsmeister, H. (2015). Corpus linguistics and linguistically annotated corpora, Bloomsbury Publishing, London, UK.
Lemmenmeier-Batinić, D. Spoken language corpora: Approaches for facilitating linguistic research, Dissertation in pursue of of Doctor of Linguistics. Zurich. 2023. 39 p.
Liu, D. (2022). Using corpora for learning academic writing: A systematic review, The thirty-first International Symposium on English Language Teaching, English Teachers’ Association-Republic of China (ETA-ROC), Taipei, Taiwan.
McEnery, T. and Wilson, A. (2001). Corpus linguistics: An introduction, Edinburgh University Press, Edinburgh, Scotland.
Meyer, C. (2023). English corpus linguistics: An introduction, Cambridge University Press, Cambridge, UK.
Sanosi, A. B. and Mohammed, A. (2024). A corpus-based analysis of Arab scholars’ use of interactional metadiscourse markers. International Journal of English Language and Literature Studies, 13(2), 188-200. https://doi.org/10.55493/5019.v13i2.5006
Sanosi, A. B. (2022). The use and development of lexical bundles in Arab EFL writing: A corpus-driven study, Journal of Language and Education, 8 (2), 108–123. https://doi.org/10.17323/jle.2022.10826
Sanosi, A. B. and Mohammed, A. (2024). A corpus-based analysis of Arab scholars’ use of interactional metadiscourse markers. International Journal of English Language and Literature Studies, 13 (2), 188–200. https://doi.org/10.55493/5019.v13i2.5006
Sinclaire, J. (1991). Corpus, concordance, collocation, Oxford University Press, Oxford, UK.
Stefanowitsch, A. (2020). Corpus linguistics: A guide to the methodology, Language Science Press, Berlin, Germany.
Toriida, M.-C. (2016). Steps for creating a specialized corpus and developing an annotated frequency-based vocabulary list, TESL Canada Journal, 34 (11), 87–105. http://dx.doi.org/1018806/tesl.v34i1.1255
Utkina, T. (2021). Teaching academic writing in English to students of economics through conceptual metaphors. The Journal of Teaching English for Specific and Academic Purposes, 9 (4), 587–599. https://doi.org/10.22190/JTESAP2104587U