Skip to main content Accessibility help
×
Hostname: page-component-857557d7f7-c8jtx Total loading time: 0 Render date: 2025-12-08T16:06:58.221Z Has data issue: false hasContentIssue false

Chapter 9 - Statistical Modelling of Syntactic Complexity of English Academic Texts Using Ensemble Machine Learning

Syntactic Predictors of Rhetorical Sections

Published online by Cambridge University Press:  03 December 2025

Mikko Laitinen
Affiliation:
University of Eastern Finland
Paula Rautionaho
Affiliation:
University of Eastern Finland
Get access

Summary

This computational modelling work investigates whether different rhetorical sections as subgenres of postgraduate English academic texts can be characterised by distinct types and amounts of syntactic structures. A corpus of dissertations written by students with different English language backgrounds and academic contexts was subjected to various Natural Language Processing (NLP) methods. Using a novel analytical method on linguistic data, this study identifies strong syntactic predictors of genres with the robust statistical modelling of ensemble learning. This method consists of four machine learning predictive classifiers of Random Forest, K-Nearest Neighbors, deep learning artificial neural network, and Gradient Boosting as the stacked layer and the Naive Bayes method as the meat-learner. The discussion of findings examines the extent of variability among the rhetorical sections of MA dissertations regarding the type and distribution of coordination, subordination, phrasal complexity, as well as the length of syntactic structures.

Information

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Haiyang, Ai and Xiaofei, Lu (2013). ‘A corpus-based comparison of syntactic complexity in NNS and NS university students writing’, in Díaz-Negrillo, Ana, Ballier, Nicolas and Thompson, Paul (eds.), Automatic Treatment and Analysis of Learner Corpus Data. Amsterdam: John Benjamins, pp. 249264.Google Scholar
Bardovi-Harlig, Kathleen and Bofman, Theodora (1989). ‘Attainment of syntactic and morphological accuracy by advanced language learners’. Studies in Second Language Acquisition, 11, 1734.CrossRefGoogle Scholar
Beers, Scott F. and Nagy, William E. (2009). ‘Syntactic complexity as a predictor of adolescent writing quality: Which measures? Which genre?Reading and Writing: An Interdisciplinary Journal, 22, 185200.CrossRefGoogle Scholar
Bhatia, Vijay K. (1993). Analysing Genre: Language Use in Professional Settings. London: Longman.Google Scholar
Bhatia, Vijay K. (1997). ‘Applied genre analysis and ESP’, in Miller, Tom (ed.), Functional Approaches to Written Text: Classroom Applications. Washington, DC: English Language Programs, US Information Agency, pp. 134149.Google Scholar
Biber, Douglas (2006). University Language: A Corpus-Based Study of Spoken and Written Registers. Amsterdam: John Benjamins.CrossRefGoogle Scholar
Biber, Douglas and Gray, Bethany (2010). ‘Challenging stereotypes about academic writing: Complexity, elaboration, explicitness’. Journal of English for Academic Purposes, 9, 220.CrossRefGoogle Scholar
Biber, Douglas and Gray, Bethany (2013). ‘Identifying multi-dimensional patterns of variation across registers’, in Krug, Manfred G. and Schlüter, Julia (eds.), Research Methods in Language Variation and Change. Cambridge: Cambridge University Press, pp. 402420.CrossRefGoogle Scholar
Biber, Douglas and Gray, Bethany (2016). Grammatical Complexity in Academic English: Linguistic Change in Writing. Cambridge: Cambridge University Press.Google Scholar
Biber, Dougals, Gray, Bethany and Poonpon, Kornwipa (2011). ‘Should we use characteristics of conversation to measure grammatical complexity in L2 writing development?TESOL Quarterly, 45, 535.CrossRefGoogle Scholar
Bitchener, John (2010). Writing an Applied Linguistics Thesis or Dissertation: A Guide to Presenting Empirical Research. Basingstoke: Palgrave MacMillan.CrossRefGoogle Scholar
Bunton, David (1998). ‘Linguistic and textual problems in Ph.D. and M.Phil. theses: An analysis of genre moves and metatext’, unpublished PhD thesis, University of Hong Kong.Google Scholar
Bunton, David (2002). ‘Generic moves in PhD thesis introductions’, in Flowerdew, J. (ed.), Academic Discourse. London: Pearson Education, pp. 5775.Google Scholar
Bunton, David (2005). ‘The structure of PhD conclusion chapters’. Journal of English for Academic Purposes, 4, 207224.CrossRefGoogle Scholar
Day, Robert A. (1989). ‘The origins of the scientific paper: The IMRAD format’. American Medical Writers Association Journal, 4(2), 1618.Google Scholar
Dudley-Evans, Tony (1986). ‘Genre analysis: An investigation of the introduction and discussion sections of MSc. dissertations’, in Coulthard, Malcolm (ed.), Talking about Text. Discourse Monograph No. 13. Birmingham: English Language Research, University of Birmingham, pp. 128145.Google Scholar
Egbert, Jesse and Plonsky, Luke (2015). ‘Success in the abstract: Exploring linguistic and stylistic predictors of conference abstract ratings’. Corpora, 10(3), 291313.CrossRefGoogle Scholar
Flowerdew, John (2017). ‘Corpus-based approaches to language description for specialized academic writing’. Language Teaching, 50(1), 90106.CrossRefGoogle Scholar
Gillaerts, Paul and Van de Velde, Freek (2010). ‘Interactional metadiscourse in research article abstracts’. Journal of English for Academic Purposes, 9, 128139.CrossRefGoogle Scholar
Grant, Leslie and Ginther, April (2000). ‘Using computer-tagged linguistic features to describe L2 writing differences’. Journal of Second Language Writing, 9(2), 123145.CrossRefGoogle Scholar
Halliday, Michael A. K. and Martin, James R. (1993). Writing Science: Literacy and Discourse Power. London: Flamer Press.Google Scholar
Hinkel, Eli (2003). ‘Simplicity without elegance: Features of sentences in L1 and L2 academic texts’. TESOL Quarterly, 37(2), 275301.CrossRefGoogle Scholar
Hunt, Kellogg W. (1965). Grammatical Structures Written at Three Grade Levels. Urbana, IL: National Council of Teachers of English.Google Scholar
Hyland, Ken (2002). ‘Directives: Argument and engagement in academic writing’. Applied Linguistics, 23(2), 215239.CrossRefGoogle Scholar
Hyland, Ken (2004). ‘Disciplinary interactions: Metadiscourse in L2 postgraduate writing’. Journal of Second Language Writing, 13, 133151.CrossRefGoogle Scholar
Hyland, Ken (2008). ‘Genre and academic writing in the disciplines’, in Plenary Speeches, a revised version of a plenary paper presented at the Biannual Conference of the European Association of the Teaching of Academic Writing, 30 June 2007, Bochum, Germany.Google Scholar
Hyland, Ken and Shaw, Philip (2016). The Routledge Handbook of English for Academic Purposes. Oxon: Routledge.CrossRefGoogle Scholar
Hyland, Ken and Tse, Polly (2005). ‘Hooking the reader: A corpus study of evaluative that in abstracts’. English for Specific Purposes, 24, 123139.CrossRefGoogle Scholar
Kermanidis, Katia L. (2009). ‘Learning to build a semantic thesaurus from free text corpora without external help’, in Mellouk, Abdelhamid and Chebira, Abdennasser (eds.), Machine Learning. Vienna: I-Tech, pp. 145166.Google Scholar
Kim, Ji-Young (2014). ‘Predicting L2 writing proficiency using linguistic complexity measures: A corpus-based study’. English Teaching, 69(4), 2750.Google Scholar
Kwan, Becky S. C. (2006). ‘The schematic structure of literature reviews in doctoral theses of applied linguistics’. English for Specific Purposes, 25(1), 3055.CrossRefGoogle Scholar
Kyle, Kristopher (2016). ‘Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication’, PhD dissertation, Georgia State University.Google Scholar
Lim, Jason M. H. (2006). ‘Method sections of management research articles: A pedagogically motivated qualitative study’. English for Specific Purposes, 25(3), 282309.CrossRefGoogle Scholar
Liu, Liming and Lan, Li (2016). ‘Noun phrase complexity in EFL academic writing: A corpus-based study of postgraduate academic writing’. Journal of Asia TEFL, 13(1), 4865.Google Scholar
Lu, Xiaofei (2010). ‘Automatic analysis of syntactic complexity in second language writing’. International Journal of Corpus Linguistics, 15(4), 474496.CrossRefGoogle Scholar
Lu, Xiaofei (2011). ‘A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development’. TESOL Quarterly, 45, 3662.CrossRefGoogle Scholar
Lu, Xiaofei (2017). ‘Automated measurement of syntactic complexity in corpus-based L2 writing research and implications for writing assessment’. Language Testing, 34(4), 493511.CrossRefGoogle Scholar
Lu, Xiaofei and Haiyang, Ai (2015). ‘Syntactic complexity in college-level English writing: Differences among writers with diverse L1 backgrounds’. Journal of Second Language Writing, 29, 1627.CrossRefGoogle Scholar
Lu, Xiaofei, Casal, J. Elliott and Liu, Yingying (2020). ‘The rhetorical functions of syntactically complex sentences in social science research article introductions’. Journal of English for Academic Purposes, 44, 116.CrossRefGoogle Scholar
Mancilla, Rae L., Polat, Nihat and Akcay, Ahmet O. (2015). ‘An investigation of native and nonnative English speakers’ levels of written syntactic complexity in asynchronous online discussions’. Applied Linguistics, 38(1), 124.Google Scholar
Nasseri, Maryam (2017). ‘A corpus-based analysis of syntactic complexity measures in the academic writing of EFL, ESL, and native English Master’s students’, paper presented at the Ninth International Corpus Linguistics Conference, University of Birmingham, 25–28 July 2017.Google Scholar
Nasseri, Maryam and Thompson, Paul (2021). ‘Analysing lexical density and diversity measures in a corpus of postgraduate academic writing: Lexical proficiency of EFL, ESL, and English L1 students’. Assessing Writing, 47, 100511.CrossRefGoogle Scholar
Ortega, Lourdes (2000). ‘Understanding syntactic complexity: The measurement of change in the syntax of instructed L2 Spanish learners’, PhD thesis, University of Hawai‘i.Google Scholar
Ortega, Lourdes (2003). ‘Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing’. Applied Linguistics, 24(4), 492518.CrossRefGoogle Scholar
Pedregosa, Fabian, Varoquaux, Gaël, Gramfort, Alexandre, Michel, Vincent, Thirion, Bertrand, Grisel, Olivier, Blondel, Matheu, Prettenhofer, Peter, Weiss, Ron, Dubourg, Vincent, Vanderplas, Jake, Passos, Alexandre, Cournapeau, David, Brucher, Matthieu, Perrot, Matthieu and Duchesnay, Édouard (2011). ‘Scikit-learn: Machine Learning in Python.’ JMLR, 12, 28252830.Google Scholar
Pho, Phuong D. (2008). ‘Research article abstracts in applied linguistics and educational technology: A study of linguistic realizations of rhetorical structure and authorial stance’. Discourse Studies, 10(2), 231250.Google Scholar
Pietilä, Päivi (2015). ‘Lexical diversity in L2 academic writing: A look at M.A. thesis conclusions’, in Pietilä, Päivi, Doró, Katalin and Pípalová, Renata (eds.), Lexical Issues in L2 Writing. Newcastle upon Tyne: Cambridge Scholars Publishing, pp. 105125.Google Scholar
Revelle, William R. (2023). psych: Procedures for psychological, psychometric, and personality research. R package version 2.3.3. Evanston, IL: Northwestern University.Google Scholar
Rinker, Tyler (2017). qdap: Quantitative discourse analysis package. Version 2.3.0. https://cran.r-project.org/web/packages/qdap/index.html.Google Scholar
Sheehan, Kathleen M., Kostin, Irene, Futagi, Yoko and Flor, Michael (2010). Generating Automated Text Complexity Classifications That Are Aligned with Targeted Text Complexity Standards. ETS Research Report RR-10-28. Princeton, NJ: Educational Testing Service.CrossRefGoogle Scholar
Swales, John M. (1990). Genre Analysis: English in Academic and Research Settings. Cambridge: Cambridge University Press.Google Scholar
Swales, John (2004). Research Genres: Explorations and Applications. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Swales, John and Feak, Christine B. (1994). Academic Writing for Graduate Students: Essential Tasks and Skills. Ann Arbor: University of Michigan Press.Google Scholar
Thompson, Paul (2016). ‘Genre approaches to theses and dissertations’, in Hyland, Ken and Shaw, Philip, (eds.), The Routledge Handbook of English for Academic Purposes. London: Routledge, pp. 379391.CrossRefGoogle Scholar
West, Gregory K. (1980). ‘That-nominal constructions in traditional rhetorical divisions of scientific research papers’. TESOL Quarterly, 14(4), 483488.CrossRefGoogle Scholar
Wickham, Hadley (2018). stringr package. R Package, Version 1.3.1. https://github.com/tidyverse/stringr.Google Scholar
Wolfe-Quintero, Kate, Inagaki, Shunji and Kim, Hae-Young (1998). Second Language Development in Writing: Measures of Fluency, Accuracy, and Complexity. Honolulu: University of Hawai‘i Press.Google Scholar
Yoneoka, Daisuke and Ota, Erika (2017). ‘Evaluating association between linguistic characteristics of abstracts and risk of bias: Case of Japanese randomized controlled trials’. PLoS ONE, 12(3), e0173526.CrossRefGoogle ScholarPubMed
Yoon, Huyng-Jo (2017). ‘Linguistic complexity in L2 writing revisited: Issues of topic, proficiency, and construct multidimensionality’. System, 66, 130141.CrossRefGoogle Scholar
Yoon, Hyunsook and Hirvela, Alan (2004). ‘ESL student attitudes toward corpus use in L2 writing’. Journal of Second Language Writing, 13, 257283.CrossRefGoogle Scholar

Accessibility standard: Inaccessible, or known limited accessibility

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

The PDF of this book is known to have missing or limited accessibility features. We may be reviewing its accessibility for future improvement, but final compliance is not yet assured and may be subject to legal exceptions. If you have any questions, please contact accessibility@cambridge.org.

Content Navigation

Table of contents navigation
Allows you to navigate directly to chapters, sections, or non‐text items through a linked table of contents, reducing the need for extensive scrolling.
Index navigation
Provides an interactive index, letting you go straight to where a term or subject appears in the text without manual searching.

Reading Order & Textual Equivalents

Single logical reading order
You will encounter all content (including footnotes, captions, etc.) in a clear, sequential flow, making it easier to follow with assistive tools like screen readers.
Short alternative textual descriptions
You get concise descriptions (for images, charts, or media clips), ensuring you do not miss crucial information when visual or audio elements are not accessible.

Visual Accessibility

Use of colour is not sole means of conveying information
You will still understand key ideas or prompts without relying solely on colour, which is especially helpful if you have colour vision deficiencies.

Structural and Technical Features

ARIA roles provided
You gain clarity from ARIA (Accessible Rich Internet Applications) roles and attributes, as they help assistive technologies interpret how each part of the content functions.

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×