Applying rhetorical analysis to processing technical documents

: Rhetorical discourse analysis (RDA) emphasizes the communicative purpose and organization structures of texts. is paper proposes a rhetorical organization model for business-based technical documents. Using RDA and genre analysis, we deﬁne such a model based on functional and structural features in terms of macro-moves, moves, and functions that comprise a type of document called standard operating procedure (SOP). An SOP is a technical document that describes procedures for segments of business processes or policies to be implemented inside organizations (commonly called a procedure manual). We propose a functional-structural framework for analyzing SOP discourses by inﬂuencing and restricting their content based on the style of the ‘manual’ genre. We identify an SOP as a document with procedural information that serves as a relevant source for extracting domain knowledge and business information. We apply this model as an analytical-conceptual procedure for mapping business documents and generating a controlled language output.


Introduction
Discourse analysis is concerned with the relationship between the forms and functions of discourse, and it is a vast field encompassing various interpretive and explanatory sub-disciplines.In discourse analysis, creating a specialized text involves a distinct process whereby the text is generated from a specialized organizational discourse and produced by specialists who have mastered the cognitive and conceptual organization of the subject matter (Biber, 2006;Nickerson, 1999).
We approach the analysis of such specialized discourses from the genre perspective (Nickerson, 1999).Our approach for discourse analysis is based on the rhetorical analysis techniques proposed by Van Nus (1999) and Swales (1990;2004).
Rhetorical analysis is concerned with discourse construction such that it emphasizes the communicative purpose of texts.We applied rhetorical discourse analysis (RDA) and genre analysis to define a preliminary approach in developing a rhetorical organization model with functional and structural features that comprise a business-based technical document called 'standard operating procedure' (SOP).
Based on the structural move analysis and steps developed by Swales (1990), we define macro-moves and moves as structural aspects of SOPs that describe features of text, concerning global organization or sentencelevel features respectively.
e functional features of SOP encompass the rhetorical purposes that express communicative intentions of the documents authors.e structure and functions, therefore, influence and restrict the content and style of a text and enable the identification of linguistic features (Askehave & Swales, 2001).e proposed model is currently being used as the core of a mapping framework for business-based technical documents that can generate a controlled language output.e remaining sections of this paper are organized as follows.Section 2 describes the theoretical framework and some related work in the field of discourse and rhetorical analysis, and Section 3 presents our proposed rhetorical organization model.Finally, Section 4 presents our conclusions and outlines prospects for future work.

Theoretical framework and related work
is study focuses on the following perspectives of discourse analysis through the use of language for constructing, interpreting, and exploiting technical documents.

Discourse analysis.
Discourse analysis is one of the several fields devoted to the study of the social uses of language, particularly the linguistic relationship between forms and functions (Gee, 2000).Based on a systematic methodology, discourse analysis is interpretive and explanatory.e study of discourse is vast and encompasses several overlapping sub-disciplines, such as sociolinguistics and semiotics, genre studies, and specialized discourse analysis.
Specialized text analysis is a special field of discourse analysis (Nickerson, 1999;Biber, 2006).One approach to discourse analysis in specialized texts is through genre analysis.According to Yates (1998) and Swales (1990;2004) genres are defined as variations of a language that operate throughout the linguistic features present in a text.Genres are linguistically linked to specialized communicative purposes, participants, production/usage contexts, and modes of discourse organization among other elements (Parodi, Ibañez, & Venegas, 2010).
Genre theory (Nickerson, 1999;Van Nus, 1999) focuses on the written practices of members in particular communities and/or the design of information and business records.Specialized texts are generated from a specialized organizational discourse and produced by subject-area experts who have mastered the conceptual and cognitive organization of a topic.According to Cabré (1999), specialized discourse derives from variables related to the subject and perspective of a topic and the producer's intent and level of expertise.Following such a Genre theory, we can analyze specialized discourses because members of specialized communities are writing them.e text structure and its context describe a genre (Van Dijk, 2008).Genre analysis is an effort to relate the text structure to the macro-social context.It is oen characterized by identifiable purposes and schematic structures, which can be as numerous as the social practices people are involved in.Genre studies have been developed from Bakhtin's (1986) seminal work, the socio-rhetorical theoretical framework by Swales (1981;1990), and the approaches for systemic functional linguistics proposed by Christie (1999) and Eggins (1994).Based on these foundations and the approach developed by Meurer (2002), we applied genre analysis based on the characterization of genres as reasonably stable types of text (formal or informal), which can be recognized based on their rhetorical structure and function.

Rhetorical analysis
Rhetorical analysis is concerned with the construction of discourse, giving priority to the communicative purpose of each genre (Azaustre & Casas, 1997).Rhetoric aims at discourse from its intentional (purpose-driven) and instrumental (means of fulfilling the purpose) perspectives.us, rhetorical discourse organization is an approach where textual structures are employed to achieve a desired effect (Connor, 1996).ese structures provide a framework for articulating diverse discourses in a particular manner and textually constituting their relations.
Based on the framework of rhetorical analysis, genre analysis involves the analysis and description of a text in terms of rhetorical moves or rhetorical structures, which denote the functional parts or sections of a genre.e particular conformation of the text surface is defined by text organization levels, which is known as rhetorical discourse organization.In such a way, the structural units identified by genre analysis can be characterized as moves subdivided into steps, referring to those passages of the text that are larger than the largest grammatical units, e.g., clauses, and sentences, and possess some unity grounded in a common function and meaning.

Corpus linguistics
Corpus linguistics analyzes the linguistic properties of an extended passage, text, or corpus of texts, and includes genre analysis and semiotics perspectives in the computational analysis of text corpora.
According to Parodi (2008), corpus linguistics encompasses a set of methodological principles for studying any language domain.Collections of linguistic features, operated by genres, can be identified from a representative corpus.
Corpus-based approaches have been widely applied in a range of fields such as discourse analysis, language teaching, and stylistic analysis (Kennedy, 2000).Biber, Conrad, and Reppen (1998) summarized features of corpus-based studies as empirical approaches involving a natural texts sample, computer-assisted analysis, and a combination of quantitative and qualitative analysis.
Working with a specialized language corpus involves a strict selection of texts for identifying common patterns (Sinclair, 1991).Certain words and phrases may be rare within a general sample of texts, but also appear very frequently in certain specific texts in a strict selection.us, a corpus should be representative of one or more aspects of a language.

Proposal of rhetorical organization model
Based on the above described rhetorical analysis approaches, we use structural and functional features to propose a preliminary rhetorical organization model (ROM).ROM has been defined by employing methodological procedures, which are divided into the following phases:

Corpus definition
Defining the corpus begins with the collection of possible technical documents on the genre that are circulating online and contain access to full text.We broadly explored four types of technical documents, namely job description documents, functions manuals, corporate policy documents, and SOPs.We then selected the corpus for 'SOP' as a referent for this study, aer analyzing the collected documents.e population selection criteria for the corpus included the following: 'SOP' or 'SOP manuals' that are written in English, published online and have open-access on the Internet, having an author affiliated to a company and/or organization, and are text-based with a low percentage of images.
We analyzed the documents following a corpus linguistics methodology (Simpson & Swales, 2001;Tognini-Bonelli, 2001) according to the approach developed by Parodi (2005).We performed a descriptive linguistic analysis of business-based technical documents that centered on procedures, and subsequently developed the following processes: (i) searching, reviewing, and analyzing of documents available on the Internet; (ii) defining a hierarchy of the macro-genre, based on genre analysis theory; and (iii) developing a genre-based qualitative characterization, from the genre perspective, for the 'manual' macro-genre and the 'procedures manual' genre (Swales, 2004).
e term 'manual' can be applied to academic manuals, instructional and teaching textbooks, and technical procedure manuals (Parodi, 2008).In this study, we use procedures manual to refer to the latter category as a basis for identifying several genre sub-types, as presented in Figure 1 and Figure 2.
Explicitly, we define SOP as a linguistic genre such that an SOP is a constitutive document of a quality system describing a set of recurring operations.An SOP describes procedures for a segment of business processes and the effective implementation of a set of policies.
A manual is a set of written instructions describing how procedures are defined, developed, and managed by an organization's members.SOPs belong to the 'manual' genre along with more typical manuals such as procedure, quality, and user manuals.We used SOP as the main document type for our research because organizations use such technical documents as a means of specifying their technical, administrative, and operational activities.We identified an SOP as a type of procedural document that presents the most important or fundamental procedural information (Karreman & Steehouder, 2003), as a source for extracting domain knowledge and business information.Hierarchy of the 'manual' macro-genre.Classification of the 'procedure manual' genre.
Aer establishing the criteria for Internet search and corpus formation, we implemented a sampling procedure.e documents collected for the corpus correspond to 100% of the population; thus, forming a significant sample from a proportional estimated sample size with a confidence level of 95% precision and an approximate value of 5% of the measured parameter.Assuming that the population was evenly distributed, we selected a sample of 32 documents, corresponding to 64% of the total population, which was the minimum statistically randomized percentage calculated with the Z-test of proportions.

Digital corpus analysis
e corpus analysis involved tokenization, keyword and stop-word identification, characterization, and the creation of word frequency lists among other tasks.In addition, a concordance analysis was conducted by searching for identified lexical forms and every occurrence of a given word, together with its context.us, we generated an occurrences list of the given search terms in the corpus within the context where they occur.e corpus analysis considered 9252 word types and 167,905 tokens.

ROM design
e first step in discourse mapping entailed performing a search for theoretical models in the relevant literature, to be used as a reference for the rhetorical analysis of the SOP genre.
Some antecedents in the administrative field include previous attempts to provide guidelines on how to create well-written SOPs, including how to organize or format such documents (Stup, 2001;Grusenmeyer, 2003; North Carolina State University [NCSU], 2014).
Such approaches are concerned with understandable instructions that will enable SOP writers to write in a clear and simple manner.Additionally, authors such as Wieringa, Moore, & Barnes (1998) and Price (2001) proposed including a discussion of procedure, grammar, and writing along with principles and practices.
We identified approaches concerned with documents related to government and business (Trosborg, 2000;Renkema, 2003;McCarthy & Handford, 2004;Warren, 2004), as well as commercial documents (Freedman & Medway, 1994;Yates, 1998;Jameson, 2008); yet we found no references to genres related to SOPs.For this reason, we used an inductive method to define the preliminary model, according to Burdiles (2016).Such a method includes the following steps: (i) Random selection of four sample documents from the corpus (ii) Incremental construction of a preliminary model based on manual review of the structure and superstructure to identify the common organization units in the sample (moves) (iii) Definition of rhetorical moves as the functional sections of the genre (Swales, 1990;2004), whereby we adopted the macro-move concept, which entails a higher abstraction of rhetorical purpose (Parodi, 2008).us, each macro-move serves a communicative purpose and all macro-moves shape the overall organization of the text, and (iv) purpose identification of a higher level hierarchy of macro-purposes that comprise a set of more specific moves and detailed steps.
We defined our preliminary model as a set of functional and structural features that resulted from the identification of highly recurrent moves (the mandatory level).e mandatory moves were selected based on a set of defined categories for evaluating every move in each document from the sub-corpus.Such categories were related to the percentage of move occurrence as follows: 0% (does not appear in the document), 1-30% (low chance), 31-70% (average chance), and 71-100% (mandatory).In the reference model, we consider as moves the placed in the high percentages.As shown in Figure 3, the resulting model comprised three macromoves, containing 19 moves-showing more specific functional units, which are described as follows:

Preliminary ROM.
Macro-move I: Presenting the SOP.is macro-move entails the presentation of a preliminary statement that introduces the document and defines its purpose, conventions, revision schedule, approval authority, and organization, among other elements.
• Move 1: Identifying the SOP.is move includes identifying the organization authoring the SOP, including the author(s), company, location, filiation, name, and verbal/nonverbal identification.
• Move 2: Organizing the SOP.is move includes elements of the document's body related to content organization and lists of tables and figures among others.is move allows the reader to locate the document content and presents the entire hierarchical organization of the document.
• Move 3: Introduction.Justifies and presents the document.is move provides a general description of the document's context and establishes its purpose.
• Move 4: Presenting foreword.Presents a general overview of the document and describes what is included in each procedure.Additionally, it can describe who participated in writing the SOP, how it was organized, how to read it, the review process that was undertaken, and warnings about its use/distribution.
• Move 5: Documenting conventions.Establishes the document's context, namely the date of its approval, version number, author, and revision number.
• Move 6: Appointing regulations or regulatory requirements.Reviews the standards, contractual requirements, and policies or regulations associated with the procedures in the SOP.
• Move 7: Giving acknowledgments.Presents the compendium of writers and others involved in authoring the SOP and acknowledges their contributions.
• Move 8: Defining intended audience and reading suggestions.Defines the primary audience for the SOP, which can include management teams, operational teams, and organization staff.
• Move 9: Establishing purpose.Describes the general goal of the procedures within the organizational framework.is goal is oriented toward contextualization and purpose description.
Macro-move II: Developing procedures.is macro-move entails providing a detailed description of the procedures associated with each organizational process, thus defining a series of specific purposes, responsibilities, functions, procedural description, and rules for implementation.
• Move 10: Defining procedure purpose.Defines the purpose of each procedure.
• Move 11: Defining roles and responsibilities.Defines roles and responsibilities involved in each procedure.
• Move 12: Identifying prerequisites.Identifies the prerequisite steps in the procedure's execution, and may include rules, cautions, warnings, and recommendations for achieving them.
• Move 13: Listing definitions.Includes a list of definitions, concepts, and terms or acronyms used in the context of SOP.
• Move 14: Listing resources.Lists the equipment, resources, and materials required for the procedure's execution.
• Move 15: Establishing methods.Establishes the methods used to characterize or guide the procedure.
• Move 16: Specifying procedure.Provides step-by-step instructions to elucidate the details of procedures.
• Move 17: Including references.Lists bibliographical references that support the procedures.Macro-move III: Ending the SOP.is is an optional macro-move related to macro-moves I and II.
• Move 18: Adding supplementary information.is includes attachments supporting the development of the macro-moves.
• Move 19: Including references.Lists a set of bibliographical references.Macro-move II, 'Developing procedures', forms the backbone of the SOP genre because it serves as a unit that can be repeated a number of times to cover the details of all of the necessary procedures development.In contrast, macro-move I only occurs once throughout the text.Macro-move III is optional, but oen complements macro-move II depending on the information needed to support the procedures.
Following the construction of the preliminary model, we solicited detailed and systematic peer review analyses and generated a reference model based on the feedback received from the experts.

Rhetorical analysis
e rhetorical analysis was based on the reference model and comprised a set of manual activities as well as those supported by computational tools like AntConc 3.3.5w® [1], TermoStatWeb [2] , ElGrial [3] , and NLTK-Demo [4] .e activities proceeded as follows: i. Analysis and identification of rhetorical units.Rhetorical units refer to the organization units as macro-moves, moves and steps.is activity focused on the identification and registration of the identifier, summary, SOP category, macro-move identifier and name, move identifier and name, identified linguistic features, and an example.Additionally, we developed a process for identifying lexical and grammatical tracks from each rhetorical unit.
ii. Definition of mandatory rhetorical units.Each rhetorical unit from the SOP was considered optional or mandatory based on its relationship with the purpose of the macro-structure and the author's communicative intention, whereby a recurrent presence indicated the mandatory level of the move.
iii.Corpus preprocessing.e collected texts were converted into a .txtformat through certain transformation actions, and non-relevant information for text processing such as HTML tags, webpage names, and ads, was manually removed using tools like HTML Text and Multireplacer3.
iv. Identification of linguistic features by rhetorical unit.We identified several variables, including word frequency that was based on mutual information, parameters of probable co-occurrence among words, recurrent lexical items in given stretches of text, keywords, and word frequency among others.We further used computational tools to identify prototypical lexical-grammatical features from the corpus sample (Table 1).According to Biber, Conrad, and Cortes (2004) and Venegas (2010), some of these features are essential to current writing in specialized organizational fields.
v. Morpho-syntactic tagging.is analysis aimed to identify the morpho-syntactic categories and features that surrounded each rhetorical unit.We described the features for such categories based on feature object, i.e., noun, adjective, syntactic clause, verb, subordination, coordination, verbal mood, verbal periphrasis, and person.e feature description is presented with the included move in Table 1 and functional features in Table 2.

Prototypical lexical-grammatical features.
Notes: English is mostly/usually expressed by the present or the past and for the subjunctive mood has different usages and frequencies.*See the complete proposal in Manrique et al. (2013).
vi. Identification of occurrence frequencies for relevant features.We used computational tools related to corpus linguistics to analyze relevant features and identify their frequency of occurrence.

Evaluation of structural and functional features
We evaluated the structural and functional features in order to determine the rhetorical moves, which are fundamental for SOP writing.We followed the evaluation activities as described below: a. Selection of experts to evaluate the reference model (comprising 19 moves as presented in Figure 3); b.Designing a template for evaluation based on the reference model and considering the following factors: rhetorical unit of reference, example extracted from an SOP, evaluation, specification of the move's optionality, and a section to add comments; c.Designing an instruction guide to fill the evaluation template; d.Sending the request to experts for evaluation purposes; e. Analyzing and filtering the completed forms.We received the completed evaluation templates from the experts and developed an inter-rater reliability analysis according to the comments, evaluated parameters, and valuation of each parameter.For the subsequent analysis, we generated a new version of the model that comprised only the moves considered mandatory by the experts.For this analysis, the inclusion criteria consisted only of the moves that were evaluated with three or four positive responses and adjustments and/or changes of moves were made in cases where the responses disagreed or were unsatisfactory Table 3 shows the evaluation results of our ROM proposal.e final ROM proposal includes only 15 of the original 19 moves proposed in the reference model.

ROM approach.
Conclusions and future work is study proposed a first approach to a ROM in terms of functional and structural features comprised in SOP.In defining such a model, we used the methods of RDA and genre analysis.
We identified the discourse analysis approach that is equally efficient as other techniques applied for the analysis of documents, such as data mining, knowledge engineering methods, and formal methods.Discourse analysis is being used as the core of a mapping framework of business-based technical documents, which can generate a controlled language output based on the proposed model.
Our proposal has a distinct method of preliminary document analysis that considers each kind of businessbased technical document as a genre.Based on this consideration, we aim to identify the functional, structural, and subsequently linguistic patterns, which are morphological, lexical, syntactic, and semantic, that is closely aligned with the writer's communicative purpose.Our approach toward mapping the information and knowledge generated from the document analysis state that the original function of the text will reflect the communicative intention of the author.
We paid particular attention to distinctions among discourse analysis methods and natural language processing techniques (NLP) for business-based technical documents.Although scientific communities, which have approached the processing of such documents, generally work in isolation, NLP applications combine techniques from various approaches.Our experience in this field enables us to believe it is possible to use one mixed approach that incorporates discourse analysis methods and theories with disciplines such as requirements elicitation, knowledge engineering, and soware engineering.
We are currently in the process of developing proposals of heuristic rules for transforming the specific features of moves into a controlled language.In addition, a number of other problems must be addressed to further the development of a ROM.Such problems suggest a variety of research directions as follows: • Discourse processing in the organizational adaptation of information systems that specifically emphasizes the role of professional discursive practices in shaping the process of organizational adaptation of information systems.
• Rhetorical characterization of management discourses.Using our proposal of rhetorical analysis for business-based technical documents, we consider a challenging application for processing the management of written discourses.