Semantic core identification as a method to overcome textoidness
We believe that neural machine translation results intended to function as a text always have enough potential for a semantic core (i.e. a communicative center with text-forming properties) to be found and verbalized.
The relevanceof this article is provided by two factors. On the one hand, machine translation software is widespread, easily available, and in active use; on the other hand, machine translation results have to be post-edited to the quality of a communicative text due to systematic disruption of its intra-textual connections in the machine translation results which turns out to be, in its raw, non-edited version, a set of separate sentences, in other words – a ‘textoid’ that should be fixed by an editor to function as a coherent text. Although frequent cross-checking between the original text and its translation helps eliminate occasional semantic errors and inaccuracies, the AI output in general still looks like a poor-quality text with a ‘machine DNA.’ This brings us to the core problem: now, there is no reliable method to assess and achieve global semantic coherence in AI-generated translations.
That is why our study aims to lay the foundations of a linguistic method for overcoming textoid-quality of machine translation results by means of semantic core identification. Through a comprehensive approach that comprises such methods as abstraction, analysis, classification, synthesis, modeling, and measurement this study has achieved the following results: (a) a unique tool for semantic core identification was proposed relying on such well-known linguistic concepts as subject, predicate, and object, as well as on a basic subject-logical typology of semantic relations; (b) a need to adjust the initial core wording/formula was demonstrated in 46 % of cases; (c) the median core volume (31 %) in a textoid was determined for medical news; (d) basic principles of linguistic annotation (how to label specific linguistic, structural, or semantic features) were proposed as well as a system of notations; (e) a principle for representing the semantic core by means of graphic formulae was proposed for illustrative purposes; (f) ways for further scientific research were outlined.
Conclusion: 52 textoids were analyzed to demonstrate applicability of our method, intended to serve as a reliable linguistic tool for identifying a semantic core which, in its turn, can function as (1) a text-forming essence that can be used in converting a textoid into a text; (2) a subject-logical benchmark for controlling and verifying translation, both for specific segments of the machine translation and for the text as a whole; and (3) a tool for interpreting unclear or contradictory passages within the textoid (without direct need to check up with the source text).


















While nobody left any comments to this publication.
You can be first.