Sunday, May 17, 2020
Definition of Disambiguation in Language Studies
In linguistics, disambiguation is the process of determining which sense of a word is being used in a particular context. Also known as lexical disambiguation. In computational linguistics, this discriminative process is called word-sense disambiguation (WSD). Examples and Observations It so happens that our communication, in different languages alike, allows the same word form to be used to mean different things in individual communicative transactions. The consequence is that one has to figure out, in a particular transaction, the intended meaning of a given word among its potentially associated senses. While the ambiguities arising from such multiple form-meaning associations are at the lexical level, they often have to be resolved by means of a larger context from the discourse embedding the word. Hence the different senses of the word service could only be told apart if one could look beyond the word itself, as in contrasting the players service at Wimbledon with the waiters service in Sheraton. This process of identifying word meanings in a discourse is generally known as word sense disambiguation (WSD). (Oi Yee Kwong, New Perspectives on Computational and Cognitive Strategies for Word Sense Disambiguation. Springer, 2013) Lexical Disambiguation and Word-Sense Disambiguation (WSD) Lexical disambiguation in its broadest definition is nothing less than determining the meaning of every word in context, which appears to be a largely unconscious process in people. As a computational problem, it is often described as AI-complete, that is, a problem whose solution presupposes a solution to complete natural-language understanding or common-sense reasoning (Ide and VÃ ©ronis 1998). In the field of computational linguistics, the problem is generally called word sense disambiguation (WSD) and is defined as the problem of computationally determining which sense of a word is activated by the use of the word in a particular context. WSD is essentially a task of classification: word senses are the classes, the context provides the evidence, and each occurrence of a word is assigned to one or more of its possible classes based on the evidence. This is the traditional and common characterization of WSD that sees it as an explicit process of disambiguation with respect to a fixed inventory of word senses. Words are assumed to have a finite and discrete set of senses from a dictionary, a lexical knowledge base, or an ontology (in the latter, senses correspond to concepts that a word lexicalizes). Application-specific inventories can also be used. For instance, in a machine translation (MT) setting, one can treat word translations as word senses, an approach that is becom ing increasingly feasible because of the availability of large multi-lingual parallel corpora that can serve as training data. The fixed inventory of traditional WSD reduces the complexity of the problem, but alternative fields exist . . .. (Eneko Agirre and Philip Edmonds, Introduction. Word Sense Disambiguation: Algorithms and Applications. Springer, 2007) Homonymy and Disambiguation Lexical disambiguation is well suited particularly for cases of homonymy, for instance, an occurrence of bass must be mapped onto either of the lexical items bass1 or bass2, depending on the intended meaning. Lexical disambiguation implies a cognitive choice and is a task that inhibits comprehension processes. It should be distinguished from processes that lead to a differentiation of word senses. The former task is accomplished fairly reliably also without much contextual information while the latter is not (cf. Veronis 1998, 2001). It has also been shown that homonymous words, which require disambiguation, slow down lexical access, while polysemous words, which activate a multiplicity of word senses, speed up lexical access (Rodd e.a. 2002). However, both the productive modification of semantic values and the straightforward choice between lexically different items have in common that they require additional non-lexical information. (Peter Bosch, Productivity, Polysemy, and Predicate Indexicality. Logic, Language, and Computation: 6th International Tbilisi Symposium on Logic, Language, and Computation, ed. by Balder D. ten Cate and Henk W. Zeevat. Springer, 2007) Lexical Category Disambiguation and the Principle of Likelihood Corley and Crocker (2000) present a broad-coverage model of lexical category disambiguation based on the Principle of Likelihood. Specifically, they suggest that for a sentence consisting of words w0 . . . wn, the sentence processor adopts the most likely part-of-speech sequence t0 . . . tn. More specifically, their model exploits two simple probabilities: (i) the conditional probability of word wi given a particular part of speech ti, and (ii) the probability of ti given the previous part of speech ti-1. As each word of the sentence is encountered, the system assigns it that part-of-speech ti, which maximizes the product of these two probabilities. This model capitalizes on the insight that many syntactic ambiguities have a lexical basis (MacDonald et al., 1994), as in (3): (3) The warehouse prices/makes are cheaper than the rest. These sentences are temporarily ambiguous between a reading in which prices or makes is the main verb or part of a compound noun. After being trained on a large corpus, the model predicts the most likely part of speech for prices, correctly accounting for the fact that people understand price as a noun but makes as a verb (see Crocker Corley, 2002, and references cited therein). Not only does the model account for a range of disambiguation preferences rooted in lexical category ambiguity, it also explains why, in general, people are highly accurate in resolving such ambiguities. (Matthew W. Crocker, Rational Models of Comprehension: Addressing the Performance Paradox. Twenty-First Century Psycholinguistics: Four Cornerstones, ed. by Anne Cutler. Lawrence Erlbaum, 2005)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.