Sign Languages (SLs) have developed naturally in Deaf communities. With no written form, they are oral languages, using the gestural channel for expression and the visual channel for reception. These poorly endowed languages do not meet with a broad consensus at the linguistic level. These languages make use of lexical signs, i.e. conventionalized units of language whose form is supposed to be arbitrary, but also - and unlike vocal languages, if we don't take into account the co-verbal gestures - iconic structures, using space to organize discourse. Iconicity, which is defined as the existence of a similarity between the form of a sign and the meaning it carries, is indeed used at several levels of SL discourse.Most research in automatic Sign Language Recognition (SLR) has in fact focused on recognizing lexical signs, at first in the isolated case and then within continuous SL. The video corpora associated with such research are often relatively artificial, consisting of the repetition of elicited utterances in written form. Other corpora consist of interpreted SL, which may also differ significantly from natural SL, as it is strongly influenced by the surrounding vocal language.In this thesis, we wish to show the limits of this approach, by broadening this perspective to consider the recognition of elements used for the construction of discourse or within illustrative structures.To do so, we show the interest and the limits of the corpora developed by linguists. In these corpora, the language is natural and the annotations are sometimes detailed, but not always usable as input data for machine learning systems, as they are not necessarily complete or coherent. We then propose the redesign of a French Sign Language dialogue corpus, Dicta-Sign-LSF-v2, with rich and consistent annotations, following an annotation scheme shared by many linguists.We then propose a redefinition of the problem of automatic SLR, consisting in the recognition of various linguistic descriptors, rather than focusing on lexical signs only. At the same time, we discuss adapted metrics for relevant performance assessment.In order to perform a first experiment on the recognition of linguistic descriptors that are not only lexical, we then develop a compact and generalizable representation of signers in videos. This is done by parallel processing of the hands, face and upper body, using existing tools and models that we have set up. Besides, we preprocess these parallel representations to obtain a relevant feature vector. We then present an adapted and modular architecture for automatic learning of linguistic descriptors, consisting of a recurrent and convolutional neural network.Finally, we show through a quantitative and qualitative analysis the effectiveness of the proposed model, tested on Dicta-Sign-LSF-v2. We first carry out an in-depth analysis of the parameterization, evaluating both the learning model and the signer representation. The study of the model predictions then demonstrates the merits of the proposed approach, with a very interesting performance for the continuous recognition of four linguistic descriptors, especially in view of the uncertainty related to the annotations themselves. The segmentation of the latter is indeed subjective, and the very relevance of the categories used is not strongly demonstrated. Indirectly, the proposed model could therefore make it possible to measure the validity of these categories. With several areas for improvement being considered, particularly in terms of signer representation and the use of larger corpora, the results are very encouraging and pave the way for a wider understanding of continuous Sign Language Recognition.