Different self-supervised learning frameworks perform differently between R-50 [20] (x-axis) and ViT-B [15] (y-axis). The numbers are ImageNet linear probing accuracy from Table 4.

Different self-supervised learning frameworks perform differently between R-50 [20] (x-axis) and ViT-B [15] (y-axis). The numbers are ImageNet linear probing accuracy from Table 4.

Source publication
Preprint
Full-text available
This paper does not describe a novel method. Instead, it studies a straightforward, incremental, yet must-know baseline given the recent progress in computer vision: self-supervised learning for Visual Transformers (ViT). While the training recipes for standard convolutional networks have been highly mature and robust, the recipes for ViT are yet t...

Context in source publication

Context 1
... fair comparisons, all are pre-trained with two 224×224 crops for each image (multi-crop training [7] could improve results, which is beyond the focus of this work). Table 4. ferent between ViT-B and R50: see Fig 7. MoCo v3 and SimCLR are more favorable for ViT-B than R50 (above the diagonal line). ...

Citations

Article
Self-supervised learning (SSL) has emerged as a promising alternative to purely supervised learning, since it can learn from labeled and unlabeled data using a pre-train-then-fine-tune strategy, achieving state-of-the-art performances across many research areas. The field of accelerometer-based human activity recognition (HAR) can benefit from SSL since unlabeled data can be collected cost-efficiently due to the ubiquitous nature of sensors embedded in smart devices, which is in contrast to labeled data, that require a costly annotation process. Motivated by the success of SSL and the lack of surveys on SSL for HAR, this survey comprehensively examines 52 SSL methods applied to HAR, and categorizes them into four SSL paradigms based on pre-training objectives. We discuss SSL strategies, evaluation protocols, and utilized datasets. We highlight limitations in current methodologies, including little large-scale pre-training, the absence of foundation models, as well as the scarcity of systematic domain shift experiments and domain knowledge utilization. Notably, the diversity in evaluation protocols across papers poses a considerable challenge when comparing methods. Future directions outlined in this survey include the development of an SSL framework for HAR to enable standardized benchmarking and large-scale pre-training, along with integrating domain knowledge to enhance model performance.