TOWARDS AUTOMATIC SKELETON RECOGNITION OF
HUMANOID 3D CHARACTER
Open Computing Ltd
Rigging and skinning are standard techniques for character animation today. Overall rig structure of different characters
is similar to simplified human skeleton. But details may be very different, like number of bones, and their orientation.
In order to animate a characters with different skeleton structures, the structure has to be inspected, bones significant for
animation identified first. This paper explains heuristics used to inspect skeletal structure and recognize desired bones, in
order to perform inverse kinematics with of-the-shelf VR hardware.
character rigging, procedural animation, inverse kinematics (IK), virtual reality (VR)
Rigging and skinning are standard techniques for character animation today. Hierarchy of bones is attached
to mesh of a character, and spatial transformations applied to these bones determine deformations of
character mesh during animation. (Magnenat-Thalmann, N. et al, 1988)
Humanoid character structure seems much the same, they all have bodies, arms, legs and heads. Artists often
use Biped or CAT rigging systems (Arshad M.P. et al, 2019). Rigs can be generated automatically (Baran I.
and Popovic J. 2007), or manually built from the scratch with custom skeletons and geometries.
However, there's no standard humanoid character structure. Different authoring tools may provide their own
templates, and different authors use their own practices.
Furthermore, bones that don't correspond to actual human skeleton are often added for purpose of animation,
e.g. hair, weapons, breasts.
In order to apply correct animation to an arbitrary character, target bones (desired body parts) need to be
Only once a character skeleton structure is known, the character can be used as VR avatar. Position and
orientation of VR controllers and headset are used to control movement of avatar’s arms, head and body.
(Spanlang B. et al, 2013) (Parger M. et al, 2018).
This paper details rules found and algorithm developed to inspect skeletons and identify bones important for
VR avatar animation. The approach is successfully tested on 95 arbitrary open source 3D characters.
2. THE PROBLEM
The research was performed for development purposes of open source multiuser web VR engine, i.e. VR
server. Basic function of such a software is tracking of VR controllers and headsets, and applying appropriate
animations on user's avatars. Open source VR server can't rely on proprietary content, so we have decided to
use already published open source 3D characters.
All characters were auto converted to glTF format for practical reasons.
The approach explained here was built on examination on 95 humanoid characters, eventually 50 characters
was chosen for the project.
2.1 Character selection
Character selection criteria were size and pose. Figure 1 displays a rejected an accepted character.
Ideally, a character should be smaller than 10MB. Characters larger than 20MB are only accepted because of
their outstanding quality, determined subjectively, e.g. number of animations or good look. As file size
significantly degrades on-line user experience, 13 characters were determined to be too big for the project.
Initial pose of the character should be neutral, standing straight up, looking front. Arms should on the side,
ideally horizontal, at right angle to the body (T pose). Hands are tracked by VR controllers, so initial arm
position is optional. Total of 26 characters were rejected due to initial pose. These do have appropriate
structure, but to be used as avatars, they should be brought into neutral pose first. Another 6 characters was
rejected as manual intervention was required to fix the mesh, animation or level of detail.
Figure 1: Rejected character (left), joints shown as small spheres, IK targets as big spheres. Note a missing foot.
Accepted character (right) as VR avatar, arms following movement of controllers.
2.2 Observed character structure
First thing to note is that all examined characters have their left and right side switched, i.e. mirrored. This
may be consequence of modeling process: designers create their character facing them, and then name bones
and joints using themselves as reference. This is the exact opposite of expected biped character skeleton
(Arshad M.P. et al, 2019), illustrated in Figure 2.
Second most important difference is that characters come in all sizes, and several additional transformations
applied. Root node of character is not pelvis, but a point height zero, node used to move entire character in
space. Between that point and pelvis, a number of transformation nodes may be found, typically between two
and six. Main purpose of these nodes is transformation to appropriate coordinate system. Specifically,
BabylonJS glTF loader creates two: RootNode (gltf orientation matrix) and RootNode (model correction
matrix). However, an arbitrary number of additional transformations may be present, depending on authoring
Figure 2: Hierarchical Bone Structure for Biped Character (Arshad M.P. et al, 2019)
tool chain used. One distinct intermediate node is character animation root, set during authoring process,
which may or may not correspond to pelvis. For purposes of IK in VR, this node holds no significance.
Number of bones may be quite different, especially for characters supporting facial animation. While facial
animation is out of scope, these bones still affect recognition of head and neck. Some characters feature
animated hair, tail, or weapons, that all influence recognition of arms and legs.
Joints are not part of glTF skeleton structure.
Character structure observed in existing characters is displayed in Figure 3.
Figure 3: Simplified diagram of observed character structure. Optional nodes and connections are dashed. Additional
nodes may be attached to pelvis, hips, chest or head
As for orientation of bones, there seem to be no rules. IK animation of a bone must be converted to
coordinate system of parent bone.
Facial animation is certainly interesting, but other than eyes, there seem to be no rules to structure of head
bones. Only one character is rigged to mimic speech.
However, all 95 characters have bone names based on English words for appropriate bones. Additional 3
characters inspected used Portuguese and Spanish bone names.
3. THE SOLUTION
Based on observation of available characters, heuristics based on bone names seemed promising. Bone
names are not enough for proper discrimination, so additional constraints were introduced. Eventually, the
implementation of heuristic algorithm allowed for IK for all 95 characters.
3.1 The algorithm
Skeleton is processed from its root node, traversing through bone hierarchy. At each node, a specific set of
rules is applied in order to identify bones of significance for animation.
Left and right bones are processed the same, and are distinguished by their names, e.g. containing “left” or
“right”. Side may be determined relative bone position instead, but that requires calculation of all
transformation matrices first. String comparison is case insensitive.
First important bone to recognize is pelvis. Its main discriminating characteristic is that it has at least three
child bones. Additionally, its name may include “pelvis”, “hip”, “spine”, “root”. It is recognized only once,
i.e. if already found, recognition will not be attempted again. This is to ensure that other bones, e.g. shoulders
and arms, are not recognized as hips and legs.
However, if any of intermediate children bones of this pelvis candidate is a valid candidate (satisfies name
and number of children criteria), it is accepted as pelvis instead. This is due to possible bones that do not
represent legs, but other animated objects attached, e.g. skirt.
Once pelvis is found, its children bones are processed, to identify legs and spine. Children bones that do not
represent either legs or spine are ignored, as they are used to animate unidentified custom objects.
3.1.2 Hips and legs
Attached to pelvis, there may or not be hips, i.e. thigh can be attached directly to pelvis, or thigh can be
attached to hip that is attached to pelvis. This decision is up to the author of character, and relates to the
animation, as it influences how character mesh is deformed.
A leg candidate contains any of the following in the bone name: “right”, “left”, “[lr]leg”, “[lr]_”, “ [lr] “,
“[lr]thigh”, “[lr]hip”. If name contains “leg” or “thigh”, it’s a leg; otherwise, it may be a hip, or a footless leg.
A candidate bone has to have at least one child bone, and that child also has to have at least one child -
corresponding to thigh and calf.
A leg may not have bone(s) corresponding to the foot. That is another artist's decision.
A leg candidate is ignored if it has no children, or it has exactly one child that has no children (e.g. gun
holster attached to a hip or thigh).
A footless leg is considered if the bone has exactly one child, that child has exactly one child, which has no
children. That structure corresponds to hip-thigh-calf. Note that it doesn't necessarily mean that foot
geometry does not exist, only that it cannot be animated. Also note that it explicitly excludes leg that is not
child of hip bone but child of pelvis bone. Based on available models, such a leg can't be properly
distinguished from other bones used to animate attached objects.
In case none of the above criteria are satisfied, the bone is assumed to be a hip, and it’s first child is assumed
to be a leg.
This allows to properly identify leg bones most important for animation, upper and lower parts of a leg,
corresponding to thigh and calf.
Foot itself usually consists of one or two bones, but only the first one is used for animation. So, no additional
recognition is performed there.
3.1.3 Spine and chest
Spine candidate contains “spine” or “body” in bone name, or whatever is not leg and has at least one child
bone. Spine bones are processed recursively until a bone with at least three children is found, and then arms
and neck processing takes place.
Chest is, for purpose of this explanation, a bone that has arms and neck attached to, as shown in Figure 2.
However, it doesn’t correspond to bone names found in characters examined.
Chest candidate has at least three child bones, and children’s names contain “neck” or “head”, or child has
children, and their names contain “shoulder”, “clavicle”, “collar” or “arm”. Valid chest candidate contains at
least three child nodes satisfying these criteria.
This allows to distinguish chest from other spine bones that may have arbitrary bones attached.
Chest bone’s children are then examined for neck and arms.
3.1.4 Neck and head
Neck candidate bone name contains “neck” or “head”, or contains “collar” but does not contain “bone” (e.g.
is not “collarbone”) and does not contain “lcollar” nor “rcollar”.
If candidate's name does not contain “head”, and it has more than two children, this may be a special case
when 'arms grow out of neck', i.e. arms bones are children of bone named “neck”, or it may be case of
multiple bones added for facial animation. Either way, in this case, neck is processed, but spine processing
continues. Spine processing may override neck and identified here, as it may trigger neck identification later.
Neck's first child bone is assumed to be the head, and this concludes identification of head for purpose of
Arm candidate bone name contains “[lr]shoulder”, “[lr]clavicle”, “[lr]collar”, “[lr]arm”, “ [lr] “, or “[lr]_”.
All observed characters follow the same pattern afterwards, shoulder-upper arm-forearm-hand structure, so
identification is straightforward: shoulder's first children is upper arm, its child is forearm, and its child is
hand. In case hand has five children bones, fingers may be identified by names containing “index” or “point”,
“middle”, “pink” or “little”, “ring”, and “thumb”.
The algorithm explained above may not be easy to understand, but Table 1 hopefully makes it easier, as it
gives all identification rules on a glance. Bone name rules are written regular expression notation, but are
substrings rather than exact regular expression, i.e. matching regular expression should begin and end with *.
Any matching substring triggers the rule.
Table 1: Bone recognition rules
Bone of interest Discrimination criteria Bone name rules Additional constraints
Pelvis Children >= 3 pelvis, hip, spine, root Intermediate child satisfying
criteria take precedence
Hip (optional) Child of pelvis right, left, [lr]leg, [lr]_, [lr]thigh,
[lr]hip, ' [lr] '
Children >=1, child.children >=1,
child.child.name contains [lr]_
Thigh Child of hip (optional) or
thigh, leg (optional, without hip)
Calf Child of thigh
Foot (optional) Child of calf
Spine Not a leg, children > 0 spine, body
Chest Children >= 3 3+ children named shoulder,
clavicle, collar, arm, (having
children), head, neck
Neck Child of chest neck, head, [^lr]collar[^bone] Not head, children > 2 →
continue spine processing, may
override previously found neck
Head Child of neck
Shoulder Child of chest [lr]shoulder, [lr]clavicle, [lr]collar,
[lr]arm, '”[lr] “, [lr]_
Upper arm Child of shoulder
Forearm Child of upper arm
Hand Child of forearm
Fingers Children of hand [index|point], middle, [pink|little],
I can’t afford more than a few days on writing a paper. This explains heuristics and data structures as they
are. Anyone is welcome to use it as they see fit, build upon, and collaborate.
Magnenat -Thalmann, N. et al, 1988. Joint-dependent local deformations for hand animation and object grasping.
Graphics Interface ‘88, Canada, DOI: 10.20380/GI1988.04.
Arshad M.P. et al, 2019. Physical Rigging Procedures Based on Character Type and Design in 3D Animation. In
International Journal of Recent Technology and Engineering, Vol. 8, No. 3. DOI: 10.35940/ijrte.C5484.098319
Baran I. and Popovic J, 2007. Automatic Rigging and Animation of 3D Characters. In ACM Transactions on Graphics
Vol 26, No. 3. DOI: 10.1145/1275808.1276467
Spanlang B. et al, 2013. Real time whole body motion mapping for avatars and robots. VRST '13: Proceedings of the
19th ACM Symposium on Virtual Reality Software and Technology, Singapore. pp. 175-178.
Parger M. et al, 2018. Human upper-body inverse kinematics for increased embodiment in consumer-grade virtual reality.
VRST '18: Proceedings of the 24th ACM Symposium on Virtual Reality Software and Technology, Tokyo, Japan. DOI: