SMIM23


SMIM23 or Small Integral Membrane Protein 23 is a protein which in humans is encoded by the SMIM23 or c5orf50 gene. The longer mRNA isoform is 519 nucleotides which translates to 172 amino acids of a protein. In recent advancements, researchers have identified this gene, along with a few others, could potentially play a role in how facial morphology arises in humans. Though this research is still relatively new it provides the start of a path for further research on this gene.

Gene

SMIM23 is a protein-encoding gene. Basic information about its aliases and chromosome location are given in the table. The schematic of the chromosome helps to visualize the location of the gene.

mRNA

While the gene has two splice isoforms, it has three exon/exon boundaries indicating four exons.

Protein

Physical features

SMIM23 notably has a transmembrane domain.
The predicted isoelectric point for the unmodified/unprocessed protein in mice is 5.779 while only the transmembrane region in humans has an isoelectric point of 5.928
The gene appears to be Leucine and Glutamic Acid rich though not at any usually high number. It is also weak in all other amino acids besides Alanine, Serine, and Glutamine.
The region underlined in the conceptual translation was predicted to be an Involucrin repeat.

Post-Translational modifications

The transmembrane region is 1674.2 Daltons while the whole protein is 200008.51 Da. This is very similar to what was found with UniProt where predicted molecular weight was 20.025 kDa. Antibody kits were investigated to see banding pattern and weight changes that may have occurred post translation. C5orf50 Polyclonal Antibody from ThermoFisher Scientific has a Western Blot banding pattern at 40 kDa. This predicts that there is a significant amount of post-translational modification by addition of large components.
There are many phosphorylation sites along its sequence including two protein kinase C phosphorylation sites, cAMP- and cGMP-dependent protein kinase phosphorylation site, and a tyrosine kinase phosphorylation site. There is also a confident potential C-terminal GPI-Modification Site.

Secondary structure

There are two stretches of alpha helices from amino acid 33 to 49 and 89 to 136 based on evidence from various programs that predict secondary structure. The most informative of all the programs from the ones investigated is PELE on Biology Workbench.
A 3D protein structure was predicted to look like a series of helices, similar to what was predicted by other programs.

Subcellular localization

This human integral membrane protein is predicted to be found in the endoplasmic reticulum. The same kind of investigation of protein localization in other types of species returned conflicting results. Many programs predicted the protein to be present in the cytosol. This suggests the possibility of incorrect naming, i.e. the protein may not be integral membrane due to other predicted locations. This type of conclusion will require further information.

Expression

Not enough consensus exists as to where in the body SMIM23 is expressed. Databases indicate mainly in the testes, but this may be due to the lack of data.

Regulation of Expression

The promoter region of SMIM23 is approximately 1192 nucleotides long with various predicted transcription factors.
Regulation in the secondary structure is a predicted stem-loop in the 5' UTR region with a few areas of conservation across species.

Function and clinical significance

Novel research has suggested that how face shape arises in individuals may be influenced by a set of genes. This set includes SMIM23. Though in the paper the gene is referred to by an alias, it is clear that the scientists have gathered a list of five genes that likely determine facial shape. This is specifically people of European descent. These findings are supported by replicating phenotypes of each specific gene and statistical analysis. Just like findings elsewhere, the article mentions SMIM23 that likely codes for an unknown transmembrane protein. There have also been studies where a set of genes including SMIM23 may influence human height. Furthermore, a great deal of research is being done on chromosome 5 in general to understand roles of certain genes on it including SMIM23. This could one day provide insight into this gene’s specific roles on the chromosome itself.

Interacting proteins

The following proteins are predicted to interact with SMIM23.
Cilia And Flagella Associated Protein 43 also known as CFAP43 or WDR96 is the most confident of the predicted functional partners and is a tryptophan-aspartic acid repeat domain.
SFR1 is SWI5-dependent recombination repair 1 which is a component of the SWI5-SFR1 complex, a complex required for double-strand break repair via homologous recombination.
COL17A1 is collagen. Specifically type XVII, alpha 1. This may play a role in overall protein structure.
PRDM16 binds to DNA and acts as a transcriptional regulator. It functions in the differentiation between white and brown adipose tissue. It can also be a repressor of transforming growth factor-beta signaling.

Homology and evolution

There are no known paralogs.
There are around 100+ known orthologs which range from primates to small ground animals. From these investigations and that of sequence similarity, an ortholog space can be discussed. The closest relatives to humans with the SMIM23 gene were in primates so two types of monkeys were picked which diverged around 29.4 million years ago and had sequence similarities in the high 70s. Slightly more distant relatives with the gene come from a wide variety of animals from horses, to sea mammals, to bats, and more which all have similarities between 62-69%. Lastly, some distantly related orthologs were included like the Tasmanian devil and various scavenger animals which have similarities between 40-61%.
It is interesting to see how some portions are still highly conserved. The most interesting motif is tryptophan 124, leucine 125, and aspartic acid 126. Lastly, in BLAST a protein family of unknown function was returned. There are two small conserved sequences part of the DUF4635 motif. So though not completely conserved in the alignments done with SMIM23, these were labeled in the conceptual translation.

Orthologs

The protein was not found in bacteria, archaea, protists, plants, fungi, invertebrate, reptiles, and birds. All the found orthologs were under mammals. An unrooted phylogenetic tree of SMIM23 was created with a few close, moderately related, and distant orthologs. Here, larger the distance, longer the time to last common ancestor. Sequence identity refers to similar amino acids while similarity refers to amino acid match.
Genus and SpeciesCommon NameDate of Divergence Sequence Identity Sequence Similarity
Cercocebus atysSooty mangabey29.4473.877.8
Macaca mulattaRhesus monkey29.4473.378.3
Galeopterus variegatusSunda flying lemur7656.567
Tupaia chinensisChinese tree shrew8254.766
Castor canadensisAmerican beaver9054.165
Microtus ochrogasterPrairie vole9054.764.2
Mustela putorius furoFerret9659.962
Equus caballusHorse965768.2
Odobenus rosmarusWalrus9659.366.4
Acinonyx jubatusCheetah9658.763
Ursus maritimusPolar bear9658.169.3
Camelus ferusWild bactrian camel9655.262.2
Dasypus novemcinctusNine-banded armadillo10531.240.2
Echinops telfairiLesser hedgehog tenrec1055061
Sarcophilus harrisiiTasmanian devil15934.747.7
Monodelphis domesticaGray short-tailed opossum15928.544.6

Suggested Reading