The Joy of S/MARs
This is possibly due to my reductionist, computer scientist brain rebelling against yet another level of complexity - the geography of the cell - when surely there's enough work to be done with the -omics we have. But anyway...
Up until the 70s, nuclear architecture was a mystery. Light and early electron microscopy couldn't pick out any structures and so everybody calmly just pretended like they had better things to do like making flies grow two heads. Then new cell preparation and fluoroscopy techniques appeared and some light was shed on eukaryotic cell nuceli.
A complex, flexible network of protein and RNA fibrils called the nuclear matrix was discovered. Amongst other things, this network serves as a scaffold to which chromosomes are attached (the relevant parts were imaginatively titled "the chromosome scaffold").

These are the Scaffold / Matrix Attachment Regions, or S/MARs for short. You can work out where they are in the lab - slowly - and there are a limited number of S/MAR sequences for different organisms available on the internet from the SMARt DB.
There's no real consensus sequence. They do tend to share some general features, though: S/MARs are between 300bp and several kb in length, tend to be AT rich and are enriched for features like Topoisomerase II binding and cleavage sites and curved or kinked DNA. Recently researchers have expanded on the latter feature - it turns out that S/MARs also have a high potential for stress-induced duplex destabilization (SIDD). Craig Benham at UC Davies has web based software to calculate SIDD for short sequences - incidentally, he has also produced work showing that SIDD-prone sites might be linked to regulatory potential, at least in E.Coli.
Why is anybody interested in where S/MARs are anyway? Well, there's their relationship to regulatory regions, faint evidence that they have something to do with where translocation breakpoints and gross deletions happen on chromosomes and the relationship between structural domains (the "loops of DNA" in-between attachment regions in the diagram above) and functional domains. It's also been mooted that there's a relationship between gene expression and the contents of each structural domain; thus, for example, important, highly-expressed genes are perhaps the only gene on their structural domains while other, larger domains contain groups of less important genes.
There's no shortage of interesting future experiments but what is lacking is the data. At the moment, the three ways to identify putative S/MARs in-silico are MAR-Wiz, Smartest and Web SIDD, all of which are web based scripts with limits on the amount of sequence that they can handle at once; limits that make genomic studies difficult (unless you're the people who wrote the software in the first place). Their workings aren't very transparent - I'm not sure if MAR-Wiz is even peer reviewed.
Which brings me to the point of this post... if anybody out there is looking for a neat coding project, a standalone S/MAR finder that incorporates SIDD as a feature would be great (an open source one that we could all tinker with would be even better). My attempts to create such a thing myself have exploded in a puff of greek letters, misunderstood equations and lack of time. If you doubt how useful S/MAR finding software might be, check out the number of papers that use MAR-Wiz (aka MAR-Finder).
p.s. I know about the EMBOSS one, but it's really behind the times.
Neil
Stew
Anonymous
. This post has trackbacks.
