Flags and Lollipops

Friday, April 18, 2008

Nice work Pedro!

Noticed while leafing through today's Nature that Pedro has a paper out (Isalan et al., Evolvability and hierarchy in rewired bacterial gene networks).

There's more on this over at Public Rambling.

Comments and trackbacks Feel free to post your comments Blogger Pedro Beltrão . This post has trackbacks.

Thursday, April 17, 2008

Ian owes me a pint

(update: Gavin Bell at Nature gave up one of his app spots so that I could put this live, which I did: only to discover that Google App Engine is even more unforgiving of timeouts than Facebook. Currently trying to work out how to make the bookmarking process, for now it doesn't work very well. Also the search is broken, though that's Google's fault and not mine.)

I bet Ian earlier that I could rewrite Connotea on App Engine in six hours. I can't remember why. Probably ego (mine, I mean). He didn't actually bet me a pint but he should have done...



... because the original estimate was a tad optimistic (ahem). After twelve hours I've produced pycite, though, which is pretty good going I think. I'll admit it: Python is actually very cool.

pycite is three hundred lines of logic and a set of html templates that implements a (very simple) social bookmarking service. Sadly I don't actually have an App Engine account so it's not live on the web anywhere (I'll buy whoever does have an account and puts it up first a pint - let's spread the love), you'll have to download it and run it locally to see it in action.

What you can do with it:

  • run it without owning a server of your own

  • log in with your Google account

  • add new bookmarks (the citation will be collected automagically)

  • view everybody's bookmarks

  • filter bookmarks by user:
    http://path.to.pycite/users/bob.smith

  • and by tag:
    http://path.to.pycite/tags/diabetes

  • and by user and tag:
    http://path.to.pycite/users/bob.smith/tags/diabetes

  • and by keyword (the full text of each bookmarked page is searchable):
    http://path.to.pycite/users/bob.smith?q=t2d

  • get atom feeds for all of the above


What you can't do with it (yet):

  • edit or delete bookmarks

  • anything else



I've put it all up on Google Code. It's fairly straightforward stuff so if you've got any brilliant social bookmarking ideas then go for it. Send me an email and I'll give you write access to the subversion repository.

Comments and trackbacks Feel free to post your comments . This post has trackbacks.

Monday, April 07, 2008

Gaggle

I hadn't heard of Gaggle before but both Deepak and Sutee Dee (who needs a homepage.. ;)) from the ISB mentioned it last week so I figured it was worth a look. It's a system built by Paul Shannon at the ISB in Seattle to share data between different bioinformatics applications on the fly. It has been around for a while, I think - there was a BMC Bioinformatics paper describing the system in March 2006.


A small server program (the ´Gaggle Boss´) provides communication among analysis and display programs (the ´geese´) which are modest and minimal adaptations of existing (or novel) bioinformatics and computational biology programs, and web resources. The Boss and the geese all run as separate programs on the user´s desktop computer, communicating with each other, at the user´s behest, by passing simple messages.

(from the ISB's 'about Gaggle' page)

I ran through a tutorial showing data sharing between (modified versions of) Cytoscape (also developed by ISB), R and a data matrix viewer no problem. Quite cool.

You can't share data from an arbitrary application (I don't think?), they need to be modified to send messages to the Boss goose. Having said that there's a Firefox extension called Firegoose which lets you pass messages to and from web apps, Entrez etc. I couldn't get it working properly but suspect that's something to do with my install rather than the extension itself.

Anyway, it's good to see stuff like this. Truth be told it's not the slickest thing ever, but it's still pretty cool - and it works. I wonder if you could turn it into a simple lab notebook - could you write a brief description of what you're going to try and do for the Boss app every time you send data to another app or something?

Comments and trackbacks Feel free to post your comments Blogger dandante Anonymous Christopher Bare Blogger dandante . This post has trackbacks.

Friday, April 04, 2008

Why you should try online dating

(you can jump to the short answer here, if you're feeling impatient)

Onto the psychology of social media. Kristin Stecher of the University of Washington and Dave Evans of Psychster LLC both gave interesting talks about profile pages.

Psychster is a consulting company dedicated to "the social science of social networking". Recently they've been looking at interpersonal perception (how does person A perceive person B? How close is that to B's self perception?). Most research into this uses 'fake' people - i.e. A is given a detailed written description of B and works off of that, rather than meeting anybody face to face.

To try and get a large 'real people' dataset Psychster created a Facebook application (and later a website) where users could fill out a questionnaire that rated their personality on a variant of the big five personality inventory (the big five being openness, conscientiousness, extraversion, agreeableness, and neuroticism). They then had the option of rating the personalities of other people (not just their friends), the idea being to collect how users saw themselves, how others saw them and the correlation between the two.

On the standalone website users created profiles to reflect their personalities. Profiles could contain any number of elements (name, location, gender, favourite movie, most embarrassing moment...) chosen from a large list.

The results in general:

  • people do 'get' each other (where to 'get' a person means to guess a personality close to their actual, self-rated personality).
  • people on Facebook get each other better (this kind of figures - you'd want to go rate your real life friends).
  • women are better guessers than men - but only when guessing random strangers.
  • women are easier to get.


Psychster looked at different profile elements on the standalone website to see if the presence or any in particular were correlated with higher rates of accuracy.

Profile elements that make somebody easier to get:

  • A link to a funny video (the number one predictor of personality)
  • What makes me glad to be alive?
  • Most embarassing thing I ever did:
  • Proudest thing I ever did:
  • My spirituality:
  • A great person:
  • I believe this:


Profile elements that make you harder to get:

  • Profile picture (but only if it is of a non-person)
  • An awful website:
  • An awful person:
  • A great book:


That last one (naming a great book making it harder to guess your personality) is pretty interesting. Dave did say that he hadn't yet done any proper analysis of why it might be. I wonder if there's any research into how much (or little) reading habits have to do with your personality? Here's a tangent (why do some people get interested in science fiction?) if you're interested. Here's another (people who read lots of fiction aren't socially awkward, in fact the tendency to get absorbed in a story correlates with empathy scores).

OK, anyway...

Why were women easier to read? Because they tended to fill out the profile elements that were good predictors ("my most embarrassing moment").

At this point you might be wondering (well, I wondered) who cares how well an online profile reflects your true personality. One answer is the online dating industry who have a vested interest in not setting you up with anybody plainly unsuitable. If profiles were set up the right way then maybe you could tell in advance if the guy or girl messaging you is worth seeing in the real world.

Sticking with the online dating theme,  it turns out that the levels of agreement (between actual and guessed personalities) you get by looking at Facebook profiles approach those you see in long term acquaintances. They're certainly better than what you get after a short face to face meeting (like a date). In fact, short f2f meetings are particularly bad at helping you gauge levels of agreeableness and neuroticism - not good. I think this means that stalking potential partners online actually makes good, practical sense and should be encouraged.

In case you needed any reassurance.

Comments and trackbacks Feel free to post your comments Anonymous Cameron Sharpe Anonymous online dating Blogger L . This post has trackbacks.

Thursday, April 03, 2008

Do you use language differently when you're depressed?

Can you tell if somebody is clinically depressed by analyzing their use of language? I'm not a psychologist, so take the background info below with a pinch of salt but the topic came up at ICWSM (more on how later) and I thought it was fascinating.

In 2001 Stirman et al compared the collected works of nine poets who eventually committed suicide and nine poets who didn't (as a control set). Their theory was that the depressed (and eventually suicidal) poets would use more first person singular (I, me, my) and words related to hopelessness and desperation (hate, worthless, death, grave) and that was supported by the data.

Rude et al later found something similar when they compared essays (on a common topic - "coming to college") written by college students. Depressed students used "I" and negative words significantly more often than controls.

Interestingly Oxman et al has found that spoken language patterns can be a good discriminator for classifying patients as depressed or not, so it's not just written language use that may be different.

Anyway, at ICWSM Nairán Ramírez-Esparza from the University of Texas presented a language analysis of some depression discussion boards on About.com. She ran a two part study: the first to confirm Stirman and Rude's findings and the second making use of the fact that the About.com boards are bilingual (there's a Spanish section too) to see how different cultures talk about depression.

Her approach was pretty simple - she collected ~ 400 posts from the depression forum and 400 posts from a breast cancer forum as a control, broke each post down into single words and then used off-the-shelf software to classify them (as verb, adjective, pronoun, positive emotion, negative emotion, etc.). She did this for both English and Spanish sections of the site.

Her results seemed to confirm the earlier studies: first person pronouns were found three times more frequently in the depression forum posts than in the controls and words relating to negative emotions occurred four times as frequently. This was true for both English and Spanish datasets.

The second part of her study was to see if English and Spanish speakers approach depression differently; what do they talk about? She studied this by using normalized word frequency counts then grouping different words into themes.

The top five themes discussed in the English dataset:


Treatment (medicine, doctor, therapist...)
Disclosure (tell, discuss, talk...)
Family (mom, dad, brother, sister...)
Symptoms ...
School


And the top five themes from the Spanish dataset:


Family
Relationship history
Hopelessness
School
Treatment


I'm a bit suspicious of results that are so intuitively appealing (family and romance are more important to Spanish people?). One thing that I did wonder was how much the results are skewed by different community expectations: if you visit a discussion forum where people are sharing stories about their depression and everybody else mentions their family maybe you feel compelled to mention your family too. Maybe the English language forums are dominated by a younger age group and so older visitors shy away, or v.v.

Anyway, it was interesting stuff. Somebody in the audience wondered aloud if this means that you could build a system to identify people at risk of depression (or perhaps more to the point suicide) by analyzing their language online. Maybe this could be built into the next version of the anti-plagiarism software used in high schools and colleges (I'm not advocating that, just saying)...

Comments and trackbacks Feel free to post your comments . This post has trackbacks.

Wednesday, April 02, 2008

Analyzing MySpace profiles

This morning James Caverlee presented his study of almost two million (well, two sets of ~ one million - one set of profiles picked at random and one gathered by traversing the social graph) MySpace profiles. It was interesting stuff. Some bits and pieces below.

MySpace users live up to gender stereotypes, rather disappointingly:



Words most frequently appearing in MySpace profiles










WomenMen

love, people, dancing, life, shopping, can, girl, family, hearts, being, have, notebook, are, dance, favourite, things

dating, sport, networking, metal, serious, football, relationship, sh*t, single, wars,
straight, band, video, f*ck, guitar, gay



And geographic ones (didn't manage to write all of these down in time):












users in Oregonusers in Alabama

camping, hiking, pixies, snowboarding, wine, vegans

football, jesus, gospel, nascar



Demographics wise ~ 50% of the profiles that they picked at random had one or no friends (i.e. weren't active). Age wise the peak is at 24, with smaller peaks at 69 and 100. The 69 peak is a secret MySpace code, apparently - it means that you're interested in, uh, one-handed typing (this wasn't made clear, but I'm guessing). By having a common age - 69 - you can use MySpace's advanced search to find others looking for the same thing. 69 year olds on MySpace are most similar (in their use of language) to people in their mid thirties.

Younger users are overwhelmingly female. There is a 2:1 ratio of girls to boys at age 14. This difference decreases as age increases. The flip over point is at 20 - after that you start seeing more men than women.

About 20% of the profiles in the connected dataset were marked as 'private'. Over time this percentage is rising. Having privacy preferences set is negatively correlated with age.

He had a fantastic slide showing top terms wrt to age... will post it and a link to the slideshow when it's online.

Comments and trackbacks Feel free to post your comments . This post has trackbacks.

Tuesday, April 01, 2008

Tossed Salad and Scrambled Eggs

I'm in Seattle for the ICWSM. The first day just finished and I'm going to blog about the more interesting talks tomorrow when I'm more awake. In the meantime:

  • Crowdvine for conferences is actually pretty useful
  • Lions for Lambs is terrible
  • St Trinians is actually quite good
  • The Century of the Self is brilliant
  • Seattle looks really nice from the air
  • Note to self: US Milky Way bars = UK Mars bars, tricksy bastards
  • Steak + beer + bay views = awesome (thanks Deepak!)
  • More Starbuckses than normal
  • Everybody is disconcertingly friendly. People keep offering to take me skiing. And to see waterfalls. People here big on waterfalls
  • MS Live Labs are hiring
  • Brad Fitzpatrick is a great speaker but I found his talk disappointing - too much hand waving about OpenID / OAuth / XMPP / XRDS. Dude, it's a room full of social network developers, you're preaching to the converted
  • Sadly Bernardo Huberman has cancelled. Marc "most unGoogleable name ever" Smith is talking instead. Marc is either founder of Poetry Slam (cool) a Happy Hardcore DJ (not cool) or a senior research sociologist at Microsoft Research (as yet undecided)

Comments and trackbacks Feel free to post your comments OpenID Deepak Anonymous Anna Blogger Sandra Porter . This post has trackbacks.


See all posts from: July 2005 August 2005 September 2005 October 2005 November 2005 December 2005 January 2006 February 2006 March 2006 April 2006 May 2006 June 2006 July 2006 September 2006 October 2006 November 2006 December 2006 January 2007 February 2007 March 2007 April 2007 May 2007 June 2007 July 2007 August 2007 October 2007 November 2007 December 2007 January 2008 February 2008 March 2008 April 2008 May 2008 October 2008 December 2008 January 2009 February 2009 March 2009 June 2009