Khaos

Defining Collocates

Now that Marty has made the decision that he will write a simple Perl script to pull collocates out of data for me I need to give him a more precise specification of a collocate. Carmen Dayrell wrote a paper on “A quantitative approach to compare collocation patterns in translated and non-translated texts” which contains a detailed section on how to decide what a collocate is.

The first step is to work out which words should be taken as nodes – but as I am interested in specific nodes, like the word “Perl”, I will not be doing this. Then we need to decide how we will define a collocate. Dayrell suggests that the collocations should occur at least 4 times to be significant with a span of up to 4 words on either side of the node. Structural boundaries in the text should also be ignored.

While Marty does this I am going to read the work that Church and Hanks did on word association norms and mutual information to see if any of that will help me get better results.

One Response to “Defining Collocates”

  1. Marty was here! - Working with collocates Says:

    […] that Karen is expecting me to write more Perl scripts to analyse collocates I think it’s time to install […]

Perl Collocates

My linguistics course contains lots of really interesting material but unfortunately has really boring assignments. The last assignment was so awful that I considered giving up the course as I didn’t want to spend my spare time on something I wasn’t enjoying. To help with the tedium I decided to find something to do with the new knowledge that actually interests me.

I have been reading about collocates – words that are typically grouped together such as “law and order” and “fish and chips”. What interests me is the introduction of new collocates. I read a study by Fairclough who had analysed 53 speeches given by Tony Blair. The word “new” occurred 609 times and the most frequent collocates were “new labour” and “new deal”.

I am also interested in the Perl community, how it is perceived and how it perceives itself. If I analyse the blogs of various members of the community what are the collocates of “Perl” going to be? Some are going to be obvious – “Perl community”, “Perl 6” – but what unexpected ones will I find? And what has changed in the last few years? What did we talk about in the past that is no longer important to us and what is the latest thing to be linked with Perl?

8 Responses to “Perl Collocates”

  1. Stray Taoist Says:

    How do you define the perl *community*? Those with the loudest voices? The self-aggrandising, self-publicising, sell-appointed spokespeople?

    Is it a self-perpetuating group? Are those who just see it as a day job not allowed in? (How does the *community* see the *wider* community?)

    I would be interested in your collaction research, and indeed, it might be fun to apply such linguistic techniques to *any* group. Because groups use language in a very specific way, to include, exclude and obfuscate. (As the Blair terms are meaningless when broken down. I digress.)

    There are quite a few decent books on such linguistic (with no meaning) constructs. Although I guess not all collocates are meaningless. Just the ones from lying scumbag politicians. I digress. Again. I should be making the dinner.

    I expect a follow up post with your results. 🙂

  2. Marty was here! - Perl Collocates Says:

    […] and I were talking about linguistics and textual analysis, and how she wanted to analyse the writings of the Perl community. So, to make a start we decided to write a short Perl script to extract word level n-grams from […]

  3. Stray Taoist Says:

    That’s me told then.

  4. Stray Taoist Says:

    /me decides to read his feeds in a different order in future.

  5. karen Says:

    I did write my post before Marty – so you read them in the correct order. Marty was showing how he was planning on pulling these out of data and now we need to find a suitable corpus of data. And I still need to blog some more about the collocation papers I have been reading.

    I am surprised that you think that the collocates that politicians use are meaningless. They give us insight into the way in which they are trying to make the electorate think. Mills, In The Art of Persuasion, claims that the word “new” is one of the 16 words (out of the half or million so words) that really grab the attention of the listener. He also says that the top word in this list is either “new” or “free”. Attaching the word “new” to the name of your party is a very clever move.

  6. Stray Taoist Says:

    Of *course* words from politicians are meaningless. They deliberately devoid them of all sense. (Let us not start on the obvious, whereby tacking ‘social’ on the front of *anything* makes it meaningless.)

    There is a difference between rebranding (shot me now) a party by reappropriating language. Perhaps that is better than meaningless. Perhaps I really mean they misappropriate words thereby making them meaningless. Gordon Brown (if you follow UK politics) is a master of making Stalinesque (in the pattern of speak, as well as the intent) speeches, which in the end say nothing, mean nothing but at the same time remain utterly dangerous to our way of life.

    Oh, wait, I should read properly before I start off. *the collocates that politicians use…* Right. Hmmm. Mayhaps I shouldn’t let my anger at the tricks pulled by politicians get in the way of my (usual, natch) clear thinking, eh? 😀

    The Art of Persuasion, eh? Haven’t read it. Added to the list.

    Is Marty planning on releasing some of this code? Maybe I should ask him that, eh?

  7. karen Says:

    Tony did send me a link to an interview with Gordon Brown regarding ID Cards and I was horrified that I couldn’t understand anything he was saying. Lots of words but no sense at all.

    I asked Marty about the code (because you know he rarely gets round to reading emails from people) and he did say that he was planning on releasing all his code under GPL 3. He giggled after this and I’m really not sure why.

  8. Khaos » Blog Archive » Perl Collocates: Finding Data Says:

    […] haven’t forgotten my earlier post where I stated that I wanted to find out what collocates of “Perl” were being used by […]

TLUG: Demonstration of e-paper Readers

At today’s TLUG meeting Jim Maricondo gave a talk about writing applications for the iRex iLiad. He brought two with him as well as a two of Fujitsu’s FLIPea prototypes of the first colour e-paper reader.

I was surprised by how good the iLiad looks and did think, when looking at one page of text, that I could read books using one of these. It weighs a lot less than the book I am currently reading and since it doesn’t have a back-light it seemed that it wouldn’t be tiring on my eyes. But then I changed the page. The refresh that takes place involves the screen being made completely black, then white, and then the text appears. It looks like it flickers and given that I read quickly it would be refreshing maybe twice or three times a minute – which would be really annoying.

It has given me hope that there will be a way in the future for me to replace paper books. In Japan my books are getting damp and I don’t have the room I used to have to store these. I also find it difficult to carry them when I am travelling and it would be great to have a light-weight way to carry the text of multiple books.

The FLIPea was certainly interesting to look at but is in no way useful for reading something like a book as the refresh takes 15 seconds when you change page. It’s also surprisingly hard to read, as the contrast isn’t particularly good inside. But it is supposed to look really good in natural sunlight and I imagine that it will be used to display adverts and pictures to begin with and not pages of text.

Bad Kitchen Day

Sometimes women complain about having a “bad hair day” but today it was the kitchen that drove me mad. I had forgotten what a mess it was as the living room was tidy and it lulled me into a false sense of security about the rest of the apartment. But when I walked into the kitchen I realised that we had used every plate, cup, pot, and spoon that we own and that if I didn’t do something soon the fungus monster would start to grow.

I put on my cleaning music and decided to just get on with it. But then a Prince song came on to sabotage my plans. I have a play-list of music that I only put on if a I want to dance or clean. “When Doves Cry” doesn’t fit with that list as it makes me want to stop and listen (and it really doesn’t take much to make me want to stop cleaning). Thankfully the next song was something terrible by Kylie so I could continue with my cleaning. Now all I need to do is find music that makes me want to cook the dinner…

2 Responses to “Bad Kitchen Day”

  1. chrissy Says:

    he,he,he

  2. karen Says:

    No wonder you’re laughing. But just wait till you get here in the summer and then you can help sort it out for me 🙂

I'm No Gentleman

I am starting to get overly sensitive about the use of gender specific terms in technical blogs. I really don’t mean for this to happen but tonight, when reading Schwern’s use.perl journal, I did wonder why he had to use the phrase “Gentlemen, start your RSS readers” as the word “gentlemen” makes me feel excluded. I assume that this is based on a quote “Gentlemen, start your engines” and I know that Schwern is not in any way saying that women shouldn’t subscribe to his RSS feed but I did notice it – and I’m not convinced I would have a year ago. So something has changed.

It could simply be that studying language has made me more aware of the words that people use or that in 2007 I read a lot of posts about gender and sexism. It could also be because Schwern’s post is about a new blog that discusses geek communication which made me look more critically at how he was communicating the news.

I don’t find the phrase offensive but it did make me stop and read the line again and make me wonder if there was a better way to have said it.

One Response to “I’m No Gentleman”

  1. Tony Says:

    Well, with more females in auto-racing these days, the quote is generally “Ladies and gentlemen start your engines” now anyway.

Lost Time

I haven’t got anything done this weekend but I do know where my time went. Marty and I watched Season 3 of Lost. And even when we finished it last night instead of going to sleep I spent hours reading various Lost web-sites to see what the rest of the world thought about the series.

3 Responses to “Lost Time”

  1. SWM Says:

    And what did you think about it..

    I thought the first half was terrible to be honest. The whole introduction of the two characters that were killed off very quickly was good though.. And I am glad that a certain hobbit met his end! 🙂

    I think the second half was good.. But they are gonna have to do something special in season 4, as I hated the filler/red hering eppys.

    However with the commies striking in hollywood I doubt we will get to see the end of season 4 until this time next year! 😉 I think they have shot a full 8 eppys, but no more. 🙁

  2. karen Says:

    I thought the first 5 or 6 episodes were fairly annoying as I really didn’t like all the stuff with Sawyer and Kate in the cages.

    Marty really liked the Nicki and Paulo story but I wasn’t sure what I made of it. But I did like the introduction of some of the new characters from the other side of the island and I really want to know what they are going to do with Locke, Jacob, Ben and Desmond.

    I also was glad that Charlie died as his character was really annoying and always whingeing – and the way he died was good.

  3. SWM Says:

    Ahh the whole thing about Nicki and Paulo was great. They were introduced really badly, and the Abrahams dude decided to kill them off really quickly as peeps on the interweb were yapping about them. They were a sort of scapegoat of the first half of the 3rd season!

    Locke, Jacob and Ben are very interesting. I think it would be awesome if Locke ended up leading The Others. Don’t know where they are going with Desmond.

    Yes the hobbit is dead, let us partiiieee!

Japanese Hotel Room Descriptions

I come across strange English in Japan all the time but usually it’s just bad spelling (like the restaurant last night serving “plane croissants”). I was looking on-line at hotels in Universal Studios Japan and saw the following description of a room:

We present you with the time for nesting peacefully your wing of dream fluttered at the Park.

Beautifully spelt words that don’t make much sense when put together like that.

2 Responses to “Japanese Hotel Room Descriptions”

  1. Chastity Says:

    Hilarious! I laughed for about ten minutes, and I keep coming back to read it again.

  2. Jessica Marie Says:

    I still have a few little-girl Japanese journals from the dollar store that are filled with phrases like “Let’s do the tennis together!” “Refinedley made for the success of your outstanding cause” and “My heart is flam able when I see your beautiful eyes.”

Perl Buzzing

I was slightly baffled by the level of disgust Perl Buzz reported over the title of the Linux Journal article regarding the release of Perl 5.10. To me the headline was simply an editor using innuendo and word play to attract attention to a rather dull press release. I have been trying to work out if there is some cultural difference I am missing as, although the phrase “put out” is pejorative, I certainly don’t think it’s offensive enough to warrant that reaction. Maybe to an American like Andy (the author of the Perl Buzz post) it has a stronger meaning than it does to me coming from Northern Ireland.

I am also confused by the things that Perl Buzz are calling on the editors of the Linux Journal to do especially “explain to us what will be changing at Linux Journal so we think of LJ as worthy of our time, trust and readership.” Who is the “us” that Andy is referring to? Is it the editors at Perl Buzz and if so why on earth would the Linux Journal ever feel the need to explain anything to them? Perl Buzz can’t possibly assume that it speaks on behalf of the entire Perl or Open Source community and I find it detracts from anything sensible they are saying when they write as if they are.

Is the article merely an attempt by Perl Buzz to create more of a “buzz” about their blog? Because if their article is really about how wrong they think it is for the Linux Journal to print such headlines and ads why on earth are they re-printing them on the front page of their blog so that I someone like me, who never reads the Linux Journal, gets to see the material?

3 Responses to “Perl Buzzing”

  1. Tony Says:

    I always like it when an “anti-sexism” rant contains lines like “material offensive to women and the rest of the computing community.”

    But I’m also confused as to why it’s the Open Source Community™ that needs an apology. I could understand (though disagree with) thinking that the Perl Community deserves one for not treating the big new release with the respect it deserves. Or even suggesting that any use of innuendo necessitates an apology to All Women Everywhere. But where does the Open Source Community fit in. Why is it they who should be apologised to?

  2. Andy Lester Says:

    The apology isn’t so much about the sexist commentary as it is pissing on the work that we’ve been doing. Looking back I should have made the distinction clearer.

  3. Endrew Says:

    Is he American? Wasn’t it americans complaing about “perl is my bitch” t-shirts? Aw bless, maybe they need a humour transplant.

    I hadn’t seen the headline until I read about it here. I thought it was very funny. Anyone offended by that seriously needs to get over themselves.

富士山

I’ve been living in Tokyo now for just over a year and have never seen Fuji from the city. But this morning, on the train from Shibuya to Jiyugaoka, the sky was clear, the train was empty and there is was. It’s beautiful.

Quiet New Year's Day

I have been trying to remember what we did with Marty’s family over the past week but the various outings are all starting to get mixed up in my mind. Today Tokyo was quite and empty. It almost felt eerie walking around Ginza with the shops shut and no people waiting to cross at the crossings. New Year feels very different here. Last night there were no car horns beeping, no fireworks, or drunk people out yelling in the streets. The stillness today reminds me more of Christmas Day than New Year’s Day.