Saturday, March 3, 2007

Concordancers and concordances

By Karen Stanley

A teacher asked what concordancers were and how to use them.

1) Concordancers of the type I am talking about are software. You can have your own concordancer, or some websites offer concordancing capability on the website.

2) Concordancing software can be used to search text for a particular item, much like the "Find" function in other types of software you may use. However, software packages for concordancing functions allow you to do a much broader and more complex range of searching.

Among other things, it actually collects and lists all examples it finds (you can set a maximum number of items). Mine lets me click on a listed occurrence and pull up a broader context for that item. You can have it sort found items alphabetically by the first word following, by the second word following, by the word before, or by the word that came two before.

You can allow for "wild card" endings or stems, or allow for up to a certain number of words to come between two other words. (eg: searching for 'have * been *ing' allows for adverbs - such as 'already' - and for any verb to occur as the present participle). And more.

3) Searchable texts are referred to as corpora (singular, corpus). I use fairly primitive corpora: downloaded newspaper articles, stored in separate files by register (Ann Landers in one, NY Times science section in another). I also have some corpora which are from tape scripts of various types of oral production.

The British National Corpus is huge, and encompasses an enormous range and volume of English text (both oral and written) - I haven't used it, myself. I know that CHILDE (which I also haven't used, and may have spelled incorrectly) involves child language production.

I believe the University of Michigan is making available a corpus of English produced by ESL learners (with an eye in particular to research on second language acquisition). Many corpora are 'tagged', which means someone has gone through and labeled (tagged) certain parts of the text for function (so you can search using the tags to specify limitations or range). I have never worked with a tagged corpus, but I imagine that at least some parts of speech are used as tags. Surely other things I haven't thought of and/or don't know about.

The Cobuild dictionary was produced using corpora and their website offers a "Corpus Concordance Sampler" and a "Collocation Sampler." They also (if you scroll down past the place to enter your search) have some instructions on how to best go about these kinds of searchs.

Corpus linguistics has become very big in the last few years; you see more and more research and conference presentations that involve it.

No comments: