Relevant Term Selection Example

This is an example of relevant term selection. Part of the program I work on (The Generator module) has to select description and question terms to build cases for a case-based information retrieval system. With CBR Express and the Generator option (I prefer to call it Zippy), you can do automatic indexing and retrieval of text and doc files. Think of it as a librarian for the text on your disk.

For this example, I grabbed some articles (251 to be exact) from talk.bizarre, a popular netnews group. I put each article into a separate text file, and stripped the header other than the Subject: line. I indexed the files with Generator, and here are some of the terms it selected as relevant terms for articles by people who posted more than 5 articles to the group in this sample. I've tried to remove duplicates if it selected the same term for more than one article by the same person. One way to think of this example is that these are the terms that best distinguish these authors within this group.

Jenine Abarbanel
caller, trail, jet, partner, bike, fantasize, coke, crowd, mountain, love, redd, rooster, deer, research partner, orchard, volleyball, Hulpit, fort, sky, come, times, valley, research, shaking, blouse, jet engine, brain lock, page, saw, diet, behind, socks, morning, stop, stain, festival, breakfast, sun, help, coming, hour, surprise, water, dry, lock, special, bucket, pill

AjD, coke machine, coke, love, Jul, machine, temporary story, Williams, Gooley, Stefan Lorant, Cutler Andrews, Twist-off, RICHH, summary, photograph, stick, sat, diner, given, find, happen, guy, story, peace, pronounce, central, HWRNMNBSOL, site, hope, dick, astronaut, habitat, firm, deep, choose, spend, content, begin, kill, public, morning, curve, apartment, join, help, voice, war

Ranjit Bhatnagar
trespasser, miscellany, Denver, gradient, lunch, rest, curve, hitch, ship, car, Jupiter, move, times, trip, begin, driving, science, bitter, hope, engine, dream, magazine, dark, travel, loss, approach, Cataldan, bitter loss, predation, learned predation, seatbelt, telescope, CATALDAN, outskirts, Shoemaker-Levy, stuffing puff, stuffing, souvenir, parted company, cargo van, persist, awaken, anecdote, traded anecdote, windshield, cough, company

Bill Bill
gabble, ship, bubble, bead, see, caffeine, beads, bead device, smile, car, feel, creak, steering, hold, bill bill, speed, trail, splash, behind, water, whistling, love, device, float, Montana, buying, Huh, turtle, times, meet, hands, falls, silence, river, lane, dead, saw, sat, given, chest, squeak, tree, spin, ship float, life bubble, cry, board, slide

Caitlin Burke
refrigerator, bramble, Caitlin, goat, tape, cream, Gisela, Dissidenten, Washington, volt, Gooley, Threshhold, Ogolelo Mesto, Pankow, poison ivy, World, Worlock, food, dad, Larry, wire, paddock, electrify, electric tape, dead, sat, rest, beauty, player, felt, poison, delicacy, assembly, light work, fiddle, side, stick, dream, sleep, mark, using, walking, car, female, hold, lots, complete

Roger Carasso
CARASSO, DAVID, ROGER, comrade, founder, revolutionary, page, national, INTERNET, reunification, home page, liberation, leader, represent, go, moderator, web, party, CARASSIAN, found, why, independence, respected, entire set, culture, entire, official, socialism, democratic, country, WORLD, victory, joy, WORKERS', brilliant, outstanding, page giving, ACTIVITIES, central, Hey, grand, giving, army, cause, Clinton, GIF, jail

Kevin McAuley (aka Chevyn, beelzebub)
tampon, bleed, BINKY, MEE, Fallout, Soren Ragsdale, men, Jul, car, salesman, wish, KWESHUN, PLAIY, sere, Dave Rhoades, number, real, path, caller, video, write, advice, saw, words, using, walk, clue, ring, assignment, rental, ball, money, show, entire, name, falls, student, miss, drive, attention, novel, female, saying, mouth, morning, town, street, dance

Larry Doering
tattoo, LJD, CFB, Yup, Hey, Larry, Joel, Stacey, Introl, Norskog, dinosaur, mike, feel, California, Gordon, Heh, kelp, dead, house, John, write, name, draw, guy, Jew, Yeah, making, blow, Atlanta, Aaron, Winkoff, Whoooo, Cal, Mindy, Pogo, Grego, Puerto Rico, Sheeooo, Artisputtingtogether, Starcap'n, Washington, Del Delatorre, Mullineaux, T-rex, Gaye Bikers, Ginevan, Korea, Yoyodyne Satcom, Wanna

Ron Echeverri (aka RONE, 5150)
Csp, Dorothy Carasso, Andrew, hope, dead, Carasso, wood, show, heart, falls, student, kid, Toto, whites, batting, optimist, HAHAHHAHAH, centerline, pearly, pearly whites, jimmy, Lloyd, Hoover, house falls, batting cage, battleship, jimmy carte, woodsman, wizard, carte, Philip Heggie, sing, fine, write, monkey, standing, snarl, flying, sense, scarecrow, fine place, edge, lion, breakdown, happy thought, feeling, state, make sense

Dave Filippi
Dave Filippi, behind, times, scroll, rest, Taiwan, Europe, girl, meet, gate, autumn, leave, dream, infinity, thinking, Internet, metaphor, terror, announcement, explain, hold, knowing, set, guy, envelope, feel, edge, conversation, food, include, window, college, dad, ways, insert, book, top, begin, boring, prey, remind, desire, tree, friend, come, spring, happen

Zvi Gilbert
Zvi, love, shave, desecration, money, water, guess, waves, bill, dream, pain, ken, human, wings, hair, playing, men, starfish, beating, quantum speck, teakettle air, teakettle, trickle, speck, Burma, TMBG, throb, shard, mother waves, cocaine, toughen, wind, Pennies, air teakettle, werewolf, fluid beating, human juice, brain fluid, strychnine, mom run, fluid, marvellous thing, purr, trace cocaine, moving, waver, mean, father

Mark Gooley
RASTUS, CHLOE, partner, receptive partner, circumference, dat, plastic, penetrating partner, Bugbee, alteration, Cranio-vaginal, remote control, mark, find, manual, method, produce, anus, danger, removal, penetration, bones, socket, end, steel, off stage, Rastus, Lawdy, entertainment, rigid support, denture, snore, orifice, Peggy, anal sphincter, Emacs, pelvic bones, Amerinds, Jane, drawback, sexual act, orphan, singing, sphincter, physical therapy, PTFE

Dan Johnson (aka Crisper Than Thou, The Elder Dan)
Dan, piss, stop whistling, crisp, whistling, Johnathan Vail, boom tomorrow, Annie, assistant, Jesus, assistant defense, resemble, brash assistant, defense attorney, attorney, Johnathan, Version, lovely, find, times, evidence, tomorrow, Caitlin, young, stop, dead, hug, heart, love, Yeah, feel, media, Ooooh, jolly, Wagner, Laura, Starcap'n, antifreeze, Santa Cruz, Kaite, rancher, Gimme, coolant, Kiva, Tuesday, Vail Subsitute

Nikolai Kingsley
rewind, beep, ring, Angela, Theander, translator, interview, guy, desk, phone, Jeanette, wastrel, phone call, publisher, department, magazine, behind, house, pinhead mask, name, machine, mike, German, wonder, grey, thank, fee, hair, mask, saw, sat, elevator, given, mountain, offer, rest, problem, fiddle, pinhead, window, follow, woman, Midwinter, swig, Penelope Keith, Anacronism, pagan

Chris Lugo? Cathy? The Strawberry Blonde?
booty, Cathy, butt, baby, blonde, song, really, back, sunshine, course, confession, strawberry, butt song, band, shake, love, Lord, Christ, Lord, Jesus, bars, treasure, grand, gift, hearts, Jerusalem, emancipation, reconciliation, death, sin, smile, angel, share, angel inside, times, saw, silence, tears, Spirit, Ukn, heart, inside, peace, valley, mountain, Jesus Christ, Paraclete, CDT, prison bars, snow, joy, feel, ground, dead, soul, fear, earth, prison, son, given, find, hope, praise, words

Miles O'Neal
hacker, net, page, ferret, cop, dead, Togglebolts, sexton, Richard, Colonies, Armageddon, stick, AOL, Taxes, Barbie, forehead, prey, monitor, email, cabal, payment, low, saw, web, leap, door, behind, guy, final, fear, darkness, lurker, wire, real, house, face, web page, lunch, hover, frustration, site, hope, committee, study, flame, rate, become, card

Bryan O'Sullivan
Wuh, bos, Bryan, herd, herd cat, Glasgow World, compute, silk, math, Elford, university, wife, wonder, email, sat, department, food, science, bolt, silkworm, diner, taxi, Jesus, inside, words, cell, begin, train, Thomas, leave, hands, felt, tears, spoke, sleep, whisper, Yeah, door, lay, team, saw, sleeping, morning, chest, voice, times, smile, face

John Perry
Oregon, John, pub, Mustaphas, wolf, page, mph, Cathy, facial, Chris, vote, video, web, chocolate, poetry, coat, Irving, handbook, lyrics, facial hair, Los Angeles, pinata, hair, angel, Larry, info, net, city, home page, highway, speed, diary, care, answer, gold, band, complete, trick, playing

Brian Rapp
DAUNTINE, Les Bizarrables, Brian Rapp, VALSEAN, Renee, hacker, ferret, lovely hacker, Valsean, hack, Blairius, times, Sean Valsean, thread, lovely, Don't, tam, Stat, tar, net, Lord, words, Valsean Forgiven, GORMAN, Wizardier, Schemed, Gorman, end, swear, scheme, clue, dead, Clifton, fish, metaphor, fight, WOMEN, dearie, CRONE, TANKGIRL, cod, STARCAPRA, lurker, name, sell, flame

Paul, RICHH, driving, Vegas, love, Jupiter, west, Yeah, Gonna, Wayne, Ethelbert Nevin, Memphis, Terri, Kansas, Slurpees, Disneyland, Tucumcari, Pennsylvania, Ferrara, Gotta, Purdue, Don't, thinking, given, car, offer, end, street, follow, Pittsburgh, real, Andrew, ball, tomorrow, listen, Sho, net, firm, sight, judge, country, dark, taking, breakfast, location, mason, hairpin, brew, gaia

Eric Scheirer
Abmaj, media, Ebmaj, Eric Scheirer, rhyme, Grice, Beckwith, Simpson, guy, hair, sleeping, Gotta, spy, net, looks, cause, noose, sleeping pill, suicidal jazz, sofa, Marge, Marge, sofa cause, comma, commons, kook, Coltrane, international spy, Boston, terrorist, blade, razor, birdwatch, razor blade, highjack, bassoonist, flatware, duck, pill, philosopher, contemplate, play, brush, plane, jazz, cat hair, juice

Greg Seidman (aka Anthropohedron)
Anthropohedron, Houston, kilo, Xterminal, Hey, orbit, earth, Netherlands, Gosh, Tupperware, building, Expired, up, desk, site, beep, party, radar, astronaut, copyright, inside, Internet, indigo, come, flicker, plastic, hate, conversation, evening, expand, outside, balloon, fax, leave, remains, web, pressure, hold, public, hassle, watching, catch site, molecule, mere, sealed containing, Charles Boni, Albert

Jeff Vogel
Jeff Vogel Rutgers, math, E-mail, student, tiger, error, blow, teacher, algebra, BURMA, blackboard, burning, Wasn't, channel surfing, World, Aaaarrrggghhhh, surfing, Simpson, times, young, class, saw, shave, death, desk, hide, set, scream, blood, problem, giving, bent, flesh, woman, GIF, understand, professor, ball, face, cane, deliver, hope, tomorrow, continue, ritual, channel, desecration

James Woodyatt (aka strychnine)
fantasize, masturbation, problem, find, dead, Yeah, men, Nope, semen, orgasm, It'd, touch, share, food, times, dick, Kennedy, clean, difficulty, scenario, raw, words, mechanics, death, stop, experience, task, acid, peace, follow, void, feel, driving, danger, mean, hate, real, central, lunch, example, guess, capacity, continue, committee, climax, answer, jerk, technology

R.P. White
Kibo, Jehovah, Elohim, JEHOVAH, ELOHIM, labor, KIBO, earth, expert, Brethren, beautify, Taylor, dry land, Internet, answer, CARASSO, loser, dry, form, appear, response, joy, email, fowl, animal life, manner, report, kind, animal, mountain, name, water, machine, land, insect, beast, stare, bring

This post is copyright 1994 John Perry. Any rebroadcast or republication is prohibited without my expressed written consent. Write to me with your comments or usage requests

[Page-O Bizarro] [John Perry's Home Page] [John's Best t.b Posts]