Poynter Online Poynter Online
New UserLogin
Poynter Online Main Page
Poynter Career Center
Design / Graphics
Diversity
Ethics
Leadership
Online
Photojournalism
Writing / Editing
TV / Radio
Journalism & Business Values
About Poynter
Seminars
Faculty
Columns
Resource Center
The Poynter Store

Help Poynter


Create Your Personal Page
Add Your Bio
Add Your Photo
Share Your Favorite Links

Signup for Poynter Newsletters
Get Poynter Delivered to Your PDA

ASNE Online Ethics Tool



E-Media Tidbits
A group weblog by the sharpest minds in online media/journalism/publishing

Add/View All E-Media Tidbits Feedback
More E-Media Tidbits

Friday, August 4, 2006


Posted by Amy Gahran 6:39:16 PM
Topic Mining: Digging Value from News Archives

Sorting unstructured information by topic is harder than you might think. People do it well, but so far machines have had trouble with this task.

Padhraic Smyth
Univ. CA-Irvine
Dr. Padhraic Smyth, one of UC-Irvine's topic-mining researchers.
According to the popular tech blog Ars Technica, researchers at the Univ. of Calif., Irvine recently used software to sort 330,000 archived New York Times from 2000 to 2002.

This software was designed "to find patterns of words which occurred together. ...Once these word patterns were indexed, the software then turned them into topics and was able to construct a map of such topics over time. The team's example is a set of words that tended to appear in the same article: rider, bike, race, and Lance Armstrong. The topic for this story would obviously be the Tour de France, and the software could use its word patterns to chart how often the bike race was discussed in the newspaper."

According to a Univ. Calif. press release, "UCI researchers didn't invent topic modeling, but they developed a technique that allows the technology to be used on huge document collections. They also are among the first to demonstrate its ease and effectiveness by applying it to a newspaper archive."

What's not clear is how accurate this automated sorting was compared to human-compiled topic maps of the same content. Still, even if this technique is less accurate than what librarians can do, it could be a time-saving starting point for indexing large document collections. Like Podzinger and Podscope (mentioned earlier).

Now I wonder if we could tweak this software to comb through the Federal Register, Congressional Record, or the Thomas legislative database to quickly locate all the buried riders and clauses on particular issues, regardless of how cryptically they're phrased... Well, I can dream...

(UPDATE AUG. 7: It turns out that another team of researchers is trying to mine the Congressional Record.)


E-mail this item | Add/View Feedback (1) | QuickLink this item: A105707



E-Media Tidbits Archive
View items published between:   and   
(MM/DD/YYYY) (MM/DD/YYYY)

MAIN | Back to Top




Search Poynter Online
Search Poynter Online

My Boss Likes Me, He Likes Me Not
My Boss Likes Me, He Likes Me Not
New On Poynter
Whither Bush's Blog?
By Alan Abbey

Olympian Ruling
Al's Friday Meeting

Tech-Savvy Cities
Al's Friday Meeting

Taking a Grammar Vote
By Roy Peter Clark

Covering Disabilities
By Susan LoTempio

News from Israel
Page One Today

Video Comments
By Paul Bradshaw

Papers Not Relevant?
By Ernst Poulsen

Digital Diversity
By Sally Lehrman


Resources
Get Tidbits by E-mail (and other Poynter columns)

View All Tidbits Feedback

Pre-11/2002 Archive

Tidbits editor:
Amy Gahran (USA)

Tidbits
Contributors:

Alan Abbey (Israel)
Paul Bradshaw (UK)
Matthew Buckland (S. Africa)
Juan C. Camus (Chile)
Thomas Crampton (Hong Kong)
Michelle Ferrier (USA)
A. Adam Glenn (USA)
Rich Gordon (USA)
Tish Grier (USA)
Barb Iverson (USA)
Steve Klein (USA)
Vincent Maher (S. Africa)
Maryn McKenna (USA)
Joe Michaud (USA)
Bill Mitchell (USA)
Steve Outing (USA)
Kim Pearson (USA)
Ernst Poulsen (Denmark)
Katja Riefler (Germany)
Laura Ruel (USA)
Ken Sands (USA)
Ezra Shapiro (USA)
Maurreen Skowran (USA)
Mac Slocum (USA)
Fons Tuinstra (China)
Monique van Dusseldorp (Netherlands)
Peter M. Zollman (USA)
  Site Map | Advertise | Search | Contact | FAQ | Our Guidelines QuickLink  
  Copyright © 1995-2008 The Poynter Institute
  801 Third Street South | St. Petersburg, FL 33701 | Phone (888) 769-6837
  Site developed & hosted by DataGlyphics, Inc.



Poynter Career Center
Friday: Can New Media Save My Career?
Giving Credit Costs Little