StatSheet generates game stories that are both surprising & predictable
As one of the smallest colleges in NCAA Division One basketball, the Houston Baptist University Huskies typically don’t receive a lot of media attention. So school athletic officials were happy to learn of a new website called HBUreview.com that promises to devote substantial coverage to each of the team's games.
On the other hand, their excitement was tempered somewhat by the fact that the website is authored entirely by robots.
HBUreview.com is one of more than 300 websites launched by a small North Carolina company called StatSheet. Using proprietary software, the company's computers transform basketball statistics and box scores into machine-written text.
Each of the company's sites is devoted to a particular school, from powerhouses like Kansas and Duke to little-known colleges such as Centenary College of Louisiana and HBU.
After every Huskies game, StatSheet's computers push out brief and coherent recaps. The stories and game notes are laden with statistics-centered text that's generally informative ("Andrew Gonzalez was the leading scorer for the Huskies with 26 points in 35 minutes") and sometimes insightful ("The bench recorded more points than the starters"), but also occasionally belies its inanimate creator. ("Houston Baptist has lost 100% of the time this season after recording 5 or less steals," the computer dutifully reported after the Huskies started the season 0-4.)
"With smaller schools, it's a plus to have as much out there as possible," said Houston Baptist Media Relations Director Russ Reneau, who's been following HBUreview.com since it went online last month. "But they still have some things to work out."
Lots of numbers, lots of clichés
StatSheet says its automated websites are still in the beta-testing phase, but company founder Robbie Allen is confident they'll fill a niche in the crowded sports media marketplace. Since 2007, StatSheet has provided numbers-only statistical analysis of college and pro sports events. But Allen says the robo-writers add a new dimension to the coverage.
"There's a large segment of fans out there that's probably not going to read through a box score, but they would read through somebody describing what happened in last night's game." said Allen, an engineer who formerly worked at Cisco Systems.
Allen said Statsheet -- which has raised more than a million dollars in venture capital -- hopes to make money from selling online ads, earning commissions from merchandise and ticket sales, and eventually syndicating its content to other websites.
While information-starved fans of small schools might be especially attracted to the machine-written stories, Allen suggests that even supporters of professional or major college teams might prefer StatSheet's stories to those written by real people.
"Our algorithm probably has more knowledge of what happened at the game than anybody who watched it in person," Allen said in a phone interview. "It's going to have research and analysis that is much broader and in depth than what one person can possibly process on his own."
Indeed, for fans who are "stat-heads" -- the disciples of sports statistics gurus such as Bill James or Dean Oliver -- StatSheet's stories unearth some interesting factoids. ("This season Montana State has won 71% of the time when Bobby Howard plays 30 minutes or more," it recently reported.)
But predictably, the robotic writing is less effective when it tries to convey the emotional side of sports or put individual statistics into broader context. The algorithm repeatedly spits out the same clichés, which find their way into stories about dozens of different contests. When Duke beat Radford last month, the Blue Devils "played the second-half on cruise control." But so did Ohio State, West Virginia, Iowa State, and several other schools.
Meanwhile, the StatSheet computers opined that "fan sentiment is shot" at Eastern Michigan, Niagara, Houston Baptist, and more than 20 other schools whose teams are struggling. ("We started off with a pretty tough schedule," countered Reneau, the Houston Baptist spokesman.)
"We do a lot of different computations that will result in a specific type of sentence," Allen said, explaining that StatSheet's algorithm takes into effect a team's record, the strength of its opponents, and its momentum heading into each game. "We're trying to make the subjective objective."
Journalistic tool or Mad Libs?
Allen concedes he's still in "the infant stages" of developing and perfecting the technology. But some sports journalists say they're less than impressed with what they've seen from StatSheet so far.
"It's a slightly more advanced version of Mad Libs," said Penn State Sports Journalism Professor Malcolm Moran, who spent three decades writing for USA Today, The New York Times, and other publications. "I don't think that makes for anything close to good journalism."
Moran added that even the most cogent machine-written story about a game is likely to omit the kinds of behind-the-scene details that many sports fans hunger for -- insider details about which players are hurt, who's in the coach's doghouse, who's on the trading block, and the like.
"The story of the day may have much more to do with off-the-field considerations than whether somebody went 0-for-5," Moran said.
Other observers, though, say the technology may have a place in modern sports coverage. Former Wall Street Journal columnist and editor Jason Fry notes that game recaps have become less important to readers, who now can watch highlights and recaps of many games on television or online. So if computers can churn out those lightly-read stories, journalists can be freed up for more in-depth writing.
"Services like this are interesting if you're looking at finite, shrinking, disappearing resources in newsrooms," said Fry, who now blogs about sports, technology, and other subjects. "If it's going to satisfy part of an audience and let you use precious resources for other things you really need people for, that's worth having a conversation about."
While Fry says the automated content has a "robotic twang" that would never be confused with the work of a professional sportswriter, it may prove to be good enough to satisfy people looking for a quick game summary. And proponents of the technology say machine-written stories eventually may find a variety of other applications.
Northwestern University Journalism Professor Rich Gordon, who's worked with students to develop automated content software, envisions it being used to produce stories about sports events that now receive no coverage, such as Little League baseball games or low-profile high school and college matches. He also sees a possible application for business stories about corporate earnings, economic data, and other subjects that largely involve numbers and statistics.
"Think about any story that's done over and over again following a template." Gordon said. "If technology can replace what a human does, it will."
Northwestern has developed its own software called Stats Monkey, which -- similar to StatSheet’s technology -- turns sports box scores into text. A commercial version of the product is being used by the Big Ten Network and a handful of other clients.
But Gordon, a former reporter at the Richmond Times-Dispatch and Palm Beach Post, doesn't view robot-generated text as a threat to human journalists -- at least not those who actually humanize their writing.
"If your job rests on writing stories that a machine could write a reasonable facsimile of, you should be worried," Gordon said. "It reinforces the need for us as journalists to be really good at the things humans can do and robots can't."