RootsTech 2011: Towards a New Genealogical Data Model

On Sat­ur­day at the Root­sTech con­fer­ence in Salt Lake City, there was an open dis­cus­sion ses­sion on genealog­i­cal data stan­dards. There has been a heat­ed dis­cus­sion, lit­er­al­ly going on for years, about a new data mod­el that could replace GEDCOM. A new GEDCOM stan­dard would address GED­COM’s gaps — for exam­ple, being able to store evi­den­tiary analy­sis with­in the data mod­el — and be a liv­ing dynam­ic stan­dard, unlike GEDCOM, which has been sta­t­ic since 1996.

In the first hour, the dis­cus­sion iden­ti­fied sev­er­al issues with the data mod­el:

  • Data in Pro­pri­etary For­mats — Because of gaps in GEDCOM, and the lack of a stan­dards body to address this issue, most soft­ware ven­dors devel­oped their own pro­pri­etary exten­sions, which lim­it­ed the abil­i­ty to share data.
  • Lack of Per­sis­tent URLs (PURLs)
  • Unstruc­tured Text
  • Tag & Link Issues
  • Incon­sis­tent Search Expe­ri­ence
  • Data Ver­sion­ing (Diff/Merge)
  • Inabil­i­ty to Trans­fer Rich Data (rich media)
  • Inabil­i­ty to do Cross-Repos­i­to­ry Search
  • Doc­u­men­ta­tion (in oth­er words, cap­tur­ing the source of a genealog­i­cal state­ment, the abil­i­ty to pro­vide
  • Key as seen (Rep­re­sen­ta­tion) — In oth­er words, how do we nor­mal­ize data while pre­serv­ing the orig­i­nal “as-keyed” ver­sion?
  • Sta­t­ic data inter­change

After the first hour, devot­ed to cre­at­ing this list, we were to vote on buck­ets of tech­no­log­i­cal or fea­ture issues to come up with one or two we could dis­cuss. For me, the biggest issue was not any of these tech­ni­cal issues, it was the lack of a gov­er­nance mod­el. Since no one was signed up to main­tain GEDCOM, it did not change with the times, and died as a stan­dard; in oth­er words, peo­ple saw gaps and addressed them in a pro­pri­etary way, since there was no way to get issues addressed with­in the stan­dard.

I got up and sug­gest­ed we talk about how we build a work­ing gov­er­nance mod­el instead of the issues that the gov­er­nance mod­el would help us solve. For more than a decade, peo­ple have been lament­ing the lack of a stan­dards body to adju­di­cate issues, devel­op a com­mon stan­dard, and sub­mit it for pub­lic review. At the same time, peo­ple have point­ed out the fea­ture gaps, and pro­posed ways to address them. For the fea­ture gap dis­cus­sion to have an effect, how­ev­er, we need to have a place to have these dis­cus­sions that is actu­al­ly designed to main­tain a work­ing stan­dard. Lack of gov­er­nance, not lack of tech­nol­o­gy, is the issue. We vot­ed, and changed the direc­tion of the meet­ing to dis­cuss gov­er­nance.

It was at about this time that Tom Creighton, the CTO of Fam­il­y­Search, got up and announced that Fam­il­y­Search is near­ly ready to announce a new pro­posed data mod­el. This changed the meet­ing imme­di­ate­ly. Instead of an open dis­cus­sion, it became more like a press con­fer­ence, with Tom field­ing ques­tions about what they have done, when the work will be shared, and so on. There was not a lot that he was able to divulge at this point.

Key por­tions of the new pro­posed stan­dard are based on the Gen­Tech genealog­i­cal data mod­el owned by the Nation­al Genealog­i­cal Soci­ety (full dis­clo­sure, I am on the Board of the NGS). The deci­sion to make the new pro­posed data mod­el pub­lic and free has not yet been made by the man­age­ment at Fam­il­y­Search, but is being dis­cussed. This means that there can­not be a date set for the launch of the new stan­dard, as it could remain the intel­lec­tu­al prop­er­ty of Fam­il­y­Search, and unavail­able out­side of Fam­il­y­Search. (Mr. Creighton said that they had dis­cussed the fact that they were devel­op­ing a new stan­dard with sev­er­al soft­ware ven­dors, but had not pro­vid­ed any of them any more detail than that they were work­ing on some­thing.)

This is an excit­ing devel­op­ment in the inter­sec­tion of geneal­o­gy and tech­nol­o­gy. If Fam­il­y­Search decides to share their work, and if a gov­er­nance body can be iden­ti­fied or set up, and final­ly if that gov­er­nance body has the trust of the genealog­i­cal com­mu­ni­ty, includ­ing:

  • the major desk­top and mobile appli­ca­tion devel­op­ers
  • the major web data­bas­es
  • the NGS
  • NEHGS (New Eng­land His­toric Genealog­i­cal Soci­ety)
  • FGS (the Fed­er­a­tion of Genealog­i­cal Soci­eties)
  • BCG (the Board for Cer­ti­fi­ca­tion of Geneal­o­gists)
  • APG (the Asso­ci­a­tion of Pro­fes­sion­al Geneal­o­gists)

we could be near the start of a much more rich tech­nol­o­gy envi­ron­ment. A new data mod­el, address­ing issues with GEDCOM and upgrad­ed and changed through a com­mu­ni­ty gov­er­nance mod­el could lead to inte­grat­ed set of inde­pen­dent­ly devel­oped soft­ware tools that would allow peo­ple to rep­re­sent their research bet­ter than they can with GEDCOM, and bet­ter share their data or move it from one vend­ed prod­uct to anoth­er.

It sounds a lit­tle like Shangri-la as I write it here, but we are talk­ing about the incred­i­ble poten­tial that would be unleashed if most soft­ware ven­dors did not have to fix inde­pen­dent­ly (or ignore) issues with the cur­rent data mod­el, and could instead focus on the next new way to access and work with genealog­i­cal data.

Update, 17 Feb­ru­ary 2011: A sum­ma­ry of the meet­ing dis­cussed here has been post­ed on the Fam­il­y­Search wiki: https://wiki.familysearch.org/en/Genealogical_Data_Standards_(RootsTech_Session)