RootsTech 2011: Towards a New Genealogical Data Model

On Saturday at the RootsTech conference in Salt Lake City, there was an open discussion session on genealogical data standards. There has been a heated discussion, literally going on for years, about a new data model that could replace GEDCOM. A new GEDCOM standard would address GEDCOM’s gaps – for example, being able to store evidentiary analysis within the data model – and be a living dynamic standard, unlike GEDCOM, which has been static since 1996.

In the first hour, the discussion identified several issues with the data model:

  • Data in Proprietary Formats – Because of gaps in GEDCOM, and the lack of a standards body to address this issue, most software vendors developed their own proprietary extensions, which limited the ability to share data.
  • Lack of Persistent URLs (PURLs)
  • Unstructured Text
  • Tag & Link Issues
  • Inconsistent Search Experience
  • Data Versioning (Diff/Merge)
  • Inability to Transfer Rich Data (rich media)
  • Inability to do Cross-Repository Search
  • Documentation (in other words, capturing the source of a genealogical statement, the ability to provide
  • Key as seen (Representation) – In other words, how do we normalize data while preserving the original “as-keyed” version?
  • Static data interchange

After the first hour, devoted to creating this list, we were to vote on buckets of technological or feature issues to come up with one or two we could discuss. For me, the biggest issue was not any of these technical issues, it was the lack of a governance model. Since no one was signed up to maintain GEDCOM, it did not change with the times, and died as a standard; in other words, people saw gaps and addressed them in a proprietary way, since there was no way to get issues addressed within the standard.

I got up and suggested we talk about how we build a working governance model instead of the issues that the governance model would help us solve. For more than a decade, people have been lamenting the lack of a standards body to adjudicate issues, develop a common standard, and submit it for public review. At the same time, people have pointed out the feature gaps, and proposed ways to address them. For the feature gap discussion to have an effect, however, we need to have a place to have these discussions that is actually designed to maintain a working standard. Lack of governance, not lack of technology, is the issue. We voted, and changed the direction of the meeting to discuss governance.

It was at about this time that Tom Creighton, the CTO of FamilySearch, got up and announced that FamilySearch is nearly ready to announce a new proposed data model. This changed the meeting immediately. Instead of an open discussion, it became more like a press conference, with Tom fielding questions about what they have done, when the work will be shared, and so on. There was not a lot that he was able to divulge at this point.

Key portions of the new proposed standard are based on the GenTech genealogical data model owned by the National Genealogical Society (full disclosure, I am on the Board of the NGS). The decision to make the new proposed data model public and free has not yet been made by the management at FamilySearch, but is being discussed. This means that there cannot be a date set for the launch of the new standard, as it could remain the intellectual property of FamilySearch, and unavailable outside of FamilySearch. (Mr. Creighton said that they had discussed the fact that they were developing a new standard with several software vendors, but had not provided any of them any more detail than that they were working on something.)

This is an exciting development in the intersection of genealogy and technology. If FamilySearch decides to share their work, and if a governance body can be identified or set up, and finally if that governance body has the trust of the genealogical community, including:

  • the major desktop and mobile application developers
  • the major web databases
  • the NGS
  • NEHGS (New England Historic Genealogical Society)
  • FGS (the Federation of Genealogical Societies)
  • BCG (the Board for Certification of Genealogists)
  • APG (the Association of Professional Genealogists)

we could be near the start of a much more rich technology environment. A new data model, addressing issues with GEDCOM and upgraded and changed through a community governance model could lead to integrated set of independently developed software tools that would allow people to represent their research better than they can with GEDCOM, and better share their data or move it from one vended product to another.

It sounds a little like Shangri-la as I write it here, but we are talking about the incredible potential that would be unleashed if most software vendors did not have to fix independently (or ignore) issues with the current data model, and could instead focus on the next new way to access and work with genealogical data.

Update, 17 February 2011: A summary of the meeting discussed here has been posted on the FamilySearch wiki: