Thursday, 15 May 2014

Multiple fact citations

In what is sure to be yet another post that most will find uninteresting, but which serves to clarify my approach as I go through the documentation process of the Genealogy Project, today's discussion will be around how I document multiple citations of the same or similar information within a single source document, as frequently occurs in family histories.  There are two particular choices that I have made here.

1.  When the source contains similar, non-conflicting information in two separate places, I do not cite both pages in the citation.  For example, if the family history first mentions the person on page 27 as a child, and then proceeds to give the details of the family on page 58, the citation in the Genealogy Project will only reflect the first mention, in this case, page 27.  While this is not entirely correct, I do this primarily because (1) if you have the source, you should be able to find the first reference, and then determine where the others might be, and (2) for database citation purposes, adding page numbers every time the same fact is cited makes the source citations unnecessarily large.

However, if the source gives a name as, say, Catherine Evelyn Smith on page 27, and then states the name as Kathleen Evelyn Smith on page 58, this will result in two separate citations, to reflect the disparate pieces of information.

2.  The corollary to this rule is this - there is also only one citation for similar, non-contradicting information within a single source.  For example, if in one place, the source states that an individual was born in 1834, and in a second location states that the person was born on May 3, 1834, there is a single citation for May 3, 1834.  Where there is differing information within the same source, both pages are separately cited.  Again, I do this for much the same reasons as previously.  Additionally, adding multiple citations for variations on the same set of facts from a single source does little to help ascertain the veracity of that set of facts when evaluating all facts about the individual as a whole.  Accordingly, I prefer to treat the fact as the best representation that could be obtained from the source, citing accordingly.

I'd be curious to hear how others deal with the issue.

Wednesday, 7 May 2014

Source ratings

This might turn out to be a road that wasn't worth turning onto, but within the Genealogy Project, each fact source citation is rated.  For simplicity, I use the four-star rating system built into FTM 2014.  This rating system is clearly not perfect, but as a quick evaluation tool of the reliability of certain facts, it serves it's purpose.  What follows is a quick discussion of the different components, and how they are generally evaluated within the Genealogy Project.

1.  Original vs. Derivative:  Throughout the Genealogy Project, I opt for the simplest interpretation of this evaluation, i.e., the source is either the original record or not.  That is, by far, the easiest way to evaluate this.  Problematically, one could consider different records to be either original or derivative depending on what event you are referring to.  For example, a birth record could be an original source for the birth of the person that it records, but you could consider it a derivative source if it is a citation for the mother's birth date.  Similarly, a census record might be an original source for some information (location at a specific time), but a derivative source for almost everything else.  The main problem with this sort of hair splitting is that it can lead you to under or over reliance on one record to the detriment of other records, and the rationale for choosing one versus the other would need to be documented in almost every instance.

2.  Clear vs. Marginal:  This is a bit more challenging, especially given that there's another category for "Direct/Indirect".  Within the Genealogy Project, this is primarily used to document whether the source document is clear (not smudged, handwriting is clear, etc.).  Marginal would indicate that there was some difficulty in determining exactly what the date/name/place was supposed to be, given that the source documentation was faint, or difficult to read.  I've read notes elsewhere that this should be applied to the "original source" documentation, but this is extremely difficult to apply in practice, as a large number of sources are derivative (birth record indexes, marriage indexes, etc.)

3.  Primary vs. Secondary:  The idea here is whether the person providing the information had direct knowledge of the fact.  The challenge here becomes much greater - who provided the information on census records?  The person compiling a family history probably had direct knowledge of some facts, but not all.  Some broad guidelines have been implemented at the Genealogy Project to assist in the evaluation - census records are considered to be primary, regardless of the fact; family histories are considered to be secondary.  Indexes are considered to be secondary, original birth, marriage, and death records are considered to be primary.

4.  Direct vs. Indirect:  Much easier to interpret.  Does the source directly state the fact?  Or is it implied?  At least one of the challenges comes back to gender - if the source says "She died in 1823", I've treated that as an indirect citation - although it is better than the typical assumption based on name, and there's nothing in the overall rating to differentiate between the two citations in that regard.  Still, within the Genealogy Project, the indirect reference to a person as male or female has been rated as such.

While this provides a quick evaluation of any given source, there are lots of other factors which could be considered as well - proximity to a given event, for example, or the level of consistency within a particular source.  Nonetheless, this is what I'm using for the time being.

Thursday, 1 May 2014

Sourcing challenges: gender

Probably the most recurring challenge that I've come across so far in my inputs has been gender sourcing.  When relying on family histories or other secondary sources (more on this in later posts), many times the history consists of listings of generations and persons within those generations, with names, dates of birth, death, and marriage, and occasionally additional notes on achievements or things that are known about the individuals.  Rarely, if ever, is gender explicitly identified in such sources.

That means that likely, assumptions are being made about the gender of the individuals, usually based on a couple of factors:  name (and typical naming conventions), name of spouse (if known), and surnames of children (if given).  However, there's a small challenge - FTM 2014 automatically creates a gender fact, and will populate it either based on the gender of the stated spouse, or, alternatively, populate it with unknown.

Since the goal of the Genealogy Project is to have all facts sourced, this creates a bit of a conundrum.  The approach that I've taken to date is simply to source the initial gender fact with the first source used for the individual.  From a rating perspective, the gender fact will typically get rated with "zero" stars, which essentially means that the fact is implied or assumed.  I've not documented the assumption rationale for every single gender fact - in most cases, this is simply based on the person's name and typical naming conventions, and quite frankly, this is probably one of the less significant assumptions made.

So, most individuals within the file are identified as male or female.  Occasionally, a person will be "unknown", but usually very little else is known about the individual (for example, nothing is known about the spouse or children) and the individual has an ambiguous name - "Willie" to use an example that actually comes from the file.  It is worth noting that sometimes names can be misleading - there's a female "Frank" in the Genealogy Project.

Even when more is known about an individual, gender is rarely explicitly stated except in birth records or census records.  Histories usually have language that refers to "he" or "she", which obviously assists in the determination, but this still ends up getting a "zero" from a source evaluation standpoint.