Before you can understand this section, you should read and understand the section Too Many Unclear Terms.
My question is very simple: Is the staging area or staging index part of a repository or not?
Very early on when I was learning Git, I wanted to make personal notes with diagrams because I believe one of the easiest ways to understand Git is to see diagrams, but I very quickly ran into a problem. I couldn't answer the questions "What is a staging area?" and "What is a staging index?" Today, I have a much better idea what these are, but I still can't draw a diagram because I still don't know if they are part of a repository or not.
My original idea was to draw a main diagram that showed all major components of Git, not just a local repository communicating with a remote repository, but also where the user files originated (i.e., the working tree).
Note: In Git, there are local repositories, external repositories, and remote repositories. For the purposes of this section, you can assume external repositories and remote repositories are the same, although I draw a distinction between the two in my glossary.
I wanted to this diagram as a fixed reference from which all other diagrams refer. I thought it should have been pretty straight forward.
Secondary diagrams would be things like showing a file moving from one place to the next as
git commit and
git push were used. The secondary diagrams would neatly fold into the main diagram. It seemed easy enough at first glance, but when I wanted to talk about the working tree, I realized this probably wasn't included in any repository. (Probably.)
For a long while, I couldn't find clarification if the working tree was part of a git repository or not. (Spoiler: It's not.)
Then came the real challenge -- I also needed a way to diagram the staging process in a way that fit into my main diagram. Even today, this still has me stumped.
Very generally speaking, if you want to draw a repository "in its entirety" with any associated peripheries you'll, you'd need to plan for the following:
Note: These items are the heart and soul of Git, but some items are optional. For instance, tags do not have to be used in a repository. Additionally, there is an option for advanced users when creating a repository (
--bare) where working trees and staging index / staging areas are not used at all. (Utilizing the bare option requires an understanding of how to setup a Git server. Setting up an actual Git server is well outside the scope of everything I'm writing in this website.)
My glossary is a statement as to how poor the terminology in Git is. However, Git terminology is not the only thing that suffers.
In my search to find out if the staging area was included in the repository or not, I ran across Pro Git, Chapter 2.2. This chapter should have made things clear what was included in a repository and what wasn't. After all, the chapter title is "Recording Change to the Repository". Unfortunately, it didn't.
One major reason for this confusion was Figure 8 at the top of the chapter. It misguided me more than it helped:
I talked about the term "The Three Trees" in Too Many Unclear Terms and I have a detailed description in my glossary of what the term is, why it exists, and some problems associated with it.
Defined very late via Chapter 7.7 but introduced in Pro Git, Chapter 1.3 as "The Three States", Pro Git talks about "The Three Trees" using various and different terms. The idea of the three trees is surprisingly popular on the internet and can be found in other unofficial documentation like Atlassian.
Sadly, a "tree" is never defined until Chapter 10... and it turns out that the definition of "tree" in Chapter 10 has nothing to do with the three trees. In Chapter 10, Pro Git talks about tree objects which is something completely different. This is very confusing to a new or casual user! Terms should be clearly defined before they explained especially since these ideas are very central to the Git.
Unfortunately, Pro Git still left me with my main question: Is the staging area and staging index part of a repository?
The main diagram I wanted to draw and from which all other diagrams would refer had to know where the staging area and staging index went. My main diagram would contain a local repository, a remote repository, and a working tree, but where does staging area and staging index belong? How do they fit in to the whole picture?
Version 1 of the Pro Git book shows a picture of the repository being defined as the git directory while the "staging area" and "working directory" are clearly outside of the repository / git directory. This is incorrect because staging information is clearly contained within the
Note: When I started writing this website, version 1 was still hosted on the git-scm.com website and showed up in web searches. Before I finished writing this website, version 1 seems to have been removed. I think removing version 1 is an improvement. Despite my numerous problems with version 2, it's actually much better than version 1.
At the time I wrote this, Pro Git was on version 2 of the book, and I can't find any equivalent of this in version 2. I'm guessing the authors realized the diagram was inaccurate and pulled it, but my question was left unanswered.
Thinking about this logically, I know:
.gitdirectory, so this is probably not part of the repository
.gitdirectory (details can be found in my glossary)
I didn't know where staged files were stored, so I ran a quick test: I staged a file and discovered that it is stored in the
.git directory. So... does that mean staged files are stored as part of the git repository? What about the staging index?
Defining whether an item is part of a repository based on whether the information is stored in the
.git directory, we can conclude the following:
This bothers me. I never really considered staging information as part of the repository.
Although I've never seen any definitive information in the Git man pages nor in Pro Git that clearly stated whether the staging area is part of the repository, Pro Git, Chapter 2.1 talks about "cloning a repository". Although I don't trust version 2 of Pro Git to give me a reliable definition for anything, the cloning process does not include staging information.
However, this idea about cloning is interesting. Based on the idea that cloning excludes staged files, it feels like the staging index and staging area don't belong in the repository, even though it is part of the
At this point, I'm going to guess that that the staging area and working tree exists outside of the local repository. I hoping Murphy's law doesn't kick in. I'd hate to put a lot of time into drawing my diagrams only to redraw most of them when the next version of the git documentation comes out.
So, what do I think is included in a repository?
Thus, when I draw my main diagram, I should draw an additional working tree, staging area, and staging index for each repository.
Hmmm... should there be a name that encompasses a working tree, a staging area, a staging index, and a repository? I won't use one in my notes, but it makes me wonder.
Personal Note: I hope it's clear I'm not trying to be pedantic. I sifted through a lot of websites with incorrect information in an attempt to figure out the truth. I have put in a lot of time not only to avoid stepping into the same traps that others have, but preventing you (the reader) from doing so too. Frankly, I just want my diagrams to be correctly reflect what Git is and reflect as closely as possible to official Git documentation, but the lack of clearly defined terminology in Git make me feel like that I have to write something potentially incorrect and that goes against one of my goals for this website.