joshuazelinsky | Entries tagged with graph theory

Last year, my students in the seminar studied graph theory. In the second semester we did a research project on total difference labeling of a graph. They had some really good ideas and a preprint for it is now available here .

Recall that by a graph, we mean a collection of points, called vertices or nodes, where pairs of nodes can be connected by edges. This is useful for representing data. For example, one can imagine four people, Alice, Bob, Carol and Diego. Maybe Alice is friends with Carol, and Bob, and Bob is friends with Diego, and no one else is a friend with anyone else. Then we can represent this data with a graph of four nodes, labeled Alice, Bob, Carol and Diego, with edges between the pairs (Alice, Bob), (Alice, Carol) and (Bob, Diego). One can imagine other contexts where this would be important. For example, one very relevant one right now is to think of graphs which model possibility for disease transmission.

One classic problem in graph theory is the problem of graph coloring. Given each node in a graph, we want to label it with a given color, and we want that no two adjacent nodes (that is nodes connected by an edge) have the same color. We are interested in how few colors we can do this with. For example, you can check that you can color the above graph with Alice, Bob, Carol and Diego with just two colors. A practical variant of this problem shows up in scheduling. Suppose for example, that you have a whole bunch of classes with students in them, and you want to schedule each class so no student has to be in two classes as the same time. Then you can make a graph where each node is a class, and two classes share an edge if at least one student is in both classes. Then, the minimum number of colors needed is the minimum number of distinct time slots you need for all the classes.

A total labeling of a graph is a little different. In a total labeling of a graph, we label each node with a color, but we also label each edge. And we insist that no two nodes which share an edge may be the same color, but we also insist that no two edges which share a node may be the same color. Finally, we require that no edge is the same color of either of its two nodes. (Exercise: What is the minimum number of colors needed to put a total labeling on the graph of Alice, Bob, Carol and Diego above?) Now, so far we've done this with colors. But there's no need to do this with actual colors. We can label our nodes, or our edges instead with numbers, where each number represent a specific color. We'll write say 1 for blue, 2 for red, and so on. Now, one thing we can do with this is we can now restrict our total labelings in terms of properties the numbers must have. We define a total difference labeling then as a total labeling satisfying that the number on any edge must be the absolute value of the difference of the numbers on its two nodes. For example, if we had nodes labeled 3 and 5, and they had an edge connecting them, that edge must then be labeled 2. We can then ask the same sort of question as before: Given a graph G, what is the minimum k such that we can put a total difference labeling on G using just 1, 2, 3... k. We'll call this the total difference labeling number of the graph. (Exercise: What is this number for our Alice, Bob, Carol, Diego graph?)

The idea of a total difference labeling was earlier introduced by a paper by Ranjan Rohatgi and Yufei Zhang where they calculated this for a whole host of graph families. The work with my students extends their work in two major ways. First, Rohatgi and Zhang gave an upper estimate on the total difference labeling number of a complete graph on n nodes. (A complete graph is a graph where every node is connected to every other node.) Their estimate grows exponentially in n, and we were able to reduce this to only polynomial growth. Second, we extended their results to some nice infinite graphs, such as calculating the total difference labeling number of an infinite square lattice. The paper does not assume much background, and should be probably somewhat readable to non-experts.

A preprint proving the Erdos-Faber-Lovasz conjecture (EFL) for sufficiently large n just came out. Initial reactions are positive. The authors, Dong Yeap Kang, Tom Kelly, Daniela Kühn, Abhishek Methuku, Deryk Osthus, are all serious graph theorists, so simply from an author list this looks like a real attempt. What is EFL and why do we care?

Recall, by a graph, mathematicians and computer scientists mean a collection of points (called vertices or nodes) some of which are connected by edges. For example, one might have a collection of people, some of whom are friends with each other, and we make a graph where we draw an edge between two nodes if the corresponding people are friends with each other.

In a graph, a clique is a collection of nodes of which each node in that collection is connected to every other node in that collection. For example, if we had the friend graph of Alice, Bob, Carol, and Diane, with Alice friends with Bob and Carol, and Bob friends with Carol and Diane, then Alice, Bob and Carol form a clique. The reason we use the word "clique" should be hopefully obvious. (A minor note: We are using connected here in a slightly non-standard fashion. In graph theory we normally speak of two nodes which have an edge between them as being adjacent and use connected to mean that we can follow a path of adjacent nodes between them. But I'm trying to keep the phrasing here as close to regular speech as possible.)

One thing we are interested in with graphs is graph coloring. This is where we try to give each node a color and we want that no two nodes which have an edge between them have the same color. One of the original contexts people cared about this was the Four Color Conjecture, which stated that if one had a map of countries, one never needed more than four colors to color the map, with no two bordering countries the same color. (We are assuming each country is a contiguous section, so nothing like Kalingrad or Alaska allowed). This is equivalent to saying that if I have a graph which is planar, in the sense that I can draw it with no edges, overlapping, then I can color it with at most four colors. Here, each node corresponds to a country, and each edge represents that two countries share a border.

Coloring problems show up in more practical contexts. For example. Suppose one has a school with many students, and some of those students are taking the same classes. We cannot have a student in two classes at the same time. So we wish to schedule the classes so that no two classes are in the same time slot. Then we can make a graph where we think of each class as a node, and two classes have an edge if they share at least one student. Then the minimum number of colors needed to color the graph is the number of time slots needed to fit all the classes in.

Now, for any graph, we can write that graph as a collection of (possibly overlapping) cliques. So let us imagine a graph G which can be written as a collection of n cliques, and none of those n cliques themselves have more than n nodes. Assume further than no two of those n cliques share more than one vertex. (An example of this with n =4 is in this picture ). Then the conjecture is that for any such graph, we do not need more than n colors to color its nodes. This is a famous and notorious difficult problem. Erdos considered it to be one of his favorite graph theory problems.

What this paper claims to show is not the whole EFL, but that if n is large enough, EFL is true. If this is accurate, this is a major breakthrough that will likely have implications about other graph theory problems. I have not had time to read the paper carefully, and it is pretty technical, so I don't know where the sufficiently large n assumption is coming in, although I think it may be coming from their modified use of the Rodl nibble method.

There is an idea in mathematics of a "graph," not in the sense you may have heard of y=f(x), but rather in the sense of a network with a collection of special points, which are called vertices or nodes, which are connected if they share some relation. For example, on a social network like Facebook, one might imagine that each person is a node and two nodes are connected if they are friends. We say a graph is connected if one can travel along edges from any given node to any other node. For example, consider the friend graph of Alice, Bob, Carol, Diane and Eric, it might be that Alice, is friends with Bob who is friends with Carol, and Diane is friends with Eric. This graph is not connected since there's no way to get from Eric to Alice. But if Bob became friends with Diane, then this graph would now be connected.

Here is a question I don't know the answer to. Suppose that we make a graph using the following process. We start with n points, and no edges. For our first step, we pick a random node (Which for convenience we'll call 1). We then pick a second random node, we'll call 2 and add an edge between those two nodes. We then move to our new node 2, which we just connected. We now pick a random node which does not have an edge connected to 2, connect that one, call it 3, and then move to 3, continuing this process. We'll stop when the entire graph is connected. On average, for a given n, about how long do we expect it to take before the graph is connected? This isn't immediately obvious, since in our process, by bad luck we can go back to a previous node which is already connected. For example, from node 3, we might end up going back to node 1, which is essentially a wasted move.

Here's a related question: We say a subset of nodes of a graph is a clique if every element in that set has an edge connecting to every other element. In terms of friends this means that every person in the group is friends with every other person in that group of people. For example, consider the friend graph of Alice, Bob, Carol, Diane, Eric, and Fred. Suppose that Alice, Bob and Carol is each friends with the other two. Carol is also friends with Diane who is friends with Eric who is friends with Fred. And there are no other friend relations. Then we have a clique of size 3 from Alice, Bob and Carol, and no larger clique exists in the graph

In our above process of making a graph by randomly adding edges, for a given n, what do we expect on average the largest clique to be in the graph we have formed?

Note I also posed these questions on Facebook where some interesting partial results were made in the comments. One related question, namely the same question for directed graphs were also raised. Empirically the answers seem to be approximately the same. One major observation in that thread was that it appears that the answer to the first question grows at about n log n, and that if one made a directed graph version of this, the rate seemed close to that also.

There's been a fair bit of discussion about people running summer camps this summer. Right now, I've seen both Christians and Jews arguing that their summer camps for kids should run, and that with appropriate safeguards there won't be any serious risk of issue. There's a lot wrong with this argument, but one aspect which is possibly not getting enough attention is something that is an implied assumption people are making without even realizing it, which is that if one increases the scale, the number of probable infections will rise linearly and that the chance of an infection event rises roughly linearly. Neither of these are the case. My suspicion is that this sort of assumption is occuring not just in the camp discussions but in all the discussions about opening up. So let's discuss these ideas.

First, let's imagine a game. In the first game, you'll get to roll a regular six sided. If it turns up 1, then one loses the game, otherwise one wins. The chance of losing is 1/6. Now, play the same game, but roll 10 dice. The chance that at least one show up is much much higher. If a one represents an infected person, then as one increases the number of people, the chance that at least one infection is present goes up drastically. This is, by the way, part of the point of isolating in small groups, rather than just isolating with a larger number all of whom seem safe. (And yes, the chance that anyone is infected with COVID in life is less than 1/6 but the same basic pattern holds.)

Now, we're going to imagine a different game. In this game, there are a whole bunch of people. When we start one person has a red slip of paper, and everyone else has a blue slip of paper. We'll play the game in stages. At each stage, every person shakes hands with every other person, but if one of the people they shake hands with has a red slip of paper then they roll a fair 10 sided die, and on roll of 1, they give a red slip of paper to the other person. Having a red slip here represents being infected. For simplicity, we'll imagine that only two rounds happen, so first everyone shakes hands with everyone else, paper is given as necessary, and then a second round of handshakes occur.

Now, we can ask, how likely is anyone to have a piece of red paper at the end? Let's look at the simplest case, where we have two people playing, one with a blue slip and one with a red slip. The chance that the person with the blue slip is 19%. To see this, note that there's a 10% chance in either round that they'll get a red slip , but that double counts the possibility they get a red slip in both rounds, which has 1% chance (1/10 times 1/10 of happening). Now, let's imagine that we play this game with 3 people; a little figuring will convince you that the chance now that either of the two people with blue slips end up with a red slip has gone up: why? Because in the second round, there are more pathways for infection, if one of the two with blue was infected on the first round. By the time one gets to 20 people playing this game, most people with blue slips will end up with a red slip by the end. If one plays with 30 people, almost every blue slipped person will end up with a red slip.

Shaking hands here is of course a proxy not just for shaking hands but for many other forms of contact, including coughing and touching. In the case of teenagers at camps there is probably more other experimental behavior than some of the parents are comfortable thinking about even if they remember their own time at such camps. But the essential upshot is the same; if one has more pathways, the chance for infection goes up drastically. This really is not just about summer camps; similar remarks apply to any gathering of people or extended way for people to interact.