The dark side of Graph Neural Networks

efferifick · on June 29, 2022

I am really interested in GNN in the context of compilers.

* Predicting the color of a node in a graph, could be used for example speculative devirtualization.

* Predicting edges weight could give us a better estimate of hot basic blocks statically.

* Running performance experiments is as easy as running the benchmark and introducing some metric of performance which you can give back to the GNN to learn from.

Imagine also for debugging and IDEs. I haven't played with copilot, but I imagine that something like guessing the graph based on node labels and on some edges is feasible using GNN? This means that the IDE could try to match the name of your variables, the control flow, and the name of the function to other known functions and could potentially point out differences. Potentially giving us a way to fix logic errors, or better algos. E.g., "Mmm... it looks like you are implementing your own bubble sort from scratch. Would you like to click here and use <insert better sort>."

I am not an expert on GNN, but if anyone has resources for someone to learn more about the state of the art of GNNs (a link to a literature review or something similar) do let me know.

davidatbu · on June 29, 2022

> I haven't played with copilot, but I imagine that something like guessing the graph based on node labels and on some edges is feasible using GNN?

Copilot is based on OpenAI Codex, which is based on GPT-3, which is a transformer model.

Although technically, transformers are mostly GNNs that are "fully connected" (in the graph theory sense), I don't think that supports your speculation here about how GNNs could be used for code analysis since the "tokens" that GPT-3 is trained on are not programming-language syntactic constructs, but sub-word units obtained from natural language (something like WordPiece).

I will say though, I am equally excited by the exact prospect you raised of using something like GNNs for code analysis.

My hunch is that if somebody can figure out a way to make training hierarchical/graph based neural networks very fast, we'll observe the same gains that we did with transformers. But hierarchical/graph based models don't lend themselves to efficient computation.

legothief · on June 29, 2022

I'm also quite excited about that - there's existing research, quite a few papers that are using graph-based models for MLOnCode: https://proceedings.neurips.cc/paper/2021/file/c2937f3a1b3a1... https://arxiv.org/abs/2203.05181 https://arxiv.org/abs/2005.02161 https://arxiv.org/abs/2012.07023 https://arxiv.org/abs/2005.10636v2 https://arxiv.org/abs/2106.10918

Definitely check them out! There are also tools that were made available by some of the authors: https://github.com/google-research/python-graphs

algo_trader · on June 29, 2022

Are these papers somehow "curated" or "recommended"?!

Unfortunately, GNNs are lagging LLMs in the code domain. Maybe because

a. LLMs and transformers rulezz OR b. there is far more source code than there are compiled code graphs

davidatbu · on June 30, 2022

I wouldn't rule out the fact that transformers are very amenable to parallel computation as the reason

davidatbu · on June 30, 2022

Thank you!! I've been looking to get my feet wet with SoTA research for MLONCode, so this is very helpful!

refulgentis · on June 29, 2022

Are graph neural networks designed to solve graph problems like this, or are they a "graph problem"? Or both? :p

efferifick · on June 29, 2022

Mmm... maybe I'm mistaken. Thanks for the opportunity to reflect a bit more about this. I was remembering this video [0] which talks about Graph Embeddings. In the video, the speaker talks specifically about node classification. Assuming the classes are target functions, this could potentially be used for speculative devirtualization.

Definitely not an expert, just excited to have more tools on which to work with graph data!

[0] Graph Embeddings - Neo4J. Talk by Alicia Frame. https://www.youtube.com/watch?v=oQPCxwmBiWo

plonk · on June 29, 2022

I learned them as a generalisation of CNNs where the data is stored in any graph, CNNs being a special case where the graph has a grid structure. Convolutions spread information from nodes to their neighbours. It's just implemented differently because the data isn't a neat 4D array anymore.

In that way, GNNs solve graph problems, the same way CNNs solve image processing problems. Training the GNN is more of an optimisation problem.

Edit: maybe graph theory can help with training on very large graphs but I don't know enough about that.

lmeyerov · on June 29, 2022

Yeps, that's a GCN, and probably the most popular GNN is indeed a RGCN :)

The transformer comment is interesting. They're very close, and in general, the tricks people use elsewhere are getting translated to GNNs: convolution, attention, ... . But scaling is still happening, so recent couple of years have gotten folks doing 1M-1B level, but not yet LLM scale yet. Critically, the scaling work is relatively recent -- good GPU impl, good samplers, etc -- and with a good trajectory.

We tracked GNNs for years but stayed away until heterogeneity + scaling started to get realistic for commercial workloads, and that's finally happening. Major credit to folks like deepmind, michael bronstein, early graphsage / jure, various individual researchers, and now aws+nvidia engineers for practical engineering evolutions here.

MonkeyMalarky · on June 29, 2022

Coming from a position of knowing nothing about graph neural networks, this is the best description I've seen so far in this thread.

jcims · on June 29, 2022

Somewhere in my HN history is this same question and I can't say I've got a conclusive answer. My partially confident takeaway is that GNNs describe the architecture of the neural network itself, much in the same way that convolutional or recurrent are terms used to describe other network architectures.

There are two confusing parts for me

1 - The words network and graph are nearly synonymous in this context, and IIRC most neural network architectures are actually graphs that fit some specific pattern. I don't know what makes a 'graph neural network' special (my guess is it has to do with how the layers relate but i don't know)

2 - I almost always see a mention of a graph-related use cases in the context of GNNs. I don't know if there is a fundamental reason for that or if it just so happens that people who have huge graphs worth applying ML to are actually just have really good intuition about how graphs can be leveraged and go that route.

posterboy · on June 29, 2022

I have no idea to be honest but I think it relates to the idea of representing the computation in a reasonably comprehendable way.

Either that or networks that can solve specific problems on graph structures by more or less general and hence reusable methods. Which could go ways towards the former point, I guess, but that's also dealt eith elsewhere. Just give g-scholar a search and see

legothief · on June 29, 2022

In my mind, GNNs are designed to solve graph problems, in the usual case, with message passing, that enables (I'd emphasise the aggregation step) to "do ML on graphs".

pca006132 · on June 29, 2022

> Potentially giving us a way to fix logic errors, or better algos. E.g., "Mmm... it looks like you are implementing your own bubble sort from scratch. Would you like to click here and use <insert better sort>."

I think this already exists in software engineering research, but iirc they were comparing against some code snippets gathered from other open source project or use language models instead of GNN.

yobbo · on June 30, 2022

> "Mmm... it looks like you are implementing your own bubble sort from scratch. Would you like to click here and use <insert better sort>."

Theoretically, it should be possible. There might not exist demos of these sorts of networks yet.

moonchild · on June 30, 2022

There was some interesting work done on using ML for autovectorisation. Pldi keynote from a year or two ago iirc.

mhh__ · on June 29, 2022

Facebook published a paper on using ML to generate PGO data statically.

K0SM0S · on June 29, 2022

Hypothesizing that graphs could lead to AGI is tantamount to equating part of the neocortex to the whole body. A model does not make reality, especially when the two are designed to work in a permanent feedback tandem.

Since writing A Thousand Brains, Jeff Hawkins has revealed fascinating structures within the brain, a finite set of structure 'types' so to speak (families of similarly architectured brain parts).

Graphs are definitely part of the biological design, but in taking inspiration from nature to build our own beings, we should take notice that the real thing is vastly more complex, and investigate more exhaustively the ins and outs of real brain structures.

ianbutler · on June 29, 2022

Your point about equating the neocortex to the whole body led me to write this out:

I don't think I have any basis for this besides a gut feeling and some daydreams, but I think each of the major methodologies like reinforcement learning, transformers, graph nns etc need to be combined into some larger type of ensemble and worked together into a cohesive system with feedback loops for online and offline learning for a shot at AGI.

I've been doing ML projects for like 6 years, mostly in NLP but have dipped into reinforcement learning because it interests me and my gut feeling has been much that there are a lot of complimentary learning systems that can handle different problems really well and cover for limitations in others and I'd like to see what happens if we smash them together towards the goal of generally beating baselines for as many benchmarks as possible.

K0SM0S · on June 29, 2022

This is exactly my personal intuition as well, almost to a T. Here's to the satisfying consilience of independent thinking reaching the same hypotheses.

JordanRomanoff · on June 29, 2022

I was reading through George Lakoff and Mark Johnson's Philosophy in the Flesh last night and had a very similar thought. Their model of embodied cognition is necessarily decentralized in a really interesting way.

lukebechtel · on June 29, 2022

Agree from my gut, a high-level stitching NN layer on top of existing techniques is what will get us the next jump.

ngcc_hk · on June 30, 2022

Maybe be we have a second brain. Or may be some part of the brain is “irrational” and cannot be model. Or strange to say, even we find more and more pattern in the physical things and can link to what we perceive and act, there might be something else. And that something else is more important that others.

Or I just hope there is something there. Unique. Cannot be copied. It might not last or re-birth or incarnations… n fact it might be better that way.

Ok, continue the rationalist approach, great find no doubt. And in fact nn may not even a total rational but a data/probabilist model. But just hope it is incomplete.

throwawaymaths · on June 29, 2022

Isn't the "neurons are graphs" kind of already represented, since graphs are already tensors?

K0SM0S · on June 29, 2022

Theoretically, maybe, but I believe the devil lies in the details: implementation is where "biological hardware" if you will is more akin to a thousand specific sets of TPUs whereas we're trying to brute force / shoehorn the whole processing into a suboptimal one-size-fits-all giant RNN. The inadequacies and inefficiencies of such a shoehorning of the model's execution might (I argue do) absolutely self-defeat the endgoal of a coherent adaptive machine. I'm tempted to humorously say #NotAllParts (need the same underlying hardware optimizations) ;-)

Shower thought that just came up: observe that while we're endlessly chasing a bigger-than-reality String Theory, actually working physics implemented in real-world machines follow the specialized approach of one partial but perfect theory for each category of problems. We build hybrid, because as far as we can tell, reality is variations, and the biological probably most of all. So in trying to build a being…

legothief · on June 29, 2022

That is true, but unfortunately "out of the box", they're not well suited just be "fed into" an NN. Even if you think of the adjacency matrix as very similar to how the weights are laid out in a feed-forward neural network, you can't ignore that:

- in real life, graphs are not fixed

- you need to deal with the many different potential representations of the graph (permutation invariance)

- the nodes are usually containing more features than a single scalar value

but this is definitely not the best explanation, I think this guy does a lot better job: https://youtu.be/JtDgmmQ60x8

throwawaymaths · on June 30, 2022

Sure, but GNNs modeling neurons is nonsensical since the graph is the analyte of the NN, you are not a priori doing anything with the graph. So in a sense my point is "to use NNs to model neurons", using a GNN doesn't buy you anything because the G in GNN isn't being subjected to dynamic activation.

agentofoblivion · on June 29, 2022

There is plenty in this article that is just wrong.

1. GNNs are no more "sequential" than CNNs and are therefore just as parallelizable in this respect (caveat below). A single GNN layer simply aggregates the features of the connected neighbors, just as a CNN aggregates the values of a nearby pixel. This can be parallelized across the nodes/pixels. The next layer depends on the output of the previous layer and is sequential in that sense, but that's true of all forms of neural networks. If other architectures have "won the hardware library" relative to GNNs, it's because GNNs depend heavily on sparse*dense multiplication. The real thing that makes it hard to parallelize is that you have to partition the graph intelligently when splitting across machines because there's a computational dependence between nodes, and you don't want connected nodes to be on different devices. In the metaphor with CNNs, that would be like needing to split a single image across multiple machines and still carry out the convolution operation.

2. It's not true that pre-training doesn't work. It's very common to use unsupervised/self-supervised pre-training to e.g., get node embeddings, which are then fine-tuned on a down-stream task.

3. It's true that the naive application of deep GNN architectures leads to problems like over-smoothing and the information bottleneck, but there are known solutions to each, and it's just rarely the case that you reasonably want/need information from far away in the graph except in special applications. In those cases, you likely want a different graph representation of the data rather than the perhaps obvious one.

4. It's true that GNNs improperly applied to problems, whether choosing the wrong graph representation, pathological architecture, or simply a problem that doesn't have dependence between the data points, will have poor performance. But I don't think that's surprising and I'm sure that there are many problems where simply throwing a CNN at the data doesn't help as well. Obviously, the modeling approach needs to fit the inductive priors of the problem.

cmarschner · on June 29, 2022

Wow this is surprisingly wrong.

ConvNets _are_ message-passing networks. It is easy to see that bitmaps can be seen as graphs, with pixels as nodes and connections to their 8 neighbors (and themselves). Treat every neighbor as a connection of a different type and you can build a ConvNet out of heterogeneous graph convolutions.

A 2D convolution operator is just an efficient implementation that takes this structure as a given and doesn‘t require the graph structure as another input.

This means that the basic arguments of the article no longer hold. Yes, in cases GNNs might be slower or harder to train, but it is not a general rule.

szemy2 · on June 29, 2022

There are many more open questions that we have not found the answer to -- the two blog posts [1&2] on our experience on creating a GNN based project is meant to spark a discussion and to clarify our own thinking on the topic.

We are here to continue the discussion on hn! eg.: We are interested if someone encountered pretraining for Graphs Neural Networks?

[1] https://www.appliedexploration.com/p/graph-neural-networks-f... [2] https://www.appliedexploration.com/p/dark-side-of-graph-neur...

mattheww · on June 29, 2022

There are four questions at the end of the post.

For sure, the second one is answered - it is possible to parallelize GNNs to the billion-scale, while still using message passing. It requires rethinking how message passing is implemented, modifying objective functions that work in parallel, and changing ML infrastructure. You're not going to get to large graphs with generic distributed Tensorflow.

I don't know if the third question is fully answered, but there are many approaches to preserving locality, either by changing architectures or changing objective functions.

Also, errata: PinSage was developed for Pinterest, not Etsy (hence, not EtsySage).

flooo · on June 29, 2022

I’m a researcher working in the fraud detection domain.

Do you have some pointers on scaling GNNs to such large problems?

legothief · on June 29, 2022

Thank you for pointing that out, we've corrected that in the article!

workingon · on June 29, 2022

I've found success using GNNs for point cloud classification, by creating edges between each point using a k nearest neighbor scheme.

lmeyerov · on June 29, 2022

That was a good read. Some slightly different perspectives:

It feels like a wild west in GNNs right now, with a big paper or tool every few weeks. Interestingly, many of the issues discussed in the article are already components in modern OSS frameworks:

- heterogeneity: RGCNs split weight matrices by node type, with diff versions in most frameworks now. Issues like smoothing are interesting too, as discussed.

- scaling: sampling and various memory techniques are enabling handling massive graphs on single nodes, metapaths & other structures enable farther communication, etc. Especially impressive is OSS scaling work by cugraph+dgl teams.

At the same time, even with that, we're finding it's still too hard for non-academic teams to use this stuff in production/operational scenarios: model zoo (esp. some particularly important modeling areas not discussed like time), GPUs, clusters, data pipeline integrations, etc. Imagine having 100K cyber alerts aggreated every day or a 50M users/transactions and wanting to score them... It's nowhere as easy (yet) as something like adding a layer to BERT like NLP people can do. If that's more your speed:

- We're working on OSS automl for graph ai, trying to get typical internal team pipelines to go from events data to decisions & UIs in a few lines. First out was for UMAP and we've been pushing on GNNs more recently, http://github.com/graphistry/pygraphistry . Elsewhere, we're also working on the MLOps side and some key modeling scenarios.

- ... both graphistry + almost all our customers & partners are hiring here! If you're into data (analytics/mlops/dataeng/ds), or general js/python fullstack dev, a lot happening here for missions like supply chain, cyber, fraud, & misinfo. Would love to chat!

EDIT: Another two things I've found interesting:

-- GNNs are probably most researched by folks in material sciences (chem, physics, ...), and it's really changing things like protein folding, and maybe next biggest on social networks. We see equal practicality for other key problem (cyber, fraud, supply chain, ...), but we're seeing much less academic work in these, and I think that's b/c academics are at a significant data disadvantage vs most industry teams. So even though big interest outside of the academic team areas, it's quite early days.

-- Industrially, we're seeing GNNs promoted primarily by graph databases.. but almost all graph databases are CPU-based vs GPU-based, and in our polls of commercial GNN usage, typically not as part of a graph DB pipeline but something like regular data lakes (parquet, ...) feeding into regular scalable GPU compute tiers

legothief · on July 3, 2022

Really nice, I haven't heard of pygraphistry, looks like something that would have made our lives a lot easier!