Moonshine
- 3 Devlogs
- 16 Total hours
Find's new materials,
Find's new materials,
I finished the Gamma Processing layer. This is the part of the file that takes in the formulas we created and screened in the Beta Processing Layer and then creates there structures and run a few initial validity tests on them.
.
First we start off with importing all the infinity stones of materials science. Then we get into our functions
.
First comes the Initial Super Gamma function. Ooooo fancyyy. No just kidding, I named everything so that I could tell which step of the process it came in just by reading the name. Initial tells me its the first and the Keyword super tells me its one of if not the first function to be run. Since keeping the order of every function run is super important for this to be built well this helps a lot. In the Initial Super Gamma Function we have a LLM, whos job is to take a look at the formula we created in the Beta Processing Layer and figure out what are the correct inputs to put into PyXtal for that formula. I will get into PyXtal in a bit
.
Next comes the SubGammaFunction. This function is nothing more than a overglorified string parser. It takes the output from the LLM, finds the key information we want, Dim, Group, Species, NumIons, and extracts them into variables we can input into PyXtal. Now all that seams straight forward right? So you can imagine my surprise when I was getting error after error in parsing the LLM response. I was so confused for some reason it could not extract the species from the response???? Yeah no it turned out I spelled species wrong in the function call. :)
I know I make complaint jokes but its bugs, erros, and moments like that , that really make coding such a laugh, even those moments I really so enjoy.
.
Now back to the project! After parsing out all the information we can finally get to the interesting part 😈. CREATING AWESOME CRYSTAL STRUCTURES!!!!! PyXtal is a python tool that lets us randomly generate structures that don’t disobey fundamental laws of chemistry, such as Wyckoff positioning. The main challenge I had to overcome here was maintaining the structural group I had assigned to this crystal. PyXtal would be actively trying to create the best structure it thought was best but I wanted to preserve certain geometries to test since that is the most important thing in crystal creation
.
ANOTHER HUGE PROBLEM, I KEPT RUNNING OUT OF FREE CREDITS TO USE THE HUGGINGFACE AI MODELS. In the middle I tried to switch to Ollama locally hosted models I have before, but the remembered why I left them, THEY TAKE FOREVER.
.
After we create the PyXtal structures we run them through CHGNet to relax the structures, PyXtal SUCKS at creating low energy, high stability structures, its good at creating structures that don’t break certain rules such as Wyckoff positioning. After CHGNet relaxes the structure. It gives up the Stress values on the structure and the forces acting on the lattice of the crystal. We run 2 small loops to check each value and make sure its under 0.1 which is the value that is considered unstable for both metrics. If it passes the tests then its structure file is stored.
Firstly I wrapped up the first step of this pipeline. I firstly changed Array Process to Alpha Process, and inside of it created functions for each step of that process making everything clean, professional, and modular.
.
After that I created a file called BetaProcess and restarted the Chemistry Checks there since there turned out to be MANY more steps to that process than what I had initially thought. Now what happens is firstly the Arrays we created in the Alpha Process are sent to deep seek to be put back together into molecular formulas. I this requires a new overly long prompt that details out all the steps the AI must take and chemistry rules and exceptions it must consider. I took some time in perfecting this prompt. After we get the reconstructed molecule we have to parse it out. Right now the LLM gives the molecule along with a whole response. I originally tried using Re to parse out the molecular formula. That didn’t work so then I had to format the LLM response to keep the answer at the end so I could just run normal string parsing to get the molecule. Then its shot off to the same LLM and prompt as the last devlog to find out how plausible the molecule is.
.
The challenges in this phase were making sure the deepsake model could put together even the hardest type of molecules so a big part of this was continuous testing and modifying of the prompt.
.
LLM’s are strange in how they always feel the need to have a 2 page explanation for every response. Because of that a bug challenge was figuring out how to parse out the information we want from each LLM response.
.
I was getting a headache looking at the outputs of this phase. Just words and number mushing together everywhere. So then I spent a lot of time formatting the output of this phase to make the information readable, understandable, and intuitive to follow.
I started building the version 2 of Project Moonshine, a algorithm that’s meant to discover new crystals. This new engine takes a input of a reference material such as NH4H2PO4, and then breaks it into individual elements. It then systematically replaces every element with every element from the periodic table to generate THOUSANDS of potential new crystals. I got this idea from how real materials science discovery takes place. Scientists in a simulation mix and mash elements, using there extensive knowledge of chemistry to create new materials they think could work. Then atoms and elements are changed more to see what else can be created.
.
After we have our candidates I decided to first run initial screenings on the candidates to remove the ones that obviously won’t form stable crystals. I was initially going to test everything so we don’t lost any new crystals, but from version 1 I learned the enormous amount of time and compute that takes place in running high accuracy tests on these candidates (about 5 hours per test per molecule), and considering the fact I have over 10,000 CANDIDATES! and factoring in I want to finish this within the decade, I decided to run initial screenings.
.
I use the Qwen 2.5 model for plausibility analysis and deep seek to reconstruct the crystals from there array form that we created them in. When creating the candidates we break everything up into arrays, I thought the process of putting them back together would be as simple as counting the amount of times they appear in a array and then creating the empirical formula based on those subscripts of the elements. BUT NO. CHEMISTRY HAS 1000 RULES AND A 1000 MORE EXCEPTIONS IN MOLECULE FORMING. Instead of coding in every single one of these rules and exceptions , I decided to just use a AI model with strong reasoning. I am currently testing if this system for testing works.
.
Next I plan on improving the Chemistry rule set that these AI’s use so that I can make more accurate judgments about molecular forming and stability.