Moonshine - Stardance

6h 17m 11s logged

IMPORTANT NOTE MOONSHINE TAKES A FEW HOURS TO RUN FOR THE BETA AND GAMMA LAYERS, SO JUST CHECK THE TERMINAL THAT OPENS UP WITH THE UI TO MAKE SURE EVERYTHING IS WORKING AND JUST WAIT YOU CAN ALSO SEE IT WORKING THROUGH THAT TERMINAL OUTPUT JUST MAKE SURE TO DOWNLOAD OLLAMA AND THE TWO AI MODELES THAT GO WITH IT, GEMMA THREE AND DEEPSEAK R1
.
.
.
.
In this last devlog I am showing how I turned my python project into a file that anyone can download and use on there Mac device without fiddling with python files and github repos. And boy was that a adventure. Firstly I know quite a bit about building python projects and everthing to do with building projects inside VSCode. I know ABSOLUTLY NOTHING about turning that into something others can easily use. The battles I faught with learning about file paths and directories and temp folders and the whole bajazz was enough to make Roman warriors run. But nontheless with Claude as my sensie and enough caffine to send a todler to the moon I was able to figure it out and Im going to outline some of it now.

Firstly this project is a multi script project where you run each stage (Alpha - Epsilon) and check the output for each and move onto the next stage and so on jsut becuase thats how I liked it best. Now to comuncate information through differtent scripts I used JSON’s. But when packaging a python project you must make it so that insted of just writting with open (, …) the filename cant just be a name, it has to be a whole path. Learning that fact and how to implement that was a strugle.

Then came the strugle of makeing a master script. At the begining of my master script I was importing each python file for each stage. That mixed with not having a run function in each stage and using if “name” == “main” to run all the main code for each stage resulted in a kathuple amount of erros and problems such as scripts not running, multiple windows being created, and more.

The last bug problem was how to package everything together. I tried command after command with nothing working, only to then learn of how I must move all my files into the Dict folder created when packaging in order to make suer everything works properly. Now I just have to put this in a github release and everyone can use the executable file I have made.

Open comments for this post

@kritik on Moonshine · 22 days ago

4h 55m 5s logged

Realizing the importance of A UI for this I had base 44 whip up a UI you can connect to the python backend once you download it so you can see the stats of the algorithm running

Ship #1 Pending review

@kritik on Moonshine · 26 days ago

I made a materials science engine meant to try and come up with candidates for novel materials with a high likleyhood of stability and validity for more advanced algorithms. The challenging part was integrating all the different tools being used, as well as perfectly tuning all tools to create accurate structures in as little time as possible. I am proud of the beautiful structures created by this algorithm. What to know to test this project is that its very unlikely a new material will be found on the first few tries, this has to be run for a while so start at the beta processing layer and run what is in the Alpha Arrays file. There is also lots of inaccuracies in this pipeline due to using off the self locally hosted LLM’s so you might have to run each layer a few times and not all candidates will be made.

4 devlogs
47h

Try project → See source code →

Open comments for this post

@kritik on Moonshine · 26 days ago

30h 17m 51s logged

MOONSHINE PROJECT FULLY FINISHED. In this devlog I am going to share how I was able to finally complete the Moonshine algorithm into a system that can autonomously generate novel crystal structures. Firstly The Beta and Gamma processing layers were changed to use locally hosted ollama models for the AI reasoning capabilities. In the Gamma layer a few changes were made as well. After constant experimenting with different AI models and architectures for the pipeline this was the one that I landed on. In the Gamma layer Ollama uses Deepseek to generate parameters for PyXtal crystal structure generation. I learned through constant experimentation that the PyXtal library’s randomness for crystal structure generation is much worse than I thought initially. Initially I thought with 10-20 tries the right structure would be generated. But no. Now what happens is the pipeline takes the parameters generated by deepseek and creates 500 pyXtal crystal structures. This adds about 3 minutes to the compute time for each candidate coming in from the Beta Processing Layer, but increases total accuracy by about 40%. The reason this helps is because PyXtal doesn’t get the structure right the first time, so by creating 500 and then picking the lowest energy generation we get the best structure we can from PyXtal. The next change made was the CHGNet relaxation phase inside of the Gamma Processing Layer. Initially I was using a random combination of convergence values, and structure optimizers. I changed that so now there are two layers to the relaxation phase. The first phase is a rough, low level, coarse relaxation that creates a rough initial structure, this is done by keeping the convergence value at 0.01 and using the basic UnitCellFillter optimizer. In the more stricter final relaxation pass the convergence value is tightened to 0.001 and the more advanced FrechetCellFilter is used. This has two benefits. firstly after the first pass the structure can be checked for validity and discarded if it is invalid before moving onto more expensive relaxation, secondly the structure created by PyXtal is relaxed more gently, by first getting an initial form and then being tightened down to its final form instead of taking the initial broken PyXtal structure and immediately forcing it into a tight final structure. It’s kind of like working with clay when making pots. Next comes the new Delta Processing Layer. The delta processing layer tests the structures made by the Gamma Processing layer further by subjecting it to a suite of tests. Atomic overlap, composition, coordination number plusability, oxidation number plusability, and structure matching against the materials project is tested on the crystal. This is done five times per candidate in beta since gamma produces five structures per candidate. This effectively tests validity and novelty for each structure. Finally comes the new Epsilon Processing Layer. This layer analyzes the structures created to give a better understanding of what we created, along with creating slab structures for the unit cell structure. Slab structures are just unit cell structures repeated into 3d space. Unit cell analysis returns density, unit cell volume, lattice parameters, final space group symmetry, local atomic geometry, bond lengths, bond angles, and prototype classification. Then comes the slab analisis. First the slabs are created by pymatgen, the main library used in this entire project, and analyzed for the following: surface symmetry, polarity, surface normal vectors, scale factors, center of mass alignment, termination layer composition, candidate adsorption sites, and total surface area. Slabs are created along the three Miller indices; there are just numbers that tell how the crystal is sliced in 3d space in reference to the three axes, (1,0,0) (1,1,0) and (1,1,1). And that’s the algorithm I spent over a month on. Now I have to just leave it running for a while and see if it finds any new crystal materials.

Open comments for this post

@kritik on Moonshine · about 2 months ago

8h 23m 35s logged

I finished the Gamma Processing layer. This is the part of the file that takes in the formulas we created and screened in the Beta Processing Layer and then creates there structures and run a few initial validity tests on them.

.

First we start off with importing all the infinity stones of materials science. Then we get into our functions

.

First comes the Initial Super Gamma function. Ooooo fancyyy. No just kidding, I named everything so that I could tell which step of the process it came in just by reading the name. Initial tells me its the first and the Keyword super tells me its one of if not the first function to be run. Since keeping the order of every function run is super important for this to be built well this helps a lot. In the Initial Super Gamma Function we have a LLM, whos job is to take a look at the formula we created in the Beta Processing Layer and figure out what are the correct inputs to put into PyXtal for that formula. I will get into PyXtal in a bit

.

Next comes the SubGammaFunction. This function is nothing more than a overglorified string parser. It takes the output from the LLM, finds the key information we want, Dim, Group, Species, NumIons, and extracts them into variables we can input into PyXtal. Now all that seams straight forward right? So you can imagine my surprise when I was getting error after error in parsing the LLM response. I was so confused for some reason it could not extract the species from the response???? Yeah no it turned out I spelled species wrong in the function call. :)
I know I make complaint jokes but its bugs, erros, and moments like that , that really make coding such a laugh, even those moments I really so enjoy.

.

Now back to the project! After parsing out all the information we can finally get to the interesting part 😈. CREATING AWESOME CRYSTAL STRUCTURES!!!!! PyXtal is a python tool that lets us randomly generate structures that don’t disobey fundamental laws of chemistry, such as Wyckoff positioning. The main challenge I had to overcome here was maintaining the structural group I had assigned to this crystal. PyXtal would be actively trying to create the best structure it thought was best but I wanted to preserve certain geometries to test since that is the most important thing in crystal creation

.

ANOTHER HUGE PROBLEM, I KEPT RUNNING OUT OF FREE CREDITS TO USE THE HUGGINGFACE AI MODELS. In the middle I tried to switch to Ollama locally hosted models I have before, but the remembered why I left them, THEY TAKE FOREVER.

.

After we create the PyXtal structures we run them through CHGNet to relax the structures, PyXtal SUCKS at creating low energy, high stability structures, its good at creating structures that don’t break certain rules such as Wyckoff positioning. After CHGNet relaxes the structure. It gives up the Stress values on the structure and the forces acting on the lattice of the crystal. We run 2 small loops to check each value and make sure its under 0.1 which is the value that is considered unstable for both metrics. If it passes the tests then its structure file is stored.

Open comments for this post

@kritik on Moonshine · about 2 months ago

4h 31m 9s logged

Firstly I wrapped up the first step of this pipeline. I firstly changed Array Process to Alpha Process, and inside of it created functions for each step of that process making everything clean, professional, and modular.

.

After that I created a file called BetaProcess and restarted the Chemistry Checks there since there turned out to be MANY more steps to that process than what I had initially thought. Now what happens is firstly the Arrays we created in the Alpha Process are sent to deep seek to be put back together into molecular formulas. I this requires a new overly long prompt that details out all the steps the AI must take and chemistry rules and exceptions it must consider. I took some time in perfecting this prompt. After we get the reconstructed molecule we have to parse it out. Right now the LLM gives the molecule along with a whole response. I originally tried using Re to parse out the molecular formula. That didn’t work so then I had to format the LLM response to keep the answer at the end so I could just run normal string parsing to get the molecule. Then its shot off to the same LLM and prompt as the last devlog to find out how plausible the molecule is.

.

The challenges in this phase were making sure the deepsake model could put together even the hardest type of molecules so a big part of this was continuous testing and modifying of the prompt.

.

LLM’s are strange in how they always feel the need to have a 2 page explanation for every response. Because of that a bug challenge was figuring out how to parse out the information we want from each LLM response.

.

I was getting a headache looking at the outputs of this phase. Just words and number mushing together everywhere. So then I spent a lot of time formatting the output of this phase to make the information readable, understandable, and intuitive to follow.

Open comments for this post

@kritik on Moonshine · about 2 months ago

3h 18m 25s logged

I started building the version 2 of Project Moonshine, a algorithm that’s meant to discover new crystals. This new engine takes a input of a reference material such as NH4H2PO4, and then breaks it into individual elements. It then systematically replaces every element with every element from the periodic table to generate THOUSANDS of potential new crystals. I got this idea from how real materials science discovery takes place. Scientists in a simulation mix and mash elements, using there extensive knowledge of chemistry to create new materials they think could work. Then atoms and elements are changed more to see what else can be created.

.

After we have our candidates I decided to first run initial screenings on the candidates to remove the ones that obviously won’t form stable crystals. I was initially going to test everything so we don’t lost any new crystals, but from version 1 I learned the enormous amount of time and compute that takes place in running high accuracy tests on these candidates (about 5 hours per test per molecule), and considering the fact I have over 10,000 CANDIDATES! and factoring in I want to finish this within the decade, I decided to run initial screenings.

.

I use the Qwen 2.5 model for plausibility analysis and deep seek to reconstruct the crystals from there array form that we created them in. When creating the candidates we break everything up into arrays, I thought the process of putting them back together would be as simple as counting the amount of times they appear in a array and then creating the empirical formula based on those subscripts of the elements. BUT NO. CHEMISTRY HAS 1000 RULES AND A 1000 MORE EXCEPTIONS IN MOLECULE FORMING. Instead of coding in every single one of these rules and exceptions , I decided to just use a AI model with strong reasoning. I am currently testing if this system for testing works.

.

Next I plan on improving the Chemistry rule set that these AI’s use so that I can make more accurate judgments about molecular forming and stability.