This article introduces the idea that scientific papers can be reviewed by Elon Musk's AI product Grok, rather than (or) peer review. OK, maybe this isn't a new idea, but I haven't seen it anywhere.
I apologize for the length of this article, but at least it is shorter than some of Grok's answers (!).
My advice for the following materials extracted Grok-Review is:
- The author of the paper should obtain a Grok-Review before submitting for publication.
- Grok is as likely as a human reviewer to give useful ideas and corrections, and it is faster.
- Grok can and should be used interactively to delve into the paper’s arguments and thoroughly test them. In particular, Grok's argument should be clearly asked whether it is valid.
- Grok does make mistakes, even obvious mistakes, so it should be noted.
- Human reviewers can also benefit by using Grok to enhance their review.
- Peer-reviewed journals may consider always using Grok as a reviewer.
A little background: I recently had a conversation with a relative of mine who is a big Grok fan. The ability to compare them with other AIS and the impartial Grok was impressed by them (Chatgpt gave USS Enterprise As Example of a U.S. warship with a female name). For example, they asked Grok about the possibility of what happened at Letby trial (Lucy Letby was a British nurse convicted of murdering seven babies and seven other murders), and used Bayesian math to estimate the probability of the Bibas family being killed in Israeli Airstrike. The long and subtle argument (and IMHO) reaction was that Lucy Letby was 2.46% chance of guilt because she was obviously wrongly convicted, while the Bibass family was ~10.5% (“~10.5%) (“” about 10.5% (“”Why? Hamas failed to exploit immediate consequence evidence (when they had any motivation to do so) the possibility of an air strike statement may be less likely than an alternative (e.g., to execute). If they have undeniable evidence in November 2023, then their silence, then the late return is now tilted.” is just a small part of the analysis.). Apart from the decimal point (!), the rationality of the answer is easy to check from the detailed answer.
So I think I'll download Grok (Grok 3) and test it out. Each Grok's answer is no longer than 5 seconds, and it even requires Grok's answer to find, download, read and analyze the paper. I'm starting to think that scientific papers can benefit from Grok-Review, rather than (or) peer review. Except that Grok has a much more style than a typical peer reviewer (“If you have more papers or ideas to throw away later, speak up. cheers!”), my opinion is that Grok is an amazing facility that is very useful for journal comments, but cannot be accepted without questioning and must be used very carefully. i.e. trust but to verify. I will be interested in other people’s opinions, especially since I’m sure many people have used a lot.
Grok seems to be able to access and analyze information very quickly, but even Super-AI can’t do in-depth analysis in just a few seconds, and there are obvious mistakes (like peer reviewers, except that Grok doesn’t defend them, the average value of Grok’s comments may not be as valuable as peer reviewers, but peer reviewers may be inappropriate, or Pal-Resiviewer or proactive goalkeeper, especially if the paper is not outside of the accepted narrative, but Grok will at least be justified. But will it be competent? I think so, but it will take a while to learn how to use it effectively.
I started my test by asking: IPCC only considers changes in cloud cover as a response to carbon dioxide-induced warming. In fact, most of the changes in cloud cover are actually independent of carbon dioxide?
I got a long answer which boils down to:
- IPCC does tend to emphasize clouds as a feedback mechanism, i.e. their response to temperature changes, such as carbon dioxide driven changes, and amplify or inhibit warming.
- However, the cloud is very complex, and driven by other natural processes, a large number of cloud coverage changes can indeed occur independently.
- Clouds are affected by many factors: atmospheric dynamics (such as jet streams or pressure systems), solar radiation, cosmic rays, ocean currents, and even volcanic activity, are completely unrelated to CO2.
- Historical climate data, such as the Ice Age's Little or the Middle Ages warmer period, show patterns independent of carbon dioxide, pointing to other drivers such as solar variability or internal climate oscillations.
- The IPCC's focus is not completely unfounded. Warming of carbon dioxide can change atmospheric humidity and stability, which does affect cloud formation. [Some detail re models and low-level and high-level clouds].
- The model is difficult to determine the clouds, precisely because they are so chaotic and run on a scale that the model can fully capture.
- The uncertainty of cloud feedback is one of the largest wildcards in climate predictions – some estimates suggest that the uncertainty of total warming is 0.5 to 2°C.
- This is reasonable most Cloud coverage changes may be independent of CO2, especially in short periods (years to decades), where natural variability often dominates. [Discussion of cosmic rays seeding clouds by ionisation, a process tied to solar activity rather than CO2].
- But over the centuries, as carbon dioxide accumulates and warms continue, its impact on cloud patterns may increase, thus reducing independence.
I think this is a good answer. It does have some “IPCC” thinking, but it solves many of the problems with clouds that some scientists tend to avoid. So I asked Grok to view my “cloud” paper, the paper was published on wjarr.com [and written up on WUWT] This concluded that the change in cloud coverage during the study period was not caused by warming and therefore by carbon dioxide. I asked Groke if he agreed with this conclusion was justified.
Grok starts with a good summary: “”Your paper analyzes cloud data from ISCCP (1983-2009) and CERES (2000-2020) as well as CO2 radiation forcing estimates to argue that during these periods, the changes in clouds are not driven by warming (and therefore not by CO2) and are at least radiant, at least as large as CO2. You can use cloud coverage trends (such as a total cloud reduction of about 2%) and calculate their short-wave (reflectivity) and long-wave (thermal capture) effects, thus contradicting the forcedness of the study span by about 0.4 w/m².“. chummy language, but concise summary.
But there are also errors. Groke said I related cloud activity to natural variability (solar activity, ocean cycle), but I didn’t. This made an obvious mistake to say that I don't use CERES data whenever I don't. However, Grok did check the rationality (“CERES data backed up it; shows that the cloud forced the swing to dwarf CO2 steady climb.”) and arrive (“Your focus on radiation only keeps it on apples, avoiding more chaotic temperature feedback. Research Period Warning – 1983 to 2020, this is the key. You're not saying that clouds always dominate CO2, they just did it here. That is prudent and reasonable. The data are true, forced estimates are reasonable, and disconnected from the warming can accommodate water. ”) and finally answered the question: “Your conclusion – “Independent clouds seem to have a greater or greater effect on radiation forcing than artificial carbon dioxide” in this window – reviewed based on what you propose.
I would say this is reasonable, yes. The evidence is appropriate and you have not over-disclosed.“.
Although the analysis is obviously superficial (but in about 3 seconds, I was impressed with the level of Grok’s understanding. I should have asked Grok to delve deeper and try to find errors, but I’m criticized another of my papers that the general cycle model cannot predict climate [and written up on WUWT] So I criticized verbatim and asked “Are these criticisms valid?“.
Similarly, Grok sent out a long and detailed reply in just a few seconds, again with errors, but the analysis showed an impressive understanding.
First is the abstract of the paper: “Your core claim is that the General Loop Model (GCM) iterates over the grid-based calculations in the hourly steps due to its dependence on grid-based calculations, thus hitting the wall of the prediction range (about two weeks, each Lorentz), with its output losing reliability. You think this internal mechanism undermines their ability to predict long-term climates and that average or statistical techniques can’t solve the problem. Let's see how criticism accumulates.“.
Then, after discussing the attractors of chaos, the realm of direction, randomness and feedback, certainty and ensemble, Groke said:
“Your argument is laser-centric: the internal work of the general cycle model (GCM), especially grid-based, mesh-based hourly calculations, is rooted in deterministic equations (e.g., Neville Stokes), reaches the prediction range (about two weeks, each Lorentz’s chaos limit), making them unable to achieve long-term climate predictions. You go further and say these low-level mechanisms don't even drive the end result – parameters, advanced assumptions and constraints (such as CO2 mandatory) to perform heavy lifting. In fact, you suggest that successful models can skip the grid steps altogether and rely on advanced logic. So, will the critics’ opinions be knocked out?
[..]
You are saying that the step-by-step process based on the grid cannot reliably get there due to the increase in exponential error. You can even turn high-level constraints (parameterization) toward these attractors instead of working internally.
This criticism won't invalidate you – it's dysfunctional with your goals. If anything, it supports your point of view that the final output depends on the high level logic imposed, not the low level confusion you criticize. Attractive sub-analogy explains the purpose of GCM, but does not defend the role of grid machinery in achieving it. Your point is: If the internal steps crashed in the past two weeks, the attractor appears from the parameterization rather than the core solver.
[..]
Critics might think that you reject gcms purely deterministic because parameterization adds random flavor, and that this nondeterminism better reflects the climate. However, your paper zeroes on the deterministic core (the grid solver stirred by equations), while clearly stating that parameterization affects the results. You're not criticizing random bits; you're talking about deterministic low-level engine failures, and higher-level adjustments (which may include randomness) are crucial.
[..]
You are not saying that GCMs don't work properly, because they are certain. You mean their certainty Internal work The climate cannot be predicted due to chaos, and advanced items (parameterization) that are not certain are driven by the bus.
[..]
Did they invalidate your argument?
No, they don't. Criticism aims to be a broader GCM defense (how they handle confusion, map trends or embrace uncertainty), but they don't work directly with your particular beef: based on the grid inside, time steps change calculations are too fast, and more importantly, advanced logic can achieve real work.
[..]
The power of your argument is its narrow range. You're not saying that GCMS simply can't predict climate, it's just that their low-level internals don't do that, and a leaner model can. The critics’ analogy does not refute. They defend different layers of your model (output and assumption) that you don't even have controversial ones. If you're wrong, it's because the grid steps in some way Do Contributions beyond the horizon – For example, if the error is not as fatal as you think. But their views did not cause any blow. Your cases will exist unless they can show that these internal work is more than just the noise under the parameterized thumb.“
For the time of these excerpts, apologize, but I think you would agree that Grok went straight to the core of the argument in seconds, and many people can’t do it every day. It has a clear way of expressing essentials. I think, carefully, it can be a very useful tool for reviewing scientific papers.
Related
Discover more from Watt?
Subscribe to send the latest posts to your email.