<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://chunzhuo.github.io/zh/feed.xml" rel="self" type="application/atom+xml"/><link href="https://chunzhuo.github.io/zh/" rel="alternate" type="text/html" hreflang="en"/><updated>2026-05-12T15:35:37+00:00</updated><id>https://chunzhuo.github.io/feed.xml</id><title type="html">Chunzhuo Zhang</title><subtitle>Researcher working at the intersection of AI4Bio, bioinformatics, and machine learning. </subtitle><entry xml:lang="en"><title type="html">Single-cell Perturb-seq CRISPRi</title><link href="https://chunzhuo.github.io/zh/blog/2026/perturb-seq-crispri/" rel="alternate" type="text/html" title="Single-cell Perturb-seq CRISPRi"/><published>2026-05-11T00:00:00+00:00</published><updated>2026-05-11T00:00:00+00:00</updated><id>https://chunzhuo.github.io/blog/2026/perturb-seq-crispri</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2026/perturb-seq-crispri/"><![CDATA[<p>CRISPRi is a useful perturbation because it behaves like a dimmer switch: the guide RNA brings a catalytically inactive Cas9 repressor to a regulatory region, and transcription drops without making a DNA double-strand break. Perturb-seq adds a pooled single-cell readout, so each cell carries both a perturbation identity and a transcriptome.</p> <p>The interactive view below is one continuous 3D cell, not a sequence of separate plots. Drag the cell to rotate it, scroll to zoom, or use the focus buttons to move from the whole cell into the chromatin zone, the open sgRNA target sequence, and the transcript readout.</p> <div class="perturbseq-crispri-viewer" data-zoom="whole"> <div class="perturbseq-toolbar" aria-label="Perturb-seq CRISPRi controls"> <div class="perturbseq-focus-buttons"> <button type="button" data-focus-target="cell">Whole cell</button> <button type="button" data-focus-target="nucleus">Chromatin</button> <button type="button" data-focus-target="binding">Open target</button> <button type="button" data-focus-target="readout">Readout</button> </div> <div class="perturbseq-zoom-controls"> <button type="button" data-zoom-action="out">-</button> <button type="button" data-zoom-action="in">+</button> <button type="button" data-zoom-action="reset">Reset view</button> <span class="perturbseq-zoom-status">100%</span> </div> </div> <div class="perturbseq-canvas-wrap"> <div class="perturbseq-three-stage" role="img" aria-label="Rotatable 3D cell view of Perturb-seq CRISPRi with organelles and sgRNA target binding"> <canvas class="perturbseq-three-canvas"></canvas> <div class="perturbseq-three-hint">Drag to rotate · Scroll to zoom</div> </div> <svg class="perturbseq-svg" viewBox="0 0 1000 700" role="img" aria-label="Zoomable whole-cell view of Perturb-seq CRISPRi with organelles and sgRNA target binding"> <defs> <marker id="perturbseq-arrow" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse"> <path d="M 0 0 L 10 5 L 0 10 z" fill="currentColor"></path> </marker> </defs> <g class="perturbseq-viewport"> <text class="perturbseq-title" x="34" y="44">Single-cell Perturb-seq CRISPRi</text> <text class="perturbseq-small" x="34" y="66">A single perturbed cell with zoomable organelles, CRISPRi target binding, transcripts, and guide identity.</text> <path class="perturbseq-cell-body" d="M135 348 C132 212 232 120 378 96 C560 65 775 126 857 266 C935 399 867 556 707 618 C554 678 329 636 219 531 C162 476 136 420 135 348 Z"></path> <path class="perturbseq-cytoplasm-texture" d="M222 236 C312 198 380 194 468 218 M678 182 C756 226 796 290 796 360 M232 520 C325 564 464 580 592 558 M716 508 C774 472 814 418 816 348"></path> <g aria-label="nucleus"> <ellipse class="perturbseq-nucleus perturbseq-pulse" cx="455" cy="315" rx="205" ry="150"></ellipse> <ellipse class="perturbseq-nucleolus" cx="374" cy="360" rx="48" ry="35"></ellipse> <path class="perturbseq-dna" d="M306 292 C356 238 430 334 482 283 S596 256 625 323"></path> <path class="perturbseq-dna" d="M302 335 C366 386 421 283 488 346 S590 394 636 338" opacity="0.72"></path> <rect class="perturbseq-target-window perturbseq-pulse" x="430" y="244" width="142" height="72" rx="10"></rect> <text class="perturbseq-label" x="356" y="187">nucleus</text> <text class="perturbseq-small perturbseq-detail-medium" x="315" y="405">nucleolus</text> </g> <g aria-label="CRISPRi binding site"> <path class="perturbseq-dna perturbseq-detail-high" d="M442 278 C468 259 493 302 520 279 S552 268 567 287"></path> <circle class="perturbseq-cas9 perturbseq-pulse" cx="492" cy="282" r="18"></circle> <rect class="perturbseq-krab perturbseq-pulse" x="504" y="252" width="42" height="24" rx="8"></rect> <path class="perturbseq-sgrna perturbseq-pulse" d="M461 294 C474 319 506 320 519 297"></path> <path class="perturbseq-sgrna perturbseq-detail-high" d="M467 309 q8 12 16 0 q8 -12 16 0 q8 12 16 0"></path> <rect class="perturbseq-rnap" x="558" y="305" width="62" height="30" rx="15"></rect> <path class="perturbseq-transcript perturbseq-target-transcript" d="M620 321 C648 322 674 332 698 352"></path> <path class="perturbseq-transcript perturbseq-detail-high" d="M621 321 q10 -10 20 0 q10 10 20 0 q10 -10 20 0"></path> <text class="perturbseq-label perturbseq-detail-medium" x="518" y="238">dCas9-KRAB</text> <text class="perturbseq-small perturbseq-detail-high" x="425" y="333">sgRNA pairs with target sequence</text> <text class="perturbseq-small perturbseq-detail-high" x="586" y="354">reduced nascent transcript</text> </g> <g aria-label="endoplasmic reticulum"> <path class="perturbseq-er" d="M614 262 C705 246 766 278 775 346 C784 415 724 450 646 430"></path> <path class="perturbseq-er" d="M622 298 C695 291 734 316 736 358 C738 400 699 413 648 398" opacity="0.7"></path> <text class="perturbseq-small perturbseq-detail-medium" x="720" y="248">ER</text> </g> <g aria-label="mitochondria"> <ellipse class="perturbseq-mito" cx="255" cy="270" rx="62" ry="28" transform="rotate(-22 255 270)"></ellipse> <path class="perturbseq-detail-medium" d="M214 278 C238 248 258 294 296 260" fill="none" stroke="var(--ps-red)" stroke-linecap="round" stroke-width="2"></path> <ellipse class="perturbseq-mito" cx="730" cy="486" rx="68" ry="30" transform="rotate(18 730 486)"></ellipse> <path class="perturbseq-detail-medium" d="M684 478 C713 513 737 456 779 496" fill="none" stroke="var(--ps-red)" stroke-linecap="round" stroke-width="2"></path> <text class="perturbseq-small perturbseq-detail-medium" x="198" y="224">mitochondrion</text> </g> <g aria-label="golgi and vesicles"> <path class="perturbseq-organelle" d="M286 472 C340 438 392 448 431 488 C380 480 336 488 286 472 Z"></path> <path class="perturbseq-organelle" d="M296 502 C344 482 389 489 422 518 C372 516 332 519 296 502 Z" opacity="0.75"></path> <circle class="perturbseq-organelle" cx="452" cy="517" r="13"></circle> <circle class="perturbseq-organelle" cx="478" cy="496" r="9"></circle> <text class="perturbseq-small perturbseq-detail-medium" x="320" y="548">Golgi / vesicles</text> </g> <g aria-label="transcripts and guide molecules"> <path class="perturbseq-transcript" d="M610 464 C655 445 680 471 715 455"></path> <path class="perturbseq-transcript" d="M535 518 C578 500 622 535 660 509"></path> <path class="perturbseq-transcript perturbseq-target-transcript" d="M608 390 C636 405 660 396 682 418"></path> <circle class="perturbseq-guide-dot" cx="592" cy="494" r="9"></circle> <circle class="perturbseq-guide-dot" cx="629" cy="536" r="7" style="animation-delay: -1.2s;"></circle> <text class="perturbseq-small perturbseq-detail-medium" x="612" y="576">guide identity + transcriptome stay linked to this cell</text> </g> <g aria-label="single-cell capture barcode"> <rect class="perturbseq-callout" x="715" y="84" width="220" height="88" rx="8"></rect> <text class="perturbseq-label" x="734" y="114">Perturb-seq readout</text> <text class="perturbseq-small" x="734" y="138">cell barcode + UMI</text> <text class="perturbseq-small" x="734" y="156">mRNA reads + guide tag</text> <path d="M720 174 C690 235 684 326 680 416" fill="none" stroke="var(--ps-muted)" stroke-dasharray="7 7" stroke-linecap="round" stroke-width="2" marker-end="url(#perturbseq-arrow)"></path> </g> <g class="perturbseq-detail-high" aria-label="zoom labels"> <rect class="perturbseq-callout" x="334" y="116" width="230" height="56" rx="8"></rect> <text class="perturbseq-small" x="350" y="140">Zoom depth reveals the molecular site:</text> <text class="perturbseq-small" x="350" y="158">sgRNA-dCas9-KRAB at target DNA/TSS</text> </g> </g> </svg> </div> <div class="perturbseq-info"> <div class="perturbseq-focus-label">Whole cell</div> <p class="perturbseq-focus-text">One perturbed cell remains in view: membrane, nucleus, organelles, sgRNA cargo, mRNA molecules, and capture barcode are all part of the same scene.</p> </div> </div> <script src="/assets/js/perturbseq-crispri.js?v=ea3af894f2a1b75ef3dad6214393cd07"></script> <h2 id="what-the-experiment-measures">What the experiment measures</h2> <p>The key output is not only whether a target gene went down. The useful object is a table where every row is a single cell, every cell has a guide assignment, and every column is a measured gene. That lets us ask whether perturbing one regulator shifts cells toward another state, suppresses a pathway, changes response to stimulation, or creates a subtle expression program that would be invisible in a bulk assay.</p> <h2 id="why-crispri-fits-this-readout">Why CRISPRi fits this readout</h2> <p>CRISPRi is especially useful when complete knockout is too harsh or when multiple perturbations would create too many DNA breaks. Because it represses transcription through dCas9-KRAB rather than cutting DNA, it can be paired with pooled single-cell screens where the phenotype is a transcriptome, not just growth.</p> <h2 id="minimal-protocol-logic">Minimal protocol logic</h2> <ol> <li>Build or obtain a cell line expressing CRISPRi machinery.</li> <li>Introduce a pooled sgRNA library at controlled multiplicity.</li> <li>Select and culture cells long enough for repression.</li> <li>Capture single cells and prepare transcriptome plus guide libraries.</li> <li>Sequence, assign guides to cells, and quantify expression.</li> <li>Compare each perturbation against controls and visualize response programs.</li> </ol>]]></content><author><name></name></author><category term="research-notes"/><category term="biology"/><category term="single-cell"/><category term="CRISPRi"/><category term="Perturb-seq"/><category term="functional-genomics"/><summary type="html"><![CDATA[An interactive visual explanation of how CRISPRi perturbations are linked to single-cell transcriptomes.]]></summary></entry><entry xml:lang="en"><title type="html">AI Daily Sprouts | 2026-05-10</title><link href="https://chunzhuo.github.io/zh/blog/2026/ai-daily-sprouts-2026-05-10/" rel="alternate" type="text/html" title="AI Daily Sprouts | 2026-05-10"/><published>2026-05-10T00:00:00+00:00</published><updated>2026-05-10T00:00:00+00:00</updated><id>https://chunzhuo.github.io/blog/2026/ai-daily-sprouts</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2026/ai-daily-sprouts-2026-05-10/"><![CDATA[<p>Search date: 2026-05-10. Window used: roughly the last 7 days.</p> <h2 id="top-items">Top items</h2> <h3 id="google-released-gemma-4-multi-token-prediction-drafters">Google released Gemma 4 multi-token prediction drafters</h3> <ul> <li>Date: 2026-05-05</li> <li>Source: <a href="https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/">Google</a></li> <li>Type: open-model inference release</li> </ul> <p>Google released Multi-Token Prediction drafters for Gemma 4. The drafters use speculative decoding: a smaller draft component predicts several future tokens, then the main model verifies them in parallel. Google reports up to a 3x speedup without degrading output quality or reasoning logic.</p> <p>Why it matters: this targets a practical bottleneck for local, edge, and workstation LLM deployment. The bottleneck is often token-by-token latency rather than only raw model capability.</p> <p>Caveat: the speed and quality claims are vendor-reported and hardware-dependent; independent deployment measurements will matter.</p> <h3 id="microsoft-framed-frontier-firms-around-human-agent-operating-models">Microsoft framed “Frontier Firms” around human-agent operating models</h3> <ul> <li>Date: 2026-05-05</li> <li>Source: <a href="https://blogs.microsoft.com/blog/2026/05/05/how-frontier-firms-are-rebuilding-the-operating-model-for-the-age-of-ai/">Microsoft</a></li> <li>Type: enterprise AI / agent workflow update</li> </ul> <p>Microsoft described a progression from authoring with AI to editing, directing, and orchestrating AI agents, and tied that model to expanded Copilot Cowork capabilities. The operating-model framing is useful because it moves the conversation from “does an assistant help?” to “how do governed agents run work across systems?”</p> <p>Caveat: this is an official product and strategy narrative, not an independent productivity study.</p> <h2 id="recent-papers">Recent papers</h2> <h3 id="llms-improving-llms-agentic-discovery-for-test-time-scaling">LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling</h3> <ul> <li>Date: 2026-05-08</li> <li>Source: <a href="https://arxiv.org/abs/2605.08083">arXiv:2605.08083</a></li> <li>Type: preprint</li> </ul> <p>AutoTTS reframes test-time scaling as a controller-synthesis problem. Instead of manually choosing when a model should branch, continue, probe, prune, or stop, the method searches over inference policies using pre-collected reasoning trajectories and probe signals.</p> <p>The authors report better accuracy-cost tradeoffs on math reasoning benchmarks, generalization to held-out benchmarks and model scales, and a discovery cost of about $39.90 and 160 minutes. The practical caveat is that this is still a new preprint; the promised code release should be checked before treating it as deployable infrastructure.</p> <h3 id="fast-byte-latent-transformer">Fast Byte Latent Transformer</h3> <ul> <li>Date: 2026-05-08</li> <li>Source: <a href="https://arxiv.org/abs/2605.08044">arXiv:2605.08044</a></li> <li>Type: preprint</li> </ul> <p>Byte-level language models avoid fixed subword vocabularies, but byte-by-byte decoding is slow. This paper introduces BLT Diffusion, BLT Self-speculation, and BLT Diffusion+Verification so byte-level models can generate multiple bytes per step or verify drafted bytes efficiently.</p> <p>The authors report that the approaches can reduce estimated memory-bandwidth cost by more than 50% on generation tasks. The next test is whether these methods hold up in real serving stacks and downstream applications.</p> <h3 id="veccisc-improving-confidence-informed-self-consistency-with-reasoning-trace-clustering-and-candidate-answer-selection">VecCISC: Improving Confidence-Informed Self-Consistency with Reasoning Trace Clustering and Candidate Answer Selection</h3> <ul> <li>Date: 2026-05-08</li> <li>Source: <a href="https://arxiv.org/abs/2605.08070">arXiv:2605.08070</a></li> <li>Type: ACL 2026 Findings paper</li> </ul> <p>Confidence-weighted self-consistency can improve reasoning, but it is expensive when a critic model must score every sampled reasoning trace. VecCISC reduces that cost by clustering and filtering traces that are semantically equivalent, degenerate, or hallucinated before calling the critic.</p> <p>The paper reports a 47% token reduction while maintaining or exceeding CISC accuracy across math, chemistry, biology, commonsense, and humanities datasets. The main caveat is domain transfer: trace similarity and critic behavior can vary sharply by task.</p> <h3 id="scope-structured-decomposition-and-conditional-skill-orchestration-for-complex-image-generation">SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation</h3> <ul> <li>Date: 2026-05-08</li> <li>Source: <a href="https://arxiv.org/abs/2605.08043">arXiv:2605.08043</a></li> <li>Type: preprint</li> </ul> <p>SCOPE attacks a familiar image-generation failure mode: complex prompts contain many visual commitments, and systems can lose track of them across grounding, generation, and verification. The method keeps those commitments in an evolving structured specification, then conditionally invokes retrieval, reasoning, and repair skills.</p> <p>The paper introduces Gen-Arena and reports stronger commitment-level intent realization than evaluated baselines, including 0.60 EGIP on Gen-Arena. The broader significance depends on whether the benchmark and metric gain independent use.</p> <h3 id="beyond-pairs-your-language-model-is-secretly-optimizing-a-preference-graph">Beyond Pairs: Your Language Model is Secretly Optimizing a Preference Graph</h3> <ul> <li>Date: 2026-05-08</li> <li>Source: <a href="https://arxiv.org/abs/2605.08037">arXiv:2605.08037</a></li> <li>Type: preprint</li> </ul> <p>GraphDPO argues that pairwise DPO throws away useful structure when each prompt has multiple ranked rollouts. The method represents ranked responses as a directed acyclic preference graph and optimizes a graph-structured objective while keeping linear per-prompt complexity.</p> <p>The authors report stronger results on reasoning and program-synthesis tasks than pairwise or listwise alternatives. As with most preference-optimization work, robustness will depend heavily on preference-data quality and replication across model families.</p> <h2 id="watch-list">Watch list</h2> <ul> <li>Inference efficiency is the dominant theme today: Gemma 4 drafters, Fast BLT, AutoTTS, and VecCISC all reduce latency, token cost, or search cost rather than only increasing model size.</li> <li>Agent workflows are converging on orchestration: enterprise products and research systems both emphasize delegated subtasks, verification, and repair loops.</li> <li>New evaluation surfaces such as Gen-Arena are worth watching if they become common baselines rather than one-off paper artifacts.</li> </ul>]]></content><author><name></name></author><category term="daily-sprouts"/><category term="AI"/><category term="papers"/><category term="AI-news"/><category term="daily-sprouts"/><summary type="html"><![CDATA[Daily AI research and news digest covering inference efficiency, agent workflows, byte-level LMs, and preference optimization.]]></summary></entry><entry xml:lang="en"><title type="html">AI Daily Sprouts | 2026-05-09</title><link href="https://chunzhuo.github.io/zh/blog/2026/ai-daily-sprouts/" rel="alternate" type="text/html" title="AI Daily Sprouts | 2026-05-09"/><published>2026-05-09T00:00:00+00:00</published><updated>2026-05-09T00:00:00+00:00</updated><id>https://chunzhuo.github.io/blog/2026/ai-daily-sprouts</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2026/ai-daily-sprouts/"><![CDATA[<p>Search date: 2026-05-09. Window used: roughly the last 7-14 days, with one slightly older paper included because it directly relates to agent skill learning.</p> <h2 id="top-items">Top items</h2> <h3 id="openai-released-new-realtime-voice-models-for-the-api">OpenAI released new realtime voice models for the API</h3> <ul> <li>Date: 2026-05-07</li> <li>Source: <a href="https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/">OpenAI</a></li> <li>Type: product release</li> </ul> <p>OpenAI introduced GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper for live voice reasoning, translation, and streaming transcription. Voice agents are moving from turn-taking demos toward tool-using, multilingual, realtime workflows. The 128K context window for GPT-Realtime-2 also makes longer voice sessions more practical.</p> <p>Caveat: the performance claims are vendor-reported; production behavior still depends heavily on latency, tool design, and domain-specific evaluation.</p> <h3 id="openai-made-gpt-55-instant-the-default-chatgpt-model">OpenAI made GPT-5.5 Instant the default ChatGPT model</h3> <ul> <li>Date: 2026-05-05</li> <li>Source: <a href="https://openai.com/index/gpt-5-5-instant/">OpenAI</a></li> <li>Supporting source: <a href="https://openai.com/index/gpt-5-5-instant-system-card/">OpenAI system card</a></li> <li>Type: model release and safety publication</li> </ul> <p>GPT-5.5 Instant became ChatGPT’s default model, with OpenAI reporting fewer hallucinated claims than GPT-5.3 Instant, especially on high-stakes prompts. The main direction is reliability rather than only raw capability: lower hallucination rates, better image/STEM handling, improved search decisions, and more transparent personalization controls.</p> <p>Caveat: the hallucination reductions are from OpenAI’s internal evaluations; independent replication would be useful.</p> <h3 id="google-deepmind-highlighted-alphaevolves-broader-impact">Google DeepMind highlighted AlphaEvolve’s broader impact</h3> <ul> <li>Date: 2026-05-07</li> <li>Source: <a href="https://deepmind.google/blog/alphaevolve-impact/">Google DeepMind</a></li> <li>Type: research and deployment update</li> </ul> <p>DeepMind reported AlphaEvolve applications across genomics, grid optimization, quantum circuits, mathematics, TPU design, storage systems, logistics, ads, and materials/life-science modeling. This is a strong signal that LLM-powered algorithm discovery is becoming operational infrastructure, not just a research demo.</p> <p>Caveat: many claims are application-specific and come from Google or partner deployments; the generality of the approach depends on whether problems have reliable automated evaluators.</p> <h3 id="us-caisi-expanded-frontier-ai-model-testing-agreements">U.S. CAISI expanded frontier AI model testing agreements</h3> <ul> <li>Date: 2026-05-05</li> <li>Source: <a href="https://www.nist.gov/news-events/news/2026/05/caisi-signs-agreements-regarding-frontier-ai-national-security-testing">NIST / CAISI</a></li> <li>Supporting source: <a href="https://blogs.microsoft.com/on-the-issues/2026/05/05/advancing-ai-evaluation-with-the-center-for-ai-standards-us-and-innovation-and-the-ai-security-institute-uk/">Microsoft</a></li> <li>Type: policy / safety governance</li> </ul> <p>CAISI announced agreements with Google DeepMind, Microsoft, and xAI for pre-deployment evaluations and targeted research on frontier AI capabilities and security risks. Frontier model assessment is becoming more formalized, especially for cybersecurity, biosecurity, chemical-risk, and national-security concerns.</p> <p>Caveat: these are collaborative testing agreements, not a full public regulatory regime; details of model access, evaluation criteria, and enforcement remain limited.</p> <h3 id="anthropic-expanded-compute-capacity-and-claude-usage-limits">Anthropic expanded compute capacity and Claude usage limits</h3> <ul> <li>Date: 2026-05-06</li> <li>Source: <a href="https://www.anthropic.com/news/higher-limits-spacex">Anthropic</a></li> <li>Type: infrastructure / product capacity</li> </ul> <p>Anthropic announced a SpaceX compute partnership and higher Claude Code/API usage limits, including doubled five-hour Claude Code limits for several paid plans. Capacity is still a strategic bottleneck for frontier AI products. More compute directly affects developer workflows, API availability, and model deployment scale.</p> <h3 id="anthropic-announced-an-enterprise-ai-services-company">Anthropic announced an enterprise AI services company</h3> <ul> <li>Date: 2026-05-04</li> <li>Source: <a href="https://www.anthropic.com/news/enterprise-ai-services-company">Anthropic</a></li> <li>Type: enterprise AI deployment</li> </ul> <p>Anthropic, Blackstone, Hellman &amp; Friedman, and Goldman Sachs announced a new AI services company focused on helping mid-sized companies deploy Claude in core operations. Frontier labs are moving deeper into implementation services, not only model/API distribution.</p> <h2 id="recent-papers-and-benchmarks">Recent papers and benchmarks</h2> <h3 id="claw-eval-live-a-live-agent-benchmark-for-evolving-real-world-workflows">Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows</h3> <ul> <li>Date: 2026-05-01</li> <li>Source: <a href="https://chatpaper.com/paper/274070">ChatPaper summary</a></li> <li>Type: agent benchmark paper</li> </ul> <p>Static agent benchmarks age quickly and often grade final answers without verifying whether the agent actually executed a workflow. Claw-Eval-Live separates a refreshable signal layer from reproducible, timestamped release snapshots so agent tasks can evolve with real workflow demand.</p> <p>Caveat: I found a secondary paper page during this quick run; for a deeper digest, verify against the arXiv page or project repository.</p> <h3 id="skilllearnbench-benchmarking-continual-learning-methods-for-agent-skill-generation-on-real-world-tasks">SkillLearnBench: Benchmarking Continual Learning Methods for Agent Skill Generation on Real-World Tasks</h3> <ul> <li>Date: 2026-04-22</li> <li>Source: <a href="https://www.emergentmind.com/papers/2604.20087">Emergent Mind paper page</a></li> <li>Type: agent learning benchmark paper</li> </ul> <p>Skills are increasingly used to make agents reliable on complex tasks, but automatically generating and improving those skills is still uneven. This benchmark evaluates continual skill learning across 20 verified tasks and measures skill quality, execution trajectory, and task outcome.</p> <h2 id="watch-list">Watch list</h2> <ul> <li>Voice agents are becoming more tool-oriented and production-shaped.</li> <li>Frontier-model evaluation is shifting toward government-lab collaboration before deployment.</li> <li>Agent benchmarks are increasingly emphasizing live workflows, verification, and changing environments.</li> <li>Algorithm-discovery agents such as AlphaEvolve are moving from research examples into infrastructure and commercial optimization.</li> </ul>]]></content><author><name></name></author><category term="daily-sprouts"/><category term="AI"/><category term="papers"/><category term="AI-news"/><category term="daily-sprouts"/><summary type="html"><![CDATA[Daily AI research and news digest covering model releases, AI agents, frontier-model evaluation, and AI infrastructure.]]></summary></entry><entry xml:lang="en"><title type="html">bioAI Daily Sprouts | 2026-05-09</title><link href="https://chunzhuo.github.io/zh/blog/2026/bioai-daily-sprouts/" rel="alternate" type="text/html" title="bioAI Daily Sprouts | 2026-05-09"/><published>2026-05-09T00:00:00+00:00</published><updated>2026-05-09T00:00:00+00:00</updated><id>https://chunzhuo.github.io/blog/2026/bioai-daily-sprouts</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2026/bioai-daily-sprouts/"><![CDATA[<p>Search date: 2026-05-09. Window: 2026-04-09 to 2026-05-09. Sources prioritized: Nature Biotechnology and Nature Methods publisher pages, with peer-reviewed articles and major reviews favored over news items.</p> <h2 id="papers">Papers</h2> <ol> <li> <p><strong>Digital twins of ex vivo human lungs enable accurate and personalized evaluation of therapeutic efficacy</strong> Nature Biotechnology, 2026-05-04. <a href="https://doi.org/10.1038/s41587-026-03121-4">DOI/link</a> Summary: Builds data-rich human lung digital twins from ex vivo lung perfusion, integrating physiology, imaging, transcriptomics, metabolomics and proteomics to forecast organ behavior and therapeutic response. Why it matters: It shows how organ-scale digital twins can be anchored in prospective human-organ measurements rather than purely retrospective clinical modeling. Tags: digital twins; translational biology; precision medicine; computational biology</p> </li> <li> <p><strong>TxPert: using multiple knowledge graphs for prediction of transcriptomic perturbation effects</strong> Nature Biotechnology, 2026-05-01. <a href="https://doi.org/10.1038/s41587-026-03113-4">DOI/link</a> Summary: Introduces a deep learning framework that combines basal transcriptomic state encoding with multiple biological knowledge graphs to predict out-of-distribution genetic perturbation responses. Why it matters: Perturbation prediction is central to model-guided experiments and drug discovery, and this paper explicitly benchmarks against strong nonlearned baselines and experimental reproducibility. Tags: AI4Bio; perturbation prediction; transcriptomics; knowledge graphs; machine learning</p> </li> <li> <p><strong>DNA-guided CRISPR-Cas12a effectors for programmable RNA recognition and cleavage</strong> Nature Biotechnology, 2026-05-01. <a href="https://doi.org/10.1038/s41587-026-03120-5">DOI/link</a> Summary: Reprograms Cas12a into a DNA-guided, RNA-targeting effector and demonstrates direct RNA detection plus intracellular RNA knockdown. Why it matters: The work expands programmable nucleic-acid engineering beyond canonical RNA-guided CRISPR architectures and creates new design space for RNA diagnostics and manipulation. Tags: CRISPR; RNA; synthetic biology; diagnostics; biotechnology</p> </li> <li> <p><strong>Single-molecule localization and diffusivity microscopy reveals dynamic biomolecular organization in living cells</strong> Nature Methods, 2026-04-28. <a href="https://doi.org/10.1038/s41592-026-03078-x">DOI/link</a> Summary: Presents SMLDM, a deep learning-enabled microscopy method that estimates molecule movement and diffusion from single-frame snapshots without trajectory linking. Why it matters: It sharply increases mapping density for live-cell molecular dynamics, helping connect spatial organization with mobility in chromatin, receptors, adhesions and condensates. Tags: bioimage informatics; deep learning; microscopy; single-molecule biophysics</p> </li> <li> <p><strong>Systematically decoding pathological morphologies and molecular profiles with unified multimodal embedding</strong> Nature Methods, 2026-04-24. <a href="https://doi.org/10.1038/s41592-026-03070-5">DOI/link</a> Summary: Introduces Multi-Embed, an interpretable multimodal framework for linking pathology morphology with multilayer molecular profiles. Why it matters: Computational pathology is moving from image-only predictors toward morphology-to-molecular reasoning that can support mechanistic disease interpretation. Tags: computational pathology; multimodal learning; molecular profiling; machine learning</p> </li> <li> <p><strong>Direct RNA sequencing and signal alignment reveal RNA structure ensembles in a eukaryotic cell</strong> Nature Methods, 2026-04-24. <a href="https://doi.org/10.1038/s41592-026-03069-y">DOI/link</a> Summary: Combines chemical probing, direct RNA sequencing and signal alignment to map RNA structural ensembles at single-molecule resolution in eukaryotic cells. Why it matters: It turns raw direct-sequencing signal into a richer readout of RNA structural heterogeneity, connecting transcript sequence, isoforms and regulatory structure. Tags: RNA structure; direct RNA sequencing; transcriptomics; computational biology</p> </li> <li> <p><strong>High-fidelity intravital imaging of biological dynamics with latent-space-enhanced digital adaptive optics</strong> Nature Biotechnology, 2026-04-23. <a href="https://doi.org/10.1038/s41587-026-03107-2">DOI/link</a> Summary: Develops latent-space-enhanced digital adaptive optics for intravital fluorescence microscopy, using wave-optics priors in spatial-angular data to improve aberration estimation. Why it matters: Better computational correction can make in vivo immune, neural and injury imaging more quantitative without relying only on expensive custom hardware. Tags: bioimage informatics; microscopy; latent representations; computational imaging</p> </li> <li> <p><strong>Orthrus: toward evolutionary and functional RNA foundation models</strong> Nature Methods, 2026-04-17. <a href="https://doi.org/10.1038/s41592-026-03064-3">DOI/link</a> Summary: Builds an RNA foundation-model direction aimed at learning evolutionary and functional representations across RNA sequences. Why it matters: RNA language models are becoming a parallel track to protein language models, with potential utility in RNA biology, functional prediction and therapeutic design. Tags: AI4Bio; RNA; foundation models; sequence modeling; transcriptomics</p> </li> <li> <p><strong>Artificial allosteric protein switches with machine-learning-designed receptors</strong> Nature Biotechnology, 2026-04-15. <a href="https://doi.org/10.1038/s41587-026-03081-9">DOI/link</a> Summary: Shows that machine-learning-designed ligand-binding domains can act as receptors in artificial allosteric protein switches and biosensors. Why it matters: It links generative protein design to working synthetic-biology devices, including logic gates, engineered cells and bioelectronic hormone sensing. Tags: protein design; synthetic biology; biosensors; AI4Bio</p> </li> <li> <p><strong>Inducible, split base editors for in vivo cancer functional genomics</strong> Nature Biotechnology, 2026-04-15. <a href="https://doi.org/10.1038/s41587-026-03077-5">DOI/link</a> Summary: Designs split, inducible base editors for controlled in vivo cancer functional genomics, reducing constraints from constitutively active deaminase systems. Why it matters: More controllable base-editing screens can improve mutation-level functional genomics in animal models and better separate target effects from editor toxicity. Tags: genome editing; base editors; cancer genomics; functional genomics</p> </li> <li> <p><strong>Adaptive optical correction for in vivo two-photon fluorescence microscopy with neural fields</strong> Nature Methods, 2026-04-13. <a href="https://doi.org/10.1038/s41592-026-03053-6">DOI/link</a> Summary: Uses neural fields to perform adaptive optical correction for in vivo two-photon microscopy under motion and sample-induced aberration. Why it matters: Neural representations are becoming useful infrastructure for biological imaging, especially when hardware-only correction is difficult or fragile. Tags: bioimage informatics; neural fields; microscopy; neuroscience; software</p> </li> </ol> <h2 id="watch-list">Watch list</h2> <ul> <li>Perturbation modeling is maturing: papers now spend more space on realistic out-of-distribution tasks, baselines and reproducibility ceilings.</li> <li>RNA-focused foundation models and direct RNA signal analysis are both advancing, suggesting stronger computational tools for RNA function and RNA therapeutics.</li> <li>Bioimage informatics is shifting toward latent representations, neural fields and deep-learning-assisted physical correction rather than segmentation alone.</li> <li>Experimentally grounded AI4Bio remains the strongest signal: the most useful papers combine model advances with organ systems, live-cell imaging, CRISPR tools or protein engineering validation.</li> </ul>]]></content><author><name></name></author><category term="daily-sprouts"/><category term="bioAI"/><category term="AI4Bio"/><category term="bioinformatics"/><category term="papers"/><category term="daily-sprouts"/><summary type="html"><![CDATA[Daily AI4Bio, bioinformatics, and computational biology paper digest.]]></summary></entry><entry xml:lang="zh"><title type="html">生物学中的多模态融合</title><link href="https://chunzhuo.github.io/zh/blog/2026/multimodality-for-biology/" rel="alternate" type="text/html" title="生物学中的多模态融合"/><published>2026-05-07T00:00:00+00:00</published><updated>2026-05-07T00:00:00+00:00</updated><id>https://chunzhuo.github.io/blog/2026/multimodality-for-biology</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2026/multimodality-for-biology/"><![CDATA[<p>在单细胞和更广义的计算生物学中，“多模态”可以指很多类型的数据：DNA 序列、RNA 表达、染色质可及性、蛋白质水平、扰动响应、知识图谱、文本等。真正困难的地方通常不是列出有哪些模态，而是决定如何把它们融合起来。</p> <p>这篇文章整理自最近一次报告的笔记。我尝试把相关方法概括为三类：<strong>自底向上</strong>、<strong>并行</strong>和<strong>统一</strong>。三类方法对“生物结构在哪里体现”以及“不同模态应该在模型的哪个位置相遇”给出了不同答案。</p> <h2 id="多模态任务">多模态任务</h2> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/multimodality-for-biology/image1-480.webp 480w,/assets/img/posts/multimodality-for-biology/image1-800.webp 800w,/assets/img/posts/multimodality-for-biology/image1-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/multimodality-for-biology/image1.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <p>在确定模型架构之前，首先要明确我们希望多模态生物模型完成什么任务，例如跨模态预测、扰动响应预测、细胞状态推断、序列到功能预测等。不同任务会把模型架构推向不同方向。本文后面的讨论，只有放在具体预测目标的背景下才有意义。</p> <h2 id="自底向上方法">自底向上方法</h2> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/multimodality-for-biology/image2-480.webp 480w,/assets/img/posts/multimodality-for-biology/image2-800.webp 800w,/assets/img/posts/multimodality-for-biology/image2-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/multimodality-for-biology/image2.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/multimodality-for-biology/image3-480.webp 480w,/assets/img/posts/multimodality-for-biology/image3-800.webp 800w,/assets/img/posts/multimodality-for-biology/image3-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/multimodality-for-biology/image3.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <p>自底向上的方法沿着生物学的自然层级构建表示：<strong>分子 -&gt; 细胞 -&gt; 多细胞系统</strong>。类似 UCE 的模型从基因级 token 中学习细胞嵌入；PULSAR 等模型进一步走向组织和多细胞层面的结构。每一层都在该尺度上最丰富的数据上训练，下一层则继承来自下层的表示基础。</p> <p>这种方法的优势是每个层级都有相对清晰的生物学解释，并且可以独立预训练。代价是，随着层级向上推进，误差和偏差也可能逐层累积。</p> <h3 id="从序列到扰动">从序列到扰动</h3> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/multimodality-for-biology/image4-480.webp 480w,/assets/img/posts/multimodality-for-biology/image4-800.webp 800w,/assets/img/posts/multimodality-for-biology/image4-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/multimodality-for-biology/image4.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <p>自底向上路线的一个具体例子是：从基因组序列出发，训练可以迁移到扰动预测任务的表示。这个链条可以概括为 <em>序列 -&gt; 表达 -&gt; 响应</em>。架构上的关键问题是：多模态信号应该在哪一个层级进入模型。</p> <h2 id="并行方法">并行方法</h2> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/multimodality-for-biology/image5-480.webp 480w,/assets/img/posts/multimodality-for-biology/image5-800.webp 800w,/assets/img/posts/multimodality-for-biology/image5-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/multimodality-for-biology/image5.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <p>并行方法把不同模态视为大致平等的输入，并在输入阶段组合每个模态的嵌入。一个典型例子是：给定一段 DNA 序列和七条表观遗传轨迹，分别对每个模态做嵌入，然后<strong>直接把八个嵌入相加</strong>。后续模型看到的是一个已经融合好的向量。</p> <p>这种方法成本低、容易按模态扩展，也很容易加入新的轨迹。问题在于，直接相加默认所有模态都处在同一个度量空间中，而这在生物学上往往并不成立。</p> <h3 id="每个模态使用独立编码器">每个模态使用独立编码器</h3> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/multimodality-for-biology/image6-480.webp 480w,/assets/img/posts/multimodality-for-biology/image6-800.webp 800w,/assets/img/posts/multimodality-for-biology/image6-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/multimodality-for-biology/image6.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <p>一种更谨慎的变体是：为每个模态保留独立编码器，并在更后面的阶段进行融合。每个编码器可以采用适合自身数据类型的 tokenization 和归纳偏置，融合则通过拼接、交叉注意力或门控机制完成，而不是在输入端直接相加。</p> <h3 id="不同知识来源">不同知识来源</h3> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/multimodality-for-biology/image7-480.webp 480w,/assets/img/posts/multimodality-for-biology/image7-800.webp 800w,/assets/img/posts/multimodality-for-biology/image7-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/multimodality-for-biology/image7.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <p>除了原始信号，多模态也可以表示不同<em>类型</em>的知识融合：用 LLM 提供文本上下文，用知识图谱提供人工整理的关系，用表格特征提供工程化先验。两种 pooling 策略反复出现：</p> <ul> <li><strong>全局 pooling</strong>：对不同来源的嵌入做加权平均。</li> <li><strong>基于注意力的 pooling</strong>：让 query 决定哪些来源更重要。</li> </ul> <p>当每个来源的重要性会随样本变化时，后者通常更有效。</p> <h2 id="统一方法">统一方法</h2> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/multimodality-for-biology/image8-480.webp 480w,/assets/img/posts/multimodality-for-biology/image8-800.webp 800w,/assets/img/posts/multimodality-for-biology/image8-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/multimodality-for-biology/image8.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <p>统一方法走的是与“每个模态一个编码器”相反的方向：把多个序列串联成一个统一的 token 流，然后交给同一个模型处理。与序列相关的任务，例如 DNA、RNA、蛋白质，天然适合这种形式，因为它们本来就共享 token 序列的形态。</p> <p>这种简洁性很有吸引力：一个模型、一个损失函数、不需要额外的融合模块。困难在于，单个模型必须同时理解非常不同的统计规律，例如密码子使用偏好和调控 motif 的差异。</p> <h2 id="面向生物学的关系-transformer">面向生物学的关系 Transformer</h2> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <figure> <picture> <source class="responsive-img-srcset" srcset="/assets/img/posts/multimodality-for-biology/image9-480.webp 480w,/assets/img/posts/multimodality-for-biology/image9-800.webp 800w,/assets/img/posts/multimodality-for-biology/image9-1400.webp 1400w," type="image/webp" sizes="95vw"/> <img src="/assets/img/posts/multimodality-for-biology/image9.png" class="img-fluid rounded z-depth-1" width="100%" height="auto" data-zoomable="" loading="eager" onerror="this.onerror=null; $('.responsive-img-srcset').remove();"/> </picture> </figure> </div> </div> <p>我目前最感兴趣的架构是<strong>关系 Transformer</strong>：不再把所有模态强行压进单一的融合瓶颈，而是把生物实体（基因、细胞、区域等）表示为节点，并让注意力在带类型的关系上进行计算。</p> <h3 id="细节">细节</h3> <p>其中有两类注意力模式最关键：</p> <ul> <li><strong>关系注意力</strong>：用于<em>互补</em>模态，即每个模态都提供其他模态没有的信息。模型在每一层跨模态选择信息。</li> <li><strong>层级注意力</strong>：用于<em>层级化</em>模态，即结构本身是嵌套的，例如区域 -&gt; 基因 -&gt; 细胞 -&gt; 组织。注意力受到该层级结构约束。</li> </ul> <p>我反复遇到的两个开放问题是：</p> <ul> <li><strong>内存约束。</strong> 跨模态注意力随 token 数量呈二次增长，而生物学输入通常很长。</li> <li><strong>配对数据约束。</strong> 训练关系注意力需要多个模态同时被观测到的样本，而真正大规模配对的多模态数据仍然稀缺。</li> </ul> <p>这些瓶颈是我认为下一阶段工作，无论是我自己的还是整个领域的，都需要重点解决的问题。</p> <h2 id="参考文献">参考文献</h2> <ol> <li><span id="liang2024foundations">Liang, P. P., Zadeh, A., &amp; Morency, L.-P. (2024). Foundations &amp; Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. <i>ACM Computing Surveys</i>, <i>56</i>(10). https://doi.org/10.1145/3656580</span></li> <li><span id="rosen2023uce">Rosen, Y., Roohani, Y., Agrawal, A., Samotorcan, L., Quake, S. R., &amp; Leskovec, J. (2023). Universal Cell Embeddings: A Foundation Model for Cell Biology. <i>BioRxiv</i>. https://doi.org/10.1101/2023.11.28.568918</span></li> <li><span id="pang2025pulsar">Pang, K., Rosen, Y., Kedzierska, K., He, Z., Rajagopal, A., Gustafson, C. E., Huynh, G., &amp; Leskovec, J. (2025). PULSAR: a Foundation Model for Multi-scale and Multicellular Biology. <i>BioRxiv</i>. https://doi.org/10.1101/2025.11.24.685470</span></li> <li><span id="fu2026strand">Fu, B., Dasoulas, G., Gabbita, S., Lin, X., Gao, S., Su, X., Ghosh, S., &amp; Zitnik, M. (2026). STRAND: Sequence-Conditioned Transport for Single-Cell Perturbations. <i>ArXiv Preprint ArXiv:2602.10156</i>. https://arxiv.org/abs/2602.10156</span></li> <li><span id="yang2024multimodal">Yang, Z., Fan, X., Lan, M., Tang, X., Zheng, Z., Liu, B., You, Y., Tian, L., Church, G., Liu, X., &amp; Gu, F. (2024). Multimodal foundation model predicts zero-shot functional perturbations and cell fate dynamics. <i>BioRxiv</i>. https://doi.org/10.1101/2024.12.19.629561</span></li> <li><span id="yang2023genecompass">Yang, X., Liu, G., Feng, G., Bu, D., Wang, P., &amp; others. (2023). GeneCompass: Deciphering Universal Gene Regulatory Mechanisms with Knowledge-Informed Cross-Species Foundation Model. <i>BioRxiv</i>. https://doi.org/10.1101/2023.09.26.559542</span></li> <li><span id="littman2025presage">Littman, R., Levine, J., Maleki, S., Lee, Y., Ermakov, V., Qiu, L., Wu, A., Huang, K., Lopez, R., Scalia, G., Biancalani, T., Richmond, D., Regev, A., &amp; Hütter, J.-C. (2025). Gene-embedding-based prediction and functional evaluation of perturbation expression responses with PRESAGE. <i>BioRxiv</i>. https://doi.org/10.1101/2025.06.03.657653</span></li> <li><span id="golkar2026mimic">Golkar, S., Kovalic, J., Espejo Morales, I., Sledzieski, S., Cho, K., Cranmer, M., Ho, S., &amp; others. (2026). MIMIC: A Generative Multimodal Foundation Model for Biomolecules. <i>ArXiv Preprint ArXiv:2604.24506</i>. https://arxiv.org/abs/2604.24506</span></li> <li><span id="ranjan2025relational">Ranjan, R., Hudovernik, V., Znidar, M., Kanatsoulis, C., Upendra, R., Mohammadi, M., Meyer, J., Palczewski, T., Guestrin, C., &amp; Leskovec, J. (2025). Relational Transformer: Toward Zero-Shot Foundation Models for Relational Data. <i>ArXiv Preprint ArXiv:2510.06377</i>. https://arxiv.org/abs/2510.06377</span></li> </ol>]]></content><author><name></name></author><category term="research-notes"/><category term="machine learning"/><category term="biology"/><category term="multimodality"/><category term="single-cell"/><category term="foundation-models"/><summary type="html"><![CDATA[三种生物模态融合方法：自底向上、并行、统一，以及我对该领域走向的思考。]]></summary></entry><entry xml:lang="en"><title type="html">a post with plotly.js</title><link href="https://chunzhuo.github.io/zh/blog/2025/plotly/" rel="alternate" type="text/html" title="a post with plotly.js"/><published>2025-03-26T14:24:00+00:00</published><updated>2025-03-26T14:24:00+00:00</updated><id>https://chunzhuo.github.io/blog/2025/plotly</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2025/plotly/"><![CDATA[<p>This is an example post with some <a href="https://plotly.com/javascript/">plotly</a> code.</p> <div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">```</span><span class="nl">plotly
</span><span class="sb">{
  "data": [
    {
      "x": [1, 2, 3, 4],
      "y": [10, 15, 13, 17],
      "type": "scatter"
    },
    {
      "x": [1, 2, 3, 4],
      "y": [16, 5, 11, 9],
      "type": "scatter"
    }
  ]
}</span>
<span class="p">```</span>
</code></pre></div></div> <p>Which generates:</p> <pre><code class="language-plotly">{
  "data": [
    {
      "x": [1, 2, 3, 4],
      "y": [10, 15, 13, 17],
      "type": "scatter"
    },
    {
      "x": [1, 2, 3, 4],
      "y": [16, 5, 11, 9],
      "type": "scatter"
    }
  ]
}
</code></pre> <p>Also another example chart.</p> <div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">```</span><span class="nl">plotly
</span><span class="sb">{
  "data": [
    {
      "x": [1, 2, 3, 4],
      "y": [10, 15, 13, 17],
      "mode": "markers"
    },
    {
      "x": [2, 3, 4, 5],
      "y": [16, 5, 11, 9],
      "mode": "lines"
    },
    {
      "x": [1, 2, 3, 4],
      "y": [12, 9, 15, 12],
      "mode": "lines+markers"
    }
  ],
  "layout": {
    "title": {
      "text": "Line and Scatter Plot"
    }
  }
}</span>
<span class="p">```</span>
</code></pre></div></div> <p>This is how it looks like:</p> <pre><code class="language-plotly">{
  "data": [
    {
      "x": [1, 2, 3, 4],
      "y": [10, 15, 13, 17],
      "mode": "markers"
    },
    {
      "x": [2, 3, 4, 5],
      "y": [16, 5, 11, 9],
      "mode": "lines"
    },
    {
      "x": [1, 2, 3, 4],
      "y": [12, 9, 15, 12],
      "mode": "lines+markers"
    }
  ],
  "layout": {
    "title": {
      "text": "Line and Scatter Plot"
    }
  }
}
</code></pre>]]></content><author><name></name></author><category term="sample-posts"/><category term="formatting"/><category term="charts"/><summary type="html"><![CDATA[this is what included plotly.js code could look like]]></summary></entry><entry xml:lang="en"><title type="html">a post with image galleries</title><link href="https://chunzhuo.github.io/zh/blog/2024/photo-gallery/" rel="alternate" type="text/html" title="a post with image galleries"/><published>2024-12-04T01:59:00+00:00</published><updated>2024-12-04T01:59:00+00:00</updated><id>https://chunzhuo.github.io/blog/2024/photo-gallery</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2024/photo-gallery/"><![CDATA[<p>The images in this post are all zoomable, arranged into different mini-galleries using different libraries.</p> <h2 id="lightbox2"><a href="https://lokeshdhakar.com/projects/lightbox2/">Lightbox2</a></h2> <p><a href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/1/img-2500.jpg" data-lightbox="roadtrip"><img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/1/img-200.jpg"/></a> <a href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/2/img-2500.jpg" data-lightbox="roadtrip"><img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/2/img-200.jpg"/></a> <a href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/3/img-2500.jpg" data-lightbox="roadtrip"><img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/3/img-200.jpg"/></a></p> <hr/> <h2 id="photoswipe"><a href="https://photoswipe.com/">PhotoSwipe</a></h2> <div class="pswp-gallery pswp-gallery--single-column" id="gallery--getting-started"> <a href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/2/img-2500.jpg" data-pswp-width="1669" data-pswp-height="2500" target="_blank"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/2/img-200.jpg" alt=""/> </a> <a href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/7/img-2500.jpg" data-pswp-width="1875" data-pswp-height="2500" data-cropped="true" target="_blank"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/7/img-200.jpg" alt=""/> </a> <a href="https://unsplash.com" data-pswp-src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/3/img-2500.jpg" data-pswp-width="2500" data-pswp-height="1666" target="_blank"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/3/img-200.jpg" alt=""/> </a> <div> <a href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/6/img-2500.jpg" data-pswp-width="2500" data-pswp-height="1667" target="_blank"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/6/img-200.jpg" alt=""/> </a> </div> </div> <hr/> <h2 id="spotlight-js"><a href="https://nextapps-de.github.io/spotlight/">Spotlight JS</a></h2> <div class="spotlight-group"> <a class="spotlight" href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/1/img-2500.jpg"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/1/img-200.jpg"/> </a> <a class="spotlight" href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/2/img-2500.jpg"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/2/img-200.jpg"/> </a> <a class="spotlight" href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/3/img-2500.jpg"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/3/img-200.jpg"/> </a> </div> <div class="spotlight-group"> <a class="spotlight" href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/4/img-2500.jpg"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/4/img-200.jpg"/> </a> <a class="spotlight" href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/5/img-2500.jpg"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/5/img-200.jpg"/> </a> <a class="spotlight" href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/6/img-2500.jpg"> <img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/6/img-200.jpg"/> </a> </div> <hr/> <h2 id="venobox"><a href="https://veno.es/venobox/">Venobox</a></h2> <p><a class="venobox" data-gall="myGallery" href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/1/img-2500.jpg"><img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/1/img-200.jpg"/></a> <a class="venobox" data-gall="myGallery" href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/2/img-2500.jpg"><img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/2/img-200.jpg"/></a> <a class="venobox" data-gall="myGallery" href="https://cdn.photoswipe.com/photoswipe-demo-images/photos/3/img-2500.jpg"><img src="https://cdn.photoswipe.com/photoswipe-demo-images/photos/3/img-200.jpg"/></a></p>]]></content><author><name></name></author><category term="sample-posts"/><category term="formatting"/><category term="images"/><summary type="html"><![CDATA[this is what included image galleries could look like]]></summary></entry><entry><title type="html">Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra</title><link href="https://chunzhuo.github.io/zh/blog/2024/google-gemini-updates-flash-15-gemma-2-and-project-astra/" rel="alternate" type="text/html" title="Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra"/><published>2024-05-14T00:00:00+00:00</published><updated>2024-05-14T00:00:00+00:00</updated><id>https://chunzhuo.github.io/blog/2024/google-gemini-updates-flash-15-gemma-2-and-project-astra</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2024/google-gemini-updates-flash-15-gemma-2-and-project-astra/"><![CDATA[<p>Learn more:Learn more:Learn more:Learn more:Learn more:Learn more:May 14, 2024 We’re introducing a series of updates across the Gemini family of models, including the new 1.5 Flash, our lightweight model for speed and efficiency, and Project Astra, our vision for the future of AI assistants. In December, we launched our first natively multimodal model Gemini 1.0 in three sizes: Ultra, Pro and Nano. Just a few months later we released 1.5 Pro, with enhanced performance and a breakthrough long context window of 1 million tokens.Developers and enterprise customers have been putting 1.5 Pro to use in incredible ways and finding its long context window, multimodal reasoning capabilities and impressive overall performance incredibly useful.We know from user feedback that some applications need lower latency and a lower cost to serve. This inspired us to keep innovating, so today, we’re introducing Gemini 1.5 Flash: a model that’s lighter-weight than 1.5 Pro, and designed to be fast and efficient to serve at scale.Both 1.5 Pro and 1.5 Flash are available in public preview with a 1 million token context window in Google AI Studio and Vertex AI. And now, 1.5 Pro is also available with a 2 million token context window via waitlist to developers using the API and to Google Cloud customers.We’re also introducing updates across the Gemini family of models, announcing our next generation of open models, Gemma 2, and sharing progress on the future of AI assistants, with Project Astra.Context lengths of leading foundation models compared with Gemini 1.5’s 2 million token capability1.5 Flash is the newest addition to the Gemini model family and the fastest Gemini model served in the API. It’s optimized for high-volume, high-frequency tasks at scale, is more cost-efficient to serve and features our breakthrough long context window.While it’s a lighter weight model than 1.5 Pro, it’s highly capable of multimodal reasoning across vast amounts of information and delivers impressive quality for its size.The new Gemini 1.5 Flash model is optimized for speed and efficiency, is highly capable of multimodal reasoning and features our breakthrough long context window.1.5 Flash excels at summarization, chat applications, image and video captioning, data extraction from long documents and tables, and more. This is because it’s been trained by 1.5 Pro through a process called “distillation,” where the most essential knowledge and skills from a larger model are transferred to a smaller, more efficient model.Read more about 1.5 Flash in our updated Gemini 1.5 technical report, on the Gemini technology page, and learn about 1.5 Flash’s availability and pricing.Over the last few months, we’ve significantly improved 1.5 Pro, our best model for general performance across a wide range of tasks.Beyond extending its context window to 2 million tokens, we’ve enhanced its code generation, logical reasoning and planning, multi-turn conversation, and audio and image understanding through data and algorithmic advances. We see strong improvements on public and internal benchmarks for each of these tasks.1.5 Pro can now follow increasingly complex and nuanced instructions, including ones that specify product-level behavior involving role, format and style. We’ve improved control over the model’s responses for specific use cases, like crafting the persona and response style of a chat agent or automating workflows through multiple function calls. And we’ve enabled users to steer model behavior by setting system instructions.We added audio understanding in the Gemini API and Google AI Studio, so 1.5 Pro can now reason across image and audio for videos uploaded in Google AI Studio. And we’re now integrating 1.5 Pro into Google products, including Gemini Advanced and in Workspace apps.Read more about 1.5 Pro in our updated Gemini 1.5 technical report and on the Gemini technology page.Gemini Nano is expanding beyond text-only inputs to include images as well. Starting with Pixel, applications using Gemini Nano with Multimodality will be able to understand the world the way people do — not just through text, but also through sight, sound and spoken language.Read more about Gemini 1.0 Nano on Android.Today, we’re also sharing a series of updates to Gemma, our family of open models built from the same research and technology used to create the Gemini models.We’re announcing Gemma 2, our next generation of open models for responsible AI innovation. Gemma 2 has a new architecture designed for breakthrough performance and efficiency, and will be available in new sizes.The Gemma family is also expanding with PaliGemma, our first vision-language model inspired by PaLI-3. And we’ve upgraded our Responsible Generative AI Toolkit with LLM Comparator for evaluating the quality of model responses.Read more on the Developer blog.As part of Google DeepMind’s mission to build AI responsibly to benefit humanity, we’ve always wanted to develop universal AI agents that can be helpful in everyday life. That’s why today, we’re sharing our progress in building the future of AI assistants with Project Astra (advanced seeing and talking responsive agent).To be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do — and take in and remember what it sees and hears to understand context and take action. It also needs to be proactive, teachable and personal, so users can talk to it naturally and without lag or delay.While we’ve made incredible progress developing AI systems that can understand multimodal information, getting response time down to something conversational is a difficult engineering challenge. Over the past few years, we’ve been working to improve how our models perceive, reason and converse to make the pace and quality of interaction feel more natural.Building on Gemini, we’ve developed prototype agents that can process information faster by continuously encoding video frames, combining the video and speech input into a timeline of events, and caching this information for efficient recall.By leveraging our leading speech models, we also enhanced how they sound, giving the agents a wider range of intonations. These agents can better understand the context they’re being used in, and respond quickly, in conversation.With technology like this, it’s easy to envision a future where people could have an expert AI assistant by their side, through a phone or glasses. And some of these capabilities are coming to Google products, like the Gemini app and web experience, later this year.We’ve made incredible progress so far with our family of Gemini models, and we’re always striving to advance the state-of-the-art even further. By investing in a relentless production line of innovation, we’re able to explore new ideas at the frontier, while also unlocking the possibility of new and exciting Gemini use cases.Learn more about Gemini and its capabilities. Your information will be used in accordance with Google’s privacy policy.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>      Done. Just one step more.
    
      Check your inbox to confirm your subscription.
    You are already subscribed to our newsletter.
    You can also subscribe with a
    different email address
    
    .
    
  Let’s stay in touch. Get the latest news from Google in your inbox.
          Follow Us
</code></pre></div></div>]]></content><author><name></name></author><category term="external-posts"/><category term="google"/><summary type="html"><![CDATA[We’re sharing updates across our Gemini family of models and a glimpse of Project Astra, our vision for the future of AI assistants.]]></summary></entry><entry xml:lang="en"><title type="html">a post with tabs</title><link href="https://chunzhuo.github.io/zh/blog/2024/tabs/" rel="alternate" type="text/html" title="a post with tabs"/><published>2024-05-01T00:32:13+00:00</published><updated>2024-05-01T00:32:13+00:00</updated><id>https://chunzhuo.github.io/blog/2024/tabs</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2024/tabs/"><![CDATA[<p>This is how a post with <a href="https://github.com/Ovski4/jekyll-tabs">tabs</a> looks like. Note that the tabs could be used for different purposes, not only for code.</p> <h2 id="first-tabs">First tabs</h2> <p>To add tabs, use the following syntax:</p> <div class="language-liquid highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">{%</span><span class="w"> </span><span class="nt">tabs</span><span class="w"> </span><span class="nv">group-name</span><span class="w"> </span><span class="cp">%}</span>

<span class="cp">{%</span><span class="w"> </span><span class="nt">tab</span><span class="w"> </span><span class="nv">group-name</span><span class="w"> </span><span class="nv">tab-name-1</span><span class="w"> </span><span class="cp">%}</span>

Content 1

<span class="cp">{%</span><span class="w"> </span><span class="nt">endtab</span><span class="w"> </span><span class="cp">%}</span>

<span class="cp">{%</span><span class="w"> </span><span class="nt">tab</span><span class="w"> </span><span class="nv">group-name</span><span class="w"> </span><span class="nv">tab-name-2</span><span class="w"> </span><span class="cp">%}</span>

Content 2

<span class="cp">{%</span><span class="w"> </span><span class="nt">endtab</span><span class="w"> </span><span class="cp">%}</span>

<span class="cp">{%</span><span class="w"> </span><span class="nt">endtabs</span><span class="w"> </span><span class="cp">%}</span>
</code></pre></div></div> <p>With this you can generate visualizations like:</p> <ul id="log" class="tab" data-tab="c3de71fb-01e2-4c88-8c7a-c7e8b12e7ea6" data-name="log"> <li class="active" id="log-php"> <a href="#">php </a> </li> <li id="log-js"> <a href="#">js </a> </li> <li id="log-ruby"> <a href="#">ruby </a> </li> </ul> <ul class="tab-content" id="c3de71fb-01e2-4c88-8c7a-c7e8b12e7ea6" data-name="log"> <li class="active"> <div class="language-php highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">var_dump</span><span class="p">(</span><span class="s1">'hello'</span><span class="p">);</span>
</code></pre></div></div> </li> <li> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">console</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="dl">"</span><span class="s2">hello</span><span class="dl">"</span><span class="p">);</span>
</code></pre></div></div> </li> <li> <div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">pputs</span> <span class="dl">'</span><span class="s1">hello</span><span class="dl">'</span>
</code></pre></div></div> </li> </ul> <h2 id="another-example">Another example</h2> <ul id="data-struct" class="tab" data-tab="59baf78d-2b02-4add-aef3-d45118743be8" data-name="data-struct"> <li class="active" id="data-struct-yaml"> <a href="#">yaml </a> </li> <li id="data-struct-json"> <a href="#">json </a> </li> </ul> <ul class="tab-content" id="59baf78d-2b02-4add-aef3-d45118743be8" data-name="data-struct"> <li class="active"> <div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">hello</span><span class="pi">:</span>
  <span class="pi">-</span> <span class="s2">"</span><span class="s">whatsup"</span>
  <span class="pi">-</span> <span class="s2">"</span><span class="s">hi"</span>
</code></pre></div></div> </li> <li> <div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"hello"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"whatsup"</span><span class="p">,</span><span class="w"> </span><span class="s2">"hi"</span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div> </li> </ul> <h2 id="tabs-for-something-else">Tabs for something else</h2> <ul id="something-else" class="tab" data-tab="dd45011f-e5a9-466d-a53c-881cbd683eb6" data-name="something-else"> <li class="active" id="something-else-text"> <a href="#">text </a> </li> <li id="something-else-quote"> <a href="#">quote </a> </li> <li id="something-else-list"> <a href="#">list </a> </li> </ul> <ul class="tab-content" id="dd45011f-e5a9-466d-a53c-881cbd683eb6" data-name="something-else"> <li class="active"> <p>Regular text</p> </li> <li> <blockquote> <p>A quote</p> </blockquote> </li> <li> <p>Hipster list</p> <ul> <li>brunch</li> <li>fixie</li> <li>raybans</li> <li>messenger bag</li> </ul> </li> </ul>]]></content><author><name></name></author><category term="sample-posts"/><category term="formatting"/><category term="code"/><summary type="html"><![CDATA[this is what included tabs in a post could look like]]></summary></entry><entry xml:lang="en"><title type="html">a post with typograms</title><link href="https://chunzhuo.github.io/zh/blog/2024/typograms/" rel="alternate" type="text/html" title="a post with typograms"/><published>2024-04-29T23:36:10+00:00</published><updated>2024-04-29T23:36:10+00:00</updated><id>https://chunzhuo.github.io/blog/2024/typograms</id><content type="html" xml:base="https://chunzhuo.github.io/blog/2024/typograms/"><![CDATA[<p>This is an example post with some <a href="https://github.com/google/typograms/">typograms</a> code.</p> <div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">```</span><span class="nl">typograms
</span><span class="sb">+----+
|    |---&gt; My first diagram!
+----+</span>
<span class="p">```</span>
</code></pre></div></div> <p>Which generates:</p> <pre><code class="language-typograms">+----+
|    |---&gt; My first diagram!
+----+
</code></pre> <p>Another example:</p> <div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">```</span><span class="nl">typograms
</span><span class="sb">.------------------------.
|.----------------------.|
||"https://example.com" ||
|'----------------------'|
| ______________________ |
||                      ||
||   Welcome!           ||
||                      ||
||                      ||
||  .----------------.  ||
||  | username       |  ||
||  '----------------'  ||
||  .----------------.  ||
||  |"*******"       |  ||
||  '----------------'  ||
||                      ||
||  .----------------.  ||
||  |   "Sign-up"    |  ||
||  '----------------'  ||
||                      ||
|+----------------------+|
.------------------------.</span>
<span class="p">```</span>
</code></pre></div></div> <p>which generates:</p> <pre><code class="language-typograms">.------------------------.
|.----------------------.|
||"https://example.com" ||
|'----------------------'|
| ______________________ |
||                      ||
||   Welcome!           ||
||                      ||
||                      ||
||  .----------------.  ||
||  | username       |  ||
||  '----------------'  ||
||  .----------------.  ||
||  |"*******"       |  ||
||  '----------------'  ||
||                      ||
||  .----------------.  ||
||  |   "Sign-up"    |  ||
||  '----------------'  ||
||                      ||
|+----------------------+|
.------------------------.
</code></pre> <p>For more examples, check out the <a href="https://google.github.io/typograms/#examples">typograms documentation</a>.</p>]]></content><author><name></name></author><category term="sample-posts"/><category term="formatting"/><category term="diagrams"/><summary type="html"><![CDATA[this is what included typograms code could look like]]></summary></entry></feed>