<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Lukas' Notes]]></title><description><![CDATA[Lukas' Notes]]></description><link>https://lukasnotes.dk</link><generator>RSS for Node</generator><lastBuildDate>Sat, 18 Apr 2026 23:24:32 GMT</lastBuildDate><atom:link href="https://lukasnotes.dk/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Building AI Automations with n8n]]></title><description><![CDATA[Preface
It is funny enough that I feel like I have to say that I hate writing and I am so happy that I finally sitting down and typing out these first words.
Back in 2025 I had a big burnout and sever]]></description><link>https://lukasnotes.dk/building-ai-automations-with-n8n</link><guid isPermaLink="true">https://lukasnotes.dk/building-ai-automations-with-n8n</guid><category><![CDATA[n8n]]></category><category><![CDATA[n8n workflows]]></category><category><![CDATA[AI-automation]]></category><category><![CDATA[AI]]></category><category><![CDATA[Enterprise AI]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Thu, 26 Mar 2026 10:12:24 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/651807c8c94eec3a732926c8/51cb672a-a73f-44c4-9e9d-0d528271591e.jpg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Preface</h2>
<p>It is funny enough that I feel like I have to say that I hate writing and I am so happy that I finally sitting down and typing out these first words.</p>
<p>Back in 2025 I had a big burnout and several good things came out from it. One of them is that in the late autumn I transitioned from a full-time developer behind a legacy application to pioneering the AI integrations and automations at the juice company.</p>
<p>Many laugh thinking that I joke when I say I do that because <em>"why would a food company need AI"</em> which is quite funny because it might the type of a company that can benefit the most.</p>
<p>I have been building various workflows with n8n since October and it has taught me a lot. I have done several things with AI so it was not a complete new thing but there were many new learnings and I am excited to share them with you in my diary.</p>
<h2>Steep learning curve with n8n</h2>
<blockquote>
<p>n8n is a more technical automation/integration platform, similar to <a href="http://make.com">make.com</a> or Zapier however allowing to dive deeper with the code blocks and data mutation outside the predefined generic nodes.</p>
</blockquote>
<p>I am as much of a developer as one can be. I seriously got into it when I was 16 years old when I was determined to use my summer vacation to build a web app - I did it. I love writing code because it's a first language to me and it feels so natural. That is not how I felt with n8n.</p>
<p>It required my brain to sweat quite a bit. There are many subtle pitfalls that only reveal themselves once you start building real workflows. There are probably countless of videos on the YouTube and articles out there teaching about those and I may have happened to skim through those parts. Maybe not. I will share some of those learnings here on my blog.</p>
<p>The biggest change is that in the code, let's say JavaScript since everyone is familiar, it is super easy to control concurrency. One either awaits one function or promise, or entire group of functions/promises. One explicitly defines where the data flows and what is returned. One can optionally enrich the data or flow via one or the other function without cracking the head. This is not so easy in n8n. Here are a few examples.</p>
<h2>Few pain points with n8n workflows</h2>
<h3>Return data</h3>
<p>Vertical order in n8n does matter, if there are two final nodes in the sub workflow and the one you don’t care is below the one you do care, it will output the wrong data. I would love that for enterprise approach one could explicitly select which return node is the actual return node and not depending on the order hierarchy.</p>
<p>The entry point is easy, the return is <em>easy</em> if there is only one node. However a good practice is to use the <code>Execution Data</code> node which lets you annotate your execution with various parameters - amazing for debugging. Now you have two nodes that are end nodes. The return data from the sub-workflow execution will be from the one that is the bottom-most. There is no explicit way to tell that in n8n. This makes me question whether n8n is truly enterprise‑ready in its current form. Big gotcha.</p>
<p>Nodes in n8n execute from left to right and from top to bottom. There also isn't any real concurrency if the parent execution wants to keep the track of. Meaning it's always left top to right bottom.</p>
<p>This bit me quite a lot. I feel itchy like after feeding an army of mosquitos in nature. I love converting bigger chunks of the workflow to a sub-workflow as soon as it makes sense to reuse or a soon as it logically means it can be encapsulated.</p>
<h3>Be generous with logging</h3>
<p>Debugging can be very cumbersome. I wish I could plug the errors to Sentry via some native connector but unfortunately that doesn't exist out-of-the-box.</p>
<img src="https://cdn.hashnode.com/uploads/covers/651807c8c94eec3a732926c8/d28c81e9-b624-44e9-b13a-1089e4930377.png" alt="" style="display:block;margin:0 auto" />

<p>Using the aforementioned <code>Execution Data</code> node is a must. In this node I always log the primary ID of the item I am processing. Such as Zendesk Ticket ID for the tickets there or the <code>internetMessageId</code> for the emails in Outlook.</p>
<p>Besides that I log extra information that I believe would make sense. Such as if your workflow has several branches (think processing extra images if exist) and whether this workflow has run that branch. You can have as many nodes along the workflow as you wish to enrich it.</p>
<p>One self decides on that. However searching for executions and filtering by the id is a game changer.</p>
<h3>Do not use Cmd + Z (or Ctrl + Z) // Undo more than once</h3>
<p>This is why I love computers. This is such a treat that one can try, go back and redo. People in sales, in medicine, in buildings can't quite do that. <em>Fuck around and find out</em> is how I got in my career where I am.</p>
<p>However NEVER click too many times Cmd - Z after dragging things around. My entire workflow got broken several times. I ended up experiencing vague execution errors saying the node doesn't exist or the data is wrong while iterating in the development. Downloading and re-importing the workflow or copying the nodes to another didn't help.</p>
<p>I had to go almost node by node in a new workflow and rebuild it from scratch. Now I fear the Undo like touching the hot pot.</p>
<h3>No first party nodes for tracking the AI usage, tokens, or raw responses</h3>
<p>This is huge. If not AI, n8n would be a small kid playing outside with anyone barely knowing its name. AI enabled people to build mini automations augmented with LLMs that bring enough value for it to blow up.</p>
<p>Crafting workflows in the enterprise, ideating, iterating and prototyping, for someone with ML training background, is a no-brainer to log the input/output for the AI that later can be used for Evals, for fine-tuning, for error or cost insights.</p>
<p>To this day, using the latest version at the day of writing, there is no node or clean way to store ai exchanges.</p>
<p>My first solution was to call the LLM node such as Claude to <code>ClaudeAiClient</code> and have sub workflow execute after the parent finishes (with no await for completion). The sub workflow would then retrieve the execution, find the <code>*AiClient</code> node and extract the messages and store to the database. This way I had exact tokens, raw messages and everything one would need.</p>
<p>Eventually our workflows have evolved a lot and due to other things I won't discuss here, I gave up on that detail logging and now just store the clean output in the main workflow.</p>
<p>I dream that n8n would add a final node that allows to retrieve all ai exchanges for sub nodes and let us handle the data how we wish.</p>
<p>Enterprise loves to answer the question how much does classifying the tickets cost? Okay, does it bring enough value to pay for them? How about the writing the response? How much does that cost? Oh that is 10M of tokens a day.... You get the gist.</p>
<h2>The pleasure points with n8n</h2>
<p>I had to get my complaining out of the way so I can focus on the pleasure. The n8n is an amazing platform for prototyping once the steep curve is ironed out.</p>
<h3>Quick iteration</h3>
<p>A lot of these workflows are novel. AI is still novel. Less than 3 years ago we heard about GPT 4 coming out which was slow and stupid. Now I trust ChatGPT to do my SKAT book keeping because it's that good at deep research and comprehension.</p>
<p>Same applies in the corporate. We know that we want the help at X department by doing Y. But the stakeholders don't necessarily know all that it takes and most importantly all that it takes <strong>they do implicitly</strong>. It is such an important thing that applies to us all.</p>
<p>If I say I am gonna eat an apple, I will implicitly wash the apple and wash my hands. I will also take the knife and cut the apple to slices because I wear braces. Think about this. There are many more steps and conditional processing. If one has braces on - the apple needs to be cut. If the email has big and many attachments, they need to be pre-processed before the AI can read the email and take a decision.</p>
<p>The hard thing here is that often some of these steps are only discovered once the thing is running. Walking up with the laptop to review the workflow with a non‑technical stakeholder and looking at the outputs at various stages - is like fine‑tuning a radio. It becomes visual, intuitive, and collaborative. That's a big plus.</p>
<h3>Various Trigger nodes, aka integrations with platforms</h3>
<p>Even though I still want to write code for some of these, having the built in triggers such as Zendesk for new incoming tickets, it's a bliss. I don't need to deploy the code to the cloud, I don't need to go in the Settings and register the webhook, I don't need configure credentials (besides putting ones in the n8n for the node). It just works, most of the time.</p>
<p>The gotcha here is that one has to be careful of the events. Too many executions and one can blow away through the monthly allowance. So be careful. Otherwise, it is a bliss to iterate with that.</p>
<p>I have done some funny stuff with my own Strava, an app for storing the runs or bike rides.</p>
<h2>The greatest benefit of building these automations</h2>
<p>Documentation.</p>
<p>Despite the common questions such as are we now trying to replace people with AI. In my case and journey - absolutely not.</p>
<p>However what I really love about building these workflows is how much implicit documentation comes out. I am now working with three prototypes for the support center and handling a shared mailbox. Since the first run of the automation, I've spared a lot with my colleagues to improve and explain AI (aka: write the system prompts) what, why and how should XYZ be done. Ever since, AI has caught up by catching human error where the people at question don't follow their own rules or things they explained. That means two things - they didn't tell everything at first because for us humans, things become a reflex muscle that we perform subconsciously and others, we actually just forget and do a mistake based on the moon phase.</p>
<p>Thanks to these workflows, I have some of the best documentation for certain touch points than any other departments because if this is not the <em>"best",</em> it won't perform its duty and deliver the value.</p>
<h2>AI cannot be trusted</h2>
<p>This is the final point that I feel is very important to note down. The enormous amount of hype being pumped on a regular basis by the major LLM developers (OpenAI, Anthropic, Google) as well as the entire AI Agency community and SaaS builders is insane. My former manager and a friend has caught me stuck in that bubble back in 2023. Now around three later, AI is a helpful tool but it fails in ridiculous ways.</p>
<p>Once every a few hundred executions given the task where it must output structured data with two fields - a string <code>reasoning</code> field and an integer <code>correct_output</code> field being - a union of either <code>1</code> or <code>2</code> - it will output <code>0</code> or <code>3</code>. This is easy to show, but the funny/sad part is where the reasoning above is correct and the model still produced the wrong choice despite its own very reasoning.</p>
<p>Model in question has mostly been <code>Claude Sonnet 4.5</code>. The temperature by default is <code>0</code> aiming for the most consistent predictable outputs. The tricky part is that rerunning the same input will not cause the same error thus rendering any debugging basically useless. If it failed with illegal data? Simply rerun.</p>
<p>I have a few more examples that I will share in the future articles but this I find to be so small and so trivial task that it should never fail, in my perception.</p>
<h2>Final remarks</h2>
<p>I envision writing at least two more posts, based on how in-depth I am allowed to dive with the things I am engineering at the sandwich and juice company. One of them is about handling a shared Outlook Inbox that receives as many 5000+ emails a week, of which majority need to be accurately classified and stored, and a minority requires the human in the loop. The other being, classification and assistive reply support for the Customer Care on Zendesk system. Both are very interesting cases and have some exciting learning. Both have been running for several months and both departments are accustomed to have it running now.</p>
<p>If you're a human and read this far, I'd love to hear from you in the comments - are you exploring to use n8n? Are you evaluating the entire automation industry? Are you just learning? Or are you a seasoned <em>hacker/tinkerer</em> like me? :)</p>
]]></content:encoded></item><item><title><![CDATA[Migrating v4 to v5 in Vercel's AI SDK]]></title><description><![CDATA[This is about updating the backend while not breaking the frontend, over which I cannot update or release in a timely manner (it is a Chrome Extension - users running multiple versions, slow CWS review queue).
Background
Basically one of my side proj...]]></description><link>https://lukasnotes.dk/migrating-v4-to-v5-in-vercels-ai-sdk</link><guid isPermaLink="true">https://lukasnotes.dk/migrating-v4-to-v5-in-vercels-ai-sdk</guid><category><![CDATA[Vercel]]></category><category><![CDATA[ai sdk 4.0]]></category><category><![CDATA[Gen AI SDK]]></category><category><![CDATA[vercel ai sdk]]></category><category><![CDATA[AI]]></category><category><![CDATA[#ai-tools]]></category><category><![CDATA[cloudflare]]></category><category><![CDATA[cloudflare-worker]]></category><category><![CDATA[chrome extension]]></category><category><![CDATA[extension]]></category><category><![CDATA[TypeScript]]></category><category><![CDATA[migration]]></category><category><![CDATA[SOLID principles]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Thu, 23 Oct 2025 10:25:32 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1761215097080/59dfd290-31d0-47cf-88ab-f6ee32d80f63.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is about updating the backend while not breaking the frontend, over which I cannot update or release in a timely manner (it is a Chrome Extension - users running multiple versions, slow CWS review queue).</p>
<h1 id="heading-background">Background</h1>
<p>Basically one of my side projects is this extension for the flame looking swiping app - Hard to guess, I know! The APIs provided for it is hosted on the Cloudflare Workers. Amazing service for the value it brings just for a few dollars a month.</p>
<p>Since the early days, once I discovered Vercel’s AI SDK I started using it as it seems like an absolute no brainer to offer beautiful streaming experiences while keeping majority of flexibility of choosing models and providers and ofc designing the UI.</p>
<p>When version 5 came out and I looked into it first time end of August, it felt completely overwhelming. I was recovering from burnout at work and my brain wasn’t fully functional. I tried running the <code>codemod</code> it failed and so I left it.</p>
<p>Fast forward to the beginning of October, I am doing so much better, and looked into it again and actually it seemed really quite simple this time.</p>
<h1 id="heading-my-biggest-breaking-changes">My biggest breaking changes</h1>
<p>To preface, I created a new <code>git worktree</code> for both backend and frontend where I started to integrate things step by step focusing on as few changes as possible.</p>
<h2 id="heading-the-v5-response-uses-server-sent-events-sse-amp-pipev5streamtov4response">The v5 response uses Server Sent Events (SSE) &amp; <code>pipeV5StreamToV4Response</code></h2>
<blockquote>
<p>If you are new to SSE, check out MDN docs at <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events">https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events</a></p>
</blockquote>
<p>This is by far the biggest breaking change and everyone would be affected. All versions up to the <code>v5</code> were using Vercel’s proprietary invention, which was either basic text-only stream or a more complex data stream supporting tool calling etc. It’s actually pretty interesting and cool thing. If you’ve never written one yourself, feel free to check out their docs. The image below is from their <code>v4</code> docs.</p>
<p>v4: <a target="_blank" href="https://v4.ai-sdk.dev/docs/ai-sdk-ui/stream-protocol#stream-protocols">https://v4.ai-sdk.dev/docs/ai-sdk-ui/stream-protocol#stream-protocols</a></p>
<p>v5: <a target="_blank" href="https://ai-sdk.dev/docs/ai-sdk-ui/stream-protocol#stream-protocols">https://ai-sdk.dev/docs/ai-sdk-ui/stream-protocol#stream-protocols</a></p>
<p><img src="https://v4.ai-sdk.dev/_next/image?url=%2Fimages%2Fdata-stream-protocol.png&amp;w=3840&amp;q=75&amp;dpl=dpl_AxoVfVkRYvD85TVKk4ZGiUW7hWiu" alt /></p>
<p>Frankly, as cool as it is, I have no idea why they chose it in the first place. SSE was already available at the ChatGPT release in 2022 and I have even written a stream in PHP backend. Although this is exactly what I’m guilty of, reinventing the wheel. It seems cool and exciting at first but then it brings the maintenance burden and limitations. For what’s worth it - I really do appreciate SSE protocol.</p>
<p>I was looking into how I could rewrite the responses. The <code>text</code> protocol was out the window since I already use tool calling. Fortunately no images at this point via messages.</p>
<p>Lucky me, and I really mean it, someone shared a snippet of code that remaps the <code>v5</code> message parts to <code>v4</code> on project’s <a target="_blank" href="https://github.com/vercel/ai/issues/7993">GitHub issue</a>. The function is called <code>pipeV5StreamToV4Response</code> Once I tested that it actually works, I only needed to detect the client version. I went ahead and updated client headers to indicate whether it’s on a new version or old one.</p>
<p>This is how I plugged it into my backend API</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// in /src/functions/ai/useAI.ts</span>

...
<span class="hljs-comment">// clientVersionValue comes from headers, provided from arguments </span>
<span class="hljs-keyword">const</span> convertToV4 = clientVersionValue !== <span class="hljs-string">'v5'</span>;

 <span class="hljs-keyword">const</span> result = streamText({
    temperature: <span class="hljs-number">0.6</span>,
    maxOutputTokens: <span class="hljs-number">1</span>_000,
    messages: modelMessages, <span class="hljs-comment">// provided from arguments</span>
});


<span class="hljs-keyword">if</span> (convertToV4) {
    <span class="hljs-keyword">const</span> stream = result.toUIMessageStream({ onError });
    <span class="hljs-keyword">return</span> pipeV5StreamToV4Response(stream, {
        headers,
    });
}

<span class="hljs-keyword">return</span> result.toUIMessageStreamResponse({
    onError,
    headers,
});
</code></pre>
<pre><code class="lang-typescript"><span class="hljs-comment">// in /src/functions/helpers/pipeV5StreamToV4Response.ts</span>
<span class="hljs-keyword">import</span> <span class="hljs-keyword">type</span> { UIMessageChunk } <span class="hljs-keyword">from</span> <span class="hljs-string">'ai'</span>;

<span class="hljs-comment">// adapted from</span>
<span class="hljs-comment">// https://github.com/vercel/ai/issues/7993#issuecomment-3180974654</span>
<span class="hljs-comment">// credit for Mastra.AI for writing this piece of wonderful code</span>
<span class="hljs-comment">// also related:</span>
<span class="hljs-comment">// https://github.com/davidondrej/AI-CRM/blob/e7e3848a6ec99f4eee2c29d9208bd4bf11b0f6c9/crm-app/docs/v5-vercel-sdk.md</span>

<span class="hljs-comment">/** Pipes an AI SDK v5 UIMessage stream to a response in a v4 compatible format. */</span>
<span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">pipeV5StreamToV4Response</span>(<span class="hljs-params">
    stream: ReadableStream&lt;UIMessageChunk&gt;,
    responseInit?: ResponseInit,
</span>): <span class="hljs-title">Response</span> </span>{
    <span class="hljs-keyword">const</span> v4Response = createV4Response(stream, responseInit);
    <span class="hljs-keyword">return</span> v4Response;
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">createV4Response</span>(<span class="hljs-params">
    v5Stream: ReadableStream&lt;UIMessageChunk&gt;,
    responseInit?: ResponseInit,
</span>) </span>{
    <span class="hljs-keyword">const</span> v4Stream = v5Stream
        .pipeThrough(createV5ToV4Transformer())
        .pipeThrough(<span class="hljs-keyword">new</span> TextEncoderStream());
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> Response(v4Stream, {
        status: <span class="hljs-number">200</span>,
        headers: {
            <span class="hljs-string">'Cache-Control'</span>: <span class="hljs-string">'no-cache'</span>,
        },
        ...responseInit,
    });
}

<span class="hljs-keyword">type</span> StreamState = {
    messageCounter: <span class="hljs-built_in">number</span>;
    stepCounter: <span class="hljs-built_in">number</span>;
};

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">createV5ToV4Transformer</span>(<span class="hljs-params"></span>) </span>{
    <span class="hljs-keyword">const</span> state: StreamState = {
        messageCounter: <span class="hljs-number">0</span>,
        stepCounter: <span class="hljs-number">0</span>,
    };
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> TransformStream&lt;UIMessageChunk, <span class="hljs-built_in">string</span>&gt;({
        transform(chunk, controller) {
            <span class="hljs-keyword">try</span> {
                <span class="hljs-keyword">const</span> v4Chunk = transformV5ChunkToV4(chunk, state);
                <span class="hljs-keyword">if</span> (v4Chunk) {
                    controller.enqueue(v4Chunk + <span class="hljs-string">'\n'</span>);
                }
            } <span class="hljs-keyword">catch</span> (transformError) {
                <span class="hljs-comment">// noop</span>
            }
        },
    });
}

<span class="hljs-comment">/** Map of v5 stream prefixes to v4 stream prefixes. */</span>
<span class="hljs-keyword">const</span> DataStreamStringPrefixes = {
    text: <span class="hljs-string">'0'</span>,
    data: <span class="hljs-string">'2'</span>,
    error: <span class="hljs-string">'3'</span>,
    message_annotations: <span class="hljs-string">'8'</span>,
    tool_call: <span class="hljs-string">'9'</span>,
    tool_result: <span class="hljs-string">'a'</span>,
    tool_call_streaming_start: <span class="hljs-string">'b'</span>,
    tool_call_delta: <span class="hljs-string">'c'</span>,
    finish_message: <span class="hljs-string">'d'</span>,
    finish_step: <span class="hljs-string">'e'</span>,
    start_step: <span class="hljs-string">'f'</span>,
    reasoning: <span class="hljs-string">'g'</span>,
    source: <span class="hljs-string">'h'</span>,
    redacted_reasoning: <span class="hljs-string">'i'</span>,
    reasoning_signature: <span class="hljs-string">'j'</span>,
    file: <span class="hljs-string">'k'</span>,
};

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">transformV5ChunkToV4</span>(<span class="hljs-params">chunk: UIMessageChunk, state: StreamState</span>) </span>{
    <span class="hljs-keyword">switch</span> (chunk.type) {
        <span class="hljs-keyword">case</span> <span class="hljs-string">'text-start'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'text-delta'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.text}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify(
                chunk.delta,
            )}</span>`</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'text-end'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'error'</span>:
            <span class="hljs-keyword">try</span> {
                <span class="hljs-keyword">const</span> errorData =
                    <span class="hljs-keyword">typeof</span> chunk.errorText === <span class="hljs-string">'string'</span>
                        ? {
                              message: chunk.errorText,
                              code: <span class="hljs-string">'STREAM_ERROR'</span>,
                              timestamp: <span class="hljs-built_in">Date</span>.now(),
                          }
                        : chunk.errorText;
                <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.error}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify(
                    errorData,
                )}</span>`</span>;
            } <span class="hljs-keyword">catch</span> {
                <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.error}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify({
                    message: <span class="hljs-string">'Stream error'</span>,
                    code: <span class="hljs-string">'TRANSFORM_ERROR'</span>,
                }</span>)}`</span>;
            }
        <span class="hljs-keyword">case</span> <span class="hljs-string">'tool-input-start'</span>:
        <span class="hljs-keyword">case</span> <span class="hljs-string">'tool-input-delta'</span>:
            <span class="hljs-comment">// Don't stream partial tool calls for v4</span>
            <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'tool-input-available'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.tool_call}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify({
                toolCallId: chunk.toolCallId,
                toolName: chunk.toolName,
                args: chunk.input,
            }</span>)}`</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'tool-output-available'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.tool_result}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify({
                toolCallId: chunk.toolCallId,
                result: chunk.output,
            }</span>)}`</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'reasoning-start'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'reasoning-delta'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.reasoning}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify(
                chunk.delta,
            )}</span>`</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'reasoning-end'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'start'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'finish'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.finish_message}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify(
                {
                    finishReason: <span class="hljs-string">'stop'</span>,
                    metadata: chunk.messageMetadata,
                }</span>,
            )}`</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'start-step'</span>:
            state.stepCounter++;
            <span class="hljs-keyword">const</span> stepId = generateId(<span class="hljs-string">`step-<span class="hljs-subst">${state.stepCounter}</span>`</span>);
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.start_step}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify({
                messageId: stepId,
            }</span>)}`</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'finish-step'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.finish_step}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify({
                finishReason: <span class="hljs-string">'stop'</span>,
                usage: {
                    promptTokens: <span class="hljs-number">0</span>,
                    completionTokens: <span class="hljs-number">0</span>,
                }</span>,
                isContinued: false,
            })}`</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'message-metadata'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${
                DataStreamStringPrefixes.message_annotations
            }</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify([
                {
                    messageId: generateId(<span class="hljs-string">'msg'</span>),
                    metadata: chunk.messageMetadata,
                }</span>,
            ])}`</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'file'</span>:
            <span class="hljs-keyword">if</span> (<span class="hljs-string">'url'</span> <span class="hljs-keyword">in</span> chunk &amp;&amp; <span class="hljs-string">'mediaType'</span> <span class="hljs-keyword">in</span> chunk) {
                <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.file}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify({
                    url: chunk.url,
                    mediaType: chunk.mediaType,
                }</span>)}`</span>;
            }
            <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'source-url'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.source}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify({
                id: chunk.sourceId,
                title: chunk.title,
                url: chunk.url,
            }</span>)}`</span>;
        <span class="hljs-keyword">case</span> <span class="hljs-string">'source-document'</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${DataStreamStringPrefixes.source}</span>:<span class="hljs-subst">${<span class="hljs-built_in">JSON</span>.stringify({
                id: chunk.sourceId,
                title: chunk.title,
                mediaType: chunk.mediaType,
                filename: chunk.filename,
            }</span>)}`</span>;
        <span class="hljs-keyword">default</span>:
            <span class="hljs-comment">// Unknown chunk type, ignore</span>
            <span class="hljs-keyword">return</span> <span class="hljs-literal">null</span>;
    }
}

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">generateId</span>(<span class="hljs-params">prefix: <span class="hljs-built_in">string</span></span>) </span>{
    <span class="hljs-keyword">return</span> <span class="hljs-string">`<span class="hljs-subst">${prefix}</span>-<span class="hljs-subst">${<span class="hljs-built_in">Date</span>.now()}</span>-<span class="hljs-subst">${<span class="hljs-built_in">Math</span>.random().toString(<span class="hljs-number">36</span>).substr(<span class="hljs-number">2</span>, <span class="hljs-number">9</span>)}</span>`</span>;
}
</code></pre>
<p>The biggest thing solved. I now have streaming working with both the new and old clients.</p>
<h2 id="heading-mixed-up-types-amp-remapping-messages-with-attachments">Mixed up types &amp; Remapping messages with attachments</h2>
<p>The following part of migration was remapping the attachments. I began using those to send the match pictures a long to the model provider so LLMs can see the profile as well.</p>
<p>It took me a good hour of deep focus to carefully read documentation on both versions. Vercel did change things up and there was a lot of mixing between <code>mediaType</code>, <code>mimeType</code> and just <code>type</code> for providing the mime type, as well as the content and it got me really confused. I don’t quite know if I missed something or the team got wrangled up in their own types.</p>
<p>For example, in <code>v4</code> they had <code>experimental_attachments</code> with <code>FileList</code> since I provide URLs and don’t host those images myself, I need either an encoded <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/URI/Reference/Schemes/data">data-uri</a> or publicly accessible URL. So I used <code>Attachment</code> in <code>v4</code> and <code>FileUIPart</code> in <code>v5</code>.</p>
<p>v4: <a target="_blank" href="https://v4.ai-sdk.dev/docs/ai-sdk-ui/chatbot#attachments-experimental">https://v4.ai-sdk.dev/docs/ai-sdk-ui/chatbot#attachments-experimental</a></p>
<p>v5: <a target="_blank" href="https://ai-sdk.dev/docs/ai-sdk-ui/chatbot#attachments">https://ai-sdk.dev/docs/ai-sdk-ui/chatbot#attachments</a></p>
<p>Example objects between versions</p>
<pre><code class="lang-json"><span class="hljs-comment">// version 4 </span>
{
      name: 'earth.png',
      contentType: 'image/png',
      url: 'https:<span class="hljs-comment">//example.com/earth.png',</span>
 }

<span class="hljs-comment">// version 5</span>
 {
      type: 'file',
      filename: 'earth.png',
      mediaType: 'image/png',
      url: 'https:<span class="hljs-comment">//example.com/earth.png',</span>
 }
</code></pre>
<p><s>One confusing aspect was the </s> <code>FileUIPart</code> <s> type, which appears in multiple packages with different signatures. Vercel - but why???</s></p>
<p><s>During branch merging between versions I came to find out that actually one can import </s> <code>FileUIPart</code> <s> part from different packages. Take a look, the identical type however has a completely different signature.</s></p>
<p><code>import type { FileUIPart } from '@ai-sdk/ui-utils';</code></p>
<p><s>and</s></p>
<p><code>import type { FileUIPart } from 'ai';</code></p>
<p><s>Perhaps it wasn’t a big deal doing so in </s> <code>v4</code> <s> as the backend somehow worked (I didn’t dive deep) and was prep-ing for the </s> <code>v5</code><s>. However upon updating the client, I had to do quite a bit of digging to find out that I was fooling myself with a wrong type all along.</s></p>
<p>Upon sharing this on twitter, <a target="_blank" href="https://x.com/lgrammel/status/1983183817461641382">Lars Grammel explained</a> that <code>@ai-sdk/ui-utils</code> shall not be used for v5 (it remains active for v4). Chances are the <code>node_modules</code> folder wasn’t clean and hence I pulled it in accidentally.</p>
<blockquote>
<p>The image below is preparing my client running <code>v4</code>.</p>
</blockquote>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761210660691/2e3277e1-d897-4772-a7ea-ed1e871ee7a1.png" alt="I was preparing client in v4 to upgrade to v5 as easy as possible. But I used the wrong FileUIPart...." class="image--center mx-auto" /></p>
<p>and updating the options argument for the <code>submit</code> - <code>sendMessage</code> method:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1761212041021/f14627e1-ef95-42ea-ac2b-a124882247c8.png" alt class="image--center mx-auto" /></p>
<p>As for the backend, in <code>v5</code> I had to write a tiny little remapping to inject messages as message parts</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Start of v4 remapping -----------------------------------------------------</span>
<span class="hljs-comment">// This is only received from v4, delete once all clients are on v5</span>
<span class="hljs-keyword">const</span> remappedMessages = isSdkV4
    ? messages.map(<span class="hljs-function">(<span class="hljs-params">message</span>) =&gt;</span> {
            <span class="hljs-keyword">const</span> parts = message.parts || [];

            <span class="hljs-keyword">if</span> (
                <span class="hljs-string">'experimental_attachments'</span> <span class="hljs-keyword">in</span> message &amp;&amp;
                <span class="hljs-built_in">Array</span>.isArray(message.experimental_attachments)
            ) {
                message.experimental_attachments.forEach(<span class="hljs-function">(<span class="hljs-params">attachment</span>) =&gt;</span> {
                    parts.push({
                        <span class="hljs-keyword">type</span>: <span class="hljs-string">'file'</span>,
                        mediaType: attachment.contentType,
                        url: attachment.url,
                    });
                });
                <span class="hljs-keyword">delete</span> message.experimental_attachments;
            }

            <span class="hljs-keyword">return</span> {
                ...message,
                parts,
            };
        })
    : messages;

<span class="hljs-keyword">const</span> modelMessages = convertToModelMessages(remappedMessages);
<span class="hljs-comment">// console.log({ modelMessages, messages });</span>
<span class="hljs-comment">// End of v4 remapping</span>

<span class="hljs-keyword">const</span> res = streamText({
    ...
    messages: modelMessages,
    ...
});
</code></pre>
<p>For the clarity, in <code>v4</code> the <code>experimental_attachments</code> are automagically handled inside <code>streamText</code> function - it just works! In <code>v5</code> I have to remap using the snippet above into the parts before applying <code>convertToModelMessages</code> on the messages and sending to the <code>streamText</code> function.</p>
<p>Perhaps, after reading this section, you can see how many types there are. If I had to work on this migration more, I would create a mind map with all the types and the way they are used to have a clarity at sight. If I got as many bonuses as how many <code>keys</code> there are to send the <code>mimeType</code> of the attachment/image, I’d be on vacation in Hawaii haha.</p>
<p>Worth a mention, the attachments had to be rendered by me in UI and they were not tied to any message by the <code>useChat</code> logic - it was up to me, the developer, to decide where and how I render them. In <code>v5</code> they became the part of the message and would be rendered as a <code>file</code> type:</p>
<pre><code class="lang-typescript">{
  messages.map(<span class="hljs-function"><span class="hljs-params">message</span> =&gt;</span> (
    &lt;div key={message.id}&gt;
      {message.parts.map(<span class="hljs-function">(<span class="hljs-params">part, index</span>) =&gt;</span> {
        <span class="hljs-keyword">if</span> (part.type === <span class="hljs-string">'text'</span>) {
          <span class="hljs-keyword">return</span> &lt;div key={index}&gt;{part.text}&lt;/div&gt;;
        } <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (
          part.type === <span class="hljs-string">'file'</span> &amp;&amp;
          part.mediaType.startsWith(<span class="hljs-string">'image/'</span>)
        ) {
          <span class="hljs-keyword">return</span> &lt;img key={index} src={part.url} /&gt;;
        }
      })}
    &lt;/div&gt;
  ));
}
</code></pre>
<p>This is a perfect intro to transition to the next and the last major section.</p>
<h2 id="heading-update-frontend-client-to-use-messageparts-instead-of-messagecontent">Update frontend client to use <code>message.parts</code> instead of <code>message.content</code></h2>
<p>This is technically not a breaking change because this was already available in <code>v4</code> and I was partially using it in the frontend - but not everywhere. So I had to go over each place and ensuring I no longer relied on the deprecated <code>message.content</code> property for rendering.</p>
<p>However, there were a change in the way the parts are made and rendered.</p>
<p>It became more of a free-form array with support for various parts and I strongly agree with that type of a design. It’s very <em>SOLID</em>.</p>
<p>docs: <a target="_blank" href="https://ai-sdk.dev/docs/migration-guides/migration-guide-5-0#content--parts-array">https://ai-sdk.dev/docs/migration-guides/migration-guide-5-0#content--parts-array</a></p>
<p>For example, tool calls now include tool name and reasoning is now a part matching the standard reply in itself. This makes designing re-usable components so much more intuitive and simple.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// version 4</span>
{
  message.parts.map(<span class="hljs-function"><span class="hljs-params">part</span> =&gt;</span> {
    <span class="hljs-keyword">if</span> (part.type === <span class="hljs-string">'tool-invocation'</span>) {
      <span class="hljs-keyword">return</span> &lt;div&gt;{part.toolInvocation.toolName}&lt;/div&gt;;
    }
  });
}
{
  message.parts.map(<span class="hljs-function">(<span class="hljs-params">part, index</span>) =&gt;</span> {
    <span class="hljs-keyword">if</span> (part.type === <span class="hljs-string">'reasoning'</span>) {
      <span class="hljs-keyword">return</span> (
        &lt;div key={index} className=<span class="hljs-string">"reasoning-display"</span>&gt;
          {part.reasoning}
        &lt;/div&gt;
      );
    }
  });
}

<span class="hljs-comment">// version 5</span>
<span class="hljs-comment">// Type-safe tool parts with specific names</span>
{
  message.parts.map(<span class="hljs-function"><span class="hljs-params">part</span> =&gt;</span> {
    <span class="hljs-keyword">switch</span> (part.type) {
      <span class="hljs-keyword">case</span> <span class="hljs-string">'tool-getWeatherInformation'</span>:
        <span class="hljs-keyword">return</span> &lt;div&gt;Getting weather...&lt;/div&gt;;
      <span class="hljs-keyword">case</span> <span class="hljs-string">'tool-askForConfirmation'</span>:
        <span class="hljs-keyword">return</span> &lt;div&gt;Asking <span class="hljs-keyword">for</span> confirmation...&lt;/div&gt;;
    }
  });
}
{
  message.parts.map(<span class="hljs-function">(<span class="hljs-params">part, index</span>) =&gt;</span> {
    <span class="hljs-keyword">if</span> (part.type === <span class="hljs-string">'reasoning'</span>) {
      <span class="hljs-keyword">return</span> (
        &lt;div key={index} className=<span class="hljs-string">"reasoning-display"</span>&gt;
          {part.text}
        &lt;/div&gt;
      );
    }
  });
}
</code></pre>
<h2 id="heading-the-easy-stuff">The easy stuff</h2>
<h3 id="heading-using-codemod-would-import-old-zod-version">Using <code>codemod</code> would import old Zod version</h3>
<p>I have no idea why but when I tried the <code>npx @ai-sdk/codemod v5</code> to update the backend, it’d update all Zod imports to import the version 3 - <code>import * as z from "zod/v3";</code>. It happened before and after my own migration. I had already upgraded to the version 4 in <code>package.json</code>. Might be a bug or some stale cache issue. Who knows. I just had to revert those lines.</p>
<p>Overall, codemod was only helpful to see that I haven’t missed anything myself but by means was it a replacement for entire work of mine.</p>
<p>Keep a sharp eye on that.</p>
<h3 id="heading-updating-small-renames">Updating small renames</h3>
<p>The rest of changes were mostly easy such as changing <code>maxTokens</code> to <code>maxOutputTokens</code>, renaming <code>coreMessages</code> to <code>modelMessages</code> and so on so far.</p>
<p>Fortunately there weren’t many variables that needed to be updated so I am really grateful for that.</p>
<p>Those are well documented in the official migration guide: <a target="_blank" href="https://ai-sdk.dev/docs/migration-guides/migration-guide-5-0">https://ai-sdk.dev/docs/migration-guides/migration-guide-5-0</a>.</p>
<p>I think my biggest score is that I never stored these messages in the database. The chats are sort of ephemeral. They’re cached on the client and in the transit to an extent but that’s it. I really feel like I scored a ton for not needing to migrate database schemas or alike.</p>
<h1 id="heading-summary">Summary</h1>
<p>It was somewhat overwhelming at first but after a weekend and 10-20 hours of deep focus, I was done and very happy.</p>
<p>If I were to do it again, I don’t think I’d change anything major. My biggest lesson was to keep changes in previous versions as up-to date as possible (i.e. adopt message parts instead of dot-content or dot-reasoning).</p>
<p>My biggest payout in my project repositories came out - keep complexity simple and have some abstraction between someone else code (the AI SDK - in this case) and my own code logic. That is, while my extension supports multiple platforms such as whatsapp, tinder, bumble, I have shared set of methods that accept a standardized structure by the provider and thus I only need to handle this set of methods to adapt to the new structure to the AI SDK. It’s like a mini proxy.</p>
<p>Typically, I feel lazy to write those on small projects but as soon as there are multiple sources, it is such a no-brainer investment. If one day I wanna ditch Vercel to use any other proxy, it’d not be too hard (hopefully that day never comes).</p>
<p>Hope that helps!</p>
]]></content:encoded></item><item><title><![CDATA[Unread GitHub notification is killing me]]></title><description><![CDATA[For a couple of weeks, if not months at this point, I’ve been feeling terrorized as I am someone who only subscribes to notifications that are of an interest to me. That is so, when I find myself on the GitHub website, I would intentionally click on ...]]></description><link>https://lukasnotes.dk/unread-github-notification-is-killing-me</link><guid isPermaLink="true">https://lukasnotes.dk/unread-github-notification-is-killing-me</guid><category><![CDATA[GitHub]]></category><category><![CDATA[notifications]]></category><category><![CDATA[curl]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Thu, 09 Oct 2025 09:39:03 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1760005729420/5ea395ba-dabb-4423-b2bb-6d08945b83c7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For a couple of weeks, if not months at this point, I’ve been feeling terrorized as I am someone who only subscribes to notifications that are of an interest to me. That is so, when I find myself on the GitHub website, I would intentionally click on them and be informed of any issues that matter to me.</p>
<p>However since late summer I’ve been seeing a badge of new notification but there was no way to clear it. I was hoping it’d go away by itself one day. I even tried the GitHub app on the phone. Nothing.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760002680433/b89ba91d-47ff-4759-b95d-33fd32d77445.png" alt class="image--center mx-auto" /></p>
<p>My OCD is unfortunately too big and it got to the point where I was ready to open a ticket and complain about it. But I must say they did a good job suggesting issues that might be related.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760001654007/29a7da38-705a-4e7a-9c53-2e22ae495c4a.png" alt class="image--center mx-auto" /></p>
<p>I click on the first one and quickly discover a pleasant idea - to use the API to mark all notifications as read.</p>
<p>Ref: <a target="_blank" href="https://github.com/orgs/community/discussions/6874#discussioncomment-2859125">https://github.com/orgs/community/discussions/6874#discussioncomment-2859125</a></p>
<p>Let’s try. I open my terminal in Warp, setup a GitHub token using 1password (I already use that) and make the <code>cURL</code> call.</p>
<pre><code class="lang-bash">TOKEN=$(op <span class="hljs-built_in">read</span> op://Personal/GithubAccountItem/token)
curl -X PUT \
    -H <span class="hljs-string">"Accept: application/vnd.github.v3+json"</span> \
    -H <span class="hljs-string">"Authorization: token <span class="hljs-variable">$TOKEN</span>"</span> \
     https://api.github.com/notifications \
    -d <span class="hljs-string">'{"last_read_at":"2025-10-10T00:00:00Z"}'</span>
</code></pre>
<p>Yayy, I feel relieved! I am no longer distracted and that matters a ton to me.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1760002041670/4c8bfb0d-e335-4c76-a800-5d803c7dcfca.png" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[Working with legacy hurts...]]></title><description><![CDATA[Yes, I switched companies for the sole purpose of being afraid of staying in the comfort zone and a decade later realizing that the world is not what my reality had been shaping it. Now I don't think there was anything wrong with that per se, but I a...]]></description><link>https://lukasnotes.dk/working-with-legacy-hurts</link><guid isPermaLink="true">https://lukasnotes.dk/working-with-legacy-hurts</guid><category><![CDATA[legacy-systems]]></category><category><![CDATA[vibe coding]]></category><category><![CDATA[lessons learned]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Sun, 23 Mar 2025 07:22:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Oal07Ai4oTk/upload/4f6b832541d7ce19b73eab76afa6ca91.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Yes, I switched companies for the sole purpose of being afraid of staying in the comfort zone and a decade later realizing that the world is not what my reality had been shaping it. Now I don't think there was anything wrong with that per se, but I am definitely learning and definitely out of the comfort circle.</p>
<p>I work with a project that cannot afford any downtime. It powers a business spanning time zones around the clock. That means no cutting corners to migrate legacy, no cheat codes that I pretty regularly execute on my side projects.</p>
<p>I started writing this article out of frustration ignited when working on a particular task. The objective was to expose data on the new system, shown in the legacy system, that has its own unique database design, that we are forced to query from the new system running with Doctrine.</p>
<h2 id="heading-confusing-pivot-naming-to-foreign-tablescolumns">Confusing Pivot Naming to Foreign Tables/Columns</h2>
<p>For example, one pain point is obvious looking at this table architecture:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> products <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">id</span> = <span class="hljs-number">151</span>; <span class="hljs-comment">-- main table</span>
<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> products_info <span class="hljs-keyword">WHERE</span> product = <span class="hljs-number">151</span> <span class="hljs-keyword">AND</span> deleted <span class="hljs-keyword">IS</span> <span class="hljs-literal">NULL</span>; <span class="hljs-comment">-- versioning table</span>
</code></pre>
<p>I really hate that the Doctrine project I'm working on uses this naming column convention, where relationship columns don't include the <code>_id</code> suffix, where <code>id</code> is the column name on the foreign table that the pivot references.</p>
<p>I grew up working with Laravel and then spreading into other projects. I don't need to use raw SQL as often, since typically ORMs such as Laravel's Eloquent or Drizzle from the TypeScript world are simply amazing.</p>
<p>I've seen a lot of open source projects (obviously not as aged) and never came across this convention. I might be completely wrong but this is so hard to get used to. Even though it's been months that I've been working with this database, I still end up wasting hours, needing another day, another <em>look</em> to actually SEE "oh wait, this is actually the foreign column for <em>this</em> table and <em>this</em> column."</p>
<p>I could imagine that this is very much MY problem and the way I was raised. I imagine if I went to that one country where people bow to say NO and turn their head to say YES, I'd want to hang myself alive :-). It is definitely difficult to unlearn a convention. I don't want to either. In my opinion, a column containing data should be self-explanatory and a column containing a reference should also be self-explanatory as to which column it links to.</p>
<h2 id="heading-no-unique-indices-besides-the-primary-key-index">No (Unique) Indices Besides the Primary Key Index</h2>
<p>That brings me to the next part. I have absolutely no idea why back in the day, back in early 2000s/2010s, there was no convention to add unique indices/constraints.</p>
<p>There are so many tables with the only index being a <em>Primary Key</em> constraint. Just as there are many tables where duplicate data is inserted sometimes as many as a dozen times due to lack of unique constraints (and no code-side checks).</p>
<p>It makes me think that a career on UpWork to optimize one's database queries by merely adding indices could be a great one. Of course, that implicitly comes with a great clean-up task before constraints can be added.</p>
<h2 id="heading-not-storing-file-upload-status-on-the-table">Not Storing File Upload Status on the Table</h2>
<p>This is a very recent task that I find rather comical. Some data comes along with media attached to it. A user can upload, delete, view it. I am really glad the system leverages S3 storage, since it's one of my favorite technologies in web dev. However, instead of storing the bucket path in the database, or at least storing a boolean whether the upload was done or not (it's optional and rare), the code will list all files (however it won't iterate, so it was capped at 1000 first results) in the bucket, store them into an in-memory array and then check based on a certain key if the file exists, which will decide if it's included as a pre-signed URL in the API response.</p>
<p>I bet that's what AI would vibe code, or what me as a 16-year-old learning to code would write. But not what a professional developer in a multinational company should ever put into production.</p>
<p>There are a couple of problems:</p>
<ul>
<li><p>At the very least, it should iterate and ensure all results are returned.</p>
</li>
<li><p>It should implement a very primitive cache in Redis with tags to store that in-memory array. Tags are to be purged upon new upload. Or simply update the list in Redis upon upload.</p>
</li>
<li><p>But that is redundant. There is no need to query S3 server-side other than uploading or deleting the file.</p>
</li>
<li><p>Ideally, a user should store the bucket path on the table column that could be nullable.</p>
</li>
<li><p>If a record exists, presign the URL, and return it.</p>
</li>
<li><p>Very fast, clean, and scalable.</p>
</li>
</ul>
<p>And there is another list of countless examples that really frustrates me. Sometimes it makes me question and self-doubt what the meaning of working on things like that is. That is hard, because I won't know until I'm gray and old if I was right or wrong thinking the way I am. But right now, I strive to remind myself that character traits and awareness that I can grow working in contexts like this, is going to be a great reward down the path.</p>
<p>As The Primeagen says, if you choose the easy path now, you pay the price later. And vice versa. Or as my grandma used to say, what you learn, you don't carry on your back. It feels like I'm paying the price for someone else now but we shall find out haha! I also write this to double down on my positive mindset and show myself what I am learning.</p>
<h2 id="heading-something-im-learning">Something I'm Learning…</h2>
<ul>
<li><p>If you migrate something in your folder structure, do it all or none; so you don't end up with a folder of <code>shared</code>, <code>dump</code>, <code>smart</code>, <code>new</code> and <code>components</code> for storing components for a frontend.</p>
</li>
<li><p>Prefer to stick with defaults for as long as possible, because one day, when you will want to migrate, it will be FAR easier to understand how things work now and how to migrate it - looking at the legacy frontend project running WebPack with a 100 and 1 customization.</p>
</li>
<li><p>Isolate to one change per PR as much as possible. I hate this in practice because it feels so <em>easy and tempting</em> to refactor everything at once (might be my ADHD/OCD to blame) but it is Achilles' heel when working with a sensitive system and legacy code base.</p>
</li>
<li><p>Patience, really, just be slow and do things slow. Perhaps the biggest thing that sometimes drives me absolutely nuts. I will sit in front of a computer yet my heart will race up like I'm running up a hill. It's funny I react that way. Makes me even laugh afterwards realizing the irony. But it does make me wonder how I can build this mental muscle to be indifferent. Because with the rise of "vibe coding," people with the ability to work on legacy will be in as high demand as developers were a decade ago.</p>
</li>
<li><p>Of course, plenty of examples of what NOT to do in architecting the system for scale and many years to maintain in production.</p>
</li>
</ul>
<h2 id="heading-takeaways">Takeaways</h2>
<ul>
<li><p>If you're not a very calm person by personality (that is, you have ADHD), meditate each morning. As little as 8 minutes can do wonders. 15 minutes is wonderful. It helps a lot to remain unaffected when literally needing to dive into challenging code. I now do this daily. And without it, I notice how I'm all over the place. Just to double down, this with running outside, crosses out more than half of the pills &amp; tips in the <em>/r/Nootropics</em> subreddit.</p>
</li>
<li><p>Consistently remind yourself that frustration comes as the <em>by-product</em> from learning something new. It is a chemical that gets released in our brain to enable elasticity to establish new <strong>synapses</strong> (the links between neurons, between our knowledge storage). Plus, building this versatility against legacy is going to be a superior shield against AI taking away your job sooner. I don't know if I know this for 100% sure but I'd rather believe this than waste energy doubting.</p>
</li>
<li><p>I'm here to learn. I hope that my mind will absorb these mistakes of others and I can learn from them. So that the code I write will look better for future developers. :)</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[How I Got a Free NemKonto for one man's company in Denmark]]></title><description><![CDATA[As I strive to improve Danish, I wrote this article first in Danish and then AI-translated to English. Read Danish version here https://lukasnotes.dk/hvordan-jeg-fik-gratis-nemkonto-til-min-enkeltmandsvirksomhed.

My Story
When ChatGPT came out, I wa...]]></description><link>https://lukasnotes.dk/how-i-got-a-free-nemkonto-for-one-mans-company-in-denmark</link><guid isPermaLink="true">https://lukasnotes.dk/how-i-got-a-free-nemkonto-for-one-mans-company-in-denmark</guid><category><![CDATA[NemKonto]]></category><category><![CDATA[enkeltmandsvirksomhed]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Sat, 22 Mar 2025 13:05:06 GMT</pubDate><content:encoded><![CDATA[<p>As I strive to improve Danish, I wrote this article first in Danish and then AI-translated to English. Read Danish version here <a target="_blank" href="https://lukasnotes.dk/hvordan-jeg-fik-gratis-nemkonto-til-min-enkeltmandsvirksomhed">https://lukasnotes.dk/hvordan-jeg-fik-gratis-nemkonto-til-min-enkeltmandsvirksomhed</a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742648677647/fb0e1d41-d160-4733-89ec-efc1bd2f06d5.jpeg" alt class="image--center mx-auto" /></p>
<h2 id="heading-my-story">My Story</h2>
<p>When ChatGPT came out, I was inspired to try building something in my spare time. In 2023, I opened a tiny sole proprietorship. I also opted to get a VAT number, as I could use it when ordering cloud services without paying an extra 25% in VAT. It's called EU reverse charge. And of course, this comes as a huge advantage for someone who only uses their own salary money and runs a non-commercial business.</p>
<p>Long story short, in 2024 I discovered that I had around 1,500 kroner stuck in the Danish Agency for Digital Government. Even though a sole proprietorship is almost the same as if I had done things without registering it, it doesn't use the same NemKonto. This is what I was told by a SKAT employee during a call regarding VAT.</p>
<p>I just want to say that SKAT's helpline is always so good and kind at explaining things and being patient with me as a foreigner who knows so little. I really appreciate that about Denmark.</p>
<p>Nevertheless, I did a little research and found that the cheapest account I could get still costs four thousand kroner per year. The idea is that you shouldn't have to pay Danish banks, but when you're not making a profit, it doesn't make sense to pay. I also had a good conversation with my friend, who has been self-employed for many years now, and he said that unfortunately, there's no free solution in the country.</p>
<p>But when I was receiving SU (Danish student grant), I once tried changing my NemKonto to the German N26 bank, and it worked perfectly. So I thought, why couldn't I do the same in this situation? I have a private Revolut account. But Revolut also has a freelancer add-on as an account for private users. It's called Revolut Pro. And it's not the same as Revolut Business or Lunar Business, which both have fees as of last year. The best thing about Revolut is that you can have many different currencies at the same time. So it's perfect for me.</p>
<h2 id="heading-the-how-to">The How-To</h2>
<p>A quick Google search, and I looked at the NemKonto website: <a target="_blank" href="https://www.nemkonto.dk/en/company/nemkonto-for-companies/nemkonto-in-a-foreign-financial-institution/"><strong>https://www.nemkonto.dk/virksomhed/nemkonto-for-virksomheder/nemkonto-i-et-udenlandsk-pengeinstitut/</strong></a></p>
<p>UPDATE on January 2026: They have finally added ability to setup account using MitID - took long enough, dear Denmark. Look for tab called <strong>“Application with MitID Erhverv”</strong>.</p>
<p><s>Under "Application without MitID Business" I found a PDF form: </s> <a target="_blank" href="https://www.nemkonto.dk/media/0pmjunqx/dk-blanket-virksomheder-webtilgaengelig.pdf"><strong><s>https://www.nemkonto.dk/media/0pmjunqx/dk-blanket-virksomheder-webtilgaengelig.pdf</s></strong></a></p>
<p><s>My first thought was that I would need to pay for legal services to get approved. But after reading it again, I saw that it's enough for two Danish citizens with CPR numbers to sign it. That's exactly what I did. I printed and filled out the PDF with my IBAN from my Revolut Pro account with DKK currency and, of course, my CVR number. I asked two of my colleagues at work to sign it, scanned it, and sent it in through the NemKonto contact page here: </s> <a target="_blank" href="https://kontakt.nemkonto.dk/"><strong><s>https://kontakt.nemkonto.dk/</s></strong></a></p>
<p><s>The same day, I got a reply by email that it would be forwarded to the Digital Government Agency. And after 3-4 weeks, I got a response in my e-Boks saying that I should wait for a One-Time Code in my mailbox. It arrived a couple of days later (thanks PostNord), I followed the instructions on the paper, got activated, and within two days, the money was in my account.</s></p>
<p>Super easy, completely free. I would say that the Revolut app is even better than DanskeBank, but of course, it depends on one's taste.</p>
]]></content:encoded></item><item><title><![CDATA[Hvordan jeg fik gratis NemKonto til min enkeltmandsvirksomhed]]></title><description><![CDATA[If you don’t read Danish, I have translated this article into English over at https://lukasnotes.dk/how-i-got-a-free-nemkonto-for-one-mans-company-in-denmark.

Mine historie
Da ChatGPT kom ud, blev jeg inspireret til at prøve at opbygge noget i min e...]]></description><link>https://lukasnotes.dk/hvordan-jeg-fik-gratis-nemkonto-til-min-enkeltmandsvirksomhed</link><guid isPermaLink="true">https://lukasnotes.dk/hvordan-jeg-fik-gratis-nemkonto-til-min-enkeltmandsvirksomhed</guid><category><![CDATA[NemKonto]]></category><category><![CDATA[enkeltmandsvirksomhed]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Sat, 22 Mar 2025 13:00:26 GMT</pubDate><content:encoded><![CDATA[<p>If you don’t read Danish, I have translated this article into English over at <a target="_blank" href="https://lukasnotes.dk/how-i-got-a-free-nemkonto-for-one-mans-company-in-denmark">https://lukasnotes.dk/how-i-got-a-free-nemkonto-for-one-mans-company-in-denmark</a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1742648092116/22630b8f-dc13-422f-bb97-05f1fc40abba.jpeg" alt class="image--center mx-auto" /></p>
<h2 id="heading-mine-historie">Mine historie</h2>
<p>Da ChatGPT kom ud, blev jeg inspireret til at prøve at opbygge noget i min egen fritid. I 2023 åbnede jeg en lillebitte enkeltmandsvirksomhed. Jeg sagde også ja tak til at få momsnummer, da jeg kan bruge det, når jeg bestiller cloud-tjenester uden at betale ekstra 25% i moms. Det hedder EU reverse charge. Og det kommer selvfølgelig som en stor fordel til én, der kun bruger egne lønpenge og kører <a target="_blank" href="https://skat.dk/erhverv/egen-virksomhed/afklar-virksomhedens-skatteforhold/ikke-erhvervsmaessig-virksomhed-eksempler">en ikke-erhvervsmæssig virksomhed</a>.</p>
<p>Lang historie kort, i 2024 fandt jeg ud af, at jeg har omkring 1.5k kroner, der sidder fast i Digital Styrelsen. Selvom enkeltmandsvirksomheden næsten er det samme, som hvis jeg havde gjort det uden at registrere den, er det ikke den samme NemKonto. Det blev jeg fortalt af en medarbejder hos SKAT under et opkald vedrørende moms.</p>
<p>Jeg skal bare sige, at SKATs hjælpelinje altid er så god og venlig til at forklare og være tålmodig med mig som en mand fra udlandet, der ved så lidt. Det sætter jeg stor pris på i Danmark.</p>
<p>Alligevel gjorde jeg en lille undersøgelse og fandt ud af, at den billigste konto, jeg kunne få, stadig koster fire tusind kroner om året. Det er meningen, at man ikke skal betale til de danske banker, men når man ikke laver overskud, giver det ikke mening at betale. Jeg havde også en god samtale med min ven, som har været selvstændig i mange år nu, og han sagde, at desværre er der ikke en gratis løsning i landet.</p>
<p>Men da jeg fik SU, prøvede jeg engang at skifte min NemKonto til den tyske N26 bank, og det virkede perfekt. Så jeg tænkte, hvorfor kunne jeg ikke gøre det samme i denne situation? Jeg har en Revolut privat konto. Men Revolut har også et freelancer-addon som en konto til private brugere. Det hedder Revolut Pro. Og det er ikke det samme som Revolut Business eller Lunar Business, som begge har gebyr siden sidste år. Det bedste ved Revolut er, at man kan have mange forskellige valutaer på samme tid. Så det passer perfekt til mig.</p>
<h2 id="heading-the-how-to">The How To</h2>
<p>En hurtig Google-søgning, og jeg kiggede på NemKontos hjemmeside: <a target="_blank" href="https://www.nemkonto.dk/virksomhed/nemkonto-for-virksomheder/nemkonto-i-et-udenlandsk-pengeinstitut/"><strong>https://www.nemkonto.dk/virksomhed/nemkonto-for-virksomheder/nemkonto-i-et-udenlandsk-pengeinstitut/</strong></a></p>
<p>OPDATERING i januar 2026: De har endelig tilføjet muligheden for at oprette en konto ved hjælp af MitID - det har taget lang tid nok, kære Danmark. Se efter fanen "<strong>Ansøgning med MitID Erhverv</strong>".</p>
<p><s>Under "Ansøgning uden MitID Erhverv" fandt jeg en PDF-blanket: </s> <a target="_blank" href="https://www.nemkonto.dk/media/0pmjunqx/dk-blanket-virksomheder-webtilgaengelig.pdf"><s>dk-blanket-virksomheder-webtilgaengelig.pdf</s></a></p>
<p><s>Min første tanke var, at jeg skulle betale for juristtjenester for at blive godkendt. Men efter jeg læste det igen, så jeg, at det er nok, at to danskere med CPR underskriver. Det er præcis, hvad jeg gjorde. Jeg udskrev og udfyldte PDF'en med min IBAN fra Revolut Pro konto med DKK valuta og selvfølgelig mit CVR. Jeg bad to af mine kolleger på arbejdspladsen om at underskrive, scannede det og sendte det ind gennem NemKontos kontaktside her: </s> <a target="_blank" href="https://kontakt.nemkonto.dk/"><strong><s>https://kontakt.nemkonto.dk/</s></strong></a></p>
<p><s>Samme dag fik jeg svar på email om, at det ville blive sendt videre til Digital Styrelsen. Og efter 3-4 uger fik jeg svar i e-Boks om, at jeg skulle vente på en One Time Code i min postkasse. Den ankom et par dage efter (tak PostNord), jeg fulgte instruktionerne på papiret, blev aktiveret, og efter to dage var pengene i min konto.</s></p>
<p>Super nemt, helt gratis. Jeg vil sige, at Revolut-appen er endnu bedre end DanskeBank, men selvfølgelig kommer det an på ens smag.</p>
]]></content:encoded></item><item><title><![CDATA[Migrating from Turso (libsql) to self hosted PostgreSQL 16 & connecting from Cloudflare Worker (Nov 2024)]]></title><description><![CDATA[Steps I took

Rewrite drizzle schema from sqlite to PostgreSQL and generate migration (*.sql) files.

Setup connection using Neon WebSocket proxy and test migration and code.

Attempt various ways to copy and settle on COPY via CSV type using stdout ...]]></description><link>https://lukasnotes.dk/migrating-from-turso-libsql-to-self-hosted-postgresql-16-connecting-from-cloudflare-worker-nov-2024</link><guid isPermaLink="true">https://lukasnotes.dk/migrating-from-turso-libsql-to-self-hosted-postgresql-16-connecting-from-cloudflare-worker-nov-2024</guid><category><![CDATA[turso]]></category><category><![CDATA[libsql]]></category><category><![CDATA[PostgreSQL]]></category><category><![CDATA[cloudflare-worker]]></category><category><![CDATA[neondatabase]]></category><category><![CDATA[Databases]]></category><category><![CDATA[honojs]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Wed, 29 Jan 2025 18:27:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Y7d265_7i08/upload/29e90ad4b7ff30395a1cef181d494fce.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-steps-i-took">Steps I took</h2>
<ol>
<li><p>Rewrite drizzle schema from sqlite to PostgreSQL and generate migration (*.sql) files.</p>
</li>
<li><p>Setup connection using Neon WebSocket proxy and test migration and code.</p>
</li>
<li><p>Attempt various ways to copy and settle on <code>COPY</code> via CSV type using <code>stdout</code> and <code>stdin</code>.</p>
</li>
<li><p>Perform complete migration during not busy moment.</p>
</li>
</ol>
<h2 id="heading-why-did-i-do-this">Why did I do this?</h2>
<p>In the early days of my little project for citizens of the world, I made a mistake by optimizing prematurely. I had initially chosen Cloudflare’s D1 database but I quickly outgrew the limits and was forced to migrate to Turso.</p>
<p>I wrote about that journey in this <a target="_blank" href="https://lukasnotes.dk/migrating-large-d1-to-turso">https://lukasnotes.dk/migrating-large-d1-to-turso</a> post.</p>
<p>After using the cloud option and the paid plan for a few months, I was still hitting the limits and seeing my bill grow even though there wasn’t that much usage. Without going into details, due to nature my app produces a lot of writes, which would cause a lot of changes in the WAL file, and that is how <code>libsql</code> tracks changes and sync to the local embedded replicated database. I do that to have zero latency reads from Laravel.</p>
<p>I had built a simple sync script, should you find that relevant. <a target="_blank" href="https://github.com/flexchar/turso-sync">https://github.com/flexchar/turso-sync</a></p>
<p>The main Laravel app leverages PostgreSQL. The god level database as some address in the rise of AI. It made me double my work in certain cases and limited the type of queries I could write since data was split across several databases - Postgres and Sqlite.</p>
<p>After some procrastination, last November I got myself together and began tackling the challenge.</p>
<h2 id="heading-is-turso-bad-is-it-not-good-enough">Is Turso bad? Is it not good enough?</h2>
<p>No, it is not bad at all. It’s all about picking the right tool for the job.</p>
<p>It is amazing what the guys behind Turso are doing with the Sqlite’s fork, libsql. I had a chance to have a call with the very man behind, Glauber Costa as well as attend a few weekly meetings on their Discord. They’re going strong and I believe they will go far.</p>
<p>Let alone their recent announcement for a complete Sqlite rewrite in Rust - the project called Limbo (<a target="_blank" href="https://github.com/tursodatabase/limbo">https://github.com/tursodatabase/limbo</a>).</p>
<p>I think for standard blogs or simple <em>CRUD</em> apps, it’s more than enough. Don’t hesitate too much with “how but start now”.</p>
<p>For me, I had cloned my app from another project where it as all about latency and I kept driving the road without realizing it was the wrong path for me.</p>
<h2 id="heading-making-a-plan">Making a plan</h2>
<p>I had quite a challenge actually doing the “not how but start now” as I wanted the <em>perfect</em> option. It’s such a weakness of mine. The first problem was to pick between:</p>
<p>A: do I create entire new database within the same <code>pgsql</code> container, or<br />B: do I run migrations in the same database that Laravel uses and is running its own migrations.</p>
<p>Neither sounded too nice. Route A would prevent me from making joints across tables in both sources. Route B could potentially cause issues with migrations, backups and collide each other.</p>
<p>Then came the thought of Route C. Let’s use the same database but different schemas. It’s like a namespace to group tables. It’s something unique to Postgres as neither Mysql nor Sqlite has it.</p>
<p>There’s a nice explanation on Reddit: <a target="_blank" href="https://www.reddit.com/r/PostgreSQL/comments/qlqmyh/can_someone_help_me_understand_schemas_and_how/">https://www.reddit.com/r/PostgreSQL/comments/qlqmyh/can_someone_help_me_understand_schemas_and_how/</a>. I had never used them myself but Laravel by default uses <code>public</code> schema. I decided to create <code>drizzle</code> schema and have a logical separation for the migrations and data being written from the worker.</p>
<p>It felt fine with me, and in retro, I can still say, it was a good decision.</p>
<h2 id="heading-establishing-a-connection-to-self-hosted-postgres-from-cloudflare-worker">Establishing a connection to self-hosted Postgres from Cloudflare Worker</h2>
<p>Before copying the data, I updated my worker’s Drizzle config to connect to Postgres. It didn’t come without challenges.</p>
<p>I care very much about the security and privacy, and I would really like to live my life without ending up leaking anyone’s data. I also self-host entire stack as it’s absurdly cheap on Hetzner. Even if one was to SSH on the server, the database connection is not exposed. There are no ports and TCPs. I use Docker and leverage UNIX sockets for ultimate performance. I absolutely love that!</p>
<p>Anyways, my ideal solution was to use Cloudflare Tunnel to tap into the container and expose the database on internal domain (protected by Cloudflare Zero Access) to my worker, also running on CF.</p>
<p>The issue is that worker’s runtime and postgres-js package did not support connecting directly. I could get some things to work locally but then not on the cloud/production, or other way around.</p>
<p>Fortunately, Neon Tech (the serverless &amp; bottomless storage postgres startup) has <a target="_blank" href="https://github.com/neondatabase/wsproxy">a wonderful container</a> to proxy requests from serverless environments to any postgres instance. However as I mentioned before, I prefer to use UNIX sockets. I happened to get very lucky to mix bits of Go knowledge with Claude Sonnet and have an updated code to connect via socket.</p>
<p>So now I had a docker compose stack with <code>pgsql</code>, <code>neon-ws-proxy</code> and <code>cloudflare-tunnel</code> exposing under private domain. After tinkering a bit with Drizzle Neon Client, I was able to connect. One catch, I have to close connections myself as sometimes it’d get issues.</p>
<p>Boom, it felt very nice. I was able to run migrations and had a list of tables in <code>drizzle</code> schema, right in my Laravel’s existing database.</p>
<h2 id="heading-copying-the-data-slow-start-fast-finish">Copying the data. Slow start, fast finish.</h2>
<p>I have users using almost all day long across the globe. Even though I can afford a bit of data loss due its nature in this table, I really want that to be as close to zero rows as possible. That means, I want to migrate from production to my local PG. Check everything runs smoothly, if everything seems &amp; feels good, deploy my worker and start migrating ASAP while new requests are written to the new database and some will temporarily fail due otherwise missing data.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">A larger organization with zero affordance for data loss would typically create a fallback option, where either they attempt to read from the new DB and fallback to the old DB; or keep the existing code and start backfilling data into new DB until it’s completely in sync, then flipping the switch.</div>
</div>

<h3 id="heading-via-script-to-read-and-insert-failed-route">Via script to read and insert (failed route)</h3>
<p>Initially I ought to copy via writing simple script in TS and using drizzle to read from Turso connection and insert into PostgrSQL connection. In short, I’ve wasted a good day and it was a pure failure. It was too slow.</p>
<p>It seems that for whatever the reason, Turso reads are slow, and if reading a table with bigger rows, I’d quickly exceed the maximum 10MB response size. A table of several millions of rows but I can only query a few hundred at most per time. That went slow. To put into perspective, I had around 30GBs of data and the migration could take entire week. I still don’t know why it was that slow. But this is also one of the reasons why I wanted to migrate. Sometimes abnormal things would occur with LibSQL instance and the Cloudflare sitting in front of it.</p>
<p>Since I was self-hosting <code>libsql</code> for around 4 months before the migration, I established a local connection forwarding the port from the server to my laptop, then I used Turso CLI <code>turso db shell</code> <a target="_blank" href="http://localhost:8080?token={authTokenHere}"><code>http://localhost:8080?token={authTokenHere}</code></a> to access the shell.</p>
<h3 id="heading-via-dump-and-copy-success">Via <code>.dump</code> and COPY (success)</h3>
<p>I then searched for possible tools to automagically import <code>sql</code> from Sqlite into Postgres. While there are several options, none gave me the magic and I came across the copy command and the <code>csv</code> route.</p>
<p>I dumped entire database to the <code>datalake.db</code> file using turso cli.</p>
<p>After a bit of trial and error, I found that dumping a single table to CSV file, copying it to the <code>pgsql</code> container and then loading via PG’s <code>COPY</code> statement I could import millions of rows in a matter of 1-3 minutes.</p>
<p>I had few gotchas.</p>
<ol>
<li><p>I had to make sure that the table is completely empty before running the copy, or I would encounter unique constraint issues or sometimes some errors.</p>
</li>
<li><p>Next gotcha was converting the date time string from Sqlite’s milliseconds to a proper UTC string that PG could understand.</p>
</li>
<li><p>Finally, my source schema was built over time with a lot of incremental migrations adding new columns, however when I rewrote it for PG, I only had one migration with a bit different order of columns. Making sure the order of columns matches exactly in the source and target.</p>
</li>
</ol>
<p>Eventually I nailed this process to the entire one-liner.</p>
<pre><code class="lang-plaintext">time sqlite3 /turso/datalake.db -header -csv "SELECT col_1,col_2,another_name,DATETIME(created_at/1000, 'unixepoch') as created_at FROM turso_table;" | PGPASSWORD=$DB_PASSWORD psql -U $DB_USERNAME -d $DB_DATABASE -c "\COPY drizzle.turso_table (my_col_1,my_col_2,my_another_col,created_at) FROM STDIN WITH (FORMAT csv, HEADER true)"
</code></pre>
<p>If you’re going to do the same, please read this carefully and play around to understand how it works. It gives far more value than a blind copy-paste.</p>
<h2 id="heading-execution">Execution</h2>
<p>That brings to the final step. I had successfully <code>COPY</code>-ied data to PostgreSQL. I have established connection from Drizzle in the Worker as well as able to read using Laravel Eloquent models by updating <code>$table</code> property to include the schema prefix.</p>
<p>I’m very happy!</p>
<p>I tweak my note file with copy paste commands ready for each database, I ensure I can access my production databases and containers.</p>
<p>I reflect on the usage logs and pick the least busy hour that happens to be around 7 AM CET and execute it.</p>
<p>It goes like:</p>
<ol>
<li><p>Complete dump of fresh production database into an <code>sqlite</code> file.</p>
</li>
<li><p>Create fresh migration on Production and deploy the worker.</p>
</li>
<li><p>Dump each table as CSV from the <code>sqlite</code> file.</p>
</li>
<li><p><code>COPY</code> the CSV into Postgres table, starting with most busy ones to the least ones.</p>
</li>
<li><p>Rinse &amp; repeat until I’m done.</p>
</li>
<li><p>Deploy changes on Laravel app.</p>
</li>
<li><p>Enjoy!</p>
</li>
</ol>
<h2 id="heading-runtime-adjustments">Runtime adjustments.</h2>
<p>When using Turso driver with Drizzle, I often would make a lot of calls in an <code>executionContext.waitUntil()</code> method, that essentially awaits promises after the worker has sent HTTP response.</p>
<p>Using PostgreSQL Neon Proxy driver, I quickly began facing <code>Too Many Connections</code> ocean of errors. I had to rewrite my endpoints to <code>await</code> before response and then explicitly close the connection from Neon driver.</p>
<p>Since lots of these endpoints are write only (think analytics type of API), it was rather un-impactful change to the end user. It did require me to do a bit of rewrite to have a better access to the DB client. I’m using HonoJS and its <code>c.set(‘db’, databaseClient)</code> is a wonderful way to pass it around.</p>
<h2 id="heading-final-thoughts-was-it-worth">Final thoughts. Was it worth?</h2>
<p>Absolutely!</p>
<p>I wish I had done this from the get-go. Although back when I first started, some things weren’t available so I probably would have struggled more.</p>
<p>My greatest wins are</p>
<ol>
<li><p>No more time outs or errors when <code>SELECT</code>'ing too much data. Sqlite cannot leverage multiple indices for faster queries, where as Postgres does it much faster.</p>
<ol>
<li><em>Everyone says Sqlite is the fastest at read speeds, so I wouldn’t rule out a skill issue here.</em></li>
</ol>
</li>
<li><p>Much less code to maintain!! Very important win. One less database to backup (and restore, should that ever happen).</p>
</li>
<li><p>I can write more complex and interesting queries. This is especially valuable for analytics. It has been an enormous pleasure.</p>
</li>
</ol>
]]></content:encoded></item><item><title><![CDATA[Letting Bun & TypeScript do my VAT numbers (Moms til SKAT Erhverv)]]></title><description><![CDATA[I am trying to build something on the side and so I've gotten my company number together with VAT registration. That means I can make purchases with VAT reverse charge. In Denmark, VAT is 25%. Excluding that means 1/4th cut in price which for a self-...]]></description><link>https://lukasnotes.dk/letting-bun-typescript-do-my-vat-numbers-moms-til-skat-erhverv</link><guid isPermaLink="true">https://lukasnotes.dk/letting-bun-typescript-do-my-vat-numbers-moms-til-skat-erhverv</guid><category><![CDATA[skat]]></category><category><![CDATA[Denmark]]></category><category><![CDATA[Bun]]></category><category><![CDATA[TypeScript]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Wed, 29 May 2024 16:12:48 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716999128071/8f1727d0-b407-4606-a1be-6f0c2cf70dc0.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I am trying to build something on the side and so I've gotten my company number together with VAT registration. That means I can make purchases with VAT reverse charge. In Denmark, VAT is 25%. Excluding that means 1/4th cut in price which for a self-funded hustle is very appreciated.</p>
<h2 id="heading-pre-history">Pre history</h2>
<p>When I first registered, I was using a bank that supported holding balances in both DKK, EUR and USD. That made it a pain to do proper book keeping because of mixed currencies. I tried Dinero.dk which was nice but it freaked out when I tried setting currencies other than Danish Krone. Their support referred me to VISMA, the more serious platform. However I quickly found that it was prohibitively complicated and demanding for my very simple case.</p>
<p>As time proven, I settled for an Excel sheet. Basic columns such as total paid, vat amount, currency etc. Then I did my first submission of VAT, it is called Moms in Danish. I made some formulas and got what I wanted. But it was too much hassle than I would have preferred. I did it again and with the turn of 2024 I knew I will automate this as every real developer would do.</p>
<h2 id="heading-code">Code</h2>
<p>Something that changed is that this year, 2024, I decided to pay using Danish Krones no matter the destination currency (aka, to leverage the perks of credit cards). This removed the previous challenge and made it even simpler. Then I wrote the following code to fetch the csv of my records in Google Sheets and calculate.</p>
<h3 id="heading-more-fluff-of-mine">More fluff of mine</h3>
<p>Ironically enough, even though I am better versed in English than in Danish at the time being, I find certain explanations in Danish easier to understand. Speaking of which, I got to really acknowledge how great Denmark is for making it easy to understand the taxes.</p>
<blockquote>
<p>Both the documentation on the website provided in multiple languages and the helpful phone line, is what makes me pay taxes with the smile. Seriously good job by SKAT and that's one of places Denmark really stands out.</p>
</blockquote>
<h3 id="heading-finally-the-code">Finally the code</h3>
<p>So all in all, this is the script I came up with.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { parse } <span class="hljs-keyword">from</span> <span class="hljs-string">'csv-parse/sync'</span>;

<span class="hljs-keyword">const</span> { SHEETS_CSV_URL, FROM_DATE, TO_DATE } = Bun.env;

<span class="hljs-keyword">if</span> (!SHEETS_CSV_URL) {
    <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'SHEETS_CSV_URL is required'</span>);
    process.exit(<span class="hljs-number">1</span>);
}
<span class="hljs-keyword">if</span> (!FROM_DATE) {
    <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'FROM_DATE is required'</span>);
    process.exit(<span class="hljs-number">1</span>);
}
<span class="hljs-keyword">if</span> (!TO_DATE) {
    <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'TO_DATE is required'</span>);
    process.exit(<span class="hljs-number">1</span>);
}

<span class="hljs-keyword">const</span> csvString = <span class="hljs-keyword">await</span> (<span class="hljs-keyword">await</span> fetch(SHEETS_CSV_URL)).text();

<span class="hljs-keyword">const</span> records = parse(csvString, {
    cast: <span class="hljs-literal">true</span>,
    cast_date: <span class="hljs-literal">true</span>,
    columns: <span class="hljs-literal">true</span>,
    skip_empty_lines: <span class="hljs-literal">true</span>,
});

<span class="hljs-keyword">type</span> Record = {
    invoiceId: <span class="hljs-built_in">string</span>;
    date: <span class="hljs-built_in">Date</span>;
    name: <span class="hljs-built_in">string</span>;
    <span class="hljs-keyword">type</span>: <span class="hljs-string">'A - Services'</span> | <span class="hljs-string">'A - Goods'</span> | <span class="hljs-string">'B - Services'</span>;
    inEu: <span class="hljs-built_in">boolean</span>;
    inDk: <span class="hljs-built_in">boolean</span>;
    grandTotal: <span class="hljs-built_in">number</span>;
    vatRate: <span class="hljs-built_in">number</span>;
    baseValue: <span class="hljs-built_in">number</span>;
    vatValue: <span class="hljs-built_in">number</span>;
};

<span class="hljs-comment">// filter records from January 1st, 2024 to March 31st, 2024</span>
<span class="hljs-keyword">const</span> filteredRecords = records.filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> date = r.date;
    <span class="hljs-keyword">return</span> date &gt;= <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(FROM_DATE) &amp;&amp; date &lt;= <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(TO_DATE);
});

<span class="hljs-comment">// console.log(filteredRecords);</span>
<span class="hljs-comment">// process.exit(0);</span>

<span class="hljs-comment">// Caclulate</span>
<span class="hljs-comment">// https://skat.dk/en-us/businesses/vat/vat-on-international-trade/reporting-your-international-trade</span>
<span class="hljs-comment">// https://skat.dk/erhverv/moms/moms-ved-handel-med-udlandet/indberet-din-handel-med-udlandet</span>
<span class="hljs-keyword">const</span> tax = {
    <span class="hljs-string">'vat-in-dk'</span>: <span class="hljs-number">0</span>, <span class="hljs-comment">// Moms i DK til Købsmoms (Input VAT) (VAT deductible)</span>

    <span class="hljs-comment">// VAT on goods purchased outside Denmark (both the EU and third countries).</span>
    <span class="hljs-string">'vat-on-goods-purchased-outside-denmark'</span>: <span class="hljs-number">0</span>,
    <span class="hljs-comment">// VAT on services purchased outside Denmark subject to a reverse charge</span>
    <span class="hljs-string">'vat-on-services-purchased-outside-denmark-subject-to-a-reverse-charge'</span>: <span class="hljs-number">0</span>,
    <span class="hljs-comment">// to calculate Købsmoms (Input VAT) (VAT deductible)</span>
    <span class="hljs-string">'vat-on-services-purchased-outside-denmark-outside-eu'</span>: <span class="hljs-number">0</span>,

    <span class="hljs-string">'eu-sales-with-vat'</span>: <span class="hljs-number">0</span>,
    <span class="hljs-string">'eu-sales-without-vat'</span>: <span class="hljs-number">0</span>,

    <span class="hljs-string">'vat-paid'</span>: <span class="hljs-number">0</span>, <span class="hljs-comment">// Købsmoms</span>
    <span class="hljs-string">'vat-collected'</span>: <span class="hljs-number">0</span>, <span class="hljs-comment">// Salgsmoms</span>

    <span class="hljs-comment">// Boxes (base value here excl. VAT)</span>
    <span class="hljs-string">'box-a-services'</span>: <span class="hljs-number">0</span>,
    <span class="hljs-string">'box-a-goods'</span>: <span class="hljs-number">0</span>,
    <span class="hljs-comment">// Not relevant for my business</span>
    <span class="hljs-string">'box-b-services'</span>: <span class="hljs-number">0</span>,
    <span class="hljs-string">'box-b-goods'</span>: <span class="hljs-number">0</span>,
    <span class="hljs-string">'box-c-services'</span>: <span class="hljs-number">0</span>,
};

tax[<span class="hljs-string">'vat-in-dk'</span>] = filteredRecords
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.inDk)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.type === <span class="hljs-string">'A - Services'</span> || r.type === <span class="hljs-string">'A - Goods'</span>)
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc: <span class="hljs-built_in">number</span>, r: Record</span>) =&gt;</span> acc + r.vatValue, <span class="hljs-number">0</span>);

<span class="hljs-keyword">let</span> vatOf25estimatedForReverseCharge = <span class="hljs-number">0</span>;

<span class="hljs-comment">// Moms af varekøb i udlandet (både EU og lande uden for EU)</span>
<span class="hljs-comment">// VAT on goods purchased outside Denmark (both the EU and third countries).</span>
<span class="hljs-comment">// Enter the VAT payable on services purchased outside Denmark during the VAT period (both EU countries and third countries).</span>
<span class="hljs-comment">// You calculate the VAT as 25% of the invoice value of the services purchased during the period.</span>
tax[<span class="hljs-string">'vat-on-goods-purchased-outside-denmark'</span>] = filteredRecords
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> !r.inDk)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.type === <span class="hljs-string">'A - Goods'</span>)
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc: <span class="hljs-built_in">number</span>, r: Record</span>) =&gt;</span> {
        <span class="hljs-keyword">let</span> value = r.vatValue;

        <span class="hljs-keyword">if</span> (r.vatRate === <span class="hljs-number">0</span>) {
            value = r.grandTotal * <span class="hljs-number">0.25</span>;
            vatOf25estimatedForReverseCharge += value;
        }

        <span class="hljs-keyword">return</span> acc + value;
    }, <span class="hljs-number">0</span>);

<span class="hljs-comment">// Moms af ydelseskøb i udlandet med omvendt betalingspligt</span>
<span class="hljs-comment">// VAT on services purchased outside Denmark subject to a reverse charge</span>
tax[<span class="hljs-string">'vat-on-services-purchased-outside-denmark-subject-to-a-reverse-charge'</span>] =
    filteredRecords
        .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> !r.inDk &amp;&amp; r.inEu)
        <span class="hljs-comment">// Only report the value of the EU services you purchased in box A - services.</span>
        .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.type === <span class="hljs-string">'A - Services'</span>)
        .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.vatRate === <span class="hljs-number">0</span>)
        .reduce(<span class="hljs-function">(<span class="hljs-params">acc: <span class="hljs-built_in">number</span>, r: Record</span>) =&gt;</span> {
            <span class="hljs-keyword">let</span> value = r.vatValue;

            <span class="hljs-keyword">if</span> (r.vatRate === <span class="hljs-number">0</span>) {
                value = r.grandTotal * <span class="hljs-number">0.25</span>;
                vatOf25estimatedForReverseCharge += value;
            }

            <span class="hljs-keyword">return</span> acc + value;
        }, <span class="hljs-number">0</span>);

tax[<span class="hljs-string">'vat-on-services-purchased-outside-denmark-outside-eu'</span>] = filteredRecords
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> !r.inDk &amp;&amp; !r.inEu)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.type === <span class="hljs-string">'A - Services'</span>)
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc: <span class="hljs-built_in">number</span>, r: Record</span>) =&gt;</span> {
        <span class="hljs-keyword">let</span> value = r.vatValue;

        <span class="hljs-keyword">if</span> (r.vatRate === <span class="hljs-number">0</span>) {
            value = r.grandTotal * <span class="hljs-number">0.25</span>;
            vatOf25estimatedForReverseCharge += value;
        }

        <span class="hljs-keyword">return</span> acc + value;
    }, <span class="hljs-number">0</span>);

<span class="hljs-comment">// Print the manually added 25% on reverse charge</span>
<span class="hljs-built_in">console</span>.info({ vatOf25estimatedForReverseCharge });

<span class="hljs-comment">// Købsmoms // Input VAT (VAT deductible)</span>
<span class="hljs-comment">// You may include amounts from the following fields:</span>
<span class="hljs-comment">// - VAT on goods purchased outside Denmark.</span>
<span class="hljs-comment">// - VAT on services purchased outside Denmark subject to a reverse charge.</span>
tax[<span class="hljs-string">'vat-paid'</span>] =
    tax[<span class="hljs-string">'vat-in-dk'</span>] +
    tax[<span class="hljs-string">'vat-on-goods-purchased-outside-denmark'</span>] +
    tax[
        <span class="hljs-string">'vat-on-services-purchased-outside-denmark-subject-to-a-reverse-charge'</span>
    ] +
    tax[<span class="hljs-string">'vat-on-services-purchased-outside-denmark-outside-eu'</span>];

<span class="hljs-comment">// Box A - goods</span>
<span class="hljs-comment">// VAT on goods purchased outside Denmark (both the EU and third countries).</span>
<span class="hljs-comment">// You should report the value of your purchase of goods from other EU countries in box A - ‘goods’  on your VAT return.</span>
<span class="hljs-comment">// KISS: Hardware that I bought from the EU but not in Denmark</span>
<span class="hljs-comment">// Rubrik A - varer. Værdien uden moms af varekøb i andre EU-lande - EU-erhvervelser.</span>
tax[<span class="hljs-string">'box-a-goods'</span>] = filteredRecords
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.inEu &amp;&amp; !r.inDk)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.type === <span class="hljs-string">'A - Goods'</span>)
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc: <span class="hljs-built_in">number</span>, r: Record</span>) =&gt;</span> acc + r.grandTotal, <span class="hljs-number">0</span>);

<span class="hljs-comment">// Rubrik A - ydelser. Værdien uden moms af ydelseskøb i andre EU-lande.</span>
<span class="hljs-comment">// Box A - services</span>
<span class="hljs-comment">// You should report the value of your purchase of services from other EU countries in box A - ‘services’  on your VAT return.</span>
<span class="hljs-comment">// KISS: Services that I bought from the EU but not in Denmark, the base value w/o VAT</span>
tax[<span class="hljs-string">'box-a-services'</span>] = filteredRecords
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.inEu &amp;&amp; !r.inDk)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.type === <span class="hljs-string">'A - Services'</span>)
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc: <span class="hljs-built_in">number</span>, r: Record</span>) =&gt;</span> acc + r.grandTotal, <span class="hljs-number">0</span>);

<span class="hljs-comment">// Box B - services</span>
<span class="hljs-comment">// The value of certain sales of services exclusive of VAT to other EU countries. To be reported under ‘EU-salg uden moms’ (EU sales exclusive of VAT)</span>
<span class="hljs-comment">// Rubrik B-ydelser. Værdien af visse ydelsessalg uden moms til andre EU-lande. Skal også indberettes til systemet "EU-salg uden moms".</span>
<span class="hljs-comment">// KISS: Services that I sold to the EU without charging VAT (reverse charge) (typically B2B)</span>
tax[<span class="hljs-string">'box-b-services'</span>] = filteredRecords
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.inEu &amp;&amp; !r.inDk)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.type === <span class="hljs-string">'B - Services'</span>)
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc: <span class="hljs-built_in">number</span>, r: Record</span>) =&gt;</span> acc + r.grandTotal, <span class="hljs-number">0</span>);

<span class="hljs-comment">// EU sales with VAT</span>
<span class="hljs-comment">// The value of certain sales of goods and services exclusive of VAT to other EU countries. To be reported under ‘EU-salg med moms’ (EU sales with VAT)</span>
<span class="hljs-comment">// KISS: Sales of goods and services to the EU where I charged VAT (typically B2C)</span>
tax[<span class="hljs-string">'eu-sales-with-vat'</span>] = filteredRecords
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.inEu &amp;&amp; !r.inDk)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.type === <span class="hljs-string">'B - Services'</span>)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.vatRate &gt; <span class="hljs-number">0</span>)
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc: <span class="hljs-built_in">number</span>, r: Record</span>) =&gt;</span> acc + r.grandTotal, <span class="hljs-number">0</span>);

<span class="hljs-comment">// EU sales without VAT</span>
<span class="hljs-comment">// The value of certain sales of goods and services exclusive of VAT to other EU countries. To be reported under ‘EU-salg uden moms’ (EU sales exclusive of VAT)</span>
tax[<span class="hljs-string">'eu-sales-without-vat'</span>] = filteredRecords
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.inEu &amp;&amp; !r.inDk)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.type === <span class="hljs-string">'B - Services'</span>)
    .filter(<span class="hljs-function">(<span class="hljs-params">r: Record</span>) =&gt;</span> r.vatRate === <span class="hljs-number">0</span>)
    .reduce(<span class="hljs-function">(<span class="hljs-params">acc: <span class="hljs-built_in">number</span>, r: Record</span>) =&gt;</span> acc + r.baseValue, <span class="hljs-number">0</span>);

<span class="hljs-comment">// No sales for me... yet :)</span>
<span class="hljs-comment">// console.log(</span>
<span class="hljs-comment">//     `</span>
<span class="hljs-comment">// Report the value of your sale in two different places in E-tax for businesses:</span>

<span class="hljs-comment">// In your VAT return within the normal deadlines that apply to your business.</span>
<span class="hljs-comment">// Under ’EU-salg uden moms’ (EU sales exclusive of VAT) by the 25th day of each month.</span>

<span class="hljs-comment">//     `.trim(),</span>
<span class="hljs-comment">// );</span>

<span class="hljs-comment">// Round to no decimal places</span>
<span class="hljs-built_in">Object</span>.keys(tax).forEach(<span class="hljs-function">(<span class="hljs-params">key</span>) =&gt;</span> {
    tax[key] = <span class="hljs-built_in">Math</span>.round(tax[key]);
});

<span class="hljs-comment">// Add Danish labels, it is easier to know where to put the numbers in the tax form</span>
<span class="hljs-keyword">const</span> dansk = {
    <span class="hljs-string">'Moms af varekøb i udlandet (både EU og lande uden for EU)'</span>:
        tax[<span class="hljs-string">'vat-on-goods-purchased-outside-denmark'</span>],
    <span class="hljs-string">'Moms af ydelseskøb i udlandet med omvendt betalingspligt'</span>:
        tax[
            <span class="hljs-string">'vat-on-services-purchased-outside-denmark-subject-to-a-reverse-charge'</span>
        ],
    Købsmoms: tax[<span class="hljs-string">'vat-paid'</span>],
    <span class="hljs-string">'Rubrik A - varer'</span>: tax[<span class="hljs-string">'box-a-goods'</span>],
    <span class="hljs-string">'Rubrik A - ydelser'</span>: tax[<span class="hljs-string">'box-a-services'</span>],
    <span class="hljs-string">'Rubrik B - ydelser'</span>: tax[<span class="hljs-string">'box-b-services'</span>],
    <span class="hljs-string">'EU-salg med moms'</span>: tax[<span class="hljs-string">'eu-sales-with-vat'</span>],
    <span class="hljs-string">'EU-salg uden moms'</span>: tax[<span class="hljs-string">'eu-sales-without-vat'</span>],
};

<span class="hljs-built_in">console</span>.dir(dansk);
</code></pre>
<p>I like TypeScript for how easy it is throw something together and just run the code. I like Bun that it allows me to execute TypeScript with no need to transpile first.</p>
<p>This script will print all the numbers that I need to fill out in the SKAT form and that is soo so nice. If you're doing the same, you're welcome to use it too. Just make sure to match the columns.</p>
<h2 id="heading-on-github">On GitHub</h2>
<p>I also put it up in the repo here <a target="_blank" href="https://github.com/flexchar/bun-scripts/tree/main/skat">https://github.com/flexchar/bun-scripts/tree/main/skat</a>. I haven't figured out the ultimate structure for these seldom used scripts.</p>
<p>PS. If you notice that something is not quite right, please do let me know. At the end of the day, it is a little challenge to interpret the taxation rules. :)</p>
]]></content:encoded></item><item><title><![CDATA[Fine tune Llama 70B using Unsloth, LoRA & Modal as easy as OpenAI ChatGPT]]></title><description><![CDATA[Intro
I came across Modal last summer when I was on a self-inspired mission to run BLOOM 176B model as an open-source competition to ChatGPT.
The guys at modal were amazing and after several days of tinkering after my work, I got it up using 6 GPUs s...]]></description><link>https://lukasnotes.dk/fine-tune-llama-70b-using-unsloth-lora-modal-as-easy-as-openai-chatgpt</link><guid isPermaLink="true">https://lukasnotes.dk/fine-tune-llama-70b-using-unsloth-lora-modal-as-easy-as-openai-chatgpt</guid><category><![CDATA[unsloth]]></category><category><![CDATA[modal]]></category><category><![CDATA[finetuning]]></category><category><![CDATA[LLaMa]]></category><category><![CDATA[Llama3]]></category><category><![CDATA[LoRA]]></category><category><![CDATA[openai]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Sun, 26 May 2024 08:15:26 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1716717068224/d3441534-ef70-44f5-a647-87ecf5a4e510.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-intro">Intro</h1>
<p>I came across Modal last summer when I was on a self-inspired mission to run BLOOM 176B model as an open-source competition to ChatGPT.</p>
<p>The guys at modal were amazing and after several days of tinkering after my work, I got it up using 6 GPUs simultaneously. I am not a big fan of Python but it is essentially the only way for ML computing and the way Modal allows to run the code on the cloud is just lovely.</p>
<blockquote>
<p>Fun fact, one of the founders is from Stockholm. There's a very interesting podcast where he shares his journey about how internet was back then in Sweden. I don't have link right now but it's worth a search.</p>
</blockquote>
<p>Recently I wanted to fine tune gpt 3.5 but my inputs got flagged. I have a big dataset so instead of going through their <em>free</em> moderation API I felt it was the perfect push to try Unsloth. The guys are absolute geniuses that wrote their own kernels.</p>
<p>By the way, Modal gives everyone free 30usd/mon worth of credits so you get plenty of room for trial &amp; error.</p>
<h1 id="heading-fine-tunning-llama-3-70b">Fine tunning Llama 3 70B</h1>
<p>The primary audience for Unsloth are everyone running their trials on Google Collab. That is something that I never got too comfortable with, I like the code on my own laptop in my own <code>git</code> repo.</p>
<p>It was a bit of challenge to figure out the right libraries and the base docker image to setup to run the library. Fortunately it didn't take too long diving in their discord to come across the one, and the rest of the code was born.</p>
<p>It took me 10 minutes to fine tune quantized Llama 3 70b 4bit using Modal's H100. I wonder how long it would take had I access to Unsloth's premium code supporting multiple GPUs. Anyway, super nice piece of code. As a developer, I love to have a little code as possible. You're welcome to use it.</p>
<h1 id="heading-code">Code</h1>
<p>Remember to read the part below too.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> datetime <span class="hljs-keyword">import</span> datetime
<span class="hljs-keyword">import</span> secrets
<span class="hljs-keyword">import</span> os
<span class="hljs-keyword">from</span> modal <span class="hljs-keyword">import</span> App, Secret, Image, Volume, gpu
<span class="hljs-keyword">from</span> pathlib <span class="hljs-keyword">import</span> PurePosixPath


HUGGING_FACE_USERNAME = <span class="hljs-string">"name"</span>

app = App(<span class="hljs-string">"unsloth-experiment"</span>)

<span class="hljs-comment"># Volumes for pre-trained models and training runs.</span>
pretrained_volume = Volume.from_name(<span class="hljs-string">"unsloth-pretrained-vol"</span>, create_if_missing=<span class="hljs-literal">True</span>)
runs_volume = Volume.from_name(<span class="hljs-string">"unsloth-runs-vol"</span>, create_if_missing=<span class="hljs-literal">True</span>)
VOLUME_CONFIG: dict[str | PurePosixPath, Volume] = {
    <span class="hljs-string">"/pretrained"</span>: pretrained_volume,
    <span class="hljs-string">"/runs"</span>: runs_volume,
}


image = (
    Image.from_registry(<span class="hljs-string">"pytorch/pytorch:2.2.0-cuda12.1-cudnn8-devel"</span>)
    .apt_install(<span class="hljs-string">"git"</span>)
    .pip_install(
        <span class="hljs-string">"torch==2.2.1"</span>,
    )
    .run_commands(
        <span class="hljs-string">'pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"'</span>,
    )
    .run_commands(
        <span class="hljs-string">"pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes"</span>,
    )
    .pip_install(<span class="hljs-string">"huggingface_hub"</span>, <span class="hljs-string">"hf-transfer"</span>, <span class="hljs-string">"wandb"</span>)
    .env(
        dict(
            HUGGINGFACE_HUB_CACHE=<span class="hljs-string">"/pretrained"</span>,
            HF_HUB_ENABLE_HF_TRANSFER=<span class="hljs-string">"1"</span>,
        )
    )
)


<span class="hljs-meta">@app.function(</span>
    image=image,
    gpu=gpu.H100(),
    volumes=VOLUME_CONFIG,
    timeout=<span class="hljs-number">3600</span> * <span class="hljs-number">1</span>,
    _allow_background_volume_commits=<span class="hljs-literal">True</span>,
    secrets=[Secret.from_name(<span class="hljs-string">"wandb-secret"</span>)],
)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">train</span>(<span class="hljs-params">data_raw: str, model_name: str, run_name: str, run_folder: str</span>):</span>
    <span class="hljs-keyword">from</span> unsloth <span class="hljs-keyword">import</span> FastLanguageModel, chat_templates
    <span class="hljs-keyword">import</span> torch
    <span class="hljs-keyword">from</span> trl <span class="hljs-keyword">import</span> SFTTrainer
    <span class="hljs-keyword">from</span> transformers <span class="hljs-keyword">import</span> TrainingArguments, EarlyStoppingCallback
    <span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset
    <span class="hljs-keyword">import</span> wandb

    <span class="hljs-comment"># Track in WanDB</span>
    wandb.init(project=app.name, name=run_name, tags=[<span class="hljs-string">"unsloth"</span>, <span class="hljs-string">"H100"</span>, model_name])

    <span class="hljs-comment"># Write the dataset to a file</span>
    file_path = <span class="hljs-string">"dataset.jsonl"</span>
    <span class="hljs-keyword">with</span> open(file_path, <span class="hljs-string">"w"</span>) <span class="hljs-keyword">as</span> data_file:
        data_file.write(data_raw)

    print(<span class="hljs-string">f"Loading dataset from <span class="hljs-subst">{file_path}</span>..."</span>)
    dataset = load_dataset(<span class="hljs-string">"json"</span>, data_files={<span class="hljs-string">"train"</span>: file_path}, split=<span class="hljs-string">"train"</span>)

    <span class="hljs-comment"># Print number of examples</span>
    print(<span class="hljs-string">f"Number of examples: <span class="hljs-subst">{len(dataset)}</span>"</span>)

    <span class="hljs-comment"># Load Llama model</span>
    max_seq_length = <span class="hljs-number">4096</span>
    print(<span class="hljs-string">f"Initalizing model with max_seq_length=<span class="hljs-subst">{max_seq_length}</span>..."</span>)
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=model_name,
        max_seq_length=max_seq_length,  <span class="hljs-comment"># Supports RoPE Scaling interally, so choose any!</span>
        dtype=<span class="hljs-literal">None</span>,  <span class="hljs-comment"># None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+</span>
        load_in_4bit=<span class="hljs-literal">True</span>,  <span class="hljs-comment"># Use 4bit quantization to reduce memory usage. Can be False.</span>
    )

    <span class="hljs-comment"># Do model patching and add fast LoRA weights</span>
    model = FastLanguageModel.get_peft_model(
        model,
        r=<span class="hljs-number">16</span>,  <span class="hljs-comment"># Choose any number &gt; 0 ! Suggested 8, 16, 32, 64, 128</span>
        target_modules=[
            <span class="hljs-string">"q_proj"</span>,
            <span class="hljs-string">"k_proj"</span>,
            <span class="hljs-string">"v_proj"</span>,
            <span class="hljs-string">"o_proj"</span>,
            <span class="hljs-string">"gate_proj"</span>,
            <span class="hljs-string">"up_proj"</span>,
            <span class="hljs-string">"down_proj"</span>,
        ],
        lora_alpha=<span class="hljs-number">16</span>,
        lora_dropout=<span class="hljs-number">0</span>,  <span class="hljs-comment"># Supports any, but = 0 is optimized</span>
        bias=<span class="hljs-string">"none"</span>,  <span class="hljs-comment"># Supports any, but = "none" is optimized</span>
        <span class="hljs-comment"># [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!</span>
        use_gradient_checkpointing=<span class="hljs-string">"unsloth"</span>,  <span class="hljs-comment"># True or "unsloth" for very long context</span>
        random_state=<span class="hljs-number">3407</span>,
        max_seq_length=max_seq_length,
        use_rslora=<span class="hljs-literal">False</span>,  <span class="hljs-comment"># We support rank stabilized LoRA</span>
        loftq_config=<span class="hljs-literal">None</span>,  <span class="hljs-comment"># And LoftQ</span>
    )

    <span class="hljs-comment"># https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2#scrollTo=LjY75GoYUCB8&amp;line=1&amp;uniqifier=1</span>
    <span class="hljs-comment"># ChatML is the default chat template</span>
    tokenizer = chat_templates.get_chat_template(tokenizer)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">formatting_prompts_func</span>(<span class="hljs-params">entry</span>):</span>
        <span class="hljs-comment"># Example of the payload</span>
        <span class="hljs-comment"># print(entry["messages"])</span>
        convos = entry[<span class="hljs-string">"messages"</span>]
        texts = []
        <span class="hljs-keyword">for</span> convo <span class="hljs-keyword">in</span> convos:
            text = tokenizer.apply_chat_template(
                convo, tokenize=<span class="hljs-literal">False</span>, add_generation_prompt=<span class="hljs-literal">False</span>
            )
            texts.append(text)
        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">"text"</span>: texts,
        }

    print(<span class="hljs-string">f"Mapping prompts..."</span>)
    dataset = dataset.map(formatting_prompts_func, batched=<span class="hljs-literal">True</span>)

    <span class="hljs-comment"># Print number of examples</span>
    print(<span class="hljs-string">f"Number of examples: <span class="hljs-subst">{len(dataset)}</span>"</span>)

    trainer = SFTTrainer(
        model=model,
        train_dataset=dataset,
        dataset_text_field=<span class="hljs-string">"text"</span>,
        max_seq_length=max_seq_length,
        tokenizer=tokenizer,
        args=TrainingArguments(
            per_device_train_batch_size=<span class="hljs-number">2</span>,
            gradient_accumulation_steps=<span class="hljs-number">8</span>,
            warmup_steps=<span class="hljs-number">10</span>,
            max_steps=<span class="hljs-number">30</span>,
            fp16=<span class="hljs-keyword">not</span> torch.cuda.is_bf16_supported(),
            bf16=torch.cuda.is_bf16_supported(),
            logging_steps=<span class="hljs-number">1</span>,
            output_dir=run_folder,
            optim=<span class="hljs-string">"adamw_8bit"</span>,
            seed=<span class="hljs-number">3407</span>,
            run_name=run_name,
            report_to=<span class="hljs-string">"wandb"</span>,
            <span class="hljs-comment"># load_best_model_at_end=True,  # Load the best model in terms of metric at the end of training</span>
            <span class="hljs-comment"># metric_for_best_model="train_loss",  # Specify the metric to use for early stopping</span>
            <span class="hljs-comment"># greater_is_better=False,  # Specify if higher values are better for the specified metric</span>
        ),
        <span class="hljs-comment"># callbacks=[</span>
        <span class="hljs-comment">#     # Stop training if eval loss does not improve for 3 checkpoints</span>
        <span class="hljs-comment">#     EarlyStoppingCallback(3),</span>
        <span class="hljs-comment"># ],</span>
    )

    print(
        <span class="hljs-string">f"Starting training run with <span class="hljs-subst">{torch.cuda.device_count()}</span> <span class="hljs-subst">{torch.cuda.get_device_name()}</span>"</span>
    )
    trainer.train()
    model.save_pretrained(<span class="hljs-string">f"<span class="hljs-subst">{run_folder}</span>/lora_model"</span>)  <span class="hljs-comment"># Local saving</span>

    VOLUME_CONFIG[<span class="hljs-string">"/runs"</span>].commit()

    <span class="hljs-keyword">return</span> run_name


<span class="hljs-meta">@app.function(</span>
    image=image,
    volumes=VOLUME_CONFIG,
    _allow_background_volume_commits=<span class="hljs-literal">True</span>,
    secrets=[Secret.from_name(<span class="hljs-string">"my-huggingface-secret"</span>)],
)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">online</span>(<span class="hljs-params">data_raw: str, model_name: str, train_suffic: str</span>):</span>
    <span class="hljs-keyword">from</span> huggingface_hub <span class="hljs-keyword">import</span> snapshot_download

    <span class="hljs-comment"># Ensure the base model is downloaded</span>
    <span class="hljs-keyword">try</span>:
        snapshot_download(model_name, local_files_only=<span class="hljs-literal">True</span>)
        print(<span class="hljs-string">f"Volume contains <span class="hljs-subst">{model_name}</span>."</span>)
    <span class="hljs-keyword">except</span> FileNotFoundError:
        print(<span class="hljs-string">f"Downloading <span class="hljs-subst">{model_name}</span> ..."</span>)
        snapshot_download(
            model_name,
            token=os.environ[<span class="hljs-string">"HF_TOKEN"</span>],
        )

        print(<span class="hljs-string">"Committing /pretrained directory (no progress bar) ..."</span>)
        VOLUME_CONFIG[<span class="hljs-string">"/pretrained"</span>].commit()

    <span class="hljs-comment"># Write config and data into a training subfolder.</span>
    time_string = datetime.now().strftime(<span class="hljs-string">"%Y-%m-%d-%H-%M-%S"</span>)
    run_name = <span class="hljs-string">f"unsloth-<span class="hljs-subst">{time_string}</span>"</span>
    <span class="hljs-comment"># if train_suffic is not empty, append it to the run_name</span>
    <span class="hljs-keyword">if</span> train_suffic := train_suffic.strip():
        run_name += <span class="hljs-string">f"-<span class="hljs-subst">{train_suffic}</span>"</span>
    run_folder = <span class="hljs-string">f"/runs/<span class="hljs-subst">{run_name}</span>"</span>
    os.makedirs(run_folder)
    print(<span class="hljs-string">f"Preparing training run in <span class="hljs-subst">{run_folder}</span>."</span>)

    <span class="hljs-comment"># Start training run.</span>
    print(<span class="hljs-string">"Spawning container for training."</span>)
    train_handle = train.spawn(data_raw, model_name, run_name, run_folder)
    <span class="hljs-keyword">with</span> open(<span class="hljs-string">f"<span class="hljs-subst">{run_folder}</span>/logs.txt"</span>, <span class="hljs-string">"w"</span>) <span class="hljs-keyword">as</span> f:
        msg = <span class="hljs-string">f"train: https://modal.com/logs/call/<span class="hljs-subst">{train_handle.object_id}</span>"</span>
        f.write(msg)
        print(msg)
    VOLUME_CONFIG[<span class="hljs-string">"/runs"</span>].commit()

    <span class="hljs-keyword">return</span> run_name, train_handle


<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run_cmd</span>(<span class="hljs-params">cmd: str, run_folder: str</span>):</span>
    <span class="hljs-keyword">import</span> subprocess

    <span class="hljs-comment"># Ensure volumes contain latest files.</span>
    VOLUME_CONFIG[<span class="hljs-string">"/pretrained"</span>].reload()
    VOLUME_CONFIG[<span class="hljs-string">"/runs"</span>].reload()

    <span class="hljs-comment"># Propagate errors from subprocess.</span>
    <span class="hljs-keyword">if</span> exit_code := subprocess.call(cmd.split(), cwd=run_folder):
        exit(exit_code)

    <span class="hljs-comment"># Commit writes to volume.</span>
    VOLUME_CONFIG[<span class="hljs-string">"/runs"</span>].commit()


<span class="hljs-meta">@app.function(</span>
    image=image,
    volumes=VOLUME_CONFIG,
    secrets=[Secret.from_name(<span class="hljs-string">"huggingface-secret-write"</span>)],
)
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">upload_to_hf</span>(<span class="hljs-params">run_folder: str, name: str</span>):</span>

    hf_token = os.environ[<span class="hljs-string">"HF_TOKEN"</span>]

    CMD = <span class="hljs-string">f"huggingface-cli upload --token <span class="hljs-subst">{hf_token}</span> --private <span class="hljs-subst">{HUGGING_FACE_USERNAME}</span>/<span class="hljs-subst">{name}</span> lora_model"</span>
    <span class="hljs-comment"># To Hugging Face Hub</span>
    run_cmd(CMD, run_folder)

    <span class="hljs-comment"># ToDo: upload to S3 for Predibase</span>


<span class="hljs-comment"># ChatML dataset</span>
<span class="hljs-meta">@app.local_entrypoint()</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>(<span class="hljs-params">
    data: str,
    train_suffic: str,
    model_name: str = <span class="hljs-string">"unsloth/llama-3-70b-bnb-4bit"</span>,
</span>):</span>
    <span class="hljs-comment"># Read data file and pass contents to the remote function.</span>
    <span class="hljs-keyword">with</span> open(data, <span class="hljs-string">"r"</span>) <span class="hljs-keyword">as</span> dat:
        run_name, train_handle = online.remote(dat.read(), model_name, train_suffic)

    <span class="hljs-comment"># # Write a local reference to the location on the remote volume with the run</span>
    <span class="hljs-keyword">with</span> open(<span class="hljs-string">".last_run_name"</span>, <span class="hljs-string">"w"</span>) <span class="hljs-keyword">as</span> f:
        f.write(run_name)

    <span class="hljs-comment"># Wait for the training run to finish.</span>
    train_handle.get()
    print(<span class="hljs-string">f"Training complete. Run tag: <span class="hljs-subst">{run_name}</span>"</span>)
    print(
        <span class="hljs-string">f"To inspect weights, run `modal volume ls unsloth-runs-vol <span class="hljs-subst">{run_name}</span>/lora_model`"</span>
    )

    print(<span class="hljs-string">f"Uploading to Hugging Face Hub..."</span>)
    run_folder = <span class="hljs-string">f"/runs/<span class="hljs-subst">{run_name}</span>"</span>
    upload_to_hf.remote(run_folder, run_name)
</code></pre>
<p>To use, you will need to set:</p>
<ul>
<li><p><code>huggingface-secret-write</code> to upload the adapter to Hugging Face, or delete that part of the code.</p>
</li>
<li><p><code>wandb-secret</code> to track the fine tunning progress in WanDB, or delete the code, which I would highly advise against because you really do want to know how it's doing.</p>
</li>
</ul>
<p>Finally prepare OpenAI type of dataset and launch training using the following command: <code>modal run --detach</code> <a target="_blank" href="http://copenhagen.py"><code>learn.py</code></a> <code>--data dataset.jsonl --train-suffic cph</code></p>
<p>Observe the training eval/loss, over fitting and over training yields no value and I haven't had a chance to implement the <code>EarlyStopping</code> yet.</p>
<h1 id="heading-bonus-tips">Bonus Tips</h1>
<p>For a good fine tunning:</p>
<ul>
<li><p>make sure to isolate to a single problem solving, no "do it all" type of stuff;</p>
</li>
<li><p>start with smallest model and then go up;</p>
</li>
<li><p>Mistral 7B and Llama 8B are some of the state-of-art choices available for free;</p>
</li>
<li><p>if you need chat, stick to ChatML template - it will save you a lot of trouble down the road;</p>
</li>
<li><p>prefer base model (not instruct) to have a fresh starting point as possible, or if picking a community tuned model such as Eric Hartford's - make sure you don't mix chat templates (given you train a chat model);</p>
</li>
<li><p>quality of dataset is everything, few shitty examples would ruin as powerful model as GPT 4 and only high quality examples would make small models outperform GPT 4 (search for Predibase adapters case study on this);</p>
</li>
</ul>
<h1 id="heading-enjoy">Enjoy!</h1>
<p>If you end up training something fun, let me know!</p>
]]></content:encoded></item><item><title><![CDATA[MySQL 8.4.0 version kicked me below the waist line]]></title><description><![CDATA[TL DR
If you're coming from google because you saw unknown variable 'default-auth=mysql_native_password'. - all you need to do is to change --default-authentication-plugin=mysql_native_password to --mysql-native-password=ON.
Story
I recently did a de...]]></description><link>https://lukasnotes.dk/mysql-840-version-kicked-me-below-the-waist-line</link><guid isPermaLink="true">https://lukasnotes.dk/mysql-840-version-kicked-me-below-the-waist-line</guid><category><![CDATA[MySQL]]></category><category><![CDATA[Docker]]></category><category><![CDATA[versioning]]></category><category><![CDATA[debugging]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Thu, 02 May 2024 13:58:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/Y9kOsyoWyaU/upload/669d827a6eac119c58213326ca0a365c.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-tl-dr">TL DR</h2>
<p>If you're coming from google because you saw <code>unknown variable 'default-auth=mysql_native_password'.</code> - all you need to do is to change <code>--default-authentication-plugin=mysql_native_password</code> to <code>--mysql-native-password=ON</code>.</p>
<h2 id="heading-story">Story</h2>
<p>I recently did a deployment where a docker compose project had own mysql that was only pinned to the version <code>mysql:8</code>. The config snippet below.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">mysql:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">mysql:8</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">always</span>
    <span class="hljs-attr">command:</span> <span class="hljs-string">--default-authentication-plugin=mysql_native_password</span>
    <span class="hljs-attr">volumes:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">db_data:/var/lib/mysql</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">db_socket:/var/run/mysqld</span>
    <span class="hljs-attr">environment:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">MYSQL_DATABASE=xxxxx</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">MYSQL_ROOT_PASSWORD=xxxxx</span>
    <span class="hljs-attr">healthcheck:</span>
        <span class="hljs-attr">test:</span> [<span class="hljs-string">'CMD'</span>, <span class="hljs-string">'mysqladmin'</span>, <span class="hljs-string">'status'</span>, <span class="hljs-string">'-hlocalhost'</span>, <span class="hljs-string">'-uroot'</span>, <span class="hljs-string">'-pxxxxxx'</span>]
        <span class="hljs-attr">interval:</span> <span class="hljs-string">1m</span>
        <span class="hljs-attr">timeout:</span> <span class="hljs-string">5s</span>
        <span class="hljs-attr">retries:</span> <span class="hljs-number">3</span>
        <span class="hljs-attr">start_period:</span> <span class="hljs-string">1m</span>
</code></pre>
<p>Immediately one could point out that it's a huge mistake to only pin by a major version. For a truly production project, I would almost never host my own database - aka, I would always use a managed one.</p>
<p>Anyhow, this one slipped through and caused completely unnecessary downtime.</p>
<h2 id="heading-debugging">Debugging</h2>
<p>After looking through the looks I could see <code>mysql 8.4.0 Plugin 'mysql_native_password' is not loaded</code> and <code>unknown variable 'default-auth=mysql_native_password'</code> lines and that the server is shutting down.</p>
<p>Almighty Google!</p>
<p>That means I started <em>surfing</em>. Since it was released just two days ago (on April 30th), there weren't any immediately results that seemed relevant as deemed by my brains.</p>
<p>Based on that, I searched for change log/release log. I land on this page <a target="_blank" href="https://dev.mysql.com/doc/relnotes/mysql/8.4/en/news-8-4-0.html">https://dev.mysql.com/doc/relnotes/mysql/8.4/en/news-8-4-0.html</a> and do Cmd + F (search in page) to see if there has been anything regarding the plugin.</p>
<p>Boom, lucky result under "Deprecation and Removal Notes" section</p>
<blockquote>
<p><strong><em>Important Change:</em></strong> The deprecated <code>mysql_native_password</code> authentication plugin is now disabled by default. It can be enabled by starting MySQL with the new <a target="_blank" href="https://dev.mysql.com/doc/refman/8.4/en/server-options.html#option_mysqld_mysql-native-password"><code>--mysql-native-password=ON</code></a> <a target="_blank" href="https://dev.mysql.com/doc/refman/8.4/en/server-options.html#option_mysqld_mysql-native-password">server option, or by addi</a>ng <code>mysql_native_password=ON</code> to the <code>[mysqld]</code> section of your MySQL configuration file.</p>
</blockquote>
<p>That led me to update the compose config and fix the downtime.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">mysql:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">mysql:8.4</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">always</span>
    <span class="hljs-attr">command:</span> <span class="hljs-string">--mysql-native-password=ON</span>
    <span class="hljs-attr">volumes:</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">db_data:/var/lib/mysql</span>
        <span class="hljs-bullet">-</span> <span class="hljs-string">db_socket:/var/run/mysqld</span>
    <span class="hljs-attr">environment:</span>
    <span class="hljs-comment">#....</span>
</code></pre>
<p>That's it. I hope by writing this I won't land on this mistake again. Restoring from backup is such as boring thing to do.</p>
<p>Or maybe it's a sign to stop procrastinating migration to <code>pg</code>... would be lovely to leverage Cloudflare's hypertunnel...</p>
]]></content:encoded></item><item><title><![CDATA[How to get JSON from video? Aka, transcribe any video file on your computer]]></title><description><![CDATA[History
Two weeks ago I found myself sharing my side project where I transcribe a whole dozen of videos into a list of JSON files containing word-level transcripts. Everyone who has been into the AI news since end of 2022 will probably know about Whi...]]></description><link>https://lukasnotes.dk/how-to-get-json-from-video-aka-transcribe-any-video-file-on-your-computer</link><guid isPermaLink="true">https://lukasnotes.dk/how-to-get-json-from-video-aka-transcribe-any-video-file-on-your-computer</guid><category><![CDATA[replicate]]></category><category><![CDATA[whisper]]></category><category><![CDATA[cli]]></category><category><![CDATA[AI]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Fri, 26 Apr 2024 14:14:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1714140719020/f19ee754-841e-4176-a4c7-5bc07e2ab763.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1714138882928/5b099277-0fdf-46bd-a05c-9b873f4f3cf4.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-history">History</h2>
<p>Two weeks ago I found myself sharing my side project where I transcribe a whole dozen of videos into a list of JSON files containing word-level transcripts. Everyone who has been into the AI news since end of 2022 will probably know about Whisper. It's one of the wildest value drops for everyone.</p>
<h4 id="heading-solving-gpu-for-whisper-runtime">Solving GPU (for Whisper) runtime</h4>
<p>However running Whisper model locally can get quite demanding, can be slow or impossible given on the hardware. I am a big fan of high accuracy, high value and high speed. That means I want the large model and low speed.</p>
<p>Initially I have used Modal.com for running my Python code. It worked well but I don't want to maintain python. I want to be as simple as possible. In winter I came across Replicate.com (there's at least half a hundred of hosted providers like that, I'm trying to keep my bookmarks updated). It offers <a target="_blank" href="https://replicate.com/turian/insanely-fast-whisper-with-video">turian/insanely-fast-whisper-with-video</a>, which in short is much faster version Whisper runtime. OpenAI tends to provide a working but not optimised code.</p>
<h4 id="heading-solving-file-hosting">Solving file hosting</h4>
<p>It didn't take long for me to try it using <code>cURL</code> and <code>jq</code> to have the transcript. But there was one problem. Many of the files are on my laptop. Such as a meeting I've just had at Flügger. I need it available on some url. There are data protections and again, I wanted to keep as simple as possible. One thing that we all love and use is <code>S3</code> storage. Lucky me, I came to notice that Cloudflare's R2 (their S3 offering) has finally added the support.</p>
<p>So I created a very convenient bucket called <code>tmp</code> and a few rules to delete files after one day if path prefix matches <code>s3://tmp/1d/</code> . Boom. Now I can use CLI to <em>copy paste</em> the file into private bucket that cleanses itself and get a signed URL, that is publicly accessible and private.</p>
<h4 id="heading-reusable-script">Reusable script</h4>
<p>My terminal, Warp.dev, after updating itself popped up with a new notebook announcement. I must have gotten lucky again. Instead of writing a shell script, now I could create a Python like notebook that I can add comments and remind my future self what the heck I'm doing here. Sometimes it's scary how quickly I can forget certain commands and how they work, if I haven't thought about them for a few weeks.</p>
<h2 id="heading-the-code">The code</h2>
<p>You may download this code as a Warp notebook from here:<br /><a target="_blank" href="https://gist.github.com/flexchar/0a9c6ecf0c1b9c2592e45af78c30cb23">https://gist.github.com/flexchar/0a9c6ecf0c1b9c2592e45af78c30cb23</a></p>
<p>or right in your holy shell</p>
<pre><code class="lang-bash">wget https://gist.githubusercontent.com/flexchar/0a9c6ecf0c1b9c2592e45af78c30cb23/raw -O video-to-json.md
</code></pre>
<h3 id="heading-prerequisites">Prerequisites</h3>
<ul>
<li><p>This script uses <code>s5cmd</code> as an s3 client which can be installed using <code>brew install s5cmd</code>, or equivalent on other systems.</p>
</li>
<li><p>This script assumes you have <code>r2</code> alias set for</p>
</li>
</ul>
<pre><code class="lang-bash">r2=<span class="hljs-string">'s5cmd --credentials-file ~/.config/s3/r2-config.cfg --endpoint-url https://{your bucket id}.r2.cloudflarestorage.com'</span>
</code></pre>
<ul>
<li>Replicate token set as an environment variable</li>
</ul>
<pre><code class="lang-bash"><span class="hljs-built_in">export</span> REPLICATE_API_TOKEN=
</code></pre>
<h3 id="heading-process">Process</h3>
<ol>
<li>Upload video to S3 on Cloudflare R2 bucket. This will create as web accessible link so we don't have to kill ourselves uploading the file directly. The bucket files are not browsable by public so they're still private.</li>
</ol>
<pre><code class="lang-bash">S3_PATH=<span class="hljs-string">"s3://tmp/7d/{{destination_name}}"</span>
r2 cp --sp {{source_path}} <span class="hljs-variable">$S3_PATH</span>
</code></pre>
<p>Couple notes: Our <code>tmp</code> bucket is configured with two rules that delete files that start with either <code>1d</code> and <code>7d</code> prefix respectively. The rule of course matches the prefix duration. As the bucket name implies, this bucket shouldn't be used for anything that should be persistent. Each file is wiped on its 30 days year old birthday, aka, it gets removed. Please see section Object lifecycle rules in your bucket settings.</p>
<ol start="2">
<li>After uploading, get a publicly accesible link - that is presign the URL.</li>
</ol>
<pre><code class="lang-bash">SIGNED_URL=$(r2 presign --expire 1h <span class="hljs-variable">$S3_PATH</span>)
<span class="hljs-built_in">echo</span> <span class="hljs-variable">$SIGNED_URL</span>
</code></pre>
<ol start="3">
<li>Submit to Replicate</li>
</ol>
<pre><code class="lang-bash">REPLICATE_RES=$(curl --location <span class="hljs-string">'https://api.replicate.com/v1/predictions'</span> \
--header <span class="hljs-string">'Content-Type: application/json'</span> \
--header <span class="hljs-string">"Authorization: Token <span class="hljs-variable">$REPLICATE_API_TOKEN</span>"</span> \
--data @- &lt;&lt;EOF
{
    <span class="hljs-string">"version"</span>: <span class="hljs-string">"4f41e90243af171da918f04da3e526b2c247065583ea9b757f2071f573965408"</span>,
    <span class="hljs-string">"input"</span>: {
        <span class="hljs-string">"url"</span>: <span class="hljs-string">"<span class="hljs-variable">$SIGNED_URL</span>"</span>,
        <span class="hljs-string">"task"</span>: <span class="hljs-string">"transcribe"</span>,
        <span class="hljs-string">"timestamp"</span>: <span class="hljs-string">"chunk"</span>,
        <span class="hljs-string">"batch_size"</span>: 64,
        <span class="hljs-string">"language"</span>: <span class="hljs-string">"en"</span>
    }
}
EOF
)
<span class="hljs-built_in">echo</span> <span class="hljs-variable">$REPLICATE_RES</span> | jq
WHISPER_ID=$(<span class="hljs-built_in">echo</span> <span class="hljs-variable">$REPLICATE_RES</span> | jq -r <span class="hljs-string">'.id'</span>)
<span class="hljs-built_in">echo</span> <span class="hljs-variable">$WHISPER_ID</span>
</code></pre>
<p>This is a cold-started ML endpoint. It may take up to 3 minutes until your transcript is ready.</p>
<blockquote>
<p>You could completely add a loop that calls endpoint and checks for status, every 5 seconds, make sure to not DDoS them.</p>
</blockquote>
<ol start="4">
<li>Get the results using the id in the response</li>
</ol>
<pre><code class="lang-bash">OUTPUT_FILE=<span class="hljs-string">"{{output_filename}}.json"</span>
<span class="hljs-built_in">echo</span> <span class="hljs-variable">$OUTPUT_FILE</span>
curl --location <span class="hljs-string">"https://api.replicate.com/v1/predictions/<span class="hljs-variable">$WHISPER_ID</span>"</span> \
--header <span class="hljs-string">"Authorization: Token <span class="hljs-variable">$REPLICATE_API_TOKEN</span>"</span> | jq &gt; <span class="hljs-variable">$OUTPUT_FILE</span>
</code></pre>
<ol start="5">
<li>Get the complete transcript using <code>jq</code></li>
</ol>
<pre><code class="lang-bash">jq <span class="hljs-string">'.output.text'</span> <span class="hljs-variable">$OUTPUT_FILE</span>
</code></pre>
<h3 id="heading-bonus-use-ai-to-summarize">Bonus: Use AI to summarize</h3>
<p>This is an example of how we could use AI to extract key points. I am currently using <a target="_blank" href="https://github.com/danielmiessler/fabric">Fabric by Daniel Miessler</a> repository. In short it provides a set of convenience prompts to submit together with the input text, that can be accepted as a piped input, while not leaving the terminal.</p>
<p>One of my favorite is <code>extract_wisdom</code> which was designed for YouTube videos.</p>
<pre><code class="lang-bash">jq <span class="hljs-string">'.output.text'</span> <span class="hljs-variable">$OUTPUT_FILE</span> | fabric -p extract_wisdom -s
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Gemini 1.5 Pro can see, hear & read yet it still sucks at OCR]]></title><description><![CDATA[Preface
As Google finally published API access Gemini 1.5 Pro on Vertex AI, I was very eager to get back to the task I've been struggling* since 2021 Christmas. Where I set out to extract messages from un-ideal screenshots of iMessage, Tinder, Instag...]]></description><link>https://lukasnotes.dk/gemini-15-pro-can-see-hear-read-yet-it-still-sucks-at-ocr</link><guid isPermaLink="true">https://lukasnotes.dk/gemini-15-pro-can-see-hear-read-yet-it-still-sucks-at-ocr</guid><category><![CDATA[AI]]></category><category><![CDATA[challenge]]></category><category><![CDATA[gemini]]></category><category><![CDATA[OCR ]]></category><category><![CDATA[Vision]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Mon, 15 Apr 2024 10:11:57 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/U2eUlPEKIgU/upload/542abdff1db8058247161adb7d6dff2a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-preface">Preface</h2>
<p>As Google finally published API access Gemini 1.5 Pro on Vertex AI, I was very eager to get back to the task I've been struggling* since 2021 Christmas. Where I set out to extract messages from un-ideal screenshots of iMessage, Tinder, Instagram etc.</p>
<p><em>\</em>Because it's my side hobby project so I haven't spent actual years working on it. But to be considerate, I did put a lot of time since then testing various ideas.*</p>
<p>While I mainly focused on Gemini, I also run parallel prompts to GPT 4 Turbo and Claude 3 Opus models to evaluate. The observations apply to them all.</p>
<h3 id="heading-the-problem">The problem</h3>
<p>...is that the screenshots are very random. They can contain many not wanted elements (a name of a mobile provider, UI elements), may not be perfectly cropped, and the design may have changed over time, including or excluding extra details.</p>
<blockquote>
<p>This is exactly where powerful LLMs such as Gemini 1.5 Pro bring a big promise and thus big expectations on processing general unstructured information.</p>
</blockquote>
<h3 id="heading-goal">Goal</h3>
<p>To successfully complete this job, the output, at a bare minimum, needs to contain:</p>
<ul>
<li><p>the message body including emojis,</p>
</li>
<li><p>the correct sender (either "us" or the opponent),</p>
</li>
<li><p>ignore any other text from UI elements - only from the actual messages of the conversation.</p>
</li>
</ul>
<h2 id="heading-my-observation-notes">My observation notes</h2>
<p>Saturday, April 13. I spent the last four hours integrating and testing Gemini 1.5 Pro to run OCR on screenshots. The approach was to have a detailed prompt on how to extract messages and run one request per image. So no matter how many screenshots (that is, how many tokens) a conversation may contain, it should work one by one.</p>
<p>I spent a lot of time testing, attempting various angles, and feeling really frustrated because it was not successful, and as Titanic once did, the time is gone.</p>
<h3 id="heading-alignment">Alignment</h3>
<p>My first approach was to tell it whether the message is aligned to the left or to the right (padded on the right or on the left, respectively). This is how we intuitively tell who sent it. Even if I use black &amp; white filter on my phone, which I love.</p>
<p>One thing that was not immediately obvious, but with debugging I found out, is that I had to explicitly tell it <em>to judge from the viewer's perspective</em>. Otherwise it would flip/mirror the sides to often. Even with temperature set to 0.</p>
<p>This proved to work rather great-ish. 7.5 out of 10 times.</p>
<p>The model is consistently inconsistent. It fails to achieve any consistent results. This is especially common when messages vary in length but are still short enough to fit a single line in the UI. It tends to shift the flow, whether it thinks a message appears on the right or left, resulting in poor parsing quality.</p>
<p>OpenAI has mentioned that their vision model is struggling with spatial reasoning. It seems to me that no matter the provider - they all are struggling as a team.</p>
<h3 id="heading-color">Color</h3>
<p>In the context of an iMessage screenshot, I thought of asking about the color of the message bubble, since the opponent will always appear in gray. However when it fails to detect the alignment, it will also hallucinate the color - will reply it is gray, even though the message appears on the right with a blue bubble.</p>
<p>I also noticed that in some requests, it will output colors that don't exist or are turned into black and white. For example, with Google Gemini, sometimes it will say the message appears in a white bubble on a green app background, which never happened in any image. Super strange.</p>
<p>For the rest 4/10 times, it would think everything is in gray. So either the model is hallucinating, or Google is performing some kind of unclear pre-processing.</p>
<p>Furthermore, once in a while I observed the model spitting out the complete bullshit by mentioning the colors that are not seen at all; or making it seem as the image was further preprocessed and turned mono-chrome. If any of the providers are hiding this, it is a very unprofessional move to fail to communicate that.</p>
<p>The problem with background color is that emojis sent in certain cases are shown as tiny images with no background color at all. A hard rule like that would not work.</p>
<h3 id="heading-emojis">Emojis</h3>
<p>Let's talk about 'em, the emojis. The model is very good for parsing text but only 50-50 for extracting and parsing emojis. They're often not the ones that actually appear and often will be missed out too.</p>
<p>To my surprise - and perhaps to yours too - no company has ever worked on extracting emojis except for The Hive AI. They have the best models I've seen hands-on. However they struggle with "I"s. These get missed out a lot or become a vertical symbol instead. Crazy.</p>
<h3 id="heading-mirroring-issue">Mirroring issue</h3>
<p>Another one I could describe is the mirroring issue. It tends to flip the image in the middle of processing the image.</p>
<p>If a message from the opponent is short leading to a <em>long</em> message from "us", it will assign "our" message to the opponent and will continue putting the wrong sender for the rest of the parsing.</p>
<p>Were it to do that consistently, then all I need to do is flip it over. But no. It's in the middle haha.</p>
<h3 id="heading-google-ai-studio-playground">Google AI Studio Playground</h3>
<p>Furthermore, using Google AI Studio, I discovered that any subsequent requests in the chat that provided an image in the first request are not actually processing the image again.</p>
<p>I found this while attempting to debug and ask the model why it thinks a message is not blue and not aligned to the right when it is. It would spit out an answer based on the image description and assumptions about how messages would typically appear. It's quite obvious that all it gets is a description of once processed image.</p>
<p>This is rather frustrating, and I would blame Google for this. I've never seen this communicated, and if it is, it's definitely not clearly communicated.</p>
<p>I do read through documentation, and I'd like to think I only miss out on a minority of details. In other words, I tend to catch majority of details.</p>
<p>On the other hand, I was cheating too - using VPN to access otherwise geo-blocked playground (hence me waiting for Vertex AI). I guess that means we're even: 1 - 1.</p>
<h3 id="heading-not-ready-for-now">Not ready for now</h3>
<p>For now I conclude that although it is promising and exciting to parse a lot of information from screenshots that are otherwise not text-searchable, it is not implementable without human in the loop with an interface for review. At which, it would be marginally faster than a human alone.</p>
<p>To close with the opening,</p>
<blockquote>
<p>powerful LLMs bring a big promise and set high expectations on processing general unstructured information but they fail to execute it consistently.</p>
</blockquote>
<p>On the bright side, this is how I build my knowledge. Much more fun than reading books.</p>
<p>That being said, Elon Musk introduced his Grok 1.5 introducing vision capabilities. I would bet that this model should be good if he wants to use for self-driving Teslas.</p>
<h2 id="heading-tips-for-working-with-llms">Tips for working with LLMs</h2>
<p>A few things that I found very useful and you should know too.</p>
<ul>
<li><p>Debug prompts by asking the model explain what it sees. Be it a text or be it an image. We see it differently so make no assumptions.</p>
</li>
<li><p>Run parallel queries to all providers and make sure hyper parameters are the same. This way you can get ideas.</p>
</li>
<li><p>Ask the model to tell the variables you want it to apply the logic for. Such as, the message is sent by us if it's in a blue bubble. So ask what the color the bubble is in.</p>
</li>
<li><p>Save prompt versions and take breaks. It's easy to get lost a rabbit hole and deviate from the main approach. Taking a break can give you fresh eyes and fresh ideas. A prescription of 15 minute walk can do wonders.</p>
</li>
</ul>
<p>That is it, my dear reader.<br />If you have any ideas, I would love them to hear.</p>
]]></content:encoded></item><item><title><![CDATA[Accidentally deleting Laravel database when running tests? Here is what to do instead]]></title><description><![CDATA[Laravel 11
Since Laravel 11 removed `tests/CreatesApplication.php`  file I had to dig up a new solution. Sadly, there isn't the perfect pretty one. I had to overwrite the refreshApplication method which comes from the BaseTestCase class.
protected fu...]]></description><link>https://lukasnotes.dk/accidentally-deleting-laravel-database-when-running-tests-here-is-what-to-do-instead</link><guid isPermaLink="true">https://lukasnotes.dk/accidentally-deleting-laravel-database-when-running-tests-here-is-what-to-do-instead</guid><category><![CDATA[Laravel]]></category><category><![CDATA[PHPUnit]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Sun, 03 Mar 2024 06:50:06 GMT</pubDate><content:encoded><![CDATA[<h2 id="heading-laravel-11">Laravel 11</h2>
<p>Since Laravel 11 removed `<code>tests/CreatesApplication.php` </code> file I had to dig up a new solution. Sadly, there isn't the perfect pretty one. I had to overwrite the <code>refreshApplication</code> method which comes from the <code>BaseTestCase</code> class.</p>
<pre><code class="lang-php"><span class="hljs-keyword">protected</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">refreshApplication</span>(<span class="hljs-params"></span>): <span class="hljs-title">void</span>
</span>{
    <span class="hljs-built_in">parent</span>::refreshApplication();

    <span class="hljs-comment">// Check if the application is running unit tests</span>
    <span class="hljs-keyword">if</span> (<span class="hljs-keyword">$this</span>-&gt;app-&gt;runningUnitTests() === <span class="hljs-literal">false</span>) {
        trigger_error(<span class="hljs-string">'Wrong testing environment, phpunit.xml ignored'</span>);
    }
}
</code></pre>
<h2 id="heading-before-laravel-11-original-write-up">Before Laravel 11 (original write up)</h2>
<p>For a long time in my first years of dev journey I would accidentally wipe out the local database when I would run tests. This would because I have cached the config for unspecified reasons.</p>
<p>After falling into the same pit hole for enough times, I figured there's a way out.</p>
<p>So far every Laravel project comes with a <a target="_blank" href="https://github.com/laravel/laravel/blob/10.x/phpunit.xml"><code>phpunit.xml</code></a> file. This file contains environment variables that are loaded when tests run. If they are not, that's how we can accidentally delete the database. If they're not, that's how we can also save ourselves.</p>
<p>In the <code>tests/CreatesApplication.php</code> file we need to check that we're in a testing environment, if not, we trigger a fatal error:</p>
<pre><code class="lang-php"><span class="hljs-meta">&lt;?php</span>

<span class="hljs-keyword">namespace</span> <span class="hljs-title">Tests</span>;

<span class="hljs-keyword">use</span> <span class="hljs-title">Illuminate</span>\<span class="hljs-title">Contracts</span>\<span class="hljs-title">Console</span>\<span class="hljs-title">Kernel</span>;

<span class="hljs-keyword">trait</span> CreatesApplication
{
    <span class="hljs-comment">/**
     * Creates the application.
     *
     * <span class="hljs-doctag">@return</span> \Illuminate\Foundation\Application
     */</span>
    <span class="hljs-keyword">public</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">createApplication</span>(<span class="hljs-params"></span>)
    </span>{
        $app = <span class="hljs-keyword">require</span> <span class="hljs-keyword">__DIR__</span> . <span class="hljs-string">'/../bootstrap/app.php'</span>;

        $app-&gt;make(Kernel::class)-&gt;bootstrap();

        <span class="hljs-keyword">if</span> ($app-&gt;runningUnitTests() === <span class="hljs-literal">false</span>) {
            trigger_error(<span class="hljs-string">'Wrong testing environment, phpunit.xml ignored'</span>);
        }

        <span class="hljs-keyword">return</span> $app;
    }
}
</code></pre>
]]></content:encoded></item><item><title><![CDATA[Migrating large D1 to Turso]]></title><description><![CDATA[The update on May 5, 2024.
Hey readers, you deserve a couple facts before you dive in.

Cloudflare has finally introduced sqlite backup. Issue thread: https://github.com/cloudflare/workers-sdk/issues/3891#issuecomment-1902082359.

Turso is really gre...]]></description><link>https://lukasnotes.dk/migrating-large-d1-to-turso</link><guid isPermaLink="true">https://lukasnotes.dk/migrating-large-d1-to-turso</guid><category><![CDATA[cloudflare]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Sat, 17 Feb 2024 08:04:47 GMT</pubDate><content:encoded><![CDATA[<h2 id="heading-the-update-on-may-5-2024">The update on May 5, 2024.</h2>
<p>Hey readers, you deserve a couple facts before you dive in.</p>
<ul>
<li><p>Cloudflare has finally introduced sqlite backup. Issue thread: <a target="_blank" href="https://github.com/cloudflare/workers-sdk/issues/3891#issuecomment-1902082359">https://github.com/cloudflare/workers-sdk/issues/3891#issuecomment-1902082359</a>.</p>
</li>
<li><p>Turso is really great piece of software. I had a chat with Gaube and he's really solid person. I am still using it. I did miss the fact that storage is a sum of all replicas, so I did outgrew the free plan quite quickly. But that's completely okay.</p>
</li>
<li><p>I build a container that regularly syncs Turso into embedded file that can be consumed by local clients. You're welcome to use it as-is <a target="_blank" href="https://github.com/flexchar/turso-sync">https://github.com/flexchar/turso-sync</a>.</p>
</li>
</ul>
<h2 id="heading-context">Context</h2>
<p>I have a main project running in Laravel strong for several years now and last Spring, with the rise of LLMs, I got into building an extension to integrate in user's browsers. For the sake of simplicity, let's assume it's a context-aware chat bot with user's all around the world.</p>
<p>As someone based in EU, I chose to host my Laravel in EU however that added nearly a whole second (1s) of latency that is completely unnecessary for end user action. Switching to Cloudflare workers was a heaven (I'm yet to write about that on another article). Late autumn I began using D1 from Cloudflare and I think it's absolutely awesome. <em>I might also happen to be a big fanboy of cloudflare's services, to address the ambiguity.</em> However as the requests were growing and as I began integrating some of the data stored on D1 back in Laravel, I faced huge query times.</p>
<p>I leveraged <a target="_blank" href="https://github.com/renoki-co/l1?tab=readme-ov-file#d1-with-laravel">L1</a> package built by @<a target="_blank" href="https://github.com/rennokki">rennokki</a>. It uses HTTP API for D1 under the hood. Since I would need to make a couple of queries in some cases I would end up with as much as 10 seconds sometimes. This was a deal breaker.</p>
<p>Then I began getting a few new active users and saw rapid growth in the database size (file and rows wise). At the time of writing (2024 Feb), there's a hard limit of 2GB per database and I was approaching 1gb fast.</p>
<p>I began researching for alternatives. I wanted it to be simple and closer to users (edge) for performance but I considered putting everything back on one of the main servers too. To cut to the chase, Turso has been on my mind for several months (ironically thanks to them <a target="_blank" href="https://blog.turso.tech/incident-2023-12-04-data-leak-and-loss-in-some-free-tier-databases-7cba5bc7">leaking private DBs story</a>) and I finally decided to move there.</p>
<h2 id="heading-planning-stage">Planning stage</h2>
<p>Upon starting D1 I also discovered Drizzle and it didn't take long to fall in love with it. I truly wish Laravel had migrations like Drizzle. Just edit the source file and run <code>db:generate</code> to have updated files. That's lovely because it's always easy to inspect the table structure from the definition file.</p>
<p>Anyways, thanks to Drizzle and DRY-enough code, it was only one method that I had to swap to load drizzle client from Turso instead of D1 so that was easy. Now I just had to download the DB, swap D1 migrations (<code>npx wrangler d1 migrations apply db</code> thing) with drizzle table and push onto Turso.</p>
<p>My first plan was to <a target="_blank" href="https://github.com/nora-soderlund/cloudflare-d1-backups">cloudflare-d1-backups</a> package built by Nora that is run inside the worker and pumps out all DB into a <code>backup.sql</code> put on R2 (S3 equivalent bucket from Cloudflare). I've tested that a few weeks ago and it worked.</p>
<p>I also set up a <code>migrate.ts</code> file to migrate using Drizzle upon new start:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { migrate } <span class="hljs-keyword">from</span> <span class="hljs-string">'drizzle-orm/libsql/migrator'</span>;
<span class="hljs-keyword">import</span> { createClient } <span class="hljs-keyword">from</span> <span class="hljs-string">'@libsql/client'</span>;
<span class="hljs-keyword">import</span> { drizzle } <span class="hljs-keyword">from</span> <span class="hljs-string">'drizzle-orm/libsql'</span>;

<span class="hljs-keyword">const</span> client = createClient({
    url: <span class="hljs-string">'http://127.0.0.1:8080'</span>,
});

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> db = drizzle(client);

migrate(db, { migrationsFolder: <span class="hljs-string">'migrations'</span> })
    .then(<span class="hljs-function">() =&gt;</span> {
        <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Migrations completed!'</span>);
        process.exit(<span class="hljs-number">0</span>);
    })
    .catch(<span class="hljs-function">(<span class="hljs-params">err</span>) =&gt;</span> {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'Migrations failed!'</span>, err);
        process.exit(<span class="hljs-number">1</span>);
    });
</code></pre>
<blockquote>
<p>PS. I really love Bun because of native TypeScript support + several APIs that make it a breeze to use write scripts in TS versus classic NodeJS way.</p>
</blockquote>
<h2 id="heading-the-reality-amp-the-struggle">The reality &amp; the struggle</h2>
<p>It was probably a thursday evening a week ago that I coded up the download route and quickly figured out it's not gonna be without the pain. I hit all the <a target="_blank" href="https://developers.cloudflare.com/workers/platform/limits/#account-plan-limits">limits</a> of cloudflare workers - memory limit (there's hard 128MB), execution limit (a bit mysterious ~30s), too many sub-requests (calling D1 is a request) and many unknown errors that are extremely hard to debug since no info is returned. I really hope CF will improve on it. It's fair to say it is still a <code>beta</code> version of D1.</p>
<p>I opened a <a target="_blank" href="https://github.com/nora-soderlund/cloudflare-d1-backups/issues/9">github issue</a> on Nora's tool hoping for help and did more play arounds in the following days after my work. I may have not been the sharpest in my mind and missed a way out but eventually, after also reaching out on their discord, I was told about another tool that leverages their HTTP API instead of Worker to do the same thing. It's actually a fork of Nora's backup - <a target="_blank" href="https://github.com/Cretezy/cloudflare-d1-backup?tab=readme-ov-file">cloudflare-d1-backup by Cretezy</a>. I tried that before hitting the bed and it worked. It was such a relief. Well, not for long...</p>
<p>I wrote up the plan and called in a day, excitedly going to rest for seemingly a powerful yet easy migration tomorrow.</p>
<p>The plan was this:</p>
<ol>
<li><p>Download D1 using another tool:</p>
<ol>
<li><code>CLOUDFLARE_D1_ACCOUNT_ID=f4960f17-dfc6-4d0c-a4c0-58786243b612 CLOUDFLARE_D1_DATABASE_ID=f4960f17-dfc6-4d0c-a4c0-58786243b612 CLOUDFLARE_D1_API_KEY=f4960f17-dfc6-4d0c-a4c0-58786243b612 npx @cretezy/cloudflare-d1-backup backup.sql</code></li>
</ol>
</li>
<li><p>Remove sequence lines:</p>
<ol>
<li><p>because none of my table uses auto incrementing primary key so I would get errors since that table will not exist in Turso. <a target="_blank" href="https://renenyffenegger.ch/notes/development/databases/SQLite/internals/schema-objects/sqlite_sequence">Ref</a>. They exist in D1 because CF's migrations table is there.</p>
</li>
<li><p><code>sed '/DELETE FROM sqlite_sequence;/d; /INSERT INTO "sqlite_sequence"/d' your_backup.sql &gt; cleaned_backup.sql</code></p>
</li>
</ol>
</li>
<li><p>Start fresh sqlite and migrate using drizzle:</p>
<ol>
<li><p>note I migrate first so that I get the drizzle migrations tracking enabled which otherwise would not be possible after loading the backup.</p>
</li>
<li><p><code>touch data.db &amp;&amp; turso dev --db-file data.db &amp;&amp; bun migrate.ts</code></p>
</li>
</ol>
</li>
<li><p>Load backup:</p>
<ol>
<li><code>sqlite3 data.db &lt; cleaned_backup.sql</code></li>
</ol>
</li>
<li><p>Deploy to Turso:</p>
<ol>
<li><code>turso db create my-db --location wwa --from-file data.db</code></li>
</ol>
</li>
<li><p>Update method to use Turso in my Cloudflare Worker:</p>
</li>
</ol>
<pre><code class="lang-typescript"><span class="hljs-keyword">export</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getDb</span>(<span class="hljs-params">c: Context</span>): <span class="hljs-title">LibSQLDatabase</span>&lt;<span class="hljs-title">typeof</span> <span class="hljs-title">schema</span>&gt; </span>{
    <span class="hljs-keyword">if</span> (c.get(<span class="hljs-string">'db'</span>)) {
        <span class="hljs-keyword">return</span> c.get(<span class="hljs-string">'db'</span>);
    }

    <span class="hljs-keyword">const</span> url = c.env.LIBSQL_DB_URL?.trim();
    <span class="hljs-keyword">if</span> (url === <span class="hljs-literal">undefined</span>) {
        <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'LIBSQL_DB_URL env var is not defined'</span>);
    }

    <span class="hljs-keyword">const</span> authToken = c.env.LIBSQL_DB_AUTH_TOKEN?.trim();
    <span class="hljs-keyword">if</span> (authToken == <span class="hljs-literal">undefined</span>) {
        <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'LIBSQL_DB_AUTH_TOKEN env var is not defined'</span>);
    }

    <span class="hljs-keyword">const</span> client = createClient({ url, authToken });

    <span class="hljs-keyword">const</span> db = drizzle(client, {
        <span class="hljs-comment">// logger: true,</span>
        schema,
    });

    c.set(<span class="hljs-string">'db'</span>, db);

    <span class="hljs-keyword">return</span> db;
}
</code></pre>
<h2 id="heading-struggle-number-2">Struggle number 2</h2>
<p>The day and the time comes. I run the command and around 3.5 minutes in I get <code>RangeError: Invalid string length</code> . I feel frustrated to be honest. The database grew up to around 120k rows since last night and apparently that was too big. Quick search reveals that it is <a target="_blank" href="https://github.com/nodejs/node/issues/35973#issuecomment-722253319">a limitation of V8 engine</a> executing JavaScript.</p>
<p>Well, I foolishly retried using <code>bun run --bun</code> hoping it'd change anything and upon seeing same error I <code>git clone</code> &amp; open the source code. It's soon clear that all the SQL lines are <code>lines.push(sql</code>) to a big array and then <code>lines.join('\n')</code> , which is exactly where the error happens - the last step of the script.</p>
<p>I then rewrite the pair of <code>cli.ts</code> and <code>index.ts</code> to use <code>appendFile</code> from <code>node:fs/promises</code> and start writing to the output as soon as there are over 1000 of lines accumulated. It goes well and finally I have my data. The changes are in <a target="_blank" href="https://github.com/flexchar/cloudflare-d1-backup">the fork</a>.</p>
<p>I run the rest of the plan and it goes well. I must admit I feel really happy for this as this solves a lot of stress back on my mind due to unknown of what would happen when my DB would be 2GB and I cannot get out.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text"><strong>Max Rozen</strong> from Cloudflare has <a target="_blank" href="https://github.com/cloudflare/workers-sdk/issues/3891#issuecomment-1902082359">mentioned</a> on Github and also <a target="_blank" href="https://www.answeroverflow.com/m/1207086527579029604">on Discord</a> that the official export is being worked on and it may come this Spring. So hopefully if someone comes across this, they will have much more pleasant experience.</div>
</div>

<h2 id="heading-lessons">Lessons</h2>
<p>The biggest one being that I chose the wrong tool for the job. I like to reflect and I'd like to come up with a smart lesson but I don't really know. I am building a side hustle and I have no idea of data estimate. I chose <code>sqlite</code> because it's so portable and because 50% purpose of data stored is archival for analytics. So short latency is nice since it contains tiny bit of business logic but nothing complex. It's also unstructured unlike main database in Laravel which why I chose to have the two separate.</p>
<p>D1 and also DO (durable object) are cool for those who need data inside cloudlfare workers but don't store THAT much. I would definitely use again if I had to. Although Turso's <a target="_blank" href="https://docs.turso.tech/features/embedded-replicas"><strong>Embedded Replicas</strong></a> was the killer feature that made me certain about the decision.</p>
<h2 id="heading-bonus-using-embedded-replicas-in-laravel">Bonus: using <strong>Embedded Replicas in Laravel</strong></h2>
<p>High latencies from Laravel is, of course, one side of the coin why I migrated. So now was the time to get the piece of the cake that has been freshly baked.</p>
<p>I <code>composer remove renoki-co/l1</code> and replace <code>d1</code> connection with <code>sqlite</code> in my <code>database.php</code>:</p>
<pre><code class="lang-php"><span class="hljs-string">'data'</span> =&gt; [
    <span class="hljs-string">'driver'</span> =&gt; <span class="hljs-string">'sqlite'</span>,
    <span class="hljs-string">'database'</span> =&gt; <span class="hljs-string">'/turso/data.db'</span>,
    <span class="hljs-string">'prefix'</span> =&gt; <span class="hljs-string">''</span>,
    <span class="hljs-string">'foreign_key_constraints'</span> =&gt; env(<span class="hljs-string">'DB_FOREIGN_KEYS'</span>, <span class="hljs-literal">true</span>),
],
</code></pre>
<p>Update my <code>.env</code> files and put <code>read-only</code> token from Turso:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Consumed by Turso sync script (Readonly token)</span>
LIBSQL_DB_URL=libsql://data-user.turso.io
LIBSQL_DB_AUTH_TOKEN=
</code></pre>
<p>then edit the Dockerfile to include <code>nodejs</code> in the base image - previously I would ditch them as soon as I build frontend assets to keep it as skinny as possible.</p>
<p>I wish <code>bun</code> was on alpine but I am forced to make it work with <code>node</code>. I use global npm package so that I don't mess up my <code>package.json</code> which I am running from outside container because of the performance issues I would have running <code>npm run dev</code> inside Docker on Mac M2.</p>
<p>To do that I add the following line. You may have to adapt based on your structure of Dockerfile.</p>
<pre><code class="lang-dockerfile"><span class="hljs-comment"># Install global NPM package for syncing Turso DB</span>
su -l $ROOTLESS -c <span class="hljs-string">"npm config set prefix '~/.local/' &amp;&amp; npm i -g @libsql/client"</span> &amp;&amp; \
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">If you copy paste, note that you must be using at least <code>node 20</code> and <code>alpine 3.19</code> .</div>
</div>

<p>I write a <code>turso.mjs</code> script on the root of my project:</p>
<pre><code class="lang-typescript"><span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Hello to Turso Embedded Replicas.\n⏰ <span class="hljs-subst">${<span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>().toISOString()}</span>`</span>);
<span class="hljs-comment">// https://docs.turso.tech/features/embedded-replicas</span>

<span class="hljs-comment">// Node cannot resolve global packages so we need to use the full path to the package</span>
<span class="hljs-comment">// Expected to be installed in the base image</span>
<span class="hljs-keyword">import</span> { createClient } <span class="hljs-keyword">from</span> <span class="hljs-string">'/home/laravel/.local/lib/node_modules/@libsql/client/lib-esm/node.js'</span>;

<span class="hljs-keyword">const</span> {
    LIBSQL_DB_URL,
    LIBSQL_DB_AUTH_TOKEN,
    TURSO_DB_PATH = <span class="hljs-string">'/turso/data.db'</span>,
} = process.env;

<span class="hljs-keyword">if</span> (!LIBSQL_DB_URL) {
    <span class="hljs-built_in">console</span>.error(<span class="hljs-string">'Missing LIBSQL_DB_URL'</span>);
    process.exit(<span class="hljs-number">1</span>);
}

<span class="hljs-keyword">const</span> client = createClient({
    url: <span class="hljs-string">`file:<span class="hljs-subst">${TURSO_DB_PATH}</span>`</span>,
    syncUrl: LIBSQL_DB_URL,
    authToken: LIBSQL_DB_AUTH_TOKEN,
});

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">sync</span>(<span class="hljs-params"></span>) </span>{
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Syncing <span class="hljs-subst">${LIBSQL_DB_URL}</span> -&gt; <span class="hljs-subst">${TURSO_DB_PATH}</span>`</span>);
    <span class="hljs-built_in">console</span>.time(<span class="hljs-string">`sync()`</span>);
    <span class="hljs-keyword">await</span> client.sync();
    <span class="hljs-built_in">console</span>.timeEnd(<span class="hljs-string">'sync()'</span>);
}

<span class="hljs-comment">// if you wanted to run this as a dedicated container script,</span>
<span class="hljs-comment">// instead of using Laravel's scheduler, then use interval:</span>
<span class="hljs-comment">// setInterval(sync, 1000 * 60 * 5);</span>
sync();
</code></pre>
<p>Finally I append my <code>app/Console/Kernel.php</code> to sync every minute.</p>
<pre><code class="lang-php"><span class="hljs-comment">// Sync Turso DB (expected to only run on production server, not on local dev server)</span>
$schedule
    -&gt;exec(<span class="hljs-string">'node --env-file .env.production turso.mjs'</span>)
    -&gt;everyMinute()
    -&gt;onFailureWithOutput(<span class="hljs-function"><span class="hljs-keyword">function</span> (<span class="hljs-params">$output</span>) </span>{
        \Log::error(<span class="hljs-string">'Turso sync failed: '</span> . $output);
    });
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">As my database was 1GB at this point, one catch was to sync first without running the scheduler as it'd take about 4 minutes. Schedule would start 4 sync processes during this time which would end up crashing each other and failing. The subsequent syncs are only <code>200-400ms</code> long - isn't that lovely?</div>
</div>

<p>That is it. Now my laravel queries are <code>1ms</code> long for user-facing parts and I still get the serverless benefits. I am excited to see how I will feel about this in 6 months, or a year.</p>
]]></content:encoded></item><item><title><![CDATA[When tests randomly fail in Laravel...]]></title><description><![CDATA[One thing I learned today, or actually was reminded of, that when tests begin to fail randomly on reruns, it is often that the factory of the model used may have been configured to use ->randomElement([...]) and therefore on certain combination that ...]]></description><link>https://lukasnotes.dk/when-tests-randomly-fail-in-laravel</link><guid isPermaLink="true">https://lukasnotes.dk/when-tests-randomly-fail-in-laravel</guid><category><![CDATA[Laravel]]></category><category><![CDATA[PHP]]></category><category><![CDATA[Testing]]></category><dc:creator><![CDATA[Lukas]]></dc:creator><pubDate>Sun, 04 Feb 2024 18:48:09 GMT</pubDate><content:encoded><![CDATA[<p>One thing I learned today, or actually was reminded of, that when tests begin to fail randomly on reruns, it is often that the factory of the model used may have been configured to use <code>-&gt;randomElement([...])</code> and therefore on certain combination that wouldn't occur on production flow is causing the tests to fail.</p>
<p>I first encountered this a couple of years ago and was pulling my hair trying to understand why this seemingly abnormal behaviour was happening. From wiping out database, to setting up a dedicated test instance, until I <code>git bisect</code> that it was when I updated my <code>factory</code> class.  </p>
<p>Hopefully writing this will help me remember this sooner in the consecutive times which I'm afraid gonna happen.</p>
]]></content:encoded></item></channel></rss>