Skip to main content

Command Palette

Search for a command to run...

Migrating large D1 to Turso

Began with Cloudflare's D1 however high latency and upper limit pushed me to Turso. Migration wasn't without lessons. Here they are.

Updated
9 min read
L

I've discovered coding back in 2013 and three years later I spent all my summer building my first Laravel app which is still in production by the non-profit I've built for.

Now I'm struggling to find the balance between enjoying the power of "I can build this myself" and not chocking myself to death trying to build everything myself.

As it is common for developers to be less articulate, I decided to leverage writing about my endeavours, to keep me up.

The update on May 5, 2024.

Hey readers, you deserve a couple facts before you dive in.

Context

I have a main project running in Laravel strong for several years now and last Spring, with the rise of LLMs, I got into building an extension to integrate in user's browsers. For the sake of simplicity, let's assume it's a context-aware chat bot with user's all around the world.

As someone based in EU, I chose to host my Laravel in EU however that added nearly a whole second (1s) of latency that is completely unnecessary for end user action. Switching to Cloudflare workers was a heaven (I'm yet to write about that on another article). Late autumn I began using D1 from Cloudflare and I think it's absolutely awesome. I might also happen to be a big fanboy of cloudflare's services, to address the ambiguity. However as the requests were growing and as I began integrating some of the data stored on D1 back in Laravel, I faced huge query times.

I leveraged L1 package built by @rennokki. It uses HTTP API for D1 under the hood. Since I would need to make a couple of queries in some cases I would end up with as much as 10 seconds sometimes. This was a deal breaker.

Then I began getting a few new active users and saw rapid growth in the database size (file and rows wise). At the time of writing (2024 Feb), there's a hard limit of 2GB per database and I was approaching 1gb fast.

I began researching for alternatives. I wanted it to be simple and closer to users (edge) for performance but I considered putting everything back on one of the main servers too. To cut to the chase, Turso has been on my mind for several months (ironically thanks to them leaking private DBs story) and I finally decided to move there.

Planning stage

Upon starting D1 I also discovered Drizzle and it didn't take long to fall in love with it. I truly wish Laravel had migrations like Drizzle. Just edit the source file and run db:generate to have updated files. That's lovely because it's always easy to inspect the table structure from the definition file.

Anyways, thanks to Drizzle and DRY-enough code, it was only one method that I had to swap to load drizzle client from Turso instead of D1 so that was easy. Now I just had to download the DB, swap D1 migrations (npx wrangler d1 migrations apply db thing) with drizzle table and push onto Turso.

My first plan was to cloudflare-d1-backups package built by Nora that is run inside the worker and pumps out all DB into a backup.sql put on R2 (S3 equivalent bucket from Cloudflare). I've tested that a few weeks ago and it worked.

I also set up a migrate.ts file to migrate using Drizzle upon new start:

import { migrate } from 'drizzle-orm/libsql/migrator';
import { createClient } from '@libsql/client';
import { drizzle } from 'drizzle-orm/libsql';

const client = createClient({
    url: 'http://127.0.0.1:8080',
});

export const db = drizzle(client);

migrate(db, { migrationsFolder: 'migrations' })
    .then(() => {
        console.log('Migrations completed!');
        process.exit(0);
    })
    .catch((err) => {
        console.error('Migrations failed!', err);
        process.exit(1);
    });

PS. I really love Bun because of native TypeScript support + several APIs that make it a breeze to use write scripts in TS versus classic NodeJS way.

The reality & the struggle

It was probably a thursday evening a week ago that I coded up the download route and quickly figured out it's not gonna be without the pain. I hit all the limits of cloudflare workers - memory limit (there's hard 128MB), execution limit (a bit mysterious ~30s), too many sub-requests (calling D1 is a request) and many unknown errors that are extremely hard to debug since no info is returned. I really hope CF will improve on it. It's fair to say it is still a beta version of D1.

I opened a github issue on Nora's tool hoping for help and did more play arounds in the following days after my work. I may have not been the sharpest in my mind and missed a way out but eventually, after also reaching out on their discord, I was told about another tool that leverages their HTTP API instead of Worker to do the same thing. It's actually a fork of Nora's backup - cloudflare-d1-backup by Cretezy. I tried that before hitting the bed and it worked. It was such a relief. Well, not for long...

I wrote up the plan and called in a day, excitedly going to rest for seemingly a powerful yet easy migration tomorrow.

The plan was this:

  1. Download D1 using another tool:

    1. CLOUDFLARE_D1_ACCOUNT_ID=f4960f17-dfc6-4d0c-a4c0-58786243b612 CLOUDFLARE_D1_DATABASE_ID=f4960f17-dfc6-4d0c-a4c0-58786243b612 CLOUDFLARE_D1_API_KEY=f4960f17-dfc6-4d0c-a4c0-58786243b612 npx @cretezy/cloudflare-d1-backup backup.sql
  2. Remove sequence lines:

    1. because none of my table uses auto incrementing primary key so I would get errors since that table will not exist in Turso. Ref. They exist in D1 because CF's migrations table is there.

    2. sed '/DELETE FROM sqlite_sequence;/d; /INSERT INTO "sqlite_sequence"/d' your_backup.sql > cleaned_backup.sql

  3. Start fresh sqlite and migrate using drizzle:

    1. note I migrate first so that I get the drizzle migrations tracking enabled which otherwise would not be possible after loading the backup.

    2. touch data.db && turso dev --db-file data.db && bun migrate.ts

  4. Load backup:

    1. sqlite3 data.db < cleaned_backup.sql
  5. Deploy to Turso:

    1. turso db create my-db --location wwa --from-file data.db
  6. Update method to use Turso in my Cloudflare Worker:

export function getDb(c: Context): LibSQLDatabase<typeof schema> {
    if (c.get('db')) {
        return c.get('db');
    }

    const url = c.env.LIBSQL_DB_URL?.trim();
    if (url === undefined) {
        throw new Error('LIBSQL_DB_URL env var is not defined');
    }

    const authToken = c.env.LIBSQL_DB_AUTH_TOKEN?.trim();
    if (authToken == undefined) {
        throw new Error('LIBSQL_DB_AUTH_TOKEN env var is not defined');
    }

    const client = createClient({ url, authToken });

    const db = drizzle(client, {
        // logger: true,
        schema,
    });

    c.set('db', db);

    return db;
}

Struggle number 2

The day and the time comes. I run the command and around 3.5 minutes in I get RangeError: Invalid string length . I feel frustrated to be honest. The database grew up to around 120k rows since last night and apparently that was too big. Quick search reveals that it is a limitation of V8 engine executing JavaScript.

Well, I foolishly retried using bun run --bun hoping it'd change anything and upon seeing same error I git clone & open the source code. It's soon clear that all the SQL lines are lines.push(sql) to a big array and then lines.join('\n') , which is exactly where the error happens - the last step of the script.

I then rewrite the pair of cli.ts and index.ts to use appendFile from node:fs/promises and start writing to the output as soon as there are over 1000 of lines accumulated. It goes well and finally I have my data. The changes are in the fork.

I run the rest of the plan and it goes well. I must admit I feel really happy for this as this solves a lot of stress back on my mind due to unknown of what would happen when my DB would be 2GB and I cannot get out.

💡
Max Rozen from Cloudflare has mentioned on Github and also on Discord that the official export is being worked on and it may come this Spring. So hopefully if someone comes across this, they will have much more pleasant experience.

Lessons

The biggest one being that I chose the wrong tool for the job. I like to reflect and I'd like to come up with a smart lesson but I don't really know. I am building a side hustle and I have no idea of data estimate. I chose sqlite because it's so portable and because 50% purpose of data stored is archival for analytics. So short latency is nice since it contains tiny bit of business logic but nothing complex. It's also unstructured unlike main database in Laravel which why I chose to have the two separate.

D1 and also DO (durable object) are cool for those who need data inside cloudlfare workers but don't store THAT much. I would definitely use again if I had to. Although Turso's Embedded Replicas was the killer feature that made me certain about the decision.

Bonus: using Embedded Replicas in Laravel

High latencies from Laravel is, of course, one side of the coin why I migrated. So now was the time to get the piece of the cake that has been freshly baked.

I composer remove renoki-co/l1 and replace d1 connection with sqlite in my database.php:

'data' => [
    'driver' => 'sqlite',
    'database' => '/turso/data.db',
    'prefix' => '',
    'foreign_key_constraints' => env('DB_FOREIGN_KEYS', true),
],

Update my .env files and put read-only token from Turso:

# Consumed by Turso sync script (Readonly token)
LIBSQL_DB_URL=libsql://data-user.turso.io
LIBSQL_DB_AUTH_TOKEN=

then edit the Dockerfile to include nodejs in the base image - previously I would ditch them as soon as I build frontend assets to keep it as skinny as possible.

I wish bun was on alpine but I am forced to make it work with node. I use global npm package so that I don't mess up my package.json which I am running from outside container because of the performance issues I would have running npm run dev inside Docker on Mac M2.

To do that I add the following line. You may have to adapt based on your structure of Dockerfile.

# Install global NPM package for syncing Turso DB
su -l $ROOTLESS -c "npm config set prefix '~/.local/' && npm i -g @libsql/client" && \
💡
If you copy paste, note that you must be using at least node 20 and alpine 3.19 .

I write a turso.mjs script on the root of my project:

console.log(`Hello to Turso Embedded Replicas.\n⏰ ${new Date().toISOString()}`);
// https://docs.turso.tech/features/embedded-replicas

// Node cannot resolve global packages so we need to use the full path to the package
// Expected to be installed in the base image
import { createClient } from '/home/laravel/.local/lib/node_modules/@libsql/client/lib-esm/node.js';

const {
    LIBSQL_DB_URL,
    LIBSQL_DB_AUTH_TOKEN,
    TURSO_DB_PATH = '/turso/data.db',
} = process.env;

if (!LIBSQL_DB_URL) {
    console.error('Missing LIBSQL_DB_URL');
    process.exit(1);
}

const client = createClient({
    url: `file:${TURSO_DB_PATH}`,
    syncUrl: LIBSQL_DB_URL,
    authToken: LIBSQL_DB_AUTH_TOKEN,
});

async function sync() {
    console.log(`Syncing ${LIBSQL_DB_URL} -> ${TURSO_DB_PATH}`);
    console.time(`sync()`);
    await client.sync();
    console.timeEnd('sync()');
}

// if you wanted to run this as a dedicated container script,
// instead of using Laravel's scheduler, then use interval:
// setInterval(sync, 1000 * 60 * 5);
sync();

Finally I append my app/Console/Kernel.php to sync every minute.

// Sync Turso DB (expected to only run on production server, not on local dev server)
$schedule
    ->exec('node --env-file .env.production turso.mjs')
    ->everyMinute()
    ->onFailureWithOutput(function ($output) {
        \Log::error('Turso sync failed: ' . $output);
    });
💡
As my database was 1GB at this point, one catch was to sync first without running the scheduler as it'd take about 4 minutes. Schedule would start 4 sync processes during this time which would end up crashing each other and failing. The subsequent syncs are only 200-400ms long - isn't that lovely?

That is it. Now my laravel queries are 1ms long for user-facing parts and I still get the serverless benefits. I am excited to see how I will feel about this in 6 months, or a year.

M

Hi Lukas, I may have gotten a little lost because I am not fluent in English.

But I wanted to ask you a question, the big breakthrough you have achieved with Turso is to have local copies in the clients and in that sense I understand the difference.

But knowing that D1 is no longer beta and that you talk about 2 gigs of maximum space but now I think it is 10 gigs, if you could not have those local copies, would you think about D1? I am developing a saas and I am worried about what you said about the latency and how long D1 takes when it is supposed to be designed to scale and that it would seem at first glance similar to D1.

I'm going to use cloudflare as serverless and that's why my initial idea to stay in the ecosystem but I was very concerned reading you.

Best regards

L
Lukas1y ago

Hi Moisés, Sorry it took so long to respond!

It is really specific to your project. The latency, if you use Cloudflare Workers for accessing the database, is very low. My problem was because I was using the SQL API from a separate application. That is why Turso local copies became so valuable.

Furthermore, I didn't know when D1 will out of beta. I am happy with turso because there is no size limit, whereas D1 still has 10GB limit. My current database is already bigger than 10GB.

Another point for me is that I don't want to be vendor locked in. The perception (which really may just be a trivial illusion) that I can download the sqlite file from Turso is what makes me believe that I could move out from Cloudflare in an hour - package my worker into a Bun.serve() docker container and deploy it on my server. I may never need to do that but who knows.

I am a big fan of Cloudflare and I don't see them being bad in any way as long as it fits what you're doing. :)

Hope that helps!

1
M

How is your experience with turso so far?

I also switched from D1 to turso recently, and would like to know if you encountered any issues or limitations in the last 2 months?

L
Lukas1y ago

Hey Rakib, actually none.

I am really loving the embedded sync function. There is no native php driver for libsql(turso) so the fact that I can get sqlite file and use Laravel native methods made it a 100x deal.

I did however hit a couple "issues" when I was optimizing indices where I caused a lot of GBs to be synced (around 50GBs over the billing period). I had a call with Gauba, founder of Turso, and he was very kind to waive it and further hear me how I am using it. I wouldn't call it a limitaion.

Frankly, if it fits your use case, it's quite cheap. Although I have seen that Cloudflare finally introduced sqlite download support.

What are you building if I may?

1
M

Thanks Lukas, for sharing your experience. I really loved reading your blog.

I'm building a SaaS app using Turso, Drizzle, Zod, and Next.js. I'm hosting the app with Cloudflare Pages.

Switching from Turso to D1 is quite easy due to Drizzle, and I might switch in future but it still needs a lot of improvements. What do you recommend, should I stick with Turso?

L
Lukas1y ago

Md Rakib Hossain

Sounds like a very light project, you will do great no matter where but...

I would totally stick with Turso. The latencies are marginal and you are getting a lot more for your bucks. A company doing one thing will always out do the one doing many things.

Though the key reason for me is that fact that I am not vendor locked in. If you ever want to host at Vercel, Netlify or just spin up your own Docker container with Bun as a web server and drop it on the ridiculously cheap Hetzner VPS, you just do that. Where as with D1 you would be tied and have a work-demanding migration.