SQLite’s Use of Tcl (2017)

(tcl.tk)

136 points | by fanf2 4 days ago

45 comments

  • weinzierl 4 days ago

    This is from the SQLite creator D. Richard Hipp who is always worth reading, but, I'd like to recommend reading what TCL's creator John Ousterhout has to say.

    His article on threads from 1995 was highly influential on me, and I remember it to this day. More recently (2018, revised and expanded in 2021), he published a book on software engineering practices called A Philosophy of Software Design which is, in my opinion, the best in its category.

    • dilawar 4 days ago

      +100 for his book. TIL he created TCL also. The book is excellent and one of the very resources that talks about architecture independent of language or tech-stack used.

      • notRobot 4 days ago

        He also gave a talk that greatly influenced how I look at relationships: https://gist.github.com/gtallen1187/27a585fcf36d6e657db2

      • mananaysiempre 4 days ago

        Like Lamport, who is more widely known for (what was originally) his side project LaTeX than for his seminal distributed systems research including Paxos, Ousterhout tends to be more widely known by name for (what was originally) his side project Tcl than for his seminal distributed systems research including Raft.

        • jimkoen 4 days ago

          What an amusing coincidence, I didn't know Lamport wrote LaTeX, rather I knew of him only in connection with Lamport clocks.

          • abhgh 4 days ago

            "(La)Tex" is believed to expand to "(Lamport)TeX" although I haven't seen an official source for that trivia :-)

        • p_l 2 days ago

          Ousterhout also I think had outsized presence in early VLSI tooling, which is why TCL happened and why is it still common in tooling (pun intended)

    • pjot 4 days ago

      This book lives on my desk. And I’ve proudly filled its pages with annotations and sticky notes. It’s really well done.

    • leoh 4 days ago

      >His article on threads from 1995 was highly influential on me, and I remember it to this day. More recently (2018, revised and expanded in 2021), he published a book on software engineering practices called A Philosophy of Software Design which is, in my opinion, the best in its category.

      Would you have a link?

  • graemep 4 days ago

    > Early versions of SQLite (prior to 2004) operated on the classic TCL principal that "everything is a string". Beginning with SQLite3 (2004-06-18), SQLite also supports binary data.

    TCL can handle binary data. It is just not a separate type: https://wiki.tcl-lang.org/page/Working+with+binary+data

    SQLite also always had a null type, surely?

    > However, types are still very flexible in SQLite, just as they are in TCL. SQLite treats the datatypes on column names in a CREATE TABLE statement as suggestions rather than hard requirements

    This is something I do not much like. Its not compulsory (you can create "strict" tables). It works well in TCL which is an entire language designed around the idea. Less so in SQL.

    One of the advantages of RDBMSes is that not accepting obviously wrong data makes life easier for developers. You can debug an issue that happens on inserting the data, not when you find the wrong type or other bad much later on.

    • tyingq 4 days ago

      There was a time when TCL could not handle binary data at all. The support came in TCL 8.0:

      "Binary data is now supported in Tcl."

      https://www.tcl.tk/software/tcltk/relnotes/tcl8.0.txt

      Well before 2004, but worth mentioning because you'll find a fair amount of old posts complaining about it.

      • gcr 4 days ago

        What do you mean, that TCL strings weren’t 8-bit-clean?

        • pavlov 4 days ago

          Maybe they were zero-terminated C strings, which of course can’t represent arbitrary binary data.

          • tyingq 3 days ago

            Yes, prior to 8.0, tcl strings mapped to null-terminated C strings.

            "Binary I/O. The new object system in Tcl 8.0 supports binary strings (internally, strings are counted in addition to being null terminated). There is a new "binary" command for inserting and extracting data to/from binary strings. Commands such as "puts", "gets", and "read" commands now operate correctly on binary data. There is a new variable tcl_platform(byteOrder) to identify the native byte order for the current host."

    • rscho 4 days ago

      This typing behaviour, now in combination with strict tables, is a boon for biostats. When you get shitty data to be cleaned, you've got 3 main choices: 1.use slow and untyped scripting languages, 2.use a strictly typed database, meaning you'll have to clean your data in advance, or 3.load it all as strings into SQLlite, then clean the data until it fits into a strict table with check constraints. IMO, it's pretty clear 3 is best by far!

      • coliveira 4 days ago

        I agree that 3 is great, but you can also do that in any database, just create your input tables as string only and then perform the necessary operations to move them into typed tables.

        • rscho 4 days ago

          Yes, but with sqlite there is much less ceremony (no server, etc.) and most importantly can be used without talking to my institution's sysadmin, which is what I'm looking for when manipulating one-off datasets.

          • mb7733 4 days ago

            But that advantage has nothing to do with accepting data of the wrong type into a column (by default).

      • kstrauser 4 days ago

        Is anyone using untyped languages much today, other than shell scripts?

        • colejohnson66 4 days ago

          CMake is entirely stringly-typed as well. Like many shells, arrays/lists are just space-separated strings.

          • mdaniel 4 days ago

            Pedantically that's not true, they're ';' delimited https://cmake.org/cmake/help/v3.31/command/list.html#:~:text...

            The confusion comes from the fact that set() automatically coerces space-delimited items into a ;-delimited list

              set(ONE alpha;beta)
              set(TWO alpha beta)
              list(LENGTH ONE one_len)
              list(LENGTH TWO two_len)
              message(FATAL_ERROR "one <<${ONE}>> length ${one_len}\ntwo <<${TWO}>> length ${two_len}")
            
            emits

              one <<alpha;beta>> length 2
            
              two <<alpha;beta>> length 2
        • nilamo 4 days ago

          Most people via JavaScript...

    • samatman 4 days ago

      > SQLite also always had a null type, surely?

      In 'stringly typed' languages, the natural value of NULL is the empty string.

      SQLite currently distinguishes between "" and NULL, and I don't know if this was always the case. But it's quite possible that SQLite 1 and 2 conflated the empty string and NULL.

    • PittleyDunkin 4 days ago

      > SQLite also always had a null type, surely?

      NULL is surely a value, not a type, no? You can restrict use of NULL with the column type, but it doesn't make any sense to have a NULL type rather than, say, a nullable-string type. What would be the point?

      • IshKebab 4 days ago

        Null/none is a value and a type. E.g. in Python there's None which has the type NoneType. Sometimes they are named the same.

        This is an example of a singleton type - a type that can only hold one value. It's not the only example though, it's fairly common to support string or integer singleton types, e.g. in Typescript you can have the type "foo" which only has a single possible value, "foo". Or some languages support a type int(5) for which the only possible value is 5.

        You might think that's useless, but it's very useful when combining types, e.g. via unions. Or sometimes when writing generic code.

        SQL has an extremely primitive type system so there's no syntax for type unions etc. But you can imagine if it was written by someone who had experience of modern typing you would say `my_nullable_column: string | null`.

        • PittleyDunkin 4 days ago

          Ok, but you could also just say "nullable string". Giving null a type doesn't seem to yield a benefit outside of python. Especially when, again, it doesn't appear to have any utility in SQL outside of applying to types.

  • zvr 4 days ago

    Back in the day, the Tcl conferences (and the EuroTcl in Europe) were great sources of information. Only about half of the presentations were for Tcl internals and extensions. The rest were about other projects that were using Tcl in some way or other, and it was fascinating to learn about completely different areas of software.

    • wduquette 4 days ago

      They were. I attended for many years, including the conference where Richard gave this talk.

      • cmacleod4 3 days ago

        The US Tcl conference has not happened for a few years now, but EuroTcl continues, I gave a talk there myself this year - https://openacs.org/conf2024/info/ .

        • wduquette 2 days ago

          Hi, Colin! I'm sorry to hear that the US conference is dead, but glad to hear that EuroTcl carries on.

  • omoikane 4 days ago

    There is also a video recording of this talk:

    https://youtu.be/kHmwv3I1Kxk?si=CrZW_VVnA_UqcXCU

    I was looking for this because Figure 1 (Breakdown Of SQLite Source Code By Language) is missing, and archive.org doesn't seem to have it. That chart can be found near 11:56 of the video above.

  • 1vuio0pswjnm7 4 days ago

    "The sqlite3.c and sqlite3.h source files are build products, and the source tree used to build those files is over 50% TCL code."

    Always feels silly to have to install Tcl to compile sqlite3,

    I use the --disable-tcl configuration option before compiling. After "make" I have a one-liner to statically-link the sqlite3 binary. It takes a relatively long time to link; sqlite3.c is a whopping 8.7M. I end up with a 1.7M sqlite3 binary.

    I never do "make install". I do not want to install all the Tcl stuff.

  • smitty1e 4 days ago

    > For example, the byte-code engine used to evaluate SQL statements inside of SQLite is implemented as a large "switch" statement inside a "for" loop, with a separate "case" for each opcode, all in the "vdbe.c" source file.

    Duff's Device[1] for the win!

    [1] https://en.m.wikipedia.org/wiki/Duff's_device

    • wizzwizz4 4 days ago

      Duff's Device is a for loop inside a switch statement. Totally different.

      • 1oooqooq 4 days ago

        And here I was thinking I found a fancy term for my "lazy parser"... Well "Lazy Parser" it will remain named.

      • smitty1e 4 days ago

        I stand clue-batted, thank you.

  • novoreorx 2 days ago

    I wonder is it possible to use Tcl as a testing tool for modern languages such as Python and JavaScript, the way SQLite use Tcl to do testing looks really promising

  • rwalle 4 days ago

    I wonder -- if they were to restart from scratch today, would they do the same thing? If not, which stack would they choose?

    • pstuart 4 days ago

      Rust would probably be the language of choice -- it appears to be the default for software in that domain.

  • postepowanieadm 4 days ago

    Anyone has used the mysterious "e" editor?

  • 4 days ago
    [deleted]