Where are the knowledge boundaries?
One thing I have thought about a lot is Personal Knowledge Management. Software that help people organize and understand and recall easily all that they care about and access it fluidly is certainly a worthwhile goal But I note an interesting tension that can be expressed in multiple ways.
One expression is a privacy vs width of knowledge question. When I first began my PKM tool, Nous+, it was well before current LLMs came on the scene. At the time it seemed sufficient for user to say "remember this" handing over a URI of some kind and be prompted for tags and relationships and perhaps a more fine grained class or category for that to be remembered and made part of a de facto knowledge graph.
There are a couple of things to note. Only what the user chooses to remember is in the graph. Only what the user decides are tags and relationships is remembered. This is very private and delimited to how the user things about things. It is thus much more likely they can easily bring it out of there store by the way the think about information and tag and relate things. All that is for the good on one perspective cluster.
The original Nous+ provided a very sophisticated query mechanism to find things by type, having on not having a set of tags, being or not being related to a set of objects, various attribute value queries including ranges and time ranges when first entered or last modified. Also fine.
But this was before much of modern AI. Now at the least I would expect AI to figure out reasonable tag structures and to note relationships to other things. It goes deeper though. The AI can deeply look into that which the URI is for if it is accessible to it. It can find other information online that relates to what is referenced and much deeper detail about it.
Does all that go into users personal knowledge or is it some cloud around it that may or may not be dynamically explored at user discretion? If it only dynamically assembled by hooking in an AI with access to user Nous+ then the token and inference costs are paid again and again even on behalf of the same user - which seems inefficient.
If multiple users all reference the same event, person, etc doe they all have their personal limited perspective on it and perhaps highly redundantly? Does the overall system combine them and augment with the multiple perspectives? Does it optionally do that? Should the underlying engine build augmented information about say public entities or concepts and make them available to all users via interaction with the AI?
Where exactly are the boundaries between private knowledge and information and the broader world of knowledge and information about the things referenced? Is their some dial for the user to choose?
In training modern LLMs they are trained on very large corpus that frankly ignores a lot of boundaries and notions, reasonable or unreasonable, as to some kind of knowledge ownership. A large part of their ability comes from this large corpus. However it is obvious that fresh inference runs on every prompt is very costly. It is logical that RAG (retrieval augmented prompting) and KAG (knowledge augmented retrieval - usually by some knowledge graph) is more efficient and more accurate.
But if you are building a PKM for many many users on even for yourself why would you do vectorization and knowledge extraction once as you go and have that to have far cheaper and better AI augmented information with your knowledge store over time? Why wouldn't you combine the combine not proprietary (good trick to know where that boundary is) elements and knowledge about them to make the cost lower and value much higher for every user going forward?
The tension here is between personal minds and group minds.