1. The Discovery Problem
Building a personal knowledge graph is easy. Making it useful is hard. The value of a knowledge graph isn't in the nodes — it's in the edges, the connections that reveal how ideas relate.
Manually creating every edge is tedious and limited by what you already know. The interesting connections are often ones you haven't thought of yet. MEMEX solves this by automatically discovering semantic relationships using structured data from Wikidata and link analysis from Wikipedia.
2. Data Sources
2.1 Wikidata
Wikidata is a free, structured knowledge base maintained by the Wikimedia Foundation. Every entity (person, concept, work) has a unique identifier (Q-number) and a set of properties (P-numbers) linking to other entities.
For intellectual history, the most valuable properties are:
| Property | Name | Example |
|---|---|---|
| P737 | influenced by | McLuhan → Innis |
| P738 | influenced | Innis → McLuhan |
| P184 | doctoral advisor | Heidegger → Husserl |
| P185 | doctoral student | Husserl → Heidegger |
| P1066 | student of | Žižek → Lacan (informal) |
| P135 | movement | Derrida → Deconstruction |
| P101 | field of work | Kittler → Media theory |
| P108 | employer | McLuhan → U of Toronto |
| P800 | notable work | McLuhan → Understanding Media |
2.2 Wikipedia Link Graph
Wikipedia articles link to other articles. These links encode semantic relationships — when an editor writes that "McLuhan was influenced by Harold Innis," they create a hyperlink. Millions of such editorial decisions form a distributed ontology.
The Wikipedia API provides two relevant endpoints:
action=query&prop=links— Outgoing links from an articleaction=query&list=backlinks— Incoming links to an article
3. Connection Types
MEMEX discovers four types of connections:
3.1 Explicit Wikidata Relationships
Direct property links between entities. These are the highest confidence connections — someone explicitly stated this relationship in structured form.
// Query: Who influenced entity Q5878?
SELECT ?influenced ?influencedLabel WHERE {
wd:Q5878 wdt:P737 ?influenced.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
}
}
3.2 Shared Property Membership
Two entities share a categorical property value. If both McLuhan and Kittler
have P101 = Media theory, that's a connection — they work in
the same field.
// Find shared movements between two entities
function findSharedProperties(entityA, entityB) {
const propsA = await getEntityProperties(entityA);
const propsB = await getEntityProperties(entityB);
const shared = [];
for (const [prop, valuesA] of Object.entries(propsA)) {
const valuesB = propsB[prop] || [];
const overlap = valuesA.filter(v => valuesB.includes(v));
if (overlap.length > 0) {
shared.push({ property: prop, values: overlap });
}
}
return shared;
}
3.3 Wikipedia Cross-References
If Wikipedia article A links to Wikipedia article B, there's likely a semantic relationship. This catches connections that haven't been formalized in Wikidata.
// Check if article A links to article B
async function checkWikipediaLink(titleA, titleB) {
const response = await fetch(
`https://en.wikipedia.org/w/api.php?` +
`action=query&titles=${titleA}&prop=links&pltitles=${titleB}&format=json`
);
const data = await response.json();
const pages = Object.values(data.query.pages);
return pages[0].links?.length > 0;
}
3.4 Manual Connections
User-created links with custom relationship types: influences, extends, contradicts, cites, applies-to, etc. These capture insights that aren't encoded in any external database.
4. Discovery Algorithm
4.1 On Node Addition
When a user adds a new entity to their graph:
- Fetch Wikidata entity by label search
- Retrieve all relevant properties (P737, P135, P101, etc.)
- For each property value, check if it matches an existing node
- Create edges for matches
- Optionally fetch Wikipedia links and check against existing nodes
async function discoverConnections(newNode, existingNodes) {
const connections = [];
// 1. Direct Wikidata relationships
const wdProps = await fetchWikidataProperties(newNode.wikidataId);
for (const node of existingNodes) {
// Check if new node influences or is influenced by existing
if (wdProps.influencedBy?.includes(node.wikidataId)) {
connections.push({
source: newNode.id,
target: node.id,
type: 'influenced_by',
source: 'wikidata'
});
}
// Check shared movements
const sharedMovements = intersect(
wdProps.movements,
node.properties?.movements
);
if (sharedMovements.length > 0) {
connections.push({
source: newNode.id,
target: node.id,
type: 'shared_movement',
data: sharedMovements
});
}
}
// 2. Wikipedia link analysis
const wikiLinks = await fetchWikipediaLinks(newNode.wikipediaTitle);
for (const node of existingNodes) {
if (wikiLinks.includes(node.wikipediaTitle)) {
connections.push({
source: newNode.id,
target: node.id,
type: 'wikipedia_link'
});
}
}
return connections;
}
4.2 Full Graph Scan
The "Discover Links" function runs pairwise comparison across all nodes, finding connections that might have been missed on initial addition (e.g., if node B was added before node A had its properties fetched).
5. Edge Visualization
Different connection types render differently:
| Type | Style | Color |
|---|---|---|
| Wikidata influence | Solid, directed arrow | Orange |
| Shared properties | Dashed | Yellow |
| Wikipedia links | Thin solid | Blue |
| Manual | Solid | Green |
This visual hierarchy lets users quickly distinguish curated relationships (Wikidata) from inferred ones (Wikipedia links) from personal annotations (manual).
6. SPARQL for Complex Queries
For advanced discovery, MEMEX can execute SPARQL queries against the Wikidata Query Service. Example: find all entities that influenced both McLuhan AND Kittler:
SELECT ?influencer ?influencerLabel WHERE {
wd:Q193489 wdt:P737 ?influencer. # McLuhan influenced by
wd:Q77116 wdt:P737 ?influencer. # Kittler influenced by
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
}
}
This returns Harold Innis — a node the user might not have thought to add but which connects two existing nodes in their graph.
7. The Emergent Ontology Thesis
Wikipedia's link structure isn't random. When thousands of editors independently decide which articles to link, they're making semantic judgments. The aggregate of these judgments forms an emergent ontology — a map of conceptual relationships that no single person designed but that reflects collective human knowledge organization.
MEMEX treats Wikipedia not as a source of text to read, but as a graph to traverse. The articles are nodes; the links are edges; the structure is the knowledge.
8. Limitations
8.1 Wikidata Coverage
Wikidata's "influenced by" property (P737) is unevenly populated. Major philosophers have extensive entries; obscure academics may have none. The system falls back to Wikipedia links when Wikidata is sparse.
8.2 Link Noise
Not every Wikipedia link is semantically meaningful. An article might link to "United States" or "1945" — these are navigational, not conceptual. MEMEX filters common low-signal targets.
8.3 Directionality
Wikipedia links are one-way. If A links to B, we know A's article mentions B, but not vice versa. Wikidata's inverse properties (P737/P738) solve this for influence relationships, but not for all connection types.
9. Future Work
- OpenAlex integration — Citation networks from academic literature, showing who actually cited whom (not just who Wikipedia says influenced whom)
- Path finding — SPARQL queries to find shortest path between two entities through the Wikidata graph
- Clustering — Automatic grouping of nodes by shared properties (all Frankfurt School members, all media theorists)
- Temporal visualization — Timeline view showing when influences could have occurred based on birth/death dates
10. Conclusion
Personal knowledge management tools typically treat connections as something users must create manually. MEMEX inverts this: connections are discovered automatically from the semantic structure already encoded in Wikidata and Wikipedia. The user's job shifts from link creation to link curation — reviewing discovered connections, adding manual refinements, and building a personalized map of how ideas relate.
The underlying thesis is that the web already contains a vast, implicit knowledge graph. MEMEX makes it explicit and personal.