◐ Shell
clean mode source β†—

GitHub - Manu577228/Advanced-String-Algorithms-Java: A collection of advanced String algorithms with efficient implementations and clear explanations.

by Manu577228

"Strings are the DNA of computation β€” master them, and you master the machine."


🧬 What Is This?

A curated, production-quality Java library of the most sophisticated string algorithms ever devised β€” algorithms that power search engines, compilers, bioinformatics tools, and beyond.

This repository goes far beyond the basics. Every algorithm here is a mathematical masterpiece with deep theoretical roots, implemented cleanly and efficiently in Java.


πŸ—‚οΈ Algorithms Included

Algorithm Category Complexity What It Does
πŸ” Booth's Algorithm Canonical Form O(n) Finds the lexicographically smallest rotation of a string
πŸ”„ Duval's Algorithm Lyndon Words O(n) Decomposes a string into Lyndon words in linear time
🌳 EER Tree Palindromes O(n) Builds a palindrome tree β€” tracks all distinct palindromic substrings
🧬 Farach's Algorithm Suffix Trees O(n) Optimal suffix tree construction for integer alphabets
πŸ€– Suffix Automaton Automata O(n) The smallest DFA that accepts all suffixes of a string
πŸ—œοΈ Ukkonen's Algorithm Suffix Trees O(n) Real-time, online suffix tree construction β€” a true classic

πŸ“ Project Structure

Advanced-String-Algorithms-Java/
└── src/
    └── main/
        └── java/
            └── org/bharadwaj/t/
                β”œβ”€β”€ BoothsAlgorithm/
                β”œβ”€β”€ DuvalsAlgorithm/
                β”œβ”€β”€ EERTreeAlgorithm/
                β”œβ”€β”€ FarachsAlgorithm/
                β”œβ”€β”€ SuffixAutomatonAlgorithm/
                └── UkkonensAlgorithm/

πŸš€ Getting Started

Prerequisites

  • Java 11 or higher
  • Maven or Gradle (optional)

Clone the Repository

git clone https://github.com/Manu577228/Advanced-String-Algorithms-Java.git
cd Advanced-String-Algorithms-Java

Compile & Run (Example β€” Ukkonen's Algorithm)

javac src/main/java/org/bharadwaj/t/UkkonensAlgorithm/*.java
java -cp src/main/java org.bharadwaj.t.UkkonensAlgorithm.Main

🧠 Deep Dives

πŸ” Booth's Algorithm β€” Lexicographic Rotation

Booth's algorithm finds the starting index of the lexicographically least rotation of a string in O(n) time. It's a cornerstone of string normalization and is used in canonical form detection, graph isomorphism, and data compression.

Key insight: Uses a failure function similar to KMP to avoid recomparison.

πŸ”„ Duval's Algorithm β€” Lyndon Factorization

Duval's algorithm decomposes any string into a sequence of Lyndon words β€” strings that are strictly smaller than all of their rotations. The factorization is unique (Chen–Fox–Lyndon theorem) and runs in O(n) time with O(1) space.

Applications: String sorting, suffix array construction, algebraic combinatorics.

🌳 EER Tree (Palindromic Tree)

The EER Tree (Eertree / Palindrome Tree) is a data structure that encodes all distinct palindromic substrings of a string in a compact tree. It has at most n+2 nodes for a string of length n.

Applications: Palindrome counting, eertree-based DP, genome analysis.

🧬 Farach's Algorithm β€” Suffix Trees for Integer Alphabets

Farach's algorithm is theoretically optimal β€” it constructs suffix trees for integer alphabets in O(n) time, improving on earlier algorithms. It uses a clever divide-and-conquer approach based on odd/even position merging.

Applications: Full-text indexing, bioinformatics, pattern matching at scale.

πŸ€– Suffix Automaton (SAM)

The Suffix Automaton is the minimal DAWG (Directed Acyclic Word Graph) that recognizes all suffixes of a given string. It has O(n) states and transitions and is one of the most powerful string data structures ever developed.

Applications: Substring search, LCS computation, string counting.

πŸ—œοΈ Ukkonen's Algorithm β€” Online Suffix Tree

Ukkonen's is perhaps the most celebrated string algorithm β€” it builds a suffix tree online, one character at a time, in O(n) total time. The use of suffix links and active points makes it beautifully efficient.

Applications: Pattern matching, data compression, plagiarism detection.


πŸ“Š Complexity Overview

Algorithm          | Time      | Space     | Online?
-------------------|-----------|-----------|--------
Booth's            | O(n)      | O(n)      | No
Duval's            | O(n)      | O(1)      | Yes βœ…
EER Tree           | O(n)      | O(n)      | Yes βœ…
Farach's           | O(n)      | O(n)      | No
Suffix Automaton   | O(n)      | O(n)      | Yes βœ…
Ukkonen's          | O(n)      | O(n)      | Yes βœ…

πŸ’‘ Why These Algorithms?

Most string algorithm repositories stop at KMP, Rabin-Karp, or Z-algorithm. This repo starts where others end. These are:

  • πŸ“ Mathematically deep β€” rooted in automata theory, combinatorics on words, and formal language theory
  • βš™οΈ Practically powerful β€” used in real-world systems from Linux grep to bioinformatics pipelines
  • 🧩 Interview gold β€” rare enough to impress, important enough to matter

🀝 Contributing

Contributions are warmly welcome! If you'd like to:

  • πŸ› Fix a bug β†’ open an Issue first
  • βž• Add an algorithm β†’ open a Pull Request with tests
  • πŸ“– Improve docs β†’ direct PRs welcome
# Fork β†’ Clone β†’ Branch β†’ Code β†’ PR
git checkout -b feature/your-algorithm-name

πŸ“œ License

This project is licensed under the MIT License β€” use it freely, attribution appreciated.


Made with 🧠 + β˜• by Manu577228

If this helped you, consider leaving a ⭐ β€” it means a lot!