GitHub - Manu577228/Advanced-String-Algorithms-Java: A collection of advanced String algorithms with efficient implementations and clear explanations.
𧬠What Is This?
A curated, production-quality Java library of the most sophisticated string algorithms ever devised β algorithms that power search engines, compilers, bioinformatics tools, and beyond.
This repository goes far beyond the basics. Every algorithm here is a mathematical masterpiece with deep theoretical roots, implemented cleanly and efficiently in Java.
ποΈ Algorithms Included
| Algorithm | Category | Complexity | What It Does |
|---|---|---|---|
| π Booth's Algorithm | Canonical Form | O(n) |
Finds the lexicographically smallest rotation of a string |
| π Duval's Algorithm | Lyndon Words | O(n) |
Decomposes a string into Lyndon words in linear time |
| π³ EER Tree | Palindromes | O(n) |
Builds a palindrome tree β tracks all distinct palindromic substrings |
| 𧬠Farach's Algorithm | Suffix Trees | O(n) |
Optimal suffix tree construction for integer alphabets |
| π€ Suffix Automaton | Automata | O(n) |
The smallest DFA that accepts all suffixes of a string |
| ποΈ Ukkonen's Algorithm | Suffix Trees | O(n) |
Real-time, online suffix tree construction β a true classic |
π Project Structure
Advanced-String-Algorithms-Java/
βββ src/
βββ main/
βββ java/
βββ org/bharadwaj/t/
βββ BoothsAlgorithm/
βββ DuvalsAlgorithm/
βββ EERTreeAlgorithm/
βββ FarachsAlgorithm/
βββ SuffixAutomatonAlgorithm/
βββ UkkonensAlgorithm/
π Getting Started
Prerequisites
- Java 11 or higher
- Maven or Gradle (optional)
Clone the Repository
git clone https://github.com/Manu577228/Advanced-String-Algorithms-Java.git
cd Advanced-String-Algorithms-JavaCompile & Run (Example β Ukkonen's Algorithm)
javac src/main/java/org/bharadwaj/t/UkkonensAlgorithm/*.java
java -cp src/main/java org.bharadwaj.t.UkkonensAlgorithm.Mainπ§ Deep Dives
π Booth's Algorithm β Lexicographic Rotation
Booth's algorithm finds the starting index of the lexicographically least rotation of a string in O(n) time. It's a cornerstone of string normalization and is used in canonical form detection, graph isomorphism, and data compression.
Key insight: Uses a failure function similar to KMP to avoid recomparison.
π Duval's Algorithm β Lyndon Factorization
Duval's algorithm decomposes any string into a sequence of Lyndon words β strings that are strictly smaller than all of their rotations. The factorization is unique (ChenβFoxβLyndon theorem) and runs in O(n) time with O(1) space.
Applications: String sorting, suffix array construction, algebraic combinatorics.
π³ EER Tree (Palindromic Tree)
The EER Tree (Eertree / Palindrome Tree) is a data structure that encodes all distinct palindromic substrings of a string in a compact tree. It has at most n+2 nodes for a string of length n.
Applications: Palindrome counting, eertree-based DP, genome analysis.
𧬠Farach's Algorithm β Suffix Trees for Integer Alphabets
Farach's algorithm is theoretically optimal β it constructs suffix trees for integer alphabets in O(n) time, improving on earlier algorithms. It uses a clever divide-and-conquer approach based on odd/even position merging.
Applications: Full-text indexing, bioinformatics, pattern matching at scale.
π€ Suffix Automaton (SAM)
The Suffix Automaton is the minimal DAWG (Directed Acyclic Word Graph) that recognizes all suffixes of a given string. It has O(n) states and transitions and is one of the most powerful string data structures ever developed.
Applications: Substring search, LCS computation, string counting.
ποΈ Ukkonen's Algorithm β Online Suffix Tree
Ukkonen's is perhaps the most celebrated string algorithm β it builds a suffix tree online, one character at a time, in O(n) total time. The use of suffix links and active points makes it beautifully efficient.
Applications: Pattern matching, data compression, plagiarism detection.
π Complexity Overview
Algorithm | Time | Space | Online?
-------------------|-----------|-----------|--------
Booth's | O(n) | O(n) | No
Duval's | O(n) | O(1) | Yes β
EER Tree | O(n) | O(n) | Yes β
Farach's | O(n) | O(n) | No
Suffix Automaton | O(n) | O(n) | Yes β
Ukkonen's | O(n) | O(n) | Yes β
π‘ Why These Algorithms?
Most string algorithm repositories stop at KMP, Rabin-Karp, or Z-algorithm. This repo starts where others end. These are:
- π Mathematically deep β rooted in automata theory, combinatorics on words, and formal language theory
- βοΈ Practically powerful β used in real-world systems from Linux
grepto bioinformatics pipelines - π§© Interview gold β rare enough to impress, important enough to matter
π€ Contributing
Contributions are warmly welcome! If you'd like to:
- π Fix a bug β open an Issue first
- β Add an algorithm β open a Pull Request with tests
- π Improve docs β direct PRs welcome
# Fork β Clone β Branch β Code β PR
git checkout -b feature/your-algorithm-nameπ License
This project is licensed under the MIT License β use it freely, attribution appreciated.
Made with π§ + β by Manu577228
If this helped you, consider leaving a β β it means a lot!