Regex to match the end of sentences in order to split a block of text into sentences.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
([\.\?!][\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+)
([\.\?!][\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+)
([\.\?!][\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+)

Python Code

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
sentence_regex = ur'([\.\?!][\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+)'
regex = re.compile(sentence_regex, flags=re.UNICODE)
sentences = regex.split(TEXT_BLOCK)
sentence_regex = ur'([\.\?!][\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+)' regex = re.compile(sentence_regex, flags=re.UNICODE) sentences = regex.split(TEXT_BLOCK)
sentence_regex = ur'([\.\?!][\'\"\u2018\u2019\u201c\u201d\)\]]*\s*(?<!\w\.\w.)(?<![A-Z][a-z][a-z]\.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)\s+)'
regex = re.compile(sentence_regex, flags=re.UNICODE)
sentences = regex.split(TEXT_BLOCK)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.