Jevgeni Tsaikin's blog

All about me and Mobile Development for iOS and Windows platforms. This blog reflects only my personal opinions.


Leave a comment

Text to Speech in iOS using AVSpeechSynthesizer and automatic language detection

In this post I am going to share the approach I took to synthesize an utterance from a text with automatic language detection. I will provide the sample code in both Objective-C and Swift 3.0 languages below.

In order to synthesize an utterance I had to use AVFoundation library and AVSpeechSynthesizer and AVSpeechUtterance classes in particular. AVSpeechSynthesizer is used to pronounce utterances and AVSpeechUtterance class allows to setup various properties for an utterance including a rate, voice and volume. Before I passed the utterance to synthesizer I had to activate an audio session for Playback category with DuckOthers option (feel free to pick other categories or options depending on your application’s behaviour or specific use case). Once synthesizer has finished speaking the last utterance out I had to deactivate previously created audio session.

For automatic language detection I have used NSLinguisticTagger (from Foundation library), which allowed me to detect a language for a given NSString(String) object. Then I had to iterate over the speech voices installed on a device (or simulator) to find a corresponding match for text language. If an appropriate match couldn’t be found or if the text language couldn’t be detected (i.e in cases where the text is too short or other cases when tagger returns “und” for undefined) I had to return a default language.

Below is a sample code I used for Swift version of implementation (Unfortunately, at the moment WordPress highlighter doesn’t support Swift language, so I had to use “cpp” highlighter instead).

@objc
class TextToSpeechUtils: NSObject, AVSpeechSynthesizerDelegate {
    
    let synthesizer = AVSpeechSynthesizer()
    let audioSession = AVAudioSession.sharedInstance()
    let defaultLanguage = "en-US"
    var lastPlayingUtterance: AVSpeechUtterance?
    
    public func synthesizeSpeech(forText text: String) {
        
        if (text.isEmpty) { return }
        
        do {
            try audioSession.setCategory(AVAudioSessionCategoryPlayback, with: [.duckOthers])
            try audioSession.setActive(true)
        } catch {
            return
        }
        
        let utterance = AVSpeechUtterance(string:text)
        utterance.rate = AVSpeechUtteranceDefaultSpeechRate
        utterance.volume = 0.7
        utterance.voice = AVSpeechSynthesisVoice(language: detectLanguageFromText(text))
        self.synthesizer.speak(utterance)
        
        self.lastPlayingUtterance = utterance
    }
    
    func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        if (synthesizer == self.synthesizer && self.lastPlayingUtterance == utterance) {
            do {
                // after last utterance has played - deactivate the audio session
                try self.audioSession.setActive(false);
            } catch {
                return
            }
        }
    }
    
    private func detectLanguageFromText(_ text: String) -> String {
        let tagger = NSLinguisticTagger.init(tagSchemes: [NSLinguisticTagSchemeLanguage], options: 0)
        tagger.string = text
        let textLanguage = tagger.tag(at: 0, scheme: NSLinguisticTagSchemeLanguage, tokenRange: nil, sentenceRange: nil)
        var detectedLanguage: String?
        for installedLanguage in AVSpeechSynthesisVoice.speechVoices() {
            let languageStringParts = installedLanguage.language.components(separatedBy: "-")
            if (languageStringParts.count > 0 && languageStringParts[0] == textLanguage) {
                detectedLanguage = installedLanguage.language
                break
            }
        }
        
        // if language could not be detected return default language
        return detectedLanguage ?? defaultLanguage
    }
}

And here is Objective-C version of the same code.
Header file:

#import <Foundation/Foundation.h>
#import <AVFoundation/AVFoundation.h>

@interface TSTextToSpeechUtils : NSObject<AVSpeechSynthesizerDelegate>

- (void)synthesizeSpeechForText:(NSString *)text;

@end

Implementation file:

#import "TSTextToSpeechUtils.h"

@interface TSTextToSpeechUtils ()

@property (strong, nonatomic) AVSpeechSynthesizer *synthesizer;
@property (strong, nonatomic) AVSpeechUtterance *lastPlayingUtterance;
@property (strong, nonatomic) AVAudioSession *audioSession;

@end

@implementation TSTextToSpeechUtils

- (instancetype)init
{
    if ((self = [super init])) {
        _synthesizer = [[AVSpeechSynthesizer alloc] init];
        _synthesizer.delegate = self;
    }
    return self;
}

- (void)synthesizeSpeechForText:(NSString *)text
{
    if ([text length] == 0) {
        return;
    }
    
    self.audioSession = [AVAudioSession sharedInstance];
    
    NSError *error;
    
    // activate audioSession to play utterance
    [self.audioSession setCategory:AVAudioSessionCategoryPlayback withOptions:AVAudioSessionCategoryOptionDuckOthers error:&error];
    [self.audioSession setActive:YES error:&error];
    
    AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:text];
    utterance.rate = AVSpeechUtteranceDefaultSpeechRate;
    utterance.voice = [AVSpeechSynthesisVoice voiceWithLanguage:[self detectLanguageFromText:text]];
    utterance.volume = 0.7;
    [self.synthesizer speakUtterance:utterance];
    
    self.lastPlayingUtterance = utterance;
}

- (void)speechSynthesizer:(AVSpeechSynthesizer *)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance *)utterance
{
    if (synthesizer == self.synthesizer && self.lastPlayingUtterance == utterance) {
        NSError *error;
        // after last utterance has played - deactivate the audio session
        [self.audioSession setActive:NO error:&error];
    }
}

- (NSString *)detectLanguageFromText:(NSString *)text
{
    NSArray *tagSchemes = [NSArray arrayWithObjects:NSLinguisticTagSchemeLanguage, nil];
    NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagSchemes options:0];
    [tagger setString:text];
    NSString *textLanguage = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:nil sentenceRange:nil];
    
    NSString *detectedLanguage = nil;
    
    // check if the text language exists within installed languages
    for (id installedLanguage in [AVSpeechSynthesisVoice speechVoices]) {
        NSArray *languageStringParts = [[installedLanguage language] componentsSeparatedByString:@"-"];
        if (languageStringParts.count > 0 && [languageStringParts[0] isEqualToString:textLanguage]) {
            detectedLanguage = [installedLanguage language];
            break;
        }
    }
    
    if (detectedLanguage == nil) {
        // if language could not be detected assign to default
        detectedLanguage = @"en-US";
    }
    return detectedLanguage;
}

@end


Leave a comment

Short trip to London for a C/C++ Security course

Last evening of my first and unfortunately too short trip to London. The purpose of my trip was to attend a “Secure C/C++ Development” training kindly provided by Skype and MWR InfoSecurity. The course itself was really intense, professionally delivered and well structured, so I would highly recommend all my friend to attend, even if you don’t have much working experience or knowledge in C/C++ development (like me). I learned so many new security related topics including Secure Cookies, Canaries, DEP, ASLR, Buffer Overflow, Heap Overflow, exploitation principles and different mitigation techniques. Besides learning all those wonderful things I did a little bit of sightseeing in the evenings. Didn’t have enough time to see much, but London made a great first impression on me, gorgeous architecture synergy of modern and old elements, nice food, even though the city felt a little bit overcrowded with a slight feeling of social diversity, but first impressions can be misleading. From the places I had a chance to see the area around Big Ben is the most eye-catching and spectacular. Hopefully will be coming back shortly.

Tower Bridge Birmingham palace Tower BridgeBig Ben


Leave a comment

JT at Codess 2014 Stockholm

codess-logo

Last week I had the honor to represent Skype and give a flash talk on Mobile App development, with some examples from the Windows Phone platform and my experience. It was a great event: good organization, many good questions, discussions and a lot of highly enthusiastic codesses. Thank you all!

Jevgeni Tsaikin at Codess 2014 Stockholm


3 Comments

Stretching ListView items to full width in WinRT XAML apps

If you have a ListView filled with child controls which you would like to stretch to full width setting HorizontalContentAlignment to Stretch for a ListView, unfortunately, will not do the trick here. In this situation you have to modify the ItemContainer style (as shown in code sample bellow). This should work for both Windows 8.1 and Windows Phone 8.1 apps.

<ListView>
    <ListView.ItemContainerStyle>
        <Style TargetType="ListViewItem">
            <Setter Property="HorizontalContentAlignment" Value="Stretch" />
        </Style>
    </ListView.ItemContainerStyle>
    <ListView.ItemTemplate>
        <DataTemplate>
          ...
        </DataTemplate>
    </ListView.ItemTemplate>
</ListView>


7 Comments

Shared Projects vs Portable Class Libraries for Windows 8.1 and Windows Phone 8.1

When you build a new modern application you might also think of sharing some of the sources between Windows 8.1 and Windows Phone 8.1 apps. To accomplish that Visual Studio 2013 Update 2 offers two main options: shared projects or portable class libraries. In this blog post I will try to compare those two options and gather all the information on the major differences between “Shared Projects” (for Universal apps) and Portable Class Libraries for Windows 8.1 and Windows Phone 8.1 projects.

Continue reading


Leave a comment

Switching my focus to Windows 8.1 and Windows Phone 8.1 development

After some big announcements at Build Conference 2014, I guess, it is time to switch my focus from building standard Silverlight Windows Phone apps to a Jupiter framework (XAML/C#) apps for Windows 8.1 and Windows Phone 8.1. I believe this switch will also make me post articles more frequently. Right now I am in a middle of building a new Universal application which will run on both platforms and it really feels like the future has already arrived.


Leave a comment

Post-event summary of TechDay 2013 in Estonia, Latvia, Lithuania

More than one week has passed since one of the largest local Microsoft events took place in three Baltic countries: Estonia, Latvia and Lithuania. This year there were again 5 different tracks to cover essentially every IT person’s needs from Security to Kinect SDK. The event happened in the same three locations from last year and there were again around 500 participants in each of three countries, which is a relatively huge audience if we take the population of countries into consideration. In this post I will try to express my opinion about the event and share my feedback and observations.
Continue reading